11 - Apprentissage : Sachs
章节大纲
-
- Report + Question answers are part of your evaluation
- Use PyAgrum to load the Sachs BN
- init
import pyagrum as gum
import pyagrum.lib.notebook as gnb
import pyagrum.lib.bn_vs_bn as bnvsbn - example
bnTheo=gum.loadBN('sachs.xdsl')
EGTheo=gum.EssentialGraph(bnTheo)
gnb.sideBySide(bnTheo,EGTheo)
- init
- generate several datasets from the Sachs network, for instance 5 datasets for each of the sizes = 50, 500, 1000, 5000
- example
gum.generateSample(bnTheo, 500, "sample_sachs_500_1.csv",True)
- for each dataset, you will be able to run at least two algorithms provided by pyAgrum
- .Greedy Hill Climbing, which is a score-based algorithm that returns one DAG
learner=gum.BNLearner("sample_sachs_500_1.csv",bnTheo) #using bn as template for variables
learner.useGreedyHillClimbing()
learner.useScoreAIC() # or useScoreBIC, useScoreBDeu
bnApp=learner.learnBN()
print("Learned in {0}ms".format(1000*learner.currentTime()))
EGApp=gum.EssentialGraph(bnApp)
gnb.sideBySide(EGTheo,EGApp) - MIIC, which is a recent constraint-based algorithm that returns one essential grap
learner2=gum.BNLearner("sample_sachs_500_1.csv",bnTheo) #using bn as template for variables
learner2.useMIIC()
learner2.useNoCorrection() #test with / withoutbnApp2=learner2.learnBN()
print("Learned in {0}ms".format(1000*learner2.currentTime()))
EGApp2=gum.EssentialGraph(bnApp2)
gnb.sideBySide(EGTheo,EGApp2)
- .Greedy Hill Climbing, which is a score-based algorithm that returns one DAG
- you can estimate the quality of the learnt model, by comparing its essential graph to the theoretical one
-
bncmp=bnvsbn.GraphicalBNComparator(bnApp,bnTheo)
bncmp.hamming() - where the hamming function returns 2 values :
- hamming : the Hamming distance between the skeletons
- structural hamming : the Hamming distance between the corresponding essential graphs
-
- questions :
- why are we repeating each experiment (for one data size) several times ?
- for one experiment, can you explain why it is "dangerous" to compare directly both DAGs ? and why comparing both essential graphs is meaningful ?
- plot the learning time and the structural Hamming distance with respect to the data size. Test several scoring functions or other algorithm parameters. What are your conclusions ?
- example