Abschnittsübersicht

    • Use PyAgrum to load the Sachs BN
      • init
        import pyAgrum as gum
        import pyAgrum.lib.notebook as gnb
        import pyAgrum.lib.bn_vs_bn as bnvsbn
      • example
        bnTheo=gum.loadBN('sachs.xdsl')
        EGTheo=gum.EssentialGraph(bnTheo)
        gnb.sideBySide(bnTheo,EGTheo)

    • generate several datasets from the Sachs network, for instance 5 datasets for each of the sizes = 50, 500, 1000, 5000
      • example
        gum.generateSample(bnTheo, 500, "sample_sachs_500_1.csv",True)
          • for each dataset, you will be able to run at least two algorithms provided by pyAgrum
            • .Greedy Hill Climbing, which is a score-based algorithm that returns one DAG
              learner=gum.BNLearner("sample_sachs_500_1.csv",bnTheo) #using bn as template for variables
              learner.useGreedyHillClimbing()
              learner.useScoreAIC() # or useScoreBIC, useScoreBDeu
              bnApp=learner.learnBN()
              print("Learned in {0}ms".format(1000*learner.currentTime()))
              EGApp=gum.EssentialGraph(bnApp)
              gnb.sideBySide(bnApp,EGApp)
            • MIIC, which is a recent constraint-based algorithm that returns one essential grap
              learner2=gum.BNLearner("sample_sachs_500_1.csv",bnTheo) #using bn as template for variables
              learner2.useMIIC()
              learner2.useNoCorrection()
              EGApp=learner2.learnEssentialGraph()
              print("Learned in {0}ms".format(1000*learner2.currentTime()))
              gnb.sideBySide(EGTheo,EGApp)

          • you can estimate the quality of the learnt model, by comparing its essential graph to the theoretical one
            • bncmp=bnvsbn.GraphicalBNComparator(bnApp,bnTheo)
              bncmp.hamming()
            • where the hamming function returns 2 values :
              • hamming : the Hamming distance between the skeletons
              • structural hamming : the Hamming distance between the corresponding essential graphs

          • questions :
            1. why are we repeating each experiment (for one data size) several times ?
            2. for one experiment, can you explain why it is "dangerous" to compare directly both DAGs ? and why comparing both essential graphs is meaningful ?
            3. plot the learning time and the structural Hamming distance with respect to the data size. What are your conclusions ?