Résumé de section

    • Report + Question answers are part of your evaluation
    • Use PyAgrum to load the Sachs BN
      • init
        import pyagrum as gum
        import pyagrum.lib.notebook as gnb
        import pyagrum.lib.bn_vs_bn as bnvsbn
      • example
        bnTheo=gum.loadBN('sachs.xdsl')
        EGTheo=gum.EssentialGraph(bnTheo)
        gnb.sideBySide(bnTheo,EGTheo)
    • generate several datasets from the Sachs network, for instance 5 datasets for each of the sizes = 50, 500, 1000, 5000
      • example
        gum.generateSample(bnTheo, 500, "sample_sachs_500_1.csv",True)
        • for each dataset, you will be able to run at least two algorithms provided by pyAgrum
          • .Greedy Hill Climbing, which is a score-based algorithm that returns one DAG
            learner=gum.BNLearner("sample_sachs_500_1.csv",bnTheo) #using bn as template for variables
            learner.useGreedyHillClimbing()
            learner.useScoreAIC() # or useScoreBIC, useScoreBDeu
            bnApp=learner.learnBN()
            print("Learned in {0}ms".format(1000*learner.currentTime()))
            EGApp=gum.EssentialGraph(bnApp)
            gnb.sideBySide(EGTheo,EGApp)
          • MIIC, which is a recent constraint-based algorithm that returns one essential grap
            learner2=gum.BNLearner("sample_sachs_500_1.csv",bnTheo) #using bn as template for variables
            learner2.useMIIC()
            learner2.useNoCorrection() #test with / without
            bnApp2=learner2.learnBN()
            print("Learned in {0}ms".format(1000*learner2.currentTime()))
            EGApp2=gum.EssentialGraph(bnApp2)
            gnb.sideBySide(EGTheo,EGApp2)
             
        • you can estimate the quality of the learnt model, by comparing its essential graph to the theoretical one
          • bncmp=bnvsbn.GraphicalBNComparator(bnApp,bnTheo)
            bncmp.hamming()
          • where the hamming function returns 2 values :
            • hamming : the Hamming distance between the skeletons
            • structural hamming : the Hamming distance between the corresponding essential graphs

        • questions :
          1. why are we repeating each experiment (for one data size) several times ?
          2. for one experiment, can you explain why it is "dangerous" to compare directly both DAGs ? and why comparing both essential graphs is meaningful ?
          3. plot the learning time and the structural Hamming distance with respect to the data size. Test several scoring functions or other algorithm parameters. What are your conclusions ?