Cours : INFO4 - S8 - ID - Systèmes de raisonnement probabiliste (ID)

Aperçu des sections

Introduction

Tout replier Tout déplier
La représentation des connaissances et le raisonnement à partir de ces représentations a donné naissance à de nombreux modèles. Les modèles graphiques probabilistes, et plus précisément les réseaux bayésiens (RB), initiés par Judea Pearl dans les années 1980 sous le nom de systèmes experts probabilistes, se sont révélés des outils très pratiques pour la représentation de connaissances incertaines et le raisonnement à partir d’informations incomplètes, dans de nombreux domaines comme la bio-informatique, la gestion du risque, le marketing, la sécurité informatique, le transport, etc.

Le but de ce cours est

de définir la notion de réseau bayésien et de raisonnement probabiliste,
de donner un aperçu des différents algorithmes permettant d'effectuer ce raisonnement (ou inférence) probabiliste,
d'introduire la notion de construction automatique de ces modèles de raisonnement à partir de données.
de présenter les extensions dynamiques et relationnelles de ces modèles.
The need of representating knowledge and reasoning from these representations has resulted in many models. Probabilistic graphical models, and more precisely Bayesian networks (BNs), initiated by Judea Pearl in the 1980s under the name of probabilistic expert systems, have proven to be very practical tools for representing uncertain knowledge and reasoning from incomplete information, in many fields such as bioinformatics, risk management, marketing, computer security, transportation, etc.

The goal of this course is
to define the notion of Bayesian network and probabilistic reasoning,
to give an overview of the different algorithms allowing to perform this probabilistic reasoning (or inference),
to introduce the notion of automatic construction/learning of these models from data,
to present the dynamic and relational extensions of these models.
1 - Introduction, rappels de probabilité / Reminders of Probability Theory
Lecture : slides part I - 2-5

Assignments

Please, fill the following document (Student statistics).

Once the document will be complete, estimate the following distributions:

P(Gl, Re, Ta, Lo) [from the statistics gathered] This is a joint distribution, check that its sum is one.

P(Gl, Re, Ta) [marginalization] This is another joint distribution summing to one

P(Lo | Gl, Re, Ta) [definition of conditional probability distribution] This is a conditional distribution, which is a set of joint distributions (one for each value of {GL, Re, Ta}, so you can check that each joint distribution sums to one.

et ainsi de suite pour P(Gl, Re), P(Ta | Gl, Re), P(Gl) et P(Re | Gl)

Finally, choose one configuration (set of values) for {Gl, Re, Ta, Lo} and check for this configuration that P(Gl, Re, Ta, Lo) = P(Gl) * P(Re | Gl) * P(Ta | Gl, Re) * P(Lo | Gl, Re, Ta) [chain rule]

Let's imagine that Re and Gl are independent, and that Lo is independent with Re and Ta given Gl. What would be the simplification of the chain rule decomposition by adding such assumptions ?
- Student Statistics URL
2-3 - Définition - modèle de dépendances/indépendances / Bayesian network definition - dependence modelling
Lecture: slides part I - 7-17

Assignments

Let's consider 3 variables A, B and C, what directed acyclic graphs encode the following assumptions ? Provide some realistic examples for all these scenarios.

A, B and C are all independent.
A and B are "directly" dependent (they are dependent, and dependent conditionally to any other variable), B and C are independent, A and C are independent.

A and B are "directly" dependent, B and C are "directly" dependent, A and C are dependent but independent given B.
A and B are "directly" dependent, B and C are "directly" dependent, A and C are independent but dependent given B.

Write the decomposition of the joint distribution as a product of local conditional distributions for the graph depicted below

Markov Equivalence :

definition : "inferred edges" are the arcs which are not members of a V-structure, but whose orientation is also constrained (if we inverted them, we could for example create a new V-structure not present in the original graph)
Let us consider 3 variables. How many different DAGs with 3 variables exist? Which ones are Markov equivalent? And how many different CPDAGs exist with these 3 variables?
4 - Modélisation : Minesweeper
Lecture: slides part I - 19-20

cf slide 19

implement your model in pyAgrum to answer to the following questions (you can check the answers with a simple logical reasoning)

what is the probability of the reading in cell 2 equal to 0 (N2=0) i.e. you click in cell 2 and nothing happens : no bomb in cell 2 neither in its adjacent cells) ?
what is the probability distribution of N2 if you observed that N1=0 ? if you observed that N1=0 and N3=1 ?

what is the probability of N4 when N1=0 and N3=1 ?

bonus questions :

let us imagine that we know the total number of bombs in the grid. How to model this information in our BN. Add this in your Genie implementation and answer to the following questions :
what is the probability of N2=0 if you know that there are 2 bombs in the grid ?
what are the probability distributions of N2, N3 and N4 if you observed that N1=0 ?
5 - Inférence probabiliste - Message Passing / Probabilistic Inference
Lecture: slides part I - 21-25
DAG : how can we recognize a tree ? a poly-tree ? Wikipedia
6 - Message Passing : Animals
(original file : Animals.xdsl)

BN used during the course : Animals2.xdsl (generated by replacing logical dependencies by CPDs)

Scenario 1 : without evidence

initialization of λ vectors : ?
first λ messages : λ_Wa(Cl)= ? λ_Bo(Cl)= ? λ_Ha(Cl)= ? but also λ_En(An)= ? λ_Be(An)= ?
aggregation of some λ messages : λ(Cl)=?
last λ messages : λ_Cl(An)=?
last aggregation : λ(An)=?
initialization of π vectors ?
first π messages : π_En(An)=? π_Cl(An)=? π_Be(An)=?
first π aggregations : π(En)=? π(Cl)=? π(Be)=?
other π messages : π_Wa(Cl)=? π_Bo(Cl)=? π_Ha(Cl)=?
last π aggregations : π(Wa)=? π(Bo)=? π(Ha)=?
and finally, what is the probability of each variable given the (empty) evidence ? (check with pyAgrum)

Scenario 2 : with evidence {Class=Mammal, Bears Young As=Eggs}

initialization of λ vectors : ?
first λ messages :
λ_Wa(Cl), λ_Bo(Cl), λ_Ha(Cl) are skipped because Cl is observed
λ_En(An)= ? λ_Be(An)= ? λ_Cl(An)= ?
aggregation : λ(An)=?
initialization of π vectors ?
first π messages :
π_Cl(An) and π_Be(An) are skipped because Cl is observed
π_En(An)=?
first π aggregation : π(En)
other π messages : π_Wa(Cl)=? π_Bo(Cl)=? π_Ha(Cl)=? you can notice that the information about Be is not usefull here !
last π aggregations : π(Wa)=? π(Bo)=? π(Ha)=?
and finally, what is the probability of each variable given the (empty) evidence ? (check with pyAgrum)

Scenario 3 : with evidence {Warm Blooded=true, Bears Young As=Eggs}

do it by yourself :-)
- Animals2.xdsl Fichier
7-8 - Elicitation d'un réseau bayésien / Bayesian network elicitation
Lecture : slides Part II 3-7
Assignment: ?
9 - Apprentissage des paramètres / BN parameter learning
Lecture : slides part II - 9-14 (complete data)

Assignment

Let's use the data provided in the file example-parameter-learning-complete.csv (cf. below)

The CPDs (conditional probability distributions) corresponding to this network are P(A), P(B|A), P(D), P(C | A, D).
The contingency tables associated to these CPDs are N(A), N(B,A), N(D), N(C, A, D). Fill these tables by counting the number of corresponding events in the dataset.
Maximum of likelihood : how can we estime P(A) from N(A) ? P(D) from N(D) ? P(B|A) from N(B,A) ? P(C | A,D) from N(C, A,D) ? What are the results ?
Let's consider the following Dirichlet coefficient α_A=[1 1], α_D=[3 7], α_BA=[1 1; 1 1], α_CAD=[1 1 1 1; 1 1 1 1]. What are the CPDs estimation with Expectation a posteriori approach ?

Lecture: slides part II - 15-17 (incomplete data)

Assignment

Let's use the data provided in the file example-parameter-learning-incomplete.csv (cf. below)

How many of the 50 lines of data will we consider by applying Complete Case Analysis approach ?
If we now consider Available Case Analysis approach, how many lines will we consider for the estimation of P(A) ? P(D) ? P(B | A) ?
EM algorithm :

initialize the parameters by applying (by hand) Available Case Analysis approach
use pyAgrum to implement this network
First E step :

the first missing data is the first sample, with A=0, C=0, D=0 and B missing.

use your pyAgrum implementation to ask your actual network (initialized with ACA) what is P(B | A=0, C=0, D=0).
what are the contingency tables impacted by this missing value ?
if B surely observed and equal to 0, N(B=0) would be increased by 1. if B surely observed and equal to 1, N(B=1) would be increased by 1. Here B is missing, but we have an estimation of its probability, so N(B=0) and N(B=1) will be increased by the respective probability of P(B=0| A=0, C=0, D=0) and P(B=1| A=0, C=0, D=0). Same reasoning for N(A,B) where two values are impacted: N(A=0, B=0) and N(A,0, B=1).

First M step : apply Maximum of Likelihood estimation to estimate one new version for each of your CPDs, and update your BN
And officially, you can repeat this until convergence :-)
- example-parameter-learning-complete.csv Fichier
- example-parameter-learning-incomplete.csv Fichier
10 - Apprentissage de la structure / structure learning
Lecture: slides part II - 21-35

Assignment

Scoring function slide 26
MWST algorithm: slide 31
Greedy search algorithm: slide 34
11 - Apprentissage : Sachs
Use PyAgrum to load the Sachs BN
init
import pyAgrum as gum
import pyAgrum.lib.notebook as gnb
import pyAgrum.lib.bn_vs_bn as bnvsbn
example
bnTheo=gum.loadBN('sachs.xdsl')
EGTheo=gum.EssentialGraph(bnTheo)
gnb.sideBySide(bnTheo,EGTheo)

generate several datasets from the Sachs network, for instance 5 datasets for each of the sizes = 50, 500, 1000, 5000
example
gum.generateSample(bnTheo, 500, "sample_sachs_500_1.csv",True)
for each dataset, you will be able to run at least two algorithms provided by pyAgrum
.Greedy Hill Climbing, which is a score-based algorithm that returns one DAG
learner=gum.BNLearner("sample_sachs_500_1.csv",bnTheo) #using bn as template for variables
learner.useGreedyHillClimbing()
learner.useScoreAIC() # or useScoreBIC, useScoreBDeu
bnApp=learner.learnBN()
print("Learned in {0}ms".format(1000*learner.currentTime()))
EGApp=gum.EssentialGraph(bnApp)
gnb.sideBySide(bnApp,EGApp)
MIIC, which is a recent constraint-based algorithm that returns one essential grap
learner2=gum.BNLearner("sample_sachs_500_1.csv",bnTheo) #using bn as template for variables
learner2.useMIIC()
learner2.useNoCorrection()
EGApp=learner2.learnEssentialGraph()
print("Learned in {0}ms".format(1000*learner2.currentTime()))
gnb.sideBySide(EGTheo,EGApp)

you can estimate the quality of the learnt model, by comparing its essential graph to the theoretical one
bncmp=bnvsbn.GraphicalBNComparator(bnApp,bnTheo)
bncmp.hamming()
where the hamming function returns 2 values :
hamming : the Hamming distance between the skeletons
structural hamming : the Hamming distance between the corresponding essential graphs

questions :
why are we repeating each experiment (for one data size) several times ?
for one experiment, can you explain why it is "dangerous" to compare directly both DAGs ? and why comparing both essential graphs is meaningful ?
plot the learning time and the structural Hamming distance with respect to the data size. What are your conclusions ?
- Sach's network Fichier
12 - Introduction aux PRM / introduction to Probabilistic relational models
Lecture: slides part IV - PRM
Assignment
complete the definition of the PRM described in the figure 1 given below (for need of simplicity, this model is absolutely unrealistic)
Movie.Genre domain is [Drama, Horror, Comedy]
Movie.ReleaseDate domain is [<2000, >=2000]
User.Gender domain is [Male, Female]
User.Age domain is [<10, 10-18, >18]
User.Occupation domain is [student, working, retired]
Vote.Rating domain is [Low, High]
What CPDs are needed but not defined in fig.1 ?
Use the instance of the database provided below to generate the corresponding Ground Bayesian Network

Bonus question : "nightmare PRM" : one person's eye color depends on two other person's eye color, the person who is the mother of the first one, and the person who is his father. Model this sentence with the simplest possible PRM.
Projet / project
How it works :

one article per PERSON (and one person per article)

make an analysis of the article: synthesis of the methods used, experiments, results, ... and your personal conclusions on the article

registration : use the forum to post me (a msg from the delegate) the distribution of the articles, before March 25, 2024. Contact me at least ONE WEEK BEFORE if you want an article in a particular field.

the report must be deposited on Madoc, in PDF format before April 19, 2024 - 19h
Evaluation
- Retour du Projet / Project deposit Devoir
Archives

Lundi	Mardi	Mercredi	Jeudi	Vendredi	Samedi	Dimanche
				Aucun événement, vendredi 1 novembre Aucun événement, vendredi 1 novembre	Aucun événement, samedi 2 novembre Aucun événement, samedi 2 novembre	Aucun événement, dimanche 3 novembre Aucun événement, dimanche 3 novembre
Aucun événement, lundi 4 novembre Aucun événement, lundi 4 novembre	Aucun événement, mardi 5 novembre Aucun événement, mardi 5 novembre	Aucun événement, mercredi 6 novembre Aucun événement, mercredi 6 novembre	Aucun événement, jeudi 7 novembre Aucun événement, jeudi 7 novembre	Aucun événement, vendredi 8 novembre Aucun événement, vendredi 8 novembre	Aucun événement, samedi 9 novembre Aucun événement, samedi 9 novembre	Aucun événement, dimanche 10 novembre Aucun événement, dimanche 10 novembre
Aucun événement, lundi 11 novembre Aucun événement, lundi 11 novembre	Aucun événement, mardi 12 novembre Aucun événement, mardi 12 novembre	Aucun événement, mercredi 13 novembre Aucun événement, mercredi 13 novembre	Aucun événement, jeudi 14 novembre Aucun événement, jeudi 14 novembre	Aucun événement, vendredi 15 novembre Aucun événement, vendredi 15 novembre	Aucun événement, samedi 16 novembre Aucun événement, samedi 16 novembre	Aucun événement, dimanche 17 novembre Aucun événement, dimanche 17 novembre
Aucun événement, lundi 18 novembre Aucun événement, lundi 18 novembre	Aucun événement, mardi 19 novembre Aucun événement, mardi 19 novembre	Aucun événement, mercredi 20 novembre Aucun événement, mercredi 20 novembre	Aucun événement, jeudi 21 novembre Aucun événement, jeudi 21 novembre	Aucun événement, vendredi 22 novembre Aucun événement, vendredi 22 novembre	Aucun événement, samedi 23 novembre Aucun événement, samedi 23 novembre	Aucun événement, dimanche 24 novembre Aucun événement, dimanche 24 novembre
Aucun événement, lundi 25 novembre Aucun événement, lundi 25 novembre	Aucun événement, mardi 26 novembre Aucun événement, mardi 26 novembre	Aucun événement, mercredi 27 novembre Aucun événement, mercredi 27 novembre	Aucun événement, jeudi 28 novembre Aucun événement, jeudi 28 novembre	Aucun événement, vendredi 29 novembre Aucun événement, vendredi 29 novembre	Aucun événement, samedi 30 novembre Aucun événement, samedi 30 novembre

novembre 2024

Aperçu des sections

Introduction

1 - Introduction, rappels de probabilité / Reminders of Probability Theory

2-3 - Définition - modèle de dépendances/indépendances / Bayesian network definition - dependence modelling

4 - Modélisation : Minesweeper

5 - Inférence probabiliste - Message Passing / Probabilistic Inference

6 - Message Passing : Animals

7-8 - Elicitation d'un réseau bayésien / Bayesian network elicitation

9 - Apprentissage des paramètres / BN parameter learning

10 - Apprentissage de la structure / structure learning

11 - Apprentissage : Sachs

12 - Introduction aux PRM / introduction to Probabilistic relational models

Projet / project

Evaluation

Archives