3 The Bayesian Network Representation
3 The Bayesian Network Representation
3.1 Exploiting Independence Properties
3.1.1 Independent Random Variables
3.1.2 The Conditional Parameterization
chain rule:
3.1.3 The Naive Bayes Model
CPDs: conditional probability distributions
naive Bayes assumption: the features are conditionally independent given the instance’s class. ( In other words, within each class of instances, the different properties can be determined independently. )
Concept: The Naive Bayes Model:
It is often used in practice, because of its simplicity and the small number of parameters required. The model is generally used for classification.
3.2 Bayesian Networks
They are not restricted to representing distributions satisfying the strong independence assumptions implicit in the naive Bayes model.
DAG: directed acyclic graph
3.2.1 The Student Example Revisited
local probability models
3.2.1.2 Reasoning Patterns
A joint distribution
any event y given any observations e.
causal reasoning or prediction:
intelligence and the difficulty of class => the recommendation letter
evidential reasoning or explanation:
from effects to causes
grade and the recommendation letter => intelligence
intercausal reasoning:
Explaining away is an instance of a general reasoning pattern called intercausal reasoning, where different causes of the same effect can interact. This type of reasoning is a very common pattern in human reasoning.
For example, when we have fever and a sore throat, and are concerned about mononucleosis, we are greatly relieved to be told we have the flu.
3.2.2.2 Bayesian Network Semantics
Definition: Bayesian network structure, local independencies
A Bayesian network structure G is a directed acyclic graph whose nodes represent random variables X1, … , Xn . Let
For each variable
3.2.3 Graphs and Distributions
I-Maps
Definition: independencies in P
Let P be a distribution over X . We define I(P) to be the set of independence assertions of the
form (X ⊥ Y | Z) that hold in P .
Definition: I-map
Let K be any graph object associated with a set of independencies I(K). We say that K is an
I-map for a set of independencies I if I(K) ⊆ I.
G is an I-map for P if G is an I-map for I(P): any independence that G asserts must also hold in P .
I-Map to Factorization
Definition: factorization
Let G be a BN graph over the variables X1 , … , Xn . We say that a distribution P over the same
space factorizes according to G if P can be expressed as a product:
This equation is called the chain rule for Bayesian networks.
Factorization to I-Map
3.3 Independencies in Graphs
This set is also called the set of global Markov independencies.
3.3.1 D-separation
v-structure: X → Z ← Y
observed variable active trail: Let G be a BN structure, and X1 … Xn a trail in G. Let Z be a subset of observed variables. The trail X1 … Xn is active given Z if
• Whenever we have a v-structure Xi−1 → Xi ← Xi+1 , then Xi or one of its descendants are in Z;
• no other node along the trail is in Z.
d-separation: Let X, Y , Z be three sets of nodes in G. We say that X and Y are d-separated given Z, denoted d-sepG (X; Y | Z), if there is no active trail between any node X ∈ X and Y ∈ Y given Z.
We use I(G) to denote the set of independencies that correspond to d-separation:
I(G) = {(X ⊥ Y | Z) : d-sepG (X; Y | Z)}.
This set is also called the set of global Markov independencies.
soundness of d-separation: if we find that two nodes X and Y are d-separated given some Z, then we are guaranteed that they are, in fact, conditionally independent given Z.
completeness of d-separation: d-separation detects all possible independencies.
Definition faithful: A distribution P is faithful to G if, whenever (X ⊥ Y | Z) ∈ I(P ), then d-sepG (X; Y | Z). In other words, any independence in P is reflected in the d-separation properties of the graph.
if X and Y are not d-separated given Z in G, then X and Y are dependent in all distributions P that factorize over G.
I-Equivalence
I = independence
a,b,c, All three of them encode precisely the same independence assumptions: (X ⊥ Y | Z)
I-equivalence: Two graph structures K1 and K2 over X are I-equivalent if I(K1 ) = I(K2). The set of all graphs over X is partitioned into a set of mutually exclusive and exhaustive I-equivalence classes, which are the set of equivalence classes induced by the I-equivalence relation.
skeleton: Let G1 and G2 be two graphs over X . If G1 and G2 have the same skeleton and the same set of v-structures then they are I-equivalent.
minimal I-map: A graph K is a minimal I-map for a set of independencies I if it is an I-map for I, and if the removal of even a single edge from K renders it not an I-map.
perfect map: We say that a graph K is a perfect map (P-map) for a set of independencies I if we have that I(K) = I. We say that K is a perfect map for P if I(K) = I(P ).