3 The Bayesian Network Representation

3.1 Exploiting Independence Properties

3.1.1 Independent Random Variables

P(X1,...,Xn)=P(X1)P(X2)⋅⋅⋅P(Xn)

3.1.2 The Conditional Parameterization

P(I,S)=P(I)P(S|I)
chain rule: P(I,D,G,S,L)=P(I)P(D)P(G|I,D)P(S|I)P(L|G)

3.1.3 The Naive Bayes Model

CPDs: conditional probability distributions

naive Bayes assumption: the features are conditionally independent given the instance’s class. ( In other words, within each class of instances, the different properties can be determined independently. )
(Xi⊥X−i|C) for all i
X−i=X1,...,Xn−Xi
3 The Bayesian Network Representation
P(C,X1,...,Xn)=P(C)∏P(Xi|C)

Concept: The Naive Bayes Model:
It is often used in practice, because of its simplicity and the small number of parameters required. The model is generally used for classification.

P (C = c 1 | x 1, . . ., x n) P (C = c 2 | x 1, . . ., x n) = P (C = c 1) P (C = c 2) \prod i = 1 N P (x i | C = c 1) P (x i | C = c 2)

3.2 Bayesian Networks

They are not restricted to representing distributions satisfying the strong independence assumptions implicit in the naive Bayes model.

DAG: directed acyclic graph
3 The Bayesian Network Representation

3.2.1 The Student Example Revisited

local probability models

3.2.1.2 Reasoning Patterns
A joint distribution PB specifies (albeit implicitly) the probability PB(Y=y|E=e) of
any event y given any observations e.

causal reasoning or prediction:
intelligence and the difficulty of class => the recommendation letter
evidential reasoning or explanation:
from effects to causes
grade and the recommendation letter => intelligence
intercausal reasoning:
Explaining away is an instance of a general reasoning pattern called intercausal reasoning, where different causes of the same effect can interact. This type of reasoning is a very common pattern in human reasoning.
For example, when we have fever and a sore throat, and are concerned about mononucleosis, we are greatly relieved to be told we have the flu.

3.2.2.2 Bayesian Network Semantics
Definition: Bayesian network structure, local independencies
A Bayesian network structure G is a directed acyclic graph whose nodes represent random variables X1, … , Xn . Let PaGXi denote the parents of Xi in G, and NonDescendantsXi denote the variables in the graph that are not descendants of Xi . Then G encodes the following set of conditional independence assumptions, called the local independencies, and denoted by Il(G):
For each variable Xi

X i ⊥ N o n D e s c e n d a n t s X i | P a G X i

3.2.3 Graphs and Distributions

I-Maps

Definition: independencies in P
Let P be a distribution over X . We define I(P) to be the set of independence assertions of the
form (X ⊥ Y | Z) that hold in P .

Definition: I-map
Let K be any graph object associated with a set of independencies I(K). We say that K is an
I-map for a set of independencies I if I(K) ⊆ I.

G is an I-map for P if G is an I-map for I(P): any independence that G asserts must also hold in P .

I-Map to Factorization

Definition： factorization
Let G be a BN graph over the variables X1 , … , Xn . We say that a distribution P over the same
space factorizes according to G if P can be expressed as a product:

P (X 1, . . ., X n) = \prod i = 1 n P (X i | P a G X i)

This equation is called the chain rule for Bayesian networks.

Factorization to I-Map

3.3 Independencies in Graphs

This set is also called the set of global Markov independencies.

3.3.1 D-separation

v-structure: X → Z ← Y
observed variable active trail: Let G be a BN structure, and X1 … Xn a trail in G. Let Z be a subset of observed variables. The trail X1 … Xn is active given Z if
• Whenever we have a v-structure Xi−1 → Xi ← Xi+1 , then Xi or one of its descendants are in Z;
• no other node along the trail is in Z.
d-separation: Let X, Y , Z be three sets of nodes in G. We say that X and Y are d-separated given Z, denoted d-sepG (X; Y | Z), if there is no active trail between any node X ∈ X and Y ∈ Y given Z.
We use I(G) to denote the set of independencies that correspond to d-separation:
I(G) = {(X ⊥ Y | Z) : d-sepG (X; Y | Z)}.
This set is also called the set of global Markov independencies.
soundness of d-separation: if we find that two nodes X and Y are d-separated given some Z, then we are guaranteed that they are, in fact, conditionally independent given Z.
completeness of d-separation: d-separation detects all possible independencies.

Definition faithful: A distribution P is faithful to G if, whenever (X ⊥ Y | Z) ∈ I(P ), then d-sepG (X; Y | Z). In other words, any independence in P is reflected in the d-separation properties of the graph.
if X and Y are not d-separated given Z in G, then X and Y are dependent in all distributions P that factorize over G.

I-Equivalence

I = independence
3 The Bayesian Network Representation
a,b,c, All three of them encode precisely the same independence assumptions: (X ⊥ Y | Z)

I-equivalence: Two graph structures K1 and K2 over X are I-equivalent if I(K1 ) = I(K2). The set of all graphs over X is partitioned into a set of mutually exclusive and exhaustive I-equivalence classes, which are the set of equivalence classes induced by the I-equivalence relation.
3 The Bayesian Network Representation

skeleton: Let G1 and G2 be two graphs over X . If G1 and G2 have the same skeleton and the same set of v-structures then they are I-equivalent.

minimal I-map: A graph K is a minimal I-map for a set of independencies I if it is an I-map for I, and if the removal of even a single edge from K renders it not an I-map.

perfect map: We say that a graph K is a perfect map (P-map) for a set of independencies I if we have that I(K) = I. We say that K is a perfect map for P if I(K) = I(P ).

3 The Bayesian Network Representation

3 The Bayesian Network Representation

3.1 Exploiting Independence Properties

3.1.1 Independent Random Variables

3.1.2 The Conditional Parameterization

3.1.3 The Naive Bayes Model

3.2 Bayesian Networks

3.2.1 The Student Example Revisited

3.2.3 Graphs and Distributions

I-Maps

I-Map to Factorization

Factorization to I-Map

3.3 Independencies in Graphs

3.3.1 D-separation

I-Equivalence

相关推荐