Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Lecture 33: Inference in Graphical Models and Intro to Structure Learning, Slides of Artificial Intelligence

A lecture outline from docsity.com covering bayesian belief networks (bbns), including inference and learning techniques, causal discovery, and learning distributions in bbns. The lecture also discusses learning structure, overfitting prevention, and the role of inference in structure learning.

Typology: Slides

2012/2013

Uploaded on 04/29/2013

shantii
shantii 🇮🇳

4.4

(14)

98 documents

1 / 24

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Inference in Graphical Models
and Intro to Structure Learning
Lecture 33 of 41
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18

Partial preview of the text

Download Lecture 33: Inference in Graphical Models and Intro to Structure Learning and more Slides Artificial Intelligence in PDF only on Docsity!

Inference in Graphical Models

and Intro to Structure Learning

Lecture 33 of 41

Lecture Outline

  • More Bayesian Belief Networks (BBNs)
    • Inference: applying CPTs
    • Learning: CPTs from data, elicitation
    • In-class exercises
      • Hugin , BKD demos
      • CPT elicitation, application
  • Learning BBN Structure
    • K2 algorithm
    • Other probabilistic scores and search algorithms
  • Causal Discovery: Learning Causality from Observations
  • Incomplete Data: Learning and Inference (Expectation-Maximization)
  • Next Week: BBNs Concluded; Review for Midterm (11 October 2001)
  • After Midterm: EM Algorithm, Unsupervised Learning, Clustering

Bayesian Networks:

Quick Review

X 1

X 2

X 3

X 4

Season: Spring Summer Fall Winter

Sprinkler: On, Off

Rain: None, Drizzle, Steady, Downpour

Ground-Moisture: Wet, Dry X 5 Ground-Slipperiness: Slippery, Not-Slippery

P ( Summer , Off , Drizzle , Wet , Not-Slippery ) = P ( S ) · P ( O | S ) · P ( D | S ) · P ( W | O , D ) · P ( N | W )

  • Recall: Conditional Independence (CI) Assumptions
  • Bayesian Network: Digraph Model
    • Vertices (nodes): denote events (each a random variable)
    • Edges (arcs, links): denote conditional dependencies
  • Chain Rule for (Exact) Inference in BBNs
    • Arbitrary Bayesian networks: NP -complete
    • Polytrees: linear time
  • Example (“Sprinkler” BBN)
  • MAP, ML Estimation over BBNs

n i

P X 1 ,X 2 , ,Xn P Xi|parents Xi 1

hML  argmaxh  H P  D |h 

Learning Distributions in BBNs:

Quick Review

  • Learning Distributions
    • Shortcomings of Naïve Bayes
    • Making judicious CI assumptions
    • Scaling up to BBNs: need to learn a CPT for all parent sets
    • Goal: generalization
      • Given D (e.g., {1011, 1001, 0100})
      • Would like to know P ( schema ): e.g., P (11**)  P ( x 1 = 1, x 2 = 1)
  • Variants
    • Known or unknown structure
    • Training examples may have missing values
  • Gradient Learning Algorithm
    • Weight update rule
    • Learns CPTs given data points D

x D ijk

h ij ik ijk ijk w

P y ,u |x w w r

  • Constraint-Based
    • Perform tests of conditional independence
    • Search for network consistent with observed dependencies (or lack thereof)
    • Intuitive; closely follows definition of BBNs
    • Separates construction from form of CI tests
    • Sensitive to errors in individual tests
  • Score-Based
    • Define scoring function ( aka score) that evaluates how well (in)dependencies in a structure match observations
    • Search for structure that maximizes score
    • Statistically and information theoretically motivated
    • Can make compromises
  • Common Properties
    • Soundness: with sufficient data and computation, both learn correct structure
    • Both learn structure from observations and can incorporate knowledge

Learning Structure:

Constraints Versus Scores

Learning Structure:

Maximum Weight Spanning Tree (Chow-Liu)

  • Algorithm Learn-Tree-Structure-I ( D )
    • Estimate P ( x ) and P ( x , y ) for all single RVs, pairs; I( X ; Y ) = D( P ( X , Y ) || P ( X ) · P ( Y ))
    • Build complete undirected graph: variables as vertices, I( X ; Y ) as edge weights
    • TBuild-MWST ( VV , Weights ) // Chow-Liu algorithm: weight function  I
    • Set directional flow on T and place the CPTs on its edges (gradient learning)
    • RETURN: tree-structured BBN with CPT values
  • Algorithm Build-MWST-Kruskal ( EVV , Weights : ER +)
    • HBuild-Heap ( E , Weights ) // aka priority queue (| E |)
    • E’  Ø; Forest  {{ v } | vV } // E’ : set; Forest : union-find (| V |)
    • WHILE Forest.Size > 1 DO (| E |)
      • eH. Delete-Max () // e  new edge from H (lg | E |)
      • IF (( TSForest.Find ( e. Start ))  ( TEForest.Find ( e. End ))) THEN (lg*^ | E |) E’.Union ( e ) // append edge e ; E’. Size ++ (1) Forest.Union ( TS , TE ) // Forest.Size-- (1)
    • RETURN E’ (1)
  • Running Time: (| E | lg | E |) = (| V |^2 lg | V |^2 ) = (| V |^2 lg | V |) = ( n^2 lg n )
  • General-Case BBN Structure Learning: Use Inference to Compute Scores
  • Recall: Bayesian Inference aka Bayesian Reasoning
    • Assumption: hH are mutually exclusive and exhaustive
    • Optimal strategy: combine predictions of hypotheses in proportion to likelihood
      • Compute conditional probability of hypothesis h given observed data D
      • i.e., compute expectation over unknown h for unseen cases
      • Let h  structure, parameters   CPTs

Scores for Learning Structure:

The Role of Inference

 ^     ^ ^ ^ ^ 

 ^    

 

h H

m

m 1 2 n

P x |D,h P h| D

P x |D P x ,x , ,x |x ,x , ,x 1

1 1 2 m 

P  h  P  D |h,Θ  P  Θ |h  dΘ

P h|D P D|h P h

Posterior Score Marginal Likelihood

Prior over Structures Likelihood

Prior over Parameters

  • Likelihood L ( : D )
    • Definition: L ( : D )  P ( D | ) =  xD P ( x | )
    • General BBN (i.i.d data x ): L ( : D )   xDi P ( xi | Parents ( xi ) ~ ) =  i L ( i : D )
      • NB:  specifies CPTs for Parents ( xi )
      • Likelihood decomposes according to the structure of the BBN
  • Estimating Prior over Parameters: P ( | D )  P ( D ) · P ( D | )  P ( D ) · L ( : D )
    • Example: Sprinkler
      • Scenarios D = {( Season ( i ), Sprinkler ( i ), Rain ( i ), Moisture ( i ), Slipperiness ( i ))}
      • P ( Su , Off , Dr , Wet , NS ) = P ( S ) · P ( O | S ) · P ( D | S ) · P ( W | O , D ) · P ( N | W )
    • MLE for multinomial distribution (e.g., {Spring, Summer, Fall, Winter}):
    • Likelihood for multinomials
    • Binomial case: N 1 = # heads, N 2 = # tails (“frequency is ML estimator”)

l

l

k k N

Θ ˆ N

K

k

N LΘ D Θk k 1

Scores for Learning Structure:

Prior over Parameters

Learning Structure:

K2 Algorithm and ALARM

  • Algorithm Learn-BBN-Structure-K2 ( D, Max-Parents )

FOR i  1 to n DO // arbitrary ordering of variables { x 1 , x 2 , …, xn } WHILE ( Parents [ xi ]. Size < Max-Parents ) DO // find best candidate parent Bestargmaxj>i ( P ( D | xjParents [ xi ]) // max Dirichlet score IF ( Parents [ xi ] + Best ). Score > Parents [ xi ]. Score ) THEN Parents [ xi ] += Best

RETURN ({ Parents [ xi ] | i  {1, 2, …, n }})

  • A Logical Alarm Reduction Mechanism [Beinlich et al , 1989]
    • BBN model for patient monitoring in surgical anesthesia
    • Vertices (37): findings (e.g., esophageal intubation ), intermediates, observables
    • K2 : found BBN different in only 1 edge from gold standard (elicited from expert)

17

6 5 4

19

10 21

27 11 31

20

22 15 34

32 29 12 9

28 7 8 30

25 18 26 1 2 3

33 14

35

23

13

36

24

16

3 7

Learning Structure:

(Score-Based) Hypothesis Space Search

  • Learning Structure: Beyond Trees
    • Problem not as easy for more complex networks
    • Example
      • Allow two parents (even singly-connected case, aka polytree)
      • Greedy algorithms no longer guaranteed to find optimal network
      • In fact, no efficient algorithm exists
    • Theorem: finding network structure with maximal score, where H restricted to BBNs with at most k parents for each variable, is NP -hard for k > 1
  • Heuristic Search of Search Space H
    • Define H : elements denote possible structures, adjacency relation denotes transformation (e.g., arc addition, deletion, reversal)
    • Traverse this space looking for high-scoring structures
    • Algorithms
      • Greedy hill-climbing
      • Best-first search
      • Simulated annealing

In-Class Exercise:

Hugin Demo

  • Hugin
    • Commercial product for BBN inference: http://www.hugin.com
    • First developed at University of Aalborg, Denmark
  • Applications
    • Popular research tool for inference and learning
    • Used for real-world decision support applications
      • Safety and risk evaluation: http://www.hugin.com/serene/
      • Diagnosis and control in unmanned subs: http://advocate.e-motive.com
      • Customer support automation: http://www.cs.auc.dk/research/DSS/SACSO/
  • Capabilities
    • Lauritzen-Spiegelhalter algorithm for inference (clustering aka clique reduction)
    • Object Oriented Bayesian Networks (OOBNs): structured learning and inference
    • Influence diagrams for decision-theoretic inference (utility + probability)
    • See: http://www.hugin.com/doc.html

In-Class Exercise:

Hugin and CPT Elicitation

  • Hugin Tutorials
    • Introduction: causal reasoning for diagnosis in decision support (toy problem)
      • http://www.hugin.com/hugintro/bbn_pane.html
      • Example domain: explaining low yield (drought versus disease)
    • Tutorial 1: constructing a simple BBN in Hugin
      • http://www.hugin.com/hugintro/bbn_tu_pane.html
      • Eliciting CPTs (or collecting from data) and entering them
    • Tutorial 2: constructing a simple influence diagram (decision network) in Hugin
      • http://www.hugin.com/hugintro/id_tu_pane.html
      • Eliciting utilities (or collecting from data) and entering them
  • Other Important BBN Resources
    • Microsoft Bayesian Networks: http://www.research.microsoft.com/dtas/msbn/
    • XML BN (Interchange Format): http://www.research.microsoft.com/dtas/bnformat/
    • BBN Repository (more data sets) http://www-nt.cs.berkeley.edu/home/nir/public_html/Repository/index.htm

Bayesian Network Learning:

Related Fields and References

  • ANNs: BBNs as Connectionist Models
  • GAs: BBN Inference, Learning as Genetic Optimization, Programming
  • Hybrid Systems (Symbolic / Numerical AI)
  • Conferences
    • General (with respect to machine learning)
      • International Conference on Machine Learning (ICML)
      • American Association for Artificial Intelligence (AAAI)
      • International Joint Conference on Artificial Intelligence (IJCAI, biennial)
    • Specialty
      • International Joint Conference on Neural Networks (IJCNN)
      • Genetic and Evolutionary Computation Conference (GECCO)
      • Neural Information Processing Systems (NIPS)
      • Uncertainty in Artificial Intelligence (UAI)
      • Computational Learning Theory (COLT)
  • Journals
    • General: Artificial Intelligence , Machine Learning , Journal of AI Research
    • Specialty: Neural Networks , Evolutionary Computation , etc.

Learning Bayesian Networks:

Missing Observations

  • Problem Definition
    • Given: data ( n -tuples) with missing values, aka partially observable (PO) data
    • Kinds of missing values
      • Undefined, unknown (possible new )
      • Missing, corrupted (not properly collected)
    • Second case (“truly missing”): want to fill in****? with expected value
  • Solution Approaches
    • Expected = distribution over possible values
    • Use “best guess” BBN to estimate distribution
    • Expectation-Maximization (EM) algorithm can be used here
  • Intuitive Idea
    • Want to find hML in PO case ( Dunobserved variablesobserved variables )
    • Estimation step: calculate E [ unobserved variables | h ], assuming current h
    • Maximization step: update wijk to maximize E [lg P ( D | h )], Dall variables