Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Lecture 33: Inference in Graphical Models and Intro to Structure Learning, Slides of Artificial Intelligence

West Bengal University of Animal and Fishery Sciences Artificial Intelligence

A lecture outline from docsity.com covering bayesian belief networks (bbns), including inference and learning techniques, causal discovery, and learning distributions in bbns. The lecture also discusses learning structure, overfitting prevention, and the role of inference in structure learning.

Typology: Slides

2012/2013

Uploaded on 04/29/2013

shantii 🇮🇳

4.4

(14)

98 documents

1 / 24

This page cannot be seen from the preview

Don't miss anything!

Inference in Graphical Models

and Intro to Structure Learning

Lecture 33 of 41

Docsity.com

Partial preview of the text

Download Lecture 33: Inference in Graphical Models and Intro to Structure Learning and more Slides Artificial Intelligence in PDF only on Docsity!

Inference in Graphical Models

and Intro to Structure Learning

Lecture 33 of 41

Lecture Outline

More Bayesian Belief Networks (BBNs)
- Inference: applying CPTs
- Learning: CPTs from data, elicitation
- In-class exercises
  - Hugin , BKD demos
  - CPT elicitation, application
Learning BBN Structure
- K2 algorithm
- Other probabilistic scores and search algorithms
Causal Discovery: Learning Causality from Observations
Incomplete Data: Learning and Inference (Expectation-Maximization)
Next Week: BBNs Concluded; Review for Midterm (11 October 2001)
After Midterm: EM Algorithm, Unsupervised Learning, Clustering

Bayesian Networks:

Quick Review

X 1

X 2

X 3

X 4

Season: Spring Summer Fall Winter

Sprinkler: On, Off

Rain: None, Drizzle, Steady, Downpour

Ground-Moisture: Wet, Dry X 5 Ground-Slipperiness: Slippery, Not-Slippery

P ( Summer , Off , Drizzle , Wet , Not-Slippery ) = P ( S ) · P ( O | S ) · P ( D | S ) · P ( W | O , D ) · P ( N | W )

Recall: Conditional Independence (CI) Assumptions
Bayesian Network: Digraph Model
- Vertices (nodes): denote events (each a random variable)
- Edges (arcs, links): denote conditional dependencies
Chain Rule for (Exact) Inference in BBNs
- Arbitrary Bayesian networks: NP -complete
- Polytrees: linear time
Example (“Sprinkler” BBN)
MAP, ML Estimation over BBNs



n i

P X 1 ,X 2 , ,Xn P Xi|parents Xi 1

hML  argmaxh  H P  D |h 

Learning Distributions in BBNs:

Quick Review

Learning Distributions
- Shortcomings of Naïve Bayes
- Making judicious CI assumptions
- Scaling up to BBNs: need to learn a CPT for all parent sets
- Goal: generalization
  - Given D (e.g., {1011, 1001, 0100})
  - Would like to know P ( schema ): e.g., P (11**)  P ( x 1 = 1, x 2 = 1)
Variants
- Known or unknown structure
- Training examples may have missing values
Gradient Learning Algorithm
- Weight update rule
- Learns CPTs given data points D



x D ijk

h ij ik ijk ijk w

P y ,u |x w w r

Constraint-Based
- Perform tests of conditional independence
- Search for network consistent with observed dependencies (or lack thereof)
- Intuitive; closely follows definition of BBNs
- Separates construction from form of CI tests
- Sensitive to errors in individual tests
Score-Based
- Define scoring function ( aka score) that evaluates how well (in)dependencies in a structure match observations
- Search for structure that maximizes score
- Statistically and information theoretically motivated
- Can make compromises
Common Properties
- Soundness: with sufficient data and computation, both learn correct structure
- Both learn structure from observations and can incorporate knowledge

Learning Structure:

Constraints Versus Scores

Learning Structure:

Maximum Weight Spanning Tree (Chow-Liu)

Algorithm Learn-Tree-Structure-I ( D )
- Estimate P ( x ) and P ( x , y ) for all single RVs, pairs; I( X ; Y ) = D( P ( X , Y ) || P ( X ) · P ( Y ))
- Build complete undirected graph: variables as vertices, I( X ; Y ) as edge weights
- T  Build-MWST ( V  V , Weights ) // Chow-Liu algorithm: weight function  I
- Set directional flow on T and place the CPTs on its edges (gradient learning)
- RETURN: tree-structured BBN with CPT values
Algorithm Build-MWST-Kruskal ( E  V  V , Weights : E  R +)
- H  Build-Heap ( E , Weights ) // aka priority queue (| E |)
- E’  Ø; Forest  {{ v } | v  V } // E’ : set; Forest : union-find (| V |)
- WHILE Forest.Size > 1 DO (| E |)
  - e  H. Delete-Max () // e  new edge from H (lg | E |)
  - IF (( TS  Forest.Find ( e. Start ))  ( TE  Forest.Find ( e. End ))) THEN (lg*^ | E |) E’.Union ( e ) // append edge e ; E’. Size ++ (1) Forest.Union ( TS , TE ) // Forest.Size-- (1)
- RETURN E’ (1)
Running Time: (| E | lg | E |) = (| V |^2 lg | V |^2 ) = (| V |^2 lg | V |) = ( n^2 lg n )

General-Case BBN Structure Learning: Use Inference to Compute Scores
Recall: Bayesian Inference aka Bayesian Reasoning
- Assumption: h  H are mutually exclusive and exhaustive
- Optimal strategy: combine predictions of hypotheses in proportion to likelihood
  - Compute conditional probability of hypothesis h given observed data D
  - i.e., compute expectation over unknown h for unseen cases
  - Let h  structure, parameters   CPTs

Scores for Learning Structure:

The Role of Inference

 ^     ^ ^ ^ ^ 

 ^    





 

h H

m 1 2 n

P x |D,h P h| D

P x |D P x ,x , ,x |x ,x , ,x 1

1 1 2 m 

P  h  P  D |h,Θ  P  Θ |h  dΘ

P h|D P D|h P h

Posterior Score Marginal Likelihood

Prior over Structures Likelihood

Prior over Parameters

Likelihood L ( : D )
- Definition: L ( : D )  P ( D | ) =  x  D P ( x | )
- General BBN (i.i.d data x ): L ( : D )   x  D  i P ( xi | Parents ( xi ) ~ ) =  i L ( i : D )
  - NB:  specifies CPTs for Parents ( xi )
  - Likelihood decomposes according to the structure of the BBN
Estimating Prior over Parameters: P ( | D )  P ( D ) · P ( D | )  P ( D ) · L ( : D )
- Example: Sprinkler
  - Scenarios D = {( Season ( i ), Sprinkler ( i ), Rain ( i ), Moisture ( i ), Slipperiness ( i ))}
  - P ( Su , Off , Dr , Wet , NS ) = P ( S ) · P ( O | S ) · P ( D | S ) · P ( W | O , D ) · P ( N | W )
- MLE for multinomial distribution (e.g., {Spring, Summer, Fall, Winter}):
- Likelihood for multinomials
- Binomial case: N 1 = # heads, N 2 = # tails (“frequency is ML estimator”)

k k N

Θ ˆ N



N LΘ D Θk k 1

Scores for Learning Structure:

Prior over Parameters

Learning Structure:

K2 Algorithm and ALARM

Algorithm Learn-BBN-Structure-K2 ( D, Max-Parents )

FOR i  1 to n DO // arbitrary ordering of variables { x 1 , x 2 , …, xn } WHILE ( Parents [ xi ]. Size < Max-Parents ) DO // find best candidate parent Best  argmaxj>i ( P ( D | xj  Parents [ xi ]) // max Dirichlet score IF ( Parents [ xi ] + Best ). Score > Parents [ xi ]. Score ) THEN Parents [ xi ] += Best

RETURN ({ Parents [ xi ] | i  {1, 2, …, n }})

A Logical Alarm Reduction Mechanism [Beinlich et al , 1989]
- BBN model for patient monitoring in surgical anesthesia
- Vertices (37): findings (e.g., esophageal intubation ), intermediates, observables
- K2 : found BBN different in only 1 edge from gold standard (elicited from expert)

6 5 4

10 21

27 11 31

22 15 34

32 29 12 9

28 7 8 30

25 18 26 1 2 3

33 14

3 7

Learning Structure:

(Score-Based) Hypothesis Space Search

Learning Structure: Beyond Trees
- Problem not as easy for more complex networks
- Example
  - Allow two parents (even singly-connected case, aka polytree)
  - Greedy algorithms no longer guaranteed to find optimal network
  - In fact, no efficient algorithm exists
- Theorem: finding network structure with maximal score, where H restricted to BBNs with at most k parents for each variable, is NP -hard for k > 1
Heuristic Search of Search Space H
- Define H : elements denote possible structures, adjacency relation denotes transformation (e.g., arc addition, deletion, reversal)
- Traverse this space looking for high-scoring structures
- Algorithms
  - Greedy hill-climbing
  - Best-first search
  - Simulated annealing

In-Class Exercise:

Hugin Demo

Hugin
- Commercial product for BBN inference: http://www.hugin.com
- First developed at University of Aalborg, Denmark
Applications
- Popular research tool for inference and learning
- Used for real-world decision support applications
  - Safety and risk evaluation: http://www.hugin.com/serene/
  - Diagnosis and control in unmanned subs: http://advocate.e-motive.com
  - Customer support automation: http://www.cs.auc.dk/research/DSS/SACSO/
Capabilities
- Lauritzen-Spiegelhalter algorithm for inference (clustering aka clique reduction)
- Object Oriented Bayesian Networks (OOBNs): structured learning and inference
- Influence diagrams for decision-theoretic inference (utility + probability)
- See: http://www.hugin.com/doc.html

In-Class Exercise:

Hugin and CPT Elicitation

Hugin Tutorials
- Introduction: causal reasoning for diagnosis in decision support (toy problem)
  - http://www.hugin.com/hugintro/bbn_pane.html
  - Example domain: explaining low yield (drought versus disease)
- Tutorial 1: constructing a simple BBN in Hugin
  - http://www.hugin.com/hugintro/bbn_tu_pane.html
  - Eliciting CPTs (or collecting from data) and entering them
- Tutorial 2: constructing a simple influence diagram (decision network) in Hugin
  - http://www.hugin.com/hugintro/id_tu_pane.html
  - Eliciting utilities (or collecting from data) and entering them
Other Important BBN Resources
- Microsoft Bayesian Networks: http://www.research.microsoft.com/dtas/msbn/
- XML BN (Interchange Format): http://www.research.microsoft.com/dtas/bnformat/
- BBN Repository (more data sets) http://www-nt.cs.berkeley.edu/home/nir/public_html/Repository/index.htm

Bayesian Network Learning:

Related Fields and References

ANNs: BBNs as Connectionist Models
GAs: BBN Inference, Learning as Genetic Optimization, Programming
Hybrid Systems (Symbolic / Numerical AI)
Conferences
- General (with respect to machine learning)
  - International Conference on Machine Learning (ICML)
  - American Association for Artificial Intelligence (AAAI)
  - International Joint Conference on Artificial Intelligence (IJCAI, biennial)
- Specialty
  - International Joint Conference on Neural Networks (IJCNN)
  - Genetic and Evolutionary Computation Conference (GECCO)
  - Neural Information Processing Systems (NIPS)
  - Uncertainty in Artificial Intelligence (UAI)
  - Computational Learning Theory (COLT)
Journals
- General: Artificial Intelligence , Machine Learning , Journal of AI Research
- Specialty: Neural Networks , Evolutionary Computation , etc.

Learning Bayesian Networks:

Missing Observations

Problem Definition
- Given: data ( n -tuples) with missing values, aka partially observable (PO) data
- Kinds of missing values
  - Undefined, unknown (possible new )
  - Missing, corrupted (not properly collected)
- Second case (“truly missing”): want to fill in****? with expected value
Solution Approaches
- Expected = distribution over possible values
- Use “best guess” BBN to estimate distribution
- Expectation-Maximization (EM) algorithm can be used here
Intuitive Idea
- Want to find hML in PO case ( D  unobserved variables  observed variables )
- Estimation step: calculate E [ unobserved variables | h ], assuming current h
- Maximization step: update wijk to maximize E [lg P ( D | h )], D  all variables

Lecture 33: Inference in Graphical Models and Intro to Structure Learning, Slides of Artificial Intelligence

Related documents

Partial preview of the text

Download Lecture 33: Inference in Graphical Models and Intro to Structure Learning and more Slides Artificial Intelligence in PDF only on Docsity!

Inference in Graphical Models

and Intro to Structure Learning

Lecture 33 of 41

Lecture Outline

Bayesian Networks:

Quick Review

hML  argmaxh  H P  D |h 

Learning Distributions in BBNs:

Quick Review

Learning Structure:

Constraints Versus Scores

Learning Structure:

Maximum Weight Spanning Tree (Chow-Liu)

Scores for Learning Structure:

The Role of Inference

 ^     ^ ^ ^ ^ 

 ^    

P  h  P  D |h,Θ  P  Θ |h  dΘ

Θ ˆ N

Scores for Learning Structure:

Prior over Parameters

Learning Structure:

K2 Algorithm and ALARM

RETURN ({ Parents [ xi ] | i  {1, 2, …, n }})

Learning Structure:

(Score-Based) Hypothesis Space Search

In-Class Exercise:

Hugin Demo

In-Class Exercise:

Hugin and CPT Elicitation

Bayesian Network Learning:

Related Fields and References

Learning Bayesian Networks:

Missing Observations