Download Machine Learning - Artificial Intelligence - Lecture Slides and more Slides Artificial Intelligence in PDF only on Docsity!
Lecture 34 of 41
Introduction to Machine Learning
Lecture Outline
- Next Week: Sections 18.3-18.4, 18.6-18.7, Russell and Norvig
- Previously: Representation and Reasoning (Inference)
- Logical
- Probabilistic (“Soft Computing”)
- Today: Introduction to Learning
- Machine learning framework
- Definitions
- Taxonomies: supervised, unsupervised, reinforcement
- Instance spaces ( X )
- Hypotheses ( h ) and hypothesis spaces ( H )
- Basic examples
- Version spaces and candidate elimination algorithm
- Next Thursday: Inductive Bias and Learning Decision Trees
Example (Revisited):
Learning to Play Board Games
- Type of Training Experience
- Direct or indirect?
- Teacher or not?
- Knowledge about the game (e.g., openings/endgames)?
- Problem: Is Training Experience Representative (of Performance Goal)?
- Software Design
- Assumptions of the learning system: legal move generator exists
- Software requirements: generator, evaluator(s), parametric target function
- Choosing a Target Function
- ChooseMove: Board Move (action selection function, or policy )
- V: Board R (board evaluation function)
- Ideal target V ; approximated target
- Goal of learning process: operational description (approximation) of V
V ˆ
Implicit Representation in Learning:
Target Evaluation Function for Checkers
- Possible Definition
- If b is a final board state that is won, then V(b) = 100
- If b is a final board state that is lost, then V(b) = -
- If b is a final board state that is drawn, then V(b) = 0
- If b is not a final board state in the game, then V(b) = V(b’) where b’ is the best final board state that can be achieved starting from b and playing optimally until the end of the game
- Correct values, but not operational
- Choosing a Representation for the Target Function
- Collection of rules?
- Neural network?
- Polynomial function (e.g., linear, quadratic combination) of board features?
- Other?
A Representation for Learned Function
- bp/rp = number of black/red pieces; bk/rk = number of black/red kings; bt/rt = number of black/red pieces threatened (can be taken on next turn)
V ˆ b w 0 w 1 bp b w 2 rp b w 3 bk b w 4 rk b w 5 bt b w 6 rt b
Performance Element:
What to Learn?
- Classification Functions
- Hidden functions: estimating (“fitting”) parameters
- Concepts (e.g., chair, face, game)
- Diagnosis, prognosis: medical, risk assessment, fraud, mechanical systems
- Models
- Map (for navigation)
- Distribution (query answering, aka QA)
- Language model (e.g., automaton/grammar)
- Skills
- Playing games
- Planning
- Reasoning (acquiring representation to use in reasoning)
- Cluster Definitions for Pattern Recognition
- Shapes of objects
- Functional or taxonomic definition
- Many Learning Problems Can Be Reduced to Classification
Representations and Algorithms:
How to Learn It?
- Supervised
- What is learned? Classification function; other models
- Inputs and outputs? Learning:
- How is it learned? Presentation of examples to learner (by teacher)
- Unsupervised
- Cluster definition, or vector quantization function ( codebook )
- Learning:
- Formation, segmentation, labeling of clusters based on observations, metric
- Reinforcement
- Control policy (function from states of the world to actions)
- Learning:
- (Delayed) feedback of reward values to agent based on actions selected; model updated based on reward, (partially) observable state
examples x,f x approximation f ˆ x
observations x distancemetric d x 1 ,x 2 discretecodebook f x
state/rew ard sequence si ,r i : 1 i n policy p:s a
Example:
Supervised Inductive Learning Problem
Unknown Function
x 1 x 2 x 3 x 4
y = f (x 1 , x 2 , x 3 , x 4 )
- xi: ti, y: t, f : (t 1 t 2 t 3 t 4 ) t
- Our learning function: Vector (t 1 t 2 t 3 t 4 t) (t 1 t 2 t 3 t 4 ) t
Example x 1 x 2 x 3 x 4 y 0 0 1 1 0 0 1 0 0 0 0 0 2 0 0 1 1 1 3 1 0 0 1 1 4 0 1 1 0 0 5 1 1 0 0 0 6 0 1 0 1 0
Hypothesis Space:
Unrestricted Case
- | A B | = | B | |^ A^ |
- | H^4 H | = | {0,1} {0,1} {0,1} {0,1} {0,1} | = 2^24 = 65536 function values
- Complete Ignorance: Is Learning Possible?
- Need to see every possible input/output pair
- After 7 examples, still have 2^9 = 512 possibilities (out of 65536) for f Example x 1 x 2 x 3 x 4 y 0 0 0 0 0? 1 0 0 0 1? 2 0 0 1 0 0 3 0 0 1 1 1 4 0 1 0 0 0 5 0 1 0 1 0 6 0 1 1 0 0 7 0 1 1 1? 8 1 0 0 0? 9 1 0 0 1 1 10 1 0 1 0? 11 1 0 1 1? 12 1 1 0 0 0 13 1 1 0 1? 14 1 1 1 0? 15 1 1 1 1?
Representing Hypotheses
- Many Possible Representations
- Hypothesis h : Conjunction of Constraints on Attributes
- Constraint Values
- Specific value (e.g., Water = Warm )
- Don’t care (e.g., “Water = ?” )
- No value allowed (e.g., “Water = Ø ”)
- Example Hypothesis for EnjoySport
- Sky AirTemp Humidity Wind Water Forecast <Sunny?? Strong? Same>
- Is this consistent with the training examples?
- What are some hypotheses that are consistent with the examples?
Typical Concept Learning Tasks
- Given
- Instances X: possible days, each described by attributes Sky, AirTemp, Humidity, Wind, Water, Forecast
- Target function c EnjoySport: X H {{Rainy, Sunny} {Warm, Cold} {Normal, High} {None, Mild, Strong} {Cool, Warm} {Same, Change}} {0, 1}
- Hypotheses H : conjunctions of literals (e.g., , Cold, High, ?, ?, ?> )
- Training examples D : positive and negative examples of the target function
- Determine
- Hypothesis h H such that h(x) = c(x) for all x D
- Such h are consistent with the training data
- Training Examples
- Assumption: no missing X values
- Noise in values of c (contradictory labels)?
x1, c x 1 , , xm,c xm
Instances, Hypotheses, and
the Partial Ordering Less-Specific-Than
Instances X Hypotheses H
x 1 = <Sunny, Warm, High, Strong, Cool, Same> x 2 = <Sunny, Warm, High, Light, Warm, Same>
h 1 = <Sunny, ?, ?, Strong, ?, ?> h 2 = <Sunny, ?, ?, ?, ?, ?> h 3 = <Sunny, ?, ?, ?, Cool, ?>
h 2 P h 1 h 2 P h 3
x 1
x 2
Specific
General
h 1 h 3
h 2
P Less-Specific-Than More-General-Than
Find-S Algorithm
- Initialize h to the most specific hypothesis in H
H : the hypothesis space (partially ordered set under relation Less-Specific-Than )
- For each positive training instance x
For each attribute constraint ai in h IF the constraint ai in h is satisfied by x THEN do nothing ELSE replace ai in h by the next more general constraint that is satisfied by x
- Output hypothesis h
Version Spaces
- Definition: Consistent Hypotheses
- A hypothesis h is consistent with a set of training examples D of target concept c if and only if h(x) = c(x) for each training example < x, c(x) > in D.
- Consistent ( h , D ) < x , c(x) > D. h(x) = c(x)
- Definition: Version Space
- The version space VSH,D , with respect to hypothesis space H and training examples D , is the subset of hypotheses from H consistent with all training examples in D.
- VSH,D { h H | Consistent (h, D) }
- Initialization
G (singleton) set containing most general hypothesis in H , denoted {, … , ?>} S set of most specific hypotheses in H , denoted {<Ø, … , Ø>}
- For each training example d
If d is a positive example ( Update-S ) Remove from G any hypotheses inconsistent with d For each hypothesis s in S that is not consistent with d Remove s from S Add to S all minimal generalizations h of s such that
- h is consistent with d
- Some member of G is more general than h (These are the greatest lower bounds, or meets , s d , in VSH,D ) Remove from S any hypothesis that is more general than another hypothesis in S (remove any dominated elements)
Candidate Elimination Algorithm [1]