Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Machine Learning - Artificial Intelligence - Lecture Slides, Slides of Artificial Intelligence

Some concept of Artificial Intelligence are Agents and Problem Solving, Autonomy, Programs, Classical and Modern Planning, First-Order Logic, Resolution Theorem Proving, Search Strategies, Structure Learning. Main points of this lecture are: Machine Learning, Representation, Reasoning, Logical, Probabilistic, Machine Learning Framework, Taxonomies, Instance Spaces, Hypotheses, Hypothesis Spaces

Typology: Slides

2012/2013

Uploaded on 04/29/2013

shantii
shantii 🇮🇳

4.4

(14)

98 documents

1 / 24

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Lecture 34 of 41
Introduction to Machine Learning
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18

Partial preview of the text

Download Machine Learning - Artificial Intelligence - Lecture Slides and more Slides Artificial Intelligence in PDF only on Docsity!

Lecture 34 of 41

Introduction to Machine Learning

Lecture Outline

  • Next Week: Sections 18.3-18.4, 18.6-18.7, Russell and Norvig
  • Previously: Representation and Reasoning (Inference)
    • Logical
    • Probabilistic (“Soft Computing”)
  • Today: Introduction to Learning
    • Machine learning framework
    • Definitions
      • Taxonomies: supervised, unsupervised, reinforcement
      • Instance spaces ( X )
      • Hypotheses ( h ) and hypothesis spaces ( H )
    • Basic examples
    • Version spaces and candidate elimination algorithm
  • Next Thursday: Inductive Bias and Learning Decision Trees

Example (Revisited):

Learning to Play Board Games

  • Type of Training Experience
    • Direct or indirect?
    • Teacher or not?
    • Knowledge about the game (e.g., openings/endgames)?
  • Problem: Is Training Experience Representative (of Performance Goal)?
  • Software Design
    • Assumptions of the learning system: legal move generator exists
    • Software requirements: generator, evaluator(s), parametric target function
  • Choosing a Target Function
    • ChooseMove: BoardMove (action selection function, or policy )
    • V: BoardR (board evaluation function)
    • Ideal target V ; approximated target
    • Goal of learning process: operational description (approximation) of V

V ˆ

Implicit Representation in Learning:

Target Evaluation Function for Checkers

  • Possible Definition
    • If b is a final board state that is won, then V(b) = 100
    • If b is a final board state that is lost, then V(b) = -
    • If b is a final board state that is drawn, then V(b) = 0
    • If b is not a final board state in the game, then V(b) = V(b’) where b’ is the best final board state that can be achieved starting from b and playing optimally until the end of the game
    • Correct values, but not operational
  • Choosing a Representation for the Target Function
    • Collection of rules?
    • Neural network?
    • Polynomial function (e.g., linear, quadratic combination) of board features?
    • Other?
  • A Representation for Learned Function

    • bp/rp = number of black/red pieces; bk/rk = number of black/red kings; bt/rt = number of black/red pieces threatened (can be taken on next turn)

V ˆ  b   w 0  w 1 bp  b   w 2 rp  b   w 3 bk  b   w 4 rk  b   w 5 bt  b   w 6 rt  b 

Performance Element:

What to Learn?

  • Classification Functions
    • Hidden functions: estimating (“fitting”) parameters
    • Concepts (e.g., chair, face, game)
    • Diagnosis, prognosis: medical, risk assessment, fraud, mechanical systems
  • Models
    • Map (for navigation)
    • Distribution (query answering, aka QA)
    • Language model (e.g., automaton/grammar)
  • Skills
    • Playing games
    • Planning
    • Reasoning (acquiring representation to use in reasoning)
  • Cluster Definitions for Pattern Recognition
    • Shapes of objects
    • Functional or taxonomic definition
  • Many Learning Problems Can Be Reduced to Classification

Representations and Algorithms:

How to Learn It?

  • Supervised
    • What is learned? Classification function; other models
    • Inputs and outputs? Learning:
    • How is it learned? Presentation of examples to learner (by teacher)
  • Unsupervised
    • Cluster definition, or vector quantization function ( codebook )
    • Learning:
    • Formation, segmentation, labeling of clusters based on observations, metric
  • Reinforcement
    • Control policy (function from states of the world to actions)
    • Learning:
    • (Delayed) feedback of reward values to agent based on actions selected; model updated based on reward, (partially) observable state

examples x,f  x  approximation f ˆ x 

observations x distancemetric d  x 1 ,x 2  discretecodebook f   x

state/rew ard sequence si ,r i : 1  i  n  policy p:s  a

Example:

Supervised Inductive Learning Problem

Unknown Function

x 1 x 2 x 3 x 4

y = f (x 1 , x 2 , x 3 , x 4 )

  • xi: ti, y: t, f : (t 1  t 2  t 3  t 4 )  t
  • Our learning function: Vector (t 1  t 2  t 3  t 4  t)  (t 1  t 2  t 3  t 4 )  t

Example x 1 x 2 x 3 x 4 y 0 0 1 1 0 0 1 0 0 0 0 0 2 0 0 1 1 1 3 1 0 0 1 1 4 0 1 1 0 0 5 1 1 0 0 0 6 0 1 0 1 0

Hypothesis Space:

Unrestricted Case

  • | AB | = | B | |^ A^ |
  • | H^4  H | = | {0,1}  {0,1}  {0,1}  {0,1}  {0,1} | = 2^24 = 65536 function values
  • Complete Ignorance: Is Learning Possible?
    • Need to see every possible input/output pair
    • After 7 examples, still have 2^9 = 512 possibilities (out of 65536) for f Example x 1 x 2 x 3 x 4 y 0 0 0 0 0? 1 0 0 0 1? 2 0 0 1 0 0 3 0 0 1 1 1 4 0 1 0 0 0 5 0 1 0 1 0 6 0 1 1 0 0 7 0 1 1 1? 8 1 0 0 0? 9 1 0 0 1 1 10 1 0 1 0? 11 1 0 1 1? 12 1 1 0 0 0 13 1 1 0 1? 14 1 1 1 0? 15 1 1 1 1?

Representing Hypotheses

  • Many Possible Representations
  • Hypothesis h : Conjunction of Constraints on Attributes
  • Constraint Values
    • Specific value (e.g., Water = Warm )
    • Don’t care (e.g., “Water = ?” )
    • No value allowed (e.g., “Water = Ø ”)
  • Example Hypothesis for EnjoySport
    • Sky AirTemp Humidity Wind Water Forecast <Sunny?? Strong? Same>
    • Is this consistent with the training examples?
    • What are some hypotheses that are consistent with the examples?

Typical Concept Learning Tasks

  • Given
    • Instances X: possible days, each described by attributes Sky, AirTemp, Humidity, Wind, Water, Forecast
    • Target function c  EnjoySport: XH  {{Rainy, Sunny}  {Warm, Cold}  {Normal, High}  {None, Mild, Strong}  {Cool, Warm}  {Same, Change}}  {0, 1}
    • Hypotheses H : conjunctions of literals (e.g., )
    • Training examples D : positive and negative examples of the target function
  • Determine
    • Hypothesis hH such that h(x) = c(x) for all xD
    • Such h are consistent with the training data
  • Training Examples
    • Assumption: no missing X values
    • Noise in values of c (contradictory labels)?

x1, c  x 1  ,  , xm,c  xm 

Instances, Hypotheses, and

the Partial Ordering Less-Specific-Than

Instances X Hypotheses H

x 1 = <Sunny, Warm, High, Strong, Cool, Same> x 2 = <Sunny, Warm, High, Light, Warm, Same>

h 1 = <Sunny, ?, ?, Strong, ?, ?> h 2 = <Sunny, ?, ?, ?, ?, ?> h 3 = <Sunny, ?, ?, ?, Cool, ?>

h 2P h 1 h 2P h 3

x 1

x 2

Specific

General

h 1 h 3

h 2

PLess-Specific-ThanMore-General-Than

Find-S Algorithm

  1. Initialize h to the most specific hypothesis in H

H : the hypothesis space (partially ordered set under relation Less-Specific-Than )

  1. For each positive training instance x

For each attribute constraint ai in h IF the constraint ai in h is satisfied by x THEN do nothing ELSE replace ai in h by the next more general constraint that is satisfied by x

  1. Output hypothesis h

Version Spaces

  • Definition: Consistent Hypotheses
    • A hypothesis h is consistent with a set of training examples D of target concept c if and only if h(x) = c(x) for each training example < x, c(x) > in D.
    • Consistent ( h , D )   < x , c(x) >  D. h(x) = c(x)
  • Definition: Version Space
    • The version space VSH,D , with respect to hypothesis space H and training examples D , is the subset of hypotheses from H consistent with all training examples in D.
    • VSH,D  { hH | Consistent (h, D) }
  1. Initialization

G  (singleton) set containing most general hypothesis in H , denoted {} S  set of most specific hypotheses in H , denoted {<Ø, … , Ø>}

  1. For each training example d

If d is a positive example ( Update-S ) Remove from G any hypotheses inconsistent with d For each hypothesis s in S that is not consistent with d Remove s from S Add to S all minimal generalizations h of s such that

  1. h is consistent with d
  2. Some member of G is more general than h (These are the greatest lower bounds, or meets , sd , in VSH,D ) Remove from S any hypothesis that is more general than another hypothesis in S (remove any dominated elements)

Candidate Elimination Algorithm [1]