Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Learning Linear Separability and Linear Programming in CS 573: Algorithms, Fall 2013, Lecture notes of Linear Programming

Greater Brighton Metropolitan College Linear Programming

The Perceptron algorithm, a machine learning algorithm used for automatic classification and linear separability. It explains how to compute the separating line between red and blue points using linear programming and the concept of linear separability in two dimensions. The document also touches upon the VC dimension and its relationship to the complexity of the function being learned.

What you will learn

How does the Perceptron algorithm work for automatic classification?
How can linear programming be used to learn a linear separator?
What is linear separability and how is it related to machine learning?

Typology: Lecture notes

2021/2022

Uploaded on 09/27/2022

gorillaz 🇬🇧

3.8

(5)

219 documents

1 / 8

This page cannot be seen from the preview

Don't miss anything!

Chapter 22

Learning, Linear Separability and Linear

Programming

CS 573: Algorithms, Fall 2013

November 12, 2013

22.1 The Perceptron algorithm

22.1.0.1 Labeling...

(A) given examples:a database of cars.

(B) like to determine which cars are sport cars..

(C) Each car record: interpreted as point in high dimensions.

(D) Example: sport car with 4 doors, manufactured in 1997, by Quaky (with manufacturer ID 6):

(4,1997,6).

Labeled as a sport car.

(E) Tractor by General Mess (manufacturer ID 3) in 1998: (0,1997,3)

Labeled as not a sport car.

(F) Real world: hundreds of attributes. In some cases even millions of attributes!

(G) Automate this classification process: label sports/regular car automatically.

22.1.0.2 Automatic classification...

(A) learning algorithm:

(A) given several (or many) classified examples...

(B) ...develop its own conjecture for rule of classification.

(C) ... can use it for classifying new data.

(B) learning:training +classifying.

(C) Learn a function: f: IRd→ {−1,1}.

(D) challenge: fmight have infinite complexity...

(E) ...rare situation in real world. Assume learnable functions.

(F) red and blue points that are linearly separable.

(G) Trying to learn a line ℓthat separates the red points from the blue points.

1

Partial preview of the text

Download Learning Linear Separability and Linear Programming in CS 573: Algorithms, Fall 2013 and more Lecture notes Linear Programming in PDF only on Docsity!

Chapter 22 Learning, Linear Separability and Linear

Programming

CS 573: Algorithms, Fall 2013 November 12, 2013

22.1 The Perceptron algorithm

22.1.0.1 Labeling...

(A) given examples:a database of cars. (B) like to determine which cars are sport cars.. (C) Each car record: interpreted as point in high dimensions. (D) Example: sport car with 4 doors, manufactured in 1997, by Quaky (with manufacturer ID 6): (4 , 1997 , 6). Labeled as a sport car. (E) Tractor by General Mess (manufacturer ID 3) in 1998: (0 , 1997 , 3) Labeled as not a sport car. (F) Real world: hundreds of attributes. In some cases even millions of attributes! (G) Automate this classification process: label sports/regular car automatically.

22.1.0.2 Automatic classification...

(A) learning algorithm: (A) given several (or many) classified examples... (B) ...develop its own conjecture for rule of classification. (C) ... can use it for classifying new data. (B) learning : training + classifying. (C) Learn a function: f : IR d^ → {− 1 , 1 }. (D) challenge: f might have infinite complexity... (E) ...rare situation in real world. Assume learnable functions. (F) red and blue points that are linearly separable. (G) Trying to learn a line ℓ that separates the red points from the blue points.

22.1.0.3 Linear separability example...

`

22.1.0.4 Learning linear separation

(A) Given red and blue points – how to compute the separating line ℓ? (B) line/plane/hyperplane is the zero set of a linear function. (C) Form: ∀ x ∈ IR d^ f ( x ) = ⟨ a, x ⟩ + b, where a = ( a 1 ,... , ad ) , b = ( b 1 ,... , bd ) ∈ IR^2. ⟨ a, x ⟩ = ∑ i aixi^ is the^ dot product^ of^ a^ and^ x. (D) classification done by computing sign of f ( x ): sign( f ( x )). (E) If sign( f ( x )) is negative: x is not in class. If positive: inside. (F) A set of training examples :

S =

{ ( x 1 , y 1 ) ,... , ( xn, yn )

} ,

where xi ∈ IR d^ and yi ∈ {-1,1}, for i = 1 ,... , n.

22.1.0.5 Classification...

(A) linear classifier h : ( w, b ) where w ∈ IR d^ and b ∈ IR. (B) classification of x ∈ IR d^ is sign(⟨ w, x ⟩ + b ). (C) labeled example ( x, y ), h classifies ( x, y ) correctly if sign(⟨ w, x ⟩ + b ) = y. (D) Assume a linear classifier exists. (E) Given n labeled example. How to compute the linear classifier for these examples? (F) Use linear programming.... (G) looking for ( w , b ), such that for an ( x i, yi ) we have sign(⟨ w , x i ⟩ + b ) = yi , which is

⟨ w , x i ⟩ + b ≥ 0 if yi = 1 , and ⟨ w , x i ⟩ + b ≤ 0 if yi = − 1_._

22.1.0.11 Claim by figure...

hard easy

R R

`

R

wopt

γ

R

wopt

`

γ ′

errors: ( R/γ )^2 # errors: ( R/γ ′)^2

22.1.0.12 Proof of Perceptron convergence...

(A) Idea of proof: perceptron weight vector converges to wopt. (B) Distance between wopt and k th update vector:

αk =

∥∥ ∥∥ ∥ wk^ −^

R^2

γ

wopt

∥∥ ∥∥ ∥

2 .

(C) Quantify the change between αk and αk + (D) Example being misclassified is ( x, y ).

22.1.0.13 Proof of Perceptron convergence...

(A) Example being misclassified is ( x, y ) (both are constants). (B) w k +1 ← w k + y ∗ x

(C) αk +1 =

∥∥ ∥∥ ∥ wk +1^ −^

R^2

γ

wopt

∥∥ ∥∥ ∥

2

∥∥ ∥∥ ∥ wk^ +^ y x^ −^

R^2

γ

wopt

∥∥ ∥∥ ∥

2

∥∥ ∥∥ ∥

( wk −

R^2

γ

wopt

)

y x

∥∥ ∥∥ ∥

2

⟨( wk − R

2 γ wopt

)

y x ,

( wk − R

2 γ wopt

)

y x

⟩

⟨( wk − R

2 γ wopt

) ,

( wk − R

2 γ wopt

)⟩ +2 y

⟨( wk − R

2 γ wopt

) , x

⟩

⟨ x , x ⟩ = αk + 2 y

⟨( wk − R

2 γ wopt

) , x

⟩

∥∥ ∥ x

∥∥ ∥

2 .

22.1.0.14 Proof of Perceptron convergence...

(A) We proved: αk +1 = αk + 2 y

⟨( wk − R

2 γ wopt

) , x

⟩

∥∥ ∥ x

∥∥ ∥

2 .

(B) ( x , y ) is misclassified: sign(⟨ wk, x ⟩) ̸= y (C) =⇒ sign( y ⟨ w k, x ⟩) = − 1 (D) =⇒ y ⟨ w k, x ⟩ < 0.

(E)

∥∥ ∥ x

∥∥ ∥ ≤ R =⇒

αk +1 ≤ αk + R^2 + 2 y ⟨ wk, x ⟩ − 2 y

⟨ R^2 γ

wopt, x

⟩

≤ αk + R^2 + − 2

R^2

γ

y ⟨ wopt,x ⟩.

(F) ... since 2 y ⟨ wk, x ⟩ < 0.

22.1.0.15 Proof of Perceptron convergence...

(A) Proved: αk +1 ≤ αk + R^2 − 2 R

2 γ y^ ⟨ wopt,x ⟩. (B) sign(⟨ w opt , x ⟩) = y. (C) By margin assumption: y ⟨ wopt , x ⟩ ≥ γ, ∀( x, y ) ∈ S. (D) αk +1 ≤ αk + R^2 − 2 R 2 γ y^ ⟨ wopt,x ⟩ ≤ αk + R^2 − 2 R

2 γ γ ≤ αk + R^2 − 2 R^2 ≤ αk − R^2_._

22.1.0.16 Proof of Perceptron convergence...

(A) We have: αk +1 ≤ αk − R^2

(B) α 0 =

∥∥ ∥∥ ∥^0 −^

R^2

γ

wopt

∥∥ ∥∥ ∥

2

R^4

γ^2

∥∥ ∥ wopt

∥∥ ∥

2

R^4

γ^2

(C) ∀ i αi ≥ 0. (D) Q: max # classification errors can make? (E) ... # of updates (F) .. # of updates ≤ α 0 /R^2 ...

(G) A: ≤

R^2

γ^2

(C) ∀( x, y, x^2 + y^2 ) ∈ ℓ ( B ): ax + by + c ( x^2 + y^2 ) + d ≥ 0.

(D) U ( h ) =

{ ( x, y )

∣∣ ∣ h (( x, y, x^2 + y^2 )) ≤ 0

} . (E) If U ( h ) is a circle =⇒ R ⊂ U ( h ) and B ∩ U ( h ) = ∅. (F) U ( h ) ≡ ax + by + c ( x^2 + y^2 ) ≤ − d.

(G) ⇐⇒

( x^2 + ac x

)

( y^2 + bc y

) ≤ − dc

(H) ⇐⇒

( x + 2 ac

) 2

( y + 2 bc

) 2 ≤ a

(^2) + b 2 4 c^2 −^

d c (I) This is disk in the plane, as claimed.

22.2.0.22 A closing comment...

Linear separability is a powerful technique that can be used to learn complicated concepts that are considerably more complicated than just hyperplane separation. This lifting technique showed above is the kernel technique or linearization.

22.3 A Little Bit On VC Dimension

22.3.0.23 A Little Bit On VC Dimension

(A) Q: how complex is the function trying to learn? (B) VC-dimension is one way of capturing this notion. (VC = Vapnik, Chervonenkis,1971). (C) A matter of expressivity: What is harder to learn:

(a) A rectangle in the plane. (b) A halfplane. (c) A convex polygon with k sides.

22.3.0.24 Thinking about concepts as binary functions...

(A) X = { p 1 ,p 2 ,... , pm }: points in the plane. (B) H: set of all halfplanes. (C) A half-plane r ∈ H defines a binary vector

r ( X ) = ( b 1 ,... , bm ) where bi = 1 if and only if pi is inside r. (D) Possible binary vectors generated by halfplanes: U ( X, H) = { r ( X ) | r ∈ H }. (E) A set X of m elements is shattered by R if

| U ( X, R )| = 2 m.

(F) What does this mean? (G) The VC-dimension of a set of ranges R is the size of the largest set that it can shatter.

22.3.1 Examples 22.3.1.1 Examples

What is the VC dimensions of circles in the plane? X is set of n points in the plane C is a set of all circles. X = { p, q, r, s }

What subsets of X can we generate by circle?

p

q

r

s

22.3.1.2 Subsets realized by disks

p

q

r

s

{} , { r } , { p } , { q } , { s } , { p, s }, { p, q }, { p, r },{ r, q }{ q, s } and { r, p, q }, { p, r, s }{ p, s, q } , { s, q, r } and { r, p, q, s } We got only 15 sets. There is one set which is not there. Which one? The VC dimension of circles in the plane is 3.

22.3.1.3 Sauer’s Lemma

Lemma 22.3.1 (Sauer Lemma). If R has VC dimension d then | U ( X, R )| = O

( md

) , where m is the size of X.

Learning Linear Separability and Linear Programming in CS 573: Algorithms, Fall 2013, Lecture notes of Linear Programming

Related documents

Partial preview of the text

Download Learning Linear Separability and Linear Programming in CS 573: Algorithms, Fall 2013 and more Lecture notes Linear Programming in PDF only on Docsity!

Chapter 22