Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Clustering Algorithms: Types, Sequential Algorithms, and Challenges, Slides of Pattern Classification and Recognition

Various clustering algorithms, their major categories, and the challenges they face. It discusses sequential clustering algorithms, such as the basic sequential clustering algorithm (bsas), maxmin algorithm, and refinement stages. It also touches upon the issues of cluster closeness and sensitivity to data presentation order.

Typology: Slides

2011/2012

Uploaded on 07/17/2012

bandhula
bandhula 🇮🇳

4.7

(10)

94 documents

1 / 16

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
CLUSTERING ALGORITHMS
Number of possible clusterings
Let X={x1,x2,…,xN}
.
Question: In how many ways the Npoints can be
assigned into m groups?
Answer:
Examples:
m
i
Nm i
i
m
m
mNS
0
1
)1(
!
1
),(
101 375 2)3,15(
S
901 115 232 45)4,20(
S
!!10)5,100( 68
S
docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Clustering Algorithms: Types, Sequential Algorithms, and Challenges and more Slides Pattern Classification and Recognition in PDF only on Docsity!

1

CLUSTERING ALGORITHMS

^

Number of possible clusteringsLet

X ={

x,x^1

,…,x 2

}^. N

Question: In how many ways the

N^

points can be

assigned into m groups?

Answer:^ 

Examples:

m i

N

m^

i

m i

m

m

N

S^

0

1

15 (^

S

S

100 (^

68

S

2

^

A way out:^ 

Consider only a small fraction of clusterings of

X^

and

select a “sensible” clustering among them.• Question 1: Which fraction of clusterings is

considered?• Question 2: What “sensible” means?• The answer depends on the specific clusteringalgorithm and the specific criteria to be adopted.

4

^

Cost function optimization.

For most of the cases a

single clustering is obtained.^ ^ Hard clustering

(each point belongs exclusively to a single

cluster):• Basic hard clustering algorithms (e.g.,

k -means)

-^ k

-medoids algorithms

  • Mixture decomposition• Branch and bound• Simulated annealing• Deterministic annealing• Boundary detection• Mode seeking• Genetic clustering algorithms

^ Fuzzy clustering

(each point belongs to more than one

clusters simultaneously).  Possibilistic clustering

(it is based on the

possibility of a

point to belong to a cluster).

5

^

Other schemes:^ ^ Algorithms based on graph theory

(^ e.g., Minimum

Spanning Tree, regions of influence, directed trees).  Competitive learning algorithms (basic competitivelearning scheme, Kohonen self organizing maps).  Subspace clustering algorithms.  Binary morphology clustering algorithms.

7

^

Basic Sequential Clustering Algorithm (BSAS)^ •^

m =

{number of clusters}\

-^ C

={ m

x }^1

-^ For

i =

to^

N

^ Find

C: dk^

( x^ ,Ci^

)= mink

1  jm

d ( x^ i

,C ) j

^ If

( d (

x^ ,Ci^

)> Θk

)^ AND

( m <

q )^ then

o^ m

= m + o^ C

={ m x^ } i

^ Else

o^ C

= Ck

{ k x^ } i

o Where necessary, update representatives (*) ^ End {if}

-^ End {for} (*) When the mean vector

m^ C

is used as representative of the cluster

C

with

nc^

elements, the updating in the light of a new vector

x^ becomes

m^ C

new =(

nC^

m^ C

+^ x

nC +1)

8

^

Remarks:•^

The order of presentation of the data in the algorithm plays importantrole in the clustering results. Different order of presentation may leadto totally different clustering results, in terms of the number ofclusters as well as the clusters themselves.• In BSAS the decision for a vector

x^ is reached prior to the final cluster

formation.• BSAS perform a single pass on the data. Its complexity is

O (
N ).

-^ If clusters are represented by point representatives, compact clustersare favored.

10

^

MBSAS, a modification of BSASIn BSAS a decision for a data vector

x^ is reached prior to the final cluster

formation, which is determined after all vectors have been presented tothe algorithm.•^

MBSAS deals with the above drawback,

at the cost of presenting the

data twice to the algorithm.• MBSAS consists of:^ ^

A cluster determination phase (first pass on the data),which is the same as BSAS with the exception that no vector is assignedto an already formed cluster. At the end of this phase, each clusterconsists of a single element.  A pattern classification phase (second pass on the data),where each one of the unassigned vector is assigned to its closest cluster.

^ Remarks:•^

In MBSAS, a decision for a vector

x^ during the pattern classification

phase is reached taking into account all clusters.• MBSAS is

sensitive to the order of presentation of the vectors.

-^ MBSAS requires two passes on the data. Its complexity is

O (
N ).

11

^

The maxmin algorithmLet

W^

be the set of all points that have been chosen to form clusters up to the current iteration step. The formation of clusters is carried out asfollows:

  • For each

x

X-W

determine

d = x

min

zW^

d ( x,z

  • Determine

y: d

=max y

xX-W

d x

  • If

d y^

is greater than a prespecified threshold then  this vector forms a new cluster

  • else

^ the cluster determination phase of the algorithm terminates.

  • End {if} After the formation of the clusters, each unassigned vector is assigned toits closest cluster.

13

^

Refinement stagesThe problem of closeness of clusters: “

In all the above algorithms it may

happen that two formed clusters lie very close to each other”.^ ^

A simple merging procedure• (

A) Find

C , i

Cj^

( i < j

) such that

d ( C

,C )= i j

min

k,r =

,…,m,k

d (≠ r C,Ck^

) r

  • If

d ( C

,C ) i j

 M

then { 1

M^1

is a user-defined threshold }

^ Merge

C , i

Cj^

to^ C

and eliminate i

Cj^

^ If necessary, update the cluster representative of

Ci^

^ Rename the clusters

Cj +

,…,C

to m C,…,Cj^

, respectively. m -

^ m=m

^ Go to (

A)

  • Else

^ Stop

  • End {if}

14

^

The problem of sensitivity to the order of data presentation:“A vector

x^ may have been assigned to a cluster

Ci

at the current stage

but another cluster

Cj

may be formed at a later stage that lies closer to

x

^ A simple reassignment procedure

  • For

i =

to^

N

^ Find

Cj^

such that

d ( x

,C )= i^ j

min

k =1 ,…,m

d ( x^ i

,C ) k

^ Set

b ( i

)= j^

{^ b

( i )^ is the index of the cluster that lies closet to

x } i^

  • End {for}• For

j =

to^

m ^ Set

C ={ j

x^  i X: b

( i )=

j }

^ If necessary, update representatives

  • End {for}

docsity.com

16

^

Remarks:•^

In practice, a few passes (

^2 ) of the data set are required.

-^ TTSAS is less sensitive to the order of data presentation, compared toBSAS.