Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Density-based Clustering: DBSCAN Algorithm and Parameter Selection, Exams of Data Mining

A lecture on density-based clustering, focusing on the DBSCAN algorithm and its parameters. It covers the basic idea of density-based clustering, the definitions of neighborhood and density, core, border, and outlier points, and the DBSCAN algorithm. It also discusses the pros and cons of DBSCAN and the method for determining the parameters Eps and MinPts.

Typology: Exams

2019/2020

Uploaded on 07/02/2020

sampat-aheer
sampat-aheer 🇮🇳

1 document

1 / 16

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Clustering
Lecture 4: Density-based Methods
Jing Gao
SUNY Buffalo
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Density-based Clustering: DBSCAN Algorithm and Parameter Selection and more Exams Data Mining in PDF only on Docsity!

Clustering

Lecture 4: Density-based Methods

Jing Gao

SUNY Buffalo

Outline

  • Basics
    • Motivation, definition, evaluation
  • Methods
    • Partitional
    • Hierarchical
    • Density-based
    • Mixture model
    • Spectral methods
  • Advanced topics
    • Clustering ensemble
    • Clustering in MapReduce
    • Semi-supervised clustering, subspace clustering, co-clustering, etc.

Density Definition

• -Neighborhood – Objects within a radius of  from

an object.
  • “High density” - ε-Neighborhood of an object contains
at least MinPts of objects.

q p^

ε ε

ε-Neighborhood of p ε-Neighborhood of q Density of p is “high” (MinPts = 4) Density of q is “low” (MinPts = 4)

N  ( p ):{ q | d ( p , q )  }

Core, Border & Outlier

Given  and MinPts ,

categorize the objects into three exclusive groups.

 = 1unit, MinPts = 5

Core

Border

Outlier

A point is a core point if it has more than a specified number of points (MinPts) within Eps—These are points that are at the interior of a cluster. A border point has fewer than MinPts within Eps, but is in the neighborhood of a core point. A noise point is any point that is not a core point nor a border point.

Density-reachability

  • Directly density-reachable
    • An object q is directly density-reachable from object p if p is a core object and q is in p’s -neighborhood.

q p^

ε ε

  • q is directly density-reachable from p
  • p is not directly density-reachable from q
  • Density-reachability is asymmetric

MinPts = 4

Density-reachability

  • Density-Reachable (directly and indirectly):
    • A point p is directly density-reachable from p 2
    • p 2 is directly density-reachable from p 1
    • p 1 is directly density-reachable from q
    • pp 2  p 1  q form a chain

p

q

p 2

  • p is (indirectly) density-reachable from q p (^1) • q is not density-reachable from p

MinPts = 7

DBSCAN Algorithm: Example

  • Parameter
    •  = 2 cm
    • MinPts = 3

for each oD do if o is not yet classified then if o is a core-object then collect all objects density-reachable from o and assign them to a new cluster. else assign o to NOISE

DBSCAN Algorithm: Example

  • Parameter
    •  = 2 cm
    • MinPts = 3

for each oD do if o is not yet classified then if o is a core-object then collect all objects density-reachable from o and assign them to a new cluster. else assign o to NOISE

DBSCAN: Determining EPS and MinPts

  • Idea is that for points in a cluster, their kth^ nearest neighbors are at roughly the same distance
  • Noise points have the kth^ nearest neighbor at farther distance
  • So, plot sorted distance of every point to its kth^ nearest neighbor

When DBSCAN Works Well

Original Points (^) Clusters

  • Resistant to Noise
  • Can handle clusters of different shapes and sizes

Take-away Message

  • The basic idea of density-based clustering
  • The two important parameters and the definitions of

neighborhood and density in DBSCAN

  • Core, border and outlier points
  • DBSCAN algorithm
  • DBSCAN’s pros and cons