Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Mining Frequent Patterns and Association Rules in Data Mining, Study Guides, Projects, Research of Data Mining

This chapter from the book 'data mining: concepts and techniques' by jiawei han explores the concepts and techniques of mining frequent patterns and association rules. The basics of frequent patterns and association rules, closed patterns and max-patterns, scalable methods for mining frequent patterns, pruning, generating candidates, partitioning patterns and databases, recursion, mining frequent patterns with fp-trees, visualization of association rules, system optimization, and constraints in data mining.

What you will learn

  • What is constraint-based association mining and how does it work?
  • What are the different kinds of association rules that can be mined?
  • What are the efficient and scalable methods for mining frequent itemsets?
  • What are the basic concepts of data mining for frequent patterns, association, and correlations?
  • What are the advantages of FP-growth over Apriori for scalability in data mining?

Typology: Study Guides, Projects, Research

2017/2018

Uploaded on 03/26/2018

sujeeth-reddy
sujeeth-reddy 🇮🇳

1 document

1 / 80

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
May 10, 2010 Data Mining: Concepts and Techniques 1
Data Mining:
Concepts and Techniques
— Chapter 5 —
Jiawei Han
Department of Computer Science
University of Illinois at Urbana-Champaign
www.cs.uiuc.edu/~hanj
©2006 Jiawei Han and Micheline Kamber, All rights reserved
Revised by Zhongfei (Mark) Zhang
Department of Computer Science
SUNY Binghamton
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50

Partial preview of the text

Download Mining Frequent Patterns and Association Rules in Data Mining and more Study Guides, Projects, Research Data Mining in PDF only on Docsity!

Data Mining:

Concepts and Techniques

— Chapter 5 —

Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign www.cs.uiuc.edu/~hanj ©2006 Jiawei Han and Micheline Kamber, All rights reserved Revised by Zhongfei (Mark) Zhang Department of Computer Science SUNY Binghamton

Chapter 5: Mining Frequent Patterns, Association and Correlations

 Basic concepts and a road map

 Efficient and scalable frequent itemset mining

methods

 Mining various kinds of association rules

 From association mining to correlation

analysis

 Constraint-based association mining

 Summary

Why Is Freq. Pattern Mining Important?  (^) Discloses an intrinsic and important property of data sets  (^) Forms the foundation for many essential data mining tasks  (^) Association, correlation, and causality analysis  (^) Sequential, structural (e.g., sub-graph) patterns  (^) Pattern analysis in spatiotemporal, multimedia, time- series, and stream data  (^) Classification: associative classification  (^) Cluster analysis: frequent pattern-based clustering  (^) Data warehousing: iceberg cube and cube-gradient  (^) Semantic data compression: fascicles  (^) Broad applications

Basic Concepts: Frequent Patterns and Association Rules  (^) Itemset X = {x 1 , …, xk}  (^) Find all the rules X  Y with minimum support and confidence

 support, s, probability that a

transaction contains X ∪ Y

 confidence, c, conditional

probability that a transaction

having X also contains Y

Let supmin = 50%, confmin = 50% Freq. Pat.: {A:3, B:3, D:4, E:3, AD:3} Association rules: A  D (60%, 100%) D  A (60%, 75%) Customer buys diaper Customer buys both Customer buys beer 40 B, E, F 50 B, C, D, E, F 30 A, D, E 20 A, C, D 10 A, B, D Transaction-id Items bought

Closed Patterns and Max-Patterns

 Exercise. DB = {<a

1

, …, a

100

>, < a

1

, …, a

50

Min_sup = 1.

 What is the set of closed itemset?

 <a

1

, …, a

100

 < a

1

, …, a

50

 What is the set of max-pattern?

 <a

1

, …, a

100

What is the set of all patterns?

Chapter 5: Mining Frequent Patterns, Association and Correlations

 Basic concepts and a road map

 Efficient and scalable frequent itemset mining

methods

 Mining various kinds of association rules

 From association mining to correlation

analysis

 Constraint-based association mining

 Summary

Apriori: A Candidate Generation-and-Test Approach

 (^) Apriori pruning principle: If there is any itemset which is infrequent, its superset should not be generated/tested! (Agrawal & Srikant @VLDB’94, Mannila, et al. @ KDD’ 94)  (^) Method:  Initially, scan DB once to get frequent 1-itemset  (^) Generate length (k+1) candidate itemsets from length k frequent itemsets  (^) Test the candidates against DB  (^) Terminate when no frequent or candidate set can be generated

The Apriori Algorithm—An Example Database TDB 1 st scan

C

1

L

1 L 2

C

2 C 2

nd scan C 3

L

3 rd^ scan^3 40 B, E 30 A, B, C, E 20 B, C, E 10 A, C, D Tid Items {D} 1 {E} 3 {C} 3 {B} 3 {A} 2 Itemset sup {E} 3 {C} 3 {B} 3 {A} 2 Itemset sup {C, E} {B, E} {B, C} {A, E} {A, C} {A, B} Itemset {A, B} 1 {A, C} 2 {A, E} 1 {B, C} 2 {B, E} 3 {C, E} 2 Itemset sup {A, C} 2 {B, C} 2 {B, E} 3 {C, E} 2 Itemset sup {B, C, E} Itemset {B, C, E} 2 Itemset sup Sup min

Important Details of Apriori  (^) How to generate candidates?  (^) Step 1: self-joining L k  (^) Step 2: pruning  (^) How to count supports of candidates?  (^) Example of Candidate-generation  L 3 ={abc, abd, acd, ace, bcd}  (^) Self-joining: L 3 *L 3  abcd from abc and abd  acde from acd and ace  (^) Pruning:  acde is removed because ade is not in L 3  C 4 ={abcd}

How to Generate Candidates?

 Suppose the items in L

k- are listed in an order

 Step 1: self-joining L

k- insert into Ck select p.item 1 , p.item 2 , …, p.itemk-1, q.itemk- from Lk-1 p, Lk-1 q where p.item 1 =q.item 1 , …, p.itemk-2=q.itemk-2, p.itemk-1 < q.itemk-  (^) Step 2: pruning forall itemsets c in Ck do forall (k-1)-subsets s of c do if (s is not in L k- ) then delete c from C k

Partition: Scan Database Only Twice  (^) Any itemset that is potentially frequent in DB must be frequent in at least one of the partitions of DB  (^) Scan 1: partition database and find local frequent patterns  (^) Scan 2: consolidate global frequent patterns  (^) A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association in large databases. In

VLDB’

DIC: Reduce Number of Scans ABCD ABC ABD ACD BCD AB AC BC AD BD CD A B^ C^ D {} Itemset lattice  (^) Once both A and D are determined frequent, the counting of AD begins  (^) Once all length-2 subsets of BCD are determined frequent, the counting of BCD begins Transactions 1-itemsets 2-itemsets … Apriori 1-itemsets 2-items DIC 3-items S. Brin R. Motwani, J. Ullman, and S. Tsur. Dynamic itemset counting and implication rules for market basket data. In SIGMOD’

Mining Frequent Patterns Without Candidate Generation 

Grow long patterns from short ones using local

frequent items

“abc” is a frequent pattern

 Get all transactions having “abc”: DB|abc

“d” is a local frequent item in DB|abc  abcd is

a frequent pattern

Construct FP-tree from a Transaction Database {} f:4 c: b: p: c:3 b: a: m:2 b: p:2 m: Header Table Item frequency head f 4 c 4 a 3 b 3 m 3 p 3 min_support = 3 TID Items bought (ordered) frequent items 100 { f, a, c, d, g, i, m, p } { f, c, a, m, p } 200 { a, b, c, f, l, m, o } { f, c, a, b, m } 300 { b, f, h, j, o, w } { f, b } 400 { b, c, k, s, p } { c, b, p } 500 { a, f, c, e, l, p, m, n } { f, c, a, m, p }

  1. Scan DB once, find frequent 1-itemset (single item pattern)
  2. Sort frequent items in frequency descending order, f-list
  3. Scan DB again, construct FP-tree F-list=f-c-a-b-m-p