Apriori Algorithm and its example | Study notes Data Mining

.Introduction

Short stories or tales always help us in understanding a concept better but this is a true story, Wal-

Mart’s beer diaper parable. A sales person from Wal-Mart tried to increase the sales of the store by

bundling the products together and giving discounts on them. He bundled bread and jam which made

it easy for a customer to find them together. Furthermore, customers could buy them together

because of the discount.

To find some more opportunities and more such products that can be tied together, the sales guy

analyzed all sales records. What he found was intriguing. Many customers who purchased diapers

also bought beers. The two products are obviously unrelated, so he decided to dig deeper. He found

that raising kids is grueling. And to relieve stress, parents imprudently decided to buy beer. He paired

diapers with beers and the sales escalated. This is a perfect example of Association Rules in data

mining.

This article takes you through a beginner’s level explanation of Apriori algorithm in data mining. We

will also look at the definition of association rules. Toward the end, we will look at the pros and cons

of the Apriori algorithm along with its R implementation.

Let’s begin by understanding what Apriori algorithm is and why is it important to learn it.

Apriori Algorithm

With the quick growth in e-commerce applications, there is an accumulation vast quantity of data in

months not in years. Data Mining, also known as Knowledge Discovery in Databases(KDD), to find

anomalies, correlations, patterns, and trends to predict outcomes.

Apriori algorithm is a classical algorithm in data mining. It is used for mining frequent itemsets and

relevant association rules. It is devised to operate on a database containing a lot of transactions, for

instance, items brought by customers in a store.i

It is very important for effective Market Basket Analysis and it helps the customers in purchasing

their items with more ease which increases the sales of the markets. It has also been used in the field

of healthcare for the detection of adverse drug reactions. It produces association rules that indicates

what all combinations of medications and patient characteristics lead to ADRs.

Association rules

Association rule learning is a prominent and a well-explored method for determining relations among

variables in large databases. Let us take a look at the formal definition of the problem of association

rules given by Rakesh Agrawal, the President and Founder of the Data Insights Laboratories.

Let I={i1,i2,i3,…,in} be a set of n attributes called items and D={t1,t2,…,tn} be the set of

transactions. It is called database. Every transaction, ti in D has a unique transaction ID, and it

consists of a subset of itemsets in I.

A rule can be defined as an implication, X⟶Y where X and Y are subsets of I(X,Y⊆I), and they

have no element in common, i.e., X∩Y. X and Y are the antecedent and the consequent of the rule,

respectively.

Apriori Algorithm and its example, Study notes of Data Mining

Related documents

Partial preview of the text

Download Apriori Algorithm and its example and more Study notes Data Mining in PDF only on Docsity!

.Introduction

Apriori Algorithm

Association rules

Let I={i^1 ,i^2 ,i^3 ,…,in^ }^ be a set of n attributes called items and^ D={t^1 ,t^2 ,…,tn^ }^ be the set of

transactions. It is called database. Every transaction, t^ i^ in^ D^ has a unique transaction ID, and it

consists of a subset of itemsets in I^.

A rule can be defined as an implication, X⟶Y^ where^ X^ and^ Y^ are subsets of^ I(X,Y⊆I), and they

have no element in common, i.e., X∩Y^.^ X^ and^ Y^ are the antecedent and the consequent of the rule,

itemsets, I ={Onion, Burger, Potato, Milk, Beer} and a database consisting of six transactions. Each

t 1 1 1 1 0 0

t 2 0 1 1 1 0

t 3 0 0 0 1 1

t 4 1 1 0 1 0

t 5 1 1 1 0 1

t 6 1 1 1 1 1

The support of an itemset X , supp(X) is the proportion of transaction in the database in which the

supp(X)=Number of transaction in whichXappearsTotal number of

transactions.

In the example above, supp(Onion)=46=0.^.

conf(X⟶Y)=supp(X∪Y)supp(X)

also be interpreted as the conditional probability P(Y|X), i.e, the probability of finding the

itemset Y^ in transactions given the transaction already contains^ X^.

popularity of the itemset X and not the popularity of Y. If Y is equally popular as X then there will

OPB 4

PBM 3

Given there are n items in the set I^ , the total number of possible association rules is^3 n^ –2n+1+1.