Anda di halaman 1dari 5

Assignment No:

Name: Ajinkya Dhurgude


Roll No:BA1044
Batch:C

Problem Statement:
Implement Apriori approach for data mining to organize the data items
on a shelf using following table of items purchased in a Mall:

Objectives:
To Understand the basic concept of Apriori Algorithm.
Theory:
Apriori is an algorithm for frequent item set mining and association rule
learning over transactional databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and
larger item sets as long as those item sets appear sufficiently often in the
database. The frequent item sets determined by Apriori can be used to determine association rules which highlight general trends in the database: this
has applications in domains such as market basket analysis.
Key Concepts:
-Frequent Item sets: The sets of item which has minimum support(denoted
by Li for i th-Item set).
-Apriori Property: Any subset of frequent item set must be frequent.
-Join Operation: To find L k , a set of candidate k-item sets is generated by
joining Lk-1 with itself The Apriori Algorithm in a Nutshell.
-Find the frequent item sets: the sets of items that have minimum support
A subset of a frequent item set must also be a frequent item set.
-i.e., if AB is a frequent item set, both A and B should be a frequent item
set iteratively find frequent item sets with cardinality from 1 to k (k-item
set).
-Use the frequent item sets to generate association rules.
1

Advantages:
-Uses large item set property.
-Easily paralleling.
-Easy to implement.

Disadvantages:
- Assumes transaction database is memory resident.
- Requires many database scans.

Terminology
- k-item set : a set of k items. E.g.
- beer, cheese, eggs is a 3-item set.
- cheese is a 1-item set.
- honey, ice-cream is a 2-item set.
support: an item set has support of the records in the DB contain that item
set.
minimum support: the Apriori algorithm starts with the specification of
a minimum level of support, and will focus on item sets with this level or
above.
sets: Let A be a set (A = cat, dog) and
let B be a set (B = dog, eel, rat) and
let C = eel, rat
I use A + B to mean A union B.
So A + B = cat, dog, eel. rat
When X is a subset of Y, I use Y X to mean
the set of things in Y which are not in X. E.g.
B C = dog

Applications:
- Market Basket Analysis: given a database of customer transactions, where
each transaction is a set of items the goal is to find groups of items which
are frequently purchased together.
- Telecommunication (each customer is a transaction containing the set of
phone calls).
2

- Credit Cards/ Banking Services (each card/account is a transaction containing the set of customers payments).
- Medical Treatments (each patient is represented as a transaction containing
the ordered set of diseases).
- Basketball-Game Analysis (each game is represented as a transaction containing the ordered set of ball passes).

Association Rule Definitions:


-I=i1, i2, ..., in: a set of all the items.
-Transaction T: a set of items such that T I.
-Transaction Database D: a set of transactions.
-A transaction T I contains a set X I of some items,if X T.
An Association Rule: is an implication of the form X Y, where X, Y I.
How to Generate Candidates :
-Input:Li-1: set of frequent item sets of size i-1.
-Output: Ci : set of candidate item sets of size i.
-Ci = empty set;
-for each item set J in Li-1 do
- for each item set K in Li-1 s.t. K< > J do
- if i-2 of the elements in J and K are equal then
- if all subsets of K J are in Li-1 then
- Ci = Ci K J
- return Ci;
Algorithm :

Mathematical Model:
Let S be the system.
S=-------Identify I as input.
I= D
Where
A=Database
S=I
Identify P as a process.
P= S,MS,F
Where S=Support.
MS=Min Support.
F=Frequent Item Set.
S = I,P
Identify O as output.
O=F
Where
F=Frequent Item Set.
S=I,P,O
Identify A as case of success.
A=program are corrected
l= syntax are right
S=I,P,O,A
Identify F as case of failure.
F= w,x , y , z
Where
w=syntax not correctly written
x= program not completed
y= lex supported file not install
z=not run correctly
S=I,P,O,A,F

Test Cases :

Black Box Testing:

White Box Testing:

Positive Testing:

Negative Testing:

Conclusion :
Thus,We successfully implemented the Apriori Algorithm.

Anda mungkin juga menyukai