9 tayangan

Diunggah oleh IRJET Journal

https://irjet.net/archives/V3/i8/IRJET-V3I8126.pdf

- Comparison of Image Segmentation Algorithms for Brain Tumor Detection
- Hybrid Data Clustering Approach Using
- Andrew Rosenberg- Lecture 11: Clustering Introduction and Projects Machine Learning
- Datamining Intro IEP 2
- 7.1 K-Means
- Paper_BI
- A Framework to Support Management of Hivaids Using K Means and Random Forest Algorithm
- bse paper 3.pdf
- Publication details of faculty in IT Department at BEC
- Clustering In Data Mining
- Clustering on Boston Dataset
- Cluster.ppt
- Sine Cosine Based Algorithm for Data Clustering
- KHALIQ PAPER
- A Heuristic Approach for Network Data Clustering
- Mode Frontier User Manual
- Cluster Analysis - Ward’s Method
- A Clustering-Based Evidence Reasoning Method
- COMPUSOFT, 3(4), 702-708.pdf
- Punctuated Equilibrium Typology

Anda di halaman 1dari 4

FOR DISTRIBUTED DATA

1 Garima Goel

1NCCE/CSE Dept., Israna-Panipat, India

Email: garimagoel06@gmail.com

Abstract For business and real time applications large set is that a data point can exist in one cluster. Any two clusters

of data sets are used to extract the unknown patterns which is cant share one common data point. In this dissertation we

termed as data mining approach. Clustering and classification present a modifying algorithm that is combination of two

algorithm are used to classify the unlabeled data from the

algorithms that will show the membership of a data point

large data set in a supervised and unsupervised manner. The

assignment of a set of observations into subsets (called within the clusters. The basic procedure involves producing

clusters) so that observations in the same cluster are similar in all the segmented dataset for 2 clusters up to Kmax clusters,

any way is termed as cluster analysis or clustering. Through where Kmax represents an upper limit on the number of

these algorithms the inferences of the clustering process and clusters. Then our measurement of membership function is

its domain application competence is determined.the research calculated to determine which data point has how much

work deals with K-means and MCL algorithms which are most

membership thats value exists between 0 & 1. At o there is

delegated clustering algorithms.

no membership but at value 1 there is membership with all

Keywords Data Mining, Cluster, K-Mean, LIC. clusters. We propose a modifying algorithm for

implementing the k means method. Our algorithm produces

1.INTRODUCTION the same or com.

3.PROPOSED METHODOLOGY

We are living in an era often referred to as the information

1. Firstly load the data in Matlab code. We will use

age. As we believe that information leads to success and

some parameters to load the data in matlab.

power, we have been collecting remarkable amounts of

2. After adding up of data we apply the division rule for

information and thanks to refined technologies such as

dividing the data into small packets of the datasets.

computers, satellites, etc. for all this. . Initially, with the

introduction of computers and resources for mass digital 3. Then we apply the k-mean clustering algorithm to

storage, we started collecting and storing all sorts of data in the resulted packets of datasets formed. The k-mean

huge amount which contains supervised aswell as clustering algorithm is the type of the algorithm of

unsupervised data. Unfortunately, these enormous the data mining technique.

collections of data stored on unrelated structures rapidly 4. In data mining, k-means clustering is a method of

became irresistible. This disorder has led to the creation of cluster analysis which aims to partition n

structured databases and database management systems observations into k clusters in which each

(DBMS). For management of a huge corpus of data and observation belongs to the cluster with the nearest

especially for effectual and proficient retrieval of particular mean.

information from a large collection whenever needed 5. After Appling k-mean clustering algo. We will apply

becomes an important asset and an efficient database MCL algorithm too, through both clustering

management system is required or becomes necessity. algorithms we can find the membership function.

Today from business transactions and scientific data, to 6. After applying both algorithms, we get a

satellite pictures, text reports and military intelligence, we membership function plot.

have far more information than we can handle. For Through resulted membership function we get some

decision-making information retrieval is simply not enough graphs as results

anymore. Confronted with vast collections of data, now we

have created new needs to help us make improved 4. CLUSTERING IN DATA MINING

managerial choices. These requirements are automatic We describe about the techniques used by data mining

summarization of data, pulling out of the essence of to get KDD. There are basically two methods of data

information stored, and the detection of patterns in raw mining Classification and Clustering but there we

data. discuss only Clustering in data mining. We also

describe the existing algorithm by combine any two of

them we can generate the third algorithm that is called

2. PROBLEM STATEMENT

Modifying Algorithm.

The k-means method has been shown to be effective in

producing good clustering results for many practical

applications. However, a direct algorithm of k-means

method requires time proportional to the product of

number of patterns and number of clusters per iteration.

This is computationally very expensive especially for large

datasets. The main disadvantage of the k-means algorithm

2016, IRJET | Impact Factor value: 4.45 | ISO 9001:2008 Certified Journal | Page 695

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056

Volume: 03 Issue: 08 | Aug-2016 www.irjet.net p-ISSN: 2395-0072

mail. These messages are regularly stored in digital form

for future use and reference creating formidable digital

libraries.

The World Wide Web Repositories: Since the inception of

the World Wide Web in 1993, documents of all sorts of

formats, content and description have been collected and

inter-connected with hyperlinks making it the largest

repository of data ever built. Despite its dynamic and

unstructured nature, its heterogeneous characteristic, and

its very often redundancy and inconsistency, the World

Wide Web is the most important data collection regularly

used for reference because of the broad variety of topics

covered and the infinite contributions of resources and

publishers.

5.ALGORITHM

Figure 1. Data Mining Cycle There are several major data mining procedures have been

developed and used in data mining projects recently

Business Transactions: Every transaction in the business including association, classification, clustering, prediction

industry is (often) memorized for perpetuity. Such and sequential patterns. We will briefly examine those data

transactions are usually time related and can be inter- mining techniques with example to have a good overview of

business deals such as purchases, exchanges, banking, them. Association is one of the best known data mining

stock, etc., or intra-business operations such as

technique. In association, a pattern is discovered based on a

management of in-house wares and assets.

relationship of a particular item on other items in the same

Scientific Data: Whether in a Swiss nuclear accelerator transaction. For example, the association technique is used

laboratory counting particles, in the Canadian forest in market basket analysis to identify what products that

studying readings from a grizzly bear radio collar, on a customers frequently purchase together. Based on this data

South Pole iceberg gathering data about oceanic activity, or

businesses can have corresponding marketing campaign to

in an American university investigating human psychology,

sell more products to make more profit. The main

our society is amassing colossal amounts of scientific data

techniques that we will discuss here are the ones that are

that need to be analyzed.

used 99.9% of the time on existing business problems.

Medical and Personal Data: From government census to There are certainly many other ones as well as proprietary

personnel and customer files, very large collections of techniques from particular vendors - but in general the

information are continuously gathered about individuals

industry is converging to those techniques that work

and groups. Governments, companies and organizations

consistently and are understandable and explainable. One

such as hospitals, are stockpiling very important quantities

of the most popular heuristics for solving the k-means

of personal data to help them manage human resources,

problem is based on a simple iterative scheme for finding a

better understand a market, or simply assist clientele.

Regardless of the privacy issues this type of data often locally minimal solution. This algorithm is often called the

reveals, this information is collected, used and even shared. k-means algorithm. There are a number of variants to this

algorithm, so, to clarify which version we are using, we will

Surveillance Video and Pictures: With the amazing refer to it as Lloyd's algorithm. (More accurately, it should

collapse of video camera prices, video cameras are be called the generalized Lloyd's algorithm since Lloyd's

becoming ubiquitous. Video tapes from surveillance original result was for scalar data.).

cameras are usually recycled and thus the content is lost.

However, there is a tendency today to store the tapes and Genetic algorithms (GAs) work on a coding of the

even digitize them for future use and analysis parameter set over which the search has to be performed,

Satellite Sensing: There are a countless number of rather than the parameters themselves. These encoded

satellites around the globe: some are geo-stationary above parameters are called solutions or chromosomes and the

a region, and some are orbiting around the Earth, but all are objective function value at a solution is the objective

sending a non-stop stream of data to the surface. NASA, function value at the corresponding parameters. GAs solve

which controls a large number of satellites, receives more optimization problems using a population of a fixed

data every second than what all NASA researchers and number, called the population size, of solutions.

engineers can cope with. Many satellite pictures and data

are made public as soon as they are received in the hopes The main disadvantage of the k-means algorithm is that the

that other researchers can analyze them. number of clusters, K, must be supplied as a parameter. In

this dissertation we present a simple validity measure

Text Reports and Memos (e-mail Messages): Most of the

based on the intra-cluster and inter-cluster distance

communications within and between companies or

measures which allows the number of clusters to be

research organizations or even private people, are based on

2016, IRJET | Impact Factor value: 4.45 | ISO 9001:2008 Certified Journal | Page 696

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056

Volume: 03 Issue: 08 | Aug-2016 www.irjet.net p-ISSN: 2395-0072

producing all the segmented dataset for 2 clusters up to

Kmax clusters, where Kmax represents an upper limit on

the number of clusters. Then our validity measure is

calculated to determine which is the best clustering by

finding the minimum value for our measure. The validity

measure is tested for LIC dataset for which the number of

clusters in known.

In this section, we describe the modifying algorithm. As

mentioned earlier, the algorithm is based on storing the Figure 2. CLUSTER ON THE BASIS OF SIMILAR PROPERTIES

multidimensional data points in a kd-tree. For

completeness, we summarize the basic elements of this STEP 6 Select any one cluster by click any one and MF Plot

data structure. Define a box to be an axis-aligned hyper- generate after implementation of algorithm as shown in

rectangle. The bounding box of a point set is the smallest figure 3.

box containing all the points. A kd-tree is a binary tree,

which represents a hierarchical subdivision of the point

set's bounding box using axis aligned splitting hyper planes.

Each node of the kd-tree is associated with a closed box,

called a cell. The root's cell is the bounding box of the point

set. If the cell contains at most one point (or, more

generally, fewer than some small constant), then it is

declared to be a leaf. Otherwise, it is split into two hyper

rectangles by an axis-orthogonal hyper plane. The points of

the cell are then partitioned to one side or the other of this

hyper plane. (Points lying on the hyper plane can be placed

on either side.) The resulting sub cells are the children of Figure 3. MF PLOT IS GENERATED FOR THE CLUSTER WHICH

the original cell, thus leading to a binary tree structure. REPERSENT POLICY NUMBER

There are a number of ways to select the splitting hyper

plane. One simple way is to split orthogonally to the longest 7.2. ACCURACY

side of the cell through the median coordinate of the

associated points. Given n points, this produces a tree with Accuracy of cluster 5 is described by plot the graph time

on nodes and O.log n. depth. Deviation and Data Accuracy as shown in figure 4.

7.1. RESULT

STEP 1 Run MATLAB Platform and browse the folder

where we have put the code file.

STEP 2 Select the code file which wants to run and press F5

button to run the program. We can also run by using Debug

tab in menu bar and then click on Run.

STEP 3 After run the program we have to select the dataset

on which we want to implement algorithm. The dataset is

classified by colouring the different colour to different

columns as described below: Figure 4. Accuracy V/S Time Deviation for Cluster 5

This colour shows the Policy Number in Dataset.

This colour shows the Initial Name in Dataset.

This colour shows the Premium Amount in Dataset. Data

This colour shows the Age in Dataset. 0.0 0.5 0.4 0.5 0.6 0.7 0.8 1

Accuracy

This colour shows the Agent Code in Dataset.

STEP 4 MF Plot for dataset 1 before implement algorithm. Time

STEP 5 Click on Start button and we get the 5 clusters Deviation 1 2 3 4 5 6 7 8

which represent the similar data in different -2 groups as (seconds)

shown in figure 2.

2016, IRJET | Impact Factor value: 4.45 | ISO 9001:2008 Certified Journal | Page 697

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056

Volume: 03 Issue: 08 | Aug-2016 www.irjet.net p-ISSN: 2395-0072

8. CONCLUSION

that the K-Mean algorithm is an excellent algorithm when

we are dealing with a small or medium sized data. It simply

provides good performance vector every time. A direct

algorithm of k-means method requires time proportional to

the product of number of patterns and number of clusters

per iteration. This is computationally very expensive

especially for large datasets. The main disadvantage of the

k-means algorithm is that the number of clusters, K, must

be supplied as a parameters.So, to overcome from this we

have made a modified algorithm which is combination of K-

means and MCL.

9. REFERENCES

Vapnik. A support vector clustering method. In

International Conference on Pattern Recognition,

2000.

[2] Ahmed S, Coenen F, Leng PH Tree-based

partitioning of date for association rule mining.

Knowl Inf

Syst 10(3):315331, (2006).

[3] Banerjee A, Merugu S, Dhillon I, Ghosh J Clustering

with Bregman divergences. J Mach Learn Res

6:17051749, (2005).

[4] Bonchi F, Lucchese C On condensed representations

of constrained frequent patterns. Knowl Inf Syst

9(2):180201, (2006).

Classics of mathematics. SIAM, Philadelphia,

(1991).

[6] Breiman L Prediction games and arcing classifiers.

Neural Comput 11(7):14931517, (1999).

[7] Breiman L, Friedman JH, Olshen RA, Stone CJ

Classification and regression trees. Wadsworth,

Belmont, (1984).

[8] Johannes Grabmeier and Andreas Rudolph

Techniques of Cluster Algorithms in Data Mining

Received November 12, 1998; Revised May 23, 2001

[9] U. Fayyad, G.Piatetsky-Shapiro and P.Smyth. From

data mining to knowledge discovery in databases.Ai

Magazine, Volume 17, pages 37-54, 1996.

[10] In Year 1997, Pavel BerkhinHo performed a

work,Survey of Clustering Data Mining Techniques

[11] In Year 2001, Glenn Fung performed a work,A

Comprehensive Overview of Basic Clustering

Algorithms.

2016, IRJET | Impact Factor value: 4.45 | ISO 9001:2008 Certified Journal | Page 698

- Comparison of Image Segmentation Algorithms for Brain Tumor DetectionDiunggah olehIJMTST-Online Journal
- Hybrid Data Clustering Approach UsingDiunggah olehAnonymous IlrQK9Hu
- Andrew Rosenberg- Lecture 11: Clustering Introduction and Projects Machine LearningDiunggah olehRoots999
- Datamining Intro IEP 2Diunggah olehaslam2001in
- 7.1 K-MeansDiunggah olehCalin Popescu
- Paper_BIDiunggah olehJuan Carlos Patricio
- A Framework to Support Management of Hivaids Using K Means and Random Forest AlgorithmDiunggah olehIJSTR Research Publication
- bse paper 3.pdfDiunggah olehinvincible_shalin6954
- Publication details of faculty in IT Department at BECDiunggah olehnalluri_08
- Clustering In Data MiningDiunggah olehseventhsensegroup
- Clustering on Boston DatasetDiunggah olehanubhav582
- Cluster.pptDiunggah olehShashank Gangadharabhatla
- Sine Cosine Based Algorithm for Data ClusteringDiunggah olehAnonymous lPvvgiQjR
- KHALIQ PAPERDiunggah olehAnkush Jain
- A Heuristic Approach for Network Data ClusteringDiunggah olehidescitation
- Mode Frontier User ManualDiunggah olehKavuru Srinivasarao
- Cluster Analysis - Ward’s MethodDiunggah olehraghu_sb
- A Clustering-Based Evidence Reasoning MethodDiunggah olehMia Amalia
- COMPUSOFT, 3(4), 702-708.pdfDiunggah olehIjact Editor
- Punctuated Equilibrium TypologyDiunggah olehBach Achacoso
- Clustering AlgorithmsDiunggah olehSanjay Shelar
- CS1004 DataMining Unit 4 NotesDiunggah olehsridharanc23
- Machine GroupsDiunggah olehAmrik Singh
- kao2008.pdfDiunggah olehIsrar Ahmed
- SEGMENTATION OF IMAGES USING HISTOGRAM BASED FCM CLUSTERING ALGORITHM AND SPATIAL PROBABILITYDiunggah olehIJAET Journal
- File Exchange - MATLAB CentralDiunggah olehAbdi Juragan Rusdyanto
- Biological-based Semi-supervised Clustering Algorithm to Improve Gene Function PredictionDiunggah olehJournal of Computing
- Robust Linear ClusteringDiunggah olehtani2779
- Ch12 Clustering Algorithms I Sequential AlgorithmsDiunggah olehWa Chio Leong
- Key Approaches to Supplier SelectionDiunggah olehDennis Bours

- IRJET-Case Study on extending of bus Rapid Transit System in Indore City from Vijay Nagar Square to Aurobindo HospitalDiunggah olehIRJET Journal
- IRJET-Lateral Behavior of Group of Piles under the Effect of Vertical LoadingDiunggah olehIRJET Journal
- IRJET-Filtering Unwanted MessagesDiunggah olehIRJET Journal
- IRJET- Experimental Study on Strength of Concrete Containing Brick Kiln Dust and Silica FumeDiunggah olehIRJET Journal
- IRJET-Smart Waiting System using Wi-Fi ModuleDiunggah olehIRJET Journal
- IRJET-Proposed Idea for Monitoring the Sign of Volcanic Eruption by using the Optical Frequency CombDiunggah olehIRJET Journal
- IRJET-Detection and Classification of Diabetic Retinopathy in Fundus Images using Neural NetworkDiunggah olehIRJET Journal
- IRJET-Plan and Examination of Planning Calculations for Optically Prepared Server Farm SystemsDiunggah olehIRJET Journal
- IRJET-Automatic Curriculum Vitae using Machine learning and Artificial IntelligenceDiunggah olehIRJET Journal
- IRJET-Objective Accessibility in Unverifiable Frameworks: Showing and Game plansDiunggah olehIRJET Journal
- IRJET- Lyrical Analysis using Machine Learning AlgorithmDiunggah olehIRJET Journal
- IRJET-Student Result Analysis SystemDiunggah olehIRJET Journal
- IRJET-Food Ordering Android ApplicationDiunggah olehIRJET Journal
- IRJET-Optimization of Bumper Energy Absorber to Reduce Human Leg InjuryDiunggah olehIRJET Journal
- IRJET-Dynamic Flexible Video Talkative on Diverse TVWS and Wi-Fi SystemsDiunggah olehIRJET Journal
- IRJET- Lung Cancer Detection using Image Processing TechniquesDiunggah olehIRJET Journal
- IRJET-Smart City TravellerDiunggah olehIRJET Journal
- IRJET-Intelligent Vehicle Speed ControllerDiunggah olehIRJET Journal
- IRJET-Seismic Analysis of a Multi-storey (G+8) RCC Frame Building for Different Seismic Zones in Nagpur.(M.H.)Diunggah olehIRJET Journal
- IRJET- Prepaid Energy Meter with Home AutomationDiunggah olehIRJET Journal
- IRJET-Solar powered Automatic Grass CutterDiunggah olehIRJET Journal
- IRJET- A Survey on Automated Brain Tumor Detection and Segmentation from MRIDiunggah olehIRJET Journal
- IRJET-Design and Fabrication of Cycloidal GearboxDiunggah olehIRJET Journal
- IRJET-Design and Analysis of BAJA ATV (All-Terrain Vehicle) FrameDiunggah olehIRJET Journal
- IRJET- Accurate Vehicle Number Plate Regognition and Real Time Identification using Raspberry PIDiunggah olehIRJET Journal
- IRJET- Literature Review on Effect of Soil Structure Interaction on Dynamic Behaviour of BuildingsDiunggah olehIRJET Journal
- IRJET- Initial Enhancement : A Technique to Improve the Performance of Genetic AlgorithmsDiunggah olehIRJET Journal
- IRJET-Design and Performance evaluation of Wheat Crop Cutting MachineDiunggah olehIRJET Journal
- IRJET-A Review on Corrosion of Steel Reinforced in Cement ConcreteDiunggah olehIRJET Journal
- IRJET-Sustainability Assessment Considering Traffic Congestion Parameters in Nagpur CityDiunggah olehIRJET Journal

- principal competencies 1bDiunggah olehapi-51170668
- 2015 specialized bikes new Tarmac S Works McLaren Rider First Engineered™ reviews chinaDiunggah olehironman008
- Manual Lector Codigo Barras Metro LogicDiunggah olehpelukin
- Intel 8253 - Programmable Interval TimerDiunggah olehGaganBhayana
- CM2Y9762en_01_PX_RSDiunggah olehPeli Jorro
- ├▀├╡╣«┴ª_╣┌┬∙╝« - Copy.docxDiunggah olehnomio12
- Relative ClausesDiunggah olehJerry Nguyễn
- Reinventing MotorolaDiunggah olehRichard Solo
- Inmarsat_Troubleshooting_BGAN.pdfDiunggah olehwilvarel
- Air Pollution Measurements and Emission EstimatesDiunggah olehAnu Cooray
- Lesson Plan. Planificación de clase. 5to gradoDiunggah olehramonuchina
- sturkenDiunggah olehanjitagrr
- TB61- FSV Vacuum Service WEBDiunggah olehHieu Nguyen
- Xiong_ZhuDiunggah olehnwabukingz
- 2.7 Metropolitan Bank v GoDiunggah olehZhanika Marie Carbonell
- TocDiunggah olehPatrick Ng
- Discovery featuring Studio Paintings | Skinner Auction 2543MDiunggah olehSkinnerAuctions
- First Discovery Request in Brown-Dickerson v. City of Philadelphia, et alDiunggah olehThe Declaration
- ch23Diunggah olehonlydlonly
- Ear and Voice Part 1Diunggah olehJ N
- Scotland's top 500 companiesDiunggah olehPatrick McPartlin
- DiencephalonDiunggah olehReza Akbar
- How to Setup a Legal Consolidation ApplicationDiunggah olehravibemail
- BGPDiunggah olehMcyanogen Hesham
- Ubuntu Apt_Get GuideDiunggah olehSourav Dey
- Music Industry GuideDiunggah olehthedcchief
- Aeon 9000sp SdsDiunggah olehandry_setiawanall4jc
- 42720409 Lecture 2 Orthographic ProjectionDiunggah olehAkatew Haile Mebrahtu
- Data CollectionDiunggah olehammoos
- Economics of InformationDiunggah olehMuhibbullah Azfa Manik