Introduction
Data publishing approach may lead to insufficient protection. So, providing privacy for the micro data publishing.
Literature Survey
In both Generalization and Bucketization approaches attributes are partitioned into three categories: 1) Some attributes are identifiers that can uniquely identify
3) Some attributes are Sensitive Attributes (SAs), which are unknown to the adversary and are considered sensitive, such as Disease and Salary.
In generalization and bucketization, one first removes identifiers from the data and then partitions tuples into buckets. Generalization transforms the QI-values in each bucket. Bucketization, one separates the SAs from the QIs by randomly permuting the SA values in each bucket.
Privacy threats
When publishing microdata, there are three types of privacy disclosure threats. They are as follows 1) Membership disclosure 2) Identity disclosure 3) Attribute disclosure
Problem Specification
The anonymization techniques for privacy preserving microdata publishing are: 1.Generalization 2. Bucketization
It does not prevent member ship disclosure. In many data sets, it is unclear which attributes are QIs and which are SAs. It requires a clear separation between QIs and SAs. By separating the QI attributes and SA here it breaks down the attribute correlation between attributes.
Slicing
Slicing partitions the dataset both vertically and horizontally. Grouping the attributes into columns, each column contains a subset of attributes, i.e., vertical partition Slicing also partition tuples into buckets. Each bucket contains a subset of tuples, i.e. horizontal partition.
Slicing Algorithms
Our algorithm consists of three phases they are as follows: Attribute partitioning Column generalization Tuple partitioning
10
11
Algorithm
The algorithm maintains two data structures: a queue of buckets Q and a set of sliced buckets SB. Initially, Q contains only one bucket which includes all tuples and SB is empty. In each iteration, the algorithm removes a bucket from Q and splits the bucket into two buckets.
If the sliced table after the split satisfies l-diversity, then the algorithm puts the two buckets at the end of the queue Q.
12
Cont..
Otherwise, we cannot split the bucket anymore and the algorithm puts the bucket into SB. When Q becomes empty, we have computed the sliced table. The set of sliced buckets is SB. The main part of the tuple-partition algorithm is to check whether a sliced table satisfies l-diversity.
13
This the original microdata table and its anonymized versions using anonymization techniques
16
In above figure it consists of QI and SA. Age, sex, zipcode is QI and disease is SA and generalized table that satisfies 4-anonymity
17
The above dataset shows the bucketized table that satisfies 2-diversity.
18
The above tables shows the Multiset-based generalization, one attribute per column slicing and the below table shows sliced table
19
Diverse Slicing
User
Correlation Measure
Attribute Clustering
Tuple Partitioning
It shows that the activities of the user. The user can provide the privacy to the microdata by generalizing the records, dividing into number of buckets and breaking the correlation between the attributes. The attributes of the table are sliced by performing random permutation and probability 20 function.
Class diagram
It shows that how probability functions are calculated and randomly permuted. It having five different classes with their attributes, methods how data is retrieved and methods are applied. 21
Modules
Data slicing Diverse slicing Correlation measure Attribute clustering Tuple partitioning
22
bucketized.
Slicing first partition attributes into columns and then partition tuples into buckets. In diverse slicing has to extend the above analysis to the general case and introduce the notion of l-
Two Correlation measures are used for measuring correlation between two continuous attributes and two categorical attributes.
After the correlations for each pair of attributes, we use clustering to partition attributes into columns. After that tuple partition will be done. Here tuples are partitioned into buckets.
24
Results
26
27
28
29
Displaying the message in the textbox when user cancel the browsing 30
31
32
33
34
35
36
38
Displaying the values which have a clear separation between Quasi39 Identifiers and Sensitive Attributes
40
41
42
Showing the probability based on the countries and salary of all the 43 buckets
44 Providing security for sliced sets and storing the values in the database
45
46
47
Comparison
400 350
300
200
150
100
50
0
Anonymity Diversity Slicing
400
350
300
250
Time (msec)
200
150
100
50
49
Conclusion
Dataset is taken and performing anonymization techniques to protect privacy for micro data.
implemented.
By using DES it provides the security to the sliced set table. Overlapping slicing is done, which duplicates an attribute in more than one columns. By comparing slicing preserves better data utility than generalization and bucketization based on time consuming.
50
Future Work
Slicing gives better privacy than generalization and bucketization but still in future there is a scope to increase the privacy for microdata publishing, by using different anonymization techniques.
51
References
[1] Tiancheng Li, Ninghui Li, Jian Zhang, and Ian Molloy Slicing: A New Approach for Privacy Preserving Data Publishing ieee transactions on knowledge and data engineering, vol. 24, no. 3, march 2012. [2] A. Inan, M. Kantarcioglu, and E. Bertino, Using Anonymized Data for Classification, Proc. IEEE 25th Intl Conf. Data Eng. (ICDE), pp. 429-440,
2009.
[3] B.-C. Chen, K. LeFevre, and R. Ramakrishnan, Privacy Skyline: Privacy with Multidimensional Adversarial Knowledge, Proc. Intl Conf. Very Large Data
Thank you..
53