A remarkable increase in different issues like
network complexity, increased access of Internet, information sharing and a
growing impact of Internet gives rise to security and privacy as a major interest
for research. “Data mining is a technique for extracting knowledge
automatically and intelligently from huge amount of data.” Privacy
preserving data mining (PPDM) refers to securing the privacy of personal data
or sensitive information without losing the productiveness of data.
Privacy preserving data mining is drawing booming
attention in the past recent years with the expeditious development of
Internet, data processing and data storage technologies. Privacy of an
individual will not be violated until and unless one feels his/her private
information is being used unfavorably. No one can prevent someone’s
personal information from being misused once it is disclosed. There are several
methods that have been put forward for privacy concern but this branch of
research is still in its infancy.
“A number of techniques and methods have been refined
for privacy preserving data mining that allows one to extract required and
significant knowledge from huge amount of data, and hiding sensitive data from revealing
or inference at the same time.”
Research of PPDM has the following approaches:
Data Hiding: The sensitive information
like name, address, contact number, etc. are either replaced or blocked or
trimmed from the database. This prevents the user of data from trading off with
other individual’s personal information.
Rule Hiding: The sensitive
information or rules extracted from data mining process are blocked for use.
Thus the private information explored from the mining cannot be used.
Secure Multiparty Computation
(SMC): The data is encrypted before being shared for computations so as to
avoid the data from being leaked.
Privacy Preserving Data Mining Techniques are classified
on the basis of following dimension:
Data or rule hiding
Data mining algorithm
distribution: On the basis of data
distribution, the PPDM algorithms are categorized as centralized and
distributed. In the centralized database system, whole data is gathered at a
single database. While in the distributed database, the data may be present in
different databases at different locations. The distributed database is further
classified as horizontal data distribution and vertical data distribution. In
the horizontal approach, the records of different databases resides at
different locations while in the vertical approach, all the data for various
attributes is present in different locations.
rule hiding: On the basis of principle of
hiding, PPDM algorithms are classified as data hiding and rule hiding. In the
data hiding approach, the sensitive information like name, address, contact
number, etc. are either replaced or blocked or trimmed from the database. This
prevents the user of data from trading off with other individual’s personal
information. Most of the procedures use data hiding techniques as a measure to
keep the information safe from revealing out through hiding precise patterns by
modifying the data.
modification: Modification is required to
modify or change the data in order to attain high level of privacy. The data
can be modified by perturbation, blocking, merging, aggregation, sampling or
swapping or using combination of any of these techniques.
Perturbation: It refers to
changing the original value by some new value. For example, replacing 1 by 0 or
0 by 1, i.e. adding some noise to it.
Blocking: It refers to blocking
of data from being disclosed by substituting the current attribute value by ‘?’
Aggregation or merging: It is
achieved by combining various values into a loutish group.
Swapping: This means that interchanging
the values of some particular data.
Sampling: It refers to unleashing
of data only for a particular sample.
mining algorithms: There are various data
modification algorithms which prepare a ground for analysis and designing of
data hiding algorithms. Some of the important algorithms are:
Decision tree inducers
Association rule mining
In the current outline, PPDM techniques use
classification, association rule mining and clustering. Association mining
cites to the detection of associated rules periodically. Clustering analysis is
a task of dividing or splitting a data set into different groups.
Classification refers to finding of set of models for estimating an outcome on
the basis of the input provided, which gives data classes.
preservation: “The selective modification of
data is done using
PPDM technique and is required to achieve higher
utility for the modified data given that the privacy is not lost.” The
techniques which are used in centralized data distributions involve sanitation,
blocking, distortion and generalization. Secure multi party computation is one
of the algorithms which deals with the computation of any function for any
input, provided that one input is held by each candidate and no private
information is disclosed to any contributor during the computation. For data
hiding, mainly data distortion is used, then the data sanitation and then
Thus, a trade-off between privacy and accuracy is
to be achieved since, improving one of these usually makes the other one to
suffer in terms of cost.
Better methods should be
built up to balance the disclosure, computation and communication cost. To keep
the confidential data private, more supreme algorithms should be developed as
in today’s world privacy of data and information is one of the major concerns.