I. INTRODUCTION
Data cleaning is a method of adjusting or eradicating
information in a database that is wrong, unfinished,
inappropriately formatted or reproduced. A business in a
data-intensive profession like banking, insurance, trade,
telecommunication, or transportation might use a data
cleaning algorithm to methodically inspect data for errors
by using a set of laws, algorithms and search for tables. On
average data cleaning tool consist of programs that are able
to correct a number of specific types of errors or detecting
duplicate records. Making use of algorithm can save a
database manager a substantial amount of time and can be
less expensive than mending errors manually.
Data cleaning is a vital undertaking for data warehouse
experts, database managers and developers alike.
Deduplication, substantiation and house holding methods
can be applied whether you are populating data warehouse
components, incorporating recent data into an existing
operation system or sustaining real time dedupe efforts
within an operational system. The objective is an elevated
level of data precision and reliability that transmutes into
enhanced customer service, lower expenses and tranquillity.
Data is a priceless organisational asset that should be
developed and honed to grasp its full benefit. Deduplication
guarantees that a single correct record be present for each
business unit represented in a business transactional or
analytic database. Validation guarantees that every
characteristic preserved for a specific record is accurate.
Cleaning data prior to it being stored in a reporting database
is essential to provide worth to clients of business acumen
applications. The cleaning procedure usually consists of
processes that put a stop to duplicate records from being
reported by the system. Data analysis and data enrichment
services can help improve the quality of data. These
services include the aggregation, organisation and cleaning
of data. These data cleaning and enrichment services can
ensure that your database-part and material files, product
catalogue files and item information etc. are current,
accurate and complete.
Rajashree Y. Patil et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 3 (5) , 2012,5212 - 5214
Rajashree Y. Patil et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 3 (5) , 2012,5212 - 5214
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
5214