Anda di halaman 1dari 4

Big Data

"Big data" is a field that treats ways to analyze, systematically extract information from, or
otherwise deal with data sets that are too large or complex to be dealt with by traditional data-
processing application software. Data with many cases (rows) offer greater statistical power, while
data with higher complexity (more attributes or columns) may lead to a higher false discovery rate.
Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer,
visualization, querying, updating, information privacy and data source. Big data was originally
associated with three key concepts: volume, variety, and velocity. Other concepts later attributed
with big data are veracity (i.e., how much noise is in the data) and value.

Data sets grow rapidly- in part because they are increasingly gathered by cheap and numerous
information- sensing Internet of things devices such as mobile devices, aerial (remote sensing),
software logs, cameras, microphones, radio-frequency identification (RFID) readers and wireless
sensor networks. The world's technological per-capita capacity to store information has roughly
doubled every 40 months since the 1980s; as of 2012, every day 2.5 exabytes (2.5×1018) of data
are generated. Based on an IDC report prediction, the global data volume will grow exponentially
from 4.4 zettabytes to 44 zettabytes between 2013 and 2020. By 2025, IDC predicts there will be
163 zettabytes of data. One question for large enterprises is determining who should own big-data
initiatives that affect the entire organization.

Characteristics

Big data can be described by the following characteristics:

1. Volume
The quantity of generated and stored data. The size of the data determines the value and
potential insight, and whether it can be considered big data or not.
2. Variety
The type and nature of the data. This helps people who analyze it to effectively use the
resulting insight. Big data draws from text, images, audio, video; plus it completes missing
pieces through data fusion.
3. Velocity
In this context, the speed at which the data is generated and processed to meet the demands
and challenges that lie in the path of growth and development. Big data is often available
in real-time. Compared to small data, big data are produced more continually. Two kinds
of velocity related to big data are the frequency of generation and the frequency of
handling, recording, and publishing.
4. Veracity
It is the extended definition for big data, which refers to the data quality and the data value.
The data quality of captured data can vary greatly, affecting the accurate analysis.
Data must be processed with advanced tools (analytics and algorithms) to reveal
meaningful information. For example, to manage a factory one must consider both visible
and invisible issues with various components. Information generation algorithms must
detect and address invisible issues such as machine degradation, component wear, etc. on
the factory floor.

Big Data Problems

1. Finding the Signal in the Noise


It’s difficult to get insights out of a huge lump of data. Intellectsoft Big Data Scientist and
author of the book “Social Network Analysis for Startups”, Maksim Tsvetovat said that in
order to use Big Data “There has to be a discernible signal in the noise that you can detect, and
sometimes there just isn’t one. Once we’ve done our intelligence on the data, sometimes we
have to come back and say we just didn’t measure this right or measured the wrong variables
because there’s nothing we can detect here.” He went on to say that in its raw form, Big Data
looks like a hairball and scientific approach to the data is necessary. “You approach it carefully
and behave like a scientist which means if you fail at your hypothesis, you come up with a few
other hypothesizes, and maybe one of them turns out to be correct.”
2. Data Silos
Data silos are basically Big Data’s kryptonite. What they do is store all of that wonderful data
you’ve captured in separate, disparate units, that have nothing to do with one another and
therefore no insights can be gathered from this data because it simply isn’t integrated on the
back end. Data silos are the reason you have to number crunch to produce a monthly sales
report. They’re the reason that C-level decisions are made at a snail’s pace. They’re the reason
your sales and marketing teams simply don’t get along. They’re the reason that your customers
are looking elsewhere to take their business because they don’t feel their needs are being met
and a smaller, more nimble company, is offering something better. The way to eliminate data
silos? Integrate your data.
3. Inaccurate Data
Not only are data silos ineffective on an operational level, they are also fertile breeding ground
for the biggest data problem: inaccurate data. According to a recent report from Experian Data
Quality, 75% of businesses believe their customer contact information is incorrect. If you’ve
got a database full of inaccurate customer data, you might as well have no data at all. The best
way to combat inaccurate data? Eliminating data silos by integrating your data.
4. Technology Moves Too Fast
Larger corporations are more prey to data silos, for such reasons as they prefer to keep their
databases on-premises, and because decision making about new technologies is often slow.
One example cited in the CapGemini report is that stalwarts like telcos and utilities “…are
noticing high levels of disruption from new competitors moving in from other sectors. This
issue was mentioned by over 35% of respondents in each of these industries, compared with
an overall average of under 25%.” In essence, traditional players are slower to move on
technological advances and are finding themselves faced with serious competition from
smaller companies because of this.
Big Data is also fast data. Paul Maritz, Pivotal Chief Executive Officer of the EMC Federation,
wrote in a recent CapGemini Report that, “If you can obtain all the relevant data, analyze it
quickly, surface actionable insights, and drive them back into operational systems, then you
can affect events as they’re still unfolding. The ability to catch people or things “in the act”,
and affect the outcome, can be extraordinarily important, valuable and disruptive.”
The ability to make snap decisions and quickly move on Big Data insights is the advantage
SMEs have over large corporations.
5. Lack of Skilled Workers
CapGemini’s report found that 37% of companies have trouble finding skilled data analysists
to make use of their data. The best bet is to form one common data analyst team for the
company, either through re-skilling your current workers or recruiting new workers specialized
in big data.
You need to find employees that not only understand data from a scientific perspective, but
who also understand the business and its customers, and how their data findings apply directly
to them.

Data Integration

Data integration - or to be technical, data harmonization - is absolutely essential for getting the full
advantage out of your Big Data. Data integration addresses the backend need for getting data silos
to work together so you can obtain deeper insight from Big Data.

The first step to integrating your data is to ensure you’ve got clean data. Big Data Consultant Ted
Clark, from the data consultancy company Adventag, said that “80% of the work Data Scientists
do is cleaning up the data before they can even look at it.

Cleaning and Maintaining Data

1. Remove Duplicates

If you’re using multiple channels to capture data, such as through your website, customer care
centre, and marketing leads, you’re running the risk of collecting duplicate information. There are
tools to help you remove duplicate data, for instance if you work with Google Contacts you can
merge your contacts.

2. Verify New Data

Set company-wide standards on verifying all new, captured data before it enters the central
database. Put in checks to see if the customer isn’t already in the system, or that they’re not in the
system under a different name or under their email address.

3. Update Data

Keep your data updated. You can do this by using parsing tools, which scans all incoming emails
and updates contact information as it comes to hand.

4. Implement Consistent Data Entry

Ensure that all employees are aware of company-wide data entry standards. For instance, each
customer record has to have first and last names

Anda mungkin juga menyukai