Anda di halaman 1dari 9

ASSIGNMENT-01

SUBMITTED BY- ASHUTOSH KUMAR

Q1-Describe big data system with respect to its application.

ANS-

Big data is a term that describes the large volume of data – both structured and unstructured –
that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important.
It’s what organizations do with the data that matters. Big data can be analyzed for insights that
lead to better decisions and strategic business moves.

Big Data is also data but with a huge size. Big Data is a term used to describe a collection of
data that is huge in size and yet growing exponentially with time. In short such data is so large
and complex that none of the traditional data management tools are able to store it or process it
efficiently.

Examples of Big Data

Social Media

A single Jet engine can generate 10+terabytes of data in 30 minutes of flight time

Q2. Explain 3V with respect to big data.

ANS-

 Volume – The name Big Data itself is related to a size which is enormous. Size of data
plays a very crucial role in determining value out of data. Also, whether a particular data
can actually be considered as a Big Data or not, is dependent upon the volume of data.
Hence, 'Volume' is one characteristic which needs to be considered while dealing with
Big Data.
 Variety – The next aspect of Big Data is its variety.

Variety refers to heterogeneous sources and the nature of data, both structured and
unstructured. During earlier days, spreadsheets and databases were the only sources of
data considered by most of the applications. Nowadays, data in the form of emails,
photos, videos, monitoring devices, PDFs, audio, etc. are also being considered in the
analysis applications. This variety of unstructured data poses certain issues for storage,
mining and analyzing data.

 Velocity – The term 'velocity' refers to the speed of generation of data. How fast the data
is generated and processed to meet the demands, determines real potential in the data. Big
Data Velocity deals with the speed at which data flows in from sources like business
processes, application logs, networks, and social media sites, sensors, Mobile devices,
etc. The flow of data is massive and continuous.

Q3. Describe Healthcare Analysis for big data.

ANS-

The application of big data analytics in healthcare has a lot of positive and also life-saving
outcomes. Big data refers to the vast quantities of information created by the digitization of
everything that gets consolidated and analyzed by specific technologies. Applied to healthcare, it
will use specific health data of a population (or of a particular individual) and potentially help to
prevent epidemics, cure disease, cut down costs, etc.

Big Data Applications in Healthcare

 Patients Predictions for an Improved Staffing

or our first example of big data in healthcare, we will look at one classic problem that any shift
manager faces: how many people do I put on staff at any given time period? If you put on too
many workers, you run the risk of having unnecessary labor costs add up. Too few workers, you
can have poor customer service outcomes – which can be fatal for patients in that industry.

Big data is helping to solve this problem, at least at a few hospitals in Paris. A Forbes
article details how four hospitals which are part of the Assistance Publique-Hôpitaux de Paris
have been using data from a variety of sources to come up with daily and hourly predictions of
how many patients are expected to be at each hospital.

 Electronic Health Records (EHRs)

It’s the most widespread application of big data in medicine. Every patient has his own digital
record which includes demographics, medical history, allergies, laboratory test results etc.
Records are shared via secure information systems and are available for providers from both
public and private sector. Every record is comprised of one modifiable file, which means that
doctors can implement changes over time with no paperwork and no danger of data replication.

 Real-Time Alerting

Other examples of big data analytics in healthcare share one crucial functionality – real-time
alerting. In hospitals, Clinical Decision Support (CDS) software analyzes medical data on the
spot, providing health practitioners with advice as they make prescriptive decisions.

However, doctors want patients to stay away from hospitals to avoid costly in-house treatments.
Analytics, already trending as one of the business intelligence buzzwords in 2019, has the
potential to become part of a new strategy. Wearable will collect patients’ health data
continuously and send this data to the cloud.

 Prevent Opioid Abuse in the US

Our fourth example of big data healthcare is tackling a serious problem in the US. Here’s a
sobering fact: as of this year, overdoses from misused opioids have caused more accidental
deaths in the U.S. than road accidents, which were previously the most common cause of
accidental death.
Q4. Difference between Structured, Unstructured and Siloed data.

ANS-

 Structured

Any data that can be stored, accessed and processed in the form of fixed format is termed as a
'structured' data. Over the period of time, talent in computer science has achieved greater success
in developing techniques for working with such kind of data (where the format is well known in
advance) and also deriving value out of it. However, nowadays, we are foreseeing issues when a
size of such data grows to a huge extent, typical sizes are being in the rage of multiple zettabytes.

Examples of Structured Data

An 'Employee' table in a database is an example of Structured Data

 Unstructured

Any data with unknown form or the structure is classified as unstructured data. In addition to the
size being huge, un-structured data poses multiple challenges in terms of its processing for
deriving value out of it. A typical example of unstructured data is a heterogeneous data source
containing a combination of simple text files, images, videos etc. Now day organizations have
wealth of data available with them but unfortunately, they don't know how to derive value out of
it since this data is in its raw form or unstructured format.

Examples of Un-structured Data

The output returned by 'Google Search'


 Siloed data.

Semi-structured data can contain both the forms of data. We can see semi-structured data as a
structured in form but it is actually not defined with e.g. a table definition in relational DBMS.
Example of semi-structured data is a data represented in an XML file.

Examples Of Semi-structured Data

Personal data stored in an XML file-

Q5. Discuss what launched the Big Data Era.

ANS-

 signal changing timing


 store all of the world music
 5billion mobile user in 2010
 74billion MasterCard user per year
 1million customer transaction per year wall mart
 Accurate weather prediction

Q6. Describe 6V of big data.

ANS- The 6V are:-

 Volume: The ability to ingest process and store very large datasets. The data can be
generated by machine, network, human interactions on system etc. The emergence of
highly scalable low-cost data processing technology platforms helps to support such huge
volumes. The data is measured in pet bytes or even Exabyte.
 Velocity: Speed of data generation and frequency of delivery. The data flow is massive
and continuous which is valuable to researchers as well as business for decision making
for strategic competitive advantages and ROI. For processing of data with high velocity
tools for data processing known as Streaming analytics were introduced. Sampling data
helps in sorting issues with volume and velocity.
 Variety: It refers to data from different sources and types which may be structured or
unstructured. The unstructured data creates problems for storage, data mining and
analyzing the data. With the growth of data, even the type of data has been growing fast.

 Variability: This refers to establishing if the contextualizing structure of the data stream
is regular and dependable even in conditions of extreme unpredictability. It defines the
need to get meaningful data considering all possible circumstances.

 Veracity: It refers to the biases, noises and abnormality in data. This is where we need to
be able to identify the relevance of data and ensure data cleansing is done to only store
valuable data. Verify that the data is suitable for its intended purpose and usable within
the analytic model. The data is to be tested against a set of defined criteria.

 Value: Refers to purpose, scenario or business outcome that the analytical solution has to
address. Does the data have value, if not is it worth being stored or collected? The
analysis needs to be performed to meet the ethical considerations.

Q8. Examine Big Data Contributions to marketing

ANS- Big data is more than just a buzzword. In fact, the huge amounts of data that we're
gathering could well change all areas of our life, from improving healthcare outcomes to helping
to manage traffic levels in metropolitan areas and, of course, making our marketing campaigns
far more powerful.

 More targeted advertising

As publishers gather more and more data about their visitors, it'll enable them to serve up more
and more relevant advertising. In the same way that Google and Face book already offer up
detailed targeting options, third-party vendors will offer the same array of choice. Imagine being
able to target people based on the articles that they've read or based on a lookalike audience of
your ideal reader.

 Semantic search

Semantic search is the process of searching in natural language terms instead of in the short burst
of keywords that we're more used to. Big data and machine learning make it easier for search
engines to fully understand what a user is searching for, and smart marketers are beginning to
incorporate this into their site search functionality to improve the user experience for their
visitors.

 More relevant content

In the same way that Netflix can serve up personalized recommendations, publishers will be able
to serve up more relevant content to their visitors by tapping into their wealth of data to
determine which content people are most likely to enjoy. Even content marketers will be able to
get into the job, and digital marketers will need to learn to stop thinking of their blog as a static
site. In the same way that you get different results when you Google the same phrase in different
locations, your blog should look different depending upon who's looking at it.

Q9. Explain astronomical Scale of Data definition


ANS - Astronomy is the study of the universe, and when studying the universe, we often deal
with unbelievable sizes and unfathomable distances. To help us get a better understanding of
these sizes and distances, we can put them to scale. Scale is the ratio between the actual object
and a model of that object. Some common examples of scaled objects are maps, toy model kits,
and statues. Maps and toy model kits are usually much smaller than the object it represents,
whereas statues are normally larger than its analog.

Example Sheet:-
Determine the Ratio

1. Select the fitness ball as the object that the Sun will be scaled to.

2. The diameter of the fitness ball is 24”, and the diameter of the Sun is 1,400,000𝑘𝑚.

3. Use the metric system.

4. 2.54𝑐𝑚 = 1”, so: 2.54𝑐𝑚 1 ∗ 24 ≈ 61𝑐𝑚

5. The fitness ball needs to be in meters, so: 61𝑐𝑚 1 ∗ 1𝑚 100𝑐𝑚 = 0.61𝑚 And the Sun needs to
be in meters as well, so: 1,400,000𝑘𝑚 1 ∗ 1,000𝑚 1𝑘𝑚 = 1,400,000,000𝑚

6. To get the ratio, we divide the diameter of the fitness ball by the Sun:

0.61m/1,400,000,000m≈ 4.4E-10

Solar System Object Actual Actual Diameter (𝑚) Scaled Diameter


Diameter (𝑐𝑚)
(𝑘𝑚)
Sun 1,400,000 1,400,000,000 61
Mercury 4,900 4,900,000 0.21
Venus 12,000 12,000,000 0.52
Earth 13,000 13,000,000 0.57
Mars 6,800 6,800,000 0.30
Jupiter 140,000 140,000,000 0.61
Uranus 51,000 51,000,000 0.22

Q.10 Discuss the limitations of Hadoop?

ANS- Hadoop is an open-source software framework for distributed storage and distributed
processing of extremely large data sets.

Big Limitations of Hadoop for Big Data Analytics


Issue with Small Files

Hadoop is not suited for small data. System lacks the ability to efficiently support the random
reading of small files because of its high capacity design.
Small files are the major problem in HDFS. A small file is significantly smaller than the HDFS
block size (default 128MB). If we are storing these huge numbers of small files, HDFS can’t
handle these lots of files, as HDFS was designed to work properly with a small number of large
files for storing large data sets rather than a large number of small files

Slow Processing Speed

In Hadoop, with a parallel and distributed algorithm, Map Reduce process large data sets. There
are tasks that need to be performed: Map and Reduce and, Map Reduce requires a lot of time to
perform these tasks thereby increasing latency. Data is distributed and processed over the cluster
in Map Reduce which increases the time and reduces processing speed.

No Real-time Data Processing

Apache Hadoop is designed for batch processing, that means it take a huge amount of data in
input, process it and produce the result. Although batch processing is very efficient for
processing a high volume of data, but depending on the size of the data being processed and
computational power of the system, an output can be delayed significantly. Hadoop is not
suitable for Real-time data processing.

No Delta Iteration

Hadoop is not so efficient for iterative processing, as Hadoop does not support cyclic data flow
(i.e. a chain of stages in which each output of the previous stage is the input to the next stage).