Data in Motion

Data in motion:
Data in motion is the process of analyzing data on the fly without storing it. Some big data
sources feed data unceasingly in real time. Systems to analyze this data include IBM
Streams. Data at rest is a snapshot of the information that is collected and stored, ready to
be analyzed for decision-making.
Data pours into the organization from every conceivable direction: from operational and
transactional systems; from scanners, sensors and smart meters; from inbound and outbound
customer contact points; from mobile media and the Web.
Those streams of data contain a wealth of potentially valuable insight if you can capture and
analyze it.
But how do you manage such a torrent of data? Where would you store it? How long would it
take to make sense of it? Traditional approaches, which apply analytics after data is stored, may
provide insights too late for many purposes and most real-time-enabled applications cant deal
with this much constantly flowing data.
Heres an idea: Analyze it on the fly. Find whats meaningful, grab only what you need, and get
instant insights to react immediately and make the best decisions as data is flowing in.
Thats the promise of event stream processing. Event stream processing continuously analyzes
data as it flows into the organization, and then triggers an action based on the information flow. It
is a form of complex event processing that empowers you (or an automated system) to spot
patterns and make decisions faster than ever.
Three steps for streaming data

Managing data in motion is different from managing data at rest. Event stream processing relies on
three principal capabilities aggregation, correlation and temporal analytics to deal with data in
motion.
1.
Aggregation. Lets say you wanted to detect gift card fraud: Tell me when the value of gift
card redemptions at any point-of-sale (POS) machine is more than $2,000 in an hour. Event
stream processing can continuously calculate metrics across sliding time windows of moving data
to understand real-time trends. This kind of continuous aggregation would be difficult with
traditional tools. With the SAS Event Stream Processing Engine, its built in.
2.
Correlation. Connect to multiple streams of data in motion and, over a period of time that
could be seconds or days, identify that condition A was followed by B, then C. For example, if we
connect to streams of gift card redemptions from 1,000 POS terminals, event stream processing
could continuously identify conditions that compare POS terminals to each other, such as:
Generate an alert if gift card redemptions in one store are more than 150 percent of the average
of other stores.
3.
Temporal analysis. Event stream processing is designed for the concept of using time as a
primary computing element, which is critical for scenarios where the rate and momentum of
change matters. For example, sudden surges of activity can be clues to potential fraud. Event
stream processing could detect such surges as they occur, such as: If the number of gift card
sales and card activations within four hours is greater than the average number of daily
activations of that store in the previous week, stop approving activations. Unlike computing
models designed to summarize and roll up historical data, event stream processing asks and
answers these questions on data as it changes.
These three capabilities set event stream processing apart from other approaches by revealing
whats happening now, not just what happened in the past, so you can take action immediately.
Big Data Infrastructure Decisions: Data in Motion vs. Data at Rest

Big Data is playing an increasingly significant role in business success as organizations
strive to generate process and analyze massive amounts of information in order to make
better business decisions. But how do organizations ensure that they derive value from all
this data? To extract meaningful insights, data must be approached in entirely different ways
at different points in its lifecycle -- from creation to ingest, to comparative analysis based on
multiple sources of data, to decision making and, finally, the action taken from that decision.
The "stage" of Big Data is critical for determining the right infrastructure to host the
applications that are collecting, managing and analyzing the data.
In general, data can be broken down into two basic categories - data at rest and data in
motion - each with different infrastructure requirements based on availability, processing
power and performance. The optimal type of infrastructure depends on the category and the
business objectives for the data.
Data at rest refers to information collected from various sources and analyzed after the datacreating events have occurred. The data analysis occurs separately and distinctly from any
action taken on the conclusions of that analysis.
For example, a retailer analyzes a previous month's sales data and then uses it to make
strategic decisions about the present month's business activities. The action takes place well
after the data-creating event. The data scrutinized may spread among multiple collection
points consisting of inventory, sales price, sales made, regions and other pertinent
information. This data may drive the retailer to create marketing campaigns, send
customized coupons, increase/decrease/move inventory, or adjust pricing. The data provides
value in enticing customers to return and makes a long-term positive impact on the retailer's
ability to meet the needs of customers based on region.
For data at rest, a batch processing method is typically utilized. In this case, there's no
pressing need for "always on" infrastructure, but there is a need for flexibility to support
extremely large, and often unstructured, datasets. From a cost standpoint, public cloud can
be an ideal infrastructure choice in this scenario because virtual machines can easily be
spun up as needed to analyze the data and spun down when finished.
The collection process for data in motion is similar to that of data at rest; however, the
difference lies in the analytics. In this case, the analytics occur in real time as the event
takes place. An example here would be a theme park that uses wristbands to collect data
about their guests. These wristbands would constantly record data about guest activities,
and the park could use this information to personalize guest visits with special surprises or
suggested activities based on guest behavior. The business is able to customize the guest
experience, in real time, during the visit. Organizations have a tremendous opportunity to
improve business results in these scenarios.
For data in motion, a bare-metal cloud environment may be a preferable infrastructure
choice. Bare-metal cloud involves the use of dedicated servers that offer cloud-like features
without the use of virtualization. In this scenario, organizations can utilize a real-time
processing method in which high-performance compute power is always online and
available, and is also capable of scaling out at a moment's notice.
Until recently, many organizations may have assumed public cloud to be the natural choice
for this type of workload. However, as more companies host Big Data applications in the
public cloud, they're confronting its performance limitations, particularly at scale.
Bare-metal technologies can enable the same self-service, on-demand scalability and payas-you go pricing as a traditional virtualized public cloud. Bare-metal cloud, however,
eliminates the resource constraints of multi-tenancy, delivering the performance levels of
dedicated servers, making it a better choice for processing large volumes of high-velocity
data in real time.
Latency is also a key consideration for data-in-motion workloads, because a lag in
processing can quickly result in a missed business opportunity. As a result, the integrity of
network connectivity should go hand-in-hand with infrastructure decisions. A fast data
application can only move as quickly as the network architecture that's supporting it, so
organizations should look for multi-homed or route-optimized IP connectivity that's able to
navigate around Internet latency and outages, ultimately improving the availability and
performance of real-time data workloads.
Viewing data through the lens of one of these two general categories - at rest or in motion
- can help organizations determine the ideal data processing method and optimal
infrastructure required to gain actionable insights and extract real value from Big Data.
Real-Time Analytics
Real-time analytics is a term used to refer to analytics that are able to be accessed as they
come into a system. In general, the term analytics is used to define data patterns that
provide meaning to a business or other entity, where analysts collect valuable information by
sorting through and analyzing that data.
While the term real-time analytics implies practically instant access and use of analytical
data, some experts provide a more concrete time frame for what constitutes real-time
analytics, such as suggesting that real-time analytics involves data used within one minute of
it being entered into the system. A common example of real-time analytics is a system where
managers or others can remotely view order information thats updated as soon as an order
is made or processed. By staying connected to an IT architecture, these users will be able to
see the orders represented as they happen, therefore tracking orders in real time.
Other examples of real-time analytics would be any continually updated or refreshed results
about user events by customer, such as page views, website navigation, shopping cart use,
or any other kind of online or digital activity. These kinds of data can be extremely important
to businesses that want to conduct dynamic analysis and reporting in order to quickly
respond to trends in user behaviour.
The ability to extract information from operational data in real time is critical for a modern,
agile enterprise. The faster you can harness insights from data, the greater your advantage
in driving revenue, reducing costs, and increasing efficiency. Modern architecture for real
time big data combines Hadoop and NoSQL. Hadoop is engineered for big data analytics,
but its not real time. NoSQL is engineered for real-time big data, but its operational rather
than analytical. NoSQL together with Hadoop is the key to real time big data.
http://www.infoq.com/articles/stream-processing-hadoop

Data in Motion

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Data in Motion

Diunggah oleh

Hak Cipta:

Format Tersedia

Data in motion:

Three steps for streaming data

Big Data Infrastructure Decisions: Data in Motion vs. Data at Rest

Anda mungkin juga menyukai