Anda di halaman 1dari 2

Hadoop is the platform for storing and processing large amounts of data and

widespread applications. In this ecosystem, you can store the large amounts of data
in one of the storage managers and then use the processing frameworks for running
the stored data. Hadoop Training in Chennai offers the training from top IT
professionals. At early days, the Map Reduce is the only processing framework tool.
At present, there are many open source tools for Hadoop ecosystem which helps to
process the data in Hadoop.

Categories for processing frameworks


Hadoop processing frameworks are divided into the following six categories

1. Abstraction Frameworks
These frameworks make the users to process data using a higher level abstraction.
This is API based- for example, Crunch and Cascading are based on the Custom DSL,
such as Pig. Hadoop Training Chennai provides the unique training methodology.
This framework is generally built on the top of General-Purpose processing
framework.

2. General-Purpose processing frameworks


These frameworks allow users to process data in Hadoop using a low level API.
Actually, these are all batch frameworks they follow different programming models.
They are MapReduce and Spark.

3. SQL frameworks
The querying data into Hadoop using SQL are enabled by this framework. These can
be built on the top of General-Purpose framework, such as Hive, or as a standalone,
special-purpose framework, such as Impala. Big Data Training in Chennai is the
best place to join in this course. Actually, SQL frameworks are abstraction
frameworks. So they are having their own category.

Benefits of using Abstraction or SQL frameworks


You can save the time by not having to implement the common processing tasks
using the low-level APIs of general purpose frameworks.

Coding directly on the frameworks means you would have to re-write the jobs if you
decided to change the frameworks. Using an abstraction or SQL framework which
builds on generic framework abstracts away.

Running a query on special-purpose processing framework is much faster than


running the equivalent MapReduce job, because they use completely different
execution model, built for executing the fast SQL queries.
4. Machine learning frameworks
This framework enables the machine learning analysis data using Hadoop. This
framework also built on the top of general purpose framework, such as MLib or as a
standalone, special-purpose framework, such as Oryx. The commonly used machine
learning frameworks are Mahout, MLib, Oryx and H2o. MLib is a machine learning
library for Spark.

5. Graph processing frameworks


This framework enables the graph processing data capabilities on Hadoop. This
framework is built on the general-purpose framework such as Giraph or a special-
purpose framework such as GraphLab.

6. Real-time/streaming frameworks
These frameworks provide the near real-time processing for data in the Hadoop
ecosystem. They can be built on top generic framework, such as Spark Streaming or
as a stand-alone, special-purpose framework such as storm. Big Data Course in
Chennai at FITA is the best leading training institute for Hadoop Training. Spark
streaming is the library for doing micro-batch streaming analysis which is built on
top of Spark. Apache storm is distributed, real-time computation engine with Trident
used as abstraction engine.

Anda mungkin juga menyukai