Anda di halaman 1dari 5

Batch processing engine (Framework design)

Our batch engine can read data from different sources e.g. csv, xml, flat file, Json, database
etc.
Batch Engine is all about processing a data source in batch mode:
There are mainly three steps to execute a batch file:
a) We need to configure a job
b) Then need to run job.
c) Finally we get an execution report and log file.

Batch Abstraction

Batch Engine Streams data record by record from the data source. Depending on the data
source type, a record can be a line in a flat file, a XML file, a row in a database table, a file in
a folder, etc.
Each batch contains header and payloads

Header
(No, Source, etc.)

Payload
Header: The header contains various metadata about the record such as the data source
from which the record has beenRaw
read,Data
its physical number, creation date, etc.
Payload: The payload is the raw content of the record which is generic since its depends on
the data source type. The record payload can be of any type, so that it can represent any
type of input data.

A set of records in Batch Engine is represented by the Batch class. Batches are processed
as a unit:

Processing workflow

Batch Engine submits each record (or batch of records) to a processing pipeline composed
of a chain of processors.
The engine reads records (or batches) one by one in sequence from the data source. When
all the processing pipeline has been applied, the engine moves to the next record.
When all records have been processed, the engine generates the execution report and stops
the job.

In Batch processing we have ItemReaders reading items, one after the other, always
delivering the next one item. When there are no more items, the reader delivers null. Then
we have optional ItemProcessors taking one item and delivering one item, that may be of
another type. Finally we have ItemWriters taking a list of items and writing them somewhere.
The batch is separated in chunks, and each chunk is running in its own transaction. The
chunk size actually is determined by a CompletionPolicy. when the CompletionPolicy is
fulfilled, Batch Engine stops reading items and starts with the processing.

If theres a RuntimeException being thrown in one of the participating components, the


transaction for the chunk is rolled back and the batch fails.

Batch Engine has the ability to restart a failed batch. A batch job instance is identified by
the JobParameters, so a batch job started with certain parameters that have been used in a
prior job execution automatically triggers a restart, when the first execution has been failed.
If not, the second job execution would be rejected.
Besides ItemReaders, ItemProcessors and ItemWriters are listeners a second way to add
our business logic to the batch processing. Listeners always listen on certain events and are
executed, when an appropriate event fires. We have several listener types in Batch Engine,
the important ones are the following:
The JobExecutionListener has two methods, beforeJob and afterJob. Both of them
are, executed outside of the chunks transaction.
The StepExecutionListener has two methods, beforeStep and afterStep. Both of
them are, executed outside of the chunks transaction.

The ChunkListener has two methods, beforeChunk and afterChunk. The first one is
executed inside the chunks transaction, the second one outside of the chunks
transaction.
The ItemReadListener has three methods, beforeRead, afterRead andonReadError.
All of them are executed inside the chunks transaction.
The

ItemProcessListener

has

three

methods,

beforeProcess,

afterProcess

andonProcessError. All of them are executed inside the chunks transaction.


. The ItemWriteListener has three methods, beforeWrite, afterWrite andonWriteError.
All of them are executed inside the chunks transaction
.The

SkipListener has three methods, onSkipInRead, onSkipInProcess andonSkipInWrite.


All of them are executed inside the chunks transaction.

Anda mungkin juga menyukai