Anda di halaman 1dari 3

DataStage is a powerful ETL tool with lot of inbuilt stages/routines which can do most of the

functionalities required; for those things DataStage EE can do, there are parallel routines which
can be written in C++.
This primer can teach you how you can create a parallel routine in few minutes, regardless of
whether or not you are a C/C++ programmer. But to write some real good codes you might have
to learn some C++ programming.
Starting C programming with Linux is a good link to start with. Before we begin few points to
be noted.
Parallel routines are C++ components built and compiled external to DataStage. Note - they must
be compiled as C++ components, not C.
This C++ program should be without main() and compiled using the compiler option specified
under a APT_COMPILEOPT, which can be found under Administrator parameter option and
create an object (*.o).
This will create runtime libraries which are compiled code, without main ie. non self-contained
executable file.
Compiler and compiler options can be found in
DataStage Administrator Properties Environment Parallel Compiler.
Ex: compiler = g++ compiler options = -O -fPIC -Wno-deprecated c Compile command syntax
Compiler : compiler options : {filename with extension}
Ex: g++ -O -fPIC -Wno-deprecated -c {filename with extension}
Here's the typical sequence of steps for creating a DataStage parallel routine:

Create Compile Link Execute

1) Create

Create a C++ program with main() Test it and if successful remove the main()





2) Compile
Compile using the compiler option specified under APT_COMPILEOPT.
Note:Compiler and compiler options can be found in
"DataStage Administrator Properties Environment Parallel Compiler" and
create an object (*.o) file and put this object file onto this directory.
3) Link

Link the above object (*.o) to a DataStage Parallel routine by making the relevant entries
in General tab:
Routine Name: {Parallel Routine Name}
Type: External Function Object Type: Object / Library
External subroutine name: {Function Name specified inside your C++ program}
Library Path: {Specified in 2) Compile section + object (*.o) file name }
Also specify the Return Type and if you have any input parameters to be passed specify
that in Arguments tab.
4) Execute
Now your parallel routine will be available inside your job. Include and compile your job
and execute.

Step by step Example: Creating a shared object

1) Create a C++ program with main()
2) Create a text file with cpp extn (Ex: OBJTEST.cpp )
3) Ex
#include <stdlib.h>
#include <stdio.h>

char * ObjTestOne()
{
char* OutStr;
OutStr="Hello World - Object Testing";
return OutStr;




2) Compile the program
Get compiler and compiler options from:
DataStage Administrator Properties Environment Parallel Compiler
Ex: compiler = g++ compiler options = -O -fPIC -Wno-deprecated c Compile command syntax
s
Compiler : compiler options : {filename with extenstion}

Ex: g++ -O -fPIC -Wno-deprecated -c {filename with extenstion}
Execute the below command: g++ -O -fPIC -Wno-deprecated c OBJTEST.cpp This will make
and object file with .o extn
Ex: OBJTEST.o Move this object file to any of the Library Path of your preference:
Ex: /datastage/Ascential/DataStage/PXEngine/lib I usually put in "lib" directory.
You can locate your "lib" directory from Library Path (LD_LIBRARY_PATH).
4) Link
Link the above object (*.o) to a DataStage Parallel routine.
In the repository pallet right click and chose New parallel routine and make these
entries in the General tab:
Routine Name: {Parallel Routine Name}
Ex: OBJECTTEST
Type: External Function
Object Type: Object
External subroutine name: {Function Name specified inside your C++ program}
Ex: ObjTestOne (Remember?
This is the function name we replaced for main() ie. char * ObjTestOne() )
Library Path: {Specified in Compile section + object (*.o) file name }
Ex: /datastage/Ascential/DataStage/PXEngine/lib/OBJTEST.o
Return Type: char* Note:As we dont have any input parameters to be passed we are
not making any entries in Arguments tab. Now save and close the window.
4) Execute
Create a test job and call this parallel routine inside your job.
Ex:
Row Generator Transformer Sequential File In the transformer call this routine
in your output column derivation. Compile and run the job.