Advanced Concepts of DfPower Studio

Advanced Concepts of dfPower Studio:
DataFlux Tips and Tricks
Chris Martin, Client Services Manager
Gary Townsend, Solutions Consultant
Consuming a directory of input files with a
single text file input node
How can I consume 500 text files sitting in a single folder, all having the same field
structure? Using 500 different input nodes would be tedious and time consuming.
Creating one job with a macro variable would be more efficient, but would still require
that job be run 500 times.
Using the DataFlux delimited and fixed width input nodes we can point a single input
node at an entire directory of files instead of a single file itself. This will cause DataFlux
to append the contents of each file together (data union), leaving the end user with a
usable data set consisting of all records across all input files. See the example below for
a description of how to use this feature.
Consuming a directory of input files with a
single text file input node
Output of directory read:
Using macro variables to create dynamic file names
Always wanted to dynamically create files names or pass table values between
pages in an Architect job? The example below shows how to create and make use
of macro variables and their values between pages of an Architect job.
Defining Data: PAGE 1 of JOB
EXPRESSION NODE:
Pre‐Expression Expression Post Expression
Integer Execution_Number Seteof() setvar("Ex_Num",Execution_Number)
Execution_Number = 2239 setvar("TodaysDT",DateToday)
String DateToday
DateToday = today()
DateToday = FormatDate(DateToday,"MMDDYYYY")
Pushrow()
Using macro variables to create dynamic file names
Retrieving Data: PAGE 2 of JOB
EXPRESSION NODE:
Pre‐Expression
string Execution_Number
String TodayDate
Execution_Number = getvar("EX_NUM")
TodayDate = getvar("TodaysDT")
pushrow()
Expression
Seteof()
Text File Output:
FileName Property Value: C:\%%Execution_Number%%_%%TodayDate%%.txt
Expected Results:
1 row of data will be written to a file called 2239_TODAYS_DATE.txt
Inside of it you will see the value of Execution_Number as well as the value stored in the TodayDate macro
variable. The possibilities of using macros in this manner are endless. Experience with other uses.
Passing Macro Variable Values as a Command Line Argument
We all realize DataFlux’s ability to utilize macro variables within Architect & Profile jobs, but
what if you do not want to declare them as static values within the architect.cfg file?
As part of a command line argument when invoking DataFlux from the command line, we are
able to dynamically declare macro variable values at the time of execution. Below are some
examples of the syntax required on both Unix & Windows Platforms:
UNIX/LINUX DIS PLATFORMS:
INPUT_FILE=/dataflux/input/audit1.txt OUTPUT_FILE=/dataflux/output/audit1_out.txt
/dfpower/bin/dfexec –log /dataflux/DISjoblob/joblogname.log
../var/dis_arch_job/jobname.dmc
WINDOWS DIS PLATFORM:
Set INPUT_FILE=C:\dataflux\input\audit1.txt &
Set OUTPUT_FILE=C:\dataflux\output\audit1_out.txt & “C:\Program
Files\DataFlux\DIS\8.2\bin\dfexec.bat” –log c:\dataflux\DISjoblob\joblogname.log
c:\dataflux\jobs\jobname.dmc
Architect Node ‐ Advanced Properties
Using the advanced properties of nodes within dfPower Architect can drastically save
you time and effort associated with managing a large number of fields.
The following examples contain two very practical uses of the advanced properties.
Copy & paste fields from external data provider into job specific data node
Standardizing all fields into same field name
=
Alternate Date/Time Extraction
Methods
Counting Records in Text File
We all know how simple it is to extract a count of records from a database table, but
what if you want to determine how many records exist in a text file so you can
increment a counter accordingly.
•Option 1: Open File in Notepad, count records one by one
•Option 2: Just take a wild guess and hope you are close
•Option 3: Let DataFlux do the counting!
Counting Records in Text File
Let DataFlux count
We will stick to…
your records….
Remove control characters within your data:
Before: After:
ASCII Control Characters
Char Oct Dec Hex Control-Key Control Action
NUL 0 0 0 ^@ NULl character

SOH 1 1 1 Â Start Of Heading
STX 2 2 2 ^B Start of TeXt
ETX 3 3 3 ^C End of TeXt
EOT 4 4 4 ^D End Of Transmission
ENQ 5 5 5 Ê ENQuiry
ACK 6 6 6 ^F ACKnowledge
BEL 7 7 7 ^G BELl, rings terminal bell
BS 10 8 8 ^H BackSpace (non-destructive)
Horizontal Tab (move to next
HT 11 9 9 Î tab position)
LF 12 10 a ^J Line Feed
VT 13 11 b ^K Vertical Tab
FF 14 12 c ^L Form Feed
CR 15 13 d ^M Carriage Return
SO 16 14 e ^N Shift Out
SI 17 15 f Ô Shift In
DLE 20 16 10 ^P Data Link Escape
Device Control 1, normally
DC1 21 17 11 ^Q XON
DC2 22 18 12 ^R Device Control 2
Device Control 3, normally
DC3 23 19 13 ^S XOFF
DC4 24 20 14 ^T Device Control 4
NAK 25 21 15 Û Negative AcKnowledge
SYN 26 22 16 ^V SYNchronous idle
ETB 27 23 17 ^W End Transmission Block
CAN 30 24 17 ^X CANcel line
EM 31 25 19 ^Y End of Medium
SUB 32 26 1a ^Z SUBstitute
ESC 33 27 1b ^[ ESCape
FS 34 28 1c ^\ File Separator
GS 35 29 1d ^] Group Separator
RS 36 30 1e ^^ Record Separator
US 37 31 1f ^_ Unit Separator
Remove non‐Ascii Latin‐1 characters within your data:
Edit Distance for matching
• Edit_Distance Function:
– DataFlux through its EEL exposes a function called
Edit_Distance which compares two strings and
returns the number of characters which would
need to be changed/added/deleted or rearranged
between the two strings to make them equal. Edit
Distance is often referred to as a measure of how
many "edits" are required to turn one text string
into another, used to suggest spelling corrections.
• Input Data:
Name 1 Name 2 DOB 1 DOB 2 SSN 1 SSN 2
Isabell Smith Isabel Smith 02/22/1924 02/24/1923 123456789 234422352
Name1 & Name2 Diff DOB1 & DOB2 Diff SSN1 & SSN2 Diff

1 2 7
Why use Edit_Distance()??
•Introduce additional layer of “fuzzy” matching
•Determine difference between two strings / words
•Set “likeness” thresholds for matching
• How to use Edit_Distance()
Need to match on a portions
of a field?
Ever need to match on subcomponents of a field? You want to find all
customers at a specific location code but positions 8 and 13 of the
code mean nothing? But the rest of the value needs to be exact.
MatchCodes won’t help. Edit Distance may give you the proper
results, IN SOME CASES……. Let’s use the expression node to create a
new field built using the left/right/mid functions
Expression node
Expression node
Clustering node
Without Location_Substring With Location_Substring
Data ‐ Examples
Original Data
Cluster Results (Location Code or Address/City/State/Postal)
Cluster Results (Location Code or Address/City/State/Postal or Location_Substring)
Sort_Words for Matching
• sort_words Function:
‐ DataFlux through its EEL exposes a function called
sort_words that performs a sort (ascending or
descending) of data within a field.
‐ The function can also eliminate a word if it is
duplicated in the field.
‐ This function becomes valuable if/when a business
requirement requires matching on a free form field
(for example material or parts descriptions).
Sort_Words (Expression Node)
Move_File after job completion
• move_file Function:
‐ DataFlux through its EEL exposes a function called
move_file. The function performs a move of a file from
one directory to another.
‐ This functionality is important when input files are
processed and should be moved to a secondary location
so that they are not processed again.
‐ This could be viable as a last page in a job if it
consistently runs ‘listening’ for a file to arrive at an input
location.
move_file function
Before After
Cluster Flagging
At times it may be necessary to identify if a record was part of a multi‐row
cluster or if it was just a single, non‐matched record.
Cluster Flagging
Data after this
node
Cluster Flagging
We sort on the cluster id and sequence field (I). The sequence field is in
descending order as it will be imperative to understand if a cluster has a
sequence higher than 1 when identifying it as a multi‐ or single row.
Cluster Flagging
Cluster Flagging
Questions
Any of the presented topics and/or workflows can be provided.
Please see the instructors following the session to obtain this information.

Advanced Concepts of DfPower Studio

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Advanced Concepts of DfPower Studio

Diunggah oleh

Hak Cipta:

Format Tersedia

Advanced Concepts of dfPower Studio:

Pre‐Expression Expression Post Expression

Integer Execution_Number Seteof() setvar("Ex_Num",Execution_Number)

NUL 0 0 0 ^@ NULl character

Name1 & Name2 Diff DOB1 & DOB2 Diff SSN1 & SSN2 Diff

Anda mungkin juga menyukai