Anda di halaman 1dari 15

————————————————————————————————————————————————————————————————————————————————

User Manual: TextHarvest


————————————————————————————————————————————————————————————————————————————————

For additional help files, click here.


————————————————————————————————————————————————————————————————————————————————

Table of Contents
You can click on any section title below to jump directly to that section.

Introduction
First-Time User Questions
General Overview
A Simple Example
Another Example
TextHarvest Basics
Word Lists
Specifying a List of Words
Combining Keep and Delete
The Controls Input Box
Text Case
Null Lines
File Name Annotation
Regular Expressions
Other Controls
Advanced Techniques
Shortcut Keys
Wildcards
Multiple Wildcards
Processing the Windows Clipboard
Boolean Filtering (Anding)
Command Line Parameters
Batch File Considerations
The Error Reporting File
The Log File
Usage Notes
Matching Problems
False Matches
Finding Slashes
File Formats
Regular Expression Syntax
Overview
Basic Regular Expressions
Using the Asterisk
Advanced Regular Expressions
Scripting
A Simple Example
Scripting User Manual
Sample Scripts
Uninstalling TextHarvest
Custom Conversion
Legal Notices
About TextHarvest

Additional documentation (such as version history) can be found in the ReadMe.


txt file.

————————————————————————————————————————————————————————————————————————————————
Introduction
————————————————————————————————————————————————————————————————————————————————

————————————————————————————————————————————————————————————————————————————————

First-Time User Questions


What does TextHarvest do?

It reads files and copies the information you want, altering it in various
ways you specify.

What kinds of files can it process?

Text (Windows, DOS, Unix, Macintosh), fixed-record-length, and character-


terminated.

How much does TextHarvest cost?

For most people: nothing. Certain exceptionally powerful capabilities


require that you purchase a special license, but the average user will
not need these.

Can I give copies of TextHarvest to other people?

Yes.

Can I sell copies of TextHarvest to other people?

No. You can charge a small distribution fee, though.

I'm a programmer, so why would I need TextHarvest?

Many operations that would take 100 lines of code in a traditional


programming language can be performed with a single script command.

————————————————————————————————————————————————————————————————————————————————

General Overview
In its simplest form, TextHarvest is a utility that copies a text file. As it
does so, it can:

• Retain lines that contain specific text


• Skip lines that contain specific text

Thus, you can use TextHarvest to filter a text file, preserving only those
lines that interest you.

You can also use powerful scripts to:

• Process files other than plain text


• Modify the data sent to the output file
• Perform further filtering and analysis

Here are some typical scripted operations:

• Change one word to another one


• Change uppercase to lowercase
• Rearrange columns of text
• Look up values in a table
• Convert data to CSV (Comma Separated Value) format
But for now, let's start with the basics ...

————————————————————————————————————————————————————————————————————————————————

A Simple Example
TextHarvest comes with a demonstration file, named ThingsToDo.txt (click to
view), which contains a simple "To Do" list. You can view this file by entering
the name in the "Input File" box then clicking the corresponding View button.

The first column contains a category, such as "Car, Home, Work", while the
second column describes the task to be done.

Let us say you only wanted to see the lines that contained the word "Work".
Here is the way to do this:

• Specify the input file name (ThingsToDo.txt)


• Specify an output file name (Output.txt)
• Make sure Autoview is checked and Append is not checked
• Make sure the "Script file" input box contains the word "None" (no quotes)
• Put the word "Work" in the "/Keep list" like this: /Work
• Click the Start button (shortcut key: F9)

TextHarvest will then read ThingsToDo.txt and copy only those lines that
contain the word "Work" (or variations such as "WORK", or "work"). Then,
because you checked "Autoview", the output file (Output.txt) will be displayed.

Note: The reason we put a slash ("/") character in front of the word "Work" is
explained later, in the section "Specifying a List of Words".

————————————————————————————————————————————————————————————————————————————————

Another Example
Now let us suppose that you want to do the opposite of what you did in the
previous example: you want to see every line except those that contain the
word "Work". Remove the word "Work" from the "/Keep list" input box and put it
in the "/Delete list" box like this:

/Work

When you click the Start button, TextHarvest will copy the file (ThingsToDo.txt)
to the output file but will remove any lines that contain the word "Work".

————————————————————————————————————————————————————————————————————————————————

TextHarvest Basics
————————————————————————————————————————————————————————————————————————————————

————————————————————————————————————————————————————————————————————————————————

Word Lists
¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨
Specifying a List of Words
Once again using ThingsToDo.txt, let us copy only those lines that contain the
word "Home", or "Work", or both.

Make sure that the "/Delete list" input box is empty, then enter the following
in the "/Keep list" input box:
/home/work

When you click the Start button, the file will be copied — but only those lines
that contain "Home" or "Work" (with variations, such as "HOME", "Work" and so
on).

¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨
Combining Keep and Delete
You can specify both Keep and Delete lists. For example, let us say you used
the following criteria:

/Keep list: /work/home


/Delete list: /inventory

This would copy any lines with the words "work" or "home", but which do not
contain the word "inventory".

When Keep and Delete lists are both specified, a line is first checked to see
if it passes the "Keep" test. If so, it is then compared to the "Delete" list.
If a match is found, the line is not copied.

————————————————————————————————————————————————————————————————————————————————

The Controls Input Box


¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨
Text Case
By default, TextHarvest will ignore text case when looking at the "/Keep list"
and the "/Delete list". You can override this behaviour, though, using the
"/Controls" input box. Here are the various settings:

/KI = Ignore case on Keep (default)


/KM = Match case on Keep
/DI = Ignore case on Delete (default)
/DM = Match case on Delete

Try using the sample input file ThingsToDo.txt to test this out:

• Make sure the "Script file" input box contains the word "None" (no quotes)
• Set your "/Keep list" to "/CAR/work"
• Make sure your "/Delete list" is empty
• Set your "/Controls" input box to "/KM"
• Click the Start button

The output will contain references to "CAR", but will ignore the lines that
start with "WORK" because "WORK" (which is in uppercase) does not match "work"
(which is in lowercase).

¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨
Null Lines
By default, TextHarvest ignores all null (zero-length) lines in the input file.
However, you can set the "/Controls" input box to deal with this. Here are the
settings:

/NI = Ignore null lines (default)


/NK = Keep null lines
/NS = Keep null lines, but never output more than two in a row
Try using the sample input file ThingsToDo.txt to test this out...

• Make sure the "Script file" input box contains the word "None" (no quotes)
• Clear the "/Keep list" input box
• Set "/Delete list" to "/car/work"
• Set "/Controls" to "/NK".
• Click the Start button

The output will not contain any lines containing "car" or "work", but it will
contain any null lines found in the input file.

Try the experiment again, first with "/NS" and then with "/NI". (Since "/NI" is
the default, you could also simply leave the "/Controls" input box blank.)

¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨
File Name Annotation
If you are processing multiple files using wildcards, you may wish to know
which output lines came from which files. TextHarvest can annotate the output
such that the file name precedes lines extracted from a particular file:

/FN = No, do not output the file name (default)


/FY = Yes, output the file name
/FS = Yes, output the file name, and put separator lines above and below

The separator line (control /FS) makes it easier to spot the file names in a
long output file.

Only the file names of files that actually generate output lines are included.
If a file does not generate any input lines, its name is not mentioned.

File name annotation lets you use TextHarvest as a "Find Text" utility. For
example, if you wanted to search a folder for the word "inventory", you could
do this:

• Set the "Input file" box to the wildcard "Things*.txt" (without the quotes)
• Make sure the "Script file" input box contains the word "None"
• Set the "/Keep list" input box to "/inventory"
• Make sure your "/Delete list" is empty
• Set the "/Controls" input box to "/FS" or "/FY"
• Click the Start button

The example given above would search all files matching the wildcard pattern
Things*.txt extension for the word "inventory".

¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨
Regular Expressions
By default, TextHarvest will search for the precise text fragments you specify
in the /Keep and /Delete lists. However, you can enable "regular expressions",
which let you match patterns rather than specific sequences of characters:

/KR = Enable regular expressions for the /Keep list


/DR = Enable regular expressions for the /Delete list

Consider the following /Keep list:

/D.g/C[aou]t

With /KR specified in the "/Controls" input box, this would match any line that
contained "Dog", "Cat", "Cot", "Cut". It would also match lines containing
"Dig" and "D3g", so when you are using regular expressions you must ensure that
you are indicating precisely what you want.
If you have never used regular expressions before, you may find them a bit
confusing at first, but with a bit of practice you will come to appreciate just
how much power they put at your fingertips.

Please see "Regular Expression Syntax" for additional examples of regular


expressions.

————————————————————————————————————————————————————————————————————————————————

Other Controls
Autoview, if checked, displays the output file after processing (if there is
anything to display). If it is not checked, you have to click the View button
to see the output.

Append, if checked, places the output at the end of the specified output file.
If it is not checked, the original copy of the output file (if it exists) is
renamed with a .BAK extension and a new version is created.

————————————————————————————————————————————————————————————————————————————————

Advanced Techniques
————————————————————————————————————————————————————————————————————————————————

————————————————————————————————————————————————————————————————————————————————

Shortcut Keys
In addition to the standard Windows shortcut key conventions (i.e. pressing Alt
plus a letter that is underlined), the following shortcut keys are defined:

——— ————————————————————— ——— —————————————————————


Key Action Key Action
——— ————————————————————— ——— —————————————————————
F1 Show help F6 Browse script file
F2 Browse input file F7 Browse support file
F3 Browse output file F9 Start processing
——— ————————————————————— ——— —————————————————————

The Esc (Escape) key will close most windows opened by TextHarvest.

————————————————————————————————————————————————————————————————————————————————

Wildcards
You can process more than one input file at a time by using wildcards. For
example, if you set the input file box to *.txt then all files with a .txt
extension will be processed. Here are some more examples:

————————————— ———————————————————————————————————————————————————————
Wildcard Mask Interpretation
————————————— ———————————————————————————————————————————————————————
report??.txt "report" followed by any two characters, .txt extension
my*.csv "my" followed by one or more characters, .csv extension
xyz.??? "xyz" with any three-character extension
————————————— ———————————————————————————————————————————————————————

Note that the asterisk (*) is interpreted differently in regular expressions


than it is in file name wildcards.

You cannot specify wildcards for the output file. All output goes to a single
output file.
¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨
Multiple Wildcards
You can specify multiple wildcards by using semicolons, as in this example:

*.txt;*.me

This would process input files with the .txt exension (example: xyz.txt) and
the .me extension (example: read.me).

There is no limit to the number of wildcards you specify, but bear in mind that
TextHarvest lets you process the same file more than once. Consider this
example:

*.txt;my*.txt

This would process all files with a .txt extension, then all files with a .txt
extension where the file name starts with "my". Thus, a file named "myfile.txt"
would be processed twice.

You cannot specify multiple file names for the output file. All output goes to
a single output file.

————————————————————————————————————————————————————————————————————————————————

Processing the Windows Clipboard


TextHarvest can read and write to the Windows text clipboard as if it was a
regular text file. To read from the clipboard, enter CLIPBOARD in the "Input
File" box. To write to the clipboard, enter CLIPBOARD in the "Output File" box.

It is possible to do both at once. Of course, after processing, the original


contents of the clipboard will have been overwritten.

Tip: Most Windows programs let you copy selected text with Ctrl-C and paste
with Ctrl-V.

————————————————————————————————————————————————————————————————————————————————

Boolean Filtering (Anding)


Note: You can use the sample file ThingsToDo.txt to try out the examples given
below. The examples should be entered in your /Keep list. Make sure that the
"/Delete" and "/Controls" input boxes are empty, and that the "Script file"
input box is set to "None".

The lists of words (see "Word Lists") you enter in the "/Keep list" and
"/Delete list" input boxes are typically a sequence of alternatives. For
example, if your /Keep list is "/Cat/Dog/Cow" it means you want to keep lines
that contain "Cat" or "Dog" or "Cow". This is called an "OR-list".

However, sometimes you want to keep lines that contain all of the words you
listed. That is to say, if even one of the words is missing, you don't want to
keep the line. For this you need an "AND-list".

TextHarvest's AND function is represented by two ampersands. Here is an example


of ANDing...

/Cat&&/Dog&&/Cow

This will match any line that contains all three (Cat, Dog and Cow).
You can combine ANDing and ORing, as in this example:

/Cat/Dog/Cow&&/Moose

This will match any line that contains any one of the first three items (Cat or
Dog or Cow) AND also contains the word Moose.

Now consider this example:

/Cat/Dog/Cow&&/Moose/Antelope

This will match any line that contains one of the first three items (Cat or Dog
or Cow) AND also contains one of the next two items (Moose or Antelope).

If any of the AND conditions is not met, the line does not match. For example,
consider this list:

/North/South&&/Up/Down&&/Back/Forth

A line that contains North, Up and Back would match. A line that contains South,
Down and Back would match. But a line that is missing both North and South
would not match.

————————————————————————————————————————————————————————————————————————————————

Command Line Parameters


To call TextHarvest from the command line (e.g. from a batch file or in a
Windows shortcut), the following format is used:

TextHarvest /i"Input.txt" /o"Output.txt"

You can also specify the /Keep, /Delete and /Controls lists:

/X"/keep/list"
/Y"/delete/list"
/Z"/control/list"

To specify a script file, use /S as in this example:

/S"ScriptSample01.txt"

If you are not using a script, you should specify /S"None" to override whatever
value TextHarvest had previously saved for that input box.

For a general overview of command line parameters, start up TextHarvest as


follows:

TextHarvest /?

This displays a window which summarizes the command-line options. The window is
also displayed if your command line contains an option that TextHarvest does
not recognize.

————————————————————————————————————————————————————————————————————————————————

Batch File Considerations


When calling TextHarvest from a batch file, you must use the Windows START
command with the /WAIT option to allow TextHarvest to complete processing
before moving to the next line in the batch file.

If the batch file is running unattended, you should also feed TextHarvest the
following parameters:

————————————————————————————————————————————————————————————————————
/R Run (i.e. start) processing immediately
/CA Close the program after execution, even if there is an error
————————————————————————————————————————————————————————————————————

Thus, a batch file line that calls TextHarvest would contain the items
exemplified below:

————————————————————————————————————————————————————————————————————————
START
/WAIT <—— Await completion
"C:\Program Files\PinnSoft\TextHarvest\TextHarvest.exe"
/I"C:\My Input\Input*.dat" <—— Input file or wildcard mask
/O"C:\My Output\Output.txt" <—— Output file
/S"None" <—— Script file
/R <—— Start processing
/CA <—— End afterwards
————————————————————————————————————————————————————————————————————————

Note the use of quotes — these are necessary if the parameter contains a space.

¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨
The Error Reporting File
If a serious error occurs during processing, TextHarvest creates a file named
TextHarvest-Error.txt in its program directory. The file is plain text and
contains information about the error. You can view the Error Reporting File
using the "Support Files" input box of the Parsing Parameters window; it will
be listed in the drop-down list.

If no error occurs, the file is not present after processing is complete.

If you are using TextHarvest in a batch file, you can check to see if
processing worked by using the IF EXIST test, as in this example:

————————————————————————————————————————————————————
@ECHO OFF
C:
CD "\Program Files\Pinnsoft\TextHarvest"
START /WAIT TextHarvest.exe /I"C:\MyInput\XYZ.TXT" /R /CA
IF EXIST TextHarvest-Error.txt GOTO ERROR
GOTO OKAY
:ERROR
ECHO An error occured!
GOTO DONE
:OKAY
ECHO Everything was fine!
:DONE
ECHO Processing completed
————————————————————————————————————————————————————

Note that the /CA parameter suppresses pop-up error messages, so if you use it
in your batch file, it is up to your batch file to watch for the error file and
then determine what to do if an error (such as "File not found") occurs.

¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨
The Log File
In addition to the Error Reporting File, TextHarvest also creates a log file
(named TextHarvest-Log.txt). TextHarvest uses the log file to record the date
and time when processing started and ended. It also uses the log file to report
anything that is slightly unusual but not a serious problem.

You can view the Log File using the "Support Files" input box of the Parsing
Parameters window; it will be listed in the drop-down list.

————————————————————————————————————————————————————————————————————————————————

Usage Notes
————————————————————————————————————————————————————————————————————————————————

————————————————————————————————————————————————————————————————————————————————

Matching Problems
¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨
False Matches
Sometimes TextHarvest matches on strings of characters that you do not want
matched. For example, if you set your /Keep list to /home/car while copying the
sample file ThingsToDo.txt you will find that an additional line is included:

WORK Buy toner cartridge for laser printer

This was included because the characters "car" appear in the word "cartridge".
You can get around this by explicitly indicating the space after "car":

/home/car /

An alternative solution in this particular case would be to set the /Keep list
to "/HOME/CAR" and the /Controls setting to "/KM" (Keep: match case).

¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨
Finding Slashes
You will normally separate the words in your /Keep and /Delete lists with the
slash ("/") character (e.g. "/home/work"). But what if you are looking for a
slash? All you need to do is begin your word list with a different character,
such as the "backslash" character ("\").

You can try processing the sample input file ThingsToDo.txt with the following
"/Keep list" to see that this works as it should:

\home\work

In other words, the first character in the list becomes the delimiter which
separates the words.

————————————————————————————————————————————————————————————————————————————————

File Formats
If you do not use scripts, TextHarvest can read either Windows-style (CRLF-
terminated) text files or Unix-style (LF-terminated) text files, and output is
always a Windows style (CRLF-terminated) text file.

If you do use scripts, TextHarvest can read all standard text files (including
the Macintosh variety), fixed-record-length files, and character-terminated
records, while output can be whatever you want it to be.

————————————————————————————————————————————————————————————————————————————————
Regular Expression Syntax
————————————————————————————————————————————————————————————————————————————————

————————————————————————————————————————————————————————————————————————————————

Overview
TextHarvest supports most of the regular expression conventions. In the
following list, the letters x, y and z stand in for any character.

^xxx Matches a sequence of characters at the start of a line


xxx$ Matches a sequence of characters at the end of line
x.x Matches a single character
[xz] Matches a set of characters ("x" and "z" in this example)
[x-z] Matches a range of characters (this example covers "x" to "z")
x* Matches zero or more occurrences of the preceding character
[xyz]* Matches zero or more occurrences from the preceding set
[x-z]* Matches zero or more occurrences from the preceding range
[^xyz] Matches any character but the ones specified
[^x-z] Matches any character but the ones in the specified range

The backslash (\) character has a special meaning in regular expressions:

\x Means "take the next character literally"


For example: \[ means the actual [ character
rather than the start of a set or range
\t Means "a tab character" (ASCII character 9)

————————————————————————————————————————————————————————————————————————————————

Basic Regular Expressions


Note: In the following examples, we assume that case sensitivity has been
turned on, using the /KM or /DM setting in the "/Controls" input box.

Here are some examples of matches:

C.t Matches Cat, Cot, Cut, Cxt, C3t etc.


C[aou]t Matches Cat, Cot, Cut only
B..d Matches Bird, Bred, Bead etc.
^Dog Matches Dog only if it is at the beginning of a line
Moose$ Matches Moose only if it is at the end of a line
Pa*d Matches Pd, Pad, Paad, Paaad etc.

¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨
Using the Asterisk
The last example given above uses the * character to indicate zero, one or more
occurrences of a particular character — in this case, the letter "a". Unlike
the * wildcard character used in file names, it does not match "any" character
but is specific. That is why "Pa*d" would not match "Parsed"; the asterisk
means "match zero or more of the preceding character specification".

If you actually want to search for "Pa" followed by one or more letters and
then "d", the correct syntax is:

Pa[a-z][a-z]*d

This means that we want to match "Pa", then a letter in the range from "a" to
"z", then some number (including zero) of characters in the "a" to "z" range,
and finally the letter "d". The character string "Parsed" would meet these
criteria, as would "Pad", "Paid" and "Packed".

————————————————————————————————————————————————————————————————————————————————

Advanced Regular Expressions


Note: In the following examples, we assume that case sensitivity has been
turned on, using the /KM or /DM setting in the "/Controls" input box.

Here are some more complicated examples of regular expressions:

C[^ou]t Matches Cat, Cxt and so on, but not Cot or Cut
C[ao]*t Matches Ct, Cat, Caat, Cot, Coot, Cooot, Coat, Coaoat etc.
[0-9][0-9]* Matches numbers such as 0, 1, 01, 10, 25, 0990, 9999 etc.
-[0-9][0-9]* Matches negative numbers such as -0, -1, -19, -12345 etc.

In the last example, [0-9] is specified twice to ensure that at least one digit
is found. Bear in mind that the * character means "zero or more occurrences".
If you had specified "-[0-9]*" you would get a match within the sequence "Hello
- there", since the "-" character is indeed found, followed by zero occurrences
of the digits 0 through 9.

You can create fairly complex patterns using regular expressions. Consider this
example:

\$[0-9][0-9]*\.[0-9][0-9]

This would match dollar amounts with two decimal places, such as $0.00, $03.23,
$3.14, $9.99, $1234.56 and so on.

————————————————————————————————————————————————————————————————————————————————

Scripting
————————————————————————————————————————————————————————————————————————————————

Parse-O-Matic Scripting lets you modify the results generated by TextHarvest.

Scripting can examine the text lines that are retained after TextHarvest's
/Keep and /Delete settings are taken into account. You could, for example:

• Replace one string of text with another one


• Convert some of the line to uppercase
• Eliminate certain lines on the basis of multiple criteria
• Rearrange the order of data items in a line
• Add up numbers and include totals at the end of the output

All this — and much, much more — is possible with Parse-O-Matic Scripting.

When using a script you will generally leave the /Keep and /Delete input boxes
empty, since the script can do this kind of selection. The /Controls input box
can be set to /NK to keep null lines or /NI (default) to ignore null lines.

————————————————————————————————————————————————————————————————————————————————

A Simple Example
Here is a very simple example, using the sample ThingsToDo.txt file. Let us say
you wanted to convert the "category" (CAT, CAR, HOME, WORK, LEISURE) to
lowercase. To do this, you would use a text editor program to write a script
file (let's call it ScrExperiment.txt) that looks like this:

Category = $OutData[1 9]
Description = $OutData[10 999]
Category = ChangeCase Category 'Lowercase'
OutEnd Category Description

The first two lines extract the two parts of the output data from the variable
named $OutData, which contains the line of text from TextHarvest. The third
line converts the category to lowercase, while the final line sends the
modified line to the output file. (Whenever you run TextHarvest's results
through a script, it is up to the script to actually send the lines to the
output file.)

To run this script, you would enter its name — we called it ScrExperiment.txt —
in the "Script File" input box of the Parsing Parameters window, then click the
Start button.

If you do not want to run a script — i.e. you simply want to use TextHarvest as
a basic filter — enter "None" (without the quotes) in the "Script File" input
box.

————————————————————————————————————————————————————————————————————————————————

Scripting User Manual


A complete user manual for Parse-O-Matic Scripting is included with TextHarvest.

Click here to access the "Parse-O-Matic Scripts" user manual.

————————————————————————————————————————————————————————————————————————————————

Sample Scripts
Here is a list of the sample scripts included with TextHarvest:

—————————————————— ————————————————— ——— —————————————————————————


Script File Name Input File to Use Adv Comments
—————————————————— ————————————————— ——— —————————————————————————
ScriptSample01.txt ThingsToDo.txt -
ScriptSample02.txt ThingsToDo.txt -
ScriptSample03.txt InputSample01.txt -
ScriptSample04.txt ToDoListFixed.dat - Fixed-record-length input
ScriptSample05.txt ToDoListDelim.dat - Character-delimited input
ScrSampleAdv01.txt ThingsToDo.txt Y
ScrSampleAdv02.txt Scr*.txt Y Input file uses wildcard
ScrExercise.txt ThingsToDo.txt Y Demonstrates all commands
—————————————————— ————————————————— ——— —————————————————————————

Adv = Uses Advanced Scripting commands (see the Scripting user manual).

It is best to study these scripts in the order they are listed above. To view a
script, click on the button with the folder icon next to the "Script File"
input box. You can then select a script and view it by clicking the View button.

To try out the sample script ScriptSample01.txt:

• Set your "Input File" box to ThingsToDo.txt


• Set the "Output File" box to an appropriate file name (e.g. Output.txt)
• Make sure Autoview is checked and Append is not checked
• Clear your "/Keep list", "/Delete list" and "/Controls" input boxes
• Set the "Script File" input box to ScriptSample01.txt
• Click the Start button

Once the output file is displayed, you may find it helpful to also view the
input file, so you can understand how the output data was transformed.
————————————————————————————————————————————————————————————————————————————————

Uninstalling TextHarvest
————————————————————————————————————————————————————————————————————————————————

If you should need to uninstall TextHarvest, start up the Windows Control Panel,
then click on Add/Remove Programs. Find TextHarvest on the list, and proceed
with removal.

————————————————————————————————————————————————————————————————————————————————

Custom Conversion
————————————————————————————————————————————————————————————————————————————————

TextHarvest is handy and simple to use, but it has its limitations. That is a
perennial problem with utilities: there always seems to be one feature missing
— one that you urgently need!

We invite you to visit our web site if you need a custom conversion application.
Our company has been doing data conversion since 1985.

————————————————————————————————————————————————————————————————————————————————

Legal Notices
————————————————————————————————————————————————————————————————————————————————

TextHarvest™ and Parse-O-Matic™ are trademarks of Pinnacle Software.

This document is Copyright © 2003 by Pinnacle Software. You may not


distribute copies of this document without explicit permission from Pinnacle
Software, except in conjunction with the complete and unaltered TextHarvest
installation package. Please write to us if you would like to adapt this
product or any of our other products to your own distributed application.

The entire product (comprising software, documentation and supporting


provisions) is presented as-is; we make no claim about (and disavow liability
for) its suitability, accuracy, reliability, performance etc. If you should
encounter a problem with the product, please write to us to find out if a
solution is available.

————————————————————————————————————————————————————————————————————————————————

About TextHarvest
————————————————————————————————————————————————————————————————————————————————

¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨
Project Information
Project Name TextHarvest
Copyright Copyright © 2003, 2005 by Pinnacle Software
Lead Programmer Timothy Campbell
Email TextHarvest@parse-o-matic.com

¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨
Program Information
Program Name TextHarvest.exe
Program Time Stamp 05.03.08 13:09 (This program is 2136 days old)
Program Serial Number TEXTHARV-041104
Program Status Freeware
Program Version 3.00.00
Parse-O-Matic Engine 4.51.00
Engine Extensions 1.00.06
Scripting Engine 4.40.00 (2005)
User Interface 3.21.01 (PI)
File Viewer 4.10.01

¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨
Customer Information
Our Customer Code None (general distribution)

¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨
Licensing Information
TextHarvest is freeware. You may give away (but not sell) complete and
unaltered copies in the form of the original installation package. Certain
features (such as Advanced Scripting) require a registration code, which can
be purchased from Pinnacle Software.

¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨
Data Format
Input File Text
Output File Text
ŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻ END OF FILE ŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻ

Anda mungkin juga menyukai