Anda di halaman 1dari 2

How to Use Tesseract OCR and ImageMagic (both open source) to read values from a

n X-Window
screen such as ICC or SMDH:
Installation of tools:
Both programs have minimal impact on the system. Tesseract only updates PATH, T
EMP, and TMP environment variables and creates TESSDATA_PREFIX. It creates two
registry entries under HKEY_CURENT_USER/Software/Tesseract-OCR; Version (3.00),
and Install Dir (C:\Program Files\Tesseract-OCR).

Configuration folders:
doc
tessdata
configs
tessconfigs
training
Files in top level folder:
uninstall.exe
tesseract.exe
leptonlib.dll
gzip.exe

ImageMagic can be installed anywhere. The relocateable Windows version doesn't


rely on any
dll's or registry settings. Just use the full path when invoking convert.exe
Windows Vista, XP, and NT Install:
In order to keep full path names reasonably short, I have installed it at C:\Ima
geMagick-6.6.7-4.
To keep it even shorter, and independant of release number, should probably just
use C:\ImageMagick\.
Most of what gets installed (i.e. copied) is not needed. In fact we could proba
bly survive with just convert.exe copied into an existing folder in the Windows
PATH.
Download ftp.imagemagick.org/pub/ImageMagick/binaries/ImageMagick-6.6.7-0-Q16-wi
ndows-static.exe
Execute it (or "open" it from your browser) to start the installation.
Once ImageMagick is installed, Select Start->Programs->Command Prompt.
In the Command Prompt window type
convert logo: logo.gif
imdisplay logo.gif

// Use ImageMagic to translate blue and dark blue to black


C:\Program Files\Tesseract-OCR>\ImageMagick-6.6.7-4\convert IOM\COMPOUNDS.tif -f
ill black -opaque blue -opaque "rgb(0,0,153)" IOM\COMPOUNDS_BWY.tif {ENTER}
// if necessary, use ImageMagic to translate yellow to white, but so far recogni
tion
// has been 100% with yellow or white
C:\Program Files\Tesseract-OCR>\ImageMagick-6.6.7-4\convert IOM\COMPOUNDS_BWY.ti
f -f
ill white -opaque "rgb(255,252,0)" IOM\COMPOUNDS_BW.tif {ENTER}
// Use Tesseract to find text in the enhanced image. Tess adds the .txt suffix
// You have to run in the Tesseract directory in order for the Leptonic (?) dll
to work.
// But it seems to handle data in subfolders so...
C:\Program Files\Tesseract-OCR>tesseract IOM/COMPOUNDS_BW.tif IOM/COMPOUNDS {ENT
ER}
Tesseract Open Source OCR Engine with Leptonica
C:\Program Files\Tesseract-OCR>type IOM\COMPOUNDS.txt {ENTER}
H21FCP_STA
H21FCP_ECB
H21_FBM232
C:\Program Files\Tesseract-OCR>

-----------------------------------------------------------
Convert any box with Cyan text to white on black:
// convert cyan to white
convert FOO.tif -fill white -opaque "rgb(0,160,158)" FOO_W.tif
//convert menu gray and dark blue to black
convert FOO_W.tif -fill black -opaque "rgb(0,0,153)" -opaque "rgb(114,114,114)"
FOO_BW.tif

Screen capture using AutoIT _ScreenCapture_Capture function requires GDI+: GDI+


requires a redistributable for applications that run on the Microsoft Windows NT
4.0 SP6, Windows 2000, Windows 98, and Windows Me operating systems.

Anda mungkin juga menyukai