Anda di halaman 1dari 2

Python Wrapper Class for Tesseract

(Linux & Mac OS X & Windows)


Python-tesseract is a wrapper class for Tesseract OCR that allows any conventional image files (JPG, GIF ,PNG , TIFF and etc) to be read and decoded into readable languages. No temporary file will be created during the OCR processing.

Windows version compiled by VS2008 is available now! remember to 1. set PATH: e.g. PATH=%PATH%;C:\PYTHON27 Details 2. set c:\python27\python.exe to be compatible to Windows 7 even though you are using windows 7. Otherwise the program might crash during runtime Details 3. Download and install all of them python-tesseract-win32 python-opencv numpy 4. unzip the sample code and keep your fingers crossed Sample Codes 5. python -u test.py it is always safer to run python in unbuffered mode especially for windows XP Example 1:
import tesseract api = tesseract.TessBaseAPI() api.Init(".","eng",tesseract.OEM_DEFAULT) api.SetVariable("tessedit_char_whitelist", "0123456789abcdefghijklmnopqrstuvwxyz") api.SetPageSegMode(tesseract.PSM_AUTO) mImgFile = "eurotext.jpg" mBuffer=open(mImgFile,"rb").read() result = tesseract.ProcessPagesBuffer(mBuffer,len(mBuffer),api) print "result(ProcessPagesBuffer)=",result

Example 2:
import cv2.cv as cv import tesseract api = tesseract.TessBaseAPI() api.Init(".","eng",tesseract.OEM_DEFAULT) api.SetPageSegMode(tesseract.PSM_AUTO) image=cv.LoadImage("eurotext.jpg", cv.CV_LOAD_IMAGE_GRAYSCALE) tesseract.SetCvImage(image,api) text=api.GetUTF8Text() conf=api.MeanTextConf() print text

Example 3:
import tesseract import cv2 import cv2.cv as cv image0=cv2.imread("p.bmp") #### you may need to thicken the border in order to make tesseract feel happy to ocr your image ##### offset=20 height,width,channel = image0.shape image1=cv2.copyMakeBorder(image0,offset,offset,offset,offset,cv2.BORDER_CONSTANT,value=(255,255,255)) #cv2.namedWindow("Test") #cv2.imshow("Test", image1) #cv2.waitKey(0) #cv2.destroyWindow("Test") ##################################################################################################### api = tesseract.TessBaseAPI() api.Init(".","eng",tesseract.OEM_DEFAULT) api.SetPageSegMode(tesseract.PSM_AUTO) height1,width1,channel1=image1.shape print image1.shape print image1.dtype.itemsize width_step = width*image1.dtype.itemsize print width_step #method 1 iplimage = cv.CreateImageHeader((width1,height1), cv.IPL_DEPTH_8U, channel1) cv.SetData(iplimage, image1.tostring(),image1.dtype.itemsize * channel1 * (width1)) tesseract.SetCvImage(iplimage,api) text=api.GetUTF8Text() conf=api.MeanTextConf() image=None print "..............." print "Ocred Text: %s"%text print "Cofidence Level: %d %%"%conf

#method 2 cvmat_image=cv.fromarray(image1)

1 of 2

2/4/2014 5:20 PM

python-tesseract - python wrapper class for tesseract OCR (Linux & M...

http://code.google.com/p/python-tesseract/

iplimage =cv.GetImage(cvmat_image) print iplimage tesseract.SetCvImage(iplimage,api) #api.SetImage(m_any,width,height,channel1) text=api.GetUTF8Text() conf=api.MeanTextConf() image=None print "..............." print "Ocred Text: %s"%text print "Cofidence Level: %d %%"%conf

2 of 2

2/4/2014 5:20 PM

Anda mungkin juga menyukai