Anda di halaman 1dari 6

2016 International Conference on Global Trends in Signal Processing, Information Computing and Communication

Automatic Voice Generation System after Street


Board Identification for Visually Impaired
Pravin A. Dhulekar1, Niharika Prajapati2 Tejal A. Tribhuvan3, Karishma S. Godse4
1 2 3,4
Asst.Prof., Student, Department of Electronics and Student, Department of Electronics and
Telecommunication Engineering, Sandip Institute of Telecommunication Engineering, Sandip Institute of
Technology and Research Center, Sandip Foundation, Nashik, Technology and Research Center, Sandip Foundation, Nashik,
India India

Abstract— The Project is based on design & implementation II. LITERATURE REVIEW
of smart hybrid system for street sign boards recognition, text A recent survey says that 285 million people are estimated to
and speech conversions through character extraction and symbol be visually impaired worldwide, 39 million are blind and 246
matching. The default language use to pronounce signs on the have low vision. Out of every 100, 82 are living with
street boards is English. Here we are proposing a novel method to blindness are aged 50 and above 20% of all such impairmentெs
convert identified character or symbol into multiple languages
still cannot be prevented or cured.
like Hindi, Marathi, Gujarati, etc. This Project is helpful to all
starting from the visually impaired, the tourists, the illiterates
In todayெs scenario, the overall literacy rate for people
and all the people who travel. The system is accomplished with aged 15 and above is 86.3% which means there are still 13.7%
embedded platform for real time conversion, speech people illiterate around the world. Starting from the most
pronunciation and to display on intended devices. This Project illiterate country like South Sudan, Afghanistan to the most
has a multidisciplinary approach as it belongs to the domains like literate countries like Norway, Cuba, etc., everywhere there is
computer vision, speech processing and embedded system. a requirement of local language for the illiterate or the rural
Computer vision is used for character and symbol extraction people to communicate.
from sign boards. Speech processing is used for text to speech Over the past years, tourism has been growing rapidly
conversion and then to multiple language conversion of original strong and an underlying contributor to the economic
speech signal. Embedded platform is used for real time recovery. Around 1.1 billion people travelled abroad in 2014
pronunciation and displaying of desired output. and Europe is the most visited area in the world. In 2015, 244
Keywords— Street Sign Boards Recognition, Character million people, or 3.3% of the world's population, lived
Extraction, Symbol Matching, Computer Vision. outside their countryெs origin. It is predicted that immigration
rates will continue to increase over time and every time they
I. INTRODUCTION travel, have to face with different languages at different
places.
Communication is a very important tool in a human life. It Hence, for all such people across the globe a common
is an essential requirement in this world to survive. We can solution, a speech generation system using language
look back in the times of ignorance when no language was translation software is implemented which will be useful to
developed even than communication existed in form of sign everyone.
language and other forms. It is impossible for any educational In [3] presented a work on converting the multilingual text
institute, organization or domestic life to exist without it. into speech for visually impaired. Basic working principle
To overcome the difficulties faced in communication; this used here is u law companding. This system used full for the
paper has been implemented to achieve a real time system blind and mute people.
with feature/character/symbol extraction, text to speech In [4] presented a paper based on the nature prosody
conversion and then speech conversion into different generation in the text to voice for English using the phonetic
languages. integration .This system is very complicated to record and
accumulate the word but is used the less memory space.
It starts with capturing any image may be color, gray scale In [5] worked on a system of converting the text in the
or monochrome via any video acquisition device and that form of speech. This paper includes the recognized text and
image is stored in various file formats (such as bmp, jpeg, tiff, converted proper sound signal. OCR techniques are
etc.). After it is stored, image is read by the software and implemented to get the output of the system.
feature extraction is performed. After character recognition is In [6] worked on the text to voice conversion for the
done, it is displayed and speech signal is generated which can application on android environment. The basic objective of
be heard via speakers. Raspberry Pi is used as a platform to this paper is extending the offline OCR technologies to the
deploy Matlab files into controller and interfacing it with android platform. They proposed there system to detect the
speakers and display device. This complete system is very text which modified to the multi-resolution and converted into
helpful for visually impaired, illiterate, tourists and everyone different languages using language translator.
who travels.

978-1-5090-0467-6/16/$31.00 ©2016 IEEE 91


In [7] presented the basic idea of interfacing of speech to A. Sequence Diagram
computer. The automatic speech recognition (ASR) played the
key role of taking the speech technology to the people. The Video Acquisition
system is classified further as feature extraction & feature Device
recognition. The aim was to study and to explore how neural
network can be implemented to recognize a particular word
speech as an alternative to the traditional methods. Personal
Computer
In [8] presented the voice recognition system automatically
by machine means an easy mode of communication between
human & machine. The area of voice processing has implied Serial
to application like text to speech processing, speech to text Communicatio
processing, telephonic communication call routing etc. Here n
various features extraction, technique are discussed such as
relative spectral processing (RASTA), linear predictive coding
Raspberry
(LPC) analysis and many more.
Pi
In [9] presented the idea of an innovative and real time low
cost technique that makes user comfortable to hear the
contents of the images or any kind of abstracted document Speakers/ Graphical
instead of reading through them. It has used a combination of Amplifiers LCD
OCR “Optical Character Recognition and TTS text to speech
synthesizer “.This system helps the visually impaired person Figure 2: Block Diagram
to interact with the machine like the computer easily through
the vocal interface. B. Explanation
In [10] proposed a system which is mainly designed for the 1) Aquisition Device :
blind user , which have interest in the field of education , this The first step in the system is to capture the image or video
people study with the help of audio recording provided by using acquisition device that is camera or web camera. Using
NGOெs or the Braille Books. This will provide them to have the segmentation process , region including the text extracts
an audio material of their own choice of any printed material the character or symbol, the captured image or video contain
of object. The whole system mainly includes OCR “Optical the background, complex layout, blur perspective distortion,
Character Recognition “ which will perform/include the sensor noise etc. which cannot be simplified by the simple
various operations like thresholding, filtering, segmentation thresholding . The text is recorded from the facet of the video
and many more. or scene which is captured from the camera or web camera
(e.g. road signs, posters and symbols).
2) Personal Computer :
In this stage personal computer is used only for the
MATLAB programming. After the image is captured by the
camera, remaining process i.e. character recognition, feature
extraction, segmentation, symbol/template matching, speech
synthesis and language conversion, all this is done via
MATLAB programming.
3) Serial Communication :
Adafruit serial cable is connected to Raspberry Pi 3. Serial
communication is used to send the data between Raspberry Pi
3 and the personal computer. Raspberry Pi 3 consists of two
serial ports TxD and RxD to send and receive the data
respectively. Connecting Raspberry Pi to PC is done using a
Figure 1: Street Sign Boards
USB port; this is the easiest option to use a USB-to-serial
cable which uses 3.3 V logic levels
4) Raspberry Pi 3 :
III. PROPOSED TECHNIQUE Hardware support package for MATLAB using Raspberry
To make a portable hybrid system, the methodology used here Pi is a new feature in latest software of MATLAB which
is Optical Character Recognition and hardware enables us to deploy MATLAB files into Raspberry Pi easily.
implementation using Raspberry Pi 3. Firstly, images will be With the help of the Raspberry Pi 3, interfacing of the
captured via web camera, then extracting text from image to speakers and graphical liquid crystal display is done.
convert that text into speech. After that, there is further 5) Speakers :
process of language translation of the original speech signal For the output of the system, speakers are to be interfaced
generated using a software. Then the whole system will be with the Raspberry Pi. A Text-To-Speech synthesizer is used
made portable by deploying it on Raspberry Pi 3 and for this process which is able to read any text aloud. The text
interfacing the required components. information converted into speech signals with the use of the
speaker enables us to listen a text converted into voice format.

6) Display Device :

92
After the text is converted into the speech of different 2) Optical Charater Recognition: This block is used for
languages, then the system also displays that text on the character extraction and symbol matching which is included in
display device like GLCD are used to display that text which algorithm and explained in detailed , this is the area where all
are converted into the speech. the input images or videos captured by camera or webcam are
processed and then the feature extraction is done by various
IV. ALGORITHM
techniques. Here we also perform the template matching ,if the
Text reading system has two main parts: image to text character or the symbol is to be matched, then the operations
conversion and text to voice conversion. Image into text and are performed as character/symbol to text conversion[5].
then that text into speech conversion is achieved using
programming on MATLAB platofrm.
A. Work Flow Syntax:
txt = ocr(I)
Image Text to Speech Language txt = ocr(I, roi)
Acquisition OCR Conversion Translator [ __ ] = ocr( __ ,Name,Value)
Device

Embedded Example:
Platform businessCard = imread(„businessCard
.pngெ); ocrResults = ocr(businessCard)
recognisedText = ocrResults.Text; figure;
Speakers Display Device imshow(businessCard);
text(600, 150, recognizedText, „BackgroundColorெ, [1 1 1]);
Figure 3: Flowchart
Steps:
1) Image Acquisition Device: A computer vision system Step 1: Image acquisition: Different types of images like
processes images acquired from an electronic camera, which is monochrome, gray scale and color, any image can be used as
like the human vision system where the brain processes input to any video acquisition like web camera whose primary
images derived from the eyes[8].The camera or webcam is operation is to sense and capture.
used as the acquisition device, which captures the image
symbol or sign which will ultimately help us to get the sample Step 2: Preprocessing: Not mandatory but preprocessing is
image for the further process and to get the symbol for done when the input image is not clear and requires
recognition. Also we get the video from this device through deblurring, denoising, resizing or reshaping.
which we can extract the required area of the interest and can Step 3: Detect MSER Regions: Maximally Stable Extremal
be processed further for various operations to get the final Regions (MSER) is used as a method of blob detection in
results. images. Mathematically,
a) USB Web Camera: Let Q1,…,Qi-1,Qi,… be a series of nested extremal regions
(Qi ࢫ Q i+1). Extremal region Qi* is maximally stable if and only
if q(i)=|Qi+¨\ Qi-¨|/|Qi| has a local minimum at i*. (Here | |
denotes cardinality). ¨ ࣅ S is a parameter of the method.
The equation checks for regions that remain stable over a
certain number of thresholds. If a region Qi+¨ is not
significantly larger than a region Qi-¨, region Qi is taken as a
maximally stable region.
3) Text to Speech Conversion : Basically called as “speech
synthesis”. In this block the output obtained from OCR i.e. the
matched image, sign or symbol is in the form of text which we
Figure 4: Logitech USB 2.0 QuickCam Web Camera have converted by symbol to text conversion technique here it
is converted into speech by using text to voice converter
technique, this speech is processed further for various
General Features: languages ,which will help the user to get the information
• Quality CiF (352 x 288) CMOS about the total area which is of the required interest.
• Video Capture is 649 x 480 pixels with SW enhanced
• Still image capture is 640 x 480 pixels with SW 4) Language Translation : Here the speech which is taken
enhanced as the output of the text to speech converter which is in its
• Frame rate up to 30 frames per second default language i.e. in English can be translated in the
• USB 2.0 certified different languages like Hindi, Marathi and Gujrathi etc. This
will help the user to get the information in his/her own

93
language, so that he can travel easily without any of the reading while browing is all done by any video acquisition
identification problem. User can be able to get the information device such as a web camera. Then the image is read with the
by this system easily correctly without any loss of time. help of „imreadெ command. „imreadெ reads image from
graphics file.
5) Embedded Platform : "Raspberry Pi 3" is used as the
embedded platform , which will help the whole system to be 2) Pre-Processing : Pre-processing is not a mandatory
made portable by interfacing the required components. Firstly, step. It is used only when the image is corrupted. It consists of
we will implement the complete system with the use of PC a number of steps to make the raw data usable for recognizer.
and then make it portable by means of this embedded platform It is mainly used for removal of noise, deblurring, resizing and
using "Raspberry Pi". reshaping of the image. MSER technique is also used for blob
detection.
6) Speakers and Display Device : It is the device which is
used for getting the voice in the different languages like Hindi, 3) Feature Extraction : The objective of feature extraction
English, Marathi, etc. which will help the user to get the is to capture the essential characteristics of the symbols. The
complete info in his own language it will be able to pronounce most precise way of obtaining a character is by its raster
the different language to make the user easy to get the basic
image. Other way is to uproot certain features that still
info. This info will help the visually impaired person which is
ultimately the goal of the system. The LCD display is used for characterize symbols, but leaves out the unimportant
displaying the different types of language outputs, which will attributes[5].
help the deaf people and he can get the complete icon by
seeing on the display in his own language, the display gives 4) Template Matching/Symbol Matching : A simpler way
better results and which is the basic need of the system. of template matching uses a convolution mask (also called as
template) structured to a related feature of the search image,
which we want to detect. This technique can be easily
B. Optical Character Recognition performed on grey images or edge images. We will match the
extracted images with the collected data or the data base. Here
Input Image the information is already stored with the use of programming
and hence, created the data base.
Pre-Processing
5) Charater/Symbol Recognition : Character recognition
deals with cropping of each line and letter, performing
Feature correlation and writing to a file. Before performing correlation
Extraction we have to load the already stored model templates so that we
can match the letters with the templates. After we got the
Template character by character segmentation we store the character
Matching image in a structure.

If not matched 6) Character/Symbol to Speech Conversion : First, it


transforms text including symbols like numbers or
Character abbreviations into its equivalent of written-out words. This
Symbol process is often called text normalization, pre-processing, or
Recognition
tokenization[9].
Recognized Symbol to Text
Character Conversion 7) Display : Display are used to display the text when we
capture image by the camera or any type of the scanner
processing it and convert that symbol and character into the
Display text and display through the GLCD. This GLCD display is
interfaced with the Raspberry Pi.
Figure 5: Algorithm of OCR

1) Input Image : Capturing the image may be on streets


while travelling, in classroom while teaching, for information

94
C. Hardware Description TABLE III. GPIO

Camera Board Functions


Functions Meaning
raspi Create connection to Raspberry Pi
hardware
configurePin Configure GPIO pin as digital input,
digital output, or PWM output
readDigitalPin Read logical value from GPIO input pin

writeDigitalPin Write logical value to GPIO output pin

showPins Show diagram of GPIO pins

V. RESULTS AND DISCUSSIONS


Case 1: Using OCR, character extraction is done for two
different images as shown below in figure 6 and figure 7.

Figure 6: Raspberry Pi 3 Kit

The Raspberry Pi 3 is the third generation Raspberry Pi. It


replaced the Raspberry Pi 2 Model B in February 2016.
Following features is used in the project:
• A 1.2GHz 64-bit quad-core ARMv8 CPU
• USB ports
• GPIO pins
• Ethernet port
• Camera interface (CSI)
• Display interface (DSI) Figure 7: Result Image I (Case I)

TABLE I. CAMERA BOARD

Camera Board Functions


Functions Meaning
raspi Create connection to Raspberry Pi
hardware
cameraboard Create connection to Raspberry Pi
Camera Board Module
snapshot Capture RGB image from Camera

record Record video from Camera Board

stop Stop video recording from Camera Board

TABLE II. WEB CAMERA

Camera Board Functions


Functions Meaning Figure 8: Result Image II (Case I)
raspi Create connection to Raspberry Pi
hardware Result shows that each and every character from the image
is extracted successfully irrespective of their fonts.
webcam Create connection to Raspberry Pi Web
Camera Case 2: Region of Interest (ROI) can be found easily in
snapshot Capture RGB image from Camera different images as shown below in figure 8 and figure 9.

95
making information browsing for people who do not have the
ability to read or write. This approach can be used in part as
well. If requirement is only for text conversion then it is
possible or else text to speech conversion is also done easily.
People with vision impairment or visual dyslexia or complete
blindness can use this approach for reading the documents,
books and also while travelling. People with vocal impairment
or complete dumb person can utilize this approach to turn
typed words into vocalization. Tourists having language
barrier can also use this approach for understanding different
languages. People travelling in cars and buses can save their
time using this feature. Experiments have been performed to
test the text and speech generation system and good results
have been achieved. Still the work is in progress for symbol
extraction.

Figure 9: Result Image I (Case II)


REFERENCES
[1] Preeti.Kale, S. T. Gandhe, G. M. Phade, Pravin A. Dhulekar,
“Enhancement of old images and documents by Digital Image
Processing Techniques” Presented in IEEE International Conference on
Communication, Information & Computing Technology (ICCICT),
Mumbai, India. 16-17 January 2015.
[2] Harshada H. Chitte, Pravin A. Dhulekar, “Human Action Recognition
for Video Surveillance” Presented in International Conference on
engineering confluence, Equinox, Mumbai, Oct 2014.
[3] Suraj Mallik, Rajesh Mehra,”Speech To Text Conversion For Visually
Impaired Person Using Ȃ Law Companding ” Publish In Iosr Journal Of
Electronics And Communication Engineering (Iosr-Jece) E-Issn: 2278-
2834,P- Issn: 2278-8735.Volume 10, Issue 6, Ver. Ii (Nov - Dec .2015).
[4] Mrs. S. D. Suryawanshi, Mrs. R. R. Itkarkar, Mr. D. T. Mane ,“ High Quality
Text To Speech Synthesizer Using Phonetic Integration ”Publish In
International Journal Of Advanced Research In Electronics And
Communication Engineering (Ijarece) Volume 3, Issue 2, February 2014.
[5] Pooja Chandran, Aravind S, Jisha Gopinath And Saranya S S ,” Design
And Implementation Of Speech Generation System Using Matlab ”
Figure 10: Result Image II (Case II) Publish In International Journal Of Engineering And Innovative
Technology (Ijeit) Volume 4, Issue 6, December 2014.
We have successfully achieved the region of interest in [6] Devika Sharma ,Ranju Kanwar ,” Text To Speech Conversion With
different images. Language Translator Under Android Environment ” Publish In
International Journal Of Emerging Research In Management
&Technology Issn: 2278-9359(Volume-4,Issue-6).
CONCLUSION [7] Siddhant C. Joshi, Dr. A.N.Cheeran “Matlab Based Back-Propagation
Neural Network For Automatic Speech Recognition”, IJAREEIE(An Iso
This paper is an effort to implement an innovative robust 3297: 2007 Certified Organization)Vol. 3, Issue 7, July 2014.
approach for character extraction and text to voice conversion [8] Siddhant C. Joshi, Dr. A.N.Cheeran, “A Comparative Study of Feature
of different images using optical character recognition and text Extraction Techniques for Speech Recognition System” ISSN: 2319-
to speech synthesis technology. A user friendly, cost effective, 8753, IJIRSET ,Vol. 3, Issue 12, December 2014.
reliable to all and applicable in the real time system is [9] K.Kalaivani, R.Praveena,V.Anjalipriya,R.Srimeena, “Real Time
achieved. Using this methodology, we can read text from a Implementation Of Image Recognition And Text To Speech
Conversion”,IJAERT,Volume 2 Issue 6, September 2014, Issn No.:
document, street sign boards, web page or also an e-Book and 2348 – 8190.
can generate synthesized speech through any portable system [10] Kalyani Mangale, Hemangi Mhaske, Priyanka Wankhade, Vivek
i.e. computer's speakers or phoneெs speaker. In developed Niwane, “Printed Text To Audio Converter Using OCR”, International
software, the use of computer vision has set all policies of the Journal Of Computer Applications (0975 – 8887) (NCETACt-2015).
characters corresponding to each and every alphabet [11] N.Swetha K..Anuradha, ”Text-To-Speech Conversion”, Issn 2278-3091,
International Journal Of Advanced Trends In Computer Science And
extraction, matching, its pronunciation, and the way it is used Engineering, Vol.2 , No.6, Pages : 269-278 (2013).
in grammar and dictionary. Speech processing has given [12] Mark Nixon, Alberto Aguado, Feature Extraction and Image Processing,
robust output for various images. Embedded platform gave a 2nd Edition.
real time platform to work with. This saves userெs time by [13] Sunil Jadhav, P. A. Dhulekar, Dr.G.M.Phade, “Hybrid Optical Character
allowing him to listen background materials while performing Recognition Method for Recognition of Text in Images”, IEEE FIFTH
International Conference on Computing of Power ,Energy &
other tasks. Other application of this system includes such as Communication” ICCPEIC -2016, April 20th and 21th – 2016.

96

Anda mungkin juga menyukai