DEPARTMENT OF ELECTRONIC ENGINEERING UNIVERSITY OF ENGINEERING AND TECHNOLOGY TAXILA MARCH, 2011
ABSTRACT
Making the machine to learn to recognize the human voice is a great need of the day. Voice recognition is the processes of making the computer intelligent enough to distinguish a voice spoken by a human from all other voices. In voice recognition, voice is first recorded sampled, framed, and windowed and then different features that make it distinguishable are extracted. These features include short time energy, zero crossing rate, and spectral roll off spectral flux spectral centriod. Once these features are extracted than pattern recognition is applied, whose output moves the robotic arm according to the voice command of the user. Neural networks are used for pattern recognition with five neurons at the input layer. This voice recognition system is 80 percent accurate when experimented with a single user. Our project covers all the commands needed to control a robotic arm.
II
UNDERTAKING
We certify that research work titled VOICE CONTROLLED ROBOTIC ARM is our own work. The work has not, in whole or in part, been presented elsewhere for assessment. Where material has been used from other sources it has been properly acknowledged/ referred.
Signature of Student
ZESHANANWAR.07-ECT-02
BASIT AHMAD.07-ECT-18
JIBRAN SIDDIQI...07-ECT-19
RAMEEZ JAVED07-ECT-31
III
DEDICATION
We dedicate this project to our Beloved, Caring, Loving and Respected Parents And Family members.
IV
ACKNOWLEDGEMENTS
With the Grace of the Almighty ALLAH WHO bestowed us, many blessings and we became able to complete our project, we are thankful to ALLAH who is the CREATOR of the whole Universe. We acknowledge the help and support of our Parents and teachers. Due to which we reached our destination. We are thankful to our supervisor Prof.Dr.Umar Farooq and special thanks to Engr.Ahmad Nauman for encouraging us during our project.
TABLE OF CONTENTS
Abstract...ii Undertaking................iii Dedication.......iv Acknowledgement..........v Table of Contents................................................................................................. vi List of Figures.....x List of Tables......xi Chapter 1: Introduction
1.1 Problem Statement ...............12 1.2 Objective....12 1.3 Organization of Project Report.............13
VI
3.1 Introduction.....20 3.2 Features.......21 3.2.1 Zero Crossing Rate..21 3.2.2 Short Time Energy.23 3.2.3 Spectral Roll Off.......24 3.2.4 Spectral Flux......25 3.2.5 Spectral Centroid25
Chapter 4: Introduction To Machine Learning 4.1 Artificial Intelligence..26 4.2 Machine Learning...26 4.2.1 Definition .............27 4.2.2 Generalization....27 4.3 Human Interaction .....27 4.4 Algorithmic Types..27 4.4.1 Supervised Learning..........28 4.4.2 Statistical Classification.........29
5.1 Introduction...31 5.2 ANN in Matlab........31 5.3 How does an ANN Work? ....................................................32 5.3.1 Transfer Function..........34 5.3.2 Feed-Forward Back Propagation....37 5.3.3 Epoch..39 5.3.4 Training of ANN.......39
VII
6.1 Introduction..40 6.2 GUI in Matlab..40 6.2.1 How does GUI work................................................41 6.2.2 Ways to built Mat lab GUIs.....................................42 6.3 Creating GUI with guide........42 6.3.1 What is guide? .......42 6.3.2 Opening Guide .......43 6.3.3 Laying Out a Guide to GUI .. 43 6.3.4 Adding Component to GUI ...44 6.4 GUI of Our Project.........45 6.4.1 Start.....46 6.4.2 Trouble Shoot..47 6.4.3 Training...48 6.4.4 Simulation Mode.49
7.1 Introduction.....................................................................50 7.2 Types of Industrial Robotics50 7.3 Mechanical Component of Robotic Arm.51 7.3.1 Power Source..51 7.3.2 Actuator......51 7.3.3 Plat Form.......51 7.3.4 End Effecter....51 7.4 Characteristics of Robotic Arm .........................................52 7.4.1 Joints......52 7.4.2 Payload.......52 7.4.3 Reach...52 7.4.4 Repeatability.......53 7.4.5 Axis of Rotation.........53
VIII
Chapter 8: Electronic Components Of Robotic Arm 8.1 Electronic Components............................................................54 8.1.1 AC and DC Supply54 8.1.2 H-Bridge....54 8.1.3 7805 Voltage Regulator........55 8.1.4 Actuator.56 8.1.5 7404N IC...57 8.1.6 DB25 Connector.......57 8.2 Motor Driving Circuit............................................................60
Problems and Troubleshooting................................................................61 Conclusion ...62 Future Suggestions......63 Appendix A. 64 Appendix B. .......66 Appendix C. ...........67 Reference.............69
IX
List of Figures
Figure Number Page
Fig 2.1 Sampling Rate....................................................................................15 Fig 2.2 Retrieving..........................................................................................15 Fig 2.3 Hamming Window............................................................................18 Fig 2.4 Result of Hamming Window.............................................................19 Fig 3.1 Block diagram of Voice/Unvoiced Classification.............................21 Fig 3.2 Defining of Zero Crossing Rate.........................................................22 Fig 3.3 Distributio0n of Zero Crossing for Voice/Unvoiced Speech.22 Fig 3.4 Computation of Short Time Energy..................................................24 Fig 5.1 Style of Neural Computation.............................................................32 Fig 5.2 Network Data Manager.....................................................................33 Fig 5.3 Create the Input Data.........................................................................33 Fig 5.4 Create the Network............................................................................34 Fig 5.5 Hard Limit Transfer Function...................................................35 Fig 5.6 Pure Line Transfer Function..............................................................35 Fig 5.7 Log Sigmoid Transfer Function........................................................36 Fig 5.8 View of Our Project Network............................................................37 Fig 5.9 Feed Forward Back Propagation.......................................................38 Fig 5.10 Result of Training............................................................................38 Fig 6.1 Adding Component to GUI...............................................................44 Fig 6.2 our Project GUI...45 Fig 6.3 Start Button.46 Fig 6.4 Trouble Shoot Button........................................................................47 Fig 6.5 Training Button..................................................................................48 Fig 6.6 Simulation Button..............................................................................49 Fig 8.1 L298...................................................................................................55 Fig 8.2 Regulator Circuit...............................................................................55 Fig 8.3 DC Motor...........................................................................................56 Fig 8.4 7404N IC...............................................................................57 Fig 8.5 DB25 Connector................................................................................58 Fig 8.6 Motor Driving Circuit........................................................................60
10
List of Tables
Table 8.1 Pin configuration of DB2559
11
CHAPTER 1:
INTRODUCTION
1.2 OBJECTIVE:
Prime objective of our project is to develop a voice recognition system that can be used as command and control for any machine. For that, we have chosen robotic arm. Since it has an extensive set of commands, which can be used to train our voice recognition system Most of the voice recognition systems are developed for speech to text conversion [2]. Since these systems are operated for a large vocabulary they are less accurate and require more computation. our aim was to develop a voice recognition for the command and control of a Voice Controlled Robotic Arm
12
machine, while using less computational power by using non conventional approach to voice recognition i.e. instead of using convention features of voice(e.g. MFCC and LPC) we have chosen a set five feature that make a voice distinguishable enough to control a robotic arm. In our project, we achieve the accuracy of 80%, for a single user. Voice command and control has many applications from military to air force and from telephony to people with disabilities. Voice command and control is used in military ground vehicles where commander can rotate and aim its gun on the target through his voice while sitting inside in the safe environment of tank [3]. This application of voice command and control was our major source of inspiration for making this project since a robotic arm and machine gun has the difference of only the end effecter.
13
interfacing circuitry and different electronic components used in the making of robotic arm. At the end of report, we have mention some Problems and Troubleshooting associated with our project. Report ends by giving the Conclusion on the project and suggesting some Future work on the project.
14
CHAPTER 2:
2.1 INTRODUCTION
Front end are generalized terms that refer to the initial and the end stages of a process. The front end is responsible for collecting input in various forms from the user and processing it to conform to a specification the back end can use. The front end is an interface between the user and the back end. In our project front end, processing has two phases. In first phase, we have done sampling and recording, retrieving and normalizing, and finally framing and windowing. The second phase involves feature extraction [4].
15
Fig 2.1: Sampling rate..[4] .The sampling rate in our project is 44100 samples per sec
2.2.2 Recording
The voice is recorded with the help of microphone and saved in the hard drive with the sampling rate (44100 samples per sec).
2.2.3 Retrieving
Retrieving is done at the sample rate of 44100 samples per sec through wave read function of MATLAB.
16
Fig 2.2: Retrieving After retrieving the copies of voice command signals is sent to each feature extraction blocks.
2.2.4 Normalizing
The process of normalizing is called pre emphasis. In this process all the samples in the voice signal is divided by the maximum value of a sample. It is done to reduce the dynamic range of the signal and to make it spectrally flat [4].
2.2.5 Framing
Framing is called frame blocking. In this step, the voice command signal is divided into a no of blocks [4]. Each frame (block) contain equal no of voice samples.
2.2.6 Windowing
Windowing is done to remove the discontinuities at the start and at the end of the frame. We have employed Hamming window for this purpose. Voice Controlled Robotic Arm
17
Fig 2.3: Hamming window[4] These are specific examples from a general family of curves of the form [4] W (i) = a + (1 - a) cos (2 pi i / N).[4]
18
19
CHAPTER 3:
Feature Extraction
3.1 Introduction:
The second phase of front end processing in our project is feature extraction. After the first phase, the voice samples are sent to each feature extraction block. The voice features are extracted per frame of voice signal. The five features of two types are extracted per frame. The two types are time domain and spectral feature. The five features we have used in our project are short time energy; zero crossing rate, spectral roll off, spectral flux, and spectral centroid. The first two features are time domain and last three are spectral features. Statistics are involved to each feature block to generalize a features value over an entire signal. The statistics are involved to each feature extraction block because the pattern recognition technique requires that the feature should give a single value over e entire signal. Due to which statistics are involved to represent a general trend in voice command signal. As we used five features, every feature should give a single value so vector of five values is given to the input of the neural network. For example, zcr is first calculated per frame after that a mean value of it is calculated using all of its value per frame. This value is now ready for input of pattern recognition. Similarly the statistics of other features are calculated as fallows:
20
3.2 Features:
3.2.1 Zero-Crossings Rate Zero-crossing rate is an important parameter for voiced/unvoiced classification. It is also often used as a part of the front-end processing in automatic speech recognition system. The zero crossing count is an indicator of the frequency at which the energy is concentrated in the signal spectrum [5].
The analysis for classifying the voiced/unvoiced parts of speech has been illustrated in the block diagram In Fig 3.1
21
In the context of discrete-time signals, a zero crossing is said to occur if successive samples have different algebraic signs. The rate at which zero crossings occur is a simple measure of the frequency content of a signal. Zero-crossing rate is a measure of number of times in a given time interval/frame that the amplitude of the speech signals passes through a value of zero. Speech signals are broadband signals and interpretation of average zero-crossing rate is therefore much less precise.
However, rough estimates of spectral properties can be obtained using a representation based on the short- time average zero-crossing rate.
22
..................................................[5]
3.2.2 Short Time Energy The short time energy measurement of a speech signal can be used to determine voiced vs. unvoiced speech. Short time energy can also be used to detect the transition from unvoiced to voiced speech and vice versa [5]. The energy of voiced speech is much greater than the energy of unvoiced speech. . Short-time energy can define as:
.[5] The choice of the window determines the nature of the short-time energy representation. In our model, we used Hamming window. The hamming window gives much greater attenuation outside the band pass than the comparable rectangular window.
23
h(n) = 0 , otherwise
Fig 3.4: Computation of short time energy..[5] The attenuation of this window is independent of the window duration. Increasing the length, N, decreases the bandwidth, Fig 5. If N is too small, details of the waveform. If N is too large, E n will change very slowly and thus will not adequately reflect the changing properties of the speech signal. 3.2.3 Spectral Roll off Description: The spectral Roll Offs point R determines where 85% of the windows energy is achieved [6]. It is used to distinguish voiced from unvoiced speech and music. Resolution: window. Parameters: SOUND file, start time, end time. Based on the Sub bands RMS. Formula:
24
..[5]
3.2.4 Spectral Flux Description: It determines changes of spectral energy distribution of two successive windows Resolution: window. Parameters: SOUND file, start time, end time. Based on the Sub band RMS. 3.2.5 Spectral Centroid Description: The spectral centroid is the balancing point of the sub band energy distribution. It determines the frequency area around which most of the signal energy concentrates and is thus closely related to the time-domain Zero Crossing Rate feature. It is also frequently used as an approximation for a perceptual brightness measure. Resolution: window. Parameters: SOUND file, start time, end time, start sub band number, end sub band number. Based on the Sub band RMS.
CHAPTER 4:
25
26
4.2.1 Definition A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. 4.2.2 Generalization The core objective of a learner is to generalize from its experience. The training examples from its experience come from some generally unknown probability distribution and the learner has to extract from them something more general, something about that distribution, which allows it to produce useful answers in new cases.
Supervised Learning generates a function that maps inputs to desired outputs. For example, in a classification problem, the learner approximates a function mapping a vector into classes by looking at input-output examples of the function.
27
Unsupervised Learning models a set of inputs, like clustering. Semi-Supervised Learning combines both labeled and unlabeled examples to generate an appropriate function or classifier.
Reinforcement Learning learns how to act given an observation of the world. Every action has some impact in the environment, and the environment provides feedback in the form of rewards that guides the learning algorithm.
Transduction tries to predict new outputs based on training inputs, training outputs, and test inputs.
Learning to Learn learns its own inductive bias based on previous experience.
Back propagation
Bayesian statistics
o o o
Case-based reasoning Decision trees Inductive logic programming Gaussian process regression Group method of data handling (GMDH) Learning Automata
28
Minimum message length (decision trees, decision graphs, etc.) Lazy learning Instance-based learning
o
Probably approximately correct learning (PAC) learning Ripple down rules, a knowledge acquisition methodology Symbolic machine learning algorithms Sub symbolic machine learning algorithms Support vector machines Random Forests Ensembles of classifiers
o o
Linear classifiers
o o o o
29
CHAPTER 5:
5.1 Introduction
An Artificial Neural Network (ANN) is an information processing paradigm that is inspired by the way biological nervous systems, such as the brain, process information. The key element of this paradigm is the novel structure of the information processing system. It is composed of a large number of highly interconnected processing elements (neurons) working in unison to solve specific problems. ANNs, like people, learn by example. An ANN is configured for a specific application, such as pattern recognition or data classification, through a learning process [6].
31
necessary information to "discover" the optimal operating point There is a style in neural computation that is worth describing.
Fig 5.1: Style of Neural computation..[8] An input is presented to the neural network and a corresponding desired or target response set at the output. An error is composed from the difference between the desired response and the system output. This error information is fed back to the system to adjust the system parameters in a systematic fashion (the learning rule). The process is repeated until the performance is acceptable [8]. It is clear from this description that the performance hinges heavily on the data. In artificial neural networks, the designer chooses the network topology, the performance function, the learning rule, and the criterion to stop the training phase, but the system automatically adjusts the parameters. At present, artificial neural networks are emerging as the technology of choice for many applications, such as pattern recognition, prediction, system identification, and control.
32
In the ANN described above, the user selects a tool by writing in m file nntool
Fig 5.2: Network/data manager in nntool User selects data and target value from new or if we have data in any file than data can be imported by Import. Target is value from which we match our output value
33
Than user create network in which training function, transfer function, number of layer and number of neuron in each layer is selected.
5.3.1 Transfer Functions: Three of the most commonly used functions are shown below.
34
transfer function.
The hard-limit transfer function shown above limits the output of the neuron to either 0, if the net input argument n is less than 0; or 1, if n is greater than or equal to 0.
The linear transfer function gives the value of positive value for input value greater than one and negative one for input value less than one.
The sigmoid transfer function shown below takes the input, which may have any value between plus and minus infinity, and squashes the output into the range 0 to 1.
In our project we use three layers of neuron , in first layer we select five neuron for five features of voice and in second layer we select twenty neuron in hidden layer and in third layer one neuron is selected for output [11].
In our project, we used the feed-forward back propagation. The view of our network is given below
36
5.3.2 Feed-Forward Back Propagation: A feed forward neural network is an artificial neural network where connections between the units do not form a directed cycle. This is different from recurrent neural networks. In this network, the information moves in only one direction, forward, from the input nodes, through the hidden nodes (if any) and to the output nodes. There are no cycles or loops in the network [10]. In Back-propagation output values are compared with the correct answer to compute the value of some predefined error-function. By various techniques, the error is then fed back through the network. Using this information, the algorithm adjusts the weights of each connection in order to reduce the value of the error function by some small amount. After repeating this process for a sufficiently large number of training cycles, the network will usually converge to some state where the error of the calculations is small. In this case, one would say that the network has learned a certain target function.
37
5.3.3 Epoch:
38
An entire pass through all of the input training vectors is called an epoch. When such an entire pass of the training set has occurred without error, training is complete [11]. One iteration through the process of providing the network with an input and updating the network's weights. Typically, many epochs are required to train the neural network. 5.3.4 Training of ANN: If sim and learnp are used repeatedly to present inputs to a perceptron, and to change the perceptron weights and biases according to the error, the perceptron will eventually find weight and bias values that solve the problem, given that the perceptron can solve it. Each traverse through all of the training input and target vectors is called a pass [12]. The function train carries out such a loop of calculation. In each pass the function train proceeds through the specified sequence of inputs, calculating the output, error and network adjustment for each input vector in the sequence as the inputs are presented.
CHAPTER 6:
6.1 INTRODUCTION
In computing a graphical user interface (GUI, sometimes pronounced gooey) is a type of user interface that allows users to interact with electronic devices with images rather than text commands. GUIs can be used in computers, hand-held devices such as MP3 players, portable media players or gaming devices, household appliances and office equipment. A GUI represents the information and actions available to a user through graphical icons and visual indicators such as secondary notation, as opposed to text-based interfaces, typed command labels or text navigation. The actions are usually performed through direct manipulation of the graphical elements [13].
40
6.2.1 How Does a GUI Work? In the GUI described above, the user selects a data set from the pop-up menu, and then clicks one of the plot type buttons. The mouse click invokes a function that plots the selected data in the axes. Most GUIs wait for their user to manipulate a control, and then respond to each action in turn. Each control, and the GUI itself, has one or more user-written routines (executable MATLAB code) known as callbacks, named for the fact that they "call back" to MATLAB to ask it to do things. The execution of each callback is triggered by a particular user action such as pressing a screen button, clicking a mouse button, selecting a menu item, typing a string or a numeric value, or passing the cursor over a component. The GUI then responds to these events. You, as the creator of the GUI, provide callbacks, which define what the components do to handle events. This kind of programming is often referred to as event-driven programming. In the example, a button click is one such event. In event-driven programming, callback execution is asynchronous, that is, it is triggered by events external to the software. In the case of MATLAB GUIs, most events are user interactions with the GUI, but the GUI can respond to other kinds of events as well, for example, the creation of a file or connecting a device to the computer. You can code callbacks in two distinct ways:
As MATLAB language functions stored in files As strings containing MATLAB expressions or commands (such as 'c sqrt(a*a + b*b);'or 'print') =
41
Using functions stored in code files as callbacks is preferable to using strings, as functions have access to arguments and are more powerful and flexible. MATLAB scripts (sequences of statements stored in code files that do not define functions) cannot be used as callbacks. Although you can provide a callback with certain data and make it do anything you want, you cannot control when callbacks will execute. That is, when your GUI is being used, you have no control over the sequence of events that trigger particular callbacks or what other callbacks might still be running at those times. This distinguishes event-driven programming from other types of control flow, for example, processing sequential data files. 6.2.2 Ways to Build MATLAB GUIs: A MATLAB GUI is a figure window to which you add user-operated controls. You can select, size, and position these components as you like. Using callbacks you can make the components do what you want when the user clicks or manipulates them with keystrokes. You can build MATLAB GUIs in two ways: Use GUIDE (GUI Development Environment), an interactive GUI construction kit. Create code files that generate GUIs as functions or scripts (programmatic GUI construction).
42
6.3.2 Opening GUIDE: There are several ways to open GUIDE from the MATLAB Command line. You can also right-click a FIG-file in the Current Folder Browser and select Open in GUIDE from the context menu. When you right-click a FIG-file in this way, the figure opens in the GUIDE Layout Editor, where you can work on it.
All the tools in palette have tool tips. Setting a GUIDE preference lets you display the palette in GUIDE with tool names or just their icons. See GUIDE Preferences for more information
43
Create toolbars Modify the appearance of components Set tab order View a hierarchical list of the component objects Set GUI options
44
This is the GUI of our project. It consists of following push button. 1. Start 2. Troubleshoot 3. Help 4. Training 5. Simulation mode
45
6.4.1 Start:
By pressing start button dialogue box is open which contains speak push button, which allow user to speak with in one second time.
46
6.4.2 Troubleshoot:
By pressing trouble shoot button the dialogue box open which consist of following buttons: 1. Rotary motor 2. Gripper open 3. Gripper close 4. Back to main menu
47
6.4.3 Training:
By pressing training button the dialogue box open which consist of following buttons: 1. Enter samples 2. Calculate feature vectors 3. Train ANN 4. Back to main menu
48
6.4.4 Simulation:
By pressing simulation button the dialogue box open which consist of following buttons: 1. Back to main menu 2. Simulate
49
CHAPTER 7:
Robotic Arm
7.1 INTRODUCTION
A robotic arm is a robotic manipulator, usually programmable, with similar functions to a human arm. The links of such a manipulator are connected by joints allowing either rotational motion (such as in an articulated robot) or translational (linear) displacement. The links of the manipulator can be considered to form a kinematics chain. The business end of the kinematics chain of the manipulator is called the end effecter and it is analogous to the human hand. The end effecter can be designed to perform any desired task such as welding, gripping, spinning etc., depending on the application [14].
Cartesian Robot /Gantry Robot: Used for pick and place work, application of sealant, assembly operations, handling machine tools and arc welding. It's a robot whose arm has three prismatic joints, whose axes are coincident with a Cartesian coordinator.
Cylindrical Robot: Used for assembly operations, handling at machine tools, spot welding, and handling at die-casting machines. It's a robot whose axes form a cylindrical coordinate system.
Spherical/Polar Robot: Used for handling at machine tools, spot welding, die-casting, fettling machines, gas welding and arc welding. It's a robot whose axes form a polar coordinate system.
50
SCARA Robot: Used for pick and place work, application of sealant, assembly operations and handling machine tools. It's a robot which has two parallel rotary joints to provide compliance in a plane [15].
Articulated Robot: Used for assembly operations, die-casting, fettling machines, gas welding, arc welding and spray painting. It's a robot whose arm has at least three rotary joints.
Parallel Robot: One of the uses of parallel Robot is mobile platform handling cockpit flight simulators. It's a robot whose arms have concurrent prismatic or rotary joints.
Our voice controlled robotic arm is a fixed sequence cylindrical robot. It is a robot whose motion along an axis is limited and the motion between these axes can not be stopped. This robot performs operation according to preset information that cannot be easily changed [14].
51
In our project an end effecter is a mechanical gripper. It consists of just two fingers, that can opened and closed to pick up and let go of a range of small objects. We have used a thread and two small springs. When motor rotates in one direction, thread is made to wrapped on its shaft, as a result springs become stretched and gripper fingers open. When motor rotates in other direction thread becomes unwrapped. AS a result springs compress and gripper closed.
52
7.4.4 Repeatability: A measurement may be said to be repeatable when this variation is smaller than some agreed limit. The repeatability of this robotic arm is 1.5. 7.4.5 Axis of Rotation: Our robotic arm can move in cylindrical co-ordinates having a constant z parameter. It can rotate 270 degrees in circular motion and its end effectors can move 12 inches linearly.
53
CHAPTER 8:
54
8.1.3 7805 Voltage Regulator The 7805 voltage regulators employ built-in current limiting, thermal shutdown, and safeoperating area protection which make them virtually immune to damage from output overloads.
55
8.1.4 Actuators Solenoid actuated valves and a D.C Motor are used as actuators. Output voltage of the first relay is 12 volt and 5volts applied to its coil. Relays which are connected in relay board have output voltage of 220 volts and they are energized by first group of relays having output voltage of 12volts and .The output of second group of relays is used to actuate the Solenoid actuated valves [16].
56
Fig 8.4: Diagram of 7404NIC..[16] We have used 7404 IC as in current buffer it is not gate IC which takes the input (1 or 0) and provides its complement at output. It has inverter as shown in figure above. 8.1.6 DB25 Connector: The DB25 (originally DE-25) connector is an analog 25-pin plug of the D-Subminiature connector family (D-Sub or Sub-D). The D-subminiature or D-sub is a common type of electrical connector used particularly in computers. The part containing pin contacts is called the male connector or plug, while that containing socket contacts is called the female connector or socket [18].
The DB25 is also used for parallel port connections, and was originally used to connect printers, and as a result is sometimes known as a "printer port. DB25 serial ports on computer generally have male connectors, while parallel port connectors are
57
58
59
60
Problems
Selection of optimal software. Selection of mathematical techniques and codes regarding softwares toolbox. Designing lay-out of physical model. Design of appropriate electronic hardware. Unavailability of components while designing hardware. Selection an optimal designing tool for hardware components modeling and simulation.
Troubleshooting
We worked and discussed on various softwares while surfing on the net and interacting with our teachers but ended up on deciding to work on MATLAB.
Among several techniques, to convert voice to do the desired action we selected the features of voice and apply neural network.
We consider several factors like weight and then choose suitable material for fabrication of our physical model.
Amongst several designing softwares we adopted PROTEUS for designing of electronic components integration.
61
Conclusion
This report explains the implementation of voice controlled Robotic Arm.The three phases of voice recognition,along with some basis of machine learning which were concernd with the project were described.GUI designed for our project was also explained.Robotic Arm designed and developed for the project was also explained in detail in the report. Voice recognition consists of three phases front-end processing, feature extraction and pattern recognition. Front end processing consists of sampling, recording, retrieving, framing and windowing. The second phase of voice recognition is feature extraction, which consists of extraction of features. The third phase is pattern recognition, which is a major concern of machine learning. The approach of pattern recognition implemented in our project is Artificial Neural Network. ANN used in our project is feed-forward back propagation. After voice recognition, the report explains the design and development of Robotic Arm. The robotic arm designed for our project is a fixed sequence robotic arm with the end effecter of gripper. To establish the coordination between robotic arm and computer, and between human and computer a GUI was designed in our project. Our implementation of voice recognition is 80% accurate when tested 100 times in some environment. Accuracy degrades with change in environment due to noise. Accuracy also degrades with change in microphone and training samples.
62
Future suggestions
We have implemented the voice-recognition on PC.It can also be implemented in a convinient microcontroller.The most suitable microcontroller for that purpose is DS PIC33, since it has builtin ADC and also have three pin port for voice recording and playing. Our voice recognition system some what depends upon environment.It is because of noise (voices other than voice commands).Though we have done the noise removal in feature extraction, in which features like ZCR detects the major voice activity region in a given sample.But the system can be made less dependent on noise by implementing of noise removal algorithms at the front end processing. Moreover it has enough processing power for front end processing and pattern recognition. We are also intrested in implementing our voice recognition system for the voice controlled wheel chair. It will require a battery source along with the implementation of voice recognition on the microcontroller.
63
Appendix A
MATLAB CODE FOR FEATURE EXTRACTION
CODE OF SHORT TIME ENERGY:
%variable E contain the value of short time energy Function E = ShortTimeEnergy(signal, windowLength,step); %normalizing the vector signal contains the samples of voice command signal = signal / max(max(signal)); curPos = 1; %used to start computation on voice samples from the index 1 L = length(signal); numOfFrames = floor((L-windowLength)/step) + 1; H = hamming(windowLength); %calculating E per frame E = zeros(numOfFrames,1); for (i=1:numOfFrames) window = (signal(curPos:curPos+windowLength-1)); E(i) = (1/(windowLength)) * sum(abs(window.^2)); curPos = curPos + step; end
64
%variable F contain the value of spectral flux function F = SpectralFlux(signal,windowLength, step, fs) %normalizing. the vector signal contains the samples of voice command signal = signal / max(abs(signal)); curPos = 1; %used to start computation on voice samples from the index 1 L = length(signal); numOfFrames = floor((L-windowLength)/step) + 1; H = hamming(windowLength); m = [0:windowLength-1]'; %calculating F per frame F = zeros(numOfFrames,1); for (i=1:numOfFrames) window = H.*(signal(curPos:curPos+windowLength-1)); FFT = (abs(fft(window,2*windowLength))); FFT = FFT(1:windowLength); FFT = FFT / max(FFT); if (i>1) F(i) = sum((FFT-FFTprev).^2); else F(i) = 0; end curPos = curPos + step; FFTprev = FFT; end
%variable En contain the value of spectral entropy function En = SpectralEntropy(signal,windowLength,windowStep, fftLength, numOfBins); %normalizing. the vector signal contains the samples of voice command signal = signal / max(abs(signal)); curPos = 1; %used to start computation on voice samples from the index 1 L = length(signal); numOfFrames = floor((L-windowLength)/windowStep) + 1; H = hamming(windowLength); %calculating En per frame En = zeros(numOfFrames,1); h_step = fftLength / numOfBins; for (i=1:numOfFrames) window = (H.*signal(curPos:curPos+windowLength-1)); fftTemp = abs(fft(window,2*fftLength)); fftTemp = fftTemp(1:fftLength); S = sum(fftTemp); for (j=1:numOfBins) x(j) = sum(fftTemp((j-1)*h_step + 1: j*h_step)) / S; end En(i) = -sum(x.*log2(x)); curPos = curPos + windowStep; end
65
Appendix B
MATLAB CODE FOR COMPUTING STATISTICS:
%FF vector contain the statistical value of the features of the voice command function FF = computeAllStatistics(fileName, win, step) %win is the window size and step is the frame size [x, fs] = wavread(fileName); E = ShortTimeEnergy(x, win*fs, step*fs); Z = zcr(x, win*fs, step*fs, fs); R = SpectralRollOff(x, win*fs, step*fs, 0.80, fs); C = SpectralCentroid(x, win*fs, step*fs, fs); F = SpectralFlux(x, win*fs, step*fs, fs); FF(1) FF(2) FF(3) FF(4) FF(5) = = = = = statistic(EE, 1, length(EE), 'std'); statistic(R, 1, length(R), 'std'); statistic(C, 1, length(C), 'std'); statistic(F, 1, length(F), 'std'); statistic(E, 1, length(E), 'stdbymean');
66
Appendix C
for i = 1:1 file = sprintf('%s%d.wav','c',i); input('You have 1 seconds to say your name. Press enter when ready to record--> '); y = wavrecord(44100,44100); sound(y,44100); wavwrite(y,44100,file);
end load net1 %loading net1 neural network object which was trained for the pattern recognition %file is the string containinfg the name of the file in which given voice %command is saved
67
file = sprintf('%s%u.wav','c',1); %ff1 is the feature vector containing the feature values of voice command ff1=computeAllStatistics(file, win, step); ff1=ff1'; %simulating the net1 for the given voice command and the result is save in %vector a a=sim(net1,ff1); % varable b contain the result from ANN when implemented on the given %voice command b=a(1,1); b if(b>1) %function call for movement of robotic arm rotaryfornechay end if(b<1) %function call for movement of robotic arm rotaryforuper end
%hwline are the hardware pins of the port which are added to %the object dio
hwlines = addline(dio,0:7,'out'); %bvdata is the logical 1 and 0 ouput to the port bvdata = logical([0 1 0 0 0 0 0 0]); putvalue(dio,bvdata); pause(4); dio = digitalio('parallel','LPT1'); hwlines = addline(dio,0:7,'out'); bvdata = logical([0 0 0 0 0 0 0 0]); putvalue(dio,bvdata);
68
References
[1] Pattern Recognition. Bishop, C Neural Networks for Pattern Recognition3rd Edition,1996 [2] Prasad D Polur, Ruobing Zhou, Jun Yang, Fedra Adnani, Rosalyn S. Hobsod SPEECH RECOGNITION USING ARTIFICIAL NEURAL. 2001 Proceedings of the 23rd Annual EMBS International Conference, October 25-28, Istanbul, Turkey.. [3] Roziati Zainuddin, Othman O. Khalifa. .Neural Networks Used for Speech Recognition. NINETEENTH NATIONAL RADIO SCIENCE CONFERENCE, ALEXANDRIA, March, 119-21, 2002. [4] LAWRENCE Rabinar, Biing-Hwang Jaung Fundamentals of speech recognition. [5] Harmonic Spectral Centroid. McAdams, S. 1999. Perspectives on the contribution of timbre to musical structure. Computer Music Journal. 23(3):85-102 [6] Neural Network: Eric Davalo and Patrick Naim Neural Networks 3rd Edition,1989 [7] Feed Forward Neural Network. http://en.wikipedia.org/wiki/ /Feed forward neural network [8] Neural Network :Aleksander, I. and Morton An introduction to neural computing. 2nd edition, 1992. [9] Neural Network.http://www.emsl.pnl.gov:2080/docs/cie/neural/neural.homepage.html [10] Pattern Recognition. Bishop, C Neural Networks for Pattern Recognition3rd Edition,1996 [11] Neural network: Howard Demuth, Mark Beale Neural Network Toolbox4th Edition, July 2002 Voice Controlled Robotic Arm
69
[12] Neural Network: Hagan,M.T. and H.B. Demuth, Neural Networks for Control, Proceedings of the 1999 American Control Conference, San Diego, CA, 1999, pp. 1642-1656. [13] GUI: http:// en.wikipedia.org/wiki/Graphical_user_interface [14] Saeed b Nikku. Introduction to robotics, 2nd Edition, 2001 [15] Robotic Arm .http://en.wikipedia.org/wiki/Robotic_arm. [16] B.L.Theraja.Electrical Machines 8thEdition, 4 Volumes Set by Tony Burns, Stephen... [17] H Bridge http://en.wikipedia.org/wiki/H_bridge [18] DB25 connector. http://www.nullmodem.com/DB-25.htm
70