Anda di halaman 1dari 5

Onkar Singh* et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 7, Issue No.

1, 098 - 102

Optimization of Feedforward Neural Network for Audio Classification Systems


Onkar Singh
Department of Electronics and Communication Engineering Rayat Institute of Engineering & Information Technology SBS Nagar, India aujlaonkarsingh@gmailmail.com Department of Electronics and Communication Engineering Rayat Institute of Engineering & Information Technology SBS Nagar, India neerusingla99@gmail.com

Neeru Singla

Manish Dev Sharma

I.

INTRODUCTION

Classification of audio signals according to their content has been a major concern in recent years. There have been many studies on audio content analysis, using different features and different methods. It is a well known fact that audio signals are baseband, one-dimensional signals. General audio consists of a wide range of sound phenomena such as music, sound effects, environmental sounds, and speech and non-speech signals. The classification of audio, at first step, requires the extraction of certain features related to the input sound sample, which may include root-mean-square amplitude envelope, constant Q transform frequency spectrum, Multidimensional Analysis Scaling trajectories, cepstral coefficients, spectral centroid and presence of vibrato. [1] There are two main approaches to this problem of content based classification based on previous extracted features:[2] The first which uses deterministic methods and the one that utilizes probabilistic techniques. There are many research efforts, high accuracy audio classification is only achieved for the simple cases such as speech/music discrimination. Previous works have presented a theoretic framework and application of automatic audio content analysis using some perceptual features and audio classifier based on simple features such as zero crossing rate and short time energy for radio broadcast. [3]

IJ A
ISSN: 2230-7818

@ 2011 http://www.ijaest.iserp.org. All rights Reserved.

ES

Abstract In this paper we classified the audio systems using feedforward neural network to measure the suitability for accuracy in classification and time taken to classify. Here we have investigated and analyzed this system to optimize the neural networks as to what layers and numbers of neurons are most suitable to classify audio wave files. Here accuracy of above 99% is reported.

Researchers have conducted many experiments with different classification models including GMM (Gaussian Mixture Model) [4], BP-ANN (Back Propagation Artificial Neural Network) and KNN (K-Nearest Neighbour)[5]. Many other works have been done to enhance audio classification algorithms such as pre-classification of audio recordings into speech, silence, laughter and non-speech sounds, in order to segment discussion recordings in meetings. [6] The usage of taxonomic structures also helps to enhance classification performance. Pitch tracking methods have also been introduced to discriminate audio recordings into more classes, such as songs, speeches over music, with a heuristic-based model. Figure 1 below is a block diagram of the classification system. An audio file stored in WAV format is passed to a feature extraction function. The feature extraction function calculates numerical features that characterize the sample. When training the system, this feature extraction process is performed on many different input WAV files to create a matrix of column feature vectors [7]. This matrix is then preprocessed to reduce the number of inputs to the neural network and then sent to the neural network for training. After training, single column vectors can be fed to the preprocessing block, which processes them in the same manner as the training vectors, and then classified by the neural network.

Department of Physics panjab university Chandigarh, India mds@pu.ac.in

Page 98

Onkar Singh* et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 7, Issue No. 1, 098 - 102

Figure1-

Block diagram of audio classification system

II. SYSTEM SETUP This section describes the setup of the digital audio classification system. This system is composed primarily of the blocks above and was developed in the Matlab environment. Matlab code can be provided upon request. Data for training and testing the system was taken from ten compact discs. The tracks on each of these CDs were extracted and converted to WAV format and then divided into segments. WAV files are taken as a input files. We can also take MP3 files as input instead of WAV files is desired. The system presented in this paper can be easily converted to take MP3 files as input by pre-appending an MP3 to WAV converter. A. Feature extraction

The feature vectors returned by the feature extraction block were first preprocessed before inputting them to the neural network. Two types of preprocessing were performed, one to scale the data to fall within the range of 1 to 1 and one to reduce the length of the input vector. The data was divided into three sets, one for training, one for validation, and one for testing. The preprocessing parameters were determined using the matrix containing all feature vectors used for training and validation. For testing, these same parameters were used to preprocess test feature vectors before passing them to the trained neural network. The first preprocessing function used was premnmx, which preprocesses the data so that the minimum and maximum of each feature across all training and validation feature vectors is 1 and 1. Premnmx returns two

IJ A
ISSN: 2230-7818

Discriminative features will contribute a lot in audio classification task. In order to improve the accuracy of classification and segmentation for audio sequence, its important to choose the features that can represent the temporal and spectral characteristics properly. In our system, we select the mel frequency cepstral coefficients (MFCC), which was proved to be effective for speech and music discrimination [7 . B. Data preprocessing

@ 2011 http://www.ijaest.iserp.org. All rights Reserved.

ES
III.

A three-layer feedforward backpropagation neural network, shown in Figure, was used for classifying the feature vectors [6]. By trial and error, an architecture consisting of 20 adalines in the input layer, 10 adalines in the middle layer, and 3 adalines in the output layer was found to provide good performance. The transfer function used for all adalines was a tangent sigmoid, tansig. The Levenberg-Marquardt backpropagation algorithm, trainlm, was used to train neural network.

This section will discuss the results of training and testing the classification system.

T
C. Neural Network

parameters, minp and maxp, which were used with the function tramnmx for preprocessing the test feature vectors. The second preprocessing function used was prepca, which performs principle component analysis on the training and validation feature vectors. Principle component analysis is used to reduce the dimensionality of the feature vectors from a length of 124 to a length more manageable by the neural network. It does this by orthogonal zing the features across all feature vectors, ordering the features so that those with the most variation come first, and then removing those that contribute least to the variation. Precpa was used with a value of .001 so that only those features that contribute to 99.9% of the variation were used. This procedure reduced the length of the feature vectors by one half. Precpa returns the matrix transMat, which is used with the function trapca to perform the same principle component analysis procedure on the test feature vectors as performed on the training and validation feature vectors. This was done before passing the test feature vectors to the trained neural network.

III. EXPERIMENTAL RESULTS

Page 99

Onkar Singh* et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 7, Issue No. 1, 098 - 102

A. Classification Here we made to build a classifier that can predict the correct classification of audio wav file based on features extracted from the wave file. Neural networks have proved themselves as proficient classifiers and are particularly well suited for addressing non-linear problems. Given the nonlinear nature of real world phenomena, like predicting success of audio classification, neural networks is certainly a good candidate for solving the problem.[6] The parameters will act as inputs to a neural network and the prediction of success of classification will be the target. Given an input, which constitutes the measured values for the parameters of the wav file the neural network is expected to identify if the audio classification is correct or not. This is achieved by presenting previously recorded parameters of wave file to a neural network and then tuning it to produce the desired target outputs. This process is called neural network training. The samples will be divided into three units:1. Training 2. Validation. 3. Test sets. The training set is used to teach the network. Training continues as long as the network continues improving on the validation set. The test set provides a completely independent measure of network accuracy. The trained neural network will be tested with the testing samples. The network response will be compared against the desired target response to build the classification matrix which will provide a comprehensive picture of a system performance. The next step was to create the neural network discussed above in the system setup section. The training function used was the feedforward backpropagation algorithm, trainlm. Using these parameters we can classify our audio files which we have given in the inputs. The plots of some audio files classification is shown in figures. Figures show the performance and the epoch plot. After training, the system was then tested using the data set reserved for testing. Before passing the test feature vectors to the trained neural network, data preprocessing was performed using the saved parameters from the preprocessing of the training data. Figure 3shoiws the plot between performance and epochs with 97.34% efficiency. The performance reached .0040275before a validation stop occurred. Figure 4,5,6,7 also show the plot between performance and epochs with efficiencies 95, 5%, 97%, 98.2% and 99.15 respectively. In figures 4,5,6,7 performance reached at .000785268, 5.51221e-005, .0032053, .00202006 respectively.

IJ A
ISSN: 2230-7818

@ 2011 http://www.ijaest.iserp.org. All rights Reserved.

ES
Page 100

Onkar Singh* et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 7, Issue No. 1, 098 - 102

IJ A
ISSN: 2230-7818

@ 2011 http://www.ijaest.iserp.org. All rights Reserved.

ES

In the final result we plotted the graph between accuracy and number of neurons. Accuracy is the ratio of correct detection over the all detection [5].

Figure shows the graph between the percentage accuracy and number of neurons. Here we noted that accuracy is above 99% at neuron 3 and 6. So here we analyzed that at neuron 3 we will get the maximum accuracy. This result is made to vary the number of neurons means at which neurons our result is more accurate. So in our project we proof that instead of using a specific network we got the more accurate result to vary the number of neurons.

B. Final result

Page 101

Onkar Singh* et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 7, Issue No. 1, 098 - 102
.[6]Ballan, L.; Bazzica, A.; Bertini, M.; Del Bimbo, A.; Serra, G.; , "Deep networks for audio event classification in soccer videos," Multimedia and Expo, 2009. ICME 2009. IEEE International Conference on., pp.474-477, June 28 2009-July 2009 [7] Xin He; Yingchun Shi; Fuming Peng; Xianzhong Zhou; , " Chinese Conference on , A Method Based on General Model and Rough Set for Audio Classification,"Pattern Recognition,2009.CCPR 2009. pp.1-5,4-6 Nov. 2009.

IV. CONCLUSIONS AND FUTURE WORK The classifier we have built has provided excellent and robust discrimination among speech signals. We extracted the features from the audio content and built the feature vectors, then we applied the neural network to classify the audio, and we used the feedforward training procedure. we reported 99% accuracy. Only a particular and specific neuron is used to make the classification result more accurate. There are many interesting directions that can be explored in the future. To achieve goal, we need to explore more audio features that can be used to characterize the audio system. The second direction is to improve the computational efficiency for neural network.

[1] Yu Song; Wen-Hong Wang; Feng-Juan Guo;, International Conference on Feature extraction and classification for audio information in news video," Wavelet Analysis and Pattern Recognition, 2009. ICWAPR 2009. , pp.43-46, 12-15 July 2009 [2] Mitra, V.; Wang, C.J.; International Joint Conference on, Neural A Network based Audio Content Classification," Neural Networks, 2007. IJCNN 2007. , vol.23, no.8, pp.1494-1499, 12-17 Aug. 2007 [3] Xi Shao; Changsheng Xu; Kankanhalli, M.S.; , Proceedings of the 2003 Joint Conference of the Fourth International Conference on , "Applying neural network on the content-based audio classification," Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. vol.3, no.4. pp. 1821- 1825 vol.3, 15-18 Dec. 2003 [4 Jae-Young Kim; Dong-Chul Park; International Joint Conference on, Application of Bhattacharyya kernel-based Centroid Neural Network to the classification of audio signals," Neural Networks, 2009. IJCNN 2009. , pp.1606-1610, 14-19 June 2009. [5] Yu Song; Wen-Hong Wang; Feng-Juan Guo; , "Feature extraction and classification for audio information in news video," Wavelet Analysis and Pattern Recognition, 2009.ICWAPR 2009. International Conference on,pp.4346,12-15July2009

IJ A
ISSN: 2230-7818

REFERENCES

@ 2011 http://www.ijaest.iserp.org. All rights Reserved.

ES
Page 102

Anda mungkin juga menyukai