Anda di halaman 1dari 7

JURNAL SIMETRIK

ISSN : 2302-9579
VOLUME 6, NOMOR 1, Juni 2016

Penanggung Jawab
Dr. Sammy Saptenno, SE., M.Si

Ketua Penyunting
Vicky Salamena, SST., MT

Redaktur
Aleksander A Patty, ST., MT

Penyunting Pelaksana
Luwis H. Laisina, ST., MT
Paulus F. Picauly, ST., M.Eng
Graciadiana I. Huka, ST., MT
Reynold P. J. V. Nikijuluw, S.Pd., M.Ed

Desain Grafis
Ridolf Kermite, ST

Tata Usaha
Wa Hauli

Alamat Penyunting dan Tata Usaha :


Pusat Penelitian dan Pengabdian kepada Masyarakat Politeknik Negeri Ambon
Jln. Ir. M. Puttuhena Wailela Rumah Tiga Kota Ambon 97234.
Website: www.uppm.polnam.ac.id. e-mail: jurnalsimetrik@gmail.com
i
DAFTAR ISI

OPTIMUM DATA LENGTH TO TRAIN ISOLATED SPEAKER DEPENDENT 1-4


INDONESIAN DIGIT RECOGNIZER
(ZULKARNAEN HATALA, ARI PERMANA L)

PENERAPAN STRATEGI SQ4R UNTUK MENINGKATKAN PEMAHAMAN MEMBACA


5 - 11
DALAM PEMBELAJARAN BAHASA INGGRIS TEKNIK
(MEYKE MARANTIKA)

RANCANGAN ALAT PENGIRIS BAWANG DAN PEMBUATAN KERIPIK OLAHAN


12 - 15
BAHAN MAKANAN SKALA RUMAH TANGGA
(ARTHUR LEIWAKABESSY, NUR HAYATI NAHUMARURY)

EVALUASI KAPASITAS BALOK STRUKTUR RANGKA PEMIKUL MOMEN GEDUNG


16 - 22
BPJN WILAYAH IX MALUKU DAN MALUKU UTARA
(VECTOR R. R. HUTUBESSY)

ANALISIS KAPASITAS SISTEM PENGAMAN DAN PENGHANTAR PADA INSTALASI


23 - 30
GEDUNG BENGKEL DAN LABORATORIUM JURUSAN ELEKTRO POLNAM
(LORY PARERA, ARI PERMANA L)

ANALISIS PAPARAN LOGAM Pb PADA IKAN ASAP YANG DIJUAL DI KOTA AMBON
31 – 38
(MUHAMMAD SAID KARYANI)

PERANCANGAN SISTEM KONTROL MENGGUNAKAN PLC Omron CP1E


39 - 43
UNTUK MENGGERAKAN MESIN AC
(RINA LUCIANE MANUHUTU, SAMY JUNUS LITILOLY)

ii
JURNAL SIMETRIK VOL 6, NO. 1 JUNI 2016, ISSN : 2302-9579

OPTIMUM DATA LENGTH TO TRAIN ISOLATED SPEAKER


DEPENDENT INDONESIAN DIGIT RECOGNIZER
Zulkarnaen Hatala1), Ari Permana L2)
1,2)
Electrical Engineering Department, Politeknik Negeri Ambon
e-mail: dzulqarnaenhatala@gmail.com

Abstract
The performance of isolated digit recognition for Indonesian language with local accent will be measured.
The software set to be used is Hidden Markov Toolkit (HTK). A set of very minimal time length of training sound is
to be measured. The result will be a plot of time length against the word error rate.

Keywords: isolated speech recognition, speaker dependent, optimum training set

1. INTRODUCTION Digit Recognition


Background Digit Recognition is an ASR System that
Many speaker recognition systems [1] require a identifies certain digit or number from spoken sound.
very long time of training set to achieve certain level of Such a system will be integrated into other system like
performance. But how about system that needs to be phone dialer or number dictation application.
train quickly? What is the optimum minimum sense Hidden Markov Toolkit (HTK)
length of time sentences or words spoken really needed Hidden Markov Toolkit [2] from Cambridge
to build simple system like speaker dependent isolated University is a set of training and testing tool for
digit recognition. This paper will address the issue and Hidden Markov Gaussian Mixture model [2].
experimenting with Indonesian language and Practically HTK is used intensively in speech
Ambonesse accent as a target system to be tested. The recognition research across the world.
software to use is Hidden Markov Toolkit (HTK).
3. METHODOLOGY
2. SPEECH RECOGNITION Hardware Setup
Isolated Word Speaker Dependent For an ASR system, common set of hardware are
There are many various automatic speech installed. Here we use cheap Bluetooth microphone
recognition (ASR) systems. The one that only identify input, laptop with 2.65 GHz processor and 4GB of
spoken single word that begin and ending with silence or RAM.
no sound is called isolated word recognition. That is
there is always a pause between words. A speaker ASR Steps
dependent is the system that built for only a single The ASR process itself contains two distinct phases:
person and single environment to use this application. 1. Training process to construct HMM-GMM model,
This single user must perform distinct criterion such as a in this phase, sample sounds are recorded through
local accent. Even for the same user, but speak different microphone input from human user. And the
way of pronunciation will degrade the performance. mathematical model is estimated from those sounds. At
this stage also annotating or labeling is performed to

1
JURNAL SIMETRIK VOL 6, NO. 1 JUNI 2016, ISSN : 2302-9579

mark boundary of sequences of phonemes, as shown in digit recognition, as can be seen in figure 2:
figure 1. Phonemes are a subword used by HTK and can $NUMS=(NUM_0

be modeled by HMM-GMM. The digit or word can then |NUM_1|NUM_3|NUM_4|NUM_5|NUM_6|

be re synthesized by analyzing the sequences of NUM_7|NUM_8|NUM_9|NUM_2);


( SIL {$NUMS SIL } )
phonemes happen in utterance.
Figure 2: HTK Dictionary Entries
HTK Dictionary
HTK need a file called dictionary to mapping between word
and its subword phonemes. For Indonesian Digit Recognizer
we use the dictionary on figure 3:

NUM_0 k o s o ng
NUM_1 satu
NUM_3 tiga
NUM_4 ampat
NUM_5 lima
NUM_6 anam
NUM_7 tujuh
Figure 1: labelling sound files NUM_8 lapan
NUM_9 sembilan
2. Recognition Process or testing process will examine NUM_2 duwa

the system performance. In this phase a total time of SIL sil

sample sound is measured for some level of recognition


performance. The performance criterion will be the Figure 3: HTK Dictionary Entries

word accuracy. The formula be calculated by HTK as:


4. EXPERIMENTAL RESULT
H I
Accuracy   100% Single Digit Recognition
N
H: number of correct words For the first system we test, we only need to recognize

I: number of insertions single Indonesian digit “kosong” from its silence


boundary.

HTK Configuration 1. Data format: PCM 16 bit, mono, 16000Hz

HTK is a set of ready to use shell scripts and 2. Speech Feature Model: MFCC_E_D_A

programming library to train and to test ASR system. In 3. Phone context: monophonic

order to work, a few configuration files must be written 4. Phones: \k\, \o\, \s\, \ng\, \sil\

explicitly. These configurations point to specific 5. HMM-GMM: Diagonal covariances, 5 states with 3

format or method of feature extraction, HMM-GMM excitation states, enter state and exit state.

parameters, language grammar and dictionary. 6. Words: silence and 0 (“kosong”)


The result is plot in figure 4:

HTK Grammar
This is the language grammar use to perform isolated

2
JURNAL SIMETRIK VOL 6, NO. 1 JUNI 2016, ISSN : 2302-9579

LENGTH Accuracy LENGTH Accuracy


7.4 seconds 100% 28.15 seconds 100%

Figure 4: Single Digit Recognition Results Figure 7: Four Digits Recognition Results

Two Digits Recognition Figure 8 plot 5 digits result with this criterion:
For the next system we test, how to to recognize two 1. Phones: \k\, \o\, \s\, \ng\, \sil\, \s\,\a\,\t\,\u\, \i\, \g\, \m\,
Indonesian digit “kosong” and “satu” plus silence \p\. \l\
boundary. 2. Words: silence, 0, 1, 3, 5 and 4.
1. Data format: PCM 16 bit, mono, 16000Hz LENGTH Accuracy
2. Speech Feature Model: MFCC_E_D_A 35.69 seconds 100%
3. Phone context: monophonic
4. Phones: \k\, \o\, \s\, \ng\, \sil\, \s\,\a\,\t\,\u\ Figure 8: Four Digits Recognition Results
5. HMM-GMM: Diagonal covariances, 5 states with 3
excitation states, enter state and exit state. Figure 9 plot 6 digits result with this criterion:
6. Words: silence and 1 (“satu”) and 0 (“kosong”), 1. Phones: \k\, \o\, \s\, \ng\, \sil\, \s\,\a\,\t\,\u\, \i\, \g\, \m\,
\p\. \l\, \n\
The result is plot in figure 5: 2. Words: silence, 0, 1, 3, 5, 6 and 4.
LENGTH Accuracy LENGTH Accuracy
13.8 seconds 100% 41.8 seconds 100%

Figure 5: Two Digits Recognition Results Figure 9: Six Digits Recognition Results

Figure 6 plot 3 digits result with this criterion: Figure 10 plot all Indonesian digits result with
1. Phones: \k\, \o\, \s\, \ng\, \sil\, \s\,\a\,\t\,\u\, \i\, \g\ constraint:
2. Words: silence, 0, 1 and 3. 1. Phones: \a\, \b\, \d\, \e\, \g\, \h\, \i\, \j\, \k\, \l\, \m\, \n\,
\ng\, \o\, \p\, \s\, \sil\, \t\, \u\, \w\
LENGTH Accuracy 2. Words: silence, 0, 1, 3, 5, 6, 7, 8, 9, 2 and 4.
21.0 seconds 97.59%
29.0 seconds 98.80% LENGTH Accuracy
38.1 seconds 100% 82.35 seconds 95.59%

Figure 6: Three Digits Recognition Results Figure 10: Indonesian Digits Recognition Results

Figure 7 plot 4 digits result with this criterion: If we summarize the number of digits to recognize
1. Phones: \k\, \o\, \s\, \ng\, \sil\, \s\,\a\,\t\,\u\, \i\, \g\, \m\, versus data length required for a certain level of
\p\. performance, Accuracy we get table on figure 11 and
2. Words: silence, 0, 1, 3 and 4. figure 12:
3
JURNAL SIMETRIK VOL 6, NO. 1 JUNI 2016, ISSN : 2302-9579

Finally we conclude that to train isolated


Data Length speaker dependent Indonesian Digit ASR we need
Number
Required
of Digit data less than 2 minutes.
(seconds)
1 7.4
2 13.8 Future work
3 38.1 Future research will be hold on more non
4 28.15 trivial system like continuous digit recognizer,

5 35.69 independence of the speaker and end user

6 41.8 application like phone dialer or price dictation of

10 82.35 accounting and business applications.

Figure 11: table of number of digit versus data length 6. REFERENCES


Rabiner, Lawrence R, 1989 “A Tutorial On Hidden
90
Markov Models and Selected Applications in
80
Speech Recognition”, IEEE Proceedings, Vol. 77
70

60
No.2, February 1989.
data length (s)

50 S. Young, G. Evermann, M. Gales, T. Hain, D.


40 Kershaw, G. Moore, J. Odell, D. Ollason, D.
30
Povey, V. Valtchev and P. Woodland, “The HTK
20
Book”, 2001-2005 Cambridge University
10
Engineering Departments, Website:
0
1 2 3 4 5 6 7 8 9 10
number of digit http://htk.eng.cam.ac.uk/docs/docs.shtml.
Figure 12: number of digit versus data length

5. CONCLUSION AND FUTURE WORK


Conclusion
We see generally it’s not need much
samples to trains the specific ASR systems here.
For single digit recognition, i.e. a digit model plus
a silence, theoretically from the experiment we
conducted above, we only need about 7 seconds of
training sentences to achieve nearly zero error
system. Of this single digit recognition is maybe
trivial and no use at all. But it hints us that, with
training strategy and supervised selection of
training databases, we can achieve high level of
accuracy with very short training data to build an
isolated dependent speaker ASR system.

Anda mungkin juga menyukai