Implementation of speech
modification on hardware
Author: Marco Gloeckler (40050956)
Supervisor:
German Supervisor:
Abstract
The main objective of this dissertation was to implement an algorithm called Phase
Vocoder onto a hardware platform. This algorithm is used to time compress or
expand audio or speech. Therefore a Rapid Prototyping Workflow was used.
The whole range of developing a product is covered. This includes choosing suitable
hardware
to
implement
the
Phase
Vocoder.
Furthermore
the
software
Matlab/Simulink was evaluated and chosen because the tool allows Rapid
Prototyping.
The engineered workflow enables to develop a program in an abstract level and build
an executable program with one click.
The Phase Vocoder algorithm itself was evaluated and compared to another time
stretching method. It was then implemented onto the hardware platform, which shows
the differences between simulation and an executable version for hardware.
Marco Gloeckler
ii
Acknowledgement
This thesis has benefited greatly from the support of many people, some of whom I
would sincerely like to thank here.
To begin with, I am really grateful for the help of my supervisor Mr. Jay Hoy. I also
want to thank my second supervisor and German supervisor Mr. James MCWhinnie
and Prof. Dr. D. Pross.
Furthermore I want to thank the technicians of the Edinburgh Napier University who
helped me to set up a computer where I can work with.
Finally I want to thank my family and my friends who supported me and gave me the
opportunity to write the thesis in Edinburgh.
Affirmation
Hereby I, Marco Gloeckler, affirm that I wrote the present thesis without any
inadmissible help by a third party and without using any other means than indicated.
Thoughts that were taken directly or indirectly from other sources are indicated as
such. This thesis has not been presented to any other examination board in this or a
similar form.
I have written this dissertation at Edinburgh Napier University under the scientific
supervision of Mr. Jay Hoy.
Marco Gloeckler
iii
Table of contents
Abstract ..................................................................................................................... ii
Acknowledgement ................................................................................................... iii
Affirmation ............................................................................................................... iii
Table of contents ..................................................................................................... iv
1
Introduction ........................................................................................................ 1
1.1
1.2
1.3
1.4
1.5
1.5.2
1.5.3
1.5.3.1
1.5.3.2
1.5.3.3
Comparison .............................................................................. 8
Preparation ....................................................................................................... 10
2.1
Motivation ...................................................................................................... 1
Objectives ...................................................................................................... 2
Approach ....................................................................................................... 3
Rapid Prototyping Workflow .......................................................................... 4
General Information about DSP..................................................................... 5
1.5.1 What is DSP? ...................................................................................... 5
Software ...................................................................................................... 10
2.1.1 Introduction of Tools .......................................................................... 10
2.1.1.1
2.1.1.2
2.1.1.3
LabVIEW ................................................................................. 12
2.2
2.3
2.4
Theory ............................................................................................................... 20
3.1
3.2
Detail ................................................................................................. 23
Marco Gloeckler
iv
Result of Simulation..................................................................................... 33
5.2
5.1.3
5.1.4
5.1.5
5.1.6
Delay ................................................................................................. 44
Conclusion ....................................................................................................... 46
References........................................................................................................ 47
Appendix ........................................................................................................... 50
9.1
9.2
9.3
9.4
9.5
Marco Gloeckler
Introduction
1 Introduction
1.1 Motivation
Nowadays everything in the field of audio, video and picture processing, industrial
control and communication systems is using digital signal processing (DSP).
Therefore it is important for students and engineers in this field to know the basics
and how to work with it.
In the past digital signal processing was described as very complex and
mathematical. Today, DSP can also be described on an abstract level like block
diagrams or state flows.
Developing algorithms in the field of DSP Rapid Prototyping is often used nowadays.
The goal is to quickly get from a simulation to a prototype. This type of development
allows transferring the developed DSP algorithm from the high level, like state flows
onto hardware for testing. This process enables to prevent costly production errors.
This procedure should be examined and documented for later developments.
This is useful because studies have shown that people read/hear and understand
faster than talk. It is important however that the pitch itself is not changed (1).
This knowledge can be used to play language files faster for example for an
answering machine or to study from audio CDs.
Changing the speed of language is also used for other applications, like speechrecognition or to convert a 32 seconds radio advertisement in the available frame of
30 seconds.
Slowing down speech can also be useful to generate effects in movies.
Marco Gloeckler
Introduction
The range of use in not limited to speech, DJs and producers use this technique to
generate special effects, or to bring two different sound tracks to the same speed to
unite them.
To sum up, there is Rapid Prototyping which includes hardware and software. The
other field of interest is audio/speech time stretching which needs a mathematical
algorithm.
1.2 Objectives
The goal in this project was to develop or use an algorithm to slow down or speed up
speech and implement it on a board with a Digital Signal Processor (DSP). The idea
of the algorithm should be based on the Phase Vocoder method.
An important issue is the timbre of the voice, which should sound as natural as
possible after modification.
A Rapid Prototyping Process should be used to allow fast changes and a good
readable program.
The project starts from scratch so the whole development environment had to be set
up. Therefore examinations of different hardware platforms and also suitable
software had to be considered, evaluated and finally organised.
If suitable hardware and software was found a workflow would have to be tested with
some simple examples.
The theory of speech shifting methods had to be analyzed. The goal was to use the
Phase Vocoder method but other possibilities had to be read and understood, too.
At the end of the project a running version should be on a hardware board and be
ready for a demonstration. The algorithm on board should be modifiable by switches
on hardware or by computer software.
Marco Gloeckler
Introduction
1.3 Approach
As there was no former project or development environment available for this kind of
task, there were a lot of different aspects to consider.
After a first overview what this project includes 3 main topics can be defined.
1. Gathering information in fields like DSP vs. FPGA, fixed-point vs. floatingpoint, processing power and theory of Phase Vocoder
2. A software/hardware combination which allows a high-level approach had
to be organised/bought
3. A Phase Vocoder algorithm had to be found or developed and adapted
to the hardware needs
First research about the Phase Vocoder theory had to be done to get ideas that had
to be considered. As it was also part of the project that the whole development
environment had to be set up other aspects like hardware or software tools had to be
considered as well.
So really basic topics like DSP vs. FPGA and fixed-point vs. floating-point had to be
analyzed. As there was the possibility that the University would not have suitable
hardware, not only processing power and architecture played a key role but also
budget and possible ways of ordering.
But even if a suitable hardware was found it wont mean that this is the solution
because
the
objective
to
use
Rapid
Prototyping
Workflow
needs
software/hardware combination.
This leads to main topic 2 where suitable hardware had to be correlated with
software. In this field there were not just technical aspects in demand but also
available licences and costs.
The result of one and two had to be a hardware/software combination permitting a
Rapid Prototyping Workflow allowing to program hardware very easily and fast.
Besides it had to be suitable for an algorithm like the Phase Vocoder.
The last step would be the implementation of the Phase Vocoder. Therefore
thoughts had to be given about peripheries like microphone and speakers and how
the user can interact with the program.
Marco Gloeckler
Introduction
Algorithm development
and design
Software coding
Hardware
implementation
Figure 1: Rapid Prototyping process
Those are the key stages which have to be considered and they will need some
iteration till the final product can be released.
There are tools available helping engineers to achieve the development of products
as quickly and cost effective as possible.
In the first step the tools allow developing algorithms in a high-level language (HLL).
This means that after designing in state flows or function blocks the tools translate
these into code, often C or Ada.
This translation can often be specified for different hardware platforms, so the code
will be more efficient and flexible.
With the finished coding the code has to be downloaded onto the hardware.
Sometimes it is done in C or for even faster applications in Assembler.
Marco Gloeckler
Introduction
Testing and verification can take place and if errors occur the whole workflow has to
be repeated. But because tools do most of it automatically errors can be fixed
quickly, compared to chips which must be produced and tested. It makes the
developing process much cheaper and faster.
Also a change of hardware platform can be done easily as the adaption can be done
in the tool which generates the HLL (2)(3)(4).
The main advantage in DSP systems is that very complex algorithms and filters can
be implemented, even in real-time.
The hardware platforms used for signal processing are mostly digital signal
processors or Field Programmable Gate Arrays (FPGA).
Such chips are optimised for digital signal processing, which means that they can do
complex calculations extremely fast.
Marco Gloeckler
Introduction
DSP have operations specialised for the fast signal processing called MAC (Multiply,
Add, and Accumulate). This operation can be performed in one clock in a DSP
whereas an ordinary processor would need 70 clocks (5).
Because of two completely different approaches to build the chip both have
advantages and disadvantages.
If the sampling rate exceeds a few MHz, it is difficult for a DSP to process data
without loss. This is due to the access to shared resources like memory or busses.
An FPGA, however, with its fixed internal structure allows high data rates of I/O.
A DSP is designed so that its entities can be reused. Thus, the multipliers used for
the FFT can be used for filters or other things afterwards. In a FPGA the reusing is
hard to achieve and is normally not used.
Therefore, a DSP is capable of working with huge and different programs. It can
perform a big context switch by loading another part of the program.
The FPGA has to have a routine to reconfigure the FPGA which can take a long time,
but it is necessary for huge programs because they cant fit on one FPGA because of
its limited gates.
One major factor in the industry is also the costs, a DSP is cheaper than their
counterparts in FPGA logic.
Marco Gloeckler
Introduction
To summarize a DSP should be used when the sampling rate is low and the
complexity of the program is high, but other factors like available tools and
background of the engineer are important and must be considered in every project
(6)(7).
For the project of the "Phase Vocoder" both hardware platforms are viable because
the complexity is not a problem for current DSP or FPGA (2)(3)(4).
Due to the fixed position of the radix point less calculations are required than with
floating-point numbers. Furthermore the conversion and correction necessary for
multiplications and divisions can be replaced by fast shift operations. So it takes less
processing power to calculate and the calculation can be done easier.
The main problem with this number representation is rounding errors and overflows.
So it is possible that in a multiplication the range of numbers is insufficient, and a
huge number will become negative because it runs out of the range - arithmetic
overflow. Therefore, the developer has to take care of this and scale the numbers in
the development which is time consuming and fault-prone.
To minimize the rounding errors today's processors with 32 or even 64 bits normally
have double the amount of bits for intermediate values within the accumulators (3).
Marco Gloeckler
Introduction
Fixed-point operations simplify numerical operations, they save space, but require a
compromise between accuracy and dynamics (4).
1.5.3.3 Comparison
The floating-point processors of today give a high dynamic range and a good
resolution. Thus, in most of the cases the limitation of the range and the accuracy
can be ignored, which makes the development easier.
This is in contrast to fixed-point designs, where the engineer has to implement
scaling factors to protect against arithmetic overflow. This is very time consuming and
therefore it sometimes can be justified to use a floating-point processor. Especially
where development costs are high and production volumes are low.
Marco Gloeckler
Introduction
To sum up, the advantages of fixed-point are hardware is cheaper and sometimes
faster, but the floating-point processors are more flexible, easier to handle and
numerically more precise. Therefore often mix of both platforms is used to combine
both advantages (3).
For the Phase Vocoder both representations would be suitable but other aspects
had to be taken in account, as described in section 2.2.
Marco Gloeckler
Preparation
2 Preparation
Prior the development of the algorithm being started a suitable development
environment had to be found. Therefore software and compatible hardware had to be
chosen.
As known from the introduction chapter the kind of processors plays a minor role. So
it's not important whether floating-point or fixed-point numbers or whether FPGA or
DSP is used. Although a floating-point processor is preferred because the
development needs fewer thoughts about the data types and normalization.
Important factors were the availability, costs and sufficient performance for the
required algorithm. However the major factor was the compatibility of hardware and
software, which was a difficult part.
In Edinburgh Napier University the TMS320C6711 DSP Starter Kit (DSK) from Texas
Instruments was available. Therefore the board and the software required to use it
were evaluated first.
2.1 Software
If working with any processor of Texas Instruments the Code Composer Studio
(CCS) is needed. So compatible software for CCS had to be found and evaluated.
An overview of possible tools for a Rapid Prototyping process with a Texas
Instrument processor had to be worked out.
The tools are introduced briefly in the following.
Marco Gloeckler
10
Preparation
device families, source code editor, project build environment, debugger, profiler,
simulators, real-time operating system and many other features.(9)
With this tool it is easy to program hardware on a low level. It enables developing and
using C-code to program it on the DSK.
Since it is very extensive and complex to write programs in C or C++ some programs
get presented in the following to simplify the development. One tool could be
Matlab/Simulink another one Labview, as they both generate the C-Code
automatically.
The combination of Matlab and Simulink is very popular and very well documented. It
has a lot of increments and can also be used with third-party products which are
directly implemented in the software. With this tool it is possible to develop a program
in an abstract level and write it directly onto hardware.
It also supports third-party products such as the CCS but also hardware directly as
some processors of Texas Instruments.
Marco Gloeckler
11
Preparation
2.1.1.3 LabVIEW
LabVIEW is a graphical programming environment used to develop sophisticated
measurement, test, and control systems using intuitive graphical icons and wires that
resemble a flowchart. It offers unrivalled integration with thousands of hardware
devices and provides hundreds of built-in libraries for advanced analysis and data
visualization all for creating virtual instrumentation.(11)
Resulting from initial research with Matlab/Simulink and CCS there are ways to use a
DSP board with Simulink. To make this possible, however, extensions for
Matlab/Simulink are necessary. Thus, extensions such as Target Support library,
Embedded Coder, Embedded Target for TI C6000, Real-Time Library (RTW), IDELink, Developer's Kit for Texas Instruments etc. are necessary.
The extension names vary with the versions of MATLAB and are sometimes
combined into suites. This makes it very difficult and time-consuming to get an
overview of the enhancements really needed. Since these versions also must be
compatible with CCS, it was difficult to find the appropriate version and organize it.
The finally used software versions are listed in 9.2.
The extensions are necessary to generate optimized C-code for the DSK. This allows
implementing real-time programs on a DSP board. They also allow hardware support
Marco Gloeckler
12
Preparation
for various manufacturers of processors in Simulink. To work with the processors,
special settings for code development must be applied but also special Simulink
function blocks are required.
These blocks are contained in Embedded Coder but the supported processors differ
on the version of the Embedded Coder. These blocks are optimized and include
functions such as multiplication, FFT, and filtering, as well as specially adapted
blocks to tap of data from AD converter or to control the LEDs on the DSK.
Unfortunately there were no function blocks for the C6711 DSK in the existing
Matlab/Simulink version. In addition, the CCS available for the C6711 was 2.1 and
does not support the Matlab extension "IDE-Link".
This "IDE-Link" also called Link for Code Composer Studio is important to link the
two tools, CCS and Matlab, ensuring automatic code generation. This makes it
possible to link from the abstract Simulink model on the DSK without further
interaction, see 2.4.
Without IDE-Link it is still possible to download the model on the board. However,
this is connected with more effort because the generated C-code from Simulink must
be loaded into CCS in a project with several other files. An explanation of the files
types is attached in 9.4.
But without the appropriate library in Simulink it is a really difficult to develop a model
because there are no blocks, which allow to access data like audio stream or LEDs.
To make those things possible the functions must be written by hand, which would be
a huge expense and would have brought delay to the project.
Thus, it made more sense to look for a new board. The other alternative would be a
suitable Matlab and CCS version for C6711 but because of costs it was discarded.
The DSK board, fully compatible with Matlab and CCS was 330 GBP. In contrast
Matlab with the required extensions would be several thousand pounds.
Therefore it was decided to buy the successor to the C6711 DSK, the C6713 DSK.
This DSK is compatible with the existing Matlab license and with the USB support
allows to use the board with all PCs. Another advantage for the C6713 DSK was
Marco Gloeckler
13
Preparation
that training was already done in Matlab and CCS. Thus, this knowledge could be
used later in the project (12) (13).
Furthermore the software/hardware combination can be used for other projects. With
this high-performance C6713 DSK (further information in 2.3.1) it is possible to
develop complex tasks like a DSL-modem.
2.3 Hardware
As already mentioned the TMS320C6711 DSP Starter Kit couldnt be used because
of software incompatibility. Therefore this board is not described.
The TMS320C6713 DSP Starter Kit is the newer version of the TMS320C6711 DSP
Starter Kit.
This DSK with up to 1800 MIPS of processing power allows the developing of
algorithm in fields like networking, communications, imaging and other applications.
Important for the project was the support of USB and enough processing power
(15)(16).
Marco Gloeckler
14
Preparation
An AIC23 stereo codec with 8-96 kHz sample rates (8-32 Bit word length)
16 MB of synchronous DRAM
The CPU is working with very-long instruction words (VLIW) (256 bits wide).
The DSP 6713 interfaces on-board peripherals through a 32-bit wide EMIF bus
(External
Memory
Interface).
The
SDRAM,
Flash
and
CPLD
(Complex
Programmable Logic Device) are all connected to this bus, see Figure 3.
Marco Gloeckler
15
Preparation
Third parties use this expansion of the EMIF bus for video support, memory
extension, other sound codec, etc.
Analogue audio signals are accessed via an on-board AIC23 codec and four 3.5-mm
audio jacks (microphone input, line input, line output and headphone output). The
analogue input can be microphone (fixed gain) or line(boost), the output line-out
(fixed gain) or headphone (adjustable gain).
The CPLD is a programmable logic device used to tie board components together
and has a register-based interface to configure the DSK.
The DSK has 4 LEDs and DIP switches to allow user to work interactive with the
board. To use this interactive method the CPLD register gets read and written.
Code Composer Studio communicates with the DSK via the integrated JTAG
emulator on-Board. They are connected with a USB interface.
Programs can be downloaded to the board into the SDRAM or Flash. The advantage
of the flash memory is that it will keep the program after a restart of the board.
Marco Gloeckler
16
Preparation
In truth there are a lot of steps and tools needed to make this workflow running.
As you can see in Figure 5 there are different extensions for Simulink needed.
Marco Gloeckler
17
Preparation
First of all there are limitations for the development of the Simulink model, because of
memory management (further described in chapter 5.2). Another difference is the
approach, running the program on hardware rather than in simulation. Thus, it isnt
possible to halt and start the simulation as it is done in simulation. Therefore it is a
different approach which needs to consider problems like different tasks or memory
management (further information in RTW user guide within the Matlab help).
Testing is different too, because there is no comfortable opportunity to see what
happens when the software is downloaded to the board.
Furthermore not all blocks of the Simulink libraries can be used because some are
not supported for code generation. Some Matlab commands are not supported
either, therefore it can be necessary to write some of the functions manually.
If all limitations are considered and adhered to in developing the model and the setup
of the workflow components as described in 9.1 the code generation can be done
without further manual interaction. This is possible because the different pieces of
software are perfectly chained together with their different tasks.
The RTW will automatically generate Ansii C/C++ code from the Simulink-model. It
also adds the I/O device (driver) as inline S-function to the code.
Marco Gloeckler
18
Preparation
The Embedded Target for TI C6000 provides RTW with APIs (Advanced
programming interface) which are needed to build code for the C6000 platform. The
generated data types are listed and explained in 9.3.
With the C-code available the Link for Code Composer Studio invokes the CCS and
builds the executable automatically. Therefore a project is generated with different
data types and functions described in 9.4. The link also invokes the program and
downloads it onto the target.
So with one click all of this is done and the program can be tested on the hardware
within the Code Composer Studio.
This workflow can be easily changed to other targets by changing the driver as long
as there are no essential differences in the memory management.
Marco Gloeckler
19
Theory
3 Theory
Definition of Time Stretching
Time Stretching also known as Time Compression/Expansion, which means the
slowdown or acceleration of an audio or speech signal without altering their pitch.
Pitch shifting is to some extent the counterpart, i.e. the change in pitch, but without
changing the tempo.
Marco Gloeckler
20
Theory
An arbitrary choice of the sections can have the effect of phase hit; therefore the
signal must first be examined for its period. This information is determined by the
Autocorrelation Function (ACF) and is used for the section length.
If the input signal is periodical, it can be reduced by integer factors without altering
the pitch. In natural signals (music, language) additional difficulties arise because
there are not two completely identical sections. Thus, there are phase hits again and
the triangulation has to be used to achieve better results.
The triangulation is a method which avoids phase hits by multiplying a triangle
function to every section. In other words, Section A is multiplied by a falling triangle
function, and Section B with a rising, thus the effect of phase hits is avoided.
To slow down a periodic signal the periodic section is just doubled. For natural
signals the triangulation is used again.
The quality of the output signal depends strongly on the determination of the section
length. Signals which have a periodic pattern can be manipulated by this method
very well.
Sound elements of short duration, such as clicks, drums and percussion, are difficult
to process because they have a pulse-like character and are not periodic. With a
maximum of 40ms long blocks of sounds, the pulse-like character sounds twice in a
row. This can be avoided if the maximum length of a section is shortened. As a
result, the processed signal loses much of basses, which argues against short
sections. Therefore the optimum cut-off has to be determined (20)(21)(22).
Marco Gloeckler
21
Theory
Fourier
Transformation
(FFT).
Marco Gloeckler
22
Theory
3.2.2 Detail
The principal without the spectral manipulation can be found in literature under
STFT and it has his limitation in the FFT. The resolution of the FFT is:
3.1
The window length is normally between 512 and 4096 samples. It could be assumed
to take a long window to get a good resolution of the frequency. Unfortunately it isnt
that simple because with a long window it misses the changes of frequency due to
the fact that the FFT assumes that everything within one frame happens at once.
Therefore a trade off between resolution and accuracy of frequency change must be
done.
Assuming a medium window length of 2048 samples and sampling an audio signal
with 44.1 KHz the resolution will therefore be 21.5 Hz.
For some speech signals this might be acceptable but for audio with a piano for
example the resolution is not good enough. If the fundamental of the piano note is at
80 Hz, there is a mistake of 25%. The piano however has just 6% between
consecutive notes (27).
To get a better frequency resolution without harming the time resolution too much the
Phase Vocoder method gets used. This is achieved with the spectral manipulation
which is using information in the signal the SFFT ignores.
at time
at time
to
to
Marco Gloeckler
23
Theory
With this information an equation is defined to:
3.2
This equation is not solvable yet n is unknown.
But there is a way to get a good estimation of the
The Phase Vocoder analyses a peak in magnitude within two different frames. Then
the closest
and
3.3
The OverlapFactor describes the samples which overlap from two consecutive
windows. If OverlapFactor is 2 half of the samples of the first window will be used in
the next window.
Marco Gloeckler
24
Theory
Another way to describe the overlap is named HopSize, which is the temporal shift of
the window. Described in an example with WindowLength = 256 samples and
HopSize = 64 samples, the windows overlaps with 256-64=192 samples.
With the time information of equation 3.3 the equation 3.2 for fn can be solved. There
will be not one result but many. Thus, the nearest value to the peak in magnitude
received by the FFT is taken.
To describe it within an example the 220 Hz sinusoidal signal with a sampling rate of
44.1 KHz and an overlap factor of 2 was chosen.
Using just the FFT the result would be 215.3 Hz instead of the 220 Hz of the signal.
With the spectral manipulation there will be a more accurate result as shown with
values of the example. The phases corresponding to the magnitude peak are
and
is
Till now the explanation was restricted to a simple sinusoidal signal. If the signal is
more complex and has more frequencies the algorithm stays the same with the
difference that the operation is repeated for every magnitude peak in the spectrum.
This is reasonable as long as the peaks in magnitude are adequately separated by
the FFT.
With this result of the spectral manipulation, where a good estimate of the actual
frequency is available it is possible to do different changes to the signal like reading
Marco Gloeckler
25
Theory
direction inversion, frame shuffling, change the pitch or like in this project timestretching.
In the synthesis part (illustrated in Figure 7) IFFT is used to transform the changed
spectrum back to pieces of the time signal and with the window function it is added to
one time signal.
To change the tempo of the signal the OverlapFactor or HopSize of the window gets
changed, which is obviously making the resulting output file longer or shorter.
If the file is played with the same sampling rate as the input file the speed is changed.
The algorithm described till now is the simplest one to understand and was chosen
therefore. In literature it is referred as spectral peak following.
However the used algorithm in Matlab/Simulink in chapter 4 works slightly different,
the theory is explained in the following.
Another implementation
The implementation is basically the same, the difference is that not just the angles of
the magnitude peak are considered but every angle.
This means phases are not chosen corresponding to a peak but to a bin. A bin is an
amplitude/phase pair of data for each channel/band.
A channel or band is used within the FFT. So for example a window length of 512
has 256 channels. This is because of the double sideband of the FFT.
To sum it up if windowing and transforming 512 samples there will be 256 bins.
Those bins will be used for the phase estimation.
This algorithm is calculating the angle for every bin and compares it with the angle of
the same bin from one frame before. So instead of searching maxima in the
magnitude and compare the corresponding phases the algorithm checks every bin.
Marco Gloeckler
26
Theory
This algorithm has another challenge not mentioned so far and is called phase
unwrapping. The phases after the FFT are modulo 2.
In the spectral peak following method the n of equation 3.2 could be guessed with
the knowledge of the closest FFT result.
In this implementation however the phase gets unwrapped which means that 360
degree gets added if there is more than one cycle as Figure 10 illustrates.
The unwrapping recovers the precise phase values for each bin and is therefore an
important part to get a god result.
Except for the guessing of the phase/frequency the algorithm stays the same. This
method is implemented in the used algorithm described in chapter 4.
Marco Gloeckler
27
allows just these numbers. Furthermore the HopSizes must be smaller than the
window length.
Marco Gloeckler
28
This subsystem changes the time signal in a frequency domain with FFT and the
hanning window function. It also adds an overlap. Therefore the Overlap buffer is
used. The numbers at the signal paths describe the dimensions of the signal. This
means that the input is a frame with 64 samples and at the output there are 512
samples. The 512 is because of the WindowLength. The overlap of the frames in
samples is WindowLength- AnalysisHopSize=448.
There are other windows like hamming window which can be used. Further
information to the hanning window and why this is a good window function can be
read in (31).
After splitting the signal into magnitude and phase the phase manipulation takes
place (see Figure 13). The phases at the input are normalized between and .
This is the complex part and needs some focus. The basic idea is to get a good
frequency estimate by comparing the phases within each bin.
Therefore the addition block takes the actual phases of the frame and subtracts the
phases from the frame before, the result is .
Marco Gloeckler
29
This block computes the principal argument of the nominal initial phase of each
frame.
After this subsystem the expected phase value for the bin gets added because it was
subtracted before. This happens again with the constant block shown in number 3 in
Figure 13.
Marco Gloeckler
30
This rescaling is needed because if changing the time scale the phase changes
occur in a longer time. In other words if there is a 45 change in consecutive bins and
the time scale gets changed it would result in altering the frequency. This happens
because the IFFT spreads the bins further apart and changes the frequency as it now
occurs over a longer time interval.
To prevent this rescaling is used with the time stretching factor.
Marco Gloeckler
31
As shown the phase increment from the actual bin gets added to the phase
increment of the last phase. So there is a continuous slope of phase.
Now the optimized phase is available it gets combined with the magnitude again.
The signal gets transformed back into time domain and multiplied with the window
function as illustrated in Figure 18.
In the last step in the subsystem the overlap gets added with the OverlapAndAdd
block. The output is now 90 samples per frame defined by the SynthesisesHopSize.
Marco Gloeckler
32
The output signal is shown in Figure 20 with the stretching factor of 2 and is therefore
6 seconds long.
The scope of both signals shows that they are not exactly the same but when hearing
them there is now recognisable loss.
Marco Gloeckler
33
Implementation on Hardware
5 Implementation on Hardware
The implementation on the hardware was tricky because the Phase Vocoder is not
a real-time application.
This is in the nature of the applied processing.
Consider talking into a microphone and slowing the speech down with a factor of 2.
So the algorithm would always just have processed half of the input. So after 1
minute of talking just 30 seconds could have been heard. The other values must be
stored in the memory and would cause a buffer overflow if talking for a long time.
Considering time compression would be even worse because the algorithm had to
process values which were not even spoken. After one minute of talking it should
already had an output of 2 minutes, which is obviously not possible.
Thus another implementation had to be chosen. The general idea was to implement
a Processing and a Play block. So when the input signal is recorded it gets
processed and saved into the memory. Afterwards the processed file in the memory
gets played and the user can hear it.
The input file was not a microphone signal but a sample voice signal which was
loaded into Simulink as a variable. To use a microphone would just need another
subsystem but is not a real change to the design.
The top level design of the Simulink model is shown in Figure 21.
Marco Gloeckler
34
Implementation on Hardware
At the left top is the C6713DSK block where parameters for the code generation are
set.
The other blocks are used for controlling the algorithm. As shown the dip switch is
used as input for the Embedded Control Unit to work interactive. This block controls
the other 3 blocks which are used for flashing LEDs to show the user what is
happening, to start Processing the signal and to enable the Play block.
To get a good design much time was spent in the Simulink help file to read about
pros and cons of different subsystems.
The result was the enabled subsystem because this block executes the subsystem
as long as there is a 1 at the enable input. This was considered as a good solution
because generating a 1 is easy and could be done with a lot of different blocks. It
also allows working with different sample times within one system which was
important as the control of the subsystems shouldnt work with a high sample time.
Using a high sample time in the control would use a lot of processing power and is
unnecessary because the user wont change the configuration a few thousand times
per second.
However the Processing and Play block must work with a sample time of
Ts=1/8000 because the input file was sampled with this rate.
Not wasting the processing power the control block works not with Ts but with Tdip
with 100ms. This sample time is fast enough to control the enabled subsystems.
Because of code generation there were limitations using Simulink blocks and Matlab
commands. This had to be considered while designing the control and led to the final
design.
The management of variables was also difficult and is described in 5.2.
Marco Gloeckler
35
Implementation on Hardware
5.1.1 FindEndOfFile
Working with a voice example it would be possible to use a fixed processing time for
that file as the file length is known. To make the control more flexible and to make it
possible to load every audio or voice signal it was necessary to find the end of the
input file.
To achieve this the elements of the subsystem FindEndOfFile shown in Figure 22
are used.
The Overlap Buffer1 changes the frame based signal into a sample based one.
After that it is integrated over an amount of samples. 64 samples were chosen and
tested with different examples with a satisfying result.
The integrated values are then compared to nonzero. So if 64 values are not 0 there
will be a 1 at the output (see Figure 23).
The Rate Transistion1 is needed that this subsystem works together with the slower
working control unit. The Data type Conversion change the data type into double as
the Embedded Control Unit needs a double as input.
Marco Gloeckler
36
Implementation on Hardware
Marco Gloeckler
37
Implementation on Hardware
The Input Signal block reads a variable form the Model Workspace which stores
the speech sample Speech8KHz and transmits it to the Phase Vocoder, which is
doing the calculations as described in chapter 4.
The signal gets rescaled to the normalized input of 1 and gets written to an output
variable y_pnt which is stored in the Matlab Workspace. Working with this
workspace is not a good solution as this block is not working properly with the
Embedded Target for TI C6000 DSP support package (further described in 5.2).
The other path is the former described FindEndOfFile block used to find the end of
the input variable and terminate the enable when processing of the file is done.
Marco Gloeckler
38
Implementation on Hardware
As shown in Figure 25 the former calculated y_pnt signal gets read from the
workspace and rescaled. This is necessary because the DAC block takes as input a
32 bit integer value. As the input file is normalized to 1 it has to be rescaled to the
whole
scale.
The DAC block outputs the signal to the line out port of the DSK where it can be
heard with speakers.
The FindEndOfFileDAC is needed to terminate the enable signal of its own
subsystem after playing the file.
Marco Gloeckler
39
Implementation on Hardware
The Embedded Matlab block was chosen because it has the ability to take normal
Matlab commands and is therefore flexible. The Matlab commands were not needed
because the ones who would be useful couldnt be used because of limitations of
RTW (see 5.2). But this knowledge was achieved while developing. Another
possibility would have been a Stateflow Chart but it wouldnt be that flexible with
using Matlab commands.
The complete code is shown in 9.5. Some parts will be described here to give an
understanding of the working principle.
This block operates with Tdip. This means that every 100ms this block gets
executed.
So first the inputs and outputs are defined as shown in Figure 21.
function [enable,enableplay,ledFlashOut] = fcn(processing,dip, playing)
The explanation will focus on the Processing block as the Play block is quite
similar.
So first there is the definition of some variables and allocation of them.
enablevar=0;
persistent enableFOld;
persistent enableFNew;
enableFNew=processing;
As there is no explicit definition of a type the standard type is used which is double.
Marco Gloeckler
40
Implementation on Hardware
The first if detects a falling edge of the processing input which comes from the
FindEndOfFile block.
The second if checks if the dip switches represent an integer 1 and if the processed
file is not at the end. As long as this is true the enablevar is 1 and keeps the
processing alive.
if ((enableFNew==0) && (enableFOld==1))
fileend = 1;
startplay = 1;
% enables the start of the "Play" block
end
if ((dip==1)&& (fileend==0))
enablevar=1;
else
enablevar=0;
end
This can also be seen in the former shown Figure 23. Because after the signal
changes from 1 to 0 the subsystem stops processing which displays no values after 5
seconds.
This must be done when working with ifs and direct output variables or the compiler
states errors. The output variable , in this case enable, cant be defined within
if/else statements, therefore a new variable enablevar is used within the if/else
and its final state is assigned to the output enable.
The other functions of the Embedded Control Unit besides enabling the processing
are
1. Enabling the Play block
2. Control the flashing of LEDs
3. Reset the control
Integer number
Function
of dip switches
0
Start the processing of the file with adjacent playing of the file
Marco Gloeckler
41
Implementation on Hardware
The actual implemented way the system works is shown in Figure 26.
The first signal is the enable signal at the Processing block, the second one is the
enable of the Play block.
So if the dip switch is 1 the processing starts. As soon as this is finished the Play
block gets enabled and the file can be heard directly.
To restart the system, switches has to be a 0 for resetting the variables and set to
1 again.
Marco Gloeckler
42
Implementation on Hardware
The Enable port is used to make the LEDs flash with the time of Tdip.
If there is no Enable the switch is connecting the Dip port to LED so the user can
see and control the choice of the dip switches.
The first signal in Figure 28 is the Enable. The second one is the Dip and the last
is the signal at the LED block. So as shown if there is no Enable the LEDs
represent the number of the dip switches in this case 1.
Marco Gloeckler
43
Implementation on Hardware
5.1.6 Delay
By now all the blocks were described except for the memory blocks in the top level.
Those blocks are necessary for Simulink to solve the model. If not using these blocks
Matlab states an error that there is an algebraic loop which cant be solved.
This leads to a small delay when finding the end of the file. So the Embedded
Control Unit receives the end of the signal with a delay of Tdip.
However this is not a problem because even if the end of the file wouldnt be
recognised for some seconds it would just be zeros. When playing it there would be a
longer time of zeros but it wouldnt be recognised as it isnt audible.
Marco Gloeckler
44
Implementation on Hardware
so it gets written directly into the variable. As this is in real time the values can be
read from there then.
The other not implemented feature is changing the Phase Vocoder parameters at
runtime.
The reason has also something to do with the storage of data.
In the simulation something like this is done mostly manual. This means that before
starting the simulation the variables are read from a m-file or from a mask. In this
case a mask is used.
The mask opens when clicking onto the Phase Vocoder block. There are fields
were a values can be typed in and assigned to a variable name defined in the mask.
To use masked parameters has the benefits of receiving or writing the values within a
simulation. With commands like getParameter or setParameter these values can
be received and changed while the simulation is running. Unfortunately this is not
possible when developing for hardware.
The solution would be similar to the one before just using different data types.
Another aspect not mentioned so far is Tunable Parameters. These values are the
ones which can be changed within runtime. As some parameters in the Phase
Vocoder are not tunable because Simulink states that there are internal errors this
needs some further research. From todays point of view, however, there shouldnt be
a problem as the Phase Vocoder is controlled by the Embedded Control Unit and
those values would just be changed if processing is not running because the
subsystem would be disabled. Therefore the errors should be switched of in the RTW
configuration.
Marco Gloeckler
45
Conclusion
6 Conclusion
The project covered different fields of development.
It was possible to set up a new developing environment from scratch. Thanks to this
project upcoming projects can use the implemented workflow and organized
hardware to develop with a Rapid Prototyping approach.
Thus, there is no need to spend a lot of time investigating for hardware needs and
tool workflows as with one click the Simulink model can be downloaded onto the
hardware.
Very interesting in this project was the fact that there was nothing to build on. This
meant that so many things had to be considered and evaluated. There were not just
technical aspects but also the financial side of the project had to be considered.
So this project gives an insight in the whole developing process from hardware and
software to simulation, implementation on hardware and finally testing.
Marco Gloeckler
46
<References
7 References
1. Arons, Barry. SpeechSkimmer: A System for Interactively Skimming Recorded
Speech. s.l. : ACM Transactions on Computer-Human Interaction, 1997.
2. Bateman, Andy and Paterson-Stephens, Iain. The DSP Handbook: Algorithms,
Applications and Design Techniques. s.l. : Prentice Hall, 2002. 978-0201398519.
3. Kuo, Sen M., Lee, Bob H. and Tian, Wenshun. Real-Time Digital Signal
Processing: Implementations and Applications. s.l. : Wiley, 2003. 978-0470014950.
4. Proakis, John G. and Manolakis, Dimitris K. Digital Signal Processing (4th
Edition). s.l. : Prentice Hall, 2006. 978-0131873742.
5. Akhan, Mehmet and Larson, Keith. DSP Intro Slides. s.l. : University of
Herdfortshire; Texas Instruments, 1998.
6.
Hunt
Engineering.
[Online]
2011
06
09.
[Cited:
22
12
2011.]
http://www.hunteng.co.uk/info/fpga-or-dsp.htm.
7. Poole, Ian. FPGAs for DSP Hardware. Radio-electronics.com. [Online] [Cited: 11
1
2012.]
http://www.radio-electronics.com/info/rf-technology-design/digital-signal-
processing/fpga-dsp.php.
8. IEEE. IEEE Standard for Floating-Point Arithmetic Std. 754-2008 . 2008. 978-07381-5753-5 .
9. Texas Instruments. [Online] [Cited: 18 11 2011.] http://www.ti.com/tool/ccstudio.
10. MathWorks. [Online] [Cited: 18 11 2011.] http://www.mathworks.co.uk.
11. Instruments, National. National Instruments. [Online] [Cited: 21 11 2011.]
http://www.ni.com/labview/whatis/.
12. MathWorks, Inc. Matlab R2009b Producthelp.
13. Mathworks, Inc. Embedded Target for TI C6000 DSP Release Notes.
14. Texas Instruments Inc. TMS320C6713 DSP Starter Kit. Product Information.
15. Inc., Spectrum Digital. TMS320C6713 DSK Module Technical Reference. 2003.
16. Texas Instruments Inc. Datasheet - TMS320c6711.
17. MathWorks. Developing Embedded Targets using Real-Time Workshop
Embedded Coder. 2010.
18. Rabiner, Lawrence. R. and Schafer, Ronald. W. Digital Processing of Speech
Signals. New Jersey : Prentice-Hal,l Inc., 1978.
19. Ostrop, Dennis and Buhr, Daniel de. Time Domain Harmonic Scaling. Kln :
FH Kln, 2007.
Marco Gloeckler
47
<References
20.
Bhler,
Christian
and
Liechti,
Christian.
Vernderung
der
TheDSPDimension.
[Online]
1999.
[Cited:
18
11
2011.]
http://www.dspdimension.com.
25. Dolson, Mark. The Phase Vocoder: A Tutorial. s.l. : Computer Music Journal,
1986.
26. Laroche, Jean and Dolson, Mark. New Phase Vocoder Technique for PitchShifting, Harmonizing and Other Exotic Effect. New York : IEEE Workshop on
Applications of Signal Processing to Audio and Acoustics, 1999.
27. Sethares, William A. A Phase Vocoder in Matlab. [Online] [Cited: 2011 11 18.]
http://sethares.engr.wisc.edu/vocoders/phasevocoder.html.
28. Portnoff, Michael R. Implementation of the Digital Phase Vocoder Using the
Fast Fourier Transform. s.l. : IEEE Trans. Acoustics, Speech, and Signal Processing,
1976.
29. Sethares, William A. Rhythm and transforms. s.l. : Springer, 2007. 9781846286391.
30. Puckette, Miller S. and Brown, Judith C. Accuracy of Frequency Estimates
Using the Phase Vocoder. s.l. : IEEE TRANSACTIONS ON SPEECH AND AUDIO
PROCESSING, 1998.
31. Gtzen, Amalia De, Bernardini, Nicola and Arfib, Daniel. Traditional (?)
Implementations of a Phase Vocoder: The tricks of the trade. Verona : Proceedings
of the COST G-6 Conference on Digital Audio Effects (DAFX-00), 2000.
32. Thesis, S. Ganapathis M.Sc. Introduction to Simulink, Link for CCS. 2006.
33. The MathWorks, Inc. Target for TI C6000. [Online] [Cited: 12 12 2011.]
http://www.kxcad.net/cae_MATLAB/toolbox/tic6000/f3-108524.html.
34. Murmu, Manas. Application of Digital Signal Processing on TMS320C6713 DSK.
Department of Electronics and Communication Engineering, National Institute Of
Technology, Rourkela. 2008. Bachelor Thesis.
Marco Gloeckler
48
Table of figures
8 Table of figures
Figure 1: Rapid Prototyping process .......................................................................... 4
Figure 2: Layout DSK C6713 (14) ............................................................................ 15
Figure 3: Functional Block Diagram of the DSK C6713 (14)..................................... 16
Figure 4: Workflow Simulink (17) .............................................................................. 17
Figure 5: Software pieces used in workflow.............................................................. 18
Figure 6: Signal before and after modification(20) .................................................... 21
Figure 7: Phase Vocoder overview (30) ................................................................. 22
Figure 8: Phase of 2 samples (29) ............................................................................ 23
Figure 9: Spectral Manipulation (26) ....................................................................... 24
Figure 10: Phase unwrapping (25) ........................................................................... 27
Figure 11: "Phase Vocoder" Simulink ....................................................................... 28
Figure 12: Overlap ST-FFT detail ........................................................................... 29
Figure 13: "Synthesis Phase Calculation" detail ....................................................... 29
Figure 14: Signal 1,2,3,4........................................................................................... 30
Figure 15: "Principal Argument" detail ...................................................................... 30
Figure 16: Signal 4,5,6.............................................................................................. 31
Figure 17: Signal 7,8 ................................................................................................ 32
Figure 18: Overlap IST-FFT detail .......................................................................... 32
Figure 19: Input signal .............................................................................................. 33
Figure 20: Output signal ........................................................................................... 33
Figure 21: Top-Level Simulink .................................................................................. 34
Figure 22: "FindEndOfFile" subsystem ..................................................................... 36
Figure 23: "FindEndOfFile" signal ............................................................................. 37
Figure 24: "Processing" subsystem .......................................................................... 38
Figure 25: "Play" subsystem ..................................................................................... 39
Figure 26: Enable signals ......................................................................................... 42
Figure 27: LedFlash subsystem ............................................................................. 43
Figure 28: "LedFlash" signals ................................................................................... 43
Marco Gloeckler
49
Appendix
9 Appendix
9.1 Configure MATLAB/Simulink and CCS 3.3
9.1.1 CCS
This tutorial will describe how to set the C6713 DSK within the CCS 3.3 up.
After the installation the package with the specific data like drivers, examples etc. has
to be downloaded from the spectrum digital homepage and copied to the installation
path.
It should now initialise the DSK while starting. Sometimes there are problems with the
emulation of the USB. To fix it take the C6713 DSK Diagnostic Utility and unplug
the board. Plug it in again and it should work.
Marco Gloeckler
50
Appendix
Also troubles could occur because of a wrong linking.
So if you have built a project (or MATLAB do it automatically) a wrong link could be
registered and it will not compile.
To change this setting you right-click on your *.pjt and click on Build Options. You
will see something similar to the following figure. There you have to change the
include search path to the installation path with your DSK specific files.
The environment is now ready to work with. There are some nice tutorials for the first
steps like (32) or the help file of CCS which further provides useful information about
the whole program.
To get CCS 3.3 and MATLAB connected you have to choose connect from the
debug menu.
51
Appendix
As we now know that our hardware is addressable and the Simulink libraries are
available we can set up the AMTLAB/Simulink environment.
52
Appendix
8. Choose the TI C6000 compiler. Set Symbolic debugging
9. In the Select tree, choose the Debug category. Select Verbose build
10. In the Select tree, choose the Solver category. Ensure that Solver is set to
Fixed type / discrete
11. Set the following Real-Time Workshop run-time options:
- Build action: Build_and_execute
-Interrupt overrun notification method: Print_message
In the model itself you need to add the targetc6713 preferences block. This block
represents your driver and will be included when generating c-code. The default
parameters should be fine in most programs. However if you want to change memory
settings you can do it there.
Marco Gloeckler
53
Appendix
Version
Matlab
7.9
Simulink
7.4
4.0
3.4
Real-Time Workshop
7.4
5.4
6.10
6.12
4.0
3.3.81.6
CCSPlatinum_v30330
Marco Gloeckler
54
Appendix
File
model.c
Description
or .cpp
model_private.h
Contains local macros and local data that are required by the model
and subsystems. This file is included by the generated source files in
the model. You do not need to include model_private.h when
interfacing hand-written code to a model.
model.h
model_data.c
(conditional)
model_types.h
rtwtypes.h
ert_main.c
or .cpp
(optional)
autobuild.h
(optional)
Marco Gloeckler
55
Appendix
File
Description
points.
See Static Main Program Module for further information.
model_capi.c
model_capi.h
(optional)
Marco Gloeckler
56
Appendix
Marco Gloeckler
57
Appendix
Marco Gloeckler
58
Appendix
%----------------------------------------------------------------------% Control to enable the "Processing" block
% ***
if ((enableFNew==0) && (enableFOld==1))
fileend = 1;
startplay = 1;
% enables the start of the "Play" block
end
if ((dip==1)&& (fileend==0))
enablevar=1;
else
enablevar=0;
end
%
% ***
Marco Gloeckler
59
Appendix
%----------------------------------------------------------------------% Writing output and set variable for next step
% ***
enable=enablevar;
enableplay=enableplayvar;
ledFlashOut=ledFlash;
enableFOld=enableFNew;
enablePOld=enablePNew;
% ***
%#eml
end
Marco Gloeckler
60