EE5700 DSP Lab Report Matrix Operations and Program Flow Control

EE5700
DSP
Application
Lab
Lab Report
Jamesh K Johny, Sanket Seth
Contents
Module 1: Assembly Coding Familiarization ....................................................................................... 4
1.
Working on Load-Store-Move-Loops-Circular Buffers ............................................................. 4
2.
Working on Arithmetic Logical & Shift Instructions................................................................. 6
3.
Working on Program Flow -CC management -Bit operations .................................................. 9
Module 2: FIR filter using input and output side based convolutions ............................................... 10
1.
Low pass filter using the input side based algorithm ............................................................ 10
2.
High pass filter using the output side based algorithm: ........................................................ 12
Module 3: A-Law companding (encoding and decoding) on the recorded speech ............................ 14
1.
A-Law companding (encoding) ............................................................................................. 14
2.
A-Law companding (decoding) ............................................................................................. 16
Module 4: DSBSC & SSBSC (both upper & lower sideband) modulation............................................ 17
Module 5: Echo Cancellation using the LMS algorithm ..................................................................... 21
1.
Discrete-time channel model with FIR impulse response ...................................................... 21
2.
Continuous-time R-C low pass filter model for echo impulse response ................................. 22
Module 6: 8-point Fast Fourier Transform ....................................................................................... 23

Module 7: Image Restoration .......................................................................................................... 25
1.
Optical Blur Degradation on an image .................................................................................. 25
2.
Restore an image that has undergone an optical blur degradation ....................................... 30
Module 8: Discrete Cosine Transform .............................................................................................. 35

DCT & IDCT Implementation ........................................................................................................ 35
Module 1: Assembly Coding Familiarization

1. Working on Load-Store-Move-Loops-Circular Buffers
1. Write a program to store 16-bit LSBs of the input array (containing five 32-bit
numbers) with sign extension in an output array (containing five 32-bit numbers)
Example:
Input array = {0xF0F02828, 0xAA67F444, 0x1DB31234, 0xABCD6DF0, 0x45988777}
Output array = {0x00002828, 0xFFFFF444, 0x00001234, 0x00006DF0, 0Xffff8777}
Repeat the experiment for storing the result with zero extension.
.section .data.l1 //Declare variables in this section
.align 4
in: .byte4 0xF0F02828, 0xAA67F444, 0x1DB31234, 0xABCD6DF0, 0x45988777;
.align 4
out: .byte4 0,0,0,0,0;
.section .text //Write code in this section
.global _main; //main function is always declared as global and //function names
always start with underscore in //assembly
_main: //Start of main function
nop;
p0=[p3+in@GOT17M4]; //loading p0 with the address of in variable
p1=[p3+out@GOT17M4]; //loading p1 with the address of out1 variable
p4=5;
loop start LC0=p4;
loop_begin start;
r1=[p0++];
r2=r1.l(x);
[p1++]=r2;
loop_end start;
nop;
nop;
p0=[p3+in@GOT17M4];
p1=[p3+out@GOT17M4];
nop;
nop;
_main.end: //End of main function
2. Write a program to fetch every 2nd element of an input array (8 elements) and
store them in an output array twice (use Circular addressing).
Example:
Input array = {0x2828, 0x4444, 0x1234, 0x6DF0, 0x7777, 0x0EEEE, 0x1B11, 0x5111}
Output array = {0x2828, 0x1234, 0x7777, 0x1B11, 0x2828, 0x1234, 0x7777, 0x1B11}
.align 2
in: .byte2 0x2828, 0x4444, 0x1234, 0x6DF0, 0x7777, 0x0EEEE, 0x1B11, 0x5111;
.align 2
out: .byte2 0,0,0,0,0,0,0,0;
nop;
p4=16;
i0=p0;b0=i0;l0=p4;
i1=p1;b1=i1;
m0=4;
m1=2;
p4=8;
loop start LC0=p4;
loop_begin start;
r1.l=w[i0];
w[i1]=r1.l;
i0+=m0;
i1+=m1;
loop_end start;
nop;
nop;
p0=[p3+in@GOT17M4];
p1=[p3+out@GOT17M4];
nop;
nop;
2. Working on Arithmetic Logical & Shift Instructions

3. There are two arrays of size 9 (each element 16-bits wide). Write a program to
perform the following:
1) AND the first element of both the arrays
2) OR the second element of both the arrays
3) NOT the third element of first array
4) Find absolute value of the third element of second array .( Use abs after sign
extending to 32 bit value)
5) ADD the fourth element of both the arrays
6) SUBTRACT the fifth element of the second array from the fifth element of first
array.
7) Multiply the sixth element of both arrays in signed integer format and saturate the
result to 16.0 precision in destination register half. (Use (IS) option).
8) Multiply the seventh element of both arrays in signed fraction format and saturate
the result to 1.15 precision in destination register half.
9) Add 0x25 to the eighth element of first array. (Use Add Immediate instruction
+=).
10) Find 2s complement of the eighth element of the second array.(Use Negate
instruction - after sign extending it to 32 bits).
11) Left Shift the 9th element of first array by 3 bits. (use shift instruction <<=)
12) Right Shift the 9th element of second array by 2 bits. (use shift instruction >>=)
Store the results in an array of size 8 (each element 16-bits wide).
Example:
Input array1 [9] = {0x2828, 0xA167, 0x1DB3, 0x80F0, 0x45AB, 0xFFF1, 0x56DE,
0x1111, 0x0020}
Input array2 [9] = {0x7777, 0x5000, 0x8678, 0x4598, 0x432A, 0xFFF0, 0x0680,
0xD444, 0x0040}
Output array [12] = {0x2020, 0xF167, 0xE24C, 0x7988, 0xC688, 0x0281,
0x00F0,0x0469, 0x1136, 0x2BBC, 0x0100, 0x0010}
.align 2
in1: .byte2 0x2828, 0xA167, 0x1DB3, 0x80F0, 0x45AB, 0xFFF1, 0x56DE, 0x1111,
0x0020;
.align 2
in2: .byte2 0x7777, 0x5000, 0x8678, 0x4598, 0x432A, 0xFFF0, 0x0680, 0xD444,
0x0040;
.align 2
out: .byte2 0,0,0,0,0,0,0,0,0,0,0,0;
.global _main; //main function is always declared as global and

p0=[p3+in1@GOT17M4]; //loading p0 with the address of in1 variable
p1=[p3+in2@GOT17M4]; //loading p1 with the address of in2 variable
p2=[p3+out@GOT17M4]; //loading p2 with the address of out variable
//and
r1=w[p0++];
r2=w[p1++];
r1=r1&r2;
w[p2++]=r1;
//or
r1=w[p0++];
r2=w[p1++];
r1=r1|r2;
w[p2++]=r1;
//not
r1=w[p0++];
r2=w[p1++];
r1=~r1;
w[p2++]=r1;
//abs
r1=abs r2;
w[p2++]=r1;
//add
r1=w[p0++];
r2=w[p1++];
r1.l=r1.l+r2.l;
w[p2++]=r1;
//substract
r1=w[p0++];
r2=w[p1++];
r1.l=r1.l-r2.l;
w[p2++]=r1;
//Multiply
r1=w[p0++];
r2=w[p1++];
r1.l=r1.l*r2.l(is);
w[p2++]=r1;
//Multiply
r1=w[p0++];
r2=w[p1++];
r1.l=r1.l*r2.l;
w[p2++]=r1;
//add 0x25
r1=w[p0++](x);
r1+=0x25;
w[p2++]=r1;
//2' complement
r2=w[p1++];
r1=- r2;
w[p2++]=r1;
//left shift 3
r1=w[p0++](x);
r1<<=3;
w[p2++]=r1;
//right shift 2
r2=w[p1++];
r2>>=2;
w[p2++]=r2;
4. Write a program to perform 3x3 Matrix multiplications between two arrays and
store the output in an array of 9 elements each 16 bits wide. Use Accumulator to
perform the multiplication. Write the program in generic form so that it can be easily
modified for matrices of different dimensions.
Note: You can only declare a single dimension array. The first 3 elements of the array
belong to first row, the next 3 elements to the second row and the last 3 elements to
the third row.
.align 2
in1: .byte2 1,2,3,2,3,1,3,1,2;
.align 2
in2: .byte2 1,2,3,2,3,1,3,1,2;
.align 2
out: .byte2 0,0,0,0,0,0,0,0,0;
.global _main; //main function is always declared as global
p0=[p3+in1@GOT17M4]; //loading p0 with the address of in1
p1=[p3+in2@GOT17M4]; //loading p1 with the address of in2
p2=[p3+out@GOT17M4]; //loading p2 with the address of out
p4=18;
i0=p0;b0=i0;
i1=p1;b1=i1;l1=p4;
m0=2;
m1=6;
p4=3;
loop outer LC0=p4;
loop_begin outer;
loop inner LC1=p4;
loop_begin inner;
r0.l=w[i0];
i0+=m0;
r1.l=w[i1];
i1+=m1;
a0=r0.l*r1.l(is);
r0.l=w[i0];
i0+=m0;
r1.l=w[i1];
i1+=m1;
a0+=r0.l*r1.l(is);
r0.l=w[i0];
i0+=m0;
r1.l=w[i1];
i1+=m1;
a0+=r0.l*r1.l(is);
r0=a0;
w[p2++]=r0;
i1+=m0;
i0=b0;
loop_end inner;
i0+=m1;
b0=i0;
i1=b1;
loop_end outer;
and
variable
variable
variable
3. Working on Program Flow -CC management -Bit operations

5. Write a program to separate the positive and negative numbers of an array
(containing eight 16-bit numbers) and perform the following operations.
For positive numbers, CLEAR bit 7 and bit 11.
For negative numbers, SET bit 4 and TOGGLE bit 8.
Store the results in two different arrays.
Example:
Input array = {0x2828, 0xC444, 0x1234, 0x2F02, 0x7777, 0xEFFE, 0xABCD, 0x5D9F}
Output array1 = {0xC554, 0xEEFE, 0xAADD}
Output array2 = {0x2028, 0x1234, 0x2702, 0x7777, 0x551F}
.align 2
in: .byte2 0x2828, 0xC444, 0x1234, 0x2F02, 0x7777, 0xEFFE, 0xABCD, 0x5D9F;
.align 2
out1: .byte2 0,0,0,0,0,0,0,0;
.align 2
out2: .byte2 0,0,0,0,0,0,0,0;
nop;
p1=[p3+out1@GOT17M4]; //loading p1 with the address of out1 variable
p2=[p3+out2@GOT17M4]; //loading p1 with the address of out2 variable
p4=8;
loop start LC0=p4;
loop_begin start;
r0=w[p0++](x);
cc = r0 <= 0 ;
if cc jump dest_label1 ;
bitclr (r0,7);
bitclr (r0,11);
w[p1++]=r0;
jump dest_label2;
dest_label1:
bitset (r0,4);
bittgl (r0,8);
w[p2++]=r0;
jump dest_label2;
dest_label2:nop
loop_end start;
nop;
nop;
Module 2: FIR filter using input and output side based convolutions
For the given input signal
Input signal[35]: {0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x1e5a, 0x2ccc, 0x24c0, 0x0ccc,
0xf4d8, 0xeccc, 0xfb3f, 0x1999, 0x37f3, 0x4666, 0x3e5a, 0x2666, 0x0e72, 0x0666, 0x14d8, 0x3333,
0x518d, 0x6000, 0x57f3, 0x4000, 0x280c, 0x1fff, 0x2e72, 0x4ccc, 0x4ccc, 0x4ccc, 0x4ccc, 0x4ccc,
0x4ccc}
1. Low pass filter using the input side based algorithm

System Response[21]={ 0x0000, 0x0166, 0x02fe, 0x04b5, 0x0675, 0x0826, 0x09af, 0x0afc, 0x0bf9,
0x0c97, 0x0ccc, 0x0c97, 0x0bf9, 0x0afc, 0x09af, 0x0826, 0x0675, 0x04b5, 0x02fe, 0x0166, 0x0000 }
.align 2
in: .byte2 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x1e5a, 0x2ccc, 0x24c0,
0x0ccc, 0xf4d8, 0xeccc, 0xfb3f, 0x1999, 0x37f3, 0x4666, 0x3e5a, 0x2666, 0x0e72,
0x0666, 0x14d8, 0x3333, 0x518d, 0x6000, 0x57f3, 0x4000, 0x280c, 0x1fff, 0x2e72,
0x4ccc, 0x4ccc, 0x4ccc, 0x4ccc, 0x4ccc, 0x4ccc;
.align 2
h: .byte2 0x0000, 0x0166, 0x02fe, 0x04b5, 0x0675, 0x0826, 0x09af, 0x0afc, 0x0bf9,
0x0c97, 0x0ccc, 0x0c97, 0x0bf9, 0x0afc, 0x09af, 0x0826, 0x0675, 0x04b5, 0x02fe,
0x0166, 0x0000;
.align 2
out: .byte2
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0;
nop;
p1=[p3+h@GOT17M4]; //loading p1 with the address of h variable
p4=21*2;
i0=p0;b0=i0;
i1=p1;b1=i1;l1=p4;
i2=p2;b2=i2;
m1=2;
p4=35;
p5=21;
loop outer LC0=p4;
loop_begin outer;
r0.l=w[i0++];
loop inner LC1=p5;
loop_begin inner;
r1.l=w[i1++];
r2.l=r0.l*r1.l;
r3.l=w[i2];
r3.l=r3.l+r2.l;
w[i2++]=r3.l;
loop_end inner;
i2=b2;
i2+=m1;
b2=i2;
loop_end outer;
nop;
nop;
Results
Input
LPF Coefficients
Output
Little-endian
00000000 00000000
2E04BA05 00071208
41178519 D31B771E
C138EF3B 953EDF40
00434C3E C538DF32
56097605 A302D700
00000000
2F09A60A
A6214D25
24439E45
DD2CCB26
Big-endian
00000000
00550000
08120700
14CA120C
254D21A6
3BEF38C1
48834782
32DF38C5
0E3A1409
00000000
02910133
0AA6092F
19851741
2D35292D
40DF3E95
46694840
26CB2CDD
05760956
00000000
05BA042E
0F3D0CB1
1E771BD3
35173138
459E4324
3E4C4300
1A4B209B
00D702A3
00005500
B10C3D0F
2D29352D
82478348
9B204B1A
33019102
0C12CA14
38311735
40486946
09143A0E
2. High pass filter using the output side based algorithm:

System Response: {0x9999, 0x6666}
All the signals are of the type 16 bit signed fraction. Plot and observe the graph
for input and output signal.
.align 2//Appened n-1=>1 zeros at the end
in: .byte2 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x1e5a, 0x2ccc, 0x24c0,
0x0ccc, 0xf4d8, 0xeccc, 0xfb3f, 0x1999, 0x37f3, 0x4666, 0x3e5a, 0x2666, 0x0e72,
0x0666, 0x14d8, 0x3333, 0x518d, 0x6000, 0x57f3, 0x4000, 0x280c, 0x1fff, 0x2e72,
0x4ccc, 0x4ccc, 0x4ccc, 0x4ccc, 0x4ccc, 0x4ccc,0;
.align 2
h: .byte2 0x9999, 0x6666;
.align 2
temp: .byte2 0, 0;
.align 2
out: .byte2
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0;
nop;
p1=[p3+h@GOT17M4]; //loading p1 with the address of h variable
p3=[p3+temp@GOT17M4]; //loading p2 with the address of temp variable
p4=2*2;
i0=p0;b0=i0;
i1=p1;b1=i1;l1=p4;
i2=p2;b2=i2;
i3=p3;b3=i3;l3=p4;
m1=2;
p4=35+2-1;
p5=2;
loop outer LC0=p4;
loop_begin outer;
r1.l=w[i0++];
w[i3]=r1.l;
a0=0;
loop inner LC1=p5;
loop_begin inner;
r1.l=w[i3++];
r2.l=w[i1++];
a0+=r1.l*r2.l;
loop_end inner;
r0.l=a0;
w[p2++]=r0;
i3-=m1;
loop_end outer;
nop;
nop;
Output
Little Endian
00000000 00000000
2A137006 71F4B8E7
71F4B7E7 B7E770F4
FFFFFFFF FFFFFFFF
00000000 B8E771F4 70062913

B8E770F4 6F062913 29137006
70062813 29137106 70F4B7E7
FFFF703D
Big-endian
00000000 00000000 00000000 F471E7B8 13290670 0670132A
E7B8F471 F470E7B8 1329066F 06701329 E7B7F471 F470E7B7
13280670 06711329 E7B7F470 FFFFFFFF FFFFFFFF 3D70FFFF
Output
Module 3: A-Law companding (encoding and decoding) on the

recorded speech
1. A-Law companding (encoding)
/*
0 0 0 0 0 0 0 A B C D X => S 0 0 0 A B C D
0 0 0 0 0 0 1 A B C D X => S 0 0 1 A B C D
0 0 0 0 0 1 A B C D X X => S 0 1 0 A B C D
0 0 0 0 1 A B C D X X X => S 0 1 1 A B C D
0 0 0 1 A B C D X X X X => S 1 0 0 A B C D
0 0 1 A B C D X X X X X => S 1 0 1 A B C D
0 1 A B C D X X X X X X => S 1 1 0 A B C D
1 A B C D X X X X X X X => S 1 1 1 A B C D
*/
.align 2
in: .byte2 0x001F,0x003F,0x007f,0x00ff,0x01ff,0x03ff,0x07ff,0x0fff,0x1fff;
.align 2
out: .byte2 0,0,0,0,0,0,0,0,0,0,0;
nop;
r0=0;r1=0;r2=0;r3=0;r4=0;r5=0;r6=0;r7=0;
r1.h=0x1000;
r5=7;
p4=9;
loop start LC0=p4;
loop_begin start;
r0=w[p0++];//0x008b;
r0=r0<<16;
r2.h=r0.h<<3;
r3=r0 & r1;
cc=r3==0;
if cc jump next1;
r2=-r2;
r3=r3>>5;
next1:nop;
r4.l=signbits r2;
cc=r4<7(IU);
if cc jump next2;
r4.l=7;
next2:r4=r5-r4;
r6.l=r4.l<<4;
r6.l=r6.l+r3.h;
r0.l=4;
r6.h=19;
r6.h=r6.h+r4.l
r6.h=r6.h<<8;
r0.l=r0.l+r6.h;
r7=extract(r2,r0.l)(z);
r7.l=r7.l+r6.l;
w[p1++]=r7;
loop_end start;
nop;
nop;
2. A-Law companding (decoding)

/*
S 0 0 0 A B C D => 0 0 0 0 0 0 0 A B C D 1
S 0 0 1 A B C D => 0 0 0 0 0 0 1 A B C D 1
S 0 1 0 A B C D => 0 0 0 0 0 1 A B C D 1 0
S 0 1 1 A B C D => 0 0 0 0 1 A B C D 1 0 0
S 1 0 0 A B C D => 0 0 0 1 A B C D 1 0 0 0
S 1 0 1 A B C D => 0 0 1 A B C D 1 0 0 0 0
S 1 1 0 A B C D => 0 1 A B C D 1 0 0 0 0 0
S 1 1 1 A B C D => 1 A B C D 1 0 0 0 0 0 0
*/
.align 2
in: .byte2 0x0F,0x1F,0x2f,0x3f,0x4f,0x5f,0x6f,0x7f,0x81;
.align 2
out: .byte2 0,0,0,0,0,0,0,0,0,0,0;
nop;
r0=0;r1=0;r2=0;r3=0;r4=0;r5=0;r6=0;r7=0;
r6=0xf;
r7.l=33;
r7.h=1;
r5.l=0x403;
p4=9;
loop start LC0=p4;
loop_begin start;
r0=w[p0++];
r1=r0 & r6;
r1=r1<<1;
r1.l=r1.l + r7.l;
r2=extract(r0,r5.l)(z);
cc=r2==0;
if !cc jump next1;
r2=1;
bittgl(r1,5);
next1:
r2.l=r2.l-r7.h;
r3.l=lshift r1.l by r2.l;
cc = bittst(r0, 7);
if !cc jump next2;
r3=-r3;
next2:
w[p1++]=r3;
loop_end start;
nop;
nop;
Module 4: DSBSC & SSBSC (both upper & lower sideband) modulation
Message signal is a sine wave with unit amplitude at frequency of 10 Hz
m(t) = sin ( 20 t )
Carrier signal is a sine wave with unit amplitude at a frequency of 50 Hz
c(t) = sin ( 100 t )
.align 2
sinesamples: .include "Sine.dat";
.align 2
sinewave: .skip 800; //Allocate 800 bytes in memory
.align 2
message: .skip 6400; //Allocate 6400 bytes in memory
.align 2
carrier: .skip 1280; //Allocate 1280 bytes in memory
.align 2
dsbsc: .skip 6400; //Allocate 6400 bytes in memory
.align 2
SSBSCupper: .skip 6400; //Allocate 6400 bytes in memory
.align 2
SSBSClower: .skip 6400; //Allocate 6400 bytes in memory
nop;
p0=[p3+sinesamples@GOT17M4];
p1=[p3+sinewave@GOT17M4];
r0=0;r1=0;r2=0;r3=0;r4=0;r5=0;r6=0;r7=0;
i0=p0;b0=i0;l0=2000*2;
i1=p1;
//single cycle of sine wave of unit amplitude and 10Hz frequency.
m0=10;
p4=400;
loop loop1 LC0=p4;
loop_begin loop1;
r0.l=w[i0];
w[i1++]=r0.l;
i0+=m0;
loop_end loop1;
//8 cycles of sine wave of unit amplitude and 10Hz frequency.
p1=[p3+message@GOT17M4];
i0=b0;
i1=p1;
m0=10;//10Hz
p4=3200;
loop loop2 LC0=p4;
loop_begin loop2;
r0.l=w[i0];
w[i1++]=r0.l;
i0+=m0;
loop_end loop2;
//Create the carrier signal which is a sine wave of unit amplitude and 50Hz
frequency.
p1=[p3+carrier@GOT17M4];
i0=b0;
i1=p1;
m0=50;//50Hz
p4=640;
loop loop3 LC0=p4;
loop_begin loop3;
r0.l=w[i0];
w[i1++]=r0.l;
i0+=m0;
loop_end loop3;
//DSBSC
p5=[p3+dsbsc@GOT17M4];
i1=p1;b1=i1;l1=640*2;
i2=p2;b2=i2;l2=3200*2;
i3=p5;
p4=3200;
loop loop4 LC0=p4;
loop_begin loop4;
r0.l=w[i1++];
r1.l=w[i2++];
r2.l=r0.l*r1.l;
w[i3++]=r2.l;
loop_end loop4;
//SSBSC (upper side-band) output
p0=[p3+SSBSCupper@GOT17M4];
i0=p2;b0=i0;l0=3200*2;
//sin ( 20 pi t )
i1=p1;b1=i1;l1=640*2;
m1=20*3*2;//50Hz has 2000/25=80samples per cycle
i1+=m1;
//-cos ( 100 pi t )
i2=p2;b2=i2;l2=3200*2;
m2=100*3*2;//10Hz has 2000/5=400samples per cycle
i2+=m2;
//-cos ( 20 pi t )
i3=p1;b3=i3;l3=640*2;
//sin ( 100 pi t )
p4=3200;
loop loop5 LC0=p4;
loop_begin loop5;
r0.l=w[i0++];
r1.l=w[i1++];
a0=r0.l*r1.l; //Product A
r0.l=w[i2++];
r1.l=w[i3++];
r2.l=(a0+=r0.l*r1.l); //Product B and sum of product A & B
w[p0++]=r2;
//
loop_end loop5;
//SSBSC (lower side-band) output
p0=[p3+SSBSClower@GOT17M4];
i0=p2;b0=i0;l0=3200*2;
//sin ( 20 pi t )
i1=p1;b1=i1;l1=640*2;
m1=20*3*2;
i1+=m1;
//-cos ( 100 pi t )
i2=p2;b2=i2;l2=3200*2;
m2=100*3*2;
i2+=m2;
//-cos ( 20 pi t )
i3=p1;b3=i3;l3=640*2;
//sin ( 100 pi t )
p4=3200;
loop loop6 LC0=p4;
loop_begin loop6;
r0.l=w[i0++];
r1.l=w[i1++];
a0=r0.l*r1.l; //Product A
r0.l=w[i2++];
r1.l=w[i3++];
r2.l=(a0-=r0.l*r1.l); //Product B and difference of product A & B
w[p0++]=r2;
//
loop_end loop6;
nop;
nop;
message
carrier
DSBSC
SSBSC Upper
SSBSC lower
Module 5: Echo Cancellation using the LMS algorithm

1. Discrete-time channel model with FIR impulse response
/*
* Least Mean Squares Algorithm
*/
#define TAPS 4
#define SIZE 1000
int d[SIZE];
float e[SIZE],r,y;
float w[TAPS],h[TAPS]={0.4,1,0.7,0.2};
float u=0.2;//gain constant which influences the convergence and smoothness of the
error plot and the steady state error
int main()
{
unsigned int i,j;
//Near-end speech samples are a pseudo-random sequence
for (i=0;i<SIZE;i++)
{
d[i]=rand();
d[i]=(d[i]%2==0)?-1:1;
}
for(i=TAPS;i<SIZE;i++)
{
r=0; y=0;
//discrete-time equivalent baseband model for the received sample
for(j=0;j<TAPS;j++)
r=r+(h[j]*d[i-j]);
//Step 1: Get the echo estimate i.e.
for(j=0;j<TAPS;j++)
y=y+(w[j]*d[i-j]);
//Step 2: Compute error
e[i]=r-y;
//Step 3: Update the weight vector
for(j=0;j<TAPS;j++)
w[j]=w[j]+(u*e[i]*d[i-j]);
}
}
Convergence curve
2. Continuous-time R-C low pass filter model for echo impulse

response
/*
* Least Mean Squares Algorithm
*/
#define TAPS 16
#define SIZE 1024
#define OFFSET 72
int d[]={
#include "ts2.dat"
};
int r[]={
#include "ts2_rx.dat"
};
float fD[SIZE],fR[SIZE];
float e[SIZE],y;
float w[TAPS];
float mu=0.2;//gain constant which influences the convergence and smoothness of
the error plot and the steady state error
int main()
{
unsigned int i,j;
for (i=0;i<SIZE;i++)
{
fD[i]=d[i]*1.52587890625e-05;//1/2^16
fR[i]=r[i+OFFSET+TAPS]* 0.00390625;//1/2^8
}
for(i=0;i<SIZE-TAPS;i++)
{
y=0;
//Step 1: Get the echo estimate i.e.
for(j=0;j<TAPS;j++)
y += w[j] * fD[j+i];//-j+TAPS-1
//Step 2: Compute error
e[i]=fR[i]-y;
//Step 3: Update the weight vector
for(j=0;j<TAPS;j++)
w[j] += mu * e[i] * fD[i+j];
}
}
Convergence curve
Module 6: 8-point Fast Fourier Transform

#include <math.h>
#define NFFT 8
#define STAGE 3//log2(8)
#define PI 3.1415926535897932384626433832795
float fData[NFFT];
float W[NFFT/2][2];
float fOutput[NFFT][2];
void complex_mult(float *Input1, float * Input2, float * Output);
void complex_add(float *Input1, float * Input2, float * Output);
void complex_sub(float *Input1, float * Input2, float * Output);
int main()
{
int i,j,k;
int BL,BF;
int iBitrev[NFFT];
float fTemp[2];
for (i=0;i<NFFT;i++)
fData[i]=(i+1.0)*0.1;
iBitrev[i]=0;
for (i=0;i<STAGE;i++)
for (j=0;j<NFFT;j++)
iBitrev[j]=(iBitrev[j]<<1)+((j>>i)&1);
for (i=0;i<NFFT/2;i++){
W[i][0] = cos(-2.0*PI/NFFT*i);
W[i][1] = sin(-2.0*PI/NFFT*i);
}
fOutput[i][0]=fData[iBitrev[i]];
BL = NFFT/2;
BF = 1;
for (i=0;i<STAGE;i++){
for (j=0;j<BL;j++)
for (k=0;k<BF;k++){
complex_mult(fOutput[(j*2+1)*BF+k], W[k*BL], fTemp);
fOutput[(j*2+1)*BF+k][0]=fTemp[0];
fOutput[(j*2+1)*BF+k][1]=fTemp[1];
complex_add(fOutput[j*2*BF+k], fOutput[(j*2+1)*BF+k],
fTemp);
complex_sub(fOutput[j*2*BF+k], fOutput[(j*2+1)*BF+k],
fOutput[(j*2+1)*BF+k]);
fOutput[j*2*BF+k][0]=fTemp[0];
fOutput[j*2*BF+k][1]=fTemp[1];
}
BL=BL>>1;
BF*=2;
}
}
/*******************************************
* (a + bi)(c + di) = (ac - bd) + i (ad + bc)

*/
void complex_mult(float *Input1, float * Input2, float * Output)
{
Output[0]=Input1[0]*Input2[0]-Input1[1]*Input2[1];
Output[1]=Input1[0]*Input2[1]+Input1[1]*Input2[0];
}
/*******************************************
* (a + bi)+(c + di) = (a + c) + i (b + d)
*/
void complex_add(float *Input1, float * Input2, float * Output)
{
Output[0]=Input1[0]+Input2[0];
Output[1]=Input1[1]+Input2[1];
}
/*******************************************
* (a + bi)-(c + di) = (a - c) + i (b - d)
*/
void complex_sub(float *Input1, float * Input2, float * Output)
{
Output[0]=Input1[0]-Input2[0];
Output[1]=Input1[1]-Input2[1];
}
Expected Result
1.
2.
3.
4.
5.
6.
7.
8.
36.0000
-4.0000 - 9.6569i
-4.0000 - 4.0000i
-4.0000 - 1.6569i
-4.0000
-4.0000 + 1.6569i
-4.0000 + 4.0000i
-4.0000 + 9.6569i
Module 7: Image Restoration

1. Optical Blur Degradation on an image
#include <math.h>
#include <stdlib.h>
#include <assert.h>
#define NFFT 256
#define SIGMA1.0
#define KNL_SZ
ceil(6*SIGMA + 1)
#define KNL_MID
((KNL_SZ-1)*0.5)
typedef struct { float Re; float Im; }
complex;
/* for complex number */
#define PI 3.1415926535897932384626433832795
int iImage[NFFT*NFFT]={
#include "Image256.dat"
};
void complex_mult(complex *Input1, complex * Input2, complex * Output);
void complex_add(complex *Input1, complex * Input2, complex * Output);
void complex_sub(complex *Input1, complex * Input2, complex * Output);
void fft2D(complex x[], int n1, int n2, int flag);
complex imageIn[NFFT*NFFT],kernelIn[NFFT*NFFT],imageOut[NFFT*NFFT];
int iImageOut[NFFT*NFFT];
int main()
{
int i,j,k;
int BL,BF;
float fSum=0;
//load image
for (i=0;i<NFFT*NFFT;i++)
imageIn[i].Re=iImage[i];
//Make kernel
for (i=0;i<KNL_SZ;i++)
for (j=0;j<KNL_SZ;j++){
kernelIn[i*NFFT+j].Re=exp(((i-KNL_MID)*(i-KNL_MID)+(jKNL_MID)*(j-KNL_MID))/(-2*SIGMA*SIGMA));
fSum+=kernelIn[i*NFFT+j].Re;
//kernelIn[i*NFFT+j]/=2*PI*SIGMA*SIGMA;
}
for (j=0;j<KNL_SZ;j++)
kernelIn[i*NFFT+j].Re/=fSum;
fft2D(imageIn,NFFT,NFFT,1);
fft2D(kernelIn,NFFT,NFFT,1);
complex_mult(&imageIn[i],&kernelIn[i],&imageOut[i]);
fft2D(imageOut,NFFT,NFFT,-1);
//store image
iImageOut[i]=(int)(imageOut[i].Re/NFFT/NFFT);
}
/*******************************************
*/
void complex_mult(complex *Input1, complex * Input2, complex * Output)
{
Output->Re=Input1->Re*Input2->Re-Input1->Im*Input2->Im;
Output->Im=Input1->Re*Input2->Im+Input1->Im*Input2->Re;
}
/*******************************************
* (a + bi)+(c + di) = (a + c) + i (b + d)
*/
void complex_add(complex *Input1, complex * Input2, complex * Output)
{
Output->Re=Input1->Re+Input2->Re;
Output->Im=Input1->Im+Input2->Im;
}
/*******************************************
* (a + bi)-(c + di) = (a - c) + i (b - d)
*/
void complex_sub(complex *Input1, complex * Input2, complex * Output)
{
Output->Re=Input1->Re-Input2->Re;
Output->Im=Input1->Im-Input2->Im;
}
/*----------------------------------------------------------------------*/
/* Truncated Stockham algorithm for multi-column vector,
X(n1,n2) <- F_n1 X(n1,n2)
x[] input data of size n, viewed as n1 = n/n2 by n2 dimensional
array, flag =+1 forward transform, -1 for backward transform, y[] is
working space, which must be n in size. This function is supposed
to be internal (static), not used by application. Note that the
terminology of column or row respect to algorithms in the Loan's
book is reversed, because we use row major convention of C.
*/
static void stockham(complex x[], int n, int flag, int n2, complex y[])
{
complex *y_orig, *tmp;
int i, j, k, k2, Ls, r, jrs;
int half, m, m2;
float wr, wi, tr, ti;
y_orig = y;
r = half = n >> 1;
Ls = 1;
/* Ls=L/2 is the L star */
while(r >= n2) {

/* loops log2(n/n2) times
tmp = x;
/* swap pointers, y is always old
x = y;
/* x is always for new data
y = tmp;
m = 0;
/* m runs over first half of the array
m2 = half;
/* m2 for second half, n2=n/2
for(j = 0; j < Ls; ++j) {
wr = cos(M_PI*j/Ls);
/* real and imaginary part
wi = -flag * sin(M_PI*j/Ls);
/* of the omega
jrs = j*(r+r);
for(k = jrs; k < jrs+r; ++k) {
/* "butterfly" operation
k2 = k + r;
tr = wr*y[k2].Re - wi*y[k2].Im;
/* complex multiply, w*y
ti = wr*y[k2].Im + wi*y[k2].Re;
*/
*/
*/
*/
*/
*/
*/
*/
*/
x[m].Re = y[k].Re + tr;

x[m].Im = y[k].Im + ti;
x[m2].Re = y[k].Re - tr;
x[m2].Im = y[k].Im - ti;
++m;
++m2;
}
}
r >>= 1;
Ls <<= 1;
};
if (y != y_orig) {
for(i = 0; i < n; ++i) {
y[i] = x[i];
}
}
/* copy back to permanent memory */

/* if it is not already there */
/* performed only if log2(n/n2) is odd */
assert(Ls == n/n2);
assert(1 == n || m2 == n);
/* ensure n is a power of 2
/* check array index within bound
*/
*/
}
/* The Cooley-Tukey multiple column algorithm, see page 124 of Loan.
x[] is input data, overwritten by output, viewed as n/n2 by n2
array. flag = 1 for forward and -1 for backward transform.
*/
void cooley_tukey(complex x[], int n, int flag, int n2)
{
complex c;
int i, j, k, m, p, n1;
int Ls, ks, ms, jm, dk;
n1 = n/n2;
for(k = 0; k < n1; ++k) {
j = 0;
m = k;
p = 1;
while(p < n1) {
j = 2*j + (m&1);
m >>= 1;
p <<= 1;
}
assert(p == n1);
if(j > k) {
for(i = 0; i < n2; ++i) {
c = x[k*n2+i];
x[k*n2+i] = x[j*n2+i];
x[j*n2+i] = c;
}
}
}
/* do bit reversal permutation */

/* This is algorithms 1.5.1 and 1.5.2. */
/* p = 2^q,
q used in the book */
/* make sure n1 is a power of two */

/* swap k <-> j row */
/* for all columns */
/* This is (3.1.7), page 124 */

p = 1;
while(p < n1) {
Ls = p;
p <<= 1;
jm = 0;
dk = p*n2;
for(j = 0; j < Ls; ++j) {
/* jm is j*n2 */
/* of the omega
for(k = jm; k < n; k += dk) {
/* "butterfly"
ks = k + Ls*n2;
for(i = 0; i < n2; ++i) {
/* for each row
m = k + i;
ms = ks + i;
tr = wr*x[ms].Re - wi*x[ms].Im;
ti = wr*x[ms].Im + wi*x[ms].Re;
x[ms].Re = x[m].Re - tr;
x[ms].Im = x[m].Im - ti;
x[m].Re += tr;
x[m].Im += ti;
}
}
jm += n2;
*/
*/
*/
*/
}
}
}
/* 3D Fourier transform:
The index for x[m] is mapped to (i,j,k) by
m = k + n3*j + n3*n2*i, i.e. the row major convention of C.
All indices start from 0.
This algorithm requires working space of n2*n3.
Stockham is efficient, good stride feature, but takes extra
memory same size as input data; Cooley-Tukey is in place,
so we take a compromise of the two.
*/
void fft2D(complex x[], int n1, int n2, int flag)
{
complex *y;
int i, n;
assert(1 == flag || -1 == flag);
n = n1*n2;
y = (complex *) malloc( n2*sizeof(complex) );
//assert(NULL != y);
for(i=0; i < n; i += n2) {
stockham(x+i, n2, flag, 1, y);
}
free(y);
cooley_tukey(x, n, flag, n2);
}
/* FFT in y */
/* FFT in x */
Original
Image
Blurred Image
2. Restore an image that has undergone an optical blur degradation

#include <math.h>
#include <stdlib.h>
#include <assert.h>
#define NFFT 256
#define SIGMA1.0
#define LOW_THRESHOLD
0.01
#define KNL_SZ
ceil(6*SIGMA + 1)
#define KNL_MID
((KNL_SZ-1)*0.5)
typedef struct { float Re; float Im; }
complex;
/* for complex number */
#define PI 3.1415926535897932384626433832795
int iImage[NFFT*NFFT]={
#include "Image256.dat"
};
void complex_mult(complex *Input1, complex * Input2, complex * Output);
void complex_add(complex *Input1, complex * Input2, complex * Output);
void complex_sub(complex *Input1, complex * Input2, complex * Output);
void fft2D(complex x[], int n1, int n2, int flag);
void Inverse_Filtering(complex *Input1, complex * kernel, complex * Output, float
threshold);
complex imageIn[NFFT*NFFT],kernelIn[NFFT*NFFT],imageOut[NFFT*NFFT];
int iImageOut[NFFT*NFFT];
complex RestoreImageIn[NFFT*NFFT],RestoreImageOut[NFFT*NFFT];
int Final[NFFT*NFFT];
int main()
{
int i,j,k;
int temp;
int BL,BF;
float fSum=0;
//load image
imageIn[i].Re=iImage[i];
//Make kernel
for (j=0;j<KNL_SZ;j++){
kernelIn[i*NFFT+j].Re=exp(((i-KNL_MID)*(i-KNL_MID)+(jKNL_MID)*(j-KNL_MID))/(-2*SIGMA*SIGMA));
fSum+=kernelIn[i*NFFT+j].Re;
//kernelIn[i*NFFT+j]/=2*PI*SIGMA*SIGMA;
}
for (j=0;j<KNL_SZ;j++)
kernelIn[i*NFFT+j].Re/=fSum;
fft2D(imageIn,NFFT,NFFT,1);
//kernel transform
fft2D(kernelIn,NFFT,NFFT,1);
complex_mult(&imageIn[i],&kernelIn[i],&imageOut[i]);
fft2D(imageOut,NFFT,NFFT,-1);
//store image
iImageOut[i]=(int)(imageOut[i].Re/NFFT/NFFT);
//Image restoration
//load image
for (i=0;i<NFFT*NFFT;i++){
RestoreImageIn[i].Re=iImageOut[i];
RestoreImageIn[i].Im=0;
}
//Take FFT
fft2D(RestoreImageIn,NFFT,NFFT,1);
Inverse_Filtering(&RestoreImageIn[i],&kernelIn[i],&RestoreImageOut[i],LOW_T
HRESHOLD);
fft2D(RestoreImageOut,NFFT,NFFT,-1);
//store image
for (i=0;i<NFFT*NFFT;i++){
if(RestoreImageOut[i].Re>0)
Final[i]=(int)(RestoreImageOut[i].Re/NFFT/NFFT);
else
Final[i]=0;
}
}
void Inverse_Filtering(complex *Input1, complex * kernel, complex * Output, float
threshold)
{
float mag;
mag=kernel->Re*kernel->Re+kernel->Im*kernel->Im;
if(mag<threshold*threshold){
Output->Re=0;
Output->Im=0;
}else
{
Output->Re=Input1->Re*kernel->Re+Input1->Im*kernel->Im;
Output->Im=-Input1->Re*kernel->Im+Input1->Im*kernel->Re;
Output->Re/=mag;
Output->Im/=mag;
}
}
/*******************************************
*/
void complex_mult(complex *Input1, complex * Input2, complex * Output)
{
Output->Re=Input1->Re*Input2->Re-Input1->Im*Input2->Im;
Output->Im=Input1->Re*Input2->Im+Input1->Im*Input2->Re;
}
/*******************************************
* (a + bi)+(c + di) = (a + c) + i (b + d)
*/
void complex_add(complex *Input1, complex * Input2, complex * Output)
{
Output->Re=Input1->Re+Input2->Re;
Output->Im=Input1->Im+Input2->Im;
}
/*******************************************
* (a + bi)-(c + di) = (a - c) + i (b - d)
*/
void complex_sub(complex *Input1, complex * Input2, complex * Output)
{
Output->Re=Input1->Re-Input2->Re;
Output->Im=Input1->Im-Input2->Im;
}
/*----------------------------------------------------------------------*/
/* Truncated Stockham algorithm for multi-column vector,
X(n1,n2) <- F_n1 X(n1,n2)
x[] input data of size n, viewed as n1 = n/n2 by n2 dimensional
array, flag =+1 forward transform, -1 for backward transform, y[] is
working space, which must be n in size. This function is supposed
to be internal (static), not used by application. Note that the
terminology of column or row respect to algorithms in the Loan's
book is reversed, because we use row major convention of C.
*/
static void stockham(complex x[], int n, int flag, int n2, complex y[])
{
complex *y_orig, *tmp;
int i, j, k, k2, Ls, r, jrs;
int half, m, m2;
y_orig = y;
r = half = n >> 1;
Ls = 1;
/* Ls=L/2 is the L star */
while(r >= n2) {

/* loops log2(n/n2) times
tmp = x;
/* swap pointers, y is always old
x = y;
/* x is always for new data
y = tmp;
m = 0;
/* m runs over first half of the array
m2 = half;
/* m2 for second half, n2=n/2
for(j = 0; j < Ls; ++j) {
/* of the omega
jrs = j*(r+r);
for(k = jrs; k < jrs+r; ++k) {
/* "butterfly" operation
k2 = k + r;
tr = wr*y[k2].Re - wi*y[k2].Im;
/* complex multiply, w*y
ti = wr*y[k2].Im + wi*y[k2].Re;
x[m].Re = y[k].Re + tr;
x[m].Im = y[k].Im + ti;
x[m2].Re = y[k].Re - tr;
x[m2].Im = y[k].Im - ti;
++m;
++m2;
}
}
r >>= 1;
Ls <<= 1;
};
if (y != y_orig) {
*/
*/
*/
*/
*/
*/
*/
*/
*/
/* copy back to permanent memory */
for(i = 0; i < n; ++i) {

y[i] = x[i];
}
/* if it is not already there */

/* performed only if log2(n/n2) is odd */
}
assert(Ls == n/n2);
assert(1 == n || m2 == n);
/* ensure n is a power of 2
/* check array index within bound
*/
*/
}
/* The Cooley-Tukey multiple column algorithm, see page 124 of Loan.
x[] is input data, overwritten by output, viewed as n/n2 by n2
array. flag = 1 for forward and -1 for backward transform.
*/
void cooley_tukey(complex x[], int n, int flag, int n2)
{
complex c;
int i, j, k, m, p, n1;
int Ls, ks, ms, jm, dk;
n1 = n/n2;
for(k = 0; k < n1; ++k) {
j = 0;
m = k;
p = 1;
while(p < n1) {
j = 2*j + (m&1);
m >>= 1;
p <<= 1;
}
assert(p == n1);
if(j > k) {
for(i = 0; i < n2; ++i) {
c = x[k*n2+i];
x[k*n2+i] = x[j*n2+i];
x[j*n2+i] = c;
}
}
}
/* do bit reversal permutation */

/* This is algorithms 1.5.1 and 1.5.2. */
/* p = 2^q,
q used in the book */
/* make sure n1 is a power of two */

/* swap k <-> j row */
/* for all columns */
/* This is (3.1.7), page 124

p = 1;
while(p < n1) {
Ls = p;
p <<= 1;
jm = 0;
/* jm is j*n2
dk = p*n2;
for(j = 0; j < Ls; ++j) {
/* of the omega
for(k = jm; k < n; k += dk) {
/* "butterfly"
ks = k + Ls*n2;
for(i = 0; i < n2; ++i) {
/* for each row
m = k + i;
ms = ks + i;
tr = wr*x[ms].Re - wi*x[ms].Im;
ti = wr*x[ms].Im + wi*x[ms].Re;
x[ms].Re = x[m].Re - tr;
x[ms].Im = x[m].Im - ti;
x[m].Re += tr;
x[m].Im += ti;
*/
*/
*/
*/
*/
*/
}
}
jm += n2;
}
}
}
/* 3D Fourier transform:
The index for x[m] is mapped to (i,j,k) by
m = k + n3*j + n3*n2*i, i.e. the row major convention of C.
All indices start from 0.
This algorithm requires working space of n2*n3.
Stockham is efficient, good stride feature, but takes extra
memory same size as input data; Cooley-Tukey is in place,
so we take a compromise of the two.
*/
void fft2D(complex x[], int n1, int n2, int flag)
{
complex *y;
int i, n;
assert(1 == flag || -1 == flag);
n = n1*n2;
y = (complex *) malloc( n2*sizeof(complex) );
//assert(NULL != y);
for(i=0; i < n; i += n2) {
stockham(x+i, n2, flag, 1, y);
}
free(y);
cooley_tukey(x, n, flag, n2);
/* FFT in y */
/* FFT in x */
Restored
Image
Module 8: Discrete Cosine Transform

DCT & IDCT Implementation
.align 2
c: .include "C1.dat";
.align 2
pixel: .include "pixel.dat";
.align 2
ct: .include "Ct1.dat";
.align 2
op1: .skip 128; //Allocate 8*8*2 bytes in memory
.align 2
DCT: .skip 128; //Allocate 8*8*2 bytes in memory
.align 2
op2: .skip 128; //Allocate 8*8*2 bytes in memory
.align 2
IDCT: .skip 128; //Allocate 8*8*2 bytes in memory
nop;
p0=[p3+c@GOT17M4]; //loading p0 with the address of C variable
p1=[p3+pixel@GOT17M4]; //loading p1 with the address of pixel variable
p2=[p3+op1@GOT17M4]; //loading p2 with the address of op1 variable
r0=64;
r0=r0<<1;
i0=p0;b0=i0;l0=r0;
i1=p1;b1=i1;l1=r0;
m0=2;
r0=8*2;
m1=r0;
p4=8;
r0=0;
r1=0;
loop outer LC0=p4;
loop_begin outer;
loop inner LC1=p4;
loop_begin inner;
r0.l=w[i0]; //1
i0+=m0;
r1.l=w[i1];
i1+=m1;
a0=r0.l*r1.l;
r0.l=w[i0]; //2
i0+=m0;
r1.l=w[i1];
i1+=m1;
a0+=r0.l*r1.l;
r0.l=w[i0]; //3
i0+=m0;
r1.l=w[i1];
i1+=m1;
a0+=r0.l*r1.l;
r0.l=w[i0]; //4
i0+=m0;
r1.l=w[i1];
i1+=m1;
a0+=r0.l*r1.l;
r0.l=w[i0]; //5
i0+=m0;
r1.l=w[i1];
i1+=m1;
a0+=r0.l*r1.l;
r0.l=w[i0]; //6
i0+=m0;
r1.l=w[i1];
i1+=m1;
a0+=r0.l*r1.l;
r0.l=w[i0]; //7
i0+=m0;
r1.l=w[i1];
i1+=m1;
a0+=r0.l*r1.l;
r0.l=w[i0]; //8
i0+=m0;
r1.l=w[i1];
i1+=m1;
r0.l=(a0+=r0.l*r1.l);
//r0=a0;
w[p2++]=r0;
i1+=m0;
i0=b0;
loop_end inner;
i0+=m1;
b0=i0;
i1=b1;
loop_end outer;
p1=[p3+ct@GOT17M4]; //loading p1 with the address of ct variable
p2=[p3+DCT@GOT17M4]; //loading p2 with the address of DCT variable
r0=64;
r0=r0<<1;
i0=p0;b0=i0;l0=r0;
i1=p1;b1=i1;l1=r0;
m0=2;
r0=8*2;
m1=r0;
p4=8;
r0=0;
r1=0;
loop outer2 LC0=p4;
loop_begin outer2;
loop inner2 LC1=p4;
loop_begin inner2;
r0.l=w[i0]; //1
i0+=m0;
r1.l=w[i1];
i1+=m1;
a0=r0.l*r1.l;
r0.l=w[i0]; //2
i0+=m0;
r1.l=w[i1];
i1+=m1;
a0+=r0.l*r1.l;
r0.l=w[i0]; //3
i0+=m0;
r1.l=w[i1];
i1+=m1;
a0+=r0.l*r1.l;
r0.l=w[i0]; //4
i0+=m0;
r1.l=w[i1];
i1+=m1;
a0+=r0.l*r1.l;
r0.l=w[i0]; //5
i0+=m0;
r1.l=w[i1];
i1+=m1;
a0+=r0.l*r1.l;
r0.l=w[i0]; //6
i0+=m0;
r1.l=w[i1];
i1+=m1;
a0+=r0.l*r1.l;
r0.l=w[i0]; //7
i0+=m0;
r1.l=w[i1];
i1+=m1;
a0+=r0.l*r1.l;
r0.l=w[i0]; //8
i0+=m0;
r1.l=w[i1];
i1+=m1;
r0.l=(a0+=r0.l*r1.l);
//r0=a0;
w[p2++]=r0;
i1+=m0;
i0=b0;
loop_end inner2;
i0+=m1;
b0=i0;
i1=b1;
loop_end outer2;
nop;
nop;
p0=[p3+ct@GOT17M4]; //loading p0 with the address of Ct variable

p1=[p3+DCT@GOT17M4]; //loading p1 with the address of DCT variable
r0=64;
r0=r0<<1;
i0=p0;b0=i0;l0=r0;
i1=p1;b1=i1;l1=r0;
m0=2;
r0=8*2;
m1=r0;
p4=8;
r0=0;
r1=0;
loop outer3 LC0=p4;
loop_begin outer3;
loop inner3 LC1=p4;

loop_begin inner3;
r0.l=w[i0]; //1
i0+=m0;
r1.l=w[i1];
i1+=m1;
a0=r0.l*r1.l;
r0.l=w[i0]; //2
i0+=m0;
r1.l=w[i1];
i1+=m1;
a0+=r0.l*r1.l;
r0.l=w[i0]; //3
i0+=m0;
r1.l=w[i1];
i1+=m1;
a0+=r0.l*r1.l;
r0.l=w[i0]; //4
i0+=m0;
r1.l=w[i1];
i1+=m1;
a0+=r0.l*r1.l;
r0.l=w[i0]; //5
i0+=m0;
r1.l=w[i1];
i1+=m1;
a0+=r0.l*r1.l;
r0.l=w[i0]; //6
i0+=m0;
r1.l=w[i1];
i1+=m1;
a0+=r0.l*r1.l;
r0.l=w[i0]; //7
i0+=m0;
r1.l=w[i1];
i1+=m1;
a0+=r0.l*r1.l;
r0.l=w[i0]; //8
i0+=m0;
r1.l=w[i1];
i1+=m1;
r0.l=(a0+=r0.l*r1.l);
//r0=a0;
w[p2++]=r0;
i1+=m0;
i0=b0;
loop_end inner3;
i0+=m1;
b0=i0;
i1=b1;
loop_end outer3;
nop;
nop;
p1=[p3+c@GOT17M4]; //loading p1 with the address of C variable
p2=[p3+IDCT@GOT17M4]; //loading p2 with the address of IDCT variable
r0=64;
r0=r0<<1;
i0=p0;b0=i0;l0=r0;
i1=p1;b1=i1;l1=r0;
m0=2;
r0=8*2;
m1=r0;
p4=8;
r0=0;
r1=0;
loop outer4 LC0=p4;
loop_begin outer4;
loop inner4 LC1=p4;
loop_begin inner4;
r0.l=w[i0]; //1
i0+=m0;
r1.l=w[i1];
i1+=m1;
a0=r0.l*r1.l;
r0.l=w[i0]; //2
i0+=m0;
r1.l=w[i1];
i1+=m1;
a0+=r0.l*r1.l;
r0.l=w[i0]; //3
i0+=m0;
r1.l=w[i1];
i1+=m1;
a0+=r0.l*r1.l;
r0.l=w[i0]; //4
i0+=m0;
r1.l=w[i1];
i1+=m1;
a0+=r0.l*r1.l;
r0.l=w[i0]; //5
i0+=m0;
r1.l=w[i1];
i1+=m1;
a0+=r0.l*r1.l;
r0.l=w[i0]; //6
i0+=m0;
r1.l=w[i1];
i1+=m1;
a0+=r0.l*r1.l;
r0.l=w[i0]; //7
i0+=m0;
r1.l=w[i1];
i1+=m1;
a0+=r0.l*r1.l;
r0.l=w[i0]; //8
i0+=m0;
r1.l=w[i1];
i1+=m1;
r0.l=(a0+=r0.l*r1.l);
//r0=a0;
w[p2++]=r0;
i1+=m0;
i0=b0;
loop_end inner4;
i0+=m1;
b0=i0;
i1=b1;
loop_end outer4;
nop;
nop;
DCT
IDCT

EE5700 DSP Lab Report Matrix Operations and Program Flow Control

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

EE5700 DSP Lab Report Matrix Operations and Program Flow Control

Diunggah oleh

Hak Cipta:

Format Tersedia

EE5700

Working on Load-Store-Move-Loops-Circular Buffers ............................................................. 4

Working on Arithmetic Logical & Shift Instructions................................................................. 6

Working on Program Flow -CC management -Bit operations .................................................. 9

A-Law companding (encoding) ............................................................................................. 14

A-Law companding (decoding) ............................................................................................. 16

Discrete-time channel model with FIR impulse response ...................................................... 21

Module 6: 8-point Fast Fourier Transform ....................................................................................... 23

Optical Blur Degradation on an image .................................................................................. 25

Restore an image that has undergone an optical blur degradation ....................................... 30

Module 8: Discrete Cosine Transform .............................................................................................. 35

Module 1: Assembly Coding Familiarization

2. Working on Arithmetic Logical & Shift Instructions

_main: //Start of main function

3. Working on Program Flow -CC management -Bit operations

1. Low pass filter using the input side based algorithm

2. High pass filter using the output side based algorithm:

00000000 B8E771F4 70062913

Module 3: A-Law companding (encoding and decoding) on the

2. A-Law companding (decoding)

Module 5: Echo Cancellation using the LMS algorithm

2. Continuous-time R-C low pass filter model for echo impulse

Module 6: 8-point Fast Fourier Transform

* (a + bi)(c + di) = (ac - bd) + i (ad + bc)

Module 7: Image Restoration

/* for complex number */

/* Ls=L/2 is the L star */

while(r >= n2) {

x[m].Re = y[k].Re + tr;

/* copy back to permanent memory */

/* do bit reversal permutation */

q used in the book */

/* make sure n1 is a power of two */

/* This is (3.1.7), page 124 */

2. Restore an image that has undergone an optical blur degradation

/* for complex number */

/* Ls=L/2 is the L star */

while(r >= n2) {

/* copy back to permanent memory */

for(i = 0; i < n; ++i) {

/* if it is not already there */

/* do bit reversal permutation */

q used in the book */

/* make sure n1 is a power of two */

/* This is (3.1.7), page 124

Module 8: Discrete Cosine Transform

p0=[p3+ct@GOT17M4]; //loading p0 with the address of Ct variable

loop inner3 LC1=p4;

Anda mungkin juga menyukai