2, June 2016
AbstractThe purpose of this study is to propose two kinds of thread-level parallelization computing Zernike
moments. Our proposed parallel algorithms are based on the combination of the q-recursive method and the
Prata's method with symmetry by a certain dihedral group. The experiments results show that our proposed
method is fast and accurate. The synchronized parallelization is applicable to greater than 250 order Zernike
moments calculation. The reductive parallelization is suitable for computing Zernike moments order between
10 and 250. For computing all Zernike moments up to order 500, it only requires 3.499 sec. on a quad-core
personal computer for a 512 512 test image. By our proposed method, the normalized mean square error with
less than 500 orders Zernike moments is 0.001435 for image reconstruction, whereas the error rate by qrecursion method is 0.001866.
KeywordsImage Processing; Parallel Computing; Prata's Method; Q-Recursive Method; Zernike Moments.
AbbreviationsNormalized Mean Square Error (NMSE).
I.
INTRODUCTION
ISSN: 2321-2381
R nm r
R nm r
can be expressed as
k
R nmk r ,
k m
k m :even
R nmk 1
where
n k
!
2
nk
2
n k k m k m
!
!
2
2
2
f z
Zernike moments
Z nm
(2)
Z nm
are defined as
f z V nm z
z D
dz d z
2i
dx dy rdr d
(3)
V nm z
(4)
11
The SIJ Transactions on Computer Science Engineering & its Applications (CSEA), Vol. 4, No. 2, June 2016
x s , y t s , t
2 s N 1 2t N 1
N 2
N 2
(5)
Z nm
f z f x, y
f x , y f s , t P s , t .
x, y
M
lim
Z nm V nm
M n0 m n
lim
M
R nm
m 0
r Re Z nm cos m
II.
Im Z nm
sin
(7)
Our proposed method is based on the combination of qrecursive method and the Prata's method. Let us recall the
both methods in this part. The q-recursive method is
introduced by Chong et al., [6] which is based the following
recurrence:
R nm r K 1 R n , m 4 r K
nM
Z nm V nm x , y
(8)
f x , y f x , y dxdy
f x , y dxdy
2
(9)
s , t A Z |
s , t N 1, t s
2
(11)
K3
R n,m 2 ,
2
r
(13)
m n 4 , n 6 ,..., 1 or 0
with the initial conditions.
n
R nn r r , n 0 and
R n , n 2 r nr
n 1 r
n2
K2
K3
n m
4 m 3
4 m 2 m 1
n m 2 n m
K3 n m 4
(14)
, n 2
H Z
PROPOSED ALGORITHMS
n m even
f x , y
(12)
n m even
r
Z
R
n0 n0
n0
cos m
P s , t R nm r
i
sin
m
s , t H Z
2n 1 s t
N
cos m / 4
P s , t R nm r
s , t H Z V 4
i sin m / 4
st
m 4
(15)
for
,
. The time complexity of q-recursive
method is O(N2 M2). For a fixed order n and a fixed r,
computing all Zernike radial polynomial Rnm(r) for all
repetition m takes a time complexity of O(M).
The other method, namely the Pratas method, originally
introduced in [Prata & Rusch, 4], uses the recurrence given
by
R nm r rC 1 R n 1 , m 1 r C 2 R n 2 , m r
(16)
where the constants can computed as follows:
C1
2n
m n
, C2
m n
m n
1 C1
for
m n , n 2 ,..., 1
or 0
(17)
12
The SIJ Transactions on Computer Science Engineering & its Applications (CSEA), Vol. 4, No. 2, June 2016
C1, C 2
x*x+ y*y
Figure 1: The plots of radial polynomial R200,0(r) using the qrecursive method and the modified Prata's method
for m, k=0,1,...,M
2b. Compute
Z[0][0]=Z[0][0]+2/ *N*N*f(x,y)
2c. For n=1 to 3 step 1
Using q-recursive method to compute
Z[n][m]=Z[n][m]+2/ *N*N*f(x,y)*R_nm*(
cos m i* sin m ) for m=n, n-2,..., 1( or 0)
Next n
2d. For n=4 to M step 1
2d1. Using q-recursive method to compute
Z[n][m]=Z[n][m]+2/ *N*N*f(x,y)*R_nm*(
cos m i* sin m ) for m=n, n-2,..., 7( or 6)
2d2. If n is even then
a=4
Else
a=5
End if
2d3. Using Prata's method to compute
Z[n][m]=Z[n][m]+2/ *N*N*f(x,y)*
R_nm*( cos m i* sin m )
for m=a, a-2,..,1( or 0)
Next n
End for
Algorithm A takes a time complexity of O(N2M2). The
total memory need has a complexity of O(N2+ M2), where the
array Z[0:M][0:M] storing the computational values for
Zernike moments contributes a complexity of O(M2), the
array storing the values f(x,y) for each pixel in a gray-level
image contributes a complexity of O(N2), and other arrays
and variables storing the temporary values for cos m , sin
m , rk and Rnm(r) contribute a complexity of O(M).
Algorithm A can be sped up by using D4 symmetry. Such
a modification can speed up to 8 times which we denote by
Algorithm A+.
2.3. Thread-Level Parallelism using Synchronization
The control variables s, t in the outer loop computing Zernike
moments have dependencies in the index numbers n, m for
the array storing the values for Zernike moments Znm. A
division of this array into disjoint subsets can be done, but not
indexed by the loop control variables s, t in outer loop. A
ISSN: 2321-2381
13
The SIJ Transactions on Computer Science Engineering & its Applications (CSEA), Vol. 4, No. 2, June 2016
C1, C 2
as in Equation (17)
ISSN: 2321-2381
cos m i* sin m
)
for m=n, n-2,..., 1( or 0)
Next n
2d. For n=4 to M step 1
2d1. Using q-recursive method to compute
z[p][n][m]=z[p][n][m]+2/
*N*N*f(x,y)*R_nm*( cos m i* sin m )
for m=n, n-2,..., 7( or 6)
2d2. If n is even then
a=4
Else
a=5
End if
2d3. Using Prata's method to compute
z[p][n][m]=z[p][n][m]+2/ *N*N*f(x,y)*R_nm*(
cos m i* sin m
)
for m=a, a-2,..., 1( or 0)
Next n
End Parallel for
Parallel for p=0 to P-1 in P threads
For n=0 to M step 1
For m=n to 0 step -2
Z[n][m]= Z[n][m]+z[p][n][m]
Next m
Next n
End Parallel for
The memory need for any kind of parallel algorithms can
be computed as follows:
(total memory)
= (shared
memory)
(19)
III.
for a thread
EXPERIMENTAL RESULTS
x*x+ y*y
2b. Compute
z[p][0][0]=z[p][0][0]+2/ *N*N*f(x,y)
For n=1 to 3 step 1
Using q-recursive method to compute
z[p][n][m]=z[p][n][m]+2/ *N*N*f(x,y)*R_nm*(
for m, k=0,1,...,M
14
The SIJ Transactions on Computer Science Engineering & its Applications (CSEA), Vol. 4, No. 2, June 2016
Figure 2: Ten standard test images are: Pirate, cameraman, house, lake, splash, Tiffany, Lena, tank, peppers and fighter F-16
ISSN: 2321-2381
(a)
(b)
(c)
Figure 3: The test image (a) 'Lena' of 512512 pixels for
experiments and its reconstruction images from computed values for
Zernike moments up to order 500 (b) using proposed method
Algorithm A+ and (c) using q-recursive method
15
The SIJ Transactions on Computer Science Engineering & its Applications (CSEA), Vol. 4, No. 2, June 2016
The NMSE for those ten test images using the proposed
method and the q-recursive method at maximal order 500 are
all listed in Table 1.
Figure 4: The error rate for the reconstructed images using different
recursive methods
Table 1: The NMSE for test images using the proposed method Algorithm A+ and the q-recursive method at maximal order 500
NMSE (
Algorithms
Camera
Man
House
Tiffany
F-16
fighter
Lena
Tank
Pirate
Splash
Pepper
Lake
Proposed
Method
2.537
1.811
1.397
2.220
1.435
3.301
4.957
1.872
3.486
3.272
Q-Recursive
Method
2.822
2.064
1.733
2.492
1.866
3.509
5.234
2.516
3.923
3.666
ISSN: 2321-2381
16
The SIJ Transactions on Computer Science Engineering & its Applications (CSEA), Vol. 4, No. 2, June 2016
90
80
70
60
50
40
30
20
10
0
10
50
100
150
200
250
300
350
400
450
500
q-recursive
0.047
0.827
3.23
7.176
12.807
20.92
29.734 39.983 51.823 65.754 82.742
Algorithm A+ 0.0139999 0.111
0.456
1.032
1.931
2.866
3.945
5.409
7.071
9.003
11.663
Figure 6: Elapsed time for Algorithm A+ and q-recursive method applied to the test image 'Lena' of 512512 pixels
Table 2: Elapsed time in sec. for Algorithm A, Algorithm A+ and Algorithm B+, and the associated speedup factors
t0
t1
s0
t2
p2
s2
t4
p4
s4
t8
p8
s8
14.385
1.931
7.45
1.019
1.89
14.12
0.713
2.71
20.18
0.582
3.32
24.72
21.755
2.866
7.59
1.53
1.87
14.22
1.028
2.79
21.16
0.804
3.56
27.06
30.629
3.945
7.76
2.115
1.87
14.48
1.514
2.61
20.23
1.085
3.64
28.23
41.275
5.409
7.63
2.920
1.85
14.14
1.85
2.92
22.31
1.542
3.51
26.77
53.776
7.071
7.61
3.811
1.86
14.11
2.523
2.80
21.31
1.942
3.64
27.69
68.934
9.003
7.66
4.947
1.82
13.93
3.277
2.75
21.04
2.664
3.38
25.88
88.428
11.663
7.58
6.471
1.80
13.67
4.625
2.52
19.12
3.499
3.33
25.27
Unit in time: second
ISSN: 2321-2381
17
The SIJ Transactions on Computer Science Engineering & its Applications (CSEA), Vol. 4, No. 2, June 2016
(a)
(b)
Figure 7: (a) the elapsed time for Algorithm A+ and Algorithm B+ with number of threads p=2, 4 and 8 (b) The surface given by the parallel
factor function s P f ( M , P ) using Algorithm B+
Table 3: Elapsed time in sec. for Algorithm A, Algorithm A+ and Algorithm C+, and the associated speedup factors
max. order
10
50
100
150
200
250
300
t0
0.045
0.764
3.23
7.488
14.385
21.755
30.629
t1
0.014
0.111
0.456
1.032
1.931
2.866
3.945
s0
3.21
6.88
7.08
7.26
7.45
7.59
7.76
t2
0.008
0.059
0.244
0.608
1.015
1.544
2.224
p2
1.75
1.88
1.87
1.70
1.90
1.86
1.77
IV.
CONCLUSION
ISSN: 2321-2381
s2
5.63
12.95
13.24
12.32
14.17
14.09
13.77
t4
0.005
0.034
0.178
0.366
0.632
0.959
1.462
p4
2.80
3.26
2.56
2.82
3.06
2.99
2.70
s4
9.00
22.47
18.15
20.46
22.76
22.69
20.95
t8
p8
s8
0.005
2.80
9.00
0.035
3.17
21.83
0.137
3.33
23.58
0.303
3.41
24.71
0.522
3.70
27.56
0.812
3.53
26.79
1.647
2.40
18.60
Unit in time: second
takes 3.499 seconds to compute the top 500-order Zernike
moments of an image with 512512 pixels. With the use of
eight multithreads to compute high-order Zernike moments,
the parallel speedup factor lies between 3.32 and 3.64, a good
parallelism performance for a quad-core computer.
For medium maximal Zernike order, between 50 and
250, the parallelism by reduction method can be aptly used.
The parallel speedup factor lies between 3.17 and 3.70 with
eight multithreads, and the parallel speedup factor nears 3 just
by using four multithreads.
At maximal order 500, the normalized mean square error
is 0.001435 by using Algorithm A on the 512512 pixels test
image 'Lena', whereas the error rate for the image
reconstruction by q-recursion method is 0.001866. When
computing high-order Zernike moments, the proposed
method outperforms other compared methods in terms of
accuracy.
REFERENCES
[1]
[2]
18
The SIJ Transactions on Computer Science Engineering & its Applications (CSEA), Vol. 4, No. 2, June 2016
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
ISSN: 2321-2381
[15]
[16]
[17]
19