Anda di halaman 1dari 116

Final thesis for S-15.

, Michal Dobroczynski
Spring
2006
S-no. 15
2D FFT in Image rocessing! meas"rements,
im#lementation, #arallelization an$ com#"ter
architect"re
Keywords: Fourier Programming Linux
Synopsis:
Report desri!es imp"ementation o# two-dimensiona" Fast Fourier $rans#orm in t%e
P&'( pro)et. &part #rom mat%ematia" e*uations and sop%istiated a"gorit%ms+ t%e
report tries to introdue and exp"ain in detai"s a"" ot%er #ators+ t%at are usua""y
s,ipped and may in#"uene o-era"" e##iieny o# t%e imp"ementation. Sine t%e P&'(
pro)et is de-e"oped under Linux+ t%e t%esis ontains important notes a!out Linux
,erne" and desri!es in detai"s pro)et organi.ation in"uding notes a!out /01
!ui"ding proess2too"s. From t%e %ardware point o# -iew+ t%is paper pro-ides a
ompre%ensi-e desription o# urrent"y a-ai"a!"e te%no"ogy and possi!"e pit#a""s
assoiated wit% it 3 most"y in terms o# rea"-time+ "arge t%roug%put Linux oriented
so#tware.
&s t%e #ina" on"usion+ t%e report states t%at t%e !est so"ution is not to imp"ement
24 FF$ routines #rom srat%+ sine omp"exity o# t%e tas, and num!er o# possi!"e
%ardware om!inations wi"" outnum!er e-ery programmer.
5 aept t%at t%e report is a-ai"a!"e at t%e "i!rary o# t%e department.
Student: 6i%a" 4o!ro.yns,i sign.
Super-isor: 0or!ert Kr7ger sign.
'ompany: &a"!org 1ni-ersity 'open%agen
'o-ordinator: 8enning 8augaard
9xt. examiner Stig S,e"!oe sign.
1 o# 116

Department of Electrical
Engineering and
Information Technology
Lautrupvang 15
2750 Ballerup
Denmark
Tel.: +45 4480 5130
Fax: +45 4480 5140
www.ihk.dk
ENGINEERING
COLLEGEOF
COPENHAGEN
Table of Contents
Pre#ae................................................................................................................5
Part 5 - :5n mat%ematis you don;t understand t%ings. <ou )ust get used to
t%em.=..................................................................................................................>
$%?orie ana"yti*ue de "a %a"eur 3 w%ere e-eryt%ing !egins.........................>
@ean Aaptiste @osep% Fourier.......................................................................>
B%at Fourier $rans#orm does 3 %uman "anguage de#inition.......................>
Fourier trans#orm: mat%ematia" approa%................................................C
'onditions................................................................................................C
4ira de"ta: impu"se #untion.................................................................10
Periodi #untion....................................................................................11
9xponentia" signa".................................................................................11
1nit step................................................................................................11
'onstant signa"......................................................................................11
/ate #untion.........................................................................................11
Properties o# Fourier trans#ormation........................................................11
4F$ - 4isrete Fourier $rans#orm.............................................................1D
Samp"ing t%eory.....................................................................................1D
Euanti.ation...........................................................................................1F
4F$: speia" ase o# Fourier $rans#orm................................................1F
Properties o# 4F$...................................................................................1>
'on-o"ution................................................................................................1>
$wo-dimensiona" Fourier $rans#orm.........................................................1G
5ntrodution...........................................................................................1G
4e#inition o# two-dimensiona" Fourier $rans#orm ................................1C
5n-erse two-dimensiona" Fourier $rans#orm ........................................1C
4isrete two-dimensiona" Fourier $rans#orm........................................20
4F$ 3 omputationa" !ott"ene,+ FF$ 3 e##iieny winner ...................20
Fast Fourier $rans#orm a"gorit%m................................................................21
Radix-2 Autter#"y.......................................................................................21
45$.........................................................................................................21
45F.........................................................................................................2F
Ait re-ersed...........................................................................................2F
5mp"ementing radix-2 45F.........................................................................25
$widd"e #ators......................................................................................25
Radix-2 45F FF$.....................................................................................26
$esting radix-2 45F FF$........................................................................2C
Radix-2 in num!ers: ost.......................................................................D0
$wo-dimensiona" FF$: adapting 14 FF$...................................................D0
24 3 omputationa" madness.................................................................D0
Programming onsiderations and o-er-iew o# t%e met%od...................D1
$ransposition.........................................................................................D2
Para""e"i.ation and 24 FF$....................................................................D2
5mp"ementation o# 24 FF$: reusing 14 FF$.........................................DD
24 FF$: test...........................................................................................D5
'omputations -s. met%ods.....................................................................D6
&"gorit%ms #or rea" data.........................................................................D>
Part 55 - :0ot e-eryt%ing t%at an !e ounted ounts+ and not e-eryt%ing t%at
ounts an !e ounted.=....................................................................................DG
2 o# 116
Linux as destination operating system.........................................................DG
5ntrodution to Linux en-ironment...........................................................DG
(perating system 3 organi.ation...........................................................DC
Linux means ,erne"................................................................................F0
Kerne" 3 organi.ation.............................................................................F0
S%edu"er and tas, management..............................................................F1
S%edu"er types.....................................................................................F2
52( and 'P1 !ound proesses...............................................................FD
Priorities 3 in#"uening s%edu"er;s *ueue.............................................FF
Eui, note a!out 52( s%edu"ers...............................................................F5
&-ai"a!"e 52( s%edu"ers........................................................................F5
6u"ti-t%readed app"iations......................................................................F6
'"assi -s. modern proesses................................................................F6
B%y t%reads are so importantH.............................................................FG
S6P 3 more 'P1s+ more pro!"ems........................................................FG
4ua"-ore te%no"ogy.............................................................................FC
0ort%!ridge pro!"em.............................................................................FC
6F !its to %appiness...................................................................................51
B%y 6F is !etter t%an D2H......................................................................51
6F !its o# pro!"ems................................................................................52
6F-!it omputationa" resear% !ox........................................................5D
5nside and "ose to 'P1................................................................................5F
5mportane o# t%is %apter........................................................................5F
Stage 1: pipe"ining.....................................................................................5F
0um!er o# pipe"ine stages.....................................................................55
F"oating-point unit.....................................................................................56
6any proessors 3 one #"oating point standard I5999 >5FJ..................5>
xG>.........................................................................................................5G
I6ar,etingJ 9xtensions.........................................................................5C
5nside t%e memory %ierar%y....................................................................61
6et%od...................................................................................................62
Si.es+ "e-e"s+ organi.ation.....................................................................6D
(ne !us+ di##erent !an,s........................................................................65
&"ternati-e so"ution 3 /P/P1.......................................................................65
5ntrodution to /P1 wor"d........................................................................66
Streams..................................................................................................66
Speia"i.ation -s. optimi.ation - datapat%.............................................66
FF$ on /P1...............................................................................................6>
Part 555 - :5# you torture t%e data enoug%+ it wi"" on#ess.=...............................6G
5ntrodution...................................................................................................6G
Pro)et #ormu"ation....................................................................................6G
6i"estone p"an...........................................................................................6G
Re*uirements............................................................................................6C
Pro!"em so"ution...........................................................................................>0
Aen%FF$..................................................................................................>1
F"orian....................................................................................................>1
mi.........................................................................................................>F
*tux........................................................................................................>>
-o",s.......................................................................................................>C
Resu"ts #rom FF$B;s we!page..............................................................G2
D o# 116
Pro)et struture........................................................................................G5
8ow does FF$B wor,...............................................................................G6
Porta!i"ity and adapti-ity 3 "et;s p"anK...................................................G6
'a"u"ating 4F$: o-er-iew....................................................................G6
5mp"ementation notes #or mu"ti-t%readed en-ironments.......................G>
Bisdom: using memoi.ation..................................................................G>
'on"usions............................................................................................G>
6FF$.............................................................................................................GG
4e-e"opment met%od.................................................................................GG
'on#igure.in...........................................................................................GG
6a,e#i"e.am...........................................................................................C0
5terations and mi"estones......................................................................C0
Soure ode doumentation..................................................................C1
5dea o# m##t and ,ey #eatures....................................................................C2
Bisdom manager...................................................................................C2
$ime engines..........................................................................................CD
'on#iguration #i"e and runtime settings................................................CD
5n ation: %ow to use t%e "i!rary...............................................................C>
Running m##test: m##t output.................................................................CG
5ntegration into 6o5nS........................................................................101
6easurements.............................................................................................102
6o5nS......................................................................................................10D
6FF$9S$.................................................................................................105
F"orian..................................................................................................105
6i.......................................................................................................105
Etux.....................................................................................................106
Lo",s....................................................................................................10>
5n-p"ae a"gorit%ms..............................................................................10>
5n-p"ae+ out-o#-p"ae wit% #u""y preempti!"e ,erne"............................10>
24 FF$.....................................................................................................10G
Si6ar,...................................................................................................10G
F"orian..................................................................................................10C
mi.......................................................................................................10C
*tux......................................................................................................10C
-o",s.....................................................................................................10C
nata"ie..................................................................................................10C
'a%e !en%mar,....................................................................................110
F"orian..................................................................................................110
6i.......................................................................................................111
Etux.....................................................................................................111
Lo",s....................................................................................................112
FF$ on /P1.............................................................................................112
Fina" on"usions.........................................................................................11D
&ppendies......................................................................................................115
&,now"edgments...........................................................................................115
Re#erenes......................................................................................................116
F o# 116
Preface
$wo years ago rea"-time #i"tering in #re*ueny domain o# images !eing "arger
t%an 16egapixe" was #rom t%e pratia" point o# -iew impossi!"e. $oday it is+
!ut it does not mean it is neat and easy. 9speia""y now+ w%en te%ni*ue
rea%es its p%ysia" !oundaries programmers e-en more %a-e to #ous on
so"-ing tas,s in optima" and e##iient way. $%e ot%er t%ing is+ t%at aessing
a"" power "ying in ontemporary ma%ines !eomes more and more omp"ex.
$%is doument was di-ided intro t%ree setions 3 ea% one orresponds to
di##erent #ie"d o# siene t%at is re"e-ant to t%e :FF$ pro!"em=+ as it wi"" !e
a""ed in "ater %apters. $%e #irst part introdues Fourier trans#orm+ its
de#inition+ app"iations and s"ow"y deri-es ot%er #orms o# t%e trans#orm. Last
%apters o# t%e #irst part onentrate on t%e FF$ a"gorit%m. :9xamp"e isn;t
anot%er way to tea%+ it is t%e on"y way to tea%= I&. 9insteinJ 3 so t%at
disussion a!out FF$ a"u"ations ends up in imp"ementing !ot% one- and two-
dimensiona" routines in '. Aut good mat%ematia" approa% is )ust one o# t%e
ways to t%e suess 3 one out o# many.
5# programming was on"y a!out imp"ementing a"gorit%ms or mat%ematia"
#ormu"as as t%ey are+ t%en it wou"d !e a rea""y nie and easy )o! #or
programmers. (# ourse 3 it is not "i,e t%at. Seond part o# t%e report tries to
answer se-era" *uestions t%at usua""y are not e-en mentioned in most !oo,s
a!out FF$ a"gorit%ms+ !ut are ,ey #ators to aessing maximum
per#ormane. P"ease+ %a-e in mind t%at t%e seond %apter does not pro-ide
ready answers+ as ea% pro!"em re*uires detai"ed ana"ysis and se-era" tests.
5nstead it gi-es a "ue a!out t%ings t%at %appen inside 'P1s+ memories and so
on. 5t a"so pin-points important #ats a!out %ardware and Linux organi.ation
and s%ou"d de#inite"y %e"p in reating #urt%er de-e"opment p"ans and
organi.ing test ases. 5t a"so tou%es a"ternati-e so"ution 3 FF$ on a /P1+
w%i% is a !rand new approa% t%at s%ou"d !e are#u""y tested.
$%ird part tries to ma,e use o# a"" t%e ,now"edge introdued in pre-ious parts.
5t s%ows step !y step+ %ow t%e :FF$ pro!"em= was so"-ed+ in"uding detai"s
a!out t%e FF$B "i!rary+ w%i% was t%e one pi,ed up a#ter !en%mar,ing
proess.
B%y t%en writing your own FF$ routinesH B%y !ot%er a!out a"" t%ese #ats+
sine anyway :we are not going to write2use e-en a sing"e piee o# our own
FF$ ode=H Know"edge is power+ and power an !e used on"y !y t%ose w%o
,now %ow to master it.
5 o# 116
$%e $1M
$1M;s magnitude spetrum
6 o# 116
Part I - In mathematics you don't understand things.
You just get used to them.
%ohann &on 'e"mann
Thorie analytiue de la chaleur ! "here
e#erything begins
Jean Baptiste Joseph Fourier
Fourier was !orn in &uxerre+ Frane on 6ar% 21+ 1>6G and is t%e one t%at is
,nown #or esta!"is%ing t%e so a""ed Fourier ana"ysis.
8is o!ser-ations in t%e !eginning ontained
errors
1
+ ne-ert%e"ess were a !ig !rea,t%roug%. 5n
1G22 %e pu!"is%ed :$%?orie ana"yti*ue de "a
%a"eur= in w%i% %e mentioned+ t%at e-ery
#untion o# a -aria!"e+ !ot% ontinuous or
disontinuous an !e expanded into an in#inite
series o# sines and osines. From t%e point w%en
%is t%eory was pu!"is%ed one *uestion remained
unanswered: w%en one an say+ t%at a #untion an
!e expanded !y a series o# sines and osinesH
Fourier;s wor, was ontinued #irst"y !y @osep%
Lagrange+ w%o a,now"edged mentioned t%eorem+
!ut %is wor, was sti"" not satis#atory. Fina""y+
@o%ann 4iri%"et was t%e #irst one t%at ga-e
satis#atory presentation o# Fourier;s
trans#ormation.
$%e !iggest a%ie-ement o# Fourier;s wor, was t%e #at+ t%at #untions in t%e
#re*ueny domain ontain exat"y t%e same in#ormation as origina"s: t%is
means+ t%at peop"e are a!"e to per#orm ana"ysis o# a #untion #rom a di##erent
point o# -iew. $%is o# ourse resu"ted in enormous num!ers o# app"iations o#
Fourier;s t%eorem.
What Fourier Transform does human language definition
$%e Fourier trans#orm is de#ined as an integra" Iin terms o# Riemann;s
integra"J:
M(o)=

x(t)e
) ot
dt
&!o-e integra" de#ines an operation w%i% auses+ t%at to t%e #untion xItJ a
#untion MIJ o# rea" -aria!"e is assigned. &na"ogous situation is w%en we
de#ine t%e in-erse Fourier trans#orm:
1 8e did not esta!"is% onditions t%at %a-e to !e #u"#i""ed !y a #untion
> o# 116
5""ustration 1: @osep% Fourier
x(t)=
1
(2n)

M(o)e
) ot
do
$%is time a #untion MI J is assigned a #untion xItJ. 5# we %a-e a #untion xItJ
and its Fourier trans#orm+ t%en we an say t%at we %a-e a trans#orm pair+
w%i% in sym!o"i "anguage is de#ined as #o""ows:
x(t)M(o)
Ae#ore t%e Fourier trans#orm t%eorem an !e used+ a set o# onditions %as to
!e esta!"is%ed. $%ey an !e di-ided into two ategories: t%e #irst one sets
onditions t%at #untion xItJ %as to meet in order to %a-e a trans#orm pair+ and
t%e ot%er one starts t%e disussion a!out mutua" properties o# t%e trans#orm 3
it means+ t%at it tries to de#ine onditions at w%i% it is possi!"e to app"y
in-erse trans#orm and o!tain origina" #untion+ xItJ.
&!o-e de#initions an !e onsidered !y mat%ematiians and peop"e a"ready
#ami"iar wit% Fourier trans#orm as :written in %uman "anguage=+ !ut w%at in
#at Fourier trans#orm doesH B%at is t%e resu"t o# app"ying it and w%at does it
rea""y mean+ t%at we an ana"y.e a #untion #rom a di##erent point o# -iewH
Bit%out tedious a"u"ations a!o-e de#initions do not say anyt%ing a!out
possi!"e outome+ t%us #ran,"y spea,ing+ Fourier trans#orm deomposes
origina" #untion xItJ into its !asi omponents 3 as it was mentioned ear"ier in
t%is text 3 to a sum o# sines and osines. 9-en simp"er approa% is to say+ t%at
!y app"ying Fourier trans#orm to t%e #untion Isigna"J xItJ we are a!"e to see
di##erent #re*uenies present in xItJ. 9a% !asi omponent arries two piees
o# in#ormation: amp"itude IgainJ and #re*ueny. 5n most app"iations we wi""
!e interested in a"tering amp"itude o# e"ements %a-ing ertain
#re*ueny2#re*uenies. (# ourse+ pre-ious"y mentioned #eature t%at #or a"" t%e
time we are wor,ing on t%e same #untion %o"ds true. xItJ is t%e same+
di##erent is t%e way o# presenting it.
(ur #untion !eomes a sum o# in#inite"y many terms omposed o# sine and
osine #untions+ ea% o# t%em %a-ing di##erent #re*ueny+ w%i% is a
mu"tip"iation o# t%e so a""ed #undamenta" #re*ueny. 5n musi t%e
#undamenta" #re*ueny an !e re#erred to as t%e "owest wa-e#orm present in a
signa" t%at determines its pit%. &"" %ig%er #re*uenies+ w%i% are mu"tip"ies o#
#undamenta" #re*ueny :o
%armonis:2o+ Do+ Fo... o
o=2n# + sot%at o#

t%e #undamenta" one+ are a""ed %armonis. &"" %ig%er #re*ueny signa"s in
t%is ase are responsi!"e #or gi-ing t%e so a""ed :o"or= to t%e instrument.
$%at is w%y di##erent instruments w%en p"ay t%e same tone gi-e di##erent
musia" texture 3 )ust !eause t%ey %a-e di##erent aousti spetrum.
B%y are we ta",ing a!out sines and osines as expansion #untionsH 5n #at+
any set o# #untions an !e used as "ong as a"" #untions in t%at set are
ort%ogona". $%is property an !e de#ined in t%e #o""owing way:
G o# 116
(#
n
+ #
m
)=0+ nm
(#
n
+ #
m
)=

t
1
t
2
#
n
( t)#
m
(t)dt
(rt%ogona"ity assures+ t%at e"ements o# a se*uene wi"" not inter#ere wit%
ea% ot%er+ and additiona""y it !rings in a "ot o# ot%er app"iations and
possi!i"ities+ "i,e #or Parse-a";s t%eorem w%i% says+ t%at energy o# a signa"
t%at is omposed o# ort%ogona" e"ements is e*ua" to t%e sum o# energies o#
signa" omponents.
9xponentia" series is ort%ogona" and de#initions presented wit% use o#
exponentia" series are !e"ie-ed to !e t%e most reada!"e ones. (# ourse+ as it
was stated a!o-e+ a"" t%ese de#initions ou"d !e a"so presented !y means o#:
trigonometri series
Legendre po"ynomia"s
Ba"s%;s #untions
8aar;s #untions
Fourier transform: mathematical approach
Conditions
$o !e a!"e to o!tain a Fourier trans#orm pair+ #untion xItJ %ast to #u"#i"" t%e
#o""owing onditions:
xItJ %as to !e a!so"ute"y integra!"e+ meaning t%at

x(t)dt . $%is
ondition is onsidered to !e a su##iient ondition.
i# : x(t)=(t)sin(2n#t+o)+ # =onst + o=onst and (t+,)(t) and #or
t>\>0 t%e #untion
x(t)
t
is a!so"ute"y integra!"e+ t%en MIJ exists and
satis#ies t%e in-erse Fourier trans#orm. P"ease %a-e in mind t%e #at+ t%at
o=2n# . $%is ondition %as an app"iation w%en it omes to t%e so-a""ed
samp"ing #untion -ery o#ten denoted as SaItJ N Sa(at)=
sin(at)
at
O. 5t is
pure"y -isi!"e t%at t%is #untion is not a!so"ute"y integra!"e. 0e-ert%e"ess+
Fourier trans#orm #or a
%(t)=2&#
0
Sa(2n#
0
t)=2&#
0
sin(2n#
0
t)
2n#
0
t
exists and
is gi-en !y t%e #o""owing #ormu"a:
8(# )=

2&#
0
sin(2n#
0
t)
2n#
0
t
e
)2n#t
dt=...=
&
n

sin(2n#
0
t)os(2n#t)
t
dt
5ntegrand #untion is odd+ t%us imaginary part wi"" !e e*ua" to .ero. &#ter
su!stituting sin( x) os( y)=
1
2
[ sin( x+y)+sin( xy)] to o!tained integra":
C o# 116
8(# )=&(#
0
+# )

[
sin(2nt( #
0
+# ))
2nt (#
0
+# )
dt ]+&(#
0
# )

[
sin(2nt (#
0
# ))
2nt(#
0
# )
dt]
.
From t%e standard integra"s ta!"e we ,now t%at

[
sin(2nax)
2nax
dx]=
1
2 a
+
so t%at t%en our so"ution is as #o""ows:
8(# )=& ##
0
8(# )=
&
2
# =!#
8(# )=0 #>#
0
.
For t%is ondition to !e true now it is enoug% to s%ow+ t%at it is possi!"e to
a"u"ate in-erse Fourier trans#orm o# t%e gi-en examp"e. $%en o# ourse+
we o!tain trans#orm pair.
%(t)=

#
0
#
0
&e
)2n#t
d# =&

#
0
#
0
os(2n#t)d# =...=2&#
0
sin(2n#
0
t)
2n#
0
t
.
Fina""y we an write t%at %ItJ %as a trans#orm 8I#J #or
##
0
.
$%ese two onditions are su##iient onditions #or existene o# Fourier
trans#orm. Ay means o# t%em we an esta!"is% a set o# #untions t%at an !e
represented !y a ur-e o# #inite %eig%t in any #inite time inter-a". (ne an
easi"y spot+ t%at t%is de#inition "a,s singu"ar Iimpu"seJ #untions.
5t is possi!"e to o!tain Fourier trans#orms #or periodi or impu"se #untions+
!ut i# and on"y i# t%e distri!ution t%eory is app"ied. B%y are t%ey so importantH
4e-e"oping trans#orms o# singu"ar #untions an !e used #urt%er #or
de-e"oping ot%er trans#orm pairs and an signi#iant"y simp"i#y #urt%er
a"u"ations.
$irac delta% im&ulse function
4ira de"ta #untion or sometimes re#erred to as a unit impu"se #untion is
de#ined !y t%e #o""owing set o# e*uations:
6(x)=+ x=0
6( x)=0+ 0 x0
5ts integra" #rom minus to p"us in#inity is e*ua" to 1. 4ira de"ta is a -ery
use#u" approximation #or ta"" and narrow spi,e #untion+ w%i% is re#erred to
as an impu"se. &s mentioned a!o-e+ preise treatment o# t%e 4ira de"ta
#untion re*uires distri!ution t%eory.
$%e Fourier trans#orm o# 4ira de"ta #untions is as #o""ows:

&6(t)e
) ot
dt=&e
0
=&
&nd t%e in-erse trans#orm:

Ke
) ot
do=

[ Kos(ot)] do+)

[Ksin(ot)] do=...=K

os(ot) do=K6( t)
10 o# 116
$%is yie"ds o# ourse t%at we %a-e esta!"is%ed a trans#orm pair #or 4ira de"ta
#untion.
Periodic function
5t wou"d !e a"so -ery %e"p#u" to %a-e a trans#orm pair de#ined #or periodi
#untions. Let us onsider t%e #o""owing examp"e:
%(t)=sin(o
0
t)8( ) o)=)n[ 6(o+o
0
)6(oo
0
)]
9xat"y t%e same resu"t we wou"d o!tain in ase o# osine #untion.
'(&onential signal
Let us onsider an exponentia" signa" o# t%e #o""owing #orm:
%(t)=e
ot
1(t)8( ) o)=
1
.o
2
+o
2
e
)artg(
o
o
)
)nit ste&
$%is #untion is -ery o#ten used espeia""y in e"etria" engineering and
automatis.
1( t)n6(o)+
1
)o
Constant signal
&2n&6(o)
*ate function
c
t
(t)=&+ t<
t
2
c
t
(t)=0+ t>
t
2
&tSa(
ot
2
)
Properties of Fourier transformation
Fourier trans#orm omes wit% a set o# !asi properties. $%ese !asis an !e
#urt%er app"ied to more omp"iated a"u"ations and deri-ations.
"inearity - x(t)+y(t)M(o)+<(o)
symmetry - 8(t) %(o)
time sa"ing - %(,t)
1
,
8(
#
,
) . $ime sa"e expansion orresponds diret"y
to t%e #re*ueny sa"e ompression. 5n ot%er words+ w%i"e t%e time sa"e is
expanding t%e #re*ueny sa"e ontrats. 5n order #or t%e e*uation to %o"d
true+ t%e amp"itude must inrease+ so t%at area under t%e #untion remains
un%anged. $%is property is used in radar and antenna t%eory.
11 o# 116
#re*ueny sa"ing -
1
,
%(
t
,
)8(,# ) . Fre*ueny sa"ing is ana"ogous to
t%e time sa"ing property+ t%at is+ w%en #re*ueny sa"e is expanded+ t%en
t%e amp"itude o# t%e time #untion inreases Iso t%at area under !ot%
#untions is e*ua"J.
time s%i#ting - %(tt
0
)8( # )e
)2n#t
0
. 'ruia" t%ing is to note+ t%at time
s%i#ting a##ets on"y p%ase 3 amp"itude remains un%anged.
#re*ueny s%i#ting - %(t) e
)2n#
0
8( # #
0
) . Funtion Isigna"J is s%i#ted !y #
0
8. in #re*ueny. $%is property is a #undamenta" property used in
modu"ation o# signa"s.
&s it is ,nown+ ea% signa" xItJ an !e represented as sum o# two signa"s
x(t)=x
o
( t)+x
e
( t)
w%i% are odd and e-en parts o# xItJ. $%e Fourier trans#orm o# xItJ is as
#o""ows:
M(o)=

x
e
(t)os(ot)dt)

x
o
( t)sin(ot)dt=P(o)+)E(o)
9*uation presented a!o-e yie"ds se-era" #eatures o# spetra" %arateristis.
5# xItJ is #untion o# rea" num!ers+ t%en its rea" spetrum is e-en #untion and
omp"ex spetrum is odd:
P(o)=P(o)
E(o)=E(o)
.
For omp"ex #untions situation is a !it di##erent. $o simp"i#y ana"ysis we wi""
esta!"is% to so-a""ed genera" representation o# omp"ex #untion Iw%i% an
!e a"so re#erred to as a signa"J:
x(t)=[ x
e
(t)]+) [ x
e
(t)]+[ x
o
(t)]+) [ x
o
(t)]
9*uation presented a!o-e yie"ds t%e #o""owing properties:
F"nction (signal) S#ectr"m
e-en e-en
odd odd
rea" and e-en rea" and e-en
rea" and odd omp"ex and odd
omp"ex and e-en omp"ex and e-en
omp"ex and odd rea" and odd
%ermitian rea"
anti %ermitian omp"ex
rea" %ermitian
12 o# 116
x(t)=[ x
e
(t)]+) [ x
e
(t)]+[ x
o
( t)]+) [ x
o
( t)]
x(t)=[ x
e
(t)]+) [ x
e
(t)]+[ x
o
(t)]+) [ x
o
( t)]
!
.

e

e

o

o

e

e

o

o
6atrix a!o-e o##ers an easy and #ast way o# determining t%e spetrum o# a
signa". 1pper row presents t%e #untion w%ereas !ottom row is t%e spetrum
2
.
DFT - Discrete Fourier Transform
+am&ling theory
Resu"t o# samp"ing o# a ontinuous signa" is disrete signa"+ w%i% is a
representation o# t%e ontinuous signa" !ut in disrete domain. 1sua""y
samp"ing is per#ormed in regu"ar time inter-a"s+ w%i% simp"i#ies #urt%er
operations on t%e disrete signa". (ne s%ou"d imagine a proess o# ta,ing a
samp"e o# t%e input signa" at regu"ar time inter-a"s 3 t%ese samp"es w%en
onsidered to !e a se*uene #orm disrete signa". 5t is important to notie+
t%at disrete signa" is a se*uene 3 !eause w%en a signa" is samp"ed !y an
idea" samp"er+ it means+ ontinuous signa" is mu"tip"ied !y a 4ira om!+ t%en
t%e resu"t is again a ontinuous signa"K
(ne %as to !e aware o# t%e #at+ t%at samp"ing introdues errors+ as t%e
disrete se*uene does not ontain any in#ormation a!out t%e signa" !etween
samp"e periods. (# ourse+ -a"ue o# t%e error an !e in#"uened and minimi.ed
so t%at it wi"" not a##et #urt%er ana"ysis o# t%e signa". $%is "eads to anot%er+
#undamenta" prinip"e o# signa" proessing: 0y*uist 3 S%annon t%eorem. $%e
more points are in t%e disrete se*uene+ t%e %ig%er samp"e #re*ueny and as
a resu"t t%e sma""er error snea,s into t%e data. &ording to t%e t%eory+
samp"ing #re*ueny %as to !e twie as !ig as t%e %ig%est #re*ueny present in
t%e signa".
t%eory:o
samp"e
>2 o
max
pratie:o
samp"e
>2+5 o
max
5n pratie+ t%is re"ations%ip s%ou"d !e e*ua" to 2.5 or more. B%y is it so
important to samp"e signa"s wit% proper samp"ing #re*uenyH 5# t%e signa" is
samp"ed wit% #re*ueny t%at is "ess t%an t%e #re*ueny o# t%e samp"ed signa"+
t%en o# ourse one wi"" o!tain #a"se resu"ts. P%enomenon responsi!"e #or t%is is
a""ed a"iasing. 5n #re*ueny domain it means+ t%at signa"s spetra are
o-er"apping Ispetrum is periodiKJ and t%ere is no way t%at some o# t%e
#re*uenies present in t%e signa" an !e distinguis%ed. 5n time domain it
means+ t%at one wou"d t%in, t%at t%e signa" %as mu% sma""er #re*ueny.
2 5dea o# presenting t%is property was ta,en #rom :Podstawy teorii sygnaPQw=+ @er.y S.a!atin
1D o# 116
5n #at it is possi!"e to samp"e signa"s t%at %a-e #re*ueny %ig%er t%an t%e
samp"ing #re*ueny+ !ut on"y w%en one ,nows signa";s !andwidt% and !and
"imits.
,uanti-ation
$%e main idea o# samp"ing is t%at we wou"d "i,e ontinuous signa"s to !e
ana"y.ed and proessed !y disrete-time ma%ines 3 omputers+ 4SPs and
ot%ers. 4isrete signa" is not a digita" signa". $%ere is one more step re*uired
w%i% is a""ed *uanti.ation. & se*uene o# disrete -a"ues %as to !e *uanti.ed
and t%en in turn+ it !eomes a digita" signa". 5n ot%er words it means+ t%at
*uanti.ation is a met%od o# approximating -a"ues. &pproximation "e-e"s are
said to !e #ixed and re"ati-e"y sma"" I'44& diss an !e used as an examp"e:
audio data is samp"ed at FF.1,8.+ !ut is *uanti.ed wit% 16 !its+ meaning+ t%at
t%ere are 2
16
"e-e"s #or representing t%e outputJ. Samp"ing and *uanti.ation
are #undamenta" prinip"es used !y 4&'s and &4's Idigita"-to-ana"og and
ana"og-to-digita" on-ertersJ.
$.T% s&ecial case of .ourier Transform
&s stated in t%e %ead"ine+ 4F$ an !e treated as a speia" ase o# Fourier
trans#orm. 5t is possi!"e to deri-e 4F$ itse"#+ !ut in most ases it is re#erred to
its ontinuous e*ui-a"ent.
&s it was stated !e#ore+ t%is time we are wor,ing on a samp"ed signa". 5dea o#
4F$ is to ana"y.e #re*uenies ontained in t%ese samp"es. (# ourse+ one we
%a-e digita" data t%ere are t%ousands o# ot%er possi!i"ities and app"iations+
"i,e on-o"utions+ so"-ing partia" di##erentia" e*uations and simi"ar.
Let us onsider a series o# samp"es t%at are omp"ex num!ers Inote: data an
1F o# 116
5""ustration 2: Samp"ing: proper"y and improper"y
samp"ed signa"
%a-e di##erent #orms+ i.e. samp"es an !e pure"y rea"+ pure"y imaginary or
om!ination o# !ot%J. 4F$ an !e o!tained #rom t%e #o""owing #ormu"a:
M
,
=
_
n=0
01
x
n
e
)2
n
0
,n
+ ,=0+ ...+ 01
Resu"t o# omputations is se*uene o# 0 omp"ex -a"ues. 4isrete 5n-erse
Fourier $rans#orm is de#ined as #o""ows:
x
n
=
1
0
_
,=0
01
M
,
e
2
n
0
,n
+ n=0+ ... +01
4i##erene !etween #orward and in-erse trans#orms is main"y in t%e sign o#
t%e exponent Isigns are -ery o#ten a"so treated as on-entionsR t%e most
important t%ing is to ,eep t%e re"ation+ t%at #or t%e in-erse trans#orm sign o#
t%e exponent s%ou"d !e oppositeJ. Sa"e #ator
D
is most"y a on-ention and
mig%t di##er among ot%er de#initions. 5t is -ery important to rea"i.e t%e #at+
t%at #irst e"ement o# t%e trans#ormed series is t%e so a""ed 4' omponent t%at
is more ommon"y ,nown as a-erage o# t%e input series:
M
0
=
_
,=0
01
x
n
e
)
n
0
=
_
,=0
01
x
n
e
0
=
_
,=0
01
x
n
e
0
=1
6entioned a!o-e sa"e #ator is a on-ention+ t%us de#initions ou"d !e a"so
rewritten in t%e #o""owing #orm:
M
,
=
1
0
_
n=0
01
x
n
e
)2
n
0
,n
+ ,=0+ ... + 01
and t%e in-erse:
x
n
=
_
,=0
01
M
,
e
)2
n
0
,n
+ n=0+ ... + 01
So t%at now+ t%e #irst e"ement o# trans#ormed se*uene wi"" !e a-erage o#
input:
M
0
=
1
0
_
,=0
01
x
n
&s it was mentioned a!o-e+ 4F$ approximates ontinuous Fourier $rans#orm.
Eua"ity o# t%is approximation depends on a #untion t%at is !eing ana"y.ed and
num!er o# samp"es present in a se*uene. (ne again it s%ou"d !e stressed
out+ t%at 4F$ wi"" not "oo, exat"y "i,e ontinuous F$ as data present in 4F$ is
trunated It%at is w%y 4F$ is sometimes a""ed Finite Fourier $rans#ormJ.
$%ere are di##erent ways o# trunating data samp"es 3 t%e most "ogia" one is
w%en we %a-e a periodi signa" sItJ t%at is samp"ed Isamp"ing #re*ueny %ig%
enoug% so t%at a"iasing does not ourJ and num!er o# samp"es ontains one
#u"" period. (# ourse+ 4F$s a"u"ated #rom t%e same signa" !ut trunated in
di##erent way+ w%ere #or examp"e num!er o# samp"es does not resem!"e
integer num!er o# periods+ wi"" resu"t in spetrum "oo,ing onsidera!"y
di##erent. $%is e##et wi"" !e -isi!"e as a periodi #untion wit% -isi!"e
D 5n most imp"ementations sa"e #ator is t%e "engt% o# input -etor
15 o# 116
disontinuities. Ae"ow t%ere are two sets o# images: #irst set presents proper
inter-a" and t%e seond one depits w%at %appens w%en t%e inter-a" is too
s%ort.
Ae"ow we %a-e aperiodi #untion Ipart o# sineJ:
16 o# 116

'*T+! $%e %ig%est #re*ueny present in a signa" is a""ed t%e 0y*uist
#re*ueny+ w%i% in turn determines t%e minimum samp"ing #re*ueny
aording to t%e 0y*uist 3 S%annon Ior S%annon 3 Kotie"ni,o-J t%eorem.
Pro&erties of $.T
Aeause 4F$ is a speia" ase o# Fourier $rans#orm+ it %as t%e same
properties as ontinuous #orm.
Convolution
B%en t%ere is disussion a!out Fourier $rans#orm usua""y on-o"ution is a"so
in"uded. B%y 3 t%is wi"" !e re-ea"ed in a moment. 5n time domain on-o"ution
o# two #untions is de#ined as:
1> o# 116
(#g)(t)=

# (t)g(tt)dt
and in disrete domain:
(#g)(m)=
_
n
# (n)m(mn)
P"ease note+ t%at #or in !ot% ases t%e seond #untion is re-ersed and s%i#ted.
(utome o# on-o"-ing two #untions is a t%ird #untion+ w%i% is a measure o#
amount o# o-er"ap !etween two on-o"-ed #untions. 5t is "ear"y -isi!"e+ t%at
in eit%er time or disrete domain on-o"-ing two #untions re*uires "ots o#
omp"ex omputations Iin ase o# time domain numeria" met%ods %a-e to
app"ied as we""J. Bit% t%e Fourier $rans#orm+ t%is tedious operation !eomes
as easy as mu"tip"iation - t%e con&ol"tion theorem says:
(#g)= (# ) (g)
$%is way o# on-o"-ing signa"s+ w%i% in terms %as app"iation in signa"
proessing #i"tering tas,s an !e -ery #ast and e##iient. $%e proess o#
trans#orming t%e data into #re*ueny domain is %and"ed !y t%e FF$
a"gorit%ms+ w%i% wi"" !e disussed "ater on. 4i##erene !etween !ot%
approa%es is rat%er %uge: in t%e #irst ase+ w%en using disrete on-o"ution it
wou"d re*uire (In
2
J operations+ w%ereas wit% FF$ it re*uires on"y (In "ognJ
omputations.
Two-dimensional Fourier Transform
Introduction
Be are a"ready a*uainted wit% a"" ad-antages t%at ome wit% one
dimensiona" Fourier $rans#orm: mu% easier #i"tering and "ess time needed #or
omp"eting it. B%y s%ou"d not t%en t%is !e app"ied to more dimensionsH 5n
#at+ Fourier trans#orm an !e expanded to ar!itrary num!er o# dimensions.
$%e #o""owing #ormu"a presents t%is possi!i"ity:
# ( x)=(
n
1
F)(x)=
1
(2n)
n
2

F(o)e
i o+x,
do
Letors o and x are n-dimensiona" -etors+ o+ x, is t%e inner produt.
(# ourse+ integration is per#ormed o-er a"" dimensions. $%is signi#iant"y
inreases re*uired num!er o# omputations Iapp"ies espeia""y to omputer
aided ana"ysisJ+ !ut #rom t%e ot%er point it ena!"es us to reuse ,now"edge we
a"ready posses a!out t%e one dimensiona" trans#orm Iwi"" !e disussed "ater
onJ.
$%e more dimensions+ t%e %arder t%e ana"ysis is. 5n ase o# two dimensiona"
trans#orm it is -ery %ard to interpret p%ase spetrum images I:Be genera""y
do not disp"ay P8&S9 images !eause most peop"e w%o see t%em s%ort"y
t%erea#ter suum! to %a""uinogenis or end up in a $i!etan monastery= -
#rom @o%n 6. Arayer;s we!siteJ. 6agnitude spetrum images are o# t%e most
importane.
&s it was mentioned pre-ious"y+ two dimensiona" trans#orm wi"" speed up
1G o# 116
#i"tering and ot%er ana"ysis o# two dimensiona" wa-e#orms. 0owadays 24
Fourier $rans#orm is omputed !y means o# 24 FF$ a"gorit%m and is app"ied
to images+ geop%ysia" arrays+ gra-ity and magneti data and is %e"p#u" in
antenna ana"ysis.
$efinition of t"o-dimensional .ourier Transform
Let us assume t%at we %a-e a two-dimensiona" #untion %Ix+yJ+ Fourier
$rans#orm #or t%is #untion wi"" !e de#ined !y t%e #o""owing integra":
8( u+ -)=

%( x)e
)2n(ux+-y)
dxdy
5# t%e #untion %Ix+yJ is separa!"e Itopo"ogia" spae is separa!"e i# it ontains
a ounta!"e dense su!setJ+ t%en integration an !e per#ormed in two steps+
w%i% wi"" onsidera!"y simp"i#y t%e w%o"e ase:
8( u+ y)=

%(x + y)e
)2nux
dx
8( u+ -)=

%( u+ y) e
)2n-y
dy
Simi"ar to one dimensiona" ase+ #untion %Ix+yJ wi"" !e deomposed into
omponents o# t%e #orm os[ 2n(ux+-y)] and sin[ 2n(ux+-y)] .
Sine it is possi!"e to a"u"ate two-dimensiona" trans#orm in two steps+ t%en it
yie"ds t%e #o""owing on"usion+ t%at two-dimensiona" trans#orm an !e -iewed
as two suessi-e one-dimensiona" Fourier trans#orms. For examp"e+ w%en
a"u"ating two-dimensiona" F$ o# an image Ispatia" data set on t%e ontrary
to time re#erened+ w%i% is named tempora"J+ #irst"y a one dimension F$ is
a"u"ated o-er a"" rows and su!stituted and in t%e seond step again one
dimensiona" trans#orm is a"u"ated o-er a"" rows Inotie+ t%at resu"ts #rom
trans#orming o"umns are in t%e input arrayKJ.
8( u+ -)=

e
)2n-y
[

%(x + y) e
)2nux
dx
]
dy
Possi!i"ity o# app"ying two suessi-e one-dimensiona" trans#orms wi"" !e a
ruia" #eature #or a"u"ating 24 F$. 4etai"s o# FF$ wi"" !e disussed "ater on.
In#erse t"o-dimensional .ourier Transform
5n-erse trans#orm exists and is de#ined !y t%e #o""owing e*uation:
%( x+ y)=

8( u+ -) e
)2n( ux+uy)
dud-
$%e prinip"e is sti"" exat"y t%e same is in ase o# one-dimensiona" trans#orm:
it is enoug% to trans#orm t%e e*uation one again wit% di##erent sign !y t%e
exponent.
P"ease note+ t%at #untion %Ix+yJ %as to satis#y a"" pre-ious"y+ !ut extended to
1C o# 116
two dimensions ru"es in order #or t%e #orward and in-erse trans#orm to exist.
$iscrete t"o-dimensional .ourier Transform
&gain it is possi!"e to reuse a"" statements deri-ed pre-ious"y 3 t%us 3 we an
simp"y rewrite ontinuous de#inition into disrete one:
8( u+ -)=
1
60
_
x=0
6
_
y=0
0
%(x + y)e
)2n(
ux
6
+
-y
0
)
&nd simi"ar"y #or t%e in-erse two-dimensiona" trans#orm:
%( x+ y)=
_
x=0
6
_
y=0
0
8(u+ -)e
)2n(
ux
6
+
-y
0
)
5t is "ear"y -isi!"e+ t%at in order to ompute a disrete Fourier $rans#orm it is
neessary to per#orm xSy omp"ex mu"tip"iations and xSy-1 additions to get
t%e resu"t. For a piture t%at %as si.e 102Fx>6G pixe"s t%is yie"ds >G6FD2
omp"ex mu"tip"iations and >G6FD1 summations to get t%e #ina" resu"t.
5magine+ t%at t%e piture was ta,en wit% a 56px digita" amera 3 t%en image
si.e wou"d !e 25C2x1CFF pixe"s+ w%i% yie"ds 50DGGFG omp"ex
mu"tip"iations. $%at is w%y it said+ t%at 4F$ is o# omp"exity 0
2
Iin ase o#
image it mig%t !e 0S6 aording to pre-ious de#initionsJ.
$.T ! com&utational bottlenec/0 ..T ! efficiency "inner
&s it was mentioned pre-ious"y+ 4F$ o# 0 samp"es wi"" resu"t in 0S0 omp"ex
mu"tip"iations and 0-1 additions. $o !e more preise+ it is said t%at 4F$ is o#
(I0
2
J omp"exity. $%e !ig (+ w%i% an !e re#erred to as t%e :!ig ( notation=
is used to desri!e asymptoti !e%a-ior o# #untions+ w%i% in terms an !e
expressed !y simp"er #untions. 5n ot%er words+ !y means o# t%e ( notation it
is possi!"e to esta!"is% upper !ound IasymptoteJ o# t%e num!er o# re*uired
omputations.
20 o# 116
$%e !ig ( notation is -ery use#u" w%en it omes to desri!e an a"gorit%m+ as it
immediate"y gi-es :t%e ost= o# exeution. Rea""ing+ 4F$ is o# (I0
2
J
omp"exity+ w%i% is usua""y re#erred to as ,"a$ratic com#le-ity. Se*uenes
t%at ontain "arge num!er o# samp"es wi"" ta,e a "ot o# time to ompute e-en
on -ery #ast omputers. $%at is w%y nowadays FF$ a"gorit%m is used #or
a"u"ating 4F$. 5t is a -ery e##iient a"gorit%m t%at exp"oits properties o#
Fourier $rans#orm and gi-es t%e same resu"t as diret a"u"ation o# 4F$ in
on"y (I0S"og
2
I0JJ operations. FF$ and its properties wi"" !e disussed in
detai"s in t%e next %apter.
.ast .ourier Transform algorithm
Fast and e##iient way o# a"u"ating 4isrete Fourier $rans#orm+ w%i%
redues num!er o# arit%metia" omputations #rom (I0
2
J to (I0S"og
2
0J. Key
o# t%e a"gorit%m is data reorgani.ation and #urt%er operations on it.
5n 1C65 a #irst paper a!out e##iient way o# a"u"ating 4F$ was pu!"is%ed !y
5A6 resear%er @ames 'oo"ey and Prineton #au"ty mem!er @o%n $u,ey.
Surprising"y+ t%ey were not t%e #irst ones t%at in-ented FF$. 5t was pro-ed+
t%at FF$ was #irst"y in-ented !y 'ar" Fredri% /auss around 1G05. 5t was not
pu!"i"y ,nown )ust !eause /auss did not pu!"is% it. 'oo"ey and $u,ey
rein-ented exat"y t%e same t%ing w%i% is !eing used ti"" today. /reat
rediso-ery made !y 'oo"ey and $u,ey put a"so some new #eatures into t%e
a"gorit%m+ w%i% ma,es it more #"exi!"e.
adi!-" Butterfl#
$%e so a""ed :'oo"ey-$u,ey= a"gorit%m is !y #ar t%e most ommon FF$
a"gorit%m. $%e main prinip"e is t%at t%e 4F$ o# si.e 0 is sp"it into sma""er
4F$s o# si.es
0
1
and
0
2 + w%ere t%e re"ation !etween !ot% is
0=0
1
0
2 .
4F$ is sp"it reursi-e"y+ !ut o# ourse it is possi!"e to reate iterati-e -ersion
o# t%e a"gorit%m. 4epending on w%i% is t%e radix+ t%e a"gorit%m an !e:
45$+ w%i% is a""ed :4eimation in $ime= and is used w%en
0
1
is radix
45F+ w%i% is a""ed :4eimation in Fre*ueny= and is used w%en
0
2
is
radix
$IT
4eimation in time is t%e simp"est -ersion o# 'oo"ey-$u,ey a"gorit%m. Aeause
o# t%at it "a,s per#ormane+ !ut is o#ten used as an examp"e. $%e deimation
in time di-ides t%e pro!"em into two su!pro!"ems:
M
r
=
_
,=0
0
2
1
x
2,
e
)2
n
0
(2,) r
+
_
,=0
0
2
1
x
2,+1
e
)2
n
0
(2,+1)r
=
_
,=0
0
2
1
x
2,
e
)2
n
0
(2,) r
+e
)2
n
0
r
_
,=0
0
2
1
x
2,+1
e
)2
n
0
(2,) r
r=0+ 1+... +01
21 o# 116
$%ese two su!pro!"ems #orm sets o# e-en- and odd-indexed data ea% o# si.e
0
2
1 and are so"-ed reursi-e"y. Aot% summations resem!"e two 4F$s and
t%us an !e rewritten into t%e #o""owing #orm Iwi"" simp"i#y #urt%er
deri-ationsJ:
e-en set
9
r
=
_
,=0
0
2
1
x
2,
e
)2
n
0
(2,)r
+ r=0+ 1+ ... +
0
2
1
odd set
(
r
=
_
,=0
0
2
1
x
2,+1
e
)2
n
0
(2,)r
+ r=0+ 1+... +
0
2
1
First
0
2
terms o# our so"ution is o!tained #rom:
M
r
=9
r
+e
)2
n
0
r
(
r
+ r=0+ 1+ ... +
0
2
1
Remaining terms an !e o!tained !y ta,ing into aount t%e #o""owing
identities:
(e
)2
n
0
)
0
2
+r
=e
)2
n
0
r
and
(e
)F
n
0
)
0
2
=1
.
8a-ing in mind a!o-e identities so"utions #or e"ements r+
0
2
an !e o!tained
#rom:
M
r+
0
2
=9
r
e
)2
n
0
r
(
r
+ r=0+ 1+ ... +
0
2
1 .
9*uations
M
r
=9
r
+e
)2
n
0
r
(
r
+ r=0+ 1+ ... +
0
2
1
and
M
r+
0
2
=9
r
e
)2
n
0
r
(
r
+ r=0+ 1+ ... +
0
2
1
are t%e so a""ed 'oo"ey-$u,ey !utter#"y+ as t%e #"ow diagram resem!"es a
!utter#"y Iimage #rom Bi,ipediaJ:
22 o# 116
$%e #ator e
)2
n
0
r
+ r=0+ 1+ ... +
0
2
1 present in !utter#"ies is t%e so a""ed
twidd"e #ator Iroot o# unityJ. $widd"e #ators an !e preomputed and stored
in memory #or "ater usageR a"u"ation o# t%ese an !e #urt%er optimi.ed eit%er
!y using trigonometri identities or !y using &S6 "e-e" optimi.ations
Isu!se*uent a""s to fsin and fcos an !e su!stituted !y one fsincos a""R t%is
instrution is a-ai"a!"e in t%e -./ #"oating-point extensionJ.
5t s%ou"d !e a"so noted+ t%at t%e "engt% o# input -etor s%ou"d !e a power o#
two. 5t is a genera" ru"e+ t%at #or ra$i--r FF$ t%e input -etor s%ou"d !e o#
"engt% r
n
. For examp"e+ ra$i--0 -ersion is -ery attrati-e !eause o# t%e
twidd"e #ators t%at %a-e -a"ues 1+ -1+ ) or -) w%i% simp"i#y a "ot #urt%er
mu"tip"iations 3 !ut o# ourse+ t%is osts t%e #at+ t%at input -etor %as to !e
t%e power o# F.
&s it was mentioned pre-ious"y+ 'oo"ey and $u,ey made t%e a"gorit%m more
#"exi!"e. $%is means+ t%at #or a non-prime "engt% o# a se*uene we an
a"u"ate a omposite FF$ It%is is t%e so a""ed :mixed radix=J. Let;s say+ t%at
we %a-e a se*uene onsisting o# 200 samp"es. 5t an !e a"u"ated !y means
o# ra$i--2 and ra$i--11. 8owH Lengt% o# t%e se*uene is 200 samp"es and
t%is an !e sp"it into 200 T 2S10S10+ so t%at we an use radix-10 twie and
radix-2 one. 'oo"ey and $u,ey in t%eir paper s%owed t%at it is possi!"e to
pro-ide a"gorit%ms #or ar!itrary r in t%e radix. 9xat"y t%e same t%ings were
deri-ed !y /auss+ w%o "e#t in %is wor, deri-ations o# ra$i--2 and ra$i--3
a"gorit%ms.
6ixed radix met%od annot !e app"ied to prime si.es. 5t is possi!"e to
a"u"ate FF$ #or a prime-si.e se*uene+ !ut t%ese met%ods are mu% "ess
e##iient t%an ones #or non-prime si.es. Aesides+ t%ere are se-era" ways o#
a-oiding prime si.es in se*uenes Iin genera"+ t%e #irst step s%ou"d !e to do
e-eryt%ing t%at is possi!"e to get a non-prime si.e o# t%e se*ueneR t%is
operation is a"most a"ways suess#u"J.
2D o# 116
$I.
4eimation in #re*ueny+ w%i% is a"so re#erred to as Sande-$u,ey a"gorit%m
deimates t%e output #re*ueny series into e-en- and odd-indexed sets. (ne
again t%ere are two pro!"ems to !e so"-ed:
M
r
=
_
,=0
0
2
1
x
,
e
)2
n
0
,r
+
_
,=
0
2
01
x
,
e
)2
n
0
,r
=
_
,=0
0
2
1
x
,
e
)2
n
0
,r
+
_
,=0
0
2
1
x
,+
0
2
e
)2
n
0
(,+
0
2
)r
M
r
=
_
,=0
0
2
1
(
x
,
+x
,+
0
2
e
)2
n
0
r0
2
)
e
)2
n
0
r,
+ r=0+ 1+ ... + 01
Simi"ar"y to t%e 45$ ase it is possi!"e to de#ine su!pro!"ems as:
e-en set
9
r
=
_
,=0
0
2
1
[( x
,
+x
,+
0
2
)e
)F
n
0
,r
] + r=0+ 1+ ... +
0
2
1
odd set
(
r
=
_
,=0
0
2
1
[( x
,
x
,+
0
2
)e
)2
n
0
,
] e
)F
n
0
,r
+ r=0+ 1+ ... +
0
2
1
$erms in :I J= !ra,ets wi"" !e a"u"ated #irst and wi"" omp"ete t%e
s"b$i&ision part o# a"u"ations. &#ter mentioned su!pro!"ems are so"-ed
t%ere are no ot%er a"u"ations t%at %a-e to per#ormed. 6ost o# t%e wor, is
done during t%e su!di-ision p%ase+ w%en t%e su!pro!"ems are prepared.
'omputations in t%e su!di-ision part are a""ed /ent"eman-Sande !utter#"y
Ian !e presented t%e same way as 'oo"ey-$u,ey !utter#"yJ.
Summari.ing+ during t%e su!di-ision part it is neessary to a"u"ate t%e
#o""owing -a"ues:
e
,
=x
,
+x
,+
0
2
#or e-en e"ements and o
,
=( x
,
x
,+
0
2
) e
)2
n
0
,
+
w%ere , in !ot% ases is ,=0+ 1+ ... +
0
2
1 .
1it re#ersed
Ait re-ersa" is a permutation t%at reorders e"ements in t%e -etor in t%e
#o""owing way:
index o# an e"ement is on-erted into !inary num!er ID
de
T 011
!in
J
!inary representation is re-ersed Imirror re#"etionJ I011
!in
!eomes 110
!in
+
w%i% is 6
de
J
5n ase o# radix-2 and 45$+ output produed !y t%e a"gorit%m is sram!"ed+
meaning+ t%e a"gorit%m s%ou"d ta,e !it-re-ersed data in order to return in-
order data. (pposite situation is #or t%e 45F+ w%ere t%e a"gorit%m ta,es in-
2F o# 116
order data and produes !it-re-ersed resu"t.
1sua""y !it-re-ersa" proess is a distint part o# t%e FF$ routine+ as most users
pre#er natura" order o# data. (# ourse+ #rom t%e ma%ine;s point o# -iew it
does not matter. 'on-o"ution wor,s e*ua""y on !ot% in-order and !it-re-ersed
data.
Ait re-ersa" pro!"em used to !e a #ie"d o# ati-e resear%. 0owadays
omp"exity o# !it re-ersa" a"gorit%ms is (I0J I"inearJ. 5t is -ery important to
remem!er to eit%er pre or post proess t%e data. 5t is o# great importane
w%en ta",ing a!out in-p"ae trans#orms+ meaning+ trans#orm w%ose resu"ts
o-erwrite input data. 'omp"ete"y di##erent situation ours w%en we ta",
a!out out-o#-p"ae trans#orms Iresu"t is written to a separate array and is
usua""y in natura" orderJ.
$mplementing radi!-" D$F
$%e !est way o# presenting t%eory is to s%ow its app"iation 3 meaning 3 to
"earn !y examp"e. 5n t%is paragrap% a radix-2 deimation-in-#re*ueny
a"gorit%m wi"" !e imp"emented as we"" as routine #or preomputing twidd"e
#ators+ w%i% wi"" use Sing"eton;s a"gorit%m. Funtion wi"" !e imp"emented in
' aording to t%e pseudo-ode examp"es in t%e !oo, :5nside t%e FF$ A"a,
Aox=.
T"iddle factors
$widd"e #ators are roots o# unity. $%ey are e*ua""y spaed on t%e unit ir"e+
t%us w%en a root o# unity is raised to a n
t%
power t%en t%e resu"t s%ou"d !e
one. $%e #at+ t%at roots o# unity are p"aed on t%e unit ir"e yie"ds t%e
#o""owing on"usion:
e
)ro
=os(ro)+)sin(r o) x + o=2
n
0
+ 0=2
n
8a-ing in mind t%at+ we an say+ t%at i# aU)! is a root o# unity+ t%en a"so
!a!)! and !!!)a are roots o# unity. &ording to t%at we an a"u"ate
on"y #irst
0
G
1 -a"ues. $%e easiest way is to a"u"ate #irst -a"ues o# os and
sine #untions using standard "i!rary #untions or &S6 optimi.ed a""s
Ifsincos t%at was mentioned !e#oreJ. (t%er so"ution wi"" exp"oit trigonometri
identities. &"gorit%m t%at is !ased on t%em was proposed !y Sing"eton+ and we
are going to de-e"op our twidd"e #ator routine aording to t%e Sing"eton;s
a"gorit%m.
Ae"ow is a ' imp"ementation. Pointers 4cos and 4sin s%ou"d point to a"ready
a""oated memory+ and t%at s%ou"d !e
(
0
2
+1
)
si.eo# (dou!"e) . ' o# ourse
s%ou"d !e e*ua" to num!er o# samp"es in t%e input -etor. Resu"ting omp"ex
num!er is in a"ge!rai #orm+ w%ere 4cos is t%e rea" part and 4sin imaginary
part.
25 o# 116
void twiddle_factors(double *wcos,double *wsin, int N) {
double alpha, S, C;
int K, L;
alpha = (double) TWO_PI/N;
S = sin(alpha);
C = 1 - 2*pow(sin(alpha/2), 2);
wcos[0] = 1;
wsin[0] = 0;
for(K=0; K<=(N/8)-2; K++) {
wcos[K+1] = C*wcos[K] - S*wsin[K];
wsin[K+1] = S*wcos[K] + C*wsin[K];
}
L = N/8;
wcos[L] = sqrt(2)/2;
wsin[L] = sqrt(2)/2;
for(K=1; K<=(N/8)-1; K++) {
wcos[L+K] = wsin[L-K];
wsin[L+K] = wcos[L-K];
}
L = N/4;
wcos[L] = 0;
wsin[L] = 1;
for(K = 1; K<=(N/4); K++) {
wcos[L+K] = -wcos[L-K];
wsin[L+K] = wsin[L-K];
}
}
&"" twidd"e #ators s%ou"d !e omputed on"y one in t%e !eginning. $%ey an
!e a"so stored in a #i"e #or #urt%er retrie-a" Iin a"most a"" ases it is not
neessaryJ.
2adi(-3 $I. ..T
Sine we %a-e preomputed twidd"e #ators+ now it is time to imp"ement t%e
FF$ routine. P"ease+ ,eep in mind t%at #or radix-2 45F FF$ we insert in-order
data and as a resu"t we get !it-re-ersed 4F$ Ior in ot%er words sram!"ed
resu"tJ.
26 o# 116
0ati-e 'omp"ex #ormat is not used. 5nstead o# t%at+ t%e input -etor is a 2
e"ement array o# dou!"e Iw%i% is !inary ompati!"e wit% 566 re-ision+ t%us
t%e nati-e omp"ex #ormatJ:
typedef double cplx[2];
$%e same :tri,= was used !y de-e"opers o# t%e FF$B "i!rary I##twVomp"exJ+
w%i% ma,es t%e de-e"opment proess mu% easier+ as t%e imp"ementation
does not re*uire speia" index a"u"ation Ias opposed to t%e :0umeria"
Reipies= -ersion+ w%i% stores data as onseuti-e e"ements+ w%ere t%e #irst
one is Rea" part and seond one is 5maginaryJ. :R9&L= and :56&/= are
preproessor maros IR9&L is 0 and 56&/ is 1J. Funtion re"ies on pre-ious"y
omputed twidd"e #ators.
void fft_r2_dif(cplx *a, double *wcos, double *wsin, int N) {
int NumOfProblems = 1;
int ProblemSize = N;
int HalfSize;
int K, J; int JFirst, JLast, Jtwiddle;
cplx W, Temp,Tmp;
while(ProblemSize > 1) {
HalfSize = ProblemSize/2;
for(K=0; K<=NumOfProblems-1; K++) {
JFirst = K*ProblemSize;
JLast = JFirst + HalfSize -1;
Jtwiddle = 0;
for(J = JFirst; J<=JLast; J++) {
W[REAL] = wcos[Jtwiddle];
W[IMAG] = wsin[Jtwiddle];
Temp[REAL] = a[J][REAL];
Temp[IMAG] = a[J][IMAG];
a[J][REAL] = Temp[REAL] + a[J + HalfSize][REAL];
a[J][IMAG] = Temp[IMAG] + a[J + HalfSize][IMAG];
Tmp[REAL] = Temp[REAL] - a[J+HalfSize][REAL];
Tmp[IMAG] = Temp[IMAG] - a[J+HalfSize][IMAG];
a[J+HalfSize][REAL] = W[REAL]*Tmp[REAL] -
W[IMAG]*Tmp[IMAG];
a[J+HalfSize][IMAG] = W[REAL]*Tmp[IMAG] +
W[IMAG]*Tmp[REAL];
Jtwiddle = Jtwiddle + NumOfProblems;
2> o# 116
}
}
NumOfProblems *=2;
ProblemSize = HalfSize;
}
}
$rans#orm is per#ormed in-p"ae+ so t%at inputs are o-erwritten wit% outputs.
$%e "ast operation on data wi"" unsram!"e t%e resu"t+ so t%at it wi"" !e in
natura" order:
/*
* BIT-REVERSE
* Based on numerical recipies. Contains Joo Martins modifications.
* Data has to be post-processed, as the radix-2 DIF takes in-order data and
* produces bit-reversed result.
*/
j=0;
for (i=0;i<(N/2);i++) {
if (j > i) {
// Swap Re and Im parts
SWAPR(inp[j],inp[i]);
SWAPI(inp[j],inp[i]);
// checks if the changes occurs in the first half
// and use the mirrored effect on the second half
if((j/2)<(N/4)){
// Swap Re and Im parts
SWAPR(inp[(N-(i+2))],inp[(N-(j+2))]);
SWAPI(inp[(N-(i+2))],inp[(N-(j+2))]);
}
}
m=N/2;
while (m >= 2 && j >= m) {
j -= m;
m = m/2;
}
j += m;
}
6aros :SB&PR= and :SB&P5= %a-e to !e de#ined !e#ore t%e ode and t%ey
"oo, as #o""ows:
#define SWAPR(a,b) tempr[0]=a[0];a[0]=b[0];b[0]=tempr[0];
#define SWAPI(a,b) tempr[1]=a[1];a[1]=b[1];b[1]=tempr[1];
Routine does not per#orm any ot%er ations on t%e data I4F$ is not enteredJ.
2G o# 116
Testing radi(-3 $I. ..T
$%e easiest way o# testing is to ta,e a se*uene o# -a"ues+ a"u"ate it in two
di##erent piees o# so#tware and t%en ompare resu"ts. 5n a"" ases 6&$L&A
was used #or omparing o!tained resu"ts.
test #or a 0 T G input -etor
6&$L&A
>> input = [ 1 1 5 5 5 5 1 1 ];
>> fft(input)'
ans =
24.0000
-9.6569 + 4.0000i
0
1.6569 - 4.0000i
0
1.6569 + 4.0000i
0
-9.6569 - 4.0000i
radix-2.
Twiddle factors (first N/2 = 4 only):
[0] cos = 1.000000, sin = 0.000000
[1] cos = 0.707107, sin = 0.707107
[2] cos = 0.000000, sin = 1.000000
[3] cos = -0.707107, sin = 0.707107
[4] cos = -1.000000, sin = 0.000000
Input data (N = 8 samples):
[0] Re(1.000000) Im(0.000000)
[1] Re(1.000000) Im(0.000000)
[2] Re(5.000000) Im(0.000000)
[3] Re(5.000000) Im(0.000000)
[4] Re(5.000000) Im(0.000000)
[5] Re(5.000000) Im(0.000000)
[6] Re(1.000000) Im(0.000000)
[7] Re(1.000000) Im(0.000000)
radix-2 DIF FFT: 0 ms (* see the text below)
Bit-reversed result:
[0] Re(24.000000) Im(0.000000)
[1] Re(-9.656854) Im(4.000000)
[2] Re(0.000000) Im(0.000000)
[3] Re(1.656854) Im(-4.000000)
2C o# 116
[4] Re(0.000000) Im(0.000000)
[5] Re(1.656854) Im(4.000000)
[6] Re(0.000000) Im(0.000000)
[7] Re(-9.656854) Im(-4.000000)
&ssessment: radix-2. wor,s orret"y.
(7) :0 ms= is !eause t%e ounter;s reso"ution is too sma"". $%is #untion wi""
!e used to e-a"uate t%is a"gorit%m;s speed "ater on.
Soure ode #or t%e radix-2. program an !e #ound in &ppendies.
2adi(-3 in numbers% cost
45F uses /ent"eman-Sande !utter#"y+ w%i% #or se*uene o# "engt% 0 re*uires
0 omp"ex additions and
0
2
omp"ex mu"tip"iations. (# ourse+ one %as to
rea"i.e t%e #at+ t%at omp"ex addition onsists o# two rea" additions and
omp"ex mu"tip"iation onsists o# t%ree rea" additions and mu"tip"iations
Iassuming preomputed twidd"e #atorsJ. 5n terms o# #"oating point operations
IFL(PsJ+ t%e tota" ost an !e esta!"is%ed as:
0 omp"ex additions is 2S0 #"ops

0
2
omp"ex mu"tip"iations yie"ds D0 #"ops Iapp"ied on"y to %a"# o# t%e
se*uene+ t%at is w%y t%ere is on"y D0J
$ota" omp"exity o# t%e a"gorit%m is t%en
((0)=50"og
2
(0)
.
Two-dimensional FFT: adapting %D FFT
&s it was mentioned !e#ore+ in ase o# a"u"ating two-dimensiona" 4F$ it is
possi!"e to di-ide t%e proess into two parts+ w%ere on"y one-dimensiona"
4F$s are in-o"-ed. $%us+ #or imp"ementing 24 FF$ we are going to use 14
routines.
3$ ! com&utational madness
For two-dimensiona" ase t%e num!er o# arit%metia" a"u"ations grows
rapid"y and re*uires mu% more storage spae ImemoryJ. For an input -etor
t%at %as 102F samp"es+ t%e situation is rat%er easy and amount o# oupied
spae is not ritia" Iassuming G!ytes dou!"e preision type it wou"d !e
102FSG T G1C2!ytesJ. 5n ase o# image+ w%ose si.e is 102Fx102F pixe"s Igray
sa"e+ G!pp
F
J we %a-e an array o# t%e same si.e+ w%i% gi-es 110.5/3 -a"ues
to !e stored in memory. For a dou!"e preision
5
#ormat it wou"d !e G1C2,A+
and i# t%ere is no distintion !etween pure"y rea" and mixed input #or t%e FF$
routine+ t%en it wi"" !e 16DGF,A )ust #or one gray sa"e imageK $%is o# ourse
F App means :!its per pixe"=R num!er o# !its a-ai"a!"e #or representing t%e pixe";s o"or
5 5999 >5F 4ou!"e Preision
D0 o# 116
an inur se-era" ot%er pro!"ems: it is %arder to distri!ute omputations
6
and
in most ases a%e memory wi"" not !e used e##iient"y Ia%e miss
ourreneJ. 6a,ing t%e routine and so#tware itse"# a%e #riend"y an !e
!ene#iia" in terms o# speed. 'a%e strategies+ design and genera" impat on
input array si.e wi"" !e disussed "ater on. &s #or now it s%ou"d !e assumed
t%at we are going to use t%e '-sty"e way o# storing arrays+ w%i% an !e eit%er
row-ma)or or o"umn-ma)or #ormat.
Let us assume to %a-e 0S6 matrix. Row-ma)or #ormat simp"y puts a"" rows I0J
o# t%e array one a#ter anot%er I6 timesJ+ so t%at as a resu"t we %a-e a 0S6
-etor. Simi"ar situation is #or o"umn-ma)or #ormat 3 !ut t%is time o"umns
are put one a#ter anot%er Io"umn-ma)or is more popu"ar among Fortran
programmersJ.
Programming considerations and o#er#ie" of the method
Be are going to %a-e our data in t%e row-ma)or #ormat+ w%i% is some%ow
more popu"ar among ' programmers. $%e -etor %as 0S6 entries I0x6
matrixJ and we are going to app"y 14 trans#orms #irst"y to a"" rows and t%en to
a"" o"umns. $%is wi"" inur:
0 14 trans#orms o-er rows+ ea% row %a-ing 6 samp"es
6 14 trans#orms o-er o"umns+ ea% o"umn %a-ing 0 samp"es
$%e #irst *uestion t%at arises is: %ow t%e seond step o# t%e 24 FF$ met%od
wi"" !e per#ormed+ sine in t%e #irst one t%e FF$ routine is going to ta,e rows
as t%ey are p"aed in t%e memory 3 t%en w%at wit% t%e ot%er stepH 5t wou"d !e
6 'ase+ w%en data %as to !e trans#erred to anot%er ma%ine I'P1J and system is supposed to
wor, in rea"-time
D1 o# 116
5""ustration D: Row- and o"umn-ma)or ways o# storing arrays Ia"so a""ed W' sty"e wayWJ
-ery ine##iient to read o"umn data #rom exat"y t%e same -etor+ as it wou"d
generate "ots o# a%e misses. $%is re*uires intermediate stage t%at wi""
transpose t%e matrix+ so t%at rows wi"" !e in p"aes o# o"umns. &#ter t%e
seond step+ matrix an !e transposed one again i# neessary to o!tain data
in t%e same #ormat as input. $%is in #at means+ t%at input data an !e in
eit%er row- or o"umn- ma)or #ormats 3 it %as no di##erene at a"" #or t%e FF$
routine. For !etter per#ormane+ #or examp"e i# on t%e output one wants row-
ma)or #ormat+ t%en input -etor an !e in o"umn-ma)or and t%en t%e "ast
transposition an !e omitted.
Trans&osition
5n-p"ae matrix transposition an !e tri,y and di##iu"t+ !ut ne-ert%e"ess it is
possi!"e e-en to para""e"i.e t%is proess. $%e most ommon se*uentia" so"ution
uses t%e di-ide and on*uer met%od+ w%i% !y de#inition in-o"-es reursion.
5nput matrix is di-ided into sma""er su! matries. 6ost genera" a"gorit%ms
wor, we"" on !ot% s*uare and retangu"ar matries.
For purposes o# t%is examp"e we are going to use t%e simp"est met%od w%i%
is o# ourse out o# p"ae transposition Ire*uires a opy o# t%e transposed
matrixJ.
Paralleli-ation and 3$ ..T
5n urrent situation #or a 0S6 matrix w%en we %a-e to per#orm #irst"y 0 14
FF$s and t%en 6 14 FF$s on distint rows and o"umns Ia#terwardsJ+ t%en it
is o# ourse re"ati-e"y easy to para""e"i.e t%is proess. Be setup appropriate
num!er o# t%reads+ w%i% s%ou"d !e e*ua" to num!er o# proessors present in
t%e system
>
. $%reads %a-e to !e syn%roni.ed on"y !etween FF$ stages+ w%en
t%e matrix %as to !e transposed. B%en per#orming out o# p"ae FF$ trans#orm+
t%en it is e-en easier+ as t%e intermediate resu"t IFF$ #rom rowsJ an !e
written to t%e new matrix in opposite order and t%en used again !y t%e same
routine Iseond step an use input -etor to store data+ a#terwards it wou"d !e
enoug% )ust to swap pointersJ. 5t is wort% notiing+ t%at preomputed twidd"e
#ators
G
an and s%ou"d !e used regard"ess o# t%e #at+ i# t%e FF$ is going to
!e omputed #u""y se*uentia""y or in para""e". 'a"u"ating t%em on t%e #"y is a
!ad idea+ as #or two-dimensiona" input arrays tota" num!er o# omputations
> $%is is a genera" ru"e+ !ut o# ourse+ it is wort% experimenting wit% t%is num!er I%a-ing in
mind+ t%at too !ig num!er o# t%reads wi"" derease t%e per#ormane signi#iant"yJ.
G $%reads gi-e you s%ared memory spae and twidd"e #ators wi"" !e on"y read+ t%us no
additiona" wor, and syn%roni.ation is needed in t%is ase.
D2 o# 116
5""ustration F: 24 FF$: operation #"ow
grows -ery #ast Itwidd"e #ators a"u"ations T 0S6J.
Furt%er ana"ysis o# para""e" a"gorit%ms #ro two-dimensiona" FF$ in-o"-es
di##erent #"a-ors o# one-dimensiona" trans#orm. 24 FF$ is a"so -ery attrati-e
w%en it omes to radix-F+ #or w%i% twidd"e #ators are )ust 1+ -1+ ) and -). $%is
an additiona""y gi-e some speed up+ !ut at t%e ost o# input si.e "imitations
Ipower o# FJ.
Im&lementation of 3$ ..T% reusing 4$ ..T
For imp"ementing 24 FF$ it is possi!"e to reuse pre-ious"y de-e"oped
#untions+ !ot% #or twidd"e #ators a"u"ation and t%e FF$ routine. $%is time it
is not enoug% )ust to %a-e t%e FF$ routine+ as num!er o# additiona" operations
is mu% %ig%er and demands mu% more attention t%at in pre-ious ases. 24
imp"ementation an !e di-ided into t%e #o""owing steps Ipseudo-odeJ:
input := N*M matrix
twiddle_factors(length M)
from 0 to N do
fft_transform(slice of length M)
bit_reverse(slice of length M)
done
transpose_matrix(N*M becomes M*N)
twiddle_factors(length N)
from 0 to M do
fft_transform(slice of length N)
bit_reverse(slice of length N)
done
transpose_matrix(M*N becomes N*M)
Routines fft8r28$if() and t4i$$le8factors() are exat"y t%e same as in
pre-ious+ 14 examp"e. Ait-re-erse part o# ode was put into #untion named
bit8re&erse(). Knowing t%is it %as mu% more sense to present t%e !ody o#
t%e program to gi-e g"o!a" o-er-iew o# w%at is %appening Ide!ugging routines
were utJ:
/*
ROWS
/
twiddle_factors(tc, ts, cols);
print_twiddles(tc, ts, cols);
TIME_START
for(i=0; i<rows; i++) {
fft_r2_dif(inp+i*cols, tc, ts, cols);
bit_reverse(inp+i*cols,cols);
}
DD o# 116
TIME_STOP("FFT over rows")
/*
* Matrix transposition: quick and dirty
*/
TIME_START
for(i=0; i<rows; i++) {
for(j=0; j<cols; j++) {
transp[j*rows + i][REAL] = inp[i*cols+j][REAL];
transp[j*rows + i][IMAG] = inp[i*cols+j][IMAG];
}
}
// Rewrite results
for(i=0; i<rows*cols; i++) {
inp[i][REAL] = transp[i][REAL];
inp[i][IMAG] = transp[i][IMAG];
}
TIME_STOP("Transposition")
/*
* COLUMNS
*/
TIME_START
twiddle_factors(tc, ts, rows);
TIME_STOP("Twiddle factors for FFT over columns")
print_twiddles(tc, ts, rows);
TIME_START
for(i=0; i<cols; i++) {
fft_r2_dif(inp+i*rows, tc, ts, rows);
bit_reverse(inp+i*rows, rows);
}
TIME_STOP("FFT over columns")
printf("T R A N S P O S E...\n");
TIME_START
for(i=0; i<rows; i++) {
for(j=0; j<cols; j++) {
transp[i*cols+j][REAL] = inp[j*rows + i][REAL];
transp[i*cols+j][IMAG] = inp[j*rows + i][IMAG];
}
}
for(i=0; i<rows*cols; i++) {
DF o# 116
inp[i][REAL] = transp[i][REAL];
inp[i][IMAG] = transp[i][IMAG];
}
TIME_STOP("Transposition")
6atrix transposition ou"d !e o# ourse so"-ed in a more e"egant way. Lersion
imp"emented in t%e examp"e is t%e #astest one in terms o# time needed to write
it 3 not to exeute it. 6atrix transposition was mentioned !e#ore 3 it seems to
!e a -ery simp"e matrix operation+ and in #at it is -ery easy to do it on a piee
o# paper. 5n ase+ w%en one %as to do it in-p"ae+ and at t%e same time
maintain t%e row-ma)or or o"umn-ma)or way o# storing data+ t%en it !eomes
rea""y tri,y.
&not%er t%ing is+ t%at sine we ,now t%e matrix si.e #rom t%e -ery !eginning+
t%en it is a"so possi!"e to a"u"ate twidd"e #ators in ad-ane and store t%em
in two di##erent memory "oations Ion"y #or non-symmetria" matries and t%us
s%ou"d !e app"ied on"y in ase+ w%en t%e #irst trans#orm %as to !e a"u"ated
as #ast as a"" ot%ersJ.
3$ ..T% test
$est met%od was exat"y t%e same as in pre-ious ase: t%e same input array
was omputed in 6&$L&A and !y t%e 2d. program. Resu"ts are !e"ow:
6&$L&A resu"ts
a =
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
>> fft2(a)'
% Result is transposed! (clarity purposes)
ans =
72.0000 0
-8.0000 -19.3137i 0
-8.0000 - 8.0000i 0
-8.0000 - 3.3137i 0
-8.0000 0
-8.0000 + 3.3137i 0
-8.0000 + 8.0000i 0
-8.0000 +19.3137i 0
2d. resu"ts
Input data (N = 16 samples):
==== Input data ====
D5 o# 116
| 1.0 0.0 | 2.0 0.0 | 3.0 0.0 | 4.0 0.0 | 5.0 0.0 | 6.0 0.0 | 7.0 0.0 | 8.0 0.0
| 1.0 0.0 | 2.0 0.0 | 3.0 0.0 | 4.0 0.0 | 5.0 0.0 | 6.0 0.0 | 7.0 0.0 | 8.0 0.0
==========
Processing rows: twiddle factors, ffts, bit-reversal, transposition.
Twiddle factors (first N/2 = 4 only):
[0] cos = 1.000000, sin = 0.000000
[1] cos = 0.707107, sin = 0.707107
[2] cos = 0.000000, sin = 1.000000
[3] cos = -0.707107, sin = 0.707107
[4] cos = -1.000000, sin = 0.000000
R_FFT: 2 FFTs of 8 samples
* * * * * FFT over rows: 0 ms
T R A N S P O S E...
* * * * * Transposition: 0 ms
Processing columns: twiddle factors, ffts, bit-reversal.
* * * * * Twiddle factors for FFT over columns: 0 ms
Twiddle factors (first N/2 = 1 only):
[0] cos = 0.000000, sin = 1.000000
[1] cos = 0.707107, sin = 0.707107
C_FFT: 8 FFTs of 2 samples
* * * * * FFT over columns: 0 ms
T R A N S P O S E...
* * * * * Transposition: 0 ms
==== Result (transposed) ====
| 72.0 0.0 | -8.0 -19.3 | -8.0 -8.0 | -8.0 -3.3 | -8.0 0.0 | -8.0 3.3 | -8.0 8.0
| -8.0 19.3
| 0.0 0.0 | 0.0 0.0 | 0.0 0.0 | 0.0 0.0 | 0.0 0.0 | 0.0 0.0 | 0.0 0.0 | 0.0 0.0
==========
Total time: 0
Com&utations #s. methods
5# one p"ans to reate a genera" purpose FF$ pa,age+ t%en it s%ou"d in"ude
"ots o# error %and"ing me%anisms+ data padding met%ods and simi"ar. 5n t%e
24 FF$ imp"ementation w%ere 14 FF$ routine was reused+ t%e main program
t%at was a""ing fft8r28$if() !eame "onger and more omp"ex. Putting
additiona" "ogi and data orretion routines I#or trans#orm si.es not !eing
radix powersJ wi"" resu"t in mu% more "ines and e-en more omp"ex
struture. &s it was mentioned !e#ore+ %ange o# t%e ra$i- parameter
Iespeia""y inreasingJ an derease !ot% omp"exity and num!er o#
omputations. $%e prie t%at one %as to pay #or doing t%at are "imitations t%at
are put on input data Inum!er o# samp"esJ. $%e per#et so"ution in t%is ase
wou"d !e to %a-e a set o# imp"emented a"gorit%ms t%at !e#ore #irst run an !e
tested+ put into a tournament or simi"ar time-%a""enge+ so t%at it wou"d !e
possi!"e to pi, up t%e #astest so"ution.
D6 o# 116
9lgorithm :eal m"lti#lications :eal a$$itions
:a$i--2 G1C2F 1DC266
:a$i--0 5>DFG 126>6G
:a$i--. FC156 126C>G
:a$i--13 FG1D2 125FF2
$a!"e 1: (perations needed #or omputing FF$ #or 0 T F0C6 Ita!"e ta,en #rom WFast Fourier
$rans#orm and its app"iationsW+ 9. Arig%amJ
5t was a"so mentioned a #ew times+ t%at #or t%e ra$i--0 twidd"e #ators %a-e
-ery attrati-e #orm. Simi"ar situation is #or ra$i--.+ w%ere twidd"es are )ust
!1+ !) +!e
)
n
F
+!e
)
n
F . Runtime %e,s mig%t !e too expensi-e 3 !ut p"an-
!ased omputations w%i% in"ude preparations+ t%at an ta,e mu% more
time t%an atua" exeution an !e !ene#iia". &ording to t%is prinip"e FF$B
"i!rary was designed. Li!rary itse"# wi"" !e o-ered in "ater %apters.
5lgorithms for real data
9speia""y #or rea"-time systems it mig%t !e wort% distinguis%ing !etween t%e
input type+ i# it is eit%er pure"y rea" or omp"ex. B%yH 5n ase+ w%en we ,now
t%at input data is pure"y rea"+ we a"so ,now t%at omp"ex part o# ea% num!er
is e*ua" to .ero. 'omp"ex mu"tip"iation+ w%i% norma""y "oo,s as #o""ows:
(a+)!)(+)d)=(a!d)+)(ad+!)
!eomes
a(+)d)=a+)ad .
5nput -etor wi"" ta,e on"y %a"# o# t%e memory spae+ t%us it wi"" !e more
a%e #riend"y.
For real-systems each millisecon$ is &ery im#ortant, beca"se "s"ally
the same ste#s are #erforme$ se&eral times o&er one secon$. ;*ne
millisecon$ here, one millisecon$ there...< - an$ finally the
im#ro&ement might be something 4e 4o"l$ not e&en e-#ect it to be.
D> o# 116
Part II - 6ot e#erything that can be counted counts0
and not e#erything that counts can be counted.
9lbert +instein
7inu( as destination o&erating system
$%e w%o"e story !egan mu% ear"ier !e#ore Linux was "aun%ed in 1CC1 !y
Linux $or-a"ds. 1nix is !e"ie-ed to !e t%e !est p"anned operating system #or
mu"tiuser en-ironments. Aeing 6u"tis suessor+ 1nix grew and !eame a
mode" #or many ot%er operating systems+ t%at nowadays are a""ed :Snix
#ami"y= operating systems.
Linux !eing not a diret deri-ati-e o# 1nix
C
imp"ements P(S5M and S1S
standards+ so t%at it is Snix ompati!"e. Aut t%e !iggest !rea,t%roug% is t%e
#at+ t%at Linux is a omp"ete"y #ree so#tware+ w%i% is re"eased under t%e
/enera" Pu!"i Liense. 1nix;s experiene and a!i"ity to de-e"op an operating
system !y programmers #rom t%e w%o"e wor"d !eame Linux;s suess. /ood
design+ "ig%tweig%t ar%iteture and possi!i"ity o# %anging e-ery sing"e piee
o# t%e system made it a per#et too" #or resear% en-ironments+ as we"" as it
!eame a per#et way o# dereasing $'(
10
in ompanies+ w%i% run Linux (S
on t%eir ser-ers or des,tops.
$ntroduction to &inu! environment
Ae"ow paragrap%s wi"" try to pro-ide on"y t%e most important piees o#
in#ormation #rom t%e operating systems t%eory #o""owed !y examp"es #rom
Linux (S+ and t%ose t%at are re"e-ant in understanding next %apters+ w%ere
t%ere wi"" !e a word a!out setting up a Linux resear% !ox.
*ne sho"l$ ask, 4hy =in"- an$ not other o#erating system> The i$ea is
,"ite sim#le! 4ith =in"- one gets a rob"st an$ o#en o#erating system,
4hich gi&es a lot of #o4er ? b"t only to those that can embrace it an$
kno4 eno"gh to make "se of e&ery single #iece of it. In case of f"lly
commercial o#erating systems, 4hat one gets is a close$-so"rce #iece
of soft4are. In these cases #ossibilities of making changes are of
co"rse m"ch smaller.
The i$ea is not to criticize close$-so"rce commercial #ro$"cts ?
beca"se they are also goo$, b"t in this #artic"lar case they might not
be the best choice. Many close$-so"rce a##lications are b"siness-
oriente$, th"s they are "ni&ersal an$ "ser frien$ly, an$ this means that
lots of "sef"l 5@ #o4er is "se$ for r"nning "seless things from the
research #oint of &ie4.
C :0ot diret deri-ati-e= - in terms o# t%e soure ode and ot%er aspets o# t%e operating
systemR some o# ear"y 1nix assumptions are di##erent in Linux+ "i,e swapping.
10$'( 3 $ota" ost o# owners%ipR $'( in"udes osts o# pur%ase+ maintenane and ot%ers+
t%at are needed to run a piee o# %ardware2so#tware. $'( is main"y used to estimate
%ardware2so#tware in-estments+ !ut it is not "imited to t%em.
DG o# 116
8&erating system ! organi-ation
0owadays operating systems %a-e di##erent strutures depending on target
group o# users. 5t an !e -iewed a"so #rom a perspeti-e w%ere t%e system
eit%er %as or does not %a-e a grap%i inter#ae and supports or does not
support ertain groups o# %ardware or so#tware. Aesides t%ese :sa"esman=
arguments and #eatures+ t%ere are ot%er+ more important onerns t%at are on
t%e !ottom o# t%e imp"ementation. &n operating system %as to:
!e an a!strat "ayer !etween user and %ardwareR it means+ t%at it %as to
!eome an a!strat ma%ine t%at reates ot%er a!strat e"ements t%at an
!e used at random times Iin"uding onurrentJ. $%is prinip"e is a
#oundation o# mu"tiprogramming en-ironments Iand #urt%er time-s%ared+
mu"ti-tas,ing en-ironmentsJ
oordinate use o# a"" e"ements present in t%e systemR t%is o# ourse %as to
!e !ound to po"iies de#ined !y t%e administrator Ian !e a"so a""ed
resoure managementJ.
& we"" designed operating system pro-ides "ean and simp"e a!stration #or
aessing its resoures. &!stration is a way o# designing a %omogeneous
inter#ae #or aessing and ontro""ing %ardware+ !ut a"so ot%er a!strations.
1sua""y a!stration onsists o#
two %a"-es: one !eing a part o#
%omogeneous inter#ae system+
and t%e ot%er one+ !eing more or
"ess a p"ugga!"e modu"e t%at
pro-ides de-ie-dependent
routines. Bit%out t%is+ it wou"d
!e a"most impossi!"e to ope
wit% t%ousands di##erent
%ardware imp"ementations.
5n Linux t%e most important
a!stration is a #i"e+ w%i% an
!e e-eryt%ing. $%is a!stration
is used #or a"" de-ies+ so t%at
operating on t%em is as easy as reading2writing to a #i"e. For t%is purpose
operating system %as a set o# a""s t%at are exeuted on !e%a"# o# t%e
operating system. $%is is due to t%e #at+ t%at Linux %as a memory manager
wit% protetion Idiret examp"e o# a po"iy app"ied to one o# t%e resouresJ.
/oing e-en deeper we wi"" #ind out+ t%at ,erne" and user app"iations run in
di##erent address spaes and t%ere is no diret aess #rom user spae to t%e
,erne" spae. 6entioned ear"ier memory protetion ou"d not !e so"-ed
wit%out %ardware support. $%e protetion !it is a""ed super-isor !it Iso t%at
#or examp"e assem!"y program wou"d not inter#ere wit% t%e operating system
gi-ing aess to a"" its resouresJ and instrution t%at swit%es #rom
unpri-i"eged to pri-i"eged mode is a a""ed a trap instrution. (# ourse+ a"" 52(
operations are pri-i"eged instrutions and t%us t%ey are a-ai"a!"e on"y t%roug%
system a""s.
& proess itse"# is an a!stration+ w%i% is represented again !y a #i"e in Linux
3 %and"ing sets o# proesses wi"" o# ourse resu"t in anot%er a!stration+ w%i%
DC o# 116
wi"" pro-ide set o# routines #or modi#ying its !e%a-ior. Fu"" separation o#
proesses is a %ard tas, and re*uires "ots o# "ines and sop%istiated a"gorit%ms
to minimi.e t%e o-er%ead+ !ut t%e outome is t%at t%en one an reate
proesses on an a!strat ma%ine t%at wi"" %a-e #or t%eir own purposes a
'P1+ memory and aess to ot%er de-ies 3 an a!strat ma%ine+ a""ed
sometimes :a -irtua" 'P1=.
$%ere s%ou"d !e no dou!t a!out t%e #at+ t%at it is mu% easier to !ui"d new
a!strations wit% ot%er a!strations. 6entioned a!o-e !asis a!out operating
systems t%eory wi"" !e *uite important w%en it wi"" !e t%e time to prepare a
Linux resear% !ox.
9bstractions make it easier also to #ort the sol"tion, as 4hate&er is
behin$ ;the call< sho"l$ be trans#arent to the #rogrammer. Aell
$esigne$ o#erating system sho"l$ #ro&i$e a com#romise bet4een
sec"rity, ability to config"re certain things an$ efficiency.
9 great e-am#le of abstraction might be for e-am#le the
gettimeof$ay() ro"tine, 4hich gi&es access to the high resol"tion
clock. It might be "se$ for calc"lating e-ec"tion time of f"nctions an$
4ill 4ork e-actly the same on all #latforms (4here it is im#lemente$ of
co"rse). Beha&ior is the same, $ifference might be in acc"racy. B"t the
i$ea is, that in =in"- there is gettimeof$ay that 4ill 4ork e-actly the
same on -.3, -.3830, S9:5, 9:M an$ so on.
7inu( means /ernel
0ot%ing "ess+ not%ing more. $%e main idea was to pro-ide a set o# a!strations
t%at wou"d imp"ement on"y t%e most important operations2ations. &"" ot%ers
s%ou"d !e p"aed outside t%e ,erne" and use "i!rary #untions stored on t%e
dis, Iw%i% wi"" #ina""y use ,erne";s system a""sJ. $%e simp"est ase is w%en
we ta", a!out t%e #rintf()C #untion. 5t pro-ides a set o# #ormatting ru"es and
ot%er options and is a part o# t%e st$io "i!rary. &"" string operations are made
in t%e user spae !y stdio 3 t%e "ast stage onsists o# a""ing 4rite()+ w%i% is
a system a""+ and printing t%e string on se"eted de-ie Isreen+ #i"e et.J. $%is
ma,es it possi!"e to ,eep ,erne" sma"" and gi-es a"so mu% more possi!i"ities
#or ot%er app"iations Ias t%en ot%er app"iations #orm t%e s%ape o# a !are
operating system+ w%i% is )ust a ,erne"J.
9ernel ! organi-ation
Kerne"+ a"so re#erred to as t%e ore is a mono"it%i so#tware. 4espite t%is #at+
it is possi!"e to add and remo-e piees o# ode a""ed modu"es at runtime. $%is
p"ugga!"e ar%iteture is !ased on system a""s responsi!"e #or
registering2remo-ing modu"es. & #u"" s%ematis o# t%e ,erne" is way too
omp"ex and ontains too many detai"s t%at are most"y not needed #or our
purposes. &s it was mentioned !e#ore+ t%e ,erne" pro-ides se-era" ser-ies
"i,e
11
:
11Ae%ind t%ese nie-named ser-ies t%ere are t%ousands o# "ines t%at reate "ower "e-e"
a!strations and t%at are responsi!"e #or ot%er important aspets Ion t%e ontrary to user
spae+ ,erne" %as "imited sta,+ no memory manager and piees o# ode s%ou"d !e S6P-
aware+ t%us #u"" syn%roni.ation is needed w%en wor,ing in mu"ti-'P1 modeJ.
F0 o# 116
tas,2proess manager Is%edu"erJ
de-ie manager I52( s%edu"erJ
memory manager I#u"" separationJ
interrupt manager Ia"" omputers use interrupts #or ommuniating wit%
de-iesJ
6emory and in #at operating mode is #urt%er di-ided into two piees: user
and ,erne" mode. $%e on"y way o# going t%roug% !etween t%ese two spaes is
to do it o-er system a""s. &s #or now in t%e 2.6.x ,erne" t%ere are around D00
system a""s. (n"y neessary system a""s get imp"emented to ,eep t%e
inter#ae simp"e
12
.
$%ere are a"so ot%er me%anisms and #eatures present in t%e ,erne"+ w%i%
wi"" !e espeia""y interesting #or us "i,e Symmetri mu"tiproessing+ w%i%
was not present in origina" 1nix imp"ementation+ t%read %and"ing and
preemption me%anisms. $%e "ast one is -ery important+ as it determines t%e
way t%e s%edu"er is going to wor,+ w%i% in turn wi"" a##et !e%a-ior o# t%e
w%o"e system.
& #ew next paragrap%s wi"" desri!e in more detai"s t%e w%o"e idea a!out t%e
s%edu"er and t%e way it an a##et system;s !e%a-ior.
'cheduler and tas( management
S%edu"er is a part o# ,erne" t%at is supposed to %oose a tas, t%at wi"" !e
exeuted next. $%is means+ t%at s%edu"er is responsi!"e #or granting 'P1
time and t%us is responsi!"e #or managing 'P1 resoures in an optima" way.
$%e outome i# t%is wor, is i""usion o# para""e"ism.
12(ne o# t%e ideas !e%ind system a""s is t%at t%ey are ne-er meant to do somet%ing spei#i+
or in ot%er words :one s%ou"d ne-er ,now w%at t%e system a"" an !e used #or=.
F1 o# 116
$%e way s%edu"er dea"s wit% tas, exeution an %a-e a !ig impat on
system;s per#ormane.
lease ha&e in min$ the fact, that it is relati&ely easy to create a
#arallel &ersion of the 2D FFT ro"tine, 4hich is f"rther calle$ from the
main #rogram. The fact that 4e can infl"ence the 4ay in 4hich tasks
4ill be treate$ is &ery im#ortant, as it can hel# "s b"il$ing a system
that 4ill ha&e best #erformance either as a lo4-latency real-time
system or #o4er com#"ter, 4ith bigger latency b"t high thro"gh#"t.
*f co"rse, combination of both is also #ossible an$ sho"l$ be caref"lly
teste$. *ne co"l$ ask! ;4hy it sho"l$ be teste$><. Denerally s#eaking,
;lo4-latency kernel< 4ill be for s"re the best sol"tion for a lo4-latency
system, b"t it is 4orth testing all #ossibilities (c"rrently only 2),
beca"se sometimes it might be #ossible to get o"t more from the
har$4are ? com#letely for free.
+cheduler ty&es
Ae#ore disussing s%edu"ers and t%eir #eatures it is neessary to o-er !asis
o# mu"titas,ing. 'urrent"y most 3 i# not a"" 3 operating systems support
mu"titas,ing. $%ere are se-era" ways o# designing mu"titas,ing systems 3 !ut
t%ere is no go"den midd"e t%at wi"" a"ways pro-ide maximum t%roug%put and
minimum "ateny system.
/enera""y spea,ing+ mu"titas,ing an !e imp"emented !y ta,ing into aount
di##erent s%edu"ing strategies. 6ain ideas !e%ind t%ese imp"ementations are
strit"y onneted wit% time s"ies t%at are granted #or proesses in tas,
*ueue and wit% possi!"e met%ods #or interrupting tas,s t%at are urrent"y
!eing exeuted. Some o# t%ese approa%es are no "onger !eing used+ "i,e #or
examp"e mu"tiprogramming t%at was rep"aed wit% time-s%aring systems.
0owadays we an say+ t%at mu"titas,ing systems an !e di-ided into two
ategories: mu"titas,ing wit% ooperation and mu"titas,ing wit% preemption.
5n ase o# t%e ooperati-e s%edu"er t%e system re"ies on proesses as t%ey
are responsi!"e to gi-e out 'P1 time regu"ar"y to ot%er proesses. $%is means+
t%at wrong"y designed program an e-en %a"t t%e w%o"e system. $%is ,ind o#
so"ution an maximi.e t%roug%put on stations t%at are supposed to run
partiu"ar piee o# so#tware+ e.g. ser-ers or sienti#i ma%ines. $%is approa%
was s"ig%t"y re-ised and imp"emented as one o# a-ai"a!"e s%edu"ers in 2.6
,erne"s. 5t is a""ed :0o Fored Preemption=. (# ourse+ w%en on#iguring
system wit% t%is s%edu"er one %as to !e aware o# t%e #at t%at oasiona"
"onger de"ays are possi!"e and t%ere are no ot%er guarantees 3 in t%e worst
ase a proess an simp"y %ang w%o"e system. 4espite t%is #at+ #or sienti#i
purposes t%is s%edu"er seems to !e t%e one wort% "oo,ing at.
(t%er option is preempti-e s%edu"ing. 5n t%is approa% s%edu"er deides
w%et%er to stop exeuting a proess and grant 'P1 time to anot%er one
waiting in t%e *ueue
1D
. $%e time w%i% is used ex"usi-e"y !y a proess !eing
exeuted is a""ed time s"ie. 5ts -a"ue is determined dynamia""y.
1D&s "ong as exeution ta,es p"ae in ,erne" spae it is possi!"e to stop exeution Ipreempt
t%e proessJ.
F2 o# 116
5n Linux ,erne" 2.5 t%e s%edu"er was re-ised and rewritten+ so t%at now it is
a""ed a s%edu"er (I1J w%i% means+ t%at t%e a"gorit%m is independent o# t%e
num!er o# e"ements in t%e input set Iproesses in t%is aseJ
1F
.
'urrent"y Linux ,erne" o##ers two di##erent s%edu"ing a"gorit%ms #or
preempti-e systems:
Lo"untary Kerne" Preemption Ides,topJ
Preempti!"e Kerne" I"ow-"ateny des,topJ
Eol"ntary Fernel reem#tion
5n t%is mode programmers #oused t%eir attention on reduing t%e "ateny o#
t%e ,erne". 5n order to a%ie-e t%is aim se-era" :exp"iit preemption points=
were added to t%e ,erne" ode+ t%at resu"t in reduing maximum "ateny o#
res%edu"ing. Fran,"y spea,ing+ we get a #aster response at a ost o# s"ig%t"y
"ower t%roug%put. 1sers o!ser-e i""usion t%at a"" app"iations run :smoot%"y=
e-en i# t%e system is %ea-i"y "oaded. 5n t%is mode it is possi!"y to preempt a
tas, e-en i# it is exeuting ode on !e%a"# o# t%e operating system during a
system a"".
reem#tible Fernel
5n t%is ase it is possi!"e to preempt t%e ,erne" at e-ery point Idoes not app"y
to ritia" setionsJ. $%is means+ t%at it is possi!"e to preempt a proess t%at is
exeuting #or examp"e a system a"" !e#ore natura" point o# preemption. $%is
s%edu"er redues o-era"" t%roug%put+ !ut o##ers "ow response time+ t%us it
mig%t !e used on !ot% des,top omputers and rea"-time em!edded systems+
w%ere "ateny re*uirements are measured in mi""iseonds. $%is s%edu"er is
de#inite"y wort% testing+ as o!tained resu"ts mig%t !e *uite interesting.
For systems that are s"##ose$ to 4ork real-time an$ that 4ill 4ork
4ith h"ge amo"nts of $ata it sho"l$ be caref"lly teste$ 4hich of the
sche$"lers 4ill gi&e ma-im"m #erformance, b"t it sho"l$ be ke#t in
min$ for all the time, that thro"gh#"t means no #reem#tion an$ lo4-
latency means smaller amo"nt of $ata to be #rocesse$.
&s it was mentioned pre-ious"y+ s%edu"er is responsi!"e #or managing time
s"ies. 'urrent strategy must in-o"-e time s"ie %anges+ as it mig%t in#"uene
o-era"" per#ormane. (# ourse+ "engt% o# time s"ie depends on t%e type o#
proess and on its priority.
S"mmarizing! it is goo$ to kno4 both the ty#e of the soft4are 4e are
going to "seG$e&elo# (in terms of latency) an$ *S internals in case of
=in"-.
I:8 and CP) bound &rocesses
52( !ounded proesses spend most o# t%eir time on initia"i.ing and ser-ing 52(
operations+ t%us t%ey re*uire "ess 'P1 time+ as 52( de-ies are onsidered to
1FS%edu"er in 2.6 ,erne"s is more e##iient t%an t%e one present in 2.F series.
FD o# 116
!e mu% s"ower t%an 'P1. 5t a"so imp"ies t%e #at+ t%at 52( !ounded proesses
are not interati-e+ t%us t%ere is no response !oundary: it is enoug% to satis#y
52( de-ies. 5n 105M-"i,e operating systems t%e s%edu"er usua""y #a-or 52(
!ounded proesses.
Situation is s"ig%t"y di##erent in ase o# 'P1 !ounded proesses+ t%at re*uire
)ust 'P1 time eit%er to #inis% omputations or !eause t%e proess is %ig%"y
interati-e. S%edu"er t%is time tries to exeute t%ese proesses "ess
#re*uent"y+ !ut t%e time s"ie is en"arged+ as t%e proess is not interrupted !y
any ot%er e-ents unti" it is preempted !y t%e ,erne".
&!o-e text "eads to t%e #o""owing on"usions:
non-interati-e and 52( !ounded IorientedJ proesses are exeuted more
o#ten+ !ut time s"ie is mu% s%orter Ie.g. 10msJ
%ig%"y interati-e proesses and 'P1 !ounded IorientedJ proesses are
exeuted "ess #re*uent"y+ !ut time s"ie is mu% "arger Ie.g. 200msJ
&!o-e -a"ues s%ou"d !e ompared wit% a standard time s"ie "engt%+ t%at in
t%is ase ou"d !e 100ms.
If 4e are s"##ose$ to $esign a real-time system, that is s"##ose$ to
#erform filtering in fre,"ency $omain, 4e ha&e to take into acco"nt
the follo4ing consi$erations!
images 4ill likely be of 1,5 an$ more Mega#i-els
har$4are 4ill ha&e to #ro&i$e high-ban$4i$th link bet4een 5@ an$
memory
sche$"lerHs tests sho"l$ be starte$ from ;'o Force$ reem#tion<
kernel (an$ this one 4ill be "se$ for f"rther meas"rements)
Priorities ! influencing scheduler's ueue
(ne o# t%e most popu"ar met%ods o# s%edu"ing tas,s is to use prioriti.ed
*ueue. 5t simp"y means+ t%at proesses t%at %a-e %ig%er priority wi"" !e a"ways
in t%e !eginning o# t%e exeuta!"e *ueue 3 and w%at is -ery important 3 in
Linux systems t%ey wi"" !e a"so granted "onger time s"ie.
Linux uses a"so dynami priorities+ t%us t%ey an !e modi#ied during t%e
runtime. & parameter t%at arries in#ormation a!out t%e priority is t%e :nie=
"e-e". 5ts range is #rom -20 to 1C 3 de#au"t is 0. $%e "ower t%e nie is+ t%e
%ig%er priority a proess %as. 5t is possi!"e to modi#y :nie= "e-e" #rom !ot%
operating system and user spae.
#include <unistd.h>
FF o# 116
int nice(int inc);
'urrent"y t%ere are two imp"ementations o# nie 3 one t%at returns 0 w%en
priority %ange was suess#u" and t%e ot%er one returns urrent priority
Ia#ter %ange+ no matter i# it sueed or notJ. $%e #irst ase app"ies to g"i! in
-ersions XT 2.2.F It%ere is a a"" getpriorityIJ t%at returns urrent nie -a"ueJ.
System call Descri#tion
nieIJ Sets t%e :nie= "e-e"
s%edVsets%edu"erIJ Sets s%edu"ing strategy
s%edVgets%edu"erIJ Returns urrent s%edu"ing strategy
s%edVsetparamIJ Sets t%e rea"-time priority o# a proess
s%edVgetparamIJ Returns urrent rea"-time priority o# a proess
s%edVgetVpriorityVmaxIJ Returns maximum rea"-time priority
s%edVgetVpriorityVminIJ Returns minimum rea"-time priority
s%edVrrVgetVinter-a"IJ Sets "engt% o# t%e time s"ie
s%edVseta##inityIJ Sets t%e 'P1 a##inity mas, o# a proess
s%edVgeta##inityIJ /ets t%e 'P1 a##inity mas, o# a proess
s%edVyie"dIJ Re"ease 'P1 time
$a!"e 2: S%edu"ing: system a""s
6ost o# system a""s presented in t%e ta!"e a!o-e are onneted wit% t%e so
a""ed rea"-time s%edu"ing. $%e on"y draw!a, o# using t%em is t%at t%ere is
no guarantee t%at re*uests wi"" !e #u"#i""ed aording to t%e rea"-time
!oundaries.
9t a first glance #riorities might seem to be I"st a ;cosmetic< tool - b"t
they sho"l$ not be consi$ere$ this 4ay. It is 4orth e-#erimenting 4ith
#riorities, as they can gi&e better real-time res"lts. *f co"rse, the best
res"lts 4ill be obtaine$ for a##lications that are s"##ose$ to r"n for a
longer time, as for short #erio$s of time that are close to the time slot,
im#act of the sche$"ling #riority 4ill not be &isible (or only #artly).
)uic( note a*out $+, schedulers
Simi"ar"y to tas, management t%ere are a"so 52( s%edu"ers t%at an in#"uene
t%e way 52( operations are ser-ied on a system. $%is topi is "ess re"e-ant in
ase o# t%e FF$ pro!"em+ !ut sti""+ it is wort% ,nowing w%i% one s%ou"d !e
pi,ed up w%en on#iguring t%e ,erne".
5#ailable I:8 schedulers
9ntici#atory
F5 o# 116
5t is a de#au"t dis, s%edu"er. $%is s%edu"er is suita!"e #or most en-ironments+
!ut its imp"ementation is omp"ex and ode si.e is "arge and it mig%t not !e a
good so"ution #or data!ase systems.
Dea$line
Simp"e and ompat 3 t%e !est %oie #or %ea-i"y "oaded data!ase systems. 5n
some ases its !e%a-ior is a"most identia" to t%e antiipatory+ t%us it is a good
%oie.
5FJ
Aest %oie #or des,top systems 3 it tries to distri!ute !andwidt% e*ua""y
among a"" proesses in t%e system.
Dea$line IG* sche$"ler in this case seems to be the best choice, as it
4orks similarly to the antici#atory sche$"ler an$ its im#lementation is
less com#le-. This might be interesting, 4hen some res"lts, i.e. images
ha&e to be 4ritten back to the $isk (or some initial images ha&e to be
rea$ from the $isk). *f co"rse, in case 4hen there is eno"gh :9M
memory, $esigners sho"l$ al4ays consi$er #ossibility of "sing memory
ma##e$ files, 4hich offer m"ch better rea$G4rite #erformance 4hen
com#are$ 4ith fastest har$ $isks. +-actly the same #roce$"re sho"l$
be follo4e$ 4ith all filters that 4ill be "se$ in con&ol"tions.
-ulti-threaded applications
Linux treats t%reads in a -ery speia" way 3 !ut sti""+ t%ey are )ust proesses
and o!ey exat"y t%e same ru"es as ot%er tas,s present in s%edu"er;s *ueue.
$%e #at+ t%at t%ey s%are ommon address spae is not o# any interest #or t%e
s%edu"er+ !eause it does not a##et s%edu"ing in any way. $%ere are se-era"
design issues onneted wit% t%reads w%i% are diret"y re"ated to t%e
spei#is o# an operating system. 4etai"s and di##erenes wi"" !e exp"ain in
next paragrap%s.
Classic #s. modern &rocesses
1sing t%reads an !e presented as :getting two t%ings done at t%e same time=
Iexat"y t%is way @a-a ad-ertises its t%readed &P5J. &s #or now we are
a*uainted wit% a proess as an entity %a-ing aess to t%e a!strat ma%ine
Iw%i% is a!strat 'P1 and memoryJ. Bit% t%reads t%ere omes a new entity+
t%at is a""ed a modern proess. Be an %a-e se-era" t%reads+ t%at exeute
wit%in one #ramewor, o##ered !y t%e modern proess It%e !iggest ad-antage
o# using t%reads o-er mu"ti-proessed so"utions is t%at t%ey %a-e a ommon
address spae and t%ere is no need #or imp"ementing omp"ex 5P' routinesJ.
(n uniproessor ma%ines t%reads are treated exat"y t%e same as norma"
proesses Ion 2.F.x ,erne"s t%readed app"iations %ad %ig%er priority in
F6 o# 116
s%edu"er;s *ueue !y de#au"tJ+ !ut t%e !est resu"ts an !e a%ie-ed w%en
t%reads run in mu"tiproessor en-ironment+ as t%en onurrent exeution
turns into #arallelism Iremem!er+ t%at onurrent does not mean para""e"R in
ase o# t%e #irst one it is )ust an in#ormation t%at somet%ing an !e exeuted in
random order and piee-!y-piee+ w%ereas para""e"ism depits exeution o#
entities at t%e same time+ "itera""yJ.
$%ere are two main imp"ementations o# t%reads: user and ,erne" t%reads. $%e
#irst one resem!"es a situation+ w%ere we %a-e a modern proess Itime
mu"tip"exedJ wit% a!strat ma%ine+ t%at is again time mu"tip"exed !y t%e
p%ysia" a!strat ma%ine. $%is simp"y means+ t%at a!strat ma%ines in"ude
t%eir own mu"tiprogramming en-ironment. 1ser spae "i!raries ma,e use o#
t%is prinip"e I6a% '+ P(S5M t%readsJ 3 and t%is app"ies to operating systems
t%at imp"ement "assi proesses. $%ere is one !ig draw!a, o# t%is so"ution 3
w%en t%e master t%read is !"o,ed+ t%en a"" ot%er under"ying t%reads are
!"o,ed+ too.
Kerne" t%reads o##er opposite !e%a-ior 3 w%en one o# t%e t%reads is !"o,ed+
ot%ers an sti"" exeute.
1nder Linux para""e"i.ation an !e imp"emented in two and more ways: eit%er
!y using threa$ "i!rary or clone()+ w%i% in turn a""s sysV"one system a"".
sys8clone
$%e "oneIJ a"" wor,s simi"ar to t%e #or,IJ system a""+ w%i% reates a %i"d
proess. $%e di##erene !etween t%em is t%at wit% "oneIJ it is possi!"e to
reate a %i"d proess t%at wi"" %a-e ommon address spae wit% t%e parent
proess 3 so in #at 3 wi"" !e a t%read. $%e !iggest disad-antage o# "oneIJ is
t%e #at t%at it is a Linux spei#i a"" and s%ou"d not !e used w%en t%e
app"iation is supposed to !e porta!"e.
threa$
Linux$%reads+ "i!rary t%at o##ers porta!"e t%readed &P5 seems to !e t%e !est
a-ai"a!"e so"ution in t%is ase Iw%en wor,ing in mu"tiproessor en-ironmentJ.
5t imp"ements P(S5M 100D.1 &P5 w%i% ma,es it possi!"e to run on se-era"
ot%er Snix #ami"y operating systems
15
. B%at atua""y %appens in Linux w%en
t%reads are reated is: ea% t%read is mapped to a sing"e LBP. :Lig%tweig%t
proess= is anot%er name #or ,erne" t%reads. Linux imp"ements one-to-one
strategy+ so t%at ea% t%read is !ounded to one LBP. 4i##erent situation is
under So"aris+ w%ere proess o# reating LBPs is rat%er expensi-e 3 t%ey use
many-to-many strategy. 5n ot%er words+ LBP is not%ing e"se t%an an a!strat
ma%ine Iand as it was mentioned !e#ore+ sine t%reads s%are address spae+
#i"e desriptors and so on+ LBPs do not need a"" piees o# in#ormation t%at are
assoiated to a proess 3 t%at is w%y t%ere is t%e name :"ig%tweig%t proess=J.
$%is is a %uge ad-antage o-er a standard #or,IJ a"" 3 #irst"y !eause
ommuniation !etween parent and %i"dren proesses is not a pro!"em+ and
t%e seond t%ing is t%at reating a %i"d proess is mu% more expensi-e t%an
15$%ere exists Bindows imp"ementation o# Pt%read+ so it is possi!"e to run Pt%readed
app"iations on a Bindows (S+ w%i% nati-e"y imp"ements modern proesses.
F> o# 116
reating a t%read+ as w%en sys8fork is in-o,ed t%e ,erne" %as to reate an
independent opy o# t%e address spae+ #i"e desriptors and so on. $%is ta,es
time and o# ourse resoures.
(t%er ad-antage o# Pt%read o-er "oneIJ mig%t !e t%e #at+ t%at Pt%read omes
wit% a set o# additiona" too"s t%at were meant to simp"i#y syn%roni.ation and
#urt%er operations on t%reads.
There sho"l$ be more or less no $o"bt, that for most a##lications
threa$ is the best a&ailable sol"tion. It #ro&i$es clean an$
"n$erstan$able interface that makes m"lti-threa$e$ $esign both
easier an$ nicer (for e-am#le setting "# Ain$o4s threa$s is m"ch
more com#le-, es#ecially the synta- an$ n"mber of #arameters #asse$
to the f"nction). It also comes 4ith set of tools nee$e$ for
synchronizing threa$s (m"te-es etc.).
;hy threads are so im&ortant<
0ot mu% time ago mu"ti-'P1 ma%ines were a "uxury. 0owadays t%ey s"ow"y
!eome a standard. /enera""y spea,ing+ w%ene-er t%ere is a situation t%at we
%a-e more t%an one 'P1 on t%e !oard+ it is %ig%"y ad-isa!"e to onsider using
t%reads. $%e word consi$er was used on purpose+ as not a"ways it is possi!"e
to imp"ement t%reads and some pro!"ems an !e so"-ed e-en !etter wit%out
t%em. (# ourse+ wit% or wit%out t%reads in mu"ti-'P1 en-ironment most
app"iations wi"" wor, mu% !etter. $%e reason #or t%at is *uite simp"e: a
proess an !e present on"y in one *ueue Inum!er o# s%edu"er;s *ueues is
e*ua" to num!er o# 'P1sJ. For examp"e+ an app"iation t%at we are using and
t%at is 'P1-onsuming wi"" !e !ound to 'P1 &+ w%ereas 'P1 A wi"" ta,e a""
ot%er ati-ities t%at %appen in t%e operating system and wi"" ser-e ot%er
app"iations Io# ourse+ s%edu"er is responsi!"e #or determining t%is
situationJ.
From t%e examp"e presented a!o-e we an draw one important on"usion:
)ust #or two-'P1 pro!"em I& and AJ+ our sing"e-proess app"iation wi"" !e a!"e
to use on"y 50Y o# a-ai"a!"e power+ !eause it wi"" !e !ound only to one 'P1K
$%e way o# getting 100Y o# t%e power is eit%er to imp"ement t%reads Imodern
so"utionJ or proesses
16
. $%is a"so "ear"y sets t%e ru"e+ t%at t%e num!er o#
ati-e t%reads s%ou"d !e set to t%e num!er o# 'P1s a-ai"a!"e in t%e system.
+=P ! more CP)s0 more &roblems
S6P itse"# is a omputer ar%iteture t%at a""ows to %a-e more t%an one 'P1
on t%e !oard. Proessors are onneted to a s%ared memory and ea% o# t%em
an exeute ar!itrary ode regard"ess o# data "oation. $%e worst "imitation is+
t%at on"y one 'P1 an aess memory at a time. (n modern systems wit% -ery
#ast 'P1s t%e !ott"ene, usua""y is t%e memory aess+ as t%e 'P1 speed is
mu% %ig%er t%an memory speed
1>
Ion S6P systems it is o# ourse e-en
worseJ. Partia" so"ution #or t%is pro!"em an !e insta""ing more #ast a%e
memory 3 !ut one %as to ta,e into aount t%at:
165t is wort% "oo,ing at (pen6P5+ an open soure imp"ementation o# t%e 6P5 protoo".
1>Ae#ore 1C>0 it was exat"y opposite: memories were mu% #aster t%an 'P1s
FG o# 116
a%e memory an;t !e in#inite"y "arge+ as t%en a%e management wou"d
ta,e more time t%an ordinary memory aess
a%e memories are usua""y stati rams w%i% o##er aess times in range
o# 0.5-5nsR t%ey are too expensi-e to !e used as main memories.
&part #rom ma,ing memories #aster and #aster+ t%ere are ot%er te%ni*ues
t%at are !eing used+ "i,e 8yper $%reading $e%no"ogy introdued !y 5nte" in
t%e Pentium F proessor. From t%e (S point o# -iew+ 8$-apa!"e proessor
o##ers S6P inter#ae 3 t%us it is -isi!"e as two proessors. B%at 8$ atua""y
does+ is t%at w%en a 'P1 is sta""ed Ineeded data is not readyJ+ t%en t%e 'P1
tries to exeute anot%er t%read2proess t%at is next in t%e *ueue. So it means+
t%at it dup"iates resoures #rom t%e "ogia" point o# -iew+ !ut sti""+ t%ere is
on"y one 'P1 to do t%e )o!. 5nte" "aims+ t%at 8$ an gi-e up to D0Y in speed.
$%ere is no dou!t in saying+ t%at 8$ speeds up t%e ma%ine 3 pro!"em "ays in
t%e speed up #ator. 0ew approa%es "ea-e !e%ind 8$ and onentrate on
dua"-ore ar%iteture.
There are certain a$&antages of "sing KT "n$er =in"- ? b"t not
al4ays. For some tasks it might be 4orth checking if 4ith $isable$ KT
one can obtain better res"lts (an$ this sho"l$ be al4ays caref"lly
teste$).
$ual-core technology
$%e main idea is to inorporate two proessor ores on one die. From t%e
!usiness point o# -iew+ t%e manu#aturing proess is mu% more expensi-e
1G

and re*uires more transistors t%an standard one-ore dies. From t%e
operating system;s point o# -iew !ot% ores an !e reogni.ed as p%ysia"
proessors IS6P ar%itetureJ or an wor, in t%e so a""ed partitioned mode+
w%ere ea% 'P1 operates independent"y on its own p%ysia" memory. 1nder
Linux dua"-ore 'P1s wor, in s%ared memory mode+ t%us in S6P
ar%iteture. 5n most ases !ot% ores %a-e indi-idua" L1 a%es and are
#urt%er onneted to one L2 a%e I!ut t%is di##ers among proessorsJ.
Both Intel an$ 9MD man"fact"re $"al-core 5@s no4. In the first
case, Intel 5ore D"o is a 22-bit #rocessor 4ith lo4 #o4er cons"m#tion
an$ is a #erfect sol"tion for a #ortable com#"ter, 4hereas 9MD
man"fact"res $"al-core *#terons, 4hich are 30-bit 5@s an$ ha&e
Ky#erTrans#ort technology.
6orthbridge &roblem
(n some o# &64 proessors t%is pro!"em does not exist+ as t%ey %a-e a
memory ontro""er !ui"t on t%e proessor die. 5n ase o# 5nte" proessors+ t%ey
onnet to t%e 0ort%!ridge -ia FSA IFront Side AusJ+ w%i% #urt%er onnets
to t%e memory+ P'5 9xpress or &/P. $%ere is one more "in, to t%e Sout%!ridge
w%i% onnets to s"ower perip%era"s I-ia s"ow onnetionJ. (# ourse+ FSA
and onnetion #rom t%e 0ort%!ridge to t%e memory s%ou"d operate on t%e
1G8ig%er num!er o# transistors means more %e,s ItransistorsJ+ %ig%er *ua"ity o# used
materia"s and di##erent manu#aturing proess Imore omp"ex w%i% means more
expensi-eJ.
FC o# 116
same #re*ueny. 5t is possi!"e to %a-e sma""er #re*ueny on t%e "in, #rom
0ort%!ridge to memory 3 !ut t%is o# ourse is on"y !eause o# !a,ward
ompati!i"ity and an !eome Iusua""y !eomesJ anot%er !ott"ene,.
(perating #re*ueny o# t%e FSA is usua""y deri-ed #rom t%e 'P1 speed I!y
modi#ying FSA it is possi!"e to o-er"o, t%e 'P1R urrent"y most 'P1s
pre-ent #rom doing t%atJ. (# ourse+ t%e %ig%er #re*ueny+ t%e more data an
!e trans#erred !a, and #ort%.
$%e #at+ t%at memory is onneted t%roug% t%e 0ort%!rigde is one o# t%e
s%ortomings o# t%e 5nte" 'ore 4uo. $%is additiona" stage+ w%i% in #at is a
onnetion node a"so #or ot%er e"ements auses %ig% memory "ateny and
"imits a-ai"a!"e !andwidt% Iusua""y it is a s%ared "in,J. &not%er pro!"em is+
t%at t%ere are se-era" ompati!i"ity pro!"ems+ !eause it is impossi!"e )ust to
onnet a Pentium 'P1 to t%e P'5 !us. FSA expansion needs additiona"
adapters 3 w%i% in t%is ase are 0ort%- and Sout%!ridge.
$%is pro!"em was so"-ed on some o# t%e &64 proessors+ w%i% %a-e memory
ontro""er on t%e 'P1 die and use 8yper$ransport
1C
as FSA rep"aement. Bit%
8yper$ransport we gain two t%ings: we an )ust onnet toget%er a""
8yper$ransport aware %ardware
20
and t%ere is a signi#iant di##erene in
t%roug%put 3 a"so !eause 8yper$ransport is a dediated "in, 3 a-ai"a!"e
!andwidt% on 0or%t!ridge is s%ared. Fast+ sa"a!"e "in, t%at redues tota"
num!er o# !uses 3 t%is o# ourse %as positi-e impat a"so on t%e "ateny. Lery
important #eature is+ t%at 8yper$ransport is ompati!"e wit% "egay P'5+ P'5-
9 and P'5-M te%no"ogies.
Ma-. clock
s#ee$ (MKz)
122 233 222 .11 1111 1011 2311
$%eoretia"
speed in 6A2s
F266 G5DD 10666 12G00 1FF00 22F00 F1600
'omments -- -- -- 8yper
$ransport
&t%"on6F+
FM and
8yper
$ransport
8yper
$ransport
1C6ore a!out 8yper$ransport and 8yper$ransport 'onsortium an !e #ound on
%ttp:22www.%ypertransport.org2
20$%e so a""ed :g"ue"ess= so"ution.
50 o# 116
Ma-. clock
s#ee$ (MKz)
122 233 222 .11 1111 1011 2311
1.x (pteronR
8$ U m
2.0 D.0
5t is "ear"y -isi!"e+ t%at 8yper $ransport "in,s o##er mu% %ig%er #re*ueny+
w%i% yie"ds %ig%er trans#er rates. (ne again+ p"ease note+ t%at
8yper$ransport "in, is a diret "in,K
B%at mig%t !e -ery interesting #rom t%e resear% point o# -iew is t%e #at+
t%at 5nte" 'ore 4uo is a D2-!it proessor+ w%ereas &64 o##ers dua"-ore 'P1s
supporting xG6-6F ar%iteture+ w%i% was a"so adapted !y 5nte" and named
966F$ IMeon+ 5x6 Presott PF seriesJ. 5nte" 5tanium
21
+ w%i% itse"# #orms a
new 5&-6F ar%iteture un#ortunate"y annot !e ta,en into aount+ !ot%
!eause o# %ig% pries and poor support
22
#or t%e xG6 ID2-!itJ "egay ode.
6ore a!out ar%itetures wi"" !e o-ered in t%e next %apter.
Ky#erTrans#ort #ro&i$es better #erformance ? 4itho"t any $o"bts. In
the case of image #rocessing or other #rocessing that re,"ires large
streams of $ata, this kin$ of $esign sho"l$ be consi$ere$ first 4hen
b"ying a machine.
./ *its to happiness
6F-!it mar,et seems to !e growing day !y day+ !ut sti"" D2-!it app"iations
#orm -ast ma)ority and wi"" not !e #orgotten #or a "ong time
2D
. 'urrent"y+ w%en
pries o# xG6-6F proessors drop+ #or most resear% app"iations 6F-!it
programs s%ou"d !e t%e #uture. B%at is -ery important 3 it is up to t%e
operating system to deide w%et%er "egay D2-!it ode s%ou"d !e supported or
not 3 t%us t%ere is no way we an "oose our #a-orite D2-!it app"iations.
Dentoo =in"- gi&es a #ossibility of r"nning in both mo$es! long or
mi-e$. *ther $istrib"tions "s"ally come in m"lti-lib mo$e.
;hy >? is better than @3<
/oing !a, in time to 16-!it G02G6 proessor 3 at t%at time 5nte" added to it
I13-bit #rocessorLJ a #"at D2-!it addressing mode wit% D2-!it registers.
Simi"ar situation is in ase o# xG6-6F: w%at is new in &646F is t%at it supports
6F-!it I#"atJ addressing and %as 6F-!it registers w%i"e sti"" !eing a!"e to
exeute D2-!it app"iations. &646F ar%iteture en"arged widt% o# a""
registers I/PR+ P'J and additiona""y added G new /PRs and dou!"ed num!er
o# SS92 registers IM66 registersR #rom G to 16J. 5n ases+ w%en ompi"er is
aware o# t%ese additiona" registers it an signi#iant"y speed up t%e exeution+
as t%e need #or sa-ing and restoring data is mu% sma""er Iregister star-ation
ours "ess #re*uent"yJ. (# ourse+ t%is is sti"" #ar away #rom most R5S'
imp"ementations+ w%i% o##er D2 /PRs and e-en #urt%er #rom 5&-6F w%i% %as
12G /PRs.
21:5tani= is anot%er name gi-en to 5nte" 5tanium !y :$%e Register=. 5t is a diret re#erene to
R6S $itani w%i% san, in 1C12. B%at :$%e Register= wanted to say+ is t%e #at t%at wor,
on 5tanium does not gi-e !a, expeted resu"ts and osts enormous amounts o# money.
22 5tanium 2 s%ou"d o-erome t%is pro!"em Iaording to 5nte" and 8PJ.
2D(r unti" 6iroso#t deides to do so.
51 o# 116
(# ourse+ xG6-6F %as !igger p%ysia" I2
F0
T 102F /AJ and -irtua" I2
FG
T
2621FF /AJ address spae. B%en ompared to D2-!it mode+ w%i% a""owed D2-
!it I2
D2
T F /AJ -irtua" and D6-!it I2
D6
T 6F /AJ p%ysia" addresses it is a !ig
di##erene. $%is ena!"es to proess "arge amounts o# data and re#erening
t%em diret"y #rom t%e memory I"i,e memory mapped #i"es 3 we omit dis, 52(
w%i% is measured in mi""iseonds+ w%ereas memory aess is measured in
nanoseonds 3 t%at is six orders o# magnitude #asterKJ.
$%ere is a"so anot%er #eature t%at was introdued !y &64 and t%at s%ou"d
inrease seurity: t%e 0M !it Ino-exeuteJ. 'urrent"y on 5&-D2 p"at#orms w%en
a !u##er-o-er#"ow ours it is possi!"e under some irumstanes to exeute
ma"iious ode It%at an a"so ome #rom t%e remote siteJ. $%at is !eause
t%ere is no "ow-"e-e" ontro" o-er memory pages. 0M !it determines i# a page
ontains exeuta!"e ode 3 t%us 3 in ase o# mentioned atta, it s%ou"d simp"y
generate a memory -io"ation error. $%is #eature was a"so imp"emented !y 5nte"
in some PentiumF proessors w%i% %a-e P&9
2F
.
From t%e ar%itetura" point o# -iew+ xG6-6F seems to !e an immediate
so"ution #or some o# t%e 5&-D2 pro!"ems I5&-D2 itse"# was di##iu"t to modi#y
!eause o# t%e !a,ward ompati!i"ity it %ad to o##erR t%is pro!"em is a"so
a""ed :go"den %andu##s pro!"em=+ !eause 5&-D2 is a mar,eting suess+ !ut
t%e design %as to o!ey some o"d-#as%ioned ru"es+ t%at simp"y ma,e it !adJ.
>? bits of &roblems
$o !e a!"e to run app"iations in t%e so a""ed long mo$e I6F-!it modeJ+ D2-
!it app"iations %a-e to !e reompi"ed. Some o# t%em o# ourse wi"" not
ompi"e+ and t%is mig%t !e a !ig pro!"em+ espeia""y #or !usiness oriented
peop"e. Linux o# ourse supports xG6-6F ar%iteture Iespeia""y &646FJ.
$%e !iggest p"ayers on 6F-!it Linux mar,et are:
'ame latforms Aebsite 'otes
/entoo 5nte" ompati!"e+ PP'
ID2 Z 6FJ+ &"p%a+ 6F-
!it+ Spar
%ttp:22www.gentoo.org For ad-aned users.
SuS9 Linux 5nte" ompati!"e+
PP'+ &"p%a+ Spar+
5tanium+ 6ain#rame+
(t%er+ 6F-!it
%ttp:22www.suse.om2 Re*uires a "iense.
Support a-ai"a!"e.
(pen SuS9 5nte" ompati!"e+ 6F-
!it
%ttp:22www.opensuse.org2 (pen Soure -ersion
o# SuS9.
Fedora 'ore 5nte" ompati!"e+ 6F-
!it+ PP'
%ttp:22#edora.red%at.om2 (pen Soure -ersion
o# Red 8at
PL4 5nte" ompati!"e+
PP'+ &"p%a+ Spar+
6F-!it
%ttp:22www.p"d-"inux.org2 For power users.
1!untu Linux 5nte" ompati!"e+
PP'+ 6F-!it
%ttp:22www.u!untu"inux.org2 4e!ian-!ased.
6andri-a 5nte" ompati!"e+
PP'+ 6F-!it
%ttp:22wwwnew.mandri-a.om2 (ne o# t%e #irst
distri!utions #u""y
2FP%ysia" &ddress 9xtension 3 ena!"es up to 6F/A o# memory to !e used on xG6 omputers.
52 o# 116
'ame latforms Aebsite 'otes
supporting 966F$
proessors
R89L 5nte" ompati!"e+ 6F-
!it+ ot%ers
%ttp:22www.red%at.om2 Red 8at 9nterprise
Ser-er. Re*uires a
su!sription. Support
a-ai"a!"e.
S"amd6F 6F-!it %ttp:22s"amd6F.om 1no##iia" S"a,ware
port.
4e!ian &"p%a+ &R6+ 65PS+
PP'+ Spar+ 5&-6F+
5nte" ompati!"e+ 6F-
!it
%ttp:22www.de!ian.org2 $esting !ran%
a-ai"a!"e.
MG6-6F is a new ar%iteture and it wi"" sti"" ta,e some time #or t%e ompanies
and de-e"opment ommunities to write or port t%eir so#tware to support "ong
mode. 4espite t%is #at+ it is %ig%"y -isi!"e t%at 6F-!it is t%e #uture and a""
app"iations sooner or "ater wi"" !e #ored to support it.
There are also some $iffic"lties an$ limitations 4hen 4orking on -.3-
30 =in"-. It might be the case 4ith so"n$ $ri&ers, for 4hich a$$itional
em"lation libraries ha&e to be installe$. If the main a##lication is
going to 4ork 4ith &i$eo streams, then it might be also $iffic"lt to
4ork 4ith certain file formats, as some 5*D+5s are still not a&ailable
in their 30-bit &ersion.
There might be also some $iffic"lties 4ith ne4est gra#hic boar$s ? b"t
in this case there is a h"ge comm"nity of commercial $e&elo#ers that
is 4orking on im#ro&ing them.
The general r"le sho"l$ be! if one #lans to b"y a =in"- 4ith s"##ort or
from a certain &en$or, it 4o"l$ be goo$ to check if one or both sites
#ro&i$e some kin$ of certificates (that ;this =in"- can r"n on that
bo-<). If the bo- 4ill be assemble$ man"ally, then each #iece of
har$4are sho"l$ be caref"lly checke$ if it is f"lly s"##orte$. It might
be too late to $isco&er at some #oint, that the har$4are that 4as
or$ere$ 4ill not 4ork as 4e 4ante$. Doo$ a##roach is also to check
B"gzillas of certain $istrib"tions that offer 30-bit =in"- &ersions an$
search them against both har$4are an$ soft4are #ieces one #lans to
"se.
>?-bit com&utational research bo(
8a-ing in mind a"" pre-ious"y mentioned #eatures and possi!i"ities t%at ome
wit% t%e xG6-6F ar%iteture ma,es it a good %oie and target p"at#orm #or
app"iations t%at are 'P1 and memory onsuming. For t%e 24 FF$+ w%ere t%e
trans#orm si.e grows -ery #ast and re*uires a "ot o# proessing power and
memory+ t%is p"at#orm wit% a"" its ad-antages is urrent"y t%e !est so"ution
t%at an !e o!tained #or deent pries
25
. 5t is a"so a per#et so"ution #or a
dediated omputationa" ma%ine or a ser-er as most o# t%e so#tware a-ai"a!"e
#or t%em in"udes t%reads Ii# app"ia!"eJ.
25&t t%e time o# writing 3 2006.
5D o# 116
Denerally s#eaking, the biggest com#"tational #o4er is insi$e Intel
#rocessors, es#ecially 4hen "sing Intel 5om#iler. It is a #erfect
sol"tion for #rocesses that ha&e to #erform many o#erations on a $ata
set that $oes not change fre,"ently o&er time ? th"s, it might not be
the best sol"tion for real-time systems. 9MD 4ith Ky#erTrans#ort can
be consi$ere$ to be a &ery goo$ sol"tion for real time system, 4here
large amo"nts of $ata ha&e to be transferre$ back an$ forth bet4een
5@ an$ memory. +&en tho"gh, 9MDHs com#"tational ca#abilities are
smaller, the s#ee$"# 4ill be &isible.
Inside and close to CP)
$%is %apter wi"" try to pro-ide some !a,ground a!out t%e most important
en%anements. 6ost o# t%em+ "i,e pipe"ining or a%e memory were in-ented
some time ago and now are imp"emented in a"most e-ery omputer
Ipre-ious"y t%ey were onsidered to !e "uxury en%anementsJ. &dditiona""y+
e-ery 'P1 omes wit% extensions: some ,inds o# magi a!!re-iations+ t%at
w%en proper"y used an gi-e :go"den= resu"ts. Sur-ey wi"" in"ude a"so
orresponding #"ags t%at an !e used wit% #or examp"e t%e gcc
23
ompi"er.
$mportance of this chapter
5n most ases our #ee"ing a!out t%e proessor mar,et is as #o""ows: :t%ere are
many 'P1s+ some o# t%em ome #rom t%e same series+ t%us+ t%ey must !e t%e
same=. 0oK $%ey are not. $%ere are se-era" ode names t%at are assigned to
proessors w%i% try to re#"et ore;s apa!i"ities and possi!"e ot%er
en%anements. 6any peop"e mig%t !e surprised i# t%ey ,new t%at #or examp"e
Pentium F %as 16 di##erent -ersions
2>
. &"" t%ese -ersions di##er a "ot 3 starting
#rom a PF t%at is a p"ain D2-!it 'P1 and ending on t%e -ersion t%at supports
966F$ wit% 0M !it and SS9D. 5t is good to ,now w%at to !uy and %ow it wi""
a##et #urt%er de-e"opment 3 at "east in t%eory.
'tage %: pipelining
5dea o# pipe"ining in its !asi+ -irgin #orm is *uite easy and an !e presented
on t%e examp"e o# a prodution "ine or -ery #amous :"aundry examp"e=. 5n
most ases onept o# t%e pipe"ine is ,nown 3 !ut t%is is )ust t%e pi, o# a
mountain. $%ere are se-era" ot%er t%ings t%at an %a-e impat on t%e pipe"ine
and its !e%a-ior 3 and t%is o# ourse wi"" !e -isi!"e in t%e per#ormane gain Ior
"ossJ t%at pipe"ine gi-es. &n idea" pipe"ine onsisting o# n stages s%ou"d gi-e
per#ormane gain "ose to n 3 !ut o# ourse+ t%e more stages+ t%e more
pro!"ems and resoures t%at %a-e to !e used Iea% stage needs its ex"usi-e
set o# e"ementsJ.
26/01 'ompi"er 'o""etion. Learn more at %ttp:22g.gnu.org2.
2>&t t%e time o# writing 3 2006. 0um!er is )ust an estimate 3 t%ere mig%t !e more -ersions.
5F o# 116
6umber of &i&eline stages
$%is o# ourse depends on t%e imp"ementation. &s it was mentioned 3 t%e more
stages t%e !etter per#ormane gain+ !ut a"so t%e more di##iu"t imp"ementation
and %ig%er pena"ty #or sta""ing t%e pipe"ine. $%is is espeia""y important in
ase o# !ran% instrutions and data "oads+ w%i% an !rea, w%o"e pipe"ine
Iausing pipe"ine #"us%J. &not%er pro!"em arises w%en instrutions depend on
ea% ot%er 3 t%is %a.ardous situation a"so %as to !e so"-ed e##iient"y !y
designers.
55 o# 116
$%ere are se-era" met%ods to minimi.e t%e "ateny. /ood examp"e an !e
Pentium F proessor+ w%i% imp"ements speu"ati-e pipe"ine wit% around 21
stages. Aut t%is num!er %anges and #or examp"e a"" Pentiums F ode name
Presott %a-e D1 stages pipe"ine
2G
. (pposite situation is in ase o# R5S'
proessors+ #or examp"e 65PS o##ers 5 stages pipe"ine. 0atura" *uestion wou"d
!e t%en+ w%ere is t%e !orderH B%at is %appening in PF proessor is a %ig%-
"e-e" engineering+ w%i% tries to get most power o# a-ai"a!"e %ardware. 5nte"
engineers put mu% more %ardware to imp"ement t%is aggressi-e *ueue 3
ot%erwise a !ran% instrution ou"d simp"y #"us% pipe"ine e-ery time it was
not ta,en. Aut t%is is a"so resu"t o# t%e way 5nte" treats instrutions+ w%i% are
o# -aria!"e si.e Ion t%e ontrary to 65PS+ w%ere e-eryt%ing is a"ignedJ. &n
instrution is deoded and sp"it into se-era" miro operations Iit is said+ t%at
5nte" wor,s "i,e R5S' !eing a '5S'J+ w%i% are #urt%er organi.ed Iand an !e
reordered 3 (o( me%anismJ into t%is mu"ti-stage pipe"ine. &dditiona""y 5nte"
uses trae a%e and register renaming to minimi.e pena"ty w%en it omes to
reo-er t%e pipe"ine. 'ompi"ers an a"so in#"uene t%e way pipe"ine wi"" !e
exeuted !y s%edu"ing instrution in a pipe"ine-#riend"y way 3 t%is in"udes
a"so !ran% predition+ w%i% nowadays is present a"so on 'P1 dies. Aot%
&64 Inew+ xG6-6F onesJ and 5nte" proessors %a-e register renaming and
(o( Iout-o#-order exeutionJ imp"emented in t%eir pipe"ines.
'urrent 'P1s usua""y %a-e distint pipe"ines #or #"oating-point units Ior
di##erent num!er o# stagesJ. 0um!er o# stages a"so di##ers among
manu#aturers and 'P1 types: 5nte" I"argest+ up to D1 stagesJ+ &64 I12 stages
#or integer and 1> stages #or #"oating-point and 65PS I5 stagesJ.
Pipe"ine inreases t%roug%put+ and t%is inrease is proportiona" to t%e num!er
o# stages. 5n-o"-ed "ateny+ additiona" %ardware and omp"ex struture o# t%e
pipe"ine is wort% t%e e##ort+ !eause t%e o-era"" resu"t sti"" ma,es it mu%
#aster t%at a non-pipe"ined so"ution.
The general r"le is, that the more stages ? the more #roblems. B"t
honestly, it is of secon$ im#ortance to #ay attention to n"mber of
stages in the #i#eline. For s"re, it 4o"l$ be imme$iately &isible if a
5@ 4as r"nning 4itho"t #i#eline.
S"mmarizing, #"r#ose of this #aragra#h 4as I"st to #ro&i$e a fe4
#ieces of information abo"t #i#elining an$ its c"rrent forms.
Floating-point unit
9ar"y 'P1s did not ontain #"oating-point units. $%ese a"u"ations were
%and"ed usua""y eit%er !y a o-proessor Iw%i% itse"# was an additionJ or were
emu"ated Ieit%er in so#tware or were imp"emented as a miroodeJ. 'urrent"y
t%e tendeny is to %a-e more t%an one FP1 wit% s%ared or separate pipe"ines
Iinteger+ #"oating-point pipe"inesJ. 9ar"y supersa"ar proessors t%at did not
imp"ement (o( Iout o# order exeutionJ %ad to %a-e separate pipe"ines. 0ow
it is not a pro!"em+ sine instrutions an !e easi"y reordered I!ut t%ere mig%t
!e a di##erent num!er o# pipe"ine stagesJ.
This cha#ter is e-tremely im#ortant, as these e-tensions sol&e the
2G$%at is w%y pipe"ine is o-ered 3 t%is parameter %anges o-er t%e time #or t%e same
proessor nameK
56 o# 116
biggest #roblem that e-ists on -.3 an$ -.3-30 architect"res, an$ that
is ;register star&ation< #roblem. F@ e-tensions "s"ally #ro&i$e set of
registers an$ a$$itional f"nctions for #erforming s#ecialize$ tasks on
them (4hich "s"ally tries to e-#loit higher le&el of #arallelism). It is
im#ortant beca"se these e-tensions can be "se$ $irectly, as gcc
com#iler offers set of b"ilt-in f"nctions for accessing them (an$ can
also generate F co$e a"tomatically ? similarly as icc com#iler).
=any &rocessors ! one floating &oint standard AI''' BC?D
5999 >5F #"oating-point standard is t%e one t%at ma,es it possi!"e to port
#"oating-point app"iations to di##erent proessors and ar%itetures. 'urrent"y
a new re-ision is !eing de-e"oped I5999 >5FrJ w%i% %as some -ery
interesting #untions and adds *uad preision #ormat 3 !ut t%is wi"" !e o-ered
"ater on.
$%ere are many detai"s a!out t%e #"oating-point standard t%at are out o# sope
o# t%is text+ !ut some o# t%em an !e re#erred to as !asis o# #"oating-point
de#inition+ as t%ey an ause one o# mentioned t%ings to %appen: o-er#"ow or
under#"ow. (-er#"ow is a samp"e ase+ w%en a num!er is too "arge to #it into
t%e data #ormat It%e same #or integer operationsJ+ w%ereas under#"ow exists
on"y in t%e #"oating-point wor"d. 5t means+ t%at t%e num!er is too sma""
Inegati-e exponent is too !igJ to #it into t%e data type. $%is is -ery important
#rom t%e omputationa" point o# -iew. B%yH 6ain"y+ !eause urrent FP
standard o##ers two data types w%i% are ,nown as float Ising"e preision+ F
!ytesJ and $o"ble Idou!"e preision+ G !ytesJ. For medium and "arge si.ed
two dimensiona" FF$s+ proper data type an !e !ene#iia"+ as t%en more data
an #it into t%e a%e 3 and t%is o# ourse wi"" minimi.e tota" num!er o# a%e
miss e-ents+ w%i% wi"" resu"t in #aster exeution Ias it was mentioned !e#ore 3
#rom proessor;s point o# -iew memory is s"owJ. 0ot a"ways is it possi!"e to
predit range o# num!ers t%at wi"" #orm t%e input+ and t%ere are di##erent
demands onerning preision o# a"u"ations.
arameter float $o"ble
Si.e F !ytes ID2 !itsJ G !ytes I6F !itsJ
Range
!10
DG
!D 10
DG
(!1.1>10
DG
!D.F010
DG
)
!2 10
D0G
!2 2 10
D0G
(!2.2510
D0G
!1.>C10
D0G
)
5nterna"s 1 !it #or sign+ G !its #or t%e
exponent and 2D !its #or t%e
#rationR
imp"iit 1 added to t%e
#ration gi-es in #at 2F !its
1 !it #or sign+ 11 !its o# t%e
exponent and 52 !its #or t%e
#rationR
imp"iit 1 added to t%e
#ration gi-es in #at 5D !its
Representation
(1)
S
(1 +#ration)2
exponent
9xponent Aiased I12>J Aiased I102DJ
Aot% sing"e and dou!"e preision #ormats support rounding s%emes and
speia" e-ents "i,e 'a' or inf. $%ese e-ents as we"" as .ero and denorma"
num!ers Isu!norma"
2C
J are enoded in speia" ways+ w%i% ma,es t%e
2CSu!norma" num!ers are non-.ero num!ers t%at are sma""er t%an t%e sma""est norma"
5> o# 116
standard mu% more #"exi!"e.
&s it was mentioned !e#ore+ re-ision o# t%e urrent #"oating-point standard wi""
in-o"-e *uad preision type I12G!itsJ and one -ery important instrution+ t%at
urrent"y is present in PowerP' and 5tanium proessors: FM9. Fused
mu"tip"y-add or #used mu"tip"y-aumu"ate is instrution t%at ma,es !ot%
t%ings mu% #aster t%an mu"tip"iation #o""owed !y addition.
F6&(a+ !+ )=a!+
B%ene-er possi!"e t%is instrution s%ou"d !e in-o"-ed+ as it great"y simp"i#ies
and speeds up omputations on omp"ex num!ers+ w%i% o# ourse are
mandatory #or FF$ imp"ementation Iand an !e a"most diret"y used #or
a"u"ating t%e so-a""ed Sande-/ent"eman !utter#"yJ .
From t%e design point o# -iew+ a"" FP1s are o# ourse optimi.ed #or t%e dou!"e
preision type+ t%us t%eoretia""y t%ere s%ou"d !e no di##erene !etween time
needed to ompute t%e same t%ing on dou!"e or #"oat type. From t%e ot%er
%and+ t%ere wi"" !e su!stantia" di##erene in memory aess time. 16px
piture in #re*ueny domain wi"" oupy 16 mega!ytes o# R&6 memory I#or
dou!"e+ w%ereas sing"e preision )ust %a"# o# itJ. $%ere is no way #or su% array
to #it into a%e memory 3 t%us e-eryt%ing %as to !e #et%ed #rom t%e :s"ow=
memory. /ood designs re*uire good ompromises: preision -s. speed.
=imitations 4ithin the stan$ar$ are more im#ortant in terms of the
$esign, as they $etermine 4hich ty#e 4ill be "se$ etc. For large
transforms it might be beneficial to consi$er the float ty#e, as it
occ"#ies half of the size that $o"ble arrays occ"#y ? an$ this of co"rse
4ill res"lt in faster access size.
(EB
@F
5n t%e !eginning #"oating-point mat% was a-ai"a!"e in #orm o# a oproessor. 5n
urrent 6F-!it imp"ementations xG> is a-ai"a!"e to support exeution o# "egay
D2-!it #"oating-point ode. (# ourse t%ere are a"so some ad-antages o#
running xG> ode in t%e "ong mode+ and in ase o# &646F t%ese are:
aess to 6F-!it address spae
R5P addressing mode is a-ai"a!"e.
R5P-re"ati-e addressing mode is a new mode o##ered !y &646F proessors+
w%i% ma,es "oading o# t%e P5'
D1
ode more e##iient+ as instrutions an
re#erene data re"ati-e to t%e instrution pointer.
Quick note from gcc manual:
--mfpmath=unit
Where unit can be:
387 (default, will run everywhere)
num!ers Inum!ers "ose to 0J. Su!norma" num!ers were #irst"y imp"emented !y 5nte" and
t%en t%ey !eame 5999 standard. (n some %ardware operations on su!norma" num!ers
are not imp"emented in t%e %ardware and re"y on so#tware so"utions Ior ot%ersJ w%i%
ma,es omputations mu% s"ower t%an on norma" Inorma"i.edJ num!ers.
D00ame omes #rom 5nte";s FP %ip+ w%ose name was ending wit% G>.
D1Position 5ndependent 'ode.
5G o# 116
sse (utilize SSE extensions needs additional options!)
sse,387 (utilize both at the same time; experimental, should not be used for
production code)
On x86-64 compiler SSE/SSE2 are enabled by default. On i386 compiler 'sse' option
requires also msse or msse2 switches and march=CPU has to be stated.
MG> ontains G sta, registers wit% t%e maximum preision e*ua" to G0 !its.
9s long as so"rces of the #roIect are a&ailable it sho"l$ be not a
#roblem to com#ile a binary &ersion that 4ill r"n on a certain #iece of
har$4are. roblems arise, 4hen one is s"##ose$ to $istrib"te binary
&ersion of the soft4are. This of co"rse means, that there sho"l$ be
se&eral binary &ersions a&ailable or ? at a cost of #erformance ?
floating-#oint co$e sho"l$ "se only -./ F@.
To #ro&i$e a real e-am#le! lame, an m#2 file enco$er 4as "se$ to
enco$e files containing ran$om $ata on t4o machines! one ha&ing
Intel Meon +M30T 2.0DKz an$ the other one Intel enti"m 0 KT
2.1DKz (both ha$ 1120kB of cache memory). The secon$ one 4as
#erforming a bit better ? I"st beca"se it "se$ MMMGSS+GSS+2
e-tensions.
A=ar/etingD '(tensions
&"" t%ese names %a-e two purposes: to se"" more proessors and to gi-e some
o-er-iew a!out 'P1s apa!i"ities
D2
. 'urrent"y ea% proessor ontains so
many extensions+ t%at new :mar,eting= names are ad-ertised on"y #or
re-o"utionary te%no"ogies I"i,e 8$ #or examp"eJ. Sine a"most a"" 'P1s %a-e
t%ese extensions+ t%en w%y !ot%erH Know"edge is power and power an
diret"y turn into per#ormane o# our ode 3 e-en wit%out writing a sing"e "ine
in assem!"y. $%e tri, is to ,now %ow to uti"i.e a"" power t%at omes wit% t%e
ompi"er 3 #or examp"e wit% g.
6ost o# t%e extensions t%at wi"" !e presented !e"ong to t%e group o# S564
instrutions+ w%i% simp"y means Sing"e 5nstrution 3 6u"tip"e 4ata. $%e main
purpose is to !e a!"e to proess "arge amounts o# data Iapp"ia!"e in 4SP and
grap%isJ+ w%i% s%ou"d more or "ess resem!"e in pratie t%at w%at %appens
in a -etor proessor
DD
. $%e outome is mu% %ig%er "e-e" o# para""e"ism t%an
in ase o# an ordinary supersa"ar proessor. (# ourse+ S564 re*uires more
registers+ !ut not a"ways were t%ey present Ito "ower t%e o-era"" ostJ. $%at
was t%e ase o# 66M extension Iinteger arit%metisJ+ w%i% added G new
registers+ t%at were in #at a"iases o# existing xG> registers. $%us+ ma,ing at
t%e same time #"oating-point operations wi"" simp"y resu"t in :register #ig%t=.
$%e ot%er 3 may!e not disad-antage 3 !ut di##iu"ty t%at omes wit% S564 is
t%e #at t%at data %as to !e per#et"y a"igned+ and t%is is usua""y trou!"esome
DF
.
D2Sometimes extensions; names are tota""y meaning"ess+ "i,e 66M.
DDProessor t%at operates on se-era" data items at one time.
DF$o ma,e it e-en more omp"iated+ on Pentium and Pentium PR( dou!"e and "ong dou!"e
%a-e to !e a"igned to G-!yte !oundary+ w%ereas on PD t%ey s%ou"d !e a"igned to 16-!yte
!oundary.
5C o# 116
SS+
Streaming S564 9xtensions were a ma)or en%anement w%en ompared to
66M t%at o##ered on"y integer operations wit%in "imited sope I66M registers
were in #at FP1 registersJ. SS9 added G new 12G-!it registers IM66R ea%
one pa,s F #"oatsJ t%at ou"d !e operated independent"y o# t%e 66M set. 5n
order to use SS9 it %as to !e exp"iit"y ena!"ed. (# ourse+ as t%e name states+
M66 registers are S564-aware. SS9 support was added "ater to &64
proessors Istarting #rom &t%"on MPJ.
SS+2
&t some point 5nte" en%aned SS9 and reated SS92 3 t%at was a ma)or step
#orward. Ae"ow is summary o# di##erenes:
reuses M66 registers+ t%us #rom t%e !inary point o# -iew operating -etor
is t%e same
adds dou!"e preision #"oating-point operations
ena!"es to wor, wit% a"most any integer type: G2162D226F !its
Support #or integer types made it possi!"e to a-oid swit%ing to 66M mode 3
t%us #"oating-point and integer operations an !e mixed toget%er.
&s #or now it is possi!"e to use set o# !ui"t-in a""s Io##ered !y gJ and ma,e
use o# !ot% SS9 and SS92. 5t is a"so possi!"e to automatia""y generate ode
#or SS92SS92 extensions Isupported !y !ot% g and iJ.
SS+2
0ewest set o# S564 instrutions adds #untions t%at an !e used ex"usi-e"y
#or 4SP app"iations. &"so operating on t%e register #i"e is more #"exi!"e 3 wit%
SS9D it is possi!"e to aess t%e register %ori.onta""y ISS92SS92 3 on"y
-ertia""yJ. $%ere is a"so one #untion t%at impro-es e##iieny o# t%e pipe"ine 3
it ena!"es to on-ert #"oating-point num!ers to integers wit%out exp"iit
%ange o# t%e rounding mode. SS9D was #irst"y introdued !y 5nte" 3 urrent"y
is present a"so in &64 'P1s Ia"" &646F onesJ and L5& '> Isome mode"s
on"yJ.
SS+ instr"ctions are the ones that can gi&e big s#ee$ "# for floating-
#oint an$ integer oriente$ calc"lations. Ahene&er #ossible, they
sho"l$ be "se$.
2D'o4L
Primari"y D40owK was supposed to !e 66M-en%anement Iand &64;s
orresponding 66M te%no"ogyJ+ as it added support #or #"oating-point
a"u"ations+ simi"ar to SS9 instrutions t%at were "ater added !y 5nte" to
Pentium D proessor. D40owK was a"so a"iasing FP1 registers I"i,e 66MJ+ !ut
ou"d pa, on"y 2 #"oating-point num!ers instead o# #our Iin SS9J. &not%er
-ery interesting t%ing was t%at D40owK supported %ori.onta" operations on a
register #i"e+ w%i% were added wit% SS9D.
60 o# 116
'urrent"y on &64 proessors t%at support !ot% SS92SS92 and D40owK
t%eoretia""y it s%ou"d !e possi!"e to exeute at t%e same time !ot% SS9 and
D40owK ode. &s it was mentioned !e#ore Inotes a!out gJ+ it is -ery %ard to
aomp"is%.
Later on &64 reated en%aned D40owK B%i% was supposed to per#orm
tas,s "i,e SS9 extension. $%is was a%ie-ed "ater+ w%en D40owK Pro#essiona"
was reated Istarting #rom &t%"on MPJ.
Similar sit"ation is in case of 2D'o4L In both cases (integer an$
floating-#oint math) if #ossible, it sho"l$ be "se$.
$nside the memor# hierarch#
6any times it %appens+ t%at peop"e !eome :giga%ert. s"a-es= - it means+ t%ey
measure per#ormane o# a omputer !y means o# t%e "o, #re*ueny. 1sua""y
t%e %ig%er t%e "o,ing #re*ueny t%e !etter t%e per#ormane 3 !ut not a"ways
as peop"e expet it to !e. (ne s%ou"d not t%in, t%at t%ere is a "inear
dependeny !etween !ot%. 6any t%ings depend a"so on t%e most important
"in,+ and t%at is onnetion to t%e main memory IR&6J. Sine 'P1s are mu%
#aster t%an R&6+ it is o!-ious+ t%at e-en t%e #astest proessor wi"" not per#orm
good i# e-ery time it %as to wait #or data to arri-e #rom t%e s"ow memory.
$%ere were many attempts t%at were trying to so"-e t%is pro!"em+ !ut t%e on"y
so"ution #or minimi.ing tra##i on t%e !us is to add an intermediate stage+ t%at
wi"" pre-ent t%e 'P1 #rom oupying t%e !us and aessing memory diret"y.
$%is is t%e purpose o# t%e a%e memory. 5ts presene in t%e system is
transparent to t%e programmer I#rom Fren%+ :a%?= means :%idden+ good
p"ae to %ide=J. 'urrent"y a%e memory is put e-eryw%ere+ w%ere t%e ost o#
o!taining data is rat%er expensi-e. 'apaity o# a%e memory is rat%er sma""+
as t%ey are -ery expensi-e ISR&6 te%no"ogyR aess time around 0.5-5nsJ.
Ae"ow t%ere is t%e so a""ed :memory pyramid=+ w%i% presents di##erent
memory %ierar%ies and de#ines di##erenes !etween t%em+ w%i% are: aess
time+ apaity and prie.
$%ere a"so ot%er #eatures t%at %a-e impat on a%e;s per#ormane+ "i,e
organi.ation+ write or rep"aement modes.
61 o# 116
=ethod
'a%e is put in !etween t%e 'P1 and memory to minimi.e aess time. $%is is
due to t%e #at+ t%at a%e exp"oits two -ery important p%enomena: s#atial
and tem#oral localities. 5# t%ere is data t%at 'P1 wants #rom t%e memory+
t%en it is "i,e"y t%at it wou"d a"so "i,e piees o# data t%at are near!y t%e
re#erened "oation 3 t%is is spatia" "oa"ity. $empora" "oa"ity wor,s aording
to t%e #at+ t%at one a piee o# data was re#erened+ it is "i,e"y t%at it wi"" !e
re#erened again+ soon. $o use "oa"ities e##iient"y+ ea% block (or line) o#
a%e ontains more data t%an it was needed. B%o"e tra##i !etween 'P1 and
memory goes t%roug% a%e memory. 5# an item is present in t%e a%e
memory+ t%en we %a-e a a%e hit e-ent 3 i# it;s not t%en it is a%e miss+
w%i% un#ortunate"y auses t%e #o""owing ations to !e ta,en:
sine t%e entry was not present in t%e a%e it s%ou"d !e #et%ed #rom t%e
memory
aess proedure %as to !e started one again #rom t%e !eginning
Aigger a%e si.e or di##erent organi.ation an signi#iant"y derease num!er
o# a%e misses. $%ese are t%e so a""ed :a%e #riend"y= programming ru"es+
t%at are -ery strit+ !ut in many ases an !e app"ied:
1. Loops s%ou"d !e sma"" 3 i# t%ey ta,e "arge amounts o# memory #or t%e
ode t%en o# ourse a "oop won;t #it into t%e a%e. &mounts t%at an !e
put into a%e are rea""y sma""+ around 2-FKA.
2. 'a""s to #untions t%at reside outside urrent area o# ode s%ou"d !e
per#ormed "ess #re*uent"y t%an a""s to #untions t%at are "ose to t%e
urrent area o# ode.
D. 6emory s%ou"d !e aessed in t%e same areas. $%is mig%t !e o# great
interest w%en designing an o!)et oriented app"iation: ode is a%e
#riend"y i# one tries to aess "ass; #ie"ds toget%er at one time Ispatia"J.
$%e same an !e app"ied to arrays: w%en t%ey are sma"" and aessed
se*uentia""y t%en it is possi!"e to notie per#ormane inrease.
Regarding point num!er D and arrays: t%ere is a"so anot%er way o# storing
two-dimensiona" arrays+ w%i% is t%e so-a""ed set o# o# pointers to pointers Ior
in ' words myVtype SS -aria!"eJ. Re#erening items #rom su% a pointer is as
easy as aessing a standard two-dimensiona" array+ t%at is: -aria!"eNxONyO. 5t
is a -ery !ad design #or time ritia" app"iations as:
#or array o# si.e 0x6 we %a-e 0 -etors o# si.e 6R in tota" :si.eo#ImyVtype
SJ U 0S6Ssi.eo#ImyVtypeJ=
dea""oation orresponds to #reeing #irst"y 0 -etors o# si.e 6 and t%en t%e
pointer to -etors
rows o# t%e array are in %un,s+ t%us spatia" "oa"ity wi"" not wor, as it
s%ou"d
it is mu% more di##iu"t to a"ign data in su% #orm Ii# one wants to use
S564 extensionsJ.
$%ere are a"so !ad %a!its+ w%i% an generate a "ot o# a%e miss e-ents+ and
t%us wi"" s"ow down exeution o# t%e program:
62 o# 116
1. Large arrays t%at annot #it into a%e and t%at are aessed
se*uentia""y and se-era" times wi"" o# ourse #ore 'P1 to aess
memory more o#ten. & possi!"e so"ution to t%is pro!"em ou"d !e to sp"it
an array into sma""er parts and proess t%em one !y one 3 !ut t%is
annot !e done a"ways.
2. 5n ase o# "in,ed "ists w%ere one tries to aess a piee o# data in ea%
node #or ea% reord 3 t%is an inrease miss ratio. 5t is mu% !etter to
proess a"" #ie"ds #rom a reord at a time and t%en mo-e to t%e next
node.
D. F5F( *ueues: t%e #irst-in e"ement is usua""y t%e o"dest entry in memory+
t%us aording to LR1 t%is wi"" !e t%e #irst disarded entry.
$%e most important is t%e #irst point. 5n our FF$ pro!"em+ t%ere is no way t%at
we an -oid using "arge amounts o# data. Aut t%ere is anot%er so"ution #or t%is
pro!"em w%i% was mentioned in t%e FF$ a"gorit%m setion: row- or o"umn-
ma)or #ormats. B%o"e matrix is pa,ed into one dimensiona" -etor w%i%
imp"ies se*uentia" aess. $%is pro-ides !etter usage o# !ot% "oa"ities.
$%ere is sti"" one more pro!"em: %ow a!out writing to t%e memoryH 5n genera"
t%ere are two strategies:
write-!a,
write-t%roug%
Brite-t%roug% tries to ,eep !ot% a%e and memory onsistent+ and updates
t%e !"o, in !ot% p"aes. For write-t%roug% memory write operations s%ou"d
!e o""eted to a write !u##er and a#terwards+ w%en t%e !u##er is #u"" or ot%er
onditions are #u"#i""ed+ data s%ou"d !e written to t%e memory. (n t%e
ontrary+ write-!a, updates on"y !"o, in t%e a%e+ w%i% is #urt%er written
to t%e main memory w%en t%e !"o, is rep"aed.
Brite-!a, s%eme is %arder in imp"ementation t%an write-t%roug%+ !ut an
speed up write operations espeia""y w%en t%ey are generated !y t%e 'P1
#aster t%an t%ey an !e proessed !y t%e memory.
rocessors 4ill ha&e the best #erformance for highly com#licate$ tasks
? b"t they are not o#timize$ for $ealing 4ith h"ge ban$4i$ths or
streams of $ata.
5"rrently it sho"l$ not be #ossible to b"y a 5 com#"ter that has a
5@ 4itho"t cache memory. S"ch 5@s ha&e &ery attracti&e #rices, b"t
in t"rn offer really #oor #erformance.
+i-es0 le#els0 organi-ation
4ue to %ig% pries o# SR&6 te%no"ogy+ a%e memories are usua""y sma"".
'urrent"y some 'P1s Iespeia""y 5nte" 'e"eron seriesJ are s%ipped wit% mu%
sma""er a%e memories+ and t%at o# ourse %as diret impat on t%e o-era""
per#ormane o# a omputer. 0owadays it is usua""y !etween 256,A and 16A
Imost asesJ+ !ut t%ere are memories as sma"" as 12G,A I5nte" 'e"eron+ PF
!asedJ and as "arge as 26A Isome "aptopsJ. 5n many ases+ a%e memories
are put on t%e die toget%er wit% 'P1 and are organi.ed into "e-e"s Iusua""y 2J.
$%e seond+ L2 Ior ot%er+ %ig%er "e-e"J a%e "e-e" is used e-ery time a a%e
6D o# 116
miss is enountered in t%e 1
st
"e-e" a%e IL1 or orresponding "ower "e-e"J.
$%en t%e miss pena"ty is e*ua" on"y to t%e aess time to t%e L2 Ior ot%er
"e-e"J a%e memory w%i% is o# ourse mu% sma""er t%an 4R&6 aess time.
Sometimes too many a%e "e-e"s an inrease t%e "ateny+ "i,e it was in ase
o# t%e 5tanium proessor. $%ird ILDJ "e-e" a%e %ad su% a !ig "ateny+ t%at
t%ere was no !ig di##erene !etween aessing main memory and t%e t%ird
"e-e" a%e IL2 was )ust 256KA and LD was C6AJ.
1sua""y L1 a%e is sp"it into two parts: one a%es data and remaining part is
dediated #or instrution a%ing I5nte" a""s it :miro op a%e=+ #rom t%e
term miro operationsJ. 5n ase o# 5nte" proessors L1 a%e is usua""y #rom 20
to D0KA Iand t%is num!er is #urt%er sp"it into data and instrution a%esR
proportions !etween !ot% di##erJ+ w%ereas on most &64 proessors L1 a%e
is rat%er !ig I&t%"on+ 4uron+ (pteron seriesJ 3 12GKA I6FKA #or data and
6FKA #or instrution a%eJ. For dua"-ore 'P1s 12GKA is per ore w%i%
#urt%er onnets !ot% L1 a%es to one L2 a%e.
&part #rom t%e a%e si.e+ t%ere is one more t%ing t%at an in#"uene
per#ormane o# t%e a%e su!system: organi.ation. 5t is said+ t%at a%e
memory is n-way set assoiati-e i# it is possi!"e to map a !"o, to n di##erent
p"aes. & ase+ w%en a ertain memory "oation an !e p"aed on"y in one
p"ae o# t%e a%e is a""ed $irect-ma##e$ cache. (pposite situation ours+
w%en a memory "oation an !e p"aed anyw%ere in t%e a%e 3 t%at is f"lly
associati&e cache. 1sua""y t%e !est so"ution is a ompromise+ w%i% in t%is
ase is some ,ind o# n-way set assoiati-e organi.ed struture. $%e more
#"exi!"e p"aement o# !"o,s+ t%e more omp"iated sear% wi"" !e. For
examp"e+ #or a 2-way set assoiati-e a%e ea% memory "oation %as to !e
%e,ed in two di##erent p"aes+ #or a F-way in F p"aes and so on. $%e !est
resu"t an !e a%ie-ed on"y i# #or n-way set assoiati-e a%e t%ere are n
omparators+ t%at an simu"taneous"y %e, ourrene o# re#erened memory
"oation. $%is means more %ardware+ and more %ardware usua""y imp"ies
%ig%er "ateny.
$%ere is one *uestion t%at sti"" remains open: w%i% !"o, s%ou"d !e rep"aed
on a%e missH 5t an !e eit%er a random !"o,+ w%i% in t%is ase mig%t !e
wrong %oie+ as usua""y a%e is sma"" enoug% to imp"ement LR1 a"gorit%m+
t%at wi"" rep"ae most"y unused e"ements on a a%e miss. Random me%anism
mig%t !e interesting #or $LA w%i% is usua""y #u""y-assoiati-e+ !ut anyway+
newest 5nte" and &64 proessors %a-e eit%er pseudo-LR1 rep"aement or
imp"ement LR1 wit% some ot%er round-ro!in me%anisms.
For a machine that is s"##ose$ to $eal 4ith big ban$4i$ths, an$ this is
$efinitely o"r case, cache memory sho"l$ be as big as #ossible. 9 5@
that has less than 1MB of cache sho"l$ not be consi$ere$ at all. 1MB
or 2MB ? these are &al"es that sho"l$ offer best #erformance for a
$ecent #rice. There are e-am#les of big m"ltile&el caches, that
"nfort"nately 4ere not #erforming as goo$ as they sho"l$. This is the
case of Itani"m #rocessors, 4hich has 253FB of =2 cache an$ 6MB (L)
of =2 cache ? b"t the latency in&ol&e$ by the thir$ le&el is so high, that
it makes the $ifference in access time bet4een cache an$ memory &ery
small.
6F o# 116
8ne bus0 different ban/s
$%ere is anot%er te%ni*ue t%at an speed up memory #et% operations+ w%i%
is a""ed memory inter"ea-ing. 5nstead o# %a-ing one !ig memory we %a-e
se-era" memory !an,s. (# ourse+ it is not enoug% to sp"it t%e memory Ia#ter
sp"itting it sti"" an !e used as segmented memoryJ. Let say+ t%at t%e memory
we %a-e is o# 0 si.e and t%ere are #our a-ai"a!"e !an,s. 6emory "oations 1+ 2+
D+ F+ 5+ 6 wi"" !e"ong to a$$r mo$"lo 0 !an,+ t%us:
&ddress: 1 --[ 1mod F T !an, 1
&ddress: 2 --[ 2 mod F T !an, 2
&ddress: D --[ D mod F T !an, D
&ddress: F --[ F mod F T !an, 0
&ddress: 5 --[ 5 mod F T !an, 1
&ddress: 6 --[ 5 mod F T !an, 2
$%is ena!"es 'P1 to issue more ommands t%an memory an proess at a
time. $%is is anot%er examp"e %ow it is possi!"e to inrease t%roug%put !y
imp"ementing some ,ind o# pseudo-para""e" operations. 6emory inter"ea-ing
an !e -iewed simi"ar"y as pipe"ining+ w%i% sti"" needs a ertain amount o#
time to #inis% one sing"e operation.
5ache memory minimizes traffic on the b"s bet4een 5@ an$ memory,
b"t of co"rse, the best res"lts can be achie&e$ for memories that ha&e
the highest clocking fre,"ency. It is #ossible to connect memories to a
'orthbri$ge 4hich ha&e smaller fre,"ency than FSB, b"t of co"rse,
this might not be a goo$ i$ea for a fast research bo-. Big an$ fast
memories are e-#ensi&e, b"t in most cases it is 4orth in&esting in
them.
9$$itionally, for s"ch a #"r#ose one sho"l$ consi$er Ky#erTrans#ort
a4are har$4are, 4hich is a &ery goo$, ;gl"eless< re#lacement for FSB
in 'orthbri$ge.
5lternati#e solution ! *P*P)
Popu"arity o# gaming and onstant"y inreasing re*uirements #rom !ot%
ustomers and grap%i industry aused+ t%at today we an !uy a deent
grap%is !oard e*uipped wit% a /P1 apa!"e o# doing se-era" %undred
!i""ion
D5
#"oating-point operations per seond #or )ust a #ew %undred do""ars. 5t
sounds promising+ espeia""y t%at /P1 per#ormane inreases mu% #aster
t%an in ase o# norma" 'P1s I6oore;s "aw says a!out t%e ratio e*ua" to 2+
w%ereas #or /P1s it is around 2.FJ. 'urrent"y a"most a"" /P1s are
programma!"e+ t%us it s%ou"d !e a !it easier to write genera" purpose
programs #or t%em. 0e-ert%e"ess+ /P1 programming is not easy and t%e
en-ironment is %ig%"y onstrained. 4espite t%is #at it is wort% trying+
espeia""y t%at most o# image proessing a"gorit%ms an !e imp"emented on
D5$o ensure+ t%at it is not a type-error: se-era" %undred !i""ion #"oating-point operations per
seond.
65 o# 116
/P1s.
$ntroduction to 0P1 world
/P1 is not a genera" purpose proessor+ t%us it was possi!"e to use "ess
ontro" units and put more omputationa" "ogi. 4ue to t%is #at+ t%ere are
many #untiona" units t%at introdue -ery %ig% "e-e" o# para""e"ism+ !eause
not on"y are t%ey a!"e to per#orm t%e same operations on mu"tip"e sets o# data
IS564 - -etorJ+ !ut a"so t%ey an per#orm di##erent operations on di##erent
data at t%e same time I6564 - -ertexJ. Le-e" o# para""e"ism present on /P1
mu% %ig%er t%an t%e one present on ontemporary 'P1s IS564 is t%e
%ig%est "e-e" o# para""e"ism a%ie-a!"e on ontemporary 5nte" and &64 'P1sJ.
+treams
4i##erent ar%iteture o# /P1s %as a"so put some re*uirements on t%e input
data+ t%us it is said+ t%at /P1s are stream proessors. Stream is not%ing e"se
t%an a set o# data o# t%e same type. $%e "arger t%e stream+ t%e %ig%er t%e
pro!a!i"ity+ t%at it wi"" !e possi!"e to use a"" #untiona" units in para""e". $o
support t%is type o# omputing+ a"" /P1s are optimi.ed towards !ig
t%roug%puts.
&s it was mentioned !e#ore+ ontemporary /P1s %a-e #u""y programma!"e
#untiona" units. Ae#ore t%at+ t%ey were speia"i.ed in per#orming ertain
tas,s. Speia"i.ation 3 not optimi.ation 3 made it possi!"e to a%ie-e
outstanding resu"ts+ !ut o# ourse+ at a ost o# no #"exi!i"ity.
4uring omputations streams are organi.ed into a pipe"ine. 9a%
omputationa" stage is per#ormed !y a ,erne" I#untiona" unit per#orming
some ,ind o# operationJ. $%ese pipe"ines are usua""y deep and onsist o#
se-era" stages.
+&eciali-ation #s. o&timi-ation - data&ath
&s it was mentioned !e#ore+ mu% !etter resu"ts an !e o!tained w%en
omputationa" unit is speia"i.ed in per#orming omputations rat%er t%an
optimi.ed. $%us optimi.ations s%ou"d !e treated as su!set o# speia"i.ation.
'on"usion is *uite straig%t#orward: a standard 'P1 is a pure"y se*uentia"
ma%ine+ t%at is in most ases supposed to dea" wit% one data item at a time+
and is most"y optimi.ed #or "ow-"ateny Inot t%roug%putJ operations. (#
ourse+ 'P1s ontain most"y genera"-purpose %ardware t%at is not speia"i.ed
in per#orming ertain tas,s as 'P1s %a-e to pro-ide ertain "e-e" o# #"exi!i"ity.
/P1s stand on t%e ot%er side o# !arriade and are !ot% speia"i.ed and
optimi.ed #or para""e" exeution and maximum t%roug%put. 5n ot%er words 3
tas,- and data "e-e" para""e"ism are ,ey #eatures o# /P1 datapat%
D6
.
D6Standard 'P1s in speia" onditions an sustain two #"oating-point operations at a time.
$%is is easi"y a%ie-ed !y /P1s+ and e-en t%at was #urt%er !eaten !y /F 5G00 I2.66J and
#ina""y !y /F 6G00+ w%i% o##ered 6 sustained #"oating-point operations at a time.
66 o# 116
FFT on 0P1
$%ere are ready made imp"ementations o# FF$ on /P1 w%i% re"y on /P/P1
a-ai"a!"e S4Ks. $%e !iggest ad-antage o# FF$ on /P1 is t%at in t%e mean time
'P1s an do ot%er )o!. 0e-ert%e"ess+ it s%ou"d !e stressed out t%at FF$ on
/P1 an !e per#ormed on"y in sing"e preision I#u""y ompati!"e wit 5999
sing"e preision #"oating point standardJ.
Last part o# t%e report presents resu"ts o# measurements+ w%i% a"so in"ude
FF$ on /P1. 6easurements were per#ormed under Bindows on a demo
program written !y Kennet% 6ore"and and 9dward &nge". 6o-ies #rom t%ese
tests are present on atta%ed '4.
DD@ is not a bran$ ne4 to#ic. It 4as intro$"ce$ some time ago, b"t
no4a$ays becomes more an$ more #o#"lar $"e to the fact, that it is
#ossible to #erform general-#"r#ose com#"tations on them. The
biggest so"rce of information abo"t #erforming general #"r#ose
com#"tations on gra#hics har$4are can be fo"n$ on DD@ #roIect
4eb#age (htt#!GG444.g#g#".orgG). Site contains also sli$es from
DD@ conferences, co$e sam#les an$ other "sef"l reso"rces (ne4s,
for"m etc.).
D"e to high com#le-ity of the #roblem, no other 4ork 4as $one on
DD@ an$ FFT on D@ in this #roIect (e-ce#t testing rea$y-ma$e FFT
on D@ co$e).
9ll co$e sam#les an$ #ieces of information come from DD@ #roIectHs
4ebsite.
6> o# 116
Part III - If you torture the data enough0 it "ill
confess.
:onal$ 5oase
Introduction
$%is report is a summary o# t%e w%o"e pro)et. 5t wi"" %ea-i"y re"y on t%e
,now"edge introdued in two pre-ious parts. Last %apters wi"" pro-ide resu"ts
#rom measurements o# di##erent a"gorit%ms and memory !andwidt% tests.
Pro2ect formulation
;FFT in Image rocessing! meas"rement,
im#lementation, #arallelization an$ com#"ter
architect"re<
/oa" o# t%e pro)et was to speedup 24 FF$ a"u"ations in 6o5nS so#tware+
!eing part o# t%e P&'( Pro)et+ de-e"oped a"so at &a"!org 1ni-ersity
'open%agen. '%anges s%ou"d !e transparent to programmers and s%ou"d not
inter#ere wit% a"ready existing ode+ t%us t%e tas, was main"y to reate !odies
o# fft() and in&fft() routines. (ne o# t%e re*uirements was a"so to reate a
porta!"e ode+ w%i% ou"d !e run on most p"at#orms+ in"uding xG6V6F.
-ilestone plan
1. 5ntrodution to t%eoretia" aspets o# t%e so"ution
1. 6at%ematia" !a,ground !e%ind FF$
2. So#tware issues Iintrodution to 6o5nSJ
D. 8ardware issues
2. &na"ysis o# possi!"e so"utions: eit%er 45< or reusing a-ai"a!"e FF$
"i!raries
1. Per#ormane -s. porta!i"ity 3 omp"exity o# reating optimi.ed and
porta!"e ode 3 S564 memory a"ignment and ot%er pro!"ems
2. 5mp"ementing 24 FF$ a"gorit%msR reating a"gorit%m
tournaments2ot%er so"utions
D. Aen%mar,ing a-ai"a!"e "i!raries and #ousing on imp"ementing :neat
and easy= inter#ae #or one o# t%em
D. 6easurements
1. 6et%ods+ preision and imp"ementation
2. 'urrent FF$ imp"ementation in 6o5nS
D. $est ase !en%mar,s
F. 4emo imp"ementations 3 non on#igura!"e FF$ routines
6G o# 116
5. Li!rary imp"ementation
1. $esting
2. MG6V6F -ersion
D. 6easurements
F. Aug #ixing and management
6. Fina" on"usions and resu"ts
So#tware de-e"opment proess was -ery simi"ar to t%e 1ni#ied Proess+ w%i%
is a ommon pratie w%en de-e"oping "arge+ (!)et-(riented app"iations. 5t
is a"so suita!"e #or sma""er pro)ets+ as it ena!"es to %ange re*uirements
during t%e pro)et de-e"opment time. 5t a"so in"udes exessi-e testing a#ter
ea% mi"estone. $o summari.e+ ea% de-e"opment stage Imi"estoneJ onsisted
o#:
re*uirements p%ase Iw%at s%ou"d !e imp"ementedJ
ana"ysis and design Ipossi!"e so"utionsJ
imp"ementation Idemo and mi"estone imp"ementationsJ
testing
dep"oyment
$%is means+ t%at de-e"opment proess o# t%e pro)et was inrementa"
Isuessi-e parts o# ode were added to t%e main pro)etJ and iterati-e Imany
iterations wit%in ea% mi"estoneJ.
e3uirements
$%e programming en-ironment is onstrained+ as 6o5nS de-e"opers tend to
reate a mono"it%i so#tware wit% its own set o# "i!raries+ so t%at externa"
dependenies are as sma"" as possi!"e.
(t%er re*uirements as #o""ows:
porta!"e and #ast ode t%at uti"i.es a-ai"a!"e 'P1 extensions IS564J
24 trans#orms o# ar!itrary si.es Ipower o# two si.esJ
t%reads support I#or S6P en-ironmentsJ
run-time inter#ae+ so t%at it is possi!"e to %ange "i!rary;s !e%a-ior
wit%out reompi"ing t%e so#tware
mentioned !e#ore minimum num!er o# externa" dependenies
$o pro-ide maximum porta!i"ity and easy inter#ae #or ompi"ing t%e "i!rary+ it
s%ou"d uti"i.e /01 !ui"ding too"s2proess. $%is s%ou"d a"so simp"i#y #urt%er
merging m##t wit% 6o5nS so#tware.
5mportant in#ormation a!out 6o5nS is t%at it is supposed to !e a rea"-time
system+ t%us ode s%ou"d !e uti"i.ing minimum resoures. Sine 6o5nS is a"so
an image proessing so#tware+ t%e ode %as to !e optimi.ed in terms o#
t%roug%put I#i"tering in #re*ueny domain+ grouping et.+ w%ere ea% image
%as to !e represented as a set o# omp"ex -a"uesJ.
6C o# 116
$%is pro)et is a"so one o# t%e parts+ t%at s%ou"d %e"p in a%ie-ing t%e #irst
6o5nS; mi"estone+ w%i% is :18.=.
Problem solution
&na"ysis o# t%e omp"exity and time needed to imp"ement #ast and porta!"e
FF$ routines pro-ed+ t%at more #easi!"e so"ution wou"d !e to reuse one o# t%e
existing "i!raries. Se"etion met%od was rat%er easy and onsisted o# two
steps:
ana"y.ing data a-ai"a!"e on t%e 5nternet Imanua"s+ !en%mar,s+ "ienses
et.J - prese"eting %ig% per#ormane "i!raries
detai"ed ana"ysis o# prese"eted "i!raries: test imp"ementations+
!en%mar,s
&s it wi"" !e s%own in a moment+ !ot% steps an !e merged into one+ as t%ere
is FF$ !en%mar,ing so#tware a-ai"a!"e t%at in"udes se-era" FF$ "i!raries.
So#tware pro)et is a""ed !en%FF$
D>
and was reated !y 6atteo Frigo and
Ste-en /. @o%nson I!ot% wrote a"so FF$BJ.
$o !e preise+ on"y "i!raries t%at:
are "iensed under /PL
%a-e mu"ti-t%readed &P5
are as mu% as possi!"e -endor independent
an !e ta,en into aount I!ut it does not mean t%at t%ey an;t !e
!en%mar,edKJ. Ae"ow is a "ist ta,en #rom !en%FF$ o# additiona" "i!raries+
t%at s%ou"d !e insta""ed !y an user Ii# one wants to !en%mar, t%emJ.
Free software:
FFTW 2.x and/or FFTW 3.x (double and/or single precision): www.fftw.org
GNU Scientific Library: sources.redhat.com/gsl
Hardware/vendor-specific:
Intel Math Kernel Library: www.intel.com/software/products/mkl
Intel IPPS
AMD Core Math Library
Apple VDSP (Macintosh G4 and higher only)
IBM ESSL (AIX only)
sgimath (SGI/MIPS only)
SUNPERF (SPARC only)
DXML/CXML (Alpha only)
Proprietary software:
Numerical Recipes: copy .c and .f files into benchees/nr
NAG (Numerical Algorithms Group) Fortran Library
D>Pro)et;s we!site: %ttp:22www.##tw.org2!en%##t2
>0 o# 116
IMSL (International Mathematical and Statistical Library)
&#ter ana"ysis o# !en%mar, data+ it was "ear t%at FF$B is t%e "i!rary t%at
s%ou"d !e used #or #urt%er de-e"opment. Simi"ar resu"ts were presented on
FF$B;s we!site
DG
. 5n t%e next %apter t%ere are resu"ts #rom running it on
di##erent ma%ines Iand resu"ts t%at are present on FF$B;s we!siteJ.
From t%e ta!"e presented a!o-e+ on"y FF$B D.x was !en%mar,ed.
BenchFFT
Metho$! !en%FF$ was run on #our di##erent omputers t%at were onsidered
to !e t%e most popu"ar ones among peop"e using 6o5nS. $%ese were !ot% D2
and 6F-!it ma%ines %a-ing Pentium and &64 proessors. Aeause 24 FF$
an !e a"so -iewed as per#orming 14 FF$s o-er !ot% dimensions+ resu"ts #rom
one-dimensiona" test wi"" !e a"so presented. 0on powers o# 2 -etors are
omitted+ as t%ey do not #it into per#ormane re*uirements I!esides+ it is a"most
a"ways possi!"e to a-oid using t%emJ.
P"ease note+ t%at on S6P systems on"y one proessor was used at a timeK
.lorian
uname -a: Linux florian-2 2.6.11.4-21.11-bigsmp #1 SMP Thu Feb 2 20:54:26 UTC 2006 i686
i686 i386 GNU/Linux
arch: i686
Intel(R) Xeon(TM) CPU 3.40GHz
cache size : 1024 KB
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm pni monitor ds_cpl est cid cx16
xtpr
bogomips : 6733.82
(the same for processor number 1)
MemTotal: 4148604 kB
gcc --version: gcc (GCC) 3.3.5 20050117 (prerelease) (SUSE Linux)
gcc -v: Reading specs from /usr/lib/gcc-lib/i586-suse-linux/3.3.5/specs
g++ --version: g++ (GCC) 3.3.5 20050117 (prerelease) (SUSE Linux)
g++ -v: Reading specs from /usr/lib/gcc-lib/i586-suse-linux/3.3.5/specs
cc --version: cc (GCC) 3.3.5 20050117 (prerelease) (SUSE Linux)
cc -v: Reading specs from /usr/lib/gcc-lib/i586-suse-linux/3.3.5/specs
c++ --version: c++ (GCC) 3.3.5 20050117 (prerelease) (SUSE Linux)
c++ -v: Reading specs from /usr/lib/gcc-lib/i586-suse-linux/3.3.5/specs
DG%ttp:22www.##tw.org2
>1 o# 116
>2 o# 116
>D o# 116
mic
uname -a: Linux mic 2.6.14-gentoo-r5 #1 SMP Fri Jan 20 17:06:25 CET 2006 i686 Intel(R)
Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux
arch: i686
model name : Intel(R) Pentium(R) 4 CPU 3.00GHz
cache size : 1024 KB
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pni monitor ds_cpl cid xtpr
bogomips : 5991.37
(the same for processor 1)
MemTotal: 498368 kB
gcc --version: gcc (GCC) 3.3.6 (Gentoo 3.3.6, ssp-3.3.6-1.0, pie-8.7.8)
gcc -v: Reading specs from /usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.6/specs
gcc --version: gcc (GCC) 3.3.6 (Gentoo 3.3.6, ssp-3.3.6-1.0, pie-8.7.8)
gcc -v: Reading specs from /usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.6/specs
g77 --version: GNU Fortran (GCC) 3.3.6 (Gentoo 3.3.6, ssp-3.3.6-1.0, pie-8.7.8)
g77 -v: Reading specs from /usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.6/specs
g++ --version: g++ (GCC) 3.3.6 (Gentoo 3.3.6, ssp-3.3.6-1.0, pie-8.7.8)
g++ -v: Reading specs from /usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.6/specs
cc --version: gcc (GCC) 3.3.6 (Gentoo 3.3.6, ssp-3.3.6-1.0, pie-8.7.8)
cc -v: Reading specs from /usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.6/specs
c++ --version: c++ (GCC) 3.3.6 (Gentoo 3.3.6, ssp-3.3.6-1.0, pie-8.7.8)
c++ -v: Reading specs from /usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.6/specs
f77 --version: GNU Fortran (GCC) 3.3.6 (Gentoo 3.3.6, ssp-3.3.6-1.0, pie-8.7.8)
f77 -v: Reading specs from /usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.6/specs
>F o# 116
>5 o# 116
>6 o# 116
tu(
uname -a: Linux qtux 2.6.16-gentoo-r7 #2 SMP Fri May 19 18:48:58 CEST 2006 x86_64 Dual
Core AMD Opteron(tm) Processor 270 GNU/Linux
arch: x86_64
Dual Core AMD Opteron(tm) Processor 270, 2GHz
cache size : 1024 KB
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni
lahf_lm cmp_legacy
bogomips : 4024.89
(the same for processors number 1, 2 and 3)
MemTotal: 4039724 kB
gcc --version: gcc (GCC) 3.4.5 (Gentoo 3.4.5, ssp-3.4.5-1.0, pie-8.7.9)
gcc -v: Reading specs from /usr/lib/gcc/x86_64-pc-linux-gnu/3.4.5/specs
g77 --version: GNU Fortran (GCC) 3.4.5 (Gentoo 3.4.5, ssp-3.4.5-1.0, pie-8.7.9)
g77 -v: Reading specs from /usr/lib/gcc/x86_64-pc-linux-gnu/3.4.5/specs
g++ --version: g++ (GCC) 3.4.5 (Gentoo 3.4.5, ssp-3.4.5-1.0, pie-8.7.9)
g++ -v: Reading specs from /usr/lib/gcc/x86_64-pc-linux-gnu/3.4.5/specs
cc --version: gcc (GCC) 3.4.5 (Gentoo 3.4.5, ssp-3.4.5-1.0, pie-8.7.9)
cc -v: Reading specs from /usr/lib/gcc/x86_64-pc-linux-gnu/3.4.5/specs
c++ --version: c++ (GCC) 3.4.5 (Gentoo 3.4.5, ssp-3.4.5-1.0, pie-8.7.9)
c++ -v: Reading specs from /usr/lib/gcc/x86_64-pc-linux-gnu/3.4.5/specs
f77 --version: GNU Fortran (GCC) 3.4.5 (Gentoo 3.4.5, ssp-3.4.5-1.0, pie-8.7.9)
f77 -v: Reading specs from /usr/lib/gcc/x86_64-pc-linux-gnu/3.4.5/specs
>> o# 116
>G o# 116
#ol/s
uname -a: Linux volks 2.6.16-gentoo-r7 #3 SMP PREEMPT Fri May 19 16:13:27 CEST 2006
>C o# 116
x86_64 Dual Core AMD Opteron(tm) Processor 285 GNU/Linux
arch: x86_64
model name : Dual Core AMD Opteron(tm) Processor 285, 2.6GHz
cache size : 1024 KB
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni
lahf_lm cmp_legacy
bogomips : 5232.45
(the same for processors number 1, 2 and 3)
MemTotal: 4039648 kB
gcc --version: gcc (GCC) 3.4.5 (Gentoo 3.4.5, ssp-3.4.5-1.0, pie-8.7.9)
gcc -v: Reading specs from /usr/lib/gcc/x86_64-pc-linux-gnu/3.4.5/specs
g77 --version: GNU Fortran (GCC) 3.4.5 (Gentoo 3.4.5, ssp-3.4.5-1.0, pie-8.7.9)
g77 -v: Reading specs from /usr/lib/gcc/x86_64-pc-linux-gnu/3.4.5/specs
g++ --version: g++ (GCC) 3.4.5 (Gentoo 3.4.5, ssp-3.4.5-1.0, pie-8.7.9)
g++ -v: Reading specs from /usr/lib/gcc/x86_64-pc-linux-gnu/3.4.5/specs
cc --version: gcc (GCC) 3.4.5 (Gentoo 3.4.5, ssp-3.4.5-1.0, pie-8.7.9)
cc -v: Reading specs from /usr/lib/gcc/x86_64-pc-linux-gnu/3.4.5/specs
c++ --version: c++ (GCC) 3.4.5 (Gentoo 3.4.5, ssp-3.4.5-1.0, pie-8.7.9)
c++ -v: Reading specs from /usr/lib/gcc/x86_64-pc-linux-gnu/3.4.5/specs
f77 --version: GNU Fortran (GCC) 3.4.5 (Gentoo 3.4.5, ssp-3.4.5-1.0, pie-8.7.9)
f77 -v: Reading specs from /usr/lib/gcc/x86_64-pc-linux-gnu/3.4.5/specs
G0 o# 116
G1 o# 116
2esults from ..T;'s "eb&age
$%ere are se-era" di##erent !en%mar,s a-ai"a!"e on t%e !en%FF$ we!site.
/rap%s are a-ai"a!"e on-"ine+ t%us most o# t%em wi"" not !e presented %ere. Be
wi"" #ous our attention on"y on resu"ts oming #rom t%e same ma%ine+ !ut
w%ere soure was ompi"ed wit% two di##erent ompi"ers: g I/01 ompi"erJ
and i I5nte" 'ompi"erJ. Resu"ts on"y #or 14 and 24 trans#orms+ powers o#
two.
G2 o# 116
5""ustration 5: g+ PF 2.F/8.
5""ustration 6: i+ PF 2.F/8.
GD o# 116
5""ustration >: g+ PF 2.F/8.
5""ustration G: i+ PF 2.F/8.
$%ere are no !ig di##erenes !etween !ot% ompi"ers+ espeia""y w%en t%ey
dea" wit% %ig%-*ua"ity soure ode. &"so+ t%ere s%ou"d !e no dou!t a!out t%e
GF o# 116
#at+ t%at 5nte" 'ompi"er seems to !e !etter in optimi.ing t%e ode. 5n !ot%
ases i# possi!"e+ it is ad-ised to measure exeution time o# !inaries generated
!y se-era" di##erent ompi"ers.
Pro2ect structure
6ain goa" was to integrate FF$B wit% 6o5nS. Sine FF$B gi-es "ots o#
possi!i"ities+ t%e ot%er aim was to design a #ast and user-#riend"y inter#ae #or
on#iguring FF$B wit%out reompi"ing e-en a sing"e piee o# ode. 'reated
"i!rary was named mfft It%ere is a"so ot%er "i!rary named m##t+ !ut t%at is )ust
oinideneJ+ w%i% s%ou"d resem!"e :my fft:+ as t%e purpose o# it was to gi-e
#u"" ontro" o-er t%e FF$ a"u"ation proess.
4iagram !e"ow presents t%e struture and re"ations !etween di##erent
so#tware parts. 4as%ed ir"e ontains de#initions t%at an !e %anged
!etween program runs 3 so"id "ines represent o!)ets+ t%at %a-e to !e
reompi"ed i# modi#ied.
6FF$ does not enapsu"ate FF$B;s types 3 t%is was done on purpose+ as m##t
is on"y an interface. 5t s%ou"d !e a"so re"ati-e"y easy to modi#y m##t;s soure
ode and in-o,e di##erent p"anner or add new options to t%e on#iguration #i"e.
Sine one o# t%e re*uirements was to %a-e minimum Ior e-en noJ externa"
dependenies+ some piees o# so#tware were integrated wit% m##t+ and t%ese
are:
Soft4are "r#ose =icense 9"thors
''L 'on#iguration #i"e
parsing
/012/PL Step%en F. Aoot%
"i!&LL AS$ #or wisdom
manager
/012/PL Aen P#a##
FF$B was not integrated 3 t%at wou"d !e a !ig step !a,wards+ sine it is sti""
!eing de-e"oped+ t%us it wou"d !e a su!stantia" !"o,age #or "ater updates
Iand wit% dynami "in,ing it is enoug% to update )ust t%e "i!rary+ wit%out
reompi"ing depending so#twareJ.
G5 o# 116
5n ase o# ''L "i!rary it is assumed+ t%at no #urt%er de-e"opment was p"anned
I5 tried to ontat t%e aut%or !y e-mai"R new on#iguration #eature was added
to ''L )ust !e#ore it was integrated wit% m##tJ.
Li!&LL does not re*uire any #urt%er modi#iations and an !e used :as is=.
4ow does FFTW wor(
FF$B pro-ides extreme porta!i"ity+ w%i"e it is sti"" a!"e to maintain extra
ordinary speed #or a"" types o# trans#orms+ regard"ess t%e ran, Inum!er o#
dimensionsJ and si.e Ian wor, e-en wit% prime typesJ. 5t is -ery important to
get a*uainted wit% FF$B interna"s and way o# wor,ing+ !eause it wi""
simp"i#y a "ot understanding m##t;s doumentation.
Portability and ada&ti#ity ! let's &lanG
'a"u"ation o# FF$s in FF$B "i!rary is di-ided into two parts: p"anning and
exeuting. 9xeution part on"y uses re#erene returned !y t%e p"anner and in
#at :a"u"ates= t%e FF$+ !ut as it wi"" !e re-ea"ed in a moment+ it is not t%e
%ardest tas,.
&s it was mentioned in pre-ious parts+ one o# good attempts to so"-e t%e
genera" 4F$ pro!"em wou"d !e to run a tournament and !en%mar, a-ai"a!"e
a"gorit%ms Iand t%eir #"a-orsJ. For t%is proess to !e e##iient It%e more
a"gorit%ms+ t%e !igger t%e %ane #or #aster so"utionJ+ t%e spae o# a-ai"a!"e
a"gorit%ms %as to !e !ig enoug% It%is re*uires *uite "arge set o# a"gorit%msJ.
For so"-ing t%is pro!"em FF$B de-e"opers used dynami programming
approa%+ w%i% strong"y re"ies on expressing t%e pro!"em.
5n t%e FF$B-"anguage a-ai"a!"e a"gorit%ms are a""ed p"ans. $o !e a!"e to use
dynami programming approa%+ ea% pro!"em %as to !e !ro,en down into
su!pro!"ems w%i% are #urt%er reused se-era" times 3 t%ese are t%e so a""ed
:o-er"apping su!pro!"ems=. $%us+ expression o# t%e pro!"em is -ery important
3 a"so !eause it puts t%e upper-!ound "imit on t%e spae o# a-ai"a!"e p"ans. 5n
dynami programming approa%+ t%e spae o# a-ai"a!"e p"ans s%ou"d !e !ig
enoug% to ontain :good= a"gorit%ms+ !ut o# ourse it annot !e enormous+
!eause t%en p"anning wou"d ta,e too mu% time.
Calculating $.T% o#er#ie"
&s it was mentioned !e#ore+ p"anning is omp"ex and t%e most time onsuming
tas,. B%at in #at is %appening w%en a 4F$ is !eing a"u"ated is:
t%e p"anner is #ed wit% in#ormation a!out t%e trans#orm si.e and type
t%e pro!"em is sp"it into se-era" su!-pro!"ems unti" t%ese !eome simp"e
enoug% to use ready-made optimi.ed piees o# ode a""ed ode"ets
t%e p"anner measures exeution time o# di##erent so"utions Iw%i% are
instantiated as so"-ersJR a so"-er an return eit%er a pointer to t%e p"an or
01LL pointer+ w%en it ou"d not reate a p"an #or gi-en input 3 in ase
w%en p"ans #or su!-pro!"ems are needed+ so"-ers a"" reursi-e"y p"anners
t%e #astest a"gorit%m is pi,ed up and returned
G6 o# 116
FF$ is a"u"ated aording to t%e returned p"an
'urrent re"ease o# FF$B IFF$BDJ omes wit% a!out 150 ready-made ode"ets
w%i% s%ou"d suit most needs. 5# one needs to %a-e ode"ets t%at wi"" dea" wit%
speia" ases 3 t%ere is a speia" too" t%at omes wit% FF$B a""ed genfft+
w%i% is a speia"-purpose :FF$ ompi"er= w%i% an generate optimi.ed ode
#rom %ig%-"e-e" mat%ematia" desription o# t%e 4F$ a"gorit%m.
Im&lementation notes for multi-threaded en#ironments
FF$B pro-ides mu"ti-t%readed &P5+ w%i% is t%e on"y optima" so"ution #or S6P
systems. (# ourse+ it mig%t %appen+ t%at one wou"d in"ude FF$B inside
%is2%er own para""e" program. $%en it is -ery important to remem!er to
proper"y syn%roni.e t%reads+ espeia""y t%e p"anning proedure. $%is is due
to t%e #at+ t%at p"anners ex%ange and s%are a "ot o# data. 5n mu"ti-t%readed
en-ironment+ t%e p"anning proess s%ou"d !e treated as ritia" setion and !e
per#ormed on"y !y one t%read. &#terwards+ ot%er instanes wi"" use a-ai"a!"e
p"anner data Ior t%e so a""ed aumu"ated wisdomJ.
$%ere is one additiona" re*uirement oming #rom m##t: mfft8sh"t$o4n()
#untions %as to !e a""ed #rom one t%read on"y+ too.
;isdom% using memoi-ation
FF$B %ea-i"y uses dynami programming #or #inding t%e #astest a"gorit%m.
Rea"u"ating t%e same t%ing e-ery time wou"d !e ine##iient and tota""y
unaepta!"e+ t%us p"anners during runtime ma,e use o# memoi.ation. 5t is a
te%ni*ue used in dynami programming app"iations #or speeding up t%e
omputations. 5t simp"y means+ t%at resu"ts o# #untions are stored #or #urt%er
retrie-a". 5n ase o# FF$B+ p"anner does not %o"d t%e #u"" resu"t+ as it mig%t !e
!ig and t%us memory onsuming. 5nstead it stores on"y %as% Istrong 645J o#
t%e pro!"em and pointer to t%e so"-er t%at reated t%e p"an.
&t some point de-e"opers o# F$$B introdued t%e so a""ed :wisdom=+ w%i% is
not%ing e"se t%an preomputed set o# p"ans+ w%i% now an !e exported and
imported IFF$B pro-ides set o# uni-ersa" a""s+ w%i% "oo, and at simi"ar"y to
system a""sJ. $%ere is a"so additiona" uti"ity a""ed fft4-4is$om+ w%i% is
not%ing e"se t%an a wisdom generator w%i% an !e used to pre-ompute
p"ans. $%ese an !e stored eit%er as system wisdom Ia-ai"a!"e under 2et #or
a"" system usersJ or sing"e #i"es.
Bisdom data is umu"ati-e+ so t%at it is possi!"e to export2import wisdom #or
se-era" di##erent trans#orms. 5t s%ou"d !e stressed out+ t%at wisdom #i"es
s%ou"d !e a"ways generated on t%e ma%ine t%at wi"" !e used #or a"u"ationsK
Conclusions
&s it is -isi!"e on grap%s oming #rom !en%FF$+ FF$B "i!rary o##ers #ast
exeution and extreme porta!i"ity. $%e inter#ae wit% integrated mu"ti-
t%readed &P5 Ion#igura!"e at ompi"ation timeJ o##ers se-era" di##erent
options t%at an a##et t%e o-era"" resu"t+ p"us some additiona" #eatures+ as it
was mentioned in t%e doumentation+ :#or aademi #un=. $%e mfft "i!rary is
G> o# 116
not%ing e"se+ t%an an inter#ae to t%e FF$B I#or 24 trans#orms on"yJ+ w%i%
gi-es possi!i"ity o# on#iguring t%e :FF$-su!system= wit%out reompi"ing
neit%er t%e "i!rary+ nor so#tware.
=..T
6FF$ is written in p"ain ' and exept some parts o# t%e ode Iassem!"y
snippetsJ+ it s%ou"d !e possi!"e to use it on di##erent ar%itetures Ias #or now
on"y xG6 and xG6V6F were testedJ. $%e "i!rary %as !ui"t-in wisdom manager+
w%i% %e"ps organi.ing aumu"ated wisdom+ automatia""y imports it and
stores %uman-reada!"e wisdom in#ormation in a separate #i"e. 5t a"so ena!"es
to on#igure t%reads+ measure time wit% two di##erent met%ods and many
more.
Development method
$o pro-ide maximum porta!i"ity and simp"i#y
!ui"ding proess o# t%e "i!rary+ #or
de-e"opment and de!ugging 5 used &n)uta
DC

4e-Studio+ w%i% is a /PL '2'UU 549 w%i% was nati-e"y designed to run on
/nome Bindow 6anager. &part #rom nie inter#ae+ &n)uta o##ers a"so set o#
pro)et maintenane #untions+ t%at an su!stantia""y simp"i#y pro)et
management tas,s. 5n t%is matter &n)uta is #u""y ompati!"e wit% /01
standard too"s "i,e automa,e+ autoon# and "i!too".
&utoon# is a set o# mF maros t%at generate on#igure s%e"" sript w%i% is
#urt%er supposed to on#igure t%e so#tware on a gi-en system2p"at#orm. 5t
wor,s on most 105M-"i,e systems and an !e a"so used #or ross ompi"ations
Ia"t%oug%+ usua""y wit% some pro!"ems
F0
J.
&utoma,e generates 6a,e#i"e.in #i"es and re*uires autoon# as prere*uisite.
$%e "ast one+ "i!too" is a set o# ommands t%at simp"i#y !ui"ding "i!raries+ and
an !e used in a"" ,inds o# ma,e#i"es: 6a,e#i"e+ 6a,e#i"e.in and 6a,e#i"e.am.
Configure.in
$%is #i"e is used to produe t%e on#igure sript. 5t ontains se-era" maro
de#initions+ t%at run se-era" %e,s and try to adapt t%e soure to t%e system.
For m##t t%e #o""owing ommands were added to t%e generi on#igure.in
reated !y &n)uta:
AM_PROG_AS
9na!"es support #or assem!"y soures. 0eeded #or t%e r$tsc.S #i"e+ w%i%
ontains routine mfft8tim w%i% aesses $S' y"e ounter o# 5&-D2
proessors Iou"d !e a"so so"-ed as in-"ine assem!"yJ.
AC_CHECK_LIB(fftw3, fftw_execute,[LIBS="$LIBS -lfftw3"], AC_MSG_ERROR([Please install
DCFor more in#ormation p"ease -isit %ttp:22www.an)uta.org2
F0$%is depends on %ow exoti t%e target p"at#orm is.
GG o# 116
fftw3 (http://www.fftw.org/)]))
'%e,s i# fft42 "i!rary is a-ai"a!"e !y ompi"ing a simp"e program w%i% a""s
fft48e-ec"te #untion.
AC_CHECK_LIB(m, sin,[LIBS="$LIBS -lm"])
'omments are unneessary.
AC_ARG_ENABLE(silent, [ --enable-silent supress all debugging messages
(advanced)],
[
AC_DEFINE([HAVE_MFFT_SILENT], 1, [Get rid of debugging messages])
])
'ompi"es m##t wit%out de!ugging messages. S%ou"d !e used on"y !y ad-aned
users t%at a"ready %a-e wanted wisdom #i"es. $%is option wor,s on t%e
preproessor "e-e" and t%us wi"" derease num!er o# !ran% instrutions+
w%i% an s"ig%t"y impro-e o-era"" per#ormane.
AC_ARG_WITH(fftw3_threads, [ --with-fftw3-threads enable multi-threaded API for
fftw3 (pthread only)],
[
AC_CHECK_LIB(fftw3_threads,fftw_init_threads,
[
AC_DEFINE([HAVE_FFTW3_THREADS],1,[This means that
pthread is present])
LIBS="$LIBS -lfftw3_threads -lpthread"
],
[
AC_MSG_ERROR([fftw3 was not compiled with --enable-
threads. You can't use multi-threaded API.])
]
)
]
)
5# fft42 was ompi"ed wit% t%read support+ t%en m##t s%ou"d !e ompi"ed wit%
it+ too. (t%erwise+ t%e "i!rary wi"" %a-e %ard oded num!er o# t%reads and wi""
not reat to %anges in on#iguration #i"e.
9ntries presented a!o-e were reated !y %and.
Summary o# ustom ompi"ation options #or m##t:
--4ith-fft42-threa$s
9na!"es mu"ti-t%readed &P5 in m##t. $o use it+ ##twD %as to !e ompi"ed wit% 3
ena!"e-t%reads option.
GC o# 116
--enable-silent
Suppresses a"" de!ugging messages. $%e "i!rary wi"" not print anyt%ing to
stdout.
=a/efile.am
First -ersion o# t%is #i"e was reated !y &n)uta. 5t is used !y automa,e to
generate 6a,e#i"e.in #i"es. 'ontains in#ormation a!out su!diretories in t%e
pro)et+ doumentation #i"es and simi"ar.
9a% su!diretory in t%e pro)et Iin t%is ase srcG and incl"$eGJ ontains its
own 6a,e#i"e.am w%i% is #urt%er proessed !y automa,e to generate
6a,e#i"e.in #i"es. Ae"ow are ontents o# t%e 6a,e#i"e.am #rom t%e srcG
diretory Iomments in !o"dR -- e-am#leJ:
-- defines include directory
INCLUDES = -I../include
-- CFLAGS C Compiler FLAGS
AM_CFLAGS =\
-Wall\
-g
-- name of the target object
lib_LTLIBRARIES = libmfft.la
-- list of source files
libmfft_la_SOURCES = \
mfft.c\
rdtsc.S\
ccl_get.c\
ccl_iterate.c\
ccl_parse.c\
ccl_release.c\
ccl_reset.c\
bst.c
-- linker flags
libmfft_la_LDFLAGS =
-- external libraries that should be linked with target object this entry is empty
-- as $LIBS variable is setup during ./configure process (when particular library is
-- detected, then proper information for the linker is appended to the #LIBS variable)
libmfft_la_LIBADD =
Iterations and milestones
Ae#ore mfft was re"eased as it is+ t%ere were se-era" ot%er intermediate
pat%es and sma"" app"iations:
#irst demo imp"ementation a"u"ating 14 FF$ and measuring time o#
C0 o# 116
exeution IFF$BJ
seond demo a"u"ating 24 FF$R input was read #rom a #i"e 3 t%is demo
#urt%er used as a #oundation o# mfftest uti"ity Ireads data #rom a #i"e+
a"u"ates #orward and in-erse trans#orm 3 uses o# ourse m##tJ IFF$BJ
t%ird demo imp"ementation #or 6o5nSR a-ai"a!"e in two -ersions IFF$BJ
m##t
$%ere is a"so one more piee o# so#tware t%at uses m##t: FFTtoolbo-. 5t was
reated )ust #or aademi purposes+ to exp"ore detai"s o# Fourier $rans#orm
and #i"tering in #re*ueny domain. $%e program an:
write image ontaining magnitude spetrum o# t%e input image
per#orm simp"e #i"tering on t%e input imageR in t%is mode #i"ter image %as to
!e a"so spei#ied Iand is assumed to !e in #re*ueny domainR un#ortunate"y+
t%is puts ertain "imitation on t%e #i"ter+ w%i% %as on"y rea" -a"uesJ
ad)ust !rig%tness o# output images in se-era" di##erent ways
+ource code documentation
5n"ude #i"e ontains a"" prototypes o# #untions and is we"" doumented. 'ode
is doumented on"y in p"aes w%ere it was a!so"ute"y neessary. 9a% m##t
re"ease ontains a"so ot%er #i"es+ and t%ese are:
&1$8(RS Iin#ormation a!out aut%orsJ
'(P<50/ I/PL2 Liense textJ
'%angeLog I%anges among -ersionsJ
50S$&LL Igenera" insta"" instrutionsJ
09BS IunmaintainedJ
$(4( I"ist o# t%ings to !e doneJ
R9&469 Idetai"ed instrutions a!out insta""ation proedureJ
&1$8(RS+ R9&469+ $(4( and '%angeLog 3 t%ese #i"es are maintained !y
t%e aut%or or m##t. Ae"ow on"y '%angeLog wi"" !e presented Iot%er #i"es are
too !ig+ p"ease re#er to t%e soure ode a-ai"a!"e on t%e '4J:
0.2.4
* Fixed transform plans for non-symmetric images.
* x86_64 version available (excludes tsc ASM call)
0.2.3
* Fixed trailing slash in wisdom_directory (you can have a trailing slash now).
* Initialization debug information depends on MFFT_DEBUG environmental variable.
If set, then init debug is active. After the library is configured configuration
directives take over.
C1 o# 116
0.2.2 and lower
autoconf fixes
Program was de-e"oped to !e as mu% as possi!"e omp"iant wit% /01
'oding Standards.
9a% distri!ution ontains a"so we"" doumented samp"e on#iguration #i"e
w%i% wi"" !e o-ered in detai"s in next %apters.
$dea of mfft and (e# features
$%e idea o# m##t ame a#ter imp"ementing se-era" demo app"iations. 5t was a
mixture o# my own o!ser-ations and #eed!a, reei-ed #rom demo testers.
$%e so a""ed :wis% "ist= is presented !e"ow+ and s%ou"d !e onsidered as a
starting point o# m##t de-e"opment:
easy wisdom generation and maintenane
on#igura!"e mu"ti-t%readed inter#ae
!ui"t-in preise time measurement
on#iguration #i"e
&"" t%ese #eatures are urrent"y present in m##t. /enera""y spea,ing+ most o#
t%em were merged toget%er and now #orm mono"it%i on#iguration inter#ae.
;isdom manager
Bisdom manager is responsi!"e #or exporting and importing wisdom #i"es. $%e
,ey #eature o# it+ is a possi!i"ity o# maintaining a "ist o# a-ai"a!"e wisdoms
wit%in t%e wisdom #i"e. $%ese #i"es %a-e .in#o extension and are in %uman
reada!"e #rom. 6##t parses .in#o #i"e during on#iguration p%ase and i# wisdom
generation is ena!"ed+ an in#orm t%e user #or examp"e+ t%at wisdom #or
spei#ied trans#orm is not present+ and t%us it mig%t ta,e some time !e#ore
p"an wi"" !e generated Iw%en running in ex%austi-e p"anning modesJ. $%is
mig%t !e use#u" piee o# in#ormation+ as during p"anning t%e app"iation
appears to !e #ro.en.
Bisdom manager sa-es a"so ot%er use#u" piees o# in#ormation a!out wisdoms
"i,e: num!er o# t%reads+ type I#orward or in-erseJ and "e-e" Imet%od o#
p"anningJ. Bisdom manager on#igures itse"# on"y one in t%e !eginning w%en
m##tVon#ig #untion is a""ed. 4uring program run it maintains a set o#
a-ai"a!"e wisdoms+ w%i% is t%e Aer,e"ey AS$ imp"ementation I#or #ast+ (In
"ognJ sear%J.
-- Contents of a sample wisdom.info file
1024 1024 32 2 -1
1024 1024 32 2 1
1024 1024 64 2 -1
1024 1024 64 2 1
Ae%a-ior o# wisdom manager an !e ontro""ed #rom t%e on#iguration #i"e.
C2 o# 116
Time engines
D2-!it -ersion o# m##t omes wit% two time engines 3 one uses a-ai"a!"e system
a"" Iand t%us is #u""y porta!"e among ot%er p"at#ormsJ+ w%ereas t%e seond
one o##ers %ig%er preision !ut is "imited on"y to xG6 ar%iteture.
Dto$ or in ot%er words gettimeo#day is time engine t%at uses gettimeo#day
system a"" to measure time o# exeution o# t%e ##twDVexeIJ #untion. $%is is
t%e de#au"t time engine in m##t+ !eause on t%e ontrary to t%e tsc time
engine+ it is S6P-sa#e+ w%i% means+ t%at it an !e #ree"y used on S6P
systems+ and measured time wi"" a"ways !e orret 3 detai"s wi"" !e disussed
in ts setion. (# ourse+ !eause o# t%at gettimeo#day is said to !e o# sma""
auray+ a"so !eause it is a system a""+ w%i% puts additiona" o-er%ead and
t%ere#ore an resu"t in inaurate measurements.
/ettimeo#dayIJ is porta!"e.
Tsc or in ot%er words rdts is a mnemoni #rom xG6 assem!"y "anguage w%i%
means Rea4 $ime Stamp 'ounter. 5t returns registers 94M:9&M w%i% ontain
num!er o# ti,s #rom proessor reset. 5t is a -ery preise met%od o# time
measurement+ !ut an !e used on"y on xG6 p"at#orms. (t%er disad-antage o#
ts engine is t%at it is not S6P-sa#e. $o measure t%e time+ #untion %as to !e
a""ed twie. $%ere is no guarantee t%at -a"ues o# ts wi"" ome #rom exat"y
t%e same proessor+ !eause one ne-er ,nows+ i# t%e tas, was swit%ed to
anot%er proessor.
5t is -ery important to ,now %ow t%e timer #untions are imp"emented+
!eause t%is ,now"edge wi"" usua""y gi-e us more in#ormation a!out t%e
auray o# our measurements. /enera""y spea,ing+ in ase o# gettimeo#dayIJ+
t%e o-er%ead is rat%er %uge+ a"so !eause it #urt%er uses a system a"" to get
aess to t%e timer. Aut t%is o-er%ead an !e ompensated eit%er !y
measuring "arge parts o# ode or !y repeating t%e experiment se-era" times
Imeasured time s%ou"d !e sma""er t%an in norma" onditions+ as running t%e
same part o# ode repeated"y wi"" resu"t in "ess a%e missesJ. &s it was
mentioned !e#ore+ gettimeo#dayIJ is S6P-sa#e+ and t%us at t%is point+ we
mig%t !e sure t%at o!tained resu"t is orret. $%is mig%t not !e t%e ase #or
ts 3 !ut %ere t%ere are "ots o# ot%er tri,s t%at an !e used. $%is met%od is
suita!"e #or measuring s%ort se*uenes o# ode Iexeuting #aster t%an D0msJ 3
!ut t%is ase+ we are on our own+ !eause we ne-er ,now i# t%e tas, was
swit%ed to anot%er 'P1 on S6P system It%us+ it is -ery important to %a-e
proper s%edu"ing po"iyKJ.
4e#au"t gtod time engine #or m##t is a resu"t o# ompromise !etween
porta!i"ity and auray.
Configuration file and runtime settings
$%e "i!rary %e,s en-ironmenta" -aria!"e MFFT85*'FID w%i% s%ou"d
ontain pat% to on#iguration #i"e. 0o ommand "ine arguments+ no %ard oded
-a"ues 3 e-eryt%ing is dynami. 5n one o# t%e -ersions+ m##t was expanded and
CD o# 116
%e,s a"so ot%er en-ironmenta" -aria!"e: MFFT8D+B@D. 5# t%is -aria!"e is
set+ t%en m##t wi"" print on t%e sreen initia"i.ation in#ormation I!e#ore
on#iguration #i"e is opened and parsedJ.
$%e "i!rary on#igures itse"# on"y one in t%e !eginning I#untion
mfft8configJ. $%en it parses on#iguration #i"e. &"" on#iguration -a"ues are
,ept in a struture w%i% is used !y a"most e-ery #untion wit%in t%e "i!rary. 5#
t%e on#iguration #i"e is not set+ t%ere are %ard oded de#au"t -a"ues to pre-ent
t%e "i!rary #rom ras%ing. 'on#iguration management #untions were designed
to !e as easy as possi!"e and t%us an !e easi"y expanded.
4e#au"t on#iguration -a"ues are set as maro de#initions in t%e #i"e mfft.h
Ipre#ix 6FF$V'F/VJ.
Struture t%at ,eeps runtime -a"ues:
struct mfft_conf_struct {
char platform[MFFT_MAX];
char wisdom_directory[MFFT_MAX];
char hostname[MFFT_MAX];
unsigned int generate_wisdoms;
unsigned int wisdom_level;
unsigned int fft_sign;
unsigned int enable_debug;
unsigned int enable_time;
unsigned int time_engine;
unsigned int enable_threads;
unsigned int threads_number;
unsigned int cpu_freq;
struct bst_table *mfft_w;
unsigned int bst_created;
unsigned int enable_wisdom_manager;
unsigned int enable_plan_info;
};
$o add new entry it is enoug% to:
reate maro de#inition o# t%e de#au"t -a"ue
add entry to t%e mfft8conf struture
add de#au"t -a"ue assignment to t%e #untion mfft8set8$efa"lts()
add parsing entry to t%e #untion mfft8#arse8entry()
'on#iguration #i"es %as ;,ey T -a"ue; struture and supports omments a#ter
;\; signs. &#ter t%at+ new entry is ready to use and an !e re#erened #rom
m##t on#iguration struture It%at o# ourse s%ou"d !e initia"i.edJ.
Ae"ow are ontents o# samp"e on#iguration #i"e+ w%i% is distri!uted wit% m##t:
#
# Configuration file for mfft
CF o# 116
# '#' starts a comment
#
# Syntax of the file is as follows:
# key = value
# Please, read it carefully. If you have questions - contact me (see AUTHORS in mfft).
#
# MFFT CONFIG STARTS HERE
#
# Platform you are running now. Please, fill it in.
# Hints:
# - i686 - generic IA32
# - i586 - _why_ do you run it on such slow machine? :)
# - pentium3
# - pentium4
# - nocona - for x86_64 (Intel EM64T in 64bit mode)
# - k8 - for AMD64 in 64bit mode
# You can put whatever you want (even 'crap' if you work on Florian's computer :) - but
it should
# correspond to the CPU/ARCH you have (please, no spaces in it). I suggest using values
that can be used
# as arguments for '-march=' option in gcc/g++.
platform = pentium4
# Where do you keep your wisdom files?
# Remember, that you should be able to read and write inside this directory!
# Default: your $HOME
wisdom_directory = /nfs/staff/mic/Projects/wisdom
# Should I generate necessary wisdom files?
# This will take some time, but only once. Afterwards the engine
# will use built-in wisdom manager to load them.
# Default: no
# If you say no, then FFTs will be calculated with level = 1.
# You can always generate wisdoms - they are internally acumulated, so that once you
generate
# a wisdom it will not be forgotten (if you delete both files, then it will :).
# Remember - generation involves I/O operations and additional checks, system calls and
so on -
# it might have impact on the overall performance of mfft.
generate_wisdoms = no # yes / no
C5 o# 116
# Wisdom manager is a tool that is responsible for managing information
# about wisdoms you have in your wisdom file. It keeps the data in human readable form
# in the file with .info extension. You can have dozens of wisdoms if you want -
# wisdom managaer uses O(log n) search algorithm (BST).
# If you don't need it - disable it.
# If you enabled 'generate_wisdoms', then wisdom manager will be enabled, too.
enable_wisdom_manager = yes
# If you want to run immediately - no problem - but FFT calculation
# will take more time.
# 64 - heuristic (fast plan, slow execution)
# 32 - patient (~2-5 mins. for plan, fast execution)
# 8 - exhaustive (I got 30 mins for plan, very fast execution)
wisdom_level = 8
# Exponent sign when calculating the transform (for inverse (-1*fft_sign) is assumed)
# Default: 1
fft_sign = 1
# Show debug information (number of mul/div/add).
# Notice: this will slow down execution, it's just for fun.
# Default: no
# Does not depend on 'enable_debug'.
enable_plan_info = yes
# Enable debugging. When it's set to 'yes' then mfft prints a lot to stdout.
# It might have huge impact if your graphics card/drivers are not friends with
# 2D acceleration. Generally speaking - it's useful to see it, because you get
# a general idea about internal processes, but then it's completely useless.
enable_debug = no
#
# Enable time measurements.
# This enables time measurements for mfft. It will give you exactly the time spent
# on calculating FFT. Does not depend on 'enable_debug'.
enable_time = yes
# Time engine: you can select one of available engines, they are listed below:
C6 o# 116
# - tsc
# - gtod
# Default: gtod (compatibility and SMP safe)
# If you use tsc on SMP system:
# it might happen that you will receive bogus results. The idea of tsc engine is that
# it uses rdtsc command to access TSC on CPU. If you have more CPUs... and if your task
# is rescheduled to another CPU, then you might read TSC from a different CPU which of
course
# has no sense.
# 'tsc' can be considered as high precision time engine.
# 'gtod' is safe, but involves some overhead (it's a system call)
time_engine = gtod
# Enable threads (library has to be linked against fftw3_threads and pthread)
# Default: no
# Remember: plans depend on this feature (please read description below).
enable_threads = yes
# Number of threads (wisdom files depend on it)
# If you enabled threads, then it should be equal to number of CPUs you have.
# Remember, that plans (wisdoms) are generated for a given number of threads.
# For example, you have a wisdom file generated for 2 threads, then you change
# number of threads to 4 - and of course - you need a plan for 4 threads!
threads_number = 2
5nter#ae is "ean and #"exi!"e. Some o# t%e direti-es depend on ompi"ation
options+ "i,e enable8threa$s and threa$s8n"mber 3 t%ese options wi"" %a-e
e##et i# and on"y o# t%e "i!rary was ompi"ed wit% 34ith-fft42-threa$s
option.
'on#iguration #i"e is we"" doumented and presents a"" #eatures a-ai"a!"e in
m##t+ t%us ot%er omments are need"ess.
$n action: how to use the li*rar#
5mp"ementation is simi"ar to t%e steps desri!ed in ##twD manua". $%e
standard proedure is expanded wit% two additiona" ations t%at are m##t
spei#i: "i!rary;s setup and s%utdown. Ae"ow are detai"s o# imp"ementation:
Used types:
fftw_complex i, o;
fftw_plan plan;
struct mfft_conf_struct cnf;
C> o# 116
Structure of the body:
i = (fftw_complex *) fftw_malloc(N*sizeof(fftw_complex));
o = (fftw_complex *) fftw_malloc(N*sizeof(fftw_complex));
mfft_config(&cnf); // configure the library
plan = mfft_fft_prepare(&cnf, i,o,1024,1024,1); // prepare plan
[ initialize i with some data ]
mfft_fft_exec(&cnf, plan); // execute plan (calculate FFT)
[ repeat as many times as needed; put new data to i ]
mfft_shutdown(&cnf); // free memory taken by mfft
fftw_free(i); fftw_free(o);
5t is -ery important to remem!er to use fft48malloc and fft48free instead o#
standard malloc and free. First set o# #untions a"igns t%e data proper"y+ so
t%at S564 extensions an !e used.
For more detai"s a!out imp"ementation+ p"ease read soures o# m##test
program and 6o5nS pat%es Ia-ai"a!"e on t%e '4J.
2unning mfftest% mfft out&ut
6##test program is a simp"e test program #or m##t w%i% an !e used #or
measuring time o# exeution #or !ot% in-erse and #orward trans#orms. 5t ta,es
two arguments:
pat% to #i"e ontaining data Iea% point e*ua"s one "ineJR i# we denote -a"ue
as &+ t%en m##test wi"" reate omp"ex input as #o""ows: ( &)( &/ 2)
optiona" argument wit% num!er o# repetitions
##twD returns unsa"ed array o# -a"ues 3 to a-oid possi!"e s"ow downs oming
#rom #"oating point exeptions a#ter #orward trans#orm t%e resu"t is sa"ed I!y
t%e si.e o# arrayJ. 6##test is not suita!"e #or ma,ing measurements #or rea"-
time systems+ as it operates on t%e same set o# data+ t%us exp"oits a%e
properties. Ae"ow is samp"e output #rom m##test:
mfftest: Going to execute [2] pairs.
mfftest: scaling between fwd and inverse transform (by 1048576)
--------[1 of 2]--------
MFFT_TIME(gtod): execution time = 115 ms
MFFT_TIME(gtod): execution time = 99 ms
--------[2 of 2]--------
MFFT_TIME(gtod): execution time = 106 ms
MFFT_TIME(gtod): execution time = 100 ms
----------------
(utput a!o-e does not ontain any de!ugging messages. Ae"ow is output wit%
CG o# 116
a"" de!ugging messages present:
mic@mic ~/Projects/mfftest $ ./mfftest data.txt
hostname: mic
platform: pentium4
cpu: 2993 MHz
wisdom_directory: /nfs/staff/mic/Projects/wisdom
generate_wisdoms: no (0)
wisdom_level: 8
fft_sign: 1
enable_debug: yes (1)
enable_time: yes (1)
time_engine: 1 (gtod)
enable_threads: yes (1)
threads_number: 2
enable_wisdom_manager: yes (1)
bst_created: no
enable_plan_info: yes
MFFT[mfft.c:630] mfft_wisdom_holder: bst_create
MFFT[mfft.c:636] mfft_wisdom_holder: bst_create success
MFFT[mfft.c:639] Looking for wisdom info file...
MFFT[mfft.c:329] mfft_wisdom_name: Wisdom name
MFFT[mfft.c:336] /nfs/staff/mic/Projects/wisdom/mic-pentium4.info
MFFT[mfft.c:656] mfft_wisdom_holder: wisdom info finished.
- size: 1024x1024, level: 32, threads: 2, type: -1
- size: 1024x1024, level: 32, threads: 2, type: 1
- size: 1024x1024, level: 64, threads: 2, type: -1
- size: 1024x1024, level: 64, threads: 2, type: 1
MFFT[mfft.c:403] mfft_fft_prepare: input size
MFFT[mfft.c:404] - cols = 1024
MFFT[mfft.c:405] - rows = 1024
MFFT[mfft.c:408] mfft_fft_prepare: threaded, running plan
MFFT[mfft.c:410] mfft_fft_prepare: initializing threads...
MFFT[mfft.c:412] mfft_fft_prepare: fftw_init_threads() = 1
MFFT[mfft.c:414] mfft_fft_prepare: number of threads = 2
MFFT[mfft.c:288] mfft_wisdom_loaded = 0
MFFT[mfft.c:312] mfft_wisdom_name: Wisdom name
MFFT[mfft.c:319] /nfs/staff/mic/Projects/wisdom/mic-pentium4.fftw
MFFT[mfft.c:296] mfft_load_wisdom: file found, importing
MFFT[mfft.c:302] mfft_load_wisdom: finished
MFFT[mfft.c:429] mfft_fft_prepare: INFO BODY!
MFFT[mfft.c:431] mfft_fft_prepare: WISDOM MANAGER INFO ->
MFFT[mfft.c:437] mfft_fft_prepare: plan not present, adding to current tree
CC o# 116
MFFT[mfft.c:439] mfft_fft_prepare: it might take some time to create new wisdom unless
you run on ESTIMATE
MFFT[mfft.c:445] mfft_fft_prepare: plan...
MFFT[mfft.c:370] mfft_plan_info: add (additions), mul (multiplications), fma (fused mul-
add operations)
add = 26214400, mul = 9437184, fma = 524288
flops = 36175872 (hw support*) or 36700160 (w/o support*), * - fused mul-add
operations
MFFT[mfft.c:403] mfft_fft_prepare: input size
MFFT[mfft.c:404] - cols = 1024
MFFT[mfft.c:405] - rows = 1024
MFFT[mfft.c:408] mfft_fft_prepare: threaded, running plan
MFFT[mfft.c:288] mfft_wisdom_loaded = 1
MFFT[mfft.c:429] mfft_fft_prepare: INFO BODY!
MFFT[mfft.c:431] mfft_fft_prepare: WISDOM MANAGER INFO ->
MFFT[mfft.c:437] mfft_fft_prepare: plan not present, adding to current tree
MFFT[mfft.c:439] mfft_fft_prepare: it might take some time to create new wisdom unless
you run on ESTIMATE
MFFT[mfft.c:445] mfft_fft_prepare: plan...
MFFT[mfft.c:370] mfft_plan_info: add (additions), mul (multiplications), fma (fused mul-
add operations)
add = 26214400, mul = 9437184, fma = 524288
flops = 36175872 (hw support*) or 36700160 (w/o support*), * - fused mul-add
operations
mfftest: Reading data
[...]
mfftest: Going to execute [1] pairs.
mfftest: scaling between fwd and inverse transform (by 1048576)
--------[1 of 1]--------
MFFT[mfft.c:479] mfft_exec: go!
MFFT_TIME(gtod): execution time = 114 ms
MFFT[mfft.c:509] mfft_exec: done!
MFFT[mfft.c:479] mfft_exec: go!
MFFT_TIME(gtod): execution time = 100 ms
MFFT[mfft.c:509] mfft_exec: done!
----------------
[...]
MFFT[mfft.c:514] -- MFFT SHUTDOWN --
MFFT[mfft.c:516] Freeing resources... [multi-threaded]
MFFT[mfft.c:521] Freeing resources... [single API]
MFFT[mfft.c:620] mfft_wisdom_holder: checking if bst_table has to be freed...
MFFT[mfft.c:622] mfft_wisdom_holder: bst_destroy
MFFT[mfft.c:526] Bye bye!
&t %ig%est -er!osity "e-e" Ioutput a!o-eJ t%e "i!rary "oo,s as -ery ta",ati-e 3
t%is was done on purpose. /ood de!ugging an simp"i#y !ot% de-e"opment and
100 o# 116
#urt%er usage Iand in t%is ase it sti"" possi!"e disa!"e it tota""y 3 onsu"t option
3ena!"e-si"entJ.
4isa!"ing de!ugging mig%t !e !ene#iia" in terms o# speed !eause:
not%ing is sent to stdout2stderr
t%e #o""owing maro simp"y disappears #rom t%e resu"ting !inary ode
#if HAVE_MFFT_SILENT
#define MFFT(msg);
#define MFFT_INT(msg,val);
#else
#define MFFT(msg) if(mfft->enable_debug) { printf("MFFT[%s:%.3d] %s\n",
__FILE__,__LINE__ ,msg); fflush(stdout); }
#define MFFT_INT(msg,val) if(mfft->enable_debug) { printf("MFFT[%s:%.3d] %s = %d\n",
__FILE__,__LINE__ ,msg,val); }
#endif
Integration into =oIn+
1se o# m##t is optiona" and %as to !e exp"iit"y ena!"ed during ompi"ation
time. $%e #o""owing maro was added to t%e 6o5nS; on#igure.in #i"e:
AC_ARG_ENABLE(mfft,[ --enable-mfft enable MFFT library (FFT)],
[
AC_CHECK_LIB(mfft,mfft_config,[
AC_DEFINE([HAVE_MFFT],1,[MFFT and FFTW are present])
MFFTLIB="-lmfft"]
)])
B%en m##t is present and ena!"ed+ maro sets -a"ue o# 6FF$L5A -aria!"e
w%i% is #urt%er re#erened in appropriate 6a,e#i"e.am #i"e:
libImageFkt_la_LIBADD = $(MFFTLIB)
5t a"so de#ines preproessor -aria!"e 8&L9V6FF$+ so t%at pat% ode is
in"uded.
Ae"ow is t%e pat% Ion"y #or #orward trans#orm 3 t%ey di##er in"y in t%e signJ
t%at #i""s t%e !ody o# t%e ##tIJ routine:
#if HAVE_MFFT
fftw_complex *fout,*fin;
double * inputArr,*outputArr;
int _cols, _rows, _loop;
fftw_plan p;
inputArr = (double *) inp.pData();
_cols = inp.cols();
101 o# 116
_rows = inp.rows();
_loop = 2*_cols*_rows;
fout = (fftw_complex *) fftw_malloc(sizeof(fftw_complex)*_cols*_rows);
fin = (fftw_complex *) fftw_malloc(sizeof(fftw_complex)*_cols*_rows);
mfft_config(&mfft_cnf);
p = mfft_fft_prepare(&mfft_cnf, fin, fout, _cols, _rows,1);
int cnt = 0;
for(int i = 0; i<_loop; i+=2) {
fin[cnt][0] = inputArr[i];
fin[cnt++][1] = inputArr[i+1];
}
mfft_fft_exec(&mfft_cnf, p);
out = inp;
outputArr = (double *)out.pData();
cnt = 0;
for(int i = 0; i<_loop; i+=2) {
outputArr[i] = fout[cnt][0];
outputArr[i+1] = fout[cnt++][1];
}
fftw_free(fout);
fftw_free(fin);
#else
Sine #or "arge trans#orms I"i,e 102Fx102FJ t%ere is no di##erene !etween in-
p"ae and out-o#-p"ae a"u"ations+ w%y not trying in-p"ae a"gorit%mH Bit% in-
p"ae a"gorit%m we mig%t onser-e a!out 166A o# memory spae I#or
102Fx102F imageJ and some o-er%ead+ !eause memory a""oations means
system a"". &not%er t%ing is+ t%at su% a""oated memory spae %as to !e
proper"y a"ignedJ.
(ne ou"d say :w%y !ot%er+ w%en it %anges so "itt"e=. $%at "itt"e an !e
e-eryt%ing on rea"-time systems+ w%ere e-eryt%ing ounts.
=easurements
&"" soure #i"es oming #rom di##erent !en%mar,s+ in"uding a%e
!en%mar,s an !e #ound on atta%ed '4.
$%e most important are o# ourse measurements o# FF$ routines 3 !e#ore and
a#ter t%e pro)et !egan.
102 o# 116
-o$n'
&"" measurements were per#ormed on:
5@ 5nte"IRJ PentiumIRJ F 'P1 D.00/8. 8$
:9M 5126A
5@ F=9DS #pu -me de pse ts msr pae me xG api sep mtrr pge ma
mo- pat pseD6 "#"us% dts api mmx #xsr sse sse2 ss %t tm p!e
pni monitor dsVp" id xtpr
Fernel 2.6.1F-gentoo-r5 \1 S6P
=in"-
$istrib"tion
/entooR optimi.ed #or urrent %ardware
5n !ot% ases 6o5nS was ompi"ed wit% t%e #o""owing 'FL&/S:
-Bno-depreated -mar%TpentiumF -(2 -#unro""-"oops -m#pmat%Tsse -msse2
-msse -mmmx -##ast-mat%
S%edu"ing po"iy: 0o Fored Preemption
Soure omputations a-ai"a!"e on t%e '4 Iodt spreads%eetJ.
If not state$ then *"t-of-#lace algorithm
Im#l. 'otes For4ar$
Na&gO
In&erse
Na&gO
For4ar$
Nst$e&O
In&erse
Nst$e&O
(rigina" 5n-p"ae
a"gorit%m
122Cms 116>ms >6.G> 1F.1
6FF$ 9x%austi-e
U 2 t%reads
105ms 10Fms 0.F6 0.G2
6FF$ 9stimate U
2 t%reads
D60ms D5Cms 1.6 0.G2
6FF$ 9stimate+
no
t%reading
D6>ms D65ms 2.2D D.51
Best config"ration ? original an$ MFFT, ratio! 11./1 (f4), 11.22(in&)
Aorst config"ration ? original an$ MFFT, ratio! 2.20 (f4), 2.16 (in&)
In-#lace algorithm
Im#l. 'otes For4ar$
Na&gO
In&erse
Na&gO
For4ar$
Nst$e&O
In&erse
Nst$e&O
6FF$
9x%austi-e
U 2 t%reads
CDms C2ms 1.2G 0.5
10D o# 116
Im#l. 'otes For4ar$
Na&gO
In&erse
Na&gO
For4ar$
Nst$e&O
In&erse
Nst$e&O
6FF$
9stimate U
2 t%reads
D66ms D6>ms F.5 5.FF
6FF$
9stimate+
no
t%reading
DG2ms D>>ms >.1G G.5
Best config"ration ? original an$ MFFT, ratio! 12.21 (f4), 12.3.(in&)
Aorst config"ration ? original an$ MFFT, ratio! 2.21 (f4), 2.16 (in&)
Best conf. - In-#lace &s. o"t-of-#lace, MFFT, ratio! 1... (f4), 1... (in&)
Aorst conf. - In-#lace &s. o"t-of-#lace, MFFT, ratio! 1.10 (f4), 1.12
(in&)
Differences bet4een o"t-of-#lace an$ in-#lace calc"lations in terms of
;S#atialMonogenicTrafo< meas"rement #oint.
Tests #erforme$ on 5DTest $emo #rogram (9"thor! Morten Frogh
Sko&).
Im#l.
In-#lace *"t-of-#lace
=eft NsO :ight NsO =eft NsO :ight NsO
(rigina" 15.1>F2 15.221> - -
6FF$ Iestimate+ 2
t%readsJ
G.2F1F5 G.1>6D G.00F2G >.CF0>F
6FF$ Iex%austi-e+ 2
t%readsJ
5.5306/ 5.50112 5.3032/ 5.3336.
6FF$ Iestimate+ no
t%readingJ
G.1>F25 >.CG>0G G.0F55C G.0DFC1
(rigina" ISJ 1>.>2F5 1>.6G1 -- --
6FF$ Iex%austi-e+ 2
t%readsJ ISJ
5.6>CF1 5.6DG6G -- --
(7) - f"lly #reem#tible kernel
5t is "ear"y -isi!"e+ t%at in-p"ae a"gorit%m per#orms mu% !etter and o##ers
%ig%er :a"u"ation sta!i"ity= Ia"u"ation time o# suessi-e trans#orms does
not di##er mu%J.
5n-p"ae pat% is urrent"y present in 6o5nS.
Best config"rations for in-#lace algorithms! f"lly #reem#tible kernel
10F o# 116
Im#l. 'otes For4ar$
Na&gO
In&erse
Na&gO
For4ar$
Nst$e&O
In&erse
Nst$e&O
(rigina" 5n-p"ae
a"gorit%m
1F0Cms 1FGDms CF.11 1FG.F2
6FF$ 9x%austi-e
U 2 t%reads
GCms GCms 1.6F21 0.G1
Time for ;S#atialMonogenicTrafo< #oints 4as meas"re$ 4ith tictoc
"tility 4ritten by Morten Frogh Sko& ("ses TS5).
MFFT timing ? gto$ time engine.
-FFT5'T
6##test program #or testing m##t "i!rary. 50 measurements per ea% test. 0o
distintion !etween #orward and in-erse trans#orms. 8ig% standard de-iation
means+ t%at t%ere was a su!stantia" di##erene in time o# exeution !etween
#orward and in-erse trans#orms. 1n"ess stated tests were per#ormed #or out-
o#-p"ae a"gorit%ms.
.lorian
5@ 5nte"IRJ MeonI$6J 'P1 D.F0/8. 8$ 966F$+ 102FKA a%e
:9M F/A
5@ F=9DS #pu -me de pse ts msr pae me xG api sep mtrr pge ma
mo- pat pseD6 "#"us% dts api mmx #xsr sse sse2 ss %t tm p!e
"m pni monitor dsVp" est id x16 xtpr
Fernel 2.6.11.F-21.11-!igsmp \1 S6P
=in"-
$istrib"tion
SuS9 Linux C.D Ii5G6J ID2-!itJ
Mo$el Time engine 9&erage StDe&
9x%austi-e U 2
t%reads
gtod 10>ms 1D.01
ts 10G.5ms 12.C
9stimate U 2
t%reads
gtod F20ms 11.G1
ts F21ms 12
9stimate+ no
t%reads
gtod F20ms 11.>G
ts F21ms 1D.C
=ic
5@ 5nte"IRJ PentiumIRJ F 'P1 D.00/8. 8$+ 102FKA a%e
:9M 5126A
105 o# 116
5@ 5nte"IRJ PentiumIRJ F 'P1 D.00/8. 8$+ 102FKA a%e
5@ F=9DS #pu -me de pse ts msr pae me xG api sep mtrr pge ma
mo- pat pseD6 "#"us% dts api mmx #xsr sse sse2 ss %t tm p!e
pni monitor dsVp" id xtpr
Fernel 2.6.1F-gentoo-r5 \1 S6P
=in"-
$istrib"tion
/entooR optimi.ed #or urrent %ardware
Mo$el Time engine 9&erage StDe&
9x%austi-e U 2
t%reads
gtod 10Dms F
ts 10Dms C.D
9stimate U 2
t%reads
gtod D>Gms C.D>
ts D>>ms 2
9stimate+ no
t%reads
gtod DG1ms 2.CG
ts DG1ms D.D
,tu(
5@ 4ua" 'ore &64 (pteronItmJ Proessor 2>0 2.0/8. I2xJ+
102FKA a%e
:9M F/A
5@ F=9DS #pu -me de pse ts msr pae me xG api sep mtrr pge ma
mo- pat pseD6 "#"us% mmx #xsr sse sse2 %t sysa"" nx
mmxext #xsrVopt "m Ddnowext Ddnow pni "a%#V"m mpV"egay
Fernel 2.6.16-gentoo-r> \2 S6P
=in"-
$istrib"tion
/entooR optimi.ed #or urrent %ardware
Mo$el Time engine 9&erage StDe&
9x%austi-e U F
t%reads
gtod FGms 2.C2
9stimate U F
t%reads
gtod 11Fms DG.DC
9stimate+ no
t%reads
gtod 2DCms >D.5
106 o# 116
Hol/s
5@ 4ua" 'ore &64 (pteronItmJ Proessor 2G5 2.6/8. Ix2J+
102FKA a%e
:9M F/A
5@ F=9DS #pu -me de pse ts msr pae me xG api sep mtrr pge ma
mo- pat pseD6 "#"us% mmx #xsr sse sse2 %t sysa"" nx
mmxext #xsrVopt "m Ddnowext Ddnow pni "a%#V"m mpV"egay
Fernel 2.6.16-gentoo-r> \D S6P PR996P$
=in"-
$istrib"tion
/entooR optimi.ed #or urrent %ardware
Mo$el Time engine 9&erage StDe&
9x%austi-e U F
t%reads
gtod 5Fms F.2
9stimate U F
t%reads
gtod 110ms DF.62
9stimate U 2
t%reads
gtod 1DDms D2.2
9stimate+ no
t%reads
gtod 220ms 62.C2
In-&lace algorithms
S%edu"ing po"iy: no #ored preemption.
Kost Mo$el 9&erage StDe&
-o",s
6FF$ Iex%austi-e+
F t%readsJ
FCms 2.6C
-o",s
6FF$ Iestimate+ F
t%readsJ
111ms D0.D1
-o",s
6FF$ Iestimate+
no t%readsJ
221ms 62.D1
mi
6FF$ Iex%austi-e+
2 t%readsJ
C2ms 2.10
mi
6FF$ Iestimate+ 2
t%readsJ
D>2ms 0.G5
mi
6FF$ Iestimate+
no t%readsJ
DG0ms 0.C0
In-&lace0 out-of-&lace "ith fully &reem&tible /ernel
$ests per#ormed on"y on ma%ine :mi=.
10> o# 116
Mo$el 9lgorithm 9&erage StDe&
6FF$ Iex%austi-e+
2 t%readsJ
out-o#-p"ae CCms D.D2
in-p"ae GCms 0.62
6FF$ Iestimate+ 2
t%readsJ
out-o#-p"ae DC2ms 1G.21
in-p"ae DG0ms D.GG
6FF$ Iestimate+
no t%readsJ
out-o#-p"ae F01ms 1C.15
in-p"ae DG0ms 0.6C
"D FFT
5mp"ementation t%at was introdued in t%e #irst part. P"ease %a-e in mind t%e
#at+ t%at transposition is done out-o#-p"ae It%us+ is ine##iientJK
50 measurements per test.
Kostname 9&erage StDe&
F"orian D>1ms 1.0>
mi F2>ms 12.FG
*tux D20ms 1.0D
-o",s 2G0ms 2.0D
(n ea% ma%ine #i"e 2d. was ompi"ed wit% t%e #o""owing 'FL&/S:
-(2 -m#pmat%Tsse -msse -msse2 -#unro""-"oops -mar%T]&R'8^
'ci-ar(
P"ease ompare resu"ts o!tained on %ost nata"ie Ion"y 12GKA o# a%eJ. (n
ea% ma%ine Si6ar, was ompi"ed wit% t%e same 'FL&/S as 2d..
0ata"ie;s on#iguration:
5@ 5nte"IRJ 'e"eronIRJ 'P1 2.G0/8.+ 12GKA a%e
:9M 5126A
5@ F=9DS #pu -me de pse ts msr pae me xG api sep mtrr pge ma
mo- pat pseD6 "#"us% dts api mmx #xsr sse sse2 ss %t tm p!e
id xtpr
Fernel 2.6.11.10 \> S6P
=in"-
$istrib"tion
4e!ian testing2unsta!"e
10G o# 116
.lorian
Using 2.00 seconds min time per kenel.
Composite Score: 602.36
FFT Mflops: 498.35 (N=1024)
SOR Mflops: 481.74 (100 x 100)
MonteCarlo: Mflops: 185.13
Sparse matmult Mflops: 718.20 (N=1000, nz=5000)
LU Mflops: 1128.37 (M=100, N=100)
mic
Using 2.00 seconds min time per kenel.
Composite Score: 472.40
FFT Mflops: 350.25 (N=1024)
SOR Mflops: 398.13 (100 x 100)
MonteCarlo: Mflops: 114.23
Sparse matmult Mflops: 579.96 (N=1000, nz=5000)
LU Mflops: 919.42 (M=100, N=100)
tu(
Using 2.00 seconds min time per kenel.
Composite Score: 580.05
FFT Mflops: 571.88 (N=1024)
SOR Mflops: 467.71 (100 x 100)
MonteCarlo: Mflops: 242.93
Sparse matmult Mflops: 642.51 (N=1000, nz=5000)
LU Mflops: 975.24 (M=100, N=100)
#ol/s
Using 2.00 seconds min time per kenel.
Composite Score: 756.56
FFT Mflops: 745.40 (N=1024)
SOR Mflops: 609.80 (100 x 100)
MonteCarlo: Mflops: 316.74
Sparse matmult Mflops: 834.85 (N=1000, nz=5000)
LU Mflops: 1276.01 (M=100, N=100)
natalie
Using 2.00 seconds min time per kenel.
Composite Score: 357.17
FFT Mflops: 289.74 (N=1024)
SOR Mflops: 555.32 (100 x 100)
10C o# 116
MonteCarlo: Mflops: 98.33
Sparse matmult Mflops: 564.97 (N=1000, nz=5000)
LU Mflops: 277.51 (M=100, N=100)
Cache *enchmar(
Aen%mar,ing program t%at generates G memory !andwidt% ur-es. Pro)et
name: :a%e!en%=.
.lorian
110 o# 116
=ic
,tu(
111 o# 116
Hol/s
FFT on 0P1
$est program was run on %ost :-o",s= under Bindows MP Pro#essiona" x6F.
/rap%is !oard: I-- %e, exat"y mode" -- J
Soures an !e #ound on atta%ed '4 It%ey ome #rom t%e pu!"iation :$%e
FF$ on /P1=J.
$%e test in"uded: 512x512 "ow-pass #i"tering+ 102Fx102F "ow-pass #i"tering
and )ust rendering wit%out FF$ #i"tering p"us origina" resu"ts o!tained !y K.
6ore"and and 9. &nge".
FF$ is per#ormed on a"" #our o"or %anne"s o# t%e :1ta% teapot=.
6o-ies #rom tests are present on atta%ed '4.
Kost Eector size Frame
time
Na&gO
FS 'otes
-o",s 512x512 0.0DDFs 2C.CF Fi"tering
-o",s 102Fx102F 0.1166s G.5> Fi"tering
-o",s 512x512 0.0166s 60.2F 0o #i"tering
-o",s 102Fx102F 0.0166s 60.2F 0o #i"tering
ISJ 102Fx102F 2.>02>s 0.D> Fi"tering
112 o# 116
Kost Eector size Frame
time
Na&gO
FS 'otes
ISJ 512x512 0.625s 1.6 Fi"tering
(7) res"lts come from ;The FFT on D@< #"blication (F. Morelan$, +.
9ngel)
Per#ormane inrease in terms o# FPS #or 512x512: 1../1
Per#ormane inrease in terms o# FPS #or 102Fx102F: 22.13
Resu"ts in t%e ta!"e exat"y s%ow %ow #ast is FF$ on /P1 and %ow #ast are
/P1s t%emse"-es e-o"-ing Ipu!"iation was re"eased in 200DJ.
.inal conclusions
$%ere are "ean ases+ w%ere it is possi!"e to draw on"usions t%at wi"" !e
a"ways true. 1n#ortunate"y+ t%ey are minority. 'urrent"y t%e te%no"ogy is so
ad-aned+ t%at it is pratia""y impossi!"e to examine possi!"e outome
wit%out per#orming any tests.
5t is de#inite"y wort% using a"" ru"es2tri,s t%at mig%t a##et per#ormane:
a%e-#riend"y ru"es+ optima" a"gorit%ms and so on 3 t%ey wi"" a"ways pay !a,
in !etter or t%e same per#ormane Iin most ases t%ere is no ris,J. Aut t%ere
are situations+ w%ere t%e programmer s%ou"d go e-en "oser to t%e %ardware+
!eause t%at is t%e on"y way o# getting power out o# ontemporary units. $%at
is #or sure t%e ase o# t%e :FF$ pro!"em=+ w%ere t%e mixture o# S564
extensions and proper a"gorit%ms an gi-e -ery good resu"ts+ una%ie-a!"e #or
standard imp"ementations.
0owadays we wor, wit% ma%ines t%at %a-e giga!ytes o# R&6 memory+
giga%ert. proessors and so on. For rea"-time programming w%i% additiona""y
in-o"-es operations on "arge streams o# data it is -ery important to remem!er
at ea% stage o# de-e"opment proess+ t%at e&ery single operation wi"" !e
repeated !i""ions o# times+ and t%us e-en sing"e memory
a""oation2dea""oation+ system a"" and simi"ar ations mig%t a##et t%e o-era""
resu"t Ip"ease rea"" situation+ w%en new pat% #or 6o5nS was reated 3 it got
rid o# one memory a""oation2dea""oation p"us t%e a"gorit%m was %anged
#rom out-o#-p"ae to in-p"aeR t%e outome was o# ourse #aster exeution and
%ig%er sta!i"ity o# t%e FF$ routineJ. &not%er t%ing is+ t%at e-en 1 mi""iseond
is wort% t%e e##ort+ !eause t%e same mi""iseond i# o!tained in se-era" p"aes
an speed up t%e exeution a "ot.
$%e #ina" pro)et and 9ngineering pratie made it possi!"e to draw a"so ot%er
on"usions+ t%at are "isted !e"ow:
1. 9-en w%en using ready made "i!raries one s%ou"d posses enoug%
,now"edge+ so t%at ot%ers wor, wi"" not !e wasted.
This is es#ecially im#ortant 4hen com#iling $ifferent #arts of
soft4are. Kar$4are an$ soft4are kno4le$ge is &ery im#ortant
an$ 4ill for s"re be beneficial 4hen "sing a com#iler that
s"##orts $ifferent e-tensionsGarchitect"res ? an$ that is
$efinitely the gcc. Many soft4are #ackages try to be as "ni&ersal
11D o# 116
an$ #ortable as #ossible an$ th"s offer many com#ilation-time
o#tions, 4hich can a$a#t the library to the har$4are one
#ossesses. Aitho"t kno4le$ge abo"t the har$4are an$ its
ca#abilities, e&en the best co$e 4ill not #erform goo$, I"st
beca"se it 4ill not "tilize the f"ll #o4er that lies in the har$4are.
2. (ne s%ou"d not design %is2%er own FF$ routinesR t%ere are se-era" ready
"i!raries+ w%i% in many ases are resu"t o# years o# de-e"opment and
wi"" pro-ide mu% !etter resu"ts 3 and w%at is -ery important 3 in most
ases+ porta!i"ity wi"" not inter#ere wit% per#ormane I!est examp"e:
FF$BJ.
This is the most contro&ersial concl"sion. *f co"rse, e&erything
$e#en$s on the #roIect backgro"n$, b"t in many cases it is not
the best i$ea, beca"se it might en$ "# on ;rein&enting the 4heel<
- 4ith better or 4orse res"lts (in many cases, 4ith 4orse). ;:e"se
of co$e< sho"l$ be consi$ere$ 4hene&er #ossible (if the co$e has
satisfactory ,"ality an$ license allo4s to $o so). In my #roIect,
selection of FFTA 4as a strike home an$ in similar cases,
4hene&er FFT 4ill be in&ol&e$ I 4ill al4ays enco"rage to test it
first.
D. 9a% pro)et %as its own %ardware and so#tware demandsR t%ere is no
go"den midd"e+ !ut t%ere are %ints %ow to test w%i% on#iguration is t%e
!est one.
Benchmarks, re&ie4s, res"lts, test cases ? e&erything r"n on
com#"ters that are going to "se the soft4are ? an$ of co"rse,
analysis of that 4hat can be fo"n$ on the Internet. The more
information yo" ha&e, the higher chances for yo"r s"ccess.
F. 0e-er trust your ode 3 e-en i# it wor,s re"ia!"y. <ou ne-er ,now...
Segmentation fa"lts 4ill al4ays a##ear 4hen yo" $o not e-#ect
them to a##ear. In case of 5 ? al4ays think t4ice before yo" #"t a
line. Po" $o not ha&e to be an e-#ert in g$b or other $eb"ggers
(like +lectric Fence) ? it is I"st eno"gh if yo" kno4 ho4 to obtain
a backtrace, 4hich 4ill #oint yo" to the #oint 4here something
4ent 4rong. 9n$ al4ays remember! #ointers $o not lie an$
memory leaksGb"ffer o&erflo4s are &ery $iffic"lt to s#otL
5. 4o not !e a#raid o# experimentingR you wi"" not waste your time 3 #ai"ed
experiments are a"so important+ !eause you ,now t%ey wi"" #ai" I5 spent
more t%an a wee, on imp"ementing and testing networ, "ient #or
so"-ing FF$ pro!"em on se-era" ma%ines 3 it #ai"ed+ !ut 5 "earned a "ot
during t%at time and now 5 ,now+ t%at t%is is not a good so"ution+
espeia""y not #or rea"-time systems and not #or today;s networ,ingJ.
Statistics lie, b"t yo" cannot change the n"mbers. Testing an$
e-#erimenting 4ill hel# a lot in $etermining o#timal sol"tion of
yo"r #roblem. Besi$es, it &ery rarely ha##ens, that yo" can a##ly
some changes to e-isting co$e an$ that 4ill not re,"ire any other
mo$ifications or $ata con&ersions. This of co"rse again 4ill affect
o&erall #erformance.
5 am aware o# t%e #at+ t%at t%is t%esis mig%t not pro-ide a"" neessary piees
11F o# 116
o# in#ormation to so"-e t%e :FF$ pro!"em= in t%e !est way. 0e-ert%e"ess+ it
an !e de#inite"y treated as a good starting point #or #urt%er resear% and
e-en !etter+ #aster imp"ementations.
5&&endices
& '4 is en"osed wit% t%e report t%at ontains:
soure ode o# a"" programs+ routines and pro)ets Iexept 6o5nSJ
mentioned in t%is paper
FF$ image ga""ery Iin P/6 #ormatJ
-ideo #ootages #rom :FF$ on a /P1= experiment
resu"ts o# tests and !en%mar,s p"us t%eir soure ode
5c/no"ledgments
Bit%out t%ese peop"e my wor, wou"d !e mu% %arder.
S"#er&isors!
8enning 8augaard+ 9ngineering 'o""ege o# 'open%agen
0or!ert Krueger+ &a"!org 1ni-ersity 'open%agen
9ccess to -.3830 machines!
Lars Fro,)_r Knudsen+ &&1K 5$ Iideas and '2'UU %intsJ
6orten Krog% S,o-+ &&1K
9ccess to +M30T!
F"orian Pi".+ &&1K Iintrodution to 6o5nSJ
The last, b"t not the least!
S`ren Fi"ten!org @ensen+ &&1K 5$ I8ardwareJ
6i,ae" Kr`yer+ &&1K 5$ I8ardware+ !"a,!oard and :%appy *uotes=J
4aria Aarna Imore t%an supportJ
Ro"# 0orda%" I#eed!a, and persona" ad-iesJ
Ste#ania Sera#in Iwords o# wisdom a!out experimentsJ
$omas. 4ud.is. Igenera" supportJ
$omas. Sterna I'P1 re"ated onsu"tationsJ
@anne &nderson IpatieneJ
9mp"oyees at &a"!org 1ni-ersity 'open%agen+ t%at "et me spend "ast
semester o# my As studies in nie and #riend"y atmosp%ere.
115 o# 116
2eferences
1. Arig%am 9. (ran+ ;The Fast Fo"rier Transform an$ its
a##lications<+ 9ng"ewood '"i##s+ 0ew @erseyR Prentie 8a"". 5SA0 0-1D-
D0>505-2
2. 9"eanor '%u+ &"an /eorge+ ;Insi$e the FFT black bo-<R 'R' Press
I2000J. 5SA0 0-GFCD-02>0-6
D. 6atteo Frigo+ Ste-en /. @o%nson+ ;The Design an$ Im#lementation
of FFTA2<+ 5n-ited Paper+ $%e FF$B we! pageR %ttp:22www.##tw.org2
F. 4. &. Patterson+ @.L 8ennessy+ ;5om#"ter *rganization an$ Design<+
6organ Kau#mann I2005J. 5SA0 1-55G60-60F-1. N%apters D->O
5. B. 8. Press+ S. &. $eu,o"s,y+ B. $. Letter"in and A. P. F"annery+
;'"merical :eci#es in 5<C 'am!ridge 1ni-ersity Press. 5SA0 0-521-
FD1-0G-5. N%apter 12O
6. 9"i 6aor+ ;Trigonometric Delights<+ Prineton+ 0ew @erseyR Prineton
1ni-ersity Press I1CCGJ. 5SA0 0-6C1-05>5F-0. N%apter 15O
>. Pau" Aour,e+ ;2 Dimensional FFT< and ;DFT an$ FFT<+ Pau"
Aour,e;s we! pageR %ttp:22astronomy.swin.edu.au2ap!our,e2ot%er2
G. 6atteo Frigo+ Ste-en /. @o%nson+ ;FFTA Man"al for &ersion 2.1<+
$%e FF$B we! pageR %ttp:22www.##tw.org2
C. 6u"tip"e aut%ors+ ;gcc man"al #ages<+ Ieit%er bman gb or a-ai"a!"e
at %ttp:22g.gnu.org2J
10.Ro!ert Lo-e+ ;=in"- Fernel De&elo#ment< IPo"is% -ersionJR S&6S
Pu!"is%ing IPo"is% trans"ation !y 89L5(0+ 200FJ. 5SA0 GD->D61-FDC->.
N%apters 1-FO
11.K. 6ore"and+ 9. &nge"+ ;The FFT on a D@<R /rap%is 8ardware
I200DJ
12.6u"tip"e aut%ors+ ;D@ Dems 2<+ P%arr2&ddison Bes"ey N%apters
a-ai"a!"e in :Fu"" 'ourse 0otes= doument a-ai"a!"e on
%ttp:22www.gpgpu.org2J
1D. 6u"tip"e aut%ors+ ;Aiki#e$ia, the Free +ncyclo#e$ia<+ Bi,ipedia
Be! pageR %ttp:22www.wi,ipedia.org2
1F.@er.y S.a!atin+ ;o$sta4y teorii sygnaQR4<+ Barsaw I200DJR BKc.
5SA0 GD-206-1DD1-0
116 o# 116

Anda mungkin juga menyukai