JANUARY I 9 Y i
73
I. INTRODUCTION
EURAL networks are a class of computational architectures which are composed of interconnected, simple
processing nodes with weighted interconnections. The term
neural reflects the fact that initial inspiration for such networks
was derived from the observed structure of biological neural
processing systems. Feedforward neural networks define a
significant subclass within the class of neural network architectures. Feedforward neural networks are usually static
networks with a well-defined direction of signal flow and no
feedback loops. Applications of feedforward neural nctworks
have been to the task of learning maps from discrete data.
Examples of such map learning problems can be found in areas
such as speech recognition [15], control and identification of
dynamical systems [20] and robot motion control [13], [14], to
name a few. In most of these applications. feedforward neural
Manuscript received April 22, 1901; revised April IO. IYV2. This work
was supported in part by the National Science Foundation\ Engineering
Research Centers Program NSFD CDR 8803012. by the Air Forcc Office
of Scientific Research under Contract AFOSR-8s-0204, and h) the Naval
Research Ldhoratory.
Y . C. Pati is with the Department of Elcctrical Engineering and Systems
Research Center, University of Maryland. College Park. MD 20742. and also
with the Nanoelectronics Processing Facility, Code 0804. Naval Research
Laboratories, Washington. DC 20016.
P. S. Krishnaprasad is with the Department of Electrical Engineering and
Systems Research Center, University of Maryland. College Park. MD 20742.
IEEE Log Number 9201252.
14
un
::Lp-g
un
(b)
Fig. 2.
Fig. 1.
Fig. 3.
A W ,=~ - F -
iJE
dWZj
and AI, = - E - .
dE
31,
We will use U-,, to denote the weight applied to the output 0, of the It11
neuron when connecting it to the input of the / t h neuron. I , is the bias input
to the It11 neuron.
Input Layer
Hidden Layers
0IpI Layer
I f
f l
cl,
-~11,112
cI(f. M2 BlIf1l2
L
(2)
I,
for every
Remarks:
(a) A frame { I ) , , } with frame bounds -4= U is called a
tight frame.
(b) Every orthonormal basis is a tight frame with .4 =
U = I.
(c) A tight frame of unit-norm vectors for which A = 13 =
1 is an orthonormal basis.
Given a frame { I ) , , } in the Hilbert space M, with frame
bounds .1 and B,we can define the frame operator, S : M i
76
fl
Fig. 5.
IH,
X(f>
h,)hn.
(3)
W C ( l ~ ~ '=
)
.llf1I2
where i
j denotes the complex conjugate of g, and the norm
11 . 11 on L2(IR.) is defined by l l f 1 1 2 = ( f , f ) .
wl~(w)lzdw.
[O.x)
Remark:
The center of concentration z , . ( f ) can be thought of as the
location parameter (in the sense of statistics) of the density
lf12/llfl12
on R.
Definition 3.3: The support of a function f , denoted
supp(f) is the closure of the set { : E : f ( x ) > O } .
Definition 3.4: Given f E L'(R) , f : R -+ R,with
Fourier transform f , and centers of concentration x , ( f ) and
wr (
[XO.
sER\[xo
(or center
Given f E M,if there exists another sequence of coefficients {a,) (other than the sequence { ( f .S - ' / L ~ ~ ) }and
)
such that f = aTLh,,then the an's are related to the
coefficients given in (3.1) by the formula,
la2,
1 and
1
,211
(1) The epsilon support (or time concentration) of f , denoted t-supp(f. f ) is the set [ z o ( f ) x, l ( f ) ] E P ( f ;t)
such that.
77
1'
f ( ( L J ) -5 CI -
F(
(I -
LJ )
.f =
~:,,,,,(f)!/rrirr
(8)
I l l If
(9)
For a function
with adequate decay at infinity, (9) is
equivalent to the requirement ,\',q(.r)d:r = 0 (see [6]). Since
'In this case we say that the triplet ((1. ( 1 . h ) generates an alline frame for
LZ(IR).
'Also referred to as the fiducial vector o r analyzing waveform.
1v. DILATIONS
AND TRANSLATIONS
IN SISO NEURALNETWORKS
c ,,..
.\-
./I =
f(J)
111
y+l{/('/f~,,.J.l~
-I))
( 10)
,/ = 1
where we have labeled the input node 0 and the output node
-li 1. It is clear that (10) is of the form in (8) with two
key differences: (i) The summation in (10) is finite, and (ii)
Even if we permit infinitely many hidden layer nodes, and
let yJ = ~ i ( ~ w ~ l . ) .Ii, ~
, ) , the infinite sequence {y,,} will not
necessarily be a frame. Since it is our intent to stay within
the general framework of feedforward neural networks, let us
first consider the sigmoidal function, S ( J ) = (1 +
shown in Fig. 3 as a possible mother wavelet candidate.
Since s
L2(IR.) , it is impossible to construct a frame for
L2(R)using individual translated and dilated sigmoids as
frame elements. However, we note that the difference of two
translated sigmoids is in L 2 ( R ) for finite translations and
'Throughout the rest of this paper wc wjill use the term wavelet transform
to mean discrete affine wavelet transform unless otherwise indicated.
78
weights w 's
1
-4
d m (seconds)
c biases I 's
C-
weights
w,~+,'s
Ai
$4.)
-4
(b)
AI
s a , L b , , ( z )-
(11)
sc,d,,(z)
n=l
n=l
Fig 7
TABLE I
TIME-FREQUENCY
LOCALIZATION
PROPERTIES OF
d . (1) = (1 1 2 )
Co.
(12)
+ P ) - cp(z - P I .
(13)
0.1
0.1
0.9420
0.0
[-2.15, 2.151
[0.2920, 1.59201
m.n
Remarks:
(a) In this section we have concentrated on wavelets constructed from sigmoids. We would however, like to
point out that nonsigmoidal activation functions are also
5Here we used ( p . d . q ) = ( 1 . 1 . 2 )
..
v.
SYNTHESIS O F
FEEDFORWARD
NEURALNETWORKSUSING WAVELETS
R}
..........
.......
............
......
............
..............
I
xmin
.
Xmax
SYNTHESIS
ALGORITHM
Step I: Our first step is to perform a frequency analysis of
the training data. In this st_epwe wish to obtain an estimate of
the bandwidth +supp( If 1 2 , ? )
of f based on the samples of
f provided in 0 . A number of techniques can be considered
for performing this estimate. We will not elaborate on such
techniques here. Let WnIln be our estimate of wInlrl,and W,,,a,
be our estimate of w,.
Step II: We now use the knowledge of WIIllnr W,,,,. xll,lll.
and x,,
to choose the particular frame elements to be used
in the approximation. The main idea in this step is to choose
only those elements of the frame { $ m 7 z } which cover the
region Q f of the time-frequency plane defined by
Qf(tiF)
[.rmin, -1.rnax1
U [-Wrnax.
( [ Z n u n . ;,ax]
that.
Q,,
f ( ~= )
cmn(f)djmn
(17)
m,nZ
n Q f # 0. for ( m ,n ) E Z.
Since f
-b
-wn1111]).
Qmn(e,?)
Increasing m
1%
written as
....
...
.
e .
..*I
5
Fig. I I .
(f.. s - l ( ~ h ) .
(19)
7 .
where Oz is the output of the network when x L is the input as
-1
I
in Section 11-A. We choose the wavelet coefficients as those
which minimize E . As a result of the wavelet formulation, the
weights to be determined appear linearly in the output equation
of the network. Thus E is a convex function of the coefficients
{cmn} and therefore any minimizer c* = { c ~ ~ } ( of
~ E
, ~ ) ~ z
is a global minimizer. Simple iterative optimization algorithms
such as gradient descent can be used to minimize E .
2) Normal Equations: There exists however an alternative
formulation of the above optimization problem which provides
a noniterative solution. Minimization of E as defined in
(20) defines a "least squares" problem. Therefore solutions
can be determined by solving the system of linear equations
2'
I
constructed via the first order optimality condition (which
0.05
0.1
0.15
0.2
0.25
0.3
is both necessary and sufficient in this case) dE/dcE, =
0, ( k , j ) E Z at any minimizer c*. By choosing an ordering Fig. 12. Original bandlimited function f ( . r ) = sin(2.rr5.r) + sin(2alOs),
(solid curve), and finite wavelet approximation (dashed curve).
of the wavelet terms {gmn,(m.n ) E Z}the normal equations
can be written as
PC=W
where, P is the #(Z) x
(21)
#(Z)matrix
p = [PkJ= [
10
defined by.
Q'lc(z2)Qj'3(52)1
(22)
(.',Y')EO
-5
-101
-0.5
and
V
"
0.4
-0.3
"
-02
'
"
4.1
0
0.1
time (seconds)
0.2
"
0.3
0.4
1'
05
0.08
D.Simulations
As a test of the neural network synthesis procedure described above, we simulated a few simple examples (some
more complicated examples will be presented in [23]). As
a first test we chose the bandlimited function comprised of
two sinusoids at different frequencies, specifically f ( z ) =
s i n ( 2 ~ 5 z ) s i n ( 2 ~ 1 0 z )which is shown in Fig. 12. Taking
%,in
= 0.0 and,,,z
= 0.3, 50 randomly spaced samples of
the function were included in the training set 0 . A single dilation of the mother wavelet was chosen ( n = 6) which covered
(b)
10
I5
20
25
30
35
40
Frequency (HI)
I
-0.6l
005
0.1
0.15
0.2
0.25
83
03
APPENDIXA
Given an admissible mother wavelet E L2(lR) , the following theorem by Daubechies [6] can be used to numerically
determine values of the parameters (1 and 1) for which ( 9 . I!. 0 )
generates an affine frame for L(IR) .
Theorem A.1 (Daubechies [6/): Let E L2(lR) and (L > 1
be such that:
1)
(24)
where
84
ti
60
k=l
5
< b,,
Translation Stepsize b
A 2 b-(m(g; a) - 2
B 5 b K 1 ( M ( g ;u ) + 2
,B(2~k/b)/~P(-2~k/b)/~)
30 I
25-
.__
Constant
_
= 1.0
Remarks:
The conditions in Theorem A . l and subsequently those
in Corollary A.l, are in general very conservative since the
theorem relies on the Cauchy-Schwarz inequality to establish
bounds.
In some applications, it may be desirable to use a sparsely
distributed frame to cover a given time interval and frequency band using a small number of frame elements. As can
be seen from Fig. 17, sparsity can be achieved to some extent
at the cost of tightness of the frame.
ACKNOWLEDGMENT
The authors are grateful to Prof. Hans Feichtinger of the
University of Vienna, Austria, for many helpful discussions
and numerous suggestions regarding this paper and to Dr.
J. Gillis for discussions on network synthesis techniques.
They also wish to thank Prof. H. White of the University
of California, San Diego for helpful comments, and Prof. J.
Benedetto of the University of Maryland, College Park for
discussions and the many references he provided on the subject
of wavelet transforms.
20-
15 -
10 -
5.............................................
OO
0.5
1.5
2.5
3.5
Fig. 17. Ratio ( B / A )of estimated frame bounds using mother wavelet C
constructed from sigmoids, with dilation stepsize a = 2 , as translation stepsize
b is varied. Solid curve represents BI.4, and the dashed line indicates the
level where BI.4 = 1.
REFERENCES
[ l ] G. Cybenko, Tech. Rep., Department of Computer Science, Tufts
University, Medford, MA, Mar. 1988.
Approximations by superpositions of a sigmoidal function,
[2] -,
Tech. Rep. CSRD 856, Center for Supercomputing Research and Development, University of Illinois, Urbana, Feb. 1989.
[3] I. Daubechies, Grossmann A., and Y. Meyer, Painless nonorthogonal
expansions, J . Mathematical Phys., vol. 27, no. 5, pp. 1271-1283, May
1986.
[4] I. Daubechies, Orthonormal bases of compactly supported wavelets,
Communications on Pure and Applied Mathematics, vol. 41, pp.
909-996, 1988.
[SI -,
Time-frequency localization operators: A geometric phase
space approach, IEEE Trans. Informat. Theory, vol. 34, pp. 605-612,
July 1988.
The wavelet transform, time-frequency localization and signal
(61 -,
analysis, IEEE Trans. Informat. Theory, vol. 36, pp. 961-1005, Sept.
1990.
[7] J. Daugman, Six formal properties of two-dimensional anisotropic
visual filters: Structural principles and frequency/orientation selectivity,
IEEE Trans. Syst., Man, Cybern., vol. SMC-13, pp. 882-887, Sept./Oct.
1983.
[8] R. J. Duffin and A. C. Schaeffer, A class of nonharmonic Fourier
series, Trans. Amer. Math. Soc., vol. 72, pp. 341-366, 1952.
191 C E Heil dnd D F Wdlnut. Continuous d n d d i w e t e wavelet transt o m \ , SIAM Rei r c w . \ 01 3 I . pp 028-666. Dec I989
8.5