net/publication/261306459
CITATIONS READS
3 23
2 authors:
Some of the authors of this publication are also working on these related projects:
use of conductive polymer composites as coatings for improved performance of implantable devices View project
Mixed-signal front-end for acquisition, elaboration and wireless transmisssion of nerural signals in BMI applications View project
All content following this page was uploaded by M. Barbaro on 01 September 2014.
Abstract— A CMOS, smart, low-power imager with pre- The chip architecture is flexible enough to allow computation
processing capabilities suitable for embedded systems, was re- of other algorithms provided they can be described in terms
alized and successfully tested. The chip hosts an array of of weighted sums. In all these algorithms, the key point is
64x64 active pixels for image acquisition and frame storage,
a programmable and reconfigurable analog row-processor for the possibility of interactively changing the parameters of the
parallel spatial and temporal filtering of an image row at kernel (frequency, envelope, gain, phase). Very fast output
the time and a fully digital communication block for chip rate is required to be able to perform multi-scale and multi-
configuration and frame grabbing. The row processor is capable frequency filtering of the same image.
of implementing programmable and tunable 2D spatial IIR filters
and programmable temporal FIR filters with up to 8 taps. The
two computations may be cascaded on-chip in order to extract II. C HIP A RCHITECTURE
motion information in real-time. The same row-processor can
be reconfigured into a parallel set of 64 8-bit A/D converters The chip architecture shown in Figure 1 reflects the choice
to achieve fully digital read-out. The chip was fabricated in a of a semi-parallel processing approach, where a full portion
0.35um process by AMI Semiconductors, has a size of 6mm2 , of the image (an entire row) is processed simultaneously,
hosts 120,000 transistors with a static power consumption of
4.7mW and is capable of a frame rate of 50frames/sec.
allowing to meet both real time and low area constraints.
Acquisition of the image is implemented in the Array, made-
I. I NTRODUCTION up of 64x64 active pixel sensors (Figure 1a) incorporating
A growing interest for embedded systems integrating image a photodiode, 2 long term memories (LTM) needed to store
processing capabilities (i.e. face/gesture recognition, motion 2 complete frames or 8 reduced frames, 2 output buffers
detection, target tracking, etc.) is being shown in different (LineCol and LineRow) and 1 input line (LineIn). Frame
application fields such as surveillance, automotive, biometrics storage is needed to compute temporal correlation. All the
and robotics. In such systems, the traditional approach based processing is done in the analog row processor (AP), a
on acquisition with CCD cameras and hardware/software vectorial structure made-up of 64 identical elements (APU,
processing on digital platforms may be ineffective in achieving Analog Processing Units, Figure 1b). A row of the new image
constraints of low-power, real-time and integration. A number or of any previously stored frame can be written in a battery
of different CMOS imagers have thus been proposed, in the of 64 S&H (short-term memories, STM) and fed to a set of
last decade, since CMOS technology allows to incorporate 64 programmable, switched capacitors, weighted adders which
image acquisition capabilities as well as low-level processing implement a feed-forward FIR filter. The weighted adders can
circuits, both digital or analog, on the same device [1]-[3]. also be interconnected one with the other to implement a
In this work we present a reconfigurable CMOS imager recursive spatial 1D IIR filter. The results of any computation
for real-time image processing which is intended to be a can be directly A/D converted (by reconfiguring the adders into
part of a portable system, capable of performing different a successive approximation A/D converter) or stored back in
kinds of spatio-temporal computations on the acquired im- the S&H or even in the frame memories.
age. Spatial processing is based on the convolution of the Notice that all the pixels of the same column as well as
image with a programmable Gabor-like function (a decaying all those of a same row share the same output channel so
exponential modulated by a cosine) useful for low-level tasks that the Array is addressable by rows and by columns. This
needed for stereo depth estimation [4], texture analysis [5], is needed when 2D spatial filters are required: such filters are
segmentation [6]. The temporal filter, which is a programmable implemented cascading a 1D horizontal filter and a 1D vertical
FIR filter with up to 8 taps (high-pass, band-pass, low-pass), filter in such a way: a) a row image is read through LineCol
coupled to the Gabor filter is suited for implementation of bus, b) the row is stored in the STMs, c) a 1D Gabor-like
motion detection [7] and estimation of motion-in-depth [8]. filter is applied, d) the result is stored back in the LTMs of
LTMm
fbCol fbRow ReadRowM writeM
rstRead
reset
SelMem
ReadColM
shutt
fbRow
ReadRowP writeP
selCol selRow
ReadColP
Acquisition outCol outRow
LTMp
Column Buffer Row Buffer
bottom side
M8=M7 F1a
F24 C4 Control
In4P
sel M5
F14
+ - outM
Feedback PCA
M6=M5 selHLS outP1
Vref
Program
P1
npreset M1 bf<6:0>
outM1
- +
VDD
X F20 C0
M10 In0M outP
bottom side
F10
INbf<6:0>
M1=M0
Vbias Vref
F1a
M9 Vref
GND F2
npreset
F24 C4 fbPCA
In4M selSUM F1
top
Fig. 3. Column buffer and feedback signal F14 Vref
bottom hold
Vref
topBC selADC In0M
to_STM to_FB from_STM bf<6:0> b<6:0> sample
Vin[0]
Column[i]
VradcM
STMtoLine0
Line0toSTM
Line1toFB
Line0toFB
selSUM selADC
Sign
Selection
Sign
Selection
Sign
Selection
Sign
Selection
Sign
Selection
Sign
Selection product of the OpAmp is GBW = 10 [M Hz] while the
DC open loop gain is A0 = 70 [dB]. The operating mode
Vin[0] Vin[1] Vin[2] Vin[3] Vin[4] of this module is set by signals selSum (adder mode) and
fromAdder
selADC (converter mode). When the circuit works in adder
Fig. 4. Switch Matrix mode, two phases are needed to compute simultaneously the
weighted sum of five inputs. During the reset phase (F1 and
F1a high and F2 low) the OpAmp is connected in a unit-
gain feedback and the capacitors are put in parallel. During
X and Y in order to reduce the time needed for each reading.
the amplification phase (F1 and F1a low and F2 high) it is
C. Switch Matrix computed the weighted sum of the inputs, being the weights
Ci
The reconfigurability of the device, i.e. the capability of set by the ratios C f
(i=0. . . 4). Since each capacitor is a 7 bit
performing different algorithms, is demanded to the Switch Programmable Capacitor Array (PCA) the weighted sum is
Matrix whose schematic is depicted in Figure4. A number programmable too and the user can digitally set the shape of
of lines of signal distribution allows the interaction among the kernel. When the circuit works in converter mode (signal
neighbouring columns, while a network of switches (Topology selADC) two phases are needed to produce the whole 8 bit
selection) provides the right connection scheme depending on digital word. During the first phase the OpAmp is connected
the desired algorithm. Thanks to the fully differential archi- in a unitary gain feedback (signal F1a high) and the total
tecture, the sign of each input can be digitally programmed feedback capacitor (bits bf [6 : 0] all equal to one) is used
just swapping the inputs via a multiplexer (Sign selection). to sample (signal sample) and hold (signal hold) the input
Moreover, the Switch Matrix must correctly route the data signal that comes from the STM. During the second phase it
from/to the Array to/from the Analog Processor. is implemented the well known bit cycling algorithm. In this
phase the OpAmp is in the open look configuration, working as
D. Programmable Adder a comparator; the Miller capacitor is disabled and only the first
In order to save area and power consumption the pro- stage is used to speed-up settling of output. The bits produced
grammable adder and the ADC are integrated into the same at successive approximations are stored in a bank of dynamic
switched capacitors circuit whose schematic is depicted in flip-flops and are used to the digitally program the feedback
Figure 5. The adoption of a fully differential architecture is PCA. Once that the whole digital word has been produced it
motivated by power-supply noise rejection. The operational remains stored in the bank of registers and can be read-out
amplifier is a two-stage Miller OpAmp. The gain-bandwidth while the circuit processes a new row/column.
276
IV. E XPERIMENTAL RESULTS still under test since a number of parameters can be optimized
The chip was designed and realized in a 0.35µm process (and digitally set) to reduce the effect of such imperfections.
from AMI Semiconductors and its layout is shown in Figure 6
(due to metal fill, the microphotograph do not show many
details). The pitch of the columns is given by the size of the
pixel which is 24.7µm and determines the size of each unit
(APU) in the Analog Processor. Figure 7 shows the outputs
(a) Test image (b) Readout and (c) Spatial filter- (d) Spatial filter-
A/D conversion ing ing
277