Bora
Electrical Signal, for example, the output of a video camera, that gives the electric voltage at locations in an image
2D array of numbers representing the sampled version of an image The image defined over a grid, each grid location being called a pixel. Represented by a finite grid and each intensity data is represented a finite number of bits. A binary image is represented by one bit gray-level image is represented by 8 bits.
255 19 21 21 21 33
Mathematically
We can think of an image as a function, f, from R2 to R: f( x, y ) gives the intensity at position ( x, y ) Realistically, we expect the image only to be defined over a rectangle, with a finite range: f: [a, b]x[c, d] [0, 1]
A color image is just a three component function. We can write this as a vector-valued function:
r ( x, y ) ( , ) f ( x, y ) = g x y b ( x, y )
Photgraphic
Examples
Ultrasound
Mammogram
Image processing
Digital Image processing deals with manipulation and analysis of the
digital image by a digital hardware, usually a computer. Emphasizing certain pictorial information for better clarity (human interpretation) Automatic machine processing of the scene data. Compressing the image data for efficient utilization of storage space and transmission bandwidth
Image Processing
An image processing operation
Image Processing
h
x x
g(x) = h(f(x))
g(x) = f(h(x))
Example
Image Restoration
Processing Restored Image
Degraded Image
Image Acquisition
An analog image is obtained by scanning the sensor output. Some of the modern scanning device such as a CCD camera contains an array of photodetectors, a set of electronic switches and control circuitry all in a single chip
Image Acquisition
Sample Sampleand and Hold Hold Image Sensor
Takes a measurement and holds it for conversion to digital. Converts a measurement to digital
Digital Image
A digital image is obtained by sampling and quantizing an analog image. The analog image signal is sampled at rate determined by the application concerned Still image 512X512, 256X256 Video: 720X480, 360X240, 1024x 768 (HDTV) The intensity is quantized into a fixed number of levels determined human perceptual limitation 8 bits is sufficient for all but the best applications 10 bits Television production, printing 12-16 bits Medical imagery
Image Enhancement
Improves the qualities of an image by enhancing the contrast sharpening the edges removing noise, etc. As an example, let us explain the image filtering operation to remove noise.
Original Image
Filtered Image
Histogram Equalization
Enhance the contrast of images by transforming the values in an intensity image to its normalized histogram the histogram of the output image is uniformly distributed. Contrast is better
Feature Extraction
Extracting Features like edges Very important to detect the boundaries of the object Done through digital differentiation operation
Segmentation
Partitioning of an image into connected homogenous regions. Homogeneity may be defined in terms of: Gray value Colour Texture Shape Motion
Segmented Image
Object Recognition
An object recognition system finds objects in the real world from an image of the world, using object models which are known a priori labelling problem based on models of known objects
Object or model representation Feature extraction Feature-model matching Hypotheses formation Object verification
Image Understanding
Inferring about the scene on the basis of the recognized objects Supervision is required Normally considered as part of artificial intelligence
Books
1. R. C. Gonzalez and R. E. Woods, Digital
Image Processing, Pearson Education, 2001 (Main Text)
2.
A. K. Jain, Fundamentals of Digital Image processing, Pearson Education, 1989. 3. R. C. Gonzalez , R. E. Woods and S. L. Eddins, Digital Image Processing using MATLAB, Pearson Education, 2004. (Lab Ref)
Evaluation Scheme
End Sem Mid Sem Quiz Matlab Assignment Mini Project Total 50 25 5 10 10 100
1. MINI PROJECT
Matlab Implementation and preparing Report and Demonstration of any advanced topic like:
Video compression Video mosaicing Video-based tracking Medical Image Compression Video Watermarking Medical Image Segmentation Image and Video Restoration Biometric recognition
X ( ) = x [ n ] e
n=-
and
j n
1 j n x [n] = X e ( ) d 2
Note that X ( + 2 ) = X ( )
X ( ) exists
x [ n] <
xs ( t ) = x a ( t ) ( t nT ) = x a ( nT ) ( t nT )
n=- n=-
2D DSFT
Consider the signal { f [m, n], m = ,....., , n = ,....., } defined over the two-dimensional space. Also assume
m = n =
f [ m, n] < .
Then the two-dimensional discrete-space Fourier transform (2D DSFT) and its inverse are defined by the following relations:
F (u , v) =
and
n = m =
f [m, n]e j ( um + vn )
f [m, n] =
1 4 2
F (u , v)e + j ( um + vn ) du dv
Note that F (u , v ) is doubly periodic in u and v. Following properties of F (u , v ) are easily verified: Linearity Separability Shifting theorem:
2 D DSFT e f [m m0 , n n0 ]
F (u, v),
Convolution theorem:
2 D DSFT 2 D DSFT If f1 [m, n] F1 (u, v) and f 2 [m, n] F2 (u , v), then 2 D DSFT f1 [m, n]) * f 2 [m, n] F1 (u, v) F2 (u , v)
2D DFT
Motivation
Consider the 1D DTFT X ( ) = each . [0, 2 ].
n =
Numerical evaluation of X ( ) involves very large (infinite) data and is to be done for each . An easier way is the Discrete Fourier transform (DFT) which is obtained by sampling X ( ) at a regular interval
1 Sampling periodically in the frequency domain at a rate means that the data sequence N will be periodic with a period N . The relation between the Fourier transform of an analog signal xa (t) and the DFT of the sampled version is illustrated in the Figure below.
1 . N
2D DFT
The 2D DFT of a 2D sequence is defined as
F [ k1 , k2 ] =
f [ m, n] e
n =0 m =0
N -1 M -1
2 2 j mk1 + nk2 N M
, k1 = 0,1,..., M 1, k2 =0,1,...,N-1
1 f [ m, n] = MN
k 2 =0 k1 =0
F [k , k ] e
1 2
N-1 M-1
2 2 j mk1 + nk2 N M
m = 0,1,..., M 1, n=0,1,...,N-1
Properties of 2D DFT
Shifting property
F [k1 , k2 ]
=e
2 mk M
2 nk 2 N
Properties of 2D DFT
Separability propery Since
e
2 2 j m k1 + nk 2 N M
=e
2 mk M
2 nk 2 N
We can write
F
[k 1 , k 2 ] =
= =
N -1
M -1
n = 0 m = 0
[m
f
,n
2 2 j m k1 + nk M N
N -1
n = 0 N -1
m = 0
M -1
[m
,n
j
]e
2 m k1 M
2 nk N
n = 0
F1[ k 1 , n ] e
2 nk N
w h e re
F1[ k 1 , n ] =
m = 0
M -1
[m
,n
]e
2 m k1 M
2D Fourier Transform
Frequency domain representation of 2D signal :: Consider a two-dimensional signal f ( x, y ) . The signal f ( x, y ) and its two-dimensional Fourier transform F ( u, v ) 2DFT are related by :: f x , y F (u , v )
F (u , v ) =
f ( x, y) =
1
2
f ( x , y ) e j ( xu + yv ) dx dy
F (u , v )e
j ( xu + yv )
du dv
u and v represent the spatial frequency in radian/length. F(u,v) represents the component of f(x,y) with frequencies u and v. A sufficient condition for the existence of F(u,v) is that f(x,y) is absolutely integrable.
f ( x, y ) dxdy <
2D Fourier Transform u and v represent the spatial frequency in horizontal and vertical directions inradian/length. F(u,v) represents the component of f(x,y) with frequencies u and v.
2D Fourier Transform
A sufficient condition for the existence of F(u,v) is that f(x,y) is absolutely integrable.
f ( x, y ) dxdy <
3.
j ( xo u + y o v ) 2 D FT f ( x x , y y ) e F (u , v ) o o Shifting Property:
f ( x, y )e
F (u u o , v v o )
5.
Complex exponentials are the eigen functions of linear shift invariant systems.
For an imaging system, h(x, y) is called the point spread function and H (u, v) is called the optical transfer function
6.
Separability property:
F (u , v ) = =
f ( x , y ) e jux dx . e jvy dy
F1 ( u , v ) e jvy dy
f ( x, y ) = f1 ( x) f 2 ( y ) then
F (u, v) = F1 (u ) F2 (v)
, then
7.
2D Convolution:
If g ( x, y ) = f ( x, y ) * h( x, y ) G (u , v) = F (u , v) H (u , v) Similarly if g ( x, y ) = f ( x, y )h( x, y ) G (u , v) = 1 4
2
F (u , v) * H (u , v)
Thus the convolution of two functions is equivalent to product of the corresponding Fourier transforms.
8. Preservation of inner product: Recall that the inner product of two functions f ( x, y ) and h ( x,y ) is defined by f ( x, y ), h( x, y ) = f ( x, y )h( x, y ) dx dy
F (u , v), H (u, v)
where F (u , v), H (u , v) =
F (u, v) H (u, v)
du dv
f ( x, y ) dx dy =
2
1 4 2
F (u , v) du dv
X ( ) = x [ n ] e
n=-
and
j n
1 j n x [n] = X e ( ) d 2
Note that X ( + 2 ) = X ( )
X ( ) exists
x [ n] <
xs ( t ) = x a ( t ) ( t nT ) = x a ( nT ) ( t nT )
n=- n=-
2D DSFT
Consider the signal { f [m, n], m = ,....., , n = ,....., } defined over the two-dimensional space. Also assume
m = n =
f [ m, n] < .
Then the two-dimensional discrete-space Fourier transform (2D DSFT) and its inverse are defined by the following relations:
F (u , v) =
and
n = m =
f [m, n]e j ( um + vn )
f [m, n] =
1 4 2
F (u , v)e + j ( um + vn ) du dv
Note that F (u , v ) is doubly periodic in u and v. Following properties of F (u , v ) are easily verified: Linearity Separability Shifting theorem:
2 D DSFT e f [m m0 , n n0 ]
F (u, v),
Convolution theorem:
2 D DSFT 2 D DSFT If f1 [m, n] F1 (u, v) and f 2 [m, n] F2 (u , v), then 2 D DSFT f1 [m, n]) * f 2 [m, n] F1 (u, v) F2 (u , v)
Colour Fundamentals
Visible spectrum: approx. 400 ~ 700 nm The frequency or mix of frequencies of the light determines the colour Visible colours: VIBGYOR with UV and IR at the two extremes (excluding)
HVS review
Cones are the sensors in the eye responsible for colour vision Humans perceive colour using three types of cones Primary colours: RGB because the cones of our eyes can basically absorb these three colours. The sensation of a certain colour is produced due to the mixed response of these three types of cones in a certain proportion Experiments show that 6-7 million cones in the human eye can be divided into red, green and blue vision. 65% cones are sensitive to red vision, 33% are for green and only 2% are for blue vision (blue cones are the most sensitive)
Absorption of light by red, green and blue cones in the human eye as a function of wavelength
The colour produced by mixing RGB is not a natural colour. A natural colour will have a single wavelength, say . On the other hand, the same colour is artificially produced by combining weighted R, G and B each having different wavelength.
The idea is that these three colours together will produce the same amount of response as that would have been produced by wavelength alone (proportion of RGB is taken accordingly), thereby giving the sensation of the colour with wavelength to some extent.
A colour is then specified by its tri-chromatic coefficients, defined as x= X/(X+Y+Z) y= Y/(X+Y+Z) z = Z/(X+Y+Z) so that x + y +z=1 For any wavelength of light in the visible spectrum, these values can be obtained directly from curves or tables compiled from experimental results.
Chromaticity Diagram
Shows colour composition as a function of x and y (only two of x, y and z are independent z = 1 (x + y) and so not independent of them) The triangle in the diagram below shows the colour gamut for a typical RGB system plotted as the XYZ system. y 1
The axes extend from 0 to 1. The origin corresponds to BLUE. The extreme points on the axes correspond to RED and GREEN.The point corresponding to x= y= 1/3 (marked by the white spot) corresponds to WHITE.
Any color in the interior of the horse shoe" can be achieved through the linear combination of two pure spectral colors A straight line joining any two points shows all the different colours that may be produced by mixing the two colours corresponding to the two points The straight line connecting red and blue is referred to as the line of purples
RGB primaries form a triangular color gamut. The white colour falls in the center of the diagram
C 1 R M = 1 G Y 1 B
Should produce black Practical printing devices additional black pigment is needed. This gives the CMYK colour space
C 1 M = 1 Y 1
Decoupling the colour components from intensity Decoupling the intensity from colour components has several advantages:
Human eyes are more sensitive to the intensity than to the hue We can distribute the bits for encoding in a more effective way. We can drop the colour part altogether if we want gray-scale images. In this way, black-and-white TVs can pick up the same signal as color ones. We can do image processing on the intensity and color parts separately. Example: Histogram equalization on the intensity part to contrast enhance the image while leaving the relative colors the same
I S
I H
HSI model can be obtained from the RGB model. The diagonal of the joining Black and White in the RGB cube is the intensity axis
HSI Model
1 2
H = H 240 G = I (1 S)
H = H 120 R = I (1 S)
YIQ model
YIQ colour model is the NTSC standard for analog video transmission Y stands for intensity I is th e in phase component, orange-cyan axis Q is the quadrature component, magenta-green axis Y component is decoupled because the signal has to be made compatible for both monochrome and colour television. The relationship between the YIQ and RGB model is
Y 0.299 0.581 0.114 R I = 0.596 0.274 0.322 G Q 0.211 0.523 0.312 B
Colour balancing
Refers to the adjustment of the relative amounts of red, green, and blue primary colors in an image such that neutral colors are reproduced correctly Colour imbalance is a serious
X W = { x1 , x2 , ...., x N }
where d ( xi , x j ) represents an appropriate distance measure between the ith and jth neighbouring vector pixels (2) Arrange is in the ascending order. Assign the vector pixel xi a rank equal to that of
i . Thus, an ordering (1) (2) .... ( N ) implies the same ordering of the corresponding vectors given as x(1) x(2) .... x( N )
where x(1) x(2) .... x( N ) are the rank-ordered vector pixels with the number inside the parentheses denoting the corresponding rank.
(3) Take the vector median as xVMF = x(1) The vector median is defined as the vector that corresponds to the minimum SOD to all other vector pixels
Considering the vector pixels as feature vectors we can apply clustering technique to segment the colour image
EDGE DETECTION
Edge detection is one of the important and difficult operations in image processing. It is important step in image segmentation the process of partitioning image into constituent objects. Edge indicates a boundary between object(s) and background.
Edge
When pixel intensity is plotted along a particular spatial dimension the existence of edge should mean sudden jump or step.
df dx
X0
d f 2 dx
X0
All edge detection methods are based on the above two principles. In two dimensional spatial coordinates the intensity function is a two dimensional surface. We have to consider the maximum of the magnitude of the gradient.
For simplicity of implementation, the gradient magnitude is approximated by The direction of the normal to the edge is obtained from
differentiation is highly prone to high frequency noise. An ideal differentiation corresponds to the function being changed in the frequency domain by the addition of a zero at origin. Thus there is an increase of 20dB per decade. This will lead to high frequency noise being amplified. To circumvent this problem, low pass filtering has to be performed. Differentiation is implemented as finite difference operation.
The most common kernels used for the gradient edge detector are the Roberts, Sobel and Prewitt edge operators.
Does some averaging operation to reduce the effect of noise. May be considered as the forward difference operations in all 2-pixel blocks in a 3 x 3 window.
Does some averaging operation to reduce the effect of noise, like the Prewitt operator. May be considered as the forward difference operations in all 2 x 2 blocks in a 3 x 3 window.
Find fx and fy using a suitable operator. Compute gradient Edge pixels are those for which suitable threshold where T is a
Example
Laplacian Operator
Advantages: No thresholding symmetric operation Disadvantages: Noise is more amplified It does not give information about edge orientation
When a change in intensity (an edge) occurs there is an extreme value in the first derivative or intensity. This corresponds to a zero crossing in the second derivative. The orientation independent differential operator of lowest order is the Laplacian.
LOG Operation
Convolving the image with Gaussian and Laplacian operator can be combined into convolution with Laplacian of Gaussian (LoG) operator ( Inverted Maxican Hat) Continuous function and discrete approximation
Minimizing the error of detection. Localization of edge i.e. edge should be detected where it is present in the image. single response corresponding to one edge.
Canny Algorithm
1.
Example
Canny
Edge linking
After labeling the edges, we have to link the similar edges to get the object boundary. Two neighboring points (x1,y1) and (x2,y2) are linked if
For
possible lines.
2
To find whether a point is closer to a line we have to perform n(n 1 ) comparisons. Thus, a total of 0( n3 ) comparisons. 2
Hough transform uses parametric representation of a straight line for line detection.
y
y = mx + c
c
(m, c)
y
( x1 , y1 )
l 2 c = mx + y
( x, y )
P
l1
x
The points ( x, y ) and ( x1 , y1 ) are mapped to lines respectively in m c space l 1 and l 2 will intersect at a point P representing the values of the line joining ( x, y ) and ( x1 , y1 ).
l 1and l2
( m, c )
The straight line map of another point collinear with these two points will also intersect at P The intersection of multiple lines in the mc plane will give the (m,c) values of lines in the edge image plane.
The transformation is implemented by an accumulator array A, each accumulator corresponding to a quantized value of ( m, c ). The array A is initialized to zero. Corresponding to each edge point ( x, y ), for each mi in the range find
[ mmin , mmax ],
c
N cmax
c j = mi x + y
Increment A(i, j ) by 1
M mmax
and
x cos( ) + y sin( ) =
varying from
M 2 + N2
for an M N image.
Example
Circle detection
Other parametric curves like circle ellipse etc. can be detected by Hough transform technique.
( x xo ) 2 + ( y yo ) 2 = r 2 = constant For circles of undetermined radius, use 3-d Hough transform for parameters
Example
Compression Basics
Todays world is dependent upon a lot of data either stored in a computer or transmitted through a communication system Compression involves reducing the number of bits to represent the data for storing and transmission. Particularly, Image compression is the
Storage Requirement Example:One second of digital video without compression requires 720X480X24X25~24.8 MB Example: One 4-minute song: 44100samples per second X16 bits per sampleX4X60~20 MB How to store these data efficiently?
Band-width requirement
The large data rate also means a larger bandwidth requirement for transmission For an available bandwidth of B, the maximum allowable bit-rate is 2B. 2B bits/s can be resolved without ambiguity
How to send large amount of data in real-time data through a limited bandwidth channel, say a telephone channel?
Lossy
Perfect reconstruction is not possible but visually useful information is retained Provides large compression Examples Video broadcasting Video conferencing Progressive transmission of images Digital libraries and image databases
Types of Redundancy
Coding Redundancy
Some symbols may be used more often than others In English text, the letter E is far more common than the letter Z. More common symbols are given shorter code-lengths Less common symbols are given bigger code-lengths Coding redundancy is exploited in loss-less coding like Huffman coding
Spatial Redundancy
Neighboring data samples are correlated Given a sample a part of x ( n 1) , x ( n 2 ) ,...,
x ( n)
Temporal Redundancy
In video, same objects may be present in consecutive frames so that objects may be predicted
Frame k
Frame k+1
Perceptual Redundancy
Humans are sensible to only limited changes in the amplitude of the signal While choosing the levels of quantization, this fact may be considered Visually lossless means that the degradation is not visible to the human eye
64 levels
32 levels
Average Information content of the source. Measures the uncertainty associated with a source and is called entropy
Entropy
Introduced by Ludwig Boltzmann His only epitaph
S = k ln W
Properties of Entropy
1. 0 H ( X ) log 2 (n) 2. H ( X ) = log 2 (n) when all n symbols are equally likely If X is a binary soure with symbols 0 and 1 emitted with probability p and (1- p ) respectively, then 1 1 ) H ( X ) = p log 2 ( ) + (1 p) log 2 ( 1 p p
Properties of a Code
Codes should be uniquely decodable. Should be instantaneous ( we can decode the code by reading from left to right as soon as the code word is received. Instantaneous codes satisfy the prefix property (no code word is a prefix to any other code) . The average codeword length Lavg is given by n L = li p i avg i=1
Krafts Inequality
There is an instantaneous binary code with codewords having lengths l1, l2 ,....lI if and only if
2li 1
i=1
For example, there is an instantaneous binary code with lengths 1, 2, 3;,3, since 1 1 1 1 + + + =1 2 2 2 2 An example of such a code is 0; 10; 110; 111. There is no instantaneous binary codewith lengths 1, 2, 2, 3, since
1 1 1 1 + + + = 1.125 > 1 2 2 2 2
2 2 3
Example
Symbol Probability 0.125 x1 x2 x3 x4 0.125 0.250 0.500 + 0.125log 2 (1/ 0.25) + 0.125log 2 (1/ 0.5) = 1.125 bit/symbol
Example (Contd..)
Symbol Probability code 0.125 000 x1 x2 x3 x4 0.125 0.250 0.500 = 1.125 bit/symbol 001 01 1
Huffman coding
Based on a loss-less statistical method of the 1950s. Creates a probability tree and combines the two lowest probabilities to obtain the code
0 1 1 1
Most common data value (with the highest frequency) has the shortest code Huffman table of data value versus code must be sent Time of coding and decoding can be long Typical compression ratios 2:1 3:1
Run-length coding
Looks for sequential pixel values Example: 1 row of an image with the new code below
40 2 10 10 9 10 0 10 4 10 10 10 10 10 0 0 10
40
40
Has reduced the size from 18 bytes to 6 Higher compression ratios when predominantly low frequency information Typical compression ratios of 4:1 to 10:1 Used in Fax machine Used for coding the quantized transform coefficients in a lossy coder
Arithmetic coding
Codes a sequence of symbols rather than a single symbol at a time
a3 a2 a1 a1
a3 0.7 a2 a1
a30.56 a2 a2 a1
a3 0.546 a2 a2 a1 a3
0.546
a3 Tag
0.49
0.539
0.5446
Choose the interval corresponding to the first symbol; the tag will lie in the interval Go on subdividing the subintervals according to the symbol probability The code is the AM of the final subinterval The tag is sent to decoder which has to know the symbol probabilities The decoder will repeat the same procedure to decode the symbols
Disadvantage
Assumes data to be stationary, does not consider dynamics of data
Example
Let aabbbaa be the sequence to be encoded; the dictionary will be
The output for the given sequence is 11253, which is aabbbaa according to dictionary
Lossy Compression
Throws away both non-relevant information and a part of relevant information to achieve required compression Usually, involves a series of algorithm-specific transformations to the data, possibly from one domain to another (e.g to frequency domain in Fourier Transform) without storing all the resulting transformation terms and thus, loosing some of the information contained
Example
Differential Encoding: Stores the difference between consecutive data samples using a limited number of bits. Discrete Cosine Transform (DCT): Applied to image data. Vector Quantization JPEG (Joint Photographic Experts Group)
Fig. Original Lena image, and Reconstructed image from lossy Compression
Lossy Coder
X
2
Average distortion
E ( X Y ) = ( x y ) p( x) p( y / x)
2 x y
X
source Lossy Coder
I( X,Y )
= H(X) H(X| Y)
R(D)
2
D
Rate Distortion
Gaussian case presents the worst case of coding For a non Gaussian case, achievable bit error rate is lower than that of Gaussian. If we do not know about anything about the distribution of X, then Gaussian case gives us the pessimistic bound. An increase of 1 bit improves the SNR by about 6 dB.
Lossy Encoder
Fig. A Typical Lossy Signal/Image Encoder
Input Data
Prediction/ Transformation
Quantization
Entropy Coding
Compressed Data
Differential Encoding
Given a sample x [ n 1] , x [ n 2] ,..., x [ n p ] , a part of x [ n ] can be predicted if the data are correlated. A simple prediction scheme expresses the predicted value as a linear combination of past p samples:
[ n ] = ai x [ n i ] x
i =1
LPC (contd..)
Variants of LPC (10) are used for coding speech for mobile communication Speech is sampled at 8000 samples per second Frames of 240 samples ( 30 msec of data) are considered for LPC Corresponding to each frame, quantized versions of 10 prediction parameters and approximate prediction errors are transmitted
Transform coding
Transform coding applies an invertible linear coordinate transformation to the image. Correlated data Transform Less correlated data
Most of the energy will be stored in a few transform coefficients Example: Discrete Cosine transform (DCT), Discrete wavelet transform (DWT)
Transform selection
Transform KLT DFT DCT
DWT
Merits
Theoretically optimal Very fast
Demerits
Data dependent Not fast Assumes periodicity of data High frequency distortion is more because of Gibbs phenomenon
Less high frequency distortion High energy compaction High Energy Compaction Scalabilty
Also, DCT is theoretically closer to KLT and implementation wise closer to DFT
f [ m , n ], m = 0,1,..., N 1, n = 0,1,..., N 1
DCT is given by
Fc (u , v) =
N 1
m =0
N 1
with (u ) =
1 N 2 N
( (2m + 1)u + ( 2n + 1) v ) , u = 0,1,..N 1, v = 0,1,.., N 1 2 N 1 u=0 N v=0 and (v) = 2 v = 1, 2,.., N -1 u = 1, 2,.., N -1 N
DCT (contd..)
x
DCT Round Threshold IDCT
50
54
49 55 2 0 53
52 -1 0 53
53
1.13
54
-4.30
49
-3.0366
147 0 -3 147 0 0 51 54 50
1 -4 0 -4 50 54
-3 0 51
We see that only two DCT coefficients contain most information about the original signal DCT can be easily extended to 2D
Block DCT
DCT can be efficiently implemented in blocks using FFT and other fast methods. FFT based transform is more computationally efficient if applied in blocks rather than on the entire data. For a data length N and N point FFT, computational complexity is of order N log 2 N If the data is divided into sub-blocks of length n then the number of sub-blocks is computational complexity
N n
and the
N = n log 2 n = N log 2 n n
2 2 4 4
88
Block size
Quantization
Replaces the transform coefficients with lower-precision approximations which can be coded in a more compact form
A many-to-one function.
Precision is limited by the number of bits available.
X= 147.07 -0.32 -2.54 0 -3 1.54 2 -1.41 -1
Quant(X)= 147
Quantization (contd..)
Information theoretic significance More the variance of the co-efficients, more is the information Estimate the variance of each transform coefficient from given image or determine the variance from the assumed model In the DCT, DC co-efficients: Raleigh distribution AC co-efficients: Generalized Gaussian distribution model
Two methods for quantization are zonal coding and threshold coding
Zonal coding
The co-efficients with more information content (more variance) are retained
Threshold coding
The co-efficients with higher energy are retained, the rest are assigned zero More adaptive Computationally exhaustive
Zonal Coding mask and the number of bits allotted for each coefficient
JPEG
Joint Photographic Expert Group A generally used lossy image coding format Allows tradeoff between compression ratio and image quality Can achieve high compression ratio(20+) with almost invisible difference
JPEG (contd..)
Quantization Table
Image
8x8 DCT
quantization
Baseline JPEG
Divide image into blocks of size 8X8. Level shift all 64 pixels values in each block by subtracting, 2n 1 (where 2nis the maximum number of gray levels). Compute 2D DCT of a block. Quantize DCT coefficients using a quantization table. Zig-zag scan the quantized DCT coefficients to form 1-D sequence. Code 1-D sequence (AC and DC) usingJPEG Huffman variable length codes.
Zig-zag scanning
Zigzag scanning
HL1
LH1
HH
1
LH1
HH
1
First stage
Second stage
EZW
EZW scans wavelet coefficients subband by subband. Parents are scanned before any of their children, but only after all neighboring parents have been scanned.
EZW coding
Each coefficient is compared against the current threshold T. A coefficient is significant if its amplitude is greater then T; such a coefficient is then encoded as Positive significant (PS) Negative significant (NS) Zerotree root (ZTR) is used to signify a coefficient below T, with all its children also below T Isolated zero (IZ) signify a coefficient below T, but with at least one child not below T 2 bits are needed to code this information
Sequentially applies a sequence of thresholds T0,,TN-1 to determine significance Three-level mid-tread quantizer Refined using 2-level quantizer
Example
Threshold T =
l o g 2 Cmax 2
= 2
l o g 2 52
= 32
Quantization
+8
-8
2nd Pass
JPEG 2000
Not only better efficiency, but also more functionality Superior low bit-rate performance Lossless and lossy compression Multiple resolution Region of interest(ROI)
Huffman Coding
Entropy Coding
Transform
DWT J2K
Discrete Wavelet Transform
Arithmetic Coding
Video Compression
A video sequence consists of a number of pictures, containing a lot of time domain redundancy. This is often exploited to reduce data rates of a video sequence leading to video compression.
Motion-compensated frame differencing can be used very effectively to reduce redundant information in sequences Finding corresponding points between frames (i.e., motion estimation) can be difficult because of occlusion, noise, illumination changes, etc Motion vectors (x,y-displacements) are sent
Motion-compensated Prediction
Reference frame
Current frame
Predicted frame
Error frame
Search procedure
Reference frame Current frame
Best match
Search region
Current block
Search Algorithms
Exhaustive Search Three-step search Hierarchical Block Matching
First iteration
Minimum at first iteration Second iteration Minimum at second iteration Third iteration
Target bit-rate Px64 kbps 1P30 1.5Mbps 15-30 Mbps Up to 64kbps 5kbps-50Mbps
64kbps to 240Mbps
Image Enhancement
Aimed at improving the quality of an image for Better human perception or Better interpretation by machines.
Includes both spatial- and frequency-domain techniques: Basic gray level transformations Histogram Modification Average and Median Filtering Frequency domain operations Edge enhancement
Image enhancement
Input image Better image Enhancement technique
X -Simplest case g [.] depends only on the value of f at [ x, y ]; does not depend on the position of the pixel in the image. called brightness transform or point processing
Contrast stretching
s = T (r )
s = 255 r
s = T (r )
Thresholding
Th= 120
Log transformation
Compresses the dynamic range
s = c log( r + 1)
where and
Example : = 1,
Monitor output
Gamma Correction
Sample Input
Monitor Output
Original Image
Corrected by = 1.5
Bit-plane slicing
Highlights the contribution of specific bits to image intensity Analyses relative importance of each bit; aids determining the number of quantization levels for each pixel
MSB plane
Original
Histogram Processing
Histogram
rk {0,1,.., L 1} nk n nk Number of pixels with p(rk ) = gray level rk n Total number of pixels
histogram: For B-bit image, initialize 2 B bins with 0 For each pixel x,y If f(x,y)=i, increment bin with index I endif endfor
Histogram
Low-contrast image
Histogram
Improved-contrast image
Histogram
Histogram Equalisation
Suppose r represents continuous gray levels 0 r 1. Consider a transformation of the form s = T ( r ) that satisfies the following conditions (1) s = T ( r ) is single valued, monotonically increasing in r . (2) 0 T ( r ) 1 for 0 r 1
T [ 0,1] [0,1]
1 Inverse transformation is T ( s ) = r , 0 s 1
Suppose s=T ( r ) = pr ( u ) du ,0 r 1
0
then,
ds = pr ( r ) dr
ps ( s ) =
pr ( r ) = 1, 0 s 1 pr ( r )
Histogram Equalisation
rk {0,1,.., L 1} p(rk ) =
k i =0
nk n
g k = p(ri )
Histogram
Histogram-equalized Image
Example
The following table shows the process of histogram equalization for a 128X128 pixel 3-bit (8level) image. Gray level ( rk )
nk
nk n
k n sk = round i i =0 n
X 7
0 1 2 3 4 5 6 7
0 2 5 6 6 7 7 7
Histogram specification
Given an image with a particular histogram, another image which has a specified histogram can be generated and this process is called histogram specification or histogram matching.
pr ( r ) original histogram pz ( z ) desired histogram
s = pr ( u ) du
0
r = pz ( w ) dw
0
z = G 1 ( s ) = G 1T ( r )
Image filtering Image filtering involves Neighbourhood operation taking a filter mask from point to point in an image and perform operations on pixels inside the mask.
Linear Filtering
he case of linear filtering, the mask is placed over the pixel; the gray value of the image are multiplied with the corresponding mask weights and then added up to give the new value of the pixel. Thus the filtered image g[m,n] is given by
g[m, n] =
w
m' n' m ', n ' n'
m ', n '
f [m m ', n n ']
Where summations are performed over the window. The filtering window is usually symmetric about the origin so that we can write
g[m, n]
w
m'
f [m + m ', n + n ']
An example of a linear filter is the averaging low-pass filter. The output I of an averaging filter at any pixel is the average of the neighbouring pixels inside the filter mask. It can be given as f [m, n] = w f [m + i, n + j ] ,
avg
avg
i, j
where the filter mask is of size m n and f , w are the image pixel values and filter weights respectively. Averaging filter can be used for blurring and noise reduction.
i, j i, j
Show that averaging low-pass filter reduces noise. Large filtering window means more blurring
Averaging filter
Original Image
Noisy Image
1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9
Filtered Image
High-pass filter
A highpass filtered image can be computed as the difference between the original and a lowpass filtered version. Highpass = Original Lowpass 0 0 0
0
0 0
0
1 0
-1/9
8/9 -1/9
High-pass filtering
-1/9
8/9 -1/9
Unsharp Masking
f s [m, n] = Af [m, n] f av [m, n] A > 1 = ( A 1) f [m, n] + f [m, n] f av [m, n] = ( A 1) f [m, n] + f high [m, n]
Median filtering
The median filter is a nonlinear filter that outputs the median of the data inside a moving window of pre-determined length. This filter is easily implemented and has some attractive properties Useful in eliminating intensity spikes ( salt & pepper noise) Better at preserving edges Works up to 50% of noise corruption Verify that median filter is a nonlinear filter.
18
20
20
15
255
17
20
20
Median
20
Median Filtering
IMAGE TRANSFORMS
Image transform
Signal data are represented as vectors. The transform changes the basis of the signal space. The transform is usually linear but not shift-invariant. Useful for z compact representation of data z separation noise and salient image features z efficient compression. .
z
A transfom may be z orthonormal/unitary or non-orthonormal z complete, overcomplete, or undercomplete. z applied to image blocks or the whole image ..
1D TRANSFORM
DATA
UNITARY TRANSFORM
0 , N 1
1, N 1
N 1, N 1
t t T = : t
1
* 0 ,0 * 0 ,1
t ...... t
1,0
N 1,0 *
t ...... t
1,1 *
1, N 1
N 1,0
* N 1,1
.....t
* N 1, N 1
t ...... t
1,0
t
*
1,1
0 , N 1
1, N 1
t t = F [0] : t
0 ,0
0 ,1
N 1,0
* N 1,0 * N 1,1
* N 1, N 1
2D DFT
Other Examples : DCT (Discrete Cosine Transform) DST (Discrete Sine Transform) DHT (Discrete Hardmard Transform) KLT (Karhunen Loeve Transform)
The rows of T form an orthogonal basis for the Ndimensional complex space.
2.
Tf = f
3.
4.
5.
Parsevals theorem: T is energy preserving transformation, because it is an unitary transform which preserves energy F*' F = energy in transform domain f*' f = energy in data domain F*' F = [Tf]*' Tf = f*' T*' Tf = f*' I f = f*' f Unitary transform is a length and distance preserving transform. Energy is conserved, but often will be unevenly distributedamong coefficients.
Decorrelating property
z z
Let f = [ f[0], ,f[N-1] ] T be data vector, = Covariance matrix = Covariance matrix of transformation
Diagonal elements of CFt variance, off-diagonal elements of CFt Covariance Perfect Decorrelation off-diagonal elements are zero.
2D CASE
Separable Property
Matrix representation
z
=
Thus we can write
N 1 N 1 n2 = 0 n1 = 0
f [n , n ]t[k , n ]t[n , k ]
1 2 1 1 2 2
Energy preserving Distance preserving Energy compaction Other properties specific to particular transforms.
UCU f =
KL Transform
KL transform is E(FKLT) = 0
FKLT = TKLT(f - f)
KL Taransform
Covariance Matrix
FKLT
is
For KLT eigen values are arranged in descending order and the transformation matrix is formed by considering the eigen vectors in order of the eigen values. Reconstructed value is Suppose we want to retain only k- transform coefficients we will retain the transformation matrix formed by the k largest eigen vectors.
Principal Component Analysis (PCA) Linear combination of largest principal eigen vectors.
KLT Illustrated
F2
F1
KL transform
z
J 1 i =0 i
Mean square error is minimum over all affine transforms. Transform matrix is data dependent so computationally extensive.
1D-DCT
z
Let f = [ f[0], ,f[N-1] ] T be data vector 1D-DCT of data vector f and its IDCT are
where
DCT :
f ( n) =
= 0 , N
f ( n) , n
< 1 n 2 N 1
N 1
Let
0, otherwise
DCT of
f (n)
is given by
Another interpretation
z
j2 j j 2 nk N 1 j2 nk j nk 2 N nk 2 N nk = e 2N e +e 2 N e 2 N f (n) e
n=0
n=0
2N
e N
j nk
(k ) F '(k )
DCT
2D-DCT
z
The 2D DCT of
f (nn , ) 1 2
is
Secondly, a first order markov process with correlation coefficient has a covariance matrix
If is close to 1 then
Therefore for a Markov first order process with close to 1 DCT is close to KLT. Because of closeness to KLT, energy compaction, data decorrelation and ease of computation DCT is used in many applications.
Matrices Many image processing operations are efficiently implemented in terms of matrices Particularly, many linear transforms are used
Simple example: colur transformation Y 0.299 0.587 0.114 R I = 0.596 -0.275 -0.321 G Q 0.212 -0.523 0.311 B
N 1
Denoting
y [ 0] y 1 [ ] . we get y= . . y [ N -1]
y = Ax
aN 1,1
x [ k ] = x [ n] e
n =0
N 1
2 nk N
k=0,1,.........N-1
X [ 0] 1 X [1] 1 . = . . . . 1 X [ N 1]
1 e
j 2 N
. .
. .
. .
1 e
j 2 ( N 1) N
2 ( N 1) N
2 ( N 1)( N 1) N
x [ 0] x [1] . . . x[ N 1]
( x, y)
r
( x, y )
r
x = x cos y sin
y = x sin + y cos
x
AT AT = A
A*T
A
i 1
Example:
i 1 T , A = 1 -i
Inverse of a Matrix:
A 1A = I
Unitary Matrix
A matrix A of complex elements is called unitary if
A 1 = AT
Example
1 3 1 3 1 3
1 3 1 1 3 + j 2 2 3 1 1 3 j 2 3 2
1 3 1 1 3 j 2 2 3 1 1 3 + j 2 3 2
Orthogonal Matrix
For an orthogonal matrix A
A 1 = AT
Real-valued unitary matrices are orthogonal Example: Rotation operation
y = x sin + y cos
x = x cos y sin
cos A= sin
-sin cos
cos A = -sin
1
sin T =A cos
Example
Is the following matrix orthogonal?
1 2 1 2 1 2 1 2
Toeplitz Matrix
A matrix is called Toeplitz if the diagonal contains the same element, each of the sub-diagonals contains the same element and each of the super-diagonals contains the same element. The system matrix corresponding to linear convolution of two sequences is a Toeplitz matrix. The autocorrelation matrix of a wide-sense stationary process is Toeplitz.
Example
aN 1,1
Circulant Matrix
A matrix is called a Circulant Matrix if each row is obtained by the circular shift of the previous row.
a0,0 a0, N 1 a A = 0, N 2 . . a0,1 a0,1 a0,0 a0, N 1 a0, N 1 a0,1 . . . a0, N 2 a0,0 . . . a0, N 3 . . ...a0, N 1 a0,0 ...
Example
a0,2
The system matrix corresponding to circular convolution of two sequences is a circulant matrix.
= 90 then
x 0 y = 1
-1 x 0 y
0 A= 1
-1 0
then No real eigen values and eigen vectors exist. Now consider Rotation operation with
= 180 then
1800
-1 A= 0
0 -1
In Gray-scale Morphology,
A = {( x, y , z ) | ( x, y ) Z 2 , I ( x, y ) = z}
The value of z gives the gray value and (x, y) gives the gray point.
Structuring element
It is similar to mask in convolution It is used to operate on the object image A
Structuring element
A
Entire image
= {(x, y) | (x, y) A} A
B
( A )x
A by an amount x is given by = {a + x | a A}
Dilation operation
Given a set A and the structuring element B, we define dilation of A with B as
A B =
x| A B
( )
A B
Why dilation?
If there is a very small object, say (hole) inside the object A, then this unfilled hole inside the object is filled up. Small disconnected regions outside the boundary may be connected by dilation. Irregular boundary may be smoothened out.
Properties of Dilation
1. 2.
A B = B A
A ( B C ) = ( A B) C
3.
( A B) x = Ax B
Erosion operation
Erosion of A with B is given by,
A B
A B = { x | B x A}
Why Erosion?
1. Two nearly connected regions will be separated by erosion operation. 1. Shrinks the size of the objects 3. Removes peninsulas and small objects . 4. Boundary may be smoothened
Properties of Erosion
1. Erosion is translation invariant 2. Erosion is not commutative 3. Erosion is not associative
( A B ) x = Ax B
( AB ) BA
( AB) C = A( B C)
( A B )
= A B
C
A B
= A B
C
Original
Dilated
Closing operation
Dilation, followed by Erosion
AB =
(A
B ) B
After Dilation operation, the size of the object is increased and it is brought back to the original size by erosion operation
Closing
Object A
Structuring element B
By closing operation, irregular boundaries may be smoothened depending upon the structural element
Example
Original
Dilated
Closed
Opening operation
Erosion, followed by Dilation
A o B = ( A B ) B
Opens the weak links between the nearby objects Smoothens the irregular boundary
In all operations, performance depends upon the structuring element and is applied to binary images. Example: Edge detection
Example
A={(1,0), (1,1), (1,2),(1,3), (0,3)} B={(0.0), (1.0)}
A B = ( AB1 ) ( A B2 )
C
A B = ( AB1 ) A B 2
where
B 2 = W B1
W is the window around the structuring element
Object A
Structuring element B1
B2=W-B1
A B = ( AB1 ) ( A B2 )
C
Example
In a text, how many Fs are present? Structuring element or
The structuring element will match both E & F .In such cases, background is also considered for pattern matching Hit or miss transform will match F only
Example
( A ) = A ( A B )
X n = ( X n 1 B ) A C
X n = X n 1
find
Xn
A Xn
3. Thinning operation
This gives the image as single pixel width line
object
Thinning
object
skeletonizing
3. Thinning operation
Thinning operator
A B = A (A B)
B1
1s 0s Original image A
B2
A B = A ( A B1 ) A ( A B1) B2
In this manner, we can do hit or miss with structuring elements to get ultimately thinned object
Skeletonizing
The skeleton is given by the operation
S( A) = USk ( A)
k =0
Gray-scale morphology
.Generalization of binary morphology to gray-level images Max and Min operations are used in place of OR and AND Nonlinear operation The generalization only applies to flat structuring elements. .
b ( x, y )
Domain of f ( x , y ) Domain of
Df
b ( x, y )
Db
Here, mask is rotated by 180 degree and placed over the object; from overlapping pixels, maximum value is considered. Since it is a maxima operation, darker regions become bright
Applications
1. Pepper noise can be removed 2. Size of the image is also changed.
Dilation Illustrated
Dilation Result
Erosion operation
Erosion Result
Closing operation
f b = ( f b) b
Removes pepper noise Keeps intensity approximately constant Keeps brightness features
Opening operation
f o b = ( f b ) b
Removes salt noise Brightness level is maintained Dark features are preserved
Original image
closing
opening
Duality
Gray-scale dilation and erosion are duals with respect to function complementation and reflection
( f b) ( s, t ) = ( f
c
( s, t ) b
Gray-scale opening and closing are duals with respect to function complementation and reflection
c f b = f ob ( ) c
Smoothing
Opening followed by closing Removes bright and dark artifacts, noise
Morphological gradient
g = f b f b
Subtract the Eroded image from the dilated image Similar to boundary detection in the case of binary image Direction Independent
Two ways of applications: (a) The intensity levels can be considered as the values of a discrete random variable with a probability mass function. (b) Image intensity as two-dimensional random process
1400 1200 1000 800 600 400 200 0 0 50 100 150 200 250
We can use this distribution of grey levels to extract meaningful information about the image.
Example: Coding application and segmentation application
Probability concepts Random Experiment: An experiment is a random experiment if its outcome cannot be predicted precisely. One out of a number of outcomes is possible in a random experiment. A single performance of the random experiment is called a trial. 2. Sample Space: The sample space is the collection of all possible outcomes of a random experiment. The elements of are called sample points. 3. Event: An event A is a subset of the sample space such that probability can be assigned to it.
Probability Definitions
Classical definition of probability) Consider a random experiment with a finite number of outcomes. If all the outcomes of the experiment are equally likely, the probability of an event N is defined by
A
NA P ( A) = N
Where is the number of outcomes favourable to A. Example A fair die is rolled once. What is the probability of getting a6? Here S = {'1', ' 2 ', '3', ' 4 ', '5', '6 '} and A = { '6 '}.
P ( A) = 1 . 6
nA P ( A) = Lim n n
Example: Suppose a die is rolled 500 times. The following table shows the frequency each face.
Face Frequency
1 82
2 81
3 88
4 81
5 90
6 78
Then
P( A) =
78 1 500 6
. .
L-1
p0
p1
pL 1
For each gray level i, f h[i ] = i N where f i is the number of pixels with intensity i. The probability is estimated from the histogram of the image.
P ( AUB ) = P ( A) + P ( B ) P ( A B)
P( S ) = 1
If A and B are mutually exclusive or disjoint, then
P( A B) = 0
Conditional Probability
Independent events
Two events are called independent if the probability of occurrence of one event does not affect the probability of occurrence of the other. Thus the events A and B are independent if and only if or
P ( B / A) = P ( B )
P ( A / B ) = P ( A) and hence P ( A B ) = P ( A) P ( B )
Random variable
A random variable associates the points in the sample space with real numbers. Consider the probability space and function mapping the sample space into the real line. Real line
Range of X
FX ( x) = P ({ X x})
FX ( x )
Random variable
FX ( x)
is a non-decreasing function of
X.
Thus,
FX () = 0 FX ( ) = 1
P ({x1 < X x2 }) = FX ( x2 ) FX ( x1 )
Example
Suppose S = {H , T } and X : S \ is defined by X (H) = 1 and X (T ) = 1. Therefore, X is a random variable that can take values 1 with 1 1 probability and -1 with probability . 2 2
1
H S T
FX (x)
-1 X
x x
X,
denoted
f X ( x) =
d FX ( x ) dx
a non-decreasing function
f X (u )du
( x) dx = 1
x2
P( x1 < X x 2 ) =
x1
f X ( x)dx
x 0 x0 + x0
Example
Uniform Random Variable
1 f X ( x) = b - a 0
a xb otherwise
f X ( x) =
1 2 X
1 xX 2 X
< x <
f X ( x) f X ( x) = dy g ( x) x = g 1 ( y ) dx Example: Probability density function of a linear function of a random variable Suppose Y = aX + b, a > 0. fY ( y ) =
y b dy =a and a dx y b fX ( ) f X ( x) a fY ( y ) = = dy a dx Then x =
The expectation operation extracts a few parameters of a random variable and provides a summary description of the random variable in terms of these parameters. The expected value or mean of a continuous random variable
xf ( x)dx X X = EX = N x p (x ) i X i i =1
X
is defined by
g( X )
of a RV
EY = Eg ( X ) = g ( x) f X ( x)dx
= x 2 f X (x )d x
2 X = E(X - X )2 = (x - X )2 f X (x)dx
The CDF of the random vector X is defined as follows F ( x , x ,..x ) = F (x) X , X ,.., X X 1 2 n n 1 2 = P ({ X x , X x ,.. X x }) 1 1 2 2 n n
F ( x , x ,..x ) is 1 2 n X , X ,.. X 1 2 n
continuous in each of
We also define the following important parameters. The mean vector of X, denoted by , is defined as
X
E( X ) E ( X ) = E ( X) = X # E ( X ) = #
1 2 n X
1
X2
Xn
Similarly for each (i, j ) i = 1, 2,.., n, j = 1,2,.., n we can define the covariance
X X
i
All the possible covariances and variances can be represented in terms of a matrix called the covariance matrix CX defined by
C X = E ( X X )( X X ) cov( X 1 , X 2 )" cov( X 1 , X n ) var( X 1 ) cov( X , X ) var( X )". cov( X , X ) 2 1 2 2 n = # # # cov( , ) cov( , ) var( ) X X X X " X n 1 n 2 n
Multi-dimensional Gaussian
Suppose for any positive integer n, X 1 , X 2 ,....., X n represent
n
jointly
random variables.
1 2 n
These random variables are called jointly Gaussian if the random variables X , X ,....., X have joint probability density function given by
X1 , X 2 ,....., X n
( x , x ,...x ) =
1 2 n
1 1 ( X X ) ( X X )' C X 2
)
X
n
det(C )
X
where
X
C = E ( X )( X ) '
X X
1 2
is
the
covariance
matrix
and
variables.
Example
2D Gaussian
X and Y
f X ,Y ( x , y ) =
1 2 X Y
2 1 X ,Y
( x X 2 X
)2
2 XY
( x
X Y
)( y
( y
Y 2 Y
)2
correlation coefficient X , Y .
auto co-variance matrix CX = E ( X x )( X x )' . CX is a positive definite matrix. This matrix can be digitalized as 1
0 CX= . .
T
2
.
Where 1 , 2 .....N are the eigen values of the matrix CX and is the matrix formed with the eigen vectors as its columns. Consider the transformation Y = T ( X x ) . Then, CYwill be a diagonal matrix and the transformation is called the Karhunen Loeve transform (KLT).
. . . . . 0 N .
Random Process
X (t , s3 )
s3
X (t , s2 )
s2
s1
X (t , s1 )
Recall that a random variable maps each sample point in the sample space to a point in the real line. A random process maps each sample point to a waveform. A random process is thus a function of t and s. The random process X(t,s) is usually denoted by X(t). Discrete time random process X[n].
The discrete random process { X [ n ]} is a function of onedimensional variable n. Some important parameters of a random process are: Mean: [ n] = EX [ n] Variance:
2 [ n ] = E ( X [n] [ n ])2
Auto correlation:
R X [ n, m ] = E ( X [ n ] X [ m ] )
C X [ n, m ] = E ( X [ n ] [ n ] ) ( X [ m ] [ m ] )
Auto covariance:
X [n] ,
the Mean
EX [ n ] = X = constant
and the autocorrelation function is a function of the lag n monly. We denote the autocorrelation function of a WSS process X [ n ] , at lag k by Autocorrelation function RX [k ] is even symmetric with a maximum at k=0.
RX [ m, n ] = EX [m] X [n]
Matrix Representation
We can represent N samples by an N-dimensional random vector
X [1] 2 X [ ] X = . . X N [ ]
Mean vector:
EX [1] x [1] EX [ 2] x [ 2] = . X = EX = . . . EX N N [ ] x [ ]
Autocorrelation matrix:
R X = EXX
Cx = E ( X - X )( X - X )
are symmetric Toeplitz matrix
Frequency-domain Representation
A host of tools are available to study a WSS process. Particularly, we may have the frequency domain representation of a WSS process in terms of the power spectral density (PSD) given by
S X ( ) =
k =
RX [k ]e j k
as given
1 RX [ k ] = 2
S X ( )e j k d
A random process { X [ n]} is called a Gaussian process if for any N, the joint density function is given by,
f X [1], X [2],..., X [ N ] ( x1 , x2 , x3 ,..., xN ) = 1
N 1 ( x X )' C-1 X ( x X ) 2
det ( Cx )
Markov process
X [ n] can take the one of L discrete { X [ n]} is a random process with discrete state .i.e. values x 0 , x 1 , x 2 , . . . . x L 1 with certain probabilities as shown below. x0 , x1 , x2 ,....xL 1 State: p0 , p1 , p2 ,..., pL 1 Probabilities:
{ X [ n]} is called first-order Markov if
P ({ X [ n ] = xn | X [ n 1] = xn 1 , X [ n 2] = xn 2 ,...}) = P ({ X [ n ] = xn | X [ n 1] = xn 1})
Thus for a first-order Markov process, the current state depends on the immediate past. Similarly, { X [n]} is called p th-order Markov if
P
({ X [ n ] = x
| X [ n 1] = xn 1 , X [ n 2] = xn 2 ,...}
=P
({ X [ n ] = x
| X [ n 1] = xn 1 , X [ n 2] = xn 2 ,..., X [ n p ] = xn p }
Random field A two dimensional random sequence{ X [m, n]} is called a random field. For a random field { X [m, n]}, we can define the mean and the autocorrelation functions as follows: Mean: EX [ m, n] = [m, n] Autocorrelation:
RX [ m, n, m, n] = EX [m, n] X [m, n]
A random field{ X [ m, n]}is called a wide-sense stationary (WSS) or homogeneous random field if RX [ m, n, m, n] is a function of the lags [m m, n n]. Thus, for a WSS random field{ X [m, n]}, the autocorrelation function RX [ k , l ] can be defined by
RX [ k , l ] = EX [ m, n] X [m + k , n + l ) = R X [ k , l ]
= EX [m + k , n + l ] X [m, n]
A random field{ X [m, n]} is called a separable random field if RX [ m, n] can be separated as RX [ m, n] = RX [ m] RX [ n]
1 2
Two-dimensional power spectral density We have the frequency domain representation of a WSS random field in terms of the two-dimensional power spectral density given by S (u , v) = R [ k , l ]e j (uk + vl ) X X
l = k =
RX [ k , l ] =
1 4
2
(u , v)e j ( uk + vl ) dudv
A random field{ f [m, n]} is called a Markov random field if the current state at a location depends only on the states of the neighboring locations.
Segmentation
Divide the image into homogenous segments. Homogeneity may be in terms of
(1) Gray values (within a region the gray values dont vary much ) Ex: gray level of characters is < gray level of background colour (2) Texture : some type of repetitive statistical uniformity (3) Shape (4) Motion (used in video segmentation)
Example
Application
Optical character recognition Industrial inspection Robotics Determining the microstructure of biological , metallurgical specimens Remote sensing Astronomical applications Medical image segmentation Object based compression techniques (MPEG 4) Related area in object representation
Main approaches
Histogram-based segmentation Region-based segmentation
Edge detection Region growing Region splitting and merging.
Clustering
K-means Mean shift
Motion segmentation
Example
Histogram-based Threshold
Assumption
Regions are distinct in terms of gray level range
Histogram-based threshold
Compute the gray level histogram of the image. Find two clusters: black and white. Minimizing the L2 error:
Select initial estimate T Segment the image using T. Compute the average gray level of each segment mb,mw Compute a new threshold value: T = (mb+mw) Continue until convergence.
Adaptive Thresholding
Divide the image into sub-images. Assume that the illumination in each sub-images is constant. Use a different threshold for each sub-image. Alternatively use a running window (and use the threshold of the window only for the central pixel ) Problems: Rapid illumination changes. Regions without text: we can try to recognize that these regions is unimodal.
Optimal Thresholding
We may use probability based approach. Intensity histogram is a mixture of Gaussian distribution.
Normalized histogram = sum of two Gaussian with different mean and variance.
Region-based segmentation
We would like to use spatial information. We assume that neighboring pixels tend to belong to the same segment (not always true) Edge detection: Looking for the boundaries of the segments. Problem: Edges usually do not determine close contours. We can try to do it with edge linking
Region-based segmentation
Basic Formulation Let R represent the entire image region. Segmentation: Partitioning R into n subgroups Ri s.t: a) U Ri = R i b) Ri is a connected region c) Ri I R j = d) P ( Ri U R j ) = False e) P( Ri ) = True P is the partition predicate
Example of a Predicate
A predicate P has values TRUE or False The intensity variation within the region is not much. Not a valid predicate The intensity difference between two pixels is less than 5.Valid Predicate The distance between two R, G, B vectors is less than 10.Valid Predicate
Region growing
Choose a group of points as initial regions. Expand the regions to neighboring pixels using a predicate:
Color distance from the neighbors. The total error in the region (till a certain threshold):
Variance Sum of the differences between neighbors. Maximal difference from a central pixel.
In some cases, we can also use structural information: the region size and shape.
In this way we can handle regions with a smoothly varying gray level or color. Question: How do we choose the starting points ? It is less important if we also can merge regions.
Frequency
S1
S2
S3
Intensity
QuadTree
R3
R21
R22
R23
R24
With quadtree, one can use a variation of the split & merge scheme: Start with splitting regions. Only at the final stage: merge regions.
Segmentation as clustering
Address the image as a set of points in the n-dimensional space:
Gray level images: p=(x,y,I(x,y)) in R3 Color images: p =(x,y,R(x,y),G(x,y),B(x,y)) in R5 Texture: p= (x,y,vector_of_fetures) Color Histograms: p=(R(x,y),G(x,y),B(x,y)) in R3. we ignore the spatial information.
From this stage, we forget the meaning of each coordinate. We deal with arbitrary set of points. Therefore, we first need to normalize the features (For
example - convert a color image to the appropriate linear space representation)
Similarity Measure
Given two vectors Xi and X j , we can use measures like Euclidean distance Weighted Euclidean distance Normalized correlation
K-means
Idea:
Determine the number of clusters Find the cluster centers and point-cluster correspondences to minimize error Problem: Exhaustive search is too expensive. Solution: We will use instead an iterative search [Recall the ideal quantization procedure.]
Algorithm
Fix cluster centers 1, 2 ,..., k Allocate points to closest cluster Fix allocation; compute best cluster centers
Error function =
Illustration of K-means
Data set (72,180) (65,120) (59,119) (64,150) (65,162) (57,88) (72,175) (44,41) (62,114) (60,110) (56,91) (70,72) Intitial Cluster centres (45,50) (75,117) (45,117) (80,180) Iteration 1 (45,50) (75,117) (45,117) (44,41) New mean (44,41) (62,114), (65,120) New mean (63,117) (57,88),(59,119),(56,91),(60,110) New mean ( 58,102) (72,180), (64,150),(65,162), (72,175),(70,172) New mean ( 39,170)
(80,180)
Mean Shift
K-means is a powerful and popular method for clustering. However:
It assumes a pre-determined number of clusters It likes compact clusters. Sometimes, we are looking for long but continues clusters.
Mean Shift:
Determine a window size (usually small). For each point p:
Compute a weighted mean of the shift in the window:
r p)
wi = d ( p, pi )
This method is based on the assumption that points are more and more dense as we are getting near the cluster central mass.
Motion segmentation
Background subtraction: Assumes the existence of a dominant background. Optical flow (use the motion vectors as features) Multi model motion: Divide the image to layers such that in each layer, there exist a parametric motion model.
Texture
Texture may be informally defined as a structure composed of a large number of more or less ordered similar patterns or structures Textures provide the idea about the perceived smoothness, coarseness or regularity of the surface. Texture has played increasingly important role in diverse application of image processing
Computer vision Pattern recognition Remote sensing Industrial inspection and Medical diagnosis.
Texture analysis: how to represent and model texture Texture synthesis: construct large regions of texture from small example images Shape from texture: recovering surface orientation or surface shape from image texture.
In image processing texture analysis is aimed at two main issues: Segmentation of the scene in an image into different homogeneously textured regions without a priori knowing the textures. Classification of the textures present in an image into a finite number of known texture classes. A closely related field is the image data retrieval on the basis of texture. Thus a speedy classification can help in browsing images in a database. Texture classification methods can be broadly grouped into one of the two approaches: Non-filtering approach and Filtering approach.
Co-occurrence Matrix
Objective: Capture spatial relations A co-occurrence matrix is a 2D array Cd in which Both the rows and columns represent a set of possible image values Cd (i, j ) indicates how many times gray value i co-occurs with value j in a particular spatial relationship d. The spatial relationship is specified by a vector d = (dr,dc). From Cd we can compute P , the normalized gray-level co-occurrence d
matrix, where each value is divided by the sum of all the values.
Example
1 1 0 0 1 1 0 0
d = 1 pixel right 16 12 Cd = 12 16 16 12 56 56 Pd = 12 16 56 56
k C ( i , j )( i j ) d i, j
5. uniformity
2 C d (i, j )
Disadvantages
Computationally expensive Sensitive to gray scale distortion (co-occurrence matrices depend on gray values) May be useful for fine-grain texture. Not suitable for spatially large textures.
Structural texture analysis methods that consider texture as a composition of primitive elements arranged according to some placement rule. These primitives are called texels. Extracting the texels from the natural image is a difficult task. Therefore these methods have limited applications. Statistical methods that are based on the various joint probabilities of gray values. Co-occurrence matrices estimate the second order statistics by counting the frequencies for all the pairs of gray values and all displacements in the input image. Several texture features can be extracted from the co-occurrence matrices such as uniformity of energy, entropy, maximum probability, contrast, inverse difference moments, and correlation and probability run-lengths. Model based methods that include fitting of model like Markov random field, autoregressive, fractal and others. The estimated model parameters are used to segment and classify textures.
Filtering approach
In the filtering approach, the input image is passed through a linear filter followed by some energy measure. Feature vectors are extracted based on these energy outputs. Texture classification is based on these feature vectors. The following figure shows the basic filtering approach for texture classification.
Filtering approach includes Laws mask, ring/wedge filters, dyadic Gabor filter banks, wavelet transforms, quadrature mirror filters, DCT, Eigen filters etc.
Gabor filters
Gabor Filters Fourier coefficients depend on the entire image (Global): We lose spatial information. Objective: Local Spatial Frequency Analysis Gabor kernels: Fourier basis multiplied by a Gaussian
( x x0 ) 2 g ( x) = exp ( i ( x x0 ) i ) exp 2 2 2 2 1
Gabor filter
Gabor filters come in pairs: symmetric and antisymmetric Each pair recover symmetric and antisymmetric components in a particular direction.:the spatial frequency to which the filter responds strongly : the scale of the filter. When = infinity, similar to FT We need to apply a number of Gabor filters at different scales, orientations, and spatial frequencies.
( x, y ).
The intensity at ( x, y ) is a two-dimensional function and is denoted by f ( x, y ). The video is modeled as a three-dimensional function f ( x, y , t ). The digital image is defined over a grid, each grid location being called a pixel. We will denote this 2D discrete space signal as f [m, n].
( x) = 0
x0
f ( x) ( x x)dx = f ( x)
( x)dx =1
( x)
a
-a
(ax) =
( x) = u( x)
0 [ n] = 1
Sifting property :
m = m =
n0 n =1
f [m] [n - m] =f [n]
3. Rectangle function ::
1, rect ( x) = 0
1 x 2 otherwise
4. Sinc function ::
sin x sin c ( x) = x
j x
These functions are defined in two or multiple dimensions by the seperability property ::
f ( x, y ) is symmetric if f ( x, y ) = f ( x) f ( y )
For example, the complex exponential function is separable.
f ( x, y ) = f1 ( x) f 2 ( y )
j (1 x +2 y )
=e
j1 x
j2 y
( x, y ) = ( x ) ( y )
[ m, n ] = [ m ] [ n ]
A system is called linear if : T [af1 [m, n] + bf 2 [m, n]] = aTf1 [m, n] + bTf 2 [m, n] And the input and output are :
f [m, n] = f [m ', n '] [m m ', n n '] g[m, n] = Tf [m, n] = f [m ', n ']T [m m ', n n ']
m' n' m' n'
f [ m ', n ']h[ m m ', n n '] m'n' For a 2-D linear shift invariant system with input f [ m, n] and
impulse response h[ m, n] ,the output g[m, n] is given by
g[ m, n] = h[ m, n]* f [ m, n]
and f [m, n] is defined for m = 0,1, . . . . ,M 2 1 and n = 0,1, . . . . ,N 2 1 m = 0,1, . . . . ,( M 1 + M 2 2) and n = 0,1, . . . . ,( N1 + N 2 2)
2D convolution involves: 1. Rotate h [ m, n] by180D to get h [ m, n] 2. Shift the origin of h [ m, n] to [m, n] 3. Multiply/overlap elements and sum up.
f ( x, y ) * h ( x, y )
f ( x, y)h( x x, y y)dx, dy
[m,n]
Example
Causality
For a causal system, present output depends on present and past inputs. Other wise, the system is non-causal.
The concept of causality is also extended. Particularly important is the non-symmetrical half-plane (NSHP) model.
[m,n]
OUTLINE
FT, STFT, WS & DWT Multi-Resolution Analysis (MRA) Perfect Reconstruction Filter Banks Filter Bank Implementation of DWT Extension to 2D Case (Image) Applications in Denoising, Compression etc.,
Fourier Transform
F ( ) =
f (t )e jt dt
Fourier analysis -- breaks down a signal into constituent sinusoids of different frequencies. a serious drawback In transforming to the frequency domain, time information is lost. When looking at a Fourier transform of a signal, it is impossible to tell when a particular event took place.
FSTFT ( , ) = f (t ) w(t )e jt dt
t
Take FT of segmented consecutive pieces of a signal. Each FT then provides the spectral content of that time segment on Difficulty is in selecting time window.
NOTE Low frequency signal better resolved in frequency domain High frequency signal better resolved in time domain
Uncertainty Theorem
Uncertainty Theorem - We cannot calculate frequency and time of a signal with absolute certainty (Similar to Heisenbergs uncertainty principle involving momentum and velocity of a particle). In FT we use the basis which has infinite support and infinite energy. In wavelet transform we have to localize both in time domain (through translation of basis function) and in frequency domain (through scaling).
Example
Mother Wavelet
In wavelet we have a mother wavelet as the basic unit.
Daubechies
Haar
Shannon Wavelet
( x) =
sin(2 x) sin( x) x
x b translated to b and a
scaled by a
Mother Wavelet
f ( x) f ( x b)
Translation
f ( ax )
Scaling
f ( ax )
1 |a|
| a |
1 f (x ) = C
W a ,b
da .db a ,b ( x ) a2
2
Where energy is
C
( )
and
F ( ) = FT of ( x )
Admissibility Criterion
Requirement is C < => 1)
(0) = 0 DC
F
zero. 2) (x) should be of finite energy => (x) should have finite support (asymptotically decaying signal).
3)
( )d <
2 F
narrow band.
Dyadic wavelet
If so = 2 & o = 1 m,n (t ) = 2
m/2
(2 t n)
m, n
f (t ), wm,n (t )
B f
f (t )
f (t ) = wm,n m,n (t )
mn
where
and
m,n (t ) = 2m / 2 (2m t n)
(t ) = 1
0 t 1 = 0 elsewhere
This scaling function is also to be scaled and translated to generate a family of scaling functions j ,k (t ) = 2 j / 2 (2 j t k )
A function f (t ) can be generated using the basis set of translated scaling functions.
f (t ) = ak j ,k (t k )
k
In the case of Haar basis, f (t ) comprises of all piecewise continuous functions. This set of functions is called span of Let this space be denoted by Vj
{ j,k (t ), k Z }
Requirements of MRA
Requirement 1 The scaling functions should be orthogonal with respect to their integral translates
j , k (t ), j ,l (t ) = 0
t
lk
j , k (t ), j ,l (t ) dt = 0 l k
Requirement 2
... V1 VO V1 V2 ....
1,0 (t )
1 2
scaling functions at low scale is nested with in the subspace spanned by the higher scale. In general, we can write
(t )
= hk 2 (2t k )
k
(t ) = bk 2 (2t k )
k
Triangular wavelet
hk = 1 , k = 0,1,2...
1
2 2 = 0 otherwise
Requirement 3
V ... V 1 V0 V0 ... C
All square integrable functions can be represented with arbitrary precision. In particular
V = L2 ( R)
V2 V1 V0
Requirement 4
W1 W0
V1 = V0 W0 V2 = V1 W1 = V0 W0 W1
and so on. Here denotes the direct sum.
Thus at a scale, the function can be represented by a scale part and a number of wavelet parts.
(t ) = hk 2 (2t k )
k
For the wavelet bases,
(t ) = bk 2 (2t k )
k
1 2
1
(t)
1 2
W0
V1 V0
1 1 f 2 (t ) = Ck 1, k (t ) + d k 1, k (t ), k k
f 2 (t ) V2
k
0 0 1 = Ck 1, k (t ) + d k 1, k (t ) + d k 1, k (t ) k k
and so on How to find those c and d coefficients? We have to learn a bit of filterbank theory to have an answer.
LPF h0 [ n]
2
g0 [ n]
f [ n]
HPF h1 [ n]
g1 [ n]
[ n] f
Note that
X ( z)
On the synthesis side To avoid aliasing, g 0 [n]and g1[n] can be selected by a simple relationship with h0 n and h1 n .
[ ]
[ ]
Orthonormal filters
A class of perfect reconstruction filters needed for the filter bank implementation of discrete wavelet transform (DWT) These filters satisfy the relation
h1[ n] ( 1) n h0 [ N 1 n]
where N is the tap length required to be even The synthesis filters are given by
gi [ n] = hi [ n] i {0,1}
h0[n]
h0 [ n]
f [ n]
h1[n]
h1 [ n]
[ n] f
(t ) = h [k ] 2 (2t k )
By slightly modifying the notations, we have
k
(t ) = h [k ] 2 (2t k )
k
(t ) = h [k ] 2(2t k ) f (t ) = c j ,k j ,k (t ) + d j ,k j ,k (t )
k k k
j ,k (t ), j ,k (t )
other
Contd
Using the orthogonality of the scaling and the wavelet bases
c j +1,k and h [ k ]
l
at alternate points
c j ,k = h [l 2k ] c j +1,l
C j +1,k
Similarly
h [ k ]
C j ,k
d j ,k = h [l 2k ]C j +1,k
l
h [k ]
2t - k = u
1 = h [k ] (u )du 2 k
h [k ] =
k
2 ------ (1)
Similarly
h2 [k ] = 1 (2)
k
2 ( x)dx = 1
Due to the orthogonality of the scaling function and its integer translates, we have
( x) ( x m)dx = [m]
------- (3)
h [k ]h [k 2m] = [m]
k
Considering the orthogonality of scaling function and wavelet functions at a particular scale
h [k ] = (1)k h [ N k ]
Hence, h [ k ] and h [ k ] form perfect reconstruction orthonormal filter banks
h [k ] =
k =0 3 2 h [k ] = 1 k =0
1+ 3 h [0] = 4 2 3+ 3 h [2] = 4 2
3+ 3 h [1] = 4 2 1 3 h [3] = 4 2
V2 V1 V0
W0
W1
The process of approximation from highest resolution can be explained as shown by the following figure
Approximate
h [ k ]
Highest resolution
2
Lowest resolution
h [ k ]
h [k ]
2
h [k ]
2
f [k ]
Detail 2
Detail 1
Reconstruction
Synthesis filter banks can be applied to get back the original signal
Approximation Processing 2
g [ k ]
+
2
g [ k ]
Detail - 2 Processing 2
g [k ]
+
Reconstructed signal
Detail - 1 Processing 2
g [k ]
2D Case
For 2 dimensional case, separability property enables the use of 1D filters
(t1 , t2 ) = 1 (t1 ) 2 (t2 )
The corresponding filters can be first applied in first dimension and then in other dimension. First, the LPF and HPF operations are done row wise and then column wise. This can be explained with the following figure
column
LP
row
LP
HP
2
2
LL-scaling co-eff
f [ m, n ]
LH-scaling co-eff
column
HP
column
row
LP
HP
2
2
HL-scaling co-eff
HH-scaling co-eff
Original image
Decomposition at level 1
Decomposition at level 2