A PROJECT REPORT Submitted in partial fulfillment of the requirements for the award of the degree of
This is to certify that this project work entitled IMAGE COMPRESSION AND
DECOMPRESSION USING NEURAL NETWORKS is being submitted in
partial fulfillment for the award of degree of BACHELOR OF TECHNOLOGY in ELECTRONICS AND COMMUNICATION ENGINEERING, Jawaharlal Nehru Technological University and is a record of bonafide work carried out by
INDEX
1. ABSTRACT 2. INTRODUCTION 3. THEORY 3.1 NEURAL NETWORKS Artificial neural networks The Analogy to the Brain The Biological Neuron The Artificial Neuron Design Layers Communication and types of connections Learning laws Applications of Neural Networks
1 2 4 4
3.2 IMAGE PROCESSING Image Compression Principles of Image Compression Performance measurement of image Compression Compression Standards 4. IMAGE COMPRESSION WITH NEURAL NETWORK Back-Propagation image Compression Hierarchical Back-Propagation Neural Network Adaptive Back-Propagation Neural Network Hebbian Learning Based Image Compression Vector Quantization Neural Networks; Predictive Coding Neural Networks.
17
27
5. PROPSED IMGAE COMPRESSION USING NEURAL NETWORK Levenberg Marquardt Algorithm Training Procedure
41
6. IMPLEMEMNTATION OF IMAGE COMPRESSION AND DE-COMPRESSION USING MATLAB Matlab code Functions used in MATLAB program Results 7. CONCLUSION 8. FUTURE SCOPE 9. BIBILIOGRAPHY
46
57 58 59
ABSTRACT
Uncompressed multimedia (graphics, audio and video) data requires considerable storage capacity and transmission bandwidth. Despite rapid progress in mass-storage density, processor speeds, and digital communication system performance, demand for data storage capacity and data-transmission bandwidth continues to outstrip the capabilities of available technologies. The recent growth of data intensive multimedia-based web applications has not only sustained the need for more efficient ways to storage and communication technology. Apart from the existing technology on image compression represented by series of JPEG, MPEG and H.26x standards, new technology such as neural networks and genetic algorithms are being developed to explore the future of image coding. Successful applications of neural networks to basic propagation algorithm have now become well established and other aspects of neural network involvement in this technology. Here we present an extensive survey on the development of neural network for image compression. One of the most promising ways to utilize the power of neural network for image compression lies on (a) Selection of efficient multi layered network (b) Selection of training methods (c) Test vector. Based on this criteria network are trained and implemented. In this project a literature survey has been carried out to find and efficient multilayered neural network and suitable and tested using MATLAB for a test case of image of size 64:64, the trained weight and biases have been recorded.
INTRODUCTION:-
Neural networks are inherent adaptive systems, they are suitable for handling nonstationaries in image data. Artificial neural network can be employed with success to image compression. The advantages of realizing a neural network in digital hardware are: Fast multiplication, leading to fast update of the neural network. Flexibility, because different network architectures are possible. Scalability, as the proposed hardware architecture can be used for arbitrary large network, considered by the no. of neurons in one layer. The greatest potential of neural networks is the high speed processing that is provided through massively parallel VLSI implementations. The choice to build a neural network in digital hardware comes from several advantages that are typical for digital systems: 1. 2. 3. 4. Low sensitivity to electric noise and temperature. Weight storage is no problem. The availability of user-configurable, digital field programmable gate arrays, Well-understood design principles that have led to new, powerful tools for
The crucial problems of neural network hardware are fast multiplication, building a large number of connections between neurons, and fast memory access of weight storage or nonlinear function look up tables.
The most important part of a neuron is the multiplier, which performs high speed pipelined multiplication of synaptic signals with weights. As the neuron has only one multiplier the degree of parallelism is node parallelism. Each neuron has a local weight ROM (as it performs the feed-forward phase of the back propagation algorithm) that stores, as many values as there are connections to the previous layer. An accumulator is used to add signals from the pipeline with the neurons bias value, which is stored in an own register.
The aim is to design and implement image compression using Neural network to achieve better SNR and compression levels. The compression is first obtained by modeling the Neural Network in MATLAB. This is for obtaining offline training.
Adaptive learning: An ability to learn how to do tasks based on the data given for training or initial experience.
Real Time Operation: ANN computations may be carried out in parallel, and special
hardware devices are being designed and manufactured which take advantage of this capability.
biological counterpart in the brain. However, neural networks have a strong similarity to the brain and therefore a great deal of the terminology is borrowed from neuroscience.
The various inputs to the network are represented by the mathematical symbol, x(n). Each of these inputs are multiplied by a connection weight, these weights are represented by w(n). In the simplest case, these products are simply summed, fed through a transfer function to generate a result , and then output. Even though all artificial neural networks are constructed from this basic building block the fundamentals may vary in these building blocks and there are differences.
3.1.5 Design
The developer goes through a period of trial and error in the design decisions before coming up with satisfactory design. The design issues in neural networks are complex and are the major concerns of system developers. Designing a neural network consists of: Arranging neurons in various layers. Deciding the type of connections among neurons for different layers, as well as among the neurons within a layer. Deciding the way a neuron receives input and produces output. Determining the strength of connection within the network by allowing the network learns the appropriate values of connection weights by using a training data set. The process of designing a neural network is an iterative process.
3.1.6 Layers
Biologically, neural networks are constructed in a three dimensional way from microscopic components. These neurons seen capable of nearly unrestricted interconnections. This is not true in any man-made network. Artificial neural network are the simple clustering of the primitive artificial neurons. This clustering occurs by creating layers, which are then
connected to one another. How these layers connect may also vary. Basically, all artificially neural networks have a similar structure of topology. Some of the neurons interface the real world to receive its inputs and other neurons provide the real world with the networks outputs. All the rest of the neurons are hidden form view.
Figure 3.3 THREE PECEPTRON FOR IMAGE COMPRESSION As the figure above shows, the neurons are grouped into layers. The input layer consists of neurons that receive input form the external environment. The output layer consists of neurons that communicate the output of the system to the user or external environment. There are usually a number of hidden layers between these two layers; the figure above shows a simple structure with only one hidden layer. When the input layer receives the input its neurons produce output, which becomes input to the other layers of the system. The process continues until a certain condition is satisfied or until layer is invoked and fires their output to the external environment. To determine the number of hidden neurons the network should have to perform its best, one are often left out to the method trial and error. If the hidden number of neurons are
increased too much an over fit occurs, that is the net will have problem to generalize. The training set of data will be memorized, making the network useless on new data sets.
10
Resonance. The layers have bi-directional connections, and they can continue sending messages across the connections a number of times until a certain condition is achieved.
On-center/off surround. A neuron within a layer has excitatory connections to itself and its immediate neighbors, and has inhibitory connections to other neurons. One can imagine this type of connection as a competitive gang of neurons. Each gang excites itself and its gang member and inhibits all member of other gangs. After a few rounds of signal interchange, the neurons with an active output value will win, and is allowed to update its and its gang members weights. (The are two types of connections between two neurons, excitatory or inhibitory. In the excitatory connection, the output of one neuron increases the action potential of the neuron to which it is connected. When the connection type between two neurons is inhibitory, then the output of the neuron sending a message would reduce the activity or action potential of the receiving neuron. One causes the summing mechanism of the next neuron to add while the other causes it to subtract. One excites while the other inhibits.
3.1.8 Learning.
11
The brain basically learns from experience. Neural networks are sometimes called machine-learning algorithms, because changing of its connection weights (training) causes the network to learn the solution to a problem. The strength of connection between the neurons is stored as a weight-value for the specific connection. The system learns new knowledge but adjusting these connection weights. The learning ability of a neural network is determined by its architecture and by the algorithmic method chosen for training. The training method usually consists of one of three schemes: 1. Unsupervised learning. Used no external teacher and is based upon only local information. It is also referred to as self-organization, in the sense that it self-organizes data presented to the network and detects their emergent collective properties. Paradigms of unsupervised learning are Hebbian learning and competitive learning. From Human Neurons to Artificial Neuron other aspect of learning concerns the distinction or not of a separate phase, during which the network is trained, and a subsequent operation phase. We say that a neural network learns off-line if the learning phase and the operation phase are distinct. A neural network learns on-line if it learns and operates at the same time. Usually, supervised learning is performed off-line, whereas unsupervised learning is performed on-line. The hidden neurons must find a way to organize themselves without help from the outside. In this approach, no sample outputs are provided to the network against which it can measure its predictive performance for a given vector of inputs. This is learning by doing. 2. Reinforcement learning This incorporates an external teacher, so that each output unit is told what its desired response to input signals ought to be. During the learning process global information may be required. Paradigms of supervised learning include error-correction learning, reinforcement learning and stochastic learning. An important issue concerning supervised learning is the problem of error convergence, i.e. the minimization of error between the desired and computed unit values. The aim is to determine a set of weights which minimizes the error.
12
One well-known method, which is common to many learning paradigms is the least mean square (LMS) convergence. This method works on reinforcement from the outside. The connections among the neurons in the hidden layer are randomly arranged, then reshuffled as the network is told how close it is to solving the problem. Reinforcement learning is also called supervised learning, because it requires a teacher. The teacher may be a training set of data or an observer who grades the performance of the network results. Both unsupervised and reinforcement suffers from relative slowness and inefficiency relying on a random shuffling to find the proper connection weights.
3. Back propagation This method is proven highly successful in training of multilayered neural nets. The network is not just given reinforcement for how it is doing on a task. Information about errors is also filtered back through the system and is used to adjust the connections between the layers, thus improving performance. A form of supervised learning.
Off-line or On-line
One can categorize the learning methods into yet another group, off-line or on-line. When the system uses input data to change its weights to learn the domain knowledge, the system could be in training mode or learning mode. When the system is being used as a decision aid to make recommendations, it is in the operation mode, this is also sometimes called recall. Off-line In the off-line learning methods, once the systems enters into the operation mode, its weights are fixed and do not change any more. Most of the networks are of the off-line learning type.
13
On-line In on-line or real time learning, when the system is in operating mode (recall), it continues to learn while being used as a decision tool. This type of learning has a more complex design structure.
14
layer is reached. The network type called Feed forward, Back-propagation derives its name from this method of computing the error term. This rule is also referred to as the WindrowHoff Learning Rule and the Least Mean Square Learning Rule. Kohonens Learning Law This procedure, developed by Teuvo Kohonen, was inspired by learning in biological systems. In this procedure, the neurons compete for the opportunity to learn, or to update their weights. The processing neuron with the largest output is declared the winner and has the capability of inhibiting its competitors as well as exciting its neighbors. Only the winner is permitted output, and only the winner plus its neighbors are allowed to update their connection weights. The Kohonen rule does not require desired output. Therefore it is implemented in the unsupervised methods of learning. Kohonen has an unsupervised learning method. has used this rule combined with the oncenter/off-surround intra-layer connection to create the self-organizing neural network,which
chemical compound, a work, and the financial profile of a customer) as one of numerous possible categories that, in return, may trigger the recommendation of an action (such as treatment plan or a financial plan. A company called Nestor, have used neural network for financial risk assessment for mortgage insurance decision, categorizing the risk of loans as good or bad. Neural networks has also been applied to convert text to speech, NET talk is one of the systems developed for this purpose. Image processing and pattern recognition form an important area of neural networks, probably one of the most actively research areas of neural networks. Another area of research for application of neural networks is character recognition and handwriting recognition. This area has use in banking, credit card processing and other financial services, where reading and correctly recognizing on documents is of crucial significance. The pattern recognition capability of neural networks has been used to read handwriting in processing checks; and human must normally enter the amount into the system. A system that could automate this task would expedite check processing and reduce errors. One of the best-known applications is the bomb detector installed in some U.S. airports. This device called SNOOPE, determine the presence of certain compounds from the chemical configurations of their components. In a document from International Joint conference, one can find reports on using neural networks in areas ranging from robotics, speech, signal processing, vision, character recognition to musical composition, detection of heart malfunction and epilepsy, fish detection and classification, optimization, and scheduling. Basically, most applications of neural networks fall into the follwing five categories: Prediction Uses input values to predict some output e.g. pick the best stocks in the market, predict weather, identify people with cancer risk.
16
Classification Use input values to determine the classification e.g. is the input the letter A, is blob of the video data a plane and what kind of plane is it. Data association Like classification but is also recognizes data that contains errors. E.g. not only identify the character that were scanned but identify when the scanner is not working properly. Data Conceptualization Analyze the inputs so that grouping relationships can be inferred. E.g. extract from a database the names of those most likely to by a particular product. Data Filtering Smooth an input signal. e.g. take the noise out of a telephone signal.
3.2 IMAGE PROCESSING The importance of visual communication has increased tremendously in the last few decades. The progress in microelectronics and computer technology, together with the creation of network operating with various channel capacities, is the bases of an infrastructure for a new are of telecommunications. New applications are preparing a revolution in the everyday life of our modern society. Communication based applications include ISDN surveillance. Storage based audiovisual applications include Training, Education, Entertainment, Advertising, Video mail and Document annotation. Essential for the introduction of new communication services is low cost. Visual information is one of the richest and most bandwidth consuming modes of communication. The digital representation of raw video requires a large amount of data. The transmission of this raw video data requires a large transmission bandwidth. To reduce the transmission and storage requirements, the video must be handled in compressed formats. To meet the requirements, the new applications, powerful data compression techniques are
17
needed to reduce the global bit rate drastically. Even in the presence of growing communication channels offering increased bandwidth. The issue of quality is of prime importance in most applications of compression. In fact, although most applications require high compression ratios, this requirement is in general in conduction with desire for high quality in the resulting pictures. The standardization of video coding techniques has become a high priority because only a standard can reduce the high cost of video compression codes and resolve the critical problem of inter operability of equipment from different manufacturers. The existence of the standards is often the trigger to the volume production of integrated (VLSI) necessary for significant cost reductions. Bodies such as the international Standards Organization (ISO) and International. Telecommunication Union (ITU-T) today recommends the video compression standards in practice.
18
Image enhancement is the use of image processing algorithms to remove certain types of distortion in an image. Removing noise, making the edge structures in the image stand out, enhances the image or any other operations that makes the image to look better. The most widely used algorithms for enhancement are based on pixel functions that are known as window operations. A window operation performed on an image is nothing more than the process of examining the pixels in a certain region of the image, called the window region, and computing same type of mathematical function derived from the pixels in the window.
19
of the image. Successful development of the systems in this area involve not only signal processing but also other disciplines such as Artificial intelligence.
20
source information or the amount of time necessary to transmit it for a given channel capacity. The mapping from the source symbols into fewer target symbols is referred to as Compression and Vice-versa Decompression. Image compression refers to the task of reducing the amount of data required to store or transmit an image. At the system input, the image is encoded into its compressed from by the image coder. The compressed image may then be subjected to further digital processing, such as error control coding, encryption or multiplexing with other data sources, before being used to modulate the analog signal that is actually transmitted through the channel or stored in a storage medium. At the system output, the image is processed step by the step to undo each of the operations that were performed on it at the system input. At the final step, the image is decoded into its original uncompressed form by the image decoder. If the reconstructed image is identical to the original image the compression is said to be lossless, otherwise, it is lossy.
21
operations include additions, subtractions, multiplications, division and shift operations. 3. Distortion measurement (DM) For a lossy compression algorithm, DM is used to measure how much information has been lost when a reconstructed version of a digital image is produced from the compressed data. The common distortion measure is the Mean-Square-Error of the original data and the compressed data. The Single-to-Noise ration is also used to measure the performance of lossy compression algorithm.
22
Wavelet
Wavelets are functions used in representing data or other functions. They analyze the signal at different frequencies with different resolutions. Optimized standard for images with amount of data with sharp discontinuities. Wavelet compression transforms the entire image differently from JPEG and is more natural as if follows the shape of the objects in the picture. It is necessary to use a special software for viewing, being this a non-standardized compression method.
JPEG2000
Based on Wavelet technology. Rarely used.
23
Graphic format used widely with Web images. It is limited to 256 colors and is a good standard for images which are not too complex. It is not recommended for network cameras being the compression ration too limited.
24
X(u ) =
where C(u) = 0.707 for u = 0 and = 1 otherwise. In 1942 JPEG established the first international standard for still image compression where the encoders and decoders are DCT-based. The JPEG standard specifies there modes namely sequential, progressive, and hierarchical for lossy encoding, and one mode of lossless encoding. The baseline JPEG CODER, which is the sequential encoding in its simplest form, is briefly discussed here. Fig.3.1 and 3.2 shows the key processing steps in such as encoder and decoder for grayscale images. Color image compression can be approximately regarded as compression of multiple grayscale images, which are either compressed entirely one at a time, or are compressed by alternately interleaving 8 x 8 sample blocks from each in turn. In this article, we focus on grayscale images only.
The DCT-based encoder is essentially compression of a stream of 8 x8 blocks of image samples. Each 8 x 8 block makes its way through each processing step, and yields output in compressed form into the data stream. Because adjacent image pixels are highly correlated , the forward DCT (FDCT) processing step lays the foundation for achieving data compression by concentrating most of the signal in the lower spatial frequencies have zero or near zero amplitude and need not be encoded. In principle, the DCT introduces no loss to the source image samples; it merely transforms them to a domain in which they can be more efficiently encoded. After output from the FDCT, each of the 64 DCT coefficients is uniformly quantization in conjunction with a carefully designed 64-element Quantization Table (QT). At the decoder, the quartered values are multiplied by the corresponding QT elements to recover the original unquantized values. After quantization, all of the quantized coefficients are ordered into the zigzag sequence as shown in. this ordering helps to facilitate entropy encoding by placing low frequency non-zero coefficients before high frequency coefficients. The DC coefficient, which contains a significant fraction of the total image energy, is differently encoded.
25
Entropy Coding (EC) achieves additional compression listlessly by encoding the quantized DCT coefficients more compactly based on their statistical characteristics. The JPEG proposal specifies both Huffman coding and arithmetic coding. The baseline sequential code uses Huffman coding, but codecs with both methods are specified for all modes of operation. Arithmetic coding, though more complex, normally achieves 5-10% better compression than Huffman coding.
26
4.1 Basic Back Propagation Neural Network:The neural network structure can be illustrated in Fig.4.1. There layers, one input layer, one output layer and one hidden layer, are designed. Both input layer and output layer are fully connected to the hidden layer. Compression is achieved by designing the value of K, the number of neurons at the hidden layer, less than that of neurons at both input and output layers. The input image is split up into blocks or vectors of 8x8, 4x4 or 16x16 pixels. When the input vector is referred to as N-dimensional which connected to each neuron at the hidden layer can be represented by {wjb j=1,2, K and I=1,2.. N}, which can also be described by a matrix of K x N. From the hidden layer to the output layer, the connections can be represented by {wij;: 1< \ < N, 1< j< K} which is another weight matrix of N x K. Image compression is achieved by training the network in such a way that the coupling weights { wij }, scale the input vector of N-dimension into a narrow channel of K-dimension (K<N) at the hidden layer and produce the optimum output value which makes the quadratic error between input and output minimum. In accordance with the neural network structure, the operation can be described as follows: For encoding and
h j = w ji xi
i =1
for decoding.
1 j K
x i = w' ji h j
j =1
Where
1 j N
x i [0,1]
with grey levels [0,255]. The reason of using normalized pixel values is due to the fact that
27
neural networks can Operate more efficiently when both their inputs and outputs are limited to a range of [0,1].
The above linear networks can also be designed into non-linear if a transfer function such as sigmoid is added to the hidden layer and the output layer to scale the summation down in the above equations. With this basic back-propagation neural network, compression is conducted in two phases: training and encoding. In the first phase, a set of image samples is designed to train the network via back-propagation learning rule, which uses each input vector as the desired output. This is equivalent to compressing the input into the narrow channel represented by the hidden layer and then reconstructing the input from the hidden to the output layer. The second phase simply involves the entropy coding of the state vector h j at the hidden layer. In cases that adaptive training is conducted, the entropy coding of those coupling weights is also required in order to catch up with some input characteristics that are not encountered at the training stage. The entropy coding is normally designed as the simple fixed length binary coding although many advanced variable length entropy-coding algorithms are available.
28
This neural network development, in fact, is in the direction of K-L transform technology, which actually provides the optimum solution for all linear narrow channel type of image compression neural networks. Equations (1) and (2) are represented in matrix form:
[h ] = [ w ]T [ x ]
for encoding and decoding.
The K-L transform maps input images into a new vector space where all the coefficients in the new space is de-correlated. This means that the covariance matrix of the new vectors is a diagonal matrix whose elements along the diagonal are eigen-values of the covariance matrix of the original input vectors. Let ej and j, i=1, 2n, be eigen-vectors and eigen values of cx the covariance matrix for input vector x, and those corresponding eigen values are arranged in a descending order so that > +1, for I=1,2,..n-1. To extract the principal components, K eigen vectors corresponding to the K largest eigen-values in cx. In addition, all eigen-vactors in [AK]are ordered in such a way that the first row of {AK} is the eigen-vector corresponding to the smallest eigen-value. Hence, the forward K-L transform or encoding can defined as:
[ y] = [A K ][ x ] [m x ]
and the inverse K-L transform or decoding can be defined as:
[ x ] = [A K ]T [ y] + [m x ]
29
where [mx] is the mean value of [x] and [ x ] represents the reconstructed vectors or image blocks. Thus the mean square error between x and [ x ] is given by the following equation:
e m = E( x x ) 2 =
n n 1 n 2 (x g x k ) = i = i j1 j k +1 m J 1
where the statistical mean value E{.} is approximated by the average value over all the input vector samples which, in image coding, are all the non-overlapping blocks of 4x4 or 8x8 pixels. Therefore, by selecting the K eigen-vectors associated with largest eigen-values to run K-L transform over input pixels, the resulting errors between reconstructed image and original one can be minimized due to the fact that the values of s decrease monotonically. For the comparison between the equation pair (3-4) and the equation pair (5-6), it can be concluded that the linear neural network reaches the optimum solution whenever the following condition is satisfied:
[ W ' ][ W ]T = [A K ]T [A K ]
Under this circumstance, the neuron weights from input to hidden and from hidden to output can be described respectively as follows:
30
The basic back-propagation network can be further extended to construct a hierarchical neural network by adding two more hidden layers into the existing network, in which the three hidden layers are termed as combiner layer, compressor layer and decomposer layer. The structure can be shown in Figure 4.2. The idea is to exploit correlation between pixels by inner hidden layer and to exploit correlation between blocks of pixels by outer hidden layers. From input layer to combiner layer and decombiner layer to output layer, local connections are designed, which has the same effect as M fully, connected neural subnetworks.
Figure 4.2 HIERARCHICAL NEURAL NETWORK STRUCTURE Training such a neural network can be conducted in terms of : (i) Outer Loop Neural Network 9OLNN) Training; (ii) Inner Loop Neural Network (ILNN) Training; and (iii) Coupling weight allocation for the Overall Neural Network.
31
Adaptive back-propagation neural network is designed to make the neural network compression adaptive to the content of input image. The general structure for a typical adaptive scheme can be illustrated in Fig. 4.3, in which a group of neural networks with increasing number of hidden neurons (hmin, hmax), is designed. The basic idea is to classify the input image blocks in age blocks into a few sub-sets with different features according to their complexity measurement. A fine tuned neural network then compresses each sub-set.
Figure 4.3 ADAPTIVE NEURAL NETWORK STRUCTURE Training of such a neural network can be designed as : (a) parallel training (b) serial training; and (c) activity based training; The parallel training scheme applies the complete training set simultaneously to all neural networks and use S/N (signal-to-noise) ratio to roughly classify the image blocks into the same number of sub-sets as the of neural networks. After this initial coarse classification is completed, each neural network is then further trained by its corresponding refined sub-set of training blocks.
32
Serial training involves an adaptive searching process to build up the necessary number of neural networks to accommodate the different patterns embedded inside the training images. Starting with a neural network with predefined minimum number of hidden neurons, hmin, the neural network is roughly trained by all the image blocks. The S/N ratio, further training is started to the next neural network with the number of hidden neurons increased and the corresponding threshold readjusted for further classification. This process is repeated until the whole training set is classified into a maximum number of sub-sets corresponding to the same number of neural networks established. In the next two training schemes, extra two parameter, activity A(Pj) and four directions are defined to classify the training set rather than using the neural networks. Hence the back propagation training of each neural network can be completed in one phase by its appropriate sub-set. The so-called activity of the block is defined as:
and
1 1
where AP(Pi(I,j)) is the activity of each pixel which concerns its neighboring 8 pixels as r and s vary from 1 to +1 in equation (11). Prior to training, all image blocks are classified into four classes according to their activity values, which are, identified as very low, low, high and very high activities. Hence four neural networks are designed with increasing number of hidden neurons to compress the four different sub-sets of input images after the training phase is completed. On top of the high activity parameter, further feature extraction technique is applied by considering four main directions presented in image details, i.e., horizontal, vertical and 33
the two diagonal directions. These preferential direction features can be evaluated by calculating the values of mean squared differences among neighboring pixels along the four directions. For the image patterns classified as high activity, further four neural network corresponding to the above directions are added to refine their structure and tune their learning processes to the preferential orientations of the input. Hence the overall neural network system is designed to have six neural networks among which two correspond to low activity and medium activity sub-sets and other four networks correspond to the high activity and four direction classifications.
Wi ( t + 1) =
W ( t ) + h 1 ( t ) X ( t ) Wi ( t ) + h 1 ( t )X( t )
where, Wi(t+1) = {Wi1, Wi2,.WiN}- the ith new coupling weight vector in the next cycle (t+1); 1 < I < M and M is the number of output neurons. - learning rate; hi(t)- ith output value; X(t)-input vector, corresponding to each individual image block.
34
11-Euclidean norm used to normalize the updated weights and make the learning stable. From the basic learning rule, a number of variations have been developed in the existing research.
zi = {
1 d ( x , Wi ( t )) = min d ( xW j ( t )) 0 otherwise
35
Wi ( t + 1) = Wi ( t ) + ( x Wi ( t ))z i
where d(x, Wi(t)) is the distance in L2 metric between input vector x and the coupling weight vector Wi(t)= { wi1,wi2.Wik}; K=p x p ; is the leering rate, and z i is its output. A so called under utilization problem occurs in competitive learning which means some of the neurons are left out of the learning process and never win the competition. Various schemes are developed to tackle this problem. Kohonen self-organising neural network overcomes the problem by updating the winning neuron as well as those in its neighborhood. Frequency sensitive competitive learning algorithm address the problem by keeping a record of how frequent each neuron is the winner to maintain that all neurons is the network are updated an approximately equal number of times. To implement this scheme, the distance is ,modified to include the total number of times that the neuron I is the winner. The modified distance measurement is defined as:
d ( x , w ( t ) i ) = d ( x , Wi ( t )) xui ( t )
Where ui(t) is the total number of winning times for neuron I up to the tth training cycle. Hence, the more the Ith neuron wins the competition, the greater its distance from the next input vector. Thus, the change of winning the competition diminishes. This way of tackling the under-utilization problem does not provide interactive solutions in optimizing the code-book. Around the competitive learning scheme, fuzzy membership functions are introduced to control the transition from soft to crisp decision during the code-boo design process. The essential idea is that one input vector is assigned to a cluster only to a certain extent rather than either in or out. The fuzzy assignment is useful particularly at earlier training stages, which guarantees that all input vectors are included in the formation of new code-book represented by all the neuron coupling weights. Representative examples included direct fuzzy competitive learning, fuzzy algorithms for learning vector quantization and distortion equalized fuzzy competitive learning algorithm etc.
36
X n = iX n i + Vn = p + v n
i =0
where p represents the predictive value for the pixel Xn which is to be encoded in the next step. its neighboring pixels, Xn-1, Xn-2 .Xn-N, are used by the linear model to produce the predictive value. vn stands for the errors between the input pixel and its predictive value. vn can also be modeled by a set of zero-mean independent and identically distributed random variables. Based on the above liner AR model, a multi-layer perception neural network can be constructed to achieve the design of its corresponding non-liner predictor as shown in Fig.1.4. For the pixel Xn which is to be predicted, its N neighboring pixels obtained from its predictive pattern are arranged into one dimensional input vector x{Xn-1 , Xn-2.Xn-N} for the neural network. A hidden layer is designed to carry out back propagation learning for training the
37
neural network. The output of each neuron, say the jth neuron, can be derived from the equation given below:
h f = f () = f ( Wji X n 1 )
i=0
where
f(v )
1 1+ e r
Figure 4.4 PREDICTIVE NEURAL NETWORK I To predict those drastically changing features inside image such as edges, contours etc., and high-order terms are added to improve the predictive performance. This corresponds to a non-linear AR model expressed as follows:
X n = a i X n i + a jX n 1X n j + a jk X n i X n jX n k
i i j i j k
Hence , another so called functional link type neural network can be designed to implement this type of non-liner AR model with high-order terms. The structure of the network is illustrated in Fig 1.5. it contains only two layer of neurons one for input and the
38
other for output. Coupling weights, {wi}, between input layer and output layer are trained towards minimizing the residual energy which is defined as: RE = n
e n = ( X n x n )
n 2
39
40
MSE = I
[ I ( x, y ) I ' ( x, y ) ] MN
y =1 x =1
5.1
The quality of image coding is typically assessed by the Peak signal-to-noise ratio (PSNR) defined as PSNR = 20 log 10 [255/sqrt(MSE)]5.2 Training is conducted for a representative class of images using the Levenberg Marquardt algorithm. Once the weight matrices have been appropriately selected, any image can be quickly encoded using the Wh matrix, and then decoded (reconstructed) using the Wy matrix.
Basic Algorithm:
Consider the form of Newtons method where the performance index is sum of squares. The Newtons method for optimizing a performance index F(x) is Xk+1= Xk Ak 1 gk, ..5.3 Where Ak = 2 F(x) and gk = F(x); It is assume d that F (x) is a sum of squares function:
41
F ( x) = vi2 ( x ) = V T ( x ) v( x ) .5.4
r =1
[ F ( x ) ] j = F ( x ) / S j = 2 Vi ( x )vi ( x ) / x j 5.5
i =1
The gradient can be written in matrix form: F(x) = 2JT (x) v(x) ,..5.6 Where J(x) is the Jacobian matrix. Next the Hessian matrix is considered. The k.j element of Hessian matrix would be
[ F ( x ) ]
2
kj
= 2 F ( x ) / x k x j
S ( x ) = Vi ( x ) . 2 vi ( x )
i =1
Assuming that S(x) is small, the Hessian matrix is approximated as 2 F(x) 2 JT(x) J(x) Substituting the values of 2 F(x) & F(x), we obtain the Gauss-Newton method: Xk+1 = Xk [JT (Xk) J ( Xk)]-1 JT (Xk) V(Xk) One problem with the Gauss-Newton over the standard Newtons method is that the matrix H=JTJ may not be invertible. This can be overcome by using the following modification to the approximate Hessian matrix: G = H + I. This leads to Levenberg Marquardt algorithm Xk+1 = Xk [JT (Xk) J ( Xk)+kI]-1 JT (Xk) V(Xk)
42
Or Xk =- [JT (Xk) J ( Xk)+kI]-1 JT (Xk) V(Xk) this algorithm has the very useful feature that as k is increased it approaches the steepest descent algorithm with small learning rate. The iterations of the Levenberg- Marquardt back propagation algorithm (LMBP) can be summarized as follows: 1. Present all inputs to the network and compute the corresponding network outputs and the errors eq = tq a Mq. Compute the sum of squared errors over all inputs. F(x).
2.
3.Obtain Xk. 4.Recompute the sum of squared errors using xk + Xk.. If this new sum of squares is smaller than that computed in step 1 then divide by v, let Xk+1 = Xk + Xk and go back to step 1. if the sum of squares is not reduced, then multiply by v and go back to step 3.
Training Procedure
During training procedure data from a representative image or a class of images is encoded into a structure of the hidden and output weight matrices. It is assumed that an image, F, used in training of size Rx C and consists of rxc blocks. 1. The first step is to convert a block matrix F into a matrix X of size P x N containing training vectors, x(n), formed from image blocks. That is:
43
P= r.c and p.N = R.C 2. The target data is made equal to the data, that is: D=X 3. The network is then trained until the mean squared error, MSE, is sufficiently small. The matrices Wh and Wy will be subsequently used in the image encoding and decoding steps.
Image Encoding
The hidden-half of the two-layer network is used to encode images. The Encoding procedure can be described as follows:
FX, H= (Wh. X)
Where X is an encoded image of F.
Image Decoding
The image is decoded (reconstructed) using the output-half the two-layer network. The decoding procedure is described as follows:
Y = (Wy. H), YF
These steps were performed using MATLAB (Matrix laboratory). The compression so obtained was though offline learning. In the off-line learning methods, once the systems enters into the operation mode, its weights are fixed and do not change any more.
44
LEVENBERG-MARQUARDT ALGORITHM
45
Rearranging of the matrix was done using function vc2blkM followed by scaling up.
46
\
47
blkM2vc.m
function vc = blkM2vc(M, blkS) [rr cc] = size(M) ; r = blkS(1) ; c = blkS(2) ; if (rem(rr, r) ~= 0) | (rem(cc, c) ~= 0) error('blocks do not fit into matrix') end nr = rr/r ; nc = cc/c ; rc = r*c ; vc = zeros(rc, nr*nc); for ii = 0:nr-1 vc(:,(1:nc)+ii*nc) = reshape(M((1:r)+ii*r,:),rc,nc); end
48
vc2blkM.m
function M = vc2blkM(vc, r, rM) %vc2blkM Reshaping a matrix vc of rc by 1 vectors into a block-matrix M of rM by cM size % Each rc-element column of vc is converted into a r by c block of a matrix M and placed as a block-row element [rc nb] = size(vc) ; disp(rc); disp(nb); pxls = rc*nb ; if ( (rem(pxls, rM) ~= 0) | (rem(rM, r) ~= 0) ) error('incorrect number of rows of the matrix') end cM = pxls/rM ; if ( (rem(rc, r) ~= 0) | (rem(nb*r, rM) ~= 0) ) error('incorrect block size') end c = rc/r ; xM = zeros(r, nb*c); xM(:) = vc ; nrb = rM/r ; M = zeros(rM, cM); for ii = 0:nrb-1 M((1:r)+ii*r, :) = xM(:, (1:cM)+ii*cM) ; end
49
newff
Create a feed-forward back propagation network Syntax net = newff net = newff(PR,[S1 S2...SNl],{TF1 TF2...TFNl},BTF,BLF,PF) Description net = newff creates a new network with a dialog box. newff(PR,[S1 S2...SNl],{TF1 TF2...TFNl},BTF,BLF,PF) takes, PR -- R x 2 matrix of min and max values for R input elements Si -- Size of ith layer, for Nl layers TFi -- Transfer function of ith layer, default = 'tansig' BTF -- Backpropagation network training function, default = 'traingdx' BLF -- Backpropagation weight/bias learning function, default = 'learngdm' PF -- Performance function, default = 'mse' and returns an N layer feed-forward backprop network. The transfer functions TFi can be any differentiable transfer function such as tansig, logsig, or purelin. The training function BTF can be any of the backprop training functions such as trainlm, trainbfg, trainrp, traingd, etc. Caution: trainlm is the default training function because it is very fast, but it requires a lot of memory to run. If you get an "out-of-memory" error when training try doing one of these: Slow trainlm training, but reduce memory requirements by setting net.trainParam.mem_reduc to 2 or more. (See help trainlm.) Use trainbfg, which is slower but more memory-efficient than trainlm. Use trainrp, which is slower but more memory-efficient than trainbfg. The learning function BLF can be either of the backpropagation learning functions such as learngd or learngdm. The performance function can be any of the differentiable performance functions such as mse or msereg. Algorithm
50
Feed-forward networks consist of Nl layers using the dotprod weight function, netsum net input function, and the specified transfer functions. The first layer has weights coming from the input. Each subsequent layer has a weight coming from the previous layer. All layers have biases. The last layer is the network output. Each layer's weights and biases are initialized with initnw. Adaption is done with trains, which updates weights with the specified learning function. Training is done with the specified training function. Performance is measured according to the specified performance function.
trainParam
This property defines the parameters and values of the current training function. net.trainParam The fields of this property depend on the current training function (net.trainFcn). Evaluate the above reference to see the fields of the current training function.
train
Train a neural network Syntax [net,tr,Y,E,Pf,Af] = train(net,P,T,Pi,Ai,VV,TV) Description train trains a network net according to net.trainFcn and net.trainParam. train(NET,P,T,Pi,Ai,VV,TV) takes, net -- Neural Network P -- Network inputs T -- Network targets, default = zeros Pi -- Initial input delay conditions, default = zeros Ai -- Initial layer delay conditions, default = zeros VV -- Structure of validation vectors, default = [] TV -- Structure of test vectors, default = [] and returns,
51
net -- New network TR -- Training record (epoch and perf) Y -- Network outputs E -- Network errors. Pf -- Final input delay conditions Af -- Final layer delay conditions
sim
Simulate closed-loop/open-loop response to arbitrary reference and disturbance signals Syntax sim(MPCobj,T,r) sim(MPCobj,T,r,v) sim(MPCobj,T,r,SimOptions) or sim(MPCobj,T,r,v,SimOptions) [y,t,u,xp,xmpc,SimOptions]=sim(MPCobj,T,...) Description The purpose of sim is to simulate the MPC controller in closed-loop with a linear timeinvariant model, which, by default, is the plant model contained in MPCobj.Model.Plant. As an alternative sim can simulate the open-loop behavior of the model of the plant, or the closed-loop behavior in the presence of a model mismatch between the prediction plant model and the model of the process generating the output data. sim(MPCobj,T,r) simulates the closed-loop system formed by the plant model specified in MPCobj.Model.Plant and by the MPC controller specified by the MPC object MPCobj, and plots the simulation results. T is the number of simulation steps. r is the reference signal array with as many columns as the number of output variables. sim(MPCobj,T,r,v) also specifies the measured disturbance signal v, that has as many columns as the number of measured disturbances.
uint8
Return the stored integer value of a fi object as a built-in uint8
52
Syntax Description The stored integer is the raw binary number, in which the binary point is assumed to be at the far right of the word. uint8(a) returns the stored integer value of fi object a as a built-in uint8. If the stored integer word length is too big for a uint8, or if the stored integer is signed, the returned value saturates to a uint8
53
MATLAB Results:
6.1 Training procedure till the MSE becomes less than e-5
54
ORIGINAL IMAGE
COMPRESSED IMAGE
DECOMPRESSED IMAGE
55
7. CONCLUSION
The project IMAGE COMPRESSION AND DECOMPRESSION USING NEURAL NETWORKS has been successfully programmed using MATLAB and tested. The computing world has a lot to gain from neural networks. Their ability to learn by example makes them very flexible and powerful. Furthermore there is no need to devise an algorithm in order to perform a specific task; i.e. there is no need to understand the internal mechanisms of that task. They are also very well suited for real time systems because of their fast response and computational times which are due to their parallel architecture. Neural networks also contribute to other areas of research such as neurology and psychology. They are regularly used to model parts of living organisms and to investigate the internal mechanisms of the brain. Perhaps the most exciting aspect of neural networks is the possibility that some day 'conscious' networks might be produced. There is a number of scientists arguing that consciousness is a 'mechanical' property and that 'conscious' neural networks are a realistic possibility. Even though neural networks have a huge potential we will only get the best of them when they are integrated with computing, AI, fuzzy logic and related subjects. Neural networks are performing successfully where other methods do not, recognizing and matching complicated, vague, or incomplete patterns.
56
FUTURE SCOPE:
Artificial Neural Networks is currently a hot research area in image processing and it is believed that they will receive extensive application to various fields in the next few years. In contrast with the other technologies, neural networks can be used in every field such as medicine, marketing, industrial process control etc. This makes our application flexible and can be extended to any field of interest. Integrated with the other fields like Artificial intelligence, fuzzy logic neural networks have a huge potential to perform. Neural networks have been applied in solving a wide variety of problems. It is an emerging and fast growing field and there is a huge scope for research and development.
57
BIBILIOGRAPHY
1. H.Demuth and M. Beale. Neural Network TOOLBOX Users Guide. For use with MATLAB. The Math Works lne.. (1998) 2. Henrieue Ossoinig. Erwn Reisinger., Reinhold Weiss Design and FPGAImplementation of a Neural Network 3. Kiamal Z. Pekmestzi. Multiplexer-Based Array Multipliers. IEEE TRANSACTIONS ON COMPUTERS, VOL, 48 , JANUARY( 1998) 4. Hennessy. J.L and Patterson. D.A. Compter Architecture: A quantitative Approach. Morgan Kaufmanns, (1990) 5. J. Jiang. Image compression with neural networks. Signal Processing: Image Communication 14 (1999) 737-760
58