For
Computer Science
&
Information Technology
By
www.thegateacademy.com
Contents
CONTENTS
Topic
#1.Mathemathics
1.1Linear Algebra
1.2Probability and Distribution
1.3Numerical Methods
1.4Calculus
#4.
Operating System
4.1Introduction to Operating System
4.2Process Management
4.3Threads
4.4CPU Scheduling
4.5Deadlocks
4.6Memory Management & Virtual Memory
4.7File System
4.8I/O Systems
4.9Protection and Security
Page No.
1 30
18
9 14
15 19
20 30
31 70
31 36
37 42
43 56
57 70
71 105
71 75
76 77
78
79
80
81 - 85
86 94
95 97
98 100
101 105
106 157
106 108
109 123
124 125
126 128
129 133
134 144
145 149
150 153
154 157
158 193
158 163
164 170
171 174
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page.I
5.4SQL
5.5Transactions and Concurrency Control
5.6File Structures (Sequential files, indexing B & B+ trees)
#6.Theory of Computation
6.1Introudction
6.2Finite Automata
6.3Regular Expression
6.4Context free grammar
6.5Turing Machines
#7.Computer Organization
7.1Introduction of Computer Organization
7.2Memory Hierarchy
7.3Pipelining
7.4Instruction Types
7.5Addressing Modes
7.6I/O Data Transfer
#8.Digital Logic
8.1Number Systems & Code Conversions
8.2Boolean Algebra &Karnaugh Maps
8.3Logic Gates
8.4Combinational Digital Circuits
8.5Semiconductor Memory
Contents
175 - 180
181 188
189 193
194 238
194
195 198
199 208
209 218
219 238
239 278
239 244
245 252
253 258
259 263
264 266
267 278
279 292
279 280
281 282
283 286
287 291
292
293 349
293 300
301 325
326 334
335 336
337 342
343 344
345 349
350 381
350 357
358 360
361 365
366 369
370 376
377 378
379 381
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page.II
# Reference Book
Contents
382 417
382 389
390 401
402 407
408 413
414 417
418 419
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page.III
Mathematics
Part - 1: Mathematics
1.1 Linear Algebra
1.1.1
Matrix
Definition: A system of m n numbers arranged along m rows and n columns.
Conventionally, single capital letter is used to denote a matrix.
Thus,
A=[
a
a
a
a
a
a
a
a
a
a
a
a
a
Column Matrix
Number of columns
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 1
Mathematics
=A
Note: All the diagonal elements of skew symmetric matrix must be zero.
Symmetric
Skew symmetric
a h g
h g
[h b ]
[h
]
g
c
g
Symmetric Matrix
=A
=-A
1.1.1.11
1.1.1.12
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 2
1.1.1.13
Mathematics
Hermitian Matrix: It is a square matrix with complex entries which is equal to its own
conjugate transpose.
A = A or a = a
1.1.1.14
always real
Idempotent Matrix
If A = A, then the matrix A is called idempotent matrix.
1.1.1.17
Determinant:
n square matrix.
a
D = det A = |a
a
a |=a a
-a a
Determinant o order n
D = |A| = det A = ||
a
a
a
a
|
|
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 3
1.1.1.19
Mathematics
b
]
a
Important Points:
1. A = A = A, (Here A is square matrix of the same order as that of )
2. 0 A = A 0 = 0,
(Here 0 is null matrix)
3.
AB = , then it is not necessarily that A or B is null matrix. Also it doesnt mean BA = .
4. If the product of two non-zero square matrices A & B is a zero matrix, then A & B are
singular matrices.
5. If A is non-singular matrix and A.B=0, then B is null matrix.
6. AB BA (in general) Commutative property does not hold
7. A(BC) = (AB)C Associative property holds
8. A(B+C) = AB AC Distributive property holds
9. AC = AD , doesnt imply C = D [even when A
].
10. If A, C, D be
matrix, and if rank (A)= n & AC=AD, then C=D.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 4
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
Mathematics
(A+B)T = A + B
(AB)T = B . A
(AB)-1 = B . A
AA =A A=
(kA)T = k.A (k is scalar, A is vector)
(kA)-1 =
. A (k is scalar , A is vector)
(A ) = (A )
) = (A
) (Conjugate of a transpose of matrix= Transpose of conjugate of matrix)
(A
If a non-singular matrix A is symmetric, then A is also symmetric.
If A is a orthogonal matrix , then A and A are also orthogonal.
+p
Note:
Elementary trans ormations dont change the ran o the matrix.
However it changes the Eigen value of the matrix.
1.1.1.23
Rank of Matrix
If we select any r rows and r columns from any matrix A,deleting all other rows and columns,
then the determinant formed by these r r elements is called minor of A of order r.
Definition: A matrix is said to be of rank r when,
i)
It has at least one non-zero minor of order r.
ii) Every minor of order higher than r vanishes.
Other definition: The rank is also defined as maximum number of linearly independent row
vectors.
Special case: Rank of Square matrix
Rank = Number of non-zero row in upper triangular matrix using elementary transformation.
Note:
1.
2.
3.
4.
r(A.B)
min { r(A), r (B)}
r(A+B) r(A) + r (B)
r(A-B)
r(A) - r (B)
The rank of a diagonal matrix is simply the number of non-zero elements in principal
diagonal.
5. A system of homogeneous equations such that the number of unknown variable exceeds
the number of equations, necessarily has non-zero solutions.
6. If A is a non-singular matrix, then all the row/column vectors are independent.
7. If A is a singular matrix, then vectors of A are linearly dependent.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 5
Mathematics
a a
x
x
a
a
Where, A =
,
[
,
[x ]
B =
[
Inconsistent means:
No solution
Cramers ule
Let the following two equations be there
a x +a x = b ---------------------------------------(i)
a x +a x = b ---------------------------------------(ii)
a
D = |b
a
b |
b
D =|
b
a
|
a
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 6
a
D =|
a
Mathematics
b
|
b
and x =
Characteristic equation: | A
|= 0, The roots of this equation are called the characteristic
roots /latent roots / Eigen values of the matrix A.
Eigen vectors: [
]X=0
For each Eigen value , solving for X gives the corresponding Eigen vector.
Note: For a given Eigen value, there can be different Eigen vectors, but for same Eigen vector,
there cant be di erent Eigen values.
Properties of Eigen values
1. The sum of the Eigen values of a matrix is equal to the sum of its principal diagonal.
2. The product of the Eigen values of a matrix is equal to its determinant.
3. The largest Eigen values of a matrix is always greater than or equal to any of the
diagonal elements of the matrix.
4. If is an Eigen value of orthogonal matrix, then 1/ is also its Eigen value.
5. If A is real, then its Eigen value is real or complex conjugate pair.
6. Matrix A and its transpose A has same characteristic root (Eigen values).
7. The Eigen values of triangular matrix are just the diagonal elements of the matrix.
8. Zero is the Eigen value of the matrix if and only if the matrix is singular.
9. Eigen values o a unitary matrix or orthogonal matrix has absolute value .
10. Eigen values of Hermitian or symmetric matrix are purely real.
11. Eigen values of skew Hermitian or skew symmetric matrix is zero or pure imaginary.
| |
12. is an Eigen value of adj A (because adj A = |A|. A ).
13. If is an Eigen value of the matrix then ,
i) Eigen value of A is 1/
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 7
ii)
iii)
iv)
v)
Eigen value of A is
Eigen value of kA are
Eigen value of A
Eigen value of (A
Mathematics
(k is scalar)
are + k
)2 are (
)
Vector:
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 8
Mathematics
Probability
B =, P(A
B) =0
Equally Likely Events: If one of the events cannot happen in preference to other, then such events
are said to be equally likely.
Odds in Favour of an Event =
Where m
n
.
.
P(A)+ P(A)=
Important points:
P(A B) Probability of happening of at least one event o A & B
P(A B) ) Probability of happening of both events o A & B
If the events are certain to happen, then the probability is unity.
If the events are impossible to happen, then the probability is zero.
Addition Law ofProbability:
a. For every events A, B and C not mutually exclusive
P(A B C)= P(A)+ P(B)+ P(C)- P(A B)- P(B C)- P(C A)+ P(A B C)
b. For the event A, B and C which are mutually exclusive
P(A B C)= P(A)+ P(B)+ P(C)
Independent Events: Two events are said to be independent, if the occurrence of one does not
affect the occurrence of the other.
If P(A B)= P(A) P(B)
Conditional Probability: If A and B are dependent events, then P( ) denotes the probability of
occurrence of B when A has already occurred. This is known as conditional probability.
P(B/A)=
)
( )
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 9
Mathematics
P(B/A) = P(B)
[ P(A) 0]
[ P(B) 0]
Bayes theorem:
An event A corresponds to a number of exhaustive events B ,B ,..,B .
If P(B ) and P(A/B ) are given then,
P( )=
(
(
). ( )
). ( )
Distribution
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 10
Mathematics
P(x )=
(x )
) ]
Var(X)= (x x
Var(X)= (xx
Var(X) =E(
)-[E(x)]
Properties of Variance
1. Var(constant) = 0
2. Var(Cx)= C Var(x) -Variance is non-linear [here C is constant]
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 11
Mathematics
covariance=0,
, here
,x
Mean:
For a set of n values of a variant X=(x , x , . . , x )
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 12
Mathematics
For a grouped data if x , x , . . , x are mid values of the class intervals having frequencies
, ,.., ,then, =
If is mean for n data; is mean for n data; then combined mean of n +n data is
=
If , be mean and SD of a sample size n and m ,
SD of combined sample of size n +n is given by,
(n
n )
D = m -m
( n)
=n
+n
+n D +n D
= (n
(n D )
Median: When the values in a data sample are arranged in descending order or ascending order
of magnitude the median is the middle term if the no. of sample is odd and is the mean of two
middle terms if the number is even.
Mode: It is defined as the value in the sampled data that occurs most frequently.
Important Points:
Mean is best measurement ( all observations taken into consideration).
Mode is worst measurement ( only maximum frequency is taken).
In median, 50 % observation is taken.
Sum o the deviation about mean is zero.
Sum o the absolute deviations about median is minimum.
Sum o the s uare o the deviations about mean is minimum.
Co-efficient of variation =
100
( , )
-1 (x, y) 1
(x,y) = (y,x)
|(x,y)| = 1 when P(x=0)=1; or P(x=ay)=1 [ for some a]
If the correlation coefficient is -ve, then two events are negatively correlated.
If the correlation coefficient is zero, then two events are uncorrelated.
If the correlation coefficient is +ve, then two events are positively correlated.
Line of Regression:
The equation of the line of regression of y on x is y
The equation of the line of Regression of x on y is (x
y=
x) =
(x
x)
(y
y)
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 13
Mathematics
Joint Probability Distribution: If X & Y are two random variables then Joint distribution is defined
as, Fxy(x,y) = P(X x ; Y y)
Properties of Joint Distribution Function/ Cumulative Distribution Function:
1.
(
,
) =
2.
( , ) =
3.
(
, ) =
{
(
, ) = P(
y) = 0 x 1 = 0 }
(x, ) = P(
) = (x) . = (x)
4.
x
( , y) = (y)
5.
Joint Probability Density Function:
Defined as (x, y) =
Property:
(x, y)
(x, y) dx dy
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 14
Mathematics
1. Bisection method
This method inds the root between points a and b.
If f(x) is continuous between a and b and f (a) and f (b) are of opposite sign then there is a
root between a & b (Intermediate Value Theorem).
First approximation to the root is x1 =
If f(x1) = 0, then x1 is the root of f(x) = 0, otherwise root lies between a and x1 or x1 and
b.
Similarly x2 and x3 . . . . . are determined.
Simplest iterative method
Bisection method always converge, but often slowly.
This method cant be used or inding the complex roots.
Rate of convergence is linear
2. Newton RaphsonMethod (or Successive Substitution Method or Tangent Method)
( )
xn+1 = xn
(
This method is commonly used for its simplicity and greater speed.
Here (x) is assumed to have continuous derivative (x).
This method ails i (x) = .
It has second order of convergence or quadratic convergence, i.e. the subsequent error at
each step is proportional to the square of the error at previous step.
Sensitive to starting value, i.e. The Newtons method converges provided the initial
approximation is chosen sufficiently close to the root.
Rate of convergence is quadratic.
3. Secant Method
x =x
(
) (
(x )
Page 15
Mathematics
, (x ) =
( )
)
2.
+n -
Page 16
Mathematics
a31
L U Decomposition
It is modification of the Gauss eliminiation method.
Also Used for finding the inverse of the matrix.
[A]n x n = [ L ] n x n [U] n x n
a11 a12 a13
1 0 0
a21 b22 c23
L21 1 0
=
b32 c33
L31 L32 1
Page 17
Mathematics
Numerical Integration
{( irst term
last term)
(remaining terms)}
a) max | ( )|
(b
[ , ]
(b
a) max |
( )
[ , ]
( )|
h ( irst term
{
last term)
(all multiple o
terms)
}
(all remaining terms)
a)
max |
[ , ]
( )
( )|
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 18
Mathematics
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 19
Mathematics
1.4 Calculus
1.4.1
Limit of a Function
Let y = f(x)
Then lim
(x)=
0<|x a|< ,| (x)
i.e, (x)
|<
as x a implies or any
x) =
a
=x
a
nx
x
e =1+x+ +
.........
log(
x) = x
log(
x) =
Sin x = x
n(n
)(n
.........x
.........a
.........
.........
.........
Cos x = 1
Sinh x = x
.........
.........
Cosh x = 1
.........
sinx
=
x
lim (
lim(
lim
lim
) =
x) =
a
x
e
x
= log a
=
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 20
lim
lim
log(
x)
x
x
x
Mathematics
a
=
a
lim log|x| =
L Hospitals ule
When function is of
limit.
or
lim
(x) = lim
Properties of Continuity
If f and g are two continuous functions at a; then
a. (f+g), (f.g), (f-g) are continuous at a
b.
is continuous at a, provided g(a) 0
c. | | or |g| is continuous at a
olles theorem
If (i) f(x) is continuous in closed interval [a,b]
(ii) (x) exists or every value o x in open interval (a,b)
(iii) f(a) = f(b)
Then there exists at least one point c between (a, b) such that
( )=0
Geometrically: There exists at least one point c between (a, b) such that tangent at c is parallel to
x axis
C
C
2
C1
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 21
Mathematics
( )
= (c).
1.4.2
( )
=
( )
( )
( )
Derivative:
( ) =lim
( )
.g
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 22
Mathematics
Homogenous Function
Any function f(x, y) which can be expressed in from xn ( ) is called homogenous function of
order n in x and y. (Every term is of nth degree.)
f(x,y) = a0xn + a1xn-1y + a2xn-2y2
f(x,y) = xn
an yn
( )
1.4.3
+ 2xy
+y
= n(n
)u
Total Derivative
u=
x+
( )
( )
Maxima-Minima
a) Global
b) Local
Find
at x = ,
If
If
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 23
If
Mathematics
If
But If
If
If
If
at x = .
= , proceed further
Note: Greatest / least value exists either at critical point or at the end point of interval.
Point of Inflexion
If at a point, the following conditions are met, then such point is called point of inflexion
Point of
inflexion
i)
ii)
=0,
iii)
Taylor Series:
(a
h)= (a)
h (a)
(a)
.........
Maclaurian Series:
(x) = ( )
x ( )
( )
( )
,s=
1.
= 0,
2. (i) if rt
(ii) ifrt
(iii) ifrt
(iv) ifrt
, t=
=
s
and r
maximum at (a, b)
s
and r
minimum at (a, b)
s < 0 at (a, b), f(a,b) is not an extreme value i.e, f(a, b) is saddle point.
s > 0 at (a, b), It is doubtful, need further investigation.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 24
1.4.4
Mathematics
1. x dx =
, n
2. dx = log x
3. e dx = e
4. a dx =
(prove it )
5.
6.
7.
8.
9.
10.
11.
cos x dx = sin x
sin x dx = cos x
sec x dx = tan x
cosec x dx = cot x
sec x tan x dx = sec x
cosec x cot x dx = cosec x
dx = sin
12.
dx =
sec
13.
dx = sec x
x )
24.
dx = log(x
a ) = cosh ( )
25.
dx = log(x
a ) = sinh ( )
26. a
x dx =
27. a
x dx = x
log(x
a )
28. x
a dx = x
log(x
a )
sin
29.
dx =
tan
30.
dx =
log (
) where x <a
31.
dx =
log (
) where x > a
32. sin x dx =
33.
34.
35.
36.
sin x
sin x
cos x dx =
tan x dx = tan x x
cot x dx = cot x x
ln x dx = x ln x x
37. e
sin bx dx =
(a sin bx
b cos bx )
38. e
cos bx dx =
(a cos bx
b sin bx )
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 25
39. e [ (x)
Mathematics
(x)]dx = e (x)
Integration by parts:
u v dx = u. v dx
v dx)dx
I L A T E
E
Selection of U & V
Inverse circular
(e.g. tan x)
Exponential
Logarithmic
Algebraic Trigonometric
x)dx
(x)dx+
a<c<b
(x)dx = (a
x)dx
(a x)dx
(x)dx =
if f(a-x)=f(x)
=0
if f(a-x)=-f(x)
4. (x)dx =2 (x)dxif f(-x) = f(x), even function
=0
if f(x) = -f(x), odd function
(x)dx
Improper Integral
Those integrals for which limit is infinite or integrand is infinite in a
then it is called as improper integral.
1.4.6 Convergence:
(x)dxis said to be convergent if the value of the integral is finite.
(i)
(x) g(x) for all x and (ii) g(x)dx converges , then
(i) (x)
g(x)
( )
( )
If lim
diverge.
is converges when p
The integral
The integral
dx and
b in case of (x)dx,
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 26
1.4.7
Mathematics
Vector Calculus:
3(x,y,z)
=(
Directional Derivative:
is the resolved part of
The directional derivative of f in a direction N
.
in directionN
= | |cos
.N
is a unit vector in a particular direction
Where N
Direction cosine: l
n =
Gradient:
is defined as the gradient of the scalar point function f(x,y,z) and written
grad f =
1.4.9
is vector function
If f(x,y,z)= 0 is any surface, then
is a vector normal to the surface f and has a
magnitude equal to rate of change of f along this normal.
Directional derivative of f(x,y,z) is maximum along
and magnitude o this maximum is | |.
Divergence:
The divergence of a continuously differentiable vector point function F is denoted by div. F and
is defined by the equation.
div. F = .
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 27
F=f +
div.F= . = (
=
Mathematics
) .( f +
. is scalar
. =
is Laplacian operator
1.4.10 Curl:
The curl of a continuously differentiable vector point function F is denoted by curl F and is
defined by the equation.
Curl F =
=|
is vector function
1.4.11 SolenoidalVector Function
If .A = 0 , then A is called as solenoidal vector function.
1.4.12 IrrotationalVectorFunction
If
f=
curl grad f =
=0
divcurl F = .
=0
curl curl F =
(
)= ( . )grad div F = ( . )=
(
)+
F
F
Page 28
Mathematics
Also note:
1. (f/g)= (g f f g)/g
2. ( . ) = .
.
3. (F
) = G + F
4.
(fg) = g f + 2 f. g + f
= (C
.A
)B
.B
- (C
)A
(A
B)
C
A (B C ) = (A . C ) B - (A . B )C
B ) C A (B
C )
(A
1.4.16 Line Integral, Surface Integral & Volume Integral
Line integral = ( )d
( )= (x,y,z) (x,y,z) + (x,y,z)
d = dx dy dz
dy dz )
( )d = ( dx
or . N
ds, Where N is unit outward normal to Surface.
Surface integral: .ds
Volume integral :
dv
If F(R ) = f(x,y,z) +
(x,y,z)
1.4.17
dv = dxdydz
dxdydz + dxdydz
reens Theorem
If R be a closed region in the xy plane bounded by a simple closed curve c and if P and Q are
continuous unctions o x and y having continuous derivative in , then according to reens
theorem.
(P dx
dy) = (
P
) dxdy
y
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 29
Mathematics
.N ds
. N. ds = div
dv
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 30
DMGT
Logic is a formal language and it has a syntax and semantics and a way of manipulating
expressions in the language.
Syntax is a description of what you are allowed to write down (i.e., what expressions are legal)
Semantics give meanings to legal expressions.
A language is used to describe about a set
Logic usually comes with a proof system which is away of manipulating syntactic expressions
which will give you new syntactic expressions
The new syntactic expressions will have semantics which tell us new information about sets.
In the next 2 topics we will discuss 2 forms of logic
1.
Propositional logic
2.
First order logic
Propositional logic
Sentences are usually classified as declarative, exclamatory interrogative, or imperative
Proposition is a declarative sentenceto which we can assign one and only one of the truth values
true or false or A statement which is either true or false
Assumptions about propositions
For every proposition p, if p is not true, then it is false & vice-versa.
For every proposition p, it cannot be simultaneously true or false
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 31
DMGT
p
T
p q
p q
p q
{ p
We also write p
p, converse
, inverse or opposite
p, contra positive
as if p then q or q if p
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 32
p q
DMGT
Two compound propositions are said to be equivalent if they have the same truth tables.
Given below are some of the equivalence propositions.
( (p)) p
(p
)
p
(p )
p
p
p
( p
)(
p
p),
p
(p ) ( p)
A propositional function is a function whose variables are propositions
A propositional function p is called tautology if the truth table of p contains all the entries as
true e p
p
A propositional function p is called contradiction if the truth table of p contains all the entries as
false e p p
A propositional function p which is neither tautology nor contradiction is called contingency
(i e A statement should be true for atleast one case, A statement should be false for atleast
one case) e p
Whenever p is a tautology, then we can replace the symbol with
i.e. p
Ex. If p q, then p is a tautology,
is a tautological implication.
Inference will be used to designate a set of premises accompanied by a suggested conclusion.
(p p p
p )Q
Here each p is called premise and Q is called conclusion.
If (p p p
p ) Q is tautology then we say Q is logically derived from p p
p
i.e., from set of premises i.e., {p , p ,
, p i.e. whenever the premises p , p , , p are true,
then Q is true.
or p , p , p if conclusion is obtained by rule of inference Q
Otherwise it is called invalid inference
Some rules of inference are:
I1
I2
I3
I4
I5
P (
)
I6
I7
I8
I9
I10
I11
(
(
,
P,P
,
)
)
Q
Q
Q
(simplification)
(addition)
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 33
I12
I13
I14
I15
I16
I17
DMGT
(modus tollens)
(hypothetical syllogism)
(dilemma)
(constructive dilemma)
(destructive dilemma
, R
R
, R,
R R
, R,
S R S
R, S,
R
S
V Q
( ), Q
P
P
(conjunctive syllogism)
EQUIVALENCES
E1
E2
E3
E4
E5
E6
E7
E8
E9
E10
E11
E12
E13
E14
E15
E16
E17
E18
E19
E20
E21
E22
E23
E25
E26
E27
E28
E29
E30
( )R (
(
) R (
( R) (
( R) (
( )
(
)
R ( )R
R(
)R
R (
)T
R( )F
(
)
( R)(
(
)~
(
)(
(P Q) ( )
( ) P
PV~PT
F
PVFP
PVTT
TP
FF
(double negation)
(commutative laws)
R)
}
R)
) ( R)
}
)( R)
(associative laws)
(distributive laws)
(De Morgans law)
(contrapositive)
) R
(absorption laws)
(trivial laws)
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 34
DMGT
2.
The set A is called the domain of p(x), and the set Tp of all elements of A for which p(a) is
true is called the truth set of p(x).
3.
4.
Frequently, when A is some set of numbers, the condition p(x) has the form of an equation
or inequality involving the variable x.
5.
The symbol which reads for all or for every is called the universal quantifier.
6.
7.
The expression (x A) p(x) or x p(x) is true if and only if p(x) is true for all x in A.
The symbol which reads there e ists or for some or for at least one is called the
existential quantifier.
8.
The expression (x A)p(x) or x, p(x) is true if and only if p(a) is true for atleast one
element x in A.
Sentence
Abbreviated Meaning
x, F(x)
all true
x, F(x)
~[x, F(x)]
none true
x, [~F(x)]
all false
x, [~F(x)]
~(x,[~F(x)])
none false
~( x,[F(x)])
Statement
Negation
all true
x, F(x)
x, [~F(x)]
, F( )
all false
x, [~F(x)]
x, F(x)
x, F(x)
, [ F( )] all false
all true
at least one true
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 35
DMGT
We see that to form the negation of a statement involving one quantifier, we need to only change
thequantifier from universal to existential, or from existential to universal and negate the
statements which it quantifies.
i.e. ( , F( )) x[ F( )],
( , F( ))
[ F( )]
Sentences with Multiple quantifiers:
In general if P (x, y) is any predicate involving the two variables x and y, then the following
possibilitiesexist:
(x)(y)P (x,y)
(x)(y) P (x,y)
(x)(y)P (x,y)
(x)(y) P (x,y)
(y)(x)P (x,y)
(y)(x) P (x,y)
(y)(x)P (x,y)
(y)(x) P (x,y)
( )( )
( )( )
( )( )
( )( )
P(x,y)
( )( )
( )( )
( )
( )( )
( )( )
, ( )
This rule holds provided we know P(c) is true for each element c in the universe.
Existential specification: If x, P(x) is assumed to be true, then there is an element c in the
universe such that P(c) is true. This rule takes the form.
, ( )
( )
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 36
DMGT
2.2: Combinatorics
Counting:
Let x be a set. Let us use |x| to denote the number of elements in x.
Two elementary principles act as building blocks for all counting problems
Sum rule:
If E1, E2,
En are mutually exclusive events, and E1 can happen in e1 ways, E2 can
happen in e2 ways,
,En can happen in en ways, then E1 or E2 or En can happen e1 + e2
+ en ways.
Product rule:
If s , s . . . . . s are non empty subsets then the number of elements in the Cartesian
product s
s
s . . . . . s is the product
s
If events E1, E2
En can happen in (e1, e2
, en) ways respectively, then the
sequence of events E1 first followed by E2
, followed by En can happen in (e1 x
e2
en ) ways.
Permutations:
P (n, r) =
= the number of permutations of n objects taken r at a time
repetitions) = n
(n 1)(n 2)
(without any
(n +r + 1)
NOTE:
1.
2.
3.
4.
5.
6.
z).
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 37
DMGT
Combinations
A combination of n objects taken r at a time (called an r-combination of n objects) is an
unordered selection of n objects.
C (n, r) = The number of combinations of n objects taken r at time (without repetition)
n
r (n r)
NOTE
1.
P (n, r) = r! C(n, r)
2.
C (n, n) = 1
3.
V (n, r) = The number of combinations of n distinct objects taken r at a time with
unlimited repetitions.
4.
V (n, r) = C ( n 1 + r, r )
= C (n 1 + r, n 1)
V (n, r) =The number of ways distributing r similar balls in to n numbered boxes.
V (n, r) =The number of non negative integral solutions to x1 + x2 +
+ n = r. Generating
functions can be used to compute V (n, r).
Enumeration of permutations and combinations
The number of r-permutations of n-elements without repetitions = n
= P(n, r)
.....
times.
= n, then
.....
)=
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 38
DMGT
Pigeonhole Principle:
If A is the average number of pigeons per hole, then some pigeon hole contains at least A
pigeons and some pigeon hole contains at most A pigeons.
E ample: If n + 1 pigeons are distributed among n pigeon holes, then some pigeon hole
contains at least 2 pigeons.
E ample: If 2n + 1 pigeons are distributed among n pigeon holes, then some pigeon hole
contains at least 3 pigeons.
E ample: In general, if k is a positive integer and kn+1 pigeons are distributed among n
pigeon holes, then some hole contains at least k + 1 pigeons
Example: In a group of 61 people at least 6 people were born in the same month.
Example: If 401 letters were delivered to 50 apartments, then some apartment received at most
8 letters.
Eulers - Function:
If n is a +ve integer then (n) = The number of integers such that 1
n and such that n
and are relatively prime (n) = n [(1 (1/p1)) (1- (1/p2))
(1- (1/pk))]
Where p1, p2 ,
Derangements:
Among the permutations of 1, 2, , n there are some, called derangements, in which none of
the n integers appears in its natural place.
Dn = The number of derangements of n elements
Dn = n
Summation:
1.
2.
3.
4.
5.
C (n, r) C (r, k) = C (n, k) = C (n k, r k) = for integers n > r > k > 0 (Newtons identity)
C (n, r).r = n. C (n -1, r 1)
P (n, r) = n. P (n 1, r 1)
ascals identity C (n, r) = C (n - 1, r) + C (n - 1, r 1)
C (n, 0) = 1 = C (n, n)
C (n, 1)
= n = C (n, n 1)
Notation:
C (n, 0) = C0,
C (n, 2) = C2
C (n, r) = Cr
C (n, n) = Cn.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 39
DMGT
Row Summation:
C 0 + C1 + C2
+ Cn = 2n
) = 2n-1
Diagonal Summation:
11. C (n, 0) + C (n + 1, 1) + C (n + 2, 2) +
12. C (m, 0) C (n, 0) + C (m, 1) C (n, 1) +
m>n>0
+ C (n + r, r) = C (n +r + 1, r)
+ C (m, n) C (n, n) = C (m + n, n) for integers
Column Summation:
13.
14.
15.
16.
C (r, r) + C (r + 1, r) +
+ C (n, r) = C (n + 1, r + 1), for any positive integer n > r
2.1 C2 + 3.2 C2 +
+ n (n 1) Cn = 2n - 1 n (n - 1)
12 C1 + 22 C2 +
+ n2Cn = 2n 2 n (n + 1)
3
3
1 C1 + 2 C2 +
+ n3Cn = 2n 3 n2 (n + 3)
Recurrence Relations
Definition: A recurrence relation is a formula that relates for any integer n>1, the n-th term of
sequence an to one or more of the terms a0, a1
an-1.
Examples of recurrence relations: If Sn denotes the sum of the first n positive integers, then
1.
Sn=n+Sn-1.
Similarly if d is a real number, then the nth term of an arithmetic progression with common
difference satisfies the relation
2.
an=an-1+d.
Likewise if Pn denotes the nth term of a geometric progression with common ratio r, then
3.
Pn = rPn-1.
We list other examples as:
Fibonacci relation: The relation F = F
conditionF = 0 and = F = 1
+F
The numbers generated by the Fibonacci relation with the initial condition are called Fibonacci
numbers.
0, 1, 1, 2, , , , 1 ,
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 40
DMGT
c (n)a
= f(n), when n,
,n >
ifc (n) are constants, then Linear Recurrence Relation with constant coefficient
Homogeneous f(n) = 0
Non-homogeneous f(n) 0
Degree: the number of previous terms of sequence used to define an
Ex: 1. f = f
2.
3.
4.
5.
a
a
a
a
+ f
=a
=a
=a
= a
(degree z linear)
(degree 5 linear)
+ a
( non-linear)
+ 2a
( linear homogeneous degree 2 )
+ 11a
+ a
(linear homogeneous degree 3)
+a y
+a y
+a y = 0
2.
3.
4.
i
+
+a )y =0
+a = 0 and solve it for E
Solution i.e.C.F
,
c ( ) +c ( ) +c ( ) +
+i ,
(a pair of imaginary roots)
(c + c n + c n )( ) +
r (c cos n + c sin n )
where r= (
) and = tan (
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 41
DMGT
+a y
+ a y = f(n)
+ +a .
I = (1/E-a ) a = na
a) y = a ,
(iii) ((E
I=(
a) y = a ,
a =
I=(
a =
)
(
a
)(
and so on.
Case(ii): When f(n)=np
P.I. =
( )
np =
nP.
(1)
Expand [ (1 + )]-1 in ascending powers of by the Binomial theorem, as far as the
term in P.
(2)
Express np in the factorial form and operate on it with each term of the expansion.
Case(iii): When f(n) = an F(n), F(n) being a polynomial of finite degree in n.
P.I. =
( )
an F(n)=an.
F(n)
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 42
DMGT
a, b, c,
( )
is odd}
0 = 1, , , 7,
Sub Set: Let A and B be any two sets. If every element of A is an element of B, then A is called a
subset of B Symbolically this relation is denoted by A B
Note: For any set A, AA and A i e , for any set A, and A are called trivial sub sets
Some examples of set are
N = the set of natural numbers or positive integers: 1, 2, 3, . . .
Z = the set of all integers: . . . , 2, 1, 0, 1, 2, . . .
Q = the set of rational numbers
R = the set of real numbers
C = the set of complex numbers
Observe that N R C.
Proper Sub Set: If A B and A B, then A is called a proper sub set of A This relationship is
denoted by AB
Equal Sets: Two sets A and B are equal [A = B] iff A B and B A Sets are equal if they have
same elements (
)
Equivalent set: of two sets A and B are equivalent iff
denoted by A ~ B
Empty Set: A set with no elements is called the empty set and it is denoted by Or { } [
{
]
Universal Set: The set of all objects under consideration or discussion, and is denoted by U.
Singleton: A set with only one element is called singleton. Ex {
Set Difference:
or {0 , {1
A B=
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 43
Set Union:
AB=
Set Intersection:
Set Complement:
Ac =
[A
DMGT
=(
)(
)(
)]
B=
A
B=
B = (A B) (B A) = (A B) (A
B)
AB=
Venn diagram: It is a pictorial representation of sets in which sets are represented by enclosed
areas in the plane. The universal set U is represented by the interior of a rectangle, and the other
sets are represented by disks lying within the rectangle.
U
B
(d)
(e)
(f) A
Cardinality of a Set: The number of elements in a set is called the cardinality of that set. It is
denoted by |A|.
+
=
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 44
DMGT
denoted by A ~ B
Ex: A = { 1, 2 }
P(A) = {{ } ,{1}, {2},{1,2}}
Power Set: Let A be any set. Then the set of all subsets of A is called the power set of A. It is
denoted by p (A)
Note: If a set A contains n elements, then its power set P(A) contains 2n elements.
The set operations satisfy the following laws:
(1a)
Associative Laws:
(2a)( )
Identity Laws
Involution Laws:
Complement Laws:
DeMorgans Laws:
( )
(4a)
(
( )
)=( )
(1b)
(2b) (
(3b)
( )=(
(4b)
(
)
(5a)
(5b)
(6a)
(6b)
(7) (
) =
(8a)
(9a)
(8b)
(9b) =
(10a) ( ) =
(10b) (
) =
= ( , , )
Ex: {1, 2}
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 45
DMGT
Note:
1.
2.
3.
4.
5.
6.
7.
In general,
If
=
then either = or (
If
= ,
= and
= then
( )=(
)(
)
(
)=(
) (
)
(
)=(
) (
)
(
)
(
)=(
) (
)
)
=
and
)].
xRx x A ( R is reflexive)
If x Ry then y Rx (R is symmetric)
If x R y and y R z then xRz( R is transitive)
and ( , )
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 46
DMGT
(( , )
a=b
Note:
1) The properties of being symmetric and being anti symmetric are not negatives of each other.
2) For the sets A and B, A B is called universal relation from A to B and is called the empty
relation from A to B.
3) A relation can be symmetric an antisymmetric both or cannot have both
Irreflexive relation: A relation R on a set A is irreflexive if (x , x)
R for all x A.
Note: Any relation which is not reflexive is not necessarily irreflexive and vice versa .
Asymmetric relation: x, y if (x ,y) R then (y ,x) R
Diagonal Relation: Let A be any set. The diagonal relation consists of all ordered pairs (a,b) such
that a = b
ie
= (a, a) I a A}
Composition of Relations:
9.
Let A, B and C be sets, and let R be a relation from A to B and let S be a relation from B tC.
That is, R isa subset of A B and S is a subset of B C. Then R and S give rise to a relation
from A to C denoted by RS and defined by: a(RS)c if for some b B we have aRband bSc.
10.
Equivalently, R S = {(a, c) | there exists b B for which (a, b) R and (b, c) S}.
) ( , )
Note:
i) A relation R* is the transitive ( symmetric, reflexive) closure of R , if R* is the smallest relation
containing R which is transitive ( symmetric, reflexive).
ii) R R-1 is the symmetric closure of R and R A is the reflexive closure of R.
Partition: A partition of S is a sub division of S into non overlapping, non empty sub sets such
that
A1 A2
and Ai
An = S
Aj if i j
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 47
DMGT
Equivalence Classes: Given any set A and an equivalence relation R on A, the equivalence class [x]
of each element x of A is defined by
[x] = { y A | x R y }
Note: i) We can have [x] = [y] even if y, provided
Ry
ii) Given any set A and any equivalence relation R on A, S = { [x] | x A } is a partition of A.
Comparability:
Two elements a and b in a posetA are said to be comparable under the relation <, if either a < b
or b <a Otherwise, they are not comparable.
If every pair of elements of A is comparable, then we say |A ,< | is a totally ordered set (or)
linearly ordered set (or) chain.
Upper bound:-upper bound of a and b is defined as set of all elements c such that
( , )=
and b
b.
If c = a V b then c satisfies
i) a c and b
ii) If a
d and b d then c
i.e..c is lub of (a , b)
iii)
( , )=
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 48
DMGT
B is the g b.
Hasse diagram
( A = 1, 2, 3, 4 };
4
4
3
3
2
2
1
Hasse diagram
i)
ii)
iii)
y cover
Join Semi lattice:A poset [A; ] in which each pair of elements a and b of A have a unique least
upper bound is called join semi lattice.
Meet Semi Lattice:A poset [A; ] inwhich each pair of elements a and b of A have a unique glb
(meet) is called 'meet semi lattice".
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 49
DMGT
Lattice: A lattice is a poset in which each pair of elements has a unique lub and a glb. In other
words, a lattice is both a join semi lattice and a meet semi lattice.
A lattice is often denoted by [L, ,
The following laws hold in L
i)
ii)
iii)
iv)
a b = b a and a b = b a
(a b) c = a ( b c) and (a b)
a (a b)= a and a (a b) = a
a a = a and a a = a
c=a
(b
c)
Ex:
f
h
e
e
c
g d
d
c
d b
b
a
Lattice
c
a
Lattice
not a lattice
[ Nolub for (b, c)]
1 = 1, a 1 = a, a
0 = a, a 0 = 0
(b c) = (a b) (a
a (b
c) = (a b)
c)
(a c)
Complemented Lattice:Let L be a bounded lattice with lower bound 0 and upper bound 1. Let a
be an element of L. An element x in L is called a complement of a, if a = 1 and a x = 0.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 50
DMGT
Note: In a lattice, compliments need not exist and need not be unique.
Def: A lattice L, is said to be a complemented lattice, if L is bounded and every element in L has a
complement.
Note: Let L be a bounded distributive lattice. Then complements are unique if they exist.
Lattices And Boolean Algebra
Maximal Element: An element of a poset that is not less than any other element of the poset.
Denoted as 1
Minimal Element:An element of a poset that is not greater than any other element of the poset.
Denoted as 0
1)
2)
3)
4)
Note: The lub and glb of any pair (a, b) in a poset, if exists, are unique.
Boolean Algebras (Boolean Lattice)
A Lattice that contains the element 0 and 1, and which is both distributive and complemented is
called a Boolean Algebra.
Ex: Let be the set of propositions. is a Boolean algebra under the operations and with
negation being the complement, a contradiction 0 is the zero element and a tautology 1 as the
unit element.
Boolean algebra can be represented by the system (B, + , . , - , 0, 1), where B is a set. The
following properties hold:
2. There e ists at least two elements a, b B such that a b
3. ,
a) +
closure property
b)
4. ,
a) + =
b)
=
commutative laws
6. , ,
a) + ( ) = ( + ) ( + )
)+( )
b) ( + ) = (
Distributive laws
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 51
7. , ,
a) + ( + ) = ( + ) +
)
b) ( ) = (
DMGT
Associative laws
8. , such that
a) + = 1 and
Inverse element
b) = 0
Note:1 = 0
0 = 1
1 + 0 = 1,
0 + 1 = 1,
1 + 1 = 1,
0 + 0 = 0,
0 0 = 0,
10 =0
Note:
i) , + =
=
ii) The elements 0 and 1 are unique.
iii) , + 1 = 1
0=0
iv) is unique
v) Let a, b B then
+
Absorption Laws
( + )=
vi) ,
, (Demorgans Laws)
( + ) =
( )= +
vii) ,
+
,
=
( + ) =
is a Boolean product
The sum of min terms that represents a given Boolean function is called the sum of products
e pansion or Disjunctive normal form (DNF).
Max terms: A Boolean expansion of the form of a disjunction (sum) of n literals is called max
term.
The product of max terms that represents a Boolean function is called the product of sums
e pansions or conjunctive normal form(CNF).
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 52
DMGT
Function:
Let A and B be non-empty sets. A function from A to B
element of B to each element of A.
( )=
B is the image of a & a is preimage of B
A = Domain of f
B = Co-domain of f
The set of all image values is called the range of f and is denoted by
One- To-One Function (Injection): A function :
different elements in the domain A have distinct images.
( )
( ),
( )
( )
a=b
Inverse Of a Function:
:
NOTE: A function :
is a function from B to A.
Constant Function:
( )= ,
Identity Function:
A mapping :
NOTE: Let m and n be positive integers with m>n then, there are
( , 1)(
1) + ( , 2)(
2)
+ ( 1)
( ,
1)1
DMGT
Groups
Binary Operation: Let A and B be two sets. A function from A A to B is called a binary operation
on set A.
Algebraic System: A set A with one or more binary operations defined on it is called an algebraic
system.
Ex: ( , ), ( , , ), ( , +, )
Associativity: Let * be a binary operation on a set A. The operation * is said to be associative if
(a*b)*c=a*(b*c) for all, a, b, c in A
Identity: For an algebraic system (A,*), an element e in A is said to be an identity element of A if
a*e=e*a=a for all aA
Inverse: Let (A,*) be an algebraic system with identity e Let a be an element in A An element b
is said to be inverse of A if
a*b = b*a = e
Closed operation: A binary operation is said to be closed operation on A if ,
, ,
* is a closed operation
* is an associative operation
There is an identity in A
Every element in A has inverse
[ commutative property]
Order of a group is equal to the number of elements in that group. Denoted as O(G)
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 54
DMGT
iii. e
2. A necessary and sufficient condition for a non empty subset H of a group (G,*) to be a sub
group is that ,
3. A necessary and sufficient condition for a non empty finite subset H of a group (G,*) to be a
sub group is that
for all ,
4. For any group {G,*},{e} and G are trivial sub groups.
Cosets
If H is a sub group of G and a G then the set
Ha = {ha | h H} is called a right coset of H in G.
Similarly,
a H = {ah | h H} is called a left coset of H is G.
Note:1. Any two left (right) cosets of H in G are either identical or disjoint.
2. Let H be a sub group of G. Then the right cosets of H form a partition of G.
i.e., the union of all right cosets of a sub group H is equal to G.
3. Lagrange's theorem: The order of each sub group of a finite group is a divisor of the order of
the group.
( )
[
= ]
( )
4. The order of every element of a finite group is a divisor of the order of the group.
5. The converse of the lagrange's theorem need not be true.
Cyclic Groups
A group G is called cyclic, if for some a G, every element of G is an integral power of a
G=
DMGT
If f is a bijection then f is called isomorphism between G and G' and we write G G'
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 56
DMGT
(v)) of node V
a, b are adjacent
is called incident the vertices
a,b
a , b are end points of
Directed graph:
a
incident to b
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 57
DMGT
a is said to be adjacent to b
b is said to be adjacent from b
a = initial vertex
b = terminal or end vertex
Degree:
In an undirected graph degree of a node V, denoted as deg(V) is the number of edges
incident with it, except that a vertex continue twice to degree
In degree or (deg (V)) In a directed graph the no. of edges incident to a vertex is called
the in degree (deg (V)) of a vertex .
Complement of a graph:
Let G be a graph with n vertices then the complement of G, denoted by
is a simple
graph with same vertices in G so that an edge is present in
iff it is not present in G.
If G has n vertices
G
= Kn
E( ) = ( )
( )
D
D
C
D
C
C
=
B
Note: In a simple graph with monotonically non- increasing degree sequence, this
inequality
Should held
1) +
n = no of vertices
1 k n : k =1,2
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 58
DMGT
i.e..if each vertex of G has degree k, then G is said to be a regular graph of degree k. (kregular).
Example: Polygon is a 2-regular graph.
Complete Graph: Asimple graph with n mutually adjacent vertices is called a complete
graph on n vertices and may be represented by Kn.
Note: A complete graph on n vertices has n(n-1)/ 2 edges, and each of its vertices has
degree 'n-1'.
Cycle Graph: A cycle graph of order n is a connected graph whose edges form a cycle of
length n.
C
Null Graph: A null graph of order n is a graph with n vertices and no edges. null graph of
3 vertices
Wheel Graph: A wheel graph of order n is obtained by adding a single new vertex (the
Wheel graph of order 4
hub) to each vertex of a cycle graph of order n.
Vertices = 5, edges = 8
NOTE: A wheel graph Wn has n+1 vertices and 2n edges
Bipartite Graph: A Bipartite graph is a non directed graph whose set of vertices can be
partitioned into two sets M and N in such a way that each edge joins a vertex in M to a
vertex in N.
Complete Bipartite Graph: A complete Bipartite graph is a Bipartite graph in which every
vertex of M is adjacent to every vertex of N.
,
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 59
DMGT
If |M| = m and |N| = n, then the complete Bipartite graph is denoted by Km,n. It has 'mn'
edges.
Degree Sequence: If v1,v2....... vn are the vertices of a graph G, then the sequence
(d1, d2,......dn) where
di = degree of vi , is called the degree sequence of G. Usually we order the degree
sequence so that the degree sequence is monotonically non-increasing.
Sum Of Degrees Theorem: If V = {v1, v2,........,vn} is the vertex set of a non directed graph G
( )=2
then
For Digraph:
( )=
( )=
& ( )
[2|E| / |V|]
( )
Note:
1) An undirected graph, has an even number of vertices of odd degree.
2) If k = (G) is the minimum degree of all vertices in a undirected graph G, then
( )=2
( )=2
Connectivity
Path: In a non directed graph G, a sequence p of zero or more edges of the form {v0, v1},
{v1, v2 ,
vn-1, vn} or {v0 v1 v2 - vn} is called a path from v0 to vn.
v0 is called the initial vertex and vn is called the terminal vertex.
v0and vnare called end points of the path.
We denote p as v0 - vn path
If v0 vn, then p is called an open path.
If v0 = vn then p is called a closed path.
Simple Path: A path p is simple if all edges and vertices on a path are distinct except
possibly the end points.
An open simple path of length n has n distinct vertices and n -1 edges.
A closed simple path of length n has n distinct vertices and n distinct edges.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 60
DMGT
e
d
a b c is a cycle a b d e b c a is a circuit
A cycle is a circuit with no other repeated vertices except its end points.
A loop is a cycle of length 1.
In a graph, a cycle that is not a loop must have length at least 3 edges.
In a multi graph, there may be a cycle of length 2.
Two paths in a graph are edge disjoint if they don't share a common edge.
Two paths are vertex disjoint if they do not share a common vertex.
An undirected graph is called connected if there is a path between every pair of
distinct vertices.
A graph that is not connected is the union of two or more connected sub graphs each
pair of which has no vertex in common. These disjoint connected sub graphs are
called connected componentsof G.
If a graph G is connected and e is an edge such that G - e is not connected, then e is
said to be a bridge or cut edge.
e is bridge
e
If v is a vertex such that G - v is not connected, then v is a cut vertex.
G
is cut-vertex
,
A digraph is weakly connected if there is a non directed path between each pair of
vertices.
A diagraph is strongly connected if there is a directed path from a to b and from b to
a, for all vertices a, b in graph.
Two vertices u and v of a directed graph G are said to be quasi strongly connected if
there is a vertex w from which there is a directed path to u and a directed path to v.
a
b
a
b
a
b
c
c
d
Weakly connected
d
c
Strongly connected
f
e
Quasi-Strongly
connected
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 61
DMGT
Note:
1) The graph G is said to be quasi strongly connected if each pair of Vertices in G is quasi
strongly connected.
2) G is quasi strongly connected
there is a vertex r in G such that there is a directed
path r to all the remaining vertices of G.
3) If G is quasi strongly connected and 'G- e' is not quasi strongly connected for each
edge e of G, then G is a directed tree.
4) G is a directed tree
G is quasi strongly connected and contains a vertex r such that
the in degree of r is zero and the out degree of all other vertices is 1.
5) G is a directed tree
G is quasi strongly connected without circuit.
6) A diagraph G has directed spanning tree iff G is quasi strongly connected.
7) Any strongly connected graph is also weakly connected.
8) A connected graph with n vertices has atleast n - 1 edges.
9) In any simple graph there is a path from any vertex of odd degree to some other
vertex of odd degree.
Isomorphic Graphs: Two graphs G and G1 are isomorphic if there is a function
f: V(G)
i) f is a bijection and
ii) for each pair of vertices u and v of G, u, v E(G) iff f(u), f(v) E(G)
e
G
2
1
G
5
a
b
3
G
e
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 62
DMGT
G&
Euler Path: An Euler path in a multi graph is a path that includes each edge of the
Multi graph exactly once and intersects each vertex of the multi graph atleast once.
A multi graph is traversable if it has an Euler path.
A non directed multi graph has an Euler path iff it is connected and has zero or
exactly two vertices of odd degree.
A connected multi graph has an Euler circuit if and only if its vertices have even
degree.
A directed multi graph G has an Euler circuit iff G is unilaterally connected and the in
degree of every vertex of G is equal to its out degree.
A directed graph that contains an Euler circuit is strongly connected.
Hamiltonian Graph: A Hamiltonian Graph is a graph with a closed path that includes
every vertex exactly once. Such a path is a cycle and is called a Hamiltonian cycle.
An Eulerian circuit uses every edge exactly once but may repeat vertices, while a
Hamiltonian cycle uses each vertex exactly once (except for the first and last) but may
skip edges.
Planarity
A graph or a multi graph that can be drawn in a plane or on a sphere so that its edges do
not cross is called a planer graph.
Example: A complete graph on 4 vertices, K4,is a planar graph.
Example: Tree is a planar graph
B
A
3
4 = Regions
C
B
Region 2
Region 3
Region 2
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 63
DMGT
Degree of a Region: The boundary of each region of a map consists of a sequence of edges
forming a closed path. The degree of region r denoted by deg (r) is the length of the
closed path bordering r.
deg (Region 1) = 3
deg (Region 2) = 3
deg (Region 3) = 4
The sum of the degrees of the regions of a map M is equal to twice the number of
edges in M.
( )=
Let G be a connected graph with e edges and 'v ' vertices Let r be the number of
regions in a planar representation of G. Then r + v = e + 2
Euler's formula.
In a planar graph G, if the degree of each region > k then k
<2|E|
In particular for a simple connected planar graph,
R
2E
If G is a connected simple planar graph with |E| > 1 then,
(A) | E | < 3| V | 6
(B) There exists atleast one vertex v of G such that deg (v) < 5
for all
)
,
Kurtowski Theorem:
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 64
DMGT
Coloring: A coloring of a simple graph is the assignment of color to each vertex of the
graph so that no two adjacent vertices are assigned the same color.(vertex coloring)
Chromatic Number: The minimum number of colors needed to paint a graph G is called
2
the chromatic number of G, denoted by x(G).Ex x ( ) = n , x ( ) = {
x
(Bipartite) = 2
Adjacent Regions: An assignment of colors to the regions of a map such that adjacent
regions have different colors.
A map 'M' is n - colorable if there exists a coloring of M which uses n colors.
A planar graph is 5 colorable.for 60th vertex coloring 2 Map coloring
Four color Theorem: If the regions of a map M are colored so that adjacent regions
have different colors, then no more than 4 colors are required.
Every planar graph is 4- colorable (vertex coloring).
Matching
Matching: Given an undirected graph G = (V, E), a matching is a subset of edges (M E)
such that for all vertices v , at most one edge of M is incident on V i e deg M(v) 1
v
deg M(v) = 1 & vertex is matched , deg M(v) =0 , unmatched
A perfect matching is a matching in which every vertex is matched
( )= 1
B
A
A
B
C
D
Perfect matching
A matching in a graph is a sub set of edges in which no two edges are adjacent.
A single edge in a graph is obviously a matching.
A maximal matching is a matching to which no edge in the graph can be added.
A graph may have many different maximal matchings and of different sizes. Among
these, the maximal matchings, with the largest number of edges are called the largest
maximal matchings. The number of edges in largest maximal matching is called the
matching number of the graph.
Off a graph has perfect matching, then the number of vertices in the graph must be
even but the converse is not true, if n is even, the number of perfect matching in
)(
)
=(
1)(
1
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 65
DMGT
Complete Matching
In a bipartite graph having a vertex partition V1 and V2, a complete matching of
vertices in set V1, into those in V2 is a matching in which there is one edge incident
with every vertex in V1. Inother words, every vertex in V1 is matched against some
vertex in V2.
A complete matching (if it exists) is a largest maximal matching, where as the
converse is not necessarily true.
For the existence of a complete matching of set V1 in to set V2, first we must have
atleast as many vertices in V2 as there are in V1. This condition however is not
sufficient.
A complete matching of V1, into V2 in a bipartite graph exists if and only if every
subset of r vertices in V1 is collectively adjacent to r or more vertices in V2 for all
values of r(Hall's Theorem).
In a bipartite graph a complete matching of V1 into V2 exists if there is a positive
integer 'm' for which degree of every vertex in V1> m > degree of every vertex in V2
NOTE: This condition is a sufficient condition and not necessary for the existence of a
complete matching.
The maximal number of vertices in set V1 that can be mapped in to V2 is equal to
number of vertices in V1 (G)
Edge Covering
In a graph G, a set of edges is said to cover G if every vertex in G is incident on atleast
one edge in G.
A set of edges that covers a graph G is said to be an edge covering (a covering sub
graph or simply a covering) of G.
D
e
A
C
e
B
e e e e is a covering
e e is a Minimal covering
Minimal Edge Covering: A covering from which no edge can be removed without
destroying its ability to cover the graph.
A covering exists for a graph iff the graph has no isolated vertex.
A covering of an 'n' vertex graph has atleast 2edges.
Every covering contains a minimal covering.
No minimal covering can contain a circuit.
A graph, in general, has many coverings, and then may be of different sizes (i.e
consisting of different number of edges). The number of edges in a minimal covering
of smallest size is called the edge covering number of the graph denoted by
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 66
DMGT
=
A
DMGT
If G is a connected graph and has a cut edge, then the edge connectivity of G is 1.
Vertex Connectivity: Let G be a connected graph. The minimum number of vertices whose
removal results in a disconnected graph is called the vertex connectivity of G and is denoted by
K(G)
If G has a cut vertex then K(G) = 1
If G is complete graph Kn then K(G) = n-1
If Cn(n4) is a cycle graph, then (Cn)=2
If a graph G has a bridge then K(G)=1
The edge connectivity of graph G cannot exceed (G)
For any connected graph G, (G)
(G)
(G)
-4.
If G is a planar graph with k connected components, each component having atleast 3 vertices
then
(3|V|-6k).
If G is a planar graph with k components then |V|- E + R = k+1 (for k=1, we get Eulers
formula).
Isomorphism
Suppose G and G1 are two graphs and that f: V(G)
V(G1) is a bijection. Let A be the
,
,
adjacency matrix for the vertex ordering V1 V2
Vn of the vertices of G. Let A1 be the
adjacency matrix for the vertex ordering f(V1), f(V2).......f(Vn). Then f is an isomorphism from
V(G) to V(G1) iff the adjacency matrices A and A1 are equal.
If AA1, then it may still be the case that graph G and G1 are isomorphic under some other
function.
Two simple graphs are isomorphic iff their complements are isomorphic.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 68
DMGT
, -vn is
If two graphs are isomorphic, then their corresponding subgraphs are isomorphic.
Induced Subgraph: If W is a subset of V(G), then the subgraph induced by W is the subgraph H
of G obtained by taking V(H) = W and E(H) to be those edges of G that join pairs of vertices in
W.
If G is isomorphic to then G is said to be self complementary.
If G is self complementary then G has 4n or 4n + 1 vertices.
Spanning Trees
A sub graph H of a graph G is called a spanning tree of G if
i) H is a tree and
ii) H contains all vertices of G
In general, if G is a connected graph with n vertices and m edges, a spanning tree of G must
have
(n-1) edges. Therefore, the number of edges that must be removed before a spanning tee is
obtained must be m - (n-1). This number is called circuit rank of G.
A non directed graph G is connected iff G contains a spanning tree.
The complete graph Kn has nn-2 different spanning trees (Calleys formula)
Tree traversal
* Preorder (Right left Right)
* In order (left Root Right )
*post order (left Right Root)
Tree : A connected undirected graph with simple circuit is Tree
Forest A set of tree
irchoffs Theorem: Let A be the adjacency matrix of a connected graph G and M be the matrix
obtained from A by changing all 1s into -1 and each diagonal element 0 to the degree of the
corresponding vertex. Then the number of spanning trees of G is equal to value of any cofactor of
M.
*The number of different trees with vertex set {v1, v2,
, vn} in which the vertices v1, v2,
, vn have degrees d1, d2,
dn respectively is
(
) (
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 69
DMGT
Minimal Spanning Tree: Let G be a connected graph where each edge of G is labeled with a non
negative cost. A spanning tree T where the total cost C(T) is minimum is called a minimal
spanning tree.
ruskals Algorithm: (For finding minimal spanning tree of a connected weighted graph) O
(log e)
Input: A connected graph G with non negative values assigned to each edge.
Output: A minimal spanning tree for G.
Method:
1) Select any edge of minimal value that is not a loop. This is the first edge of T(if there is more
than one edge of minimal value, arbitrarily choose one of these edges).
2) Select any remaining edge of G having minimal value that does not form a circuit with the
edges already included in T.
3) Continue step 2 until T contains (n-1) edges, where n=|V(G)|
D
Ex:
9 B
C
2
D
2
C
B
D
A
Cycle
4 C
5
4
62
C
B
D
6
7
2
C
5
.C
6
.B
.C
D
2
D
6
C
2
B
A
Both prims and Kruskals will give same weighted minimum spanning tree [ Three
structure way differ but weight of both MST will same ]
A
DSA
Algorithm Analysis
Mathematical Foundations of Algorithm Analysis
Solving Recurrences
Sorting a billion numbers with the Bubblesort algorithm (running time O(n2)) on ASCII
White, the world's fastest computer, takes about a week.
Sorting a billion numbers with Heapsort (running time O(n log n)) on a 100 MHz Pentium PC
takes about 10 minutes.
Most times it does not matter if an algorithm is 2, 3 or 10 times faster than an other algorithm.
Constant factors are dwarfed by the effect of the input size.
The real question to ask when we analzye an algorithm's efficieny is: If we double the input size,
how does this affect the running time of the program? Does it run twice as long? Or four times as
long? Or even longer?
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 71
DSA
O-notation
For asymptotic upper bounds.
O(g(n)) = {f(n): there exist positive constants c and n0 such that 0 <= f(n) <= c*g(n) for all
n>=n0 }
2n = O(3n)
Omega-notation
For asymptotic lower bounds.
Omega(g(n)) = { f(n): there exist positive constants c and n0 such that 0 <= c*g(n) <= f(n)
for all n>=n0 }
n2 log n = Omega(n2)
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 72
DSA
Theta-notation
For asymptotically tight bounds.
Theta(g(n)) = { f(n): there exist positive constants c1, c2, n0 such that 0 <= c1*g(n) <=
f(n) <= c2*g(n) for all n>=n0 }
Examples
3n2 + 6n + 17 = O(n9)
2n+1 = Theta(2n)
22n = Omega(2n)
Recurrences
When an algorithm contains a recursive call, its running time can often be described by a
recurrence.
Examples
T(n) = n + T(n-1)
(Maxsort)
(Mergesort)
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 73
DSA
Prove that T(n) <= c(n log n) for some constant c>0.
= cn log n/2 + n
= cn (log n - log 2) + n
= cn log n - cn + n
<= cn log n
QED
T(n) = n + 2 T(n/3)
T(n) = 3n
T(n) = Theta(n)
DSA
If f(n) and
log n).
This is a rather sloppy definition, but it is good enough for most functions that we will
encounter. There are some more technical formalities that should be understood, however. They
are explained in "Cormen, Leiserson, Rivest: Introduction to Algorithms", pages 62-63.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 75
DSA
Sets
Relations
Functions
Graphs
Trees
Formally, a function f is a binary relation on A x B such that for all elements of A there is
exactly one element of B such that (a,b) is in f.
Graphs
Formally, a graph is a pair (V,E) where V is a finite set and E is a binary relation on V.
Intuitively, a graph is a network consisting of vertices (V) and edges (E) that connect the
vertices.
The edges can be directed or undirected. Dependent on that, the resulting graph is called
a directed graph or an undirected graph.
By convention, edges from a vertex to itself - self-loops - are allowed in directed graphs,
but forbidden in undirected graphs..
If (u,v) is an edge of a directed graph, we say that (u,v) leaves u and enters v.
If (u,v) is an edge of a graph, we say that u and v are adjacent to each other.
DSA
In case of a directed graph, the in-degree of a vertex is the number of edges entering it,
the out-degree is the number of edges leaving it, and its degree is the sum of its in-degree
and out-degree.
A sequence of vertices (v0,v1,..., vk-1,vk) where all (vi,vi+1 are edges of the graph is called a
path of length k that connects the vertices v0 and vk.
A path (v0,v1,..., vk-1,vk) forms a cycle, if v0 = vk and the path contains at least one edge.
A directed graph is strongly connected, if every two vertices are reachable from each
other.
Graph Problems
Euler Circuit: find a cycle in a graph that visits each edge exactly once.
Hamiltonian Circuit: find a cycle in a graph that visits each vertex exactly once.
Trees
If any edge is removed from a tree, the resulting graph is a forest of two trees.
Linked Lists
List elements consist of a value and a pointer to the next list element.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 77
DSA
Examples of ADT
Arrays, lists and trees are concrete datatypes. They are basic data structures typically
provided by the computer language.
Stacks, queues and heaps are abstract datatypes. They are just ideas, i.e. "black boxes"
with a defined behavior. To implement them, you have to choose a suitable concrete
datatype.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 78
DSA
3.4 Stacks
A stack is a container of objects that are inserted and removed according to the last- infirst- out (LIFO) principle
Objects can be inserted at any time, but only the last object can be removed
Inserting an object is known as pushing on the stack and removing and item from the
stack is known as popping
IsEmpty(S) Returns true if the number of elements in the stack is 0, false otherwise
t the t p f st ck S
The stack is very most important data structure being used to implement function calls
efficiently. When a function is called some housekeeping (register values, return address and
etc.) is required before the program counter is assigned the address of called function. Thus
in that case activation records is allocated and being used to store the required state and
then this record is pushed onto the system stack
Being used to find balancing property of the program/expression. At the cost of O(n) this can
be done.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 79
DSA
3.5 Queue
A Queue differs from the stack that its insertion and removal routines follow the first-infirst-out (FIFO) principle.
Objects can be inserted at any time, but only the object which has been in the queue the
longest may be removed
Objects are inserted from the rear and removed from the front
A double ended queue, or dequeue supports insertion and deletion from the front and
back
IsEmpty(Q) Returns true if the number of elements in the queue is 0, false otherwise
t p ssib e i c
st
t time i this
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 80
DSA
3.6 Trees
Binary Trees
Binary tree is a tree in which a node can have at most two children.
Perfect Balanced Binary Tree
In perfectly balanced binary tree each node has same number of nodes in its both subtrees.
Height Definition
Height in general measured as number of edges from the root to deepest node in the tree. Height
can also be measured in terms of nodes in the longest path from the root to deepest node. Please
use the first height definition unless until specified explicitly.
Leaf nodes Vs Internal Nodes
Le f
des d es t h ve chi d.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 81
DSA
Tree Traversals
Traversal is used to visit each node in the tree exactly once. A full traversal as a binary tree gives
a linear ordering of the data in the tree.
Inorder Traversal
1. Traverse the left sub tree of the root node R in inorder
2. Visit the root node R
3. Traverse the right subtree of the root node R in inorder
for n nodes inorder takes O(n).
Postorder Traversal
1. Traverse the left subtree of the root R in post order.
2. Traverse the right sub tree of the root R in post order
3. Visit the root node R.
Running time is O(n) which can be derived as discussed for inorder running time.
Binary Tree Construction Using Postorder
A unique binary tree cannot be constructed using a postorder sequence as there can be many
tree exists for the given sequence.
Preorder Traversal
1. Visit the root node R
2. Traverse the left subtree of R in preorder
3. Traverse the right subtree of R in preorder
Running time is O(n) which can be derived as discussed for inorder running time.
Binary Tree Construction Using Preorder
A unique binary tree cannot be constructed using a preorder sequence as there can be many tree
exists for the given sequence.
Levelorder Traversal
This traverses the tree nodes in level by level order.
Binary Tree Construction Using Levelorder
A unique binary tree cannot be constructed using a level order sequence as there can be many
tree existing for the given sequence.
The nodes can be visited in any order of a level however nodes from deeper
level cannot be visited unless nodes from higher levels are visited.
Queue data structure can be used to implement level order traversal efficiently.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 82
DSA
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 83
DSA
i sert v ue if it d es t re dy exist.
Algorithm
The new value is either inserted into left subtree or right subtree of the root node. Thus by
comparing root and new value we identify the correct subtree to be chosen. And the same steps
are applied to each subtree until a null subtree is found.
Running Time Analysis
The running time can be measured in terms of height of the tree which can vary from logn to n.
Thus in worst case it would be n. However if the insertion is done into a perfect balanced tree
then it would be logn.
Delete
This operation deletes a node from a given BST.
Algorithm
M i y 3 c ses sh u d be c
de the ;
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 84
DSA
OR
The least key in right subtree is the required immediate successor and can be found as follows:
Tree *minNode = findMin(X->right);
Then copy
minNode->val/maxNode->val into X->val;
Thereafter call either case1 or case2 as minNode/maxNode will have either one child or none.
(Running time --->O(n))
Running Time Analysis
Only one of the cases (case1, case2 or case3) is executed for one deletion. Therefore running
time is max (case1 RT, case2 RT, case3 RT) which is O(n). If the tree were a perfect balanced
tree then it would be logn.
The Other Operations
The running time of the following operations is O(n) in worst case since a single path from root
to leaf level is being traversed by all these operations.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 85
3.7
DSA
Tree structures support various basic dynamic set operations including Search, Predecessor,
Successor, Minimum, Maximum, Insert, and Delete in time proportional to the height of the tree.
To ensure that the height of the tree is as small as possible and therefore provide the best
running time, a balanced tree structure like an AVL tree, or B-tree, or B+ tree must be used.
AVL Trees
AVL tree is a self-balancing binary search tree with height-balanced condition. That means for
every node the height of left subtree and right subtree can differ by at most 1.
The following tree a bad binary tree. Requiring balance at root is not enough.
Minimum Number of Nodes in AVL Tree of Height h
Let (h) be the umber f des i the sm est AVL tree f hei ht h. Si ce its
AVL tree the
definitely the height of its subtrees will be either (h-1 and h-2) or (h -1 and h -1). Now to have
smallest AVL tree for the given height, its two subtree must also be smallest possible AVL tree.
And that is possible only if subtrees are of height h-1 and h-2(if it was of h -1 height then it will
have at least one extra node than the subtree of height h-2 has.). Then clearly the following
recurrence can represent the smallest AVL tree.
n(h) = n(h-1) + n(h-2) + 1
Maximum Number of Nodes in AVL Tree of Height h
The largest AVL tree will have 2h+1 1 node.
The derivation follows as discussed before in binary tree section.
Height Analysis
Since it is a balanced tree the height order is O(logn).
Insertion
After insertion it might be possible that only nodes that are on the path from the insertion point
to the root might have their balance altered because only those nodes have their subtree altered.
In that case we need to follow the path up to the root and update the balancing information; we
may find a node whose new balance violates the AVL condition. At this moment using the
rotations we can restore the AVL condition for this node and that rebalancing itself would be
sufficient for the entire tree too, that means we will not need rebalancing on the further nodes
on path to root.
A violation might occur in the four cases
An insertion into the left subtree of the left child of node X(where X violates the AVL
condition).
DSA
1st two cases are symmetric and called as outside insertion that mean left-left or right-right and
can be fixed by the simple single rotation.
Last two are also symmetric and called as inside insertion that mean left-right or right-left and
can be fixed by double rotation (two single rotations).
Single Rotation
The following rotation is called a right rotation. Here tree on the left side becomes imbalance
because the insertion of a new node in x subtree causes k2 to violates AVL condition. This will
fall into left-left case.
NOTE:
After the rotation, k2 and k1 not only satisfy AVL condition but also have subtree that are
exactly the same height. Furthermore, the new height of the entire subtree is exactly the
same as the height of the original one prior to the insertion that caused the X(Z) to grow
up hence will not require any more rebalancing of the further nodes on the path to root.
Double rotation is also retained the same height of the entire affected subtree what was
there prior to insertion. Therefore the height of the entire tree is unchanged in these
cases.
The main idea of rotation is to level the height of both left and right subtrees of the imbalance
subtree by transferring the nodes from the deeper subtree to another subtree. After rotation
deeper subtree height will get restored to the original one and another subtree height is
increased by one. Thereby level the required heights.
Double Rotation
This rotation is required when inside insertion happens. As mentioned above that this can be
implemented with left and right single rotation appropriately.
Right-Left Double Rotation
This case occurs when insertion happens into the left subtree of the right child of node X.
A right-left rotation requires a single right rotation followed by a single left rotation on the
appropriate keys.
Step1: single right rotation on right child of X, where X node violates the property.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 87
DSA
A left-right rotation requires a single left rotation followed by a single right rotation on the
appropriate keys.
Step1: single left rotation on left child of X, where X node violates the property.
B - Tree
In computer science, a B-tree is a tree data structure that keeps data sorted and allows
searches, insertions, deletions, and sequential access in logarithmic amortized time.
The B-tree is a generalization of a binary search tree in that more than two paths diverge
from a single node.
Unlike self-balancing binary search trees, the B-tree is optimized for systems that read and
write large blocks of data in secondary storage such as a magnetic disk. Since disk accesses
are expensive (time consuming) operations, a b-tree tries to minimize the number of disk
accesses. It is most commonly used in databases and filesystems.
Overview
In B-trees, internal (non-leaf) nodes can have a variable number of child nodes within some
pre-defined range. When data is inserted or removed from a node, its number of child nodes
changes. In order to maintain the pre-defined range, internal nodes may be joined or split.
Because a range of child nodes is permitted, B-trees do not need re-balancing as frequently
as other self-balancing search trees, but may waste some space, since nodes are not entirely
full. The lower and upper bounds on the number of child nodes are typically fixed for a
particular implementation. For example, in a 2-3 B-tree (often simply referred to as a 2-3
tree), each internal node may have only 2 or 3 child nodes.
A B-tree is kept balanced by requiring that all leaf nodes are at the same depth. This depth
will increase slowly as elements are added to the tree, but an increase in the overall depth is
infrequent.
Advantages
B-trees have substantial advantages over alternative implementations when node access times
far exceed access times within nodes. This usually occurs when the nodes are in secondary
storage such as disk drives. By maximizing the number of child nodes within each internal node,
the height of the tree decreases and the number of expensive node accesses is reduced. In
addition, rebalancing the tree occurs less often.
NOTE: By maximizing the number of child nodes within each internal node, the height
of the tree decreases and the number of expensive node accesses is reduced.
Definition
The terminology used for B-trees is inconsistent in the literature. Unfortunately, the literature
on B-trees is not uniform in its use of terms relating to B-Trees.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 88
DSA
Last level
(2m)0
(2m)1
(2m)2
(2m)h
Last level
2(m)h -1
Thus Nmin = 1 + 2 *(mh 1) / (m 1)
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 89
DSA
From this we will get minimum records also which is 1 + (Nmin 1)* (m -1). Notice that not all
node in Nminwill have m 1 keys. The root node will have one key.
NOTE:
The best case height of a B-Tree is: O(log2mN)
The worst case height of a B-Tree is: O(logmN)
where m is the B-Tree degree.
Maximizing B-Tree Degree
A bad degree choice will lead to worst case height of B-Tree and hence its operations will
become expensive. It is always good to choose a good degree to minimize the height this will not
only utilizes the disk spaces but also reduces the access cost since height gets reduced in this
case. The following derivation will let you enable for choosing the best degree.
Fig 4.12
As the diagram shows that each B-Tree node contains fields for child pointers, data pointers and
keys.
Let disk block size is = B bytes, and degree is = d (child pointers per node)
And index pointer to data and child are of = b bytes,
And key length is of = k bytes.
Then a value of d would be best choice if it maximizes value of the left side of below equation
such that result is closest to right side (B).
(d 1) * b + d * b + (d 1)*k = B ----------- 1
Insertion
All insertions start at a leaf node. To insert a new element
Search the tree to find the leaf node where the new element should be added. Insert the new
element into that node with the following steps:
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 90
DSA
1. If the node contains fewer than the maximum legal number of elements, then there is room
for the new element. Insert the new element in the node, keeping the node's elements
ordered.
2. Otherwise, the node it is full, so evenly split it into two nodes.
A single median is chosen from among the leaf's elements and the new element.
Values less than the median are put in the new left node and values greater than the median
are put in the new right node, with the median acting as a separation value.
Insert the separation value in the node's parent, which may cause it to be split, and so on. If
the node has no parent (i.e., the node was the root), create a new root above this node
(increasing the height of the tree).
If the splitting goes all the way up to the root, it creates a new root with a single separator value
and two children, which is why the lower bound on the size of internal nodes does not apply to
the root.
Deletion
The logic to delete a record is to locate and delete the item, then restructure the tree to regain its
invariants
There are two special cases to consider when deleting an element:
The element in an internal node may be a separator for its child nodes.
Deleting an element may put its node under the minimum number of elements and children.
DSA
If the value is in an internal node, choose a new separator (either the largest element in the
left subtree or the smallest element in the right subtree), remove it from the leaf node it is in,
and replace the element to be deleted with the new separator.
This has deleted an element from a leaf node, and so is now equivalent to the previous case.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 92
DSA
The primary value of a B+ tree is in storing data for efficient retrieval in a blockoriented storage context. This is primarily because unlike binary search trees, B+
trees have very high fanout (typically on the order of 100 or more), which reduces
the number of I/O operations required to find an element in the tree.
B+ Trees are designed for having the range query running faster. The nodes in leaf
level are connected in linked list fashion and that makes it possible.
Fig 4.18
As the diagram shows that each B+ tree internal node contains fields for child pointers, and keys.
Let disk block size is = B bytes, and degree is = d (child pointers per node)
And index pointer to child is of = b bytes,
And key length is of = k bytes.
Then a value of d would be best choice if it maximizes value of the left side of below equation
such that result is closest to right side (B).
d * b + (d 1)*k = B ----------- 1
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 93
DSA
Insertion in B+ Tree
1. do a search to determine what bucket the new record should go in
2. if the bucket is not full, add the record.
3. Otherwise, split the bucket.
4. allocate new leaf and move half the bucket's elements to the new bucket
5. Insert the new leaf's smallest key and address into the parent.
6. if the parent is full, split it also
7. now add the middle key to the parent node
8. repeat until a parent is found that need not split
9. if the root splits, create a new root which has one key and two pointers.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 94
DSA
3. 8 HASHING
Hashtables
If the universe U is much larger than the number of keys actually stored (here: N), then a
hashtable might be better than a standard array.
h is a function that maps the universe U of possible keys to the indices of the hashtable.
h : U -> {0,1,2,...,M-2,M-1}
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 95
DSA
In a room with 35 people, how likely is it that two people have the same birthday?
|U| = 365
N = 35
Open Hashing
Linear Probing
Quadratic Probing
Double Hashing
Open Hashing
Each element of the hashtable is a linked list of elements that got mapped to this
location.
If you store N keys in a table of size M, the lists will have an average length of N/M.
Linear Probing
hi(key) = h(key) + i
Problem: Clustering
Quadratic Probing
If cell x is already taken, try x+1, x+4, x+9, x+16,..., until you find an empty cell.
But if many keys are mapped to the same cell x, each key runs through the same
sequence of cells until an empty cell is found.
Double Hashing
On the ith try to find an empty cell for key, use the hash function:
hi(key) = (h(key) + i*h'(key)) mod m.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 96
DSA
Universal Hashing
One problem remains: No matter which hash function you choose, there will always be a
worst-case input where each key is hashed to the same slot.
Have a set of hash functions ready and choose from them after the program started.
A collection H of hash functions is called universal, if for each pair of distinct keys x,y, the
number of hash functions for which h(x)=h(y) is exactly |H|/m.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 97
DSA
d ( -1)/2 edges.
= 2n(n-1)/2
In case of directed graph without self loopNmax= 2n(n-1).
Representation of Graphs
Adjacency matrix: Here two dimensional arrays are used. For each edge,
(u,v) a[u,v] = true ,otherwise it will be false.
The space complexity to represent a graph using this is O(|V|2).
Adjacency list: Array of lists is used, where each list stores adjacent vertices of the corresponding
vertex.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 98
DSA
rithm (Dijkstr s A
Dijkstr s
rithm pr ceeds i st es. At e ch st e, this
rithm se ects vertex v, which h s
the smallest dv among all the unknown vertices, and declares that the shortest path from s to v is
known. The remainder of a stage consists of updating the values of dw.
RT is = |V|log|V| + |E|log|V| = (|E| + |V|)log|V|.
However, if graph is represented using adjacency matrix, then
RT is = |V|log|V| + |V|2log|V| = O (|V|2log|V|).
DSA
The Prims algorithm is essentially identical to Dijkstra only, except the update rule.
The new update rule is as follows:
dw= min(dw, cw,v).
Thus, the RT analysis of Dijkstra algorithm will also remain applicable here too.
Krushkal Algorithm
Initially from the graph G, consider a forest of all the vertices without any edge.
1. Sort all the edges in ascending order according to their costs.
2. Include the edge with minimum cost into the forest if it does not form a cycle in the partially
constructed tree.
Repeat step (2) until no edge can be added to the tree.
Running Time Analysis
Total number of edges considered by this algorithm in worst case is |E|.
That means, the maximum number of stages is |E|. Now we need to compute the RT of each
stage. In each stage an edge with lowest cost is selected and being checked for not causing cycle.
Thus, if minHeap of edges is used then finding next edge takes up O(log|E|). Therefore, total RT
is O(|E|log|E|). Notice that if E=O(|V|2), then RT is O(|E|log|V|).
Remarks:
If the weight of all the edges of a graph G is unique, then only one minimum spanning tree
exists of that graph.
If the weights of all the edges of a graph G are not unique, then the graph might have only one
minimum spanning tree or more than one also which are structurally different.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 100
DSA
Insertion Sort
Insertion sort is a simple a comparison sorting algorithm. Every iteration of insertion sort
removes an element from the input data, inserting it into the correct position in the alreadysorted list, until no input elements remain. The choice of which element to remove from the
input is arbitrary, and can be made using almost any choice algorithm.
Performance
The worst case input is an array sorted in reverse order. In this case every iteration of the inner
loop will scan and shift the entire sorted subsection of the array before inserting the next
element. For this case insertion sort has a quadratic running time (i.e., O(n2)). The running time
can be bound as the total number of comparison being made in the entire execution of this
algorithm. Thus, the worst case comparisons in the respective passes are as follows:
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 101
DSA
Simple implementation.
Adaptive, i.e. efficient for data sets that are already substantially sorted: the time complexity
is O(n ).
Stable, i.e. does not change the relative order of elements with equal keys.
Selection Sort
Selection sort is also a simple a comparison sorting algorithm. The algorithm works as follows:
1. Find the minimum value in the list
2. Swap it with the value in the first position
3. Repeat the steps above for the remainder of the list (starting at the second position and
advancing each time)
Effectively, the list is divided into two parts: the sublist of items already sorted, which is built up
from left to right and is found at the beginning, and the sublist of items remaining to be sorted,
occupying the remainder of the array.
Performance
The all inputs are worst case input for selection sort as each current element has to be compared
with the rest of unsorted array. The running time can be bound as the total number of
comparison being made in the entire execution of this algorithm. Thus, the worst case
comparisons in the respective passes are as follows:
Total comparisons = n(n -1) / 2; which implies O(n2) time complexity.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 102
DSA
Remarks:
It is much less efficient on large lists than more advanced algorithms such as quicksort,
heapsort, or merge sort.
Simple implementation.
In-place, i.e. only requires a constant amount O(1) of additional memory space.
Insertion sort is very similar in that after the kth iteration, the first k elements in the array
are in sorted order. Insertion sort's advantage is that it only scans as many elements as it
needs in order to place the k + 1st element, while selection sort must scan all remaining
elements to find the k + 1st element.
Selection s rt w ys perf rms (n) swaps.
Merge Sort
Merge sort is an O(n log n) comparison-based divide and conquer sorting algorithm.
Conceptually, a merge sort works as follows:
1. If the list is of length 0 or 1, then it is already sorted,otherwise:
2. Divide the unsorted list into two sublists of about half the size.
3. Sort each sublist recursively by re-applying merge sort algorithm.
4. Merge the two sublists back into one sorted list.
Performance
In sorting n objects, merge sort has an average and worst-case performance of O(n log n). If the
running time of merge sort for a list of length n is T(n), then the recurrence T(n) = 2T(n/2) + n
follows from the definition of the algorithm (apply the algorithm to two lists of half the size of
the original list, and add the n units taken to merge the resulting two sorted lists).
Thus, after simplifying the recurrence relation;
T(n) = O(nlogn).
Remarks:
Merge sort is a stable sort as long as the merge operation is implemented properly.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 103
DSA
Not a in-place sorting method, requires O(n) of additional memory space. The additional n
locations were needed because one couldn't reasonably merge two sorted sets in place.
Heap Sort
Heap sort is a comparison-based sorting algorithm which is much more efficient version of
selection sort. It also works by determining the largest (or smallest) element of the list, placing
that at the end (or beginning) of the list, then continuing with the rest of the list, but
accomplishes this task efficiently by using a data structure called a heap. Once the data list has
been made into a heap, the root node is guaranteed to be the largest(or smallest) element. When
it is removed(using deleteMin/deleteMax) and placed at the end of the list, the heap is
rearranged so the largest element of remaining moves to the root. Using the heap, finding the
next largest element takes O(log n) time, instead of O(n) for a linear scan as in simple selection
sort. This allows Heapsort to run in O(n log n) time.
Remarks:
It is in-place sorting method as utilized the same input array for placing the sorted subarray.
Not a stable sorting method as during deleteMin/deleteMax the order is not preserved for
the same key values. Consider an input that having all the same key values. The deleteMin
will pickup the last heap element as to place in the root location. Thereby, the order is
changed because in the sorted output later values appears before.
Quick Sort
Quick sort sorts by employing a divide and conquer strategy to divide a list into two sub-lists.
The steps are:
1. Pick an element, called a pivot, from the list.
2. Reorder the list so that all elements which are less than the pivot come before the pivot and
so that all elements greater than the pivot come after it (equal values can go either way).
After this partitioning, the pivot is in its final position. This is called the partition operation.
3. Recursively sort the sub-list of lesser elements and the sub-list of greater elements.
The base case of the recursion are lists of size zero or one, which are always sorted.
Performance
If the running time of quick sort for a list of length n is T(n), then the recurrence T(n) = T(size
of s1) + T(size of s2) + n follows from the definition of the algorithm (apply the algorithm to
two sets which are result of partioning, and add the n units taken by the partioning the given
list).
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 104
DSA
Each call finalizes the correct position of pivot element which can never be changed by any
subsequent calls.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 105
Operating System
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 106
Operating System
Operating System
One set of operating system services provides functions that are helpful to the user.
(i) User interface
(ii) Program executions
(iii) I/O operation
(iv) File system manipulation
(v) Communications
(vi) Error detection
(vii) Resource allocation
(viii) Protection and security
6. User Operating System Interface
There are two fundamental approaches for users to interface with the operating system.
One technique is to provide a command line interface or command interpreter that allows
operations to be performed by the operating system.
The second approach allows the user to interface with the operating system via a Graphical
User Interface (GUI).
7. System Calls
System calls provide an interface between the user process and the Operating System.
Basically, it provides the interface to the services made available to user processes by an
operating system.
System calls can be grouped roughly into five major categories:
(i)
(ii)
(iii)
(iv)
(v)
Process control
File manipulation
Device manipulation
Information maintenance
communications.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 108
Operating System
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 109
Operating System
New/Start
Job scheduler long
term scheduling
I/O completion
Waiting
for I/O
Waiting
Ready
Short term
scheduler
Running
exit
Indefinite waiting is
Deadlock
Medium-term
scheduling
Swapped
out
STOP
A process that has halted its execution but a record of the process is still maintained by the
operating system (on UNIX). Such process referred as zombie process.
Schedulers:
On multiprocessing systems, scheduling is used to determine which process is given to the
control of the CPU. Scheduling may be divided into three phases: long-term, medium-term, and
short-term.
Long-term scheduler: controls the degree of multi programming- the number of processes in
memory.
Short term scheduler: Decides which process in ready state showed to be allocated CPU. It
can also prempt the process in running state.
Mid term Scheduler: It swaps out the blocked process from running state to swapped state.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 110
Operating System
The long-term scheduler must make a careful selection. In general, most processes can be
described as either I/O bound or CPU bound.
An I/O bound process spends more of its time doing I/O operations than it spend doing
computations.
A CPU bound process using more of its time doing computations
The long-term scheduler should select a good process mix of I/O bound and CPU bound
processes.
If all processes are I/O bound, the ready queue will be almost always empty, and the short-term
scheduler will have little to do.
If all processes are CPU bound, the I/O waiting queue will almost always be empty and device
will go unused, and again the system will be unbalanced.
The system with the best performance will have a combination of CPU bound and I/O bound
processes.
Context Switching
Switching the CPU to another process requires saving the content of the old process and loading
the saved content for the new process. This task is known as a context switch. The context of
process is represented in PCB of a process; it includes the value of the CPU register and the
process state.
a. The context switching part of the scheduler ordinarily uses conventional load and store
operation to save the register contents. This means that each context switch requires (n +
m) b k time unit,
to save the state of a processor with n general registers and m status registers, assuming b
store operation are required to save a single register and each store instruction requires K
instruction time unit.
b. Total time in context switching = time requires in saving the old process in its PCB + time
requires in loading the saved state of the new process schedule to run.
The context of a process is represented in the PCB of a process; it includes the value of the CPU
registers, the process state (figure 4.2.2) and memory management information.
pointer
process state
Process identifier PID
program counter
registers
memory limits
list of open files
Operating System
It contains many pieces of information associated with a specific process, including these:
Each subsequent selection involves one less choice. The total number of possibilities is
computed by multiplying all the possibilities at each point, making the answer n!
Inter-process communication, concurrency and synchronization.
The concurrent processes executing in the operating system may be either independent
processes or cooperating processes. A process is independent if it cannot affect or be affected by
the other processes executing in the system.
A process is cooperating if it can affect or be affected by the other processes executing in the
system. Clearly, any process that shares data with other processes is a cooperating process.
Process cooperation is used for information sharing, computation speedup, modularity and
convenience.
Concurrent execution of cooperating processes requires mechanisms that allow processes to
communicate with one another and to synchronize their actions.
Inter-process communication:
Cooperating processes can communicate in a shared memory environment and another way to
achieve the same effect is for the operating system to provide the means for cooperating
processes to communicate with each other via an interprocess communication (IPC) facility.
IPC provides a mechanism to allow processes to communicate and to synchronize their actions
without sharing the same address space.
IPC is particularly useful in a distributed environment where the communicating processes may
reside on different computers connected with a network. An example is a chat program used on
the World Wide Web.
IPC is best provided by a message-passing system, and message passing systems can be defined
in many ways. Different issues when designing message passing systems are as follows:
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 112
Operating System
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 113
Operating System
3. Synchronization
Communication between processes takes place by calls to send( ) and receive( )primitives.
Message passing may be either blocking or non-blocking also known as synchronous and
asynchronous.
I.
Blocking Send: The sending process is blocked until the message is received by the
receivingprocess or by the mailbox.
II. Non blocking receive: The receiver retrieves either a valid message or a null.
III. Non blocking send: The sending process sends the message and resumes operation
IV. Blocking Receive : The receiver blocks until a message is available
4. Buffering:
Whether the communication is direct or indirect, message exchanged by communicating
processes reside in a temporary queue. Basically, such a queue can be implemented in three
ways:
I.
Zero capacity: The queue has maximum length of 0; thus the link cannot have any
message in it
II. Bounded capacity: The queue has finite length n; thus, atmost n messages can reside in
it.
III. Unbounded capacity: Any number of messages can wait in it.
Example of IPC systems: POSIX shared memory, mach and windows XP
Process Synchronization
A situation where several processes access and manipulate the same data concurrently and the
outcome of the execution depends on the particular order in which the access takes place, is
called a race condition.
To guard against the race condition, we need to ensure that only one process at a time can
manipulate the variable counter.
To make such a guarantee, we require same form of synchronization of the processes.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 114
Operating System
The critical section may be followed by an exit section. The remaining code is the remainder
section.
The general structure of a typical process Pi is shown in following code.
do{
entry section
Critical section
exit section
remainder section
} while (TRUE);
A solution to the critical section problem must satisfy the following 3 requirements.
I. Mutual exclusion:
If process Pi is executing in its critical section, then no other processes can be executed in
their critical sections.
II. Progress:
If no process is executing in its critical section and some processes wish to enter their critical
sections, then only those processes that are not executing in their remainder sections can
participate in the decision on which will enter its critical section next, and this section cannot
be postponed indefinitely.
III. Bounded waiting:
There exists a bound, or limit, on the number of times that other processes are allowed to
enter their critical section after a process has made a request to enter its critical section and
before that request is granted.
Petersons solution
A classic software-based solution to the critical section problem known as Petersons solution
We restrict our attention to algorithms that are applicable to only two processes at a time the
processes are numbered P0 and P1. For convenience, when presenting Pl we use Pj to denote the
other process; i.e. j = 1 i.
Note: Below is an algorithm for two processes. It can be generalized for n processes.
Algorithm 1
Shared variables:
int turn;
initiallyturn = 0
turn = I; Pi can enter its critical section
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 115
Operating System
Process Pi
do{
while (turn != i) ;
critical section
turn = j;
reminder section
} while (1);
Algorithm 2
Shared variables
boolean flag[2];
initiallyflag [0] = flag [1] = false.
flag [i] = true; Pi ready to enter its critical section
Process Pi
do {
flag[i] := true;
while (flag[j]) ;
critical section
flag [i] = false;
remainder section
} while (1);
Algorithm 3
Process Pi
do{
flag [i]:= true;
turn = j;
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 116
Operating System
Meets all three requirements; solves the critical-section problem for two processes
2. Synchronization hardware
Many machines provide special hardware instructions that allow us either to test and modify the
content of a word or to swap the contents of two words, atomically that is, as one
uninterruptable unit.
The Test And Set instruction can be defined as follows:
boolean Test And Set (boolean& lock)
{
boolean r;
r = lock;
lock = true;
return r;
}
If lock is true then we cant enter in to critical section
Initially lock = false.
while (Test And Set (lock))
critical section
lock = false ;
In above method problem is busy waiting, because when lock is true then process continues
execute that instruction.
(i) Swap instruction
void swap (boolean a, boolean b)
{
boolean c;
c = a;
a = b;
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 117
Operating System
b = c;
return b;
}
A global Boolean variable lock is declared and is initialized to false.
lock = false ;
while (swap (lock, true))
critical section
lock = false ;
In above also problem of busy waiting
3. Semaphores
A semaphores S is an integer variable that, apart from initialization, is accessed only through two
standard atomic operations: wait ( )and signal( ). These operations were originally termed P
(for wait) and V (for signal)
The classical definition of wait in pseudo code is:
wait (s) {
while ( s
0)
// keep waiting
s = s 1;
}
The classical definitions of signal in pseudo code is:
signal (s) {
s = s + 1;
}
A binary semaphore is a semaphore where count may only take on the value of 1 or 0.
A semaphores which can take on any non negative value may be referred to as general
semaphores or counting semaphores.
Let S be a counting semaphore. To implement it in term of binary semaphores we need the
following data structure:
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 118
Operating System
)
Signal (S1);
Wait (S2);
}
else
Signal(S2);
The signal operation on the counting semaphore s can be implemented as follows:
Wait (S1);
C + +;
if (C < = 0)
Signal(S2) ;
else
Signal(S1);
The main disadvantage of the mutual-exclusion solutions and the semaphores definition given
here, is that they all require busy waiting.
Busy waiting: While a process is in its critical section, any other process that tries to enter its
critical section must loop continuously in the entry code. This continual looping is clearly a
problem in a real multiprogramming system, where a single CPU is shared among many process.
Busy waiting wastes CPU cycle that some other process might be able to use productively. This
type of semaphore is also called a spinlock. Spinlocks are useful in multiprocessor systems.
When two or more process are waiting indefinitely for an event that can be caused only by one
of the waiting processes, these processes are said to be deadlocked.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 119
Operating System
Consumer
while (1) {
wait ( can_consume) ;
wait (mutex) ;
----------------------------------Consume data
----------------------------------Signal (mutex);
Signal ( can_produce);
For unbounded buffer remove wait ( can produce) from produce code.
Writer
wait (s) ;
Perform writing
Signal (s) ;
If (reader_count = 1)
wait (s) ;
Signal(mutex);
Perform Reading
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 120
Operating System
Wait(mutex);
reader count - - ;
If( reader count = = 0)
Signal (s) ;
Signal(mutex);
II.
When a philosopher thinks, she does not interact with her colleagues.
From time to time, a philosopher gets hungry and tries to pick up the two chopsticks that are
closest to her.
When a hungry philosopher has both her chopsticks at same time, she eats without releasing her
chopsticks.
When she finishes eating, she puts down both of her chopsticks and starts thinking again.
One simple solution is to represent each chopstick by a semaphore. A philosopher tries to grab
the chopsticks by executing a wait operation on that semaphore; she releases her appropriate
semaphore.
Thus the shared data are
Semaphore chopsticks [5];
Where all the element of chopsticks are initialized to 1.
do {
wait (chopstick [i]);
wait (chopstick[(i+1)%5]);
eat
Signal (chopstick [i]);
Signal (chopstick [ (i+1)% 5]);
Think
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 121
Operating System
} while (1);
Although this solution guarantees that no two neighbors are eating simultaneously, it
nevertheless must be rejected because it has the possibility of creating a deadlock.
Suppose that all five philosophers become hungry simultaneously, and each grabs her left
chopsticks. All the elements of chopsticks will now be equal to 0. When each philosopher tries to
grab her right chopstick, she will be delayed forever.
Solution that ensure freedom from deadlocks:
(i) Allow atmost four philosophers to be sitting simultaneously at the table.
(ii) Allow a philosopher to pick up her chopsticks only if both chopsticks are available.
(iii) A symmetric solution: An odd philosopher picks up first her left chopsticks and then her
right chopstick, whereas an even philosopher pick up her right chopstick and than her left
chopstick.
(iv) One should be right handed (first pick right chopsticks) and all other left handed or vice
versa.
Monitors
A high-level abstraction that provides a convenient and effective mechanism for process
synchronization.
Only one process may be active with in the monitor at a time
Monitor
{
Monitor name
|| shared
procedure
variable
p1 ()
declarations
}
procedure
pn()
initialization
code ()
.
}
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 122
Operating System
Condition Variables
Condition x, y:
Two operations on condition variables:
x.wait( ) a process that invoked the operation is suspended
x. signal ( ) resumes one of processes that invoked x.wait ( )
Monitor wit condition variables
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 123
Operating System
4.3: Threads
A thread is a single sequence stream within a process. It inhibits some properties of the process,
therefore it is sometimes termed as a light weight process (LWP) which is a basic unit of CPU
utilization.
It comprises of a thread ID, a program counter, a register set, and a stack.
It shares with other threads belonging to the same process its code section, data section, and
other operating-system resources, such as open files and signals.
If the process has multiple threads of control, it can do more than one task at a time.
Code
register
s
data
file
s
stack
Code
register
s
stack
data
register
s
stack
file
s
register
s
stack
Sharable
resources
Non-Sharable
resources
Thread
Thread
Single Threaded
Multi Threaded
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 124
Libraries:
Operating System
Multithreading Models:
1. Many to one model : The many-to-one model maps many user-level threads to one kernel
thread. Thread management is done in user space. So, it is efficient, but the entire process
will block if a thread makes a blocking system call.
Green threads a thread library available for Solaris 2 uses this model.
User thread
Kernel thread
Fig. 4.3.2
2. One-to-one model: The One-to-One model maps each user thread to a kernel thread. It
provides more concurrency than the many-to-one model by allowing another thread to run
when a thread makes a blocking system call.
User thread
Kernel thread
Fig. 4.3.3
3. Many-to-many model: This model multiplexes many user-level threads to a smaller or equal
number of kernel threads.
User thread
Kernel thread
Fig. 4.3.4
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 125
Operating System
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 126
Operating System
Scheduling algorithms:
CPU scheduling deals with the problem of deciding which of the processes in the ready queue
are to be allocated to the CPU. There are following types of scheduling:
1. First-Come, First-Served (FCFS) scheduling: Processes are scheduled in the order they are
received.
Advantage: Easy to implement.
Disadvantage: If big process comes first, then small process suffer. This is called convey
effect.
Process
Execution time
Arrival time
P1
20
0
P2
25
15
P3
10
30
P4
15
45
P
0
P
20
P
45
P
55
70
2. Shortest-Job-First (SJF) scheduling: Selects the process with the shortest expected processing
time, and do not preempt the process.
Advantage:Minimum average waiting time
Disadvantage:(1) There is no way to know the length of the next CPU burst.
(2) The constant arrival of small jobs can lead to indefinite blocking
(Starvation)of a long job.
3. Shortest-Remaining-Time-First (SRTF) scheduling: Selects the process with the shortest
expected remaining process time. A process may be preempted when another process
becomes ready.
4. Priority scheduling: Priority scheduling requires each process to be assigned, a priority
value. The CPU is allocated to the process with the highest priority. FCFS can be used in case
of a tie.
Priority scheduling can be either preemptive or non-preemptive
A major problem with priority scheduling algorithm is indefinite blocking (or starvation)
A solution to the problem of indefinite blocking of low- priority processes is aging. Aging is a
technique of gradually increasing the priority of processes that wait in the system for a long
time.
5. Round-Robin (RR) Scheduling: This algorithm is designed especially for time sharing
systems. It is similar to FCFS scheduling, but preemption is added to switch between
processes.
A small unit of time, called a time quantum (or time slice) is defined. A time quantum is
generally from 10 to 100 millisecond.
To implement RR scheduling, we keep the ready queue as First In First Out (FIFO) queue
of processes. New processes are added to the tail of the ready queue
The RR scheduling algorithm is preemptive.
If there are n processes in the ready queue and the time quantum is q, then each process
gets 1/n of the CPU time in chunk of at most q time units.
Each process must wait no longer than (n-1)q time units until its next time quantum
The performance of the RR algorithm depends heavily on the size of the time quantum
At one extreme, if the time quantum is very large, the RR policy is the same as the FCFS
policy
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 127
Operating System
If the time quantum is very small (say 1 micro second) the RR approach is called
processor sharing
6. Multilevel queue (MLQ) scheduling: A multilevel queue scheduling algorithm partitions the
ready queue into several separate queues.
The processes are permanently assigned to one queue, generally based on some property of
the process, such as memory size, process priority, or process type.
Each queue has its own scheduling algorithm.
Foreground RR
Back ground- FCFS
Highest priority
System process
Interactive process
Batch process
Student process
Lowest priority
Fig: 4.4.1. Multilevel Queue Scheduling
7. Multilevel Feedback Queue Scheduling: In MLQ (Multilevel Queue) algorithm, processes are
permanently assigned to a queue on entry to the system. Processes do not move between
queues. But in multilevel feedback queue scheduling, however, it allows a process to move
between queues. The idea is to separate processes with different CPU burst characteristics.
If a process uses too much CPU time, it will be moved to a lower priority queue. This scheme
levels I/O bound and interactive processes in the higher-priority queues. Similarly, a process
that waits too long in a lower-priority queue. This form of aging prevents starvation.
Queue
Quantum = 8
Quantum = 16
FCFS
Fig. 4.4.2.Multilevel feedback queues
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 128
Operating System
4.5: Deadlocks
A process requests resources: If the resources are not available at that time the process enters a
wait state. Waiting process may never again change state, because the resources they have
requested are held by other waiting processes this situation is called a deadlock.
A deadlock situation can arise if the following four condition hold simultaneously in a system.
(i)
(ii)
Mutual exclusion: At a time only one process can use the resource.
Hold and wait: A process must hold atleast one resource while waiting, to acquire
additional requested resources that are currently being held by other processes.
(iii) Circular wait: In circular wait, the processes in the system form a circular list or chain
where each process in the list is waiting for a resource held by the next process in the list.
R
&P
R.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 129
R3
R1
P1
P2
R1
P3
R2
R4
P2
P3
P1
R2
Operating System
P4
Cycle but no deadlock
The request for R by R may now be granted and all the arcs pointing towards P may be erased.
Similarly, the arcs for P can be earased and the reduced graph has no arcs.
Deadlocks prevention
Deadlock avoidance
Deadlock detection.
Recovery from deadlock.
Operating System
(ii) Deadlock avoidance: It requires additional information about how resource will be
requested by a process in its lifetime, by this information, it takes decision whether
resources should be allocated to the process or it should wait. Here, all available resources
allocated currently and resources will be requested in future kept in account.
(1) Safe state: A state is safe if the system can allocate resources to each process (upto its
maximum) in some order and still avoid a deadlock. More formally, a system is in a safe state
only if there exists a safe sequence.
A sequence of a processes < p1, p2 , . . . .pn> is a safe sequences for the current allocation
state if, for each Pi, the resources that Pi can still request can be satisfied by the currently
available resources plus the resources held by all the Pj with j<i. In this situation, if the
resource, that processes Pi needs are not immediately available, then Pi can wait until all
Pjhave finished. Then Pi can obtain all its needed resources, complete its designated task,
return its allocated resources and terminates now Pi+1 can obtain its needed resources and
so on.
If no such sequence exits then the system state is said to be unsafe.
unsafe
Deadlock
safe
Fig 4.5.3
Note:
A safe state is not a deadlock state. Conversely, a deadlock state is an unsafe state.
Not all unsafe states are deadlocks, however an unsafe state may lead to a deadlock.
(2) Bankers Algorithm
Assumption for Bankers Algorithm are as follows:
(a) Every process tells in advance, the number of resources of each type it may require.
(b) No process asks for more resources than what the system has.
(c) At the termination time every process will release resources.
Example :- Consider a system with 5 processes (P0 ..P4) and 4 resources (R0 ..R3).
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 131
Operating System
Allocation
R0 , R1, R2, R3
2 0 1 1
Need
R0 , R1, R2, R3
1 2 0 0
P1
2 0
1 1 0
1 0
P2
1 2
1 0
0 2
P3
2 1
0 1
2 0
P4
2 1 0
0 1 0
2 0 0
Available
R0 , R1, R2, R3
1 1 2 0
Q: Consider process P3 requests one instance of R1 and R0 . How do you ensure/know system is
in a safe state? (Use Bankers algorithm).
So request [P3]= ( 1,1,0,0)
Check if request [P3] Available (yes)
Process Max
Allocation
R0 , R1, R2, R3
R0 , R1, R2, R3
P0
3 2 1 1
2 0 1 1
Need
R0 , R1, R2, R3
1 2 0 0
P1
2 0
1 1 0
1 0
P2
1 2
1 0
0 2
P3
2 1
1 1
1 0
P4
2 1 0
0 1 0
2 0 0
Now, <P , P , P , P , P
Available
R0 , R1, R2, R3
0 0 2 0
is a safe sequence.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 132
Operating System
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 133
Operating System
CPU
Logical
Address
Relocation
Register
Yes
<
Physical
Address
Memory
no
Trap; Addressing
Error
Fig.4.6.1
Advantages:
(a) Simple memory management scheme.
(b) Memory is allocated entirely to one job.
(c) All the memory is available after job finished.
(d) No need of special Hardware support, only this method needs protection of user programs
from the operating system.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 134
Operating System
Disadvantages:
(a) Poor utilization of memory.
(b) Poor utilization of process.
(c) Inefficient for multi-programming .
(d) User job being limited to the size of available main memory.
(ii) Multiple-partition
(a) Multiple-partition with fixed size.
(b) Multiple-partition with variable-size.
(a) Multiple-partition with fixed size: One of the simplest methods for memory allocation is to
divide memory into several fixed-sized partitions. Each partition may contain exactly one
process. Thus, the degree of multiprogramming is bounded by the number of partitions. In
this multiple-partition method, when partition is free, a process is selected from the input
queue and is loaded into the free partition.
Initially all memory is available for user processes, and is considered as one large block of
available memory, a hole.
When a process arrives and needs memory, we search for a hole large enough for this
process. If we find one, we allocate only as much memory is needed. Keeping the rest
available to satisfy the future request.
The set of holes is searched to determine which hole is best to allocate. The first-fit, best- fit,
and worst- fit strategies are the most common ones used to select a free hole from the set of
available holes.
First fit : (i) Allocate the first hole that is big enough.
Advantage: Searching time is less
Disadvantage: More internal fragmentation.
For the final request, the first hole not smaller than 5K starts at location 60K.
Internal fragmentation:Memory that is internal to a partition but not being used.
Best fit: Allocating the smallest hole that is big enough.
Advantage: (i) Less internal fragmentation
Disadvantage:
(i) It takes more searching time
(ii) Program cant grow dynamically
Worst fit: Allocate the largest hole.
Advantage: (i) program can grow dynamically
Disadvantage: (i) Suffer from more internal fragmentation
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 135
Operating System
Logical address space is divided into equal size pages. Page size is generally power of 2.
No. of pages (N) =
N=
No. of bits for pages (b) = log
Page offset depends on page size number of bits for page offset (d) = log (page size)
Similarly, physical address space is divided into equal size frames.
Frame size = page size
Example: logical address is of 32 bits. Physical address space is 128MB. Page size is 8 KB. What is
the size of page table in bytes?
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 136
umber of pages
=
umber of frames
Operating System
bits i. e. ,
entries)
=
=
Each page table entry si e
Physical address
f0000 . . . . . . 0000
offset
f1111 . . . . . . 1111
Physical memory
Page table
Fig 4.6.2
The page size (like the frame size)is defined by the hardware . The size of a page is typically a
power of 2.
When we use a paging scheme, we have no external fragmentation: Any free frame can be
allocated to a process that needs it. However, we may have some internal fragmentation.
The hardware implementation of the page table can be done in several ways:
(i) The page table is implemented as a set of dedicated registers.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 137
Operating System
(ii) If page table is very large then it is kept in main memory and a page table base register
(PTBR) points to the page table.
(iii) The standard way is to use a special, small, fast-lookup hardware cache, called Translation
Looks aside Buffer(TLB). The TLB is associative, high-speed memory. Each entry in TLB
consists of two parts a key(or tag) and a value.
If the page number is in the TLB, it is known as a TLB hit. If not then TLB miss.
The percentage of times that a particular page number is found in the TLB is called the hit ratio.
TLB hit ratio h=
Now effective memory access time with TLB = h(t1 +t2) +( 1-h) (t1 +2t2)
Where t1 TLB access time
t2 Memory access time.
Logical
address
p
CPU
frame
page
number number
TLB hit
TLB
p
TLB miss
______
______
f
______
______
Physical
address
Physical
memory
Page table
Fig. 4.6.3
Memory protection in a paged environment is accomplished by protection bits that are
associated with each frame. Normally, these bits are kept in the page table. One bit can define a
page to be read-write or read only.
One more bit is generally attached to each entry in the page table: a valid- invalid bit.
Valid bit indicates that the associated page is in the process logical address space, and is thus a
legal (or valid)page
If the bit is set to invalid this value indicates that the page is not in the process logical-address
space.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 138
Operating System
Hierarchical Paging:
For large logical address space we can divide the page table into smaller pieces, division can be
two, three or four level paging scheme.
Hashed Page Table:
A common approach for handling address spaces larger than 32 bits is to use a hashed page
table, hash value being the virtual page number. Each entry in the hash table contains a linked
list of elements of three fields.
(a) The virtual page number
(b) The value of the mapped page frame and
(c) A pointer to the next element in the linked list
Inverted Page Table:
An inverted page table has one entry for each real page of memory. Each entry consists of virtual
address of the page stored in that real memory location with information about the process that
own that page.
Each virtual address in the system consists of the triple < process-id, page-number,
offset>
Each inverted page table entry is a pair < process-id, page number>
(iv) Segmentation
It is a memory-management scheme that supports user view of memory. A logical address space
is a collection of segments. Each segment has a name and a length. The addresses specify both
the segments name and the offset within the segment. The user specifics each address by two
quantities: a segment name and an offset. For simplicity, segments are numbered and are
referred to by a segment number, rather than segment name.
s
Limit
base
Segment table
CPU
S
d
ddd
yes
<
no
Physical memory
Fig. 4.6.4
.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 139
Operating System
Advantages of segmentation
(a) Allows growing and shrinking of each segment independently.
(b) Provides sharing procedures on data between several processes.
(c) Procedure is put in segment , then it is very easy to modify and recompile that procedure
without distributing others
Disadvantage:External fragmentation
(v) Segmentation with paging
If segmentation is combined with paging then this combination is very useful in many situations.
Virtual Memory:
Virtual memory is a technique that allows the execution of process that may not be completely in
memory.
The instructions being executed must be in physical memory. The first approach to meet this
requirement is to place the entire logical address space in physical memory overlays and
dynamic loading can help to ease this restriction, but they generally require special precautions
and extra work by the programmer.
Overlays: The idea of overlays is to keep in memory only those instructions and data that are
needed at the given time. When other instructions are needed, they are loaded into space
occupied previously by instructions that are no longer needed.
Dynamic loading and linking:
The concept of dynamic linking is similar to that of dynamic loading. Rather than loading being
postponed until execution time, linking is postponed. This feature is usually used with system
libraries such as language subroutine libraries.
Virtual memory is the separation of user logical memory from physical memory. This separation
allows extremely large virtual memory to be provided for programmer when only a smaller
physical memory is available.
Virtual memory is commonly implemented by demand paging. It can also be implemented in a
segmentation system. Several systems provide a paged segmentation scheme, where segments
are broken into pages.
(i) Demand paging
A demand paging system is similar to a paging system with swapping. Processes reside on
secondary memory. When we want to execute a process, we swap it into memory. Rather than
swapping the entire process into memory, however we use a lazy swapper.
A lazy swapper never swaps a page into memory unless that page is needed. A swapper
manipulates entire processes whereas a page is concerned with the individual pages of a
process. We thus use pages, rather than swapper in connection with demand paging.
With this scheme we need some form of hardware support to distinguish between these pages
that are in memory and these that are on the disk.
The valid invalid bit scheme can be used for this purpose valid means page is legal and in
memory. Invalid means page either is not valid or is valid but currently is on the disk.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 140
Operating System
0
1
2
Valid invalid bit
Frame
0
1
2
3
4
5
6
7
A
B
C
D
E
F
G
H
4
0
1
2
3
4
5
6
7
v
i
v
i
i
v
i
i
5
6
7
8
9
10
11
Page table
Logical memory
A B
a
C Da E
F
12
13
14
Physical memory
Fig. 4.6.5
Pure demand paging: Never brings a page into memory until it is required.
Effective access time for a demand paged memory is:
Effective access time = (1-P)
ma + P
Operating System
Oldest page
Total page
faults
4
0
1
4
9
2
3
1
2
3
0
1
2
3
4
0
1
2
3
4
0
1
2
3
4
4
0
1
2
5
3
4
0
1
6
2
3
4
0
7
1
2
3
4
8
0
1
2
3
9
4
0
1
2
10
3
3
2
2
3
1
1
2
3
3
0
0
2
3
4
3
0
2
3
4
2
0
2
3
4
4
4
2
3
5
3
4
2
3
5
2
4
2
3
5
1
4
1
3
6
1
1
2
3
8
0
4
1
0
7
0
0
1
2
9
4
4
1
0
7
4
4
0
1
10
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 142
Operating System
CPU Utilization
thrashing
Degree of multiprogramming
Fig. 4.6.6
.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 143
Operating System
We can limit the effects of thrashing by using a local replacement algorithm (or priority
replacement algorithm). With local replacement, if one process starts thrashing, it cannot steal
frames from another process and cause the latter to thrash also. Pages are replaced with regard
to the process of which they are part
Working Set model:
The working-set model is based on the assumption of locality. This model uses a parameter , to
define the working-set window. The idea is to examine the most recent page reference.
The set of pages in the most recent
If a page is in active use, it will be in the working set. If it is no longer being used, it will drop
from the working set
time units after its last reference. Thus, the working set is an
approximation of the programs locality.
Example
= 10 memory reference
Reference string
6 1 5 7 7 7 7 5 1 6 6 4 1 2 3 4 4 4 3 4 3 4 4 4 1 3 2 3 4 4 3 4 4 4
WS(t1 ) = { 1 , 2, 5 , 6, 7}
WS(t2 ) = { 3 , 4}
Fig. 4.6.7
.
The accuracy of the working set depends on the selection of
encompass the entire locality.
If is too large, it may overlap several localities. In the extreme, if
the set of pages touched during the process execution.
. If
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 144
Operating System
Beginning
Rewind
Read or write
Fig 4.7.1
(ii) Direct /Random / Relative Access method: A direct access files are made up of collection of
records, each of which is identified by a number called record number.
In direct access file, one can access any record at any time for reading or writing.
Database are often of this type.
(iii)Index Sequential Access Method (ISAM): This method is developed by slight modification in
the direct access method. The index contains pointer to the various blocks or records. To
find an entry in the file, we first search the index and find the pointer of the record to access
file entry directly.
(2) Directory Structure:
One-level Directory or Single-level-Directory: If system supports only a single directory and
all files are stored in the same directory, then it is known as one-level directory.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 145
Operating System
It is simplest directory structure but it has own limitations. One is, if number of files
increases, then it is difficult to remember names of files any-where conflict can occur in file
names. Each file must have unique name.
Root Directory
A
B
files
Fig. 4.7.2
Two level Directory: In this type of directory system, a different directory is given to a
different user. Each of such directories is related to a user which has a group of files.
Tree-Structured Directory:
This is most common directory structure in use. In this type of structure, there is a root
directory and all intermediate directories are known as subdirectories. Each subdirectory
contains information of files and subdirectories within it.
Acyclic- Graph Directory Structure:
This type of directory structure allows sharing files or directories among users. This is graph
like structure where no cycle exists.
The problems with acyclic-graph directory system are as follows: First, a file may have more
than one absolute address. Second is the problem of deletion. If a user wanted to delete
shared file, then after deletion other user will point to garbage value that contain dangling
pointers.
General-Graph Directory Structure:
The directory system supports graph like structure. In graph, any node can connect to any
other node. Similarly, in graph-directory system, directories and files can be connected in
any form. The prerequisite of acyclic-graph directory system is that there should not be any
cycle. But general-graph directory system supports cycles also.
(3) Allocation methods: An allocation method refers to show disk blocks are allocated for files.
(i) Contiguous Allocation method: It stores each file as a contiguous blocks of data in the disk.
Thus if full size is n KB and the block size on disk is 1 KB, then file utilized n consecutive
blocks of the disk in contiguous allocation as shown below:
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 146
Operating System
Directory
0
10
11
File
Start
Length
Count
New
List
Operating System
Advantage:
a) There is no wastage of memory. Every block can be used.
b) There is no external fragmentation. Only internal fragmentation is possible in last block.
c) Directory needs only two values to remember file i.e. starting block address and end
block address
Disadvantages:
a) A block can be accessed directly. Only sequential reading is possible, since every block
contains address of next block of file.
b) Pointer uses same disk space for storing. This space is wasted totally, since it is not a part
of information. If data part of file is larger, then space used for pointer consumes more
space than data itself.
c) Access time, seek time, and latency time are higher than contiguous allocation.
(iii)Index Allocation/Linked Allocation Using Index:
In this allocation scheme, a table is maintained that contains pointer to next block of disk.
This table is known as index table.
This index table is prepared to speed up access to file. This is also known as FAT (File
Allocation Table). Index allocation support direct access, without suffering from external
fragmentation.
Advantages:
a) Entire block is available for data.
b) Random access is much easier.
c) Like linked list allocation, it requires only the starting block number to find entire file.
d) Index table is used to find next block entry. Since index table always remains in memory,
so no disk reference is necessary for each access.
Disadvantages:
Extra space is required to keep index table in memory all the time.
Free space management
a) Bit vector or Bit map: Each block is represented by 1 bit. If the block is free, the bit is 1, if
block is allocated, the bit is 0.
Advantage: It is simple and efficient in finding the first free block, or n consecutive free
blocks on the disk.
Disadvantages: Difficult to maintain 0,1 bits.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 148
Operating System
b) Linked free space management: It just store free blocks entry, remaining free block
connected to each other through linked list.
Advantages: (i)Less size of information.
Disadvantages:
I.
II.
c) Index free space management: OS maintain block which maintain free blocks entry. This
block called indexed block.
If it is not possible into one block then we will do two level indexing.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 149
Operating System
(i) I/O scheduling: To schedule a set of I/O requests means to determine a good order in which
to execute them. The order in which application issue system calls rarely is the best choice.
Scheduling can improve overall system performance, can share device access fairly among
processes and can reduce the average waiting time for I/O to complete.
(ii) Buffering: A Buffer is a memory area that stores data while they are transferred between
two devices or between a device and an application.
Buffering is done for three reasons:
One reason is to cope with a speed mismatch between the producer and consumer of a data
stream.
A second use of buffering is to adapt between devices that have different data transfer sizes.
Such disparities are especially common in computer networking where buffers are used
widely for fragmentation and reassembling of message.
A third use of buffering is to support copy semantics for application I/O.
(iii)Caching:
A cache is a region of fast memory that holds copies of data. Access to the cached copy is
more efficient than access to the original.
Caching and buffering are distinct functions, but sometimes a region of memory can be used
for both purposes.
(iv) Spooling and device Reservation:
A spool is a buffer that holds output for a device, such as a printer, that cannot accept
interleaved data streams. Although a printer can serve only one job at a time, several
application may wish to print their output concurrently, without having their output mixed
together.
The operating system solves this problem by intercepting all output to the printer. Each
applications output is spooled to a separate disk file.
When an application finishes printing the spooling system queues the corresponding spool
file for output to the printer.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 150
Operating System
2. Disk structure
Modern disk drives are addressed as large one dimensional arrays of logical blocks, where
the logical block is the smallest unit of transfer.
The size of a logical block is usually 512 bytes, although some disks can be low-level
formatted to choose a different logical block size, such as 1024 bytes.
The one dimensional array of logical blocks is mapped onto the sectors of the disk
sequentially. Sector 0 is the first sector of the first track an the outer most cylinder.
The number of sectors per track is not a constant on same drives.
Constant linear velocity (CLV): The density of bits per track is uniform. The greater its length, so
the more sectors it can hold.
Alternatively, the disk rotation speed can stay constant, and density of bits decreases from inner
track to outer track to keep the data rate constant. This method is used in hard disks and is
known as constant angular velocity (CAV).
Disk Scheduling
The Seek time is the time for the disk arm to move the heads to the cylinder containing the
desired sector.
The Rotational latency is the additional time waiting for the disk to rotate the desired sector to
the disk head.
The Disk bandwidth is the total number of bytes transferred divided by the total time between
the first request for service and the completion of the last transfer.
We can improve both the access time and the bandwidth by scheduling the servicing of disk I/O
requests in good order.
(i) FCFS : ( First-Came First-Serve) Scheduling
This is the simplest disk scheduling. As its name FCFS, the request for block that comes first
is serviced first.
For example ,the request for I/O to blocks are on following cylinder 4, 34, 10, 7, 19, 73, 2, 15,
6, 20 that order. If the disk head is initially at cylinder 50,
0
10 15
19
20
34
50
73 100
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 151
Operating System
10 15
19
20
34
50
73 100
10 15
19
20
34
50
73 100
Operating System
In practice, neither algorithm is implemented this way. More commonly, the arm goes only
as far as the final request in each direction. Then, it reverses direction immediately, without
going all the way to the end of the disk.
These versions of SCAN and C-SCAN are called LOOK and C-LOOK scheduling, because they
look for a request before continuing to move in a given direction.
Disk Management
(i) Disk Formatting:
Before a disk can store data, it must be divided into sectors that the disk controller can read
and write. This process is called low level formatting (or physical formatting).
Low-level formatting fills the disk with a special data structure for each sector. The data
structure for a sector typically consists of a header, a data area (usually 512 bytes in size),
and a trailer.
Logical formatting (or creation of a file system):
In this step, the operating system stores the initial file-system data onto the disk.
These data structures may include maps of free and allocated space (a FAT or inodes) and an
initial empty directory.
Boot Block
It contains code required to boot the operating system.
For example, MS DOS uses one 512-byte block for its boot program.
Sector 0
Boot block
Sector 1
FAT
Root directory
Data blocks
sub
Su
(directories)
Fig. 4.8.4
FAT: File allocation table, which stores the position of each file in the directory tree.
Root directory: Every file within the directory hierarchy can be specified by giving its path
name from the top of the directory hierarchy, the root directory. Such absolute path names
consist of the list of directories that must be traversed from the root directory to get the file,
with slashes separating the components. The leading slash indicates that the path is absolute
that is, starting at the root directory.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 153
Operating System
D1
<O2 write>
D2
<O1 exe>
<O4 print>
<O3read>
Fig. 4.9.1
The access right<O4,{Print}> is shared by both D1 and D2, implying that a process executing in
either of these two domain can print object O4.
Domain may be a user, process and procedure.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 154
Operating System
Ring n-1
Fig. 4.9.2
Let Di and Dj be any two domain rings. If j<i, then Di is a subset of Dj. That is, a process executing
in domain Dj has more privileges than does a process executing in domain Di.
(2) Access Matrix
The rows of the access matrix represent domains, and the columns represent objects. Each entry
in the matrix consists of a set of access rights. Because the column defines objects explicitly, we
can omit the object name from the access right.
object
F1
F2
F3
Printer
Domain
D1
Read
Read
D2
Print
D3
Read
Execute
D4
Read, write
Read, write
Table 4.9.1
The entry access (i, j) defines the set of operations that a process, executing in domain D i, can
invoke an object Dj.
The Security Problem
Security violations (or misuse) of the system can be categorized as intentional (malicious) or
accidental.
It is easier to protect against accidental misuse than against malicious misuse.
Malicious access can be of following forms:
(i) Unauthorized reading/modification/destruction of data.
(ii) Preventing legitimate use of system (or denial of service).
To protect the system, we must take security measures at physical, human, network and OS
level.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 155
Operating System
User Authentication
Generally, authentication is based on user possession (a key or card) , user knowledge(a user
identifier and password), and/or a user attribute(finger print, retina pattern or signature).
(A) Passwords:
If the user supplied password, matches the password stored in the system, the system
assumes that the user is legitimate.
(B) Encrypted Passwords:
It is difficult to keep the password secret within the computer. UNIX system uses encryption
to avoid the necessity of keeping its password list secret.
(C) One Time Password:
When a session begins, the system randomly selects and presents one part of a password
pair; the user must supply the other part.
(D) Biometrics:
Palm or hand readers are common to secure physical access. Finger readers have become
accurate and cost-effective, and should become more common in the future. These devices
read your fingers ridge pattern and convert them into a sequence of numbers.
Program Threats:
When a program written by one user may be used by another user, misuse and unexpected
behavior may ensure. Some common methods by which such behavior may occur are: Trojan
horse, trap doors, and stack and buffer overflow.
(i) Trojan Horse:
Many systems have mechanisms for allowing programs written by users to be executed by
other users. If these programs are executed in a domain that provides the access rights of
the executing users, the other users may misuse these rights. A code segment that misuses
its environment is called a Trojan Horse.
(ii) Trap Door:
The designer of a program or system might leave a hole in the software that can be only used
by them. This type of security breach (or trap door) was shown in the movie War Games.
(iii) Stack and Buffer overflow: Exploits a bug in a program
System Threats
System threats create a situation in which operating-system resources and user files are
misused.
(i) Worms
A worm is a process that uses the spawn mechanism to clobber system performance. The
worm spawns copies of itself, using up system resources and perhaps locking out system
used by all other processes.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 156
Operating System
(ii) Viruses
Viruses are designed to spread into other programs and can wreak havoc in a system,
including modifying or destroying files and causing system crashes and program
malfunctions.
Denial of service
It does not involve gaining information or stealing resources, but rather disabling legitimate use
of a system or facility.
This check would be expensive and needs to be performed every time the object is accessed.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 157
DBMS
What To Model?
Static Information
Data -- Entities
Associations -- Relationships among entities
Dynamic Information
Processes -- Operations/transactions
Integrity constraints -- Business rules/regulations and data meanings
What is Data Model?
A collection of tools for describing:- data, data relationships, data semantics, data constraints
Data Model:- A data model is a collection of concepts that can be used to describe the structure of
database.
Schema:- The description of a database is called the database schema.
System model tools:
Data Flow Diagram (DFD)
Hierarchical Input Process and Output (HIPO)
State Transition Diagrams (STD)
Entity Relationship (ER) Diagrams
Entity-Relationship Model (ER Model)
A data model in which information stored in the database is viewed as sets of entities and sets of
relationships among entities and it is diagram-based representation of domain knowledge, data
properties etc...., but it is more intuitive and less mechanical. Entity Relationship is a popular
high-level conceptual data model.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 158
DBMS
Components of E R model
a. Entity
b. Relationship
c. Attributes
Entity:- The basic object that the ER model represents is an entity, which is a thing in the real
world with an independent existence.
Entity set:-A set of entities of the same type.
Entity sets need not be disjoint. Example: A person entity could be in both the customer and
employee sets
Types of Entities
Entities with Physical existence
Example: Student, Customer, Book etc
Entities with Conceptual existence
Example: Sale, University course etc
Relationship:An association among two or more Entities.
Example:The Relationship between a Faculty and Student i.e. Faculty take course for Student
Relationship Set
A set of Relationships of the same type
Attribute:The particular properties of entity that describe it
Example: A student entity might have attributes such as: roll number, name, age, address etc
As all entities in an entity set have the same attributes, entity sets also have attributes - the
attributes of the contained entities. The value of the attribute can be different for each entity
in the set.
Types of Attributes:i) Composite attributes
ii) Simple attributes
iii) Single-valued attributes
iv) Multivalued attributes
v) Stored attributes
vi) Derived attributes
Relationship Degree:The number of entity types associated with that relationship (fig. 5.1.1).
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 159
DBMS
Binary
Employee
Work
Employee
Department
Supervise
n-ary
Ternary
project
Part
Supply
Supplier
Fig. 5.1.1
1
Department
is managed
by
1
Employee
The one-to-one relationship has the cardinality or degree of one and only
one in both direction.
1
one to - many
Department
has
Employee
Employee
N
works-on
Fig. 5.1.2
Project
task-assignment
start-date
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 160
DBMS
Multiplicity: Multiplicity constrains the way that entities are related - it is a representation of the
policies (or business rules) established by the user or enterprise. Multiplicity actually consists of
two separate constraints.
Cardinality: Cardinality describes the maximum number of possible relationship occurrences for
an entity participating in a given relationship type i.e. How many relationship instances is an
entity permitted to be linked to.
Participation: Participation determines whether all or only some entity occurrences participate
in a relationship i.e. how is an entity linked to the relationship.
Total participation (indicated by double line): Every entity in the entity set participates in atleast
one relationship in the relationship set.
Partial participation: Some entities may not participate in any relationship in the relationship set
Note: Cardinality limits can also express participation constraints.
Weak and Strong Entity Set
A Strong Entity set has a primary key. All tuples in the set are distinguishable by that key.
A Weak entity set has no primary key unless attributes of the strong entity seton which it
depends are included.
Tuples in a weak entity set are partitioned according to their relationship with tuples in a
strong entity set.
Tuples within each partition are distinguishable by a discriminator, which is a set of
attributes.
The discriminator (or partial key) of a weak entity set is the set of attributes that
distinguishes among all the entities of a weak entity set.
Weak Entity set is represented by double rectangles.
Underline the discriminator of a weak entity set with a dashed line.
The primary key of the Strong entity set is not explicitly stored with the Weak entity set,
since it is implicit in the identifying relationship.
We want to avoid the data duplication and consequent possible inconsistencies caused by
duplicating the key of the strong entity.
Weak entities reflect the logical structure of an entity being dependent on another entity.
Weak entities can be deleted automatically when their strong entity is deleted.
Weak entities can be stored physically with their strong entities.
Existence Dependencies
If the existence of entity x depends on the existence of entity y, then x is said to be existence
dependent on y. y is a dominant entity, x is a subordinate entity. If y entity is deleted, then all its
associated x entities must be deleted.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 161
DBMS
Relationship Set
Identifying
Relationship Set for
Work Entity Set
one-to one
Relationship
many to one
Relationship
Attribute
Multivated Attribute
Derived Attribute
Primary Key
Total participation of
Entity Set in
Relationship
Discriminating
Attribute of Weak
Entity Set
many to many
Relationship
A1
A2
A3
*
R
Fig. 5.1.3
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 162
DBMS
Keys
A super key of an entity set is a set of one or more attributes whose values uniquely
determine each entity.
A candidate key of an entity set is a minimal super key.
Although several candidate keys may exist, one of the candidate keys is selected to be the
primary key.
Aggregation
Aggregation is an abstraction through which relationships are treated as higher-level entities.
Thus the relationship between entities A and B is treated as if it were an entity C.
Utility of E-R Model
It maps well to the relational model. The constructs used in the ER model can easily be
transformed into relational tables.
It is simple and easy to understand with a minimum of training. Therefore, the model can be
used by the database designer to communicate the design to the end user.
In addition, the model can be used as a design plan by the database developer to implement
a data model in specific database management software.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 163
DBMS
Structural modeling: Model data entities, their attributes and association among entities.
Constraint specification: Model security and integrity-constraints.
Operations: Specify Meaningful operations associated with entities/objects and their
associations.
Use the data model (relational, object oriented etc) of a DBMS to specify the structural
properties and constraints.
Determine the structural properties, constrains and operations which are not captured
by the data model and are to be implemented in application programs
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 164
DBMS
Has a name that is distinct from all other relation names in the relational schema.
Each cell contains exactly one atomic (single) value.
Each attribute has a distinct name.
The values of an attribute are all from the same domain.
Each tuple is distinct; there are no duplicate tuples.
The order of attributes has no significance.
The order of tuples has no significance, theoretically. (however, in practice, the order may
affect the efficiency of accessing tuples.)
A relation is defined as a set of tuples
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 165
DBMS
Normalization of data can be looked upon as a process of analyzing the given relation schemas
based on their FDs and primary keys to achieve the desirable properties of (1) minimizing
redundany and (2) minimizing the insertion, detection, and update anomalies.
The Concept of Functional Dependency:
Functional dependency describes the relationship between attributes in a relation.
If A and B are attributes of relation R, B is functionally dependent on A (denoted by A
B),
if each value of A in R is associated with exactly one value of B in R.
Given a table T with at least two attributes A and B, we say that A
B (A functionally
determines B, or B is functionally dependent on A)if it is the intent of the designer that for
any set of rows that might exist in the table, two rows in T cannot agree on A and disagree on
B. More formally, given two rows r and r in T, if r (A)= r (A) then r (B) = r (B).
Inference Rules for Functional Dependencies
The following six rules IR1 through IR6 are well-known inference rules for functional
dependencies.
X;
repeat
old X
X ;
old X )
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 166
DBMS
Essence of Normalization
When a relation has more than one theme, break it in to two (or more) relations, each having a
single theme.
Functional dependency and the process of normalization
Three normal forms were initially proposed, which are called first (1NF), second (2NF) and
third (3NF) normal form. Subsequently, a stronger definition of third normal form was
introduced and is a referred to as Boyce- Codd normal form (BCNF) .All these normal form are
based on the functional dependencies among the attribute of a relation.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 167
DBMS
Steps in Normalization
UNNORAMALIZED
RELATION (non NF)
Remove repeating
group
NORMALIZED
RELATION (1NF)
Remove partial
dependencies
SECOND NORMAL
FORM (2NF)
Remove transitive
dependencies
THIRD NORMAL
FORM (3 NF)
Remove overlapping
candidate keys
BOYCE CODD NORMAL
FORM (BCNF)
Remove multi-valued
dependencies
FOURTH NORMAL FORM
(4NF)
Remove non-implied joindependencies
FIFTH NORMAL FORM
(5NF)
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 168
DBMS
Join dependencies
A join dependency (JD), denoted by JD(R , R
. R ), specified on relation schema R, specifies a
constraint on the states r of R. The constraint states that every legal state r of R should have a
non-additive (lossless) join decomposition into R , R
. R that is for every such r we have
(
(r),
(r),
(r))
r.
A join dependency JD (R , R
. R ), specified on relation schema R, is a trivial JD if one of the
relation schemas R in JD (R , R
. R ) is equal to R.
Fifth Normal Form
A relation schema R is in 5NF with respect to a set F of functional multivalued & join
dependencies if, for every non-trivial dependency JD(R , R
. R ),isF every R is a super key of
R
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 169
DBMS
3NF
BCNF
4NF
Yes
Yes
Yes
No
No
Yes
Preserve FDs
Yes
Maybe Maybe
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 170
DBMS
Relational Operators
We call relational operators as a set of operators that allow us to manipulate the tables of the
database. Relational operators are said to satisfy the closure property, since they operate on
relations to produce new relations. When a relational operator is used to manipulate a relation,
we say that the operator is applied to the relation.
The Selection Operator
When applied to a relation r, this operator produces another relation whose rows are a subset of
the rows of r that satisfy a specified condition. The resulting relation and r have the same
attributes.
Definition: Let r be a relation, A an attribute of r, and a is an element of the Domain (A). The
Selection of r on attribute A is the set of tuples t of r such that t (A) = a. The Selection of r on A is
denoted
(r). The selection operator is a unary operator. That is, it operates on one relation
at a time.
The Projection Operator
The projection operator is also a unary operator. The selection operator chooses a subset of the
rows of the relation, whereas the projection operator chooses a subset of the columns.
Definition:The projection of relation r onto a set X of its attributes, denoted by
(r), is a new
relation that we can obtain by first eliminating the columns of r not indicated in X and then
removing any duplicate tuple. The columns of the projection relation are the attributes of X.
The Equijoin Operator
The equijoin operator is a binary operator for combining two relations on all their common
attributes. That is, the join consists of all the tuples resulting from concatenating the taples of
the first relation with the tuples of the second relation that have identical values for a common
attribute X.
Definition:Let r be a relation with a set of attributes R and let s be another relation with a set of
attributes S. In addition, let us assume that R and S have some common attributes, and let X be
that set of common attributes. That is, R S = X.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 171
DBMS
The join of r and s, denoted by r Join s, is a new relation whose attributes are the elements of R
S. In addition, for every tuple t of the r Joins relation, the following three conditions need to be
satisfied simultaneously: (1) t(R) = t for some tuple t of the relation r, (2) t(S) = t for some
tuple t of the relation s, and (3) t (X) = t (X).
Set Operators on Relations
Union:- The result of this operation, denoted by R S, is a relation that includes all tuples that are
either in R or in S or in both R and S. Duplicate tuples are eliminated.
Intersection:-The result of this operation, denoted by R S, is a relation that includes all tuples
that are in both R and S
Set difference (or MINUS):The result of this operation, denoted by R S, is a relation that includes all tuples that are in R
but not in S.
Cartesian Product (or Cross Product) :This operation is used to combine tuples from two relation in a combinational fashion. In
general, the result of R (A , A -----A ) S (B , B ------B ) is a relation Q with degree n + m
attributes Q (A , A ----- A , B , B ------ B ) is that order
n tuples.
Natural Joins
The simplest sort of match is the natural join of two relations R and S, denoted R S, in which
we match only those pairs between R and S that agree in whatever attributes are common to the
schemas of R and S.
A Complete Set of Relational Algebra operations:The set of relational algebra operations { , , , , x} is a complete set; that is any of the other
original relational algebra operation can be expressed as a sequence of operation from this set.
Tuple Relational Calculus
The tuple relational calculus is based on specifying a number of tuple variables.Each tuple
variable usually ranges overa particular database relation, meaning that the variable may take as
its valuefrom any individual tuple in that relation. A simple tuple relational calculus query is of
the form
{t|COND (t)}
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 172
DBMS
where t is a tuple variable and COND(t) or FORMULA is a conditional expression involving t. The
result of such a query is the set of all tuples t that satisfy COND (t).
For example, to find all employees whose age is above 30, we can write the following tuple
calculus expression:
(t | EMPLOYEE (t) and t.age> 30}
The condition EMPLOYEE (t) specifies that the range relation of tuple variable t is EMPLOYEE.
Each EMPLOYEE tuple t that satisfies the condition t.age> 50000 will be retrieved.
A (well-formed) condition or formula is made out of one or more atoms, where an atom has one of
the following forms:
We recursively build up condition or formulae from atoms using the following rules:
An atom is a formula;
If F1 and F2 are formulae, so are their conjunction F1 F2, their disjunction F1 F2 and the
negation ~F1 are also formula;
If F is a formula with free variable X, then (X)(F) and (X)(F) are also formula.
Research
F2 : (t) (d.DNUMBER
F3 : (d) (d.MGRSSN
t.DNO)
333445555)
The tuple variable d is free in F1, whereas it is bound to the universal quantifier () in F3.
Variable t is bound to the () quantifier in F2.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 173
DBMS
.,xn+m)}
where x1, x2, xn,xn+1,xn+2, , xn+m are domain variables that range over domains (of attributes)
and COND is a condition or formulaof the domain relational calculus.
A formula is made up of atoms.The atoms of a formula are slightly different from those for the
tuple calculus and can be one of the following:
1. An atom of the form R (x1,x2, ...... ,xj), where R is the name of a relation of degree j and
each xi, 1 i j, is a domain variable. This atom states that a list of values of <x1, x2,. .
.,xj>must be a tuple in the relation whose name is R, where xi, is the value of the ith
attribute value of the tuple. To make a domain calculus expression more concise, we
drop the commas in a list of variable; thus we write
{x1, x2......,x| R(x1x2x3) and----}
instead of:
{x1, x2 . . .., xn | R(x1,x2,x3) and ... }
2. An atom of the form xi op xj, where op is one of the comparison operators in the set
{=,>,, <, } and xi and xj are domain variables.
3. An atom of the form xi op c, where op is one of the compressionoperators in the set
{ , <, , >, , }, x and x are domain variables and c is a constant value.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 174
DBMS
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 175
DBMS
INSERT statement allow user to insert a single record or multiple records into a table
INSERT INTO table
(Column-1, Column-2, . . . Column-n)
(Value-1, Value-2, . . . . Value-n);
DELETE statement is used to delete rows in a table.
DELETE FROM table-name
WHERE some-column = some-valve
UPDATE statement is used to update existing records in a table.
UPDATE table-name
SET Column 1 = Value, column 2 = Value 2, - - WHERE some-column = some- value
DCL is Data Control Language statements. Some examples:
[WHERE condition]
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 176
[HAVING condition]
DBMS
Simple SELECT
SELECT attributes (or calculations: +, -, /, *)
FROM relation
SELECT DISTINCT attributes
FROM relation
SELECT attributes (or * wild card)
FROM relation
WHERE condition
SELECT - WHERE condition
AND
OR
NOT
IN
NOT IN
BETWEEN
IS NULL
IS NOT NULL
SOME
ALL
NOT BETWEEN
LIKE % multiple characters
LIKE _ single character
Evaluation rule: left to right, brac ets, NOT before AND & OR, AND before OR
SELECT - aggregate functions
COUNT
SUM
AVG
MIN
MAX
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 177
DBMS
SELECT - ORDER BY
ORDER BY
ORDER BY ... DESC
SELECT - JOIN Tables
Multiple tables in FROM clause MUST have join conditions!!!
Compatible Operations
UNION (Combine the result of two quaries together)
EXCEPT (Return any distinct values from the left query that are not found on the right
query)
INTERSECT(Return only rows that appear in both result sets)
Union compatible operator [ALL] [CORRESPONDING][BY column...] (ALL
duplicated rows in the result)
includes
COLUMN Alias
SELECT prodid, prodname, (salesprice - goodofcost) profit
FROM product
ORDER BY prodid;
SELECT prodid, prodname, (salesprice - goodofcost) AS profit
FROM product
ORDER BY prodid;
EXIST:-EXIST simply tests whether the inner query returns any row. If it does than the outer
query proceeds.
NOT EXIST:-NOT EXIST subquery is used to display cases where a selected column does not
appear in another table.
SOME:-Compare a scalar value with a single column set of value.
ANY: Compare a scalar value with a single column set of value.
Find stuid, stuname, major, and credits of the student whose credits are greater than any mis
student s credits
SELECT stuid, stuname, major, credits
FROM student
WHERE credits > ANY (SELECT credits FROM student WHERE major='mis');
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 178
DBMS
ALL: The ALL key word specifies that the search condition is true if the companion is true for
every value, that the sub query returns.
Grouping
Partition the set of tuples in a relation into group based on certain criteria and compute
aggregate functions for each group
All tuples that agree on a set of attributes (i.e., have the same value or each of these
attributes) called grouping attributes are put into a group.
Example: Determine that maximum of the GATE CS marks obtained by students in each city for
all cities.
SELECT City, MAX(Marks) As Max marks FROM Gate Marks
WHERE Branch
CS
GROUP BY City;
Result:
City
Max Marks
Hyderabad
87
Chennai
84
Mysore
90
Bangalore
82
Join operation
Join operation takes two relations and return another relation as a result.
Join Types
1. Inner Join (default)
(r inner join r on <
>)
use of just join is equivalent to inner join
Example: loan INNER JOIN borrower ON loan. loan Number = borrower loan Number.
2. Left Outer Join
(r left outer join r on <
>)
Example : loan LEFT OUTER JOIN borrower on loan. loan Number = borrower loan Number
3. Right Outer Join
(r right outer join r on <
>)
Example : loan RIGHT OUTER JOIN borrower ON loan.loanNumber = borrower loan
Number.
The right outer join is symmetric to the left outer join. Tuples from the right hand-side
relation that do not match any tuples in the left hand-side relation are padded with nulls and
are added to the result of the right outer join.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 179
DBMS
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 180
DBMS
Transactions
A transaction T is a logical unit of database processing that includes one or more database access
operations
ACID properties
Atomicity:
Either all operations of the transaction are reflected properly in the database, or none. This
property is maintained by transaction management component of DBMS.
Consistency:
Execution of a transaction in isolation (that is, with no other transaction executing concurrently)
preserves the consistency of the database. This is typically the responsibility of the application
programmer who codes the transactions.
Isolation:
When multiple transactions execute concurrently, it should be the case that, for every pair of
transactions Ti and Tj, it appears to Ti that either Tj finished execution before Ti started or Tj
started execution after Ti finished. Thus, each transaction is unaware of other transactions
executing concurrently with it. The user view of a transaction system requires the isolation
property and the property that concurrent schedules take the system from one consistent state
to another. It is enforced by concurrenty control component of database
Durability:
After a transaction completes successfully, the changes it has made to the database persist, even
if there are system failures. It is enforced by recovery management component of the DBMS.
Transaction States (fig. 5.5.1)
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 181
DBMS
BEGIN TRANSACTION
Active
END TRANSACTION
COMMIT
Partially
committed
READ
WRITE
Committed
ABORT
ABORT
Failed
Terminated
Fig. 5.5.1
Schedule
When transactions are executing concurrently in an interleaved fashion, then the order of
execution of operations from the various transactions is known as Schedule.
Types of schedules: Serial, Non serial, Recoverable and Non recoverable schedules
Serial schedule:A schedule where the operations of each transaction are executed consecutively
without any interleaved operations from other transactions.
Nonserial schedule: A schedule where the operations from a set of concurrent transactions are
interleaved.
Recoverable Schedule: A schedule which recovers from aborts by itself
Non-recoverable Schedule: A schedule which is not recoverable
Serializability
A schedule'S' of 'n' transactions is Serializable if it is equivalent to some serial schedule of the
same 'n' transactions, i.e If interleaved schedule produces the same result as that of the serial
schedule, then the interleaved schedule is said to be serializable.
For two schedules to be equivalent, the operations applied to each data item affected by the
schedules should be applied to that item in both schedules in the same order. There are two
types of equivalences they are conflict equivalence and view equivalence and they lead to
(a) Conflict Serializability
(b) View Serializability.
Conflict Serializabilty
DBMS
View Serializability
Consider two schedules S and S1, they are said to be view equivalent if the following conditions
are met.
1. For each data item Q if transaction Ti reads the initial value of Q in schedule S, then
transaction Timust in schedules S1 must also read the initial value of Q.
2. For each data item Q, if transaction Ti executes read Q in schedule S and that value was
produced by Tj (if any), then transaction Ti must in schedule S1, also read the value of Q
that was produced by transaction Tj.
3. For each data item Q, the transaction (if any) that performs the final write (Q) operation
in schedule S must perform the final write (Q) operation is schedule S1.
Conditions 1 and 2, ensure that each transaction reads the same values in both schedules and
therefore performs the same computation. Condition 3 ensures that both schedules result in the
same final system state. The concept of view equivalence leads to the concept of view
serilizability.
Concurrency control Protocols
Two-phase-locking protocol
Basic 2PL: Transaction is said to follow the two-phase-locking protocol if all locking
operations precede the first unlock operation
Two phases
Expanding or growing: New locks on items can be acquired but none can be
released.
Shrinking: Existing locks can be released but no new locks can be acquired.
Conservative 2PL (static) 2PL: Lock all items needed BEFORE execution begins by
predeclaring its read and write set
If any of the items in read or write set is already locked (by other transactions),
transaction waits (does not acquire any locks)
Deadlock free but not very realistic
Strict 2PL: Transaction does not release its write locks until AFTER it aborts/commits
Not deadlock free but guarantees recoverable schedules
Most popular variation of 2PL
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 183
DBMS
Gurantees Strict schedules (strict schedule: Transaction can neither read/write X until
last transaction that wrote X has committed/aborted)
A schedule with a set of transactions, that uses the tree locking protocol can be shown to be
serializable. The transactions need not be two phase.
Advantage of tree locking control:
a. Compared to the two phase locking protocol, unlocking of the data item is earlier. So it
leads to the shorter waiting times and increase in concurrency.
Disadvantages of tree locking control:
a. A transaction may have to lock data items that it does not access, because to access
descendants we have to lock its parent also. So the number of locks and associated
locking overhead is high.
b. A risk of deadlock: One problem that is not solved by two-phase locking is the potential
for deadlocks, where several transactions are forced by the scheduler to wait forever for
a lock held by another transaction
Timestamp based protocols
The use of locks, combined with the two phase locking protocol, guarantees serializability of
schedules. The order of transactions in the equivalent serial schedule is based on the order in
which executing transactions lock the items they require. If a transaction needs an item that is
already locked, it may be forced to wait until the item is released. A different approach that
guarantees serializability involves using transaction timestamps to order transaction execution
for an equivalent serial schedule.
Time Stamps
A Timestamp is a unique identifier created by the DBMS to identify a transaction. Timestamp
values are assigned in the order in which the transactions are submitted to the system. So a
timestamp is considered as the transaction start time with each transaction Ti in the system, a
unique timestamp is assigned and it is denoted by TS(Ti). When a new transaction Tj enters the
system, then TS(Ti) < TS(Tj), this is known as timestamp ordering scheme. To implement this
scheme, each data item (Q) is associated with two timestamp values.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 184
DBMS
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 185
DBMS
start
T reads
T start
aborts
Multiple Granularity:
Allow data items to be of various sizes and define a hierarchy of data granularities, where
the small granularities are nested within larger ones.
Can be represented graphically as a tree.
When a transaction locks a node in the tree explicitly, it implicitly locks all nodes
descendants in the same lock mode.
Granularity of locking (level in tree where locking is done):
Fine granularity (lower in tree): High concurrency high locking overhead
Coarse Granularity (higher in tree): Low locking overhead, low concurrenty.
Example of granularity hierarchy:
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 186
DBMS
DB
Fa
A1
A2
Fb
Fc
The highest level in the example hierarchy is the entire database. The levels below are of
type area, file and record in that order.
Intention Lock Modes:
In addition to shred and exclusive lock modes, there are three additional lock modes
with multiple granularity.
Intention shared (IS) indicates explicit locking at a lower level of the tree but only
with shared locks
Intention exclusive (IX) indicates explicit locking at a lower level with exclusive or
shared locks.
Shared and intention exclusive (SIX)The subtree rooted by that node is locked
explicitly in shared mode and explicit locking is being done at a lower level with
exclusive mode locks.
Compatibility matrix with intention lock modes
IS IX S SIX X
IS
IX
SIX
DBMS
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 188
DBMS
-Trees
B-trees and B+ trees are special cases of the well known tree data structure. We introduce very
briefly the terminology used in discussing tree data structures. A tree is formed of nodes. Each
node in the tree, except for a special node called the root, has one parent node and several zero
or more child nodes. The root node has no parent. A node that does not have any child nodes is
called a leaf node; a nonleaf node is called an internal node. The level of a node is always one
more than the level of its parent, with the level of the root node being zero. A sub tree of a node
consists of that node and all its descendant node its child nodes, the child nodes of its child
nodes, and so on. A precise recursive definition of a sub tree is that it consists of a node n and the
sub trees of all the child nodes of n. In this figure the root node is A, and its child nodes are B, C,
and D. Nodes E, J, C, G, H, and K are leaf nodes.
Usually, we display a tree with the root node at the top, as shown in Figure 5.6.1. One way to
implement a tree is to have an many pointers in each node as there are child nodes
nodes at
level 1
root node
(level 0)
G
G
nodes at
level 2
nodes at
level 3
DBMS
<K
.
K
P
X
X <K
X
K
<
X
K
<K
<
Figure 5.6.2 A node in a search tree with pointers to subtrees below it.
2. For all values X in the subtree pointed at by P , we have K
for i = 1; and K
< for i = q (see Figure 5.6.2).
<
Whenever we search for a value X, we follow the appropriate pointer P according to the
formulas in condition 2 above
We can use a search tree as a mechanism to search for records stored in a disk file. The values in
the tree can be the values of one of the fields of the file, called the search field (which is the same
as the index field if a multilevel index guides the search). Each key value in the tree is associated
with a pointer to the record in the data file having that value. Alternatively, the pointer could be
to the disk block containing that record. The search tree itself can be stored on disk by assigning
each tree node to a disk block. When a new record is inserted, we must update the search tree by
inserting an entry in the tree containing the search field value of the new record and a pointer to
the new record.
Algorithms are necessary for inserting and deleting search values into and from the search tree
while maintaining the preceding two constraints. In general, these algorithms do not guarantee
that a search tree is balanced, meaning that all of its leaf nodes are at the same level. The tree in
figure 5.6.1 is not balanced because it has leaf nodes at levels 1, 2, and 3. Keeping a search tree
balanced is important because it guarantees that no nodes will be at very high levels and hence
require many block accesses during a tree search. Keeping the tree balanced yields a uniform
search speed regardless of the value of the search key. Another problem with search trees is that
record deletion may leave some nodes in the tree nearly empty, thus wasting storage space and
increasing the number of levels. The B-tree addresses both of these problems by specifying
additional constraints on the search tree.
B Trees: The B-tree has additional constraints that ensure that the tree is always balanced and
that the space wasted by deletion, if any, never becomes excessive. The Algorithms for insertion
and deletion, though, become more complex in order to maintain these constraints. Nonetheless,
most insertions and deletions are simply processes; they become complicated only under special
circumstance namely, whenever we attempt an insertion into a node that is already full or a
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 190
DBMS
deletion from a node that makes it less than half full. More formally, a B-tree of order p, when
used as an access structure on a key field to search for records in a data file, can be defined as
follows:
1. Each internal node in the B-tree (figure 5.6.3a) is of the form
< P , < K , P >, P , < K , P >, , < K , Pr
>, P >
where q p. Each P is a tree pointer a pointer to another node in the B-tree. Each Pr is
a data pointer a pointer to the record whose search key field value is equal to K (or to
the data file block containing the record).
2. Within each node, K < K < < K .
3. For all search key field values X in the subtree pointed at by P (the ithsubtree, see Figure
6.4a), we have:
K
< < K for 1 < < ; X < K for i = 1; and K
< for i = q.
4. Each node has at most p tree pointers.
5. Each node, except the root and leaf nodes, has at least (p 2) tree pointers. The root
node has at least two tree pointers unless it is the only node in the tree.
6. A node with q tree pointers, q p, has q 1 search key field values (and hence has q 1
data pointers).
7. All leaf nodes are at the same level. Leaf node have the same structure as internal nodes
except that all of their tree pointers P are null.
A B-tree starts with a single root node (which is also a leaf node) at level 0 (zero). Once the root
node is full with p 1 search key values and we attempt to insert another entry in the tree, the
root node splits into two nodes at level 1. Only the middle value is kept in the root node, and the
rest of the values are split evenly between the other two nodes. When a nonroot node is full and
a new entry is inserted into it, that node is split into two nodes at the same level, and the middle
entry is moved to the parent node along with two pointers to the new split nodes. If the parent
node is full, it is also split. Splitting can propagate all the way to the root node, creating a new
level if the root is split.
.
(a)
tree
pointer
data
pointer
X
X<
tree
pointer
.
tree
data pointer
pointer
<
tree
data pointer
pointer
data
pointer
X
<
<
(b)
8 0
0 Data pointer
Null node
pointer
1 0
3 0
6 0
7 0
9 0
1
2
Figure 5.6.3 B-tree structures. (a) A node in a B-tree with q 1 search values. (b) A B-tree
of order p = 3. The values were inserted in the order 8, 5, 1, 7, 3, 12, 9, 6.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 191
DBMS
If deletion of a value causes a node to be less that half full, it is combined with its neighboring
nodes, and this can also propagate all the way to the root. Hence, deletion can reduce the
number of tree levels. It has been shown by analysis and simulation that, after numerous
random insertions and deletions on a B-tree, the nodes are approximately 69 percent full when
the number of values in the tree stabilizes. This is also true of B+-trees. If this happens, node
splitting and combining will occur only rarely, so insertion and deletion become quite efficient. If
the number of values grows, the tree will expand without a problem although splitting of
nodes may occur, so some insertions will take more time.
B-trees are sometimes used as primary file organizations. In this case, whole records are stored
within the B-tree nodes rather than just the <search key, record pointer> entries. This works
well for files with a relatively small number of records, and a small record size. Otherwise, the
fan-out and the number of levels become too great to permit efficient access.
In summary, B-trees provide a multilevel access structure that is a balanced tree structure in
which each node is at least half full. Each node in a B-tree of order p can have at most p 1
search values.
-Trees
Most implementations of a dynamic multilevel index use a variation of the B-tree data structure
called a
-tree. In a B-tree, every value of the search field appears once at some level in the
tree, along with a data pointer. In a B -tree, data pointers are stored only at the leaf nodes of the
tree; hence, the structure of leaf nodes differs from the structure of internal nodes. The leaf
nodes have an entry for every value of the search field, along with a data pointer to the record
(or to the block that contains this record) if the search field is a key field. For a nonkey search
field, the pointer points to a block containing pointers to the data file records, creating an extra
level of indirection.
The leaf nodes of the B -tree are usually linked together to provide ordered access on the search
field to the records. These leaf nodes are similar to the first (base) level of an index. Internal
nodes of the B -tree correspond to the other levels of a multilevel index. Some search field
values from the leaf nodes are repeated in the internal nodes of the B - tree to guide the search.
The structure of the internal nodes of a B -tree of order p (Figure 5.6.4a) is as follows:
1. Each internal nodes is of the form
<P , K , P , K , , P , K , P >
Where q p and each P is a tree pointer.
2. Within each internal node, K < K < < K .
3. For all search field values X in the subtree pointed at by P , we have K
< K for 1
< i < q; X K for i = 1; and K
< for i = q (see Figure 5.6.4a).
4. Each internal node has at most p tree pointers.
5. Each internal node, except the root, has at least (p 2) tree pointers. The root node has
at least two tree pointers if it is an internal node.
6. An internal node with q pointers, q p, has q 1 search field values.
The structure of the leaf nodes of a B -tree of order p (Figure 5.6.4b) is as follows:
1. Each leaf node is of the form
<< K , Pr >, < K , Pr >,
,< K
, Pr
>, P
>
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 192
(a)
tree
pointer
X
<
data
pointer
data
pointer
2.
3.
4.
5.
tree
pointer
tree
pointer
(b)
DBMS
<
<
pointer to
next leaf
node in tree
.
data
pointer
data
pointer
The pointers in internal nodes are tree pointers to blocks that are tree nodes, whereas the
pointers in leaf nodes are data pointers to the data file records or blocksexcept for the P
pointer, which is a tree pointer to the next leaf node. By starting at the leftmost leaf node, it is
possible to traverse leaf nodes as a linked list, using the P
pointers. This provides ordered
access to the data records on the indexing field. A P
pointer can also be included. For a
B -tree on a nonkey field, an extra level of indirection is needed so the Pr pointers are block
pointers to blocks that contain a set of record pointers to the actual records in the data file.
Because entries in the internal nodes of a B -tree include search values and tree pointers
without any data pointers, more entries can be packed into an internal node of a B -tree then for
a similar B -tree. Thus, for the same block (node) size, the order p will be larger for the B -tree
than for the B-tree. This can lead to fewer B -tree levels, improving search time. Because the
structures for internal and for leaf nodes of a B -tree are different, the order p can be different.
We will use p to denote the order for internal nodes and p
to denote the order for leaf nodes,
which we define as being the maximum number of data pointers in a leaf node.
As with the B-tree, we may need additional informationto implement the insertion and
deletion algorithmsin each node. This information can include the type of node (internal or
leaf), the number of current entries q in the node, and pointers to the parent and sibling nodes.
Hence, before we do the above calculations for p and p , we should reduce the block size by
the amount of space needed for all such information.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 193
Theory of Computation
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 194
Theory of Computation
Acceptance by an Automata :
A string X is said to be accepted by a finite automaton M = (Q, , , q 0, F) if (q0, x) = P for
some p in F. The language accepted by M, designated L (M), is the set {x | (q0,x) is in F}.
A language is a regular set (or just regular) if it is the set accepted by some automaton.
There are two preferred notations for describing Automata
1. Transition diagram
2. Transition table
1. Give DFA for accepting the set of all strings containing 111 as substring.
Transition diagram :
0/1
0
Start
1
q0
1
q2
q1
11
11
11
0 11
111
111
Transition Table: 1
q0
q1
q2
*q3
q0
q0
q0
1q3
1
q1
q2
q3
q3
111
111
1
111
0
111
1
111
111
1
1
q3
11
11
111
111
1
111
111
1
111
111
Extending transition function from single symbol to string: For the behavior of a finite automaton
1
on string, we must extend the transition function to apply to a state and a string rather
than a state and a symbol. We define a function from Q X *Q.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 195
Theory of Computation
1. (q, ) = q, and
2. for all strings w and input symbols a, (q, wa) = ( (q,w),a)
In any DFA, for a given input string and state the transition path will always be unique.
0
0
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 196
Theory of Computation
q0
In this diagram
q1
q2
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 197
Theory of Computation
(C) Then (q,w) = -closure (rj). This additional closure step includes all the paths from j
= 1. q labeled w, by considering the possibility that there are additional -labeled arcs
that we can follow after making a transition on the final real symbol a.
Eliminating -Transitions (Construction of DFA from -NFA);
Let E=(Q , , , q , F )be the given -NFA then the equivalent DFA D=(Q , , q , F )is
defined as follows
1. q =
closure (q )
2. Q is the set of all subsets of Q more precisely ,we shall find that the only accessible
states of D ate the -colsed subset of Q , that is those sets SQ such that S= -closure
(S).
3. F is those sets of states that contain atleast one accepting state of E i.e., F ={S/S is in Q
and SF }
4. is computed ,for all a in and sets S in Q by
(A) Let S={P , p , p }
(B) Compute (p , a) ; Let this set be {r , r . . , r }
m
(C) Then (S, a)
-closure (r )
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 198
Theory of Computation
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 199
Theory of Computation
Theory of Computation
3. * =
4. r+ = r.r* = r*.r
i.e r+ = rr* = r*r
5. r* = r+ +
6. r? = + r (Unary postfix operator? means zero or one instance)
Converting Regular Expression To Automata ( -NFA):
Basis: Automata for , and a are (a), (b) and (c) respectively
start
start
start
q0
q0
qf
q0
(a) r =
(b) r =
Induction:
Automata for r + s, rs and r* are (p), (q) and (s) respectively.
q1
(c) r = a
f1
M1
star
t
qf
f0
q0
q2
f2
M2
(p) r+s
start
q1
M1
f1
q2
M2
f2
(q)rs
start
q0
q1
M1
f1
ff0
0
(s)r*
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 201
Theory of Computation
1/y
Theory of Computation
The states of Moore machine are [q0,y], [q0,n], [p0,y],[p0,n] [p1,y] and [p1,n].
As q0 is the start stae of Melay machine choose either [q0,n].
The start stae of Moore machine is [q0,n]
Regular Grammars
Definition of a Grammar:
A phrase-structure grammar (or simply a grammar) is ( V,T,P,S), where
i. V is a finite nonempty set, whose elements are called variables.
ii. T is a finite nonempty set, whose elements are called terminals.
iii. V T = ,
iv. S is a special variable (i.e an element of V) called the start symbol, and
v. P is a finite set whose elements are , where and are strings on V T. has at
least one symbol from V. Elements of P are called productions or production rules or
rewriting rules.
Right-Linear Grammar:
A grammar G = (V,T,S,P) is said to be right-linear if all productions are of the form
A xB
A x.
Where A,B V and x T*.
Left-Linear Grammar:
A grammar G = (V,T,P,S) is said to be left-linear grammar if all productions are of the form A
Bx or A x.
Either right-linear or left-linear grammar is a Regular grammar.
Example:
The grammar G1 = ({s}, {a,b}, S, P1), with P1 given as S abS/a is right-linear grammar.
The grammar G2 = ({S, S1, S2}, {a,b}, S, P2} with productions.
S S1ab,
S1 S1ab|S2,
S2 a,
is left-linear grammar.
Both G1 and G2 are regular grammars.
A language L is regular if and only if there exists a left-linear grammar G such that L = L(G).
A language L is regular if and only if there exists a right-linear grammar G such that L = L(G).
Construction of -NFA from right-linear grammar:
Let G = (V,T,P,S) be a right-linear grammar. We construct an NFA with -moves, M =
(Q,T,,[S],[ ]} that simulates deviation in G
Q consists of the symbols [] such that is S or a (not necessarily proper) suffix of some righthand side of a production in P.
We define by:
1. If A is a variable, then ([A], ) = { [] | A is a production}
2. If a is in T and in T* T*V, then ([a],a) = {[]}
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 203
Theory of Computation
Theory of Computation
3. Regular languages are closed under intersection. That is if L1 and L2 are regular
languages then L1 L2 and L1 L2 are also regular languages.
4. Regular languages are closed under difference. That is if L and M are regular languages,
then so is L M.
5. Regular languages are closed under string reversal.
The reversal of a string a1 a2 an is the string written backwards, that is an an-1 .a1
we use WR for the reversal of a string w. The reversal of a language L, written LR, is the
language consisting of the reversals of all its strings.
Given a language L that is L (M) for some deterministic finite automata, we may
construct an automata for LR by
1. Reverse all the arcs in the transition diagram for M
2. Make that start state of A be the only accepting states for new automata
3. Create a new start P0 with transitions on to all the accepting states of M. The result
is an automata that simulates M in reverse and therefore accepts a string w if and
only if A accepts wR.
6. Regular languages are closed under substitution.
Let R * be a regular set and for each a in , let Ra * be a regular set
Let f: * be the substitution defined by f (a) = Ra.
Select regular expression denoting R and each Ra.
Replace each occurrence of the symbol a in the regular expression for R by the regular
expression for Ra.
Example:
Let f (0) = a and f(1) = b* That is , f(0) is the language {a} and f(1) is the language of all
strings of bs then f(010) is the regular set ab*a. If L is the language 0*(0+1)1*, then f(L)
is a*(a + b*) (b*)* =a*b*.
7. Regular languages are closed under homomorphism and inverse homomorphism.
homomorphism: A homomorphism h is a substitution such that h(a) contains a string
for each a. We generally take h (a) to be the string itself, rather than the set containing
that string.
Suppose and are alphabet, then a function h: * is called a homomorphism The
domain of the function h is extended to strings in an obvious fashion: if
W=a1a2 .. an then h(w) = h(a1)h(a2) h(a3) .h(an).
If L is a language on , then its homomorphic image is defined as h (L) = {h(w):w L}.
8. Inverse Homomorphism:
The inverse homomorphic image of a language L is
h-1(L) ={x / h (x) is in L}
for string w,h-1(w) = {x / h (x) = w}
1. Regular languages are closed under quotient with arbitrary sets.
Definition: The quotient of languages L1 and L2 written L1/L2
is {x | there exist y in L2 such that xy is in L1}
2. Regular languages are closed under INIT operation
Definition:
Let L be a language. Then INIT(L) ={x/ for some y, xy is in L}
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 205
Theory of Computation
3. Regular languages are closed under Kleen closure. i.e if L is a regular set then L* is also a
regular set.
Decision Algorithms for Regular languages:
The set of sentences accepted by a finite automata M with n states is:
1. Non empty if and only if the finite automaton accepts a sentence of length less than n.
2. Infinite if and only if the finite automaton accepts some sentences of length l, where
n l < 2n.
Note:
To test whether a DFA accepts the empty set, take its transition diagram and delete all states
that are not reachable on any input from the start state. If one or more final states remain, the
language is non empty. Then without changing the language accepted, we may delete all states
that are not final and from which one cannot reach a final state. The DFA accepts an infinite
language if and only if the resulting transition diagram has a cycle. The same method works for
NFAs also.
Equivalence of Regular languages:
There is an algorithm to determine if two finite automata are equivalent (i.e., if they accept the
same language).
L2) is accepted by
Let M1 and M2 be FA accepting L1 and L2 respectively. (L1
L ) (L
some finite automaton M3. It is easy to see that M3 accepts a word if and only if L1 L2. Hence
we can find whether L1 = L2 or not.
Right invariant relation:
A relation R such that xRy implies xzRyz is said to be right invariant (with respect to
concatenation)
Myhill-Nerode Theorem:
The following three statements are equivalent.
1. The set L * is accepted by some finite automaton.
2. L is the union of some of the equivalence classes of a right invariant equivalence relation
of finite index.
3. Let equivalence relation RL be defined by :xRLy if and only if for all z in *, xz is in L
exactly when yz is in L. Then RL is of finite index.
Pumping Lemma for Regular languages:
Pigeon Hole Principle:
If we put n objects into m boxes (pigeon holes), and if n>m, then atleast one box must have
more than one item in it.
Pumping Lemma for Regular languages:
Pumping Lemma uses pigeon hole principle to show that certain languages are not regular. This
theorem state that all regular languages have a special property. There are three forms of
Pumping Lemma.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 206
Theory of Computation
Regular Language
no
Yes
Satisfies Weak form
no
Yes
Satisfies Standard form
no
L is not regular
Yes
no
Satisfies Strong form
Yes
We cannot say anything regularity of L
If a language satisfies Pumping Lemma it may or may not be regular. But a language which
doesnt satisfy Pumping Lemma is not regular.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 207
Theory of Computation
} is not regular
5. L = {a b | n 1} is not regular
6.
7.
8.
9.
L = {a | n 1} is not regular.
L = {0 | n 1} is not regular
L = { | , in (0 + 1) } is not regular
L = { | , in (0 + 1) } is not regular
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 208
Theory of Computation
Theory of Computation
(A B means Band be derived from A after zero or more number of intermediate steps)
Lemmal 1:
Given a CFG =(V,T,P,S) with L(G) , we can effectively find an equivalent CFGG =(V ,T,p ,S)
,such that for each A in V there is some w in T for which s w.
Lemmal 2:
Given a CFGG=(V, T, P, S) with : (G) , we can effectively find an equivalent CFG G =
(V , T , P ,S) such that for each X in V T there exists and in (V
T )* for which
S X
Procedure for Eliminating Useless Symbols: Let G(V,T,P,S) be given CFG then by Applying
Lemmal.
G =(V , T,P ,S) can be find as follows,
Calculation of V
begin
OLDV:=;
NEWV:={A/Aw for some win T };
while OLDVNEWV do
begin
OLDV:=NEWV;
NEWV:=OLWV: {A/Ao for some in (TOLDV) }
End ;
P is the set of all productions whose symbols are in V T.
Now by applying lemma2 on G we can fine G as follows
Place Sin V lf A is placed inV and A / ./ ,
then add all variables of , .. to set V and all terminals of , . toT
p is the set of productions of P containing only symbols ofV T
Notes:
Every Non empty CFL is generated by CFG with no useless symbols
Theory of Computation
Normal Forms
Chomsky Normal Form
Any CFL without is generated by a grammar in which all productions are of the form A BC or
Aa. Here A,B and C are variables and a is a terminal.
Example:
Consider the grammar ({S,A,B}, {a,b}, P,S} that has the productions
SbA | aB
A bAA | aS | a
B aBB | bS | b
And find an equivalent grammar in CNF.
Solution:
The only productions already in proper form are Aa and Bb.
There are no unit productions, so we may begin by replacing terminals on the right by variables,
except in the case of productions Aa and Bb.
SbA is replaced by SCbA and Cb b
Similarly AaS is replaced by ACaS and Caa
AbAA is replaced by ACbAA
SaB is replaced by SCaB
BbS is replaced by BCbS, BaBB is replaced by BCaBB
In the next stage
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 211
Theory of Computation
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 212
Theory of Computation
Pushdown Automata
Just as the regular sets have an equivalent automaton the finite automaton, the context free
grammars has their machine counterpart - the pushdown automation.
The deterministic version of PDA accepts only a subset of all CFLs where as non-deterministic
version allows all CFLs. The PDA will have an input tape, a finite control, and a stack.
Q0
Z0
Where Qo is initial state and Zo is bottom stack symbol.
The language accepted by a PDA can be defined in two ways.
1. The first is the language accepted to be the set of all inputs for which some sequences of
moves causes the pushdown automaton to empty stack.
Definition of PDA:
A PDA M is a system (Q, ,F,,q0,Z0,F), where
1. Q is a finite set of states;
2. is an alphabet called the input alphabet;
3. F is an alphabet called the stack alphabet;
4. Q0 in Q is the initial state;
5. Z0 in F is a particular stack symbol called the star symbol;
6. F Q is the set of final states;
7. is a mapping from Q*(
{ }) x F to finite subsets of Q x F*.
Instantaneous Descriptions:
To formally describe the configuration of a PDA at a given instant we define an instantaneous
description (ID), we define an ID to be trible (q, w, r), where q is a state, w is a string of input
symbols, and a string of stack symbols.
If M = (Q,E,F,,qo,Z0,F) is a PDA, we say (q, aw, z) (p, w, ) if (q, a, z) contains (p, ) note that
a may be or an input symbol.
We use */M for the reflexive and transitive closure of /M. That is l */
I for each ID I, and I */M j and J*/M k imply I */M K. we write 1 I K if ID i can become k after
exactly I moves the subscript is dropped from |M i|M and *|M whenever the particular PDA M is
understood.
We can define L(M) the language accepted by a PDA M = (Q, , , , q0 z0, ) find state to be,
{w/(q0 w, z0) * (p, , ) for some p in F and in F*}. We define N(M), the language accepted y
empty stack (or null stack) to be {w/( q0 w, z0) * (p , ) for some in Q}. When acceptance is
by empty stack, the set of final states is irrelevant, and in this case, we usually let the set of final
states be the empty set.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 213
Theory of Computation
Deterministic Pdas
A PDA M = (Q,
q0 z0, F), is deterministic if:
For each q in Q and Z in , whenever (q, z) is nonempty, then (q, a z ) is empty for
all a in :
2. For non q in q Z in and a in { } does (q, a, z) contain more than one element
Note: for finite automata, the deterministic and non- deterministic models were equivalent
respect to the languages accepted. The same is not true for PDAs DPDAs accepts only a
subset languages accepted NPDAs. That is NPDA is more powerful tan DPDA.
If L is a CFL, then there exists a PDA, m that accepts L.
1.
Theory of Computation
4. All the regular language are accepted (by final state) by DPDAs and there are non
regular languages accepted by DPDAs. The DPDA languages are context free
languages, and in fact are languages that have unambiguous CFGs. The DPDA
languages lie strictly between the regular languages and the context free languages.
5. Deterministic CFLs are closed under complements, inverse homomorphism,
6. Intersection with regular sets and regular difference (DCFL regular).
7. Deterministic EFLS are closed under union, concatenation, kleene closer,
homomorphism and intersection.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 215
Input a
Theory of Computation
h(a)
P DA
state
Accept / Reject
The key idea in this diagram is that after input a is read, h (a) is placed in a buffer. The
symbols of h(a) are used one at a time and fed to the PDA being simulated. Only when the buffer
is empty does the constructed PDA read another of its input symbols and apply the
homomorphism to it.
If L is a CFL and R is a regular language, then L R is a CFL:
A PDA and a FA can run in parallel to create a new PDA as shown below.
FA
state
AND
Accept/Reject
PDA
state
stack
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 216
Theory of Computation
Non-Context-Free Languages
Important points:
1. Let G be a CFG in CNF. Let us call the productions of the form
Non-terminal Nonterminal Nonterminal; Live
and the productions of the form
Nonterminal terminal; Dead.
2. If G is a CFG in CNF that has p live productions and q dead productions, and if w is a word
generated by G that has more than 2p letters in it, then somewhere in every derivation tree
for w there is some non-terminal being used twice where the second z is descended from the
first z.
3. In a given derivation of a word in a given CFG, a non-terminal is said to be self-embedded if it
occurs as a tree descendent of itself.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 217
Theory of Computation
Wij}
wik and C
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 218
Theory of Computation
an
Finite Control
Definition:
A Turing machine M is defined by M = (Q,, q B, F) where Q is the inite set of internal states
is the finite set of allowable tape symbols
B a symbol of is the blank
is a subset if not including B is the set of input symbols,
is the next move function a napping from Q x to Q x x {L,R} ( may however undefined for
some arugements)
Q0 in Q is the start state,
F Q is the head of final states
We denote instantaneous description (ID) of the Turing machine M by q Here q the current
state of M is in Q, is the string in we assume that Q and are disjoint to avoid confusion
Finally the tape head is assumed to be scanning the left most symbol of or if
= ,the head is scanning a blank
We define a move of M as follows Let x1,x2,x3xi .qx.xn be an ID suppose (q, x) = (P, , L)
where if i-l=1=n then x is taken to be B if i=1 then there is no next ID as the tape head is not
allowed to fall off the left end of the tape if i>1 then we write
x1,x2.xi-1qx1.xn
However if any suffix is completely blank that suffix is deleted in (1)
Note that in the case i-1=n the string x1..xn is empty and the right side of (2) is longer than the
left side if two IDs are related by M say that the second results from another by some finite
number of moves including zero moves they are related by the symbol M
The language accepted by M denoted L (M) is the set of those words in that cause M to enter a
final state when placed justified at the left on the tape of M with M is state q0 and the tape head
of M at the left most cell .Given a TM recognizing a language L we assume without loss of
generality that the TM halts i.e., has no next move whenever the input is a accepted However for
words not accepted it is possible that the TM will never halt.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 219
Theory of Computation
Two Way infinite TM: L is recognized by a turing machine with a two-way infinite tape if
and only if it is recognized by a TM with one-way infinite tape
.
.
a2
a1
Finite control
2. Multitape TM: If L is a language, accepted by a multi tape turing machine it is accepted by a
single-tape machine
Finite control
Fig: Multitape TM
3. Non deterministic TM:If L is accepted by a non deterministic turing machine M1,then L is
accepted by some deterministic turing machine M2.
4. Multi Dimensional TM:In K-dimensional TM the tape consists of K- dimensional array cells
infinite in all 2K direction for some fixed K.
If L is accepted by a K-dimensional turing machine M1, then L is accepted by some Single
tape turing machine M2.
5. Multi Head Turing Machine: A K- head TM has some fixed numer K of heads The heads are
numbered I through K and a move of TM depends on the state and on the symbol scannd by
each head In one move the haeds may move independently left right or remain stationery
If L is accepted by some K-head by some K-head TM M1, it is accepted by a one head TM M2,
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 220
Theory of Computation
Finite control
Fig: K head TM
6. MULTI TRACK TURING MACHINE: we can imagine that the rope of the TM is divied into k
tracks, for any finite k.
k-tracks
Finite control
7. TURING MACHINE WITH STAY OPTION: In these TMs the read-writehead can stay atthe
current position upon reading an input symbol (possibly changing) without moving left to
right.
8. OFF-LINE TURING MACHINE: An off-line TM is a multitape TM whose input tape isread-only.
Usually we surround the input by end makers c on the left and s on the right. The turing
machine is not allowed to move the input tape head off the region between c and s it should
be obvious that the off-line TM is just a special case of the Multiple TM. An off-line TM can
simulate any TM M by using one more tape than M. The first thing the off-line TM does is
copy its own input onto the extra tape and it then simulates M as if extra tape were Ms input.
All these Modifications does not add any language accepting power and all these are equivalent to
the Basic model.
POST MACHINE: A post machine denoted PM, is a collection of five things:
1. The alphabet of input letters plus the special symbol
2. A linear storage location (a place where a string of symbols is kept) called the STORE or
QUEUE which initially contains the input string. We allow for the possibility that characters
not in can be used in the STORE, characters from an alphabets called the store alphabet.
3. Read states, for example which remove the left most character from the STORE and
branches accordingly? The only branching in the machine takes place at the Read states.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 221
Theory of Computation
read
4. ADD states
ADD
ADD
ADD
Which concatenate a character onto the right end of the string in the STORE. This is different
Form PDA pushes state .No branching can take place at an ADD state.It is possible to have an
ADD state for every letter in and
5.
A start state (unutterable) and some halt states called Accept and REJECT
start
ACCEPT
REJECT
We could also define a Non deterministic post machine NPM. This would allow for more than
one edge with the same label to come from a READ state. In their strength NPM-PM.
Two - Stack Machine:
A two-push down stack machine a 2PDA is like a PDA except that it has two push down
STACKS,STACKS1,STACKS2.When we wish to push a character x into a stack, we have to specify
which stack either PUSH1 x or PUSH2 x . When we POP a STACK for the purpose of branching we
must specify which STACK either POP1 or POP2 (Read a character from read-only input tape)
the functions of start, Read, Accept and Reject are same as in the post machine.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 222
Theory of Computation
Counter-Machines:
A counter machine may be thought of in one or two ways:
1. The counter machine has the same structure as the multistack machine but in place of each
stack is a counter .Counters hold any nonnegative integer but we can only distinguish
between zero and nonzero counters. That is the move of the counter machine depends on its
state input symbol and which if any of the counter are zero . In one or more the counter
machine can
a. Change state
b. Add or subtract 1 form any of its counters, independently. However a counter is not
allowed to become negative, so it cannot subtract from a counter that is currently 0
2. A counter machine may also be regarded as a restricted multistack machine. The
restrictions are as follows.
a. There are only two stack symbols. Which we shall refer to as z (the bottom of stack
marker), and X.
c. Zo is initially on each state
d. We may replace Z0 only by a string of the form XZ0 for some i 0.
e. We may replace X only by X for some i 0That is Z0appears only on the bottom of each
stack and all other stack symbols if any are X.
The two definitions clearly define machines of equivalent power
1. Every recursively enumerable language is accepted by a three counter machine( we can
simulate two stacks by 3-counters only)
2. Every recursively enumerable language is accepted by a two-counter machine(we can
simulate two stacks by 2-counters only)
3. Every recursively enumerable language is accepted by a 3-pebble machine (3-pebble
machine are sufficient to simulate two counters).
3-PEBBLES
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 223
State
q
A
Theory of Computation
Storage
Multiple Tracks:
Another useful trick is to think of the tape of TM as composed of several tracks each track can
hold one symbol and tape alphabet of the TH consists of tuples, eith one component for each
track. Like the technique of storage in the finite control, using multiple tracks does not extend
what the TM cando. A common use of the multiple track is to treat one track as holding the data
and a second track as holding a mark. We can check of each symbol as we use it or we keep track
of a small number of positions within the data marking those positions.
Finite control
Track 1
Track 2
Track 3
Y
X
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 224
Theory of Computation
Description of M
Internal states of M
Tape contents of M
For any input M and W tape will keep an encoded definition of M, tape 2 will contain the tape
contents of M and tape 3 the internal state of M Mu looks first at the contents of tape 2 and 3 to
determine the configuration of M. it then consults tape1 to see what M would do in this
configuration. Finally tapes 2 and tapes 3 will be modified to reflect the result of the move
This implementation clearly can be done using some programming languages. There, we expect
that it can also be done by a standard Turing machine
Context sensitive grammar:
A grammar G=(V,T,P,S) is said to be context-sensitive if all production are of the form
Where , (VUT)+and || ||
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 225
Theory of Computation
This definition shows clearly one aspects of this type of grammar. It is non contracting in the
sense that the length of successive sentential forms can never decrease.
Context sensitive language:
A language is said to be context sensitive if there exists a context sensitive grammar G, such that
L = L (G) or L=L(G) U {}
Context sensitive grammar does not contain productions of the form so, that a contextsensitive grammar can never generate a language containing the empty strong
By including the empty string in the definition of a context-sensitive language, we can claim that
the family of context-free language is a subset of the family of context sensitive language
q for some q in F}
M
1. If L is a CSL, then L is accepted y some LBA.
2. IF L = (M) for LBAM = ((Q. , ,, q , C, $, F ) then L { } is a CSL
3. Every CSL is recursive but converse is not true.
A string is accepted a LBA if there is a possible sequence of moves q C w$ C q $
For some q
F, ,
*.The language accepted by the LBA is the set of all such accepted
strings.
Hierarchy of formal languages (Chomsky Hierarchy )
1. Unrestricted grammars (or Type 0 of grammars ):
A grammar G = (V, T, P, S) is called unrestricted if all the production are of the form
+
Where is in (VUT) and is in (VUT)*
Any language generated by an unrestricted grammar is recursively enumerable.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 226
Theory of Computation
Type 2
OR UNRESTRICTED GRAMMAR
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 227
Theory of Computation
UNDESIDABILITY:
Recursive language:
A language L over the alphabet is called recursively enumerable if there is a TM T that access
every word in L
Recursive enumerable language:
A language L over the alphabet is called recursively enumerable if there is a TM T that accepts
Every word in L the either rejects or loops for every word in the language L , the complement L.
Non recursively enumerable language :
For any nonempty there exists languages there exist language that are not recursively
enumerable
Suppose we have a list of (0+1)* in canonical order(if
= {0,1}, The canonical order is
0,1,00,01,10,11,000,000}Where W is the ith word, and M is the TM whose code is the integer
j written in binary, imagine an infinite table that tells for all I and j whether W is in L(M )
J
..
..
..
1
Diagonal
We construct language L by using the diagonal entries of the table to determine membership
inL
To guarantee that no TM accepts L .We insist that W is in L if and only if if the (i.i) entry is 0.
that is , if M doesnot accept W Suppose that some TM MJ accepted L . Then we are faced with
the following contradiction if W .is in L then (j.j) entry is 0, implying that w is tin
L(M ) and contradiction if L = L(M ) or the other hand if W is not in L the (j, j) entry is 1,
implying that W is I l(M ) , which again contradicts L = L(M ) , as W is either in or not in Ld
we conclude that our assumption L = L(M ) is false. Thus no TM in the list accepts L .
The universal language:
Define L , the universal language to be { <M,w>/ M accepts w} we call L universal since
the question of whether any particular string W in ( 0+1)* is accepted by any particular TM.. M
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 228
Theory of Computation
is equivalent to the question of whether <M1, W> is in Lu. Where M is the one tape TM with
tape alphabet {0,1,B} equivalent to M.
Lu is recursively enumerable but not recursive
Ld is recursively enumerable but not recursive
Relationship between recursive, recursively enumerable and non enumerable languages
Recursively enumerable
Recursive
Reductions
then:
a) If p1 is un decidable then so is
b) If p1 is non-RE, then so is
yes
yes
N0
o
no
P1
P2
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 229
Theory of Computation
guessed
W
U
Accept
Accept
M for
accept
accept
M
X
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 230
Theory of Computation
M ignores its own input x and instead simulates M on input W, accepting note that M is not B.
Rather B is like a compiler that takes <M, w> as source program and produces M as object
program the construction of B is simple, it takes <M, w> and isolates W say W = a a .. a is
of length n, B creates n+3 states q , q ,-----------q +3. With moves
(q X) = (q ,$, R) for any X (print maker),
(q ,X)=(q +1, a , R) for any X and 2 1 n+1 (print w)
(q ,X) = (q ,B,R) for XB (erase tape)
(q ,B) = (q ,B,L)
(q ,X) = (q ,,B,L) for X $ (find maker)
having produced the code for these moves B then adds n+3 to the indices of the states of M and
includes the move
(q , $) = (q , $,R) /* state up M*/
and all the moves of M it is generating TM
Now suppose algorithm A accepting L
shown below.
as
yes
yes
(M,w)
No
No
exists.
Yes start
w
yes
Fig: TM
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 231
Theory of Computation
As in the previous example, we have described the output of A. we leave the construction of A
to the reader.
as shown below
Given A and M we could construct a Tm acceptingL
yes
yes
<M,W>
yes
yes
yes
yes
<M,w>
Thus M accepts a recursive language iff M accepts w. M which B must produce, is shown in
fig (a) and TM to accept
L given B and M is shown in fig (b)
The TM of fig (b) accepts <M, w>iffL(M ) is not recursive or equivalently, if and only if M does
since we have already shown that
not accept w. i.e., the TM accepts <M,w>iff<M,w> is in L
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 232
Theory of Computation
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 233
Theory of Computation
Question
Regular
Sets
D
DCFL's
CFL's
CSL's
2. is L=? (Emptiness
problem)
3.is L =
*?
(completeness problem)
4. is L1 = L2? (equality
problem)
5.is L1 L2 ? (subset
problem)
6. is L1 L2 = ?
7.i s L - R, where R is a
given regular set
8.is L regular?
1.
is
w
in
L?
(membership problem)
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 234
Theory of Computation
DCFL's
CFL's
CSL's
Recursive
Sets
R.E Sets
1. Union
2. Concatenation
3. Kleene closure
4. Intersection
5. complementation.
6. Homomorphism
7.
Inverse
Homomorphism
8. Substitution
9. Reversal
11. Quotient
regular sets
with
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 235
Theory of Computation
Theory of Computation
12. The edge cove problem : Given a graph G and an integer k, does G have an "edge cover" of k
edges, that is, a set of k edges such that node of G is an end of at least one edge is the edge
cover.
The edge cover problem is NP-complete.
13. The linear integer programming problem is NP-complete.
14. The dominating set problem .-.Given a graph G and an integer k, does there exist a subset S of
k nodes of G such that each node is either in S or adjacent a node of S? This problem is NP complete.
15. The half-clique problem :Given a graph G with an even number of vertices, does there exist a
clique of G consisting of exactly half the nodes of G?
The half-clique problem is NP-complete.
16. The unit-execution time scheduling problem is NP - complete.
17. Exact cover problem: Given a set S and a set of subsets S1,S2, ....Sn of S, is there a set of subsets
T {S1, S2,...Sn} such that each element x of S is inexactly one member of T? Exact cover
problem is NP-Complete.
18. The knapsack problem is NP-Complete.
19. Given graphG and an integer k, does G have a spanning tree with at most k leaf vertices.
20. Given graph G and an integer d, does G have a spanning tree with no node of degree greater
than d ?. (The degree of a node n in the spanning tree is the number of edges of the tree that
have n as an end).This problem is NP-Complete.
21. Do two FA's with the same input alphabet recognize different languages isNP-Complete.
22. Do two R.E E1on E2over the operators (+,.,*) represent "different languages is NP- Complete.
23. Do two regular grammars G1and G2generate different languages is NP-Complete.
24. Does a given CFG generates a given string x is NP-Complete.
25. Satisfiability, CNF-satisfiability problems are NP- complete.
Some of the NP-Hard Problems:
1. Halting problem is to determine for an arbitrary deterministic algorithm A and an input I
whether algorithm A with input I ever terminates. It is well known that this problem is
undecidable. Hence, there exist no algorithm (of any complexity) to solve this problem. So, it
clearly can't be in NP.
Two problem L1 and L2 are said to be polynomially equivalent if and only if L1 is
polynomilally reducible to L2 and viceversa.
Only a decision problem can be NP - complete. However, an optimization problem may be
NP-hard. Further more if L1 is a decision problem and L2 an optimization, it is quite possible
that L1 and L2 (Li is polynomially reducible to L2).
Knapsack decision Optimization
Clique decision Optimization
Yet optimization problems can't be NP-complete where as decision problems can. There also
exist NP - Hard decision problems that are not NP-complete.
To show that a problem L2 is NP-hard, it is adequate to show L1 and L2, where L1 is some
problem already known to be NP-hard.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 237
Theory of Computation
Intractable Problems:
*The problems solvable in polynomial time on a deterministic TM are tractable.
* The problems which require more than polynomial time on a deterministic TM are intractable.
The Class of languages Co-NP
P is closed under complementation but its is not known whether NP is closed under
complementation .A suspected relationship between Co-NP and other classes of languages is
shown below
NP-Complete Problems
NP
P
Co-NP
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 238
Computer Organization
Computer architecture deals with the structure and behavior of the computer system.
It includes the information formats, the instruction set and the hardware units that
implement the instructions alongwith the techniques for addressing memory.
Computer organization deals with the way the various hardware components operate and
the way they are connected together to form the computer system.
It also deals with the units of the computer that receive information from external sources
and send computed results to external destinations.
Computer design is concerned with the hardware design of the computer. This aspect of
computer hardware is sometimes referred to as computer implementation.
Basic blocks of a Computer System
Input Unit: It is a medium of communication between the user and the computer. With the
help of input unit, it is possible to give programs and data to the computer.
Examples: Keyboard, floppy disk drive, hard disk drive, mouse, Magnetic Ink Character
Recognition (MICR), Optical Character Recognition (OCR), paper tape reader, Magnetic tape
reader, Scanner etc.
Output Unit: It is a medium of communication between the computer and the user. With the
help of output unit only it is possible to take results from the computer.
Example: Printers, Video Display Unit (VDU), Floppy disk drive, Hard disk drive, Magnetic
tape drive, punched cards, paper tape, plotter, digitizer etc.
Memory: The memory unit is responsible for storing the user programs and data as well as
system programs. The digital computer memory unit consists of two types of memories:
Read Only Memory (ROM) and Read Write Memory (R/WM) or Random Access Memory
(RAM).
ALU: All arithmetic and logical operations are performed within this unit.
Control unit: It is used to generate necessary timing and control signals to activate different
blocks in the computer to perform the given task.
Central Processing Unite (CPU): The ALU and Control Unit together are called CPU. It is the
heart of any digital computer.
Byte Ordering or Endianness
When computers try to read or store multiple bytes. Where does the biggest byte appear?
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 239
Computer Organization
Big endian machine: Stores data big-end (MSB) first. When looking at multiple bytes, the first
byte (lowest address) is the biggest.
Little endian machine: Stores data little-end (LSB) first. When looking at multiple bytes, the
first byte is smallest.
Memory Unit
ROM:ROM is used to store permanent programs or system programs. It does not have write
capability.
Types: PROM, EPROM, EEPROM
RAM: It is also called user memory because the user programs or application programs are
stored in this memory. The CPU is able to write or read information into or from this type of
memory.
Types: static, dynamic, scratch pad etc.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 240
Computer Organization
When the binary number is positive, the sign bit is represented by 0. When the binary
number is negative, the sign bit is represented by 1.
The representation of the decimal point (or binary point) in a register is complicated by the
fact that it is characterized by a position between two flip- flops in the register. There are
two ways of specifying the position of the decimal point in a register.
1. Fixed Point
2. Floating Point
The fixed point method assumes that the decimal point (or binary point) is always fixed in
one position. The two positions most widely used are (1) a decimal point in the extreme left
of the register to make the stored number a fraction, and (2) a decimal point in the extreme
right of the register to make the stored number an integer.
Negative number can be represented in one of three possible ways
Signed- magnitude representation
Signed -1s complement representation
Signed -2s complement representation
The 2s complement of a given binary number can be formed by leaving all least significant
zeros and the first non-zero digit unchanged, and then replacing 1s by 0s and 0s by 1s in
all other higher significant digits.
Subtraction using 2s complement represent the negative number in signed 2s complement
form. Add the two numbers, including their sign bit and discard any carry out of the most
significant bit.
Since negative number are represented in 2s complement form, negative results also
obtained in signed 2s complement form.
2s complement form is usually chosen over 1s complement to avoid the occurrence of a
negative zero.
The 1s complement of 1s complement of a given number is same number.
The general form of floating point number is Smr .Where S= sign bit, M= Mantissa, r = base,
e = exponent.
The Mantissa can be a fixed point fraction or fixed point integer
Normalization: Getting non-zero digit in the most significant bit or digit position of the
mantissa is called Normalization.
It is possible to store more number of significant digits as a result accuracy can be improved,
if the floating point number is normalized.
A zero can not be normalized because it does not contain a non- zero digit.
The hexadecimal code is widely used in digital systems because it is very convenient to enter
binary data in a digital system using hexcode.
There are mainly two types of numbering systems:
a) Non positional number systems
Computer Organization
This number system is known as Decimal number system. A Decimal number system of counting
having ten different digits or symbols namely 0..9. This number system said to have a
base of ten as it has ten different digits.
The commonly used number systems with their symbols and bases.
Number system
Binary
Octal
Decimal
Hexadecimal
Radix
2
Essential Digits
0, 1
0, 1, 2, 3, 4, 5, 6, 7
10
0, 1, 2, 3, 4, 5, 6, 7, 8, 9
16
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 242
Computer Organization
Data Representation
In digital computer system, the information is represented by means of binary sequences, which
are organized in words. A word is a unit of information of a fixed length.The binary information
in digital computers is stored in memory or in processor registers. This binary information may
be in the form of either data or control information.
Types of Information:
Information
o Instructions
o Data
Numerical
Non numerical
Fixed point
Binary
Decimal
Floating point
Binary
Decimal
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 243
Computer Organization
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 244
Computer Organization
Size increases
Access time increases
Cost/bit decreases
Decrease in frequency of access
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 245
Computer Organization
Capacity: The capacity of a cache is simply the amount of data that can be stored in the cache, so
a cache with a capacity of 32KB can store 32 Kilobytes of data.
What is Cache?
High speed memory module connected to the processor for its private use.
Contains copies of recently referenced material.
Copies between cache and memory in lines or blocks.
By using cache memory the speed of operation will be increased and execution time will be
reduced.
Cache memory is also called high speed buffer memory
Processor
Cache
Main
Memory
Line length: The line length of a cache is the caches block size.
Associativity: The associativity of a cache determines how many locations within the cache may
pertain to a given memory address.
The speed of the main memory is very low in comparison with the speed of modern processors.
An efficient solution is to use a fast cache memory which essentially makes the main memory
appears to the processor to be faster than it really is. The effect of the cache mechanism is based
on a property of computer programs called locality of reference.
It manifests itself in two ways. i) temporal and ii) spatial.
Temporal:
Definition: Recently accessed items are likely to be accessed in future.
The temporal aspect of the locality of reference suggests that whenever an information item
(instruction or data) is first needed, this item should be brought into the cache, where it will
hopefully remain until it is needed again.
Spatial:
Definition: Items whose addresses are near one another are tend to be referenced close together
in time.
The spatial aspect suggests that instead of fetching just one item from the main memory to the
cache, it is useful to fetch several items that reside at adjacent addresses as well. We will use the
term block to refer to a set of contiguous address locations of some size. Another term that is
often used to refer to a cache block is cache line.
The correspondence between the main memory blocks and those in the cache is specified by a
mapping function. When the cache is full and a memory word (instruction or data) that is not in
the cache is referenced, the cache control hardware must decide which block should be removed
to create space for the new block that contains the referenced word. The collection of rules for
making this decision constitutes the replacement algorithm.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 246
Computer Organization
In a Read operation, the main memory is not involved. For a Write operation, the system can
proceed in two ways. In the first technique, called the write-through protocol. The cache location
and the main memory location are updated simultaneously. The second technique is to update
only the cache location and to mark it as updated with an associated flag bit, often called the
dirty or modified bit. The main memory location of the word is updated later, when the block
containing this marked word is to be removed from the cache to make room for a new block.
This technique is known as the write-back, or copy-back, protocol. When the addressed word in a
Read operation is not in the cache, a read miss occurs. The block of words that contains the
requested word is copied from the main memory into the cache. After the entire block is loaded
into the cache, the particular word requested is forwarded to the processor. Alternatively, this
word may be sent to the processor as soon as it is read from the main memory. The latter
approach, which is called load-through or early restart reduces the processors waiting period.
The performance of virtual memory or cache memory is measured with hit ratio.
Hit ratio is defined as the number of hits divided by the total number of CPU references to the
memory. (hits plus misses)
Tavg = HC + (1 H) M
Where H = Hit ratio of cache memory
C = time to access information in cache memory
M = miss penalty + Main Memory access time + Cache Memory access time
The average memory access time of a computer system can be improved considerably by use of
a cache.
Cache Coherence Problem:
The transformation of data from main memory to cache memory is referred to as a mapping
process.
Multilevel Cache Hierarchy
One of the fundamental issues is tradeoff between cache latency and hit rate. Larger caches have
better hit rates but longer latency. To address this tradeoff, many computers use multiple levels
of cache, with small fast caches backed up by larger slower caches.
Multi-level caches generally operate by checking the smallest Level 1 (L1) cache first; if it hits,
the processor proceeds at high speed. If the smaller cache misses, the next larger cache (L2) is
checked, and so on, before external memory is checked.
Multi-level caches introduce new design decisions. For instance, in some processors, all data in
the L1 cache must also be somewhere in the L2 cache. These caches are called strictly inclusive.
Other processors have exclusive caches data is guaranteed to be in at most one of the L1 and
L2 caches, never in both
Address Mapping:
a) Associative Mapping: The tag bits of an address received from the processor are compared to
the tag bits of each block of the cache to see if the desired block is present. This is called to
associative-mapping technique.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 247
Computer Organization
An associative cache employs a tag, that is, a block address, as the key. At the start of a
memory access, the incoming tag is compared simultaneously to all the tags stored in the
caches tag memory. If a match (cache hit) occurs a match indicating signal triggers the
cache to service the requested memory access. A no match signal identifies a cache miss, and
the memory access requested is forwarded to the main memory for service.
Main memory
Block0
Tag
Block 0
Block 0
Tag
Block 1
Block i
Block
127
Tag
Tag
Block
Block 4095
Fig.7.2.3
b) Direct Mapping: An alternative, and simpler, address mapping technique for cache is
known as Direct Mapping.
Main
memory
Block 0
Block 1
Cache
tag
tag
tag
Block 127
Block 0
Block 128
Block 1
Block 129
Block 255
Block 256
Block 127
Tag
Block
Word
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 248
Computer Organization
The direct- mapping technique is easy to implement, but it is not very flexible.
The main drawback of direct mapping is that the caches hit ratio drops sharply if two or
more frequently used blocks happen to map onto the same region in the cache.
c) Set Associative Mapping: Here, combination of the direct and associative mapping techniques
can be used. Blocks of the Cache are grouped into sets, and the mapping allows a block of the
mainmemory to reside in any block of a specific set. At the same time, the hardware cost is
reduced by decreasing by the size of the associative search. The tag field of the address must
then be associatively compared to the tags of the two blocks of the set to check if the desired
block is present. This two-way associative search is simple to implement.
The number of blocks per set is a parameter that can be selected to suit the requirement of a
particular computer. A cache that has k-blocks per set is referred to as a k-way setassociative cache.
One more control bit, called the valid bit, must be provided for each block. This bit indicates
whether the block contains valid data or not.
Main
memory
Block 0
Block 1
Set 0
Set 1
Cache
Block 0
Block 1
Block 2
tag
tag
tag
Block 3
Set 63
Block
63
Block
64
Block 65
Block 127
tag
tag
Block 126
Block 127
Block 128
Block 129
Block 4095
Tag
6
Set
6
Word
4
Fig. 7.2.5 Main memory address
The time required to find an item stored in memory can be reduced considerably if stored data
can be identified for access by the content of the data itself rather than by an address. Match
logic is used in the associative memory to identify data item.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 249
Computer Organization
Computer Organization
This nature of look-up explains why this scheme is called Translation Look-aside Buffer (TLB)
scheme.
The basic TLB buffering scheme is shown in Figure.
Suppose we wish to access page frame p. The following three possibilities may arise:
1. Cache presence: There is a copy of the page frame p. In this case it is procured from the lookaside buffer which is the cache.
2. Page table presence: The cache does not have a copy of the page frame p, but page table
access results in a page hit. The page is accessed from the main memory.
3. Not in page table: This is a case when the copy of the page frame is neither in the cache buffer
nor does it have an entry in the page table. Clearly, this is a case of page-fault. It is handled
exactly as the page-fault is normally handled.
Offset
CPU
offset
Main
Memory
Page Table
TLB
Fig. 7.2.6
With 32-bit addresses and 1kB page, the VPN and PPN are 22 bits each. With 128 entries and 4
ways set associativity, there are 32 sets in the TLB; so 5 bits of the VPN are used to select a set.
Therefore, we only have to store 17 bits of the VPN in order to determine if a hit has occurred,
but we need all 22 bits of the PPN to determine the physical address of a virtual address. This
gives a total of 41 bits per TLB entry. 41 128 = 5,125 kB
Memory mapping table is used to translate virtual address to physical address
The virtual memory is divided into pages and the main memory is divided into blocks or frames.
The size of the page must be equal to the size of the block or frames.
Random access memory page table techniques or Associate memory page table techniques can
be used to translate virtual address into main memory address.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 251
Computer Organization
In Associative memory page table technique, the number of memory locations required to store
memory mapping table is equal to the number of block available in the main memory.
The wastage of memory is minimum in the case of associative memory page table technique. The
most commonly used page replacement algorithms in virtual memory are (a) first-in-first-out
and (b) the least recently used (LRU).
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 252
Computer Organization
7.3: Pipeline
The speed of execution of programs is influenced by many factors. One way to improve
performance is to use faster circuit technology to build the processor and the main memory.
Another possibility, is to arrange the hardware in a manner so that more than one operation can
be performed at the same time.
Pipelining is particularly effective way of organizing concurrent activity in a computer system.
Pipeline is commonly known as an assembly-line operation.
There are two area of computer design where the pipeline organization is applicable.
1. An Arithmetic Pipeline: It divides an arithmetic operation into sub-operations for execution
in the pipeline segments.
Pipeline arithmetic units are usually found in very fast speed computer. They are used to
implement floating-point operations, multiplication of fixed-point number, and similar
computations encountered in scientific problem.
2. An Instruction Pipeline: It operates on a stream of instructions by overlapping the fetch,
decode and execute phases of the Instruction Cycle.
Example of Instruction Pipeline
IF
ID
EX
OF
Totaltimewithpipe
WB
[ + (n 1)]T
n
1)]
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 253
Computer Organization
Speed Up: The Speed Up of a pipeline processing over equivalent non pipeline processing is
defined by the ratio of
S=
a)
Sequential Execution:
I1
F1
I3
I2
E1
F2
E2
F3
E3
Interstate buffer
B1
Instruction fetch
unit
Execution unit
Fig. 7.3.1
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 254
b)
Computer Organization
Hardware organization
Time
Clock cycle
F1
E1
Instruction
I1
F2
I2
E2
F3
I3
E3
F1
D1
E1
F2
D2
Instruction
I1
I2
F3
I3
I4
W1
E2
W2
D3
E3
W3
F4
D4
E4
W4
D:
Decode
instruction
and fetch
operands
F : Fetch
Instruction
B1
E:Execute
operation
B2
W : Write
results
B3
Computer Organization
The instruction fetched by the fetch unit is deposited in an intermediate storage buffer. This
buffer is needed to enable the execution unit to execute the instruction while the fetch unit is
fetching next instruction.
The computer is controlled by a clock whose period is such that the fetch and execute steps of
any instruction can each be completed in one clock cycle.
An interchange storage buffer, B1, is needed to hold the information being passed from one stage
to next. New information is loaded into this buffer at the end of each clock cycle.
A pipelined processor may process each instruction in 4 steps:
F
The sequence of events for this case is shown in Figure 7.3.3. Four instructions are in progress at
any given time. This means that four distinct hardware units are needed, as shown in Figure
7.3.4. These units must be capable of performing their tasks simultaneously and without
interfering with one another. Information is passed from one unit to the next through a storage
buffer. As an instruction progresses through the pipeline all the information needed by the
stages downstream must be passed along. For example, during clock cycle 4, the information in
the buffers is as follows:
Buffer B1 holds instruction , which was fetched in cycle 3 and is being decoded by the
instruction-decoding unit.
Buffer B2 holds both the source operands for instruction
and the specification of the
operation to be performed. This is the information produced by the decoding hardware in
cycle 3. The buffer also holds the information needed for the write step of instruction (step
). Even though it is not needed by stage E, this information must be passed on to stage W
in the following clock cycle to enable that stage to perform the required Write operation.
Buffer B3 holds the results produced by the execution unit and the destination information
for instruction .
Pipeline Performance:
In above figure, processor completes the processing of one instruction in each clock cycle, which
means that the rate of instruction processing is four times as compared to sequential operation.
For a variety of reasons, one of the pipeline stage may not be able to complete its processing task
for a given instruction in the time allotted. Some operations, such as divide, may require more
time to complete. Idle periods are called stalls. They are also referred to as bubbles in the
pipeline. Once created as a result of a delay in one of the pipeline stages, a bubble moves down
stream until it reaches the unit pipelined operations on the above situation, is said to have been
stalled for two clock cycles. Any condition that cause the pipeline to stall is called a hazard.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 256
Computer Organization
A data hazard is any condition in which either the source or the destination operands of an
instruction are not available at the time expected in the pipeline. As a result some operations
has to be delayed, and the pipeline stalls.
The pipeline may also be stalled because of a delay in the availability of an instruction. For
example, this may be a result of a miss in the cache, requiring the instruction to be fetched from
the main memory. Such hazards are often called control hazards (or instruction hazards).
A third type of hazards that may be encountered in pipelined operation is known as a structural
hazard. This is the situation when two instructions require the use of a given hardware resource
at the same time. The most common case in which this hazard may arise is in access to memory,
or structure hazard occurs when the processors hardware is not capable of executing all
instruction in the pipeline simultaneously.
If instructions and data reside in the same cache unit, only one instruction can proceed and the
other instruction is delayed. Many processors use separate instruction and data caches to avoid
this delay.
Data Hazards:
A data hazard is a situation in which the pipeline is stalled hold because the data to be operated
on are delayed for some reason. We must ensure that the results obtained when instructions are
executed in a pipelined processor are identical to those obtained when the same instruction are
executed sequentially.
Pipeline Conflicts: In general, there are three major difficulties that cause the Instruction Pipeline
to deviate from its normal operation.
1. Resource Conflicts caused by access to memory by two segments at the same time. Most
of these conflicts can be resolved by using separate instruction on data memories.
2. Data Independency Conflicts arise when an instruction depends on the result of previous
instruction, but this result is not yet available.
3. Branch Difficulties arise from branch and other instruction that change the value of PC.
Data Hazards avoiding techniques:
1. Hardware Interlock or delayed load or bubble: The hardware interlock preserves the
correct execution pattern of an instruction sequence. In general a hardware interlock
detects the data hazard and stalls the pipeline until the hazard is cleared.
2. Operand Forwarding: The operand forwarding is a hardware technique to minimize the
stalls in a pipelined execution of sequence of instructions. The key insight the forwarding
is that the result of a previous instruction is directly fed to the next instruction through
the pipelined registers without waiting to be written in WB stage to the register file.
Branch Hazards avoiding techniques
1. Pipeline flushing : Simplest solution to the handle branches is to freeze or flush the
pipeline, deleting any instruction after the branch until the target of the branch is known
2. Branch prediction: Continuing with the sequence of instructions as if branch were taken
or not taken.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 257
Computer Organization
3. Delayed Branching: A branch delay slot is introduced after the branch instruction and
filled with the instruction will be executed irrespective of branch is taken or not.
There are two types of Branch Prediction techniques.
They are
done by compiler
is branch penalty.
Anti Dependence Write After Read (WAR) Hazard: Dependence resulting from reuse of a name.
Data Dependence Read After Write (RAW) Hazard or True Dependence: True dependence
resulting from use of a data value produced by an earlier statement.
O/P Dependence Write After Write (WAW) Hazard: Dependence resulting by writing a value
before a preceding write has completed.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 258
Computer Organization
certain pieces of code are only executed if a condition is met (involved conditional branches)
and
loops are possible. Such that pieces of code can be executed over and over again.
Micro-operation: It is the elementary operation performed on the binary data stored in the
internal registers of a digital computer during one clock period. The result of the operation
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 259
Computer Organization
may replace the previous binary information of a register or may be transferred to another
register.
Macro-operation or operation: A set or micro operations is called a macro-operation.
Instruction Code: It is a group of bits that tell the computer to perform a specific operation. It
is divided into parts called fields, each having its own particular interpretation. The most
basic part of an instruction code is its operation part.
The basic instruction code format consists of two parts or two fields.
Opcode
Address
OpCode: It is also called Operation Code field. It is a group of bits that define the operation
performed by the instruction.
The number of bits required for the operation part of an instruction code is a function of the
total number of operations used. It must consist of at least n-bits for given 2n (or less)
distinct operations.
The address part of the instruction tells the control, where to find an operand in the
memory.
Operand: It is the data on which the given operation is to be performed.
Stored Program Connect: It is the most important concept used in all the digital computers.
In this concept the instructions are stored in one section of the memory and data is stored in
another section of the memory. The CPU fetches one by one instructions from the memory
and they will be decoded and executed.
Computers that have a processor register usually assign to it name accumulator and label
it AC.
In general, the length of Accumulator register or processor register must be equal to the
length of each memory location.
In general, the basic registers available in the digital computer are, Program Counter (PC),
Memory Buffer Register (MBR),Memory Address Register (MAR), Operation Register (OPR),
Accumulator register(AC), Mode bit flip- flop (1), Extension (E) flip-flop etc.
PC: The Program Counter always holds the address of the next instruction to be fetched from
the memory location.
The length of the PC always depends on the addressing capacity of the CPU or the number of
memory locations available in the memory. If the memory consist of 2n memory location
then n bit PC is required.
OPR: The operation register is used to hold the operation part of the instruction. The length
of this register depends on the length of the operation part of the instruction.
The length of MAR must be equal to the length of PC.
The length of MBR is equal to the length of AC and length of each memory location.
The use of mode flip-flop (l) is to hold mode bit. With the help of this mode bit the CPU can
distinguish direct address instruction and indirect address instruction.
The E flip-flop is an extension of the AC. It is used during shift operations. It receives the end
carry during addition, etc.
The basic instruction code formats can be of three types: Immediate operand instructions,
Direct address instructions, and Indirect address instructions.
Immediate operand instructions: If the second field of an instruction code specifies an
operand, the instruction is said to have an immediate operand.
Op Code
Operand
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 260
Computer Organization
Direct address instructions: It is second field of an instruction code specifies the address of
an operand, then it is called direct address instruction.
Op Code
Address of operand
Indirect address instructions: If the second field of an instruction code designates an address
of a memory word in which the address of the operand is found, is called the
indirectaddressinstruction.
Op Code
Todistinguish direct and indirect address instructions a third field (one bit field) called Mode
field is used in the instruction code format.
I
Op Code
Op Code
Address of operand
The most common fields found in the instruction code formats are:
(a) mode bit field
(b) Op Code field and
(c) address field.
The CPU requires zero memory references to complete the execution of the immediate types
of instruction, once by instruction code is transferred from memory into CPU.
The CPU requires one memory reference to complete the execution of the direct address
type of instruction, once the instruction code is transferred from memory into the CPU.
The CPU requires two memory references to complete the execution of the indirect address
type of instruction, once the instruction code is transferred into the CPU.
Depending on the way the CPU refers the memory, Input-Output and registers, the
instructions of the basic computer can be of three types: Memory reference instructions,
Registers reference instructions and Input-Output reference instructions.
Memory reference instructions: The CPU is supposed to refer to the memory to get the
operand for the completion of the execution of the instruction.
Register reference instruction: The CPU is supposed to refer to the internal registers of the
CPU to get the operand for the execution of the instruction.
Input-Output reference instructions: The CPU is supposed to refer the input or output devices
to complete the execution of the instruction.
Most computers fall in three types of CPU organizations: Single AccumulatorOrganization,
General Register Organization and Stack Organization.
Based on the number of address fields available in the address part of the instruction, there
are four different types of instructions.
1. Three address instructions
2. Two address instructions
3. One address instructions
4. Zero address instructions
Three address instructions: The address field of the instruction code is divided into three sub
fields. Example: ADD R 1, R2, R3
(R1) (R2) + (R3)
The advantage of the three- address format is that it result in short program when
evaluating arithmetic expression.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 261
Computer Organization
Two address instructions: In this type of instruction the address field of the instruction code
is divided into two sub fields. Example: ADD R1, R2 (R1) (R1) + (R2)
The example of a computer which supports two address instructions is PDP -11
One Address Instructions: In this type of instructions, an implied accumulator (AC) register
are used for all data manipulation.
Example: LOAD A, AC M [A]
ADD B, AC M [B] + AC
The example of a computer which supports one address instruction is PDP-8
Zeroaddress instructions: Some operational instructions do not require address field, such
instructions are called zero address instructions.
Example: computer which supports zero address instructions is Burroughs - 6700
Single accumulator organized computers uses one address instruction.
General register organized computer uses two and three address instructions.
Stack organized computer uses zero address instructions.
Stack: It is a set of memory locations of the RAM which are used to store information in such
a manner that the item stored last, is the first item retrieved.
A very useful feature that is included in the CPU of many computers is a stack or Last In
First Out (LIFO) list.
Stack Pointer (SP): It is a register, which always holds the address of the top of the stack. The
length of the stack pointer register is equal to the length of PC register.
PUSH and POP instructions are used to communicate with stack memory.
FULL and EMPTY flip- flops are used to indicate status of the stack memory.
The arithmetic expressions can be written in one of three ways
1. Infix notation
2.Prefix notation and
3.Postfix notation
If the operator is placed between the operands then it is called infix notation.
Ex: A+B
The prefix notation is also called polish notation. In this notation the operator is placed
before the operands. Ex: +AB
The postfix notation is also called as reverse polish notation. In this notation the operator is
placed after the operands. Ex: AB+
The reverse polish notation is suitable for stack organized computers.
The expression A* B +C* D can be represented in reverse polish notation as AB*CD*+
Computer Organization
RISC
Variable format
Memory Operands
Load/Store Architecture
Complex instruction
Simple Operations
Features
CISC
RISC
Multiple clocks are required to complete a The instruction usually take a single clock
instruction
Difficult to take advantage of pipelining
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 263
Computer Organization
Assembler Syntax
Addressing function
Implied
NIL
Implied
Immediate
# value
Operand =Value
Register
Ri
EA= Ri
Absolute ( Direct )
LOC
EA= LOC
Indirect
Ri
EA= [Ri]
Index
X (Ri)
EA = [Ri] +X
(Ri, Rj)
X (Ri,Rj)
Relative
X (PC)
EA = [PC] + X
Auto increment
(Ri) +
EA = [Ri];
Increment Ri
Auto decrement
-(Ri)
Decrement Ri;
EA = [Ri]
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 264
Computer Organization
Effective Address: The effective address is defined to be the memory address obtained from
the computation dictated by the given addressing mode. The effective address is the address
of the operand in a computational type instruction.
Implied mode: In this mode the operands are specified implicitly in the definition of the
instruction. Zero address instruction in a stack organized computer are implied-mode
instruction.
EX: PUSH
Immediate Mode: In this mode the operands are specified in the instruction itself.
Register Mode: In this mode the operands are in register that reside within the CPU.
Register Indirect Mode: In this mode the instruction specifies a register in the CPU whose
contents give the address of operands in memory.
Auto increment or Auto decrement: This is similar to the register indirect mode except that
the register is incremented or decremented after (or before) its value is used to access
memory.
Direct Address Mode: In this mode the effective address is equal to the address part of the
instruction.
Indirect Address mode: In this mode the address field of the instruction gives the address
where the effective address is stored in the memory.
Relative Address Mode: In this mode the content of the program counter (PC) is added to the
address part of the instruction in order to obtain the effective address. Relative addressing is
often used with branch type instruction when the branch address is in the area surrounding
the instruction word itself.
Indexed Addressing Mode: In this mode the content of a index register is added to the
address part of the instruction to obtain the effective address. The index register is a special
CPU register that contains an index value.
Base Register Addressing Mode: In this mode the content of a base register is added to the
address part of the instruction to obtain the effective address.
The addressing modes supported by one processor may differ from the addressing modes
supported by the other processors.
(A) Direct = 400
(B) Immediate = 301
(C) Relative = 302+400=702
(D) Register indirect = 200
(E) Indexed = 200+400 = 600
Computer cycle: Digital computers provide timing sequence of 8 to 16 repetitive timing
signals. The time of one repetitive sequence is called a computer cycle.
Types of computer cycles are Fetch cycle, Indirect cycle, Executive cycle and Interrupt cycle.
Fetch cycle: When an instruction is read from memory the computer is said to be in an
instruction fetch cycle. The first cycle of any instruction cycle must be always a fetch cycle.
Indirect cycle: When the word read from the memory is an address of operand, the computer
is said to be in an indirect cycle.
Execute cycle: When the word read from memory is an operand, the computer is said to be in
an execute cycle.
The execute cycle can come after fetch cycle or indirect cycle.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 265
Computer Organization
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 266
Computer Organization
Details
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 267
Computer Organization
I/O Mapping
Memory mapped I/O.
Devices and memory share an address space.
I/O looks just like memory read/write.
No special commands for I/O.
Large selection of memory access commands available.
Isolated I/O
Separate address space.
Need I/O or memory select lines.
Special commands for I/O.
Limited set.
Basic Operation
CPU issues read command.
I/O module gets data from peripheral devices while CPU does other work.
I/O module interrupts CPU.
CPU requests data.
I/O module transfers data.
In memory mapped I/O techniques, the I/O devices are also treated as memory locations,
under that assumption that they will be given addresses. Same control lines are used to
activate memory locations as well as I/O devices.
In I/O or Isolated I/O technique, the I/O devices are given separate addressed and separate
control signals are used to activate memory locations and I/O devices.
Data transfer between CPU and peripherals is handled in one of three possible modes:
Data transfer under program controlled.
Interrupt initiated data transfer.
Direct memory Access (DMA) transfer.
Program controlled operations are the result of I/O instruction written in the computer
Each data item transfer is initiated by an instruction in the program.
The disadvantage of program controlled data transfer is that, the processor stays in a
program loop until the I/O unit indicates that it is ready. This is a time-consuming process
since it keeps the processor busy needlessly.
In interrupt initiated data transfer, when the peripheral is ready for data transfer, it
generates an interrupt request to the processor. Then the processor stops momentarily the
task it is doing, branches to a service routine to process the data transfer and then returns to
the task it was performing.
In Direct Memory Access (DMA) the interface transfers data into and out of memory unit
through the memory bus generally to transfer bulk amount of data from memory to
peripheral or from peripheral to CPU, DMA techniques is used.
There are basically two formats of data transfer: parallel and serial.
In parallel mode data bits (usually a byte) are transferred parallely over the communication
lines referred to as buses. Thus all the bits of a byte are transferred simultaneously within
the time frame allotted for the transmission.
In serial data transfer, each data bit is sent sequentially over a single data line.
In order to implement serial data transfer, the sender and receiver must divide the
timeframe allotted for the transmission of a byte into subintervals during which each bit is
sent and received.
In serial transmission the information is transferred in the form of frames. The frame
consists of three parts. Start bit, character code and stop bits.
Start bit is always logic 0 and stop bits are always at logic 1 level.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 268
Computer Organization
The data transfer in both the parallel and serial mode of operation can be either
synchronous.
In synchronous mode both source unit and destination unit work in synchronous mode with
same control signal.
In asynchronous mode the source unit and destination units have their own independent
control signals.
Asynchronous parallel data transfer can be of three types. (a) Strobe control (b) Two wire
hand shaking method and (c) Three wire handshaking method.
Asynchronous parallel transfer
o Strobe control
Source initiated
Destination initiated
o Two wire handshaking
Source initiate
Destination initiate
o Three wire handshaking
Asynchronous data transfer between two independent unit requires the control signals to be
transmitted between the communication units to indicate the time at which data is being
transmitted. One way of achieving this is by means of a strobe pulse supplied by one of the
unit to indicate to the other unit when the transfer has to occur.
Exchange of control signals between source and destination units during the data transfer
operation is called as handshaking.
The disadvantage of two-wire handshaking method is, it is not possible to connect more than
one destination unit to a single source unit. Transmitter is used to interface serial I/O device
to a parallel bus structure.
In DMA transfer, the CPU initializes the DMA by sending a memory address and the number
of words to be transferred.
During DMA transfer, the CPU is idle and has no control of the memory buses.
A DMA controller takes over the buses to manage the transfer directly between the I/O
device and memory.
DBUS
Bus request BR
ABUS
Address bus
bus
Data bus
CPU
Bus grant
BG
RD
Read
WR
Write
High- impedance
(disable)
When BG is
enabled
Computer Organization
VAD 1
VAD 2
VAD 3
Device 1
O
I
Device 2
I
O
Device 3
I
O
Interrupt request
Interrupt Acknowledge
To Next
Device
I T
C U
I T AC
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 270
Computer Organization
The parallel priority interrupt method uses a register whose bits are set separately by the
interrupt request from each device priority is established according to the position of the
bits in the register.
Interrupts
1 Introduction
An interrupt is signal either sent by an external device or internally by the software,
indicating the need for attention or change in execution.
Interrupts arising from an external events, is called asynchronous
Interrupts arising from an internal even are generated by the software and generally called
synchronous
Following are the examples are both type of interrupts
Externalities: Interrupts come typically from I/O devices which have completed a task, have
run out of some resources or have run into some problems, such as an attempt to write to a
write-disabled disk.
Page faults: Running under virtual memory implies that many parts of a program will reside
on disk. If references to those resident pages cannot be answered in reasonable time, a fault
interrupt is issued which calls for a context switch to a task that can make use of the CPU
while the missing pages are brought in from disk.
Address translation errors: Address translation errors occur if events such as trying to write
to read-only-memory or doing any operation in a space, that is not open to the running
program.
Illegal instructions fault: This type of faults includes undefined opcodes and instructions
reserved for higher level of privilege.
Arithmetic errors: This class includes divide-by-zero, word and half-word overflow.
Direct Memory Access (DMA)
1. Overview
Direct memory Access (DMA) is a operational transfer mode which allows data transfer within
memory or between memory and I O device without processors intervention. A special DMA
controller manages that data transfer. A DMA controller can be implemented as separated
controller from the processor or integrated controller in the processor. In the following
integrated DMA controller is considered.
The DMA mechanism provides two unique methods for performing DMA transfer:
Demand-mode transfer (synchronized to external hardware): Typically used for transfers
between an external device and memory. In this mode, external hardware signals are
provided to synchronize DMA transfers with external requesting devices.
Block-Mode transfer (non-synchronized): Typically used to move block of data within
memory.
To perform a DMA operation the DMA controller uses microcode the cores multi-process
resources, the bus controller and internal hardware dedicated to the DMA controller. Loads and
stores are executed in DMA microcode to perform each DMA transfer. Multi-process resources
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 271
Computer Organization
are used to enable DMA operations to be executed concurrently with users program. The bus
controller, directed by the DMA microcode, handles data transaction in external memory.
External bus access is shared equally between the user and the DMA process. The bus controller
executes bus requests by each process in alternating fashion. The DMA controller hardware
synchronizes transfers with external devices or memory, provides the programmers interfaces
to the DMA controller itself, and manages the priority for servicing DMA requests.
2. Data transfers
Different DMA transfer modes are explained in the following paragraph.
Multi-cycle Transfer
Multi-cycle Transfer comprises of two or more bus requests: loads from source address are
followed by stores to a destination address. To execute the transfer, DMA microcode issues the
proper combination of bus requests. The processor effectively buffers the data for each transfer.
When the DMA is configured for destination synchronization, the DMA controller buffers source
data, waiting for the request from the destination requestor. The initial DMA request still
requires the source data to be loaded before the request is acknowledged.
32bit memory
Data
source
Integrated
DMA controller
Buffer load
data
DACK
Data
Destination
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 272
DREQ
DACK
Integrated
DMA
controller
32bit memory
Data
Destination
Buffer
load
data
Computer Organization
External system
bus
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 273
Computer Organization
Throughput describes how fast data is moved by a DMA operation. It is defined as the number of
the controller clock cycles per DMA request. This value is denoted as
The established
measure of throughput, in units of byte/second, is derived by the following equation:
Throughput (Bytes/Second) = (
f )
where:
f , where:
Data bus
Data bus
buffer
Address bus
buffer
Address register
DMA Select
DS
Register Select
RS
Read
RD
Write
WR
Bus Request
BR
Control
Logic
Control register
DMA request
Bus Grant
BG
Interrupt
Interrupt
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 274
Computer Organization
INPUT/OUTPUT:
The devices that are connected to the periphery of the CPU are called peripheral devices.
Example: input and output devices.
Input devices: It is a medium of communication between user to the computer.
Example: keyboard, Floppy disk drive, hard disk drive, Mouse, Magnetic tape drive, paper
tape Deader, card reader, VDU etc.,
Output device: It is a medium of communication between computer to the user.
Example: VDU printers, Floppy disk drive, punched cards, potters etc.
On line devices: Devices that are under the direct control of the processor are said to be
connected on line.
Off line devices: When a device is offline then it is operated independently of the computer.
All peripheral devices are electromechanical & electromagnetic devices.
The I/O organization of a computer is a function of the size of the computer and the devices
connected to it.
Auxiliary Memory: The device that provide backup storage are called auxiliary or secondary
memory. The secondary memory is not directly assessable to the CPU.
Example: Hard disk, floppy disk, magnetic tape etc.
The important characteristics of any device are its access mode, access time, transfer rate,
capacity, and cost.
Seek Time: It is the time required, to move the read/write head to the proper track. This depends
on the initial position of the head relative to the track specified in the address.
Rotational delay (latency time): This is the amount of time that elapses after the head is
positioned over the correct track until the starting position of the addressed sector passes under
the read/write head. On average, this is the time for half a rotation of the disk.
Access Time: The sum of these two delays is called the disk access time.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 275
Computer Organization
Disk Controller:
Operation of a disk drive is controlled by disk controller, which also provides an interface
between the disk drive and the bus that connects it to the rest of the computer System. The disk
controller may be used to control more than one drive. The disk controller keeps track of such
sector ad substitutes other sector instead.
Main memory Address: The address of the first main memory location of the block of words
involved in the transfer.
Disk Address: The location of the sector containing the beginning of the desired block of words.
Word Count: The number of words in the block to be transferred.
Magnetic Hard Disks
Tracks
Sector
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 276
Computer Organization
RAID Disk Arrays: RAID stands for, Redundant Array of Inexpensive Disks. Using multiple disks
also makes it to improve the reliability of the overall system. Six different configurations were
proposed. They are known as RAID levels even though there is no hierarchy involved.
RAID 0 is the basic configuration intended to enhance performance. A single large file is stored
in a several separate disk units by breaking the file up into a number of smaller pieces and
storing these pieces on different disks. This is called data striping.
RAID 1 is intended to provide better reliability by storing identical copies of data on two disks
rather than just one. The two disks are said to be the mirrors of each other.
RAID 2, RAID 3 and RAID 4 levels achieve increased reliability through various parity checking
schemes without requiring a full duplication of disks. All of the parity information is kept on one
disk.
RAID 5 also makes use of parity based error recovery scheme. However, the parity information
is distributed among all disks, rather than being stored on one disk. Indeed, the term RAID has
been redefined by the industry to refer to independent disks.
ATA/EIDE Disks: EIDE (Enhanced Integrated Drive Electronics) or as ATA (Advanced
Technology Attachment). Many disk manufactures have a range of disks that have EIDE/ATA
interfaces. In fact Intel s entium chip sets include on controller that allows EIDE ATA disks to
be connected to the motherboard one of the main drawback is that separate controller is needed
for each drive if two drives are to be used concurrently to improve performance.
RAID Disks: RAID disks offer excellent performance and provide a large and reliable storage.
They are used either in high-performance computers.
Optical Disks & CD Technology:
The optical technology that is used for CD systems is based on a laser light source. A laser beam
is directed onto the surface of the spinning disk. Physical indentation in the surface are arranged
along the tracks of the disk. They reflect the focused beam toward a photo detector, which
detects the stored binary patterns.
The laser emits a coherent light beam that is sharply focused on the surface of the disk. Coherent
light consists of synchronized waves that have the same wavelength. If a coherentlight beam
iscombined with another beam of the same kind, and the two beams are in phase, then the result
will be a brighter beam. But, if the waves of the two beams are 180 degrees out of phase, they
will cancel each other. Thus, if a photo detector is used to detect the beams, it will detect a bright
spot in the first case and a dark spot in the second case.
The bottom layer is polycarbonate plastic, which function as a clear glass base. The surface of
this plastic is programmed to store data by indenting it with pit. The unindented parts are called
lands.
The laser source and the photo detector are positioned below the polycarbonate plastic. The
emitted beam travels through this plastic reflects off the aluminum layer, and travels back
toward the photo detector.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 277
Computer Organization
CD-ROM:
Science information is stored in binary form in CDs they are suitable for use as a storage
medium in computer system. The biggest problem is to ensure the integrity of stored data.
Because pits are very small, it is difficult to implement all of pits perfectly.
Stored data are organized on CD-ROM tracks in the form of blocks that are called sectors.
Error handling
Possible errors on a disk subsystem are:
Programming error: For example the driver requests the controller to seek to a nonexistent
sector. Most disk controllers check the parameter given to them and complain if they are invalid.
Transient checksum error: That are caused by dust on the head. Most of the time they are
eliminated by just repeating the operation few times. If error persists, the block has to be
remarked as a bad block and avoided.
Permanent checksum error: In this case, the disk blocks are assumed to be physically damaged.
These errors are unrecoverable errors and these blocks are remarked as bad block and avoided.
Seek error: For example, the arm was sent to cylinder 6, but it went to cylinder 7. Normally, it
keeps track of the arm position internally. To perform a seek, it issues a series of pulses to the
arm motor, one pulse per cylinder to move the arm to the destination cylinder. Then, the
controller reads the actual cylinder number to check whether the seek operation is correct or
not. If the seek error occurs, the controller moves the arm as far as it will go out, resets the
internal current cylinder to 0 and tries it again. If it does not help, the drive must be repaired.
Disk controller error: The controller refuses to accept command from the connected computer. It
has to be replaced.
Disk Structure:
Disk drives are addressed as large 1-dimensional arrays of logical blocks, where the logical block
is the smallest unit of transfer.
The 1-dimensional array of logical blocks is mapped into the sectors of the disk
sequentially.
Sector 0 is the first sector of the first track on the outermost cylinder.
Mapping proceeds in order through that track, then the rest of the tracks in that cylinder,
and then through the rest of the cylinders from outermost to innermost.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 278
Digital Logic
Dont care values or unused states in BCD code are 1010, 1011, 1100, 1101, 1110, 1111.
Dont care values or unused state in excess 3 codes are 0000, 0001, 0010, 1101, 1110,
1111.
The binary equivalent of a given decimal number is not equivalent to its BCD value. Eg.
Binary equivalent of 2510 is equal to 110012 while BCD equivalent is 00100101.
In signed binary numbers,MSB is always sign bit and the remaining bits are used for
magnitude.
A7
A6
A5
A4 A3 A2 A1
A0
Sign Bit
Magnitude
For positive and negative binary number, the sign is respectively 0 and 1.
Negative numbers can be represented in one of three possible ways.
1. Signed magnitude representation.
2. Signed 1s complement representation.
3. Signed 2s complement representation.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 279
Example:
Signed magnitude
+9
0 0001001
Digital Logic
-9
(a) 1 000 1001 signed magnitude
(b) 1 111 0110 signed 1s complement
(c) 1 111 0111 signed 2s complement
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 280
Digital Logic
2. 0 . X = 0
3. X . 1 = X
4 .1.X = X
b) Properties of OR function
5. X + 0 = X
6. 0 + X = X
7. X + 1 = 1
8. 1 + X = 1
10. X . X = X
11. X + X = X
12. X + X = 1
13. (X) = X
d)
e)
f)
g)
Commutative laws:
Distributive laws:
Associative laws:
Absorption laws:
h) Demorgans laws:
14.
16.
18.
20.
x. y = y. x
x(y +z) = x.y + x.z
x(y.z) = (x. y) z
x + xy= x
15.
17.
19.
21.
x+y=y+x
x + y. z = ( x+y) (x + z)
x + ( y + z) = (x + y) +z
x(x + y) = x
22. x + xy = x+ y
23. x(x + y) = xy
24. (x + y) = x .y
25. (x . y) = x + y
Duality principle: It states that every algebraic expression deducible from theorems of
Boolean algebra remains valid if the operators and identify elements are interchanged.
To get dual of an algebraic function, we simply exchange AND with OR and exchange 1
with 0.
The dual of the exclusive OR is equal to its complement.
To find the complement of a function is take the dual of the function and complement
each literal.
Maxterm is the compliment of its corresponding minterm and vice versa.
Sum of all the minterms of a given Boolean function is equal to 1.
Product of all the maxterms of a given Boolean function is equal to 0
Boolean Algebraic Theorems
Theorem No.
Theorem
) =
( + B). ( + B
1.
2.
B + C = ( + C)( + B)
( + B)( + C) = C + B
3.
4.
B + C + BC = B + C
+ C +
6.
. B. C. = + B
. C
7.
+ B + C + = . B
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 281
Digital Logic
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 282
Digital Logic
+VCC
Symbol
A
Y=
AND gate:
Truth Table
A
B
Y
0
0 0
0
1 0
1
0 0
1
1 1
VCC
A
B
Y = AB
A
B
OR gate:
A
0
0
1
1
B
0
1
0
1
Y
0
1
1
1
Y=
A
Y = A+B
A
Y
B
NAND gate:
A
0
0
1
1
B
0
1
0
1
Y
1
1
1
0
A
B
Y =
B
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 283
Digital Logic
NOR gate:
A
0
0
1
1
B
0
1
0
1
Y
1
0
0
0
A
Y =
+B
The circuit, which is working as AND gate with positive level logic system, will work as OR
gate with negative level logic system and vice-versa.
The circuit which is behaving as NAND gate with positive level logic system will behave as
NOR gate with negative level logic system and vice versa.
Exclusive OR
inputs.
A
B
0
0
0
1
1
0
1
1
gate (X OR): The output of an X OR gate is high for odd number of high
A
Y
0
1
1
0
Y = AB= B + B
Exclusive NOR gate (XNOR): The output is high for odd number of low inputs. (OR) The
output is high for even number of high inputs.
A
B
Y
A
0
0
1
Y = AB= B + B
0
1
0
B
1
0
0
1
1
1
Realization of Basic gates using NAND and NOR gates:
NOT gate
A
NAND
Y=
A
A
1
NOR
Y = ( . ) A
=
Y = ( .1) A
0
=
( + ) =
Y = ( + 0)
=
AND gate
A
A
B
A
Y =AB
Y =AB
Y =AB
OR gate:
A
A
B
A
Y =A+B
B
Y = A+B B
Y = A+ B
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 284
Digital Logic
A
Y = ( B)
Y = ( B)
B
Realization of NOR gate using NAND gates:
A
A
Y = ( + B)
Y = ( + B)
B
A
Y = B + B
B
A
`
Y = B +
B
The minimum number of NAND gates required to realize X OR gate is four.
The minimum number of NOR gates required to realize X OR gate is five.
Equivalence Properties:
1. (X Y) = XY + XY = X
2. X 0 = X
3. X 1 = X
4. X X = 1
5. X X= 0
6. X Y = Y X
7. (X Y) = X Y
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 285
Digital Logic
A
Y=( B)
B
`
=A+B
Y = A+B
A
`
+ B ) =AB
B
`
Y= B
A
Y=
B = ( + B)
Y = ( + B)
Y=
+ B =( B)
A
`
B
`
Y = ( B)
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 286
Digital Logic
Carry = XY
Half Subtractor: It is a Combinational circuit that subtracts two bits and produces their
difference.
Diff. = X Y = XY + XY Borrow = X Y
Half adder can be converted into half subtractor with an additional inverter.
Full Adder: It performs sum of three bits (two significant bits and a previous carry) and
generates sum and carry.
Sum=X Z
Carry = XY + YZ + ZX
Full adder can be implemented by using two half adders and an OR gate.
X
Y
H.A.
H.A.
Sum
Carry
Full subtractor: It subtracts one bit from the other by taking pervious borrow into account
and generates difference and borrow.
Diff.=X Z
Borrow = XY + YZ + ZX
Full subtractor can be implemented by using two half- subtractors and an OR gate.
X
Y
Z
H.S.
H.S.
Diff.
Borr.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 287
Digital Logic
Multiplexers (MOX)
It selects binary information from one of many input lines and directs it to a single output
line
The selection of a particular input line is controlled by a set of selection lines
There are 2 input lines where n is the select lines i/p then n = log
2 : 1 MUX
I
2:1
MUX
Y=SI + SI
S
4 : 1 MUX
I
I
I
I
4:1
MUX
S1
S1
0
0
1
1
S0
0
1
0
1
Y
I
I
I
I
S0
Y=S S I + S S I + S S I + S S I
Decoder:
Decoder is a combinational circuit that converts binary information from n input lines to a
maximum of 2 unique output lines.
Truth table of active high output type of decoder.
X
2
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 288
Digital Logic
Encoder
Encoder is a combinational circuit which has many inputs and many outputs
It is used to convert other codes to binary such as octal to binary, hexadecimal to binary
etc.
Clocked S-R Flip-flop: It is called set reset flip-flop.
No change
Reset set
Forbidden
Pr
S
Clk
R
Cr
= S +R
PRESET
S
Clk
R
CLEAR
S and R inputs are called synchronous inputs. Preset (pr) and Clear (Cr) inputs are called
direct inputs or asynchronous inputs.
The output of the flip-flop changes only during the clock pulse. In between clock pulses the
output of the flip flop does not change.
During normal operation of the flip flop, preset and clear inputs must be always high.
The disadvantage of S-R flip-flop is S=1, R=1 output cannotbe determined. This can be
eliminated in J-K flip-flop.
S-R flip flop can be converted to J-K flip-flop by using the two equation S=J and R= K .
J
Q
Pr
Clk
Clk
Q
R
Q
K
Cr
=J
+K
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 289
Digital Logic
Truth table
Race around problem is present in the J-K flip flop, when both J=K=1.
Toggling the output more than one time during the clock pulse is called Race around
Problem.
The race around problem in J-K flip-flop can be eliminated by using edge triggered flip-flop
or master slave J-K flip flop or by the clock signal whose pulse width is less than or equal to
the propagation delay of flip-flop.
Master-slave flip-flop is a cascading of two J-K flip-flops Positive or direct clock pulses are
applied to master and these are inverted and applied to the slave flip-flop.
D-Flip-Flop: It is also called a Delay flip-flop. By connecting an inverter in between J and K
input terminals. D flip-flop is obtained.
Truth table
J
D
0
Q
Clk
T Flip-flop: J K flip-flop can be converted into T- Flip-flop by connecting J and K input terminals
to a common point. If T=1, then Q n+1 =
. This unit changes state of the output with each clock
pulse and hence it acts as a toggle switch.
Truth table
T
0
1
Q
Clk
Ring Counter: Shift register can be used as ring counter when Q0 output terminal is
connected to serial input terminal.
An n-bit ring counter can have n different output states. It can count n-clock pulses.
Twisted Ring counter: It is also called Johnsons Ring counter. It is formed when output
terminal is connected to the serial input terminal of the shift register.
An n-bit twisted ring counter can have maximum of 2n different output states.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 290
Digital Logic
Counters: The counter is driven by a clock signal and can be used to count the number of clock
cycles counter is nothing but a frequency divider circuit.
Two types of counters are there:
(i) Synchronous
(ii) Asynchronous
Synchronous counters are also called parallel counters. In this type clock pulses are
applied simultaneously to all the flip flops
Asynchronous counters are also called ripple or serial counter. In this type of counters
the output of one flip flop is connected to the clock input of next flip flop and soon.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 291
Digital Logic
Memories
Semiconductor Memories
Magnetic Memories
Drum
Disk
Bubble
Core
PROM
Static RAM
Tape
EPROM
EEPROM
Dynamic RAM
Volatile Memory: The stores information is dependent on power supply i.e., the stored
information will remain as long as power is applied. Eg. RAM
Non- Volatile Memory: The stored information is independent of power supply i.e., the stored
information will present even if the power fails. Eg: ROM, PROM, EPROM, EEPROM etc.
Static RAM (SRAM): The binary information is stored in terms of voltage. SRAMs stores ones
and zeros using conventional Flip-flops.
Dynamic RAM (DRAM): The binary information is stored in terms of charge on the capacitor.
The memory cells of DRAMs are basically charge storage capacitors with driver transistors.
Because of the leakage property of the capacitor, DRAMs require periodic charge refreshing
to maintain data storage.
The package density is more in the case of DRAMs. But additional hardware is required for
memory refresh operation.
SRAMs consume more power when compared to DRAMs. SRAMS are faster than DRAMs.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 292
Complier Design
library,
relocatable object file
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 293
Complier Design
COMPILERS
A compiler is a program that reads a program written in one language the source
language and translates it into an equivalent program in another language the target
language
As an important part of this translation process, the compiler reports to its user the
presence of errors in the source program.
Source
code
Applications:
Compiler
Machine
code
Errors
Design of Interfaces
Design of language migration tools
Design of Re engineering Tools
Two-Pass Assembly:
The simplest form of assembler makes two passes over the input, where a pass consists of
reading an input file once. In the first pass, all the identifiers that denote storage locations are
found and stored in a symbol table.
In the second pass, the assembler scans the input again. This time, it translates each operation
code into the sequence of bits representing that operation in machine language, and it translates
each identifier representing a location into the address given for that identifier in the symbol
table.
The output of the second pass is usually relocatable machine code, meaning that it can be loaded
starting at any location L in memory; i.e., If L is added to all addresses in the code, then all
references will be correct. Thus, the out- put of the assembler must distinguish those portions of
instructions that refer to addresses that can be relocated.
Loaders and Link-Editors:
A program called loader performs the two functions of loading and link-editing.
The process of loading consists of taking relocatable machine code, altering the relocatable
addresses and placing the altered instructions and data in memory at the proper locations.
The link-editor makes a single program from several files of relocatable machine code.
Lexical analysis
Syntax Analysis
Semantic analysis
Intermediate code generation
Code optimization
Target code generation
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 294
Source
code
Front
end
Intermediate
Language
Back
end
Complier Design
Machine
code
Errors
Lexical Analyzer
Stream of tokens
Syntax Analyzer
Parse tree
Semantic Analyzer
Annotates Parse tree
Symbol Table
Management
Error Handling
Table
Intermediate form
Code Optimization
Optimized intermediate form
Code Generatin
Assembly Program
Fig. 9.1.2. Compiler structure
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 295
Complier Design
Symbol-Table Management
A symbol table is a data structure containing a record for each identifier, with fields for the
attributes of the identifier.
Symbol table is a data Structure in a compiler used for managing information about variables
& their attributes.
The syntax and semantic analysis phases usually handle a large fraction of the errors
detectable by the compiler.
The lexical phase can detect errors where the characters remaining in the input do not form
any token of the language.
Errors where the token stream violates the structure rules (syntax) of the language are
determined by the syntax analysis phase.
The lexical analyzer is the first phase of a compiler. Its main task is to read the input
characters and produce as output a sequence of tokens that the parser uses for syntax
analysis.
Sometimes, lexical analyzers are divided into a cascade of two phases, the first called
scanning and the second "lexical analysis."
The scanner is responsible for doing simple tasks, while the lexical analyzer does the more
complex operations.
Consider the expression
t=t
t
where t,t t are floats
Lexical analyzer will generate id
id
id
Syntax Analysis:
Complier Design
=
id
id
12
id
Semantic Analysis:
The semantic analysis phase checks the source program for semantic errors and gathers
type information for the subsequent code-generation phase.
It uses the hierarchical structure determined by the syntax-analysis phase to identify the
operators and operands of expressions and statements.
An important component of semantic analysis is type checking.
Ex. Now as t +1 &t are float. 12 is also converted to float
=
id
id
id
Int to float
12
id
te
te
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 297
Complier Design
Code Optimization:
The code optimization phase attempts to improve the intermediate code, so that faster-running
machine code will result. Some optimizations are trivial.
Advantages of Code Optimization:Improves Efficiency
Occupies less memory
Executes fast
id
id
.
te
Code Generation:
The final phase of the compiler is the generation of target code, consisting normally of
relocatable machine code or assembly code. Memory locations are selected for each of the
variables used by the program. Then, intermediate instructions are each translated into a
sequence of machine instructions that perform the same task. A crucial aspect is the assignment
of variables to registers.
Machine code will look like
MUL
ADD
MOV id
Where
.
contains id &
contains id .
Lexical Analysis
The process of forming tokens from an input stream of characters is called tokenization
and the lexer categorizes them according to a symbol type.
Int number ;
The substring nu ber is a lexe e for the token identifier or ID int is a lexe e
for the token keyword and ; is a lexe e for the token;
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 298
Complier Design
The main purpose of a lexical analyzer in a compiler application is to translate the input stream
into a form that is more manageable by the parser. However the tasks of a lexical analyzer can be
divided into two phases. They are: Scanning & Lexical analysis. Lexical analyzer can also detect
some lexical errors.
a) Scanning:
In the scanning phase it scans the input file and eliminates comments and white spaces in the
form of blank, tab and new-line characters. So the parser will have not to consider it. The
alternative is to incorporate white space into the syntax which is not nearly as easy to
implement. This is why most compilers do such tasks at scanning phase.
b) Lexical Analysis:
At the second phase it matches pattern for each lexeme to generate token. In some compilers,
the lexical analyzer is in charge of making a copy of the source program with the error message
marked in it. It may also implements preprocessor functions if necessary.
Issues in Lexical Analysis :
There are several reasons for separating the analysis phase of compiling into lexical analysis and
parsing.
1. Simpler design is perhaps the most important consideration.
2. Compiler efficiency is improved. A separate lexical analyzer allows us to construct a
specialized and potentially more efficient processor for the task.
3. Compiler portability is enhanced. Input alphabet peculiarities and other device-specific
anomalies can be restricted to the lexical analyzer.
Tokens, Patterns, Lexemes (Important Point)
When talking about lexical analysis, we use the terms "token," "pattern," and "lexeme" with
specific meanings.
There is a set of strings in the input for which the same token is produced as output. This set
of strings is described by a rule called a pattern associated with the token.
A lexeme is a sequence of characters in the source program that is matched by the pattern
for a token.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 299
Complier Design
TOKEN
SAMPLES LEXEMES
const
const
const
If
If
If
relation
Id
pi, count, D2
Num
3.1416, 0, 6.02E23
literal
core du
ed
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 300
9.2
Complier Design
Syntax Analysis
lexical
analyze
r
Parser
get next
token
parse
tree
rest of
front end
intermediate
representation
symbol
table
Context-Free Grammars
Grammars: It is a set of finite rules that may define infinite sentence.
A context- free grammar (grammar for short) consists of terminals, non terminals, a start
symbol, and productions.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 301
Complier Design
Derivations:
The central idea here is that a production is treated as a rewriting rule in which the non terminal
on the left is replaced by the string on the right side of the production. We can take a single Non
terminal and repeatedly apply productions in any order to obtain a sequence of replacements.
We call such a sequence of replacements as derivation. Likewise, we use + to mean derives
in one or more steps. To ean derives in zero or ore ste s.
"Given a grammar G with start symbol S, we can use the S
language generated by G. Strings in L (G) can contain only terminal symbols of G. We say a string
of terminals w is in L (G) if and only if S
that can be generated by a CFG is said to be a context-free language, If two grammars generate
the same language, the grammars are said to be equivalent.
If S
where
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 302
Complier Design
Parsing means construct a parse tree. Using this parse tree we determine whether a string can
be generated by a grammar. We can construct a parse tree in the following two ways:
Top-Down parsing: When we construct a parse tree expanding the root, then expand all the non
terminals until we get the leaves.
Bottom-up parsing: When we construct a parse tree from bottom i.e., from leaf and get the root
this parsing process is known as bottom-up parsing.
A parse tree ignores variations in the order in which symbols in sentential forms are replaced.
These variations in the order in which productions are applied can also be eliminated by
considering only leftmost (or rightmost) derivations. It is not hard to see that every parse tree
has associated with it a unique leftmost and/or a unique rightmost derivation.
Grammar could be ambiguous or unambiguous.
Ambiguous Grammars
Unambiguous Grammars
MD for a
Ambiguity:
A grammar that produces more than one parse tree for some sentence is said to be ambiguous.
Eliminating Ambiguity
Example : S S+S|S S|a
Expression :a+a*a
S
S
a
+
S
a
S S+T|T
T T*F|T
S
S
S
*
*
S
S
a
+
a
F a
This grammar is equivalent to
S S+S|S*S|a
But it is unambiguous
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 303
Complier Design
A grammar is left recursive if it has a non terminal A such that there is a derivation A
AA1|A2|..|Am|1|2|..|n
Where no begins with an A. Then we re lace the A-productions by
A 1A | 2A|..|nA
A1A|2A|..|mA|
Left recursion may appear either immediate or indirect left recursion in the grammar.
If productions contain immediate left recursion then the above rule can be applied
individually to the A-productions.
If productions contain indirect left recursion then substitution procedure applied in the
grammar.
SAa|b
AAc|Sd|
After removing left recursion for A, we will get
SAa|b
ASdA |A
AcA |
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 304
Complier Design
Left Factoring:
Left factoring is a grammar transformation that is useful for producing a grammar suitable for
predictive parsing. Left factoring is useful to avoid backtracking nature for parsers. The basic
idea is that when it is not clear which of two alternative productions to use to expand a non
terminal A, we may be able to rewrite the A-productions to defer the decision until we have seen
enough of the input to make the right choice. For example, if we have the two productions
st t if ex r then st t else st t | if ex r then st t
on seeing the input token if, we cannot immediately tell which production to choose to expand
statement.
In general if A 1| 2 are two A-productions, and the input begins with a nonempty string
derived fro A we do not know whether to ex and A to 1or to 2by seeing . However, we
ay defer the decision by ex anding A to A'. Then after seeing the in ut derived fro we
ex and A' to 1 or to 2. That is, left-factored, the original productions become
A A
A1| 2
Example:Sabc | abd | ae | f
Removing left factoring,
SabS | ae | f
Sc|d
Once, again, repeat, the same procedure:
SaS | f
S c | d
SbS | e
Top-Down Parsing
Recursive-Descent Parsing / Predictive Parsing:
Top-down parsing, called recursive descent, that may involve backtracking, that is, making
repeated scans of the input.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 305
Complier Design
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 306
Complier Design
The construction of a predictive parser is aided by two functions associated with a grammar
G. These functions, FIRST and FOLLOW, allow us to fill in the entries of a predictive parsing
table for G, whenever possible.
If a is any string of grammar symbols let FI ST ( ) be the set of ter inals that begin the
strings derived from a. If
, then e is also in FI ST().
Define FOLLOW(A), for non-terminal A, to be the set of terminals a that can appear
immediately to the right of A in some sentential form, that is, the set of terminals a such that
there exists a derivation of the form S Aa for so e and Note that there ay at so e
ti e during the derivation have been sy bols between A and but if so they derived and
disappeared. It A can be the rightmost symbol in some sentential form, then $ is in FOLLOW
{A).
To build FIRST (X)
1.
2.
3.
4.
To compute FOLLOW(A) for all non-terminals A, apply the following rules until nothing can be
added to any FOLLOW set.
To build follow(X):
1.
2.
3.
4.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 307
Example:
S aBDh
B cC
C bC
D EF
E gl
F fl
Complier Design
Follow S | {$}
Follow B | {g,f,h}
Follow C | {g,f,h}
Follow D | {h}
Follow E | {h,f}
Follow F | {h}
{ $ )} follow (T)
{ $ )} follow
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 308
Complier Design
(ii)
id
E
ETE
E TE
TFT
TFT
T
T
F
ETE
E
T
TFT
Fid
F(E)
Action
ETE
TFT
Fid
Pop id
T
E TE
Pop+
TFT
FID
Pop id
T
E
Success
LL (1) Grammars
Complier Design
If first (
), first (
. |
) are mutually disjoint then
Bottom-up parsing
Shift-reduce parsing attempts to construct a parse tree for an input string beginning at the
leaves (the bottom) and working up towards the root (the top).
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 310
Complier Design
Handles
Informally, a "handle" of a string is a substring that matches the right side of a production,
and whose reduction to the nonterminal on the left side of the production represents one
step along the reverse of a rightmost derivation.
In any cases the left ost substring that atches the right side of so e roduction A
is not a handle, because a reduction by the production A yields a string that cannot be
reduced to the start symbol.
A handle of a right-sentential form is a production A
and a position in where ay
be found.
If (A
k) is a handle then re lacing the in at osition k with a roduces the revious
right-sentential for in a right ost derivation of .
If G is unambiguous, then every right-sentential form has a unique handle.
The process we use to construct a bottom-up parse is called Handle-Pruning.
To construct a rightmost derivation
S =
While the primary operations of the parser arc shift and reduce, there are actually four possible
actions a shift-reduce parser can make: (1) shift, (2) reduce. (3) accept and (4) error.
1. In a shift action, the next input symbol is shifted onto the top of the stack.
2. In a reduce action, the parser knows the right end of the handle is at the top of the stack. It
must then locate the left end of the handle within the stack and decide with what nonterminal will be used to replace the handle.
3. In an except action, the parser announces successful completion of pursing.
4. In an error action, the parser discovers a syntax error that has occurred and calls an error
recovery routine.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 311
Complier Design
LR parsers:
The technique is called LR(k) parsing; the "L" is for left-to-right scanning of the input, the "R"
is for constructing a rightmost derivation in reverse, and the * for the number of input
symbols of look ahead that are used in making parsing decisions. When (k) is omitted, k is
assumed to be 1. LR parsing is attractive for a variety of reasons.
LR parsers can be constructed to recognize virtually all programming- language constructs
for which context-free grammars can be written.
The LR parsing method is the most general non-backtracking shift-reduce parsing method
known, yet it can be implemented as efficiently as other shift-reduce methods.
The class of grammars that can be parsed using LR methods is a proper superset of the class
of grammars that can be parsed with predictive parsers.
An LR parser can detect a syntactic error as soon as it is possible to do so on a left-to-right
scan of the input.
The principal drawback of the method is that it is too much work to construct an LR parser
by hand for a typical programming-language grammar.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 312
Initial state
Complier Design
LR parsing algorithm
stack :
input: w$
Loop{
if action [S a
shift S
Then ush (a) ush (S); i
else if action [S,a] = reduce A
then o ( *||) symbol:
push (A): push (goto[S A )
(S is the state after o ing sy bols)
else if action [S,a] = accept
then exit
else error
state id
0
r2
9
10
11
Stack
0
0 id 5
0F3
0T2
0E1
s7
s5
r2
r2
r4
r4
s4
r6
E
1
T F
2 3
ccc
r4
(
)
s4
s6
s5
r6
s5
s4
s5
s4
s6
r6
10
s11
r1
s7
r1
r1
r3
r3
r3
r3
r5
r5
r5
r5
Input
id+id*id$
+id*id$
+id*id$
+id*id$
+id*id$
Action
Shift 5
Reduce by Fid
Reduce by TF
Reduce by ET
Shift 6
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 313
0 E 1+6
0 E 1+6 id 5
0 E 1+6 F 3
0 E 1+6T9
0E1+6T9*7
0E1+6T9*7id5
0E1+6T9*7F10
0E1+6T9
0E1
id*id$
*id$
*id$
*id$
id$
$
$
$
$
Shift 5
Reduce by Fid
Reduce by TF
Shift 7
Shift 5
Reduce by F id
Reduce by TT*F
Reduce by EE+T
Accept
INPUT
STACK
Complier Design
LR
Parsing Program
action
OUTPUT
goto
LR Grammars :
A grammar for which we can construct a parsing table is said to be an LR grammar. Intuitively, in
order for a grammar to be LR it is sufficient that a left-to-right shift-reduce parser be able to
recognize handles when they appear on top of the stack. An LR parser does not have to scan the
entire stack to know when the handle appears on top. Rather, the state symbol on top of the
stack contains all the information it needs
Another source of information that an LR parser can use to help make its shift-reduce decisions
is the next K input symbols. The cases k =0 or k=1 are of practical interest, and we shall only
consider LR parsers with k 1 here. A grammar that can be parsed by an LR parser examining
up to k input symbols on each move is called an LR(k) grammar.
There is a significant difference between LL and LR grammars. For a grammar to be LR(k), we
must be able to recognize the occurrence of the right side of a production, having seen all of
what is derived from that right side with it input symbols of lookahead. This requirement is far
less than that for LL(k) grammars where we must be able to recognize the use of a production
seeing only the first k symbols of what its right side derives. Thus, LR grammars can describe
more languages than LL grammars.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 314
Complier Design
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 315
Complier Design
{closure ({S.S})}
repeat
for each set of items I in C and each grammar symbol X
such that goto (I.X) is not empty and not in C do
aDD goto (I.X) to C
until no more sets of items can be added to C
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 316
Example
Grammar
(
)
E EET.EE+.T
EE+T|T
TT.*F
TT*F|F
(
F(E) |id
TF.
closure (E.E)
(
EE
F(.E)
E.E+T
E.E+T
E.T
E.T
T.T*F
T.T*F
T.F
T.F
F.(E)
F.(E)
F.id
F.id
(
)
(
EE.
Fid.
EE.+T
)
(
)
EE+T.
T.T*F
)
T.F
F(E)
( )F.id
(
)
TT*F
F.(E)
F.id
(
)
F.(E)
EE.+T
)
goto(I . T) is I
goto(I F) is I
goto(I ( )isI
Complier Design
TT.*F
goto(I F) is I
goto (I ( ) is I
goto (I id ) is I
(
)
TT*F.
goto(I ( )isI
goto(I id) is I
( ))
F(E).
goto (I
) is I
goto (I ) is I
goto(I id) is I
Fig. 9.2.10The set of items construction
Checking whether the given grammar is LR(0) Grammar or not?
1. No Multiple Entries in the table. It is not LR(0)
2. If RR conflict (or) SR conflict is present, it is not LR(0)
Shift-reduce Parsing is a type of bottom up Parsing that Constraint a parse tree for an input
beginning at the leaves and working towards the root conflicts.
Perform reduce action when there is a handle on the top of the stack.
There are two problems that this Parser faces during parsing the string:
Shift-Reduce(SR) conflict:
are valid?
Reduce-Reduce (RR) conflict: Which rule to use for reduction if reduction is possible by
more one rule?
These conflicts come either because of ambiguous grammars or parsing method is not
powerful enough.
3. At Augmented Grammar dont check SR & RR, because Augmented grammar is dummy.
(Augmented Grammar is not is original Grammar.)
4. A state with conflict is referred as inadequate state or error state.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 317
LR (0): SR Conflict
Complier Design
RR Conflict
A .
Shift (S)
r .
Reduce (R)
A .
Reduce (R)
r .
Reduce (R)
SR
RR
If Aa. is in I where a is a string of terminal and non terminals then action [I,b] = reduce
Aa for all b in follow (A).
If S S. is in I where S is sy bol introduces for aug enting the gra
[I,$] = accept.
If goto (I A)
ar then action
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 318
Complier Design
NOTE: If two reduced productions are in any state, check for Follow of both productions. Follow
symbols for two productions should not contain any symbol common.
Constructing Canonical LR Parsing Tables
Canonical LR parsing
Carry extra information in the state so that wrong reductions by A a will be ruled out
Redefine LR items to include a terminal symbol as a second component (looked ahead
symbol)
The general form of the item becomes [Aa.B,a] which is called LR(1) item.
Item [Aa.,a] calls for reduction only if next input is a. the set of symbols
Canonical LR parse solve this problem by storing extra information in the state itself. The
problem we have with SLR parses is because it does reduction even for those symbols of
follow (A) for which it is invalid. So LR items are redefined to store 1 terminal (look ahead
symbol) along with state and thus, the items now are LR(!) items.
An Lr(1) item has the form : [Aa.B,a] and reduction is done using this rule only if input is
a. clearly the sy bols as for a subset of follow(A).
To find closure for canonical LR parse:
Repeat
for each item [Aa. a in I
for each production B in G
and for each ter inal b in first (a)
add item [B. b to I
until no more items can be added to I
For the given grammar:
SS
SCC
CcC | d
I closure ([SS,$])
S.S $
S.CC $
C.cC c
C.cCd
C.d
c
C.dd
As first (e$) = {$}
As first (C$) = first (C) = {c,d}
As first (Cc) = first (C) = {c,d}
As first (Cd) = first (C) = {c,d}
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 319
Complier Design
As first (e c) = {c}
As first (ed) = {d}
Algorithm Construction of the canonical LR parsing table.
Input: An aug ented gra
ar G.
Construction of canonical LR parse table
Construct C ={
c
s3
d
s4
s6
s3
r3
s7
s4
r3
s6
s7
r2
r2
S
1
C
2
acc
5
8
r1
9
r3
r2
An LR parse will not make any wrong shift/reduce unless there is an error. But the number of
states in LR parse table is too large. To reduce number of states we will combine all states which
have same core and different look ahead symbol.
If a conflict results from the above rules, the grammar is said not to be LR(1), and the algorithm
is said to fail.
1. The goto transitions for state i are determined as follows: If goto(IRRiRR, A) = IRRjRR, then
goto[I, A] = j.
2. All entries not defined by rules (2) and (3) are made "error."
3. The initial state of the arser is the one constructed fro the set containing ite [S S, $]
The table formed from the parsing action and goto functions produced by Algorithm is called the
canonical LR(1) parsing table. An LR parser using this table is called a canonical LR(1) parser. If
the parsing action function has no multiply-defined entries, then the given grammar is called as
LR(1) grammar. As before, we omit the "(1)" if it is understood.
Constructing LALR Parsing Tables
The tables obtained by it are considerably smaller than the canonical LR tables, yet most
common syntactic constructs of programming languages can be expressed conveniently by an
LALR grammar. For a comparison of parser size, the SLR and LALR tables for a grammar always
have the same number of states. Thus, it is much easier and more economical to construct SLR
and LALR tables than the canonical LR tables.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 320
Complier Design
Consider a pair of similar looking states (same kernel and different look ahead) in the set of
LR(1) items
. /
. $
Replace and by a new state
consisting of (Cd.,c/d/$)
similarly & and & from pairs
merge LR(1) items having the same core
Algorithm An easy, but space-consuming LALR table construction.
Input: An aug ented gra
ar G.
Output: The LAL arsing table functions action and goto for G.
Method:
Construct LALR parse table
Construct C ={ . . }set of LR(1) items
For each core present in LR(1) items find all sets having the same core and replace these sets
by their union
Let C {
} be the resulting set of items
Construct action table as was done earlier
Let J= U
Since
Merging items never produces shift/reduce conflicts but may produce reduce/reduce
conflicts.
Merging states will never give rise to shift-reduce conflicts but may give reduce-reduce
conflicts and have some grammars which were in canonical LR parse may becomes
ambiguous in LALR parse. To realize this, suppose in the union there is a conflict on
lookahead a because there is an item [Aa.,a] calling for a reducing by Aa, and there is
another item [B.a b calling for a shift. Then so e set of ite s fro which the union was
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 321
Complier Design
formed has item [Aa.,a], and since the cores of all these states are the same, it must have an
item [B.a c for so e c. but then this state has the sa e shift/reduce conflict on a, and
the grammar was not LR(1) as we assumed. Thus the merging of states with common core
can ever produce a shift/reduce conflict that was not present can have reduce-reduce
conflicts. Assume states {[Xa.,a], [Y. b[} and {[ a.,b], [Y. a }. Now, merging the two
states produces {[Xa., a/b], [Y. a/b } which generates a reduce-reduce conflict, since
reductions by both Xa and Y are called for on in uts a and b.
Summary
Unambiguos
s
LR(1)
LALR(1)
SLR(1)
Operator
Ambiguou
precedence
s
LR(0)
LL(0)
LL(k)
LL(2)
LL(k)
LL(1)
Grammar
1) Sa
2) EE+T/T
Ti
3) EE+T/T
TTF/F
FF*/ab
4) SA/a
Aa
LL(1)
LR(0)
SCR(1)
LR(k)
LALR(1) LR(1)
Complier Design
Operator-Precedence Parsing :
RELATION
MEANING
<
a yields recedence to b
a has the sa e recedence as b
a takes recedence over b
Fig. 9.2.2
There are two common ways of determining what precedence relations should hold between a
pair of terminals. The first method we discuss is intuitive and is based on the traditional notions
of associativity and precedence of operators.
The second method of selecting operator-precedence relations is first to construct an
unambiguous grammar for the language, a grammar that reflects the correct associativity and
precedence in its parse trees.
id
Complier Design
id
<
*
<
<
<
<
<
Complier Design
Method: Initially, the stack contains $ and the input buffer the string w$.
To parse, we execute the following program.
1) set ip(input pointer) to point to the first symbol of w$;
2) repeat forever
3) if $ is on top of the stack and ip points to $ then return
4) else begin
5) let a be the topmost terminal symbol on the slack
and let b be the: symbol pointed to by ip;
6) if a < b or a = b then begin push b onto the stack; advance ip to the next input symbol;
end;
7) else if a > b then /* reduce*/
8) repeat
9) pop the stack
10) until the top stack terminal is related by < to the terminal most recently popped
11) else error
12) end
Operator-Precedence Relations from Associativity and Precedence :
1. If operator
has higher precedence than operator , make
and
< . For
example, if * has higher precedence than +, make * .> + and + < *.
2. If and are operators of equal precedence (they may in fact be the same operator), then
make > and
if the operators are left-associative, or make
<
and
< if
they are right- associative.
3. Make <. id, id . > < . (, ( < , ) > , > ), . > $, and $ < for all operators .
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 325
9.3
Complier Design
There are two notations for associating semantic rules with productions, syntax-directed
definitions and translation schemes. Syntax-directed definitions are high-level specifications for
translations. Translation schemes indicate the order in which semantic rules are to be evaluated,
so they allow some implementation details to be shown.
Conceptually, with both syntax-directed definitions and translation schemes, we parse the input
token stream, build the parse tree, and then traverse the tree as needed to evaluate the semantic
rules at the parse-tree nodes. Evaluation of the semantic rules may generate code, save
information in a symbol table, issue error messages, or perform any other activities. The
translation of the token stream is the result obtained by evaluating the semantic rules.
Complier Design
Synthesized Attributes:
A syntax-directed definition that uses synthesized attributes exclusively is said to be an Sattributed definition. A parse tree for an S-attributed definition can always be annotated by
evaluating the semantic rules for the attributes at each node bottom up, from the leaves to the
root.
Inherited Attributes:
An inherited attribute is one whose value at a node in a parse tree is defined in terms of
attributes at the parent and/or siblings of that node. Inherited attributes are convenient for
expressing the dependence of a programming language construct on the context in which it
appears.
Dependency Graphs:
If an attribute b at a node in a parse tree depends on an attribute c, then the semantic rule for b
at that node ust be evaluated after the se antic rule that defines . The interde endences
among the inherited and synthesized attributes at the nodes in a parse tree can be depicted by a
directed graph called a dependency graph,
Evaluation Order :
A topological sort of a directed acyclic graph in any ordering m1, m2, . . . .mk of the nodes of the
graph such that edges go from nodes earlier in the ordering to later nodes; that is, if mi mj is an
edge from mi to mj, then mi appears before mj in the ordering.
Any topological sort of a dependency graph gives a valid order in which the semantic rules
associated with the nodes m a parse tree can be evaluated. That is, in the topological sort, the
dependent attributes c1, c2. . . .ck in a semantic rule b : = f(c1, c2. . . . ck) are a available at a node
before f is evaluated.
A syntax-directed definition is said to be circular if the dependency graph for some parse tree
generated by its grammar has a cycle.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 327
Complier Design
Syntax Trees
An (abstract) syntax tree is a condensed form of parse tree useful for representing language
constructs. The production S if then S1 else S2 might appear in a syntax tree as
If-then-else
B
Fig. 9.3.1
In a syntax tree, operators and keywords do not appear as leaves, but rather are associated with
the interior node that would be the parent of those leaves in the parse tree. Another
simplification found in syntax trees is that chains of single productions may be collapsed.
Syntax-directed translation can be based on syntax trees as well as parse trees.
Constructing Syntax Trees for Expressions
The construction of a syntax tree for an expression is similar to the translation of the expression
into postfix form. We construct sub trees for the sub expressions by creating a node for each
operator and operand. The children of an operator node are the roots of the nodes representing
the sub expressions constituting the operands of that operator.
Each node in a syntax tree can be implemented as a record with several fields. In the node for an
operator, one field identifies the operator and the remaining fields contain pointers to the nodes
for the operands. The operator is often called the label of the node. When it is used for
translation, the nodes in a syntax tree may have additional fields to hold the values (or pointers
to values) of attributes attached to the node. In this section, we use the following functions to
create the nodes of syntax trees for expressions with binary operators. Each function returns a
pointer to a newly created node.
1. mknode(op, left, right) creates an operator node with label op and two fields containing
pointers to left and right,
2. mkleaf (id, entry) creates an identifier node with label id and a field containing entry, a
pointer to the symbol-table entry for the identifier.
3. mkleaf (num, vat) creates a number node with label num and a field containing val, the value
of the number.
A Syntax-Directed Definition for Constructing Syntax Trees
Figure 9.3.2 contains an S-attributed definition for constructing a syntax tree for an expression
containing the operators + and -. It uses the underlying productions of the grammar to schedule
the calls of the functions mknode and mkleaf to construct the tree. The synthesized attribute
nptr for E and T keeps track of the pointers returned by the function calls.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 328
PRODUCTION
SEMANTIC RULES
E E
E.nptr:
EE
E.n tr
E T
E.nptr := T.nptr
T (E)
T.nptr := E.nptr
T id
T nu
Complier Design
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 329
Complier Design
Each cell in a linked list represents a node. The bucket headers, consisting of pointers to the first
cell in a list, are stored in an array. The bucket number returned by h(op, l, r) is an index into this
array of bucket headers.
0
List elements
representing nodes
9 ...
Array of bucket heads,
indexed by hash value
25
...
20
2
...
top
State
Val
...
...
X.x
Y.y
Z.z
...
Z
...
Complier Design
L-Attributed Definitions
When translation takes place during parsing, the order of evaluation of attributes is linked to the
order in which nodes of a parse tree are "created" by the parsing method. A natural order that
characterizes many top-down and bottom-up translation methods is the one obtained by
applying the procedure dfvisit in Fig. to the root of a parse tree. We call this evaluation order the
depth-first order. Even if the parse tree is not actually constructed, it is useful to study
translation during parsing by considering depth-first evaluation of attributes at the nodes of a
parse tree.
proceduredfvisit {n: node);
begin
for each child m of n, Train left to right do begin
evaluate inherited attributes of m;
dfvisit(m)
end;
evaluate synthesized attributes of n
end
Fig. 9.3.5 Depth-first evaluation order for attributes m a pane tree.
We now introduce a class of syntax-directed definitions, called L-attributed definitions; whose
attributes can always be evaluated in depth-first order. (The L is for left because attribute
information appears to flow from left to right.) L-attributed definitions include all syntaxdirected definitions based on LL (1) grammars.
L-Attributed Definitions
A syntax-directed definition is L-attributed if each inherited attribute of Xj, 1 j n, on the right
side of AX1 ,X2 .. n, depends only on
1. The attributes of the symbols Xl ,X2 .. 1-1 to the left of Xj in the production and
2. The inherited attributes of A.
Note that every S-attributed definition is L-attributed, because the restrictions (1) and (2)
apply only to inherit attributes.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 331
Complier Design
Translation Schemes
A Translation scheme is a context-free grammar in which attributes are associated with the
grammar symbols and semantic actions enclosed between braces {} are inserted within the right
sides of productions.
PRODUCTION
SEMANTIC RULES
ALM
L.i : = l(A.i)
M.i := m(L.s)
A.s := f(M.s)
AQ
R.i := r(A.i)
Q.i := q(R.s)
A.s := f(Q.s)
Semantic Rule
T T1 * F
T.val := T1.val
F.val
F.val }
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 332
Complier Design
Top-Down Translation
In this section, L-attributed definitions will be implemented during predictive parsing. We work
with translation schemes rather than syntax-directed definitions so we can be explicit about the
order in which actions and attribute evaluations take place. We also extend the algorithm for
left-recursion elimination to translation schemes with synthesized attributes.
Eliminating Left Recursion from a Translation Scheme
Since most arithmetic operators associate to the left, it is natural to use left- recursive grammars
for expressions. We now extend the algorithm for eliminating left recursion in to allow for
attributes when the underlying grammar of a translation scheme is transformed;the
transformation applies to translation schemes with synthesized attributes. The next example
motivates the transformation.
Suppose we have the following translation scheme
A A1 Y { A.a : = g(A1.a, Y.y) }
A x { A.a := f(X.x)}
Each grammar symbol has a synthesized attribute written using the corresponding lower case
letter and f and g are arbitrary functions. The generalization to additional A-productions and to
productions with strings in place of symbols X and Y can be done as below.
The algorithm for eliminating left recursion constructs the following grammar
AXR
RYR|
Taking the semantic actions into account, the transformed scheme becomes
A X {R.i. := f(X.x)}
R { A.a := R.s}
R Y {R1.i : = g(R.i, Y.y)}
R1 { R.s := R1.s}
R {R.s := R.i}
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 333
Complier Design
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 334
9.4
Complier Design
The scope rules of a language determine which declaration of a name applies when the name
appears in the text of a program.
The portion of the program to which a declaration applies is called the scope of that
declaration.
An occurrence of a name in a procedure is said to be local to the procedure if it is in the
scope of a declaration within the procedure; otherwise, the occurrence is said to be nonlocal.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 335
Complier Design
At compile time, the symbol table can be used to find the declaration that applies to an
occurrence of a name. When a declaration is seen, a symbol- table entry is created for it. As
long as we are in the scope of the declaration, its entry is returned when the name in it is
looked up.
Activation Records
Information needed by a single execution of a procedure is managed using a contiguous block of
storage called an activation record consisting of the collection of fields shown in Fig. Not all
languages or all compilers use all of these fields; often registers can take the place of one or more
of the . For languages like Pascal and it is custo ary to ush the activation record of a
procedure on the run-time stack when the procedure is called and to pop the activation record
of the stack when control returns to the caller. The purpose of the fields of an activation record is
as follows, starting from the field for temporaries.
1. Temporary values, such as those arising in the evaluation of expressions, are stored in the
field for temporaries.
2. The field for local data holds data that is local to an execution of a procedure.
3. The field for saved machine status holds information about the state of the machine just
before the procedure is called. This information includes the values of the program counter
and machine registers that have to be restored when control returns from the procedure.
4. The optional access link is used in to refer nonlocal data held in other activation records.
5. The optional control link points to the activation record of the caller.
returned value
actual parameters
optional control link
optional access link
saved machine status
local data
temporaries
9.5
Complier Design
Although a source program can be translated directly into the target language, some benefits of
using a machine-independent intermediate form are:
1. Retargeting is facilitated; a compiler for a different machine can be created by attaching a
back end for the new machine to an existing front end,
2. A machine-independent code optimizer can be applied to the intermediate representation.
For simplicity, we assume that the source program has already been parsed and statically
checked, as in the organization of Fig.9.5.1
intermediate
intermediate
code
code
generator
Static
checker
parser
code
generator
Intermediate languages
Syntax trees and postfix notation, respectively, are two kinds of intermediate representations. A
third, called three-address code, will be discussed here. The semantic rules for generating threeaddress code from common programming language constructs are similar to those for
constructing syntax trees or for generating postfix notation. Graphical Representations A syntax
tree depicts the natural hierarchical structure of a source program. A DAG gives the same
information but in a more compact way because common sub expressions are identified. A
syntax tree and DAG for the assignment statement a := b*-c + b* - a ear in Fig. 9.5.2
assign
a
assign
+
*
*
b
uminus b
*
uminus
c
(a) Syntax tree
uminus
c
(b) DAG
Complier Design
Three-Address Statements
Three-address statements are similar to assembly code. Statements can have symbolic labels
and there are statements for flow of control. A symbolic label represents the index of a threeaddress statement in the array holding inter-mediate code. Actual indices can be substituted for
the labels either by making a separate pass, or by using "back patching".
Here are the common three-address statements used in the remainder of this book:
1. Assign ent state ents of the for x o z where o is a binary arith etic or logical
operation.
2. Assignment instructions of the form x := op y, where op is a unary operation. Essential
unary operations include unary minus, logical negation, shift operators, and conversion
operators that, for example, convert a fixed-point number to a floating-point number.
3. Co y state ents of the for x y where the value of is assigned to x.
4. The unconditional jump goto L The three-address statement with label L is the next to be
executed.
5. Conditional jumps such as if x reloo goto L. This instruction a lies a relational
operator (<, =, > =, etc) to x and y, and executes the statement with label L next if x
stands in relation reloop to y. If not, the three-address statement following if x reloo
goto L is executed next, as in the usual sequence.
6. param x and call n for rocedure calls and return y where re resenting a returned
value is optional- Their typical use is as the sequence of three-address statements
param x1
param x2
ara n
callp,n
generated as pan of a call of the procedure p(x1,x .., xn). The integer n indicating the
number of actual parameters in "call p, n" is not redundant because calls can be nested.
Indexed assignments of the form x :=y[i] and x[i] := y. The first of these sets x to the
value in the location i memory units beyond location y. The statement x[i] := y sets the
contents of the location i units beyond x to the value of y. In both these instructions, x, y,
and i refer to data objects.
7. Address and pointer assignments of the form x : = &y, x := *y, and *x := y. The first of
these sets the value of x to be the location of y. presumably is a na e erha s a
temporary, that denotes an expression with an l-value such as A[i, j], and x is a pointer
name or temporary. That is, the r-value of x is the l-value (location) of some object. In the
state ent x
*y resu ably is a ointer or a te orary whose r- value is a location.
The r-value of x is ade equal to the contents of that location. Finally *x
sets the rvalue of the object pointed to by x to the r-value of y.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 338
Complier Design
EE
E (E )
E id
SEMANTIC RULES
S.code E.code || gen(id. lace E. lace)
E.place := newtemp;
E.code:= E . code||E . code||
gen (E. lace E . lace
E . lace)
E.place := newtemp;
E.code:= E . code||E . code||
gen (E. lace E . lace
E . lace)
E.place:= newtemp;
E.code: = E . code||gen(E. lace
u inus E / lace)
E.place:= E . lace;
E.code:= E . code
E.place := id.place;
E.code
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 339
Complier Design
and the statement following the code for S, respectively. These attributes represent labels
created by a function new label that returns a new label every time it is called. Note that S.after
becomes the label of the statement that comes after the code for the while statement. We assume
that a non-zero expression represents true; that is, when the value of E becomes zero, control
leaves the while statement.
Expressions that govern the flow of control may in general be boolean expressions containing
relational and logical operators.
Postfix notation can be obtained by adapting the semantic rules in Fig. 9.5.3. The postfix notation
for an identifier is the identifier itself. The rules for the other productions concatenate only the
operator after the code for the operands. For example, associated with the production E -E1 is
the semantic rule
E.code := E1.code || 'uminus'
In general, the intermediate form produced by the syntax-directed translations in this chapter
can be changed by making similar modifications to the semantic rules.
1) Quadruples
A quadruple is a record structure with four fields, which we call op, arg1 arg2, and result. The op
field contains an internal code for the operator. The three-address state ent x
o z is
re resented by lacing in arg z in arg2, and x in result. Statements with unary operators like
x ; = -y or x
do not use arg . O erators like ara use neither arg nor result. Conditional
and unconditional jumps put the target label in result. The quad- ruples in Fig.9.5.4(a) are for
the assignment a := b*-c + b*-c.
The contents of fieldsarg 1, arg 2, and result are normally pointers to the symbol-table entries
for the names represented by these fields;if so, temporary names must be entered into the
symbol table as they are created.
2) Triples
To avoid entering temporary names into the symbol table, we might refer to a temporary value
by the position of the statement that computes it. If we do
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 340
op
arg 1
arg 2
(0)
uminus
(1)
(2)
uminus
(3)
(4)
(5)
:=
Complier Design
result
t
t
t
(a) Quadruples
op
arg 1
arg 2
(0)
uminus
(1)
(2)
uminus
(3)
(2)
(4)
(1)
(3)
(5)
assign
(4)
(0)
(b) Triples
Fig. 9.5.4 Quadruple and triple representations of Three-address statements.
So, three-address statements can be represented by records with only three fields: op, arg1 and
arg2. The fields arg1 and arg2, for the arguments of op, are either pointers to the symbol table
(for programmer- defined names or constants) or pointers into the triple structure (for
temporary values). Since three fields are used, this intermediate code format is known as triples.
Except for the treatment of programmer-defined names, triples correspond to the
representation of a syntax tree or dag by an array of nodes.
Parenthesized numbers represent pointers into the triple structure, while symbol-table pointers
are represented by the names themselves. In practice, the information needed to interpret the
different kinds of entries in the arg1 and arg2 fields can be encoded into the op field or some
additional fields.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 341
Complier Design
The triples in Fig. 9.5.4(b) correspond to the quadruples in Fig. 9.5.5(a), Note that the copy
statement a := t5 is encoded in the triple representation by placing a in the arg1 field and using
the operator assign.
A ternary o eration like x[i ; requires two entries in the tri le structure as shown in Fig
9.5.5(a), while x := y[ i] is naturally represented as two operations in Fig.9.5.5(b).
Op
arg 1
arg 2
(0)
[ ]=
(1)
Assign
(0)
(a) x[i] := y
op
arg 1
arg 2
(0)
[ ]=
(1)
assign
(0)
(b) x := y[i]
Fig. 9.5.5More triple representation
3) Indirect Triples
Another implementation of three-address code that has been considered is that of listing
pointers to triples, rather than listing the triples themselves. This implementation is naturally
called indirect triples.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 342
9.6
Complier Design
Code Optimization
Complier Design
Loop Optimizations
Three techniques are important for loop optimization;
Code motion - which moves code outside a loop
Induction-variable elimination -which we apply to eliminate i and j
Reduction in strength, which replaces an expensive operation by a cheaper one, such as a
multiplication by an addition.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 344
9.7
Complier Design
Code Generation
It takes as input an intermediate representation of the source program and produces as output
an equivalent target program,
The output code must be correct and of high quality, meaning that it should make effective use of
the resources of the target machine. Moreover, the code generator itself should run efficiently.
Issues in code generation:
Memory Management
Mapping names in the source program to addresses of data objects in run-time memory is done
cooperatively by the front end and the code generator.
Instruction Selection
The nature of the instruction set of the target machine determines the difficulty of instruction
selection. The uniformity and completeness of the instruction set are important factors. If the
target machine does not support each data type in a uniform manner, then each exception to the
general rule requires special handling.
Instruction speeds and machine idioms are other important factors. If we do not care about the
efficiency of the target program, instruction selection is straightforward. For each type of threeaddress statement, we can design a code skeleton that outlines the target code to be generated
for that construct.
Register Allocation
Instructions involving register operands are usually shorter and faster than those involving
operands in memory. Therefore, efficient utilization of registers is particularly important in
generating good code. The use of registers is often subdivided into two sub problems:
1. During register allocation, we select the set of variables that will reside in registers at a
point in the program.
2. During a subsequent register assignment phase, we pick the specific register that a variable
will reside in.
Choice of Evaluation Order
The order in which computations are performed can affect the efficiency of the target code.
Some computation orders require fewer registers to hold intermediate results than others
The Target Machine
Familiarity with the target machine and its instruction set is a prerequisite for designing a good
code generator. Unfortunately, in a general discussion of code generation it is not possible to
describe the nuances of any target machine in sufficient detail to be able to generate good code
for a complete language on that machine.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 345
Complier Design
Flow Graphs
We can add the flow-of-control information to the set of basic blocks making up a program by
constructing a directed graph called a flow graph. The nodes of the flow graph are the basic
blocks. One node is distinguished as initial; it is the block whose leader is the first statement.
There is a directed edge from block B1 to blockB2 if B2 can immediately follow B1 in some
execution sequence; that is, if
1. there is a conditional or unconditional ju
fro the last state ent of 1 to the first
statement of B2, or
2. B2 immediately follows B1in the order of the program, and B1 does not end in an
unconditional jump.
We say that 1 is a predecessor of B2 and 2 is a successor of 1.
Representation of Basic Blocks
Basic blocks can be represented by a variety of data structures. For example, after partitioning
the three-address statements by Algorithm, each basic block can be represented by a record
consisting of a count of the number of quadruples in the block, followed by a pointer to the
leader (first quadruple) of the block, and by the lists of predecessors and successors of the block.
An alternative is to make a linked list of the quadruples in each block. Explicit references to
quadruple numbers in jump statements at the end of basic blocks can cause problems if
quadruples are moved during code optimization.
Loops
In a flow graph, what is a loop, and how does one find all loops? Most of the time, it is easy to
answer these questions. For exa le in Fig. there is one loo consisting of block 2 the general
answers to these questions, however, are a bit subtle, and we shall examine them in detail in the
next chapter. For the present, it is sufficient to say that a loop is a collection of nodes in a flow
graph such that
1. All nodes in the collection are strongly connected; that is, from any node in the loop to any
other, there is a path of length one or more, wholly within the loop, and
2. The collection of nodes has a unique entry, that is, a node in the loop such that the only way
to reach a node of the loop from a node outside the loop is to first go through the entry.
A loop that contains no other loops is called an inner loop.
Overview of Phases of a Compiler:
A Compiler takes as input a source program and produces as output an equivalent Sequence of
machine instructions. This process is so complex that it is divided into a series of sub process
called Phases.
The following gives brief explanation of phases of compiler with an example.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 346
Complier Design
1. Lexical Analysis:
It is the first phase of a Compiler. Lexical analyzer or Scanner reads the characters in the source
program and groups them into a stream of tokens. The usual tokens are identifiers, keywords,
Constants, Operators and Punctuation Symbols such as Comma and Parenthesis. Each token is a
Sub-String of the source program that is to be treated as a single unit. Tokens are of two types:
1. Specific Strings Eg: If, Semicolon
2. Classes of Strings Eg: identifier, Constants, Label.
A token is treated as a pair consisting of two parts.
1. Token type
2. Token Value.
The character sequence forming a token is called the lexeme for the token. Certain tokens will be
increased by a lexical value. The lexical analyzer not only generates a token, but also it enters the
lexeme into the symbol table.
Symbol table
1. a
2. b
3. c
Token values are represented by pairs in square brackets. The second component of the pair is
an index to the symbol table where the infor ations are ke t. For eg. Consider the ex ression
a = b + c * 20
After lexical Analysis it will be.
id1 = id2 + id3 *20
The lexical phase can detect errors where the characters remaining in the input do not form any
token of the language. Eg: Unrecognized Keyword.
2. Syntax Analysis:
It groups tokens together into Syntactic Structures called an Expression. Expressions might
further be combined to form statements. Often the syntactic structure can be regarded as a tree
where leaves are tokens, called as parse trees. The parser has two functions. It checks if the
tokens, occur in pattern that are permitted by the specification of the source language. i.e.,
Syntax checking.
For eg., Consider the expire the each position A+/B. After lexical Analysis this will be, as the
token sequence id+/id in the Syntax analyzer. On seeing / the syntax analyzer should detect an
error. Because the presence of two adjacent binary operators violates the formulation rules. The
second aspect is to make explicit the hierarchical Structure of incoming token stream by
identifying which parts of the token stream should be grouped. The Syntax analysis can detect
syntax errors. Eg., Syntax error.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 347
Complier Design
3. Semantic Analysis:
An important role of semantic analysis is type checking. Here the computer checks that the each
operator has operands that are permitted by the source language specification. Consider the eg:
x= a+b
The language specification may permit some operand coercions. For eg: When a binary
arithmetic operator is applied to an integer and real. In this case, the compiler array needs to
convert the integer to a real.In this phase, the compiler detects type mismatch error.
4. Intermediate Code generation:
It uses the structure produced by the syntax analyzer to create a stream of simple
instructions.Many styles are possible. One common style uses instruction with one operator and
a small number of operands.The output of the previous phase is some representation of a parse
tree. This phase transforms this parse tree into an intermediate language. One popular type of
intermediate language is called three address code.
A typical three- address code statement is A = B op C.Where A, B, C are operands. OP is a binary
Operator.
Eg: A = B + C * 20
Here, T1, T2, T3 are temporary variables. Id1, id2, id3 are the identifiers corresponding to A, B,
C.
5. Code Optimization:
It is designed to improve the intermediate code. So that the Object program less space.
Optimization may involve:
1. Detection & removal of dead code.
2. Calculation of constant expressions & terms.
3. Collapsing of repeated expressions into temporary storage.
4. Loop unrolling.
5. Moving code outside the loops.
6. Removal of unnecessary temporary-variables.
For e.g.: A: = B+ C * 20 will be
T1 = id3 * 20.0
Id1 = id2 + T1
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 348
Complier Design
6. Code generation:
Once optimizations are completed, the intermediate code is mapped into the target languages.
This involves; Allocation of registers & memory, Generation of connect references, Generation of
correct types, Generation of machine code.
Eg: MOVF id3, R2
MULF # 20.0, R2
MOVF id2, R1
ADDF R2, R1
MOVF R1, id1.
Few tips while you are checking for parsing grammars:
To check LL (1) grammar is explained with the example in the top down parsing section.
Remember that predictive parser or top down parser are by default called as LL (1) parser.
To check bottom up parsing grammars: Either you can check as following order or reverse
of the following order:
CLR(1),LALR(1),SLR(1),LR (0) grammar
To check CLR (1), construct transition diagram with lookaheads and check whether all the
states are without conflict.
To check LALR (1), Take the CLR (1) transition diagram and combine those states which are
having common core part with different lookaheads in production. Then check all the states
should not contain any conflict (apply the rules given in the class).
To check SLR(1), Take the same LALR(1) transition diagram and then check all the states
should not contain any conflict (apply rules given in the class).
To check LR (0), Take the same SLR (1) transition diagram without any lookahead for each
production and then check for conflict (apply rules given in the class).
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 349
Computer Networks
Components of Network
Computer networks consist of the following fundamental components:
1.
2.
3.
4.
5.
Server
Workstations
Network Interface Cards
Cabling system
Shared Resources and Peripherals.
Classification of Networks
Networks are classified upon the geographical area they span and can fall into the following
categories:
1. Local Area Network (LAN)
2. Metropolitan Area Network (MAN)
3. Wide Area Network (WAN)
LAN
The number of computers in LAN varies from small LANs that connect 2 to 25 computers, to
large LANs that may connect more than 10,000 computers. Normally LANs are owned by single
organization. The speed of data transfer ranges from several thousands bit per second to several
Mbps (Mega bits per second).
LAN Topologies:
Computer Networks
All network topologies derived from two basic types: The bus and the point-to-point and
a network can be made in one of the two different topologies:
o Bus topology
o Ring topology
Bus Topology
A bus network consists of a single medium that all the stations share.
Advantages of Bus Network
(a) Short cable length and simple wiring layout
(b) Resilient Architecture
(c) Easy to extent
Disadvantages of the Bus Network
(a) Fault diagnosis is difficult
(b) Fault isolation is difficult
(c) Repeater configuration
(d) Nodes must be intelligent
Ring Network
In a ring network, several devices or computers are connected to each other in a closed
loop by a single communication cable.
A ring network is also called loop network.
A ring can be unidirectional or bi-directional.
In the unidirectional ring, special software is needed if one computer should break down,
whereas, in the bi-directional ring, message can be sent in the opposite direction to meet
the requirement.
A MAN is basically a bigger version of a LAN and normally uses similar technology.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 351
Computer Networks
A special standard has been adopted for MAN is called DQDB (Distributed Queue Dual
Bus). DQDB consists of two unidirectional buses (cables) to which all computers are
connected.
A key aspect of a MAN is that there is broadcast medium to which all the computer are
attached.
A network can be made in one of the two different topologies:
1. Star Network
2. Tree Network
3. Mesh Network
1. Star Network
In a star network, devices or computers are connected to one centralized computer.
Advantages of the Star Network:
(a) Ease of service
(b) One device per connection
(c) Centralized control / problem diagnosis
Disadvantages of the Star Network:
(a) Long cable Length
(b) Difficult to expand
(c) Central node dependency: If the central node in a star network fails, the entire network will
go down.
2. Tree Network
In a tree network, several devices or computers are linkedhierarchical fashion. Tree
network is also known as hierarchical network.
This type of distribution system is commonly used in the organization where
headquarters communicate with regional offices and regional offices communicate with
distinct offices and so on.
Advantages of Tree Network:
(a) Easy to extend
(b) Fault isolation
Disadvantages of Tree Network:
(a) Dependent on the root
3. Mesh Network
A mesh network has point-point connections between every device in the network. Each
device requires an interface for every other device on the network, mesh topologies are not
usually considered practical.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 352
Computer Networks
Layered Architecture
The OSI model is built of seven ordered layers: physical (layer 1), data link (layer 2),
network (layer 3), transport (layer 4), session (layer 5), and presentation (layer 6), and
application (layer 7).
Each layer defines a family of functions distinct from those of the other layers. By
defining and localizing functionality in this fashion, the designers created an architecture
that is both comprehensive and flexible.
The interface between the layers facilitates passing of data downwards and upwards
between the adjacent layers
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 353
Computer Networks
Framing: The data link layer divides the stream of bits received from the network layer into
manageable data units called frames.
Physical addressing: If frames are to be distributed to different systems on the network, the
data link layer adds a header to the frame to define the physical address of the sender
(source address) and/or receiver (destination address) of the frame. If the frame is intended
for a system outside the senders network, the receiver address is the address of the device
that connects one network to the next.
Flow control: If the rate at which the data are absorbed by the receiver is less than the rate
produced in the sender, the data link layer imposes a flow control mechanism to prevent
overwhelming the receiver.
Error control: The data link layer adds reliability to the physical layer by adding mechanisms
to detect and retransmit damaged or lost frames. It also uses a mechanism to prevent
duplication of frames. Error control is normally achieved through a trailer added to the end
of the frame.
Access control: When two or more devices are connected to the same link, data link layer
protocols are necessary to determine which device has control over the link at any given
time.
Network Layer
Logical addressing: The physical addressing implement by the data link layer handles the
addressing problem locally. If a packet passes the network boundary. We need another
addressing system to help distinguish the source and destination systems. The network layer
adds a header to the packet coming from the upper layer that, among other things, indicators
the logical addresses of the sender and receiver.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 354
Computer Networks
Transport Layer
Service-point addressing: Computers often run several programs at the same time. For this
reason, source-to-destination delivery means delivery not only from one computer to the
next but also from a specific process (running program) on one computer to a specific
process (running program) on the other. The transport layer header therefore must include
a type of address called a service-point address (or port address). The network layer gets
each packet to the correctly to the computer, the transport layer gets the entire message to
the correct process on that computer.
Flow control: Like the data link layer, the transport layer is responsible for flow control.
However, flow control at this layer is performed end to end rather than across a single link.
Error control: Like the data link layer, the transport layer is responsible for error control.
However, error control at this layer is performed end-to-end rather than across a single link.
The sending transport layer makes sure that the entire message arrives at the receiving
transport layer with out error. (damage, loss, or duplication). Error correction is usually
achieved through retransmission.
Session Layer
Dialog control
Synchronization
Presentation Layer
Translation:Different host computer uses different format to present the data. The
presentation layer at the source host converts the source specific formatting of data to a well
known common format and the receiving host converts common data to its own format.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 355
Computer Networks
Application Layer
Mail services
Directory services
Amplifiers simply amplify the entire incoming signal i.e. they amplify both the signal and
noise.
Signal-regenerating repeaters differentiate between data and noise, and retransmitting only
the desired information. This reduces the noise. The original signal is duplicated, boosted to
its original strength, and sent.
Bridges:
Bridges connect two networks that use the same technology. The use of a bridge increases the
maximum possible size of your network. A bridge selectively determines the appropriate
segment to which it should pass a signal. It does this by reading the address of all the signals it
receives. The bridges read the physical location of the source and destination computer from
this address. There are two basic types of bridges:
Transparent bridges: That keeps a table of addresses in memory to determine where to send
data.
Source-routing bridges: That requires the entire route to be included in the transmission and
do not route packets intelligently. IBM Token Ring networks use this type of bridge.
Routers:
A router routes data between networks. Routers use logical and physical addressing to connect
two or more logically separate networks.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 356
Computer Networks
B-Routers:
Many routers may be appropriately called brouters. A brouter is a router that can also bridge.
Gateways:
A gateway is a device that can interpret and translate the different protocols that are used on
two distinct networks. The gateway can actually convert data so that it works with an
application on a computer on the other side of the gateway.
Hubs:
All networks require a central location to bring media segment (i.e. computers) together. These
central locations are called hubs. There are three categories of hubs:
Passive hubs: A passive hub simply combines the signals of network segments. There is no
signal processing or regeneration. With a passive hub, each computer receives the signals
sent from all other computer connected to hub.
Active hubs: Active hubs are like passive hubs except that they have electronic components
that regenerate or amplify signals. Because of this, the distance between devices can be
increased.
Intelligent hubs: In addition to signal regeneration, intelligent hubs perform some network
management and intelligent path selection. Many switching hubs can choose which
alternative path will be the quickest and send the signal that way.
The connection oriented network service, the service user first establishes a connection,
uses the connection, and then releases the connection.
The connection less service each message carries the full destination address, and each
one is routed through the system independently.
epeater elay
ridge elay
bit delay
transmission delay T.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 357
Computer Networks
A node transmits wheneverithas data to be sent .There will be collision and the colliding
frames will be destroyed. If the channel supports feedback property sender can find out
whether or not its frame was destroyed by listening to the channel.
If the frame was destroyed, the sender just waits the random interval of time and sends
again.
The throughput of ALOHA system is maximized by having the uniform frame size rather
than allowing variables length frames.
Slotted ALOHA
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 358
Computer Networks
When a station has data to send, if first listen to the channel to see if anyone else is
transmitting at the moment.
If the channel is busy, the station waits until it becomes idle
If the channel is free, it transmits the frame.
If a collision occurs the station waits a random interval of time and starts all over again.
The protocol is called 1-persistent if station transmits with the probability of 1 whenever
it finds the channel free.
The protocol is call p-persistent if the station transmits with the probability p, whenever
it finds the channel free.
Non-persistent CSMA.
In this protocol, continues attempt is made to be less greedy than in the 1-persisent
If no one else is sending, the station begins doing so itself.
If the channel is already in use it waits for random period of time and then repeats the
algorithm.
This algorithm should lead the better channel utilization and longer delays than 1persisent CSMA.
If two station sense the channels to be idle and begin transmitting simultaneously, they
will both detects the collision almost immediately.
Rather than finish transmitting the frames, they should abruptly stop as soon as collision
is detected, quickly terminating damaged frames saves time and bandwidth.
This protocol, known as CSMA/CD (Carrier sense Multiple Access Collision Detection), is
widely used in Ethernet.
The collision detection puts a restriction on minimum size of the frame a source can
transmit. In order to correctly detect the collision, the transmission duration of a frame
should be greater than round trip delay
With CSMA/CD the collisions are still possible
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 359
Computer Networks
Unlike token bus case, a ring really consists of collection of ring interfaces, connected
through point to point lines in a physical manner.
Each bit arriving at interfaces is copied into a 1 bit buffer and then copied out onto the
ring again. This coping step introduces a 1 bit delay. At each interface.
A special bit pattern, called a token, circulates around the ring whenever all station is
idle. When station wants to transmit the frame, it is required to seize the token and
remove it from the ring before transmitting.
Ring latency can be defined as the time required by the one bit traveling complete ring.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 360
Computer Networks
Framing: determining how the bits of the physical layer are grouped into frames,
Error control: detection and correction of transmission errors.
Flow control: Regulating the flow of frame so that slow receivers are not swapped by fast
sender, and general link management.
Framing
Commonly used methods:
Character count : The first framing method uses the field is the header to specify the
number of character in the frame, when the data link layer at the destination sees the
character count, it knows how many character follow, and hence the end of the frame.
The trouble with this algorithm is that the count can be garbled by a transmission error.
Character based Start and Stop: Start each frame start with the ASCII character sequence
DLESTX and end with sequence DLEETX (DLE is data link escape, STX is Start of Text,
ETX is End of Text). A serious problem occurs with this method when binary data, such
as object programs or floating point numbers, are being transmitted, It may easily
happen that the character for DLESTX or DLRETX occur in the data, which will interface
with the farming. One possible solution to this problem is character stuffing. The Sender
data lin layer insert one ASCII LE character just before each accidental
LE
character in the data, the data link layer in the receiving end removes the DLE before the
data are given to the network layer.
Bit pattern based start and stop: This technique allows the frames to contain any
arbitrary bits and allows the character codes with an arbitrary numbers of bits per
character. Each frame begins and ends with specific bit pattern, 01111110, called the flag
byte. Whenever the senders data link layers encounter five consecutive ones in the data,
it automatically stuffs the 0 bit into the outgoing bit stream. This bit stuffing is analogous
to character stuffing, in which the DLE is stuffed into the outgoing character stream
before DLE in the data.Whenever the receiver sees five consecutive 1 bits, followed by
zero bit, it automatically de-stuffs (i.e. deletes) the 0 bit, if the user data contains the flag
pattern, 01111110, this flag is transmitted as 011111010 but stored in the receiver
memory as 0111111.
Error Detection
Parity Bit:Most of errors result in change of a bit from 0 to 1 or 1 to 0. One of the simplest
error detection codes which is used in common is called as parity bit.This can be done by
counting number of 1s in message string.There are two types of parity bit:
Even parity bit: If in a message string number of 1s are even than it is called even parity
bit e.g. 101000 has even parity bit since, there are two 1s
Odd parity bit: If in a message string number of 1s are odd than it is called odd parity
bit e.g. 101100 has odd parity bit since, there are three number 1s .
The single parity bits can detect all even (odd) errors
Multiple parity bit schemes can also be used instead of one bit, to improve the error
detection efficiency
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 361
Computer Networks
A sequence of redundant bits, called the CRC or the CRC remainder, is appended to the
end of a data unit so that the resulting data unit becomes exactly divisible by a second,
predetermined binary number.
At its destination, the incoming data unit is divided by the same number. If at this step
there is no remainder, the data unit is assumed to be intact and is therefore accepted.
A remainder indicates that the data unit has been damaged in transit and therefore must
be rejected.
CRC Mechanism
First, a string of n 0s is appended to the data unit. The number n is one less than the
number of bits in the predetermined divisor, which are n + 1 bits.
Second, the newly appended data unit is divided by the divisor using a process called
binary division. The remainder resulting from this division is the CRC.
Third, the CRC of n bits derived in step 2 replaces the appended 0s at the end of the data
unit. Note that the CRC may consist of all 0s.
The receiver divides the received data whole string as a unit and divides it by the same
divisor that was used to find the CRC remainder.
If the string arrives without error, the CRC checker yields a remainder of zero and the
data unit passes. If the string has been changed in transit, the division yields a non-zero
remainder and the data unit does not pass.
The CRC generator (the divisor) is most often represented not as a string of 1s and 0s,
but as an algebraic polynomial.
The polynomial format is useful for two reasons: It is short, and it can be used to prove
the conceptmathematically
The selected polynomial should be at least divisible by x and x+1. The first condition
guarantees that all burst errors of length equal to the degree of the polynomial are
detected. The second condition guarantees that all burst errors affecting an odd number
of bits are detected
CRC can detect all burst errors that affect an odd number of bits.
CRC can detect all burst errors of length less than or equal to the degree of the
polynomial.
CRC can detect with a very high probability burst errors of length greater than the
degree of the polynomial.
Flow Control
Flow control refers to a set of procedures used to restrict the amount of data, the sender
can send before waiting for acknowledgement.
Two methods have been developed to control the flow of data across communications
links: stop- and- wait and sliding window
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 362
Computer Networks
In the stopandwait method of flow control, the sender sends one frame and waits for
an acknowledgement before sending the next frame.
The advantage of stopandwait is simplicity each frame is checked and acknowledged
before the next frame is sent. The disadvantage is inefficiency: stop- and-wait is slow.
In the sliding window method of flow control, the sender can transmit several frames
before needing an acknowledgement. Frames can be sent one right after another. The
link can carry several frames at once and its capacity can be used efficiently. The receiver
acknowledges only some of the frames, using a single ACK to confirm the receipt of
multiple data frames.
The advantage of sliding window is its higher efficiency. The disadvantage is its
complexity.
For stop and wait
T
S
S
1
T
here
TT
T
TT
robability of frame error
oun trip time
Throught put
ac et length
and width
The sliding window refers to imaginary windows at both the sender and receiver.
At sender side, this window can hold frames which are either transmitted but not
received the acknowledgment and frames that can be transmitted before requiring an
acknowledgement.
Frames may be transmitted as long as the window is not yet full.
To keep track of which frames have been transmitted and which received, sliding
window introduces an identification scheme based, on the size of the window. The
frames are numbered modulo-n which means they are numbered from 0 to n 1.
When receiver sends an ACK, it includes the number of next frame it expects to receive.
When the sender sees an ACK with the number 5, it knows that all frames up through
number 4 have been received.
A maximum of n 1 frame may be sent before an acknowledgement is required.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 363
Computer Networks
T
TT
Sender windows si e 1
T
1 T
Sliding window protocal channel utili ation
T
TT
N T
T
TT
Error Control
In the data link layer, the term error control refers primarily to methods of detection and
retransmission.
Anytime an error is detected in an exchange, a negative acknowledgement (NAK) is
returned and the specified frames are retransmitted. This process is called automatic
repeat request (ARQ).
Error control in the data link layer is based on automatic repeat request (ARQ), which
means retransmission of data in three cases: damaged frame, lost frame and lost
acknowledgement.
ARQ error control is implemented in the data link layer as an adjacent to flow control
Usually stop-and-wait flow control is usually implemented as stop-and-wait ARQ and
sliding window is usually implemented as one of two variants of sliding window ARQ,
called Go-back-N or selective-reject
Stop-and-wait ARQ is a form of stop-and-wait flow control extended to include
retransmission of data in case of lost or damaged fames.
For retransmission to work, four features are added to the basic flow control
mechanism:
The sending device keeps a copy of the last frame transmitted until it receives an
acknowledgement for that frame.
For identification purposes, both data frames and ACK frame are numbered after 0 and 1.
A data 0 frame is acknowledged by an ACK 1 frame, indicating that the receiver has
gotten data 0 and is now expecting data 1. This numbering allows for identification of
data frames in case of duplicating transmission.
If an error is discovered in a data frame, indicating that it has been corrupted in
transmission, a NAK frame is returned. NAK frames, which are not numbered, tell the
sender to retransmit the last frame sent. Stop-and-wait ARQ requires that the sender
wait until it receives an acknowledgement for the last frame transmitted before it
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 364
Computer Networks
transmits the next one. When the sending device receives a NAK, it resends the frame
transmitted after the last acknowledgement, regardless of number.
The sending device is equipped with a timer. If an expected acknowledgement is not
received within an allotted time period the sender assumes that the last data frame was
lost in transmit and sends it again.
Go-back-N ARQ and selective reject ARQ both are based on sliding window flow control.
To extend sliding window to cover retransmission of lost or damaged frames, three
features are added to the basic flow control mechanism:
The sending device keeps copies of all transmitted frames, until they have been
acknowledged.
In addition to ACK frames, the receiver has the option of returning a NAK frame the data
have been received damaged. NAK frame tells the sender to retransmit a damaged frame.
In this sliding window Go-Back-NARQ method, if one frame is lost or damaged, all frames
sent since the last frame acknowledged are retransmitted.
In selective-Reject ARQ, only the specific damaged or lost frame is retransmitted.
The receiving device must be able to sort the frames it has and insert the retransmitted
frame into its proper place in the sequence. To make such selectivity possible, a selective
reject ARQ system differs from a Go-back-N ARQ system in the following ways:
The receiving device must contain sorting logic to enable it to recorder frame received
out of sequence. It must also be able to store frame received after a NAK has been sent
until the damaged frame has been repaired.
The sending device must contain a searching mechanism that allows it to find and select
only the requested frame for retransmitted
A buffer in the receiver must keep all previously receives frame on hold until all
retransmissions have been sorted and any duplicate frames have been identified and
discarded.
To aid selectivity, ACK numbers, like NAK numbers, must refer to the frame receive (or
lost) instead of the next frame expected.
This complexity requires a smaller window size than is needed by the Go-back-N method
if it is to work efficiently. It is recommended that the window size be less than or equal
ton /2, where n 1 is the Go-back-N window size.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 365
Computer Networks
The main function of the network layer is routing packets from the source machine to the
destination machine.
The routing algorithm is that part of the network layer software responsible for deciding
which output line an incoming packet should be transmitted on.
In case of data datagrams (packet switched network), the routing decision must be made
new for every arriving data packet since the best route may have changed since last time.
In case of virtual circuits (circuit switched network), routing decisions are made only when
a new virtual circuit is being set up. Thereafter, data packets just follow the previously
established route.
Routing algorithms can be grouped into two major classes: non-adaptive and adaptive.
Adaptive algorithms, in contrast, change their routing decisions to reflect changes in the
topology, and usually the traffic as well.
Flooding
Another static algorithm is flooding, in which every incoming packet is sent out on every
outgoing line except the one it arrived on.
Flooding generates vast numbers of duplicate packets, in fact, an infinite number unless
some measures are taken to damp the process.
A variation of flooding that is slightly more practical is selective flooding. In this algorithm
the routers do not send every incoming packet out on every line, only on those lines that are
going approximately in the right direction.
Distance vector routing algorithms operate by having each router maintain a table (i.e, a
vector) giving the best known distance to reach destination and which line to use to get
there. These tables are updated by exchanging information with the neighbours.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 366
Computer Networks
In distance vector routing, each router maintains a routing table indexed by, and containing one
entry for, each router in the subnet. This entry contains two parts: the preferred outgoing line
to use for that destination and an estimate of the time or distance to that destination.
The metric used might be number of hops, time delay in milliseconds, total number of
packets queued along the path, or something similar.
The router is assumed to know the ''distance'' to each of its neighbours. If the metric is
hops, the distance is just one hop. If the metric is queue length, the router simply
examines each queue. If the metric is delay, the router can measure it directly with
special ECHO packets that the receiver just timestamps and sends back as fast as it can.
In effect, the complete topology and all delays are experimentally measured and distributed to
every router. Then Dijkstra's algorithm can be run to find the shortest path to every other
router. Below we will consider each of these five steps in more detail.
Learning about the Neighbours
The router on the other end is expected to send back a reply telling who it is.
The most direct way to determine this delay is to send over the line a special ECHO packet
that the other side is required to send back immediately. By measuring the round-trip time
and dividing it by two, the sending router can get a reasonable estimate of the delay.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 367
Computer Networks
If the sequence numbers wrap around, confusion will arise. The solution here is to use a 32bitsequence number. With one link state packet per second, it would take 137 years to wrap
around, so this possibility can be ignored.
If a router ever crashes, it will lose track of its sequence number. If it starts again at 0, the
next packet will be rejected as a duplicate.
Third, if a sequence number is ever corrupted and 65,540 is received instead of 4 (a 1-bit
error), packets 5 through 65,540 will be rejected as obsolete, since the current sequence
number is thought to be 65,540.
The solution to all these problems is to include the age of each packet after the sequence
number and decrement it once per second. When the age hits zero, the information from that
router is discarded. Normally, a new packet comes in, say, every 10 sec, so router
information times out when a router is down (or six consecutive packets have been lost, an
unlikely event). The Age field is also decremented by each router during the initial flooding
process, to make sure no packet can get lost and live for an indefinite period of time (a
packet whose age is zero is discarded).
Some refinements to this algorithm make it more robust. When a link state packet comes in
to a router for flooding, it is not queued for transmission immediately. Instead it is first put
in a holding area to wait a short while. If another link state packet from the same source
comes in before the first packet is transmitted, their sequence numbers are compared. If
they are equal, the duplicate is discarded. If they are different, the older one is thrown out.
To guard against errors on the router-router lines, all link state packets are acknowledged.
When a line goes idle, the holding area is scanned in round-robin order to select a packet or
acknowledgement to send.
Construct the entire subnet graph after collecting link state for every link.
Use Dijkstra's algorithm to compute shortest path to all possible destinations. The results of
this algorithm can be installed in the routing tables, and normal operation resumed.
Hierarchical Routing
In above case, hierarchical routing is used, the routers are divided into what we will call
regions, with each router knowing all the details about how to route packets to destinations
within its own region, but knowing nothing about the internal structure of other regions.
When different networks are interconnected, it is natural to regard each one as a separate
region in order to free the routers in one network from having to know the topological
structure of the other ones.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 368
Computer Networks
Broadcast Routing
One broadcasting method that requires no special features from the subnet is for the source
to simply send a distinct packet to each destination. Not only is the method wasteful of
bandwidth, but it also requires the source to have a complete list of all destinations. In
practice this may be the only possibility, but it is the least desirable of the methods.
A third algorithm is multi-destination routing. If this method is used, each packet contains
either a list of destinations or a bit map indicating the desired destinations. When a packet
arrives at a router, the router checks all the destinations to determine the set of output lines
that will be needed.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 369
Computer Networks
Network Layer
At the network layer TCP/IP supports the internetwork protocol (IP). IP, in turn contains four
supporting protocols: ARP, RARP, ICMP and IGMP.
Internetwork Protocol (IP)
IP is the transmission mechanism used by the TCP/IP protocols. It is an unreliable and
connectionless datagram protocol a best- effort delivery service.
IP transport data in packets called datagram (describe below) each of which is
transported separately.
Datagrams may travel along different routes and may arrive out of sequence or
duplicated.
Datagram
Packets in the IP layer are called datagrams. A datagram is a variable-length packet (up to
65,536 bytes) consisting of two parts: header and data. The header can be from 20 to 60 bytes
and contains information byte sections. A brief description of each field is in order.
Version: The first field defines the version number of the IP. The current version is 4 (IPv4),
with a binary value of 0100.
Header length (HLEN): The HLEN field defines the length of the header in multiples of four
bytes. The four bits can represent a number between 0 and 15, which when multiplied by 4,
gives a maximum of 60 bytes.
Service type: The service type field defines how the datagram should be handled. It includes
bits that define the priority of the datagram. It also contains bits that specify the type of
service the sender desires, such as the level of throughput, reliability and delay.
Total length: The total length defines the total length of the IP datagram. It is a two-byte field
(16 bits) and can define up to 65,535 bytes.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 370
Computer Networks
Addressing
In addition to the physical addresses (contained on NISs) that identify individual services, the
Internet requires an additional addressing convention; an address that identifies the connection
of a host to its network.Each Internet address consists of four bytes (32-bits) defining three
fields class type, netid, and hostid. These parts are of varying lengths, depending on the class of
the addresses
Classes
There are currently five different field-length patterns in use, each defining a class of address.
The different classes are designed to cover the needs of different types of organizations.
Class A addresses are numerically the lowest. They use only one byte to identify class type
and netid, and leave three bytes available for hostid numbers.
Class B uses two bytes to identify the netid and leave two bytes available for hostid numbers
Class C uses one three bytes to identify the netid and leave one byte available for hostid
numbers
Class D is reserved for multicast addresses. Multicasting allows copies of a datagram to be
passed to a select group of hosts rather than to an individual host. It is similar to
broadcasting, but where broadcasting requires that a packet be passed to all possible
destinations, multicasting allows transmission to a selected subset.
Class E addresses is reserved for future use.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 371
Computer Networks
Subnetting
As we previously discussed, an IP address is 32 bits long. One portion of the address indicates a
network (netid) and the other portion indicates the host (or router) on the network (hostid).
This means that there is a sense of hierarchy in IP addressing. To reach a host on the Internet,
we must reach the network using the portion of the address (netid). Then we must reach the
host itself using the second portion (hosted). In other words, classes A,B and C in IP addressing
are designed with two levels of hierarchy.
However, in many cases, these two levels of hierarchy are not enough. For example imagine an
organization with a class B address. The organization has two-level hierarchical addressing, but
it cannot have more than one physical network
With this scheme, the organization is limited of two levels of hierarchy. The hosts cannot be
organized into groups, and all of the hosts are at the same level. The organization has one
network with many hosts.
One solution to this problem is subnetting, the further division of a network into smaller
networks called subnetworks.
Three Levels of Hierarchy
Adding subnetworks creating an intermediate level of hierarchy in the IP addressing system.
Now we have three levels: netid, subnetid and hostid. The netid is the first level; it defines the
site. The second level is the subnetid; it defines the physical subnetwork.
The routing of an IP datagram now involves three steps: delivery to the site, delivery to the
subnetwok and delivery to the host.
Masking
Masking is a process that extracts the address of the physical network from an IP address.
Masking can be done whether we have subnetting or not. If we have not subnetted the network,
masking extracts the subnetwork address from an IP address
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 372
Computer Networks
The address resolution protocol (ARP)associates an IP address with the physical address.
ARP is used to find the physical address of the node when its Internet address is known.
When a host or a router needs to find the physical address of another host on its network, it
formats as ARP query packet that includes the IP address and broadcasts it over the
network.
Every host on the network receives and processes the ARP packet, but only the intended
recipient recognizes its internet address and sends back its physical address.
The host holding the datagram adds the address of the target host both to its cache memory
and to the datagram header, then sends the datagram on its way.
The reverse address resolution protocol (RARP)allows a host to discover its internet
address when it knows only its physical address.
Required for a diskless computer or the computer is being connected to the network for the
first time
RARP works much like ARP. The host wishing to retrieve its internet address broadcasts an
RARP query packet that contains its physical address to every host on its physical network. A
server on the network recognizes the RARP packet and returns the host internet address.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 373
Computer Networks
The internet control message protocol (ICMP)is a mechanism used by hosts and routers to
send notifications of network problems back to the sender.
IP is essentially an unreliable and connectionless protocol. ICMP. However, allows IP to
inform a sender. If a datagram is undeliverable.
A datagram travels from router to router until it reaches one that can deliver it to its final
destination. If a router is unable to route or deliver the datagram because of unusual
conditions or because of network congestion, ICMP allows it to inform the original source.
ICMP uses echo test/reply to test whether a destination is reachable and responding. It also
handles both control and error messages, but its sole function is to report problems, not
correct them, responsibility for correction lies with the sender.
Note that a datagram carries only the addresses of the original sender and the final
destination. It does not know the addresses of the previous routers that passed it along. For
this reason, ICMP can send messages only to the source, not to an intermediate router.
The IP protocol can be involved in two type of communication; unicasting and multi-casting.
Unicasting is the communication between one sender and one receiver. It is a one-to-one
communication. However, some processes some time need to send the same message to a
large number of receivers simultaneously. This is called multicasting, which is a one-tomany communication.
IP addressing supports multicasting. All 32-bit IP addresses that start with 1110 (class D)
are multicast addresses. With 28 bits remaining for the group address, more than 250
million addresses are available for permanently assigned.
Internet group message protocol (IGMP) has been designed to help a multi-cast router
identify the hosts in a LAN that are members of a multi-cast group. It is a companion to the IP
protocol.
Transport Layer
The transport layer is represented in TCP/IP by two protocols: TCP and UDP. Of these UDPis the
simpler; it provides non-sequenced transport functionality when reliability and security are less
important than size and speed. Most applications however require reliable end-to-end delivery
and so make use of TCP.
The IP delivers a datagram from a source host to a destination host, making it a host-to-host
protocol. A host receiving a datagram may be running several different concurrent processes,
any one of which is a possible destination for the transmission. In fact, although we have been
talking about hosts sending messages to other hosts over a network, it is actually a source
process that is sending a message to a destination process.
TCP/IP transport level protocols are port-to-port protocols that work on top of the IP protocols
to deliver the packet from the originating port to the IP services at the start of a transmission
and from the IP services to the destination port at the end. Each port is defined by a positive
integer address carried in the header of a transport layer packet. An IP datagram uses the host
32-bit internet address. A frame at the transport level uses the process port address of 16-bits,
enough to allow the support up to 65,536 to 65535) ports.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 374
Computer Networks
Source port address: The source port address is the address of the application program that
has created the message.
Destination port address: The destination port address is the address of the application
program that will receive the message.
Total length: The total length field defines the total length of the user datagram in bytes.
Checksum: The checksum is a 16-bit field used in error detection.
UDP provides only the basic functions needed for end-to-end delivery of a Transmission. It does
not provide any sequencing or reordering functions and cannot specify the damaged packet
when reporting an error (for which it must be paired with ICMP).UDP can discover that an error
has occurred; ICMP can then inform the sender that a user datagram has been damaged and
discarded. Neither, however, has the ability to specify which packet has been lost. UDP contains
only a checksum; it does not contain an IP or sequencing number for a particular data segment.
Transmission Control Protocol (TCP)
The Transmission Protocol (TCP) provides full transport layer service to applications.TCP is a
reliable stream transport port-to-port protocol. The term stream in this context, means
connection-oriented : a connection must be established between both ends of a transmission
before either may transmit data. By creating this connection .TCP generates a virtual circuit
between sender and receiver that is active for the duration of a transmission, (Connections for
the duration of an entire exchange are different and are handled by session functions in
individual application.) TCP begins each transmitted by altering the receiver that datagrams are
on their way (connection establishment) and ends each transmission with a connection
termination. In this way the receiver knows to expect the entire transmission rather than a
single packet.
IP and UDP treat multiple diagrams belonging to a single transmission as entirely separate units,
unrelated to each other. The arrival of each datagram at the destination is therefore a separate
event, unexpected by the receiver. TCP, on the other hand, as a connection oriented service, is
responsible for the reliable delivery of the entire stream of bits contained in the message
originally generated by the sending application. Reliability is ensured by provision for error
detection and retransmission of damaged frames; all segments must be received and
acknowledged before the transmission is considered complete and the virtual circuit is
discarded.At the sending ends of each transmission. TCP divides long transmission s into smaller
data units and packages each into a frame called a segment. Each segment includes a sequencing
number for recording after receipt, together with as acknowledgement ID numbers and a
window-size field for sliding window ARQ. Segments are carried across network link inside of IP
datagrams. At the receiving end, TCP collects each datagram as it comes in and reorders the
transmission based on sequence numbers.
The TCP Segment
The scope of the services provided by TCP requires that the segment header be extensive. A
comparison of the TCP segment format with that of a UDP datagram shows the differences
between the two protocols. TCP provides a comprehensive range of reliability functions but
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 375
Computer Networks
sacrifices speed (connections must be established, acknowledgements waited for, etc). Because
of its smaller frame size. UDP is much faster than TCP, but at the expense of reliability. A brief
description of each field is in order.
Source port address: The source port address defines the application program in the source
computer.
Destination port address: The destination port address defines the application program in
the destination computer.
Sequence number: A stream of data from the application program may be divided into two or
more TCP segments. The sequence number field shows the position of the data in the
original data stream.
Acknowledgement number: The 32-bit acknowledgement number is used to acknowledge
the receipt of data from the other communicating device. This number is valid only if the
ACK but in the control field (explained later) is set. In this case it defines the byte sequence
number that is next expected.
Header length (HLEN): The four-bit HLEN field indicates the number of 32-bit (four-byte)
words in the TCP header. The four bits can define the number up to 15. This is multiplied by
4 to give the total number of bytes in the header. Therefore, the size is of the header is 20
bytes; 40 bytes are thus available for the option sections.
Reserved: A six-bit field is reserved for future use.
Control: Each bit of the six-bit control field functions individually and independently. A bit
can either define the use of a segment or serve as a validity check for other fields. The urgent
bit, when set, validates the urgent pointer field. Both this bit and the pointer indicate that the
data in the segment are urgent. The ACK bit, when set, validates the acknowledgement
number field. Both are used together and have different functions, depending on the
segment type. The PSH bit is used to inform the sender that a higher throughput is needed. If
possible data must be pushed through paths with higher throughput is needed throughput.
The reset bit is used to reset the connection when there is confusion in the sequence
numbers. The SYN bit is used for sequence number synchronization in three types of
segments; connection request, connection confirmation (with the ACK bit set), and
confirmation acknowledgement (with the ACK bit set). The FIN bit is used in connection
termination in three types of segments; termination request, termination confirmation (with
the ACK bit set). And acknowledgement of termination confirm.
Window size: The window is a 16-bit field that defines the size of the sliding window.
Checksum: The checksum is a 16-bit field is used in error detection.
Urgent pointers: This is the last required field in the header. Its value is valid only if the URG
bit in the control field is set. In this case, the sender is informing the receiver that there are
urgent data in the data portion of the segment. This pointer defines the end of urgent data
and start of the normal data.
Options and padding: The remainder of the TCP header defines the optional fields. They are
used to convey additional information to the receiver of for alignment purposes.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 376
Computer Networks
Instead of using the full 32-bit IP address, many systems adopt meaningful names for their
devices and networks.
To solve the problem of network names; the Network Information Centre (NIC) maintains a
list of network names and the corresponding network gateway addresses.
DNS uses a hierarchical architecture, much like the UNIX file system. The first level of
hierarchy divides networks into the category of sub-networks, such as com for commercial,
mil for military, edu for education, and so on. Below each of these is another division that
identifies the individual sub network, usually one for each organization. This is called the
domain name.
The organi ations system manager can further divide the companys sub-networks as
desired, with each network called a sub-domain.
FTP is an internet services that transfer a data file from the disk on one computer to the disk
on another. File transfer services, which can copy a large volume of data efficiently, only
require a single person to manage the transfer.
The TELNET protocol specifies exactly how a remote login client and a remote login server
interact.
A URL identifies a particular Internet resource; for example a web page, a Gopher server, a
library catalog, an image, or a text file. URLs represent a standardized addressing scheme for
Internet resources, and help the users to locate these resources by indicating exactly where
they are. Every resource available via the World Wide Web has a unique URL.
URLs consist of letters, numbers, and punctuation. The basic structure of a URL is
hierarchical and hierarchy moves from left to right :Protocol:// server- name.domainname.top-level-domain; port/directory/filename.
The SMTP is the defined Internet method for transferring electronic mail.
SMTP is similar to FTP in many ways, including the some simplicity of operation. SMTP uses
TCP port numbers.
When a message is sent to SMTP, it places it in a queue.
SMTP attempts to forward the message from the queue whenever it connects to remote
machines.
If it cannot forward the message within a specified time limit, the message is returned to the
sender or removed.
When a connection is established, the two SMTP systems exchange authentication codes.
Following this, one system sends a MAIL command to the other to identify the sender and
provide information about the message.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 377
Computer Networks
The receiving SMTP returns an acknowledgment, after which a RCTP is sent to identify the
recipient. If more than one recipient, at the receiver location is identified, several RCTP
messages are sent, but the message itself is transmitted only once.
After each RCTP there is an acknowledgment. A DATA command is followed by the message
line, until a single period on a line by itself indicates the end of the message. The connections
to closed with a QUIT command.
HTTP is short for Hyper Text Transfer Protocol. It is the set of rules, or protocol that governs
the transfer of hypertext between two or more computers.
Hyper is the text that is specially coded using a standard system called Hypertext Markup
Language (HTML).
The HTML codes are used to create links. These links can be textual or graphic and when
clic ed on can lin the user to another resource such as other HT L documents text files,
graphics, animation and sound.
HTT is used on the client/serves principle. HTT allows computer A the client) to
establish a connection with computer the server and ma e a re uest. The server
accepts the connection initiated by the client is interested in and tells the server what
action to ta e on the se uence.
When a user select a hypertext link, the client programs on their computer uses HTTP to
contact the server, identify a resource , and ask the server to respond with an action. The
server accepts the request, and then uses HTTP to respond to or perform the action.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 378
Computer Networks
Counter mode
m mod n
Receiver side
Digital Signatures
Requirements:
1. The receiver can verify the claimed identity of the sender.
2. The sender cannot later repudiate the contents of the message.
3. The receiver cannot possibly have concocted the message himself.
1. Digital signatures using public-key cryptography..
2. Message Digests (MD5)
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 379
Computer Networks
3. SHA-1
4. Mutual authentication using public-key cryptography.
Firewalls
Aims:
Firewall Characteristics
Design goals:
All traffic from inside to outside must pass through the firewall (physically blocking all
access to the local network except via the firewall)
Only authorized traffic (defined by the local security police) will be allowed to pass
The firewall itself is immune to penetration (use of trusted system with a secure
operating system)
Types of Firewalls
Three common types of Firewalls:
1. Packet-filtering routers
2. Application-level gateways
3. Circuit-level gateways
4. Bastion host
1. Packet-filtering Router
Packet-filtering Router
IP address spoofing
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 380
Computer Networks
1. Application-level Gateway
2. Circuit-level Gateway
3. Bastion Host
Firewall Configurations
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 381
SE & WT
Software Development
The Waterfall Model
The Waterfall Model (WM) is an early lifecycle model. WM is based on engineering practice; it
works well if the requirements are well-understood and do not change this rarely happens in
practice. The Waterfall Model is important in the same sense as Newtons Theory of Gravity: its
wrong, but you cant understand relativistic gravitation if you do not understand Newtonian
gravitation.
A software project is divided into phases. There is feedback from each phase to the previous
phase, but no further.
1. Requirements Analysis.
2. Design and Specification.
3. Coding and Module Testing .
4. Integration and System Testing.
5. Delivery and maintenance.
WM is document driven. Requirements analysis yields a document that is given to the designers;
design yields a document that is given to the implementers; implementation yields documented
code.
Requirements Analysis (SRS)
Write a System Requirements Document (SRD) that describes in precise detail what the
customer wants.
Find out what the client wants. This should include what the software should do and also:
likely and possible enhancements;
platforms (machines, OS, programming language, etc);
cost;
delivery schedule;
terms of warranty and maintenance;
user training;
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 382
SE & WT
Note: The SRD does not say how the software works.
Major deliverable: SRD.
Design
Design a software system that satisfies the requirements. Design documentation has three parts:
Architecture Document (AD): An overall plan for the components of the system. The AD is
sometimes called High-level Design Document (HDD).
Module Interface Specifications (MIS): Description of the services provided by each software
module.
Internal Module Design (IMD): Description of how the module implements the services that it
provides.
In the AD, each module is a black box. The MIS describes each module as a black box. The IMD
describes each module as a clear box.
Major deliverable: AD, MIS, and IMD.
Implementation
Implement and test the software, using the design documents. Testing requires the development
of test plans, based on SRD, which must be followed precisely. Roughly: for each requirement,
there should be a test.
Major deliverable: Source code and Test results.
Delivery and Maintenance
The product consists of all the documentation generated and well-commented source code.
Maintenance includes:
Correcting: removing errors;
Adapting: For a new processor or OS or to new client requirements;
Perfecting: Improving performance in speed or space.
Maintenance is 60% to 80% of total budget for typical industrial software. This implies the need
for high quality work in the early stages. Good documentation and good coding practice make
maintenance easier, cheaper, and faster.
Reverse engineering is a rapidly growing field. Many companies have MLOCs of legacy code
developed 20 or 30 years ago in old languages (e.g. COBOL, FORTRAN, PL/I) with little
supporting documentation. Tools are used to determine how it works.
Delivery also includes customer assistance in the form of manuals, tutorials, training sessions,
response to complaints.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 383
SE & WT
The requirements of high quality are the same as the requirements for maintainability.
Maintenance is the solution, not the problem.
Software Tools
Software tools are an important part of software development. The larger the project, the more
important it is to use tools in its development.
Editor.
Compiler and Linker.
Version control system (RCS).
Software measurement (DATRIX).
Specification checkers (OBJ3, Larch Prover).
Test generators.
Graph editors for DFDs and other diagrams.
CASE tools for integrated development.
Browsers, library managers, etc.
SE & WT
The SRD must be complete. It must contain all of the significant requirements related to
functionality (what the software does), performance (space/time requirements), design
constraints (must run in 640Kb), and external interfaces.
The SRD must define the response of the program to all inputs.
The SRD must be verifiable. A requirement is verifiable if there is an effective procedure that
allows the product to be checked against the SRD.
The program must not loop .
The program must have a nice user interface.
The response time must be less than 5 seconds for at least 90% of queries.
The SRD must be consistent. A requirement must not conflict with another requirement.
When the cursor is in the text area, it appears as an I-beam. During a search, the cursor
appears as an hour-glass.
The SRD must be modifiable. It should be easy to revise requirements safely without
the danger of introducing inconsistency. This requires:
Good organization;
Table of contents, index, extensive cross-referencing;
Minimal redundancy.
The SRD must be traceable.
The origin of each requirement must be clear. (Implicitly, a requirement comes from the
client; other sources should be noted.)
The SRD may refer to previous documents, perhaps generated during negotiations
between client and supplier.
The SRD must have detailed numbering scheme.
The SRD must be usable during the later phases of the project. It is not written to be
thrown away! A good SRD should be of use to maintenance programmers.
The SRD is prepared by both the supplier with help and feedback from the client.
The client (probably) does not understand software development.
The supplier (probably) does not understand the application.
Writing Requirements
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 385
SE & WT
Anticipate change. Distinguish what should not change, what may change, and what will
probably change.
The FSM diagram will contain at least nodes and arcs.
The software may be eventually required to run on machines with the EBCDIC character
set.
Capability Maturity Model (CMM): CMM that defines key activities required at different level of
process maturity
Level 1:
Initial: The software process is characterized as adhoe and occasionally even chaotic. Few
process are defined, and success depends on individual effort.
Level 2:
Repeatable: Basic project management processes are established to track cost, schedule and
functionality
Level 3:
Defined: The software process for both management and engineering activities is documented,
standardized, and integrated into an organization.
Level 4:
Managed: Detailed measures of the software process and product quality are collected
Level 5:
Optimizing: Continuous process improvement is enabled by quantities feedback from the
process and from testing innovative ideas and technologies. This level include all characteristics
defined for level 4.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 386
SE & WT
Formal: Reasoning based on a mechanical set of rules (formal system). Example: programming
language, predicate calculus.
Software Qualities
Good software is:
Correct: The software performs according to the SRD. The SRD may be too vague (although it
should not be) in this case, conformance to a specification is needed.
Reliable: This is a weaker requirement than correct. E-mail is reliable messages usually
arrive but probably incorrect.
Robust: The software behaves well when exercised outside the requirements. For example,
software designed for 10 users should not fall apart with 11 users.
Performance: The software should have good space/time utilization, fast response times, and the
worst response time should not be too different from the average response time.
Friendly: The software should be easy to use, should not irritate the user, and should be
consistent.
The screen always mirrors the state.
One key one effect. E.g. F1 for help.
Verifiable: A common term that is not easily defined; it is easier to verify a compiler than a wordprocessor.
Maintainable:
Easy to correct or upgrade.
Code traceable to design; design traceable to requirements.
Clear simple code; no hackers tricks.
Good documentation.
Simple interfaces between modules.
More later.
Reusable: Programmers tend to re-invent wheels. We need abstract modules that can be used in
many situations. Sometimes, we can produce a sequence of products, each using code from the
previous one.
Example: accounting systems.
OO techniques aid reuse.
Portable: The software should be easy to move to different platforms. Recent developments in
platform standards (PCs, UNIX, X, . . .) have aided portability.
Interoperable: The software should be able to cooperate with other software (word-processors,
spread-sheets, graphics packages, . . .).
Visibility: All steps must be documented.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 387
SE & WT
SE & WT
These emerged during the 70s: Ada, Modula-n and provide true
Object oriented languages: The pure OOLs provide classes, which are a good basis for
modularity.
Abstraction: It is sometimes best to concentrate on general aspects of the problem while
carefully removing detailed aspects.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 389
SE & WT
Waterfall Model
The waterfall model is document-driven;
Good features:
Simple to understand;
Phases are important even if their sequence is not;
Works for well-understood problems;
Keeps managers happy.
Bad features:
Does not allow for change;
Does not work for novel or poorly understood problems;
Produces inaccurate estimates;
Does not allow for changing requirements;
Plethora of documents lead to bureaucratic project management with more concern for the
existence/size of documents than their meaning.
Evolutionary Model
The evolutionary model is increment driven and cyclical:
1. Deliver something (this is the increment);
2. Measure added value to customer (may be positive and negative);
3. Adjust design and objectives as required.
Evolution often requires prototypes.
Prototypes
A prototype is a preliminary version that serves as a model of the final product.
Examples:
Software Prototypes:
Emulate the user interface (UI) and see if people like it.
Develop application code without UI to assess feasibility
There are several kinds of prototypes.
Throwaway Prototype A throwaway prototype is not part of the final product.
Throwaway prototypes should:
Be fast to build;
Help to clarify requirements and prevent misunderstanding;
Warn implementers of possible difficulties.
Some languages are suited to prototyping: APL, LISP, SML, Smalltalk. Others are not: FORTRAN,
COBOL, C.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 390
SE & WT
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 391
SE & WT
Design
Design is conveniently split into three parts: the architecture of the system, the module
interfaces, and the module implementations. We give an overview of theses and then discuss
each part in more detail.
Design Documentation
The design documentation consist of:
AD Architectural Design
MIS Module Interface Specifications
IMD Internal Module Design
The document names also provide a useful framework for describing the design process.
Architectural Design
The AD provides a module diagram and a brief description of the role of each module.
Module Interface Specifications
Each module provides a set of services. A module interface describes each service provided by
the module.
Services are usually functions (used generically: includes procedure). A module may also
provide constants, types, and variables.
To specify a function, give:
name;
argument types;
a requires clause a condition that must be true on entry to the function;
an ensures clause a condition that will be true on exit from the function;
further comments as necessary.
This requires clause is a constraint on the caller. If the caller passes arguments that do not satisfy
the requires clause, the effect of the function is unpredictable.
This ensures clause is a constraint on the implementer. The caller can safely assume that, when
the function returns, this ensures clause is true.
This requires and ensures clause constitute a contract between the user and implementor of the
function. The caller guarantees to satisfy this requires clause; in return, the implementer
guarantees to satisfy this ensures clause.
Internal Module Design
The IMD has the same structure as the MIS, but adds:
data descriptions (e.g. binary search tree for NameTable);
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 392
SE & WT
Remarks on Design
What designers actually do:
Construct a mental model of a proposed solution.
Mentally execute the model to see if it actually solves the problem.
Examine failures of the model and enhance the parts that fail.
Repeat these steps until the model solves the problem.
Design involves:
Understanding the problem;
Decomposing the problem into goals and objects;
Selecting and composing plans to solve the problem;
Implementing the plans;
Reflecting on the product and the process.
But when teams work on design:
The teams create a shared mental model;
Team members, individually or in groups, run simulations of the shared model;
Teams evaluate the simulations and prepare the next version of the model.
Conflict is an inevitable component of team design: it must be managed, not avoided.
Communication is vital.
Issues may fall through the cracks because no one person takes responsibility for them.
Varieties of Architecture
The AD is a ground plan of the implementation, showing the major modules and their
interconnections.
An arrow from module A to module B means A needs B or, more precisely, a function of A calls
one or more of the functions of B.
The AD diagram is sometimes called a Structure Chart.
The AD is constructed in parallel with the MIS. A good approach is to draft an AD, work on the
MIS, and then revise the AD to improve the interfaces and interconnections.
Hierarchical Architecture
The Structure Diagram is a tree.
Top-down design tends to produce hierarchical architectures.
Hierarchical architectures are easy to do.
May be suitable for simple applications.
Do not scale well to large applications.
Leaves of the tree tend to be over-specialized and not reusable.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 393
SE & WT
Layered Architecture
The structure diagram has layers. A module may use only modules in its own layer and the layer
immediately below (closed architecture) or its own layer and all lower layers (open
architecture).
Layers introduced by THE system (Dijkstra 1968) and Multics (MIT, Bell Labs, General
Electric) (Corbato et al. 1965). (UNIX was designed in opposition to Multics).
Programs with utility functions are (informal) layered systems.
Requires a combination of top-down and bottom-up design. Top-down ensures that
overall goals are met. Bottom-up ensures that lower layers perform useful and general
functions.
High layers perform high-level general tasks. Lower layers perform specialized (but not
too specialized!) tasks.
Modules in lower layers should be reusable.
General Architecture
Arbitrary connections are allowed between modules.
Not recommended: cf. spaghetti code.
May be an indication of poor design.
Avoid cycles. Parnas: nothing works until everything works.
Event-Driven Architecture
In older systems, the program controlled the user by offering a limited choice of options at any
time
In a modern, event-driven system, the user controls the program. User actions are abstracted as
events, where an event may be a keystroke, a mouse movement, or a mouse button change.
The architecture consists of a module that responds to events and knows which application
module to invoke for each event. For example, there may be modules related to different
windows.
This is sometimes called the Hollywood approach: Dont call us, well call you. Calling
sequences are determined by external events rather than internal control flow.
Modules in an event-driven system must be somewhat independent of one another, because the
sequence of calls is unknown. The architecture may be almost inverted with respect to a
hierarchical or layered architecture.
Subsumption Architecture
A subsumption architecture, sometimes used in robotics, is an extension of a layered
architecture. The lower layers are autonomous, and can perform simple tasks by themselves.
Higher layers provide more advanced functionality that subsumes the lower layers.
Designing for Change
What might change?
1. Users typically want more:
commands;
reports;
options;
fields in a record.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 394
SE & WT
Solutions include:
Abstraction Example: abstract all common features of commands so that it is easy to add a
new command. The idea would be to add the name of a new command to a table somewhere
and add a function to implement the command to the appropriate module.
Constant Definitions There should be no magic numbers in the code.
Parameterization If the programming language allows parameterization of modules, use it.
C++ provides templates. Ada packages can be parameterized.
2. Unanticipated errors may occur and must be processed.
Incorporate general-purpose error detection and reporting mechanisms. It may be a good
idea to put all error message text in one place, because this makes it easier to change the
language of the application.
3. Algorithm changes might be required.
Usually a faster algorithm is needed. As far as possible, an algorithm should be confined to a
single module, so that installing a better algorithm requires changes to only one module.
4. Data may change.
Usually a faster or smaller representation is needed. It is easy to change data representations
if they are secrets known to only a small number of modules.
5. Change of platform (processor, operating system, peripherals, etc)
Keep system dependencies localized as much as possible.
6. Large systems exist in multiple versions:
different releases
different platforms
different devices
Module Design
Ideas about module design are important for both AD and MIS.
Language Support
The programming language has a strong influence on the way that modules can be designed.
Turbo-Pascal provides units which can be used as modules. A unit has an interface part
and an implementation part that provide separation of concern.
C does not provide much for modularization. Conventional practice is to write a .h file for
the interface of a module and a .c file for its implementation. Since these files are used by
both programmers and compilers, the interfaces contain information about the
implementation. For example, we can (and should) use typedefs to define types, but the
typedef declarations must be visible to clients.
Ada provides packages that are specifically intended for writing modular programs.
Unfortunately, package interfaces and package bodies are separate.
The modern approach is to have one physical file for the module and let the compiler
extract the interfaces. This is the approach used in Eiffel and Dee.
A Recipe for Module Design
1. Decide on a secret.
2. Review implementation strategies to ensure feasibility.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 395
3.
4.
5.
6.
SE & WT
Design Notations
A design can be described by diagrams, by text, or (preferably) by both.
Diagrams are useful to provide an overview of the design, showing how the parts relate to one
another. Text (Textual Design Notation, TDN) can be more precise, and as detailed as
necessary, but it may not be easy to see the overall plan from a textual design.
Text and graphics are complementary, with graphics working at a higher level of abstraction.
Design Strategies
We need a strategy, or plan, to develop the design. Strategies and architectures are related, in
that a particular strategy will tend to lead to a particular architecture, but there is not a tight
correspondence. For example, functional design tends to give a hierarchical architecture, but
does not have to.
Functional Design
Base the design on the functions of the system.
Similar to writing a program by considering the procedures needed.
A functional design is usually a hierarchy (perhaps with some layering) with main at the
root.
Compatible with top-down design and stepwise refinement.
Good feature: Functional design works well for small problems with clearly-defined
functionality.
Weaknesses:
Leads to over-specialized leaf modules.
Does not lead to reusable modules.
Emphasis on functionality leads to poor handling of data. For example, data structures
may be accessible throughout the system.
Poor information hiding.
Control flow decisions are introduced early in design and are hard to change later.
Structured Analysis/Structured Design (SA/SD)
Structured Design/Structured Analysis (SA/SD) is a methodology for creating functional designs
that is popular in the industry.
1. Formalize the design as a Data Flow Diagram (DFD). A DFD has terminators for input and
output, data stores for local data, and transformers that operate on data. Usually,
terminators are squares, data stores are parallel lines, and transformers are round. These
components are linked by labelled arrows showing the data flows.
2. Transform the DFD into a Structure Chart (SC). The SC is a hierarchical diagram that shows
the modular structure of the program.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 396
SE & WT
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 397
SE & WT
A class:
is a collection of objects that satisfy the same protocol (or provide the same services);
may have many instances (which are the objects).
All of this can be done with conventional techniques. OOP adds some new features.
Inheritance Suppose we have a class Window and we need a class ScrollingWindow.
We could:
rewrite the class Window from scratch;
copy some code from Window and reuse it;
inheritWindow and implement only the new functionality.
Class Scrolling Window
Inherits Window
new code to handle scroll bars etc.
redefine methods of Window that no longer work
Inheritance is important because it is an abstraction mechanism that enables us to develop a
specialized class from a general class.
Inheritance introduces a two new relationships between classes.
The first relationship is called is-a. For example: a scrolling window is a window.
If X is-a Y it should be possible to replace Y by X in any sentence without losing the meaning of
the sentence. Similarly, in a program, we should be able to replace an instance of Y by an
instance of X, or perform an assignment Y :=X.
For example, a dog is an animal. We can replace animal by dog in any sentence, but not the
other way round.5 In a program, we could write A :=D where A:Animal and D: Dog.
The second relationship is called inherits-from. This simply means that we borrow some
code from a class without specializing it. For example, Stack inherits from Array: it is not
the case that a stack is an array, but it is possible to use array operations to implement
stacks.
Organization of Object Oriented Programs Consider a compiler. The main data structure in a
compiler is the abstract syntax tree; it contains all of the relevant information about the source
program in a DAG. The DAG has various kinds of node and, for each kind of node, there are
several operations to be performed.
Using object oriented techniques, we can reverse the relationship between the library and the
main program. Instead of the application calling the library, the library calls the application. A
library of this kind is referred to as a framework. A framework is a coherent collection of classes
that contain deferred methods. The deferred methods are holes that we fill in with methods
that are specific to the application. The organization of a program constructed from a framework
is shown in Fig. 11.2.1. The structure of the system is determined by the framework; the
application is a collection of individual components.
The best-known framework is the MVC triad of Small talk . It is useful for simulations and other
applications. The components of the triad are a model, a view and a controller see Fig. 11.2.2.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 398
Framework
SE & WT
Library
Applicatio
n
Figure 11.2.1: A Program Using a Framework from a Library
Controller
Model
View
Figure 11.2.2: The Model-View-Controller Framework
The Model is the entity being simulated. It must respond to messages such as step (perform one
step of the simulation) and show (reveal various aspects of its internaldata).
The View provides one or more ways of presenting the data to the user. View classes might
include DigitalView,GraphicalView, BarChartView, etc.
The Controller coordinates the actions of the model and the view by sending appropriate
messages to them. Roughly, it will update the model and then display the new view.
Smalltalk provides many classes in each category and gives default implementations of their
methods. To write a simulation, all you have to do is fill in the gaps.
A framework is a kind of upside-down library. The framework calls the application classes,
rather than the application calling library functions. Frameworks are important because they
provide a mechanism for design re-use.
Functional or Object Oriented?
Functional Design (FD)
FD is essentially top-down. It emphasizes control flow and tends to neglect data.
Information hiding is essentially bottom-up. It emphasizes encapsulation (secrets) in
low-level modules.
In practice, FD must use both top-down and bottom-up techniques.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 399
SE & WT
Risk
A risk is the possibility that an undesirable event could happen.
Risk Estimation: Risk estimation invokes two tasks in rating a risk. The first task is estimating
the probability of the occurrence of a risk called risk probability and risk impact, cost of risk
event happening.
Risk Exposure: Risk exposure is the expected value of risk event.
Risk exposure = Risk probability Risk impact
Risk Decision Tree:
A decision tree gives a graphic view of the processing logic involved in decision making and the
corresponding actions taken.
A technique that can be used to visualize the risks of alternatives is to build a risk decision tree.
The top level branch splits based on alternatives available.
Reliability Metrics:
1. Mean Time To Failure(MTTF):
MTTF is the average time between two successive failures, observed over a large number of
failures.
To measure MTTF we can record the failure data for n failures. Let the failure occur at the
time instants t , t , t . . . . . t .
Then MTTF =
(
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 400
SE & WT
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 401
SE & WT
Varieties of Testing
We can perform verification and validation by testing. A single test can provide both validation
and verification. A failure (e.g. system crashes) reveals an internal fault in the system
(verification). An incorrect result indicates that the software does not meet requirements
(validation).
The main problem with testing is that testing can never be complete.
1. Goal-driven testing:
Requirements-driven testing: Develop a test-case matrix (requirements vstests) to insure
that each requirement undergoes at least one test. Tools areavailable to help build the
matrix.
Structure-driven testing: Construct tests to cover as much of the logical structure of the
program as possible. A test coverage analyzer is a tool that helps to ensure full coverage.
Statistics-driven testing: These tests are run to convince the client that the software is
working by running typical applications. Results are often statistical.
Risk-driven testing: These tests check worst case scenarios and boundary conditions. They
ensure robustness.
2.
1.
2.
3.
Phase-driven testing:
Unit testing. Test individual components before integration.
Integration testing. Assemble the units and ensure that they work together.
System testing. Test the entire product in a realistic environment.
Designing Tests
A test has two parts:
A procedure for executing the test. This may include instructions for getting the system
into a particular state, input data, etc.
An expected result, or permitted range of results.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 402
SE & WT
Stages of Testing
Unit Testing: Test an individual unit or basic component of the system. Example: test a function
such as sqrt.
Module Testing: Test a module that consists of several units, to validate the interaction of the
units and the module interface.
Subsystem Testing: Test a subsystem that consists of several modules, to validate module
interaction and module interfaces.
Integration Testing: Test the entire system.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 403
SE & WT
Acceptance Testing: Test the entire system thoroughly with real data to satisfy the customer that
the system meets the requirements.
Testing Strategies
The strategies described here apply to development testing, not acceptance testing.
Top-down Testing
Goes from system to subsystem to module to unit;
Requires that we write stubs for parts of the system that are not yet completed.
A stub:
Replaces a module or unit for the purposes of testing;
Must return with a valid response;
May do nothing useful;
May always do the same thing;
May return random values;
May handle specific cases.
Advantages (+) and disadvantages () of top-down testing:
Catches design errors (but these should have been caught in design reviews).
Enables testing to start early in the implementation phase.
It is hard to write effective stubs.
Stubs take time to design and write.
Bottom-up Testing
Test units, then modules, then subsystems, then system. We require drivers to exercise each unit
or module because its environment does not exist yet. Drivers must
provide environment and simulated input
check outputs
Advantages ( ) and disadvantages () of bottom-up testing:
Each component is tested before it is integrated into a larger component.
Debugging is simplified because we are always working with reliable components.
Drivers must be written; drivers are usually more complex than stubs and they may contain
errors of their won.
Important errors (e.g. design errors) may be caught late in testing perhaps not until
integration.
Mixed strategies are also possible. We can aim at gradual refinement of the entire system, doing
mostly top-down testing, but with some bottom-up testing.
The order of testing and the order of implementation must be chosen together. Clearly, topdown testing requires top-down coding.
Alpha test: It conducted at the developers site by a customer in a controlled environment
Beta test: It is a line application of the software in an environment that cant be controlled by
developer, and unlike alpha testing developer is generally not present customer records all the
problem encounter during beta testing
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 404
SE & WT
When do we stop?
Ideal: Stop when all tests succeed.
Practice: stop when the cost of testing exceeds the cost of shipping with errors.
Preparing Test Cases
A Test Set consists of a list of tests. Each test should include the following three components:
Purpose: For an acceptance test, the purpose is an SRD item. For a unit or subsystem test, the
purpose is an internal requirement based on the design.
Data: The environment (i.e. state of the system when the test is conducted), inputs to functions,
etc.
Expected Result: The effect of conducting the test, the value returned by a function, the effect of a
procedure, etc.
A Test Plan has the following components:
A description of the phases of testing. For example: unit, system, module.
The objectives of the testing phase (verify module, validate subsytem, etc).
A schedule that specifies who does what to which and when.
The relationship between implementation and testing schedules (dont schedule a test
before the component is written).
Tracing from tests to requirements.
How test data are generated.
How test results are recorded.
Reviews, Walkthroughs, and Inspections
The basic idea of reviews, walkthroughs, and inspections is the same: A team examines a
software document during a meeting. Studies have shown that errors are found more effectively
when a group of people work together than when people work individually.
Common features include:
A small group of people;
The person responsible for the document (analyst, designer, programmer, etc) should
attend;
One person is responsible for recording the discussion;
Managers must not be present, because they inhibit discussion;
Errors are recorded, but are not corrected.
During a review:
The author of the document presents the main themes;
Others criticize, discuss, look for omissions, inconsistencies, redundancies, etc.
Faults and potential faults are recorded.
During a walkthrough:
Each statement or sentence is read by the author;
Others ask for explanation or justification if necessary.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 405
SE & WT
Example:
Since n > 0, we can divide . . . .
How do you know that n > 0?
During an inspection:
Code is carefully examined, with everyone looking for common errors. As usual, there is
variation in usage: an IBM inspection is close to what we have called a walkthrough.
Some general rules:
Teams prepare in advance, e.g. by reading the documentation.
Meetings are not long at most 3 hours so concentration can be maintained.
A moderator is advisable to prevent discussions from rambling.
The author may be required to keep silent except to respond to questions. If the author
explains what she/he thought she/he was doing, others may be distracted from what is
actually written.
All members must avoid possessiveness and egotism, cooperating on finding errors, not
defending their own contributions.
McCabes CYCLOMATIC NUMBER (Product metric)
McCabes cyclomatic complexity is based on fact that complexity is related to the control flow of
the program. It defines an upper band on the number of independent paths in a program.
Method I
Given a control flow graph G of a program, the cyclomatic complexity can be computed as
=
2
Where E is the number of edges and N is the number of nodes in the control flow graph.
Method II
Cyclomatic complexity = Total No. of bounded areas +1 (for planar graphs)
Method III
If N is the number of decision statements of a program then the McCabes metric is equal to N 1
Software Maturity Index
Software maturity index (SMI) provides an indication of the stability of a software product
based on changes that occur for each release of the product.
M = Number of modules in the current release.
F = Number of modules in the current release that have been changed.
F = Number of modules in the current release that have been added.
F = Number of modules from the preceding release that were deleted in the current release.
Software Maturity Index is computed in the following manner:
(
)
SMI =
Coupling Metric:
Module coupling provides an indication of the connectedness of a module to other module,
global data, and outside environment.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 406
SE & WT
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 407
SE & WT
11.4 HTML
HTML Structure
HTML uses tags that are encased in brackets like the following:
<>
HTML documents consist of elements which are constructed with tags. For instance, a paragraph
is considered to be an html element constructed with the tags <P> and </P>. The <P> tag
begins the paragraph element and the </P> tag ends the element.
Not all tags have a tag for ending the element such as the line break, <br> tag. The HTML
document is begun with the <html> tag and ended with the </html> tag. Elements of an HTML
document include the HEAD, BODY, paragraphs, lists, tables, and more. Some elements have
attributes embedded in the tag that define characteristics of the element such as the placing of
text, size of text, source of an image, and other characteristics depending on the element.
An HTML document is structured with two main elements:
1. HEAD
2. BODY
An Example HTML File
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta name="GENERATOR" content="Arachnophilia 3.9">
<meta name="description" content="Comprehensive Documentation and information about
HTML.">
<meta name="keywords" content="HTML, tags, commands">
<title>The CTDP HTML Guide</title>
<link href="style.css" rel="stylesheet" type="text/css">
<!-- Background white, links blue (unvisited), navy (visited), red (active) -->
</head>
<body>
<center><h1>HTML Document Structure</h1></center>
<p>
This is a sample HTML file.
</p>
</body>
</html>
Comments begin with <! and end with the > bracket. The tags "HTML", "BODY", and all others
may be in capital or small letters, however the new XHTML standard requires small letters so
small letters are recommended.
In the above file, there is a header and a body. Normally you can copy this file and use it as a
template to build your new file while being sure to make some modifications. You can edit HTML
using a standard editor, but it is easier to use an HTML editor like Arachnophilia since it displays
the tags with different colors than the text is displayed in. Also note the LINK element above
which specifies a style sheet to be used. This is a file with a name "style.css". This is a text file
that may be created with a text editor but must be saved in plain text format.
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 408
SE & WT
HTML Header
The HTML header contains several notable items which include:
1. doctype - This gives a description of the type of HTML document this is.
2. meta name="description" - This gives a description of the page for search engines.
3. meta name="keywords" - This line sets keywords which search engines may use to find your
page.
4. title - Defines the name of your document for your browser.
The <!DOCTYPE> Line
The <!DOCTYPE> line is used to define the type of HTML document or page. It has no ending
tag. The three document types that are recommended by the World Wide Web Consortium
(W3C) are:
1. <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN">. This implies strict adherence
with HTML 4 standards.
2. <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN">. This supports
frameset tags.
3. <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">. This is used to
support depreciated HTML 3.2 features. It does not support frames.
Elements in the Header
Elements allowed in the HTML 4.0 strict HEAD element are:
BASE - Defines the base location for resources in the current HTML document. Supports
the TARGET attribute in frame and transitional document type definitions.
LINK - Used to set relationships of other documents with this document.
META - Used to set specific characteristics of the web page and provide information to
readers and search engines.
SCRIPT - Used to embed script in the header of an HTML document.
STYLE - Used to embed a style sheet in the HTML document.
TITLE - Sets the document title.
The additional element allowed by the HTML 4.0 transitional standard is:
ISINDEX (Depreciated) - Allows a single line of text input. Use the INPUT element rather
than ISINDEX.
The <META> Element
The <META> element is used to set specific characteristics of the web page and provide
information to readers and search engines. It has no ending tag.
Attributes
http-equiv - Possible values are:
o refresh - The browser will reload the document after the specified seconds that is
specified with the CONTENT value have elapsed. Ex: <META HTTP-EQUIV=refresh
CONTENT=45>
o expires - Gives the date that content in the document is considered unreliable.
o reply-to - A an email address of the responsible party for the web page. This attribute is
not commonly used. Ex: <META HTTP-EQUIV=reply-to
CONTENT="ctdp@tripod.com">
Name - Provides non-critical information about the document possibly useful to someone
looking at it. Possible values are:
o Author - The person who made the page or the HTML editor name . Ex: <META
NAME=author CONTENT="Mark Allen">
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 409
SE & WT
description - An explanation of the page or its use, used by search engines at times to
provide a page summary. Ex: <META NAME=description CONTENT="The CTDP Home
Page">
o copyright - A copyright notice for the page. Ex: <META NAME=copyright
CONTENT="Copyright 2000, Mark Allen">
o keywords - A list of keywords which are separated by commas. These keywords are used
by search engines. EX: <META name="keywords" CONTENT="computer
documentation, computers, documentation, computer help">
This section is very important if you want your web page to be found by search engines.
Please note that keywords are separated by commas, not spaces and that the words
"computer documentation" are treated by search engines as one word. If someone enters the
phrase "computer documentation" when doing a search, it gives the web page a much
greater chance of being found than just having the separate keywords "computer" and
"documentation".
o date - <META name="date" CONTENT="2000-05-07T09:10:56+00:00">
CONTENT - Specifies a property's value such as the content of this document is text/HTML.
scheme - Names a scheme to be used to interpret the property's value.
o
SE & WT
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 411
SE & WT
Description
Supplies contact information for the document
Used to quote in block form
Depreciated
A container allowing specific style to be added to a block of
text.
A container allowing specific style to be added to a block of
text.
A container allowing multiple frames (HTML documents) to
be placed on a web browser.
Comment
-
Headings
HR
Horizontal rule
ISINDEX
Depreciated
DIR
FRAMESET
NOFRAMES
NOSCRIPT
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 412
SE & WT
Name
P
Description
Comment
PRE
TABLE
FORM
Includes table
sub elements
-
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 413
SE & WT
SE & WT
SE & WT
(email | letter)
(header, subject?,
text+)
(header, recipient* ,
date?)
(sender, recipient*,
date)
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 416
SE & WT
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 417
Reference Book
Reference Books
Mathematics:
Operating Systems:
Computer Organization:
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 418
Reference Book
Database:
Computer Networks:
Theory of Computations:
Compiler Design:
THE GATE ACADEMY PVT.LTD. H.O.: #74, KeshavaKrupa (third Floor), 30th Cross, 10th Main, Jayanagar 4th Block, Bangalore-11
: 080-65700750, info@thegateacademy.com Copyright reserved. Web: www.thegateacademy.com
Page 419