Anda di halaman 1dari 50

1

Fault-tolerant design
Verification
Testing
Design for testability
Built-in-self-test
Concurrent checking
2
Types of testing
When testing is performed?
On-line (concurrent) testing, off line testing
Where is the source of stimuli?
Self testing, external testing (tester)
What do we test for?
Design verification, acceptance testing for fabrication errors, etc
How are the stimuli applied?
Fixed order, adaptive testing
What are the observed results?
Entire output patterns, some functions of the output (compact
testing /signature)
What lines are accessible for testing?
Only I/O, I/O and internal lines
Who checks the results?
self checking, external testing
3
Nature of faults
Permanent
Always present
Intermitted
Occurs in regular intervals
Transient
One time and gone
fault
error
4
Self-checking circuits - Overview
Fault model (model describes a nature of faults)
Error detecting codes
Totally Self-checking property
Self-testing property
Fault-secure property
Self-checking checker
SOM-based checker
circuit
checker
5
Error Detecting Codes

There is a possibility that during information processing or storage data can
get corrupted due to physical defects in the system, there should be some
provisions in the system for detecting erroneous bits in data.

This typically request additional (redundant) bits to appended to the data
for error detecting. The length the number of bits in encoded data, also
known as code word is greater than that of the original data.

The process of appending check bits to the information bits is called
encoding; the opposite process extracting the original information bits
from a code word is known as decoding.

The ratio of the number of information bits to the number of code word bits
is known as the code rate.
An n-bit code obtained by encoding information bits of length k has 2
k
valid
code words, (2
n
- 2
k
) invalid (non-code) words, and the code rate k/n.
6
Error Detecting Codes
The primary requirements of the code are as follows:
It detects all likely errors.
It achieve the desired degree of error detection by using minimum
redundancy.
The encoding and decoding process is fast and simple.

Codes can be classified as either separable or nonseparable.
The information bits in a separable code can be separately
identified from the check bits;
The information bits in a nonseparable code are embedded in a
code word and can only be extracted by using a specific decoding.

A separable code with k information bits is said to be systematic if all
2
k
patterns of information bits occur in code words.


7
Parity Code
The parity code is obtained by counting the number of 1s in
information bits and tacking a 0 or a 1 to make the count odd or even.
The odd parity is generally preferred because of ensures at least a
single 1 in any code word.

Parity check can detect only odd number of errors.

The error detecting capability of the parity checker can be expanded
by including a parity bit for each byte of information bits.

More sophisticated parity oriented solutions are based on partitioning
information bits into several blocks with each bit appearing in more
then one block, and computing the parity for each block. Such
overlapping detects not only more than 1-bit errors but in the case of a
single erroneous bit the location of this bit can also be identified.
8
Multiple Error Detecting Codes

Single-bit error detecting code can handle only random error.
However, in many cases error that occur in logic circuits and memory
systems are of multiple nature. Multiple errors belong to one of the
following classes: symmetric, unsymmetrical, and unidirectional.

Symmetric errors: Both 0->1 and 1->0 errors can occur with
equal probability in code word.

Unsymmetrical errors: Only one type of error 0->1 or 1->0 but
not both can occur in code word.

Unidirectional errors: Both 0->1 and 1->0 errors can occur, but
they do not occur simultaneously in any code word.


9
Unidirectional Error Detecting codes -Definitions
Let X and Y be two binary k-tuple. Denote N(X,Y) as the number of
10 crossovers from X to Y.



The number of bits in which two distinct binary vectors differ is known
as the Hamming distance of the code d(X,Y)=N(X,Y)+N(Y,X).
A word X(x
1
,x
k
) covers another word Y(y
1
,,y
k
), (X Y), if y
i
=1
implies x
i
=1 for i=1,2,k. In other words, the positions of 1 in Y are a
subset of the positions of 1 in X.

X=101010 and Y =101000 X Y

If X does not cover Y, and Y does not cover X, then X and Y are
unordered.
A code in which no code word is covered by any other code word is
known to be unordered code.
101010 ( , ) 2
110101 ( , ) 3
X N X Y
Y N Y X
= =

= =
10
Unordered Codes for Unidirectional Error
Detecting

Many fault in VLSI circuits have been found to cause unidirectional errors.
This has led to the development of several unidirectional error detecting
codes.

An unordered code is capable of detecting all unidirectional errors. This is
because in such a code, a unidirectional error cannot transform one code
word into another code word.

Unordered code can be separable and nonseparable. For example both
m-out-of-n and Berger code are unordered, but former is nonseparable
and the latter is separable. Both codes detect single and unidirectional
multiple errors.
11
m-out-of-n Codes
In an m-out-of-n code, all valid code words have exactly m 1s and (n-
m) 0s. The total number of code words is n!/(n-m)!m!.

If m=k and n=2k, we have a popular k-out-of-2k code. A special case
of k-out-of-2k code consisting of only 2
k
code words out of the
possible 2k!/k!k! code words is known as k-pair two-rail code. Each
code word of this code has k-information bits and k check bits, which
are bit-to-bit complements of the information bits. The 2-pair two-rail
code consists of the following code words: 0011, 1001, 0110, 1100.

If in m-out-of-n code, then the code is optimal. In other
words, there is no other unordered code except
code that has more code words of length n.

An important subset of m-out-of-n code is 1-out-of-n code, in which
exactly 1 bit of an n-bit code word is 1 and the remaining bits are all
0s.


2 / n m =

n of out n 2 /
12
Berger Code
A Berger code of length n has k information bits and r check bits
where: and n=k+r.

It is the least redundant unordered code for detecting single and
unidirectional multiple errors.

A code word is constructed by forming a binary number
corresponding to the number of 1s in the information bits, and
appending the bit-by-bit complement of the binary number as check
bits to the information bits. For example: if k=0101000,
and the Berger code must have a length of 10
(=7+3), r check bits are derived as follows:

Number of 1s in information bits k=2 (010). The bit to bit
complement of 010 is 101, which are the r check bits. Thus,

0101000 101.
k r
( )
(
1 log
2
+ = k r
( )
2
log 7 1 3 r = + = (
(
13
Berger Code (cont)

The r check bits may be the binary number
representing the number of 0s in k information
bits. Thus, the check bits for the Berger code can
be generated by using two different schemes:

The scheme that uses the bit-to-bit complement of the
binary representation of the number of 1s in the
information bits is known as the B1 encoding scheme.

The other scheme, which uses the binary representation
of the number of 0s in the information bits as check
bits, is known as the B0 scheme.
14
Smith code

In the case when some subset of codewords is unordered and another
subset is ordered the Smith code can be applied.

Example: Let the set of codewords
{11111, 11100, 00101, 00110, 10001, 11000, 00100, 00000}.

The idea of the Smith encoding is to make unordered not every couple of
vectors but just those that are ordered.

In our example there are 4 ordered chains of Hasse diagram:
A
1
={11111, 11100, 11000, 00000}
A
2
={11111, 00101, 00100, 00000}
A
3
={11111, 00110, 00100, 00000}
A
4
={11111, 10001, 00000}

Each chain can be encoded undependably. Consequently, it is possible
to encode each level of the Hasse diagram as follows:

15
Hasse diagram for the Smith code
00101 00110 10001
11111
11100
00100
00000
11000
00
01
10
11
A
1
A
2
A
3
A
4
16
Berger vs. Smith encoding
INFORMATION BITS Berger Smith
x1 x2 x3 y1 y2 y3 y4 y5 H1 H2 H3 s1 s2
0 0 0
1 1 1 1 1
0 0 0
0 0
0 0 1
1 1 1 0 0
0 1 0
0 1
0 1 0
0 0 1 0 1
0 1 1
0 1
0 1 1
0 0 1 1 0
0 1 1
0 1
1 0 0
1 0 0 0 1
0 1 1
0 1
1 0 1
1 1 0 0 0
0 1 1
1 0
1 1 0
0 0 1 0 0
1 0 0
1 0
1 1 1
0 0 0 0 0
1 0 1
1 1
17
Self-Checking Combinational Circuits Design

Self-checking can be defined as the ability to verify automatically whether
is any fault in logic without the need for externally applied testing.

Self-checking circuits allow on-line error detection, that is faults can be
detected during the normal operation of the circuit.

One of the ways to achieve the self-checking design is through the use of
error detecting codes.
18
Principles of Self-checking

Let a circuit has m primary input lines and n primary output
lines. Then 2
m
binary vectors of length m form the input space
X of the circuit.

The output space is similarly defined to be the set of 2
n
binary
vectors of length n.

During the normal (fault free) operation the circuit receives
only a subset of X called input code space and produces a
subset of Z called the output code space. Member of the
code space called code words.

A non-code word at the output indicates the presence a fault in
the circuit. However, a fault may also result in an incorrect
codeword at the output, rather then a non-codeword, in which
case the fault is undetectable.
19
Principles of Self-checking
A circuit may be designed to be self-checking only for an assumed set of
faults. Such a set usually includes single stuck-at faults and
unidirectional multiple faults.

A single stuck-at fault assumes that a physical defect in a logic circuit
results in one of the signal lines in the circuit being fixed to either a logic 0
(stuck-at-0) or logic 1 (stuck-at-1).

If more then one signal line in the circuit is stuck-at-1 or stuck-at-0 at the
same time, the circuit is said to have a multiple stuck-at fault.

A variation of the multiple fault is the unidirectional fault. A multiple fault is
unidirectional if all its constituent faults are either stuck-at-0 or stuck-at-1
but not simultaneously.

Self-checking circuits must satisfy the following properties:
Self-testing
Fault-secure

20
Principles of Self-checking (cont)
Self-checking
circuit
checker
Inputs
error signal
coded output
...
.
.
.
.
.
.
Self-checking circuit
Self-checking
circuit
checker
Inputs
error signal
coded output
...
.
.
.
.
.
.
Self-checking circuit
No need in redundancy
No need in fault-secure
Must be fault-secure
21
Self-checking property : fault secure
I - input code space;
S is output code space;
( ) , X Y is a function of an input vector in the fault free case;
( ) f X Y , is a function of an input vector and fault f in the circuit.




Definition 1

A circuit is fault-secure for an input set I and a fault set F if for any input
X in I and for any fault f in F, ( ) S X Y e , , and ( ) S f X Y e , implies
( ) ( ) , , X Y f X Y = .

A circuit is fault-secure if, for every fault from a prescribed set, the
circuit is never produces an incorrect code space output for code space
inputs.


22
Self-checking property : self testing
Definition 2

A circuit is self-testing for an input set N and a fault set F if for
every fault f in F there is an input X in N such that ( ) S f X Y e , .

A circuit is self-testing if, for every fault from a prescribed set,
the circuit produces a non-code space output for at least one code
input.


Definition 3.
A totally-self-checking circuit is a circuit that is self-testing for a
normal input set and a fault set F, and fault-secure for N and a
fault set F.


23
Example: PLA checking
Problem: to develop method for synthesis of the self-checking PLA
with minimal overhead (redundant area).

Input space vs. Output space (code words).

Totally Self-checking (TSC) property for the circuit has to be proven.

A checker have to be totally self-checking (TSC).

Using XOR is forbidden for checking

Fault coverage has to be high.
24
Example: PLA checking (cont)
Three kinds of faults can normally occur in PLAs: stuck-at faults,
bridging faults, and cross-point faults.
An important assumption: non-concurrent property, that is
any normal input pattern selects exactly one product term in
PLA during fault free operation.
All single faults in a PLA can cause only unidirectional errors
in the outputs of the PLA.
x1 x2 x3 y1 y2 y3
1 - 0 1 1 0
1 0 1 1 0 0
0 0 - 0 1 1
0 1 - 1 0 0
x1 x2 x3
y1 y2 y3
25
Example: PLA checking
(concurrency property)
x
1
x
2
x
3
x
4
x
5
x
6
x
7
x
8
y
1
y
2
y
3
y
4
y
5
y
6
y
7
y
8
y
9
y
10
y
11
y
12
y
13
y
14
Non-disjoint
cubes
Missing
device
stuck at 1
Missing
device
stuck at 0
26
Example: PLA checking
(non-concurrency property)
memory
x
1
x
2
x
3
x
4
x
5
x
6
x
7
x
8
y
1
y
2
y
3
y
4
y
5
y
6
y
7
y
8
y
9
y
10
y
11
y
12
y
13
y
14
d
3
d
2
d
1
disjoint
27
PLA checking - SOM checker solution

The Sum-of-Minterms (SOM) checker implements the
following logical function:





Y
t
a certain codeword, Q a number of possible code
words.

,
1

=
=
Q
t
t err
Y f
28
PLA checking

Why do we need the non-concurrent property for a SOM
checker?

Example :





An erroneous output due to a missing device in x
2
th column
cannot be detected

1 2 3 1 2
1 2
1 2
3
3
expression
1 0 1 0
1 0 1
1 0 1 1 1
x x x y y
x x
x x x
x


29
Example: PLA checking
(SOM checker)
x1 x2 x3 y1 y2 y3 y4
1 0 * 1 1 0 0
0 0 * 0 0 1 1
* 1 1 0 1 0 1
0 1 0 0 1 1 1
1 1 0 1 0 0 0
Error
1 1 0 0 1
0 0 1 1 1
0 1 0 1 1
0 1 1 1 1
1 0 0 0 1
Non fault secure
A missing (or an
additional) device at
cannot be detected
SOM checker (by itself)
cannot help in this case
f
err
Non TSC
A fault in the checker
cannot be detected

30
SOM-based checker on PLA
Berger encoding
INFORMATION BITS CHECK BITS
x1 x2 x3 y1 y2 y3 y4 b1 b2
1 0 * 1 1 0 0 1 0
0 0 * 0 0 1 1 1 0
* 1 1 0 1 0 1 1 0
0 1 0 0 1 1 1 0 1
1 1 0 1 0 0 0 1 1
Error
1 1 0 0 1 0 1 0
0 0 1 1 1 0 1 0
0 1 0 1 1 0 1 0
0 1 1 1 0 1 0 1
1 0 0 0 1 1 0 1
Two-rail
31
Self-checking Checkers
Overview:

Self-dual Parity Checking
Two-rail checker
TSC checkers for m-out-of-n codes
TSC Berger checkers
TSC Smith checkers

32
Parity Checking Sellers, Hasio, Bearnson 1968
In conventional parity checking the parity bit p corresponding to the
output bits of the combinational circuit is compared with the parity bit p
generated independently by the parity prediction circuit.
Combinational Circuit
Parity prediction
circuit
Comparator
Inputs (x
1
, , x
n
) Outputs (y
1
, , y
n
)
p
p
n p
y y y y =
2 1
Parity prediction function:
checker
33
Self-dual Parity Checking Saposhnikov, Dmitriev, Goessel, 1996
In general the area overhead for separate implementation of the
parity prediction checker results in average overhead of 33%.
To reduce the overhead a self-dual parity checking is
developed. In this checking approach the parity prediction
function is replaced by a circuit that generates a self-dual
complement of the combination circuit function.
Combinational Circuit
f(x)
Self-dual complement
(x)
Comparator
Inputs (x
1
, , x
n
) Outputs (y
1
, , y
n
)
34
Self-dual Parity Checking - single output
Let the self-dual complement function (x), of the function
f(x), in respect to a self-dual function h(x) be:


or

Therefore,
( ) ( ) ( )
( ) ( ) ( )
h x f x x
x f x h x
o
o
=
=
( ) ( )
( )
( ) ( )
( )
1 f x x f x x o o =
35
Self-dual Parity Checking - example



Take an arbitrary self-dual function:


It can be rewritten:


Therefore self-dual complement of f(x) is:
( )
3 0 3 2 2 0 3 2 1 0
, , , x x x x x x x x x x f + + =
( )
3 2 1 3 1 0 3 2 1 0 3 2 1 0 2 1 0 3 2 1 0
, , , x x x x x x x x x x x x x x x x x x x x x h + + + + =
( )
3 1 3 1 3 2 1 0
, , , x x x x x x x x + = o
( ) ( ) ( )
3 1 3 1 3 2 1 0 3 2 1 0
, , , , , , x x x x x x x x f x x x x h + =
36
Implementation of the function and its self-
dual complement
x
0
x
2
x
3
x
0
x
3
x
1
x
3
x
1
x
3
o
f
37
Self-dual parity checking - multiple outputs
For circuit with multiple outputs such as y
1
, y
2
, y
3
, , y
n
, the parity of
the output bits is compared with the self-dual complement of the
parity function.

The self-dual complement o
p
of f
p
is chosen such that the function

is self-dual.
n p
y y y f =
2 1
( ) ( ) ( )
n n n
x x x x f x x h , , ,
1 1 1
o =
Self-dual complement of f
p
Inputs (x
1
, , x
n
)
+
+
+
+
h(x)
Combinational Circuit
f
p
o
p
38
Self-dual parity checking

During normal operation complementary input patterns are applied to the
composite circuit implementing h(x) at time units t and t+1. If there is no
fault output responses of f and o are complementary.

The self-dual parity checking in conjunction with a time redundancy
scheme allows on-line error detection corresponding to function h(x).

The drawback of the self-dual parity approach is 100% time redundancy
in addition to hardware overhead.
39
Self-checking Checker

A totally self-checking checker must have two outputs, and, hence, four
output combinations.

Two of these combinations (for example 01 and 10) are considered as
valid.

A non-valid combination indicates either a non-code word at the input of
the checker, or a fault in the checker itself.

A checker does not need to be fault-secure :

We are interested only in whether the checker circuit is a code word
or not.
It is not important whether 01 has changed to 10 or vice versa
because the output of the checker will be 00 or 11 in presence of a
fault (self-testing).

40
Two-rail checker
The two-rail checker has two groups of inputs (x
1
, , x
n
) and (y
1
, , y
n
)
and two outputs f and g.
f and g are complementary (1-out-of-2) if and only if the pair x
j
, y
j
is also
complementary for all j.

x
0
Totally self-checking two-rail checker
y
0
x
1
y
1
1-out-of-2
x
n-1
y
n-1
41
Truth Table of the Two-rail checker
The circuit has normal input set
N={<0101>,<0110>,<1001>, <1010>}

The circuit is totally self-checking for all
unidirectional multiple faults.
x
1
y
1
x
0
y
0
f g
0 0 0 0 0 0
0 0 0 1 0 0
0 0 1 0 0 0
0 0 1 1 0 0
0 1 0 0 0 0
0 1 0 1 0 1
0 1 1 0 1 0
0 1 1 1 1 1
1 0 0 0 0 0
1 0 0 1 0 1
1 0 1 0 1 0
1 0 1 1 1 1
1 1 0 0 0 0
1 1 0 1 1 1
1 1 1 0 1 1
1 1 1 1 1 1
f
g
x
0
y
0
x
1
y
1
Can we detect a stuck-at-1
at this point?
42
TSC two-rail checker with six input pairs
* *
* *
*
x
2
y
2
x
1
y
1
x
3
y
3
x
4
y
4
y
5
x
6
x
5
y
6
f
1
g
1
f
2
g
2
f
3
g
3
f
4
g
4
f
g
43
Totally Self-checking checkers for
m-out-of-n codes
The m-out-of-n checker consists of two independent
subcircuits, each subcircuit having a single output.



The k-out-of-2k checker is fault-secure for a single fault
because it has two subcircuits; a single fault can affect
the output of only one of them.
If the checker is implemented with AND-OR logic, it is
TSC also for unidirectional multiple faults.
1
&
n n n
k k k
S M M
+
=
44
Each monotonic symmetric function of n variables
can be represented as a composition of elementary monotonic
symmetrical functions of m and n-m variables:

=
=

k
j
m n
j k
m
j
n
k
M M M
0
&

Example:

zt t z y x xy
M M M M M M M
+ + + + =
= + + =
) )( (
& & &
2
2
2
0
2
1
2
1
2
0
2
2
4
2
Design of the k-out-of-2k checker (cont)
45
Design of the k-out-of-2k checker (cont)

2k bits are partitioned into two disjoint subsets:

A(x
1
,x
k
) and B(x
k+1
, , x
2k
).

Outputs of the checker can be expressed as:







where k
A
and k
B
are numbers of 1s occurring in subsets A and B,
respectively.



( )
( ) number even an i M M Z
number odd an i M M Z
B
k
i k
k
i
A
k
i
B
k
i k
k
i
A
k
i

, 6 , 4 , 2 , 0 &
, 5 , 3 , 1 &
1
2
1
1
= =
= =

46
Example 1: design of TSC 2-out-of-4
checker
k=2; A=(x
1
, x
2
); B=(x
3
, x
4
).

( ) ( )
2 1 4 3 2 2 0 2 2 0 2
4 3 2 1 1 1 1
1 1 x x x x M M M M M M Z
x x x x M M Z
A B B A B A
B A
k k k k k k
k k
+ = + = + =
+ + = =
& &
&
Z
1
Z
2
x
1
x
3
x
4
x
2
x
1
x
3
x
4
x
2
47
Example: Design of the 2-out-of-4
checker (cont)
(0,1,1,0) (1,0,0,1) (0,1,0,1) (1,0,1,0)
(0,1,0,0) (0,0,1,0)
(1,1,0,1) (1,0,1,1)
(0,0,0,0)
(1,1,1,1)
(0,1,1,1) (1,1,1,0)
(1,0,0,0) (0,0,0,1)
(0,0,1,1) (1,1,0,0)
(0,1,1,0) (1,0,0,1) (0,1,0,1) (1,0,1,0)
(0,1,0,0) (0,0,1,0)
(1,1,0,1) (1,0,1,1)
(0,0,0,0)
(1,1,1,1)
(0,1,1,1) (1,1,1,0)
(1,0,0,0) (0,0,0,1)
(0,0,1,1) (1,1,0,0)
M
0
M
2
M
1
M
1
M
2
M
1
Z
1
Z
2
48
Design of the k-out-of-2k checker (cont)
Why does it work ?
Consider a codeword with w
H
(A)=a, and w
H
(B)=k-a.







1
1,
2
1,
&
&
A B
A B
k
k k
i k i
i i odd
k
k k
i k i
i i even
Z M M
Z M M

= =

= =
=
=

k
k
49
Design of the k-out-of-2k checker (cont)
Why does it work ?
In a fault free operation:







In the presence of 01 errors Z
1
=Z
2
=1
In the presence of 10 errors Z
1
=Z
2
=0
1
1,
2
1,
0
&
1
1
&
0
A B
A B
k
k k
i k i
i i odd
k
k k
i k i
i i even
a is even
Z M M
a is odd
a is even
Z M M
a is odd

= =

= =

= =

= =

50
Design of the k-out-of-2k checker (cont)
Example with k=8
Consider a codeword with w
H
(A)=3, and w
H
(B)=k-5.
In a fault free operation:










In the presence of 01 errors Z
1
=Z
2
=1
In the presence of 10 errors Z
1
=Z
2
=0
0
1
2
3
4
3
1 2
8
7
6
5
4
5
2
1
6
8 0
7
Z Z
M M
M
M
M
M
M
M
M
M
M
M
M
M
M
M M
M