Anda di halaman 1dari 14

1072

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 7, JULY 2011

BIST-Based Fault Diagnosis for Read-Only Memories


Nilanjan Mukherjee, Member, IEEE, Artur Pogiel, Member, IEEE, Janusz Rajski, Senior Member, IEEE, and Jerzy Tyszer, Senior Member, IEEE

AbstractThis paper presents a built-in self-test (BIST)-based scheme for fault diagnosis that can be used to identify permanent failures in embedded read-only memories. The proposed approach offers a simple test ow and does not require intensive interactions between a BIST controller and a tester. The scheme rests on partitioning of rows and columns of the memory array by employing low cost test logic. It is designed to meet requirements of at-speed test thus enabling detection of timing defects. Experimental results conrm high diagnostic accuracy of the proposed scheme and its time efciency. Index TermsBuilt-in self-test (BIST), deterministic partitioning, discrete logarithms, embedded read-only memory, fault diagnosis.

I. Introduction HE INTERNATIONAL Technology Roadmap for Semiconductors [13] predicts memories to occupy more than 90% of the chip silicon area in the foreseeable future. Due to their ultralarge scale of integration and vastly complex structures, memory arrays are far more vulnerable to defects than the remaining parts of integrated circuits. Embedded memories have already started introducing new yield loss mechanisms at a rate, magnitude, and complexity large enough to demand major changes in test procedures. Many types of failures, often not seen earlier, originate in the highest density areas of semiconductor devices where diffusions, polysilicon, metallization, and fabricated structures are in extremely tight proximity to each other. Failing to properly test all architectural features of the embedded memories can eventually deteriorate the quality of test, and ultimately hinder yield. Embedded memories are more challenging to test and diagnose than their stand-alone counterparts. This is because their complex structures are paired with a reduced bandwidth of test channels resulting in limited accessibility and
Manuscript received September 7, 2010; revised December 15, 2010; accepted February 4, 2011. Date of current version June 17, 2011. A preliminary version of this paper appeared as Fault diagnosis for embedded read-only memories at the Proceedings of the IEEE International Test Conference in 2009, paper 7.1. This paper was recommended by Associate Editor D. M. Walker. N. Mukherjee and J. Rajski are with Mentor Graphics Corporation, Wilsonville, OR 97070 USA (e-mail: nilanjan mukherjee@mentor.com; janusz rajski@mentor.com). A. Pogiel is with Mentor Graphics Polska, Pozna 61-131, Poland (e-mail: artur pogiel@mentor.com). J. Tyszer is with the Faculty of Electronics and Telecommunications, Pozna University of Technology, Pozna 60-965, Poland (e-mail: tyszer@et.put.poznan.pl). Digital Object Identier 10.1109/TCAD.2011.2127030

controllability. Consequently, the memory built-in self-test (MBIST) has established itself as one of the mainstream design for test (DFT) methodologies as it allows one to generate, compress, and store on chip very regular test patterns and expected responses by using a relatively simple test logic. The available input/output channels, moreover, sufce to control built-in self-test (BIST) operations, including at-speed testing and detection of timing defects. Non-volatile memories are among the oldest programmable devices, but continue to have many critical uses. ROM, PROM, EPROM, EEPROM, and ash memories have proved to be very useful in a variety of applications. Traditionally, they were primarily used for long-term data storage, such as look-up tables in multimedia processors or permanent code storage in microprocessors. Due to the high area density and new submicrometer technologies involving multiple metal layers, ROMs have also gained popularity as a storage solution for lowvoltage/low-power designs. Moreover, different methods such as selective pre-charging, minimization of non-zero items, row(s) inversion, sign magnitude encoding, and difference encoding are being employed to reduce the capacitance and/or the switching activity of bit and word lines. Such design, technology, and process changes have resulted in an increase in the number of ROM instances usually seen in design. New non-volatile memories such as ferroelectric, magnetoresistive, and phase changed RAMs retain data when powered off but are not restricted in the number of operation cycles [1], [12]. They may soon replace other forms of non-volatile memory as their advantages, e.g., reduced standby power and improved density, are tremendous. It has become imperative to deploy effective means for testing and diagnosing non-volatile memory failures. A functional model employed for these memories remains similar to that of RAMs with relevant fault types such as stuck-ats and bridges being tackled through functional test algorithms [25]. Also, all addressing malfunctions are covered by memory cell stuck-at fault tests as there are no writes in the mission mode. Typically, the basic test reads successive memory cells, and processes output responses by performing a polynomial division to compute a cyclic redundancy code (signature). The same procedure can be used to detect certain classes of dynamic faults provided memory cells are designed with additional DFT features [14]. No longer, however, is it sufcient to determine whether a memory failed or not [3], [27]. In ROM defect analysis and

0278-0070/$26.00 c 2011 IEEE

MUKHERJEE et al.: BIST-BASED FAULT DIAGNOSIS FOR READ-ONLY MEMORIES

1073

ne-tuning of a fabrication process, the ability to diagnose the cause of failure is of paramount importance. In particular, new defect types need to be accurately identied and well understood. It is also a common desire to verify if the programming device that is writing the ROM is working correctly. The method and accuracy of the diagnostic technique, therefore, is a critical factor in identifying failing sites of a memory array. It can be performed either on chip or off-line after downloading compressed test results. Until recently, the main strategy for ROM diagnosis was to have users provide an initialization le that describes the content of the ROM. The initialization sequence can be random as far as the test is concerned. During the MBIST session, the content of the ROM is read multiple times using different addressing schemes and compressed into a signature. The signature is downloaded at the very end of the algorithm. Although the data transferred to the memory is minimal, the diagnostic procedures employed for ROMs are cumbersome in nature. Current techniques either rely on downloading the signature value at certain intervals (based on binary search techniques) such that one can corner the test step when the MISR gets corrupted. Some other techniques suggest downloading the content of the entire ROM when a failure occurs. Such techniques can get to the failing address and data, but they are complex, time consuming, and often prohibitive in practice. Therefore, additional hardware is added to allow downloading the content of the entire ROM. As the ROM needs to be stopped after every read operation, the time needed to diagnose ROM failures increases signicantly. Susceptibility to different forms of failure has given rise to various memory diagnostic algorithms. Typically, they target RAMs by modifying the MBIST controller [5], [20] to carry out extra tests aimed at localizing all single-cell faults. A syndrome compression scheme, which requires a content addressable memory to perform data accumulation, is presented in [15]. Various techniques [2], [7], [9] are, in essence, off-line reasoning procedures gearing toward accurate reconstruction of error bitmaps. The recent method of [17] achieves similar goals by employing exible test logic to record test responses at the system speed with no interruptions of a BIST session. Several diagnostic schemes have been patented. Solutions presented in [6] and [26] use dedicated circuits to compress diagnostic data at high speed and download them to a slow memory tester. Similarly, the scheme of [7][9] compresses memory test responses using a combinational logic and scans them outside the chip using a reduced bandwidth. Reference [28] proposed a fault syndrome compression scheme to identify failing patterns by means of coordinates compression. A technique similar to that of [11] is deployed in [29], where repetition of the same test is required. A dynamic switching between BIST and built-in self-diagnosis modes is introduced in [24]. It allows some failing patterns to be recognized and encoded into bit-strings. In this paper, we propose a low-cost test and diagnostic scheme that allows uninterrupted test response collection to perform accurate identication of failing rows, columns, and cells in read-only memories [18], [23]. The method utilizes a concept of partitioning, originally introduced in [21] for scan-

TABLE I Basic Parameters of a ROM Array R B M C The number of rows The word size (the number of bits) The number of words in a row (mux factor) The number of columns (C = B M)

based fault diagnosis in BIST environment, and further rened in [4]. The proposed scheme partitions rows and columns of a ROM array deterministically and records signatures corresponding to array segments being currently read (observed), every time narrowing down possible error locations until the failing rows and columns are determined. Such approach neither requires interactions between BIST and automatic test equipment (ATE) nor interrupts a test ow. This paper is organized as follows. In Section II, the overall architecture of the diagnostic environment is presented. Section III details foundations of row and column partitioning, while Section IV introduces its hardware implementation. Section V demonstrates how to locate single erroneous cells within failing rows and columns. In Section VI, we report experiments performed using the proposed approach. Section VII discusses area cost of the scheme. Finally, Section VIII concludes this paper. Compared to the earlier version of this paper [18], we have added: 1) a comparison regarding diagnostic time as offered by the proposed method and other techniques based on a complete ROM dumping or a binary search across the address space; 2) results of logic synthesis (in terms of area overhead) with respect to proposed diagnostic logic; and 3) detailed discussions of new diagnostic algorithms and adopted experimentation procedures. Moreover, this paper is signicantly modied with respect to its presentation style, including several additional comments, new gures, and illustrative examples.

II. Test Logic Architecture A. Memory Array Organization Fig. 1 shows the salient architectural features of a ROM. Every row consists of M words, each B-bit long. Bits belonging to one word can be either placed one after another or interleaved forming segments, as illustrated in the gure. Decoders guarantee the proper access to memory cells in either a fast row or a fast column addressing mode, i.e., with row numbers changing faster than word numbers or vice versa. Table I gives the main memory parameters that we will use in the next sections of this paper. It is worth noting that algorithms proposed in this paper do not impose any constraints on the addressing scheme so that the memory array can be read using either increasing or decreasing address order. B. Collection of Diagnostic Data The same Fig. 1 summarizes the architecture of a test environment used to collect diagnostic data from the ROM arrays. In addition to a BIST controller, it consists of two modules and gating logic that allow selective observation of rows and

1074

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 7, JULY 2011

Fig. 1.

Memory array architecture and diagnostic environment.

columns, respectively. Assuming permanent failures, the BIST controller sweeps through all ROM addresses repeatedly while the row and column selectors decide which data arriving from the memory rows and/or columns is actually observed by the signature register. Depending on a test scenario, test responses are collected in one of the following test modes. 1) Row disable = 0 and column disable = 1; the row selector may enable all bits of the currently received word, thereby selecting a given row; this mode is used to diagnose row failures and, in some cases, single cell faults. 2) Row disable = 1 and column disable = 0; assertion of the row disable signal effectively gates the row selector off; the column selector takes over as it picks a subset of bit lines to be observed (this corresponds to selecting desired columns and is recommended to diagnose column and single cell failures). 3) Row disable = 0 and column disable = 0; de-asserting both control lines allows observation of memory cells located where selected rows and columns intersect; this mode is discussed in Section IV-D. Fault diagnosis has a simple ow. It proceeds iteratively by determining a signature, which corresponds to the selected rows or columns, followed by a transfer of such a test response to the ATE through an optional shadow register. If the obtained signature matches the reference (golden) signature, we declare the selected rows and/or columns fault-free. Time required to

lter out failing sites accurately depends on how selection of observable rows and columns is carried out. Our scheme employs an enhanced version of deterministic partitioning originally proposed for scan-based diagnosis [4]. It assures the fastest possible identication of fault sources down to the array nodes that cannot be recognized as fault-free ones. Details of the partitioning procedure will be presented in Sections III and IV. C. Signature Register A signature register is used to collect all test responses arriving from selected memory cells. The register is reset at the beginning of every run (test step) over the address space. Similarly, the content of the register is downloaded once per run. A multiple input ring generator (MIRG) [16] driven by the outputs of gating logic is used to implement the signature register. The design of Fig. 2 features the injector network handling the increasing number of input channels. It is worth noting that connecting each input to uniquely selected stages of the compactor makes it possible to recognize errors arriving from different input channels. This technique visibly improves diagnostic resolution, as is demonstrated in the following sections. III. Deterministic Partitioning In principle, selection of rows and columns that should be observed during a single diagnostic test run proceeds in

MUKHERJEE et al.: BIST-BASED FAULT DIAGNOSIS FOR READ-ONLY MEMORIES

1075

Fig. 2.

MIRG-based signature register.

accordance with a deterministic scheme sketched in [4] and, for the sake of completeness, briey summarized in Section IV. The set of memory rows or columns is decomposed several times into groups of 2n disjoint partitions of approximately same size. In order to reduce test time, the number of partitions within each group should be small. Consequently, the same applies to the value of n. On the other hand, we need to guarantee that successive groups of partitions are formed in such a way that each partition of a given group shares at most one item with every partition belonging to the remaining groups. This implies that the partition size must not exceed the number 2n of partitions. Hence, if the total number of memory rows or columns v is an even power of 2, then the value of n can be computed as 0.5 log2 v. Otherwise, n = 0.5 log2 v . As a result, the size of partitions may vary from 2n1 to 2n . This rule guarantees the most time-efcient tracking down of faulty rows or columns. Indeed, if the array has x failing elements, then it sufces to run a test as indicated by x + 1 groups to determine the faulty items [4]. Example: Let us consider a 16-row memory array. Two groups, each comprising four unique partitions, are shown on the left-hand side of Fig. 3(a). Suppose row 7 is faulty. After producing four signatures according to the scheme dened by the rst group (0), it appears that signatures representing partitions 0, 1, and 2 are error-free, thereby rows that belong to these partitions can be cleared [the right-hand side of Fig. 3(a)]. Since the signature obtained by processing data from rows 3, 7, 11, and 15 (partition 3) is erroneous, these rows become now suspects [as marked in Fig. 3(a)]. The suspect rows belong to different partitions in the subsequent group (1). After running four tests for this group, it becomes evident that only signature representing partition 2 is erroneous. Since rows 2, 8, and 13 were identied earlier as fault-free, we can easily isolate row 7 as the failing one. Example: Assume now that x = 3 rows (5, 10, and 11) of the same memory are faulty. As can be seen in Fig. 3(b), after collecting four signatures for group 0, only rows 0, 4, 8, and 12 can be declared fault-free. Running tests for group 1 results in erroneous signatures associated with partitions 0 and 1; hence the number of suspects drops to 6 candidate rows: 1, 5, 10, 11, 14, and 15. The next round of tests produces three erroneous signatures, but due to new contents of partitions 0, 1, and 3, the possible suspects can be conned to rows 1, 5, 10, 11, and 14. Eventually, tests for group 3 produce two erroneous signatures for partitions 2 and 3, where only rows 5, 10, and

11 are still present. As a result, these rows are identied as faulty ones. Clearly, reading x + 1 = 4 groups of row partitions sufces to uniquely determine the failing rows. As with many other schemes, ROM diagnosis can be performed either in a non-adaptive mode where tests are selected prior to the actual diagnostic experiment, or in an adaptive fashion, where selection of tests is based on the outcomes of the previous runs. In the rst case, the process targets a prespecied number x of failing items and does not require any interaction with a tester, as only signatures for x + 1 partition groups have to be collected. In the second approach, if the current number of suspect rows or columns does not narrow down anymore, the failing items are assumed to be determined, and the test stops.

IV. Row and Column Selection In this section, we introduce several hardware solutions for row and column selection. In particular, after presenting separate row and column selectors that implement a deterministic partitioning of a ROM array, we introduce a scheme that allows one to partition rows and columns simultaneously. A. Row Selection We start by introducing the general structure of the row selector shown in Fig. 4. Essentially, it is comprised of four registers. The up counters partition and group, each of size n = 0.5 log2 R , keep indexes of the current partition and the current group, respectively. They act as an extension of the row address register that belongs to the BIST controller (the leftmost part of the counter in Fig. 4). A linear feedback shift register (LFSR) with a primitive characteristic polynomial implements a diffractor providing successive powers of a generating element of GF(2n ), which are subsequently used to selectively invert data arriving from the partition register. The same register can be initialized when its input load is activated. Similarly, one can initialize a down counter called offset by asserting its input load. In principle, the circuit shown in Fig. 4 implements the following formula used to determine members r of partition p within group g: r = S k + (p (g k)), k = 0, 1, . . . , P 1 (1)

where S is the size of partition, P is the number of partitions, is a bit-wise addition modulo 2, and g k is a state that the diffractor reaches after k 1 steps assuming that its initial state was g. If k = 0, then g k = 0. For example, (1) yields successive partitions of Fig. 3 for S = 4 and k = 0, 1, 2, 3, assuming that the diffractor cycles through the following states: 1 2 3 1. Let g = 3 and p = 2. Then we have k = 0: r = 4 0 + (2 (3 0)) = 0 + (2 0) = 2 k = 1: r = 4 1 + (2 (3 1)) = 4 + (2 3) = 5 k = 2: r = 4 2 + (2 (3 2)) = 8 + (2 1) = 11 k = 3: r = 4 3 + (2 (3 3)) = 12 + (2 2) = 12. With the ascending row address order, selection of rows within a partition, a group, and nally the whole test is done as

1076

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 7, JULY 2011

Fig. 3.

Partition groups for 16-word memory. (a) Single faulty row. (b) Three faulty rows.

offset register when handling partition 2 (102 ) of group 3 (112 ). The last diagram depicts the output of gate N2. As can be seen, the diffractor is loaded at the beginning of a test run (row address = 0) with the group number 3 (112 ), and then it changes its state every four cycles by following the trajectory 1 2 3. At the same cycles, i.e., 0, 4, 8, and 12, the offset counter is reloaded with the sum of the partition number 2 (102 ) and the previous state of the diffractor, except for the rst load, when only the partition number goes to the offset counter as the outputs of the AND gates are set to 0. After initializing, the offset regiter counts down and reaches zero at cycles 2, 5, 11, and 12, which yields an active signal on line observe row resulting, in turn, in observing data from the memory rows with addresses 2, 5, 11, and 12, respectively. B. Column Selection Fig. 6 shows the column selector used to decide, in a deterministic fashion, which columns should be observed. Its architecture resembles the structure of the row selector as both circuits adopt the same selection principles. The main differences include the use of a BIST column address register and a diffractor clocking scheme. Moreover, the offset counter is now replaced with a combinational column decoder, which allows selection of one out of B outputs of the word decoder (see Fig. 1). It is worth noting that the diffractor advances every time the column address increments. Its content added to the partition number yields a required column address in a manner similar to that of the row selection. If the size B of the memory word is equal to M (the number of words per row), it sufces to select one out of B columns at a time to cover all columns of the memory array for one partition group. Typically, however, we observe that B > M. This requires more than one column of each word to be selected at a time, as far as the single test run is concerned for every partition. The number t of columns observed simultaneously can be determined by dividing the maximal number of columns in a partition, which is 2n , by the number M of memory words per row = 2n /M. (2)

Fig. 4.

Row selector.

follows. The offset counter is reloaded periodically every time the n least signicant bits of the row address register become zero (this is detected by the NOR gate N1). Once loaded, the counter is decremented to reach the all-0 state after p(gk) cycles. This is detected by the NOR gate N2 associated with the counter. Hence, its asserted output enables observation of a single row within every S successive cycles. As indicated by (1), the initial values of the offset counter are obtained by adding the actual partition number to the current state of the diffractor. The latter register is initialized by using the group number at the beginning of every test run, i.e., when the row address is reset. Subsequently, the diffractor changes its state every time the offset register is reloaded. As the period of the LFSR-based diffractor is 2n 1, and the offset counter is reloaded 2n times, the missing all-0 state is always generated at the beginning of a test run by means of the AND gates placed at the outputs of the diffractor. Example: Fig. 5 illustrates operation of the row selector for the memory of Fig. 3, i.e., for 16 rows forming four partitions. In this case n = 2. Two state diagrams shown in the gure correspond to the content of the diffractor and the

It is important to note that columns observed in parallel cannot be handled by a single t out of B selector, as in such

MUKHERJEE et al.: BIST-BASED FAULT DIAGNOSIS FOR READ-ONLY MEMORIES

1077

Fig. 5.

Row selector operation.

Fig. 6.

Column selector. Fig. 7. Enhanced column selector.

a case certain columns would always be observed together, thereby precluding an effective partitioning. Consequently, the output column decoder is divided into t smaller 1 out of B decoders fed by phase shifters (PS), and then the diffractor, as shown in Fig. 7. The phase shifters transform a given input combination in such a way that the resultant output values are spread in regular intervals over the diffractor state trajectory. Fig. 8 demonstrates this scenario for a 3-bit diffractor driving three phase shifters and using primitive polynomial x3 + x + 1. Let the diffractor be initialized to the value of 1. The phase shifters PS1 , PS2 , and PS3 are then to output states of the original trajectory, but starting with the values of 4, 6, and 5, respectively. When various partition groups are examined, the diffractor traverses the corresponding parts of its state space while the phase shifters produce appropriate values that ensure generation of all 2n 1 combinations. The missing all-0 state is again obtained by means of AND gates. Synthesis of phase shifters is thoroughly discussed in [22].

Fig. 8.

Use of three phase shifters.

Example: Let a memory row consist of M = 2 8-bit interleaved words arranged as shown in Fig. 9. From (2) we have that = 4/2 = 2, so we need two 1 out of 4 column decoders and one phase shifter connected to the decoder selecting bits b4 to b7 . Table II illustrates how columns are selected for

1078

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 7, JULY 2011

Fig. 9.

Example of column selector. TABLE II Column Partitioning Word Address 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Partition 00 00 01 01 10 10 11 11 00 00 01 01 10 10 11 11 Group 01 01 01 01 01 01 01 01 10 10 10 10 10 10 10 10 Column Decoder 0 00 10 01 11 10 00 11 01 00 11 01 10 10 01 11 00 Column Decoder 1 11 01 10 00 01 11 00 10 01 10 00 11 11 00 10 01 Observed Columns 0, 14 5, 11 2, 12 7, 9 4, 10 1, 15 6, 8 3, 13 0, 10 7, 13 2, 8 5, 15 4, 14 3, 9 6, 12 1, 11

Fig. 10.

Phase shifters for column partitioning.

partition groups 1 (012 ) and 2 (102 ). The rst two rows of the table contain values generated by the diffractor (initialized with the group number 1) and a phase shifter for partition 0. As can be seen, despite the diffractors initial value, address 0 is rst observed at the input of column decoder 0 due to the logic value of 0 driving the AND gates. The next state provided to the column decoder is 2, which is the second state produced by the diffractor. These two addresses at column decoder 0 result in the following selections: column 0 of word 0, and column 5, which, in fact, is the column 2 of word 1. Moreover, column decoder 1 receives states 3 and 1 produced by the phase shifter (see the corresponding diffractor trajectories). They facilitate selection of columns 14 and 11, respectively. As for the remaining partitions of group 1, the same states occur at the outputs of AND gates and the phase shifter, but they are further modied by adding successive partition numbers. It effectively results in selection of the remaining columns. Column selection for the next partition groups is carried out in a similar manner except for initialization of the diffractor. The diffractor trajectory and selection of columns for partition group 2 are presented in Fig. 10. C. Combined Row and Column Selection In order to reduce the area overhead, some components of the row selector and the column selector can be shared. The circuit by which this concept is implemented is shown in Fig. 11 where the partition and group registers feed both selectors. Since the word address increments prior to the row address, the memory array is read in the fast column

addressing mode. As no interaction between control signals arriving from the word and row address registers is needed, the scheme enables reading the memory array in the fast row mode as well, after exchanging the row and column address registers. Furthermore, the combined row and column selector is designed in such a way that none of the components require clock faster than the one used to increment either the word or row address register. As a result, the proposed scheme allows reading memory at-speed, and thus detection of timing defects. Finally, as the combined selector makes it possible to collect the row and column signatures in parallel, such an approach allows one to reduce the diagnostic time by half. In this mode, however, two signature registers are required. D. Trellis Selection Given x + 1 groups of signatures, the selection schemes presented earlier allow one to identify correctly up to either

MUKHERJEE et al.: BIST-BASED FAULT DIAGNOSIS FOR READ-ONLY MEMORIES

1079

TABLE III Correlation in the Trellis Mode k 0 1 2 3 4 5 8 16 32 0 1 2 3 4 5 8 16 32 Column diffractor initialized with the group number 952 320 920 576 888 832 31 744 95 232 126 976 158 720 1 015 808 0 0 0 0 1024 0 0 0 1024 0 0 1024 0 0 0 1024 Column diffractor initialized with the group number + 1 953 312 922 560 892 800 297 600 92 256 122 016 149 792 527 744 2976 2976 3968 180 544 32 992 1984 31 744 32 0 4960 32 3968 992 992 32

Fig. 11.

Combined row and column selector.

Fig. 12. Trellis selection. (a) Single stuck-at column and single stuck-at row failure. (b) Error-free response. (c) Erroneous response.

x failing rows or x failing columns, exclusively. The actual failure may comprise, however, faults occurring in rows and columns at once. Fig. 12(a) illustrates a failure that consists of a single stuck-at column and a single stuck-at row. The black dots indicate failing cells assuming a random llnote that some cells of the faulty row and column store the same logic values as those forced by the fault. If diagnosed by using separate selection of rows and columns, such a fault would affect most of signatures as cells belonging to the failing column make almost all row signatures erroneous, and cells of the failing row would render almost all column signatures erroneous, as well. Collecting signatures in so-called trellis mode provides a solution to this problem by partitioning rows and columns simultaneously. Selecting rows and columns in parallel substantially reduces the number of observed cells, thereby increasing a chance to record fault-free signatures and to sieve successfully failing rows and columns. Fig. 12(b) and (c) are examples of trellis compaction in the presence of a singlerow-single-column failure. Observed are memory cells located at the intersections of rows and columns only. The resultant signatures are therefore likely to be error-free, as shown in Fig. 12(b). Consequently, the selected rows and columns can be declared fault free. When the selected cells come across the failing row or the failing column, one may expect to capture at least one error, as in Fig. 12(c). There is an intrinsic rows-to-columns correlation in the trellis selection mode. In particular, using the same characteristic polynomial for both diffractors of the combined selector, and initializing them with the same group number causes predictable changes in this dependencymany row-column pairs always end up in the same partitions. As a result, the diagnostic

algorithm is unable to distinguish fault-free rows and columns from defective ones since they are permanently paired by the selection scheme. The upper part of Table III illustrates a possible impact this phenomenon may have on diagnostic quality. The results were obtained for a memory array with 1024 rows and 1024 columns. The row and column selectors employ identical diffractors with a primitive polynomial x5 + x2 + 1. Each entry to the table provides the number of row-column pairs (out of total 10242 ) that occur k times within the same partitions for arbitrarily chosen 3, 4, 5, and 32 partition groups. As shown in the table, 1024 rows and columns get always to the same partition regardless of the number of partition groups. A thorough analysis of these results has further revealed that every row is permanently coupled with a certain column due to this particular selection mechanism. It appears, however, that a simple n-bit arithmetic incrementer (a module +1 in Fig. 11) placed between the group register and one of the diffractors alters this row-column relationship so that the resultant correlation is signicantly decreased. This is conrmed by the experimental data gathered in the lower part of Table III. We assume here that the column diffractor is initialized with the group number increased arithmetically by 1. As can be seen, the enhanced selection technique clearly reduces the number of the row-column pairs that always end up in the same partitions. Interestingly, the number of such pairs is equal to the number of partitions in a group (32). This is due to the zero states that are always contributed by the AND gates at the beginning of each partition.

V. Single Cell Failures The methods presented in the previous section allow identication of failing sites with single-row and/or single-column accuracy. It is also possible to take diagnosis a step further and determine location of a single faulty cell within a row or a column. This section summarizes this approach.

1080

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 7, JULY 2011

Fig. 13.

Single cell failure diagnosis.

Since the compactor (signature register) is a linear circuit, we work with so-called error signature E, which replaces the actual signature A, and can be obtained by adding modulo 2 a golden (fault-free) signature G to A, i.e., E = A G. In terms of error signatures, the compactor remains in the all-0 state (Fig. 13) till a fault injection that moves the compactor to a certain state x determined by the compactor injector network. Subsequently, the compactor advances by additional d steps to reach state y. Typically, d is the number of steps required to complete a given memory run. The same value provides then the actual fault location which is the distance between states x and y, as recorded by the compactor. The value of d, and hence a fault site, can be found by using a discrete logarithm-based counting [10], [17]. It solves the following problem: given an LFSR and a given state, determine the number of cycles needed to reach that state assuming that the compactor is initially set to 0. . . 001. When working with a failing row signature (most likely representing a single cell failure), a fault injection site (the compactor input) is unknown. Thus, d must be computed B times by applying repeatedly the following formula: d = dy dx (3)

Fig. 14.

Simple compactor and its state trajectory.

where dy and dx are distances between the state 0. . . 01 and states y and x, respectively. Recall that state x depends on where a fault is injected, so does dx . Finally, only d < M R is considered an acceptable solution. It is worth noting that once accepted, the corresponding state x identies uniquely the memory segment from which a fault arrives. The following steps summarize the above procedure: compute dy using discrete logarithms counting for i = 0 to B 1 repeat recall dx (i) from LUT compute d = dy dx(i) if d < 0, then d d + LFSR period if d < M R, then stop; the failing cell belongs to segment i and its distance to the end of this segment is d. end for. Example: Suppose we use a simple 4-bit compactor with two inputs as shown in Fig. 14. The same gure illustrates its state trajectory. Let the compactor work with a memory having the following parameters: B = 2, M = 2, and R = 4. As can be seen, if an error is injected through input a, then the compactor will initially move to state 0110 (6). Similarly, an error reaching input b takes the compactor to state 1001 (9). From the gure we have that a distance d6 between state 0001 (1) and 0110 (6) is 2. Also, d9 = 11. Assume that the

compactor has reached state 1110 (14). Its distance from state 0001 (1) is 7, and thus d = 7 2 = 5 for input a. Since 7 < M R = 2 4 = 8, we get an acceptable result, which indicates that the failing cell belongs to the segment connected with input a, and its distance from the end of the segment is 5. If one examined results obtained for input b, then we would get d = 7 11 = 4. After adjusting this number by using the compactor period, we would have d = 4 + 15 = 11. Clearly, 11 > 8, and thus the result could be discarded. Having collected column signatures, a fault injection site can be determined in a straightforward manner. Consequently, the above diagnostic procedure simplies and becomes more reliable, as there is no need to repeat all diagnostic steps for successive inputs of the compactor. Moreover, information related to failing rows (or columns), obtained as shown in the earlier sections, can be used in further efforts to improve accuracy of diagnosis. Given distance d, one can easily determine a row r to which the suspect cell belongs. If r does not match the row indicated by virtue of the way the selection mechanism works, the algorithm continues to target the following memory segments. The same technique allows scaling down the size of the compactor itself. In fact, the compactor period can be shortened even below the size of a single memory segment. Failing row information used to eliminate inconsistent results effectively counterbalances a possible wrap-around. The following code summarizes the basic steps of the improved diagnostic procedure: compute dy using discrete logarithms counting for i = 0 to B 1 repeat recall dx,i from LUT compute d = dy dx,i if d < 0, then d d + LFSR period if d > segment size, start a new iteration j0 while (d + j LFSR period < segment size) repeat r R 1 (d + j LFSR period) / M c M 1 + i M (d + j LFSR period) mod M if r is a suspect row and c is the failing column (if known), then stop; the failing cell is (r, c) j j+1 end while end for.

MUKHERJEE et al.: BIST-BASED FAULT DIAGNOSIS FOR READ-ONLY MEMORIES

1081

VI. Experimental Results This section reports results of experiments carried out to characterize performance of the proposed diagnostic scheme. In particular, a diagnostic coverage is used as a primary gure of merit. Assuming that we target up to x failing rows or columns, all numbers presented in this section have been obtained by adopting the following procedure. 1) Run tests for x + 1 column partition groups. Let xc be the resultant number of failing columns. 2) Repeat the same tests for x + 1 row partition groups. Let xr be the resultant number of failing rows. 3) If neither xc nor xr is less than or equal to x, then carry out the trellis selection and stop. Otherwise: 4) If xc x, then: Examine signatures (one per a failing column) collected in step (1) against single cell faults (by using the discrete logarithms-based counting). 5) If xr x, then: Examine signatures (one per a failing row) collected in step (2) against single cell faults (again by using the discrete logarithms-based counting). The order of actions proposed above plays a key role in optimizing the diagnostic performance. For example, if there are single cell faults only, then both xc and xr are less or equal to x and the discrete logarithms-based counting can be applied to examine both column and row signatures (steps 4 and 5), and subsequently to cross-check their results. Typically, the columns are to be examined prior to the rows. This is because the failing column number is known when launching the diagnostic procedures for columns, so that we can run the discrete logarithms-based reasoning only once for a given (known) injector polynomial. It remarkably minimizes the likelihood of choosing a wrong segment number, which might be the case in step (5) as further illustrated by results of Table IV. On the other hand, in a rare case of multiple row and column failures [19], the trellis selection is the only feasible diagnostic approach, and thus the condition of step (3) must be checked before launching the remaining techniques proposed in this paper. The rst group of experiments examines a relationship between the compactor size and the diagnostic coverage when attempting to identify single cell failures. Table IV presents the diagnostic coverage numbers as a function of the memory and compactor sizes. Each entry to the table indicates a fraction of failures that were correctly diagnosed out of 100 K randomly generated single cell failures. In order to increase statistical signicance of the experiments, the compactor injector network kept changing every 1000 failures. It is worth noting that only failing row signatures were considered as starting points to trace faulty cells. This experiment can be regarded therefore as the worst case analysis as far as the discrete logarithm-based counting is concerned. Typically, one may expect substantial improvement once failing column signatures are also available. As shown in the table, the size of the signature register can be crucial in achieving adequate diagnostic resolution and cov-

TABLE IV Diagnostic Coverage [%] Versus Compactor Size Segment Size 256 1K 4K 16 K 64 K 64 256 1K 4K 16 K Memory Size [kB] 8 32 128 512 2048 8 32 128 512 2048 Compactor Size 24 28 99.22 98.50 97.90 97.57 97.42 99.49 98.59 97.01 94.79 93.01 99.43 98.86 98.41 98.15 98.04 99.83 99.27 97.95 96.20 94.60

20 B = 32 98.49 97.43 96.67 96.19 95.81 B = 128 99.01 96.76 92.45 88.02 85.27

32 99.74 99.44 99.13 98.97 98.88 99.85 99.47 98.75 97.67 96.77

erage. Interestingly, there is a coverage drop when comparing memories of the same capacity but having different number of segments. Apparently, the increasing number of segments adversely impacts the diagnostic coverage. Fortunately, this phenomenon is gradually diminishing with the increasing size of the compactor itself. It also appears that the discrete logarithms-based counting works ne even for memories greater than the compactor period. As an example, consider a 2 MB array comprising over 16.8M memory cells. They may potentially produce 16.8M erroneous patterns. Nevertheless, a 32 input 20 bit compactor with the period of 1 048 575 guarantees almost 96% diagnostic coverage. This is because the diagnostic algorithm targets only memory cells belonging to the indicated failing rows, as presented in Section V. The schemes proposed in this paper were further tested on 128 kB and 2 MB memories working with 16 bit and 32 bit compactors, respectively. This group of experiments was aimed at determining the overall diagnostic coverage for faults commonly exhibited by memories. They are listed in the rst column of Table V. In principle, each entry to the table consists of two numbers. The rst one is the percentage of faults of a given type that were correctly identied. The second number provides the percentage of test cases in which at least rows and/or columns that host the actual failure were part of the solution. Clearly, if the rst number is 100%, the second assumes the same value and is, therefore, omitted in the table if the complete coverage was reached for all cases in a row. Note that each data presented in the table was obtained by injecting 10 K and 5 K randomly generated failures to 128 kB and 2 MB devices, respectively. In each case, the number of failures was chosen arbitrarily. As can be seen, the table presents a predictable tradeoff between accuracy of diagnosis and test application time (measured in memory runssee the header third row). In particular, the increasing number of memory runs increases the diagnostic coverage as well. The best results are achieved for the largest partition groups. Predominantly, the coverage is complete. The proposed scheme always yields a solution that includes all columns and rows that host failing cells. It was meticulously veried during the experiments and is conrmed

1082

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 7, JULY 2011

TABLE V Diagnostic Coverage [%] for Two Memory Arrays Memory Size Architecture Compactor Size Partition groups Memory runs Single cell Two cells Three cells Single row Two rows Three rows Single column Two columns Three columns Single row and single cell Single row and two cells Two rows and single cell Single column and single cell Single column and two cells Two columns and single cell Single row and single column Two rows and single column Single row and two columns Two rows and two columns 128 kB 1024 (32 32) 16 3 4 5 96 128 160 100 100 100 100 100 100 70.48 100 100 100 100 100 100 100 100 100 100 100 83.77 99.98 99.98 100 100 100 100 100 100 99.98 99.98 99.98 100 100 100 83.98 99.99 99.99 100 100 100 99.99 99.99 99.99 100 100 100 84.42 99.98 99.98 100 100 100 83.85 99.99 99.99 100 100 100 99.99 99.99 99.99 100 100 100 84.31 100 100 100 100 100 84.11 100 100 100 100 100 77.55 87.91 93.83 100 1100 100 35.93 73.23 88.91 100 100 100 35.81 73.25 89.18 100 100 100 7.43 54.01 83.81 100 100 100 2 MB 4096 (32 128) 32 3 4 5 192 256 320 100 100 100 100 100 100 83.08 100 100 100 100 100 100 100 100 100 100 100 91.16 100 100 100 100 100 100 100 100 100 100 100 100 100 100 91.42 100 100 100 100 100 98.02 98.02 98.02 100 100 100 87.16 95.26 95.26 100 100 100 89.46 97.26 97.26 100 100 100 100 100 100 100 100 100 91.36 100 100 100 100 100 91.46 100 100 100 100 100 88.18 93.66 96.58 100 100 100 60.70 87.24 95.24 100 100 100 59.16 87.14 95.04 100 100 100 24.24 74.38 92.42 100 100 100

in Table V by the overwhelming presence of 100% numbers. A detailed analysis of the remaining test cases reveals that some diagnostic misses can be attributed to one of the following reasons. 1) Insufcient number of partition groups (mostly columns labeled 3 in the table). One may alleviate this drawback by simply collecting more signatures at the price of a longer test session. 2) Low diagnostic resolution due to a small compactor. It becomes apparent when looking for single cell failures. The diagnostic algorithm returns incorrect faulty sites that still belong to the same failing row/column as the actual faulty cell. As demonstrated earlier, a larger compactor can easily alleviate this problem. 3) Correlation between rows and columns (observed mainly when using the trellis selection). For larger memories this effect is negligible. For instance, there are 32 out of 1024 (3.1%) correlated rows and columns in 128 kB memory, whereas a 2 MB array lowers down this percentage to 1.6% (64 out of 4096). It is also instructive to compare diagnostic times when employing the method proposed in this paper and some conventional techniques. In the rest of this section, we will

delve into two of these techniques. According to the rst one, each ROM address location is read from and its content is dumped into an s-bit register, which is then immediately shifted out. This approach allows diagnosing any number of memory failures. The second method follows a binary search scheme. The ROM address space is divided in half, and the MBIST is run for both halves separately to collect two sbit signatures. Once the failing half is determined, one can continue running MBIST for the corresponding sub-halves to match signatures again. Clearly, this technique allows for correct identication of single memory failures only. Diagnostic time can be derived from the cycle time, memory size n (in terms of its words), and the number of cycles required to perform both read operations and serial download of the resultant signatures. The memory dump technique requires n read cycles and ns shift cycles to download a content of successive memory words. The binary search-based method proceeds as follows. First, it reads all n memory words and dumps two s-bit signatures. Next, it reads n/2 memory locations and again produces two signatures. This is roughly repeated log2 n times. Hence, it takes n + n/2 + n/4 + . . . 2 n cycles to carry out the read operations, and additional 2s log2 n cycles to download all relevant signatures. The approach presented in this paper reads all memory locations g times assuming that one targets at most g1 faults. Since test time in this case is memory-architecture dependent, we will assume the worst-case scenario where the number of rows is equal to the number of memory words n. Therefore, it requires ng cycles. In reality, as shown earlier in this paper, the presence of multiple-word rows may actually accelerate the diagnostic process. Moreover, since the number of partitions for each group is roughly equal to 2h , where h = 0.5 log2 n (see Section III), this scheme produces approximately g n signatures, and it takes sg n cycles to shift them out. Let c denote the MBIST clock cycle, and assume that a shift clock used by a tester is r times slower than the MBIST clock. The three schemes discussed here would have then approximately the following diagnostic time: 1) the memory dump: cn + rcns rcns; 2) the binary search: 2c (n + rs log2 n); 3) the new method: cg(n + rs n). Clearly, the binary search offers the shortest test time. Unfortunately, it will only work for single failures. Let us now assume that n = 1024, s = 32 (the signature size), and r = 10 (the ratio of BIST and ATE clocks.1 ) Then in order to locate faults of multiplicity 3 (which implies g = 4), it will take 327 680 cycles c to diagnose the ROM by using the dump-based approach, whereas the method proposed in this paper requires only 45 056 such cycles, i.e., more than seven times faster. Interestingly, both techniques need a similar test time, if one wants to locate up to 28 different faults. For a
1 It is worth noting that typically the ratio of the speed at which BIST is run versus the speed at which data is shifted out to the ATE is very high. So interrupting BIST to dump the memory contents and to shift them out results in a much bigger overhead than that of calculating signatures for the entire algorithm and shifting them out at the very end of test. Even if one needs to run BIST multiple times (as proposed in this paper), the resultant overhead remains feasible as BIST clock speeds are orders of magnitude higher than shift speeds.

MUKHERJEE et al.: BIST-BASED FAULT DIAGNOSIS FOR READ-ONLY MEMORIES

1083

TABLE VI Test Logic Area Overhead ROM size [kB] Architecture R (M B) n (see Fig. 4) , see (2) Combinational Noncombinational Interconnections Total ROM array Percentage 8 256 (4 64) 4 4 1084 159 1473 2716 82 575 3.29 64 1024 (4 128) 5 8 Area [m2 ] 2899 446 4030 7375 660 602 1.12 128 1024 (32 32) 5 1 605 159 836 1600 1 321 205 0.12 2048 4096 (32 128) 6 2 2708 521 3837 7066 21 139 292 0.033

has been achieved by using low-cost on-chip selection mechanisms, which are instrumental in very accurate and timeefcient identication of failing rows, columns, and single memory cells. In particular, the scheme employs the original designs of row and column selectors with phase shifters controlling the way the address space is traversed. Furthermore, the new combined selection logic allows the scheme to collect test results in parallel (leading to shorter test time) without compromising quality of diagnosis. Results of experiments performed on several memory arrays for randomly generated failures clearly conrm high accuracy of diagnosis of the scheme provided the signature registers and the proposed selection logic are properly tuned to guarantee a desired diagnostic resolution. Acknowledgment The authors would like to acknowledge a private communication with F. Poehl of Inneon Technologies AG, Munich, Germany, concerning logic synthesis of semiconductor memories. References
[1] R. D. Adams, High Performance Memory Testing: Design Principles, Fault Modeling and Self-Test. New York: Kluwer, 2003. [2] D. Appello, V. Tancorre, P. Bernardi, M. Grosso, M. Rebaudengo, and M. Sonza Reorda, Embedded memory diagnosis: An industrial workow, in Proc. ITC, 2006, paper 26.2. [3] S. Barbagallo, A. Burri, D. Medina, P. Camurati, P. Prinetto, and M. Sonza Reorda, An experimental comparison of different approaches to ROM BIST, in Proc. Eur. Comput. Conf., 1991, pp. 567571. [4] I. Bayraktaroglu and A. Orailoglu, The construction of optimal deterministic partitioning in scan-based BIST fault diagnosis: Mathematical foundations and cost-effective implementations, IEEE Trans. Comput., vol. 54, no. 1, pp. 6175, Jan. 2005. [5] T. J. Bergfeld, D. Niggemeyer, and E. M. Rudnick, Diagnostic testing of embedded memories using BIST, in Proc. DATE, 2000, pp. 305309. [6] T. Boehler and G. Lehmann, Using data compression for faster testing of embedded memory, U.S. Patent 6 950 971, Sep. 27, 2005. [7] J. T. Chen and J. Rajski, Method and apparatus for diagnosing memory using self-testing circuits, U.S. Patent 6 421 794, Jul. 16, 2002. [8] J. T. Chen, J. Rajski, J. Khare, O. Kebichi, and W. Maly, Enabling embedded memory diagnosis via test response compression, in Proc. VTS, 2001, pp. 292298. [9] J. T. Chen, J. Khare, K. Walker, S. Shaikh, J. Rajski, and W. Maly, Test response compression and bitmap encoding for embedded memories in manufacturing process monitoring, in Proc. ITC, 2001, pp. 258267. [10] D. W. Clark and L.-J. Weng, Maximal and near-maximal shift register sequences: Efcient event counters and easy discrete logarithms, IEEE Trans. Comput., vol. 43, no. 5, pp. 560568, May 1994. [11] X. Du, N. Mukherjee, W.-T. Cheng, and S. M. Reddy, Full-speed eldprogrammable memory BIST architecture, in Proc. ITC, 2005, paper 45.3. [12] D. Gizopoulos, Ed., Advances in Electronic TestingChallenges and Methodologies. Dordrecht, The Netherlands, Springer, 2006. [13] International Technology Roadmap for Semiconductors. (2009) [Online]. Available: www.itrs.net [14] Y.-H. Lee, Y.-G. Jan, J.-J. Shen, S.-W. Tzeng, M.-H. Chuang, and J.-Y. Lin, A DFT architecture for a dynamic fault model of the embedded mask ROM of SoC, in Proc. Int. Workshop Memory Technol. Design Testing, 2005, pp. 7882. [15] J.-F. Li and C.-W. Wu, Memory fault diagnosis by syndrome compression, in Proc. DATE, 2001, pp. 97101. [16] G. Mrugalski, J. Rajski, and J. Tyszer, Ring generators: New devices for embedded deterministic test, IEEE Trans. Comput.-Aided Design, vol. 23, no. 9, pp. 13061320, Sep. 2004. [17] N. Mukherjee, A. Pogiel, J. Rajski, and J. Tyszer, High throughput diagnosis via compression of failure data in embedded memory BIST, in Proc. ITC, 2008, paper 3.1.

4096-word ROM and otherwise the same conditions, the new method would be almost 23 times faster. VII. Hardware Overhead The silicon area of test logic amounts to a certain number of gates and ip-ops. The number of gates depends on the memory word size, which in turn affects the number of inputs of the signature register and the size of its XOR injection network. Furthermore, the number of gates is implied by the mux factor M, as the ratio of M and B determines the number c of columns to be observed at a time, and thereby gives the number of phase shifters, and thus XOR gates. Table VI provides the actual area costs computed with a commercial synthesis tool for four memory arrays of different capacity and architecture. All components of our test logic were synthesized using a 90 nm CMOS standard cell library under 3.5 ns timing constraint. For indicated memory sizes and the relevant parameters n and , the table reports the resultant silicon area with respect to combinational and non-combinational devices, as well as their interconnection network. The total area taken by the proposed test logic is subsequently compared with the corresponding ROM array area (determined based on independent data provided by two silicon manufacturers). The area occupied by test logic and expressed as a fraction of ROM area is reported in the last row of the table as a percentage. Clearly, the area overhead of the proposed diagnostic circuitry is an insignicant part of the entire real estate designated to host ROM arrays, their controllers, and a BIST infrastructure. In particular, a small amount of sequential circuitry is required in each test case. Consequently, the numbers of Table VI make the proposed diagnostic scheme very attractive as far as its silicon cost is concerned. VIII. Conclusion In this paper, we proposed a new fault diagnosis scheme for embedded read-only memories. It reduces the diagnostic data that needs to be scanned out during ROM test such that the minimum information to recover the failure data is preserved, and the time to unload the data is minimized. The presented approach allows an uninterrupted collection and processing of test responses at the system speed. This

1084

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 7, JULY 2011

[18] N. Mukherjee, A. Pogiel, J. Rajski, and J. Tyszer, Fault diagnosis for embedded read-only memories, in Proc. ITC, 2009, paper 7.1. [19] P. Nagvajara and M. G. Karpovsky, Built-in self-diagnostic read-onlymemories, in Proc. ITC, 1991, pp. 695703. [20] D. Niggemeyer and E. M. Rudnick, Automatic generation of diagnostic memory tests based on fault decomposition and output tracing, IEEE Trans. Comput., vol. 53, no. 9, pp. 11341146, Sep. 2004. [21] J. Rajski and J. Tyszer, Diagnosis of scan cells in BIST environment, IEEE Trans. Comput., vol. 48, no. 7, pp. 724731, Jul. 1999. [22] J. Rajski, N. Tamarapalli, and J. Tyszer, Automated synthesis of phase shifters for built-in self-test applications, IEEE Trans. Comput.-Aided Design, vol. 19, no. 10, pp. 11751188, Oct. 2000. [23] J. Rajski, N. Mukherjee, J. Tyszer, and A. Pogiel, Fault diagnosis in memory BIST environment, U.S. Patent Applicat. 20110055646, Sep. 18, 2008. [24] C. Selva, R. Zappa, D. Rimondi, C. Torelli, and G. Mastrodomenico, Built-in self diagnosis device for a random access memory and method of diagnosing a random access memory, U.S. Patent 7 571 367, Aug. 4, 2009. [25] A. K. Sharma, Semiconductor Memories: Technology, Testing and Reliability. New York: Wiley, 2002. [26] J. Volrath, K. White, and M. Eubanks, On-chip circuits for high speed memory testing with a slow memory tester, U.S. Patent 6 404 250, Jun. 11, 2002. [27] L.-T. Wang, C.-W. Wu, and X. Wen, VLSI Test Principles and Architectures. Design for Testability. New York: Morgan Kaufmann Publishers, 2006. [28] C.-W. Wu, R.-F. Huang, C.-L. Su, W.-C. Wu, Y.-J. Chang, K.-L. Luo, and S.-T. Lin, Method and apparatus of build-in self-diagnosis and repair in a memory with syndrome identication, U.S. Patent 7 228 468, Jun. 5, 2007. [29] H. Yamauchi, Semiconductor memory device for build-in fault diagnosis, U.S. Patent Applicat. 20 050 262 422, Nov. 24, 2005.

Artur Pogiel (M09) received the M.S. degree in electrical engineering and the Ph.D. degree in telecommunications from the Pozna University of Technology, Pozna, Poland, in 2002 and 2008, respectively. He was a Teaching Assistant with the Faculty of Electronics and Telecommunications, Pozna University of Technology, until 2008. He is currently a Software Development Engineer with Mentor Graphics Polska, Pozna. He has published 12 technical papers in various IEEE journals and conferences. He is a co-inventor on ve U.S. patents. His main research interests include design for testability, built-in self-testing, embedded testing, and fault diagnosis. Dr. Pogiel was the co-recipient of the Best Paper Award at the 2009 VLSI Design Conference.

Nilanjan Mukherjee (S87M89) received the B.Tech. (Hons.) degree in electronics and electrical communication engineering from the Indian Institute of Technology Kharagpur, Kharagpur, India, in 1989, and the Ph.D. degree from McGill University, Montreal, QC, Canada, in 1996. He is currently a Software Development Director of the Design-to-Silicon Division, Mentor Graphics Corporation, Wilsonville, OR. With Mentor Graphics Corporation, he was a co-inventor of the embedded deterministic test (EDT) technology and was a Lead Developer for the leading test compression tool in the industry, TestKompress. Prior to joining Mentor Graphics Corporation, he was with Lucent Bell, Holmdel, NJ, where he primarily contributed to the areas of logic built-in self-test, RTL testability analysis, path-delay testing, and online testing. He has published more than 45 technical articles in various IEEE journals and conferences. He is a co-inventor on 27 U.S. patents. He was an invited author for the special issue of the IEEE Communications Magazine in June 1999. His current research interests include developing next generation test methodologies for deep submicrometer designs, test data compression, test synthesis, memory testing, and fault diagnosis. Dr. Mukherjee was the co-recipient of the Best Paper Award at the 2009 VLSI Design Conference, the Best Paper Award at the 1995 IEEE VLSI Test Symposium, and the Best Student Paper Award at the Asian Test Symposium in 2001. His paper in EDT at the International Test Conference in 2002 was recognized as one of the most signicant papers of ITC published in the last 35 years. He received the prestigious 2006 IEEE Circuits and Systems Society Donald O. Pederson Outstanding Paper Award recognizing the paper in EDT published in the IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems. He served on the program committees of several IEEE conferences, including the Asian Test Symposium, the International Test Synthesis Workshop, the VLSI Design and Test Symposium, the Symposium on Design and Diagnostics of Electronics Circuits and Systems, and VLSI Design.

Janusz Rajski (A87SM10) received the M.Eng. degree in electrical engineering from the Technical University of Gda sk, Gda sk, Poland, in 1973, and n n the Ph.D. degree in electrical engineering from the Pozna University of Technology, Pozna, Poland, in 1982. From 1973 to 1984, he was a Faculty Member with the Pozna University of Technology. In June 1984, he joined McGill University, Montreal, QC, Canada, where he became an Associate Professor in 1989. In January 1995, he became the Chief Scientist with Mentor Graphics Corporation, Wilsonville, OR. His main research interests include design automation and testing of very large scale integration systems, design for testability, built-in self-test, and logic synthesis. He has published more than 150 research papers in these areas and is a co-inventor on 73 U.S. and international patents. He is the principal inventor of the embedded deterministic test technology used in the rst commercial test compression product, TestKompress. He is the co-author of Arithmetic Built-In Self-Test for Embedded Systems (Englewood Cliffs, NJ: Prentice-Hall, 1997). Dr. Rajski was the co-recipient of the 1993 Best Paper Award for the paper in logic synthesis published in the IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems, the co-recipient of the 1995 and 1998 Best Paper Awards at the IEEE VLSI Test Symposium, the co-recipient of the 1999 and 2003 Honorable Mention Awards at the IEEE International Test Conference, as well as the co-recipient of the 2006 IEEE Circuits and Systems Society Donald O. Pederson Outstanding Paper Award recognizing the paper in embedded deterministic test published in the IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, and the 2009 Best Paper Award at the VLSI Design Conference. He was the Guest Co-Editor of the June 1990 and January 1992 special issues of the IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems devoted to the 1987 and 1989 International Test Conferences, respectively. In 1999, he was a Guest CoEditor of the special issue of the IEEE Communications Magazine devoted to testing of telecommunication hardware. He was the Associate Editor for the IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, the IEEE Transactions on Computers, and the IEEE Design and Test of Computers Magazine. He has served on technical program committees of various conferences, including the IEEE International Test Conference and the IEEE VLSI Test Symposium.

Jerzy Tyszer (M91SM96) received the M.Eng. (Hons.) degree in electrical engineering from the Pozna University of Technology, Pozna , Poland, n n in 1981, the Ph.D. degree in electrical engineering from the Pozna University of Technology in 1987, n and the Dr.Hab. degree in telecommunications from the Technical University of Gda sk, Gda sk, Poland, n n in 1994. From 1982 to 1990, he was a Faculty Member with the Pozna University of Technology. In January n 1990, he joined McGill University, Montreal, QC, Canada, where he was a Research Associate and Adjunct Professor. In 1996, he became a Professor with the Faculty of Electronics and Telecommunications, Pozna University of Technology. He has published eight books, more n than 100 research papers in his areas of expertise, and is a co-inventor on

MUKHERJEE et al.: BIST-BASED FAULT DIAGNOSIS FOR READ-ONLY MEMORIES

1085

53 U.S. and international patents. He is the co-author of Arithmetic Built-In Self-Test for Embedded Systems (Englewood Cliffs, NJ: Prentice-Hall, 1997), and is the author of Object-Oriented Computer Simulation of Discrete Event Systems (Boston, MA: Kluwer, 1999). His main research interests include design automation and testing of very large scale integration (VLSI) systems, design for testability, built-in self-testing, embedded testing, and computer simulation of discrete event systems. Dr. Tyszer was the co-recipient of the 1995 and 1998 Best Paper Awards at the IEEE VLSI Test Symposium, the 2003 Honorable Mention Award at the IEEE International Test Conference, the 2006 IEEE Circuits and Systems

Society Donald O. Pederson Outstanding Paper Award recognizing the paper in embedded deterministic test published in the the IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, and the 2009 Best Paper Award at the VLSI Design Conference. In 1999, he was a Guest Co-Editor of the special issue of the IEEE Communications Magazine devoted to testing of telecommunication hardware. He has served on technical program committees of various conferences, including the IEEE International Test Conference, the IEEE VLSI Test Symposium, and the IEEE European Test Symposium.