Testing Parity-Based Error Detection and Correction Circuits Charles E. Stroud

Outline of Presentation

‰ Testing Exclusive-OR (XOR) Gates ™ Pin faults vs. gate-level faults

‰ Parity Circuits ™ Basic operation and design ™ Testing parity trees 9 C-testable for known connections 9 Pseudo-exhaustive test set for unknown connections ¾ Stuck-at & bridging fault simulation results using AUSIM

‰ Hamming Circuits ™ Basic operation and design 9 Use in FPGAs

™ Pseudo-exhaustive test set for unknown connections 9 Stuck-at & bridging fault simulation results using AUSIM

‰ Summary and Conclusions C. Stroud 9/6/06

VLSI Design & Test Seminar

2

Some Test Definitions ‰ Exhaustive Test ™ Apply all possible test patterns to circuit under test (CUT)

‰ Pseudo-exhaustive Test ™ Apply all possible test patterns to every subcircuit in CUT 9 McCluskey, Trans on Comp. ‘84

‰ C-testable ™ A circuit is C-testable if it can be tested with a constant number of test patterns, regardless of size of CUT 9 Friedman, Trans on Comp. ‘73

‰ N-detect Test Set ™ Every fault in CUT detected ≥ N times by N different vectors 9 Elementary logic gate, at-speed N-detect test sets (N ≥ 3 to 5) are effective in detecting delay, bridging, and transistor faults ¾ McCluskey & Tseng, ITC’00 C. Stroud 9/6/06

VLSI Design & Test Seminar

3

XOR Gates ‰ Elementary logic gates:

S

™ AND, OR, NOT, NAND, NOR

T

‰ XOR not considered an elementary logic gate by most designers ™ Implementation requires multiple elementary logic gates

‰ Note that: S⊕T=R, T⊕R=S, and R⊕S=T ™ Linear function S

SOP implementation 2-level AND-OR

S

Truth Table S T R 0 0 0 1 0 1 1 1 0 0 1 1

CMOS implementation AOI21 & NOR

R T

R

R

NAND implementation T

Transistor implementation

S R

S T

R

T C. Stroud 9/6/06

VLSI Design & Test Seminar

4

Testing XOR Gates ‰ Stuck-at faults

S

™ Stuck-at-0 (sa0) and stuck-at-1 (sa1)

T

‰ To detect all pin faults:

R

Fault-free S T R S T R sa0 sa1 sa0 sa1 sa0 sa1 9 Mourad & McCluskey, Trans. on IE ‘89 0 0 0 0 1 0 1 0 1 ™ But note that any 3 vectors will work 0 1 1 1 0 0 1 0 1 1 0 1 0 1 1 0 0 1 ‰ To detect all gate-level faults: 1 1 0 1 0 1 0 0 1

™ Need 3 vectors {01, 10, and 00 or 11}

™ Need all 4 vectors

S

sa1 {11} sa0 {01} X X sa1 {00} X

T

S

X sa0 {10}

T C. Stroud 9/6/06

X sa1 {10}

R

S

sa1 {01} X

sa1 {01} X X sa1 {10}

sa1 {00} X sa1 {11} X

T

sa0 {11} X X sa0 {00}

S R

VLSI Design & Test Seminar

Note: # nets doubles to triples for R bridging faults sa1 {00} X

R

sa0 {10}

T

sa0 {11} X sa1 {01}

5

Alternate View

‰ Theorem 1: A set of test vectors that detects all single stuck-at faults on all primary inputs of a fanout-free combinational logic circuit will detect all single stuck-at faults in that circuit.

S R

T

™ No fanout in XOR when considering pin faults

‰ Theorem 2: A set of test vectors that detects all single stuck-at faults on all primary inputs and all fanout branches of a combinational logic circuit will detect all single stuck-at faults in that circuit. S ™ 2 or 3 fanout stems in any gate-level implementation of XOR ™ All 4 vectors needed to detect faults on fanout branches 9 Need exhaustive testing C. Stroud 9/6/06

VLSI Design & Test Seminar

R

T S R T 6

Hamming Distance ‰ Distance, d = # bits different between 2 words ™ Example, d=3 9 00110100 9 01100101

‰ Used in error detection & correction codes ™ d = minimum distance between 2 valid code words 9 Invalid code words represent error conditions

™ d = E + C + 1, where E ≥ C 9 E = # detectable bit errors 9 C = # correctable bit errors

™ examples

3 4

9 d=1: no detection or correction (regular data) 9 d=2: 1-bit detection, no correction (parity) 9 d=3: 1-bit detection & correction or 2-bit detection 9 d=4: 2-bit detection & 1-bit correction C. Stroud 9/6/06

d 1 2

VLSI Design & Test Seminar

5

E 0 1 1 2 3 2 4 3 2

C 0 0 1 0 0 1 0 1 2 7

Parity Error Detection

‰ Add 1 bit to create valid code words with d=2 ‰ Detects single bit errors ™ Also detects all odd number bit errors

‰ Even parity has even # 1s in code word ™ Code word = data + parity bit

‰ Odd parity has odd # 1s Data Code Word Word even parity odd parity 00 00 0 00 1 01 01 1 01 0 10 10 1 10 0 11 11 0 11 1 C. Stroud 9/6/06

VLSI Design & Test Seminar

100 000

101 001

110 010

111 011

3-cube for even parity valid code word invalid code word = error 8

Parity Generator ‰ XOR tree to generate parity bit for N data bits ™ #XOR gates = N-1 9 Balanced tree (#levels=⎡log2N⎤) 9 Linear tree (#levels=N-1) 9 There are other types

‰ C-testable with 4 vectors ™ All gate-level stuck-at faults 9 Mourad & McCluskey, IEEE Trans. on IE 1989

D0

S R D1 R S

R

S D1 R D2 T D3 R D4 S D5 T D6 S D7

™ Recall: S⊕T=R, T⊕R=S, & R⊕S=T ™ Algorithm: 9 Assign one vector set to output 9 Assign other 2 sets to inputs 9 Repeat to primary inputs

™ Pseudo-exhaustive testing ™ Assumes connections are known C. Stroud 9/6/06

D0

VLSI Design & Test Seminar

T D2 R

T S

R

T

S

S

P D3 T T R T R D4 R

R

S 0 D5 R S Vector Sets S T R 0 0 0 0 1 1 1 0 1 1 1 0

T S D6 R T S R D7 R S

PT 9

Parity Generator

‰ What if connections are not known? ™ All 0s

9 Applies {00} to all gates

¾ Detects any sa1 pin fault

™ Walking 1 thru 0s

9 Applies {01, 10} to all gates

¾ Cumulatively detects all pin faults ƒ But not all gate-level faults ¾ Detects all bridging faults ƒ Mourad & McCluskey, TIE‘89

™ All 1s

D0

101 0 0

101 0 0 D1 0 101 0 D2 101 0 0 D3 0 101 0 D4 101 0 0 D5 1 001 0 D6 010 1 D7

101 0 0 101 0 0 001 1 0

101 0 0

P 110

010 1 1

110 0 1

9 Applies {11} to input gates

¾ Cumulatively detects gate-level stuck-at faults in first level gates

™ All combinations of two 1s in field of 0s

9 Applies {00, 01, 10, 11} to all gates except output XOR gate ¾ Detects all gate-level stuck-at faults in all gates except output

‰ Pseudo-exhaustive test set # vectors

⎛N⎞ N2 + N = ⎜⎜ 2 ⎟⎟ + N = 2 ⎝ ⎠

™ Walking 1s and all combinations of two 1s in field of 0s C. Stroud 9/6/06

VLSI Design & Test Seminar

10

Parity Check

‰ Regenerate parity over data ‰ Compare regenerated parity is incoming parity ™ Mismatch indicates bit error ™ Match assumed to indicate no error

‰ Complete pseudo-exhaustive test set ™ Same as for generator with extra input(s) for incoming parity (and parity control) ⎛N⎞ N2 + N ™ N = # data bits +2 # vectors = ⎜⎜ ⎟⎟ + N = ⎝2⎠

D0

2

D1 D2

C. Stroud 9/6/06

D3 D4

P

D5 D6

Control Pin

D7

VLSI Design & Test Seminar

Perror

11

Gate-Level Fault Simulation Results

‰ Example 64-bit parity tree

9 64-bit generator, 63-bit parity check, or 62-bit check w/control

™ 254 collapsed pin stuck-at faults ™ 504 collapsed gate-level stuck-at faults (CMOS standard cell XOR)

‰ 100% fault coverage with walking 1s plus all combinations of two 1s in field of 0s ™ N-detectability: N=37 (pin faults) 9 Gate faults: N=1 (32 flts), N=4 (16 flts), N=18 (8 flts), N=32 (4 flts), N=37

Pin faults Test Pattern # vectors FCIND FCCUM All 0s 1 50% 50% Walking 1s 64 99.6% 100% All 1s 1 50% 100% Two 1s in 0s 2016 99.6% 100% C. Stroud 9/6/06

VLSI Design & Test Seminar

Gate faults FCIND FCCUM 25% 25% 87.1% 87.5% 25% 93.8% 99% 100% 12

Bridging Faults ‰ Wired-AND/Wired-OR fault model (bipolar tech) ™ 1 vector {01 or 10} observing 2 outputs (A’ and B’), or ™ Observe 1 output (A’ or B’) with 2 vectors {01, 10}

‰ Dominant fault model (more accurate for CMOS) ™ 1 vector {01 or 10} observing 2 outputs (A’ and B’) 9 harder to detect that wired-AND/OR (less observable) 9 detecting all dominant BFs ⇒ detects all wired-AND/OR BFs A

A

A’

B’ AB A’B’ WAND WOR AdomB BdomA Wired-AND fault model 00 0 0 00 00 00 00 01 0 1 00 11 00 11 A’ A 10 1 0 00 11 11 00 11 1 1 11 11 11 11 B’ B Wired-OR fault model

B

C. Stroud 9/6/06

VLSI Design & Test Seminar

A’

B B’ A dominates B model A

A’

B B’ B dominates A model 13

Bridging Fault Simulation Results

‰ Dominant bridging fault model using ordered list of nets with bridging faults on adjacent nets in list ™ 125 BFs for pin faults ™ 377 BFs for gate-level faults ™ Includes feedback BFs

127 pin fault nets: 2x(N-choose-2) = 16,002 BFs 253 gate-level fault nets: 2x(N-choose-2) = 63,756 BFs

9 Faults causing oscillations assumed to be detected

‰ 100% FC with all combinations of two 1s in field of 0s ™ 100% gate-level BF not obtained for walking 1s 9 Mourad & McCluskey, Trans on IE ’89 considered only pin faults

Pin faults Test Pattern # vectors FCIND FCCUM All 0s 1 0.8% 0.8% Walking 1s 64 100% 100% 1 0.8% 100% All 1s Two 1s in 0s 2016 98.4% 100% C. Stroud 9/6/06

VLSI Design & Test Seminar

Gate faults FCIND FCCUM 48.5% 48.5% 82.5% 83.3% 49.1% 91.8% 100% 100% 14

Calculating Hamming Code ‰ H = # Hamming bits ™ D+H+1 ≤ 2H 9 D= # data bits 9 Hamming, BSTJ ‘50

‰ D=8 example ™ ™ ™ ™

H1=D1⊕D2⊕D4⊕D5⊕D7 H2=D1⊕D3⊕D4⊕D6⊕D7 H3=D2⊕D3⊕D4⊕D8 H4=D5⊕D6⊕D7⊕D8

‰ Hamming distance, d=3=E+C+1 ™ Single bit error detection & correction (SEC) ‰ Additional parity bit, d=4=E+C+1 ™ Parity over data & Hamming bits ™ Double bit error detection (DED) & single bit error correction (SEC) 9 E=2, C=1

C. Stroud 9/6/06

Position 1 2 3 4 5 6 7 8 9 10 11 12 H1 H2 D1 H3 D2 D3 D4 H4 D5 D6 D7 D8 Bit Parity H1 1 0 1 0 1 0 1 0 1 0 1 0 Parity H2 0 1 1 0 0 1 1 0 0 1 1 0 Parity H3 0 0 0 1 1 1 1 0 0 0 0 1 Parity H4 0 0 0 0 0 0 0 1 1 1 1 1 Syndrome 000 001 010 011 100 101 110 111 0000 no err H1 H2 D1 H3 D2 D3 D4 0001 H4 D5 D6 D7 D8 D9 D10 D11 0010 H5 D12 D13 D14 D15 D16 D17 D18 0011 D19 D20 D21 D22 D23 D24 D25 D26 0100 H6 D27 D28 D29 D30 D31 D32 D33 0101 D34 D35 D36 D37 D38 D39 D40 D41 0110 D42 D43 D44 D45 D46 D47 D48 D49 0111 D50 D51 D52 D53 D54 D55 D56 D57 1000 H7 D58 D59 D60 D61 D62 D63 D64 Error Type Condition No bit error Hamming match, no parity error 1-bit correctable error Hamming mismatch, parity error 2-bit error detection Hamming mismatch, no parity error

VLSI Design & Test Seminar

15

Hamming Code Operation ‰ Example: a RAM or a hard drive ‰ Input (Generate Circuit): ™ Generate Hamming code for data ™ Store data and Hamming bits

‰ Output (Detect/Correct Circuit): ™ Regenerate Hamming code for data ™ Bit-wise XOR with stored Hamming bits 9 Non-zero syndrome indicates ¾ Error detection ¾ Bit position of error bit ƒ Flip that bit to correct

Hstored Hregenerated

C. Stroud 9/6/06

VLSI Design & Test Seminar

H

H

™ Use extra parity to determine nonSyndrome correctable double bit error 9 Can disable correction circuit

Syndrome

H

Di

Di’

H

16

Error Detection and Correction ‰ Single bit error examples ™ D3 is erroneous

Position 1 2 3 4 5 6 7 8 9 10 11 12 H1 H2 D1 H3 D2 D3 D4 H4 D5 D6 D7 D8 Bit 9 Changes H2 and H3 1 0 1 0 1 0 1 0 1 0 1 0 ¾ Syndrome = 0000 110 = bit 6 0 1 1 0 0 1 1 0 0 1 1 0 ™ D6 is erroneous 0 0 0 1 1 1 1 0 0 0 0 1 0 0 0 0 0 0 0 1 1 1 1 1 9 Changes H2 and H4

¾ Syndrome = 0001 010 = bit 10

™ Odd number of bits change 9 Overall parity bit error (SEC)

‰ Double bit error example ™ D3 and D6 are erroneous 9 Changes H3 and H4 (not H2) ¾ Syndrome = 0001 100 = bit 12 ƒ Indicates error in D8

™ Even number of bits change

Syndrome 0000 0001 0010 0011 0100 0101 0110 0111 1000

000 001 010 011 100 no err H1 H2 D1 H3 H4 D5 D6 D7 D8 H5 D12 D13 D14 D15 D19 D20 D21 D22 D23 H6 D27 D28 D29 D30 D34 D35 D36 D37 D38 D42 D43 D44 D45 D46 D50 D51 D52 D53 D54 H7 D58 D59 D60 D61

101 D2 D9 D16 D24 D31 D39 D47 D55 D62

110 111 D3 D4 D10 D11 D17 D18 D25 D26 D32 D33 D40 D41 D48 D49 D56 D57 D63 D64

9 No overall parity error (DED) C. Stroud 9/6/06

VLSI Design & Test Seminar

17

Xilinx Virtex 4 FPGAs ‰Contain 48 to 552 18K-bit dual-port RAMs ™Program from 16Kx1-bit RAM to 512x36-bit RAM ™Can operate as 24 to 276 36K-bit RAMs with ECC 9512x72-bit RAMs 9Hamming code 964-bit data 97-bit Hamming ¾ Single error correction

PPC

PPC

91-bit overall parity ¾ Double error detection

™Can also operate as FIFOs 9With ECC mode 9Or without ECC C. Stroud 9/6/06

VLSI Design & Test Seminar

=DSPs =PLBs =Block RAMs/FIFOs =I/O Buffers 18

Xilinx Virtex-4 ECC RAM

‰ Separate Hamming code generators

™ Separate write & read ports

‰ Reconvergent fanout in Generate circuit ‰ No direct observability or control of Hamming or parity bits to detect faults in FT circuit Input Data

D=64 Hamming Code Generator Parity Bit Generator Generate

C. Stroud 9/6/06

Syndrome 0000 0001 0010 0011 0100 0101 0110 0111 1000

H=7

512 words 64+7+1 bits/word

write addr

H

read addr

010 H2 D6 D13 D21 D28 D36 D44 D52 D59

011 D1 D7 D14 D22 D29 D37 D45 D53 D60

100 H3 D8 D15 D23 D30 D38 D46 D54 D61

101 D2 D9 D16 D24 D31 D39 D47 D55 D62

110 D3 D10 D17 D25 D32 D40 D48 D56 D63

111 D4 D11 D18 D26 D33 D41 D49 D57 D64

Bit Error D Output Correction Circuit Data

D RAM Core

000 001 no err H1 H4 D5 H5 D12 D19 D20 H6 D27 D34 D35 D42 D43 D50 D51 H7 D58

Parity Bit Generator

Parity Check

Hamming H Hamming Check Code Generator Detect/Correct

VLSI Design & Test Seminar

DED Error Indicators

SEC

19

Testing ECC Circuitry

‰ Init: initialize RAM with vectors with Hamming bit errors ™ Then read out to test Detect/Correct circuit 9 Note: 72 inputs to Detect/Correct circuit vs. 64 inputs to Generate circuit

‰ Collapsed pin stuck-at faults ™ Generate circuit = 1076 ™ Detect/Correct circuit = 2035

‰ Collapsed gate-level stuck-at faults (CMOS standard cell XOR) ™ Generate circuit = 2112 ™ Detect/Correct circuit = 3359 # Pin fault Gate fault Vectors detection detection all 0s; walk 1-thru-0s 65 100% 87.7% Generate all 1s 1 50% 26.5% walk two 1s-thru-0s 2016 99.9% 99.6% Output of ECC generate vectors 2082 56% 58.4% Detect & Init: all 0s; walk 1-thru-0s; all 1s; 321 100% 95.2% Correct all Hamming values w/ data=0s; Init: walk two 1s-thru-0s 2556 73.5% 71.9% 51 configurations configuration of ofFPGA FPGAblock blockRAM RAMcontents contents Circuit

C. Stroud 9/6/06

Vectors

VLSI Design & Test Seminar

Cum. FC 87.7% 93.9% 100% 58.4% 98.1% 100% 20

Testing ECC Circuitry

‰ Assuming dominant bridging fault model

™ Recall: detecting all dominant BFs detects all wired-AND/OR BFs

‰ Number of pin bridging faults ™ Generate circuit = 527 ™ Detect/Correct circuit = 821

‰ Number of gate-level bridging faults ™ Generate circuit = 1583 ™ Detect/Correct circuit = 2165

Note: using ordered list of nets with pair-wise faulting of nets in order of list and applying both combinations of each net dominating the other

# Pin BF Gate BF Cum. Vectors detection detection FC all 0s; walk 1-thru-0s 65 100% 83.7% 83.7% all 1s 1 17.5% 46.2% 91.9 % Generate walk two 1s-thru-0s 2016 99.6% 99.9% 100% Output of ECC generate vectors 2082 78.9% 79.1% 79.1% Detect & Init: all 0s; walk 1-thru-0s; all 1s; 321 100% 92.5% 97% Correct all Hamming values w/ data=0s; Init: walk two 1s-thru-0s 2556 95.6% 85.7% 100% Circuit

C. Stroud 9/6/06

Vectors

VLSI Design & Test Seminar

21

Test Pattern Generator ‰ Use TPG for reprogrammable PLAs ™ From Designer’s Guide to BIST ™ Two N-bit shift registers with reset implemented in PLBs ™ Generates (N+1)2 vectors as shown 9 All 0s 9 Walking 1 through field of 0s 9 All combinations of two 1s in field of 0s 2

™ Total unique vectors = ⎛⎜⎜ N ⎞⎟⎟ + N + 1 = N 2 ⎝

data out data in

N-bit Shift Register

1 0

0 1 1 0

D0 C. Stroud 9/6/06

1 0 1 0

0 1 0 1

0 1 0 1 DN-1



+N +2 2

enable data in

1 0

data out

N-bit Shift Register 0 1 1 0

0 0

Done

VLSI Design & Test Seminar

22

Summary and Conclusions ‰Parity error detection circuits ™Previous algorithms for 100% fault detection 9But only for known XOR connections in parity tree 9And only detects all pin-level bridging faults ¾ not gate-level bridging faults

™Pseudo-exhausitve test set: 9Walk a 1 through a field of 0s , and 9All combinations of two 1s in a field of 0s ¾ Detects all gate-level stuck-at and bridging faults in parity tree ¾ Independent of XOR connections C. Stroud 9/6/06

VLSI Design & Test Seminar

23

Summary and Conclusions ‰Hamming code error correction circuits ™Problem: detecting faults in circuit designed to tolerate faults ™Solution: initialize RAM with Hamming error conditions 9Same pseudo-exhaustive test set for parity tree ¾ Detects all gate & bridging faults in Hamming code generator circuit

9Add all Hamming bit values with data bits = all 0s ¾ Cumulatively detects all gate & bridging faults in error detect/correct correction circuit

™Question: Is there a formal proof? C. Stroud 9/6/06

VLSI Design & Test Seminar

24