GENERAL  ARTICLE

Error Detection in Numeric Codes A Siddharth

This article investigates the e±ciency of four commonly used methods for detecting the most frequent types of errors committed by individuals while entering numeric codes. 1. Introduction Numeric codes are used in various situations. The UPC (Universal Product Code), UID (Unique Identi¯cation Number), credit card numbers, ISBN (International Standard Book Number), airline ticket identi¯cation numbers, bank account numbers, phone numbers, to name a few, are all numeric codes. Quite often, it may be required that these codes be entered into a database manually by data entry operators or by users to obtain some information. While doing so, errors are often made, and this may lead to many undesirable consequences. For example, entering a wrong ISBN may result in purchase of a di®erent book, entering a wrong debit card number may result in a debit from a wrong bank account. So, it is very important that these errors are detected during data entry so that the codes are rejected and the user cautioned. It is for this purpose that many error detection methods were designed. However, it is not possible to design a method which detects all possible errors. The error detection methods are designed in such a way that they detect the most common errors which occur while entering the numeric codes. There are several coding methods but the most popular ones are:

A Siddharth is currently pursuing BTech in computer science and engineering at IIT Patna. His interests include watching and playing cricket, listening to music and playing sitar. His research interests include cryptography and pattern recognition.

² The Modulus 10 Method (Luhn Algorithm), ² The EAN-13 Scheme (European Article Number; currently International Article Number)

RESONANCE  July 2012

Keywords Decimal codes, error detection, Verhoeff scheme.

653

GENERAL  ARTICLE

The most common errors committed while entering numeric codes are found after extensively studying the errors made by data entry clerks.

² The Modulus 11 Method and, ² The Verhoe® Scheme. In this article, we will see how each of these methods helps in detecting the most common errors and also compare their e±ciencies. 2. Common Errors The most common errors committed while entering numeric codes are found after extensively studying the errors made by data entry clerks [1]. They are: Single Digit Error (Transcription Error): This error occurs when only a single digit in the whole code is misread/typed wrongly. For example, 18367521 typed as 18367541. Here the digit 2 is incorrectly typed as 4. Single Transposition Error: It occurs when the ith and the (i + 1)th digits are replaced by the (i + 1)th and the ith respectively. For example, 12367521 typed as 12365721. Here the digits 7 and 5 are transposed and this is a transposition error. Twin Error: This error occurs when two consecutive identical digits are incorrectly typed as two other identical digits. For example, 122345 typed as 177345. Here the digits 22 are incorrectly typed as 77. Jump Transposition Error: It occurs when the ith and the (i + 2)th digits are replaced by the (i + 2)th and the ith respectively. For example, 12367521 typed as 12357621. Here the digits 6 and 5 are jump transposed causing this error. Jump Twin Error: This error occurs when the ith and (i + 2)th digits are identical and are mistyped as two other identical digits. For example, 123245 mistyped as 173745. As 232 is mistaken to be 737, so there is a jump twin error.

654

RESONANCE July 2012

GENERAL  ARTICLE

Error Type

Probability of Occurrence

Single Error Single Transposition Twin Error Jump Transposition Jump Twin Error Phonetic Error

Table 1. Common error types and their statistical probabilities of occurrence.

79.1% 10.2% 0.5% 0.8 % 0.3% 0.5%

Phonetic Error: In many languages around the world, 1a and a0 (where a is any digit from 2 to 9) sound similar. This causes a lot of errors. Thus in a code, when 1a is heard as a0 and vice-versa, there is a phonetic error. For example, 13546 (thirteen thousand ¯ve hundred and forty six) may be heard as 30546 (thirty thousand ¯ve hundred and forty six) and incorrectly entered. It may also be noted that more than one type of error can occur at the same time. A large number of experiments have been carried out to ¯nd the frequency of occurrence of these errors and the results are given in Table 1 [1]. 3. Error Detection Methods There are two parts to every error detection method. The ¯rst part is assigning a check digit as per the checking algorithm of a particular method and appending it to the end of the code. The second part is checking if a particular code is correct. The four most popular error detection methods are the Modulus 10 method, the EAN-13 scheme, the Modulus 11 method and the Verhoe® scheme. All of these methods append a single check digit at the end of the code. The check digit is di®erent for each of the four methods mentioned above as they use di®erent algorithms to compute it.

RESONANCE  July 2012

There are two parts to every error detection method. The first part is assigning a check digit and appending it to the end of the code. The second part is checking if a particular code is correct.

655

GENERAL  ARTICLE

The modulus 10 method uses modulo arithmetic with its modulus as 10.

3.1 The Modulus 10 Method The modulus 10 method is used to detect errors in the postal code called zip (zoning improvement plan) code in USA[1]. As the name suggests, the modulus 10 method uses modulo arithmetic with its modulus as 10. The method is as follows: Part 1: Assigning check digit. Let the number to which the check digit is to be assigned be an an¡1. . . a 2 . Here, 0  ai  9, where 2  i  n and n is an integer greater than 1. Check digit, a1 , is found such that à n ! X ai (mod 10) = 0: i=1

The check digit is appended as the last digit of the number. Part 2: Checking if the code has been interpreted correctly. Let the number which has to be checked be an an¡1. . . a 1 (with a1 being the check digit). Here, 0  ai  9, where 1  i  n and n is any integer greater than 1. The code is assumed to be correct if and only if à n ! X ai (mod 10) = 0: i=1

(Note: x (mod z) gives the remainder when x is divided by z.)

x (mod z) gives the remainder when x is divided by z.

Example: Let the numeric code which is to be assigned the check digit be 823451. The check digit is found from the equation à n ! à n ! X X ai (mod 10) = 0: Here; ai = 23: i=1

656

i=2

RESONANCE July 2012

GENERAL  ARTICLE

So, (a1 + 23) (mod 10) = 0, i.e., a1 = 7. Thus, the numeric code is 8234517 with the check digit as 7. Analysis It can be seen that the modulus 10 method depends only on the sum of digits constituting the numeric code. ² If there is a single digit error (say a replaced by b), the total sum of the digits of the faulty code di®ers from the sum of digits of the actual code, which is 0(mod 10), by (b¡a). The error will not be detected by the modulus 10 method only if (b ¡ a) (mod 10) equals 0. But as b and a are two unequal single digit numbers, (b¡a) (mod 10) will never be 0. Hence all single digit errors will be detected using the modulus 10 method. ² In case of phonetic errors, the total sum of the digits of the faulty code always di®ers from the sum of digits of the actual code by ¡1 or +1. Hence the sum of the digits of a code with the phonetic error is always 1(mod 10) or 9(mod 10) that is, the remainder is never 0. Hence, all phonetic errors will also be detected by this method. ² In case of single and jump transposition errors, the total sum of the digits of the faulty code and the sum of the digits of the actual code are the same. Consequently, the sum of the digits of the faulty code is equal to 0(mod 10). Hence these two types of errors will not be detected by this method. ² In case of twin and jump twin errors, the total sum of the digits of the faulty code di®ers from the sum of digits of the actual code by 2(b ¡ a). Hence the error is not detected by the modulus 10 method if (2(b¡a))(mod 10) equals 0. This is possible only if a and b di®er by 5. This implies that, 10 out of the 90 twin and jump twin errors are not detected by this method. We can see that the modulus 10 method detects not only all the single digit and phonetic errors but also 88.89%

RESONANCE  July 2012

Modulus 10 method detects not only all the single digit and phonetic errors but also 88.89% of all the twin and jump twin errors.

657

GENERAL  ARTICLE

of all the twin and jump twin errors. However, it will not detect single transposition and jump transposition errors. So, it will detect 80.31% of all errors. Also note that this method is independent of the number of digits present in the code. 3.2 The EAN-13 Scheme EAN-13 [2,8] is a standard coding scheme currently used to identify products uniquely all over the world. It is a numeric code consisting of thirteen digits of which the ¯rst twelve digits represent the product code. The thirteenth digit is the check digit. The check digit is computed as shown below: Part 1: Assigning check digit. Let the number to which the check digit is to be assigned be an an¡1. . . a 2 : Here, 0  ai  9, where 2  i  n and n is equal to 13. Then, check digit, a1 , is determined such that 0 1 [n=2] [n=2] X X @ a2i+1 + 3 a2i A (mod 10) = 0: i=0

i=1

The check digit is appended as the last digit of the number. Part 2: Checking if the code has been interpreted correctly. Let the number which has to be checked be an an¡1. . . a 1 (with a1 being the check digit). Here 0  ai  9, where 1  i  n and n is equal to 13. EAN-13 is a standard coding scheme currently used to identify products uniquely all over the world.

658

The code is assumed to be correct if and only if 0 1 [n=2] [n=2] X X @ a2i+1 + 3 a2i A (mod 10) = 0: i=0

i=1

RESONANCE July 2012

GENERAL  ARTICLE

If this condition is not satis¯ed, the code de¯nitely has an error. Example: Let the numeric code which is to be assigned the check digit be 137142823451. The check digit is found from the equation 0 0 11 [n=2] [n=2] X X @a1 + @ a2i+1 + 3 a2i AA (mod 10) = 0; i=1

The EAN-13 scheme detects all the single transcription errors and 88.89% of all single transposition errors.

i=1

i:e:;(a1 + 67) (mod 10) = 0: So; a1 is equal to 3: Thus, the numeric code is 1371428234513. Analysis Errors detected by this scheme are:

² The EAN-13 scheme detects all the single transcription errors. This is because the di®erence in the sums of the faulty and the actual numeric code is equal to either (b ¡ ai ) (if the subscript of the erroneous digit is odd) or 3 ¤ (b ¡ ai ) (if the subscript of the erroneous digit is even). And this can never be 0(mod 10) as ai and b are two unequal single digit numbers. ² It detects 88.89% of all single transposition errors as the di®erence in the weighted sum of the faulty and the actual numeric code is equal to double the di®erence between the transposed digits, and this di®erence is equal to 0(mod 10) only if the transposed digits di®er by 5. And this happens only in 10 of the total possible 90 cases. ² EAN-13 scheme also detects 88.89 % of all twin errors and jump twin errors besides all phonetic errors.

EAN-13 scheme also detects

² However, this method does not detect any jump transposition errors.

88.89% of all twin errors and jump

_ of all errors. Therefore, this method can detect 89.37% Also note that this method is independent of the number

twin errors besides all phonetic errors.

RESONANCE  July 2012

659

GENERAL  ARTICLE

The ISBN-10 code is a ten digit number which provides information about a book’s publisher and the language in which it is written.

of digits present in the code and can be extended to numeric codes with any number of digits. Hence it is better than the modulus 10 method since the total error detection percentage is more. 3.3 The Modulus 11 Method This method [3,4,7] was commonly used to provide an ISBN-10 code to each book published. The ISBN-10 code is a ten digit number which provides information about its publisher and the language in which it is written. Thus this code helped the bookstores in sorting/ identifying books. The modulus 11 method, by adding a check digit to the number, enhances the usage of ISBN10 codes by detecting most of the errors which are committed while entering the ISBN code. Since 2007, the books are given an ISBN-13 code which uses the EAN-13 scheme to calculate its check digit. The modulus 11 method is as follows: Part 1: Assigning check digit. Let the number to which the check digit is to be assigned be an an¡1. . . a 2 , where a2 is the last digit of the number. Here, 0  ai  9, where 2  i  n and 2  n  10. The check digit a1 is found out such that à n ! X i ¤ ai (mod 11) = 0: i=1

The check digit is appended to the end of this number. If the calculation shows the check digit to be 10, the character `X' is used instead. In modulus 11 scheme if the check digit is 10 it is replaced by X.

660

Part 2: Checking if the code has been interpreted correctly. Let the number which has to be checked be an an¡1 . . . a 1. Here, 0  ai  9, where 1  i  n and 2  n  10 .

RESONANCE July 2012

GENERAL  ARTICLE

The number is assumed to be correct if and only if à n ! X i ¤ ai (mod 11) = 0:

The modulus 11 method is usually

Otherwise, there is de¯nitely an error.

codes whose length is less than

Example: Let the numeric code which is to be assigned the check digit be 823451. The check digit is found from the equation à ! n X a1 + i ¤ ai (mod 11) = 0:

or equal to 10.

i=1

used to detect errors in numeric

i=2

Here

à n X i=2

!

i ¤ ai (mod 11) = 116:

So, (a1 + 116) (mod 11) = 0; i.e., a1 = 5. Thus, the numeric code is 8234515. Analysis Normally, n is less than 11 and the weights used for a10 ; a9 . . . a 1 are 10, 9,. . . , 1 respectively. But if n exceeds 10, the weights are repeated: a11 gets the weight 1, a12 gets the weight 2, a20 gets the weight 10, a21 gets the weight 1, i.e., the weight of ai is ((i ¡ 1)(mod 10) + 1). The reason why the weights are repeated after every 10 digits is that if the weight 11 (or any multiple of 11) is used, then the transcription error is not detected if a digit whose weight is 11 (or any multiple of 11) is mis-interpreted to be another digit. Further, there is no change in the weighted sum (mod 11) if weights which are congruent (mod 11) are used. (Note: Two numbers a and b are said to be congruent (mod z) i® a(mod z) = b(mod z).) It can be seen that the modulus 11 method depends only on the weighted sum of the digits of the number.

RESONANCE  July 2012

661

GENERAL  ARTICLE

The modulus 11 method detects all transcription errors, single and jump transposition errors and jump twin errors.

The error is not detected only if the remainder obtained by dividing the di®erence of the weighted sums of the actual and the mis-interpreted number by 11 is zero. ² The modulus 11 method detects all transcription errors (single digit). This is because the di®erence between the weighted sums of the correct and the mis-interpreted number is equal to (b ¡ a) ¤ ((i¡1) (mod 10) + 1) (considering the error occurs at the ith digit and that a is mistaken for b) and this can never be equal to 0(mod 11). Hence all single digit errors are detected by this method. ² This method also detects all single and jump transposition errors because the di®erence between the weighted sums of the correct and the mis-interpreted number is equal to the di®erence between the transposed digits for single transposition errors and double the di®erence between the transposed digits for jump transposition errors and these di®erences can never be 0(mod 11). ² This method also detects all jump twin errors. Thus, 91.3 % of all the errors are detected for the 10 digit ISBN code. The drawback of the modulus 11 scheme is that the check digit can also be a character `X', which is used to represent the number 10.

Modulus 11 codes can be constructed without the need to use ‘X’ as check digit by eliminating all the codes which require ‘X’ as the check digit.

662

Modulus 11 codes can be constructed without the need to use `X' as check digit by eliminating all the codes which require `X' as the check digit. But this will reduce the number of usable codes by 9.1 %. This is because 9.1 % of all the codes have their check digit as `X' and hence only 90.9 % of the codes are available for use. 3.4 The Verhoe® Scheme Single digit (transcription) and transposition errors are the most common errors committed in entering numeric codes, their combined probability of occurrence being 89.3%. The modulus 10 method and the EAN-13 scheme

RESONANCE July 2012

GENERAL  ARTICLE

fail to detect all single transposition errors. Although the modulus 11 method is successful in detecting all single digit and transposition errors, it has a disadvantage. It uses the character `X' to represent 10. The Verhoe® scheme not only detects all single digit transcription and transposition errors but also works for codes of any length and uses only decimal digits unlike the modulus 11 method. Unlike the methods described so far which use ordinary arithmetic, the Verhoe® scheme uses permutations and non-commutative dihedral group operations to evaluate the check digit. This coding scheme is being used by `AADHAAR' [6] to give a UID (Unique Identi¯cation Number) to each citizen of India (see Box 1).

Box 1. Aadhaar – Unique Identification Number Aadhaar is a 12 digit unique number which will be issued to all residents in India. It is a random number generated devoid of any classification based on caste, creed, religion and geography. The number will be stored in a centralized database and linked to basic demographic and bio-

Permutations: A permutation of a set of digits and/or characters is a rule for symmetrically replacing one character with another from the same set. A permutation is always a one-one and onto function.

metric information of each

For example, the permutation f = (0,1)(2,3,4,5)(6,7,8,9) means that f (0) = 1, f (1) = 0, f (2) = 3, f (3) = 4, f (4) = 5, f (5) = 2, f (6) = 7, f (7) = 8, f (8) = 9, f (9) = 6. Thus the permutation follows a cyclic order inside each bracket.

nate duplicate and false

Fact: All re°ective and rotational symmetries applied on an object/¯gure can be represented as permutations.

jobs and pensions by nu-

individual. The biometric information is photograph, ten fingerprints and iris scan. It is meant to elimiidentities in current Government and private databases. It is expected to benefit below poverty line population who are given merous Government schemes and prevent leakage in

From Figure 1, we can see that a re°ective symmetry can be represented as:

public distribution system.

( a b c d e ) ! ( b a e d c ), and rotational symmetry can be represented as: ( a b c d e ) ! ( e a b c d ). Dihedral group: For every n-gon, there are n re°ective symmetries, n¡1 rotational symmetries and 1 identity symmetry. The collection of all permutations of the re°ective and rotational symmetries along with the

RESONANCE  July 2012

663

GENERAL  ARTICLE

Figure 1. Reflective and a rotational symmetry of a pentagon.

The collection of all permutations of the reflective and rotational symmetries along with the identity symmetry is called the dihedral group and is represented as D2n.

identity symmetry is called the dihedral group and is represented as D2n . As there are 10 single digit numbers, we use the dihedral group D10. Each permutation in the group is assigned a single digit number. The operation `*' applied to decimal digits is obtained from the Cayley table (Table 2). In Box 2 we explain how Cayley table is constructed. The values of a and b are given in the ¯rst column and the ¯rst row respectively in Table 2. Given a and b, (a*b) is found from this table. For example, 2*5 = 7. Observe that the operation * is not commutative. (b*a) = 5*2 = 8. (Out of 100 entries in Table 2, 60 are not commutative). We now give the method. a

Table 2. The Cayley table used by the Verhoeff Scheme for the operation ‘*’.

664

b

* 0 1 2 3 4 5 6 7 8 9

0 0 1 2 3 4 5 6 7 8 9

1 1 2 3 4 0 9 5 6 7 8

2 2 3 4 0 1 8 9 5 6 7

3 3 4 0 1 2 7 8 9 5 6

4 4 0 1 2 3 6 7 8 9 5

5 5 6 7 8 9 0 1 2 3 4

6 6 7 8 9 5 4 0 1 2 3

7 7 8 9 5 6 3 4 0 1 2

8 8 9 5 6 7 2 3 4 0 1

9 9 5 6 7 8 1 2 3 4 0

RESONANCE July 2012

GENERAL  ARTICLE

Number Permutation ( f) Number Inverse (inv)

0 1 2 3 1 5 7 6 0 1 2 0 4 3

4 5 6 7 8 2 8 3 0 9

3 4 5 6 7 2 1 5 6 7

9 4

8 9 8 9

Table 3. (top) Permutation used by Verhoeff. Table 4. (bottom) Inverse of each number obtained from the Cayley table.

Part 1: Assigning check digit. Let the number to which the check digit is to be assigned be an an¡1 . . . a2. Here, 0  ai  9, where 2  i  n and n is any integer greater than 1. De¯ne f i (x) = f (f i¡1 (x)); where f is the permutation used in the Verhoe® scheme. Evaluate f(a2)*f 2 (a3 )*: : :*f n¡2 (an¡1 )*f n¡1(an ) using the binary operation given in Table 2 and the permutation given in Table 3 or any other suitable permutation. (As `*' is associative, the order of evaluation does not matter.) Let the result obtained be `z' (it is a single digit number). From Table 4, ¯nd the inverse of `z', i.e., the number for which a1 *z = 0. The inverse of `z' is the check digit (a1 ) which is to be appended to the end of the number. Part 2: Checking if the code has been interpreted correctly. Let the number which has to be checked be an an¡1 . . . a2a1 . Here, 0  ai  9, where 1  i  n and n is any integer greater than 1. Evaluate a1 *f(a2 )*: : :* f n¡2 (an¡1 )*f n¡1 (an ) (as `*' is associative, the order of evaluation does not matter). If the result is zero, then the number is assumed to be correctly interpreted. If not, there is an error. Example: Let the numeric code which is to be assigned the check digit be 823451. The check digit, a1, must be such that a1 *f (a2 )*: : :*f n¡2 (an¡1 )*f n¡1(an )= 0 . RESONANCE  July 2012

665

GENERAL  ARTICLE

Box 2. Constructing the Cayley Table (with binary operation ‘’) for the Dihedral Group D10 Without Using the Group Members

The construction of the Cayley table can be explained using two pentagons.

Figure A. Pentagons used to construct the Cayley table.

To ¯nd the number a*b, follow the following steps. a and b are integers ¸ 0,  9. Case 1: a < 5; b < 5. 1. Go to the point `a' in the ¯rst pentagon. 2. Advance `b' points in the clockwise direction to ¯nd the number representing a*b. 3. Mathematically, a*b = (a + b)(mod 5). Case 2: a < 5; b ¸ 5. 1. Go to the point (a + 5) in the second pentagon. 2. Advance `b' points in the clockwise direction to ¯nd the number representing a*b. 3. Mathematically, a*b = (a + b)(mod 5) + 5. Case 3: a ¸ 5; b < 5.. 1. Go to the point `a' in the second pentagon. 2. Advance `b' points in the anti-clockwise direction to ¯nd the number representing a*b. 3. Mathematically, a*b = (a ¡ b)(mod 5) + 5. Case 4: a ¸ 5; b ¸ 5.. 1. Go to the point (a ¡ 5) in the ¯rst pentagon. 2. Advance `b' points in the anti-clockwise direction to ¯nd the number representing a*b. 3. Mathematically, a*b = (a ¡ b)(mod 5).

666

RESONANCE July 2012

GENERAL  ARTICLE

Assume that this scheme uses the permutation of Table 2, namely, f = (1,5,8,9,4,2,7,0)(3,6).

The Verhoeff scheme detects all

So, f(1) = 5; f 2 (5) = f(f (5)) = f(8) = 9; f 3 (4) = f(f (f(4))) =0. Similarly, f 4 (3) =3; f 5(2)= 8; f 6 (8) = 1.

single digit errors and this does not

n¡2

So, f(a2 )*: : :*f (an¡1 )*f = (1*3)*7 = 4*7 = 6.

n¡1

(an ) = (5*9)*(0*3)*(8*1)

depend on the permutation used in the scheme.

So, the check digit is such that a1 *6 = 0. This is possible only if a1 = 6. So, the numeric code is 8234516. Now, if there is a single transposition error, say 8 and 2 are transposed, then a1 *f(a2)*: : :*f n¡2 (an¡1)*f n¡1 (an )= 6*(5*9)*(0*3)*(0*9) = (6*1)*(3*9)= 5*7 = 3. Hence, a1 *f (a2 )*: : :*f n¡2 (an¡1 ) *f n¡1 (an ) 6= 0 , and this transposition error is detected. Analysis ² The Verhoe® scheme detects all single digit errors and this does not depend on the permutation used in the scheme. The proof is as follows: For the correct number, a1 *f(a2)*: : :*f i¡1 (ai )*: : :*f n¡2(an¡1 )*f n¡1 (an ) = 0. If ai is replaced by b and the error has to go undetected, a1 *f(a2)*: : :*f i¡1 (b)*: : :*f n¡2(an¡1 )*f n¡1 (an ) = 0. This is possible only if f i¡1(ai ) = f i¡1(b) which cannot be true as f is a one-one and onto function. ² Only some permutations can detect all transposition errors. With the help of the same argument as in the previous case, we can say that the error is detected if f i¡1(ai )*f i (ai+1 ) 6= f i¡1 (ai+1 )*f i (ai ). Without loss of generality, we can take f i¡1 (ai ) = a and f i¡1(ai+1 ) = b. So, error is detected if a*f(b) 6= b*f(a).

RESONANCE  July 2012

667

GENERAL  ARTICLE

In the Verhoeff scheme only some

Permutations that detect all single transposition errors can be found as follows:

permutations can detect all transposition

Consider f(x) = f1(x) for x  4 and f(x) = f2 (x) for x ¸ 5 . f is a permutation de¯ned by f1 and f2 .

errors.

Let us take 2 cases: Case 1: f1 (a)  4, f2 (a) > 4. Case 2: f1 (a) > 4, f2 (a)  4. There may be even other types of permutations which detect all the single transposition errors but it is easier to ¯nd permutations using the above conditions due to the structure of the Cayley table. Assume that the digits a and b have been transposed. Case 1.1: a < 5 and b < 5. If the permutation detects all the single transposition errors, a*f1 (b) 6= b*f1 (a): Using Box 2: (a+f1 (b))(mod 5) is not equal to (b+f1 (a))(mod 5) , i.e., (f1 (b)¡b)(mod 5) is not equal to (f1(a)¡a)(mod 5) . Case 1.2: a ¸ 5 and b ¸ 5. If the permutation detects all the single transposition errors, a*f2 (b) 6= b*f2 (a): Using Box 2: (a¡f2 (b))(mod 5) is not equal to (b¡f2 (a))(mod 5), i.e., (f2 (a)+a)(mod 5) is not equal to (f2(b)+b)(mod 5).

668

RESONANCE July 2012

GENERAL  ARTICLE

Case 1.3: a < 5 and b ¸ 5. If the permutation detects all the single transposition errors, a*f2(b) 6= b*f1 (a). Using Box 2:

There are at least 40 permutations which can successfully help the Verhoeff scheme detect all transcription and transposition errors.

(a+f2 (b))(mod 5) + 5 is not equal to (b¡f1 (a)) (mod 5) + 5, i.e., (a+f1 (a))(mod 5) is not equal to (b¡f2 (b))(mod 5), Taking f1 (a) = (k1 a + x1)(mod 5) and f2(a) = (k2 a + x2 )(mod 5) + 5, we ¯nd that all the three conditions are satis¯ed if k1 = 4(mod 5) and k2 = 1(mod 5) and x1 is not congruent to ¡x2 with modulus 5. We see that the pair (x1 , x2 ) can occur in 20 ways. Thus, there are 20 such permutations for which all transposition errors are detected. Similarly, we can ¯nd 20 other permutations from Case 2. Hence there are at least 40 permutations which can successfully help the Verhoe® scheme detect all transcription and transposition errors. All these permutations can thus detect at least 89.3% of all errors. Although all these permutations detect all single digit and single transposition errors, they have di®erent success rates for detecting other types of errors. For example, if the Verhoe® scheme uses the permutation (1,5,8,9,4,2,7,0)(3,6), then it can successfully detect all single digit, single transposition, 95.555% of twin errors, 94.222% of jump twin and jump transposition errors and 95.833% of phonetic errors [9, 10]. Hence, the permutation suggested by Verhoe® successfully detects 91.3% of all possible errors.

RESONANCE  July 2012

669

GENERAL  ARTICLE

Conclusion In this article, we have described four methods of appending a check digit to numeric codes which use decimal digits, with the aim of detecting transcription and transposition errors. While the modulus 10, EAN-13 and modulus 11 methods use simple commutative arithmetic to evaluate the check digit and validate codes, the Verhoe® scheme devised by J Verhoe®, uses non-commutative dihedral group operations and permutations to do so. The modulus 11 method detects all transcription and single transposition errors. In addition, it also detects all jump twin and jump transposition errors, 88.89% of twin errors and 90% of phonetic errors for codes ten digits long (Table 5). However, it uses a character `X' to represent the check digit 10, which is the method's greatest drawback. Nevertheless, all the codes which have their check digit as `X' can be deleted, thus overcoming the drawback. But, this reduces the total number of codes that can be generated using this method.

Table 5. Error detection rates of numeric codes.

Errors

Single Error Single Transposition Jump Transposition Twin Error Phonetic Error Jump Twin Total Accuracy

670

The Verhoe® scheme presents a better alternative. It detects all single transcription and single transposition errors. The success rates of detection of other errors depend on the permutation used by the Verhoe® scheme.

Modulus 10 Method

Modulus 11 (for codes 10 digits long)

100% 0% 0% 88.89% 100% 88.89% 80.31%

100% 100% 100% 88.89% 90% 100% 91.3%

EAN-13 Verhoe® Scheme Scheme 100% 88.89% 0% 88.89% 100% 88.89% 89.37%

100% 100% 94.222% 95.555% 95.833% 94.222% 91.3%

RESONANCE July 2012

GENERAL  ARTICLE

Also, unlike the modulus 11 method, this method uses only decimal digits as its check digits thus overcoming this drawback of the modulus 11 method. However, the number of operations required to compute the check digit and to check if a particular code is correct, is very large compared to the modulus 11 method.

Verhoeff scheme of detecting errors

Other simpler methods like the modulus 10 method and the EAN-13 scheme are inferior to the modulus 11 and the Verhoe® scheme as these methods are not successful in detecting all single transposition errors.

EAN-13 schemes.

in decimal codes is superior to modulus 10, modulus 11 and

Table 5 shows the success rates of the four error detecting methods described in this article in detecting the most common errors in numeric codes. Acknowledgements I would like to express my deepest gratitude to Prof. V Rajaraman for his valuable guidance and support which immensely helped me in writing this article. Suggested Reading [1]

Joseph Kirtland, Identification numbers and check digit schemes, The Mathematical Society of America, (ISBN 0-88385-720-0), 2001.

[2]

David Savir, George J Laurer, The characteristics and decodability of universal product code, IBM Systems Journal, Vol.14, No.1, pp.16–34, 1975.

[3]

www.isbn-international.org/faqs

[4]

V Rajaraman, Analysis and design of information systems, 3rd Edition, PHI Learning, pp.222–227, 2010.

[5]

J Verhoeff, Error detecting decimal codes, Mathematical Centre Track, Vol.29, The Mathematical Centre, Amsterdam, 1969.

[6]

Hemant Kanankia, Srikant Nathamuni, Sanjay Sarma, A UID numbering scheme, May 2010. (www.uidai.gov.in)

[7]

www.wikipedia.org/wiki/ISBN

[8]

www.wikipedia.org/wiki/International_Article_Number(EAN)

[9]

http://www.cs.utsa.edu/~wagner/laws/verhoeff.html

[10]

http://en.wikipedia.org/wiki/Verhoeff_algorithm

RESONANCE  July 2012

Address for Correspondence A Siddharth Deapartment of Computer Science and Engineering Indian Institute of Technology Pataliputra Colony New Government Polytechnic Patna 800 013, Bihar, India Email: [email protected]

671