Fundamentals of Biometric System Design

Fundamentals of Biometric System Design by S. N. Yanushkevich Chapter 3 Biometric Methods and Techniques: Part I – Statistics FRR 1 • Computing typ...
Author: Adela Bailey
31 downloads 1 Views 944KB Size
Fundamentals of Biometric System Design by S. N. Yanushkevich

Chapter 3 Biometric Methods and Techniques: Part I – Statistics FRR

1

• Computing type I errors (FRR) • Computing type II errors (FAR) • System performance evaluation

Equal error rate (EER)

FAR

0 1

S.N. Yanushkevich, Fundamentals of Biometric System Design

2

Preface The key methodology of the measurement in biometric technology is the engineering statistics. The key methodology of biometric data processing is signal processing and pattern recognition. The key performance metrics in biometrics are related to matching rates (false match rate, false nonmatch rate, and failure-to-enroll rate). The crucial point of biometric system design is the measuring of biometric data. It is mandatory knowledge of all biometric design teams. In the engineering environment, the data is always a sample1 selected from some population. For example, the calculation the reliability parameters of biometric data, such as the confidence interval and the sample size, is a typical problem of the experimental study, reliability design and quality control. Engineering statistics provides various tools for assessing the of biometric data, in particular: Techniques Techniques Techniques Techniques

◮ ◮ ◮ ◮

for for for for

estimation of mean, variance, and correlation, computing confidence intervals, hypothesis testing, and computing type I and type II errors.

In biometric system, decision making is based on statistical criteria. This is because biometric data is characterized by their high variability. Every time a user presents biometric data, a unique template2 Given the type of biometric, even two immediately successive samples of data from the same user generate entirely different templates. Statistical techniques are used to recognize that these templates belong to the same person. For example, a user may place the same finger on biometric device several times and all generated templates will be different. To deal with this variety, statistical tools must be used. For processing the raw biometric data, various techniques of the signal or image processing, pattern recognition, and decision making are used, in particular, ◮ 2D discrete Fourier transform, ◮ Filtering in spatial and frequency domains using Fourier transform, 1

In biometric system, sample is a biometric measure submitted by the user and captured by the data acquisition tool. 2 A template is a small file derived from the distinctive features of a user’s biometric data. A template is used in a system to perform biometric matches. Biometric system store and compare templates derived from biometric data. Biometric data cannot be reconstructed from a biometric template.

S.N. Yanushkevich, Fundamentals of Biometric System Design

3

◮ Classifiers, and ◮ Pattern recognition module design. In design of a biometric system, these techniques should be considered with respect to the software or hardware implementation. This lecture brings these techniques together in the context of the implementation. Finally, this lecture introduces notation of the performance of biometric system. Because biometric system is an application-specific computer system, the performance is defined: ◮ In terms of specific application, such as operational accuracy, and ◮ In terms of computer platform, such as operational time. In this lecture, performance in terms of operational accuracy is introduced (false reject and accept rates, false match and non-match rates, failure to enroll).

Essentials of this lecture • Statistical thinking. Statistical approach should be applied at all phases of a life cycle of biometric system including experimental study, design techniques, testing, reliability estimation, and quality control. Biometric data must be represented in the form which is acceptable for decision making in verification and identification procedures. For this, classic signal processing and pattern recognition methods are adopted. • Statistical performance evaluation. Performance parameters of biometric system in terms of the operational (system) accuracy and operational time (computational speed) cannot be measured exactly, they can only estimated using statistical techniques. • Statistical decision-making. The variability of biometric data is propagated into the templates and decision making. Decision making at various levels of biometric system hierarchy is a statistical procedure by nature, that is decision making under uncertainty.

S.N. Yanushkevich, Fundamentals of Biometric System Design

4

Biometric Methods and Techniques Biometrics is a multidisciplinary area. Various advanced mathematical and engineering methods and techniques are used in biometric system design. In this lecture, the methods from the following directions are briefly introduced: ◮ Statistical methods, ◮ Methods of signal processing, and ◮ Methods of pattern recognition.

1

Basic statistics for biometric system design

Biometric systems begin with the measurement of a behavioral/physiological characteristic. Key to all systems is the underlying assumption that the measured biometric characteristic is both distinctive between individuals and repeatable over time for the same individual. Statistical methods provide the techniques for the biometric characteristic measuring. In the implementation, the problems of measuring and controlling these random variations begin in the data acquisition module. The users characteristic must be presented to a sensor. The presentation of any biometric to the sensor introduces a behavioral (random) component to every biometric method. The output of the sensor forms the input data upon which the system is built. It is a combination of (a) the biometric measure, (b) the way the measure is presented, and (c) the technical characteristics of the sensor. Both the repeatability and distinctiveness of the measurement are negatively impacted by changes in any of these factors.

The engineering method and statistical thinking An engineering approach to formulating and solving problems is applicable to design the biometric devices and systems. Step 1: Develop a clear and concise description of the problem. Step 2: Identify, at least tentatively, the important factors that affect this problem or that may play a role in its solution. Step 3: Propose a model for the problem, using knowledge of the biometric phenomenon being used in the biometric system. State any limitation or assumptions of the model. Step 4: Conduct appropriate experiments and collect data to test or validate the tentative model or conclusions made in Steps 2 and 3. Step 5: Refine the model on the basis of observed data. Step 6: Manipulate the model to assist in developing an algorithm, program, and hardware platform.

S.N. Yanushkevich, Fundamentals of Biometric System Design

5

Step 7: Conduct an appropriate experiment to confirm that the proposed design solutions are both effective and efficient with respect to given criteria. Step 8: Draw conclusions or make recommendations based on design solutions.

The field of statistics deals with the collection, presentation, analysis, and use of data to make decisions. Statistical techniques are used in all phases of biometric system design, their comparison and testing, and improving existing designs. Statistical methods are used to help us describe and understand variability. By variability, we mean that any successive observation of a biometric system or biometric phenomenon does not produce an identical result. Because the measurements exhibit variability, we say that the measured parameter is a random variable. A convenient way to think of a random variable, say X, which represents a measured quantity, is by using an appropriate model, for example, Random variable X = Constant µ + Noise ǫ . In the engineering environment, the data is almost always a sample that has been selected from some population. In biometric system design, data is collected in three ways: ◮ A retrospective study based on historical data; the engineer uses either all or one sample of the historical process data from some period of time; for example, biometric data from data bases, data from previous experimental study, etc. ◮ An observation study; the engineer observes the process during a period of routine operation; for example, facial expressions, signatures, etc. ◮ A designed experiment; the engineer makes deliberate or purposeful changes in controllable variables called factors of the system, observes the system output, and makes a decision or an inference about which variables are responsible for the changes that he/she observes in the system output; for example, feature extraction from biometric data using an appropriate algorithm

Distinction between a designed experiment and an observational/retrospective study An important distinction between a designed experiment and either an observational or retrospective study is that in the first one the different combinations of the factors of interest are applied randomly to a set of experimental units. This allows cause-and-effect relationships to be established, and that cannot be done with observational/retrospective studies. A designed experiment is based on two statistical techniques: hypothesis testing and confidence intervals.

S.N. Yanushkevich, Fundamentals of Biometric System Design

6

Example 1: (Designed experiment.) Assume that a company introduces a new biometric device. How should an experiment be designed to test its effectiveness? The basic method would be to perform a comparison between the control devices and the new device. Any comparison is based on a measurement. If the same thing is measured several times, in an ideal world, the same result would be obtained each time. In practice, there are differences. Each result is thrown off by chance error, and the error changes from measurement to measurement. No matter how carefully it was made, a measurement could have turned out a bit differently from the previous way it did.

Statistical hypothesis Many problems in biometric system design require that we decide whether to accept or reject a statement about some parameters. The statement is called a hypothesis and the decision-making procedure about the hypothesis is called hypothesis testing.

Statistical hypothesis A statistical hypothesis is an assertion or conjecture concerning one or more populations. The truth or false of statistical hypothesis is never known with absolute certainty, unless we examine the entire population. This is impractical. Instead, we take a random sample from the population of interest and use the data contained in this sample to provide evidence that either supports or does not support the hypothesis (leads to rejection of the hypothesis). The decision procedure must be done with the awareness of the probability of the wrong conclusion. The rejection of a hypothesis implies that the sample evidence refutes it. In other words: The rejection means that there is a small probability of obtaining the sample information observed when, in fact, the hypothesis is true.

The structure of hypothesis testing is formulated using the term null hypothesis. This refers to any hypothesis we wish to test and is denoted by H0 . The rejection of H0 leads to the acceptance of an alternative hypothesis, denoted by H1 .

Null and alternative hypothesis The alternative hypothesis H1 represents the question to be answered; its specification is crucial. The null hypothesis H0 nullifies or opposes H1 and is often the logical compliment to H1 . This results in one of the two following conclusions: Reject H0 : In favor of H1 because of sufficient evidence in the data Fail to reject H0 : because of insufficient evidence in the data

S.N. Yanushkevich, Fundamentals of Biometric System Design

7

Example 2: (Null and alternative hypothesis.) Suppose that we are interested in deciding whether or not the mean, µ, is equal to value 50. Formally it is expressed as Null hypothesis H0 : µ = 50 and Alternative hypothesis H1 : µ 6= 50. That is, the conclusion is that we reject the hypothesis H0 in favor of hypothesis H1 if µ 6= 50. Because in Example 2 the alternative hypothesis specifies the values of µ that could be either greater or less that 50, it is called a two-sided alternative hypothesis. In some situations, we may wish to formulate a one-sided alternative hypothesis: Null hypothesis H0 : µ = 50 One-sided alternative hypothesis H1 : µ < 50 or One-sided alternative hypothesis H1 : µ > 50

Testing a statistical hypothesis Let the null hypothesis be that the mean is µ = a, and the alternative hypothesis be that µ 6= a. That is, we wish to test: Null hypothesis H0 : µ = a Two-sided alternative hypothesis H1 : µ 6= a Suppose that a data sample of n is tested, and that the sample mean x is observed. The sample mean is an estimate of the true population mean µ = a. A value of the sample mean x, that falls close to the hypothesized value of µ, is the evidence that the true mean µ is really a; that is, such evidence supports the null hypothesis H0 . On the other hand, a sample mean x that is considerably different from a, is evidence in support of the alternative hypothesis H1 . Thus, the sample mean represents the test statistics. Example 3: (Critical region and values.) The sample mean x can take on many different values. Suppose that if 48.5 ≤ x ≤ 51.5, we will not reject the null hypothesis H0 : µ = 50. If either x < 48.5 or x > 51.5, we will reject the null hypothesis in favor of the alternative hypothesis H1 : µ 6= 50. The values of x that are less than 48.5 and greater than 51.5 constitute the critical region for the test. The boundaries that define the critical regions (48.5 and 51.5) are called critical values.

S.N. Yanushkevich, Fundamentals of Biometric System Design

8

Therefore, we reject H0 in favor of H1 if the test statistic falls in the critical region, and fails to reject H0 otherwise. This decision procedure can lead to either of the two wrong conclusions: Type I error or False reject rate (FRR): is defined as rejecting the null hypothesis H0 when it is true. The type I error is also called the significant level of the test. The probability of making a type I error is α = P (Type I error) = P (Reject H0 when H0 is true) Type II error or False accept rate (FAR): is defined as failing to reject the null hypothesis when it is false. The probability of making a type II error is β = P (Type II error) = P (Fail to reject H0 when H0 is false) Properties of type I (FRR) and type II (FAR) errors Property 1: Type I error and type II error are related. A decrease in the probability of one generally results in an increase in the probability of the other Property 2: The size of the critical region, and, therefore, the probability of committing a type I error, can always be reduced by adjusting the critical value(s) Property 3: An increase in the sample size n will reduce α and β simultaneously Property 4: If H0 is false, β is maximum when the true value of a parameter approaches the hypothesized value. The greater the distance between the true value and the hypothesized value, the smaller β will be.

Recommendations for computing type I and II errors Type I error. Generally, the designer controls the type I error probability α, called a significance level, when the critical values (the boundaries that define the critical region, see Example 3) are selected. Thus, it is usually easy for the designer to set the type I error probability at (or near) any desired value. Because the designer can directly control the probability of wrongly rejecting H0 , we always think of rejection of the null hypothesis H0 as a strong conclusion. Because we can control the probability of making a type I error, α, the problem is what value should be used. The type I error probability is a measure of risk, specifically, the risk of concluding the the null hypothesis is false when it really isn’t. So, the value of α should be chosen to reflect the consequences (biometric data, device, system, etc.) of incorrectly rejecting H0 : ◮ Smaller values of α would reflect more serious consequences, and

S.N. Yanushkevich, Fundamentals of Biometric System Design

9

◮ Larger values of α would be consistent with less severe consequences. This is often hard to do, and what has evolved in much of biometric system design is to use the value α = 0.05 in most situations, unless there is information available that indicates that this is an inappropriate choice. Type II error. The probability of type II error, β, is not a constant. It depends on both the true value of the parameter and the sample size that we have selected. Because the type II error probability β is a function of both, the sample size and extent to which the null hypothesis H0 is false, it is customary to think of the decision not to reject H0 as a weak conclusion, unless we know that β is acceptably small. Therefore, rather than saying we “accept H0 ” we prefer the terminology “fail to reject H0 ”. Failing to reject H0 implies that we have not found sufficient evidence to reject H0 , that is, to make a strong statement. Failing to reject H0 does not necessarily mean there is a high probability that H0 is true. It may simply mean that more data are required to reach a strong conclusion. This can have important implications for the formulation of hypotheses. The power of a statistical test is the probability of rejecting the null hypothesis H0 when the alternative hypothesis is true. The power is computed as Power of a statistical test = 1 − β The power of a statistical test can be interpreted as the probability of correctly rejecting a false null hypothesis. The power of the test is very descriptive and concise measure of the sensitivity of a statistical test, where by sensitivity we mean the ability of the test to detect differences. Example 4: (Type I and II errors.) The techniques for computing type I and II errors for given data sample is shown in Fig. 1.

Estimating the mean Even the most efficient estimator is unlikely to estimate a population parameter θ exactly. It is true that our accuracy increases with large samples, but there is still no reason why we should expect a point estimate from a given sample to be exactly equal to the population parameter it is supposed to estimate. It is preferable to determine an interval within which we would expect to find the value of the parameter. Such an interval is called an interval estimate.

10

S.N. Yanushkevich, Fundamentals of Biometric System Design

Design example: Computing type I and II errors Problem formulation: Let be face features such as the regions of lips, mouth. nose, ears, eys, eyebrow, and other facial measureing be detected. Let biometric data corresponding to the lip topology is represented by a sample of size n = 10, while the mean and the standard deviation are µ = 50 and σ = 2.5, respectively. This biometric data has a distribution for which the conditions of the central limit theorem apply, so the distribution of √ the sample√mean is approximately normal with mean µ = 50 and σ/ n = 2.5/ 10 = 0.79. Find the probability of type I error.

Step 1: The probability of type I error The probability of making type I error (or significance level of our test) α = P (Type I error) = P (Reject H0 when H0 is true) is equal to the sum of the areas that have been shaded in the tails of the normal distribution. We may find this probability as Left tail x2 }| { z z}|{ Probability of type I error, α = P (X < |{z} 48.5 when µ = 50) + P (X > 51.5 when µ = 50) {z } | x1

Right tail

The z-values that correspond to the critical values 48.5 and 51.5 are calculated as follows: z1 =

x1 − µ x2 − µ 48.5 − 50 51.5 − 50 √ = √ = = -1.90 and z2 = = 1.90 0.79 0.79 σ/ n σ/ n

Therefore α = P (Z < −1.90) + P (Z > 1.90) = P (Z < −1.90) + (1 − P (Z < 1.90)) =

0.0287 + (1 − 0.9713) = 0.0574 Conclusion: This implies that 5.74% of all random samples would lead to rejection of the hypothesis H0 : µ = 50 when the true mean is really 50.

Step 2: Reducing a type I error by decreasing the critical region From inspection of the critical region for H0 : µ = 50 versus H1 : µ 6= 50 and n = 10, note that we can reduce α by pushing the critical regions further into the tails of the distribution. For example, if we make the critical values 48 and 52, the values of α is α/2 = 0.0287

α/2 = 0.0287

α 48.5

µ = 50

=

=

P (Z2
) 0.79 0.79

51.5

P (Z < −2.53) + P (Z > 2.53) = 0.0057 + 0.0057 = 0.0114

Fig. 1: Techniques for computing type I and II errors (Example 4).

11

S.N. Yanushkevich, Fundamentals of Biometric System Design

Design example: Computing type I and II errors (Continuation) Step 3: Reducing type I error by increasing the sample size We could also reduce α by increasing the sample size, assuming that the critical values of 48.5 = 0.625, and using the original critical region, we and 51.5 do not change. If n = 16, √σn = √2.5 16 find

z1 =

48.5 − 50 51.5 − 50 = −2.40 and z2 = = 2.40 0.625 0.625

Therefore α = P (Z < −2.40) + P (Z > 2.40) = 0.0082 + 0.0082 = 0.0164

Step 4: Design decision on type I error

An acceptable type I error can be chosen from the following possibilities: Type I error from the original critical region Z < −1.90, Z > 1.90 is The type I error reduced by decreasing the critical region from Z < −1.90, Z > 1.90 to Z < −2.53, Z > 2.53 is

α=0.0574

α=0.0114

Type I error reduced by increasing the sample size from n = 10 to n = 16 is

α = 0.0164

Step 5: Specification of the probability of type II error The probability of making type II error is β = P (Type II error) = P (Fail to reject H0 when H0 is false) To calculate β, we must have a specific alternative hypothesis; that is, we must have a particular value of µ. For example, suppose we want to reject the null hypothesis H0 : µ = 50 whenever the mean µ is grater than 52 or less than 48. We could calculate the probability of type II error β for the values µ = 52 and µ = 48, and use this result to tell us something about how the test procedure would perform. Because of the symmetry of normal distribution function, it is only necessary to evaluate one of the two cases, say, find the probability of not rejecting the null hypothesis H0 : µ = 50 when the true mean is µ = 52.

Under H0: µ = 50

48

Under H1: µ = 52

50

52

54

◮ The normal distribution on the left (see figure on the left) is the distribution of the test statistic X when the null hypothesis H0 : µ = 50 is true (this is what is meant by the expression “under H0 : µ = 50”) ◮ The normal distribution on the right is the distribution of the test statistic X when the alternative hypothesis is true and the value of the mean is 52 (or “under H1 : µ = 52”).

Fig. 2: Techniques for computing type (continuation of Example 4).

S.N. Yanushkevich, Fundamentals of Biometric System Design

12

Design example: Computing type I and II errors (Continuation) Step 5: (continuation) Now the type II error will be committed if the sample mean x falls between 48.5 and 51.5 (the critical region boundaries) when µ = 52. This is the probability that 48.5 ≤ X ≤ 51.5 when the true mean is µ = 52, or the shaded area under the normal distribution on the right, that is β = P (Type II error) = P (48.5 ≤ X ≤ 51.5 when µ = 52)

Step 6: Computing of the probability of type II error The z-values corresponding to 48.5 and 51.5 when µ = 52 are z1 =

48.5 − 52 51.5 − 52 = −4.43 and z2 = = −0.63 0.79 0.79

Therefore, Probability of type II error, β

=

P (−4.43 ≤ Z ≤ −0.63)

=

P (Z ≤ −0.63) − P (Z ≤ −4.43)

=

0.2643 − 0.0000 = 0.2643

Conclusion: If we are testing H0 : µ = 50 against H1 : µ 6= 50 with n = 10, and the true value of the mean is µ = 52, the probability that we will fail to reject the false null hypothesis is 0.2643 . By symmetry (see graphical representation in Fig. 2), if the truth value of the mean is µ = 48, the value of β will also be

0.2643 .

Step 7: Analysis of a type II error The probability of making a type II error β increases rapidly as the true value µ approaches the hypothesized value. For example, consider the case when the true value of the mean is µ = 50.5 and the hypothesized value is H0 : µ = 50. The true value of µ is very close to 50, and the value for probability of type II error β is β = P (48.5 ≤ X ≤ 51.5) when µ = 50.5. The z-values corresponding to 48.5 and 51.5 when µ = 50.5 are z1 =

51.5 − 50.5 48.5 − 50.5 = −2.53 and z2 = = 1.27 0.79 0.79

Therefore β = P (−2.53 ≤ Z ≤ 1.27) = P (Z ≤ 1.27) − P (Z ≤ −2.53) = 0.8980 − 0.0057 =

0.8923 . This is higher than in case µ = 52, that is, we are more likely to accept the faulty hypothesis µ = 50 (fail to reject H0 : µ = 50).

Fig. 3: Techniques for computing type I and type II errors (continuation of Example 4).

13

S.N. Yanushkevich, Fundamentals of Biometric System Design

Design example: Computing type I and II errors (Continuation) Step 7: (Continuation) Conclusion: The type II error probability is much higher for the case in which the true mean is 50.5 than for the case in which the mean is 52. Of course, in many practical situations, we would not be as concerned with making a type II error if the mean were “close” to the hypothesized value. We would be much more interested in identifying the large differences between the true mean and the value specified in the null hypothesis.

Step 8: Reducing a type II error by increasing the sample size

Under H0: µ = 50

48

Under H1: µ = 52

50

52

54

The type II error probability also depends on the sample size n. Suppose that the null hypothesis is H0 : µ = 50 and that the true value of the mean is µ = 52. By letting the sample size increase from n = 10 to n = 16, we can compare them using the graphics on the left. The normal distribution on the left is the distribution of X when µ = 50, and the normal distribution on the right is the distribution of X when µ = 52. As shown in figure, the type II error probability is

Probability of type II error β = P (48.5 ≤ X ≤ 51.5) when µ = 52 √ √ When n = 16, the standard deviation of X is δ/ n = 2.5/ 16 = 0.625, and the z-values corresponding to 48.5 and 51.5 when µ = 52 are z1 =

48.5 − 52 51.5 − 52 = −5.60 and z2 = = −0.80 0.625 0.625

Therefore Probability of type II error β

= P (−5.60 ≤ Z ≤ −0.80)

= P (Z ≤ −0.80) − P (Z ≤ −5.60) = 0.2119 − 0.0000 = 0.2119 This β = 0.2119 is smaller than β = 0.2642, so we decrease the probability of accepting the false hypothesis H0 by increasing the sample size.

Step 9: Design decision on type II error An acceptable type II error can be chosen from the following possibilities: The type II error for the original sample size n = 10 and −4.43 ≤ Z ≤ −0.63 is

0.2643

The type II error, reduced by increasing the sample size from n = 10 to n = 16, is

0.2119

Fig. 4: Techniques for computing type I and type II errors(continuation of Example 4).

14

S.N. Yanushkevich, Fundamentals of Biometric System Design

Design example: Computing type I and II errors (Continuation) Step 10: Computing the power of a test Suppose that the true value of the mean is µ = 52. When n = 10, we found that β = 0.2643, so the power of this test is Power of the test

=

1 − β = 1 − 0.2643 = 0.7357

Conclusion: The sensitivity of the test for detecting the difference between a mean of 50 and 52 is 0.7357 . That is, if the true mean is really 52, this test will correctly reject H0 : µ = 50 and “detect” this difference 73.57% of the time. If this value of power is judged to be too low, the designer can increase either α or the sample size n.

Fig. 5: Techniques for computing type I and type II errors (continuation of Example 4). Let 0 < α < 1, then the interval a < θ < b, computed from the selected sample, is called a 100(1 − α)% confidence interval, the fraction 1 − α is called the degree of confidence, and the endpoints a and b, are called the lower and the upper confidence limits. If x is the mean of a random sample of size n from a population with known variance σ 2 , a 100(1 − α)% confidence interval for µ is given by σ σ x − z α2 √ < µ < x + z α2 √ n n

(1)

where z α2 is the z-value leaving an area of α2 to the right. Practice recommendation. In experiments, σ is often unknown, and normality cannot always be assumed. If n ≥ 30, s can replace σ, and the confidence interval s Confidence interval = x ± z α2 √ n may be used. This is often referred to as a large-sample confidence interval. The justification lies only in the presumption that with a sample as large as 30 and the population distribution not too skewed, s (standard deviation from the sample) will be very close to the true σ and, thus, the central limit theorem prevails. It should be emphasized that this is only an approximation, and the quality of the approach becomes better as the sample size grows larger.

S.N. Yanushkevich, Fundamentals of Biometric System Design

15

The 100(1−α)% confidence interval provides an estimate of the accuracy of our point estimate. If µ is actually the center value of the interval, then x estimates µ without error. However, in most cases, x will not be exactly equal to µ and the point estimate is an error. Theorem 1: If x is used as an estimate of µ, we can then be 100(1 − α)% confident that the error will not exceed the value σ Error = z α2 × √ n

(2)

Example 5: (Errors of the confidence intervals.) Hand geometry is defined as a surface area of the hand or fingers and the corresponded measures (length, width, and thickness). The average distance between two points in a hand in 36 different measurements is found to be 2.6 mm. Calculate: (a) the 95% and 99% confidence intervals for the mean between these hand points and (b) the accuracy of point estimate using Theorem 1. Assume that the population standard deviation is σ = 0.3. The solution is given in Fig. 6. Often in experimental studies, we are interested in how large a sample of biometric data is necessary to ensure that the error in estimating µ will be less than a specified amount e. Theorem 2: If x is used as an estimate of µ, we can be 100(1 − α)% confident that the error will not exceed a specified amount e when the sample size is  α  z2 × σ 2 Sample size n = (3) e Theorem 2 is applicable only if we know the variance of the population from which we are to select our sample. Lacking this information, we could take a preliminary sample of size n ≥ 30 to provide an estimate σ. Then, using this estimate as an approximation for σ in Theorem 2, we could determine approximately how many observations are needed to provide the desired degree of accuracy.

16

S.N. Yanushkevich, Fundamentals of Biometric System Design

Design example: Errors of the confidence intervals Problem formulation: Let the hand geometry measurement results in a sample size of n = 36, the sample mean x = 2.6, and the population standard deviation is σ = 0.3. Calculate: (a) the 95% and 99% confidence intervals for the mean between these hand points; (b) the accuracy of a point estimate using Theorem 1 (c) the sample size, if we want to be 95% confident that our estimate of µ does not exceed 0.05 (Theorem 2)

Step 1: If x is the mean of a random sample of size n from a population with known variance σ 2 , a 100(1 − α)% confidence interval for µ is given by

σ σ x − z α2 √ < µ < x + z α2 √ n n where z α2 is the z-value leaving an area of

Step 2:

For

α 2

to the right.

α = 0.05 , n = 36, x = 2.6, and σ = 0.3, the 95% confidence interval is

0.3 0.3 < µ < 2.6 + (1.96) √ , that is, 2.50 < µ < 2.6 − (1.96) √ | {z } 36 | {z } 36 z0.05/2

2.70

z0.05/2

Note that z−value, leaving an area of 0.025 to the right, and, therefore, an area of 0.975 to the left, is z 0.05 = z0.025 = 1.96 (see the table) 2

Step 3:

For

α = 0.01 , n = 36, x = 2.6, and σ = 0.3, the 99% confidence interval is

0.3 0.3 < µ < 2.6 + (2.575) √ , that is, 2.47 < µ < 2.6 − (2.575) √ | {z } 36 | {z } 36 z0.01/2

2.73

z0.01/2

Note that z−value, leaving an area of 0.005 to the right, and, therefore, an area of 0.995 to the = z0.005 = 2.575 (see the table) left, is z 0.01 2 Observation: A longer interval is required to estimate µ with a higher degree of confidence. Decision: Based on Theorem 1, we are 95% confident that the sample mean x = 2.6 differs from the true mean µ by an amount that is less than σ 0.3 z α2 × √ = (1.96) √ = 0.98 n 36 By analogy, we are 99% confident that the sample mean x = 2.6 differs from the true mean µ by an amount that is less than 0.3 σ z α2 × √ = (2.575) √ = 0.13 n 36

Fig. 6: The error of estimating the mean (Example 5).

S.N. Yanushkevich, Fundamentals of Biometric System Design

17

Example 6: (Sample size) (Continuation of Example 5) How large a sample is required if we want to be 95% confident that our estimate of µ is off by less than 0.05? Using Theorem 2,  2   α z2 × σ 2 1.96 × 0.3 = = 138.3 ≈ 139 n= e 0.05 Therefore, we can be 95% confident that a random sample of size 139 will provide an estimate x different from µ by an amount of less than 0.05.

2

Biometric system performance evaluation

Fig. 7 contains the basic definitions and terminology used in the design and testing of biometric systems. In thus design, the terms such as a sample of biometric data, user template, matching score, decision-making, decision rule and decision error rates are used in specific-application meaning.

2.1

Matching score

A broad category of variables, impacting the way in which the users inherent biometric characteristics, are displayed to the sensor. In many cases, the distinction between changes in the fundamental biometric characteristics and the presentation effects may not be clear. Two samples of the same biometric characteristic from the same person are not identical due to imperfect imaging conditions, changes in the users physiological or behavioral characteristics, ambient conditions, and user‘s interaction with the sensor. Therefore, the response of a biometric matching system is the matching score S(XQ , XI ) X

z }|I { Response = Matching score S(Input, T emplate) | {z } XQ

that quantifies the similarity between the input XQ and the template XI representations. This similarity can be encoded by a single number.

S.N. Yanushkevich, Fundamentals of Biometric System Design

Basic definitions and terminology Sample: A biometric measure submitted by the user. Template: A user’s reference measure based on features extracted from the enrolment samples. Matching score: A measure of the similarity between features derived from a presented sample and a stored template. A match/nonmatch decision may be made according to whether this score exceeds a decision threshold. System decision: A determination of the probable validity of a users claim to identity/non-identity in the system. Transaction: An attempt by a user to validate a claim of identity or non-identity by consecutively submitting one or more samples, as allowed by the system’s decision policy. Verification: The user makes a positive claim to an identity, requiring a one-to-one comparison of the submitted sample to the enrolled template for the claimed identity. Identification: The user makes either no claim or an implicit negative claim to an enrolled identity, and a one-to-many search of the entire enrolled database is required. Positive claim of identity: The user claims to be enrolled in or known to the system. An explicit claim might be accompanied by a claimed identity in the form of a name, or personal identification number (PIN). Common access control systems are an example. Negative claim of identity: The user claims not to be known to or enrolled in the system. For example, enrolment in social service systems open only to those not already enrolled. Genuine claim of identity: A user making a truthful positive claim about identity in the system. The user truthfully claims to be him/herself, leading to a comparison of a sample with a truly matching template. Impostor claim of identity: A user making a false positive claim about identity in the system. The user falsely claims to be someone else, leading to the comparison of a sample with a non-matching template.

Fig. 7: Basic definitions and terminology that are used in biometric system design.

18

S.N. Yanushkevich, Fundamentals of Biometric System Design

19

Example 7: (Response.) Similarity encoded by YES (1) or NO (0), between the input XQ given its number 11101 and the template XI given its number 10011 can be represented by the following binary number NO

# XQ

#X

z}|{ z }| { z }|I { 0 11101 10011 | {z } Binary number

2.2

Decision rule

If the stored biometric template of a user I is represented by XI and the acquired input for recognition is represented by XQ , then the null H0 and alternate H1 hypotheses are: Null hypothesis H0 : Input XQ does not come from the same person as the template XI ; the associated decision is: “Person I is not who he/she claims to be”; Alternate hypotheses H1 : Input XQ comes from the same person as the template XI ; the associated decision is: “Person I is who he/she claims to be” That is, we wish to test Null hypothesis H0 : D = D0 Alternative hypothesis H1 : D 6= D0 The decision rule is as follows: if the matching score S(XQ , XI ) is less than the system threshold t, then decide H0 , else decide H1 .

Controlled decision making in a biometric system The higher the score, the more certain the system is that the two biometric measurements come from the same person. The system decision is regulated by the threshold t:

Probability

Decision making

Threshold

Nonmate pairs

t

(different persons) M a t c h i n g

Mate pairs (the same person) s c o r e

Decision 1: Pairs of biometric samples generating scores higher than or equal to t are inferred as mate pairs, that is, the pairs belong to the same person. Decision 2: Pairs of biometric samples generating scores lower than t are inferred as nonmate pairs, that is, the pairs belong to different persons.

S.N. Yanushkevich, Fundamentals of Biometric System Design

2.3

20

Decision error rates

Decision errors are due to matching errors and image acquisition errors. These errors are summed up and drive the decision process at various levels of the system, in particular, in situation when (a) one-to-one or one-to-many matching is required; (b) there is a positive or negative claim of identity; and (c) the system allows multiple attempts (the decision policy). Biometric performance has traditionally been stated in terms of the decision error rates.

2.4

FRR computing

The FRR (type I error) is defined as the probability that the user making a true claim about his/her identity will be rejected as him/herself. That is, the FRR is the expected proportion of transactions with truthful claims of identity (in a positive ID system) or non-identity (in a negative ID system) that are incorrectly denied. A transaction may consist of one or more truthful attempts dependent upon the decision policy. Note that rejection always refers to the claim of the user Example 8: (False reject.) If person A1 types his/her correct user ID into the biometric login for the given terminal, this means that A1 has just made a true claim that he/she is A1 . Person A1 presents his/her biometric measurement for verification. If the biometric system does not match the template of A1 to the A1 ’s presented measurement, then there is a false reject. This could happen because the matching threshold is too low, or the biometric features presented by a person A1 are not close enough to the biometric template. Suppose a person A1 was denied his authentication (unsuccessfully authenticated) as A1 n times, while the total number of attempts was N , then FRR = n/N . Statistically, the more times something is done, the greater is the confidence in the result. The result is the mean (average) FRR for K users of the system: K 1 X FRR = FRRi K i=1

FRR and matching algorithm The strength of the FRR is the robustness of the algorithm. The more accurate the matching algorithm, the less likely a false rejection will happen.

S.N. Yanushkevich, Fundamentals of Biometric System Design

2.5

21

FAR computing

The FAR (type II error) is defined as the probability that a user making a false claim about his/her identity will be verified as that false identity. That is, FAR is the expected proportion of transactions with wrongful claims of identity (in a positive ID system) or non-identity (in a negative ID system) that are incorrectly confirmed. A transaction may consist of one or more wrongful attempts, depending upon the decision policy. Note that acceptance always refers to the claim of the user3 . Example 9: (False accept.) If a person A1 types the user ID of another person A2 into the biometric login for the given terminal, this means that A1 has just made a false claim that he or she is A2 . The person A1 presents his biometric measurement for verification. If the biometric system matches A1 to A2 , then there is a false acceptance. This could happen because the matching threshold is set too high, or it could be that biometric features of A1 and A2 are very similar. Suppose the person A1 was n times successfully authenticated as A2 in the total number of attempts, N , then FAR = n/N . The FRR is the mean (average) for K users of a system: K 1 X FAR = FARi K i=1

FAR and matching algorithm The FAR characterizes the strength of the matching algorithm. The stronger the algorithm, the less likely that a false authentication will happen.

2.6

Matching errors

Matching algorithm errors, occurred while performing a single comparison of a submitted sample against a single enrolled template/model, are defined to avoid ambiguity within the system allowing multiple attempts or having multiple templates. 3

It should be noted that conflicting definitions are implicit in literature. In access control literature, a false acceptance is said to have occurred when a submitted sample is incorrectly matched to a template enrolled by another user.

S.N. Yanushkevich, Fundamentals of Biometric System Design

22

False match rate (FMR) is the expected probability that a sample will be falsely declared to match a single randomly-selected non-self template; that is, measurements from two different persons are interpreted as if they were from the same person. False non-match rate (FNMR) is the expected probability that a sample will be falsely declared not to match a template of the same measure from the same user supplying the sample; that is, measurements from the same person are treated as if they were from two different persons. Equal error rate (EER) is the value defined as EER=FMR=FNMR, that is, the point where false match and false non-match curves cross is called equal error rate or crossover rate. The EER provides an indicator of the system’s performance: a lower EER indicates a system with good level of sensitivity and performance. Difference between false match/non-match rates and false accept/reject rates is introduced in Fig. 8. Example 10: (FMR and FNMR.) Let us assume that a certain commercial biometric verification system wishes to operate at 0.001% FMR. At this setting, several biometric systems, such as the state-of-the-art fingerprint and iris recognition systems, can deliver less than 1% FNMR. A FMR of 0.001% indicates that, if a hacker launches a brute force attack with a large number of different fingerprints, 1 out of 100 000 attempts will succeed on an average. To attack a biometric-based system, one needs to generate (or acquire) a large number of samples of that biometric, which is much more difficult than generating a large number of PINs/passwords. The FMR of a biometric system can be arbitrarily reduced for higher security at the cost of increased inconvenience to the users that results from a higher FNMR. Note that a longer PIN or password also increases the security while causing more inconvenience in remembering and correctly typing them.

23

S.N. Yanushkevich, Fundamentals of Biometric System Design

Difference between false match/non-match rates and false accept/reject rates False match rate (FMR) and false non-match rate (FMNR) are not generally synonymous with false accept rate (FAR) and false reject rate (FRR), respectively: ◮ False match/non-match rates are calculated over the number of comparisons:

False match rate |

{z

V erif ication

←−←−

# of comparisons

Biometric system

False non-match rate

V erif ication

−→−→

}

|

{z

}

# of comparisons

◮ False accept/reject rates are calculated over transactions and refer to the acceptance or rejection of the stated hypothesis, whether positive or negative: False accept rate |

{z

# of transactions

V erif ication

←−←−

}

Biometric system

False reject rate

V erif ication

−→−→

|

{z

# of transactions

}

Fig. 8: Difference between false match/non-match rates and false accept/reject rates. Example 11: (FMR and FNMR.) Consider that airport authorities are looking for the 100 criminals. (a) Consider a verification system. The state-of-the-art fingerprint verification system operates at 1% FNMR and 0.001% FMR; that is, this system would fail to match the correct users 1% of the time and erroneously verify wrong users 0.001% of the time. (b) Consider an identification system. Assume that the identification FMR is still be 1%, the FNMR is 0.1%. That is, while the system has a 99% chance of catching a criminal, it will produce large number of false alarms. For example, if 10,000 people may use a airport in a day, the system will produce 10 false alarms.

S.N. Yanushkevich, Fundamentals of Biometric System Design

24

In fact, the tradeoff between the FMR and FNMR rates in a biometric system is no different from that in any detection system, including the metal detectors already in use at all the airports. Other negative recognition applications such as background checks and forensic criminal identification are also expected to operate in semi-automatic mode and their use follows a similar cost-benefit analysis.

2.7

FTE computing

The FTE (failure to enroll) is defined as the probability that a user, attempting to biometrically enroll, will be unable to do so. The FTE is usually defined by a minimum of three attempts. The FTE can be calculated as follows. Let unsuccessful enrollment event occurs if a person A1 , on his third attempt, is still unsuccessful. Let n be the number of unsuccessful enrollment events, and N be the total number of enrollment events. Then FTE = n/N . The mean (average) FTE for K users of a system is K 1 X FTE = FTEi K i=1

The EER (equal error rate) is defined as the crossover point on a graph that has both the FAR and FRR curves plotted.

Probability

Genuine and impostor distribution Imposter distribution

Threshold Genuine distribution

FMR FNMR

t MatchIng score

The distribution of scores, generated from pairs of samples taken from the same person, is called the genuine distribution. The distribution of scores while the samples are taken from different persons, is called the impostor distribution. FMR and FNMR for a given threshold t are displayed over the genuine and impostor score distributions; FMR is the percentage of nonmate pairs whose matching scores are greater than or equal to t, and FNMR is the percentage of mate pairs whose matching scores are less than t.

The FMR (FAR) and FNMR (FRR) are related and must be balanced (Figure 9). For example, in access control, perfect security would require denying access to everyone. Conversely, granting access to everyone would mean no security. Obviously, neither extreme is reasonable, and biometric system must operate somewhere between the two.

S.N. Yanushkevich, Fundamentals of Biometric System Design

False match rate (FMR) or False accept rate (FAR) ◮ A FMR and FAR occurs when a system incorrectly matches an identity; A FMR (FAR) is the probability of individuals being wrongly matched. ◮ False matches may occur because there is a high degree of similarity between two individuals’ characteristics. ◮ In a verification and positive identification system, unauthorized people can be granted access to facilities or resources as a result of an incorrect match. ◮ In a negative identification system, the result of a false match may be to deny access.

25

False non-match rate (FNMR) or False reject rate (FRR) ◮ A FNMR and FRR occurs when a system rejects a valid identity; A FNMR (FRR) is the probability of valid individuals being wrongly not matched. ◮ False non-matches occur because there is not a sufficiently strong similarity between individuals’ enrollment and trial templates, which could be caused by any number of conditions. For example, an individual’s biometric data may have changed as a result of aging or injury. ◮ In verification and positive identification system, people can be denied access to some facility or resource as a result of a system’s failure to make a correct match. ◮ In a negative identification system, the result of a false non-match may be that a person is granted access to resources to which he/she should be denied.

Balance of FMR (FAR) and FNMR (FRR) FMR (FAR) and FNMR (FRR) are related and must, therefore, always be assessed in tandem, and acceptable risk levels must be balanced with the disadvantages of inconvenience.

Fig. 9: Relations of the FMR (FAR) and FNMR (FRR).

S.N. Yanushkevich, Fundamentals of Biometric System Design

3

26

Receiver operating characteristic (ROC) curves

The standard method for expressing the technical performance of a biometric device for a specific population in a specific application is the Receiver Operating Characteristic (ROC) curve.

3.1

Applications of biometric system in terms of a ROC

False accept rate (FAR)

The system performance at all the operating points (thresholds ) can be depicted in the form given in Fig. 10. Forensic applications

Civilian applications

High security applications

False reject rate (FRR)

Fig. 10: Typical operating points of different biometric applications are displayed on the ROC curve. An ROC curve plots, parametrically as a function of the decision threshold t = T , the rate of “false positives” (i.e. impostor attempts accepted) is shown on the X-axis, against the corresponding rate of “true positives” (i.e. genuine attempts accepted) on the Y -axis.

3.2

Equal error rate (EER) in terms of ROC

Graphical interpretation of the EER is given in Figure 11. The FMR, FNMR, and EER behavior is expressed in terms of a ROC. The FMR and FNMR can be considered as the functions of the threshold t = T . These functions give the error rates when the match decision is made at some threshold T .

3.3

Comparison the performance of biometric systems

ROC curves allow to compare the performance of different systems under similar conditions, or of a single system under differing conditions.

S.N. Yanushkevich, Fundamentals of Biometric System Design

FRR

1 Equal error rate (EER)

FAR

0 1

27

◮ When the threshold T is set low, the FMR is high and the FNMR is low; when T is set high, the FMR is low and the FNMR is high. ◮ For a given matcher, operating point (a point on the ROC) is often given by specifying the threshold T . ◮ In biometric system design, when specifying an application, or a performance target, or when comparing two matches, the operating point is specified by choosing FMR or FNMR. ◮ The equal error operating point is defined as the EER. The matcher can operates with highly unequal FMR and FNMR; in this case, the EER is unreliable summary of system accuracy.

Fig. 11: The relationship between FRR, FAR, and EER. Example 12: (Comparison two matchers.) Various approaches can be used in matcher design. The matches must be compared using criteria of operational accuracy (method and algorithm) and operational time (computing platform). In Figure 12, the technique for comparison two matches is introduced using criterion of operational accuracy.

3.4

Confidence intervals for the ROC

Each point on the ROC curve is calculated by integrating “genuine” and “impostor” score distributions between zero and some threshold, t = T . Confidence intervals for the ROC at each threshold, t, have been found through a summation of the binomial distribution under the assumption that each comparison represents a Bernoulli trial4 The confidence, β, given a non-varying probability p, of k sample/template comparison scores, or fewer, out of n independent comparison scores being in the region of 4

An experiment can be represented by n repeated Bernoulli trials, each with two outcomes that can be labeled success, with a probability p, or failure, with probability 1 − p. The probability distribution of  the binomial random variable X, that is, the number of successes in n independent trials,  n! (n−x)!

px q n−x , x = 0, 1, . . . n. For example, for n = 3 and p = 0.25, the probability   3! distribution of X can be calculated as b(x; 3, 0.25) = (3−x)! (0.25)x (0.75)3−x , x = 0, 1, . . . 3.

is b(x; n, p) =

28

S.N. Yanushkevich, Fundamentals of Biometric System Design

Design example: Comparison two matchers using the ROC curves Problem formulation: FRR Matcher A

1 Matcher B

Target FNMR

In biometric system design, two types of matches are specified, type A and type B matcher. These matches are described by the ROC curves. Figure in the left shows the corresponding ROCs for these matches and their operating points for some specified target FNMR. The problem is to choose the better matcher.

FAR

0

a

1

b

Step 1: Understanding of the initial data The ROCs of two matches are plotted in the form acceptable for comparison (the same type of ROC and scaling factors). The ROC shows the trade-off between FMR and FNMR with respect to the threshold T . For a given operational matcher, the operational point is specified by the particular threshold T .

Step 2: Comparison of two matchers It follows from the ROC characteristics of the matchers, that: ◮ For every specified FMR it has a lower FNMR; ◮ For every specified FNMR it has a lower FMR;

Conclusion The matcher A is better than matcher A for all possible thresholds T .

Fig. 12: Techniques for comparison two matchers using the ROC curves (Example 12). integration would be Confidence intervals 1 − β = P (i ≤ k) =

k X

b(i; n, p)

(4)

i=0

|

{z

}

Available from Table

where binomial sums are available and given table for different values of n and p is given.

29

S.N. Yanushkevich, Fundamentals of Biometric System Design

Example 13: (Binomial distribution.) Examples of the manipulation of the binomial distribution given n = 15 and p = 0.4, are as follows 9 (a)

P (i ≥ 10)

=

1 − P (i < 10) = 1 −

X

b(i; 15, 0.4) = 0.0338

i=0

|

{z

}

From Table: 0.9662

(b) P (3 ≤ i ≤ 8)

=

8 X

8 X

b(i; 15, 0.4) =

i=0

i=3

b(i; 15, 0.4) −

|

{z

}

From Table: 0.9050

= (c)

P (i = 5)

=

0.8779 b(5; 15, 0.4) =

5 X i=0

|

b(i; 15, 0.4) − {z

}

From Table: 0.4032

=

0.1859

2 X

b(i; 15, 0.4)

i=0

|

{z

}

From Table: 0.0271

4 X

b(i; 15, 0.4)

i=0

|

{z

}

From Table: 0.2173

Equation 4 might be inverted to determine the required size, n, of a biometric test for a given level of confidence, β, if the error probability, p, is known in advance.

3.5

The number of comparison scores

The required number of comparison scores (and test subjects) cannot be predicted prior to testing. To deal with this, Doddingtons Law is to test until 30 errors have been observed. Example 14: (Doddingtons law.) If the test is large enough to produce 30 errors, we will be about 95% sure that the true value of the error rate for this test lies within about 40% of that measured. If the test is large enough to produce 30 errors, we will be about 95% sure that the true value of the error rate for this test lies within about 40% of that measured, provided that Equation 4 is applicable. The comparison of biometric measures will not be Bernoulli trials and Equation 4 will not be applicable if: (a) Trials are not independent, and (b) The error probability varies across the population. Example 15: (Equation 4 is not applicable.) Trials will not be independent if users stop after a successful use and continue after a non-successful use.

30

S.N. Yanushkevich, Fundamentals of Biometric System Design

Example 16: (Failure to enroll (FTE) rate.) A fingerprint biometric system may be unable to extract features from the fingerprints of certain individuals, due to the poor quality of the ridges. Thus, there is a failure to enroll (FTE) rate associated with using a single biometric trait. It has been empirically estimated that as much as 4% of the population may have poor quality fingerprint ridges that are difficult to image with the currently available fingerprint sensors. This fact results in FTE errors.

3.6

Test size

The size of an evaluation, in terms of the number of volunteers and the number of attempts made (and, if applicable, the number of fingers/hands/eyes used per person) will affect how accurately we can measure error rates. The larger the test, the more accurate the results are likely to be. Rules such as the Rule of 3 and Rule of 30, detailed below, give lower bounds to the number of attempts needed for a given level of accuracy. However, these rules are overoptimistic, as they assume that error rates are due to a single source of variability, which is not generally the case with biometrics. Ten enrolment-test sample pairs from each of a hundred people is not statistically equivalent to a single enrolment-test sample pair from each of a thousand people, and will not deliver the same level of certainty in the results. The Rule of 3 addresses the question: What is the lowest error rate that can be statistically established with a given number N of (independent identically distributed) comparisons? This value is the error rate p for which the probability of errors in N trials is zero, purely by chance. It can be, for example, 5%.

The Rule of 3 Error rate p



Error rate p



3 for a 95% confidence level N 2 for a 90% confidence level N

(5) (6)

Example 17: (Rule of 3.) A test of 300 independent samples can be said with 95% confidence to have an error rate of or less.

3 300

= 1%

S.N. Yanushkevich, Fundamentals of Biometric System Design

31

The “Rule of 30” Doddington5 proposes the Rule of 30 for helping determine the test size: To be 90% confident that the true error rate is within ±30% of the observed value, we need at least 30 errors. The rule below generalizes different proportional error bands:

The Rule of 30 To be 90% confident that the true error rate is within ±10% of the observed value, we need at least 260 errors ±30% of the observed value, we need at least 30 errors ±50% of the observed value, we need at least 11 errors

Example 18: (Rule of 30.) If we have 30 false non-match errors in 3,000 independent genuine trials, we can say with 90% confidence 30 that the true error rate is 3000 = 0.01 ± 30%, that is, 1% − 0.3 ≤ True error rate ≤ 1% + 0.3 0.7% ≤ True error rate ≤ 1.3%

3.7

Estimating confidence intervals

With sufficiently large samples, the central limit theorem implies that the observed error rates should follow an approximately normal distribution. However, because we are dealing with proportions near to 0%, and the variance in the measures is not uniform over the population, some skewness is likely to remain until the sample size is quite large. Confidence intervals under the assumption of normality are considered in Section 1. Often when Equation 1 is applied, the confidence interval reaches into negative values for the observed error rate. However, negative error rates are impossible. This is due to non-normality of the distribution of observed error rates. In these cases a special approaches are required, such as non-parametric methods. The latter reduce the need to make assumptions about the underlying distribution of the observed error rates and the dependencies between attempts.

5

Doddington, G.R., Przybocki, M.A., Martin, A.F., and Reynolds, D.A. The NIST speaker recognition evaluation: Overview methodology, systems, results, perspective. Speech Communication, 2000, 31(2-3), 225-254.

S.N. Yanushkevich, Fundamentals of Biometric System Design

32

References [1] Bolle R., Connell J., Pankanti S., Ratha N., and Senior A. Guide to Biometrics. Springer, Heidelberg, 2004. [2] Germain R. S., Califano A., and Coville S. Fingerprint matching using transformation parameter clustering. IEEE Computational Science and Engineering, pp. 42–49, Oct.Dec. 1997. [3] Gonzalez R. C., Woods R. E., and Eddins S. L. Digital Image Processing Using MATLAB. Pearson, Prentice Hall, 2004. [4] Joliffe I. T. Principle Component Analysis. Springer-Verlag, New York, 1986. [5] Rangayyan R.M. Biomedical Image Analysis. CRC Press, Boca Raton, FL, 2005. [6] Richards A. Alien Vision: Exploring the Electromagnetic Spectrum with Imaging Technology. SPIE, 2001.

33

S.N. Yanushkevich, Fundamentals of Biometric System Design

4

Problems

Problem 1: The distances Di between feature points measured in a sample of signatures are represented by a normally distributed random variable d with the following parameters of the mean µ and standard deviation σ, n(d; µ, σ) (Fig. 13a): (a) If µ = 40 and σ = 1.5, calculate the probability P (39 < d < 42) Solution: Step 1:

Step 2: Step 3:

d1 − µ d2 − µ 0.44 0.44 that is z > 1.07

Step 2:

P (d > 2.5) = P (z > 1.07)

Step 3:

P (z > 1.07)) = 1 − P (z < 1.07) = 0.1423

Answer: P (d > 2.5) = 0.1423 (c) If µ = 5 and σ = 1.58, calculate the probability P (d = 4) Solution: Let d1 = 1.5 and d2 =4.5, then

Standard normal distribution n(z;0,1)

z 0 1.07 P(z > 1.07)

34

S.N. Yanushkevich, Fundamentals of Biometric System Design

Step 1:

Step 2: = Step 3: =

d1 − µ d2 − µ 11) P (d = 10) , P (d = 9), and P (d = 11) P (8 < d < 10) and P (10 < d < 12)

35

S.N. Yanushkevich, Fundamentals of Biometric System Design

Problem 3: The sample of distances Di between feature points measured on a retina image are represented by a normally distributed, n(d; µ, σ), random variable d (Fig. 14a). The sample size is n = 36 and the sample mean is d = 2.6. The standard variance, σ, of the population is assumed σ = 0.3. Calculate: (a) 90% confidence interval for µ Solution: Using Equation 1 σ σ d − z α2 √ < µ < d + z α2 √ n n α for α = 0.1, = 0.05 z0.05 = 1.645 2 0.3 0.3 < µ < 2.6 + 1.645 √ 2.6 − 1.645 √ 36 36 2.52 < µ < 2.68

Standard normal distribution n(z;0,1)

z −1.645

Answer: With 90% confidence, the true mean lies within

2.52 < µ < 2.68

0

5%

1.645

5%

of the observed sample

mean d = 2.6 (b) 95% confidence interval for µ Solution: Using Equation 1 σ σ d − z α2 √ < µ < d + z α2 √ n n α = 0.025 z0.025 = 1.96 for α = 0.05, 2 0.3 0.3 2.6 − 1.96 √ < µ < 2.6 + 1.96 √ 36 36 2.50 < µ < 2.70

Standard normal distribution n(z;0,1)

z −1.96

Answer: With 95% confidence, the true mean lies within

2.50 < µ < 2.70

mean d = 2.6 (c) 99% confidence interval for µ Solution: Using Equation 1

of the observed sample

2.5%

0

1.96

2.5%

36

S.N. Yanushkevich, Fundamentals of Biometric System Design

σ σ d − z α2 √ < µ < d + z α2 √ n n α for α = 0.01, = 0.005 z0.005 = 2.575 2 0.3 0.3 2.6 − 2.575 √ < µ < 2.6 + 2.575 √ 36 36 2.47 < µ < 2.73

Standard normal distribution n(z;0,1)

z −2.575

Answer: With 99% confidence, the true mean lies within

2.47 < µ < 2.73

0.5%

0

2.575

0.5%

of the observed sample

mean d = 2.6 Observation: The larger the value we choose for z α2 , the wider we make all the intervals and the more confident we can be that the parameter sample selected will produce an interval that contains the unknown parameter µ.

(a)

(b)

Fig. 14: The distances Di between feature points measured in a retina (a) and gait (b) are represented by a normally distributed, n(d; µ, σ) (Problems 3 and 4). Problem 4: The sample of distances Di between feature points measured in a sample of retina are represented by a normally distributed, n(d; µ, σ), random variable d (Fig. 14b). The sample size is n = 49 and the sample mean is d = 4.0. The standard variance, σ, of the population is assumed σ = 0.2. Calculate: (a) 85% confidence interval for µ (b) 90% confidence interval for µ

37

S.N. Yanushkevich, Fundamentals of Biometric System Design

(c) 95% confidence interval for µ (d) 98% confidence interval for µ Compare the confidence intervals Problem 5: How large is the size of sample considered in Problem 3, if we want to be: (a) 90% confident that our estimate of µ is off by less than 0.05. Solution: Using Equation 3, the sample size is Sample size n =



z α2 × σ e

2

=



2

≈ 100

2

≈ 138

2

≈ 225

1.645 × 0.3 0.05

(b) 95% confident that our estimate of µ is off by less than 0.05. Solution: Using Equation 3, the sample size is Sample size n =



z α2 × σ e

2

=



1.96 × 0.3 0.05

(c) 99% confident that our estimate of µ is off by less than 0.05. Solution: Using Equation 3, the sample size is Sample size n =



z α2 × σ e

2

=



2.575 × 0.3 0.05

Problem 6: How large is the size of sample in Problem 4, if we want to be: (a) (b) (c) (d)

85% confident that our estimate of µ is off by less than 0.5 90% confident that our estimate of µ is off by less than 0.5 95% confident that our estimate of µ is off by less than 0.5 99% confident that our estimate of µ is off by less than 0.5

Problem 7: Estimate the lowest error rate that can be statistically established with the following number N of (independent identically distributed) comparisons: (a) 90% confident that the lowest error rate p, for which the probability of zero errors in 30 trials is purely by chance Solution: Using the Rule 5, the lowest error rate p is p = 2/30 = 0.07 or 7% (b) 90% confident that the lowest error rate p, for which the probability of zero errors in 100 trials is purely by chance Solution: Using the Rule 5, the lowest error rate p is p = 2/100 = 0.02 or 2%

38

S.N. Yanushkevich, Fundamentals of Biometric System Design

(c) 95% confident that the lowest error rate p, for which the probability of zero errors in 30 trials is purely by chance. Solution: Using the Rule 5, the lowest error rate p is p = 3/30 = 0.1 or 10% (c) 95% confident that the lowest error rate p, for which the probability of zero errors in 100 trials is purely by chance Solution: Using the Rule 5, the lowest error rate p is p = 3/100 = 0.03 or 3% Problem 8: Using the Rule of 30, determine the true error rate in the following experiments: (a) (b) (c) (d)

1 error is observed in 30 independent trials 1 error is observed in 100 independent trials 10 error is observed in 500 independent trials 50 error is observed in 1000 independent trials

Problem 9: Suppose that a device’s performance goal is to reach 1% false non-match rate, and a 0.1% false match rate. Using the Rule of 30, estimate the number of genuine attempt trials and impostor attempt trials. Solution: 30 errors at 1% false non-match rate implies a total of 3,000 genuine attempt trials, and 30 errors at 0.1% false match rate implies a total of 30,000 impostor attempt trials. Note that the key assumption is that these trials are independent. Problem 10: The distances Di between feature points measured in 100 fingerprints are represented by a normally distributed, n(x; µ, σ), random variable x with the sample mean x = 71.8 (Fig. 15a). Assuming a population standard deviation of σ = 8.9, does this seem to indicate that the mean of distances is greater than 70? Use a 0.1 level of significance. Solution: Input data: x = 71.8, σ = 8.9, n = 100, µ = 70, and α = 0.05 Step 1: Formulate the hypotheses H0 : µ = 70 H1 : µ > 70 Step 2: Critical point for α = 0.1 is z0.05 = 1.645 (from the table); critical region is defined as z0.05 > 1.645 Step 3: Critical point for input data (x = 71.8, σ = 8.9, n = 100, and µ = 70) z =

71.8 − 70 x−µ √ √ = = 2.02 σ/ n 8.9/ 100

Step 4 Decision: Reject H0 and conclude that the mean is greater than 70.

Standard normal distribution n(z;0,1)

z 0 1.645 2.02 Critical region

z > 1.645

S.N. Yanushkevich, Fundamentals of Biometric System Design

39

Di Di

(a)

(b)

Fig. 15: The distances Di between feature points measured in a fingerprint (a) and face (b) are represented by a normally distributed, n(d; µ, σ) (Problems 10 and 11). Problem 11: The distances Di between feature points measured in 50 facial images are represented by a normally distributed, n(x; µ, σ), random variable x with the sample mean x = 7.8 (Fig. 15b). Assuming a population standard deviation of σ = 0.5, does this seem to indicate that the mean of distances is greater or less than 78? Use a 0.01 level of significance. Solution:

40

S.N. Yanushkevich, Fundamentals of Biometric System Design

Input data: x = 7.8, σ = 0.5, n = 50, µ = 8, and α = 0.01 Step 1: Formulate the hypothesises H0 : µ = 8 H1 : µ 6= 8 0.01 2

Step 2: Critical points for α = = 0.005 are z0.005 = {−2.575; 2.575} (from the table); critical region is defined as z0.005 < −2.575 and z0.005 > 2.575 Step 3: Critical point for input data (x = 7.8, σ = 0.5, n = 50, µ = 8, and α = 0.01) z =

x−µ 7.8 − 8 √ = √ = −2.83 σ/ n 0.3/ 50

Standard normal distribution n(z;0,1)

z −2.83 −2.575 Critical region z < − 2.575

0

2.575 Critical region z > 2.575

Step 4 Decision: Reject H0 in favor of the alternative hypothesis H1 : µ 6= 8 Problem 12: Evaluate the performance of a system that accept at least 5 facial images of impostors as belonging to the database of 100, and reject 10 faces of persons enrolled in the database.