A Mathematical Programming Approach to Stratified Random Sampling

Cairo University Faculty of Economics and Political Science Department of Statistics A Mathematical Programming Approach to Stratified Random Samplin...

Author: Hubert Sherman Norris

15 downloads 0 Views 759KB Size

Report

Download PDF

Recommend Documents

A) Stratified sampling B) Cluster sampling C) Simple random sampling D) Systematic random sampling

Integer programming formulations applied to optimal allocation in stratified sampling

Einsatz von Varianzreduktionstechniken II : Stratified Sampling und Common Random Numbers

Reformulations in Mathematical Programming: A Computational Approach

OPTIMUM ALLOCATION IN MULTIVARIATE STRATIFIED SAMPLING: MULTI-OBJECTIVE PROGRAMMING

Statistical correlation in stratified sampling

An Integer Programming Formulation Applied to Optimum Allocation in Multivariate Stratified Sampling

Musical Sound: A Mathematical Approach to Timbre

Random Sampling of Questions

Chapter 5 Random Sampling

Simple Random Sampling

Random Sampling: a Tool for Library Research

Use random sampling to draw inferences about a population

SOME STRATIFIED SAMPLING PLANS IN REPLICATED DESIGNS

Optimum allocation of multi-items in stratified random sampling using principal component analysis

random sampling error sampling method error non sampling method error

! do not forget random sampling

Using PROC SURVEYSELECT: Random Sampling

4. Convenience sampling is an example of a. probabilistic sampling b. stratified sampling c. nonprobabilistic sampling d

A mathematical programming approach to multiattribute decision making with interval-valued intuitionistic fuzzy assessment information

Outline. Chapter 3: Random Sampling, Probability, and the Binomial Distribution. Random Sampling Model. Random Variables. Types of Random Variables

Logic and Mathematical Programming

a. cluster sample b. stratified random sample c. simple random sample d. systematic sample

PROGRAMMING LANGUAGES MATHEMATICAL THEORIES

Cairo University Faculty of Economics and Political Science Department of Statistics

A Mathematical Programming Approach to Stratified Random Sampling

Prepared by Dina Mohsen Mohamed Sabry

Supervised by Prof. Ramadan Hamed Mohamed

Prof. Reda Ibrahim Mazloum

Professor of Statistics

Professor of Statistics

Department of Statistics

Department of Statistics

Dr. Mahmoud Mostafa Rashwan Assistant Professor of Statistics Department of Statistics

A Thesis Submitted to the Department of Statistics, Faculty of Economics and Political Science in Partial Fulfilment of the Requirements for the M.Sc. Degree in Statistics

2012

i

A Mathematical Programming Approach to Stratified Random Sampling Abstract When applying stratified sampling, the problem of allocating the sample to different strata arises. Many classical methods are available to allocate the sample to the different strata. Nevertheless, mathematical programming methods have many advantages and can handle the allocation problem while overcoming the limitations of the classical methods. Thus, there have been many attempts by researchers to apply mathematical programming in the field of sampling. Most of these attempts concentrate on minimizing the variances of the overall estimators when optimally allocating the sample to the different strata. However, none of the models focuses on minimizing the variances of the estimators within the strata and this is what this study aims to deal with. In many practical situations, the purpose of the study could be to estimate overall estimators in addition to separate estimators within each stratum. Hence, the present study targets minimizing the coefficients of variation of the overall estimators in addition to the coefficients of variation of the estimators within the strata when optimally allocating the sample. This creates a multiple objective problem that needs to be dealt with using the appropriate approach. As a result, this study adopts a goal programming approach that tries to tackle this problem in multivariate surveys by maximizing the precision of the overall estimators in addition to the precision of the estimators within each stratum under a fixed cost. Integer programming is used to guarantee integer values for the optimal allocation. The proposed approach is compared with three of the classical methods of allocation in addition to five mathematical programming models suggested in the literature using a simulation study. Based on the criteria used for comparison, it is shown that the suggested models have the highest efficiency in obtaining the estimators within the strata in certain cases. Keywords: Multivariate Stratified Sampling; Optimum Allocation; Goal Programming. Supervised by Prof. Ramadan Hamed Mohamed

Prof. Reda Ibrahim Mazloum

Professor of Statistics

Professor of Statistics

Department of Statistics

Department of Statistics Dr. Mahmoud Mostafa Rashwan Assistant Professor of Statistics Department of Statistics

A Thesis Submitted to the Department of Statistics, Faculty of Economics and Political Science in Partial Fulfilment of the Requirements for the M.Sc. Degree in Statistics 2012 ii

Name: Dina Mohsen Mohamed Sabry Youssef Nationality: Egyptian Date and Place of Birth: 9/12/1985, Giza – Egypt Degree: Master of Science Specialization: Statistics Supervisor:

Prof. Ramadan Hamed Mohamed

Prof. Reda Ibrahim Mazloum

Professor of Statistics

Professor of Statistics

Department of Statistics

Department of Statistics Dr. Mahmoud Mostafa Rashwan Assistant Professor of Statistics Department of Statistics

Title of the Thesis:

A Mathematical Programming Approach to Stratified Random Sampling Summary of the Thesis: The main objective of this study is to introduce goal programming models that try to tackle the problem of sample allocation in stratified random sampling by taking into account the precision of the overall estimators in addition to the precision of the estimators within the strata under a fixed budget. Hence, the present thesis focuses on the formulation of the proposed models. Moreover, the proposed models are compared with other models presented in the literature through a simulation study. The performance of the models is evaluated using three criteria that measure the efficiency of the models in obtaining the overall estimators in addition to the estimators within the strata.

The present thesis is divided into five chapters which are organized in the following manner: Chapter 1: Introduces the main objectives of this study in addition to outlining the contents of the thesis.

Chapter 2: Illustrates a review on stratified random sampling, in addition to some of the classical methods of sample allocation. Moreover, the notations that are to be used throughout the thesis are to be demonstrated in this chapter as well.

iii

Chapter 3: Presents a review on various mathematical programming approaches suggested in the literature that deal with the problem of sample allocation in stratified random sampling.

Chapter 4: Introduces the proposed goal programming approach discussing the criteria that are to be used for comparison in addition to the simulation study conducted and the conclusions reached from the simulation.

Chapter 5: Discusses the main concluding remarks reached and presents some points for future work.

iv

Acknowledgments I would like to express my most profound gratefulness and appreciation to Prof. Ramadan Hamed for his patience, guidance and continuous help during the preparation time of this thesis.

Also, my deepest gratitude goes to Prof. Reda Mazloum for her support, care and co-operation in providing me with her knowledge and expertise whenever needed.

I would also like to genuinely and sincerely thank Dr. Mahmoud Rashwan who never hesitated in helping and assisting me. Dr. Mahmoud was very supportive, encouraging and always provided me with positive energy that motivated me during the tough times of my research.

A warm and heartfelt indebtedness and thankfulness goes to my family especially my parents who were always there for me and for their unconditional love and support throughout my whole life.

Last but not least, I would like to dedicate a very special thanks to my professors, colleagues and friends at the faculty of Economics and Political Science for their continuous support.

v

Table of Contents Chapter 1: Introduction .............................................................................................. 1 1.1 Research Objective ............................................................................................... 2 1.2 Thesis Outline.......................................................................................................3 Chapter 2: Review on Stratified Random Sampling ................................................ 4 2.1 Stratified Random Sampling ................................................................................ 4 2.2 Types of Sample Allocation ................................................................................. 6 2.3 Sample Allocation with More than One Variable ................................................ 9 Chapter 3: Review on Mathematical Programming Approaches to Sample Allocation in Stratified Random Sampling .......................................... 11 3.1 Univariate Case ..................................................................................................12 3.2 Multivariate Case (correlation is not taken into account) ..................................13 3.2.1 Cost As An Objective .................................................................................. 13 3.2.2 Precision As An Objective ........................................................................... 14 3.3 Multivariate Case (correlation is taken into account) ........................................ 24 3.4 Precision of Stratum Estimators ......................................................................... 25 Chapter 4: The Suggested Mathematical Programming Approach ..................... 26 4.1 The Suggested Mathematical Programming Approach...................................... 27 4.1.1 The Suggested Objectives............................................................................ 27 4.1.2 The Proposed Models .................................................................................. 28 4.1.3 The Criteria for Comparison ........................................................................ 32 4.2 Simulation Study ................................................................................................ 33 4.2.1 The Design of the Simulation Study ............................................................ 33 4.2.2 Data generation ............................................................................................ 35 4.2.3 Software Packages ....................................................................................... 37 4.3 Simulation Results .............................................................................................. 39 4.3.1 Mean of Relative Efficiencies (MRE).......................................................... 40 4.3.2 Total Sample Size ........................................................................................ 42 4.3.3 Mean of Coefficients of Variation (MCV) ................................................... 42 4.3.4 Relative Mean Index (RMI) ......................................................................... 48 4.3.5 The Effect of Varying the Budget on the Models’ Performance ................. 49 Chapter 5: Conclusions and Further Research ....................................................... 52 References.. ................................................................................................................. 54

vi

List of Tables Table 4.1 : Summary of the Models under Comparison with the Proposed Approach….32 Table 4.2 : Simulation Design…………………..………………………………….........35 Table 4.3 : Combination 1: 2x2 (2 strata and 2 variables)…………………………........36 Table 4.4 : Combination 2: 3x2 (3 strata and 2 variables)…………………………........36 Table 4.5 : Combination 3: 4x2 (4 strata and 2 variables)…………………………........36 Table 4.6 : Combination 4: 2x3 (2 strata and 3 variables)…………………………........37 Table 4.7 : Combination 5: 3x3 (3 strata and 3 variables)…………………………........37 Table 4.8 : Combination 6: 4x3 (4 strata and 3 variables)…………………………........37 Table 4.9 : Mean of Relative Efficiencies (MRE)…………………………………........40 Table 4.10 : Total Sample Size “ ”………………………………………………..........42 Table 4.11 : Mean of Coefficients of Variation in the 2 Strata Case…….......……….…43 Table 4.12 : Mean of Coefficients of Variation in the 3 Strata Case………...………….44 Table 4.13 : Mean of Coefficients of Variation in the 4 Strata Case…………...……….46 Table 4.14 : Relative Mean Index (RMI)……..……………………………………........48 Table 4.15 : Mean of Relative Efficiencies (MRE) Under Different Budgets…..………50 Table 4.16 : Total Sample size “ ” Under Different Budgets…..…………………....... 51

vii

Glossary of Notation Total number of units in stratum Total population size Number of units in the sample drawn from stratum Total sample size Value obtained for the th unit in the th stratum Stratum weight ̅

True population mean in stratum

̅

Sample mean in stratum True population variance in stratum Sample variance in stratum

̅

Overall population mean

̅

Overall sample mean

(̅ )

Variance of the sample mean in stratum

(̅ )

Variance of the overall sample mean Sample size in the th stratum for the th variable Value obtained for the th unit in the th stratum for the th variable

̅

True population mean of the th variable in stratum

̅

Sample mean of the th variable in stratum True population variance of the th variable in stratum Sample variance of the th variable in stratum

(̅ ) (̅ (

Variance of the sample mean of the th variable in stratum

))

Variance of the overall sample mean of the th variable Total budget Fixed cost Cost per sampling unit in the th stratum Weights representing the importance of the th variable

(̅ ( (̅ (

)) ))

Individual desired variance of the overall sample mean of the th variable Compromise variance of the overall sample mean of the th variable under optimum compromise strata sample sizes

(̅ )

Compromise variance of the sample mean of the th variable in the th stratum under optimum compromise strata sample sizes Positive deviation: The amount of deviation for a given goal by which it exceeds the aspired level (target) Negative deviation: The amount of deviation for a given goal by which it is less than the aspired level (target) The lower bound on the sample size that is to be drawn from the th stratum

viii

Chapter 1 Introduction “Sampling is the process by which inference is made to the whole by examining only a part”. Sample surveys are conducted on different cultural and scientific aspects [18]. The use of sampling surveys arose from the need to minimize the time and effort that is greatly consumed when using complete enumeration. Moreover, although the cost per observation in sample surveys is higher than in complete enumeration; the overall cost of the sample survey will be much less.

Furthermore, sometimes

obtaining data by complete enumeration is not possible as in destructive tests such as testing the life of electric bulbs and haematological testing [18]. In addition, more comprehensive (and frequent) data can be obtained using sampling surveys as it is possible to make use of the highly trained and competent personnel or the specialized equipment that are limited in availability. Hence, sample surveys offer more scope and flexibility regarding the types of information that can be collected which are impractical to obtain using complete enumeration [5]. Furthermore, sample surveys can produce more accurate results as opposed to complete enumeration. And this is because the volume of work in surveys that rely on sampling is much less. So, it is possible to employ staff of higher quality and more careful supervision of the processing of the results can be provided [5]. Nevertheless, there are situations where complete enumeration appears to be essential; for example, when basic information is needed for every unit such as counting the population for census purposes and a voter’s list [18]. In addition, sampling may not be useful in case the population is small or the variance in the variable being measured is high [1]. In practice, post-enumeration sample surveys are usually conducted in order to evaluate and supplement censuses by assessing the coverage and the errors that will inevitably take place. Hence, it can be observed that sample surveys are often used in conjunction with censuses and as a result sampling and complete enumeration are “complementary and, in general, not competitive” [18]. Many sampling designs are available when conducting surveys. One of the most frequently used designs is stratified sampling. In this design the population is divided into separate sub-populations called strata. The main problem that faces researchers 1

when applying this design is to determine the sample size that is to be selected from each stratum. This is known as the sample allocation problem.

This allocation problem was dealt with by many classical methods such as: equal share allocation, proportional allocation and optimum allocation. In the optimum allocation method, the allocation to the different strata is determined by minimizing the variance of the overall estimator for a given total cost or minimizing the cost for a given level of precision (measured by the variance of the overall estimator). However, classical methods sometimes suffer from limitations such as: the inability to optimize several objectives simultaneously, producing non-integer values for the sample sizes and in some cases, producing a sample size larger than the corresponding stratum size. Nonetheless, mathematical programming has many tools that can overcome these limitations faced by classical methods. Thus, many researchers tried to tackle this problem using mathematical programming approaches.

Most of the mathematical programming models available in the literature deal with the allocation problem in the multivariate case. In these models, the allocation is considered to be optimum if it minimizes the variances of the overall estimators subject to a fixed cost or if it minimizes the total cost subject to a given level of precision. However, none of the models concentrate on the minimization of the variances of the estimators within the strata. In many surveys, it is sometimes the objective of the study to obtain overall estimators in addition to separate estimators within the strata. Hence, the precision of both overall estimators and estimators within the strata should be taken into account when finding the optimal allocation.

In the following section, the main research objectives are introduced and section 1.2 will outline the main contents of the thesis.

1.1 Research Objective: This study targets developing a goal programming approach that tackles the allocation problem in multivariate surveys by maximizing the precision of the overall estimators in addition to the precision of the estimators within each stratum under a fixed cost. Integer programming is applied to guarantee integer values for the sample sizes. The performance of the proposed approach is compared with three of the classical methods of allocation in addition to five mathematical programming models available in the literature using a simulation study. 2

1.2 Thesis Outline: Chapter 1: Presents an introduction to the thesis. Chapter 2: Presents a review on stratified random sampling, stating the main reasons for using stratified random sampling in addition to the properties of the estimators and the main notations that are to be used throughout the study. Moreover, some of the different classical methods of sample allocation are demonstrated in this chapter.

Chapter 3: Illustrates a review on the previous research that applies mathematical programming to deal with the allocation problem in stratified random sampling. The previous literature is divided into models conducted in the univariate case, multivariate case without taking the correlation between the variables into account and then the multivariate case while taking the correlation into consideration. Finally, the chapter will end with a brief review on some of the attempts that take the precision of the estimators within the strata into account.

Chapter 4: Introduces the suggested goal programming approach discussing the suggested objectives, the different proposed models and the criteria used for comparison. Moreover, this chapter demonstrates the design of the simulation study, the procedures used for data generation and the different software packages used in conducting the simulation. Finally, the chapter will end with an analysis of the main results obtained from the simulation.

Chapter 5: Summarizes the main conclusions reached based on the performed simulation study. In addition, the chapter will show some recommended points for further research.

3

Chapter 2 Review on Stratified Random Sampling The present chapter will first consider a review on stratified random sampling indicating the reasons that may lead to the stratification of a population into distinct sub-divisions (strata) and the notations that are to be used throughout this study. Furthermore, the general properties of the estimators used will be dealt with in this chapter. Finally, this chapter will consider the different types of allocating the total sample to the different sub-populations and it will illustrate an allocation method used in case of having more than one important variable.

2.1 Stratified Random Sampling: There are different sampling designs available when conducting surveys. The simplest design that is considered to be the basic sampling technique is simple random sampling. In this sampling design each unit in the population has the same chance of selection. Simple random sampling forms the basis of most of the other designs [5], [18]. Another technique of sampling which is the most frequently used is stratified sampling where the population is divided into suitable sub-populations that are internally homogeneous but heterogeneous with respect to each other. There are many reasons for dividing the population into distinct sub-populations: [2], [5], [16], [18] 1- When the variability in the population is very large, the use of stratified sampling appears to be advantageous. Moreover, if it is required to give a larger weight to some units that are uncommonly occurring in the population (such as respondents with very high income) then, stratified sampling is of significance in this case. 2- Stratified sampling can produce estimates for each stratum of the population separately, such as estimates for each geographical sub-population. 3- When using stratified sampling there is the benefit of utilizing the flexibility of using different sampling techniques in the different strata. For example, simple random sampling or systematic random sampling could be applied in the different strata. 4- Stratified sampling produces more precise estimates than those produced by simple random sampling of the same size (especially when the measurements within the strata are homogenous). 4

5- The cost per observation may be reduced when using stratified sampling (the cost per observation includes the cost of the interviewer, time and travel) 6- Administrative convenience may command the use of stratified sampling. For instance, the agency conducting the survey may have field offices, each of which can supervise the survey for a part of the population.

In stratified sampling, the population consists of

units, and it is divided into

non-overlapping sub-populations (called strata) of sizes (

The values of

units.

) are known in advance and when the strata have

been determined, a sample is drawn from each stratum independently and the sample sizes are denoted by

respectively.

Throughout this study, it is going to be taken for granted that the strata have already been determined, the technique used in the different strata is simple random sampling, and that sampling is done without replacement. Furthermore, this study will only be concerned with the estimation of the mean. 

Notation and Properties of the Estimators:

Throughout this study, the notation of Cochran (1977) [5] will be adopted, where the subscript

denotes the stratum and denotes the unit within the stratum: total number of units in stratum

,

total population size ,

∑

number of units in the sample drawn from stratum

total sample size ,

∑

value obtained for the th unit in the th stratum ,

stratum weight ,

̅ ̅ ∑

∑

true population mean in stratum

∑

sample mean in stratum

(

̅)

,

,

true population variance in stratum

5

,

,

In stratified sampling, the population mean is denoted by ̅ and has the following formula: ̅

∑

∑

̅

∑

̅

∑

An unbiased estimator for the population mean is ̅ ( where, ̅

∑

(2.1)

stands for stratified),

̅

(2.2)

Since as previously mentioned, sampling is done independently in

the different

strata, hence: (̅ )

(̅ )

∑

(2.3)

And provided that simple random sampling is applied in the different strata (which is the case in our study), thus: (̅ )

(

)

(

)

(2.4)

As a result, the variance of the estimator ̅ in stratified random sampling has the following formula:

(̅ )

∑

(

)

(2.5)

2.2 Types of Sample Allocation: In stratified random sampling, the problem of finding the values of the sample sizes

in the respective strata (i.e. allocating the sample) arises. There are several

methods of allocation such as: optimum allocation, Neyman allocation, equal share allocation, proportional allocation and predetermined allocation. In this section the different types of allocation are briefly discussed. 1- Optimum Allocation: The allocation of the sample to the different strata is determined by either minimizing the variance of the estimator ( ̅ ) for a given total cost “ ” or minimizing the cost for a given level of precision (i.e. simplest form of the cost function is: 6

(̅ )

). The

∑ where

(2. )

is the cost per sampling unit in the th stratum,

available and

is the total budget

is the overhead (fixed) cost. There are other forms for the cost

function, however, only the linear form will be considered in this study.

The optimum allocation formula (in terms of the total sample size ) has the following form: (

√ )

∑ (

√ )

(2. )

Hence, we can conclude from this formula that the sample size in a certain stratum increases as the size of the stratum within the stratum

increases, as the variability

increases and as the cost per unit in the stratum

decreases.

The previous formula is in terms of the sample size

which may not be

known in advance. Thus, if the cost is fixed then the optimum values of

can

be substituted in the cost function giving the following form: (

)∑ ( ∑ (

√ ) √ )

(2. )

On the other hand, if the variance of the estimator is fixed (say ( ̅ )

) then the optimum values of

can be substituted in ( ̅ )

giving, √ )∑ ( ( ⁄ )∑

(∑

It should be noted that the values of

√ )

(2. )

are unknown. Hence, they are either

obtained from previous studies or estimated from a pilot investigation.

2- Neyman Allocation: If the cost per unit (i.e.

is assumed to be equal for all the strata

) then, the cost function is reduced to: (2.1 )

Hence, for a given total cost, the total sample size is of the following form: (

)

7

(2.11)

And the optimum allocation formula becomes [from equation (2.7)]: (2.12)

∑ This type of allocation is known as “Neyman allocation” [5]. 3- Equal Share Allocation:

This type of allocation divides the total sample into equal shares for the different

strata in the population, (2.13)

Given that the total cost is fixed and takes the linear form (2.6), the total sample size takes the following form: (2.14)

∑ 4- Proportional Allocation:

Here, the total sample is allocated to the different strata in proportion to the total number of units in the sub-populations

(i.e.

is proportional to

), (2.15)

In this type of allocation we select the same proportion of units from each stratum. For a given total cost, the linear cost function (2.6) gives the total sample size in proportional allocation as follows [18]: ( ∑ where

)

(2.1 )

.

If on the other hand, the cost per observation is equal for all the strata, yielding the cost function (2.10) then the sample size will be given by the formula (2.11). 5- Predetermined Allocation: Predetermined allocation divides the total sample size (which could be determined in a subjective way) among the different strata according to the researcher’s judgement. 8

2.3 Sample Allocation with More than One Variable: In all the previously presented types of allocation, it was assumed that there is only one important variable that we base the allocation upon. However, this is usually not the case since sample surveys usually include more than one important variable. And an optimum allocation for one variable will not necessarily be optimum for another [5]. Many researchers suggested solutions to this problem such as Chatterjee and Yates (see [5]). However, in this study only one method is to be presented which is “Cochran’s average”. 

Cochran’s Average (i.e. compromised optimal allocation) : A few of the most important variables are to be chosen to optimally allocate the sample. Let the subscript

denote the variable where (

). As mentioned earlier, equation (2.7) gives the optimum allocation in terms of the total sample size , and equation (2.8) gives the total sample size in case of a fixed total cost. Substituting (2.8) in (2.7) we get (

)( ∑(

√ )

(2.1 )

√ )

which represents the optimum allocation under a fixed total cost. By applying this formula for each variable separately, we get the optimum individual strata sample sizes, (

)( ∑(

√ )

(2.1 )

√ )

where, value obtained for the th unit in the th stratum for the th variable ,

̅ ̅ ∑

∑

true population mean of the th variable in stratum

∑

sample mean of the th variable in stratum

(

̅ )

,

,

true population variance of the th variable in stratum

9

.

The individual strata sample sizes given by (2.18) are to be averaged over all the variables giving an optimum compromise allocation that takes all the variables into account, i.e.: ∑

(2.1 )

In all the previous methods of allocation, there is no guarantee that the resulting optimum allocation will be integer. This requires rounding of the values of the sample sizes in the different strata which could provide a total cost that exceeds the total budget specified (in case of a fixed total cost), hence providing infeasible solutions. Moreover, in the previous allocation methods the problem of oversampling can occur (oversampling happens when the sample size in one or more strata is larger than the stratum size [6]). As noted by [5], the optimum allocation formula can produce an in some strata that are larger than the corresponding number of units

in the stratum

and this problem has happened in practice on several occasions. This problem arises only when the overall sampling fraction (i.e.

) is large and the variability in some

strata is greater than the others [5]. Therefore, other alternatives to the classical methods have been applied that are thought to overcome the previous problems. Hence, there have been many attempts by researchers to apply mathematical programming in the field of sampling and this is what the next chapter will discuss.

10

Chapter 3 Review on Mathematical Programming Approaches to Sample Allocation in Stratified Random Sampling From the previous chapter, it can be seen that classical methods of sample allocation offer only one objective subject to one constraint when optimally allocating the sample. This can therefore be viewed as a limitation. Also, as stated before, classical methods suffer from the problem of producing non-integer sample sizes for the different strata. This could lead to infeasible solutions [i.e. having a total cost that exceeds the total specified budget (in case of fixing the cost)] due to rounding. Moreover, the problem of oversampling can be faced when using the classical methods of allocation. Hence, the use of mathematical programming appears to be advantageous as it can overcome these limitations. Mathematical programming has several advantages over classical methods. First, it offers the ability to optimize several objectives simultaneously and it has the benefit of assigning priorities to different objectives. Also, several constraints could be suggested. Second, mathematical programming can guarantee that the optimal allocation has integer values by the use of integer programming. Third, it can ensure that oversampling doesn’t occur. Accordingly, this chapter will illustrate a review on the different mathematical programming approaches to sample allocation suggested in the literature. As previously mentioned, mathematical programming tools offer researchers the advantage of optimizing more than one objective at the same time. And this is one of the main benefits that has been utilised by many authors in the field of sampling. In the coming sections, some of the mathematical programming models that were suggested in the literature to determine the optimal sampling scheme are presented.

In most cases, we may want to estimate parameters for more than one variable; therefore those variables should all be taken into consideration as the key variables when determining the optimal strata sample sizes. Hence, the review will begin with the models that were developed in the univariate case, the multivariate case without taking the correlation between the variables into account, and then the multivariate case where the correlation was taken into consideration. The present chapter will finally end with a brief review on some attempts that take the precision of the estimators within the strata into account. 11

Thus, the classification of the literature will be as follows: Mathematical Programming Models Multivariate (with correlation)

Multivariate (no correlation)

Univariate

Cost as an Objective

Precision as an Objective

Approach A

Approach B

Approach C

All these cases will be presented in the following sections.

3.1 Univariate Case: This section presents different mathematical programming models dealing with the allocation problem when only one variable is of interest.

Arthanari and Dodge [2] presented a review on the use of mathematical programming for optimal allocation of sample sizes in stratified random sampling. They formulated the problem of obtaining statistical information on population characteristics based on sample data as an optimization problem. In the univariate case, the authors considered the problem of having

strata where

it was assumed that the samples were drawn independently from different strata. The problem of choosing optimal such a problem

’s is known as the “optimal allocation problem”. In

’s are the decision variables and the objective can be the

minimization of the variance of the estimator of the variable under study (in this case the estimator is ̅ ) with the restriction on the fixed total sample size . Hence, the problem is formulated as: Minimize ( ̅ ) subject to

∑

(

∑

),

(3.1)

, ,

(3.2)

integer,

is the true population variance in stratum

,

(3.3)

and as mentioned before, its value is

either known from prior studies of the same kind or estimated from pilot investigations. 12