INTRODUCTION TO SURVEY SAMPLING February 14, 2018 Linda Owens www.srl.uic.edu
General information
Please hold questions until the end of the presentation
Slides available at http://www.srl.uic.edu/seminars.htm
Please raise your hand so that I can see that you can hear me
2
Outline Introduction Target Populations Sample Frames Sample Designs Determining Sample Sizes Modes of Data Collection Questions
3
Introduction Census: Gathering information about every individual in a population Sample: Selection of a small subset of a population Census Sample
4
Why sample instead of taking a census?
Less expensive
Less time-consuming
More accurate
Samples can lead to statistical inference about the entire population
5
Probability vs. non-probability Probability Sample
Generalize to the entire population Unbiased results Known, non-zero probability of selection
Non-probability Sample
6
Exploratory research Convenience Probability of selection is unknown
Probability vs Non-Probability Sample
p=n/N=10/30=.3333
p=n/?=?=?
Steve Mays, YouTube video on sampling: https://youtu.be/yx5KZi5QArQ Rahul Patwari, YouTube video on non-probability sampling: https://youtu.be/-kwdXEXC7yE
7
Target population Definition:
The population to which we want to generalize our findings
Unit of analysis: Individual/Household/City
Geography: State of Illinois/Champaign County/City of Urbana
Age/Gender
Other variables
8
Examples of target populations
Population of adults in Champaign County
Faculty, staff, or students at the University of Illinois
Youth age 5 to 18 in Champaign County
Registered Voters
9
Sampling frame
Before you can ask people to answer your questions, you have to make contact with them How will you do that? Sampling frame is the mechanism that makes that possible Information on sampling frame has bearing on mode of data collection
10
Sampling frame
A complete list of all units, at the first stage of sampling, from which a sample is drawn
For example, lists of . . . addresses landline phone numbers in specific area codes blocks or census tracts in specified geographic areas members of professional organization schools cell phone numbers
11
Target populations, sample frames, and coverage Example 1: Population: Adults in Champaign County, IL Frames: List of landline numbers, list of census blocks, list of addresses Example 2: Population: Youth age 5 to 18 in Cook County Frame: List of schools Example 3: Population: Adults age 18-34 in United States Frame: ?? Coverage: How well does the sample frame represent the target population?
12
Coverage Error
Target Population Sample Frame
13
Sample designs for probability samples
Simple random samples
Systematic samples
Stratified samples
Cluster
Multi-stage
Combination (e.g. stratified cluster sample)
14
Simple random sampling (SRS)
Definition: Every element has the same probability of selection and every combination of elements has the same probability of selection.
Probability of selection: n/N, where n = sample size; N = population size
Use Random Number tables, software packages to generate random numbers
Most precision estimates assume SRS
15
Simple Random (6 out of 30)
16
Systematic sampling
Definition: Every element has the same probability of selection, but not every combination can be selected.
Use when drawing SRS is difficult List of elements is long & not computerized
Procedure Determine population size N and sample size n Calculate sampling interval (N/n) Pick random start between 1 & sampling interval Take every ith case Problem of periodicity
17
Systematic Sample (every 5th)
18
Stratified sampling: Proportionate
To ensure sample resembles some aspect of population
Population is divided into subgroups (strata) Students by year in school Faculty by gender
Simple Random Sample (with same probability of selection) taken from each stratum. Sampling fraction is the same for all strata, regardless of population in each stratum. Larger strata will have larger sample
19
Proportionate Stratified Sample (sampling fraction=1/5) N=25 (n=5)
20
N=10 (n=2)
N=15 (n=3)
Stratified sampling: Disproportionate
Major use is comparison of subgroups
Population is divided into subgroups (strata)
Compare girls & boys who play Little League Compare seniors & freshmen who live in dorms
Probability of selection needs to be higher for smaller stratum (girls & seniors) to be able to compare subgroups.
Requires weighting to adjust for different probabilities of selection
21
Disproportionate Stratified Sample (n=12--4 from each stratum, overall p=.24) p=4/25=.16
22
p=4/10=.40
p=4/15=.267
Cluster/Multistage sampling
Typically used in face-to-face surveys
Population divided into clusters
Schools (earlier example) Blocks
Draw a sample of clusters
Include every member of cluster (=cluster sample)
Select random sample of cluster members (=multistage sample)
Reasons for cluster sampling
Reduction in cost No satisfactory sampling frame available
23
Cluster Sample
24
Complex Sample Designs
Combination of sample strategies Example: multistage, stratified sample of adults in Chicago Stratify census blocks into groups based on predominant racial/ethnic group Draw a sample of census blocks from each stratum Draw a sample of housing units from each sampled census block Sample one respondent from all eligible adults in the household Each sampling stage has its own probability of selection Final probability of selection of eligible adult is product of all stages
1. 2. 3. 4. 5. 6.
25
Determining sample size: SRS
Need to consider
26
Precision Variation in subject of interest
Formula
Sample size
no = CI2 * (pq) Precision
For example:
no = 1.962 * (.5 * .5) .052
Sample size not dependent on population size (except finite population correction)
Sample size: Other issues
Finite Population Correction (FPC)
Use when sample >5% of pop ᇲ ே
݊ = ݊ᇱ /(1 + )
Design effects
Analysis of subgroups
Increase size to accommodate nonresponse
Cost 27
Modes of data collection
Face to face
Phone
Web
Mail
28
Target population/frame/mode correspondence
Mode needs to be consistent with information in sample frame
Mode needs to be consistent with target population
29
Cell phone and landline frames
Increasing proportion of US households are cell phone only (52.5% in 2017, 5.9% landline only) https://www.cdc.gov/nchs/data/nhis/earlyrelease/wireless2 01712.pdf (Blumberg & Luke) Cell phone only households tend to be • Unrelated adults • Hispanic adults • Younger • Lower SES • But…… Landline sample frames can will lead to bias 30
Cell phone and landline frames, cont.
Cell phone frames harder to target geographically than landline frames
Survey researchers are combining landline and cell phone frames
31
Address-based sampling
Sampling addresses from a near universal listing of residential mail delivery locations Post Office Delivery Sequence Files (DSF)
32
Address-based sampling: advantages
Coverage of households is very high
Can be matched to name and listed telephone numbers
Includes non-telephone households
More efficient than traditional block-listing
33
Address-based sampling: disadvantages
Incomplete in rural areas (although improving with 9-1-1 address conversion)
Difficulties with “multidrop” addresses
Best used with mail or face to face surveys.
Can be used for web surveys with some additional effort/cost 34
Thank you! Future noontime webinars
Introduction to Questionnaire Design, Wednesday, February 21
Survey Response Rates: Uses and Misuses, Wednesday, February 28
35
Evaluation
36
Questions
37
Resources
Books on Sampling: the Classics • • •
Leslie Kish, Survey Sampling, 1965 William Cochrane, Sampling Techniques, 3rd Ed. 2007 Seymour Sudman, Applied Sampling, 1976
Sharon Lohr, Sampling: Design and Analysis, 2009 https://www.cdc.gov/nchs/nhis/releases.htm#wireless Rahul Patwari, YouTube video on non-probability sampling: https://youtu.be/-kwdXEXC7yE Steve Mays, YouTube video on sampling: https://youtu.be/yx5KZi5QArQ 38