Statistics Denmark Maja Fromseier Petersen
[email protected]
Surveys using a probability-based web panel or an “E-boks panel” 2016
Presentation
Maja Fromseier Petersen, chief consultant and team leader for the survey consultants Master from Copenhagen University in Cultural Geography 2003-2007 Statistic Denmark, statistics on entrepreneurs and globalization 2007-2013 Deputy manager at SFI Survey (the data collection department) on the National Institute of Social research 2014- Merged with Statistic Denmark into ”Statistic Denmark Survey department” Contact info:
[email protected]
2
The Danish Statistical system Person
CPR
id: Person-No
Tax
Cadastral
Employment
Interview Web forms
Social stat
Health
Education
Building and Housing Register (BBR) id: Adress
Business register id: CVR-No BBR
CVR
And more…
SD Survey’s business model
SD Survey is a external selling offices in Statistics Denmark 90% of the turnover on 5 mill. € is from external customers Positive result 2015 0.6 mill. €. Overhead 2 mill. € Contacts annually 300.000 people in 80 surveys 29 employees, 25-50 central interviewers on cati central and 200 nationwide interviewers Not products and private marketing The entire process from sample, questionnaire, weighting and report Three principles: 1. Non loss 2. no (much) profit 3. take care of the reputation of Statistic Denmark 4
Mission and products Mission: Make SD methods and register valuable for others Products: Optimized sampling and data collection with: telephone interviews, personal interviews, web forms paper interviews experiments and tests on the PC 5
Overview of the presentation A new and well-documented web panel Representative surveys Recruiting to the web panel Selection from the web panel Non-response in the web panels Calibration in the web panels E boks – “panel” What is E boks and what is the E boks population? The E-boks test Who’s answering? 6
A new and good web panel Researchers at universities, government and research institutions use also cheap web panels Idea: Using Statistics Denmark's register of the entire population, it must be possible to establish a web panel that comes as close as possible to the classic representative surveys - by controlling the bias 7
A new and good web panel Accept bias due to demands about access to the Internet. Demand to new web panel: 1.Recruited from representative surveys (simple random sampling of the total population) 2.Be within the 95% confidence interval for all other variables not used for selection
8
Representative surveys (1) A ‘sample’ can be selected in many ways, - for example all who are here today Only a sample without bias can be really be generalized Three requirements to probability sampling: 1. All in the target population can be elected 2. Selection with known probability 3. Weighing with the inverse selection probability
9
Representative surveys (2) Probability sampling: The right answer may as well be higher as lower as what we can measure
The sample has a statistical uncertainty The larger sample the smaller the confidence interval But for all background/ register variables: 19 out of 20 samples are within the classic 95% confidence interval The most importen: If a sample is not representative: A larger sample may just make the estimate more wrong
10
Representative surveys (3) One of many monthly omnibus The sample is from the central personal register, the size is around1,600, and has a random error up to 2.5%
Population Elementary school 34,9 High school and vocational edu. 39,6 Short education 4,2 Medium education 12,4 Long education 8,9
Sample 34,0 39,3 4,0 13,2 9,5
11
Representative surveys (4) For any variables: 19 out of 20 samples are within the classic 95% confidence interval
Ethnicity Danish origin Immigrant Descendants
Present Population
Sample
88,4 10,2 1,5
89,2 9,2 1,6
12
Recruiting to Web panel (1) For more than three years, DST in representative samples, asked for e-mail addresses. 1,500 answer every month and about 600 say yes.
There are currently about 30.000 in the database. There are three sources of bias in this web panel. 1.The non-response of the initial representative sample 2.Those who do not have Internet access and do not have an e-mail 3.Those who will not give their e-mail
13
Recruiting to Web panel (2) Examples of bias: Population 18-34 year 29,0% 50-64 year 26,8% Elementary school 32,3% Long education 7,5% 600.000 DDK 15,8%
Web-panel 23,1% 30,6% 22,2% 10,4% 12,3% 20,9%
Random error with 30,000 observations around 0,5 percent 14
Recruiting to Web panel (3) More examples of bias: Population Men 50,1% South Denmark 21,1% Danes 87,8% Western countries 4,7% Non western 7,5% One plan housing 67,9%
Web-panel 49,0% 21,6% 94,3% 2,5% 3,2% 74,3%
15
Recruiting to Web panel (4) More examples of bias:
Singles without children Single parents Unmarried Married Unemployed Employed
Population 28,1% 28,1% 37,7% 48,3% 20,2% 61,4%
Web-panel 20,6% 20,6% 30,8% 56,5% 14,9% 68,3%
16
Recruiting to Web panel (5) Conclusion for recruitment The three sources of bias worsens picture of the normal first step non-response on a number of variables Gender and geography, however less Web panel shows a too socially positive picture of population
17
Selection from the web panel (1) Is it possible to correct the bias by selecting proportionally to registry variables?
G: Men A: 18-34 year R: South DK E: Elementary edu. F: Single parents
Population 50,1% 29,0% 21,1% 32,3% 28,1%
GAR 50,1% 29,1% 21,1% 21,6% 20,8%
EFGAR 50,1% 29,1% 21,0% 31,6% 28,1%
GAR (Gender, Age and Region) can not correct for Education and Family
18
Selection from the web panel (2) Examples of other factors that are not selected proportionally EFGAR (Gender, Age, Region, Education and Family) Population Danes 87,8% < 100.000 DKK 18,1% Employed 61,4%
GAR 95,3% 13,3% 68,6%
EFGAR 93,6% 16,9% 67,0%
Income, ethnic background and employment bias with EFGAR Income something better with EFGAR, but still bias.
19
Non-respons in web panels 3000 selected proportionally to EFGAR (Education, Family, Gender, Age and Region) Selected 18-34 year 28,8% Elementary 31,6% Men 50,1% Single parents 27,9% South DK 20,9%
Response 22,7% 23,1% 53,7% 27,5% 18,9%
Population 29,0% 32,3% 50,1% 28,1% 21,1%
Men answering more in web panel. Unlike other collection methods Education and age wrong again, even if it been corrected in the selection 20
Calibration in web panels GREG calibration after gender, age and region, as well as education and family type
18-34 year Single parents Elementary more tablet / phone
38
E-boks omnibus with and with out smileys Extra Omnibus
Extra Omnibus + Smileys
Daycares can demand parents to Agree: take out the children for 3 weeks 28 pct summer holiday
Agree: 36 pct
0-class obligatory or not
Agree: 79 pct
Agree: 67 pct
• Fully agree • Agree • Both agree and disagree • Disagree • Fully disagree
39
Web panel and E-boks panel Conclusions and your questions Even when web panel is recruited from representative surveys, there is great bias The non-response in the web collection gives new bias Weighting restores the population, but there is big variation in weights One could call this web panel for the quasi representative E-boks solves the problem about the first sources of bias, but it is still a problem that some respondents are hard to get to answer. We still have to weight data to restores the population. What do you think are the future for these ways of collecting data? Where should we focus and what should we be aware of?
40