Statistics Denmark Maja Fromseier Petersen Surveys using a probability-based web panel or an E-boks panel 2016

Statistics Denmark Maja Fromseier Petersen [email protected] Surveys using a probability-based web panel or an “E-boks panel” 2016 Presentation  Maja Fr...
Author: Ethan Black
7 downloads 0 Views 957KB Size
Statistics Denmark Maja Fromseier Petersen [email protected]

Surveys using a probability-based web panel or an “E-boks panel” 2016

Presentation

 Maja Fromseier Petersen, chief consultant and team leader for the survey consultants  Master from Copenhagen University in Cultural Geography  2003-2007 Statistic Denmark, statistics on entrepreneurs and globalization  2007-2013 Deputy manager at SFI Survey (the data collection department) on the National Institute of Social research  2014- Merged with Statistic Denmark into ”Statistic Denmark Survey department”  Contact info: [email protected]

2

The Danish Statistical system Person

CPR

id: Person-No

Tax

Cadastral

Employment

Interview Web forms

Social stat

Health

Education

Building and Housing Register (BBR) id: Adress

Business register id: CVR-No BBR

CVR

And more…

SD Survey’s business model

 SD Survey is a external selling offices in Statistics Denmark  90% of the turnover on 5 mill. € is from external customers  Positive result 2015 0.6 mill. €. Overhead 2 mill. €  Contacts annually 300.000 people in 80 surveys  29 employees, 25-50 central interviewers on cati central and 200 nationwide interviewers  Not products and private marketing  The entire process from sample, questionnaire, weighting and report Three principles: 1. Non loss 2. no (much) profit 3. take care of the reputation of Statistic Denmark 4

Mission and products Mission: Make SD methods and register valuable for others  Products:  Optimized sampling and data collection with:  telephone interviews,  personal interviews,  web forms  paper interviews  experiments and tests on the PC 5

Overview of the presentation  A new and well-documented web panel  Representative surveys  Recruiting to the web panel  Selection from the web panel  Non-response in the web panels  Calibration in the web panels  E boks – “panel”  What is E boks and what is the E boks population?  The E-boks test Who’s answering? 6

A new and good web panel Researchers at universities, government and research institutions use also cheap web panels Idea:  Using Statistics Denmark's register of the entire population, it must be possible to establish a web panel that comes as close as possible to the classic representative surveys - by controlling the bias 7

A new and good web panel Accept bias due to demands about access to the Internet. Demand to new web panel: 1.Recruited from representative surveys (simple random sampling of the total population) 2.Be within the 95% confidence interval for all other variables not used for selection

8

Representative surveys (1)  A ‘sample’ can be selected in many ways, - for example all who are here today  Only a sample without bias can be really be generalized Three requirements to probability sampling: 1. All in the target population can be elected 2. Selection with known probability 3. Weighing with the inverse selection probability

9

Representative surveys (2) Probability sampling: The right answer may as well be higher as lower as what we can measure

The sample has a statistical uncertainty The larger sample the smaller the confidence interval But for all background/ register variables: 19 out of 20 samples are within the classic 95% confidence interval The most importen: If a sample is not representative: A larger sample may just make the estimate more wrong

10

Representative surveys (3) One of many monthly omnibus The sample is from the central personal register, the size is around1,600, and has a random error up to 2.5%

Population Elementary school 34,9 High school and vocational edu. 39,6 Short education 4,2 Medium education 12,4 Long education 8,9

Sample 34,0 39,3 4,0 13,2 9,5

11

Representative surveys (4) For any variables: 19 out of 20 samples are within the classic 95% confidence interval

Ethnicity Danish origin Immigrant Descendants

Present Population

Sample

88,4 10,2 1,5

89,2 9,2 1,6

12

Recruiting to Web panel (1) For more than three years, DST in representative samples, asked for e-mail addresses. 1,500 answer every month and about 600 say yes.

There are currently about 30.000 in the database. There are three sources of bias in this web panel. 1.The non-response of the initial representative sample 2.Those who do not have Internet access and do not have an e-mail 3.Those who will not give their e-mail

13

Recruiting to Web panel (2) Examples of bias: Population 18-34 year 29,0% 50-64 year 26,8% Elementary school 32,3% Long education 7,5% 600.000 DDK 15,8%

Web-panel 23,1% 30,6% 22,2% 10,4% 12,3% 20,9%

Random error with 30,000 observations around 0,5 percent 14

Recruiting to Web panel (3) More examples of bias: Population Men 50,1% South Denmark 21,1% Danes 87,8% Western countries 4,7% Non western 7,5% One plan housing 67,9%

Web-panel 49,0% 21,6% 94,3% 2,5% 3,2% 74,3%

15

Recruiting to Web panel (4) More examples of bias:

Singles without children Single parents Unmarried Married Unemployed Employed

Population 28,1% 28,1% 37,7% 48,3% 20,2% 61,4%

Web-panel 20,6% 20,6% 30,8% 56,5% 14,9% 68,3%

16

Recruiting to Web panel (5) Conclusion for recruitment  The three sources of bias worsens picture of the normal first step non-response on a number of variables Gender and geography, however less Web panel shows a too socially positive picture of population

17

Selection from the web panel (1) Is it possible to correct the bias by selecting proportionally to registry variables?

G: Men A: 18-34 year R: South DK E: Elementary edu. F: Single parents

Population 50,1% 29,0% 21,1% 32,3% 28,1%

GAR 50,1% 29,1% 21,1% 21,6% 20,8%

EFGAR 50,1% 29,1% 21,0% 31,6% 28,1%

GAR (Gender, Age and Region) can not correct for Education and Family

18

Selection from the web panel (2) Examples of other factors that are not selected proportionally EFGAR (Gender, Age, Region, Education and Family) Population Danes 87,8% < 100.000 DKK 18,1% Employed 61,4%

GAR 95,3% 13,3% 68,6%

EFGAR 93,6% 16,9% 67,0%

Income, ethnic background and employment bias with EFGAR Income something better with EFGAR, but still bias.

19

Non-respons in web panels 3000 selected proportionally to EFGAR (Education, Family, Gender, Age and Region) Selected 18-34 year 28,8% Elementary 31,6% Men 50,1% Single parents 27,9% South DK 20,9%

Response 22,7% 23,1% 53,7% 27,5% 18,9%

Population 29,0% 32,3% 50,1% 28,1% 21,1%

Men answering more in web panel. Unlike other collection methods Education and age wrong again, even if it been corrected in the selection 20

Calibration in web panels GREG calibration after gender, age and region, as well as education and family type

18-34 year Single parents Elementary more tablet / phone

38

E-boks omnibus with and with out smileys Extra Omnibus

Extra Omnibus + Smileys

Daycares can demand parents to Agree: take out the children for 3 weeks 28 pct summer holiday

Agree: 36 pct

0-class obligatory or not

Agree: 79 pct

Agree: 67 pct

• Fully agree • Agree • Both agree and disagree • Disagree • Fully disagree

39

Web panel and E-boks panel Conclusions and your questions  Even when web panel is recruited from representative surveys, there is great bias  The non-response in the web collection gives new bias  Weighting restores the population, but there is big variation in weights  One could call this web panel for the quasi representative  E-boks solves the problem about the first sources of bias, but it is still a problem that some respondents are hard to get to answer.  We still have to weight data to restores the population.  What do you think are the future for these ways of collecting data?  Where should we focus and what should we be aware of?

40