DOES SAMPLE SIZE MATTER IN QUALITATIVE RESEARCH?: A REVIEW OF QUALITATIVE INTERVIEWS IN IS RESEARCH

DOES SAMPLE SIZE MATTER IN QUALITATIVE RESEARCH?: A REVIEW OF QUALITATIVE INTERVIEWS IN IS RESEARCH Bryan Marshall Georgia College & State University ...
1 downloads 4 Views 344KB Size
DOES SAMPLE SIZE MATTER IN QUALITATIVE RESEARCH?: A REVIEW OF QUALITATIVE INTERVIEWS IN IS RESEARCH Bryan Marshall Georgia College & State University United States

Peter Cardon University of Southern California United States

ABSTRACT This study examines 83 IS qualitative studies in leading IS journals for the following purposes: (a) identifying the extent to which IS qualitative studies employ best practices of justifying sample size; (b) identifying optimal ranges of interviews for various types of qualitative research; and (c) identifying the extent to which cultural factors (such as journal of publication, number of authors, world region) impact sample size of interviews. Little or no rigor for justifying sample size was shown for virtually all of the IS studies in this dataset. Furthermore, the number of interviews conducted for qualitative studies is correlated with cultural factors, implying the subjective nature of sample size in qualitative IS studies. Recommendations are provided for minimally acceptable practices of justifying sample size of interviews in qualitative IS studies. KEYWORDS: qualitative methodology, qualitative inter­ views, data saturation, sample size INTRODUCTION Other than selecting a research topic and appropriate research design, no other research task is more fundamental to creating credible research than obtaining an adequate sample. Ensuring that there is enough data is a precursor to credible analysis and reporting. Yet, rarely if ever do qualitative researchers justify the sample sizes of qualitative interviews. Furthermore, leading qualitative research methodologists provide few concrete guidelines for estimating sample size. In this study, we explore how well researchers have justified their sample sizes for qualitative interviews in leading IS journals. We began the study with the hope of establishing evidence-based guidelines for researchers who intend to use qualitative interviews in their research. Furthermore, our hope was that these guidelines would lead to publishable and impactful IS research. In the Findings and Discussion sections of the paper, we demonstrate the poor compliance with basic procedures for rigor in sample size. We conclude with guidelines for standard practices related to justifying sample size that should be employed in all qualitativeinterview-based studies. We also discuss what we consider a crisis of rigor related to sample sizes in qualitative IS research. LITERATURE REVIEW Abundant qualitative research exists in the IS field, including a variety of reviews and recommendations tailored specifically to the use of qualitative methodology in the IS field [1, 9, 16, 17, 18, 19, 26, 40]. There have even been conferences devoted

Fall 2013

Amit Poddar Georgia College & State University United States

Renee Fontenot Georgia College & State University United States

to qualitative IS research (i.e., the Qualitative Research in IT conference in New Zealand) and WebPages sponsored by IS academic organizations that supply qualitative methodological resources on their websites (i.e., the Association of Information System’s linked webpage http://www.qual.auckland.ac.nz/). Yet, like many disciplines, scant attention is paid to estimating sample size for qualitative interviews. In part, this may be due to the fact that qualitative research emerges from a paradigm of emergent design with a hesitation to estimate sample size at the often fluid and undefined initial stages of research. In this literature review, we focus first on theoretical arguments for estimating sample size. Then, we discuss applied or practical reasons for estimating sample size. Finally, we discuss best practices in justifying sample size and data saturation. The concept data saturation (developed originally for grounded theory studies but applicable to all qualitative research that employs interviews as the primary data source) “entails bringing new participants continually into the study until the data set is complete, as indicated by data replication or redundancy. In other words, saturation is reached when the researcher gathers data to the point of diminishing returns, when nothing new is being added” [2, 22]. Thus, estimating adequate sample size is directly related to the concept of saturation. However, data saturation is an elusive concept and standard in qualitative research since few concrete guidelines exist. As Morse [23] stated, “Saturation is the key to excellent qualitative work . . . [but] there are no published guidelines or tests of adequacy for estimating the sample size required to reach saturation.” Furthermore, many qualitative methodologists recognize the sloppy fashion by which sample size is described in qualitative studies. Onwuegbuzie and Leech [29] stated, “Many qualitative researchers seemingly select the size of their samples in an arbitrary fashion” (pp. 115-116). Charmaz [4] further stated: Often, researchers invoke the criterion of saturation to justify small samples — very small samples with thin data. Such justifications diminish the credibility of grounded theory. . . . Claims of saturation often reflect rationalization more than reason, and these claims raise questions. What stands as a category? Is it conceptual? Is it useful? Developed by whose criteria? All these questions add up to the big question: What stands as adequate research? (p. 528) Most qualitative methodologists openly recognize the lack of standards for sample size. At the same time, some qualitative methodologists are not troubled by the lack of guidelines, even

Journal of Computer Information Systems

11

considering the vague nature of sample size guidelines as a reflection of the qualitative orientation to research. This orientation is not only theoretical but also psychologically fitting for people with more tolerance for ambiguity. For example, in the Sample Size section of his work on qualitative research methodology, Patton [31] explains, Qualitative inquiry is rife with ambiguities. There are purposeful strategies instead of methodological rules. There are inquiry approaches instead of statistical formulas. Qualitative inquiry seems to work best for people with a high tolerance for ambiguity. . . . Nowhere is this ambiguity clearer than in the matter of sample size. . . . There are no rules for sample size in qualitative inquiry. Sample size depends on what you want to know, the purpose of the inquiry, what’s at stake, what will be useful, what will have credibility, and what can be done with available time and resources. (pp. 242-243) While qualitative methodologists are unlikely to agree on exact sample sizes needed for qualitative studies, they generally agree that a number of factors can affect the number of interviews needed to achieve saturation. In addition to the nature and scope of the researcher, some other factors that can influence sample size needed to reach saturation include quality of interviews, number of interviews per participant, sampling procedures, and researcher experience [21, 24, 31, 32, 34, 35]. Estimating and justifying sample size of interviews has more than theoretical significance. Fundamentally, qualitative researchers face a number of unique challenges when proposing, conducting, and getting their research published. Fundamentally, there is still strong resistance among many scholars to qualitative research. Although the standing of qualitative work has improved over the past few decades, qualitative work is still viewed as less rigorous research by many non-qualitative researchers [8]. Furthermore, many qualitative research techniques are misunderstood by non-qualitative researchers. This leads to a variety of political problems, including difficulty in getting qualitative proposals approved and getting qualitative work published. Many scholars also believe that qualitative research is more time consuming and mentally challenging. Therefore, it can be a high-resource, high-risk, high-time commitment research activity in an academic environment that is often driven by shortterm productivity demands. For these many reasons, it is essential that qualitative researchers identify ways of most efficiently accomplishing their research objectives. Getting the right amount of data through interviews can accomplish this objective. Politically, qualitative researchers are generally in a minority position. Doctoral committees and editorial review boards typically contain a majority of quantitative-oriented researchers. Therefore, qualitative researchers need to mold research design, analysis, and reporting to get political support. Many reviews of qualitative research proposals are conducted with scholars with little or no knowledge of proper qualitative procedures. In some cases, antagonism towards qualitative work exists [5, 25, 30]. Regarding the many political and resource constraints facing qualitative researchers, Patton [31] stated: “Sampling to the point of redundancy is an ideal, one that works best for basic research, unlimited timelines, and unconstrained resources. The solution is judgment and negotiation. I recommend that qualitative sampling designs specify minimum samples based on expected reasonable

12

coverage of the phenomenon given the purpose of the study and stakeholder interests.” (p. 246) Patton [31] further stated: At the beginning, for planning and budgetary purposes, one specifies a minimum expected sample size and builds rationale for that minimum, as well as criteria that would alert the research to inadequacies in the original sampling approach and/or size. In the end, sample size adequacy, like all aspects of research, is subject to peer review, consensual validation, and judgment. (p. 246) Many qualitative methodologists claim the time and mentally exhaustive nature of qualitative work compared to other forms of research. Yin [45] stated regarding case study research: In actuality, the demands of a case study on your intellect, ego, and emotions are far greater than those of any other research method. This is because the data collection procedures are not routinized. . . . During data collection, only a more experienced investigator will be able to take advantage of unexpected opportunities rather than being trapped by them — and also will exercise sufficient care against potentially biased procedures. (p. 68) Guest, Bunce, and Johnson [15] further emphasized the need for numerical targets for sample sizes of interviews: Our experience, however, tells us that it is precisely a general, numerical guideline that is most needed, particularly in the applied research sector. Individuals designing research — lay and experts alike — need to know how many interviews they should budget for and write into their protocol, before they enter the field. This article is in response to this need, and we hope it provides an evidence-based foundation on which subsequent researchers can expand. (p. 60) We have cited a number of scholars who describe the importance of estimating and justifying sample size based on practical reasons. We would add that we consider the lack of more concrete guidelines acts as a barrier to many scholars in considering qualitative research. When Yin [45] states that “procedures are not routinized . . . [and that] only a more experienced investigator will be able to take advantage of unexpected opportunities rather than being trapped by them” (p. 68), this deters scholars who, despite interest in qualitative techniques, prefer some routine and less ambiguity. Furthermore, when Patton [31] states that “qualitative inquiry seems to work best for people with a high tolerance for ambiguity . . . [and that] nowhere is this ambiguity clearer than in the matter of sample size” (pp. 242-243), he may be unintentionally excluding a large set of scholars who are sympathetic to qualitative research but who need more structured guidelines for rigor, including for estimating sample size. These barriers to entry for researchers with tighter expectations of rigor are a practical problem for all qualitative researchers because by inadvertently discouraging other scholars from participating in qualitative research, the minority position of qualitative researchers is perpetuated, thus perpetuating the political and consensus-gaining activities that qualitative researchers must engage in more so than quantitative researchers. In this section, we describe three methods that can be used to justify sample size of interviews in qualitative research. The first and second are external justifications — they depend

Journal of Computer Information Systems

Fall 2013

on other scholars. The first method is to cite recommendations by qualitative methodologists. The second method is to act on precedent by citing sample sizes used in studies with similar research problems and designs [38]. The final and third method is internal justification. It involves statistical demonstration of saturation within a dataset. Some qualitative research methodologists present general guidelines for sample size of interviews [36]. These guidelines vary from methodologist to methodologist and sometimes the same methodologist has provided different ranges at different points in time; however, there is substantial overlap in the various recommended ranges. For example, in grounded theory studies, Creswell [6] recommends at least 20 to 30 interviewees. Denzin and Lincoln [8] recommend 30 to 50 interviews. Morse [24] recommends 20 to 30 interviewees with 2 to 3 interviews per person. In 1994, she had recommended 30 to 50 interviews and/ or observations. For phenomenological studies, ranges include approximately 6 [7], 6-8 [20], and 6-10 [24]. Case studies are among the most difficult types of qualitative research to classify. Yin [44] recommends at least six sources of evidence. Creswell [6] recommends no more than 4 or 5 cases. He further recommends 3 to 5 interviewees per case study. One problem is that when qualitative methodologists do in fact present number ranges for appropriate sample size, they fail to explain any rationale [29]. The second method of defending sample size is by precedent. Onwuegbuzie & Leech [29] stated, “We recommend that before deciding on an appropriate sample size, qualitative researchers should consider identifying a corpus of interpretive studies that used the same design as in the proposed study (e.g., grounded theory, ethnography) and wherein the data saturation was

Figure 1 — Data Saturation in a Phenomenological Study

Fall 2013

reached” (p. 118). In this situation, researchers estimate and/or justify sample size by citing similar studies that claimed data saturation at certain points. The third method of justifying sample size is through statistical demonstration of saturation within a dataset. We’re aware of few research efforts to identify appropriate sample size ranges. One exception is the work of Guest, Bunce, and Johnson [15] regarding the issue of theoretical saturation. Based on their review of recommendations of qualitative methodologists, they found that nearly all recommend achieving theoretical saturation, yet stated the following: They [qualitative methodologists] did a poor job of operationalizing the concept of saturation, providing no description of how saturation might be determined and no practical guidelines for estimating sample sizes for purposively sampled interviews. This dearth led us to carry out another search through the social science and behavioral science literature to see if, in fact, any generalizable recommendations exist regarding nonprobabilistic sample sizes. After reviewing twenty-four research methods books and seven databases, our suspicions were confirmed; very little headway has been made in this regard. (p. 60) Thus, they set out to develop evidence-based guidelines from their dataset of interviews of sixty women in Nigeria and Ghana in a phenomenological study of social desirability bias and selfreported sexual behavior. They followed coding procedures of Glaser and Strauss [13]. Based on themes developed in their codebooks, they concluded that of 109 content-driven codes, 80

Note. Based on Guest, Bunce, and Johnson

Journal of Computer Information Systems

13

(73%) were identified within the first six interviews transcripts; 100 (92%) were identified within the next six interview transcripts; the final 8 codes (to reach 100% of all codes) were identified by completion of the thirtieth interview transcript (see Figure 1). In terms of thematic code prevalence, they calculated a Cronbach’s alpha to measure the reliability of code frequency distribution as the analysis progressed. Generally, .70 is considered an acceptable measure. They showed that within 12 interviews, the Cronbach’s alpha was .70; after 18 interviews, .79; after 24 interviews, .85; after 30 interviews, .88; after 36 interviews, .88; after 42 interviews, .89; after 48 interviews, .90; after 54 interviews, .91; after 60 interviews, .93. They concluded that this measure was reliable early in the process and improved at ever decreasing rates. They concluded that most of the data saturation had occurred by 12 interviews. They state the following about the questions researchers should ask when planning sample size: “The question we pose, however, frames the discussion differently and asks, ‘Given x analyst(s) qualities, y analytic strategy, and z objectives(s), what is the fewest number of interviews needed to have a solid understanding of a given phenomenon?’” (p. 77). Based on our review of literature about justifying sample size of qualitative interviews, we would hypothesize that several relationships would emerge when qualitative researchers follow best practices for justifying sample size, as depicted in Figure 2. In the figure we have assumed that there is a point where data is most often saturated — in this case about 30 interviews (a rough midpoint of suggested ranges for grounded theory methodologists). This assumption is obviously variable and dependent on the type and scope of study as well as the skill set of the researcher/s. The larger point is that we would assume that certain types of

qualitative research (i.e., grounded theory, phenomenology, case study) would tend to cluster around an optimal range of interviews for several reasons. First, for theoretical reasons and assuming that researchers verify data saturation during the coding process with statistical techniques, we would expect that most studies would reach theoretical saturation most frequently in roughly the same range. Generally, to justify data saturation would require a researcher to conduct several interviews past that point (to indicate that the dataset was indeed becoming redundant). Furthermore, we would expect that this range would be influenced by the recommendations of qualitative methodologists. We would also expect that study quality to be highest at about the point of data saturation. In other words, the quality of the study would increase until data saturation is reached but diminish afterwards. As illustrated in Figure 1, there are rapidly diminishing returns once the data is saturated. However, exceeding the point of data saturation is not necessarily a harmless activity. In fact, gathering too much data can impair researchers from the deep, rich analysis of the data that is a hallmark and central to the purpose of qualitative research [31]. Furthermore, for most researchers, moving far past the point of saturation would devour limited time and resources. Therefore, we would expect the quality imperative to reign in efforts to gather too much data. METHODS The primary purposes of our research were the following: (a) identify the extent to which IS qualitative studies employ best practices of justifying sample size; (b) identify optimal ranges of interviews for various types of qualitative research; (c) identify

Figure 2 — Hypothesized Relationships with Sample Size of Qualitative Interviews

14

Journal of Computer Information Systems

Fall 2013

the extent to which cultural variables (i.e., journal, author, region) impact sample size of interviews; and (d) identify the relationship between sample size of interviews and impact of the research. Thus, we had the following broad research questions:

• Do IS research studies conform to best practices for justification of sample size? • How many qualitative interviews are enough for various types of qualitative research? • Do cultural variables (i.e., journal, author, region) influence the sample size of interviews?

We are aware of few efforts to examine qualitative studies as an evidence-based guide to appropriate sample size in qualitative studies. One example of such an effort is Thomson’s [39] review of fifty ground theory articles during the 2002-2004 time period. Sample size ranged from 5 to 350 with an average of 31. By taking the 350-interview study out of the sample, the average number of interviews was 24. Based on his study, he made several conclusions: (a) small sample size studies generally involved more contact time with each interviewee (longer interviews and/or repeated interviews); (b) theoretical saturation generally occurs between 10 and 30 interviews; and (c) once a researcher believes saturation has occurred, he/she should conduct several additional interviews to test whether existing themes and categories are sufficient. Based on the dilemmas of estimating needed interviews beforehand, he stated: “Thus, it would be wise to anticipate 30 interviews in order to facilitate pattern, category, and dimension growth and saturation. It is only through the quality of the data that meaningful and valid results are developed, so it is essential that the researcher ensure that saturation has occurred.” A study that focused specifically on rigor in the IS field was Dubé and Paré’s [9] study of 183 IS case studies in seven IS journals from 1990 to 1999. They examined the degree to which these studies met standards of rigor as posited by case study methodologists. They found that “a large portion of them have actually ignored the state of the art of case research methods that have been readily available to them” (p. 599). They found that semi-structured interviews were the primary data collection method in 88% of the articles. Only 13% of articles described the sampling strategy. Fewer than 38% included the number of interviewees and only 24% described the number of interviews conducted. “In short, the apparent lack of information about the sampling strategy in positivist case studies might prohibit the reader from understanding the limits of the conclusions that are drawn from such research” (p. 614). They found that in addition to qualitative interviews, other major data sources used the case studies were existing documentation (64% of cases), direct observation (32% of cases), and quantitative surveys (27% of cases). We patterned our study primarily on Thomson’s [39] study of grounded theory articles, yet we wanted to expand our sample to studies that employed qualitative interviews (not just grounded theory studies) but restrict the sample to IS studies. While it is true that many archival documents, observation, and other sources of data are included in qualitative studies, particularly cases, qualitative interviews are the primary data source in the vast majority of IS positivist cases studies, as confirmed by Dubé and Paré [9]. This is not different for the interpretive approach to case studies. As Walsh [40] stated, “With respect to interpretive case studies as an outside observer, it can be argued that interviews

Fall 2013

are the primary data source, since it is through this method that the researcher can best access the interpretations that participants have regarding the actions and events which have or are taking place” (p. 78). We identified articles that employed interviews as the primary data source in the following journals: MIS Quarterly, Information Systems Journal, Information Systems Research, Communications of the ACM, and Journal of Management Information Systems. We selected these five journals due to the high quality of the journals and accessibility through our library systems. Four of the five selected journals are top five IS journals according to average rankings on the Association for Information Systems website, which averages rankings of nine studies of IS journal quality. We gathered several pieces of information about the interviews. After all, “Sampling involves more than just the number of participants included in the study; sampling is a process that incorporates the number of participants, the number of contacts with each participant, and the length of each contact. It is important for researchers to consider how much contact will be required in order to reach saturation” [29]. Therefore, we gathered the number of interviewees, interviews, and length of interviews. By multiplying the number of interviews with the length of interviews, we established total contact time. In cases where authors simply stated a range of interview lengths, we average the range (e.g., 60 to 90 minutes was average to 75 minutes). We wanted to examine the impact of rigor of sample size justification with study quality. We decided to use impact factor as a rough estimate of study quality. This is by no means a perfect measure of study quality, but it is a decent measure of how well accepted a study is. To determine impact we used several methods. First, we used citations and impact factors developed by Web of Science®. For each article in the database (available for all articles in our sample with the exception of Journal of Management Information Systems), we were able to obtain the number of articles in journals in the ISI Web of Knowledge index that cited the article. We calculated five-year impact factors by calculating the average number of citations during the five years immediately following publication. Second, we used Google Scholar to identify the number of citations for each article. For Google impacts, we calculated average annual citations for all articles that had been published for at least five years. FINDINGS As far as justifying sample size for interviews, there was little if any defense based on the three best practices for doing so. First, no studies cited qualitative methodologists for an appropriate sample size. Several single case studies cited Yin [44] in defense of using a single case but went no further in describing why they employed the number of interviews they conducted. One multiple-case study cited Eisenhardt [10] in the selection of ten cases; yet, provided no additional detail about reasons for selecting the number of interviews for each case [3]. Many of the studies included references to theoretical saturation as described by Glaser and Strauss [13]. Surprisingly, even when mentioning the importance of saturation, none of these studies provided a defense of the size of their sample. As examples, Nissen [28] stated: “[We conducted] more than 200 [interviews] conducted over a three-month period. The number, time, and scope of interviews continue until theoretical saturation is reached” (p.

Journal of Computer Information Systems

15

235); yet, they never explained how or when saturation was achieved. Garud and Kumaraswamy [12] cited Glaser and Strauss [13] and explained the importance of “iterating between theory until a stage of theoretical saturation is reached” (p. 15); yet, similarly never mentioned at what point this was reached in their study. Watson-Manheim and Bélanger [41] advocated “reaching theoretical saturation when possible” (p. 271); yet, never discussed how their sample reached saturation. In short, none of the studies cited qualitative methodologists in defense of their sample size. Many of the studies invoked the concept of saturation but never described a point at which that was expected to or did happen. For the second method of defending sample size, only one study out of the 83 studies cited a prior qualitative work as the basis for sample size when addressing a similar problem. Ryan and Harrison [33] stated, “Other IT researchers have used a similar number of interviews to explore complex issues and build theory” (p. 17) and then cited the work of Niederman, Beise, and Beranek [27] as a basis for their sample size. For the third method of defending sample size, none of the studies cited statistical evidence of saturation. A few studies made vague statements about reaching saturation near the end of the interviews. Tallon and Kraemer [37] stated, “We conducted interviews until we reached theoretical saturation — that is, the last few interviews did not provide new insights — making the sample size appropriate for this study” (p. 145). Fort, Larco, and Bruckman [11] stated, “When themes began arising over and over in these interviews, we followed up by recruiting partici-

pants who have been involved in particular issues and policies” (p. 54). Goulielmos [14] stated, “No new concepts were developed with the completion of data completion” (p. 366). These types of statements were rare, and in each instance, presented in one or two sentences with no statistical demonstration of sample size. We provide various descriptive statistics that show ranges of number of interviews for various types of research methodologies. In Table 1, we group these statistics by the following types of research studies: grounded theory, single case, multiple case study, other types of qualitative study, and mixed design (qualitative and quantitative) studies. The final two columns show statistics for the top 25 percentile in impact for articles that were published in 2005 or before, thus focusing on impacts that have extended over a minimum of five years. Google top performers refer to average annual citation counts for articles older than five years. Top performers refer to five-year impact factors of articles older than five years according to the Social Sciences Index. Institution region refers to the location of the first author’s institution and the regions are North America (United States and Canada), Europe (including Australia and New Zealand), and Asia. Impact factors are displayed in quartiles since with relatively small sample sizes, averages are easily skewed by extreme points in the dataset. Since grounded theory, single case, and multiple case research designs are the dominant forms of qualitative research, we report only on those types of research throughout the remainder of the paper.

Table 1. Descriptive Statistics of Included Studies   Grounded Multiple Theory Single Case Case Other Mixed All   Time Period n % n % n % n % n % n % 1980s 0 0% 0 0% 0 0% 3 18% 1 6% 3 4% 1990-1994 0 0% 2 6% 0 0% 1 6% 0 0% 3 4% 1995-1999 3 15% 4 12% 3 13% 1 6% 2 11% 9 11% 2000-2004 8 40% 12 35% 9 39% 6 35% 7 39% 31 37% 2005-2009 9 45% 16 47% 11 48% 6 35% 8 44% 37 45%   Journal CACM 4 20% 2 6% 1 4% 7 41% 3 17% 13 16% ISJ 2 10% 11 32% 7 30% 2 12% 6 33% 22 27% ISR 3 15% 1 3% 4 17% 0 0% 1 6% 6 7% JMIS 8 40% 16 47% 5 22% 3 18% 5 28% 26 31% MISQ 3 15% 4 12% 6 26% 5 29% 3 17% 16 19%   Institution Region USA/CA 14 70% 17 50% 14 61% 13 76% 8 44% 49 59% Europe 5 25% 13 38% 6 26% 4 24% 7 39% 26 31% Asia 1 5% 4 12% 3 13% 0 0% 3 17% 8 10%   Number of Authors 1 5 25% 6 18% 4 17% 6 35% 6 33% 17 20% 2 11 55% 16 47% 14 61% 4 24% 6 33% 39 47% 3 or more 4 20% 12 35% 5 22% 7 41% 6 33% 27 33%   Impact Factor Minimum 0.0 2.0 0.0 0.0 0.0 0.0 Quartile 1 6.8 6.5 5.0 3.0 3.5 4.5 Quartile 2 17.5 9.0 16.0 5.0 5.0 9.0 Quartile 3 27.5 24.5 20.0 11.0 8.5 18.5 Maximum 48.0   48.0   30.0   28.0   27.0   48.0     Total 20   34   23   17   18   83  

16

Journal of Computer Information Systems

Google Top Performers

Top Performers

n 0 1 4 7 0

% 0% 8% 33% 58% —

n 0 0 1 7 0

% 0% 0% 13% 88% —

0 0 2 5 5

0% 0% 17% 42% 42%

1 0 3 0 4

13% 0% 38% 0% 50%

9 2 1

75% 17% 8%

6 1 1

75% 13% 13%

2 8 2

17% 67% 17%

1 6 1

13% 75% 13%

0.0 10.5 27.0 29.5 48.0   12   Fall 2013

20.0 26.8 28.5 30.3 48.0   8

In Table 2, we present the number of interviews per study broken down by journal, institution region, and number of authors. Generally speaking, some journals contained articles with far more interviews per study. For example, 50% of articles in MIS Quarterly contained more than 40 interviews, compared to 34% in Journal of Management Information Systems, 23% in Information Systems Journal, 17% in Information Systems Research, and 15% in Communications of the ACM. On the other hand, some journals contained articles with far fewer interviews. In Communications of the ACM and Information Systems

Journal, 69% of the articles contained 30 or fewer interviews, compared to 66% in Information Systems Research, 56% in Journal of Management Information Systems, and 38% in MIS Quarterly. European and Asian authors were far more likely to include fewer interviews in their studies with 77% of European authors including fewer than 30 interviews, compared to 71% for Asian authors, and 50% for American authors. Studies that included more authors similarly contained fewer interviews. In approximately 76% of studies authored by three or more authors there were fewer than 30 interviews compared to 52% and 47%

Table 2. Number of Interviews per Study by Journal, Region, and Authors  

Under 10

11 to 20

21 to 30

31 to 40

41 to 50

51 to 100

over 100

Journal

n

%

n

%

n

%

n

%

n

%

n

%

n

%



2 5 0 3 1

15% 23% 0% 13% 6%

0 7 2 7 3

0% 32% 33% 30% 19%

7 3 2 3 2

54% 14% 33% 13% 13%

2 2 1 2 2

15% 9% 17% 9% 13%

0 4 0 3 2

0% 18% 0% 13% 13%

2 0 1 4 5

15% 0% 17% 17% 31%

0 1 0 1 1

0% 5% 0% 4% 6%

Institution Region USA/CA Europe Asia

4 8% 10 20% 11 22% 8 16% 3 6% 11 6 27% 8 36% 3 14% 0 0% 3 14% 1 1 14% 1 14% 3 43% 0 0% 2 29% 0

22% 5% 0%

2 1 0

4% 5% 0%

Number of Authors 1 2 3+

1 6% 6 35% 1 6% 3 18% 3 18% 2 5 13% 5 13% 10 26% 3 8% 5 13% 8 5 20% 8 32% 6 24% 3 12% 1 4% 2

12% 21% 8%

1 2 0

6% 5% 0%

CACM ISJ ISR JMIS MISQ

Note. Percentages are in rows.

Figure 3. Number of Interviews Based on Number of Authors

Fall 2013

Journal of Computer Information Systems

17

containing fewer than 20 interviews) or large samples sizes (30% in double-authored and single-authored studies, respectively. We containing more than 40 interviews). The majority (57%) of depict this author relationship in Figure 3 to show that whereas multiple case studies contains 20 to 50 interviews; however, there there is little pattern to the number of interviews in single-authored is major variability. and double-authored studies, those studies with three or more In Table 4, we present the number of cases in multiple case authors tend to crest around twenty interviews and gradually fall studies. Generally, most of these studies contain just 2 or 3 cases to around fifty interviews with quite uncommon cases of studies (43%) or 8 or more cases (43%). Few studies contain between 4 with more than fifty interviews. and 7 cases (13%). This pattern is similar to grounded theory and In Table 3, we present the number of interviewees, interviews, single case designs in that sample sizes tend to cluster around and contact time (in total hours per study) for grounded theory, small or large sample sizes. And surprisingly, this is counter to single case, multiple case, Google top performers, and top the recommendations of some qualitative methodologists, such performers. The differential between minimums and maximums is extremely large. For example, in terms of total contact time the differential ranges on order of 17 Table 3. Interviewees, Interviews, and Contact Time by Research Design to 40 times. Even when limiting analysis of the   middle fifty percentile, the range for grounded Grounded Single Multiple Google Top Top theory, single case, and multiple case studies is on Theory Case Case Performers Performers   an order of 3 to 4.5 times: grounded theory ranges Interviewees from 16 to 70 contact hours for interviews; single Minimum 6 4 10 7 12 case ranges from 10 to 46 hours; and multiple case Quartile 1 15 13 22 19 20 ranges from 21 to 69 contact hours. Ranges tighten Quartile 2 27 24 39 29 24 for those studies with highest impacts: Google top Quartile 3 59 43 45 68 40 performers range from 31 to 71 contact hours and Maximum 200 200 74 105 105   Social Science Index top performers range from 23 Interviews to 39 contact hours. Minimum 6 4 10 7 12 In Figure 4, we depict trends for sample size Quartile 1 17 13 22 19 20 of interviews for grounded theory, single case, and Quartile 2 29 23 40 29 24 multiple case studies. The trend line for grounded Quartile 3 59 45 49 68 40 theory studies is particularly surprising since there Maximum 200 200 74 127 127 are more concrete suggestions for grounded theory   studies in terms of sample size (generally ranging Contact Time (hrs) in the 20 to 40 range). The IS grounded theory Minimum 5.5 4.0 6.0 5.5 12.0 studies we examined tended to be on either side of Quartile 1 15.8 10.0 21.2 31.2 23.0 this range with 35% of these studies containing less Quartile 2 37.3 28.0 38.8 39.0 37.3 than 20 interviews and 40% containing more than Quartile 3 70.0 46.0 68.8 71.0 38.8 40 interviews. Similarly, single case studies exhibit Maximum 222.3 222.3 100.5 222.3 222.3 a similar trend with either small sample sizes (45%

Figure 4. Interviews per Study by Research Design

18

Journal of Computer Information Systems

Fall 2013

Table 4. Number of Cases in Multiple Case Research Designs

n

Min # Cases Average Number of Cases Median Number of Cases Max # Cases

%

2 7 5 34

# Cases per Study 2 7 3 3 4 1 5 2 6 0 7 0 8 2 9 2 10 or more 6

30.4% 13.0% 4.3% 8.7% 0.0% 0.0% 8.7% 8.7% 26.1%

as Creswell, who recommend not exceeding five or six cases in multiple case designs. Finally, in Figures 5 and 6 we show scatter plots with trend lines with the relationships between number of interviews and average annual Google Scholar citations for articles that were published five or more years ago. We present these figures cautiously due to the small sample sizes. Nonetheless, we find it interesting that there are trends showing maximum impact around 25-30 interviews for grounded theory studies and 15-25 interviews for single case studies. DISCUSSION AND RECOMMENDATIONS Without justification of sample size, the implication is that sample size is arbitrary and thus inconsequential. In the studies we examined, there is no apparent effort to justify sample size by citing the recommendations of qualitative methodologists, acting on the precedent of other studies with similar designs and research problems, or by demonstrating statistically that the

Figure 5. Number of Interviews and Google Scholar Annual Citations for Grounded Theory Studies

Figure 6. Number of Interviews and Google Scholar Annual Citations for Single Case Studies



Fall 2013

Journal of Computer Information Systems

19

dataset collected had become saturated. Rather, IS researchers made superficial references to data saturation or altogether omitted justification for sample size. Since justifying sample size is evidence that the dataset is sufficient to address research problems, this indicates an indifference to rigor that would be unacceptable in quantitative research and should likewise be unacceptable in qualitative research. We anticipated that we would be able to provide evidencebased guidelines for the number of interviews needed for various types of research designs. Ultimately, we showed the extreme variation in sample size in all research designs. Rather than clustering in a mid-range of ideal number of interviews, grounded theory and single case studies tended to have many studies with small sample sizes and many studies with large sample sizes. Furthermore, there were few studies in suggested ranges (mid-ranges) by qualitative methodologists. For studies with small sample sizes (for example, grounded theory studies of 20 or fewer interviews) and without justification of sample size, it would be easy to question whether theoretical saturation had been reached. For studies with large sample sizes (for example, grounded theory studies of 40 or more interviews), it would be easy to question whether the researchers had been able to devote sufficient attention to analyzing and reporting in-depth, rich content with such a voluminous dataset. As far as the cultural influence (values and norms of groups and institutions), it is evident that culture does play a role. Some journals tend to publish articles with much larger sample sizes (such as MISQ), some regions (based on institution of first author) likewise include more interviews (such as North America), and larger groups of researchers (three or more) tend to conduct fewer interviews. The fact that there is such variation highlights the subjective nature of determining sample size. Given this variation, the need for evidence-based guidelines for sample size of interviews is further needed. We consider the lack of rigor in justifying sample size a major crisis to qualitative IS researchers. At a minimum level, we recommend that IS researchers use each of the three best practices for justifying sample size. The most critical best practice is statistical demonstration of data saturation since this provides internal support for the value of the dataset and the analysis and reporting built on the dataset. We consider the second most important best practice that of citing other similar studies that have adopted similar designs with similar research problems. There are many benefits to adopting these best practices. First, they add to the credibility of the research. Second, they have the potential of saving significant amounts of time. Since each interview creates enormous time commitments (such as establishing relationships of trust, transcribing, coding), focusing on getting the right amount of data can save hundreds of hours. Third, we believe that adopting more rigorous standards of qualitative research would enhance the reputation of qualitative research generally and even make this type of research more appealing to quantitative researchers. None of the three best practices described in this article are new nor are they controversial among most qualitative researchers — yet, few if any IS qualitative researchers are employing them. As we reviewed the many qualitative articles for this study, we concluded that there is a heavy emphasis on process in methods sections. Qualitative researchers carefully detailed all of the many steps they took in the data collection and analysis process. This process-heavy orientation in terms of methods omits the results

20

oriented justification for sample size. After all, the best and most rigorous justification for sample size of interviews does not emerge from the steps a researcher takes in collecting the data (process-driven), it emerges with statistical demonstration of redundancy in codes (results-driven). Although there are major limitations in each study in terms of justifying sample size, we do think that we can infer some collective wisdom from the studies that provide guidelines for qualitative IS studies. We recommend the following: Grounded theory qualitative studies should generally include between 20 and 30 interviews. We based this recommendation on several pieces of information from the study. Assuming that top performers are articles that are most highly respected, we think that the minimum number of articles, based on precedent, should be the minimum standard in the majority of top performers (in this case, we view this majority as the Quartile 1 figure as displayed in Table 3 for top performers and Google top performers). At the same time, qualitative researchers should assume that too many interviews can be counterproductive. As a result, we think the maximum should be where additional interviews fail to produce substantial new insight. We see no evidence that studies over 30 interviews yielded significantly more impact. For example, Figure 5 shows that the average number of annual Google citations essentially reached a plateau around 30 interviews. We also think exceeding 30 interviews generally defies the wisdom when multiple authors work together. When three or more authors work together, they view saturation as occurring with 30 or fewer interviews in the vast majority (76%) of studies (see Figure 4). We particularly caution against grounded theory studies of over 40 interviews. The collective judgment in our study seems to indicate this as well since 60 percent of all grounded theory studies contained fewer than 40 interviews (see Figure 4). We expect further refinement to our recommended range in years to come as IS researchers report the techniques they use to reach data saturation (as outlined in the prior recommendation). Single case studies should generally contain 15 to 30 interviews. The extreme variation in practices for single case studies makes a recommendation challenging. What is clear is that 69 percent of all qualitative IS studies sampled for this study employed fewer than 30 interviews. We think it would be rare that additional interviews would be a wise time investment. Further, of top Google performers, the highest average impacts fell in the 15 to 30 interview range (see Figure 5). Qualitative researchers should examine the expectations of their intended journal outlets based on history and culture. It appears that some journals may have expectations for higher numbers of interviews. For example, qualitative studies in MIS Quarterly have far higher numbers of qualitative interviews per study. Furthermore, it appears that American researchers tend to conduct more interviews. We do not know a reason for this. Aside from the theoretical and statistical elements of justifying sample size, qualitative researchers should recognize these apparent cultural judgments of data saturation and adjust their sample sizes accordingly. Replication studies should further examine the impacts of culture and study design. We think this study shows clearly that culture and study design affect judgments about data saturation. Future studies that specifically addressed to what extent, how, and why cultural factors affect judgments of data saturation would be particularly helpful. We also think future studies are needed to examine ideal ranges of qualitative interviews, particularly in

Journal of Computer Information Systems

Fall 2013

multiple case studies. We recommend expanding the inclusion of other top IS journals for such studies, including Association of Information Systems, Journal of Information Technology, Journal of Strategic Information Systems, European Journal of Information Systems, and Journal of AIS. SUMMARY In this paper, we have addressed the problem of estimating and justifying sample size of qualitative interviews. By examining 83 IS qualitative studies in leading IS journals, we have shown that there is little rigor in justifying sample size. Furthermore, the number of interviews conducted for qualitative studies is correlated with cultural factors (such as journal of publication, number of authors, world region), implying the subjective nature of sample size in qualitative IS studies. We intended to develop evidence-based guidelines for estimating sample size. However, the vast range of sample sizes for all research designs makes this problematic, which further supports that following rigorous methods of justifying sample size of interviews in qualitative studies is needed. Indeed, for qualitative IS research to gain wider acceptance relative to quantitative research, rigor in sample size determination is critical. Based on our examination of qualitative interviews in IS studies, we make the following recommendations: (a) grounded theory qualitative studies should generally include between 20 and 30 interviews; (b) single case studies should generally contain 15 to 30 interviews; (c) qualitative researchers should examine the expectations of their intended journal outlets based on history and culture; and (d) replication studies should further examine the impacts of culture and study design. REFERENCES [1] Amin M., & Mabe M. “Impact factors: Use and abuse,” Perspectives in Publishing, (1), 2000, 1-6. [2] B  owen, G. A. “Naturalistic inquiry and the saturation concept: A research note,” Qualitative Research, (8:1), 2008, 137-152. [3] Chan S. C. H., & Ngai E. W. T. “A qualitative study of information technology adoption: How ten organizations adopted Web-based training,” Information Systems Journal, (17:3), 2007, 289-315. [4] Charmaz, K. “Grounded theory for the 21st century: Applications for advancing social justice studies,” In N. K. Denzin & Y. S. Lincoln (Eds.), The Sage Handbook of Qualitative Research 3rd edition, Sage, Thousand Oaks, CA, 2005, 507-535. [5] Cheek, J. “The practice and politics of funded qualitative research,” In N. K. Denzin & Y. S. Lincoln (Eds.), The Sage Handbook of Qualitative Research 3rd ed, Sage, Thousand Oaks, CA, 2005, 387-409. [6] Creswell, J. W. Qualitative inquiry & research design: Choosing among five approaches (2nd ed.). Sage, Thousand Oaks, CA, 2007. [7] D  enzin, N. K., & Lincoln Y. S. Handbook Of Qualitative Research. Sage, Thousand Oaks, CA, 1994. [8] Denzin, N. K., & Lincoln Y. S. “The discipline and practice of qualitative research,” In N. K. Denzin & Y. S. Lincoln (Eds.), The Sage Handbook of Qualitative Research (3rd ed.), Sage, Thousand Oaks, CA, 2005, 1-32. [9] Dubé, L., & Paré, E. “Rigor in information systems

Fall 2013

positivist case research: Current practices, trends, and recommendations,” MIS Quarterly (27:4), 2003, 597-635. [10] E  isenhardt, K. M. “Building theories from case study research,” Academy of Management Review, (14:4), 1989, 532-550. [11] Forte, A., Larco V., & Bruckman, A. “Decentralization in Wikipedia governance,” Journal of Management Information Systems (26:1), 2009, 49-72. [12] G  arud, R., & Kumaraswamy, A. “Vicious and virtuous circles in the management of knowledge: The case of Infosys Technologies,” MIS Quarterly (29:1), 2005, 9-33. [13] G  laser, B. G., & Strauss A. L. The Discovery of Grounded Theory. Aldine, New York, NY, 1967. [14] Goulielmos, M. “Systems development approach: tran­ scending methodology,” Information Systems Journal, (14:4), 2004, 363-386. [15] Guest, G., Bunce, A., & Johnson, L. “How many interviews are enough? An experiment with data saturation and variability,” Field Methods, (18), 2006, 59-82. [16] Havelka, D., & Merhout, J. W. “Toward a theory of information technology professional competence,” Journal of Computer Information Systems, (50:2), 2009, 106-116. [17] Huang, E. Y., & Lin, S. W. “Do knowledge workers use email wisely,” Journal of Computer Information Systems, (50:1), 2009, 65-73. [18] K  aplan, B., & Duchon, D. “Combining qualitative and quantitative methods in information systems research: A case study,” MIS Quarterly, (12:4), 1988, 571-586. [19] K  aplan, B., & Maxwell, J. A. “Qualitative research methods for evaluating computer information systems,” In J. G. Anderson & C. E. Aydin (Eds.), Evaluating the organizational impact of healthcare information systems (2nd ed.), Springer, New York, NY, 2006, 30-55. [20] Kuzel, A. J. “Sampling in Qualitative Inquiry,” In BF Crabtrree and WL Miller (Eds.) Doing Qualitative Research (2nd ed.), Sage, Thousand Oaks, CA, 1999, 33-45. [21] Lincoln, Y. S., & Guba E. G. Naturalistic Inquiry, Sage, Beverly Hills, CA, 1985. [22] Miles M. B., & Huberman A. M. Qualitative Data Analysis (2nd ed.), Sage, Thousand Oaks, CA, 1994. [23] Morse, J. M. “The significance of saturation,” Qualitative Health Research, (5), 1995, 147-149. [24] M  orse, J. M. “Determining sample size,” Qualitative Health Research, (10:1), 2000, 3-5. [25] Morse, J. M. “Myth #53: Qualitative research is cheap,” Qualitative Health Research, (12:10), 2002, 1307-1308. [26] Myers, M. D. “Qualitative research in information systems,” MIS Quarterly, (21:2), 1997, 241-242. [27] Niederman, F., Beise, C., & Beranek, P. “Issues and concern about computer-supported meetings: The facilitator’s perspective,” MIS Quarterly, (20:1), 1996, 1-22. [28] Nissen, M. E. “Dynamic knowledge patterns to inform design: A field study of knowledge stocks and flows in an extreme organization,” Journal of Management Information Systems, (22:3), 2005, 225-263. [29] Onwuegbuzie, A. J., & Leech, N. L. “A call for qualitative power analyses,” Quality & Quantity, (41), 2007, 105-121. [30] Parahoo, K. “Square pegs in round holes: Reviewing qualitative research proposals,” Journal of Clinical Nursing, (12), 2003, 155-157. [31] Patton, M. Q. Qualitative Research & Evaluation Methods, Sage, Thousand Oaks, CA, 2002.

Journal of Computer Information Systems

21

[32] Richardson, L., & St. Pierre, E. A. “Writing: A method of inquiry,” In N. K. Denzin & Y. S. Lincoln (Eds.), The Sage Handbook of Qualitative Research (3rd ed.), Sage, Thousand Oaks, CA, 2005, 959-978. [33] Ryan, S. D., & Harrison, D. A. “Considering social subsystem costs and benefits in information technology investment decisions: A view from the field on anticipated payoffs,” Journal of Management Information Systems, (16:4), 2000, 11-40. [34] Sandelowski, M. “Sample size in qualitative research,” Research in Nursing and Health, (18), 1995, 179-183. [35] Strauss, A., & Corbin, J. Basics of Qualitative Research: Grounded Theory Procedures and Techniques, Sage, Thousand Oaks, CA, 1990. [36] Subramanian, G. H., & Peslak, A. R. “User perception differences in enterprise resource planning implementations,” Journal of Computer Information Systems, (50:3), 2010, 130-138. [37] Tallon, P. P., & Kraemer K. L. “Fact or fiction? A sensemaking perspective on the reality behind executives’ perceptions of



22

IT business value,” Journal of Management Information Systems, (24:1), 2007, 13-54. [38] Tan, W., Cater-Steel, A., & Toleman, M. “Implementing IT service management: A case study focusing on critical success factors,” Journal of Computer Information Systems, (50:2), 2009, 1-12. [39] Thomson, S. B. “Grounded theory — sample size and validity, “ Retrieved January 29, 2010, at http://www.buseco. monash.edu.au/research/studentdocs/mgt.pdf, 2002. [40] Walsham, G. “Interpretive case studies in IS research: Nature and method,” European Journal of Information Systems, (4), 1995, 74-81. [41] Watson-Manheim, M. B., & Bélanger, F. “Communication media repertoires: Dealing with the multiplicity of media choices,” MIS Quarterly, (31:2), 2007, 267-293. [42] Yin, R. Case Study Research: Design and Methods (2nd ed.), Sage, Beverly Hills, CA, 1994. [43] Yin, R. Case Study Research: Design and Methods (4th ed.). Sage, Thousand Oaks, CA, 2009.

Journal of Computer Information Systems

Fall 2013