Using SPSS. 1. What (who?) is SPSS?

Using SPSS 1. What (who?) is SPSS? SPSS (Statistical Package for the Social Sciences) is a powerful set of computer programs for calculating statistic...
7 downloads 0 Views 64KB Size
Using SPSS 1. What (who?) is SPSS? SPSS (Statistical Package for the Social Sciences) is a powerful set of computer programs for calculating statistics (and thus for saving you innumerable hours of calculating using pencil and paper). The program is available on many personal computers (PCs) around campus. (These PCs can be located at the following web site: http://www.it.iastate.edu/labsdb/) In these few pages you will learn only a few commands for running SPSS. You will learn some more commands as you are given SPSS programs to run during the course of the semester.

2. Entering data into SPSS Like any program on a PC, it is started by double-clicking on the SPSS icon on the computer’s desktop. This opens the “Data Editor” in SPSS. Here you can enter data manually (not advised), import data, or type in (and run) a batch program that contains your data. Below are instructions on how to enter data into SPSS using the latter two methods.

1

a. Importing data from an SPSS portable (with *.por extension) or SPSS systems (with *.sav extension) file. During the semester you will be provided with SPSS portable files that contain data for you to analyze as part of your lab problems. You can import these files into SPSS by (1) selecting “Open” then “Data” from the “File” menu, (2) changing “Files of type” to “SPSS Portable (*.por)”, (3) changing “Look in” to the folder into which you have copied the file, and (4) double-clicking on the file name. (Directions are the same for a systems file, except in that “Files of type” should be changed to “SPSS (*.sav).”) b. Using SPSS in “Batch Mode” Around the time of the first exam, you will start getting SPSS “batch programs” with your lab assignments. To complete the assignments you will need to run these programs and print the output (that appears in an SPSS Viewer window). WARNING: SPSS will only print parts of your output that you have highlighted in the left tree-pane of the SPSS Viewer window. Pressing the button with the printer on it will not do anything if nothing has been highlighted. Batch programs must be typed using the SPSS Syntax Editor. Open the editor by selecting “File” then “New” then “Syntax.” Next, type in your batch program exactly as it appears in the lab. Then select “Edit,” “Select All,” and push the “Run Current” button (i.e., the button with the right-pointing black arrowhead on it). This will execute your batch program and send the output to the SPSS Viewer window (where you may also find error messages if you mistyped parts of your program). WARNING: It is strongly recommended that you use a fixed (i.e., NOT proportional) font when typing your batch programs. Any Courier font should do the trick. Fixed fonts align your numbers in straight columns, so that it is easer to tell if you have left too many spaces between them.

3. Why do batch programs? Maybe you have used SPSS on a PC before, or maybe you have had a chance to play around with the program a bit. In either case, you have probably discovered that once you have imported data into SPSS, you can analyze these data entirely with mouse movements and clicks. What could be easier? Batch programs just slow you down. Who needs them, right? The answer lies in the fact that you can save (and rerun, if necessary) your batch programs, whereas it is not as easy to recreate the mouse activities that generated your output. Just imagine in a few years when you proudly take your results to your major professor and he exclaims, “These results are hard to believe! How did you get these numbers?” Now imagine going back to your computer and discovering that no matter how much you and your mouse try, you cannot recreate the numbers on your output. On the other hand, if the numbers appeared in Table 23 and you search your PC for a file called “table23.sps,” just a quick run of this batch program will (assuming that no one has manipulated your data or program file) generate an exact replica of the output that your major professor seeks. In a sentence, batch programs are simply part of a good

2

record keeping strategy. In reviewing the literature, competent researchers always keep careful records of their sources, right? Well, when performing a statistical analysis, only the incompetent fail to keep just as meticulous records of their programs. Enough said.

4. Writing a program a. When data are included in an SPSS batch program, the first line of the program will be a “data list” statement. You will find one at the beginning of the following illustration: data list records=1 / attend 1-2 prejud 4-5. compute newx = (attend - 28)**2. compute sstotal = (prejud - 37)**2. compute ysq = prejud**2. compute xy = attend * prejud. compute xsq = attend**2. compute yhat1 = 39.94926 + (-.10533 * attend). compute resid1 = prejud - yhat1. compute sspred1 = (yhat1 - 37)**2. compute sserror1 = (prejud - yhat1)**2. compute newxy = newx * prejud. compute ssnewx = (newx - 259.5)**2. compute yhat2 = 59.19815 + (-.085542 * newx). compute resid2 = prejud - yhat2. compute sspred2 = (yhat2 - 37)**2. compute sserror2 = (prejud - yhat2)**2. begin data. 11 36 46 33 3 6 16 42 41 49 21 51 23 61 10 23 34 57 48 18 28 65 55 3 end data. plot plot=prejud with attend,newx. frequencies vars=prejud ysq attend xsq xy / statistics=mean. frequencies vars=yhat1 to sserror1 / statistics=mean. frequencies vars=prejud ysq newx ssnewx newxy / statistics=mean. frequencies vars=yhat2 to sserror2 / statistics=mean.

3

In writing batch programs, be sure that each line (except ones with data) ends with a period. Also notice how the “data list” statement indicates that there is one line of data (records=1) for each unit of analysis, that data on the variable, “attend” appear rightjustified in the first two columns, and data on the variable, “prejud” appear right-justified in columns 4 and 5. When listed in a batch program, data are placed between a “begin data” and an “end data” statement, and may be preceded by data transformation statements (e.g., compute, recode, if, etc.) and followed by commands (e.g., plot, frequencies, etc.). Note: “Commands” generate output, “statements” merely create or modify variables. b. You will be given programs such as this when they are required in your lab assignments. The following common SPSS commands and statements are included to help you understand the various parts of these programs: 1) Occasionally you may wish to get a scatter plot of your data. To plot “prejud” on the vertical axis and “attend” on the horizontal axis, you would use the following command: plot plot=prejud with attend. 2) The “compute” statement is used to create a new variable as a mathematical function of other variables. For example, imagine that you found a constant (or intercept) of 39.94926 and a slope of -.10533 in the regression of “prejud” on “attend”. You could compute a new variable, “prejuhat”, that gives the estimated values of “prejud” (according to this regression) for each of the 12 observations listed in the above program. This would be done as follows: compute prejuhat = 39.94926 + (-.10533 * attend). 3) The “recode” statement is used to change values on a variable. For example, consider an income measure (let’s call it “rincome”) that takes a value of 1 for incomes less than $1000, 2 for incomes between $1000 and $2999, 3 for $3000 to $3999, 4 for $4000 to $4999, 5 for $5000 to $5999, 6 for $6000 to $6999, 7 for $7000 to $7999, 8 for $8000 to $9999, 9 for $10000 to $14999, 10 for $15000 to $19999, 11 for $20000 to $24999, 12 for $25000 or more, and 13 for refused to respond. These values could be recoded (approximately) into dollar units with the following recode command: recode rincome (1=500)(2=2000)(3=3500)(4=4500)(5=5500)(6=6500) (7=7500)(8=9000)(9=12500)(10=17500)(11=22500)(12=35000)(13=99). Note that values are changed to the midpoints (in dollars) of their corresponding intervals. The choice of $35,000 as the midpoint of the highest income category is, admittedly, somewhat arbitrary. Also, since 99 is the missing data code for “rincome”, the last parentheses in the recode set the incomes of refusers to missing (rather than not recoding it and in-so-doing assuming their average income to equal $13.00).

4

4) The “if” statement can be used to combine information from different variables. This is particularly useful when one has contingency items. For example, consider the two contingency items “hit” (“Have you ever been punched or beaten by another person?”) and “hitage” (“Did you experience this beating (or these beatings) as a child, as an adult, or in both childhood and adulthood?”). Imagine that a score of 1 on “hit” means “yes” and a score of 2 means “no” and that a score of 1 on “hitage” means “as a child,” of 2 means “as an adult,” and of 3 means “both.” If you wanted a variable that measured whether respondents were beaten as children, you might wish to change “hit” such that a score of 1 would mean “hit as a child” and 2 would mean “not hit as a child.” This can be done by changing scores on “hit” from 1 to 2 among respondents who were beaten as an adult but not as a child. This could be done with the following “if” statement: if ( hitage eq 2 ) hit = 2. Note: Logical operators other than “eq” (equals) are “ne” (does not equal), “lt” (less than), “gt” (greater than), “le” (less than or equal to), and “ge” (greater than or equal to). These relations can also be combined with “and” and “or”. Consider the following illustration: if ( ( ( var1 ne 0 ) and ( var2 eq 1) ) or ( var3 ge 20 ) ) var4 = 0. 5) The “select if” statement allows one to restrict an analysis to part of one’s data set. Thus, the following statement would restrict an analysis to women only: select if ( sex eq 2 ). 6) All statements (e.g., with “compute”, “recode”, “if”, “select if”, etc.) that follow a “temporary” statement apply only to the next following command. For example, if you wished to find the mean and variance on “age” separately for males and females, this would be done as follows: temporary. select if ( sex eq 1 ). frequencies general = age / statistics = mean, variance. temporary. select if ( sex eq 2 ). frequencies general = age / statistics = mean, variance.

5. And in closing . . . You are strongly encouraged to use SPSS to do your homework instead of doing your homework problems via hand calculations. If you do this, you must hand in your entire computer outputs in lieu of your hand calculations, however.

5