Introduction to SPSS

Center for Teaching, Research & Learning Social Science Research Lab American University, Washington, D.C. http://www.american.edu/provost/ctrl/ 202-8...
3 downloads 0 Views 818KB Size
Center for Teaching, Research & Learning Social Science Research Lab American University, Washington, D.C. http://www.american.edu/provost/ctrl/ 202-885-3862

Introduction to SPSS SPSS is one of the most widely used programs for statistical analysis in social science.

Course Objective This course is designed to give a basic understanding of how SPSS works and how to run simple statistical analysis of data.

Learning Outcomes 1. Understanding the layout and interface of SPSS 2. Introducing the main menus 3. Opening and creating new datasets 4. Analyzing data using descriptive statistics

1. Layout and Interface: Data Editor, Syntax Editor, and Output Viewer SPSS consists of three parts: 

The Data Editor



The Syntax Editor



Output Viewer

When you start SPSS, the Data Editor window opens by default.

The Data and Variable Editors The Data Editor allows you to create your data set and perform statistical operations interactively, using pull-down menus. The Data Editor window has two sheets: 1

By default the Data View opens whenever you open the Data Editor. It contains your actual data set. Here, the variable names are displayed in the grey row right above line 1. Each white row represents a case, and each column represents a variable. SPSS’S Data Viewer:

The Variable View allows you to name your variables, to identify missing values, assign variable and value labels etc.

Tip: You can move between The Data and Variable Views by using the tab on the bottom of the screen. 2

What these columns mean Name: Is the name of the variable. These will appear in the column headers in the Data View. In SPSS variable names may not have spaces. Type: Is the type of data in a variable. String refers to data stored as text; usually proper names. Numeric variables store data as a number. Other useful options are Date and Dollar. 3

Tip: SPSS cannot perform statistical functions on data stored as strings. Width: Tells the computer how much space each case needs to take up. This is measured in characters. Thus a width for country means that country names can be no longer than 12 letters long. Decimals: Tells the computer how many decimals to display. If you do not want to see a decimal point at all enter a zero here. Labels: This column is useful for explaining what the variable is measuring. You may use spaces here. Values: These allow you to display certain labels depending on the data in each case. In the example below all countries with a 1 in this variable will display OECD in the Data View. These labels will also appear in tables and graphs that you create.

Tip: Labels can be hidden/revealed in the Data view by clicking on this button at the top of the SPSS window.

4

Tip: String variables are automatically set as Nominal. Tip: Frequently data sets from the internet do not have their variables set correctly.

The Syntax Editor and Output Viewer The Syntax Editor fulfills the same function as the Data Editor. However, there are a couple of reasons why you should be aware of SPSS syntax even if you may plan to primarily use the dialog boxes. For one, not all procedures are available through the dialog boxes and you should be aware of the Syntax Editor so that you can save procedures as syntax to be rerun at a later date. The dialog boxes available through the pull-down menus have a button labeled Paste which will print the syntax for the procedure you are running in the dialog box environment to the Syntax Editor.

5

To open the syntax editor go to FILE > New > Syntax The Output Viewer displays the results of statistical operations you perform on your data. It pops up automatically once you run a statistical procedure.

2. Important Menu Commands Take a look at the menu bar. There are several pull-down menus. The most important ones are the following: The Data Menu The Data menu provides techniques for defining variables, inserting variables or cases, sorting files, splitting files, merging data sets, aggregating data, or using a select command to look at a subgroup within the data file. For more about this see the SPSS Data Management Tutorial. The Transform Menu The Transform menu allows you to transform your data set on the basis of existing variables. Among other things, you can recode your variables and compute new variables from existing ones. For more about this see the SPSS Data Management Tutorial. The Analyze Menu With the Analyze menu you perform statistical operations on your data set, the output of which will be displayed in the Output Viewer. In this tutorial we will be exploring descriptive statistics using this menu. For more information about other statistical functions, see the SPSS Bivariate Statistics and Regression tutorials. The Graphs Menu The Graphs menu contains a number of graph options that allow you to visually display descriptive statistics in the Output Viewer. For more information about this see the SPSS Graphing Tutorial.

6

3. Opening and Creating a Dataset If you already have a SPSS dataset, you can open it in the following way: 1. Select the File pull-down menu > Open > Data. A dialog box pops up. 2. Browse for your dataset and open it. (use world95.sav) If you have a dataset in Excel, it is easy to open it in SPSS. 1. Select the File pull-down menu > Open > Data. A dialog box pops up. In the line that specifies “Files of Type,” change the file type from SPSS to Excel. 2. Browse for your dataset and open it. (use demo.xls) You can also create a dataset from scratch in the Data Editor: Now, let’s enter the data for this survey.

1. Go to the Variable View sheet and specify your variables. a. In row one (the first variable): b. In name, type “age” c. In type, choose numeric (this will usually be either numeric or string/text) d. Width: 2 characters (width of 2…people in this class are probably all less than 100) e. Decimals: 0 (age is reported in whole years) f.

Label: “respondent age” (the variable name is limited in that you cannot use spaces. The variable label allows you to give more information about variables)

g. Values: select “none”. You would only include values if each number means something more than a number. We will talk about this more momentarily) 7

h. Measure: select “scale”. Age is scale variable. There is an order but there is a possible infinite number of possibilities. Other common scale variables include income, GDP, adult literacy rate. 2. In row two (the second variable) a. Name: College b. Type: string (“string” means that there is text being used, not numbers) c. Width: 5 (KOGOD has 5 letters is the longest college name) d. Label: “College of respondent” e. Measure: Nominal 3. In row three, we are going to save the college information in a different way. This time, we will assign numbers to the various college (1=SIS, 2=SPA, 3=CAS, 4=KOGOD, 5=SOC) 4. Name: Collegenum (you can’t have two variables with the same name and no spaces) a. Type Numeric b. Width: 1 c. Decimals: 0 d. Label: “College of respondent, coded” e. Click on “Values” o In the “value” box, enter “1”. In the “label” box, enter “SIS”. Click “add” o Continue with SPA, CAS, KOGOD, and SOC o Click OK f.

Measure: Nominal. This is a “nominal” variable. A nominal variable is a variable in which the data values represent something else. While SIS=1 and SPA=2, this does not mean that SIS is better than SPA or vice versa. That is, the numbers mean something but the

order isn’t important. 5. Return to the Data View sheet and enter your data, one case per line. 6. Save your data by selecting the File pull-down menu and using the Save option.

8

4. Running descriptive statistics and frequencies 

Descriptive statistics To run descriptive statistics, go to Analyze > Descriptive Statistics > Descriptives…



Select the variables for which you want the descriptives.



To specify the kind of descriptives you want, click on the Options button.



Then click OK.



The results will be displayed in the Output Editor.



You can get the mode in the frequencies section. Let’s do this for the age and collegenum variables.

Standard Deviation The standard deviation is a measure of dispersion that is calculated based on the values of the data. It allows us to see how widely the data are dispersed around the mean. The standard deviation has the desirable property that, when the data are normally distributed, 68.3 % of the 9

observations lie within +/- 1 standard deviation from the mean, 95.4% within +/- 2 standard deviations from the mean and 99.7 % within 3 standard deviations from the mean. Skewness and Kurtosis To state that the data are normally distributed simply means that the distribution of the data resembles a bell shaped curve; in such a case, most of the 10 observations are clustered around the mean. In reality, it is rare to find data that is perfectly normally distributed but they might appear to be somehow close to a normal distribution. Two statistics will help us determine whether this is the case. Skewness Skewness is a measure of whether the peak is centered in the middle of the distribution. A positive value means that the peak is off to the left, and a negative value suggests that it is off to the right. Kurtosis Kurtosis is a measure of the extent to which data are concentrated in the peak versus the tail. A positive value indicates that data are concentrated in the peak; a negative value indicates that data are concentrated in the tail. Frequencies To run frequencies, go to Analyze > Descriptive Statistics > Frequencies… 

Select the variables for which you want the frequencies.



To specify more options, click on the charts and format buttons.



To get descriptive statistics along with the frequencies, you can click on the Statistics button.



Then click OK. The results will be displayed in the Output Editor.

Let’s do this for the age, college, and collegenum variables

10

The Mean The mean is defined as the sum of a series of observations divided by the number of observations in the series. It is commonly used to describe the central tendency of variables. The Median A limitation of the mean as an indicator of central tendency is that its value is greatly affected when a few observations have very large or very low values. The median is the middle value in a series of values. It is the observation that divides the sample into two sub-samples of the same size. The median should always be used when your sample contains a relatively small number of observations and/or when a few very large – or small – values affect estimates of the mean. The Mode The mode is defined as the most frequent value of a variable. This indicator might convey more information about the central tendency of a series when variables have certain values that are much more frequent than the others.

Tip: For nominal variables, the mode is more meaningful that the mean. You can also create histograms through the Frequencies Window. 11

Histograms Histograms show the number of observations in each category. They are very useful because they give a quick visual of the central tendency, the extent of dispersion, and also whether any unusually large or small observations are present.

Exporting your output To export outputs, go to File > Export. A dialog box pops up. Specify the type of output you want to export from the Export drop-down menu. Specify your file destination and name in the File Name box. Specify your file type from the File Type drop-down menu. You can export the outputs in Html, Text, Excel, Word, and PowerPoint format. Then click OK. If you only want one or two tables or charts, you can select them in the output viewer and press CTRL+C. You can then paste these into a word processor. If you’re doing this in MS Office you will want to use “Paste Special”. In Office 2007 this can be found in the Home Tab by clicking on the arrow under Paste button. You will want to paste your table as an Enhanced Metafile.

This will paste your tables into your

document in an aesthetically pleasing and readable manner. 12

Suggest Documents