STAT 408/608 Guided Exercise 7 ANSWERS For On-‐Line Students, be sure to: Key Topics • Submit your answers in a Word file to Sakai at the • Single Factor ANOVA same place you downloaded the file • Understanding the ANOVA Table • Remember you can paste any Excel or JMP output into a Word File (use Paste Special for best results). • Put your name and the Assignment # on the file name: e.g. Ilvento Guided7.doc Answer as completely as you can and show your work. Upload your file via Sakai. 1. This problem looks at the salary differences of Male and Female Mid-‐Level Managers at 220 firms. We will compare a Difference of Means Test and the ANOVA approach using this data. We will be looking at an Excel file with data on mid-‐level managers in 220 firms. The salary is given in $1,000s. We want to look at the female sample (n=75) and compare it to the male mean level (144) to see if it is lower than that of males (or if males is higher). Use an alpha level of .05. Here are the Excel output for descriptive for Females, males, and the total sample, as well as the Difference of Means Test assuming equal variances. t-Test: Two-Sample Assuming Equal Variances Descriptives
Females
Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count Confidence Level(95.0%)
140.467 1.443 139.000 146.000 12.496 156.144 -0.104 0.350 54.000 118 172 10535 75 2.875
Males 144.110 1.029 145.000 145.000 12.394 153.613 -0.275 -0.299 61.000 110 171 20896 145 2.034
Salary 142.868 0.844 143.500 145.000 12.521 156.763 -0.397 -0.078 62.000 110.000 172.000 31431.000 220 1.664
Mean Variance Observations Pooled Variance Hypothesized Mean Difference df t Stat P(T 6.01 Reject Ho: µ 1 = µ 2 = µ 3
c. R-‐square is a measure of the explanatory ability of the model. It is calculated as a proportion of:
R2 =
SST SSTotal
R2 = 14.149/19.826 = .71366 71.4% of the variability in cost is explained by the Car Type
The interpretation of R-‐Square is how much of the variability in the dependent variable is “explained” by the independent variable (the car models). It ranges from 0 to 1.0, with 1.0 meaning that all the variability of the dependent variable is explained by the independent variables. Calculate and interpret R-‐square for this model.
€
3. 3. This is a study to see the effects of three different pesticides. It is actually a block design, but we will ignore the block effect and save that for later. The researcher hypothesized that the three insecticides would have different impacts on the number of seedlings in a row. Factor = INSECTICIDE: the levels = 1, 2, 3 (Note: event though these are numbers, this is a nominal level variable. I could have labeled them A, B, C or some other name). She measured the number of seedlings in a row. Response Variable = SEEDLINGS. The following are the descriptive statistics for the Response Variable, along with a box plot and means for each insecticide. Seedlings Quantiles
45
60
75
Summary Statistics
100.0% maximum 99.5% 97.5% 90.0% 75.0% quartile 50.0% median 25.0% quartile 10.0% 2.5% 0.5% 0.0% minimum
90
94 94 94 93.7 84.5 79 63 50.4 48 48 48
Mean Std Dev Std Err Mean Upper 95% Mean Lower 95% Mean N Sum Variance Skewness Kurtosis CV N Missing Median Mode Range Interquartile Range
75 14.447397 4.1706042 84.179438 65.820562 12 900 208.72727 -0.529182 -0.593126 19.263196 0 79 83 46 21.5
Stem and Leaf Stem 9 8 8 7 7 6 6 5 5 4
Leaf 34 5 033 8 2 6 2 6
Count 2 1 3 1 1 1 1 1
8
1
4|8 represents 48
Oneway Analysis of Seedlings By Insecticide
Seedlings
90 80 70 60 50 1
2
3
Insecticide
Means and Std Deviations Level 1 2 3
Number 4 4 4
Mean 58.0000 87.0000 80.0000
Std Dev 7.83156 7.78888 5.71548
Std Err Mean Lower 95% Upper 95% 3.9158 45.538 70.462 3.8944 74.606 99.394 2.8577 70.905 89.095
a. Look at the data and graphs and briefly summarize the average seedlings for the different insecticides. Note: the Box Plots show the spread of the data around the median. Insecticide 1 has a much lower average seedlings compared with insecticides 2 and 3 (58.0 compared with 87.0 and 80.0). The variances of the three insecticides are very close to each other.
b. The following is the output from a JMP ANOVA Fill in the blanks in the Analysis of Variance Table. There are 5 numbers to calculate -‐ Error SS; Insecticide d.f.; MSE, F*, and R2. Summary of Fit Rsquare 0.798 Adj Rsquare 0.753 Root Mean Square Error 7.180 Mean of Response 75.000 Observations (or Sum Wgts) 12.000 Analysis of Variance Source DF Sum of Squares Mean Square F Ratio Prob > F Insecticide 2 1832.00 916.000 17.7672 0.0007* Error 9 464.00 51.556 C. Total 11 2296.00 Error SS SSTreatment – SSTotal = 2296.00 – 1832.00 = 464.00 Insecticide d.f.
k=3 groups k-‐1 = 2 d.f.
MSE
464/9 = 51.546
F Ratio (F*)
MSTreatment/MSE = 916.000/51.556 = 17.7672
Also, calculate R2 for this model: SSTreatment/SSTotal = 1832.00/2296.00 = .798 c. Conduct a Test to see if there is a mean difference in (1, 2, 3). Use an F-‐test with α= .01. You will need to look up the critical value of F for 2 and 9 d.f. at alpha = .01.
Null Hypothesis
Ho: µ 1 = µ 2 = µ 3
Alternative Hypothesis
Ha: at least two of the means differ
Assumptions of Test
Small sample normal distribution; equal variances
Test Statistic
F* = 17.7672
Rejection Region
F.01, 2, 9 d.f = 8.02
Comparison of Test Statistic with Rejection Region
F* > F.01, 2, 9 d.f 17.7672 > 8.02 Reject Ho: µ 1 = µ 2 = µ 3
This is the full JMP output for the same ANOVA, including some summary statistics and differences of means. Once we establish there is something going on in the model (at least one mean is different), we should ask which means are different. Based on the results, we can see that Insecticide 2 and 3 are both different from insecticide 1, but they are not significantly different from each other. The last test, using Tukey-‐Kramer’s HSD, is given at the bottom of the output. Oneway Analysis of Seedlings By Insecticide
Seedlings
90 80 70 60 50 1
2
3
Insecticide
All Pairs Tukey-Kramer 0.05
Oneway Anova Summary of Fit Rsquare Adj Rsquare Root Mean Square Error Mean of Response Observations (or Sum Wgts)
0.798 0.753 7.180 75.000 12.000
Analysis of Variance Source Insecticide Error C. Total
DF 2 9 11
Sum of Squares Mean Square 1832.00 916.000 464.00 51.556 2296.00
F Ratio 17.7672
Prob > F 0.0007*
Means for Oneway Anova Level Number Mean Std Error Lower 95% Upper 95% 1 4 58.0000 3.5901 49.879 66.121 2 4 87.0000 3.5901 78.879 95.121 3 4 80.0000 3.5901 71.879 88.121 Std Error uses a pooled estimate of error variance
Means and Std Deviations Level 1 2 3
Number 4 4 4
Mean 58.0000 87.0000 80.0000
Std Dev 7.83156 7.78888 5.71548
Std Err Mean Lower 95% Upper 95% 3.9158 45.538 70.462 3.8944 74.606 99.394 2.8577 70.905 89.095
Means Comparisons Comparisons for all pairs using Tukey-Kramer HSD Confidence Quantile q* 2.79201
Alpha 0.05
LSD Threshold Matrix Abs(Dif)-HSD 2 2 -14.176 3 -7.176 1 14.824
3 -7.176 -14.176 7.824
1 14.824 7.824 -14.176
Positive values show pairs of means that are significantly different.
Connecting Letters Report Level Mean 2 A 87.000000 3 A 80.000000 1 B 58.000000 Levels not connected by same letter are significantly different.
Ordered Differences Report Level 2 3 2
- Level 1 1 3
Difference Std Err Dif 29.00000 5.077182 22.00000 5.077182 7.00000 5.077182
Lower CL 14.8245 7.8245 -7.1755
Upper CL 43.17554 36.17554 21.17554
p-Value 0.0008* 0.0048* 0.3911
The results of the multiple comparisons (3 comparisons, insecticide 1 to 2, 1 to 3, and 2 to 3) indicate there is a significant difference between insecticides 1 and 2, as well as 1 and 3. Insecticide 2 and 3 both yield a significantly higher number of seedlings compared with insecticide 1. However, there is no significant difference between insecticides 2 and 3. We can tell this from either of these two aspects of the report: • Using the connecting letters, levels 2 and 3 have an A, indicating no difference between these two levels, but level 1 has a single B, indicating it is different from the others. • If we look at the confidence intervals, the interval for 1 and 2 or 1 and 3 does not contain zero, while the interval between 2 and 3 does contain zero.