Stat 5421, Fall 2006: Blowdown data, part 2

Stat 5421, Fall 2006: Blowdown data, part 2 Here is the plot of y = blowdown (1) or not (0) versus x = log(D). options(width = 60) bddata m1 class(...
Author: Madlyn Carson
1 downloads 2 Views 115KB Size
Stat 5421, Fall 2006: Blowdown data, part 2 Here is the plot of y = blowdown (1) or not (0) versus x = log(D).

options(width = 60) bddata m1 class(m1) [1] "glm" "lm" > summary(m1) Call: glm(formula = y ~ x, family = binomial(), data = bddata) Deviance Residuals: Min 1Q Median -2.4606 -0.9947 -0.5093

3Q 1.0631

Max 2.0527

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -4.79181 0.20786 -23.05 range(bddata$x) [1] 1.609438 4.442651

0.6 0.4 0.2 0.0

Blowdown

0.8

1.0

> plot(bddata$x, bddata$y, xlab = "Log(D)", ylab = "Blowdown") > newx lines(newx, predict(m1, newdata = data.frame(x = newx), + type = "response"))

1.5

2.0

2.5

3.0

3.5

4.0

Log(D)

3

4.5

Grouped data So far, this handout has treated the n = 3666 trees as 3666 independent trials, so that the probability of blowdown π(log(D)i) is different for each tree. In reality many trees were the same size, and all trees with the same value of log(D) will have the same value of π(log(D)): > table(bddata$D) 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10 10.5 167 3 199 1 203 2 191 2 179 5 201 8 11 11.5 12 12.5 13 13.5 14 14.5 15 15.5 16 16.5 154 11 197 2 172 7 210 6 184 3 141 5 17 17.5 18 18.5 19 19.5 20 20.5 21 21.5 22 22.5 147 5 120 4 93 4 113 8 85 1 90 7 23 23.5 24 24.5 25 25.5 26 26.5 27 28 28.5 29 71 1 68 3 86 8 48 1 52 46 4 38 29.5 30 30.5 31 31.5 32 32.5 33 33.5 34 34.5 35 1 55 4 29 1 45 2 23 1 23 4 21 35.5 36 37 37.5 38 39 39.5 40 41 42 42.5 43 2 15 9 3 12 9 1 11 3 6 1 1 44 45 46 47 48 48.5 49 49.5 51 51.5 52 54 4 3 4 2 3 1 1 1 2 1 2 1 56 58 85 1 1 1 The grouping should have no effect on the estimates of the regression parameters, as the program should take care of this automatically, but it will have an effect on the computed value of the deviance because the saturated model will be different. Under the each-tree-is-a unit model, which Agresti (p. 140), called ungrouped data, 3666 parameters are estimated, each equal to either zero or one. Under the grouped model, there are only 87 parameters, one for each unique value of D (and hence of log(D)), and the estimate of one of the 87 π(log(D)) is now the fraction of trees of this size the blowdown, generally between zero and one. Since the saturated models change, so do the deviances (See Section 4.5.3). To demonstrate this, we refit the blowdown data by grouping it into 87 rows, one for each of the 87 unique values of D. I have not found an easy way to do this, but fortunately you will almost never need to do it anyway. Here is how I did it: > bddata$m names(bddata) [1] "D"

"S"

"y"

"SPP" "x"

"m"

> t1 t1[1:5, ]

1

x 1.6094379124341

y m 7 167 4

2 3 4 5

1.70474809223843 1 3 1.79175946922805 18 199 1.87180217690159 1 1 1.94591014905531 33 203

> t1$x summary(m2 |z|) (Intercept) -4.79181 0.20786 -23.05