Stat 5421, Fall 2006: Blowdown data, part 2

Stat 5421, Fall 2006: Blowdown data, part 2 Here is the plot of y = blowdown (1) or not (0) versus x = log(D). options(width = 60) bddata m1 class(...

Author: Madlyn Carson

1 downloads 2 Views 115KB Size

Report

Download PDF

Recommend Documents

STAT 248: Exploratory Data Analysis Handout 2

STAT Introductory Statistics Fall URI

2 STAT-PAK ASSAY

Volume 3, Number 2, Fall 2006

Stat 110 Final Review, Fall 2011

STAT-GB.2302, STAT-UB.0018: FORECASTING TIME SERIES DATA

5421. der Bundesregierung

BDV60 Blowdown Vessel

Suitability Analysis With Raster Data Part 2

Tribological Design Data. Part 2: Lubrication

BT1000 Boiler Blowdown Timer

Data Hazards. Pipelining Part 2. Data Hazard Example. Forwarding

CS 277, Data Mining Web Data Analysis: Part 2, Advertising

STAT 341: PROBABILITY: FALL 2016 COMMENTS ON HW PROBLEMS

STAT 1220, Sections 002 and 004, Fall, 2014

Topic 2 Data Communications and Networking (Part 2)

I ntegrity. Research. volume 9 No. 2 Fall Spring 2006

Getting Started with SPSS Stat E100 Fall 2013

STAT 200. Basic Statistical Practice On-Line Course, Fall 2003

Biol 213 Genetics (Fall 2000) Mendelian Genetics (Part 2)

Polish Law Review Fall 2006

CSE 3 Final Fall 2006

Stat 2610 Test 2 Review Names

famos Dezember 2006 Hakenkreuz-Fall

Stat 5421, Fall 2006: Blowdown data, part 2 Here is the plot of y = blowdown (1) or not (0) versus x = log(D).

options(width = 60) bddata m1 class(m1) [1] "glm" "lm" > summary(m1) Call: glm(formula = y ~ x, family = binomial(), data = bddata) Deviance Residuals: Min 1Q Median -2.4606 -0.9947 -0.5093

3Q 1.0631

Max 2.0527

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -4.79181 0.20786 -23.05 range(bddata$x) [1] 1.609438 4.442651

0.6 0.4 0.2 0.0

Blowdown

0.8

1.0

> plot(bddata$x, bddata$y, xlab = "Log(D)", ylab = "Blowdown") > newx lines(newx, predict(m1, newdata = data.frame(x = newx), + type = "response"))

1.5

2.0

2.5

3.0

3.5

4.0

Log(D)

3

4.5

Grouped data So far, this handout has treated the n = 3666 trees as 3666 independent trials, so that the probability of blowdown π(log(D)i) is different for each tree. In reality many trees were the same size, and all trees with the same value of log(D) will have the same value of π(log(D)): > table(bddata$D) 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10 10.5 167 3 199 1 203 2 191 2 179 5 201 8 11 11.5 12 12.5 13 13.5 14 14.5 15 15.5 16 16.5 154 11 197 2 172 7 210 6 184 3 141 5 17 17.5 18 18.5 19 19.5 20 20.5 21 21.5 22 22.5 147 5 120 4 93 4 113 8 85 1 90 7 23 23.5 24 24.5 25 25.5 26 26.5 27 28 28.5 29 71 1 68 3 86 8 48 1 52 46 4 38 29.5 30 30.5 31 31.5 32 32.5 33 33.5 34 34.5 35 1 55 4 29 1 45 2 23 1 23 4 21 35.5 36 37 37.5 38 39 39.5 40 41 42 42.5 43 2 15 9 3 12 9 1 11 3 6 1 1 44 45 46 47 48 48.5 49 49.5 51 51.5 52 54 4 3 4 2 3 1 1 1 2 1 2 1 56 58 85 1 1 1 The grouping should have no effect on the estimates of the regression parameters, as the program should take care of this automatically, but it will have an effect on the computed value of the deviance because the saturated model will be different. Under the each-tree-is-a unit model, which Agresti (p. 140), called ungrouped data, 3666 parameters are estimated, each equal to either zero or one. Under the grouped model, there are only 87 parameters, one for each unique value of D (and hence of log(D)), and the estimate of one of the 87 π(log(D)) is now the fraction of trees of this size the blowdown, generally between zero and one. Since the saturated models change, so do the deviances (See Section 4.5.3). To demonstrate this, we refit the blowdown data by grouping it into 87 rows, one for each of the 87 unique values of D. I have not found an easy way to do this, but fortunately you will almost never need to do it anyway. Here is how I did it: > bddata$m names(bddata) [1] "D"

"S"

"y"

"SPP" "x"

"m"

> t1 t1[1:5, ]

1

x 1.6094379124341

y m 7 167 4

2 3 4 5

1.70474809223843 1 3 1.79175946922805 18 199 1.87180217690159 1 1 1.94591014905531 33 203

> t1$x summary(m2 |z|) (Intercept) -4.79181 0.20786 -23.05