Spatial Statistics. Spatial Point Processes, Part II 1. Oct 25, University, 2007

Spatial Statistics Spatial Point Processes, Part II1 Oct 25, 2012 1 These notes are largely based on Prof. Fuentes’ SPP notes from Warwick Universi...
Author: Mervin Lucas
1 downloads 1 Views 469KB Size
Spatial Statistics Spatial Point Processes, Part II1

Oct 25, 2012

1

These notes are largely based on Prof. Fuentes’ SPP notes from Warwick University, 2007 Point Processes II

Spatial Statistics

Spatial Point Patterns - 4 Classifications



Clusters 1.0

1.0

Completely Random ●

● ●

● ● ●



● ● ● ● ● ● ● ●

● ●

● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●



● ● ●









● ●

0.8

0.8

● ●







● ● ●







●● ●

spatpts[,2]

0.6 yc





● ●







● ●









● ●

















● ● ● ●● ● ● ● ● ● ● ● ● ● ●●

0.4



0.6

● ● ● ● ●



0.4

● ● ●

● ●



● ●

0.2





● ●















● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●





0.2



●●







● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0.0

● ● ●

● ●



0.5

0.6

0.7

0.8

0.0

● ●



0.0

0.2

0.4

spatpts[,1]



0.6

0.8

1.0

xc











● ●







0.6













0.4











● ●

● ●













0.2

0.6

0.0

● ●●

● ● ●





● ● ● ●

● ●

● ● ● ● ●

0.00

0.02



0.04

spatpts[, 1]

Point Processes II



● ●

1.0

● ●





0.0 0.8



● ●

● ●





● ●

● ●

0.4



●● ● ●

● ●● ●



0.0

● ●



















● ●





● ●

● ●



● ●





● ●



● ● ● ● ●





0.2











● ●













● ●











● ● ●







● ●

spatpts[, 2]

0.6

0.8





● ●











● ●





● ●



spatpts[, 2]











● ●















● ●













0.2

0.8







● ●



● ●









● ●







● ●

● ●

Heterogenous

0.4

1.0

Regular

0.06 spatpts[, 1]

Spatial Statistics

0.08

0.10

0.12

Spatial Point Process - Methods

1. Quadrat methods: I

I I

Based on a reduction of the SPP to counts of events within nonoverlapping subregions, i.e. quadrats, of equal size. Quadrats usually rectangular may or may not constitute an exhaustive partition of A.

2. Distance methods: I I

I

Based on a reduction of the SPP to distances to events May utilize interevent distances (e.g. distance of an event to its nearest neighbor) or point-to-event distances, or both May utilize distances only to nearest events, or to events beyond the nearest, or both.

Point Processes II

Spatial Statistics

Spatial Point Process - Methods Each of these types of methods has its advantages and disadvantages. I

quadrat methods emphasize “global’ information at the expense of “local” information, viceversa for distance methods.

I

Size and shape of quadrats are arbitrary, and different choices can give you different answers.

I

Two problems with distance methods are edge effects and overlap effects.

* edge effects

* overlap effects

Point Processes II

Spatial Statistics

Testing for Randomness

Quadrat methods: Let n1 , . . . , nm denote the counts from P a partitioning of A into m equally-sized quadrats. Write n¯ = ni /m for the sample mean of the ni ’s. Then compute the “index of dispersion”, 2

X =

m X

(ni − n¯)2 /¯ n.

i=1

It the pattern is completely random, then the distribution of X 2 is, to a good approximation, chi-square (m − 1 df.)

Point Processes II

Spatial Statistics

Testing for Randomness

Quadrat methods: Two interpretations of X 2 : 1. Pearson’s chi-square statistic, since E (ni | the uniformity implied by CSR. 2.

P

i

(m−1)s 2 n ¯

ni ) = n¯ under

i.e. (m − 1) times the sample variance-to-mean ratio, which makes sense since the mean and variance of a Poisson distribution are equal.

The test is two-sided: I

X 2 too large ⇒ aggregation or heterogeneity

I

X 2 too small ⇒ regularity

Point Processes II

Spatial Statistics

Testing for Randomness - Quadrat Methods Japanese Pines

Longleaf longleaf ●

● ●









● ●















● ●











● ●

● ● ●

● ●





● ●●●

● ●









●● ●

●● ● ●



● ● ●





● ● ●





● ● ● ●●

● ● ●

● ●



● ●























● ● ● ● ●

● ● ● ● ●



● ● ● ● ● ●





● ●

● ●

● ●



● ●● ● ● ● ● ● ●● ● ● ● ● ●

● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●



● ● ● ● ● ●

● ●



● ●

● ●







● ●









● ●



● ●●

●● ● ● ●

● ●

● ●

● ●





● ● ●

● ●



● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ●







● ●



● ●













● ● ●●

● ●

● ●

● ● ●

● ●







● ● ●

● ● ● ●



● ● ● ●



● ●

● ●



● ●



● ● ● ●●

● ●

● ●

● ● ● ●



● ●

● ●









● ●

● ● ● ● ● ●●

● ● ●







● ● ●





● ●



●●

● ● ● ● ● ●



● ●

● ●

● ● ●

● ●

●●







● ●





● ●● ● ● ●



● ● ●









● ● ●

● ● ● ● ● ●●● ●● ● ● ● ● ●

● ●







● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ●● ● ● ● ●● ● ●●

● ●● ● ● ●●

●●



● ●

●●



● ● ● ●● ●





● ●

● ● ●





● ●

● ●

● ●

● ● ● ● ● ●

● ●



● ●



● ●●





● ● ●



● ●







● ●●



●● ● ●

● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ●

● ●

● ●







● ●

● ●





●● ● ● ●● ● ● ● ●





● ● ● ●

● ●● ● ● ● ● ● ●

●● ● ● ● ● ●

● ●









● ●



●●







● ●

● ● ●

● ● ● ● ●●







● ● ●

● ●





●● ●



japanesepines ●

● ●

● ● ●

● ●

● ●

● ● ●

EXAMPLE 1: Analysis of Japanese black pines. Divide the study area into a 3 × 3 square grid of quadrats. Counts and result of test are as follows: EXAMPLE 2: Analysis of Longleaf data. Divide the study area into a 5 × 5 square grid of quadrats. Counts and result of test are as follows: Point Processes II

Spatial Statistics

Testing for Randomness - Quadrat Methods Japanese Pines

Longleaf datl ● ● ●● ●● ● ● ●● ●● ● ●● ● ● ● ● ● 25 37 7 ● ●●● ● ●● ●● ● ● ●26 ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●● 34 ● 50 51 27 ● 25 ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●29 ● ● ● ● 15 31 37 22 ●● ●● ● ●● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● 26 ● 19 ● 8 12 24● ● ● ●● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ●

datj ●

● ●





● ●









●●







● ●



20





15



































●● ●

● ●●











● ●

● ●





●●

● ●●





● ●







● ●





























● ●

















● ●







●●























● ● ● ● ●

● ●● ●

●●



● ● ●





●●









● ●● ●





● ●





















● ● ●



7











● ●

6







●●























































● ●

10





3

4





●●





● ●



● ● ●





●●● ●





















●● ●

● ●







●●







● ●











● ●



















● ●

● ●

● ● ●







4

8

8









18





●●















●● ●

















14

● ● ● ● ● ●

















12



●●

8

● ●

● ●●





7



EXAMPLE 1: Analysis of Japanese black pines. n¯ = 7.22, X82 = 8.80, PCSR (2.733 < χ28 < 15.51) = .90, CSR not rejected. EXAMPLE 2: Analysis of Longleaf data. n¯ = 23.36, X82 = 152.6438,, PCSR (χ224 < 51.18) = .999, p-value < 2.2e-16. CSR rejected. Point Processes II

Spatial Statistics

Testing for Randomness - Quadrat Methods

Criticisms of quadrat methods: I

Insensitive to regular departures from CSR

I

Conclusion can depend on quadrat size and shape, the choice of which is quite arbitrary. For example, if we repeat the procedure for the cells data using a 4 × 4 grid rather than a 2 = 2.95, and we reject CSR 3 × 3grid, we obtain X15 (P < .001) Likewise, if we repeat the procedure for the redwood data using a 2 × 2 grid, we obtain X32 = 5.56, and we do not reject CSR (P = .14).

I

Too much information is lost by reducing the pattern to quadrat counts.

Point Processes II

Spatial Statistics

Testing for Randomness Distance methods: I I

I

Clark-Evans: Based on mean nearest-neighbor (NN) distance Diggle’s: CE will perform poorly when there are more large and small, but fewer intermediate, NN distances √ than expected under CSR, but Y¯ is still about (2 λ)−1 This suggests that a test based on the entire empirical distribution function (EDF) of the NN distances may be more sensitive. Ripley’s K-function: the “K-function” (second-moment cumulative function) is defined as 1 E (Nt ) λ Nt = # of additional events within t of a randomly chosen event. K (t) combines distance measurement with quadrat counting, so we might expect it to contain more info than the NN distances and thus provide more sensitive analysis. K (t) =

Point Processes II

Spatial Statistics

Testing for Randomness

Distance Methods: Clark-Evans test: Based on the mean nearest-neighbor (NN) distance, Y¯ . Yi = distance to neighbor i I

Y¯ too small ⇒ aggregation (small-scale); Y¯ too large ⇒ regularity (small-scale)

I

Test statistic

Y¯ − CE = q

1 √ 2 λ

4−π 4λπN

where λ = N/|A|.

Point Processes II

Spatial Statistics

Testing for Randomness - Clark-Evans Method

q

I

1 √ 2 λ

I

Under CSR, and if edge and overlap effects are ignored, the distribution of CE is, to a fairly good approximation N(0, 1)

= E (Y ) ignoring edge and overlap effects;

4−π 4λπN

is SE

¯

i −E (Y ) Z = Y√ ∼ N(0, 1)

V (Y )

I

Powerful for detecting aggregation and regularity, weak at detecting heterogeneity

Point Processes II

Spatial Statistics

Testing for Randomness - Clark-Evans Method longleaf

japanesepines ●

● ●









● ●



● ●



● ●

● ●





● ●











● ●





● ● ● ●

● ●







● ● ● ●●











● ● ●

● ●



● ● ●





● ● ● ●●

● ● ●





● ●









● ●

● ● ●

● ● ● ● ● ●



● ● ● ● ● ●





● ●

● ●

● ●

● ●● ● ● ● ● ● ●● ● ● ● ● ●

● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●



● ● ● ● ● ●

● ●



● ●







● ●





● ●





● ●●

●● ● ● ●

● ●

● ●

● ●







● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ●

● ●





● ● ● ● ● ● ●



● ● ●





● ●

● ● ● ●



● ●

● ● ●

● ●



























● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ●

● ● ●

● ●











● ●



● ● ● ● ●



● ●





● ● ● ● ●







● ● ● ● ● ●●

● ●

● ●







● ●





● ●

● ● ● ●

● ●



● ●



● ● ● ●●

● ●



● ●



● ● ●

● ● ●

● ●





● ●





● ● ● ● ● ●



● ● ● ● ● ●

● ●● ● ● ●●









● ●





●●

● ●







● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ●● ●● ● ● ● ●● ● ●●





●● ● ● ● ● ●





● ●

●● ● ● ● ● ● ●●● ●● ● ● ● ● ●

● ● ● ●● ●



● ●

● ●

● ●

● ● ● ● ●

● ●

●● ●

●●

● ●



● ●





● ● ●

● ●

● ● ●



● ●

●● ● ●

● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●



● ●●

● ● ●



● ●









● ●



●● ● ● ●● ● ● ●



● ●

● ● ● ● ● ● ● ● ●

●● ● ● ● ● ●



●●

● ●











● ● ●

● ●











● ● ●

● ● ● ● ●●







● ●

● ● ● ●

●●

● ●



● ●

● ●

● ● ●





● ●

● ● ●

EXAMPLE 1: Japanese black pines. clarkevans(datj) (under different edge effects) naive = 1.064002 , Donnelly= 1.007507, cdf =1.056241 All close to 1. Compare w Quadrat test where could not reject. clarkevans.test(datj); p-val = 0.3236. Do not reject H0 EXAMPLE 2: Longleaf data. clarkevans(datl) naive =0.8320547, Donnelly = 0.8176799, cdf = 0.8194595 All less than 1. Suggests clustering, as seen in Quadrat test. clarkevans.test(datl); p-val = 8.216e-15. Reject H0 : CSR Point Processes II

Spatial Statistics

Testing for Randomness Diggle’s Refined NN analysis I

CE will perform poorly when there are more large and small, but fewer intermediate, NN √ distances than expected under ¯ CSR, but Y is still about (2 λ)−1 This suggests that a test based on the entire empirical distribution function (EDF) of the NN distances may be more sensitive.

I

Let

ˆ (y ) = 1 #(Yi ≤ y ) G N

IF CSR holds, ˆ (y ) should be “close” to G (y ) = 1 − exp(−λπy 2 ) for all G ˆ (y ) vs. G (y ) should be nearly a y > 0, and a plot of G straight line.

Point Processes II

Spatial Statistics

Testing for Randomness

Diggle’s Refined NN analysis I

ˆ (y ) > G (y ) for small y ⇒ aggregation (at small scale) G ˆ (y ) < G (y ) for small y ⇒ regularity (at small scale) G

I

ˆ (·) and G (·): Measures of discrepancy between G

I

ˆ (y ) − G (y )| (Kolmogorow-Smirnow type) 1. ∆G = maxy |G R ˆ (y ) − G (y )}2 dy (Cramer-von Mises type) 2. {G

Point Processes II

Spatial Statistics

Testing for Randomness - Diggle’s I

How do we judge significance? Because distribution theory for these is too difficult, we use Monte Carlo testing. That is, we compare the measure’s value for our data to the measure’s values for s simulations of an HPP.

I

Because we don’t know the true cdf G (due to edge and overlap effects), the use of ¯i (y ) = G

1 Xˆ Gj (y ) s −1 j6=i

in place of G (y ) is recommended. That is, take ˆi (y ) − G ¯i (y )| ui = maxy |G

Point Processes II

Spatial Statistics

Testing for Randomness - Diggle’s

I

Rather than reducing the EDF to a single summary statistics, it may be more informative ot look at a plot of the EDF. If ˆ (y ) vs. G (y ) the SPP is consistent with CSR, then a plot of G should be nearly a straight line from (0, 0) to (1, 1).

Point Processes II

Spatial Statistics

Testing for Randomness - Diggle’s Departures from CSR can be detected by means of simulation envelopes, whose upper and lower endpoints are defined as ˆi (y )} U(y ) = maxi=1,...,s {G ˆi (y )} L(y ) = mini=1,...,s {G where s is the number of simulated HPP patterns having the same ˆi (·) is the number of events (s is usually taken to be 99), and G NN-distance EDF for the ith simulation. For each y > 0 ˆ (y ) > U(y )] = P[G ˆ (y ) < L(y )] = P[G I

1 s +1

Simulation envelopes also indicate the distance at which a deviation, if any, from CSR occurs. Point Processes II

Spatial Statistics

Testing for Randomness - Diggle’s Method

Diggle Longleaf

1.0

Diggle Japanese Pine

Gobs(r)

0.8

Gobs(r)

Gtheo(r)

Ghi(r)

Ghi(r)

Glo(r)

Glo(r)

G(r) 0.0

0.0

0.2

0.2

0.4

0.4

G(r)

0.6

0.6

0.8

Gtheo(r)

0.00

0.02

0.04

0.06

0.08

0.10

0

1

2

r (one unit = 5.7 metres)

3

4

5

6

r (metres)

EXAMPLE 1:Japanese pines. plot(envelope(datj,Gest)). Inside the envelope. Do not reject H0 EXAMPLE 2: Longleaf data. plot(envelope(datl,Gest)) Above envelope suggests clustering. Reject H0 : CSR Point Processes II

Spatial Statistics

Testing for Randomness - More Diggle’s

We can do precisely the same kinds of tests using the EDF of the point-to-nearest event distances X1 , . . . , Xm from m random or systematically placed sample points. I

Let

1 Fˆ (x) = #(Xi ≤ x) m

If CSR holds, Fˆ (x) should be close to F (x) = 1 − exp(−λπx 2 ) for all x > 0 and a plot of Fˆ (x) vs. F (x) should be nearly a straight line.

Point Processes II

Spatial Statistics

Testing for Randomness - More Diggle’s

I I I

Fˆ (x) > F (x) for small x ⇒ regularity (small-scale) Fˆ (x) < F (x) for small x ⇒ aggregation (small-scale) A Monte Carlo test for significance is necessary, as no tables from simulation studies have been published.

ˆ (y ) and Fˆ (x) is what Diggle calls refined NN The use of both G analysis. Refined NN analysis can be done in R-splancs using the functions Ghat and Fhat.

Point Processes II

Spatial Statistics

Testing for Randomness

Ripley’s K-function approach I

The “K-function” (second-moment cumulative function) is defined as 1 K (t) = E (Nt ) λ Nt = # of additional events within t of a randomly chosen event.

I

K (t) combines distance measurement with quadrat counting, so we might expect it to contain more information than the NN distances and thus provide a more sensitive analysis.

Point Processes II

Spatial Statistics

Testing for Randomness - Ripley’s ˆ (t) of K (t). He then Ripley proposes a nonparametric estimator K ˆ ˆ (t)/π}1/2 vs. t and suggests looking at the plot of L(t) ≡ t − {K ˆ computing a test statistic Lmax = maxt πt 2 ) for small t ⇒ aggregation (small-scale)

I

L(t) > 0 (K (t) < πt 2 ) for small t ⇒ regularity (small-scale)

Point Processes II

Spatial Statistics

Testing for Randomness - Ripley’s

I

Ripley gives approximate 5% and 1% cutoff values: p |A| 1.42 N and

p |A| 1.68 N

respectively. I

Preferably, a Monte Carlo approach can be used to assess significance

I

A Ripley’s K-function analysis can be done in R-splancs using the Khat and Lhat functions

Point Processes II

Spatial Statistics

Testing for Randomness - Ripley’s K

Ripley's K Longleaf

0.25

Ripley's K Japanese Pine

Ktheo(r)

Ktheo(r) Khi(r)

Klo(r)

Klo(r)

6000

Khi(r)

K(r)

4000 0

0.00

0.05

2000

0.10

K(r)

0.15

0.20

Kobs(r)

8000

Kobs(r)

0.00

0.05

0.10

0.15

0.20

0.25

0

10

20

r (one unit = 5.7 metres)

30

40

50

r (metres)

EXAMPLE 1: Japanese pines. plot(envelope(datj, Kest)). Inside the envelope. Do not reject H0 EXAMPLE 2: Longleaf data. plot(envelope(datl, Kest)) Above envelope suggests clustering. Reject H0 : CSR Point Processes II

Spatial Statistics

Testing for Randomness - Ripley’s L

Ltheo(r)

Lhi(r)

Lhi(r)

40

0.25 0.20

Lobs(r)

Ltheo(r)

Llo(r)

20

L(r)

0.15

30

Llo(r)

0

0.00

10

0.05

0.10

L(r)

Ripley's L Longleaf

50

Ripley's L Japanese Pine

Lobs(r)

0.00

0.05

0.10

0.15

0.20

0.25

0

10

20

r (one unit = 5.7 metres)

30

40

50

r (metres)

EXAMPLE 1: Japanese pines. plot(envelope(datj, Lest)). Inside the envelope. Do not reject H0 EXAMPLE 2: Longleaf data. plot(envelope(datl, Lest)) Above envelope suggests clustering. Reject H0 : CSR Point Processes II

Spatial Statistics

Testing for Randomness Comparisons of Tests Some tests for CSR are more powerful than others against specific alternatives. How powerful a test is against a specific alternative is affected by whether the test statistic is primarily a function of “local” or “global” characteristics. I

Tests based on distances to the nearest event emphasize local characteristics and thus do well against aggregation and regularity, but not against intensity trends (heterogeneity). The Clark-Evans test and Diggle’s refined NN analysis are examples.

I

Quadrat count-based tests emphasize global characteristics and are thus more powerful against large-scale heterogeneity but weaker against aggregation and regularity.

I

Tests that combine distance measurement with quadrat counting, like the Ripley’s K-function, give some weight to both local and global characteristics, and thus might be regarded as good all-purpose tests.

Point Processes II

Spatial Statistics

References I

Baddeley and Turner, Modelling Spatial Point Patterns in R, 2005. http://www.springer.com/statistics/statistical+theory+ and+methods/book/978-0-387-28311-1

I

Fuentes, M. St810 Lecture notes 2000 & 2007

I

Gelfand et al, Handbook of Spatial Stats, 2010

I

Reich, B. ST733 Lecture notes 2012

I

spatstat and splancs R-packages

R code can be found to generate point processes on Prof. Reich’s webpage: http://www4.stat.ncsu.edu/~reich/st733/PPdatasets.R, and on Prof. Fuentes’ webpage: http://www.stat.ncsu.edu/people/ fuentes/courses/stwarwick/lab/lab07.html Upcoming: More SPP! Sparse data, Intensity Estimation, Multivariate Point Processes II

Spatial Statistics