Modeling Forest Fire Occurrences in Riau Province, Indonesia using Data Mining Method

4/18/2012 Photo credit: http://www.gowallpaper.net/ Modeling Forest Fire Occurrences in Riau Province, Indonesia using Data Mining Method Imas Sukae...
Author: Steven Bryant
3 downloads 4 Views 1MB Size
4/18/2012

Photo credit: http://www.gowallpaper.net/

Modeling Forest Fire Occurrences in Riau Province, Indonesia using Data Mining Method Imas Sukaesih Sitanggang Lecturer at Computer Science Department , Bogor Agricultural University PhD student at Universiti Putra Malaysia Supervisors:  Dr. Razali Yaakob Assoc. Prof. Dr. Norwati Mustapha Assoc. Prof. Dr. Ahmad Ainuddin B Nuruddin Presented at the SEARCA Agriculture & Development Seminar Series (ADSS), Los Baños, Philippines,  17 April 2012

1

Introduction • In the last 25 years, Riau province has lost more than 65% of forest (about 4 million hectares (ha)) in which forest cover decreased from 78% in 1982 to 27% in 2007 (Uryu et al., 2008). • Riau had about 4.044 millions hectares (56.19 %) of peatland in 2002 (Wahyunto and Suryadiputra, 2008). • High deforestation in Riau majority occurred on peatland. • Between 1997 and 2007, in Riau, (Uryu, et.al., 2008). – more than 72,000 active fires (hotspots) – estimated total emission reached 3.66 Gt CO2 Photo credit: www.antarafoto.com Presented at the SEARCA Agriculture & Development Seminar Series (ADSS), Los Baños, Philippines,  17 April 2012

2

1

4/18/2012

Introduction • Why peatland? – Important for biodiversity conservation – Provide important support for human welfare – Hold large store of carbon – Reduce flooding risk • When fires occurs in peatland, its effects are more dangerous because the fire produce not only CO2 emissions but also smoke haze problems. • Smoke haze from peatland fires influence the city traffic, sea transportation and flights, human health and other economical lost (Herawati et al, 2006). • Early warning system has an important role in minimizing the damage due to forest fire  Wildfire Risks Model Presented at the SEARCA Agriculture & Development Seminar Series (ADSS), Los Baños, Philippines,  17 April 2012

3

Related works in Modeling Wildfire Risks Reference/ Location

Method/Data

Darmawan et. al., 2000 East Kalimantan, Indonesia

Geographical Information System (GIS) & remote sensing (RS) Data: vegetation fuel type derived from land use/cover map, terrain, road, and bare soil.

Boonyanuphap, 2001. Sasamba, East Kalimantan Indonesia

GIS & Complete Mapping Analysis (CMA) Data: physical- environmental & human activity factors

Hadi, 2006 Bengkalis, Riau Indonesia

GIS & CMA for peat swamp Data: environmental & infrastructure aspects.

Setiawan, 2007 Pekan, Pahang Malaysia

Analytical Hierarchy Process (AHP) & GIS for peat swamp Data: fuel type, road proximity, elevation, slope and aspect.

Danan, 2008 West Kutai District, East Kalimantan, Indonesia

GIS & RS, Binary logistic regression, AHP Data: weather data, land use, road, river, peatland depth

Razali , 2010 Kuantan, Pahang, Malaysia

GIS & RS for peat swamp Data: fuel types, roads and canal

Presented at the SEARCA Agriculture & Development Seminar Series (ADSS), Los Baños, Philippines,  17 April 2012

4

2

4/18/2012

Methods in Wildfire Risks Model • Forest fire modeling using non-data mining methods: – use the weightage and criterion of variables that involve the subjective and qualitative judging for variables – based on expert knowledge or the previous experienced of the developers that may result too subjective models – most applied to evaluate the small problem containing few criteria • Forest fires modeling includes many human and natural factors (geographic, weather, social and economic data). • Using the subjective and qualitative method, the forest fire risk model from the large spatial data is not easy to develop.

• Lead to the application of data mining methods in forest fires data Presented at the SEARCA Agriculture & Development Seminar Series (ADSS), Los Baños, Philippines,  17 April 2012

5

What is a hotspot? • A pixel in digital satellite images, which has higher temperature than particular threshold value • Threshold: 315 - 330 K (Day time capturing), 303 - 320 K (Night time capturing) • Each detected fire represents the centre of an (approximately) 1km pixel that contains one or more fire hotspots

http://www.weather.gov.sg/wip/web/ASMC

Presented at the SEARCA Agriculture & Development Seminar Series (ADSS), Los Baños, Philippines,  17 April 2012

6

3

4/18/2012

What is data mining? • Extraction of interesting (nontrivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data (Han & Kamber, 2006). Database Technology

Algorithm

Data Mining

Pattern Recognition

Statistics

Photo credit: http://courses.essex.ac.uk/ce/ce802/

Other Disciplines

Presented at the SEARCA Agriculture & Development Seminar Series (ADSS), Los Baños, Philippines,  17 April 2012

7

Data Mining Tasks Association rules, classification and prediction, clustering, outlier analysis, … One of the methods: Data mining in forest fires: Reference Stojanova et. al., 2006

Decision Tree Algorithm

Method/Data Logistic regression & decision trees algorithms Data: meteorological ALADIN data & MODIS satellite data

Results Predictive models of fire occurrence in Slovenia

Prasad and Clustering (K-Means), fuzzy logic Ramakrishna, 2008 Data: digital satellite images

Fuzzy rule base for detection of forest fires

Angayarkkani and Radhakrishnan, 2009

Fuzzy rules for the fires detection

Spatial data mining, image processing & artificial intelligence techniques Data: digital satellite images

Presented at the SEARCA Agriculture & Development Seminar Series (ADSS), Los Baños, Philippines,  17 April 2012

8

4

4/18/2012

Objectives 1. develop a spatial decision tree algorithm that construct trees from spatial data based on the algorithm ID3. 2. apply the spatial decision tree algorithm on historic forest fires data for Rokan Hilir district, Riau province, Indonesia to develop a model for hotspot occurrences prediction. 3. compare the classification model based on the spatial decision tree algorithm to those constructed by non-spatial decision tree algorithms as well as logistic regression in term of accuracy.

Output 1. A new spatial decision tree algorithm 2. Hotspot occurrences models Presented at the SEARCA Agriculture & Development Seminar Series (ADSS), Los Baños, Philippines,  17 April 2012

9

Study area • Rokan Hilir District, Riau Province Indonesia, total area is 896,142.93 ha. (about 10% of the total area of the Riau Province). • It is situated in area between 100° 17' - 101° 21' East Longitude and 1° 14' - 2° 45' North Latitude.

Indonesia Rokan Hilir District

Riau Province

Presented at the SEARCA Agriculture & Development Seminar Series (ADSS), Los Baños, Philippines,  17 April 2012

10

5

4/18/2012

Spatial and non-spatial data No 1

Data Spread and coordinates of hotspots, for the year 2008

Source FIRMS MODIS Fire/Hotspot, NASA/University of Maryland

2

Weather data: maximum daily temperature, daily rainfall, and speed of wind, for the period: 2005-2009

Meteorological Climatological and Geophysical Agency (BMKG)

3

Digital maps for vegetation/types of forest, road, rivers, village, administrative border, land cover, and water area

National Coordinating Agency for Survey and Mapping (BAKOSURTANAL)

4

Digital maps for peatland depth and peatland types

Wetland International

5

Social and economic data from different regions of Riau

BPS-Statistics Indonesia

6

Landsat TM, Resolution: 30 x 30 m2

U.S. Geological Survey

Presented at the SEARCA Agriculture & Development Seminar Series (ADSS), Los Baños, Philippines,  17 April 2012

11

Open source tools • Quantum GIS 1.7.2 for spatial data analysis and visualization (http://www.qgis.org) • PostgreSQL 9.1 for the spatial database management system (http://www.postgresql.org) • PostGIS 1.5 for spatial data analysis (http://www.postgis.org) • Python 2.7.2 for programming (http://www.python.org/) • R for statistical computing (http://www.r-project.org/) • Ilwis 3.7 for burn area processing (http://www.ilwis.org) • Weka 3.6.6 for non-spatial data mining (http://www.cs.waikato.ac.nz/ml/weka/)

12

6

4/18/2012

Research Methodology 1. Forest Fires Data Preprocessing a) Burn Image Processing b) Physical Data Processing c) Social and Economic Data Processing d) Spatial Interpolation for Weather Data 2. Develop an Extended ID3 Decision Tree Algorithm for Spatial Data *) 3. Apply the Extended ID3 Algorithm to Forest Fires Data (Rokan Hilir District, Riau Province) 4. Hotspot occurrences models comparison 5. Calculate Keetch-Byram Drought Index (KBDI) *) Discussed on the paper “An Extended ID3 Decision Tree Algorithm for Spatial Data”, published on the First IEEE International Conference on Spatial Data Mining and Geographical Knowledge Services (ICSDM 2011), June 29 - July 1, 2011 Fuzhou, China Presented at the SEARCA Agriculture & Development Seminar Series (ADSS), Los Baños, Philippines,  17 April 2012

13

Burn area processing Landsat TM, Band combination 7, 4, 2*, Acquisition date: 2006-07-24

After clustering on subset images and 4 steps of majority filter

To generate random points as false alarms near hotspot (true alarms)

* Source: USGS (http://edcsns17.cr.usgs.gov/NewEarthExplorer/)

14

7

4/18/2012

Radius of buffer from hotspot 0.907374 km

True alarm False alarm

True alarm: hotspot False alarm: randomly generated near hotspot. Presented at the SEARCA Agriculture & Development Seminar Series (ADSS), Los Baños, Philippines,  17 April 2012

15

Hotspots as target objects 0.907374 km

True alarm Hotspots

True alarm inside buffer

Target objects: True alarm data (positive examples): hotspots in 2008 False alarm data (negative examples): randomly generated and they are located within the area at least 0.907374 km away from any true alarm data

False alarm

16

8

4/18/2012

Data preprocessing Calculate distance to nearest river, road, city center Road segment River segment

Target object

Road segment

River segment

Target object

Operation applied: ST_Distance(target.the_geom, river.the_geom)

Operation applied: ST_Distance(target.the_geom, road.the_geom)

17

Problems in Data Preprocessing • Spatial operations on polygon features result non-polygon features. • Computation on valid geometries result invalid geometries Invalid geometry

Invalid geometry

Invalid geometry

Presented at the SEARCA Agriculture & Development Seminar Series (ADSS), Los Baños, Philippines,  17 April 2012

18

9

4/18/2012

Problems in Data Preprocessing (continue)

No target object in small polygon

No target object in polygon

Invalid part in yellow polygon

Some target objects outside polygon

Presented at the SEARCA Agriculture & Development Seminar Series (ADSS), Los Baños, Philippines,  17 April 2012

19

Weather data interpolation Weather variables include (in NetCDF format) for the year 2008: Field name

unit

Precipitation

mm/day

Screen temperature

K

10m wind speed

m/s

Surface height

m

Primary variables Secondary variable

Tool used: ArcMap 9.3 Screen temperature

Method : Cokriging Precipitation 10m wind speed

20

10

4/18/2012

Decision Tree Algorithm • To construct decision trees from a dataset • For example: Quinlan’s ID3, C4.5 as a successor of ID3 and CART (Classification and Regression Tree) A decision tree contains three types of nodes: 1. a root node, 2. internal nodes (nonleaf node), either a root node or an internal node contains attribute test conditions to separate records that have different characteristics, 3. leaf or terminal nodes, each leaf node is assigned a class label. 21

Decision Tree

• The task of classification aims to discover classification rules that determine the label class of any object (Y) from the values of its attributes (X). • A decision tree is a model expressing classification rules. Spatial Join Index

Spatial dataset

Attribute X (explanatory attributes): land cover, road, river, etc

improved ID3

Target attribute (Y): hotspot Spatial Decision tree22

11

4/18/2012

Spatial Decision Tree

River

Road

Land cover

Explanatory layers

Income Source Peatland depth Temperature

Target layer Hotspot

Spatial Decision Tree Algorithm

False alarm

Topological & Metric relationship

Spatial Decision Tree 23

Layers and distinct values in spatial dataset Layer Physical Distance to nearest river (dist_river), 'l1' Distance to nearest road (dist_road), 'l2' Distance to nearest city center (dist_city), 'l0' Land cover (land_cover), 'l4' Social-economy income source (income_source), 'l3' Weather Precipitation in mm/day (precipitation), 'l6' Screen temperature in K (screen_temp), 'l8' 10m wind speed in m/s (wind_speed), 'l9' Pealtand Peatland type (peatland_type), 'l5'

Number of features

Number of distinct values

1030 points 1030 points 1030 points

3 (low, medium, high) 3 (low, medium, high) 3 (low, medium, high)

3058 polygons

12 (Dryland_forest, plantation, Water_body and so on)

117 polygons

7 (Forestry, Agriculture, Trading_restaurant and so on)

7 polygons 7 polygons 7 polygons

2, 3 297, 298, 299 0, 1, 2

58 polygons

Hemists/Saprists, and so on

Peatland depth (peatland_depth), 'l7'

68 polygons

D1 (Shallow/Thin 50-100 cm), D2 (Moderate 100-200 cm), D3 (Deep/Thick 200-400 cm), D4 (very deep/very thick > 400 cm)

Target target

1030 points 24

12

4/18/2012

Applying Spatial ID3 Algorithm • Result: a spatial decision tree with the first test attribute is income source. Accuracy of spatial decision tree Dataset to calculate accuracy Data training Data testing Data testing

Accuracy 76.51% 71.12%, without pruning tree 71.66%, after pruning tree (4 iteration)

Size of tree and number of rules Spatial Decision Tree Without pruning tree After pruning tree (4 iteration)

Number of rules generated 134 122

Size of tree 613 553

Presented at the SEARCA Agriculture & Development Seminar Series (ADSS), Los Baños, Philippines,  17 April 2012

25

Applying Spatial ID3 Algorithm Spatial vs non-spatial algorithms Algorithm Accuracy Spatial algorithm Extended ID3 Spatial Decision Tree (the 71.66% proposed algorithm) Non Spatial algorithm (available in Weka 3.6.6) ID3 Decision Tree algorithm 49.02 % J48 Decision Tree (with pruned tree) *) 65.24 % *) J48 is Java implementation for the C4.5 Decision Tree Algorithm

Presented at the SEARCA Agriculture & Development Seminar Series (ADSS), Los Baños, Philippines,  17 April 2012

26

13

4/18/2012

Pruning tree • To overcome overfitting in creating decision tree. • Error on testing data increases because leaves in the large trees reflect noises or outlier • The method used: postpruning – Tree is fully grown at first, and then a subtree of the tree at a given node are pruned by removing the its branches and replacing it with a leaf. unpruned tree

pruned version

L1

v11

v12

L2 v21 No

L3 v22

v31 L6

L5 v51 Yes

v32 L7

v61

v62

v71

v72

No

No

No

Yes

Presented at the SEARCA Agriculture & Development Seminar Series (ADSS), Los Baños, Philippines,  17 April 2012

27

Spatial Decision Tree Subtree

Rule 4: IF income_source = Trading restaurant THEN Hotspot Occurrence = F Rule 2: IF income_source = Forestry AND land_cover = Bare_land AND 1 ≤ wind_speed (m/s) < 2 THEN Hotspot Occurrence = T

28

14

4/18/2012

Sample rules 1.

IF income_source = Forestry AND land_cover = Bare_land AND 0 ≤ wind_speed (m/s) < 1 AND 297 ≤ screen_temp (K) < 298 AND peatland_depth = D4 (Very deep/Very thick > 400 cm) THEN Hotspot Occurrence = F

2.

IF income_source = Forestry AND land_cover = Bare_land AND 1 ≤ wind_speed (m/s) < 2 THEN Hotspot Occurrence = T

3.

IF income_source = Forestry AND land_cover = Paddy_field AND 0 ≤ wind_speed (m/s) < 1 THEN Hotspot Occurrence = F

4.

IF income_source = Trading_restaurant THEN Hotspot Occurrence = F

5.

IF income_source = Plantation AND dist_road (m)

Suggest Documents