4/18/2012
Photo credit: http://www.gowallpaper.net/
Modeling Forest Fire Occurrences in Riau Province, Indonesia using Data Mining Method Imas Sukaesih Sitanggang Lecturer at Computer Science Department , Bogor Agricultural University PhD student at Universiti Putra Malaysia Supervisors: Dr. Razali Yaakob Assoc. Prof. Dr. Norwati Mustapha Assoc. Prof. Dr. Ahmad Ainuddin B Nuruddin Presented at the SEARCA Agriculture & Development Seminar Series (ADSS), Los Baños, Philippines, 17 April 2012
1
Introduction • In the last 25 years, Riau province has lost more than 65% of forest (about 4 million hectares (ha)) in which forest cover decreased from 78% in 1982 to 27% in 2007 (Uryu et al., 2008). • Riau had about 4.044 millions hectares (56.19 %) of peatland in 2002 (Wahyunto and Suryadiputra, 2008). • High deforestation in Riau majority occurred on peatland. • Between 1997 and 2007, in Riau, (Uryu, et.al., 2008). – more than 72,000 active fires (hotspots) – estimated total emission reached 3.66 Gt CO2 Photo credit: www.antarafoto.com Presented at the SEARCA Agriculture & Development Seminar Series (ADSS), Los Baños, Philippines, 17 April 2012
2
1
4/18/2012
Introduction • Why peatland? – Important for biodiversity conservation – Provide important support for human welfare – Hold large store of carbon – Reduce flooding risk • When fires occurs in peatland, its effects are more dangerous because the fire produce not only CO2 emissions but also smoke haze problems. • Smoke haze from peatland fires influence the city traffic, sea transportation and flights, human health and other economical lost (Herawati et al, 2006). • Early warning system has an important role in minimizing the damage due to forest fire Wildfire Risks Model Presented at the SEARCA Agriculture & Development Seminar Series (ADSS), Los Baños, Philippines, 17 April 2012
3
Related works in Modeling Wildfire Risks Reference/ Location
Method/Data
Darmawan et. al., 2000 East Kalimantan, Indonesia
Geographical Information System (GIS) & remote sensing (RS) Data: vegetation fuel type derived from land use/cover map, terrain, road, and bare soil.
Boonyanuphap, 2001. Sasamba, East Kalimantan Indonesia
GIS & Complete Mapping Analysis (CMA) Data: physical- environmental & human activity factors
Hadi, 2006 Bengkalis, Riau Indonesia
GIS & CMA for peat swamp Data: environmental & infrastructure aspects.
Setiawan, 2007 Pekan, Pahang Malaysia
Analytical Hierarchy Process (AHP) & GIS for peat swamp Data: fuel type, road proximity, elevation, slope and aspect.
Danan, 2008 West Kutai District, East Kalimantan, Indonesia
GIS & RS, Binary logistic regression, AHP Data: weather data, land use, road, river, peatland depth
Razali , 2010 Kuantan, Pahang, Malaysia
GIS & RS for peat swamp Data: fuel types, roads and canal
Presented at the SEARCA Agriculture & Development Seminar Series (ADSS), Los Baños, Philippines, 17 April 2012
4
2
4/18/2012
Methods in Wildfire Risks Model • Forest fire modeling using non-data mining methods: – use the weightage and criterion of variables that involve the subjective and qualitative judging for variables – based on expert knowledge or the previous experienced of the developers that may result too subjective models – most applied to evaluate the small problem containing few criteria • Forest fires modeling includes many human and natural factors (geographic, weather, social and economic data). • Using the subjective and qualitative method, the forest fire risk model from the large spatial data is not easy to develop.
• Lead to the application of data mining methods in forest fires data Presented at the SEARCA Agriculture & Development Seminar Series (ADSS), Los Baños, Philippines, 17 April 2012
5
What is a hotspot? • A pixel in digital satellite images, which has higher temperature than particular threshold value • Threshold: 315 - 330 K (Day time capturing), 303 - 320 K (Night time capturing) • Each detected fire represents the centre of an (approximately) 1km pixel that contains one or more fire hotspots
http://www.weather.gov.sg/wip/web/ASMC
Presented at the SEARCA Agriculture & Development Seminar Series (ADSS), Los Baños, Philippines, 17 April 2012
6
3
4/18/2012
What is data mining? • Extraction of interesting (nontrivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data (Han & Kamber, 2006). Database Technology
Algorithm
Data Mining
Pattern Recognition
Statistics
Photo credit: http://courses.essex.ac.uk/ce/ce802/
Other Disciplines
Presented at the SEARCA Agriculture & Development Seminar Series (ADSS), Los Baños, Philippines, 17 April 2012
7
Data Mining Tasks Association rules, classification and prediction, clustering, outlier analysis, … One of the methods: Data mining in forest fires: Reference Stojanova et. al., 2006
Decision Tree Algorithm
Method/Data Logistic regression & decision trees algorithms Data: meteorological ALADIN data & MODIS satellite data
Results Predictive models of fire occurrence in Slovenia
Prasad and Clustering (K-Means), fuzzy logic Ramakrishna, 2008 Data: digital satellite images
Fuzzy rule base for detection of forest fires
Angayarkkani and Radhakrishnan, 2009
Fuzzy rules for the fires detection
Spatial data mining, image processing & artificial intelligence techniques Data: digital satellite images
Presented at the SEARCA Agriculture & Development Seminar Series (ADSS), Los Baños, Philippines, 17 April 2012
8
4
4/18/2012
Objectives 1. develop a spatial decision tree algorithm that construct trees from spatial data based on the algorithm ID3. 2. apply the spatial decision tree algorithm on historic forest fires data for Rokan Hilir district, Riau province, Indonesia to develop a model for hotspot occurrences prediction. 3. compare the classification model based on the spatial decision tree algorithm to those constructed by non-spatial decision tree algorithms as well as logistic regression in term of accuracy.
Output 1. A new spatial decision tree algorithm 2. Hotspot occurrences models Presented at the SEARCA Agriculture & Development Seminar Series (ADSS), Los Baños, Philippines, 17 April 2012
9
Study area • Rokan Hilir District, Riau Province Indonesia, total area is 896,142.93 ha. (about 10% of the total area of the Riau Province). • It is situated in area between 100° 17' - 101° 21' East Longitude and 1° 14' - 2° 45' North Latitude.
Indonesia Rokan Hilir District
Riau Province
Presented at the SEARCA Agriculture & Development Seminar Series (ADSS), Los Baños, Philippines, 17 April 2012
10
5
4/18/2012
Spatial and non-spatial data No 1
Data Spread and coordinates of hotspots, for the year 2008
Source FIRMS MODIS Fire/Hotspot, NASA/University of Maryland
2
Weather data: maximum daily temperature, daily rainfall, and speed of wind, for the period: 2005-2009
Meteorological Climatological and Geophysical Agency (BMKG)
3
Digital maps for vegetation/types of forest, road, rivers, village, administrative border, land cover, and water area
National Coordinating Agency for Survey and Mapping (BAKOSURTANAL)
4
Digital maps for peatland depth and peatland types
Wetland International
5
Social and economic data from different regions of Riau
BPS-Statistics Indonesia
6
Landsat TM, Resolution: 30 x 30 m2
U.S. Geological Survey
Presented at the SEARCA Agriculture & Development Seminar Series (ADSS), Los Baños, Philippines, 17 April 2012
11
Open source tools • Quantum GIS 1.7.2 for spatial data analysis and visualization (http://www.qgis.org) • PostgreSQL 9.1 for the spatial database management system (http://www.postgresql.org) • PostGIS 1.5 for spatial data analysis (http://www.postgis.org) • Python 2.7.2 for programming (http://www.python.org/) • R for statistical computing (http://www.r-project.org/) • Ilwis 3.7 for burn area processing (http://www.ilwis.org) • Weka 3.6.6 for non-spatial data mining (http://www.cs.waikato.ac.nz/ml/weka/)
12
6
4/18/2012
Research Methodology 1. Forest Fires Data Preprocessing a) Burn Image Processing b) Physical Data Processing c) Social and Economic Data Processing d) Spatial Interpolation for Weather Data 2. Develop an Extended ID3 Decision Tree Algorithm for Spatial Data *) 3. Apply the Extended ID3 Algorithm to Forest Fires Data (Rokan Hilir District, Riau Province) 4. Hotspot occurrences models comparison 5. Calculate Keetch-Byram Drought Index (KBDI) *) Discussed on the paper “An Extended ID3 Decision Tree Algorithm for Spatial Data”, published on the First IEEE International Conference on Spatial Data Mining and Geographical Knowledge Services (ICSDM 2011), June 29 - July 1, 2011 Fuzhou, China Presented at the SEARCA Agriculture & Development Seminar Series (ADSS), Los Baños, Philippines, 17 April 2012
13
Burn area processing Landsat TM, Band combination 7, 4, 2*, Acquisition date: 2006-07-24
After clustering on subset images and 4 steps of majority filter
To generate random points as false alarms near hotspot (true alarms)
* Source: USGS (http://edcsns17.cr.usgs.gov/NewEarthExplorer/)
14
7
4/18/2012
Radius of buffer from hotspot 0.907374 km
True alarm False alarm
True alarm: hotspot False alarm: randomly generated near hotspot. Presented at the SEARCA Agriculture & Development Seminar Series (ADSS), Los Baños, Philippines, 17 April 2012
15
Hotspots as target objects 0.907374 km
True alarm Hotspots
True alarm inside buffer
Target objects: True alarm data (positive examples): hotspots in 2008 False alarm data (negative examples): randomly generated and they are located within the area at least 0.907374 km away from any true alarm data
False alarm
16
8
4/18/2012
Data preprocessing Calculate distance to nearest river, road, city center Road segment River segment
Target object
Road segment
River segment
Target object
Operation applied: ST_Distance(target.the_geom, river.the_geom)
Operation applied: ST_Distance(target.the_geom, road.the_geom)
17
Problems in Data Preprocessing • Spatial operations on polygon features result non-polygon features. • Computation on valid geometries result invalid geometries Invalid geometry
Invalid geometry
Invalid geometry
Presented at the SEARCA Agriculture & Development Seminar Series (ADSS), Los Baños, Philippines, 17 April 2012
18
9
4/18/2012
Problems in Data Preprocessing (continue)
No target object in small polygon
No target object in polygon
Invalid part in yellow polygon
Some target objects outside polygon
Presented at the SEARCA Agriculture & Development Seminar Series (ADSS), Los Baños, Philippines, 17 April 2012
19
Weather data interpolation Weather variables include (in NetCDF format) for the year 2008: Field name
unit
Precipitation
mm/day
Screen temperature
K
10m wind speed
m/s
Surface height
m
Primary variables Secondary variable
Tool used: ArcMap 9.3 Screen temperature
Method : Cokriging Precipitation 10m wind speed
20
10
4/18/2012
Decision Tree Algorithm • To construct decision trees from a dataset • For example: Quinlan’s ID3, C4.5 as a successor of ID3 and CART (Classification and Regression Tree) A decision tree contains three types of nodes: 1. a root node, 2. internal nodes (nonleaf node), either a root node or an internal node contains attribute test conditions to separate records that have different characteristics, 3. leaf or terminal nodes, each leaf node is assigned a class label. 21
Decision Tree
• The task of classification aims to discover classification rules that determine the label class of any object (Y) from the values of its attributes (X). • A decision tree is a model expressing classification rules. Spatial Join Index
Spatial dataset
Attribute X (explanatory attributes): land cover, road, river, etc
improved ID3
Target attribute (Y): hotspot Spatial Decision tree22
11
4/18/2012
Spatial Decision Tree
River
Road
Land cover
Explanatory layers
Income Source Peatland depth Temperature
Target layer Hotspot
Spatial Decision Tree Algorithm
False alarm
Topological & Metric relationship
Spatial Decision Tree 23
Layers and distinct values in spatial dataset Layer Physical Distance to nearest river (dist_river), 'l1' Distance to nearest road (dist_road), 'l2' Distance to nearest city center (dist_city), 'l0' Land cover (land_cover), 'l4' Social-economy income source (income_source), 'l3' Weather Precipitation in mm/day (precipitation), 'l6' Screen temperature in K (screen_temp), 'l8' 10m wind speed in m/s (wind_speed), 'l9' Pealtand Peatland type (peatland_type), 'l5'
Number of features
Number of distinct values
1030 points 1030 points 1030 points
3 (low, medium, high) 3 (low, medium, high) 3 (low, medium, high)
3058 polygons
12 (Dryland_forest, plantation, Water_body and so on)
117 polygons
7 (Forestry, Agriculture, Trading_restaurant and so on)
7 polygons 7 polygons 7 polygons
2, 3 297, 298, 299 0, 1, 2
58 polygons
Hemists/Saprists, and so on
Peatland depth (peatland_depth), 'l7'
68 polygons
D1 (Shallow/Thin 50-100 cm), D2 (Moderate 100-200 cm), D3 (Deep/Thick 200-400 cm), D4 (very deep/very thick > 400 cm)
Target target
1030 points 24
12
4/18/2012
Applying Spatial ID3 Algorithm • Result: a spatial decision tree with the first test attribute is income source. Accuracy of spatial decision tree Dataset to calculate accuracy Data training Data testing Data testing
Accuracy 76.51% 71.12%, without pruning tree 71.66%, after pruning tree (4 iteration)
Size of tree and number of rules Spatial Decision Tree Without pruning tree After pruning tree (4 iteration)
Number of rules generated 134 122
Size of tree 613 553
Presented at the SEARCA Agriculture & Development Seminar Series (ADSS), Los Baños, Philippines, 17 April 2012
25
Applying Spatial ID3 Algorithm Spatial vs non-spatial algorithms Algorithm Accuracy Spatial algorithm Extended ID3 Spatial Decision Tree (the 71.66% proposed algorithm) Non Spatial algorithm (available in Weka 3.6.6) ID3 Decision Tree algorithm 49.02 % J48 Decision Tree (with pruned tree) *) 65.24 % *) J48 is Java implementation for the C4.5 Decision Tree Algorithm
Presented at the SEARCA Agriculture & Development Seminar Series (ADSS), Los Baños, Philippines, 17 April 2012
26
13
4/18/2012
Pruning tree • To overcome overfitting in creating decision tree. • Error on testing data increases because leaves in the large trees reflect noises or outlier • The method used: postpruning – Tree is fully grown at first, and then a subtree of the tree at a given node are pruned by removing the its branches and replacing it with a leaf. unpruned tree
pruned version
L1
v11
v12
L2 v21 No
L3 v22
v31 L6
L5 v51 Yes
v32 L7
v61
v62
v71
v72
No
No
No
Yes
Presented at the SEARCA Agriculture & Development Seminar Series (ADSS), Los Baños, Philippines, 17 April 2012
27
Spatial Decision Tree Subtree
Rule 4: IF income_source = Trading restaurant THEN Hotspot Occurrence = F Rule 2: IF income_source = Forestry AND land_cover = Bare_land AND 1 ≤ wind_speed (m/s) < 2 THEN Hotspot Occurrence = T
28
14
4/18/2012
Sample rules 1.
IF income_source = Forestry AND land_cover = Bare_land AND 0 ≤ wind_speed (m/s) < 1 AND 297 ≤ screen_temp (K) < 298 AND peatland_depth = D4 (Very deep/Very thick > 400 cm) THEN Hotspot Occurrence = F
2.
IF income_source = Forestry AND land_cover = Bare_land AND 1 ≤ wind_speed (m/s) < 2 THEN Hotspot Occurrence = T
3.
IF income_source = Forestry AND land_cover = Paddy_field AND 0 ≤ wind_speed (m/s) < 1 THEN Hotspot Occurrence = F
4.
IF income_source = Trading_restaurant THEN Hotspot Occurrence = F
5.
IF income_source = Plantation AND dist_road (m)