Knowledge Discovery from Sensor Data (Sensor-KDD)

Knowledge Discovery from Sensor Data (Sensor-KDD) ∗ Ranga Raju Vatsavai Olufemi A. Omitaomu Joao Gama Knowledge Discovery from Sensor Data (Sensor-KD...
Author: Oswald Parsons
0 downloads 0 Views 236KB Size
Knowledge Discovery from Sensor Data (Sensor-KDD) ∗

Ranga Raju Vatsavai Olufemi A. Omitaomu Joao Gama Knowledge Discovery from Sensor Data (Sensor-KDD) Oak Ridge National Oak Ridge National University of Porto, Portugal Laboratory, TN, USA

Laboratory, TN, USA

[email protected]∗ Ranga Raju Vatsavai Nitesh V. Chawla

[email protected] Olufemi A. Omitaomu Mohamed Medhat Gaber

Oak Ridge National University of Notre Dame, IN, Laboratory, USATN, USA

[email protected] [email protected] Nitesh V. Chawla University of Notre Dame, IN, ABSTRACT USA

Oak University, Ridge National Monash Australia Laboratory, TN, USA

mohamed.m.gaber@ [email protected] gmail.com Mohamed Medhat Gaber

[email protected] JoaoR.Gama Auroop Ganguly

University of Porto, Portugal Oak Ridge National Laboratory, TN, USA [email protected]

[email protected] Auroop R. Ganguly

Monash University, Australia Oak Ridge National or events require real-time analysis methodologies and sysLaboratory, TN, USA tems, while on the other hand centralized processing through

mohamed.m.gaber@ Extracting knowledge and emerging patterns from sensor [email protected] high end [email protected] is also required for generating offline gmail.com data is a nontrivial task. The challenges for the knowledge predictive insights, which in turn can facilitate real-time discovery community are expected to be immense. On one analysis. The online and real-time knowledge discovery imhand, dynamic data streams or events require real-time analor require real-time analysis and sysABSTRACT plyevents immediate opportunities as well methodologies as intriguing shortand ysis methodologies and systems, while on the other hand tems, whilechallenges on the other hand centralized processing through long-term for practitioners and researchers in knowlExtracting processing knowledge through and emerging patterns from is sensor centralized high end computing also high end computing is also required for be generating offline edge discovery. The opportunities would to develop new data is a for nontrivial task.offline The predictive challenges insights, for the knowledge required generating which in predictive insights, which inadapt turntraditional can facilitate real-time data mining approaches and and emerging discovery community are expected to In beaddition, immense.emerging On one turn can facilitate real-time analysis. analysis. online and real-time knowledge discovery imknowledgeThe discovery methodologies to the requirements of hand, dynamic data streams or eventsdiscovery require real-time societal problems require knowledge solutionsanalthat ply immediate opportunities as well as intriguing short-proband the emerging problems. In addition, emerging societal ysis methodologies and systems, while on the other hand are designed to investigate anomalies, changes, extremes long-term challenges fordiscovery practitioners and researchers in knowllems require knowledge solutions that are designed centralized processing through high end computing is also and nonlinear processes, and departures from the normal. edge discovery.anomalies, The opportunities be toand develop new to investigate changes, would extremes nonlinear required for generating offline predictive insights, which in Keeping in view the requirements of the emerging field of data miningand approaches andfrom adaptthe traditional and emerging processes, departures normal. The Sensorturn can facilitate real-time analysis. In addition, knowledge discovery from sensor data, we took emerging initiative knowledge discovery to theresearchers requirements of KDD workshop seeksmethodologies to bring together from societal problems require of knowledge discovery solutionsinterthat to develop a community researchers with common the emerging problems. In addition, emerging societal probacademia, government and the industry working in various are to investigate extremes ests designed and scientific goals, whichanomalies, culminatedchanges, into the organizalems require knowledge discoveryfrom solutions are designed aspects of knowledge discovery sensorthat data. and of nonlinear processes, departuresinfrom the normal. tion Sensor-KDD seriesand of workshops conjunction with to investigate anomalies, changes, extremes and nonlinear Keeping in view the requirements of the emerging field of the prestigious ACM SIGKDD International Conference of 1.1 Motivation processes, and departures from the normal. The Sensorknowledge sensorMining. data, we tookreport, initiative Knowledge discovery Discovery from and Data In this we KDDexpected workshop seeks to bring together researchers from The ubiquity of sensors in the near future, comto develop athe community of researchers with common intersummarize events of the Second ACM-SIGKDD Internaacademia, and they the industry working in various bined with government the critical roles are expected to play in high ests and scientific goals, which culminated into the organizational Workshop on Knowledge Discovery form Sensor Data aspects of knowledgesolutions, discovery point from sensor priority application to an data. era of unprecetion of Sensor-KDD (Sensor-KDD 2008). series of workshops in conjunction with dented growth and opportunities. The online knowledge the prestigious ACM SIGKDD International Conference of 1.1 Motivation discovery requirements described earlier imply immediate Knowledge Discovery and Data Mining. In this report, we 1. INTRODUCTION The expected as ubiquity of sensorsshortin theand near future, comopportunities well as intriguing long-term chalsummarize the events of the Second ACM-SIGKDD Internabined with the critical roles they are expected to play discovin high Wide-area sensoron infrastructures, remote sensors, and wirelenges for practitioners and researchers in knowledge tional Workshop Knowledge Discovery form Sensor Data priority application to anand era data of unpreceless sensor networks, ery. In addition, thesolutions, knowledgepoint discovery mining (Sensor-KDD 2008). RFIDs, yield massive volumes of disdented and would opportunities. The online knowledge parate, dynamic, and geographically distributed data. As (KDD) growth community be called upon, again and again, discovery earlier immediate such sensors are becoming ubiquitous, a set of broad reas partnersrequirements with domaindescribed experts to solveimply critical applica1. INTRODUCTION opportunities as intriguing short- andaslong-term quirements is beginning to emerge across high-priority aption solutions as in well business and government, well as inchalthe Wide-area sensor infrastructures, remote sensors, and wirelenges practitioners and researchers knowledge discovplications including disaster preparedness and management, domainfor sciences and engineering. The in main motivation for less sensor networks, RFIDs, yield massive volumes ofsecudisery. Sensor-KDD In addition, series the knowledge discovery and data mining adaptability to climate change, national or homeland the of workshops stems from the increasparate, and geographically data. The As (KDD) be called upon, and again, rity, anddynamic, the management of critical distributed infrastructures. ing needcommunity for a forumwould to exchange ideas andagain recent research such data sensors becoming a setmanaged of broad and reas partners domain collaboration experts to solve applicaraw fromare sensors need ubiquitous, to be efficiently results, and with to facilitate and critical dialog between quirements istobeginning to emergethrough across data high-priority aption solutions in business and government, as well asThis in the transformed usable information fusion, which academia, government, and industrial stakeholders. is plications including disastertopreparedness and management, domain sciences and engineering. main motivation for in turn must be converted predictive insights via knowlclearly reflected in the successful The organization of the first adaptability to ultimately climate change, national or homeland secuthe Sensor-KDD series workshops stems the increasedge discovery, facilitating automated or humanworkshop [3] along withofKDD-2007, which from was attended by rity, andtactical the management critical policy infrastructures. The ing a forumregistered to exchange ideas and recent research induced decisions orofstrategic based on decimoreneed thanforseventy participants. The high qualraw from sensors need to be systems. efficiently managed and results, and to facilitate dialoganbetween sion data sciences and decision support ity of submissions allowedcollaboration to us to putand together edited transformed to usable fusion, which academia, government, andinindustrial stakeholders. This is book [2] and a special issue the ‘Intelligent Data Analysis’ The challenges for theinformation knowledge through discoverydata community are in turn must converted predictive insightsdata via streams knowlclearly reflected in the successful organization of the first journal [1]. expected to bebe immense. Ontoone hand, dynamic edge discovery, ultimately facilitating automated or humanworkshop [3] along with KDD-2007, which was attended by ∗Corresponding author. induced tactical decisions or strategic policy based on decimore SUMMARY than seventy registered participants. The high qual2. OF THE WORKSHOP sion sciences and decision support systems. ity of submissions allowed to us to put together an edited Based on the positive feedback from the previous workshop book [2] and a special issue in the ‘Intelligent Data Analysis’ The challenges for the knowledge discovery community are attendees and our own experiences and interactions with journal [1]. expected to be immense. On one hand, dynamic data streams ∗Corresponding author. 2.

SUMMARY OF THE WORKSHOP

Based on the positive feedback from the previous workshop attendees and our own experiences and interactions with

SIGKDD Explorations

Volume 10, Issue 2

Page 68

the government agencies such as DHS, DOD, and involvement with numerous projects on knowledge discovery from sensor data, we organized the 2nd Sensor-KDD workshop along with the KDD-2008 conference. As expected we received very high quality paper submissions which were thorthe government such DHS, DOD,program and involveoughly reviewed agencies by a panel of as international comment with numerous projects on knowledge discovery from mittee members [4]. Based on a minimum of two reviews sensor data,wewe organized 2ndfull Sensor-KDD per paper, have selectedthe seven papers andworkshop six short along KDD-2008 As expected we repapers.with In the addition to theconference. oral presentations of accepted ceived very qualityfeatured paper submissions which were thorpapers, thehigh workshop two invited speakers Dr. oughly reviewed byProgram a panel of international program comKendra E. Moore, Manager, DARPA/IPTO and mittee members [4].Department Based on aof minimum two reviews Prof. Jiawei Han, ComputerofScience, Uniper paper, we haveatselected seven full papers sixbriefly short versity of Illinois Urbana-Champaign. Weand now papers. In each addition to the oral presentations of accepted summarize of these presentations, for full details about papers, the presented workshop papers, featuredplease two invited Dr. each of the refer to speakers the workshop Kendra E. Moore, Program Manager, DARPA/IPTO and proceedings [5]. Prof. Jiawei Han, Department of Computer Science, UniversitySession of Illinois1 at Urbana-Champaign. We now briefly 2.1 summarize each ofwas these presentations, for Auroop full details about The first session moderated by Dr. Ganguly. each of the presented papers, please refer to the workshop This session featured our first invited speaker, Dr. Kendra proceedings [5]. by two paper presentations. Dr. Moore Moore, followed presented the challenges of knowledge discovery from sensor 2.1 data inSession defense 1 applications. She touched upon wide vaThe first session was moderated by Dr.distributed Auroop Ganguly. riety of topics including heterogeneity, sensors, This session featured our first invited speaker, Dr. Kendra real-time requirements, privacy, and applications of national Moore, followed by two paper presentations. Dr. Moore importance. presented challenges of knowledge discovery from The paper,the “Anomaly Detection from Sensor Data forsensor Realdata defense applications. touched upon wide and vatime in Decisions” by Olufemi A.She Omitaomu, Yi Fang, riety of topics including heterogeneity, distributed sensors, Auroop R. Ganguly, is presented by Olufemi. This paper is real-time requirements, privacy,ofand applications concerned about the detection unusual profiles of ornational anomaimportance. lous behavioral characteristics from multiple types of sensor The paper, “Anomaly Detection from Sensor Data fordiscovRealdata. The authors presented a two-stage knowledge timeprocess, Decisions” byoffline Olufemi A. Omitaomu, Yi Fang, and ery where approaches are utilized to design Auroop R. Ganguly, presented Olufemi. This paper is online solutions that is can support by real-time decisions. They concerned about the detection of unusual anomaillustrated this innovative solution in the profiles contextor detecting lous behavioral characteristics types measureof sensor anomalous behavior of trucks from usingmultiple sensor-based data. The authors presented a two-stage knowledge discovments collected at truck weigh stations. This is a fine exery process, where offline approaches are utilized to design ample of knowledge discovery application in the context of online solutions national security.that can support real-time decisions. They illustrated innovative solution in theupon context detecting The paper,this “Network Service Disruption Natural Disanomalous behavior of Sensory trucks using sensor-based aster: Inference Using Measurements andmeasureHuman ments at truck weigh stations. This is aJi,fine exInputs”collected by Supaporn Erjongmanee and Chuanyi shows ample of knowledge discovery in learning the context of important role of data mining application and machine in innational security. network service disruption. Natural disferring large-scale The paper, “Network Service Katrina, Disruption Natural Disasters, like recent Hurricane canupon cause large-scale aster: Inference Using Sensory Measurements and Human network service interruptions which leads to unreachability Inputs” by Supaporn Erjongmanee Chuanyi Ji, shows of networks. The authors presented aand joint use of large-scale important role of data mining and machine in insensory measurements from Internet and a learning small number ferring large-scale network service disruption. Natural disof human inputs for effective network inference through a asters, likeand recent Hurricane Katrina, cause large-scale clustering semi-supervised learningcan algorithm. This apnetworkisservice interruptions which leadsdisruption to unreachability proach evaluated on network service induced of authors a joint of large-scale bynetworks. HurricaneThe Katrina at presented subnet level. Theuse results showed sensory measurements Internet and a small by number that clustering reduces from the spatial dimensionality 81%, of human inputs for effective inference through a and the subnet statuses inferrednetwork by the semi-supervised clasclustering andinteresting semi-supervised sifier showed facts oflearning networkalgorithm. resilience.This approach is evaluated on network service disruption induced by Hurricane Katrina at subnet level. The results showed 2.2 Session 2 that clustering reduces spatial dimensionality by Omi81%, The second session wasthe moderated by Dr. Olufemi and the subnet statuses inferred by the semi-supervised taomu. This session featured five paper presentations, clascovsifier interesting ering showed variety of topics. facts of network resilience. The paper, “Spatio-Temporal Outlier Detection in Precipi2.2 Session 2 tation Data” by Elizabeth Wu, Wei Liu, and Sanjay Chawla, The presented second session was moderated Dr. address Olufemione Omiwas by Elizabeth. This by paper of taomu. This session featured five paper presentations, covering variety of topics. The paper, “Spatio-Temporal Outlier Detection in Precipitation Data” by Elizabeth Wu, Wei Liu, and Sanjay Chawla, was presented by Elizabeth. This paper address one of

SIGKDD Explorations

the core data mining techniques, outlier detection, from large volumes of spatio-temporal data. Current data mining techniques have several limitations in handling spatiotemporal data, therefore it is very important to develop new techniques or extend existing techniques to handle spatiothe core data techniques, outliera detection, from temporal data. mining The authors presented spatio-temporal large volumes of spatio-temporal data. Current data minoutlier detection algorithm called Outstretch, which discoving the techniques have several limitations handling spatioers outlier movement patterns of theintop-k spatial outtemporal data, therefore it is very to develop liers over several time periods. Theimportant top-k spatial outliersnew are extend existingTop-k techniques to handle spatiotechniques found usingorthe Exact-Grid and Approx-Grid Top-k temporal data. presented a spatio-temporal algorithms, whichThe are authors an extension of algorithms developed outlier detection Outstretch, which discovby Agarwal et al. algorithm [2]. Thesecalled algorithms use Kulldorff spatial ers the outlierwhich movement patterns of the top-k spatialunafoutscan statistic, is designed to discover all outliers, liers over time periods. Thethat top-k spatial outliers are fected by several the neighbouring regions may contain missing found using Exact-Grid Approx-Grid Top-k values. Afterthe generating the Top-k outlierand sequences, the authors algorithms, whichsequences are an extension of algorithmsbydeveloped shows how these can be interpreted, comparby al.phases [2]. These use Kulldorff spatial ingAgarwal them to et the of thealgorithms El Nino Southern Oscilliation scan statistic, which is designed to discover all outliers, unaf(ENSO) weather phenomenon. fected by the“Probabilistic neighbouring Analysis regions that contain missing The paper, of amay Large-Scale Urban values. After generating the outlier sequences, the authors Traffic Sensor Data Set” by Jon Hutchins, Alexander Ihler, shows how these sequences can be interpreted, by comparand Padhraic Smyth, was presented by Jon. It is very imporing to theunderlying phases of the El Nino Southern Oscilliation tantthem to detect patterns in large volumes of spa(ENSO) weather phenomenon. tiotemporal data as it allows, for example, human behavior The paper,traffic “Probabilistic of a Large-Scale modeling, planning,Analysis etc. However, real-world Urban sensor Traffic Sensor by Jon Hutchins, Ihler, time series are Data often Set” significantly noisy and Alexander more difficult to and Padhraic Smyth, was presented Jon.sets It isthat verytend imporwork with than the relatively cleanbydata to tantused to detect patterns in in large volumes of spabe as theunderlying basis for experiments many research patiotemporal data asthe it authors allows, for example, behavior pers. In contrast, report on a human large case-study modeling, statistical traffic planning, etc. However, real-world involving data mining of over 100 millionsensor meatime series from are often noisy and more surements 1700significantly freeway traffic sensors over adifficult period to of work with thaninthe relatively clean data setsdiscussed that tendthe to seven months Southern California. They be used asposed the basis forwide experiments many research pachallenges by the variety ofindifferent sensor failpers. In contrast, authors on a The largevolume case-study ures and anomaliesthe present in report the data. and involving statistical data mining the of over 100manual millionvisualmeacomplexity of the data precludes use of surements from 1700 freeway traffic sensorstoover a period of ization or simple thresholding techniques identify these seven months in Southern California. They discussed the anomalies. The authors describe the application of probchallenges posed byand theunsupervised wide variety of different sensor failabilistic modeling learning techniques to ures and anomalies present how in the data. The volume and this data set and illustrate these approaches can succomplexity of the data precludes the use of manual cessfully detect underlying systematic patterns even visualin the ization orofsimple thresholding techniques to identify these presence substantial noise and missing data. anomalies. The authors describe application of probThe paper, “WiFi Miner: An Onlinethe Apriori-Infrequent Based abilistic modeling and unsupervised techniques to Wireless Intrusion Detection System” learning by Ahmedur Rahman, this data setand andA.K. illustrate how deals these with approaches candetecsucC.I. Ezeife, Aggarwal, intrusion cessfully detect networks. underlyingTheir systematic patterns even in tion in wireless system, WiFi Miner, is the capresence of substantial noise missingpatterns data. from prepable of finding frequent andand infrequent The paper,wireless “WiFi Miner: An Online Apriori-Infrequent processed connection records using infrequent Based patWireless Intrusion Detection System” by Ahmedur Rahman, tern finding Apriori algorithm. This online Apriori-Infrequent C.I. Ezeife,improves and A.K. deals step withofintrusion detecalgorithm theAggarwal, join and prune the traditional tion in wireless networks. Theirthat system, WiFi Miner, is caApriori algorithm with a rule avoids joining itemsets pable of finding frequent and infrequent fromscore prenot likely to produce frequent itemsets. patterns An anomaly processed wirelesstoconnection records using infrequent patis then assigned each packet (record) based on whether tern findinghas Apriori This online Apriori-Infrequent the record morealgorithm. frequent or infrequent patterns. Conalgorithm improves join and prune step of the traditional nection records withthe positive anomaly scores have more inApriori withfrequent a rule that avoids joining itemsets frequentalgorithm patterns than patterns and are considered notanomalous likely to produce itemsets. An a anomaly as packets. frequent The authors described solutionscore that is then assigned to for each packet (record) baseddata on whether eliminates the need hard-to-obtain training in wirethe record has more frequent infrequent patterns. Conless network environments, andorincreases intrusion detection nection rate andrecords reduceswith falsepositive alarms.anomaly scores have more infrequent patterns than frequent patterns and are considered The paper, “Mobile Visualization for Sensory Data Stream as anomalous packets. The authors described a solution Mining” by Pari Delir Haghighi, Brett Gillick, Shonali that Kreliminates theMohamed need for hard-to-obtain dataZaslavsky, in wireishnaswamy, Medhat Gaber,training and Arkady less network and increases intrusion detection introduces anenvironments, integrated architecture of situation aware adaprate and reduces false alarms. tive data mining and mobile visualization techniques for The paper, “Mobile Visualization for Sensory Data Stream Mining” by Pari Delir Haghighi, Brett Gillick, Shonali Krishnaswamy, Mohamed Medhat Gaber, and Arkady Zaslavsky, introduces an integrated architecture of situation aware adaptive data mining and mobile visualization techniques for

Volume 10, Issue 2

Page 69

ubiquitous computing environments. With the emergence of ubiquitous data mining and recent advances in mobile communications, there is a need for visualization techniques to enhance the user-interactions, realtime decision making and comprehension of the results of mining algorithms. To adubiquitous computingproblem, environments. With proposed the emergence of dress this important the authors a novel ubiquitous data mining and recent advances in mobile comarchitecture for situation-aware adaptive visualization that munications, there visualization is a need for techniques visualization to applies intelligent totechniques data stream enhance thesensory user-interactions, realtime decisionincorporates making and mining of data. Their architecture comprehension of thefor results of mining algorithms. Toconadfuzzy logic principles modeling and reasoning about dress this important authors proposed of a novel text/situations and problem, performs the gradual adaptation data architecture for situation-aware adaptive visualization mining and visualization parameters according to thethat ocapplies situations. intelligent visualization techniques to datais stream curring A prototype of the architecture implemining sensory Their in architecture mented of using J2MEdata. and tested the area ofincorporates health-care fuzzy logic principles for modeling and reasoning about conmonitoring. text/situations and Pixel performs gradual adaptation data The paper, “Dense Visualization for Mobile ofSensor mining and visualization parameters according to the ocData Mining” by Pedro Pereira Rodrigues and Joo Gama, curring situations. A prototype of the architecture impledescribes dense pixel visualization techniques for is visualizmented using in theerrors area resulting of health-care ing sensor dataJ2ME and asand welltested as absolute from monitoring. predictive models. Sensor data is usually represented by The paper,time “Dense Pixel Visualization for Mobile Sensor streaming series. Current state-of-the-art systems for Data Mining”include by Pedro Rodrigues and Joo Gama, visualization linePereira plots and three-dimensional repdescribes dense pixel visualization techniques for visualizresentations, which most of the time require screen resoluing sensor data and as well in assmall absolute errorsmobile resulting from tions that are not available transient devices. predictive datacyclic is usually represented Moreover, models. when dataSensor presents behaviors, such as by in streaming timedomain, series. Current state-of-the-art systems for the electricity predictive models may tend to give visualization linerecurrent plots andpoints three-dimensional higher errors include in certain of time, but repthe resentations, most oftothe timethese require screen human-eye is which not trained notice cycles in resolua long tions that not available mobile stream. Toare overcome some in ofsmall thesetransient limitations, the devices. authors Moreover, presents cyclic behaviors, such as in proposed awhen simpledata dense pixel display visualization system, the electricity models may on tend to give exploiting the domain, benefits predictive that it may represent detecting higher errors inrecurrent certain faulty recurrent points ofAtime, but the and correcting predictions. case study is human-eye is not trained to notice thesestrategy cycles is in studied a long also presented, where a simple corrective stream. To overcome of these limitations, the authors in the context of globalsome electrical load demand, exemplifying proposed dense pixel display visualization system, the utilityaofsimple the new visualization method when compared exploiting the benefits that it may represent on detecting with automatic detection of recurrent errors. and correcting recurrent faulty predictions. A case study is also 2.3 presented, Sessionwhere 3 a simple corrective strategy is studied in the context of global electrical load demand, exemplifying This session which was moderated by Dr. Joao Gama, feathe utility of the new visualization method when compared tured our second invited speaker, Prof. Jiawei Han, and two with automatic detection of recurrent errors. paper presentations. Prof. Jiawei’s talk on “Data Mining in Sensor Network Systems: Trouble-Shooting and Shooting2.3 Session Trouble” captures3 dual application of data mining and touches This session which wasinmoderated by Dr. Joao Gama, fearecent advances made the filed. Abstract of Prof. Jaiawei tured invited Han’s our talksecond is given below.speaker, Prof. Jiawei Han, and two paper Prof. Jiawei’s talk Mining in “Data presentations. mining will play an essential role on in “Data the development Sensor Network Systems:systems. Trouble-Shooting anddiscuss Shootingof robust sensor network The talk will our Trouble” captures application of data mining anddevelop touches recent work in twodual research frontiers: (1) how to recent advances made methods in the filed. Abstract of Prof. effective data mining for troubleshooting in Jaiawei the deHan’s talk of is robust given below. velopment sensor network systems; and (2) how to “Data willmining play anmethods essential in thetroubles development developmining new data forrole shooting (i.e., of robust sensor network systems. talk will discuss our anomalies) in data streams, whichThe should be an essential recent work in twonetwork researchsystems. frontiers: (1) how to develop function in sensor effective methods for atroubleshooting in the deHandlingdata the mining first task leads to tool, called DustMiner, velopment of robust systems; and (2) in how to for uncovering bugssensor due tonetwork interactive complexity netdevelop new dataapplications. mining methods shooting troubles (i.e., worked sensing Suchfor bugs are not localized to anomalies) in data streams, which should be from an essential one component that is faulty, but rather result complex function in sensor network systems. and unexpected interactions between multiple often individHandling the first task leads Moreover, to a tool, the called DustMiner, ually non-faulty components. manifestations forthese uncovering due interactivemaking complexity netof bugs arebugs often nottorepeatable, them in particworkedhard sensing applications. Such bugs are not to ularly to find, as the particular sequence of localized events that one component faulty, but to rather result from complex invokes the bug that mayisnot be easy reconstruct. Because of and distributed unexpectednature interactions between multiple individthe of failure scenarios, our often tool looks for ually non-faulty components. Moreover, the manifestations of these bugs are often not repeatable, making them particularly hard to find, as the particular sequence of events that invokes the bug may not be easy to reconstruct. Because of the distributed nature of failure scenarios, our tool looks for

SIGKDD Explorations

sequences of events that may be responsible for faulty behavior, as opposed to localized bugs such as a bad pointer in a module. An extensible framework is developed where a frontend collects runtime data logs of the system being debugged and an offline back-end uses frequent discriminative pattern sequences events likely that may be responsible mining to of uncover causes of failure. for Thefaulty tool behavhelped ior, as opposed to localized bugs such as a bad pointer in a uncover event sequences that lead to a highly degraded mode module. An extensible framework developed where a frontof operation. Fixing the problem is significantly improved the end collects runtime data logs of the system being debugged performance of the protocol. offline troubles back-end(i.e., uses anomalies) frequent discriminative pattern and For an shooting in sensor network mining toweuncover of failure. tool helped systems, proposelikely a newcauses approach to build The predictive moduncover eventevents sequences that lead to astreams. highly degraded mode els for rare in sensor data The method of operation. Fixing the problem significantly the estimates reliable posterior probabilities using improved an ensemble performance the protocol. of models to of match the distribution over under-samples of For shooting anomalies) in sensor negatives and troubles repeated(i.e., samples of positives (i.e., network anomasystems, weformally propose ashow new approach to build predictive modlies). We some interesting and important els for rare in sensor data streams. The method properties of events the proposed framework, e.g., reliability of esestimatesprobabilities reliable posterior probabilities ensemble on skewed positive using class, an accuracy of timated of models probabilities, to match the efficiency distribution over under-samples of estimated and scalability. Experinegatives repeatedonsamples positivesas (i.e., anomaments areand performed several of synthetic well as reallies). datasets We formally show some interesting important world with skewed distributions, andand they demonproperties theframework proposed framework, e.g., advantages reliability ofover esstrate that ofour has substantial timated probabilities on skewed positive class, accuracy of existing approaches in estimation reliability and predication estimated probabilities, efficiency and scalability. Experiaccuracy.” ments are performed on several synthetic as wellDistribuas realThe paper, “Monitoring Incremental Histogram world datasets with skewed distributions, and they demontion for Change Detection in Data Streams” by Raquel Sestrate that our framework has substantial over bastio, Joo Gama, Pedro Pereira Rodrigues, advantages and Joo Bernardes, existing estimation reliability changes and predication addressesapproaches importantinproblem of detecting in conaccuracy.”histograms from time-changing high-speed data structing The paper, “Monitoring HistogramforDistribustreams. Histograms are Incremental a common technique density tion for Change Detection in Data Streams” estimation and they have been widely used asbya Raquel tool in Seexbastio, Joodata Gama, Pedro Pereira Rodrigues, and Joo Bernardes, ploratory analysis. Learning histograms from static addresses important of detecting changes in conand stationary data isproblem a well known topic. In this paper austructing histograms from time-changing high-speed data thors present algorithms to detect changes from high-speed streams. Histograms are a common technique density time-changing data streams. The authors studiedfor strategies estimation and they been widely generating used as a tool in exto detect changes in have the distribution examples, ploratory analysis. to Learning static and adapt data the histogram the mosthistograms recent datafrom by forgetand stationary data isThey a wellused known In this paper auting outdated data. thetopic. Partition Incremental thors present algorithms changes from high-speed Discretization algorithm to fordetect this task. The authors comtime-changing data streams. authors studieddivergence, strategies pared the distributions usingThe Kullback-Leibler to detecta changes in for thechange distribution generating defining threshold detection decisionexamples, based on and asymmetry adapt the histogram to the most datatheir by forgetthe of this measure and recent evaluated algoting outdated data. artificial They used rithm on controlled data.the Partition Incremental Discretization algorithm for this task. The authors compared the distributions using Kullback-Leibler divergence, 2.4 Session 4 defining a threshold for change detection decision based on The final session was moderated by Dr. Nitesh Chawla. the asymmetry of this measure and evaluated their algoThis session featured four paper presentations. rithm on controlled artificial data. The paper, “Unsupervised Plan Detection with Factor Graphs” by George B. Davis, Jamie Olson, and Kathleen M. Car2.4 Session 4 ley, describes synchronous and asynchronous expectation The final session was moderated by Dr. learning Nitesh in Chawla. maximization algorithms for unsupervised factor This session featuredplans four paper presentations. graphs. Recognizing of moving agents is a natural goal Themany paper, “Unsupervised Planapplications Detection with Factorrobotic Graphs” for sensor systems, with including by George B.traffic Davis,control, Jamie and Olson, and Kathleen M. Carpath finding, detection of anomalous beley, describes synchronous and asynchronous expectation havior. However, plan recognition gets complicated in the maximization algorithms for unsupervised learningplans in factor absence of contextual information such as labeled and graphs. Recognizing plans of moving agents is a natural goal relevant locations. Authors introduced two unsupervised for many to sensor systems, with applications robotic methods simultaneously estimate modelincluding parameters and path finding, detection of anomalous behidden values traffic withincontrol, a Factorand graph representing agent tranhavior. over However, in apthe sitions time. plan Theserecognition algorithmsgets werecomplicated evaluated by absencethem of contextual information such asconsisting labeled plans and plying on a GPS tracking dataset of 1074 relevant introduced unsupervised ships overlocations. 5 days in Authors the English channel. two Initial results inmethodsthat to simultaneously estimate model parameters and dicated a reasonable model may be inferred on this hidden values within a Factor graph representing agent transitions over time. These algorithms were evaluated by applying them on a GPS tracking dataset consisting of 1074 ships over 5 days in the English channel. Initial results indicated that a reasonable model may be inferred on this

Volume 10, Issue 2

Page 70

difficult problem. The paper, “An Adaptive Sensor Mining Framework for Pervasive Computing Applications” by Parisa Rashidi and Diane J. Cook, presents adaptive data mining framework for detecting patterns in sensor data. Mining sequences of sendifficult problem. sor events poses unique challenges to the KDD community The paper, “An Adaptive Sensordata Mining Framework for Perespecially when the underlying source is dynamic and vasive Computing Applications” by Parisa Rashidi have and Dipatterns change. In this papers the authors inthe ane J. Cook, presents data adaptive dataframework mining framework for troduced an adaptive mining that detects detecting patterns sensor sequences of sensensor in data, anddata. moreMining importantly, adapts to patterns in sor events unique challenges to The the KDD community the changesposes in the underlying model. frequent and peespecially whenofthe underlying sourceby is dynamic and riodic patterns data are first data discovered the Frequent the patterns In this(FPPM) papers the authors and havethen inand Periodic change. Pattern Miner algorithm; troduced an in adaptive data mining framework detects any changes the discovered patterns over thethat lifetime of patterns in are sensor data, and more importantly, adapts to the system discovered by the Pattern Adaptation Miner the changes in the underlying The andenvipe(PAM) algorithm, in order to model. adapt to thefrequent changing riodic patterns of data arealso firstcaptures discovered bycontext the Frequent ronment. This framework vital inforand Periodic Minercomputing (FPPM) algorithm; and then mation presentPattern in pervasive applications, such as any startup changes triggers in the discovered patterns over the lifetime of the and temporal information. This data the system are discovered by theusing Pattern Miner mining framework is evaluated theAdaptation data collected in (PAM) algorithm, in order to adapt to the changing envithe CASAS smart home testbed. ronment. This framework alsoNeighborhood captures vital Discovery context inforThe paper, “Spatiotemporal for mation present in pervasive computing applications, as Sensor Data” by Michael P. McGuire, Vandana P. such Janeja, the startup triggers and temporal This and Aryya Gangopadhyay, describesinformation. a framework for thedata dismining framework is evaluated using theindata collected in covery of spatiotemporal neighborhoods sensor datasets the CASAS smart home testbed. where a time series of data is collected at many spatial loThe paper, Neighborhood neighborhoods Discovery for cations. The“Spatiotemporal purpose of the spatiotemporal Sensor Data”regions by Michael McGuire, Vandana P. Janeja, is to provide in theP.data where knowledge discovery and Aryya describes framework distasks such Gangopadhyay, as outlier detection, can abe focused. for Asthe buildcovery of spatiotemporal neighborhoods in sensor datasets ing blocks for the spatiotemporal neighborhoods, authors where a time series of data collected at many spatial lohave developed a method to is generate spatial neighborhoods cations. The purpose of thetemporal spatiotemporal and a method to discretize intervals.neighborhoods These methis towere provide regions in the whereincluding knowledge ods tested on real lifedata datasets (a)discovery sea surtasks such as outlier detection, can be focused. As Ocean buildface temperature data from the Tropical Atmospheric ing blocks for array the spatiotemporal neighborhoods, Project (TAO) in the Equatorial Pacific Ocean authors and (b) have developed a methoddata to generate spatial neighborhoods highway sensor network archive and initial results were and a method to discretize temporal intervals. These methencouraging. ods were tested on real life datasets including (a) sea surThe paper, “Exploiting Spatial and Data Correlations for face temperature data from theinTropical Ocean Approximate Data Collection WirelessAtmospheric Sensor Networks” Project (TAO) array in the Equatorial Pacific Ocean and (b) by Chih-Chieh, Hung Wen-Chih, Peng York, Shang-Hua highway networkLee, datadescribes archive and initial results were Tsai andsensor Wang-Chien algorithms for finding encouraging. representative sensor nodes. Finding sensor nodes with simThereadings paper, “Exploiting Spatial for ilar is an important task and as it Data allowsCorrelations data reduction. Approximate Data Collection Wireless However, efficiently identifyinginthe sensorSensor groupsNetworks” and their by Chih-Chieh, Hung Peng task. York,Authors Shang-Hua representative nodes is aWen-Chih, very challenging proTsai and Wang-Chien Lee, describes for afinding posed an algorithm, namely DCglobal,algorithms to determine set of representative nodes. Finding sensor nodes simrepresentative sensor nodes that have high energy levels with and wide ilar readings an important as it allows data data coverageis ranges, where task a data coverage rangereduction. of a senHowever, thewhose sensor groupsvectors and their sor node isefficiently the set ofidentifying sensor nodes reading are representative nodes is a very challenging task. Authors provery close to the sensor node. Furthermore, they propose posed an algorithm, namelytoDCglobal, to determine a set of a maintenance mechanism dynamically select alternative representative nodes when that have high energy levels representative nodes the representative nodesand havewide less data coverage ranges, where a data of a spasenenergy or representative nodes can coverage no longerrange capture sor node is the set of sensor reading vectors are tial correlation within theirnodes data whose coverage ranges. Experivery close to the node. Furthermore, they propose mental studies on sensor both synthetic and real datasets, showed a maintenance mechanism to dynamically select alternative that DCglobal was able to effectively and efficiently provide representative nodes when thewhile representative nodes less approximate data collection prolonging the have network energy or representative nodes can no longer capture spalifetime. tial correlation their datawere coverage ranges. ExperiClosing remarkswithin of the workshop provided by Dr. Raju mental studies on both synthetic real datasets, showed Vatsavai. He summarized the daysand proceedings by thanking that DCglobal wassponsors, able to effectively efficiently provide invited speakers, program and committee members, approximate data collection while prolonging the network and participants. lifetime. Closing remarks of the workshop were provided by Dr. Raju Vatsavai. He summarized the days proceedings by thanking invited speakers, sponsors, program committee members, and participants.

SIGKDD Explorations

3.

CONCLUSIONS

Extracting knowledge and emerging patterns from sensor data is a nontrivial task. The challenges for the knowledge discovery community are expected to be immense. As evidenced from the participation and quality of submissions 3. to theCONCLUSIONS first and second Sensor-KDD workshops, it is clear Extracting knowledge and emerging patterns sensor that the ‘Knowledge Discovery from Sensor Datafrom or Sensordata nontrivial task. The for the knowledge KDD’isisa clearly a growing areachallenges and an important specialty discovery community expected discovery. to be immense. As ev(sub-area) within theare knowledge The Sensoridenced from theis participation quality offorum submissions KDD workshop proven to be and an attractive for the to the first from and second Sensor-KDD workshops, it istoclear researchers academia, industry and government, exthat theideas, ‘Knowledge from Sensor Data or Sensorchange initiateDiscovery collaborations and lay foundation to is clearly area an important specialty KDD’ the future of thisa growing important andand growing area. The work(sub-area) within theparticipation knowledge discovery. The Sensorwitnessed lively from all quarters, genershop KDD workshop is proven to immediately be an attractive for the ated interesting discussions afterforum each presenresearchers from government, to extation and as wellacademia, as at the industry end of theand workshop. All particchange agreed ideas, for initiate collaborations and to ipants continued patronage forlay thefoundation Sensor-KDD the future of important growing area.proceedings, The workworkshop. In this addition to theand ACM workshop shop witnessed participation quarters, generextended paperslively will be published asfrom post all workshop proceedated in interesting discussions immediately after in each presenings Springer’s well-known ‘Lecture Notes Computer tation and as well as at the end of the workshop. All particScience’ series. ipants agreed for continued patronage for the Sensor-KDD workshop. In addition to the ACM workshop proceedings, extended papers will be published as post workshop proceedings in Springer’s well-known ‘Lecture Notes in Computer 4. ACKNOWLEDGMENTS Science’ series. We would like to thank the authors of all submitted papers and presenters. Their innovation and creativity has resulted in a strong technical program. We are highly indebted to the program committee members, whose reviewing ef4. forts ACKNOWLEDGMENTS ensured in selecting a competitive and strong techniWe program. would likeThe to thank thecommittee authors ofincluded: all submitted pacal program Michaela pers andAndre presenters. TheirSanjay innovation and creativity reBlack, Carvalho, Chawla, Francisco has Ferrer sulted in a strong technical program. We are highly indebted Ray Hickey, Ralf Klinkenberg, Miroslav Kubat, Mark Last, to the program committee members, whose reviewing efChang-Tien Lu, Elaine Parros Machado de Sousa, Sameep forts ensured in selecting a competitive and strong techniMehta, Laurent Mignet, S. Muthu Muthukrishnan, Pedro cal program. TheRoure, program committee included: Rodrigues, Josep Bernhard Seeger, Cyrus Michaela Shahabi, Black, Andre Carvalho, SanjaySorokine, Chawla, Eiko Francisco Mallikarjun Shankar, Alexandre Yoneki,Ferrer Philip Ray Hickey, Ralf Klinkenberg,and Miroslav Kubat, Last, S. Yu, Nithya Vijayakumar, Guangzhi Qu. Mark We would Chang-Tien Parros Machado de Sousa, like to thankLu, ourElaine invited speakers, Dr. Kendra E. Sameep Moore, Mehta, Mignet, S. Muthu and Muthukrishnan, ProgramLaurent Manager, DARPA/IPTO Prof. JiaweiPedro Han, Rodrigues, Roure, Bernhard Seeger, Cyrus Shahabi, DepartmentJosep of Computer Science, University of Illinois at Mallikarjun Shankar, who Alexandre Eiko Yoneki, readPhilip Urbana-Champaign, despiteSorokine, their busy schedules, S. Nithya andmotivating Guangzhi and Qu.informative We would ily Yu, agreed and Vijayakumar, delivered highly like to We thank ourlike invited speakers, Dr. Kendra Moore, talks. would to thank, Dr. Brian Worley,E.Director, Program Manager, DARPA/IPTO and Prof. Jiawei Han, Computational Sciences and Engineering Division (CSED), Department of Computer Science, University of Illinois at Oak Ridge National Laboratory (ORNL), for his encourUrbana-Champaign, who despite their busy schedules, readagement, support, and continued patronage of Sensor-KDD ily agreed series, and delivered motivating andGroup informative workshop and Dr. highly Budhendra Bhaduri, leader talks. We would like to thank, Dr. and Brian Worley, Director, of Geographic Information Science Technology, CSED, Computational Sciences andsupport Engineering Division (CSED), ORNL, for his enthusiastic and best paper award Oak Ridge National Laboratory (ORNL), for his program encoursponsorship. We would like to thank the SensorNet agement, support, and continued managed patronagebyofthe Sensor-KDD (url: http://www.sensornet.gov) Computaworkshop series,and andEngineering Dr. Budhendra Bhaduri, Group tional Sciences Division at the Oak leader Ridge of Geographic Information Science and Technology, CSED, National Laboratory and other collaborators. ORNL, for his report enthusiastic support and best paper award This workshop has been co-authored by UT-Battelle, sponsorship. We wouldDE-AC05-00OR22725 like to thank the SensorNet program LLC, under contract with the U.S. (url: http://www.sensornet.gov) managed by the ComputaDepartment of Energy. The United States Government retional and Engineering Division the Oak tains, Sciences and the publisher by accepting theatarticle forRidge pubNational Laboratory and other collaborators. lication, acknowledges that the United States Government This workshop report haspaid-up, been co-authored byworld-wide UT-Battelle, retains, a non-exclusive, irrevocable, liLLC, under contract DE-AC05-00OR22725 cense to publish or reproduce the publishedwith formthe of U.S. this Department of allow Energy. ThetoUnited Government remanuscript, or others do so, States for United States Govtains, and the publisher by accepting the article for pubernment purposes. lication, acknowledges that the United States Government retains, a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes.

Volume 10, Issue 2

Page 71

5.

WORKSHOP ORGANIZERS

Dr. Ranga Raju Vatsavai has been conducting research in the area of spatiotemporal databases and data mining for the past 15 years. Before joining the Oak Ridge National Laboratory (ORNL) as a Research Scientist, he worked at 5. WORKSHOP (2004-06;ORGANIZERS IIT-Delhi campus), U of Minnesota IBM-Research Dr. Ranga Raju Vatsavai has been conducting research in (1999-2004; Twin-cities campus, MN), AT&T Labs (1998; the area of spatiotemporal andofdata mining for Middletown, NJ), Center for databases Development Advanced Comthe past(1995-98; 15 years.C-DAC, Before U joining the campus, Oak Ridge National puting of Pune India), and Laboratory (ORNL) a Research Scientist, he workedFRI at National Forest DataasManagement Center (1990-95; IBM-Research (2004-06; IIT-Delhi campus), U of Minnesota Campus, Dehradun, India). He has published over thirty (1999-2004; Twin-cities campus, MN), AT&Tcommittees Labs (1998; peer-reviewed articles and served on program of NJ), Center for Development of Advanced ComMiddletown, several international conferences (KDD, ICTAI, SSTDM). puting C-DAC, of Pune India), and was (1995-98; also involved in theUdesign andcampus, development of sevHe National Forest Data Management Center (1990-95; FRI successful software systems (UMN-MapServer eral highly Campus, India). HeWebGIS, has published over - a worldDehradun, leading open source *Miner - athirty spapeer-revieweddata articles and workbench, served on program committees of tiotemporal mining EASI/PACE classifiseveral international conferences (KDD, ICTAI, SSTDM). cation modules, and first parallel softcopy photogrammetry He was for alsoIRS-1C/1D involved insatellites). the design His andbroad development sevsystem researchofintereral are highly successful software systems (UMN-MapServer ests centered on spatial, spatiotemporal databases and -data a world leading open source WebGIS, *Minerin -particua spamining, and computational geoinformatics; tiotemporal data mining workbench, EASI/PACE lar he is interested in statistical pattern recognition,classifisemication modules, and first parallel softcopy photogrammetry supervised learning, multiple classifier systems, time series system IRS-1C/1D satellites). His retrieval, broad research interanalysisfor and forecasting, information uncertainty ests error are centered on spatial, spatiotemporal databases and and handling. data computational in particuDr. mining, Olufemi and A. Omitaomu is ageoinformatics; research associate in the lar he is interested in statistical pattern recognition, Computational Sciences and Engineering Division atsemithe supervised classifier systems,interests time series Oak Ridge learning, National multiple Laboratory. His research inanalysis and forecasting, information retrieval, clude data mining and knowledge discovery from uncertainty distributed and error handling. sensor data, infrastructure modeling and interdependencies, Dr. Olufemi A. Omitaomu is a research associate in the machine learning, and uncertainty analysis. He received Computational Sciences and Engineering Division of atTenthe Ph.D. in information engineering from the University Oak Ridge National Laboratory. His research journals interestsand innessee. He has published in top peer-reviewed clude data mining and knowledge discovery from distributed conferences; co-organized and co-chaired workshop and sessensoratdata, infrastructure modeling and interdependencies, sions professional conferences including the ACM Workmachine learning, and uncertainty analysis. shop on Knowledge Discovery from Sensor Data He heldreceived in conPh.D. in with information engineering University of2008. Tenjunction ACM SIGKDD 2007from andthe ACM SIGKDD nessee. He hasworked published top peer-reviewed journals and He previously as aindata analyst with Mobil Exploconferences; co-organized and co-chaired sesration and Production Company for moreworkshop than fiveand years. sions at professional conferences including the ACM WorkDr. Joao Gama is a researcher at LIAAD-INESC Porto shop the on Knowledge fromIntelligence Sensor Dataand heldDecision in conLA, LaboratoryDiscovery of Artificial junction ACM SIGKDD andHis ACM SIGKDD 2008. Support with of the University of 2007 Porto. main research inHe previously worked as a data analyst with Mobil Exploterest is Learning from Data Streams. He has published ration and Production Company for more than five years. several articles in change detection, learning decision trees Dr. Joao streams, Gama ishierarchical a researcher at LIAAD-INESC Porto from data clustering from streams, etc. LA, theofLaboratory of on Artificial Intelligence and Decision Editor special issues Data Streams in Intelligent Data Support ofJ.the University of Porto. His main research inAnalysis, Universal Computer Science, and New Generterest is Learning from DataofStreams. He has published ation Computing. Co-chair ECML 2005 Porto, Portuseveral articles in change learning decision gal 2005, Conference chairdetection, of Discovery Science 2009,trees and from data streams, hierarchical clusteringDiscovery from streams, etc. of a series of Workshops on Knowledge in Data Editor of ECML special 2004, issues Pisa, on Data Streams Intelligent Streams, Italy, ECMLin2005, Porto, Data PorAnalysis, J. Universal Computer and New Genertugal, ICML 2006, Pittsburg, US, Science, ECML 2006 Berlin, Geration Co-chair of ECML 2005 Porto, Portumany, Computing. SAC2007, Korea, and the ACM Workshop on Knowlgal 2005, Conference chair ofData Discovery 2009, with and edge Discovery from Sensor held inScience conjunction of a series of Workshops Knowledge Discovery in Data ACM SIGKDD 2007 andon ACM SIGKDD 2008. Together Streams, ECML 2004,the Pisa, Italy, ECML 2005, Porto, Porwith M. Gaber edited book Learning from Data Streamstugal, ICMLTechniques 2006, Pittsburg, US, ECML 2006published Berlin, GerProcessing in Sensor Networks, by many, SAC2007, Korea, and the ACM Workshop on KnowlSpringer. edge Discovery from Sensor heldprofessor in conjunction Dr. Nitesh V. Chawla is an Data assistant at the with UniACM SIGKDD 2007 and SIGKDD 2008. Together versity of Notre Dame. Dr.ACM Chawla’s research interests are with M. Gaber the book Learning from Data Streamsbroadly in the edited areas of data mining, machine learning, patProcessing Techniques in Sensor Networks, published by tern recognition, and their applications. More specifically Springer. his research has focused on learning from massive datasets, Dr. Nitesh V. Chawla is an assistant professor at the University of Notre Dame. Dr. Chawla’s research interests are broadly in the areas of data mining, machine learning, pattern recognition, and their applications. More specifically his research has focused on learning from massive datasets,

SIGKDD Explorations

distributed data mining/machine learning, ensemble techniques, cost/distribution sensitive learning, feature selection, and semi-supervised learning. His research has also focused on the inter-disciplinary applications such as intelligent scientific visualization, biometrics, bioinformatics, natdistributed data mining/machine learning, ensemble techural language processing, and customer analytics. niques, cost/distribution sensitive learning, feature selecDr. Mohamed Medhat Gaber is a research Fellow at Monash tion, and semi-supervised learning. His more research Australia. He has published thanhas 60 also paUniversity, focused on the inter-disciplinary applications such as intellipers. Mohamed is the co-editor of the book: Learning from biometrics, in bioinformatics, natgent Datascientific Streams:visualization, Processing Techniques Sensor Networks, ural language processing, andand customer analytics. published by Springer in 2007 the book: Knowledge DisDr. Mohamed Medhat Gaber is a research Monash covery from Sensor Data by CRC that is Fellow due to at appear by University, Heinterests has published than 60minpaend of 2008.Australia. His research includemore data stream pers.wireless Mohamed is the co-editor the book: Learning from ing, sensor networks andofcontext-aware computing. Data Streams: Processing in Sensor Networks, Mohamed has served in theTechniques program committees of several published by Springer in 2007 and theand book: Knowledge international and local conferences workshops in Disthe covery by CRC that is due to appear by area of from data Sensor mining Data and context-aware computing. He was end co-chair of 2008. His research include data stream minof the IEEE interests International Workshop on Minthe ing, Evolving wireless sensor networks Data and context-aware computing. ing and Streaming held in conjunction with Mohamed hasInternational served in the program on committees of Discovseveral ICDM 2006, Workshop Knowledge international and local conferences andinworkshops the ery from Ubiquitous Data Streams held conjunctioninwith area of data mining computing. He was ECML/PKDD 2007, and and context-aware the First and Second International the co-chaironofKnowledge the IEEE Discovery International on MinWorkshop fromWorkshop Sensor Data held ingconjunction Evolving and Streaming Data held in conjunction with in with ACM SIGKDD 2007/2008. ICDM 2006, International onscientist Knowledge DiscovDr. Auroop R. Ganguly isWorkshop a research within the ery from Ubiquitous Data Streams held in conjunction Computational Sciences and Engineering division of the with Oak ECML/PKDD and the First Second Ridge National2007, Laboratory since and 2004. His International research inWorkshop Knowledge Sensorinformatics, Data held terests are on climate changeDiscovery impacts, from geoscience in conjunction with ACM SIGKDD computational 2007/2008. civil and environmental engineering, data sciDr. Auroop R. Ganguly is a research within the ences, and knowledge discovery. Priorscientist to ORNL, he has Computational Engineering division of the Oak more than five Sciences years ofand experience in the software indusRidge NationalOracle Laboratory since and 2004. His researchcomintry, specifically Corporation a best-of-breed terests are climate acquired change impacts, geoscience informatics, pany subsequently by Oracle, and about a year in civil and environmental computational data sciacademia, specifically atengineering, the University of South Florida in ences, and discovery. he has Tampa. Heknowledge has a PhD from the Prior Civil to andORNL, Environmenmore than five years of experience in the software industal Engineering department of the Massachusetts Institute try,Technology, specifically Oracle and a best-of-breed comof several Corporation years of research experience with a pany subsequently acquired by Oracle, and aboutexperience a year in group at the MIT Sloan School of Management, academia, theaUniversity of of South Florida in in private specifically consulting, at and wide range peer-reviewed Tampa. He spanning has a PhD from the Civil and Environmenpublications multiple disciplines. Currently, he is tal department Massachusetts Institute alsoEngineering an adjunct professor at of thethe University of Tennessee in of Technology, several years of research experience with a Knoxville. group at the MIT Sloan School of Management, experience in private consulting, and a wide range of peer-reviewed publications spanning multiple disciplines. Currently, he is 6. INVITED SPEAKERS also an adjunct professor at the University of Tennessee in Dr. Kendra Moore’s research interests include automatic Knoxville. pattern learning and change detection in complex spatiotemporal data streams. This spans learning data representations, and activity and movement models, and adapting 6. INVITED to changes as theySPEAKERS occur. Dr. Moore is also interested Dr. Kendra research interests include automatic in developing Moore’s technology to understand, support, and aspattern and change detection fusion in complex spatiosess peerlearning production-based information systems. Dr. temporal data streams. learning data for represenMoore currently managesThis the spans Predictive Analysis Naval tations, and Activities activity and movement models,She andalso adapting Deployment (PANDA) program. manto changes as Connectivity they occur. for Dr. Coalition Moore isAgents also interested aged the Fast Program in developing technology to understand, support, as(Fast C2AP) program, which transitioned to the USand Navy’s sess peer production-based information fusion systems. Dr. GCCS-M program in October 2007. Dr. Moore joined Moore currently Predictive Analysis for Naval DARPA in 2005.manages Prior to the joining DARPA, Dr. Moore was Deployment She manpresident andActivities founder of(PANDA) Advestan,program. Inc., where shealso provided aged the Fast Connectivity for Coalition Program R&D consulting services to DoD customersAgents and contractors (Fast C2AP)estimation, program, which transitioned to the US in advanced analysis, and exploitation forNavy’s largeGCCS-M programfusion in October 2007. Before Dr. Moore joined scale information applications. starting AdDARPADr. in 2005. DARPA, Dr. Moore was vestan, MoorePrior was to thejoining Director of Information Fusion president and founder of Advestan, Inc., where she provided R&D consulting services to DoD customers and contractors in advanced estimation, analysis, and exploitation for largescale information fusion applications. Before starting Advestan, Dr. Moore was the Director of Information Fusion

Volume 10, Issue 2

Page 72

at ALPHATECH, Inc. (now BAE Systems). She also served on the Problem-Centered Intelligence, Surveillance, and Reconnaissance (PCISR) study panel to develop recommendations for new all-source ISR architectures for a national intelligence agency. Prior to that, she developed, extended, at Inc. systems (now BAE Systems). She also served andALPHATECH, applied large-scale analysis techniques to a wide on the Problem-Centered Intelligence, Surveillance, and Rerange of military command and control systems. connaissance (PCISR) study panel develop recommenJiawei Han is a Professor in thetoDepartment of ComDr. dations for new all-source ISRofarchitectures for a national puter Science at the University Illinois at Urbana-Champaign. intelligence agency. Prior to that, she developed, extended, His research expertise include data mining, data warehousand applied large-scale analysisfrom techniques to a wide ing, database systems,systems data mining spatiotemporal range of military command and and control systems. data, multimedia data, stream RFID data, social netDr. Jiawei Hanbiological is a Professor in the Department of Comwork data, and data. He has written over 350 jourputer Science at thepublications. University of He Illinois at Urbana-Champaign. nal and conference has chaired or served in His expertise include data mining, dataconferences warehousoverresearch 100 program committees of international ing, workshops, database systems, data spatiotemporal and including PC mining cochairfrom of 2005 (IEEE) Indata, multimedia data, on stream RFID data, social netternational Conference Dataand Mining (ICDM), Americas work data, and biological data. HeConference has writtenon over 350Large jourCoordinator of 2006 International Very nal and conference publications. has chaired or served in Data Bases (VLDB). He is alsoHeserving as the founding over 100 program of international conferences Editor-In-Chief of committees ACM Transactions on Knowledge Disand workshops, including cochair of 2005 (IEEE) Incovery from Data. He is anPC ACM Fellow and has received ternational on Data Mining (ICDM), Americas 2004 ACM Conference SIGKDD Innovations Award and 2005 IEEE Coordinator of 2006Technical International Conference on Very Computer Society Achievement Award. HisLarge book Data Bases (VLDB). Heand is also serving 2nd as the “Data Mining: Concepts Techniques” ed.,founding Morgan Editor-In-Chief of has ACM Transactions on Knowledge DisKaufmann, 2006) been popularly used as a textbook covery from Data. He is an ACM Fellow and has received worldwide. 2004 ACM SIGKDD Innovations Award and 2005 IEEE Computer Society Technical Achievement Award. His book 7. REFERENCES “Data Mining: Concepts and Techniques” 2nd ed., Morgan Kaufmann, 2006) has been popularly used as a textbook [1] A. Ganguly, J. Gama, O. Omitaomu, M. Gaber, and worldwide. R. R. Vatsavai, editors. Intelligent Data Analysis, An International Journal. IOS Press, Nieuwe Hemweg 6B, 7. 1013 REFERENCES BG Amsterdam, The Netherlands, 2008. [2] [1] A. Ganguly, J. Gama, O. Omitaomu, M. Gaber, and R. R. R. Vatsavai, Vatsavai, editors. editors.Intelligent Knowledge Discovery Data Analysis,from An Sensor Data. CRC Press, Raton, FL Hemweg 33487, 2008. International Journal. IOSBoca Press, Nieuwe 6B, http://www.crcpress.com/shopping cart/products/ 1013 BG Amsterdam, The Netherlands, 2008. product detail.asp?sku=82329&isbn=9781420082326 [2] A. Ganguly, J. Gama, O. Omitaomu, M. Gaber, and &parent id=&pc=. R. R. Vatsavai, editors. Knowledge Discovery from [3] Sensor-KDD Sensor Data. CRC 2007. Press, Boca First Raton, FL international 33487, 2008. workshop on knowledge discovery http://www.crcpress.com/shopping cart/products/ from sensor data (Sensor-KDD 2007). detail.asp?sku=82329&isbn=9781420082326 product http://www.ornl.gov/sci/knowledgediscovery/KDD&parent id=&pc=. 2007-Workshop/. [3] Sensor-KDD 2007. First international [4] Sensor-KDD Program Committee. Second inworkshop on knowledge discovery ternational workshop knowledge discovfrom sensor data on (Sensor-KDD 2007). ery from sensor data (Sensor-KDD 2008). http://www.ornl.gov/sci/knowledgediscovery/KDDhttp://www.ornl.gov/sci/knowledgediscovery/SensorKDD2007-Workshop/. 2008/organizers.htm. [4] Sensor-KDD Program Committee. Second in[5] R. R. Vatsavai,workshop O. Omitaomu, M. Gaber, and ternational on J. Gama, knowledge discovA. workshop on knowlery Ganguly. from Second sensor international data (Sensor-KDD 2008). edge discovery from sensor data (Sensor-KDD 2008). http://www.ornl.gov/sci/knowledgediscovery/SensorKDDhttp://www.ornl.gov/sci/knowledgediscovery/SensorKDD2008/organizers.htm. 2008/index.htm, 2008. ACM. [5] R. R. Vatsavai, O. Omitaomu, J. Gama, M. Gaber, and A. Ganguly. Second international workshop on knowledge discovery from sensor data (Sensor-KDD 2008). http://www.ornl.gov/sci/knowledgediscovery/SensorKDD2008/index.htm, 2008. ACM.

SIGKDD Explorations

Volume 10, Issue 2

Page 73

Suggest Documents