Department of Electronic Engineering, Tsinghua University
Streaming Similarity Search on FPGA based on Dynamic Time Warping Yu WANG Associate Prof., Head, Research Institution of Circuits and Systems, E.E. Dept, Tsinghua University, Beijing, China
http://nics.ee.tsinghua.edu.cn/people/wangyu/ Joint work by Tsinghua Univ. and IBM China Research Lab Based on a submitted paper to FPGA 2013 Nano-scale Integrated Circuit and System Lab.
1
Outline Background and Motivation
Why we need streaming similarity search Recent achievements and problems to solve
Subsequence Similarity Search on FPGA
Algorithms Hardware Architectures Results
Conclusion and future work
2
Alberto Sangiovanni-Vincentelli (Tuesday noon @ ICCAD 2012) ICCAD at 30 years Where We have been, where we are going
3
Internet of Things Nowadays
Future
Independent Applications
Fully connected, and correlated Applications
Traditional Database Techniques Small Scale “small IOTs”
Monitoring only
BIG DATA (Time and Spatial Correlated Streaming DATA) Volume, Variety, Velocity Collection, Publish, Processing, Storage, and Query for BIG DATA
Advanced IT techniques Large Scale, and large Volume Data (“big IOTs”) Different realtime or nonrealtime applications
IoT DATA Manage System (IBM RODB©)
4
RODB Different Applications Application Specific Data Management Middleware (Collection, Publish, Processing, Storage, and Query )
Realtime Oriented DataBase
5
Data format from IOT (CPS, SoS, ect.) Format of Data
Industries in Smarter Planet
RFID
Retail
Logistic
Numerical data streams from various sensors (Timing Series)
Mineral
Steel
Manufactory
E&U
Petro
Chemistry
Smart building
Multi-media data and sensor data
Smart City
Healthcare
Environment monitoring
Transportation 6
Mining Task Dependency (Not Complete) No history data involved
May have real time req Similarity Search
Correlation Discovery
Data Privacy
History data analyses
Segmentation
Motif Discovery
Visualization
Classification
Clustering
Novelty/Anomaly detection
Rule Discovery
Prediction Burst Detection
“Similarity” Search Finite filed subsequence exact search
Object: string e.g. find “pattern” in “we have a pattern here” with K.M.P
Finite filed subsequence similarity search
Object: DNA chain, Protein sequence e.g. find similar subsequence as “ATGAG” in a DNA chain “ATGACTGAG…” with Smith-Waterman.
Infinite filed subsequence similarity search
Object: time series data e.g. next slide
Simple DATA representation Tuple [Sensor, Time, Value]
Streaming Subsequence Similarity Search 2 1 0 -1 -2 -3
0
100
200
300
d
Time Complexity O(N*O(distance))
Time series (electrocardiogram) & pattern (query) Pick out subsequences with sliding window (totally N subsequences) Compare the subsequences with the pattern, under a certain distance measure, to judge if they are similar
400
Distance Measure Dynamic Time Warping P= p1, p2, p3…pM; S= s1, s2,Step1: s3…sM Calculate the distance of each two DTW(S, P) = D(M, M); points
D(i-1, j) D(i, j) = dist(si, pj) + min D(i, j-1) D(i-1, j-1)
D(0,0) = 0; D(i, 0) = D(0, j) = infinite, 1
DTW(Ss,e, P) = D(e,M) i-R< sp(i-1,j)+j