Streaming Similarity Search on FPGA based on Dynamic Time Warping

Department of Electronic Engineering, Tsinghua University Streaming Similarity Search on FPGA based on Dynamic Time Warping Yu WANG Associate Prof., ...
Author: Beatrice Powers
8 downloads 0 Views 6MB Size
Department of Electronic Engineering, Tsinghua University

Streaming Similarity Search on FPGA based on Dynamic Time Warping Yu WANG Associate Prof., Head, Research Institution of Circuits and Systems, E.E. Dept, Tsinghua University, Beijing, China

http://nics.ee.tsinghua.edu.cn/people/wangyu/ Joint work by Tsinghua Univ. and IBM China Research Lab Based on a submitted paper to FPGA 2013 Nano-scale Integrated Circuit and System Lab.

1

Outline  Background and Motivation  

Why we need streaming similarity search Recent achievements and problems to solve

 Subsequence Similarity Search on FPGA   

Algorithms Hardware Architectures Results

 Conclusion and future work

2

Alberto Sangiovanni-Vincentelli (Tuesday noon @ ICCAD 2012) ICCAD at 30 years Where We have been, where we are going

3

Internet of Things Nowadays

Future

 Independent Applications

 Fully connected, and correlated Applications

 Traditional Database Techniques  Small Scale “small IOTs”

 Monitoring only

BIG DATA (Time and Spatial Correlated Streaming DATA) Volume, Variety, Velocity Collection, Publish, Processing, Storage, and Query for BIG DATA

 Advanced IT techniques  Large Scale, and large Volume Data (“big IOTs”)  Different realtime or nonrealtime applications

IoT DATA Manage System (IBM RODB©)

4

RODB Different Applications Application Specific Data Management Middleware (Collection, Publish, Processing, Storage, and Query )

Realtime Oriented DataBase

5

Data format from IOT (CPS, SoS, ect.) Format of Data

Industries in Smarter Planet

RFID

Retail

Logistic

Numerical data streams from various sensors (Timing Series)

Mineral

Steel

Manufactory

E&U

Petro

Chemistry

Smart building

Multi-media data and sensor data

Smart City

Healthcare

Environment monitoring

Transportation 6

Mining Task Dependency (Not Complete) No history data involved

May have real time req Similarity Search

Correlation Discovery

Data Privacy

History data analyses

Segmentation

Motif Discovery

Visualization

Classification

Clustering

Novelty/Anomaly detection

Rule Discovery

Prediction Burst Detection

“Similarity” Search  Finite filed subsequence exact search  

Object: string e.g. find “pattern” in “we have a pattern here” with K.M.P

 Finite filed subsequence similarity search  

Object: DNA chain, Protein sequence e.g. find similar subsequence as “ATGAG” in a DNA chain “ATGACTGAG…” with Smith-Waterman.

 Infinite filed subsequence similarity search  

Object: time series data e.g. next slide

Simple DATA representation Tuple [Sensor, Time, Value]

Streaming Subsequence Similarity Search 2 1 0 -1 -2 -3

0

100

200

300

d

Time Complexity O(N*O(distance))

  

Time series (electrocardiogram) & pattern (query) Pick out subsequences with sliding window (totally N subsequences) Compare the subsequences with the pattern, under a certain distance measure, to judge if they are similar

400

Distance Measure  Dynamic Time Warping P= p1, p2, p3…pM; S= s1, s2,Step1: s3…sM Calculate the distance of each two DTW(S, P) = D(M, M); points

D(i-1, j) D(i, j) = dist(si, pj) + min D(i, j-1) D(i-1, j-1)

D(0,0) = 0; D(i, 0) = D(0, j) = infinite, 1

DTW(Ss,e, P) = D(e,M) i-R< sp(i-1,j)+j

Suggest Documents