EXPERIMENTAL METHODS AND TECHNIQUES IN ENGINEERING

EXPERIMENTAL METHODS AND TECHNIQUES IN ENGINEERING The Data Management Perspective Fabio A. Schreiber Politecnico di Milano Dipartimento di Elettroni...
6 downloads 2 Views 4MB Size
EXPERIMENTAL METHODS AND TECHNIQUES IN ENGINEERING The Data Management Perspective Fabio A. Schreiber

Politecnico di Milano Dipartimento di Elettronica, Informazione e Bioingegneria

THE DATA MANAGEMENT PERSPECTIVE 1







Experiments on Databases and DBMSs Data organization and management as a service to the experiments of the scientific community Experimenting with the Database content itself

F. A. Schreiber

Experimental Methods ... Data Perspective

EXPERIMENTS & DATA MANAGEMENT 2



Experiments for optimizing data structures and management  Database

Management System (DBMS)  Data Structures  Conceptual/Logical

Schema optimization and evolution  Physical structures design 

 

Data organization and management for collecting experimental results (e-Science) Exploring Database content (data mining) Assessing Data Quality F. A. Schreiber

Experimental Methods ... Data erspective

EXPERIMENTS & DATA MANAGEMENT 3



Goals  Systems  

Performance Evaluation and Tuning

How performant a system is? How can I improve its performance?

 Benchmarking

Comparison among different systems under similar workload  System

Effectiveness

How much a system conforms to the user’s needs

w.r.t. a defined metric F. A. Schreiber

Experimental Methods ... Data Perspective

WORKLOAD AND FACTORS 4



Synthetic vs. Real Synthetic workload allows for controlled experiment repeteability. Useful in systems comparison  Real workload can be highly variable and can be used in assessing the overall performance of a single system in its real environment 



Single-user (to test specific algorithms) vs. Multi-user (to test system procedures) Multiprogramming level  Query mix  Degree of data sharing (buffer and cache sizes) 

F. A. Schreiber

Experimental Methods ... Data Perspective

FACTORS IN DBMS PERFORMANCE EVALUATION (Boral & DeWitt 84)[2] 5



Multiprogramming level (MPL) Number of concurrent queries in any phase of execution Use precompiled queries and minimize the data volume of the results in order to exibit as much as possible the true «execution» time 

F. A. Schreiber

Experimental Methods ... Data Perspective

FACTORS IN DBMS PERFORMANCE EVALUATION 6



Degree of Data Sharing (DDS) Concurrent access affects both data pages (rare) and index pages (frequent) Expressed as a percentage of the multiprogramming level:

    

0% each query references only its partition 100% all queries reference the same partition 0% O = < o1,t1 >; < o2,t2 >; … < oj,tj >

ALGORITHM FOOTPRINT 11 KB RAM;

1 KB ROM

Experimental Methods ... Data Perspective

EXPERIMENTAL SET UP 42

∆V=v2

100 Ώ < ZR(t) < 1000 Ώ (measured) R = 1 Ώ  R + ZR ≈ ZR i 0 mA < i < 30 mA 0 mV < ΔV < 30 mV

F. A. Schreiber

(Data sheet)

Z(t)

R

+ -

v1

Experimental Methods ... Data Perspective

ALGORITHM BEHAVIOUR 43

WITHOUT AGGREGATION (BYPASS)

WITH AGGREGATION 7 TRANSMITTED VALUES , 30mJ

60 TRANSMITTED VALUES , 120mJ

70% ENERGY SAVINGS

7 TRANSMITTED VALUES , 30mJ F. A. Schreiber

60 TRANSMITTED VALUES , 120mJ Experimental Methods ... Data Perspective

COMPARISON CRITERIA (1/2) 44



Two real world data sets have been processed by using the algorithm proposed and two other aggregation algorihms:  I.

Lazaridis, S. Mehrotra, Capturing Sensor-Generated Time Series with Quality Guarantees, in: ICDE, 2003, pp. 429–439.  T. Schoellhammer, E. Osterweil, B. Greenstein, M. Wimbrow, D. Estrin, Lightweight Temporal Compression of Microclimate Datasets, in: LCN, 2004, pp. 516–524. F. A. Schreiber

Experimental Methods ... Data Perspective

COMPARISON CRITERIA (2/2) 45



The comparison among algorithms have been based on three main criteria: Compression rate: the degree with which data have been aggregated.  Energy savings: the degree with which the aggregation allows sensors to save energy with respect to the case in which all the original values are sent to the base station.  Correctness: the degree with which the aggregated data allow the base station to retrieve the original trend. Correctness has been evaluated by using the Mean Absolute Error (MAE) and the related Mean Absolute Percentage Error (MA%E). 

F. A. Schreiber

Experimental Methods ... Data Perspective

DATA SET (A) RESULTS 46 [V] 0,19

Cappiello and Schreiber

0,18

0,17

0,16

0,15

0,14

0,13 [V]

0

a

20

40

60

80

b

100

120

140

c

C2N2 absorption spectrum F. A. Schreiber

Experimental Methods ... Data Perspective

160

[t]

DATA SET (A) RESULTS 47

0,19

[V]

0,18

0,17 Lazaridis et al.

0,16

0,15

0,14

[t]

0,13 0

a

F. A. Schreiber

20

40

60

b80

100

120

140c

Experimental Methods ... Data Perspective

160

DATA SET (A) RESULTS 48 0,19

[V]

0,18

0,17

Schoellham mer et al.

0,16

0,15

0,14

0,13 0

a

F. A. Schreiber

20

40

60

b

80

100

120

c

140

160

Experimental Methods ... Data Perspective

[t]

DATA SET (A) RESULTS 49 Compression rate

Energy Reduction

90,00%

60,00%

85,00%

50,00%

80,00%

40,00%

75,00%

30,00%

70,00%

20,00%

65,00%

10,00%

60,00%

[Authors]

F. A. Schreiber

[Lazaridis et al.]

[Schoellhammer et al. ]

MAE in case of non linear trends

0,00%

[Authors]

[Lazaridis et al.] [Schoellhammer et al. ]

Experimental Methods ... Data Perspective

DATA SET (B) RESULTS 50

6

4

2

0 0

20

-2

-4

40

60

80

100

120

140

Systematic error due to the processing time shift

-6

-8

C2N2 absorption spectrum FM F. A. Schreiber

Experimental Methods ... Data Perspective

[t]

160

Cappiello and Schreiber Input data set

DATA SET (B) RESULTS 51

6

4

2

0 0

20

40

60

80

100

120

140

[t]

160

Lazaridis et al.

-2

Input Data Set

-4

-6

-8

F. A. Schreiber

Experimental Methods ... Data Perspective

DATA SET (B) RESULTS 52

6

4

2

0 0

20

40

60

80

100

120

140

Schoellhammer et al. Input data set

-2

-4

-6

-8

F. A. Schreiber

[t] 160

Experimental Methods ... Data Perspective

DATA SET (B) RESULTS 53

Compression rate

Energy Savings

95,00%

60,00%

90,00%

50,00%

85,00%

40,00%

80,00%

30,00%

75,00%

20,00%

70,00%

10,00%

65,00%

[Authors]

F. A. Schreiber

[Lazaridis et al.]

[Schoellhammer et al. ]

0,00%

[Authors]

[Lazaridis et al.]

Experimental Methods ... Data Perspective

[Schoellhammer et al. ]

SUMMARY COMPARISON AND COMMENTS 54

 

No single algorithm is «the best» Transmission procedures with packed based protocols can affect the analysis  



Higher packing factors improve energy efficiency Higher transmission delays negatively affect timeliness

Adaptable procedures should be used on the basis of  The peculiar features of the signals to be processed  The quality requirements of the applications F. A. Schreiber

Experimental Methods ... Data Perspective

54

Programs and Data 55

Philosophy without Science is empty, Science without Philosophy is blind

I. Kant PARAPHRASE

Programs without Data are empty, Data without Programs are blind

F. A. Schreiber F. A. Schreiber

Experimental Methods ... Data Perspective

SUMMARY AND CONCLUSIONS 56

Experiments on Databases and DBMSs for optimizing data structures and management including Data Quality  Data organization and management as a service to the experiments of the scientific community  Experimenting with the Database content itself (data mining) Experimentation is both: 

a science because it requires formal and rigorous methodologies, languages, and instruments  an art because it requires intuition, phantasy, and … it gives emotions 

F. A. Schreiber

Experimental Methods ... Data Perspective

BIBLIOGRAPHICAL REFERENCES 57 1.

Babu S. et Al. – Automated Experiment-Driven Management of (Database) Systems – Proc. 12th HotOS, pp. 1 – 5, 2009

2.

Boral H., DeWitt D. J. – A Methodology for Database Systems Performance Evaluation – SIGMOD Record, Vol. 14, n. 2, pp. 176-185, 1984

3.

Brown D. et Al. – High energy nuclear database: a testbed for nuclear data information technology – Int. Conf. On nuclear data for Science and Technology, art. 250, 2007

4.

Cappiello C., Schreiber F.A. - Experiments and analysis of quality and Energy-aware data aggregation approaches in WSNs - 10th Int. Workshop on Quality in Databases QDB 2012, Istanbul, Aug. 26, 2012, pp. 1- 8 http://www.purdue.edu/discoverypark/cyber/qdb2012/papers/7data%20aggregation.pdf

5.

Curino C. et Al. – Schema Evolution in Wikipedia: Toward a Web Information System Benchmark – Proc. ICEIS, pp. 323 – 332, 2008

6.

Curino et Al. – Graceful Database Schema Evolution: the PRISM Workbench – Proc. VLDB’08, pp. 761 – 772, 2008

7.

Davcev D. et Al. – Experiments in Data Management for Wireless Sensor Networks – Proc. 2° Int. Conf. on Sensor Technologies and Applications , pp. 198 – 202, 2008

8.

Manolescu I. et Al. - The Repeatability Experiment of SIGMOD 2008 - SIGMOD Record, Vol. 37, n. 1, pp. 39 – 45, 2008 F. A. Schreiber

Experimental Methods ... Data Perspective

BIBLIOGRAPHICAL REFERENCES 58 9.

Marche S. – Measuring the stability of data models – European Journal of Information Systems, Vol.2, n.1, pp. 37 – 47, 1993

10.

Masseroli M. - Management and Analysis of Genomic Functional and Phenotypic Controlled Annotations to Support Biomedical Investigation and Practice - IEEE Transactions on Information Technology in Biomedicine, Vol. 11, n. 4, pp. 376-385, 2007

11.

Sjoberg D. I. – Quantifying schema evolution – Information asnd software technology, Vol. 35, n. 1, pp.35 - 44, 1993

12.

Stoeckert C. et Al. – Microarray databases: standards and ontologies – Nature genetics, Vol. 32, pp. 469 – 473, 2002

13.

Szalay A., Gray J. – The world-wide telescope – Science, Vol. 293, pp. 2037 – 2040, 2001

14.

Vanschoren J., Blockeel H. – Experiment Databases - In: Dzeroski S., Goethals B., Panov P. (Eds.), Inductive Databases and Queries: Constraint-based Data Mining, Chapt. 14, Springer, pp. 335 - 360, 2010

15.

Schwartz B. – The four fundamental performance metrics – PERCONA, 2011 http://www.mysqlperformanceblog.com/2011/04/27/the-four-fundamental-performance-metrics/

16.

http://www.tpc.org/information/benchmarks.asp

F. A. Schreiber

Experimental Methods ... Data Perspective

Suggest Documents