EXPERIMENTAL METHODS AND TECHNIQUES IN ENGINEERING The Data Management Perspective Fabio A. Schreiber
Politecnico di Milano Dipartimento di Elettronica, Informazione e Bioingegneria
THE DATA MANAGEMENT PERSPECTIVE 1
Experiments on Databases and DBMSs Data organization and management as a service to the experiments of the scientific community Experimenting with the Database content itself
F. A. Schreiber
Experimental Methods ... Data Perspective
EXPERIMENTS & DATA MANAGEMENT 2
Experiments for optimizing data structures and management Database
Management System (DBMS) Data Structures Conceptual/Logical
Schema optimization and evolution Physical structures design
Data organization and management for collecting experimental results (e-Science) Exploring Database content (data mining) Assessing Data Quality F. A. Schreiber
Experimental Methods ... Data erspective
EXPERIMENTS & DATA MANAGEMENT 3
Goals Systems
Performance Evaluation and Tuning
How performant a system is? How can I improve its performance?
Benchmarking
Comparison among different systems under similar workload System
Effectiveness
How much a system conforms to the user’s needs
w.r.t. a defined metric F. A. Schreiber
Experimental Methods ... Data Perspective
WORKLOAD AND FACTORS 4
Synthetic vs. Real Synthetic workload allows for controlled experiment repeteability. Useful in systems comparison Real workload can be highly variable and can be used in assessing the overall performance of a single system in its real environment
Single-user (to test specific algorithms) vs. Multi-user (to test system procedures) Multiprogramming level Query mix Degree of data sharing (buffer and cache sizes)
F. A. Schreiber
Experimental Methods ... Data Perspective
FACTORS IN DBMS PERFORMANCE EVALUATION (Boral & DeWitt 84)[2] 5
Multiprogramming level (MPL) Number of concurrent queries in any phase of execution Use precompiled queries and minimize the data volume of the results in order to exibit as much as possible the true «execution» time
F. A. Schreiber
Experimental Methods ... Data Perspective
FACTORS IN DBMS PERFORMANCE EVALUATION 6
Degree of Data Sharing (DDS) Concurrent access affects both data pages (rare) and index pages (frequent) Expressed as a percentage of the multiprogramming level:
0% each query references only its partition 100% all queries reference the same partition 0% O = < o1,t1 >; < o2,t2 >; … < oj,tj >
ALGORITHM FOOTPRINT 11 KB RAM;
1 KB ROM
Experimental Methods ... Data Perspective
EXPERIMENTAL SET UP 42
∆V=v2
100 Ώ < ZR(t) < 1000 Ώ (measured) R = 1 Ώ R + ZR ≈ ZR i 0 mA < i < 30 mA 0 mV < ΔV < 30 mV
F. A. Schreiber
(Data sheet)
Z(t)
R
+ -
v1
Experimental Methods ... Data Perspective
ALGORITHM BEHAVIOUR 43
WITHOUT AGGREGATION (BYPASS)
WITH AGGREGATION 7 TRANSMITTED VALUES , 30mJ
60 TRANSMITTED VALUES , 120mJ
70% ENERGY SAVINGS
7 TRANSMITTED VALUES , 30mJ F. A. Schreiber
60 TRANSMITTED VALUES , 120mJ Experimental Methods ... Data Perspective
COMPARISON CRITERIA (1/2) 44
Two real world data sets have been processed by using the algorithm proposed and two other aggregation algorihms: I.
Lazaridis, S. Mehrotra, Capturing Sensor-Generated Time Series with Quality Guarantees, in: ICDE, 2003, pp. 429–439. T. Schoellhammer, E. Osterweil, B. Greenstein, M. Wimbrow, D. Estrin, Lightweight Temporal Compression of Microclimate Datasets, in: LCN, 2004, pp. 516–524. F. A. Schreiber
Experimental Methods ... Data Perspective
COMPARISON CRITERIA (2/2) 45
The comparison among algorithms have been based on three main criteria: Compression rate: the degree with which data have been aggregated. Energy savings: the degree with which the aggregation allows sensors to save energy with respect to the case in which all the original values are sent to the base station. Correctness: the degree with which the aggregated data allow the base station to retrieve the original trend. Correctness has been evaluated by using the Mean Absolute Error (MAE) and the related Mean Absolute Percentage Error (MA%E).
F. A. Schreiber
Experimental Methods ... Data Perspective
DATA SET (A) RESULTS 46 [V] 0,19
Cappiello and Schreiber
0,18
0,17
0,16
0,15
0,14
0,13 [V]
0
a
20
40
60
80
b
100
120
140
c
C2N2 absorption spectrum F. A. Schreiber
Experimental Methods ... Data Perspective
160
[t]
DATA SET (A) RESULTS 47
0,19
[V]
0,18
0,17 Lazaridis et al.
0,16
0,15
0,14
[t]
0,13 0
a
F. A. Schreiber
20
40
60
b80
100
120
140c
Experimental Methods ... Data Perspective
160
DATA SET (A) RESULTS 48 0,19
[V]
0,18
0,17
Schoellham mer et al.
0,16
0,15
0,14
0,13 0
a
F. A. Schreiber
20
40
60
b
80
100
120
c
140
160
Experimental Methods ... Data Perspective
[t]
DATA SET (A) RESULTS 49 Compression rate
Energy Reduction
90,00%
60,00%
85,00%
50,00%
80,00%
40,00%
75,00%
30,00%
70,00%
20,00%
65,00%
10,00%
60,00%
[Authors]
F. A. Schreiber
[Lazaridis et al.]
[Schoellhammer et al. ]
MAE in case of non linear trends
0,00%
[Authors]
[Lazaridis et al.] [Schoellhammer et al. ]
Experimental Methods ... Data Perspective
DATA SET (B) RESULTS 50
6
4
2
0 0
20
-2
-4
40
60
80
100
120
140
Systematic error due to the processing time shift
-6
-8
C2N2 absorption spectrum FM F. A. Schreiber
Experimental Methods ... Data Perspective
[t]
160
Cappiello and Schreiber Input data set
DATA SET (B) RESULTS 51
6
4
2
0 0
20
40
60
80
100
120
140
[t]
160
Lazaridis et al.
-2
Input Data Set
-4
-6
-8
F. A. Schreiber
Experimental Methods ... Data Perspective
DATA SET (B) RESULTS 52
6
4
2
0 0
20
40
60
80
100
120
140
Schoellhammer et al. Input data set
-2
-4
-6
-8
F. A. Schreiber
[t] 160
Experimental Methods ... Data Perspective
DATA SET (B) RESULTS 53
Compression rate
Energy Savings
95,00%
60,00%
90,00%
50,00%
85,00%
40,00%
80,00%
30,00%
75,00%
20,00%
70,00%
10,00%
65,00%
[Authors]
F. A. Schreiber
[Lazaridis et al.]
[Schoellhammer et al. ]
0,00%
[Authors]
[Lazaridis et al.]
Experimental Methods ... Data Perspective
[Schoellhammer et al. ]
SUMMARY COMPARISON AND COMMENTS 54
No single algorithm is «the best» Transmission procedures with packed based protocols can affect the analysis
Higher packing factors improve energy efficiency Higher transmission delays negatively affect timeliness
Adaptable procedures should be used on the basis of The peculiar features of the signals to be processed The quality requirements of the applications F. A. Schreiber
Experimental Methods ... Data Perspective
54
Programs and Data 55
Philosophy without Science is empty, Science without Philosophy is blind
I. Kant PARAPHRASE
Programs without Data are empty, Data without Programs are blind
F. A. Schreiber F. A. Schreiber
Experimental Methods ... Data Perspective
SUMMARY AND CONCLUSIONS 56
Experiments on Databases and DBMSs for optimizing data structures and management including Data Quality Data organization and management as a service to the experiments of the scientific community Experimenting with the Database content itself (data mining) Experimentation is both:
a science because it requires formal and rigorous methodologies, languages, and instruments an art because it requires intuition, phantasy, and … it gives emotions
F. A. Schreiber
Experimental Methods ... Data Perspective
BIBLIOGRAPHICAL REFERENCES 57 1.
Babu S. et Al. – Automated Experiment-Driven Management of (Database) Systems – Proc. 12th HotOS, pp. 1 – 5, 2009
2.
Boral H., DeWitt D. J. – A Methodology for Database Systems Performance Evaluation – SIGMOD Record, Vol. 14, n. 2, pp. 176-185, 1984
3.
Brown D. et Al. – High energy nuclear database: a testbed for nuclear data information technology – Int. Conf. On nuclear data for Science and Technology, art. 250, 2007
4.
Cappiello C., Schreiber F.A. - Experiments and analysis of quality and Energy-aware data aggregation approaches in WSNs - 10th Int. Workshop on Quality in Databases QDB 2012, Istanbul, Aug. 26, 2012, pp. 1- 8 http://www.purdue.edu/discoverypark/cyber/qdb2012/papers/7data%20aggregation.pdf
5.
Curino C. et Al. – Schema Evolution in Wikipedia: Toward a Web Information System Benchmark – Proc. ICEIS, pp. 323 – 332, 2008
6.
Curino et Al. – Graceful Database Schema Evolution: the PRISM Workbench – Proc. VLDB’08, pp. 761 – 772, 2008
7.
Davcev D. et Al. – Experiments in Data Management for Wireless Sensor Networks – Proc. 2° Int. Conf. on Sensor Technologies and Applications , pp. 198 – 202, 2008
8.
Manolescu I. et Al. - The Repeatability Experiment of SIGMOD 2008 - SIGMOD Record, Vol. 37, n. 1, pp. 39 – 45, 2008 F. A. Schreiber
Experimental Methods ... Data Perspective
BIBLIOGRAPHICAL REFERENCES 58 9.
Marche S. – Measuring the stability of data models – European Journal of Information Systems, Vol.2, n.1, pp. 37 – 47, 1993
10.
Masseroli M. - Management and Analysis of Genomic Functional and Phenotypic Controlled Annotations to Support Biomedical Investigation and Practice - IEEE Transactions on Information Technology in Biomedicine, Vol. 11, n. 4, pp. 376-385, 2007
11.
Sjoberg D. I. – Quantifying schema evolution – Information asnd software technology, Vol. 35, n. 1, pp.35 - 44, 1993
12.
Stoeckert C. et Al. – Microarray databases: standards and ontologies – Nature genetics, Vol. 32, pp. 469 – 473, 2002
13.
Szalay A., Gray J. – The world-wide telescope – Science, Vol. 293, pp. 2037 – 2040, 2001
14.
Vanschoren J., Blockeel H. – Experiment Databases - In: Dzeroski S., Goethals B., Panov P. (Eds.), Inductive Databases and Queries: Constraint-based Data Mining, Chapt. 14, Springer, pp. 335 - 360, 2010
15.
Schwartz B. – The four fundamental performance metrics – PERCONA, 2011 http://www.mysqlperformanceblog.com/2011/04/27/the-four-fundamental-performance-metrics/
16.
http://www.tpc.org/information/benchmarks.asp
F. A. Schreiber
Experimental Methods ... Data Perspective