A database extension for ORACLE, INFORMIX and PostgreSQL Compact, fast loads, quick queries Barrodale Computing Services Ltd.
DBXten
The topics addressed in this presentation are: ‣
What is DBXten?
‣
Where can DBXten be used?
‣
How does DBXten improve performance?
‣
How do you use DBXten?
‣
How effective is DBXten?
DBXten
DBXten is a database extension that: ‣
Speeds up data loading and querying
‣
Shrinks data space requirements
‣
Works with Oracle, Informix, PostgreSQL
‣
Comes with C, Java & SQL APIs
‣
Installs quickly
DBXten
Ideal DBXten application characteristics are: ‣
Data generated by instruments, sensors
‣
High volume feeds
‣
Write Once, Read Many times
DBXten OCEAN SENSOR ARRAY LATITUDE
LONGITUDE
DEPTH
TIME
SALINITY
TEMPERATURE
CONDUCTIVITY
48° 3’ 22” N
124° 21’ 08” W
628.41 m
2009.07.09 12:42
4.23
6.21931
43675
48° 3’ 22” N
124° 21’ 08” W
628.41 m
2009.07.09 12:43
4.23
6.21847
44532
48° 3’ 22” N
124° 21’ 08” W
628.41 m
2009.07.09 12:44
4.23
6.21527
43658
48° 3’ 22” N
124° 21’ 08” W
628.41 m
2009.07.09 12:45
4.23
6.21092
44085
48° 3’ 22” N
124° 21’ 08” W
628.41 m
2009.07.09 12:46
4.23
6.21654
43969
48° 3’ 22” N
124° 21’ 08” W
628.41 m
2009.07.09 12:47
4.23
6.21416
43862
48° 3’ 22” N
124° 21’ 08” W
628.41 m
2009.07.09 12:48
4.23
6.21702
44498
48° 3’ 22” N
124° 21’ 08” W
628.41 m
2009.07.09 12:49
4.24
6.21911
44114
48° 3’ 22” N
124° 21’ 08” W
628.41 m
2009.07.09 12:50
4.24
6.21763
44049
48° 3’ 22” N
124° 21’ 08” W
628.41 m
2009.07.09 12:51
4.24
6.21263
44525
48° 3’ 22” N
124° 21’ 08” W
628.41 m
2009.07.09 12:52
4.24
6.21048
43635
48° 3’ 22” N
124° 21’ 08” W
628.41 m
2009.07.09 12:53
4.28
6.21737
44346
48° 3’ 22” N
124° 21’ 08” W
628.41 m
2009.07.09 12:54
4.29
6.21329
44355
48° 3’ 22” N
124° 21’ 08” W
628.41 m
2009.07.09 12:55
4.31
6.21633
44412
48° 3’ 22” N
124° 21’ 08” W
628.41 m
2009.07.09 12:56
4.33
6.21757
43710
48° 3’ 22” N
124° 21’ 08” W
628.41 m
2009.07.09 12:57
4.33
6.21992
43600
48° 3’ 22” N
124° 21’ 08” W
628.41 m
2009.07.09 12:58
4.33
6.21366
44273
48° 3’ 22” N
124° 21’ 08” W
628.41 m
2009.07.09 12:59
4.33
6.21248
44453
48° 3’ 22” N
124° 21’ 08” W
628.41 m
2009.07.09 13:00
4.33
6.21983
44214
48° 3’ 22” N
124° 21’ 08” W
628.41 m
2009.07.09 13:01
4.33
6.21723
44321
48° 3’ 22” N
124° 21’ 08” W
628.41 m
2009.07.09 13:02
4.33
6.21754
44310
48° 3’ 22” N
124° 21’ 08” W
628.41 m
2009.07.09 13:03
4.33
6.21652
44584
...
DBXten NASA FLIGHT R
Θ
Φ
TIME
ΔR
ΔΘ
ΔΦ
TEMPERATURE
PRESSURE ...
4711.75
79.12
24.79
91373
0.73
0.00
0.00
-27.683
0.4478
4712.50
79.12
24.79
91374
0.75
0.00
0.00
-27.705
0.4476
0.71
4713.21
79.12
24.79
91375
0.00
0.00
-27.727
0.4474
4713.98
79.12
24.79
91376
0.77
0.00
0.00
-27.736
0.4472
4714.73
79.12
24.79
91377
0.75
0.00
0.00
-27.775
0.4470
0.76
4715.49
79.12
24.79
91378
0.00
0.00
-27.871
0.4468
4716.26
79.12
24.79
91379
0.77
0.00
0.00
-27.966
0.4466
4716.98
79.12
24.79
91380
0.72
0.00
0.00
-28.005
0.4464
0.79
4717.77
79.12
24.79
91381
0.00
0.00
-28.032
0.4462
4718.56
79.12
24.79
91382
0.79
0.00
0.00
-28.102
0.4460
4719.35
79.12
24.79
91383
0.79
0.00
0.00
-28.131
0.4458
0.75
4720.10
79.12
24.79
91384
0.00
0.00
-28.209
0.4456
4720.82
79.12
24.79
91385
0.72
0.00
0.00
-28.288
0.4454
4721.56
79.12
24.79
91386
0.74
0.00
0.00
-28.331
0.4452
0.80
4722.36
79.12
24.79
91387
0.00
0.00
-28.360
0.4450
4723.08
79.12
24.79
91388
0.72
0.00
0.00
-28.380
0.4448
4723.83
79.12
24.79
91389
0.75
0.00
0.00
-28.382
0.4446
4724.56
79.12
24.79
91390
0.73
0.00
0.00
-28.402
0.4444
4725.31
79.12
24.79
91391
0.75
0.00
0.00
-28.501
0.4442
4726.10
79.12
24.79
91392
0.79
0.00
0.00
-28.526
0.4440
0.00
0.00
-28.608
0.4438
0.00
0.00
-28.622
0.4436
4726.84
79.12
24.79
91393
0.74
4727.59
79.12
24.79
91394
0.75
DBXten NUCLEAR REACTOR Time
Valve
1986.04.26 03:22:04
Open
Open
1986.04.26 03:22:05
Open
Open
1
. . . Valve
q
P1
...
...
Tn
Neutron Density 1
...
Neutron Density p
Pm
T1
4.27
3.62
727
662
7351
4178
4.28
3.62
732
665
7364
4178
721
666
7336
4178
1986.04.26 03:22:06
Open
Open
4.28
3.62
1986.04.26 03:22:07
Open
Open
4.29
3.62
728
667
7379
4178
1986.04.26 03:22:08
Open
Open
4.28
3.62
724
672
7335
4178
727
679
7390
4178
1986.04.26 03:22:09
Open
Open
4.26
3.62
1986.04.26 03:22:10
Open
Open
4.24
3.62
726
681
7341
4178
1986.04.26 03:22:11
Open
Open
4.27
3.62
730
682
7340
4178
720
683
7331
4178
1986.04.26 03:22:12
Open
Open
4.26
3.62
1986.04.26 03:22:13
Open
Open
4.31
3.62
723
685
7339
4178
1986.04.26 03:22:14
Open
Closed
4.31
3.62
728
692
7366
4178
727
728
7370
4186
1986.04.26 03:22:15
Open
Closed
4.29
4.16
1986.04.26 03:22:16
Open
Closed
4.32
4.53
731
754
7377
4237
1986.04.26 03:22:17
Open
Closed
4.32
5.02
723
769
7370
4272
732
795
7374
4309
1986.04.26 03:22:18
Open
Closed
4.35
5.23
1986.04.26 03:22:19
Open
Closed
4.28
5.32
723
837
7369
4348
1986.04.26 03:22:20
Open
Closed
4.35
5.67
730
881
7366
4370
729
928
7328
4417
1986.04.26 03:22:21
Open
Closed
4.27
5.88
1986.04.26 03:22:22
Open
Closed
4.28
6.27
725
964
7364
4427
1986.04.26 03:22:23
Open
Closed
4.29
6.62
730
983
7339
4438 4471 4528
1986.04.26 03:22:24
Open
Closed
4.31
6.91
720
1037
7346
1986.04.26 03:22:25
Open
Closed
4.28
7.09
722
1062
7371
...
DBXten NATURE OF THE DATA ‣
Sampling rate chosen not to miss “interesting” events
‣
Repetitive or simply patterned
‣
Low information density
Great candidate for compression
DBXten BENEFITS OF COMPRESSION
E K E WE
S D N
Less Happier Less tuning, tuning, Happier administration DBAs administration DBAs Faster Faster loads loads Smaller Smaller tables tables
Happier Happier users users
Faster Faster queries queries
Smaller Smaller indexes indexes
Less Less I/O I/O
DBXten COST OF COMPRESSION
CPU CPU cycles cycles
DBXten STANDARD INFORMIX COMPRESSION
Row
Name
Dept
Salary
City
State
1
Fred Smith 500
10000 Raleigh
NC
2
John Smith 500
20000 Raleigh
NC
Symbol Row
Data stored on disk
1
01
02
03
10000
04
2
05
02
03
20000
04
Pattern
01
Fred
02
Smith
03
500
04
Raleigh NC
05
John
DBXten STANDARD ORACLE COMPRESSION Row
Name
Dept
Salary
City
State
1
Fred Smith 500
10000 Raleigh
NC
2
John Smith 500
20000 Raleigh
NC
Symbol
Row
Data stored on disk
1
01
02
03
10000
04
05
2
06
02
03
20000
04
05
Pattern
01
Fred
02
Smith
03
500
04
Raleigh
05
NC
06
John
DBXten DBXten COLUMN COMPRESSION SALINITY 4.23 4.23
SALINITY
7
×
4.23
4
×
4.24
4.23 4.23 4.23 4.23
4.28
4.23 4.24 4.24
4.29
4.24 4.24 4.28
4.31
4.29 4.31 4.33 4.33 4.33 4.33 4.33 4.33 4.33 4.33
8
×
4.33
DBXten SEQUENCES & PATTERNS 1, 3, 5, 7, 9, 11, 13 ...
Arithmetic Sequence Vi = V0 + Δ × i
1, 3, 5, 7, 9, 1, 3, 5, 7, 9, 1, 3, 5, ...
Arithmetic Cycle Vi = V0 + Δ × (i + offset) mod len
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, ...
Run Length V = n rep V0
9, 16, 7, 8, 9, 16, 7, 8, 9, 16, 7, 8, ...
Repeated Sequence Vi = Si mod len
DBXten DIFFERENCES
7, 8, 11, 16, 17, 20, 25, 26, 29 ... 7 7 7 7 7 7 ...
+ + + + +
1 1 1 1 1
Repeated Sequence of differences Δ Vi = Si mod len
+ + + +
3 3 + 5 3 + 5 + 1 3 + 5 + 1 + 3
Very useful for expressing timestamps (usually floating point values representing the number of seconds since Jan 1, 1970, UTC). For example, 4pm every Tuesday and 6pm every Thursday.
DBXten FLOATING POINT Sign
Exponent
Mantissa
‣ Very wide range of values ‣ Storing excessive precision is wasteful ‣ Rounding to a user-declared precision gives new opportunities for compression
DBXten FLOATING POINT USER DECLARED PRECISION
64 bit
16 bit
DBXten FLOATING POINT USER DECLARED PRECISION 64 bit doubles 4711.75452 4711.75503 4711.75764 4711.76013 4711.76335 4711.76435 4711.76626 4711.76782 4711.76864 4711.76876 4711.77237 4711.77408 4711.77465 4711.77844 4711.78009 4711.78062 4711.78417 4711.78454
Precision .01 471175 471176 471176 471176 471176 471176 471177 471177 471177 471177 471177 471177 471177 471178 471178 471178 471178 471178
Run length encoded 471175 5
471176
7
471177
5
471178
DBXten
US PATENT APPLICATION 11/779,791 JULY 21, 2006
DBXten LOADING
‣ One column of “chips” ‣ Self-describing, includes schema ‣ Portable chips ‣ Table in table functionality
DBXten IN OTHER WORDS… Instrument Id Block 1:
Block 2:
Block 3:
1 1 1 1 1 1 1 1
Datetime 2008-02-01 00:12:23 2008-02-01 00:12:25 2008-02-01 00:12:27 2008-02-01 00:12:29 2008-02-01 00:12:31 2008-02-01 00:12:33 … 2008-02-01 00:13:43
Latitude 46.343 46.344 46.345 46.346 46.347 46.349 … 46.392
Longitude -127.386 -127.385 -127.383 -127.382 -127.381 -127.379 … -127.335
Depth 14.34 16.82 18.85 21.22 23.66 26.05 … 104.82
1 1 1 1 1 1 1 1
2008-02-01 00:13:45 2008-02-01 00:13:47 2008-02-01 00:13:49 2008-02-01 00:13:51 2008-02-01 00:13:53 2008-02-01 00:13:55 … 2008-02-01 00:14:59
46.394 46.395 46.396 46.398 46.399 46.400 … 46.439
-127.334 -127.332 -127.331 -127.330 -127.328 -127.327 … -127.289
106.83 108.90 110.99 113.32 115.38 117.65 … 188.40
1 1 1 1 1 1 1 1
2008-02-01 00:15:01 2008-02-01 00:15:03 2008-02-01 00:15:05 2008-02-01 00:15:07 2008-02-01 00:15:09 2008-02-01 00:15:11 … 2008-02-01 00:16:15
46.441 46.442 46.443 46.444 46.446 46.447 … 46.486
-127.287 -127.286 -127.285 -127.283 -127.282 -127.281 … -127.241
190.88 193.22 195.65 197.86 200.02 202.38 … 274.68
Measurement 10.2 11.9 10.7 11.7 10.9 11.4 … 10.7
10.7 13.1 10.7 11.4 10.2 10.9 … 10.3
9.7 11.9 11.5 10.5 11.2 10.7 … 11.3
Instrument Id 1 1 1
Chip Chip 1 Chip 2 Chip 3
DBXten SUPPLIED LOADERS
‣
CSV Loader
‣
NetCDF Loader
‣
Draw and Load (DaL)
DBXten INDEXING
Bounds Bounds extraction extraction
Min
Max
Min
Max
Min
Max
DBXten INDEXING
CREATE TABLE measurements (id serial primary key, chip DSChip);
CREATE INDEX measurements_idx ON measurements(bounds(chip) dsbox_ops) USING rtree;
DBXten RETRIEVAL
Chip Chip selection selection
Tuple Tuple extraction extraction
DBXten RETRIEVAL
EXECUTE FUNCTION DSQueryToStrings( "SELECT id::integer, DSChipExtract(chip, 'dt 2008-02-01 00:13:00 2008-02-01 00:14:00, lat * *, lon * *', 'dt,value') FROM measurements WHERE overlap( DSAsBoxString(chip,'datetime,lat,lon')::DSBox, DSRangeToBox('datetime 2008-02-01 00:13:00 2008-02-01 00:14:00, lat * *,lon * *')::DSBox)" );
DBXten INFORMIX VIRTUAL TABLE INTERFACE create table measurements_VTI ( id integer, dt datetime year to fraction (4), lat double precision, lon double precision, value integer) using DSChipAccess( basetable='measurements', keylist='bounds(chip):(dt,lat,long)'); select id, dt, lat, lon, value from measurements_VTI where lat .... and lon .... and dt ....;
DBXten BENCHMARK
‣
3 GHz Intel Pentium D, 2 GB RAM OS: Fedora Core 7
‣
Loaded 25 million CSV records prepared from NOAA Sea Surface Temperature data, supplied via a UNIX pipe
‣
Four queries run against 50 million records
DBXten BENCHMARK
Conventional
With DBXten
ORACLE
INFORMIX
SQL*Loader
High Performance Loader
Direct mode with compression
IDS compression
CSV Chip Loader tool
CSV Chip Loader tool
DBXten BENCHMARK Unindexed
Indexed
Space
Load
Space
MB
secs
MB
1,894
2,836
848
Load
Query
secs
secs
3,172
3,574
205
1,410
860
1,576
85
2,024
3,036
2,556
4,069
176
733
1,318
736
1,367
23
ORACLE Conventional with DBXten INFORMIX Conventional with DBXten
DBXten PERFORMANCE IMPROVEMENT WITH DBXten Unindexed
Indexed
Space
Load
Space
Load
ORACLE
2.2
2.0
3.7
2.3
2.4
INFORMIX
2.8
2.3
3.5
3.0
7.7
Query
DBXten SUMMARY
‣ ‣
2 or 3 times improvement over vendor compression techniques C, Java & SQL APIs
‣
Fully documented
A database extension for INFORMIX, ORACLE and PostgreSQL www.barrodale.com
[email protected]