CS448B
:: 23 Sep 2009
Data and Image Models Last Time: Value of Visualization
Jeffrey Heer Stanford University
Three functions of visualizations
Other recording instruments
Record: store information x
Ph Photographs, h blueprints, bl …
Analyze: support reasoning about information x x x
Process and calculate Reason about data Feedback and interaction
Communicate: convey information to others x x x
Share and persuade Collaborate and revise Emphasize important aspects of data
Marey’s sphygmograph [from Braun 83]
1
Make a decision: Challenger
“to affect thro’ the Eyes what h we ffaill to convey to the public through their word-proof ears” 1856 “Coxcomb” of Crimean War Deaths, Florence Nightingale
Visualizations drawn by Tufte show how low temperatures damage O-rings [Tufte 97]
Info-Vis vs. Sci-Vis?
Visualization Reference Model Data Raw Data
Visual Form Data Tables
Data Transformations
Visual Structures
Visual Encodings
Task Views
View Transformations
2
The Big Picture task
Data and Image Models
data physical type int, float, etc.
processing algorithms
abstract type nominal, ordinal, etc.
domain
mapping
image visual channel retinal variables
visual encoding visual metaphor
metadata semantics conceptual model
Topics Properties of data or information Properties of the image Mapping data to images
Data
3
Data models vs. Conceptual models
Taxonomy (?)
Data models are low level descriptions of the data
x x x x x x x
x Math: M h S Sets with h operations on them h x Example: integers with + and × operators
Conceptual models are mental constructions x Include semantics and support reasoning
Examples (data vs. conceptual) x (1D floats) vs. Temperature x (3D vector of floats) vs. Space
Types of variables Physical types x Ch Characterized t i db by storage t fformatt x Characterized by machine operations Example: bool, short, int32, float, double, string, …
Abstract types x Provide descriptions of the data x May be characterized by methods/attributes x May be organized into a hierarchy Example: plants, animals, metazoans, …
1D (sets and sequences) Temporal 2D (maps) 3D (shapes) nD (relational) Trees (hierarchies) Networks (graphs)
x Are there others? The eyes have it: A task by data type taxonomy for information visualization [Shneiderman 96]
Nominal, Ordinal and Quantitative N - Nominal (labels) x Fruits: Apples, oranges, …
O - Ordered x Quality of meat: Grade A, AA, AAA
Q - Interval (Location of zero arbitrary) x Dates: Jan, 19, 2006; Location: (LAT 33.98, LONG -118.45) x Like a geometric point. Cannot compare directly x Onlyy differences (i.e. ( intervals)) mayy be compared p
Q - Ratio (zero fixed) x Physical measurement: Length, Mass, Temp, … x Counts and amounts x Like a geometric vector, origin is meaningful S. S. Stevens, On the theory of scales of measurements, 1946
4
Nominal, Ordinal and Quantitative
From data model to N,O,Q data type
N - Nominal (labels)
Data model
x Operations: =, = ≠
O - Ordered x Operations: =, ≠,
Q - Interval (Location of zero arbitrary) x Operations: =, ≠, , x Can measure distances or spans
Q - Ratio (zero fixed) x Operations: =, ≠, , -, ÷ x Can measure ratios or proportions
x 32.5, 54.0, -17.3, … x floats
Conceptual model x Temperature (°C)
Data type x Burned vs. Not burned (N) x Hot, warm, cold (O) x Continuous range of values (Q)
S. S. Stevens, On the theory of scales of measurements, 1946
Q O N Sepal and petal lengths and widths for three species of iris [Fisher 1936].
5
Relational data model
Relational Algebra [Codd]
x x x x
x x x x x x x
Records are fixed-length tuples E h column Each l (attribute) ( tt ib t ) off tuple t l has h a domain d i (type) (t ) Relation is schema and a table of tuples Database is a collection of relations
Data transformations (sql) Selection (select) Projection (where) Sorting (order by) Aggregation (group by, sum, min, …) Set operations (union, …) Join (inner join)
Statistical data model
Statistical data model
x Variables or measurements x Categories or factors or dimensions x Observations or cases
x Variables or measurements x Categories or factors or dimensions x Observations or cases Month March April May June July August
Control 165 162 164 162 166 163
Placebo 163 159 158 161 158 158
300 mg 166 161 161 158 160 157
450 mg 168 163 153 160 148 150
Blood Pressure Study (4 treatments, 6 months)
6
Dimensions and Measures
Dimensions and Measures
Independent vs. dependent variables
Dimensions: Discrete variables describing data Dates, categories of values (independent vars)
x E Example: l y = f(x,a) f( ) x Dimensions: Domain(x) × Domain(a) x Measures: Range(y)
Measures: Data values that can be aggregated Numbers to be analyzed (dependent vars) Aggregate as sum, sum count count, average average, std std. deviation
Example: U.S. Census Data
Example: U.S. Census
People: # of people in group Year: 1850 – 2000 (every decade) Age: 0 – 90+ Sex: Male, Female Marital Status: Single, Married, Divorced, …
People Year Age Sex Marital Status 2348 data points
7
Census: Dimension or Measure?
Roll-Up and Drill-Down
People Count Year Age Sex (M/F) Marital Status
Want to examine marital status in each decade? Roll-up the data along the desired dimensions
Measure Dimension Depends! Dimension Dimension
Dimensions
Measure
SELECT year, marst, sum(people) FROM census GROUP BY year, marst; Dimensions
All Marital Status 2000 1990 1980 1970
Roll-Up and Drill-Down A Age
60+
Need more detailed information? Drill-down into additional dimensions
40-59 20-39 Sum along Marital Status
Widowed
Divorced
Married
SELECT year, age, marst, sum(people) FROM census GROUP BY year, age, marst;
Single
0-19
Sum along Age
Marital Status
All Ages
All Years Sum along Year
8
All Marital Status 2000 1990 1980 1970
Roll-Up
A Age
60+
Drill-Down
40-59 20-39
Row vs. Column-Oriented Databases
Sum along Marital Status
Widowed
Divorced
Married
Single
0-19
Sum along Age
Marital Status
All Ages
All Years Sum along Year
Relational Data Organizations Transactions Row-oriented
vs.
Analysis
Relational Data Organizations Row-oriented
Column-oriented
Column-oriented
9
Relational Data Organizations Speed-up Analysis Reduce data transfer Improved locality Better data compression
Column-oriented
Administrivia
Announcements
Assignment 1: Visualization Design
Auditors
Design a static visualization for a given data set.
x Requirements: R Come C to class l and d participate (online ( l as well) ll)
Class participation requirements
Deliverables (post to the course wiki)
x Complete readings before class x In-class discussion x Post at least 1 discussion substantive comment/question on wiki within a week of each lecture
x Image of your visualization x Short description and design rationale (≤ 4 para.)
Due by 7:00am on Monday 9/28.
Class wiki: http://cs448b.stanford.edu
10
Image
Visual language is a sign system
Bertin’s Semiology of Graphics
Images g perceived p as a set of signs g Sender encodes information in signs Receiver decodes information from signs
C
Sémiologie Graphique, 1967 Jacques Bertin
B A
1. A,, B,, C are distinguishable g 2. B is between A and C. 3. BC is twice as long as AB. ∴ Encode quantitative variables
"Resemblance, order and proportion are the three signifieds in graphics.” - Bertin
11
Visual encoding variables Position (x 2) Si Size Value Texture Color Orientation Shape
Visual encoding variables
Information in color and value
Position L Length th Area Volume Value Texture Color Orientation Shape Transparency Blur / Focus …
Value is perceived as ordered ∴Encode E d ordinal d l variables bl (O)
∴ Encode continuous variables (Q) [not as well]
Hue is normally perceived as unordered ∴ Encode nominal variables (N) using color
12
Bertin’s “Levels of Organization” Position
N
O
Q
Size
N
O
Q
Value
N
O
Q
Texture
N
O
Color
N
Orientation
N
Shape
N
Nominal Od d Ordered Quantitative
Design Space of Visual Encodings
Note: Q < O < N
Note: Bertin actually breaks visual variables down into differentiating (≠) and associating (≡)
factors
Univariate data
factors
A B C 1
variable
A B C
Univariate data
variable
1
7
Tukey box plot
5 B
A
3
C
D
E
low
1 A
B
C
Middle 50%
high
Mean
D
0
A
B
C
20
D
13
Bivariate data
A B C 1 2
A B C
Trivariate data
C C B
3D scatter plot is possible
B
F
D
F
1 2 3
B
E A
E A
Scatter plot is common
F
C
D
E
A D
G
Three variables
Large design space (visual metaphors)
Two variables [x,y] can map to points x Scatterplots, S l maps, …
Third variable [z] must use x Color, size, shape, …
[Bertin, Graphics and Graphic Info. Processing, 1981]
14
Multidimensional data
Multidimensional data
How many variables can be depicted in an image?
A B C 1 2 3 4 5 6 7 8
How many variables can be depicted in an image?
“With up to three rows, a data table can be y as a single g image g … However,, constructed directly an image has only three dimensions. And this barrier is impassible.” Bertin
A B C 1 2 3 4 5 6 7 8
Playfair 1786 Deconstructions
15
Playfair 1786
Wattenberg 1998
x-axis: year (Q) y-axis: currency (Q) color: imports/exports (N, O) http://www.smartmoney.com/marketmap/
Wattenberg 1998
Minard 1869: Napoleon’s march
rectangle size: market cap (Q) rectangle position: market sector (N), market cap (Q) color hue: loss vs. gain (N, O) color value: magnitude of loss or gain (Q)
16
Single axis composition
Mark composition y-axis: i ttemperature t (Q)
+
x-axis: longitude (Q) / time (O)
+ = temp over space/time (Q x Q)
=
[based on slide from Mackinlay]
Mark composition
longitude (Q)
latitude (Q)
y-axis: longitude (Q)
+ + =
[based on slide from Mackinlay]
army size (Q)
x-axis: latitude (Q)
width: army size (Q)
temperature (Q)
army position (Q x Q) and army size (Q) [based on slide from Mackinlay]
latitude (Q) / time (O) [based on slide from Mackinlay]
17
Minard 1869: Napoleon’s march Automated design Jock Mackinlay’s APT 86
Depicts at least 5 quantitative variables. Any others?
Combinatorics of Encodings
Design Criteria (Mackinlay)
Challenge: Pick the best encoding from the exponential number of possibilities (n+1)8
Expressiveness A set off ffacts is expressible bl in a visuall language l iff the h sentences (i.e. the visualizations) in the language express all the facts in the set of data, and only the facts in the data.
Principle of Consistency: The properties of the image (visual variables) should match the properties of the data. Principle of Importance Ordering: Encode the most important information in the most effective way.
18
Cannot express the facts
Expresses facts not in the data
A one-to-many (1 → N) relation cannot be expressed in a single i l h horizontal i t ld dott plot l tb because multiple lti l ttuples l are mapped to the same position
A length is interpreted as a quantitative value; ∴ Length L th off bar b says something thi untrue t about b t N data d t
[Mackinlay, APT, 1986]
Design Criteria (Mackinlay)
Mackinlay’s Ranking
Expressiveness A set off ffacts is expressible bl in a visuall language l iff the h sentences (i.e. the visualizations) in the language express all the facts in the set of data, and only the facts in the data.
Effectiveness A visualization is more effective than another visualization if the information conveyed by one visualization is more readily perceived than the information in the other visualization. (Effectiveness subject of the Graphical Perception lecture) Conjectured effectiveness of the encoding
19
Mackinlay’s Design Algorithm
Limitations
User formally specifies data model and type
Does not cover many visualization techniques
x Additional Add l input: ordered d d list l off data d variables bl to show h
x Bertin B and d others h d discuss networks, k maps, diagrams d x Does not consider 3D, animation, illustration, photography, …
APT searches over design space x Tests expressiveness of each visual encoding x Generates image for encodings that pass test x Tests T t perceptual t l effectiveness ff ti off resulting lti image i
Does not model interaction
Outputs the “most effective” visualization
Summary
Assignment 1: Visualization Design
Formal specification
Design a static visualization for a given data set.
x D Data model d l x Image model x Encodings mapping data to image
Choose expressive and effective encodings x Formal F l ttestt off expressiveness i x Experimental tests of perceptual effectiveness
Deliverables (post to the course wiki) x Image of your visualization x Short description and design rationale (≤ 4 para.)
Due by 7:00am on Monday 9/28.
20