Data and Image Models

CS448B :: 23 Sep 2009 Data and Image Models Last Time: Value of Visualization Jeffrey Heer Stanford University Three functions of visualizations ...
31 downloads 0 Views 2MB Size
CS448B

:: 23 Sep 2009

Data and Image Models Last Time: Value of Visualization

Jeffrey Heer Stanford University

Three functions of visualizations

Other recording instruments

Record: store information x

Ph Photographs, h blueprints, bl …

Analyze: support reasoning about information x x x

Process and calculate Reason about data Feedback and interaction

Communicate: convey information to others x x x

Share and persuade Collaborate and revise Emphasize important aspects of data

Marey’s sphygmograph [from Braun 83]

1

Make a decision: Challenger

“to affect thro’ the Eyes what h we ffaill to convey to the public through their word-proof ears” 1856 “Coxcomb” of Crimean War Deaths, Florence Nightingale

Visualizations drawn by Tufte show how low temperatures damage O-rings [Tufte 97]

Info-Vis vs. Sci-Vis?

Visualization Reference Model Data Raw Data

Visual Form Data Tables

Data Transformations

Visual Structures

Visual Encodings

Task Views

View Transformations

2

The Big Picture task

Data and Image Models

data physical type int, float, etc.

processing algorithms

abstract type nominal, ordinal, etc.

domain

mapping

image visual channel retinal variables

visual encoding visual metaphor

metadata semantics conceptual model

Topics Properties of data or information Properties of the image Mapping data to images

Data

3

Data models vs. Conceptual models

Taxonomy (?)

Data models are low level descriptions of the data

x x x x x x x

x Math: M h S Sets with h operations on them h x Example: integers with + and × operators

Conceptual models are mental constructions x Include semantics and support reasoning

Examples (data vs. conceptual) x (1D floats) vs. Temperature x (3D vector of floats) vs. Space

Types of variables Physical types x Ch Characterized t i db by storage t fformatt x Characterized by machine operations Example: bool, short, int32, float, double, string, …

Abstract types x Provide descriptions of the data x May be characterized by methods/attributes x May be organized into a hierarchy Example: plants, animals, metazoans, …

1D (sets and sequences) Temporal 2D (maps) 3D (shapes) nD (relational) Trees (hierarchies) Networks (graphs)

x Are there others? The eyes have it: A task by data type taxonomy for information visualization [Shneiderman 96]

Nominal, Ordinal and Quantitative N - Nominal (labels) x Fruits: Apples, oranges, …

O - Ordered x Quality of meat: Grade A, AA, AAA

Q - Interval (Location of zero arbitrary) x Dates: Jan, 19, 2006; Location: (LAT 33.98, LONG -118.45) x Like a geometric point. Cannot compare directly x Onlyy differences (i.e. ( intervals)) mayy be compared p

Q - Ratio (zero fixed) x Physical measurement: Length, Mass, Temp, … x Counts and amounts x Like a geometric vector, origin is meaningful S. S. Stevens, On the theory of scales of measurements, 1946

4

Nominal, Ordinal and Quantitative

From data model to N,O,Q data type

N - Nominal (labels)

Data model

x Operations: =, = ≠

O - Ordered x Operations: =, ≠,

Q - Interval (Location of zero arbitrary) x Operations: =, ≠, , x Can measure distances or spans

Q - Ratio (zero fixed) x Operations: =, ≠, , -, ÷ x Can measure ratios or proportions

x 32.5, 54.0, -17.3, … x floats

Conceptual model x Temperature (°C)

Data type x Burned vs. Not burned (N) x Hot, warm, cold (O) x Continuous range of values (Q)

S. S. Stevens, On the theory of scales of measurements, 1946

Q O N Sepal and petal lengths and widths for three species of iris [Fisher 1936].

5

Relational data model

Relational Algebra [Codd]

x x x x

x x x x x x x

Records are fixed-length tuples E h column Each l (attribute) ( tt ib t ) off tuple t l has h a domain d i (type) (t ) Relation is schema and a table of tuples Database is a collection of relations

Data transformations (sql) Selection (select) Projection (where) Sorting (order by) Aggregation (group by, sum, min, …) Set operations (union, …) Join (inner join)

Statistical data model

Statistical data model

x Variables or measurements x Categories or factors or dimensions x Observations or cases

x Variables or measurements x Categories or factors or dimensions x Observations or cases Month March April May June July August

Control 165 162 164 162 166 163

Placebo 163 159 158 161 158 158

300 mg 166 161 161 158 160 157

450 mg 168 163 153 160 148 150

Blood Pressure Study (4 treatments, 6 months)

6

Dimensions and Measures

Dimensions and Measures

Independent vs. dependent variables

Dimensions: Discrete variables describing data Dates, categories of values (independent vars)

x E Example: l y = f(x,a) f( ) x Dimensions: Domain(x) × Domain(a) x Measures: Range(y)

Measures: Data values that can be aggregated Numbers to be analyzed (dependent vars) Aggregate as sum, sum count count, average average, std std. deviation

Example: U.S. Census Data

Example: U.S. Census

People: # of people in group Year: 1850 – 2000 (every decade) Age: 0 – 90+ Sex: Male, Female Marital Status: Single, Married, Divorced, …

People Year Age Sex Marital Status 2348 data points

7

Census: Dimension or Measure?

Roll-Up and Drill-Down

People Count Year Age Sex (M/F) Marital Status

Want to examine marital status in each decade? Roll-up the data along the desired dimensions

Measure Dimension Depends! Dimension Dimension

Dimensions

Measure

SELECT year, marst, sum(people) FROM census GROUP BY year, marst; Dimensions

All Marital Status 2000 1990 1980 1970

Roll-Up and Drill-Down A Age

60+

Need more detailed information? Drill-down into additional dimensions

40-59 20-39 Sum along Marital Status

Widowed

Divorced

Married

SELECT year, age, marst, sum(people) FROM census GROUP BY year, age, marst;

Single

0-19

Sum along Age

Marital Status

All Ages

All Years Sum along Year

8

All Marital Status 2000 1990 1980 1970

Roll-Up

A Age

60+

Drill-Down

40-59 20-39

Row vs. Column-Oriented Databases

Sum along Marital Status

Widowed

Divorced

Married

Single

0-19

Sum along Age

Marital Status

All Ages

All Years Sum along Year

Relational Data Organizations Transactions Row-oriented

vs.

Analysis

Relational Data Organizations Row-oriented

Column-oriented

Column-oriented

9

Relational Data Organizations Speed-up Analysis Reduce data transfer Improved locality Better data compression

Column-oriented

Administrivia

Announcements

Assignment 1: Visualization Design

Auditors

Design a static visualization for a given data set.

x Requirements: R Come C to class l and d participate (online ( l as well) ll)

Class participation requirements

Deliverables (post to the course wiki)

x Complete readings before class x In-class discussion x Post at least 1 discussion substantive comment/question on wiki within a week of each lecture

x Image of your visualization x Short description and design rationale (≤ 4 para.)

Due by 7:00am on Monday 9/28.

Class wiki: http://cs448b.stanford.edu

10

Image

Visual language is a sign system

Bertin’s Semiology of Graphics

Images g perceived p as a set of signs g Sender encodes information in signs Receiver decodes information from signs

C

Sémiologie Graphique, 1967 Jacques Bertin

B A

1. A,, B,, C are distinguishable g 2. B is between A and C. 3. BC is twice as long as AB. ∴ Encode quantitative variables

"Resemblance, order and proportion are the three signifieds in graphics.” - Bertin

11

Visual encoding variables Position (x 2) Si Size Value Texture Color Orientation Shape

Visual encoding variables

Information in color and value

Position L Length th Area Volume Value Texture Color Orientation Shape Transparency Blur / Focus …

Value is perceived as ordered ∴Encode E d ordinal d l variables bl (O)

∴ Encode continuous variables (Q) [not as well]

Hue is normally perceived as unordered ∴ Encode nominal variables (N) using color

12

Bertin’s “Levels of Organization” Position

N

O

Q

Size

N

O

Q

Value

N

O

Q

Texture

N

O

Color

N

Orientation

N

Shape

N

Nominal Od d Ordered Quantitative

Design Space of Visual Encodings

Note: Q < O < N

Note: Bertin actually breaks visual variables down into differentiating (≠) and associating (≡)

factors

Univariate data

factors

A B C 1

variable

A B C

Univariate data

variable

1

7

Tukey box plot

5 B

A

3

C

D

E

low

1 A

B

C

Middle 50%

high

Mean

D

0

A

B

C

20

D

13

Bivariate data

A B C 1 2

A B C

Trivariate data

C C B

3D scatter plot is possible

B

F

D

F

1 2 3

B

E A

E A

Scatter plot is common

F

C

D

E

A D

G

Three variables

Large design space (visual metaphors)

Two variables [x,y] can map to points x Scatterplots, S l maps, …

Third variable [z] must use x Color, size, shape, …

[Bertin, Graphics and Graphic Info. Processing, 1981]

14

Multidimensional data

Multidimensional data

How many variables can be depicted in an image?

A B C 1 2 3 4 5 6 7 8

How many variables can be depicted in an image?

“With up to three rows, a data table can be y as a single g image g … However,, constructed directly an image has only three dimensions. And this barrier is impassible.” Bertin

A B C 1 2 3 4 5 6 7 8

Playfair 1786 Deconstructions

15

Playfair 1786

Wattenberg 1998

x-axis: year (Q) y-axis: currency (Q) color: imports/exports (N, O) http://www.smartmoney.com/marketmap/

Wattenberg 1998

Minard 1869: Napoleon’s march

rectangle size: market cap (Q) rectangle position: market sector (N), market cap (Q) color hue: loss vs. gain (N, O) color value: magnitude of loss or gain (Q)

16

Single axis composition

Mark composition y-axis: i ttemperature t (Q)

+

x-axis: longitude (Q) / time (O)

+ = temp over space/time (Q x Q)

=

[based on slide from Mackinlay]

Mark composition

longitude (Q)

latitude (Q)

y-axis: longitude (Q)

+ + =

[based on slide from Mackinlay]

army size (Q)

x-axis: latitude (Q)

width: army size (Q)

temperature (Q)

army position (Q x Q) and army size (Q) [based on slide from Mackinlay]

latitude (Q) / time (O) [based on slide from Mackinlay]

17

Minard 1869: Napoleon’s march Automated design Jock Mackinlay’s APT 86

Depicts at least 5 quantitative variables. Any others?

Combinatorics of Encodings

Design Criteria (Mackinlay)

Challenge: Pick the best encoding from the exponential number of possibilities (n+1)8

Expressiveness A set off ffacts is expressible bl in a visuall language l iff the h sentences (i.e. the visualizations) in the language express all the facts in the set of data, and only the facts in the data.

Principle of Consistency: The properties of the image (visual variables) should match the properties of the data. Principle of Importance Ordering: Encode the most important information in the most effective way.

18

Cannot express the facts

Expresses facts not in the data

A one-to-many (1 → N) relation cannot be expressed in a single i l h horizontal i t ld dott plot l tb because multiple lti l ttuples l are mapped to the same position

A length is interpreted as a quantitative value; ∴ Length L th off bar b says something thi untrue t about b t N data d t

[Mackinlay, APT, 1986]

Design Criteria (Mackinlay)

Mackinlay’s Ranking

Expressiveness A set off ffacts is expressible bl in a visuall language l iff the h sentences (i.e. the visualizations) in the language express all the facts in the set of data, and only the facts in the data.

Effectiveness A visualization is more effective than another visualization if the information conveyed by one visualization is more readily perceived than the information in the other visualization. (Effectiveness subject of the Graphical Perception lecture) Conjectured effectiveness of the encoding

19

Mackinlay’s Design Algorithm

Limitations

User formally specifies data model and type

Does not cover many visualization techniques

x Additional Add l input: ordered d d list l off data d variables bl to show h

x Bertin B and d others h d discuss networks, k maps, diagrams d x Does not consider 3D, animation, illustration, photography, …

APT searches over design space x Tests expressiveness of each visual encoding x Generates image for encodings that pass test x Tests T t perceptual t l effectiveness ff ti off resulting lti image i

Does not model interaction

Outputs the “most effective” visualization

Summary

Assignment 1: Visualization Design

Formal specification

Design a static visualization for a given data set.

x D Data model d l x Image model x Encodings mapping data to image

Choose expressive and effective encodings x Formal F l ttestt off expressiveness i x Experimental tests of perceptual effectiveness

Deliverables (post to the course wiki) x Image of your visualization x Short description and design rationale (≤ 4 para.)

Due by 7:00am on Monday 9/28.

20

Suggest Documents