Plots and contingency tables. Plots are graphical representations of data. Plots of categorial data can be made on the basis of contingency tables

Making bar plots Plots and contingency tables Plots are graphical representations of data. Plots of categorial data can be made on the basis of cont...
19 downloads 1 Views 819KB Size
Making bar plots

Plots and contingency tables Plots are graphical representations of data. Plots of categorial data can be made on the basis of contingency tables.

First plot: pie chart function

obligatory argument

pie(x) Make a pie chart for the “miles per gallon” (mpg) variable 1. Make a contingency table of the mpg variable 2. Make a pie chart with the function pie()

First plot: pie chart function

obligatory argument

pie(x) Make a pie chart for the “miles per gallon” (mpg) variable 1. Make a contingency table of the mpg variable 2. Make a pie chart with the function pie()

First plot: pie chart

Although pie charts are very common, they are criticized by most statisticians who recommend bar or dot plots over pie charts because people are able to judge length more accurately than volume (no more pie charts!)

Bar plot

A barplot is useful to represent frequency distributions of categorial data. We can for instance try to make a barplot of the number of gears variable (gear).

First we have to make a frequency distribution, then we use the function barplot()

Bar plot

Bar plot

You can change the names of the bars with the names() function

Bar plot

http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf

You can make your plot prettier by manipulating the colors, the font, etc. Here an example for the colors, with the extra argument col

Bar plot

You can make your plot prettier by manipulating the colors, the font, etc. You can also add a title with the argument main

Bar plot

You can make your plot prettier by manipulating the colors, the font, etc. Here an example for changing font size, with the arguments cex.axis (for the frequency scale on the y-axis) and cex.names (for the names we gave to the columns), and cex.main (for the title).

Bar plots of two-way tables Suppose you want to show that there is some kind of connection between the transmission system (automatic versus manual) and the number of gears 1. Create a two-way table from the mtcars dataset, using the functions with() and table() – the relevant variables are gear and am. Assign it to some name. 2. Make a barplot of this

Bar plots of two-way tables Is this an insightful table? Why, why not? What can we do to make it more insightful?

Bar plots of two-way tables 1. Adding names to the columns

Bar plots of two-way tables 1. Adding names to the columns

> barplot(mytwowaytable, names=c(“automatic”, “manual”))

Bar plots of two-way tables 1. Adding names to the columns 2. Adding a title to the graph

Bar plots of two-way tables 1. Adding names to the columns 2. Adding a title to the graph

> barplot(mytwowaytable, names=c(“automatic”, “manual”), main=“transmission and number of gears))

Bar plots of two-way tables http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf 1. Adding names to the columns 2. Adding a title to the graph 3. Changing the colors

Bar plots of two-way tables 1. Adding names to the columns 2. Adding a title to the graph 3. Changing the colors

> barplot(mytwowaytable, names=c(“automatic”, “manual”), main=“transmission and number of gears), col=c(“aquamarine4”, “cadetblue4”, “chartreuse4”))

Bar plots of two-way tables 1. 2. 3. 4.

Adding names to the columns Adding a title to the graph Changing the colors Add a legend with the command legend.text()

> barplot(mytwowaytable, names=c(“automatic”, “manual”), main=“transmission and number of gears”, col=c(“aquamarine”, “cadetblue4”, “chartreuse4”), legend.text=c(“three gears”, “four gears”, “five gears”))

Bar plots of two-way tables 1. 2. 3. 4.

Adding names to the columns Adding a title to the graph Changing the colors Add a legend with the command legend.text() 5. Alternative: grouped bar plot

> barplot(mytwowaytable, names=c(“automatic”, “manual”), main=“transmission and number of gears”, col=c(“aquamarine”, “cadetblue4”, “chartreuse4”), legend.text=c(“three gears”, “four gears”, “five gears”), beside=TRUE)

Bar plots summary A bar plot is a good way of visually representing categorial data. Bar plots are based on contingency, or frequency tables >> table() Bar plots are made with the function barplot(), where the minimal argument is a contingency table. Further graphical parameters can be added, like names= for adding names to the columns (vector) main= for a general title col= for colors of the bars (vector) legend.text= to add a legend to the graph and the names of the variables beside=TRUE for a grouped bar plot rather than a stacked one

> barplot(mytwowaytable, names=c(“automatic”, “manual”), main=“transmission and number of gears”, col=c(“aquamarine”, “cadetblue4”, “chartreuse4”), legend.text=c(“three gears”, “four gears”, “five gears”), beside=TRUE)

Making bar plots 2: do it yourself

Getting a dataset #2

Kabacoff 2011

Getting a dataset #2

Kabacoff 2011

Appropriate formats: CSV • You can’t enter data directly from Excel, you have to save it in another format (which is, thankfully, easy) • A comma-separated values (.csv) file stores tabular (=table) data in plain-text form • a .csv file consists of any number of records (=rows) separated by line breaks of some kind • each record consists of fields (=columns) separated by some character, most commonly a comma or semicolon

Appropriate formats: CSV

Excel may complain, just ‘yes’and ‘OK’ your way through it. Save it, close the open file (when it asks you to save it, say no).

Appropriate formats: CSV Depending on your software system, you may have to replace the ; with ,

Or You have to tell R what your separator is.

...

Appropriate formats: Tab-delimited • A tab-delimited file (.txt) stores tabular (=table) data in plain-text form • A tab-delimited file consists of any number of records (=rows) separated by line breaks of some kind • Each record consists of fields (=columns) separated by a tab space

Appropriate formats: Tab-delimited

Appropriate formats: Tab-delimited

Getting datasets into R #1 Reading a csv file using the search path On a Mac languages