VIEWS: 14 PAGES: 52 POSTED ON: 3/5/2010 Public Domain
More Complex Graphics in R Fish 507 F: Lecture 8 Recommended Readings • R Graphics – R Graphics: Chapter 3/4 (Paul Murrell, 2006) – http://www.statsnetbase.com/books/2884/c486x_fm.pdf • How to Display Data Badly (Howard Wainer, 1984) – http://www.jstor.org/stable/2683253 Outline • The aim of this lecture is to explore more of the options in par in order to create more informative plots and introduce other high-level plotting routines in R – Working with multiple layouts – Saving plots to files – 3-dimensional plots – Trellis Graphics Possum Data • The possum data frame consists of nine morphometric measurements on each of 104 mountain brushtail possums, trapped at seven sites from Southern Victoria to central Queensland Australia – Data comes from DAAG package in R – Available at: http://students.washington.edu/gtommy/FISH507F/data/ > possum <- read.csv("http://students.washington.edu/gtom . . . > head(possum, n=3) case site Pop sex age hdlngth skullw totlngth taill . . . C3 1 1 Vic m 8 94.1 60.4 89.0 36.0 . . . C5 2 1 Vic f 6 92.5 57.6 91.5 36.5 . . . C10 3 1 Vic f 6 94.0 60.0 95.5 39.0 . . . Possum Data • Case - observation number • Site - one of seven locations where possums were trapped • Pop - a factor which classifies the sites as Vic Victoria, other New South Wales or Queensland • Sex - a factor with levels f female, m male • Age - age • Hdlngth - head length • Skullw - skull width • Totlngth - total length • Taill - tail length • Footlgth - foot length • Earconch - ear conch length • Eye - distance from medial canthus to lateral canthus of right eye • Chest - chest girth (in cm) • Belly - belly girth (in cm) Opening the Graphics Device • windows() opens a new blank graphing device • This function has a variety of uses – Changing background color – Color pixels per inch – Specifying the size of the plotting window (default 7 x 7) • When two plots are placed in the same graphing device specifying windows(width=14,height=7) will prevent the plots from looking squished. – This can also be performed by hand by resizing the window Closing the graphing device • Since windows() will create a new graphing device, it’s possible to have multiple graphing devices open simultaneously in R – This can be useful in some situations • To close the current graphing device use dev.off() Multiple graphs • It’s often useful to put multiple graphs on the same active R Graphics device • This is achieved with mfrow / mfcol specifications in par – Takes as input a vector specifying the number of rows and number of columns: c(nr, nc) – mfrow: multiple plots on one graph filled by row – mfcol: same but by column • This command needs to be given prior to making the plots Default way to make 3 plots (only specify mfrow) 100 100 100 95 95 95 Head Length Head Length Head Length 90 90 90 85 85 85 75 80 85 90 95 32 34 36 38 40 42 50 55 60 65 Total Length Tail Length Skull Weight Changing the margins • When multiple graphs are laid out space can be optimized by modifying the default margins • par(mar=c(bottom, left, top, right)) – Default is c(5, 4, 4, 2) + 0.1 • Note that if these specifications are not chosen carefully each individual plot may not be the same size making plot comparisons on the same scale misleading Make use of windows() and mar 100 100 100 95 95 95 Head Length 90 90 90 85 85 85 75 80 85 90 95 32 34 36 38 40 42 50 55 60 65 Total Length Tail Length Skull Weight What’s sub-optimal about this plot? 95 95 possum$totlngth possum$totlngth 90 90 85 85 80 80 75 75 22 24 26 28 30 32 25 30 35 40 possum$chest possum$belly Histogram of possum$chest Histogram of possum$belly 30 20 25 20 15 Frequency Frequency 15 10 10 5 5 0 0 22 24 26 28 30 32 25 30 35 40 possum$chest possum$belly Hands-on Exercise 1 • As a plotting review, make changes to the previous plot based on our discussion and the contributed suggestions. Two of the changes I had in mind were: – Redundant labeling and spacing – Optimize space – Putting the plots on the same scale Making customized layouts • The layout() function provides an alternative to the mfrow and mfcol settings. • The primary difference is that the layout() function allows the creation of multiple figure regions of unequal sizes and is much more flexible • The first argument (and the only required argument) to the layout() function is a matrix. The number of rows and columns in the matrix determines the number of rows and columns in the layout. The contents of the matrix are integer values that determine which rows and columns each figure will occupy. layout() layout(mat, widths = rep(1, ncol(mat)), heights = rep(1, nrow(mat)), respect = FALSE) • mat a matrix object specifying the location of the next N figures on the output device. Each value in the matrix must be 0 or a positive integer. If N is the largest positive integer in the matrix, then the integers {1,...,N-1} must also appear at least once in the matrix. • widths a vector of values for the widths of columns on the device. Relative widths are specified with numeric values. Absolute widths (in centimetres) are specified with the lcm() function (see examples). • heights a vector of values for the heights of rows on the device. Relative and absolute heights can be specified, see widths above. • respect argument controls whether a unit column-width is the same physical measurement on the device as a unit row-height Understanding layout() • First create a matrix with numbers for each location to plot > ( lay1.mat <- matrix(c(1,1,0,2), 2, 2, byrow = TRUE) ) [,1] [,2] [1,] 1 1 [2,] 0 2 1 • Assign and show the layout > lay1 <- layout(mat=lay1.mat) > layout.show(lay1) 2 Understanding layout() > ( lay2.mat <- matrix(c(1,1,3,2), 2, 2, byrow = TRUE) ) [,1] [,2] [1,] 1 1 [2,] 3 2 > lay2 <- layout(mat=lay2.mat) > layout.show(lay2) 1 3 2 Understanding layout() > ( lay3.mat <- matrix(c(1,1,1,1,2,3), nrow=3, byrow = TRUE) ) [,1] [,2] [1,] 1 1 [2,] 1 1 [3,] 2 3 > lay3 <- layout(mat=lay3.mat) 1 > layout.show(lay3) plot(possum$hdlngth~possum$totlngth) hist(possum$hdlngth) hist(possum$totlngth) 2 3 100 95 possum$hdlngth 90 85 75 80 85 90 95 possum$totlngth Histogram of possum$hdlngth Histogram of possum$totlngth 25 20 20 15 Frequency Frequency 15 10 10 5 5 0 0 85 90 95 100 75 80 85 90 95 possum$hdlngth possum$totlngth Understanding layout() • In this example we will make a more custom layout > ( lay4.mat <- matrix(c(2,0,1,3),nrow=2,byrow=TRUE) ) [,1] [,2] [1,] 2 0 2 [2,] 1 3 > lay4 <- layout(lay4.mat, c(3,1), c(1,3), respect=TRUE) > layout.show(lay4) Note the use of widths/heights to control the size of the regions 1 3 Putting it all together • Design the layout – Note that the width/heights argument is now specified lay4.mat <- matrix(c(2,0,1,3),nrow=2,byrow=TRUE) lay4 <- layout(lay4.mat, c(3,1), c(1,3), respect=TRUE) layout.show(lay4) Don’t plot anything • Get histogram counts to use in barplot() hdlngth.hist <- hist(possum$hdlngth, plot = F) totlngth.hist <- hist(possum$totlngth, plot = F) Putting it all together • Specify margins and plot the figures – Note use of horizontal=T par(mar=c(3,2.5,1,1)) plot(possum$hdlngth~possum$totlngth, main = "") par(mar=c(0,3,1,1)) barplot(totlngth.hist$counts) par(mar=c(3,0,1,1)) barplot(hdlngth.hist$counts, horiz = T) possum$hdlngth 85 90 95 100 0 5 10 15 20 75 80 85 90 95 0 5 10 15 20 25 Saving plotting output to files • Since graphs can take a long time to produce or the analysis that went into a graph may have taken days to converge, it is a good idea to save important graphs to files • This can be done with a variety functions in R conveniently named after the common file name extension (*. jpeg – jpeg()) • My advice Program R Function Reason Microsoft Word/PowerPoint win.metafile() Image resizable without losing quality LaTex pdf() Native format of document Other png() All around high quality image Creating an image • All the functions in R to create images have 4 arguments in common – filename=“name.extension” – pointsize – size (resolution) of the points – width/height – controls size of image. Note that by default this 7 x 7 so if you use windows() to make a larger plotting device, the width/height should also match those dimensions. • The image statement should always precede the commands to make the plot. • When finished making the plot, use dev.off() to turn finish writing the file png() • Set your working directory or specify the path in the filename statement before saving an image png("possum.png", pointsize=12, width=480, height=480) lay4.mat <- matrix(c(2,0,1,3),nrow=2,byrow=TRUE) lay4 <- layout(lay4.mat, c(3,1), c(1,3), respect=TRUE) Note that the width/height . . . . . argument is in pixels barplot(hdlngth.hist$counts, horiz = T) dev.off() pdf() pdf("possum.pdf", pointsize=12, width=7, height=7) lay4.mat <- matrix(c(2,0,1,3),nrow=2,byrow=TRUE) lay4 <- layout(lay4.mat, c(3,1), c(1,3), respect=TRUE) Note that the width/height . . . . . argument is in inches barplot(hdlngth.hist$counts, horiz = T) dev.off() • Now look for the files in the folder where they were saved 3-dimensional graphs • There are number of different functions in R to create graphs that convey information about 3-dimensions – Think about which plots to use. A bubble plot (2-dimensional bubble sizes corresponding to z-values) in some cases can be more informative than 3-dimensional perspective plot • For most 3-dimensional plots in R and the x and y values are given as vectors in ascending orders and the z values are given as a matrix with the number of rows corresponding to x and number of columns corresponding to y – The data we will work with will be in this format. If it weren’t, what are some approaches to getting there ? • Think back to the data manipulation lectures . . . volcano data • Maunga Whau (Mt Eden) is one of about 50 volcanos in the Auckland, New Zealand volcanic field. The data consists of 87 rows and 61 columns, rows corresponding to grid lines running east to west and columns to grid lines running south to north. – volcano is the z-matrix – Take a look at the volcano data • Open the help file on filled.contour – ?filled.contour filled.contour x <- 10*1:nrow(volcano) Data is in hundreds of meters y <- 10*1:ncol(volcano) Also see rainbow, filled.contour(x, y, volcano, heat.colors, topo.colors color = terrain.colors, plot.title = title(main = "The Topography of Maunga Whau", xlab = "Meters North", ylab = "Meters West"), plot.axes = { axis(1, seq(100, 800, by = 100)) axis(2, seq(100, 600, by = 100)) }, key.title = title(main="Height"), key.axes = axis(4, seq(90, 190, by = 10))) Control key aspects Arguments to control axes and titles are slightly different because this function uses the title and axis statements The Topography of Maunga Whau Height 600 190 180 500 170 400 160 Meters West 150 300 140 130 200 120 110 100 100 90 100 200 300 400 500 600 700 800 Meters North Hands-on Exercise 2 • Figure out how to create a similar plot with contour lines overlaid. – Note that there are many ways to do this and using the filled.contour() function may not be the easiest. Explore the help files ! – A key may also not be necessary since the contour lines have the elevation embedded in them The Topography of Maunga Whau 600 110 120 500 110 400 160 10 Meters West 0 150 300 0 18 0 17 190 200 180 160 170 11 140 0 10 0 100 130 0 10 110 200 400 600 800 Meters North persp() function • The persp() function draws perspective plots of surfaces over an x–y plane • Look up help on persp() – ? persp() – Looking at examples are a good way to get started more complex functions like persp() • Note the function returns a viewing transformation matrix, a 4 x 4 matrix suitable for projecting 3D coordinates (x,y,z) into the 2D plane. The function trans3d() can be used in conjunction with this matrix to make the projection persp() • By default the function can return some pretty ugly plots volcano y x persp() • Save the transformation matrix to trans Shading direction as if being illuminated from the azimuth trans <- persp(x, y, volcano, theta = 135, phi = 30, ltheta = -120, shade=0.5, border=NA, box = FALSE) Control viewing direction. theta gives the azimuthal direction and phi the colatitude Shade the surface by this Pretty it up degree as if being iluminated by ltheta Labeling the summit • Find which matrix indices correspond to the max > which(volcano == max(volcano), arr.ind=T) row col [1,] 20 31 This is not a vector • Make the projection and add points for the summit summit <- trans3d(x[20], y[31], max(volcano), trans) points(summit[1], summit[2], pch=16) text(summit[1], summit[2], "Summit", pos=3) The transformation matrix Summit Other 3-d functions • contour() – contour lines • image() – image plot – These two are essentially used together in filled.contour() • contourplot() – lattice package, easier control • wireframe() – lattice package, similar to persp() • symbols() – 2-d, but uses a symbol of varying radius to represent the z- variable Trellis graphics • Trellis graphics were a new style of graphics presented in S-Plus that are particularly useful for displaying multivariate and especially grouped data. • These graphics can produce publication quality plots with a relative few lines of code. – However, they do lack flexibility • R can implement these graphics by using the lattice package install.packages("lattice") library(lattice) The formula argument • Just like specifying a model when fitting a linear model, trellis graphics are produced by specifying a model formula but slightly different • Common formula arguments formula Explanation y ~ x Plots variable y against variable x. ~ x One variable plots (boxplot) z ~ x * y Specify three continuous variables (perspective plots) y1 + y2 ~ x y1 variable and the y2 variable against x (multivariate) y ~ x | g y against the variable x for each level of the variable g. quakes data • The quakes data set gives the locations of 1000 seismic events of MB > 4.0 occurring in a cube near Fiji since 1964. > head(quakes) lat long depth mag stations 1 -20.42 181.62 562 4.8 41 2 -20.62 181.03 650 4.2 15 Number of stations 3 -26.00 184.10 42 5.4 43 reporting the event 4 -17.97 181.66 626 4.1 19 5 -20.42 181.96 649 4.0 11 6 -19.68 184.31 195 4.0 12 xyplot() • Go to help file ?xyplot • In the following slides we will extend a simple bivariate plot ( y vs. x) xyplot(lat ~ long, data=quakes, main="Earthquakes in the Pacific Ocean\n(since 1964)") formula statement \n moves text to the next line Earthquakes in the Pacific Ocean (since 1964) -10 -15 -20 lat -25 -30 -35 165 170 175 180 185 long Shingles • A shingle is a data structure used in Trellis, and is a generalization of factors to ‘continuous’ variables (recall cut() ) • equal.count() converts x to a shingle using an equal count algorithm that insures equal representation – See shingle() for finer control • Create a depth range categorical variable and plot it depthgroup <- equal.count(quakes$depth, number=3, overlap=0) The number of conditioning intervals • xyplot(lat ~ long | depthgroup, data=quakes) The fraction of overlap of the conditioning variables Earthquakes in the Pacific Ocean (since 1964) depthgroup -10 Level of conditioning -15 variable -20 -25 -30 -35 lat depthgroup depthgroup -10 -15 -20 -25 -30 -35 165 170 175 180 185 long xyplot() • The panels can be arranged with two additional arguments. xyplot(lat ~ long | depthgroup, data=quakes, layout=c(1, 3), aspect=1, index.cond=list(3:1)) Determines the order of panels within the conditioning variable Specify number of columns then rows ( just like mfcol ) depthgroup •The point is Trellis graphics are -10 -15 -20 customizable, but the learning curve -25 is a bit steeper. Before venturing into -30 these graphics, evaluate whether -35 you need their depthgroup -10 multivariate/categorical capabilities -15 and how much flexibility you need in -20 producing those plots. lat -25 -30 -35 depthgroup -10 -15 -20 -25 -30 -35 165 170 175 180 185 long xyplot() • So far this function has not generated anything that we would not be able to generate ourselves with a little coding. • When we start conditioning on two or more variables, the trellis graphics become more useful. xyplot(lat ~ long | depthgroup * magnitude, data=quakes, main="Fiji Earthquakes", ylab="latitude", xlab="longitude", scales=list(x=list(alternating=c(1, 1, 1))), between=list(y=1), Insert space between vertical panels par.strip.text=list(cex=0.7), par.settings=list(axis.text=list(cex=0.7))) Controls what panels have tick marks Controls text axis size Fiji Earthquakes magnitude magnitude magnitude depthgroup depthgroup depthgroup -10 -15 -20 -25 -30 -35 latitude magnitude magnitude magnitude depthgroup depthgroup depthgroup -10 -15 -20 -25 -30 -35 165 170 175 180 185 165 170 175 180 185 165 170 175 180 185 longitude Other trellis graphics Lattice Function Description Traditional Analogue barchart() Barcharts barplot() bwplot() Boxplots boxplot() Box-and-whisker plots densityplot() Conditional kernel density plots none Smoothed density estimate dotplot() Dotplots dotchart() Continuous versus categorical histogram() Histograms hist() qqmath() Quantile{quantile plots qqnorm() Data set versus theoretical distribution stripplot() Stripplots stripchart() One-dimensional scatterplot qq() Quantile{quantile plots qqplot() Data set versus data set xyplot() Scatterplots plot() levelplot() Level plots image() contourplot() Contour plots contour() cloud() 3-dimensional scatterplot none wireframe() 3-dimensional surfaces persp() splom() Scatterplot matrices pairs() parallel() Parallel coordinate plots none Hands-on Exercise 3 • Make use of the trellis graphics of your choice with the possum data. Be sure to use the categorical variable sex and create at least one shingle. Be creative. If you have extra time explore the help files to pretty-up your plots!