Lecture_8 by liaoxiuli4

VIEWS: 14 PAGES: 52

									More Complex Graphics in R

     Fish 507 F: Lecture 8
Recommended Readings
• R Graphics
   – R Graphics: Chapter 3/4 (Paul Murrell, 2006)
   – http://www.statsnetbase.com/books/2884/c486x_fm.pdf


• How to Display Data Badly (Howard Wainer, 1984)
   – http://www.jstor.org/stable/2683253
Outline
• The aim of this lecture is to explore more of the options in par in
  order to create more informative plots and introduce other high-level
  plotting routines in R


    – Working with multiple layouts


    – Saving plots to files


    – 3-dimensional plots


    – Trellis Graphics
Possum Data
•   The possum data frame consists of nine morphometric measurements on
    each of 104 mountain brushtail possums, trapped at seven sites from
    Southern Victoria to central Queensland Australia
     – Data comes from DAAG package in R
     – Available at: http://students.washington.edu/gtommy/FISH507F/data/


    > possum <- read.csv("http://students.washington.edu/gtom . . .
    > head(possum, n=3)
          case site Pop sex age hdlngth skullw totlngth taill . . .
    C3      1      1 Vic     m    8      94.1     60.4       89.0    36.0 . . .
    C5      2      1 Vic     f    6      92.5     57.6       91.5    36.5 . . .
    C10     3      1 Vic     f    6      94.0     60.0       95.5    39.0 . . .
Possum Data
•   Case - observation number
•   Site - one of seven locations where possums were trapped
•   Pop - a factor which classifies the sites as Vic Victoria, other New South Wales or
    Queensland
•   Sex - a factor with levels f female, m male
•   Age - age
•   Hdlngth - head length
•   Skullw - skull width
•   Totlngth - total length
•   Taill - tail length
•   Footlgth - foot length
•   Earconch - ear conch length
•   Eye - distance from medial canthus to lateral canthus of right eye
•   Chest - chest girth (in cm)
•   Belly - belly girth (in cm)
Opening the Graphics Device
• windows() opens a new blank graphing device


• This function has a variety of uses
    – Changing background color
    – Color pixels per inch
    – Specifying the size of the plotting window (default 7 x 7)


• When two plots are placed in the same graphing device specifying
   windows(width=14,height=7)
   will prevent the plots from looking squished.
    – This can also be performed by hand by resizing the window
Closing the graphing device
• Since windows() will create a new graphing device, it’s possible
  to have multiple graphing devices open simultaneously in R
   – This can be useful in some situations


• To close the current graphing device use dev.off()
Multiple graphs
• It’s often useful to put multiple graphs on the same active R
  Graphics device


• This is achieved with mfrow / mfcol specifications in par
     – Takes as input a vector specifying the number of rows and number of
       columns: c(nr, nc)
     – mfrow: multiple plots on one graph filled by row
     – mfcol: same but by column


•   This command needs to be given prior to making the plots
Default way to make 3 plots (only specify mfrow)




                         100




                                                                           100




                                                                                                                             100
                         95




                                                                           95




                                                                                                                             95
           Head Length




                                                             Head Length




                                                                                                               Head Length
                         90




                                                                           90




                                                                                                                             90
                         85




                                                                           85




                                                                                                                             85

                               75   80      85     90   95                       32   34   36   38   40   42                       50   55    60        65

                                         Total Length                                      Tail Length                                   Skull Weight
Changing the margins
• When multiple graphs are laid out space can be optimized by
  modifying the default margins


• par(mar=c(bottom, left, top, right))
   – Default is c(5, 4, 4, 2) + 0.1


• Note that if these specifications are not chosen carefully each
  individual plot may not be the same size making plot comparisons
  on the same scale misleading
Make use of windows() and mar
               100




                                                        100




                                                                                                     100
               95




                                                        95




                                                                                                     95
 Head Length

               90




                                                        90




                                                                                                     90
               85




                                                        85




                                                                                                     85
                     75   80     85           90   95         32   34   36          38     40   42         50   55          60      65

                               Total Length                                  Tail Length                             Skull Weight
What’s sub-optimal about this plot?



                         95




                                                                                   95
       possum$totlngth




                                                                 possum$totlngth
                         90




                                                                                   90
                         85




                                                                                   85
                         80




                                                                                   80
                         75




                                                                                   75
                              22   24    26     28     30   32                          25      30        35     40

                                        possum$chest                                            possum$belly



                               Histogram of possum$chest                                 Histogram of possum$belly




                                                                                   30
                         20




                                                                                   25
                                                                                   20
                         15
       Frequency




                                                                 Frequency

                                                                                   15
                         10




                                                                                   10
                         5




                                                                                   5
                         0




                                                                                   0




                              22   24    26     28     30   32                           25      30        35    40

                                        possum$chest                                            possum$belly
Hands-on Exercise 1
• As a plotting review, make changes to the previous plot based on
  our discussion and the contributed suggestions. Two of the
  changes I had in mind were:
   – Redundant labeling and spacing
   – Optimize space
   – Putting the plots on the same scale
Making customized layouts
• The layout() function provides an alternative to the mfrow and
  mfcol settings.


• The primary difference is that the layout() function allows the
  creation of multiple figure regions of unequal sizes and is much
  more flexible


• The first argument (and the only required argument) to the
  layout() function is a matrix. The number of rows and columns in
  the matrix determines the number of rows and columns in the layout.
  The contents of the matrix are integer values that determine which
  rows and columns each figure will occupy.
layout()
layout(mat, widths = rep(1, ncol(mat)), heights = rep(1,
   nrow(mat)), respect = FALSE)


•   mat a matrix object specifying the location of the next N figures on the output device.
    Each value in the matrix must be 0 or a positive integer. If N is the largest positive
    integer in the matrix, then the integers {1,...,N-1} must also appear at least once in
    the matrix.


•   widths a vector of values for the widths of columns on the device. Relative widths
    are specified with numeric values. Absolute widths (in centimetres) are specified with
    the lcm() function (see examples).


•   heights a vector of values for the heights of rows on the device. Relative and
    absolute heights can be specified, see widths above.


•   respect argument controls whether a unit column-width is the same physical
    measurement on the device as a unit row-height
Understanding layout()
• First create a matrix with numbers for each location to plot
   > ( lay1.mat <- matrix(c(1,1,0,2), 2, 2, byrow = TRUE) )
          [,1] [,2]
   [1,]     1      1
   [2,]     0      2
                                                        1




• Assign and show the layout
   > lay1 <- layout(mat=lay1.mat)
   > layout.show(lay1)
                                                                 2
Understanding layout()
 > ( lay2.mat <- matrix(c(1,1,3,2), 2, 2, byrow = TRUE) )
        [,1] [,2]
 [1,]     1    1
 [2,]     3    2
 > lay2 <- layout(mat=lay2.mat)
 > layout.show(lay2)                         1




                                       3           2
Understanding layout()
 > ( lay3.mat <- matrix(c(1,1,1,1,2,3), nrow=3, byrow = TRUE) )
        [,1] [,2]
 [1,]      1    1
 [2,]      1    1
 [3,]      2    3
 > lay3 <- layout(mat=lay3.mat)
                                                1


 > layout.show(lay3)


 plot(possum$hdlngth~possum$totlngth)
 hist(possum$hdlngth)
 hist(possum$totlngth)
                                         2              3
                 100
                 95
possum$hdlngth

                 90
                 85




                       75                   80              85                             90                     95

                                                          possum$totlngth

                            Histogram of possum$hdlngth                                Histogram of possum$totlngth
                 25




                                                                             20
                 20




                                                                             15
Frequency




                                                                 Frequency
                 15




                                                                             10
                 10




                                                                             5
                 5
                 0




                                                                             0




                            85     90       95      100                           75       80       85       90        95

                                   possum$hdlngth                                               possum$totlngth
Understanding layout()
• In this example we will make a more custom layout
  > ( lay4.mat <- matrix(c(2,0,1,3),nrow=2,byrow=TRUE) )
          [,1] [,2]
  [1,]         2       0                        2



  [2,]         1       3
  > lay4 <- layout(lay4.mat, c(3,1), c(1,3), respect=TRUE)
  > layout.show(lay4)


   Note the use of widths/heights to control
   the size of the regions                      1          3
Putting it all together
• Design the layout
   – Note that the width/heights argument is now specified


  lay4.mat <- matrix(c(2,0,1,3),nrow=2,byrow=TRUE)
  lay4 <- layout(lay4.mat, c(3,1), c(1,3), respect=TRUE)
  layout.show(lay4)
                                                        Don’t plot anything


• Get histogram counts to use in barplot()
   hdlngth.hist <- hist(possum$hdlngth, plot = F)
   totlngth.hist <- hist(possum$totlngth, plot = F)
Putting it all together
• Specify margins and plot the figures
   – Note use of horizontal=T


      par(mar=c(3,2.5,1,1))
      plot(possum$hdlngth~possum$totlngth, main = "")
      par(mar=c(0,3,1,1))
      barplot(totlngth.hist$counts)
      par(mar=c(3,0,1,1))
      barplot(hdlngth.hist$counts, horiz = T)
                  possum$hdlngth

        85   90               95   100
                                         0   5   10   15   20




75
80
85
90
95
0
5
10
15 20
25
 Saving plotting output to files
 • Since graphs can take a long time to produce or the analysis that
   went into a graph may have taken days to converge, it is a good
   idea to save important graphs to files


 • This can be done with a variety functions in R conveniently named
   after the common file name extension (*. jpeg – jpeg())


 • My advice
Program                     R Function         Reason

Microsoft Word/PowerPoint   win.metafile() Image resizable without losing quality

LaTex                       pdf()              Native format of document

Other                       png()              All around high quality image
Creating an image
• All the functions in R to create images have 4 arguments in common
   – filename=“name.extension”
   – pointsize – size (resolution) of the points
   – width/height – controls size of image. Note that by default this 7 x
     7 so if you use windows() to make a larger plotting device, the
     width/height should also match those dimensions.


• The image statement should always precede the commands to
  make the plot.


• When finished making the plot, use dev.off() to turn finish writing
  the file
png()
• Set your working directory or specify the path in the filename
  statement before saving an image


   png("possum.png", pointsize=12, width=480, height=480)
   lay4.mat <- matrix(c(2,0,1,3),nrow=2,byrow=TRUE)
   lay4 <- layout(lay4.mat, c(3,1), c(1,3), respect=TRUE)

                                                  Note that the width/height
   . . . . .                                      argument is in pixels



   barplot(hdlngth.hist$counts, horiz = T)
   dev.off()
pdf()

    pdf("possum.pdf", pointsize=12, width=7, height=7)
    lay4.mat <- matrix(c(2,0,1,3),nrow=2,byrow=TRUE)
    lay4 <- layout(lay4.mat, c(3,1), c(1,3), respect=TRUE)

                                                      Note that the width/height
    . . . . .                                         argument is in inches


    barplot(hdlngth.hist$counts, horiz = T)
    dev.off()


•   Now look for the files in the folder where they were saved
3-dimensional graphs
• There are number of different functions in R to create graphs that
  convey information about 3-dimensions
    – Think about which plots to use. A bubble plot (2-dimensional bubble
      sizes corresponding to z-values) in some cases can be more
      informative than 3-dimensional perspective plot


• For most 3-dimensional plots in R and the x and y values are given
  as vectors in ascending orders and the z values are given as a
  matrix with the number of rows corresponding to x and number of
  columns corresponding to y
    – The data we will work with will be in this format. If it weren’t, what are
      some approaches to getting there ?
        • Think back to the data manipulation lectures . . .
volcano data
• Maunga Whau (Mt Eden) is one of about 50 volcanos in the
  Auckland, New Zealand volcanic field. The data consists of 87 rows
  and 61 columns, rows corresponding to grid lines running east to
  west and columns to grid lines running south to north.


   – volcano is the z-matrix


   – Take a look at the volcano data


• Open the help file on filled.contour
   – ?filled.contour
 filled.contour
      x <- 10*1:nrow(volcano)                Data is in hundreds of meters

      y <- 10*1:ncol(volcano)

                                                    Also see rainbow,
      filled.contour(x, y, volcano,                 heat.colors, topo.colors
            color = terrain.colors,
            plot.title = title(main = "The Topography of Maunga Whau",
            xlab = "Meters North", ylab = "Meters West"),
            plot.axes = { axis(1, seq(100, 800, by = 100))
                             axis(2, seq(100, 600, by = 100)) },
            key.title = title(main="Height"),
            key.axes = axis(4, seq(90, 190, by = 10)))


Control key aspects
                            Arguments to control axes and titles are slightly different because
                            this function uses the title and axis statements
                          The Topography of Maunga Whau                  Height

              600
                                                                                  190

                                                                                  180
              500
                                                                                  170


              400                                                                 160
Meters West




                                                                                  150

              300                                                                 140

                                                                                  130
              200
                                                                                  120

                                                                                  110
              100
                                                                                  100

                                                                                  90
                    100     200   300    400    500    600   700   800

                                        Meters North
Hands-on Exercise 2
• Figure out how to create a similar plot with contour lines overlaid.
    – Note that there are many ways to do this and using the
      filled.contour() function may not be the easiest. Explore the
      help files !
    – A key may also not be necessary since the contour lines have the
      elevation embedded in them
                                The Topography of Maunga Whau




              600
                          110
                                       120




              500
                                                                         110


              400


                                              160
                                                                                      10
Meters West




                                                                                        0
                                                            150
              300




                                                      0
                                                    18




                                                              0
                                                            17
                                 190
              200




                                       180          160

                                        170
                    11




                                                                  140
                      0




                                                                                            10
                                                                                              0
              100




                                                                          130




                                                                                        0
                                                                                      10
                                                    110




                                200                   400               600     800

                                                    Meters North
persp() function
• The persp() function draws perspective plots of surfaces over an
  x–y plane


• Look up help on persp()
    – ? persp()
    – Looking at examples are a good way to get started more complex
      functions like persp()


• Note the function returns a viewing transformation matrix, a 4 x 4
  matrix suitable for projecting 3D coordinates (x,y,z) into the 2D
  plane. The function trans3d() can be used in conjunction with
  this matrix to make the projection
persp()
• By default the function can return some pretty ugly plots


                   volcano




                             y




                                        x
   persp()
  • Save the transformation matrix to trans                        Shading direction as if being
                                                                   illuminated from the azimuth


       trans <- persp(x, y, volcano,
                             theta = 135, phi = 30, ltheta = -120,
                             shade=0.5, border=NA, box = FALSE)



Control viewing direction.
theta gives the azimuthal
direction and phi the
colatitude
                                   Shade the surface by this                      Pretty it up
                                   degree as if being iluminated
                                   by ltheta
Labeling the summit
• Find which matrix indices correspond to the max
  > which(volcano == max(volcano), arr.ind=T)
         row col
  [1,]   20   31
                                                    This is not a vector

• Make the projection and add points for the summit
  summit <- trans3d(x[20], y[31], max(volcano), trans)
  points(summit[1], summit[2], pch=16)
  text(summit[1], summit[2], "Summit", pos=3)


                                                     The transformation matrix
Summit
Other 3-d functions
• contour() – contour lines
• image() – image plot
   – These two are essentially used together in filled.contour()
• contourplot() – lattice package, easier control
• wireframe() – lattice package, similar to persp()
• symbols() – 2-d, but uses a symbol of varying radius to represent
  the z- variable
Trellis graphics
• Trellis graphics were a new style of graphics presented in S-Plus
  that are particularly useful for displaying multivariate and especially
  grouped data.


• These graphics can produce publication quality plots with a relative
  few lines of code.
    – However, they do lack flexibility


• R can implement these graphics by using the lattice package
      install.packages("lattice")
      library(lattice)
The formula argument
• Just like specifying a model when fitting a linear model, trellis
  graphics are produced by specifying a model formula but slightly
  different


• Common formula arguments

   formula         Explanation
   y ~ x           Plots variable y against variable x.
    ~ x            One variable plots (boxplot)
   z ~ x * y       Specify three continuous variables (perspective plots)
   y1 + y2 ~ x     y1 variable and the y2 variable against x (multivariate)
   y ~ x | g       y against the variable x for each level of the variable g.
quakes data
• The quakes data set gives the locations of 1000 seismic events of
  MB > 4.0 occurring in a cube near Fiji since 1964.


  > head(quakes)
         lat     long depth mag stations
  1 -20.42 181.62         562 4.8           41
  2 -20.62 181.03         650 4.2           15
                                                     Number of stations
  3 -26.00 184.10          42 5.4           43       reporting the event

  4 -17.97 181.66         626 4.1           19
  5 -20.42 181.96         649 4.0           11
  6 -19.68 184.31         195 4.0           12
xyplot()
• Go to help file ?xyplot
• In the following slides we will extend a simple bivariate plot ( y vs. x)


   xyplot(lat ~ long, data=quakes,
            main="Earthquakes in the Pacific Ocean\n(since 1964)")

formula statement

                                             \n moves text to the next line
                  Earthquakes in the Pacific Ocean
                           (since 1964)


      -10




      -15




      -20
lat




      -25




      -30




      -35




            165    170        175          180       185

                                    long
Shingles
•   A shingle is a data structure used in Trellis, and is a generalization of
    factors to ‘continuous’ variables (recall cut() )


• equal.count() converts x to a shingle using an equal count algorithm
  that insures equal representation
     – See shingle() for finer control


•   Create a depth range categorical variable and plot it
    depthgroup <- equal.count(quakes$depth, number=3, overlap=0)


             The number of conditioning intervals

•   xyplot(lat ~ long | depthgroup, data=quakes)
                                                            The fraction of overlap of the
                                                            conditioning variables
                         Earthquakes in the Pacific Ocean
                                  (since 1964)


                        depthgroup
                                                  -10
                                                                     Level of conditioning
                                                  -15                variable
                                                  -20

                                                  -25

                                                  -30

                                                  -35
lat




                        depthgroup                      depthgroup
      -10

      -15

      -20

      -25

      -30

      -35



            165   170    175   180   185

                                           long
xyplot()
• The panels can be arranged with two additional arguments.


     xyplot(lat ~ long | depthgroup, data=quakes,
            layout=c(1, 3), aspect=1,
     index.cond=list(3:1))
                                         Determines the order of panels within the conditioning variable




Specify number of columns then rows ( just like mfcol )
                 depthgroup
                                        •The point is Trellis graphics are
      -10
      -15
      -20
                                        customizable, but the learning curve
      -25                               is a bit steeper. Before venturing into
      -30                               these graphics, evaluate whether
      -35
                                        you need their
                 depthgroup
                                  -10
                                        multivariate/categorical capabilities
                                  -15   and how much flexibility you need in
                                  -20   producing those plots.
lat




                                  -25
                                  -30
                                  -35

                 depthgroup
      -10
      -15
      -20
      -25
      -30
      -35


            165 170 175 180 185

                     long
   xyplot()
   •   So far this function has not generated anything that we would not be able to
       generate ourselves with a little coding.


   •   When we start conditioning on two or more variables, the trellis graphics
       become more useful.
       xyplot(lat ~ long | depthgroup * magnitude, data=quakes,
              main="Fiji Earthquakes",
              ylab="latitude", xlab="longitude",
              scales=list(x=list(alternating=c(1, 1, 1))),
              between=list(y=1),       Insert space between vertical panels

              par.strip.text=list(cex=0.7),
              par.settings=list(axis.text=list(cex=0.7)))


Controls what panels have tick marks                                          Controls text axis size
                                                 Fiji Earthquakes

                              magnitude                      magnitude                      magnitude
                             depthgroup                     depthgroup                     depthgroup
                                                                                                              -10


                                                                                                              -15


                                                                                                              -20


                                                                                                              -25


                                                                                                              -30


                                                                                                              -35
latitude




                              magnitude                      magnitude                      magnitude
                             depthgroup                     depthgroup                     depthgroup
           -10


           -15


           -20


           -25


           -30


           -35




                 165   170   175    180   185   165   170    175   180   185   165   170   175    180   185

                                                            longitude
Other trellis graphics
    Lattice Function   Description                                Traditional Analogue
    barchart()         Barcharts                                  barplot()
    bwplot()           Boxplots                                   boxplot()
                       Box-and-whisker plots
    densityplot()      Conditional kernel density plots           none
                       Smoothed density estimate
    dotplot()          Dotplots                                   dotchart()
                       Continuous versus categorical
    histogram()        Histograms                                 hist()
    qqmath()           Quantile{quantile plots                    qqnorm()
                       Data set versus theoretical distribution
    stripplot()        Stripplots                                 stripchart()
                       One-dimensional scatterplot
    qq()               Quantile{quantile plots                    qqplot()
                       Data set versus data set
    xyplot()           Scatterplots                               plot()
    levelplot()        Level plots                                image()
    contourplot()      Contour plots                              contour()
    cloud()            3-dimensional scatterplot                  none
    wireframe()        3-dimensional surfaces                     persp()
    splom()            Scatterplot matrices                       pairs()
    parallel()         Parallel coordinate plots                  none
Hands-on Exercise 3
• Make use of the trellis graphics of your choice with the possum
  data. Be sure to use the categorical variable sex and create at least
  one shingle. Be creative. If you have extra time explore the help
  files to pretty-up your plots!

								
To top