VIEWS: 0 PAGES: 76 POSTED ON: 3/18/2013 Public Domain
An Introduction to Web resources • R home page: http://www.r-project.org/ • R Archive: http://cran.r-project.org/ • R FAQ (frequently asked questions about R): http://cran.r-project.org/doc/FAQ/R-FAQ.html • R manuals: http://cran.r-project.org/manuals.html The R environment • R command window (console) or Graphical User Interface (GUI) – Used for entering commands, data manipulations, analyses, graphing – Output: results of analyses, queries, etc. are written here – Toggle through previous commands by using the up and down arrow keys The R environment • The R workspace – Current working environment – Comprised primarily of variables, datasets, functions The R environment • R scripts – A text file containing commands that you would enter on the command line of R – To place a comment in a R script, use a hash mark (#) at the beginning of the line Some tips for getting started in R 1. Create a new folder on your hard drive for your current R session 2. Open R and set the working directory to that folder 3. Save the workspace with a descriptive name and date 4. Open a new script and save the script with a descriptive name and date Executing simple commands • The assignment operator <- • x <- 5 assigns the value of 5 to the variable x • y <- 2*x assigns the value of 2 times x (10 in this case) to the variable y • r <- 4 • area.circle <- pi*r^2 • NOTE: R is case-sensitive (y ≠ Y) ● R object types • Vector • Matrix • Array • Data frame • Function • List Vectors and arrays • Vector: a one-dimensional array, all elements of a vector must be of the same type (numerical, character, etc) • Matrix: a two-dimensional array with rows and columns • Array: as a matrix, but of arbitrary dimension Entering data into R • Vectors: • The “c” command – combine or concatenate data • Data can be character or numeric v1 <- c(12, 5, 6, 8, 24) v2 <- c("Yellow perch", "Largemouth bass", "Rainbow trout", "Lake whitefish“) Entering data into R • Vectors: • Sequences of numbers c() seq() years<-c(1990:2007) x<-seq(0,100,10) x<-seq(0, 200, length=100) Entering data into R • Arrays: array() matrix() m1<-array(1:20, dim=c(4,5)) m2<-matrix(1:20, ncol=5, nrow=4) Entering data into R • Arrays: • Combine vectors as columns or rows cbind() rbind() Matrix1 <- cbind(v1, v2) Matrix2 <- rbind(v1, v2) Data frames • A data frame is a list of variables of the same length with unique row names • A collection of variables which share many of the properties of matrices and of lists • Used as the fundamental data structure by most of R's modeling software Data frames • Convert vectors or matrices into a data frame data.frame() df1<-data.frame(v1, v2) df2<-data.frame(matrix1) Data frames • Editing data frames in spreadsheet-like view edit() df2<-edit(df1) • Let’s go to R and enter some vectors, arrays, and data frames Script : QFC R short course R object types_1_vectors and arrays.R Placing variables in the R search path • When variables in a data frame are used in R, the data frame name followed by a $ sign and then the variable name is required query1<-df1$v3 > 20 Placing variables in the R search path • Alternatively, the attach() function can be used attach(df1) query1 <- v3 > 20 detach(df1) Accessing data from an array, vector, or data frame • Subscripts are used to extract data from objects in R • Subscripts appear in square brackets and reference rows and columns, respectively Subscripts df1 df[3,5] C1 C2 C3 C4 C5 R1 25 Mon 56 45 Cat df[,3] R2 2 Tues 84 2 Dog R3 24 Wed 7 15 Dog df[5,] R4 15 Thurs 56 236 Cat R5 26 Fri 89 6 Cat df[2:5,] R6 25 Sat 23 58 Dog R7 2 Sun 11 8 Dog Queries in R: Common logical arguments > Greater than < Less than == Equals !x ! Indicates logical negation (not), not x x & y Logical and, x and y x|y Logical or, x or y Queries in R • The use of logical tests query1<-df1$v3 > 20 df1[query1,] query2<-df1$v3 > 20 & df1$v4 < 30 (&=and) query2<-df1$v3 > 20 | df1$v4 < 30 (| = or) Queries in R Script : QFC R short course R object types_2_query arrays data frames.R Exercise 1 Importing data from Excel (or other database management programs) • Export as text file (.txt) • Tips – Avoid spaces in variable and character names, use a period (e.g., fish.weight, not fish weight and Round.lake not Round lake) – Replace missing data with “NA” – See Excel example (MI STORET data RAW.xls) Importing data from Excel • read.table() data.frame.name <- read.table(“file path”, na.strings=”NA”, header=TRUE) df1<-read.table("C:\\R\\Example\\datafile1.txt", na.strings="NA", header=TRUE) Note the use of \\ instead of \ in path name Importing data from Excel • If your working directory is set, R will automatically look for the data text file there. • So, the read.table syntax can be simplified by excluding the file path name: • read.table(“data.txt”, na.strings=“NA”, header=T) Exporting data from R write.table() write.table(df, file = "Path Name\\file_name.csv", sep = ",", col.names = NA) Introduction to R functions • R has many built-in functions and many more that can be downloaded from CRAN sites (Comprehensive R Archive Network) • User-defined functions can also be created The R base package Introduction to R functions • Common functions names(): obtain variable names of a df summary(): summary of all variables in a df mean(): Mean var(): Variance sd(): standard deviation Script: QFC R short course R functions_1.R Introduction to R functions, cont head(): print first few rows of data frame sapply() and tapply(): column-wise summaries levels(): obtain levels of a character variable by(): produce summaries by group Introduction to R functions, cont tapply(variable, list(group1, group2), mean) Applies function to each element in ragged arrays sapply(variable, FUN=) Applies a function to elements in a list by(data, INDICES, FUN) Introduction to R loops Basin syntax: for (i in 1:n){ some code } *Excel example Script: QFC R short course R functions_2.R User-defined functions Function name <- function(x){ argument } Script: QFC R short course R user defined functions_3.R Exercise 2 (Part 1) R functions part 2: subset data • subset() function sub<- subset(data frame, criteria) sub1<-subset(fish, no.fish > 50) sub2<-subset(fish, no.fish>50 & position=="Below") R functions part 2: subset data • Select specific columns sub3<-subset(fish, select=c(stream, site, no.fish)) Script: QFC R short course R subset_4.R Exercise 2 (Part 2) Introduction to basic graphing http://addictedtor.free.fr/graphiques/ Graphing basics Plotting commands 1. High-level functions: Create a new plot on the graphics device 2. Low-level functions: Add more information to an already existing plot, such as extra points, lines, and labels 3. Interactive graphing functions: Allow you to interactively add information to a graph Common high-level functions • plot(): A generic function that produces a type of plot that is dependent on the type of the first arguement • hist(): Creates a histogram of frequencies • barplot(): Creates a histogram of values • boxplot(): Creates a boxplot • pairs(): Creates a scatter plot matrix Common high-level functions plot() plot(x) plot(x,y) : scatter plot plot(y~x) : scatter plot plot(group, x) : box plot Common high-level functions hist(x) boxplot(x~group) pairs(z) pairs(df1[,3:7]) Common high-level functions Script: QFC R course Graphing Basics_1.R Exercise 3 END OF DAY 1 Lower-level graphing functions Length (mm) histogram Boxplot of length 170 0.04 Length (mm) 150 Density 0.02 130 110 0.00 100 120 140 160 180 Chinook slamon lengths Chinook triglyceride levels for three hatcheries Scatter plot of length-weight 2000 50 Triglycerides (mg/dL) 1500 DWOR MCCA 40 Weight (g) RAPH 1000 30 Big Fish 500 20 0 DWOR MCCA RAPH 110 120 130 140 150 160 170 Length (mm) Lower-level graphing functions • Axis scales and labels xlim=c(0,50) ylim=c(0,100) xlab=“text” ylab=“text” main=“text” cex= <1 will make font smaller than default, >1 will increase font size Lower-level graphing functions Symbol shapes and colors 25 24 pch = symbol types 23 22 col = color types 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 5 10 15 20 25 Lower-level graphing functions • Adding lines and text, and points abline() abline(a,b) a= intercept, b = slope abline(h=mean(x, na.rm=T) text() text(x,y, “text”, options) points() points(x,y, options) Lower-level graphing functions Scripts: QFC R course Graphing Basics_2.R QFC R course Graphing Basics_3.R Exercise 4 Introduction to statistical analyses • R provides many functions for statistical analyses – Descriptive – Univariate – Multivariate – Mixed models – Spatial – Bayesian Introduction to statistical analyses •Descriptive statistics Correlations: cor() cor(df1[,2:6]) t-tests: t.test() t.test(y~group) Script: QFC R short course correlations and t-test.R Basic model structure in R response variable ~ predictor variable(s) Symbols in model statements are used differently compared to arithmetic expressions Symbol Meaning + Indicates inclusion of a predictor variable, not addition - Indicates the deletion of a predictor variable, not subtraction * Indicates inclusion of a predictor variable and an interaction, not multiplication / Indicates nesting of predictor variables, not division | Indicates conditioning : Indicates an interaction (e.g., A:B is a two-way interaction between A and B) Specifying models in R • Linear regression example: yi 0 1 xi ei i = 1, 2,…n Linear regression y1 1 x1 e1 1 x1 y2 1 x2 e 2 1 x2 0 ... = 0 ...+ 1 ... + ... = ... 1 ... y 1 x e 1 xn n n n Design matrix X Linear regression We can solve for 0 and 1 by X X T 1 T X Y Script: QFC R short course simple linear regression example 1.R Simple linear regression lm() Examples lm(y~x), with intercept and where x and y are continuous lm(y~1+x), with intercept lm(y~0+x), regression through the origin (no intercept) lm(y~A), where A is a categorical variable lm(y~x + A) lm(y~A*B) = lm(y~A+B+A:B) Simple linear regression lm() Example: Model1<-lm(length~weight, data=reg) Model1<-lm(log(length)~log(weight), data=reg) Linear regression • Model diagnostics – summary() – residual() – fitted() – plot() Script: QFC R short course simple linear regression example_2.R Linear regression • Subset data for regression • Model1<-lm(length ~ wgt, subset=hatchery=="DWOR", data=reg1) • Running models through a loop Script: QFC R short course simple linear regression example_3.R Analysis of variance • aov() • Categorical explanatory variables • Compare the mean values of multiple group anova<-aov(y~groups) Script: QFC short course ANOVA 1.R Exercise 5 Nonlinear regression • Estimating parameters is more tricky compared to linear regression models • Iterative search procedure required • Must provide starting values of parameters • Convergence issues Nonlinear regression Differences in R between linear and nonlinear regression 1.For nonlinear regression models the user must specify the exact equation as part of the model statement 2.The user must specify initial guesses as to the value of the parameters that are being estimated Nonlinear regression • Von Bertalanffy growth model Lt L 1 e k[t t 0 ] error Lt is length at age t L is the asymptotic average maximum length k is the growth rate coefficient that determines how quickly the maximum size is attained t 0 is the hypothetical age which the species has zero length Nonlinear regression 900 Lt L 1 e k[t t 0 ] error 800 500 600 700 Length (mm) 400 0 5 10 15 Age (yrs) Nonlinear regression nls() Least-squares estimates of the parameters of a nonlinear model Nonlinear regression Lt L 1 e k[t t 0 ] error vonB1<- nls(length~Linf*(1-exp(-k*(age-to))), data=length.age, start=list(Linf=1000, k=0.05, t0=-2)) Nonlinear regression Graphing fitted lines: 1. Generate a sequence of numbers that cover the range of the x-axis 2. Generate predicted values for the sequence of x-values 3. Plot original data 4. Overlay predicted values Nonlinear regression Script: QFC R course Von Bertalanffy Nonlinear regression.R Exercise 6