Variance Stabilization and Normalization using the vsn package in R Short description The vsn method builds upon the fact that the variance of microarray data depends on the signal intensity and that a transformation can be found after which the variance is approximately constant. It is like the logarithm at the upper end of the intensity scale, approximately linear at the lower end, and smoothly interpolates in between. The position of the cross-over point and the slope of the linear part depend on the error distribution of the data. It also incorporates the estimation of "normalization" parameters (shift and scale). Vsn assumes that less than half of the genes on the arrays is differentially transcribed across the experiment. An advantage of vsn-transformation over log- transformation is that vsn works also on values that are negative after background subtraction. More information Variance stabilization applied to microarray data calibration and to the quantification of differential expression. W. Huber, A. von Heydebreck, H. Sültmann, A. Poustka, M. Vingron. Bioinformatics 18 suppl. 1 (2002), S96-S104 (ISMB 2002). Example data: Four Affymetrix chip intensities In this example we will normalize four Affymetrix chips using vsn. The experimental set up is two replicates each for control (arrays 1 and 2) and treatment (arrays 3 and 4). The input file affy_vsn_start.txt is a tab-delimited text file containing a header row, the unique probe identifiers in the first column, and the Affymetrix signal intensities in the following four columns. Take a look at the input file with Excel or Notepad. 1. Open R and load the vsn package: library(vsn) 2. Set the working directory where R looks for files to import and where output files are saved. 3. Import the tab-delimited data set affy_vsn_start.txt from your directory into R as an object called afdat: afdat <- read.table(”affy_vsn_start.txt”, header=TRUE, row.names=1) 4. Check if you indeed created an R object: ls() 5. Check the number of data rows and columns in afdat: dim(afdat) 6. Check the first few entries by printing all columns and first 10 rows of afdat: afdat [1:10, ] 7. Perform vsn on afdat and write results to a new object afvsn (this will take a while!*): afvsn <- vsn(afdat) 8. In the afvsn object, exprs(afvsn) is a matrix of the same size as the input data containing the transformed intensities: dim(exprs(afvsn)) exprs(afvsn[1:10,]) 9. Now calculate for the first two arrays (control1 and control2) the ratios (R) and average intensities (I) for the raw and transformed data to generate Ratio-Intensity plots: Rraw <- afdat[,2] / afdat[,1] Iraw <- afdat[,2] + afdat[,1]/2 Rvsn <- exprs(afvsn)[,2] – exprs(afvsn)[,1] Ivsn <- exprs(afvsn)[,2] + exprs(afvsn)[,1]/2 10. Set R to make four simultaneous plots: on two lines and two columns: par(mfrow = c(2,2)) 11. Make two types of plots to compare the raw and the vsn-transformed intensities: scatter plots of control1 versus control2 and RI plots. Note the difference in variance between the raw data and the vsn-transformed data: plot(afdat[,1], afdat[,2], log="xy", main="raw", xlab="control1", ylab="control2", pch=".") plot(exprs(afvsn)[,1], exprs(afvsn)[,2], main="vsn", xlab="control1", ylab="control2", pch=".") plot(Iraw, Rraw, log="xy", pch=".") plot(Ivsn, Rvsn, pch=".") 12. Now calculate the average intensities for control (arrays 1 and 2) and treatment (arrays 3 and 4), the "generalized log-ratio" of treatment versus control, and generate an RI plot: Ic <- exprs(afvsn)[,2] + exprs(afvsn)[,1]/2 It <- exprs(afvsn)[,4] + exprs(afvsn)[,3]/2 Itc <- Ic + It/2 Rtc <- It – Ic plot(Itc, Rtc, pch=".") 13. You probably wondered about the 10 dots? The parameter estimation in vsn works in an iterative manner! To verify that the estimations of the calibration and variance stabilization parameters have reached a plateau well before the last iteration, you can call the function vsnPlotPar: par(mfrow = c(1,2)) vsnPlotPar(afvsn, "offsets") vsnPlotPar(afvsn, "factors") 14. You can access the transformation and calibration parameters: prep <- preproc(description(afvsn)) prep$vsnParams 15. The calibrated data is obtained through scaling the original data by the muliplicative factor and shifting by the additive offset. In this data set, the factors and offsets are very similar, indicating well-comparable arrays of good quality. Now suppose array 2 in the example data was not that well measured, and had a baseline that was shifted by 500 and a scale that differed by a factor of 0.25: bafdat <- afdat bafdat[, 2] <- 0.25 * (500 + bafdat[ ,2]) afdat [1:10, ] 16. We call again vsn on this data*: bafvsn <- vsn(bafdat) 17. Let's have a look at the offsets and factors again and make some plots of control1 versus control2: preproc(description(bafvsn))$vsnParams plot(bafdat[,1], bafdat[,2], log="xy", main="raw", xlab="control1", ylab="control2", pch=".") plot(exprs(bafvsn)[,1], exprs(bafvsn)[,2], main="vsn", xlab="control1", ylab="control2", pch=".") 18. Finally, export the vsn-transformed intensities to a text file in the working directory: write.table(cbind(rownames(exprs(afvsn)), exprs(afvsn)), quote=FALSE, sep="\t", file="af_vsn.txt", row.names=FALSE) * While you are waiting for vsn to complete the analysis, check out the Bioconductor website (www.bioconductor.org) and look up the Vignette for vsn.
Pages to are hidden for
"Variance Stabilization and Normalization using the vsn package in"Please download to view full document