Variance Stabilization and Normalization using the vsn package in by cmk16156


									             Variance Stabilization and Normalization
                         using the vsn package in R

Short description

The vsn method builds upon the fact that the variance of microarray data depends on the
signal intensity and that a transformation can be found after which the variance is
approximately constant. It is like the logarithm at the upper end of the intensity scale,
approximately linear at the lower end, and smoothly interpolates in between. The position
of the cross-over point and the slope of the linear part depend on the error distribution of
the data. It also incorporates the estimation of "normalization" parameters (shift and
scale). Vsn assumes that less than half of the genes on the arrays is differentially
transcribed across the experiment. An advantage of vsn-transformation over log-
transformation is that vsn works also on values that are negative after background

More information

Variance stabilization applied to microarray data calibration and to the quantification of
differential expression. W. Huber, A. von Heydebreck, H. Sültmann, A. Poustka, M.
Vingron. Bioinformatics 18 suppl. 1 (2002), S96-S104 (ISMB 2002).
Example data: Four Affymetrix chip intensities

In this example we will normalize four Affymetrix chips using vsn. The experimental set
up is two replicates each for control (arrays 1 and 2) and treatment (arrays 3 and 4). The
input file affy_vsn_start.txt is a tab-delimited text file containing a header row, the
unique probe identifiers in the first column, and the Affymetrix signal intensities in the
following four columns. Take a look at the input file with Excel or Notepad.

   1. Open R and load the vsn package:
   2. Set the working directory where R looks for files to import and where output files
       are saved.
   3. Import the tab-delimited data set affy_vsn_start.txt from your directory into R as

       an object called afdat:
                afdat <- read.table(”affy_vsn_start.txt”, header=TRUE,
   4. Check if you indeed created an R object:
   5. Check the number of data rows and columns in afdat:
   6. Check the first few entries by printing all columns and first 10 rows of afdat:
                afdat [1:10, ]
   7. Perform vsn on afdat and write results to a new object afvsn (this will take a

                afvsn <- vsn(afdat)
   8. In the afvsn object, exprs(afvsn) is a matrix of the same size as the input data
       containing the transformed intensities:
   9. Now calculate for the first two arrays (control1 and control2) the ratios (R) and
       average intensities (I) for the raw and transformed data to generate Ratio-Intensity
                Rraw <- afdat[,2] / afdat[,1]
                Iraw <- afdat[,2] + afdat[,1]/2
                Rvsn <- exprs(afvsn)[,2] – exprs(afvsn)[,1]
           Ivsn <- exprs(afvsn)[,2] + exprs(afvsn)[,1]/2
10. Set R to make four simultaneous plots: on two lines and two columns:
            par(mfrow = c(2,2))
11. Make two types of plots to compare the raw and the vsn-transformed intensities:
   scatter plots of control1 versus control2 and RI plots. Note the difference in
   variance between the raw data and the vsn-transformed data:
           plot(afdat[,1], afdat[,2], log="xy", main="raw",
           xlab="control1", ylab="control2", pch=".")
           plot(exprs(afvsn)[,1], exprs(afvsn)[,2], main="vsn",
           xlab="control1", ylab="control2", pch=".")
           plot(Iraw, Rraw, log="xy", pch=".")
           plot(Ivsn, Rvsn, pch=".")
12. Now calculate the average intensities for control (arrays 1 and 2) and treatment
   (arrays 3 and 4), the "generalized log-ratio" of treatment versus control, and
   generate an RI plot:
           Ic <- exprs(afvsn)[,2] + exprs(afvsn)[,1]/2
           It <- exprs(afvsn)[,4] + exprs(afvsn)[,3]/2
           Itc <- Ic + It/2
           Rtc <- It – Ic
           plot(Itc, Rtc, pch=".")

13. You probably wondered about the 10 dots? The parameter estimation in vsn
   works in an iterative manner! To verify that the estimations of the calibration and
   variance stabilization parameters have reached a plateau well before the last
   iteration, you can call the function vsnPlotPar:
           par(mfrow = c(1,2))
           vsnPlotPar(afvsn, "offsets")
           vsnPlotPar(afvsn, "factors")
14. You can access the transformation and calibration parameters:
           prep <- preproc(description(afvsn))
15. The calibrated data is obtained through scaling the original data by the
   muliplicative factor and shifting by the additive offset. In this data set, the factors
   and offsets are very similar, indicating well-comparable arrays of good quality.
      Now suppose array 2 in the example data was not that well measured, and had a
      baseline that was shifted by 500 and a scale that differed by a factor of 0.25:
              bafdat <- afdat
              bafdat[, 2] <- 0.25 * (500 + bafdat[ ,2])
              afdat [1:10, ]
   16. We call again vsn on this data*:
              bafvsn <- vsn(bafdat)
   17. Let's have a look at the offsets and factors again and make some plots of control1
      versus control2:
              plot(bafdat[,1], bafdat[,2], log="xy", main="raw",
              xlab="control1", ylab="control2", pch=".")
              plot(exprs(bafvsn)[,1], exprs(bafvsn)[,2], main="vsn",
              xlab="control1", ylab="control2", pch=".")
   18. Finally, export the vsn-transformed intensities to a text file in the working
              write.table(cbind(rownames(exprs(afvsn)), exprs(afvsn)),
              quote=FALSE, sep="\t", file="af_vsn.txt", row.names=FALSE)

* While you are waiting for vsn to complete the analysis, check out the Bioconductor
website ( and look up the Vignette for vsn.

To top