Embed
Email

PLINK_tutorial

Document Sample

Shared by: huanghengdong
Categories
Tags
Stats
views:
1
posted:
12/19/2011
language:
pages:
4
PLINK tutorial

1/20/2011 Erin Smith with John Kelsoe





Goals:

1. Run a GWAS on cleaned data for multiple phenotypes in PLINK

2. Visualize results with Manhattan and Q-Q plots.

3. Look at LD structure of regions of interest with Haploview

4. Make a regional plot of the P-values using SNAP plot





Websites of interest:

PLINK: http://pngu.mgh.harvard.edu/~purcell/plink/

A Catalog of Published Genome-Wide Association Studies:

http://www.genome.gov/gwastudies/

UCSC Genome browser: http://genome.ucsc.edu/

dbSNP: http://www.ncbi.nlm.nih.gov/projects/SNP/

dbGaP: http://www.ncbi.nlm.nih.gov/sites/entrez?Db=gap

Haploview: http://www.broadinstitute.org/mpg/haploview

SNAP plot: http://www.broadinstitute.org/mpg/snap/doc.php





Setting the PATH environment variable

After placing PLINK in a convenient place, put the location in your environment

path to make it easier to call.

This process is temporary and will only work for the current window.

“PLINK_location” is the folder where PLINK is located.

Windows in a command prompt:

echo %PATH% (see what is now in the path)

path = C:\PLINK_location;%PATH%



Mac in a terminal window:

echo $PATH

export PATH=/PLINK_location:$PATH

Introduction to data formats in PLINK:

PED & MAP

BED, BIM, & FAM

Additional phenotype files

Exercise: Look at example.ped, example.map, example.bim,

example.fam, and phenotypes.txt and figure out what they are

Exercise: Reading in files: use --bfile to read in the example and bipolar

BED/BIM/FAM filesets – how many individuals are in the datasets? How

many SNPs? What is genotyping rate?

For Windows, use plink.exe, for Mac, use plink

plink --bfile example

plink --bfile bipolar





Manipulating the data files

Get only the genotypes for a single chromosome or a region around a

SNP

--chr 13

Exercise: Get data from chromosome 13 and write to a new BED file. If

you are having trouble running the full dataset, you can use this fileset

instead of bipolar.

plink --bfile bipolar --chr 13 --make-bed --out bipolar_chr13





Performing association tests

--assoc allelic association (chi-squared test of allele frequencies)





Other examples of association-related commands

--linear linear regression for a quantitative phenotype

--logistic logistic regression for a qualitative phenotype

--pheno pick a new phenotype file

--pheno-name choose a column from the phenotype file

Run a GWAS on the irritable mania phenotype

Use the commands –pheno and --pheno-name to select an alternate

phenotype. For later analyses, also add –adjust and --qq-plot commands.

plink --bfile bipolar --assoc --out bipolar_irritable --pheno phenotypes.txt --

pheno-name irritable_elated --adjust --qq-plot





Interpret genome-wide output: Manhattan & Q-Q plots

Exercise: generate a Manhattan plot in Haploview

Load Haploview

Choose PLINK format and read in .assoc file

Note: these assoc files have integrated map information.

Plot chromosomes on x-axis and –log(p) on y-axis.





Exercise: generate a Quantile-Quantile (Q-Q) plot in R

Use results from –adjust and –qq-plot, which generated the

expected null distribution of p-values:

bipolar_irritable.assoc.adjusted.

Start R

Change working directory to where the plink output is located

(setwd(dir) or Mac: Misc -> Change working directory or Windows:

File -> Change dir…)

Read in data

data <- read.table(file = " bipolar_irritable.assoc.adjusted",

header = T)

look at first 10 lines of table:

data[1:10,]

plot the expected –log P-values on the x-axis and observed –log P

values on the y-axis:

plot(-log(data$QQ, 10), -log(data$UNADJ,10), xlab =

“expected –logP values”, ylab = “observed –logP values”)

add a line corresponding to y = x

abline(a = 0, b = 1)

Strong deviation from the line indicates that there were more

significant associations than you would expect by chance.





Interpreting regional associations

Exercise: Look at LD relationships near potential hits

Get region +/- 250kb from peak SNP from PLINK – output as a ped file

using –snp and –window command

plink --bfile bipolar --snp rs17079247 --window 250 --recodeHV --out

rs17079247_250kb



Load into Haploview using the linkage format



Exercise: Look at zoomed-in P-values for the region with LD values

(SNAP plot)



plink --bfile bipolar --chr 13 --from-kb 84500 --to-kb 84800 --pheno

phenotypes.txt --pheno-name irritable_elated --assoc --out

bipolar_irritable_rs17079247_zoom



Edit output file in a text editor: change the header P to PValue



Go to SNAP plot website, choose “Plots” from upper right menu and plot a

“Regional Association Plot”





Get more info on genes in the region

UCSC Genome browser: http://genome.ucsc.edu/

Enter top SNP to find region and zoom out to find nearest genes



Related docs
Other docs by huanghengdong
2012_Vendor_Form_Wedding_Expo
Views: 0  |  Downloads: 0
SCOPE 1 GP letter v2.0 12Mar2007
Views: 0  |  Downloads: 0
Boston_immigration_records
Views: 2  |  Downloads: 0
PSC MATRIX of achievement 080709
Views: 0  |  Downloads: 0
Summary - CIRCA
Views: 0  |  Downloads: 0
ieee_wiley_ebooks_library_customer_title_list
Views: 0  |  Downloads: 0
2009-2010_ACC0044_fishers_772_07-dec-2009
Views: 1  |  Downloads: 0
FSP20111216-EN
Views: 0  |  Downloads: 0
Workshops
Views: 0  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!