Finding Sequence Motifs in Alu that Enhance the Expression of Nearby
Shared by: ewghwehws
-
Stats
- views:
- 14
- posted:
- 3/4/2012
- language:
- English
- pages:
- 23
Document Sample


Finding Sequence Motifs in Alu
Transposons that Enhance the
Expression of Nearby Genes
Kendra Baughman
York Marahrens’ Lab
UCLA
Overview
Goal
Background
Prior Studies
Strategy
Results
Remaining Tasks
Future Directions
Goal
Determine if there are motifs present among Alu
elements near highly expressed genes, and
missing from Alu elements near poorly
expressed genes, that might contribute to gene
expression
Background – Alu Elements
Repetitive sequence
Transposons (DNA sequences that make copies
of themselves and insert elsewhere in the genome)
Over 1 million in human genome
~50 subfamilies categorized by
sequence differences
Prior Studies
“Repetitive sequence environment
distinguishes housekeeping genes”
Eller, Daniel et al. submitted
“Alu abundance positively correlates
with gene expression level”
C.D. Eller et. al. submitted
Alu
p= 2e-45
20
Percent
15
Higher Alu
10
5
concentration
0
HK TS RS
near widely
expressed
genes
Higher Alu
concentration
near highly
expressed
genes
# Alu in the Subfamily
Alu Subfamilies
Subfamily
Data
Human gene expression levels from
microarray data (Stan Nelson’s lab, UCLA)
Alu information from UCSC Genome
Browser, Repeat masker tracks
Goal, reiterated
Determine if there are motifs present among Alu
elements near highly expressed genes, and
missing from Alu elements near poorly
expressed genes, that might contribute to gene
expression
Strategy
Find Alu “near” high and low expression
genes (within 20kb)
Perform multiple sequence alignment on
Alu sequences
Identify motifs preferentially conserved
around highly expressed genes (these
motifs could help the genes be highly
expressed)
Strategy
Find Alu “near” high and low expression
genes (within 20kb)
Perform multiple sequence alignment on
Alu sequences
Identify motifs preferentially conserved
around highly expressed genes (these
motifs could help the genes be highly
expressed)
Screening the genes…
Used Perl scripts to
extract information
from MySQL
Expression Level
databases
Grouped genes by
expression level in R
Chose genes in top
and bottom 20%
Genes
Screening the Alu…
Used MySQL queries to PERCENTAGES OF ALU THROWNOUT
determine flanking region Chrom1 Chrom10 Chrom19
1st 20mb 1st 20mb
Used Perl scripts to screen 10kb 3% 6% 20%
Alu located within 20kb of 20kb 7% 7% 28%
genes 50kb 17% 11% 50%
Omitted Alu in overlapping
LO-gene
flanking regions HI-gene
HI-Alu ??-Alu LO-Alu
Strategy
Find Alu “near” high and low expression
genes (within 20kb)
Perform multiple sequence alignment on
Alu sequences
Identify motifs preferentially conserved
around highly expressed genes (these
motifs could help the genes be highly
expressed)
Alignment Process…
First alignment tool: Clustalw
– Slow, inaccurate
Second alignment tool: T-COFFEE
– Can’t handle hundreds of sequences
Third alignment tool: MUSCLE
Aligning thousands of sequences = big gaps and
processing limitations
Chose to analyze by subfamily (S, Sp/q)
– Aligned elements around highly expressed genes
– Aligned elements around poorly expressed genes
– Profile high/low alignment
– Consensus sequence alignment
Alignment viewed in Jalview
Alignments of Alu Sp/q and AluS High conserv.
Elements Low conserv.
High Alu
AluSp-q EPS
AluSp/q AluS
Strategy
Find Alu “near” high and low expression
genes (within 20kb)
Perform multiple sequence alignment on
Alu sequences
Identify motifs preferentially conserved
around highly expressed genes (these
motifs could help the genes be highly
expressed)
AluS
Frequency of Alu w/ a base: *5547666896759699995769699999999999*9989979
consensus
base All Alu: 0444762289674300448576809499545545409449808
Alu High Alu: TATCCACGCCTGCAAAATCTCAGCCACTCCCAAAGTTGCTGCG
consensus
sequence Low Alu CANCC-CGCCT-CGTAATCCCAA--------AATGTT--TG-G
Frequency of All Alu: 76044 55899 37444989894 454045 98 8
consensus
base Alu w/ a base: 77488 66899 67444999995 455645 98 9
AluSp/q
Frequency of Alu w/ a base: 596**65559458765699999978999999966566******
consensus
base All Alu: 0860005458443600233333323333333345400000000
Alu High Alu: TGCTCAGAAATTTCTCGGCTCACTGCAACCTCCGTATCACCCC
consensus
sequence Low Alu: CG---A-AA--------------------CTCCGT--T---CT
Frequency of All Alu: 55 4 58 444544 0 77
consensus
base Alu w/ a base: 56 5 69 555655 6 99
Remaining Tasks
Analyze the remaining sub-families
Determine whether identified motifs agree
across subfamilies
BLAST motifs against all Alu sequences
and correlate alignment scores with
expression level
Future Directions
Cluster alignments into a relationship tree
to see if HI and LO Alu groups cluster
differently from each other
– Create a matrix of pairwise alignments and
cluster these into a tree using nearest
neighbour clustering
Use Hidden Markov Models or Gibbs
sampling to identify sequence motifs (non-
multiple sequence alignment method of
motif finding)
Acknowledgements
Danny Eller
York Marahrens
Marc Suchard
Chiara Sabatti
SoCalBSI
NIH/NSF
Related docs
Other docs by ewghwehws
Control system for dynamoelectric machines with differentially excited fields
Views: 0 | Downloads: 0
Get documents about "