Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out

Finding Sequence Motifs in Alu that Enhance the Expression of Nearby

VIEWS: 14 PAGES: 23

									Finding Sequence Motifs in Alu
Transposons that Enhance the
 Expression of Nearby Genes
        Kendra Baughman
       York Marahrens’ Lab
             UCLA
            Overview
Goal
Background
Prior Studies
Strategy
Results
Remaining Tasks
Future Directions
                   Goal

Determine if there are motifs present among Alu
  elements near highly expressed genes, and
     missing from Alu elements near poorly
 expressed genes, that might contribute to gene
                    expression
Background – Alu Elements
  Repetitive sequence
  Transposons (DNA sequences that make copies
  of themselves and insert elsewhere in the genome)

  Over 1 million in human genome
  ~50 subfamilies categorized by
  sequence differences
     Prior Studies
“Repetitive sequence environment
 distinguishes housekeeping genes”
    Eller, Daniel et al. submitted

“Alu abundance positively correlates
     with gene expression level”
    C.D. Eller et. al. submitted
                 Alu
               p= 2e-45
          20
Percent
          15




                           Higher Alu
          10
          5




                           concentration
          0




                HK TS RS


                           near widely
                           expressed
                           genes
   Higher Alu
concentration
  near highly
   expressed
       genes
# Alu in the Subfamily
                         Alu Subfamilies




                              Subfamily
                Data
Human gene expression levels from
microarray data (Stan Nelson’s lab, UCLA)
Alu information from UCSC Genome
Browser, Repeat masker tracks
          Goal, reiterated

Determine if there are motifs present among Alu
  elements near highly expressed genes, and
     missing from Alu elements near poorly
 expressed genes, that might contribute to gene
                    expression
             Strategy
Find Alu “near” high and low expression
genes (within 20kb)
Perform multiple sequence alignment on
Alu sequences
Identify motifs preferentially conserved
around highly expressed genes (these
motifs could help the genes be highly
expressed)
             Strategy
Find Alu “near” high and low expression
genes (within 20kb)
Perform multiple sequence alignment on
Alu sequences
Identify motifs preferentially conserved
around highly expressed genes (these
motifs could help the genes be highly
expressed)
          Screening the genes…

                                 Used Perl scripts to
                                 extract information
                                 from MySQL
Expression Level




                                 databases
                                 Grouped genes by
                                 expression level in R
                                 Chose genes in top
                                 and bottom 20%




                    Genes
Screening the Alu…

 Used MySQL queries to         PERCENTAGES OF ALU THROWNOUT

  determine flanking region               Chrom1      Chrom10   Chrom19
                                         1st 20mb               1st 20mb
 Used Perl scripts to screen   10kb       3%           6%        20%
  Alu located within 20kb of    20kb       7%           7%        28%
  genes                         50kb       17%         11%        50%

 Omitted Alu in overlapping
                                                             LO-gene
  flanking regions               HI-gene



                                HI-Alu       ??-Alu              LO-Alu
             Strategy
Find Alu “near” high and low expression
genes (within 20kb)
Perform multiple sequence alignment on
Alu sequences
Identify motifs preferentially conserved
around highly expressed genes (these
motifs could help the genes be highly
expressed)
Alignment Process…
    First alignment tool: Clustalw
     – Slow, inaccurate
    Second alignment tool: T-COFFEE
     – Can’t handle hundreds of sequences
    Third alignment tool: MUSCLE

    Aligning thousands of sequences = big gaps and
    processing limitations
    Chose to analyze by subfamily (S, Sp/q)
     –   Aligned elements around highly expressed genes
     –   Aligned elements around poorly expressed genes
     –   Profile high/low alignment
     –   Consensus sequence alignment
 Alignment viewed in Jalview
Alignments of Alu Sp/q and AluS      High conserv.

Elements                             Low conserv.




                       High Alu
       AluSp-q EPS




        AluSp/q                   AluS
             Strategy
Find Alu “near” high and low expression
genes (within 20kb)
Perform multiple sequence alignment on
Alu sequences
Identify motifs preferentially conserved
around highly expressed genes (these
motifs could help the genes be highly
expressed)
 AluS
Frequency of    Alu w/ a base: *5547666896759699995769699999999999*9989979
  consensus
       base            All Alu: 0444762289674300448576809499545545409449808
       Alu          High Alu: TATCCACGCCTGCAAAATCTCAGCCACTCCCAAAGTTGCTGCG
 consensus
  sequence           Low Alu CANCC-CGCCT-CGTAATCCCAA--------AATGTT--TG-G
Frequency of           All Alu: 76044 55899 37444989894      454045       98 8
  consensus
       base     Alu w/ a base: 77488 66899 67444999995       455645       98 9



 AluSp/q
 Frequency of   Alu w/ a base: 596**65559458765699999978999999966566******
   consensus
        base          All Alu: 0860005458443600233333323333333345400000000
      Alu           High Alu: TGCTCAGAAATTTCTCGGCTCACTGCAACCTCCGTATCACCCC
consensus
 sequence           Low Alu: CG---A-AA--------------------CTCCGT--T---CT
 Frequency of         All Alu: 55   4 58                   444544   0      77
   consensus
        base    Alu w/ a base: 56   5 69                   555655     6     99
      Remaining Tasks
Analyze the remaining sub-families
Determine whether identified motifs agree
across subfamilies
BLAST motifs against all Alu sequences
and correlate alignment scores with
expression level
      Future Directions
Cluster alignments into a relationship tree
to see if HI and LO Alu groups cluster
differently from each other
– Create a matrix of pairwise alignments and
  cluster these into a tree using nearest
  neighbour clustering
Use Hidden Markov Models or Gibbs
sampling to identify sequence motifs (non-
multiple sequence alignment method of
motif finding)
    Acknowledgements
Danny Eller
York Marahrens
Marc Suchard
Chiara Sabatti
SoCalBSI
NIH/NSF

								
To top