Bioinformatics

Document Sample
Bioinformatics Powered By Docstoc
					                         Roadmap
The topics:
    basic concepts of molecular biology
         Gene, protein
         Central dogma of molecular biology
         PCR, DNA sequencing
      Elements of Perl
      overview of the field
      biological databases and database searching
      sequence alignments
      phylogenetics
      structure prediction
      microarray data analysis
Programming and Perl
         for
   Bioinformatics
       Part I
     A Taste of Perl: print a message
   perltaste.pl: Greet the entire world.

#!/usr/bin/perl           - command interpretation header
#greet the entire world     - a comment
$x = 6e9;                 - variable assignment statement
print “Hello world!\n”;
print “All $x of you!\n”;        }   - function calls
                                     (output statements)
     Basic Syntax and Data Types
   whitespace doesn’t matter to Perl. One can write all
    statements on one line
   All Perl statements end in a semicolon ; just like C
   Comments begin with ‘#’ and Perl ignores everything
    after the # until end of line.
       Example: #this is a comment

   Perl has three basic data types:
       scalar
       array (list)
       associative array (hash)
                           Scalars
   Scalar variables begin with ‘$’ followed by an identifier
       Example: $this_is_a_scalar;

   An identifier is composed of upper or lower case
    letters, numbers, and underscore '_'. Identifiers are case
    sensitive (like all of Perl)
       $progname = “first_perl”;
       $numOfStudents = 4;
        = sets the content of $progname to be the string
        “first_perl” & $numOfStudents to be the integer 4
                  Scalar Values
   Numerical Values
     integer:       5, “3”, 0, -307
     floating point: 6.2e9, -4022.33

     hexadecimal/octal: 0xd4f, 0477

     binary: 0b011011



    NOTE: all numerical values stored as floating-point
     numbers (“double” precision)
                        Do the Math
   Mathematical functions work pretty much as you would
    expect:
         4+7
         6*4                                                4+5
         43-27
         256/12
                                                            9
         2/(3-5)                                            4+5=9
   Example
    #!/usr/bin/perl         What                 will be the output?
    print "4+5\n";
    print 4+5 , "\n";
    print "4+5=" , 4+5 , "\n";
    $myNumber = 88;
       Note: use commas to separate multiple items in a print statement
                   Scalar Values
   String values
   Example:
    $day = "Monday ";
    print "Happy Monday!\n";          Happy Monday!<newline>
    print "Happy $day!\n";            Happy Monday!<newline>
    print 'Happy Monday!\n';          Happy Monday!\n
    print 'Happy $day!\n';            Happy $day!\n

    What will be the output?
   Double-quoted: interpolates (replaces variable name/control
    character with it’s value)
   Single-quoted: no interpolation done (as-is)
               String Manipulation
Concatenation
         $dna1 = “ACTGCGTAGC”;
         $dna2 = “CTTGCTAT”;
   juxtapose in a string assignment or print statement
         $new_dna = “$dna1$dna2”;
   Use the concatenation operator ‘.’
         $new_dna = $dna1         .      $dna2;

Substring                0    2                   Length of the substring
         $dna = “ACTGCGTAGC”;
         $exon1 = substr($dna,2,5); # TGCGT
                   Substitution
DNA transcription: T  U
Substitution operator s/// :
      $dna = “GATTACATACACTGTTCA”;
      $rna = $dna;
      $rna =~ s/T/U/g; #“GAUUACAUACACUGUUCA”
=~ is a binding operator indicating to exam the contents of
  $rna for a match pattern

Ex: Start with $dna =“gaTtACataCACTgttca”;
and do the same as above. What will be the output?
                           Example
   transcribe.pl:
    $dna ="gaTtACataCACTgttca";
    $rna = $dna;
    $rna =~ s/T/U/g;
    print "DNA: $dna\n";
    print "RNA: $rna\n";
   Does it do what you expect? If not, why not?
   Patterns in substitution are case-sensitive! What can we do?
   Convert all letters to upper/lower case (preferred when possible)
   If we want to retain mixed case, use transliteration/translation
    operator   tr///
        $rna =~ tr/tT/uU/; #replace all t by u, all T by U
            Case conversion
$string = “acCGtGcaTGc”;
Upper case:
      $dna = uc($string); # “ACCGTGCATGC”
        or $dna = uc $string;

        or $dna = “\U$string”;

Lower case:
      $dna = lc($string); # “accgtgcatgc”
        or $dna = “\L$string”;

Sentence case:
      $dna = ucfirst($string) # “Accgtgcatgc”
        or $dna = “\u\L$string”;
             Reverse Complement
5’- A C G T C T A G C . . . . G C A T -3’
3’- T G C A G A T C G . . . . C G T A -5’
5’- A T G C . . . . G C T A G A C G T -3’


   Reverse: reverses a string
    $string = "ACGTCTAGC";
    $string = reverse($string); "CGATCTGCA“

   Complementation: use transliteration operator
    $string =~ tr/ACGT/TGCA/;
   More on String Manipulation
String length:
    length($dna)


Index:                           optional



    #index STR,SUBSTR,POSITION
    index($strand, $primer, 2)
                     Flow Control
Conditional Statements
      parts of code executed depending on truth value of a logical
       statement

“truth” (logical) values in Perl:
  false = {0, 0.0, 0e0, “”, undef}, default “”
  true = anything else, default 1
  ($a, $b) = (75, 83);
  if ( $a < $b ) {
      $a = $b;
      print “Now a = b!\n”;
  }
  if ( $a > $b ) { print “Yes, a > b!\n” } # Compact
         Comparison Operators
Comparison                 String   Number
Equality                   eq       ==
Inequality                 ne       !=
Greater than               gt       >
Greater than or equal to   ge       >=
Less than                  lt       <
Less than or equal to      le       <=
return 1/null
Comparison:                cmp      <=>
Returns -1, 0, 1
        Logical Operators

Operation    Computerese   English version
  AND            &&              and
   OR            ||              or
  NOT             !              not
                    if/else/elsif
   allows for multiple branching/outcomes
    $randDNA = "";
    while ( length($randDNA) < 200 ) {
      $a = rand();
      if ( $a <0.25 ) {
         $randDNA .= "A";
       }
       elsif ($a <0.50 ) {
          $randDNA .= "C";
       }
       elsif ( $a < 0.75 ) {
          $randDNA .= "G";
       }
       else {
          $randDNA .= "T";
       }
    }
    print $randDNA;
                  Conditional Loops
while ( statement ) { commands … }
        repeats commands until statement is no longer true

do { commands } while ( statement );
       same as while, except commands executed as least once
       NOTE the ‘;’ after the while statement!!
Short-circuiting commands: next and last
   next;        #jumps to end, do next iteration
   last;        #jumps out of the loop completely
                    while
Example:

    while ($alive) {
       if ($needs_nutrients) {
          print “Cell needs nutrients\n”;
       }
    }



    Any problem?
             for and foreach loops
   Execute a code loop a specified number of times, or for
    a specified list of values
   for and foreach are identical: use whichever you want
Incremental loop (“C style”):
        for ( $i=0 ; $i < 50 ; $i++ ) {
            $x = $i*$i;
            print "$i squared is $x.\n";
        }
Loop over list (“foreach” loop):
        foreach $name ( "Billy", "Bob", "Edwina" ) {
            print "$name is my friend.\n";
        }
             Basic Data Types
 Perl   has three basic data types:
   scalar
   array(list)
   associative array (hash)
                                Arrays
   An array (list) is an ordered group of scalar values.
   ‘@’ is used to refer to the entire array
   Example:
       (1,2,3)                     # Array of three values 1, 2, and 3
       ("one","two","three")     # Array of 3 values "one", "two", "three"
       @names = ("mary", "tom", "mark", "john", "jane");
       $names [1] ;         ?
                             # “tom”

                      Extract 2nd item from @names

       @names [1..4];
                         Extract the sublist from @names
             Basic Data Types
 Perl   has three basic data types:
   scalar
   array(list)
   associative array (hash)
                            More on Arrays
   @a = ();                                         # empty list
   @b = (1,2,3);                                    # three numbers
   @c = ("Jan","Joe","Marie");                      # three strings
   @d = ("Dirk",1.92,46,"20-03-1977");              # a mixed list

   Variables and sublists are interpolated in a list
        @b = ($a,$a+1,$a+2);                        # variable interpolation
        @c = ("Jan",("Joe","Marie"));               # list interpolation
        @d = ("Dirk",1.92,46,(),"20-03-1977");      # empty list interpolation
        @e = ( @b, @c );                            # same as (1,2,3,"Jan","Joe","Marie")


   Practical construction operators ($x..$y)
        @x = (1..6)                     # same as (1, 2, 3, 4, 5, 6)
        @y = (1.2..4.2)        # same as (1.2, 2.2, 3.2, 4.2, 5.2)
        @z = (2..5,8,11..13)   # same as (2,3,4,5,8,11,12,13)
                   Array Manipulations
reverse Reverses the order of array elements
         @a = (1, 2, 3);
         @b = reverse @a; # @b = (3, 2, 1);
split    Splits a string into a list/array
         $line = "John Smith 28";
         ($first, $last, $age) = split /\s/, $line;

         $DNA = "ACGTTTGA";
         @DNA = split ('', $DNA);

join     Joins a list/array into a string
         $gene = join "", ($exon1, $exon3);
         $name = join "-", ("Zhong", "Hui");

scalar   Returns the number of elements in @array
         scalar @array;

sort     Return sorted elements
         sort { $a <=> $b } @not_sorted             # numerical sort
         sort { $a cmp $b } @not_sorted             # ASCII-betical sort
            Exercise
#Determine freq of nucleotides
$dna ="gaTtACataCACTgttca";

 ?
 Ex: Determine freq of nucleotides
$dna ="gaTtACataCACTgttca";
$dna = uc($dna); #GATTACATACACTGTTCA
$count_A = 0;
$count_C = 0;
$count_G = 0;
$count_T = 0;
@dna = split '', $dna;
foreach $base (@dna) {
    if ($base eq 'A') {$count_A++;}
    elsif ($base eq 'C') {$count_C++;}
    elsif ($base eq 'G') {$count_G++;}
    elsif ($base eq 'T') {$count_T++;}
    else {print "error!\n";}
}
print "count of A = $count_A \n";
print "count of C = $count_C \n";
print "count of G = $count_G \n";
print "count of T = $count_T \n";
                           Filehandles
File I/O (input/output): reading from/writing to files
 Files represented in Perl by a filehandle variable
     (for clarity, usu. written as a bare word in UPPERCASE)

   Open a file on a filehandle using the open function
        for reading (input):
                     open INFILE, “< datafile.txt”;
              or open (INFILE, “< datafile.txt”);
        for writing (output), overwriting the file:
                     open OUTFILE, “> output”;
        for appending to the end of the file:
                     open OUTFILE, “>> output”;

   Close a file on a filehandle
        Close (OUTFILE);
              Special Filehandles

Special “files” that are always “open”

   STDIN (standard input)
      input from command window read only


   STDOUT (standard output)
      output to command window          write only

    print STDOUT “Have fun with Perl!\n”;

or just

    print “Have fun with Perl!\n”;
               Input from Filehandles
“Angle Bracket” input operator
      reads one line of input (up to newline/carriage return)
   from STDIN:
         print     "Enter name of protein: ";
         $line     = <STDIN>;
         chomp     $line;   # removes \n from end of $line
         print     “\nYou entered $line.\n”;
   from a file:
         open (INPUT, “aminos.txt”);
         $amino1 = <INPUT>;
         $amino2 = <INPUT>;
         chomp ($amino1, $amino2);

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:1
posted:7/27/2012
language:
pages:31