Programming and Perl for Bioinformatics Part II by bzs12927

VIEWS: 6 PAGES: 23

									Programming and Perl
         for
   Bioinformatics

       Part II
             Basic Data Types
 Perl   has three basic data types:
   scalar
   array(list)
   associative array (hash)
                                Arrays
   An array (list) is an ordered list of scalar values.
   „@‟ is used to refer to the entire array
   Example:
       (1,2,3)                     # Array of three values 1, 2, and 3
       ("one","two","three")     # Array of 3 values "one", "two", "three"
       @names = ("mary", "tom", "mark", "john", "jane");
       $names [1] ;         ?
                             # “tom”

                      Extract 2nd item from @names

       @names [1..4];
                         Extract the sublist from @names
                              More on Arrays
   @a = ( );                                        # empty list
   @b = (1,2,3);                                    # three numbers
   @c = ("Jan","Joe","Marie");                      # three strings
   @d = ("Dirk",1.92,46,"20-03-1977");              # a mixed list

   Variables and sublists are interpolated in a list
        @b = ($a, $a+1, $a+2);                      # variable interpolation
        @c = ("Jan", ("Joe","Marie") );             # list interpolation
        @d = ("Dirk", 1.92,46,( ), "20-03-1977");   # empty list interpolation
        @e = ( @b, @c );                            # same as (1,2,3,"Jan","Joe","Marie")


   Practical construction operators ($x..$y)
        @x = (1..6)                     # same as (1, 2, 3, 4, 5, 6)
        @y = (2..5, 8, 11..13)          # same as (2,3,4,5,8,11,12,13)
                       Array Example
# Here's one way to declare an array, initialized with a list of four
 # scalar values.
@bases = ('A', 'C', 'G', 'T');

  # Now we'll print each element of the array
print "Here are the array elements:";
print "\nFirst element: ";
print $bases[0];
print "\nSecond element: ";      This code snippet prints out:


print $bases[1];                Here are the array elements:
print "\nThird element: ";      First element: A
                                Second element: C
print $bases[2];                Third element: G
print "\nFourth element: ";     Fourth element: T
                       Print Array
   You can print the elements one after another like
    this:
@bases = ('A', 'C', 'G', 'T');
print "\n\nHere are the array elements: ";
print @bases;

                 It produces the output:
              Here are the array elements: ACGT
  Converting a string to an array
split splits a variable into parts and
  puts them in an array.

$dnastring = "ACGTGCTA";

@dnaarray = split ( //, $dnastring ) ;

          #@dnaarray is now (A, C, G, T, G, C, T, A)

@dnaarray = split ( /T/, $dnastring ) ;

          #@dnaarray is now (ACG, GC, A)
     Converting an array to a string

   join combines the elements of an array into a single
    scalar variable (a string)
        $dnastring = join('', @dnaarray);


                                    spacer      which array
                                 (empty here)
                 Array Manipulations
reverse Reverses the order of array elements
       @a = (1, 2, 3);
       @b = reverse @a; # @b = (3, 2, 1);
split   Splits a string into a list/array
        $line = "John Smith 28";
        ($first, $last, $age) = split (/\s/, $line);
                          #\s: white spaces [\t\n\f\r]

        $DNA = "ACGTTTGA";
        @DNA = split ("", $DNA);

join    Joins a list/array into a string
        $gene = join ( "", ($exon1, $exon3) ) ;
        $name = join ( "-", ("Zhong", "Hui")) ;

scalar Returns the number of elements in @array
        scalar @array;
         Array Manipulations - pop
  You can take an element off the end of an array with
   pop:
@bases = ('A', 'C', 'G', 'T');
$base1 = pop @bases;
print "Here's the element removed from the end: ";
print $base1, "\n\n";
print "Here's the remaining array of bases: ";
print "@bases";
   which produces the output:
Here's the element removed from the end: T
Here's the remaining array of bases: A C G
           Array Manipulations - shift
  You can take a base off of the beginning of the array with
   shift:
@bases = ('A', 'C', 'G', 'T');
$base2 = shift @bases; # shift left
print "Here's an element removed from the beginning: ";
print $base2, "\n\n";
print "Here's the remaining array of bases: ";
print "@bases";

   which produces the output:
Here's an element removed from the beginning: A
Here's the remaining array of bases: C G T
            Array Manipulations - push
   You can put an element on the end of the array with
    push:
@bases = ('A', 'C', 'G', 'T');
$base2 = shift @bases;
push (@bases, $base2); # return the number of elements in the array after push
print "Here's the element from the beginning put on the end: ";
print "@bases\n\n";

   It produces the output:
Here's the element from the beginning put on the end: C G T A
        Array Manipulations - unshift
   You can put an element at the beginning of the array
    with unshift:
@bases = ('A', 'C', 'G', 'T');
$base1 = pop @bases;
unshift (@bases, $base1);
print "Here's the element from the end put on the beginning:";
print "@bases\n\n";

   It produces the output:
Here's the element from the end put on the beginning: T A C G
            Exercise
#Determine freq of nucleotides
$dna ="gaTtACataCACTgttca";

 ?
                        Filehandles
File I/O (input/output): reading from/writing to files
 Files represented in Perl by a filehandle variable
    (for clarity, written as a bare word in UPPERCASE)
   Open a file on a filehandle using the open function
     for reading (input):
                 open INFILE, “< datafile.txt”;
           or open (INFILE, “< datafile.txt”);
     for writing (output), overwriting the file:
                 open OUTFILE, “> output”;
     for appending to the end of the file:
                 open OUTFILE, “>> output”;
   Close a file on a filehandle
      Close (OUTFILE);
              Special Filehandles

Special “files” that are always “open”

   STDIN (standard input)
      input from command window read only


   STDOUT (standard output)
      output to command window   write only

    print STDOUT “Have fun with Perl!\n”;

or just

    print “Have fun with Perl!\n”;
               Input from Filehandles
“Angle Bracket” input operator
      reads one line of input (up to newline/carriage return)
   from STDIN:
         print     "Enter name of protein: ";
         $line     = <STDIN>;
         chomp     $line;   # removes \n from end of $line
         print     “\nYou entered $line.\n”;
   from a file:
         open ( INPUTFILE, “prot1.seq”);
         $line1 = <INPUTFILE>; # first line
         chomp $line1;
         $line2 = <INPUTFILE>; # second line
                   # Perl reads files one line at a time
                   # … etc
               sequences.fasta
>gi|145536|gb|L04574.1|Escherichia coli DNA polymerase III chi subunit gene, complete cds
TAACGGCGAAGAGTAATTGCGTCAGGCAAGGCTGTTATTGCCGGATGCGGCGTGAACGCCTTATCCGACC
TACACAGCACTGAACTCGTAGGCCTGATAAGACACAACAGCGTCGCATCAGGCGCTGCGGTGTATACCTG
ATGCGTATTTAAATCCACCACAAGAAGCCCCATTTATGAAAAACGCGACGTTCTACCTTCTGGACAATGA
CACCACCGTCGATGGCTTAAGCGCCGTTGAGCAACTGGTGTGTGAAATTGCCGCAGAACGTTGGCGCAGC
GGTAAGCGCGTGCTCATCGCCTGTGAAGATGAAAAGCAGGCTTACCGGCTGGATGAAGCCCTGTGGGCGC
GTCCGGCAGAAAGCTTTGTTCCGCATAATTTAGCGGGAGAAGGACCGCGCGGCGGTGCACCGGTGGAGAT
CGCCTGGCCGCAAAAGCGTAGCAGCAGCCGGCGCGATATATTGATTAGTCTGCGAACAAGCTTTGCAGAT
TTTGCCACCGCTTTCACAGAAGTGGTAGACTTCGTTCCTTATGAAGATTCTCTGAAACAACTGGCGCGCG
AACGCTATAAAGCCTACCGCGTGGCTGGTTTCAACCTGAATACGGCAACCTGGAAATAATGGAAAAGACA
TATAACCCACAAGATATCGAACAGCCGCTTTACGAGCACTGGGAAAAGCAGGGCTACTTTAAGCCTAATG
GCGATGAAAGCCAGGAAAGTTTCTGCATCATGATCCCGCCGCCGAA
    Determine frequency of nucleotides
  Input file: sequences.fasta
open (INPUTFILE, "sequences.fasta"); #open file for
   sequence
$line1 = <INPUTFILE>;
$line2 = <INPUTFILE>;
$line3 = <INPUTFILE>;
chomp ($line2, $line3);
$dna = $line2.$line3;
$count_A = 0;
$count_C = 0;
$count_G = 0;
$count_T = 0;
@dna = split '', $dna;
foreach $base (@dna) {
    if ($base eq 'A') {$count_A++;}
    elsif ($base eq 'C') {$count_C++;}
    elsif ($base eq 'G') {$count_G++;}
    elsif ($base eq 'T') {$count_T++;}
    else {print "error!\n";}
}
print "count of A = $count_A \n";
print "count of C = $count_C \n";
print "count of G = $count_G \n";
print "count of T = $count_T \n";
         Read a File: line by line
    my $my_sequence;
    open FILE1, “/u/doej01/prot1.seq”;
    while ($line = <FILE1>){
      chomp($line);
       $my_sequence = $my_sequence . $line;
    };
    close ( FILE1 );

   Dumps the whole file into the variable :
    my_sequence
         Using loops to read in a file
   The while loop just keeps doing an expression while it‟s true.
    So it will keep reading lines from the file until it runs out.

   The special variable $_ keeps track of the line of the file we‟re on.
     my $longsequence;
     open FILE, „exampleprotein.txt‟;
     while (<FILE>){
       $longsequence = $longsequence . $_ ;
       chomp $longsequence;
     }
     close FILE;
   This reads the whole file, and puts each line into the variable
    $longsequence one at a time.
       Read a File into an Array

   Rather than read a file one line at time into a
    scalar variable, it is often helpful to read the
    entire file into an array
    open FILE1, “prot1.seq”;
    @DNA = <FILE1>; #array of strings
                Writing to a File
   Writing to a file is similar to reading from it
   Use the > operator to open a file for writing:
open OUTPUT,„>/home/achou/output.txt‟;

   This creates a new file with that name, or overwrites
    an existing file
   Use >> to append text to an existing file
   print to the file using the filehandle:
print OUTPUT $myoutputdata;

								
To top