Biocomputing Databases and Perl by ntz11397

VIEWS: 3 PAGES: 39

									                            Biocomputing
                          Databases and Perl
                             13th February 2006




MSc Bioinformatics 2006
                               HTTP              CGI



                    Client               Web           Application
                              Request   Server           Server

                             Response




MSc Bioinformatics 2006
                  HTTP                         CGI                      DB API


Client                                Web                 Application            DB Server
                                     Server                 Server                 +DB

                            e.g. PC + Apache +PERL (with CGI module) + MySQL




  MSc Bioinformatics 2006
                          The Database Tier
• stores and retrieves data.
• manages updates, allows simultaneous (concurrent)
  access from web servers, provides security, ensures the
  integrity of data
• requires complex software.
   – Database Management Systems (DBMSs)
       • searches and manages data that's stored in
         databases
       • a database is a collection of related data, and an
         application can have more than one database
   – interface is accessed using SQL (Standard Query
     Language) that's used to define and manipulate
     databases and data
MSc Bioinformatics 2006
     Why use a database server?
• more than one user who needs to access the data at the
  same time.
• a moderate amount of data
• relationships between the stored data items
• more than one kind of data object
• constraints that must be rigidly enforced on the data,
  such as field lengths, field types etc..
• If data must be queried to produce reports or results.
• large amount of data that must be searched quickly.
• If security is important
• When adding, deleting, or modifying data is a complex
  process.

MSc Bioinformatics 2006
                          Relational databases
• Entities and Attributes
         – An entity is a person, place, event, or thing about
           which data is collected
         – An entity set is a named collection of entities sharing
           common characteristics
         – Attributes are characteristics of the entity
         – Attributes are also called fields
• Tables
         – A table holds related entities or an entity set
         – Also called relations
         – Comprised of rows and columns
MSc Bioinformatics 2006
                          Tables
• Two-dimensional structure with rows and columns
• Each row (tuple) represents a single entity
• Columns represent attributes
• Row/column intersection represents single value
• Tables must have an attribute to uniquely identify each
  row, a unique key
• Column values all have same data format
• Each column has range of values called attribute domain
• Order of the rows and columns is immaterial to the
  DBMS


MSc Bioinformatics 2006
                            Fingerprint




Sequence




                            Motif



  MSc Bioinformatics 2006
             Entity-relationship model




MSc Bioinformatics 2006
                Relational logical model




MSc Bioinformatics 2006
MSc Bioinformatics 2006
                                       mySQL
• Command-line interpreter
                          – UNIX/Mac
/usr/local/bin/mysql -uhugh -pshhh
                          – Windows
Start > All Programs > mySQL > mySQL Server 5.0 > mySQL command line client




MSc Bioinformatics 2006
                          Managing databases
•      Creating databases
                             mysql> CREATE DATABASE phyloprints;
•      To use that database
                                       use phyloprints;
•      Creating tables

                  CREATE TABLE "fingerprint" (
                          "fprint_accn" varchar(15),
                          "identifier" varchar(15),
                          "motifs" int2,
                          "date" date,
                          "up_date" date,
                          "family_title" text,
                          "family_doc“ varchar(18),
                          "category" text,
                          "second_accn" varchar(15),
                          "pseudo" bool
                  );

                  SHOW CREATE TABLE wine;
MSc Bioinformatics 2006
           Table = fingerprint
           +----------------------------------+----------------------------------+-------+
           | Field                            | Type                              | Length|
           +----------------------------------+----------------------------------+-------+
           | fprint_accn                      | varchar()                         | 15    |
           | identifier                       | varchar()                         | 15    |
           | motifs                           | int2                              | 2     |
           | date                             | date                              | 4     |
           | up_date                          | date                              | 4     |
attributes
           | family_title                     | text                              | var   |
           | family_doc                       | varchar()                         | 18    |
           | category                         | text                              | var   |
           | second_accn                      | varchar()                         | 15    |
           | pseudo                           | bool                              | 1     |
           +----------------------------------+----------------------------------+-------+




           Table = motif
           +----------------------------------+----------------------------------+-------+
           |Field                             | Type                              |Length |
           +----------------------------------+----------------------------------+-------+
           |fprint_accn                       | varchar()                         | 18    |
           | motif                            | int2                              | 2     |
           | repeat                           | varchar()                         | 4     |
           | seqn_accn                        | varchar()                         | 15    |
           | seqn_fragment                    | varchar()                         | 35    |
           | start_position                   | int4                              | 4     |
           | inter_motif_dist                 | int4                              | 4     |
           | initial                          | bool                              | 1     |
           | final
     MSc Bioinformatics 2006                  | bool                              | 1     |

         +----------------------------------+----------------------------------+-------+
                            Modifiers
• Are applied to attributes
         – NOT NULL (row can’t exist without this
           attribute having a value)
         identifier varchar(15) NOT NULL,

         – DEFAULT (sets the data to the value that
           follows when no data is supplied)
                  pseudo" bool DEFAULT f,


MSc Bioinformatics 2006
                               Keys
• Primary key – one or more attributes that
  uniquely identify a row in a table
         – It is essential that every table has one

 PRIMARY KEY (fprint_accn), key names*fprint_accn,identifier) type=MyISAM;




MSc Bioinformatics 2006
   Removing databases and tables
• DROP statement (make sure you want to
  do this – it doesn’t check!)
                  DROP TABLE seqn;
                  DROP DATABASE phyloprints;




MSc Bioinformatics 2006
 Inserting/Updating/Deleting Data
• Inserting Data
INSERT INTO fingerprint VALUES („PR90008‟,
  „CRYSTALLIN‟,0,2000-10-11,2000-10-11,‟Crystallin pseudo-
  signature‟,‟PDOC90008‟,‟Others‟,t);

• Inserting lots of data rows
INSERT INTO fingerprint VALUES („PR90008‟,
   „CRYSTALLIN‟,0,2000-10-11,2000-10-11,‟Crystallin pseudo-
   signature‟,‟PDOC90008‟,‟Others‟,t),(„PR00240‟,‟ADRENRGCA
   1DR‟,7,1996-08-14,2000-02-18,‟Alpha-1D adrenergic
   receptor signature‟,‟PDOC00240‟, f);




MSc Bioinformatics 2006
• DELETE statement
         – All data
                          DELETE FROM fingerprint;
         – Deleting specific rows – use WHERE
         DELETE FROM fingerprint WHERE identifer =
           „CRYSTALLIN‟;
• UPDATE statement
       UPDATE fingerprint SET fprint_accn =
       „PR00001‟ WHERE identifier=„CRYSTALLIN‟;
MSc Bioinformatics 2006
                          Querying with SQL
• Key operators
         – SELECT
         – PROJECT
         – JOIN
• Other operators
         – INTERSECT
         – UNION (union compatible tables)
         – DIFFERENCE
         – PRODUCT
         – DIVIDE
MSc Bioinformatics 2006
 Querying databases with SQL
• SELECT statement
SELECT identifier, fprint_accn FROM
  fingerprint;

SELECT * FROM fingerprint;

• WHERE clause
SELECT * from fingerprint WHERE
  fprint_accn=„PR00001‟;
MSc Bioinformatics 2006
• select identifier,fprint_accn from
  fingerprint where pseudo='t' ORDER BY
  fprint_accn;

•  select fprint_accn,COUNT(*) from fingerprint
  where pseudo='t' GROUP BY fprint_accn;
• select DISTINCT seqn_accn from motif;
• select distinct seqn_accn from motif LIMIT
  10;




MSc Bioinformatics 2006
                          Join queries
• output data that's based on relationships
  between two or more tables




MSc Bioinformatics 2006
MSc Bioinformatics 2006


                          Diagram taken from Jane Mabey
MSc Bioinformatics 2006
                          Diagram taken from Jane Mabey
                            Biocomputing
                          Databases and Perl
                             15th February 2006




MSc Bioinformatics 2006
   Outline
• Database connectivity in Perl using DBI
  module
• A series of small exercises




MSc Bioinformatics 2006
      Querying databases using Perl
#! /usr/bin/perl -w
# intro6.pl - connect to MySQL, retrieve data, write plain text output
use strict;
use DBI;

my $user = „george‟;
my $pass = „george‟;


my ($dbh, $sth, $count); $dbh = DBI->connect
    ("DBI:mysql:host=localhost;database=webdb", "webdev", "webdevpass", {PrintError =>
    0, RaiseError => 1});
$sth = $dbh->prepare ("SELECT name, wins, losses FROM teams");
$sth->execute ();
$count = 0;
while (my @val = $sth->fetchrow_array ()) {
printf "name = %s, wins = %d, losses = %d\n", $val[0], $val[1], $val[2]; ++$count;
push(@allval,@val);
}
print "count rows total\n";
$sth->finish ();
$dbh->disconnect ();
exit (0);



MSc Bioinformatics 2006
%./intro6.pl
name = Fargo-Moorhead Twins, wins = 36, losses = 16
name = Winnipeg Maroons, wins = 24, losses = 26
name = Minot Why Nots, wins = 19, losses = 23
name = Warren Wanderers, wins = 16, losses = 30
4 rows total




MSc Bioinformatics 2006
                             Perl example…
use DBI;
my $dbh = DBI->connect('DBI:MySql:payroll') or die "Couldn't connect to database: " .
    DBI->errstr;
my $sth = $dbh->prepare('SELECT * FROM people WHERE lastname = ?') or die "Couldn't
    prepare statement: " . $dbh->errstr;
print "Enter name> ";
while ($lastname = <>) {
# Read input from the user my @data;
chomp $lastname;
$sth->execute($lastname)
# Execute the query or die "Couldn't execute statement: " . $sth->errstr;
# Read the matching records and print them out
while (@data = $sth->fetchrow_array()) {
         my $firstname = $data[1];
         my $id = $data[2];
         print "\t$id: $firstname $lastname\n";
}
if ($sth->rows == 0) {
         print "No names matched `$lastname'.\n\n";
 }
$sth->finish;
print "\n"; print "Enter name> "; }
$dbh->disconnect;




MSc Bioinformatics 2006
      Query databases over the Web
#! /usr/bin/perl -w # intro7.pl - connect to MySQL, retrieve data, write
   HTML output

use strict;
use DBI;
use CGI qw(:standard);
my ($dbh, $sth, $count);
$dbh = DBI->connect ("DBI:mysql:host=localhost;database=webdb", "webdev",
   "webdevpass", {PrintError => 0, RaiseError => 1});
$sth = $dbh->prepare ("SELECT name, wins, losses FROM teams");
$sth->execute ();
print header(), start_html ("team data");
$count = 0;
while (my @val = $sth->fetchrow_array ()) {
   print p (sprintf ("name = %s, wins = %d, losses = %d\n", $val[0],
   $val[1], $val[2]));
   ++$count;
}
print p ("$count rows total"), end_html ();
$sth->finish ();
$dbh->disconnect ();
exit (0);
MSc Bioinformatics 2006
        More about data retrieval with
                  DBI.pm
• fetchrow_arrayref

$sth = $dbh->prepare ("SELECT name, wins, losses
  FROM teams");
$sth->execute ();
while (my $ref = $sth->fetchrow_arrayref ()) {
  printf "name = %s, wins = %d, losses = %d\n",
  $ref->[0], $ref->[1], $ref->[2];
}
$sth->finish ();


MSc Bioinformatics 2006
      #!/usr/bin/perl

      use DBI;

      my     $db = “sequence”;
      my     $server = „localhost‟;
      my     $user = „root‟;
      my     $passwd = „passwd‟;

      my $dbConnection = DBI-
         >connect(“dbi::mysql:$db:$server”,$user,$passwd)
      my $query = “show tables”;
      my $sql = $dbConnection->prepare($query);
      $sql ->execute();
      while(my $row = $sql->fetchrow_arrayref){
         print join(“\t”,@$row),”\n”;
      }
      $dbConnection->disconnect;
      exit;


MSc Bioinformatics 2006
              Higher retrieval methods
• selectrow_array()
         – to retrieve a single row or a single column
           value
• selectcol_arrayref()
         – returns the first column of a result set, as a
           reference to an array of values.
• selectall_arrayref()
         – retrieves the entire result set as a matrix and
           returns a reference to it

MSc Bioinformatics 2006
 Inserting data in the database
sub new_employee {
# Arguments: database handle; first and last names of new
   employee; # department ID number for new employee's work
   assignment
my ($dbh, $first, $last, $department) = @_;
my ($insert_handle, $update_handle);
my $insert_handle = $dbh->prepare_cached('INSERT INTO
   employees VALUES (?,?,?)');
my $update_handle = $dbh->prepare_cached('UPDATE
   departments SET num_members = num_members + 1 WHERE id =
   ?');
die "Couldn't prepare queries; aborting" unless defined
   $insert_handle && defined $update_handle;
$insert_handle->execute($first, $last, $department) or
   return 0; $update_handle->execute($department) or return
   0; return 1; # Success
}

MSc Bioinformatics 2006
 Inserting data in the database
sub new_employee {
# Arguments: database handle; first and last names of new
   employee; # department ID number for new employee's work
   assignment
my ($dbh, $first, $last, $department) = @_;
my ($insert_handle, $update_handle);
my $insert_handle = $dbh->prepare_cached('INSERT INTO employees
   VALUES (?,?,?)');
my $update_handle = $dbh->prepare_cached('UPDATE departments SET
   num_members = num_members + 1 WHERE id = ?');
die "Couldn't prepare queries; aborting" unless defined
   $insert_handle && defined $update_handle;
my $success = 1;
$success &&= $insert_handle->execute($first, $last, $department);
   $success &&= $update_handle->execute($department);
my $result = ($success ? $dbh->commit : $dbh->rollback); unless
   ($result) { die "Couldn't finish transaction: " . $dbh->errstr
   } return $success;
}

MSc Bioinformatics 2006
• A short cut for DELETING, UPDATING
  and INSERTING data use the do
  statement
$dbh->do('DELETE FROM people WHERE age > 65');




MSc Bioinformatics 2006
                    A little on more keys…
• A foreign key is a column or columns
  whose values are the same as the primary
  key of another table. You can think of a
  foreign key as a copy of primary key from
  another relational table. The relationship is
  made between two relational tables by
  matching the values of the foreign key in
  one table with the values of the primary
  key in another.
MSc Bioinformatics 2006
                          Reminders
• Executable CGI files
• Configuring Apache to read cgi files from
  cgi-bin
• In httpd.conf
• ScriptAlias /cgi-bin/ "C:/Program
  Files/Apache Group/Apache2/cgi-bin/"


MSc Bioinformatics 2006

								
To top