Name:______________________________________ Biol 402
Name: ______________________________________ Crowe
Each pair of two will turn in ONE Bioinformatics worksheet. Please write both of your
names on the top of this sheet.
You have cloned and sequenced a portion of an Arabidopsis gene. The sequence you
have obtained is:
gtgaacccgt caacccttga acctcggctg gcaagtctaa tcaaaggcag gcagttaaat
(This sequence can be copied and pasted from the word document
“BioinformaticsSequence” in the Assignments folder on the Biol 402 web page)
You want to find out if anyone has already characterized this gene, so you decide to use
BLAST (Basic Local Alignment Search Tool).
Access BLAST through the NCBI home page: http://www.ncbi.nlm.nih.gov/ and
selecting BLAST from the menu along the top of the page.
OR by selecting NCBI Blast from the 402 home page.
You decide NOT to restrict your search to the Arabidopsis database since you
want to find out if your sequence is related to sequences in other species.
You want to search a nucleotide database with a nucleotide sequence, so you select
Now you can paste your sequence into the “Enter Query Sequence” Box.
Next you have to select a database to search. For the broadest search, use the
Nucleotide collection (nr/nt).
Next you have to decide which BLAST program you will use for your search.
To help you decide, click the ? button next to “choose a BLAST algorithm”
After reading this over, give an example of a research question that would require you
to use a Megablast search instead of a Discontiguous megablast search. Explain your
Now that you know about the different BLAST programs, you are ready to determine if
anyone has characterized the gene you are studying. Remember your decision above
about wanting to find out if your sequence is related to sequences in other species.
Which program do you select? ________________________
Click on the Blue BLAST button to start your search.
Your request is now being processed. The sequence you entered is being compared to
all sequences in the database you selected, so it may take a few minutes.
Your results should be presented in graphic format. You can scroll down to see the
pair-wise alignments below the graph (or click on a bar inside the graph and you will be
taken to that sequence alignment).
Can you figure out what the length of each line indicates?
What the color of each line indicates?________________________________
Can you find any homologs of this sequence in organisms other than
Note: if you didn’t identify any non-Arabiodpsis sequences you can go back to the
BLAST search page and broaden your search.
Check the E-value and the percent coverage to see how highly conserved these other
Click on the blue accession number for the entry that represents the Arabidopsis gene
with an exact match over these 60 base pairs.
This will take you to the GenBank Record for this gene.
Look through the record to identify the Arabidopsis locus tag (Read over the information
on p. 21 of your lab manual if you have not done this already to refresh your memory on
what an Arabidopsis locus tag should look like).
Arabidposis locus tag: ________________________
What is the gene’s three letter acronym (usually followed by a number if the gene
belongs to a family of genes):____________________
This is the Arabidopsis gene that you will be studying in this class
Looking at the GenBank record, list one thing that you learned about this gene (hint:
how did you learn more about the AtSWI3B gene?):
Knowing how Arabidopsis locus tags are named (from p. 21), what chromosome is the
Arabidopsis gene At4g10000 located on? _____________
Check with an instructor/TA before proceeding to the next step.
Once you have identified your gene using BLAST, you want to learn more about it.
Since it is a protein that acetylates histones, you decide to go to the Plant Chromatin
In the Search Box at the top of the page, select Locus and enter the Arabidopsis gene
What is the ChromDB ID for your gene?____________
What protein group does your gene product belong to?__________________________
What plant tissues is your gene expressed in as determined by Northern analysis?
(Select “expression” from the menu at the top of the page)
Note: Slq = silique or seed pod; Sdlg = 2 week old seedling (entire plant)
Go back to the Summary page
You can find out a lot of information about your gene from this database, but we also
want you to check out The Arabidopsis Information Resource (TAIR) web page.
To do this, click on the Locus Link which will take you to the TAIR information page
for your gene.
Scroll down to the bottom of this page and notice there are several published papers
that reference your gene. You will be reading one of these (Plant orthologs of
p300/CBP: conservation of a core domain in metazoan p300/CBP acetyltransferase-
related proteins.) for class next week.
There is a lot more information on this page about your gene!
***You should come back to this page throughout the quarter to learn more about
your gene as you develop your research projects.
However, now that you know your gene of interest, you want to characterize its function.
One way to do this is to look at the phenotype of a plant in which this gene has been
“knocked-out”. Several years ago, an incredible resource of Arabidopsis “knock-out”
mutants was compiled at the Salk Institute in San Diego. Random insertion of the
Agrobacterium transfer DNA (T-DNA) into the Arabidopsis genome created a library of
seeds each of which contains one or two insertions in its genome. The Salk institute
has set up a searchable database to allow you to find a mutant in your gene of interest.
To do this, follow the link to the Salk T-DNA web page from the Biol 402 home
This page is very busy!
You can make it more easy to read by using the arrows in the upper right corner of the
screen to scroll in to 10 bases/pixel. The dark blue bar running horizontally represents
the chromosome, with the “top” of the chromosome to the left. The first gene you see at
the top of Chromosome 1 is called At1g01010, and the next one is named Atg1g01020
etc. The positions of the genes are shown as broken green arrows above the
chromosome (the introns are shown as gaps in the arrow). Can you find a gene that
does not have any introns?(note: you may have to zoom out)
Underneath the diagrammed chromosome, there are indications of various insertions
localized to each of the genes. For example, insert mutants generated at the Salk
Institute lab are indicated in pink, and are called “SALK_NNNNNN”. There are also
Salk_NNNNNN which are in Green boxes and these are homozygous insertion mutants
which we will talk more about in lab next week. The numbering of these mutant plants
are unrelated to the gene in which they are inserted.
Find an insert mutant for the gene you are studying (from p. 2 of this
To search the database for a gene, scroll down to 1. SEARCH and type the Arabidopsis
gene name into “Query”. Click on “Search”. The display will now be centered on the
gene you searched for. Do you see any SALK insert lines in your gene?
Write down the Salk numbers of one or two T-DNA insertion mutants that you believe
will not express your gene. For each one, describe where in the gene the insertion lies
(e.g. exon 3) and what effect you predict it will have on gene expression and/or function.
The position of the insert determines in part how it affects gene function.
***Check with a TA/instructor to find out which Salk T-DNA line your group will be
using. Then proceed to “identifying PCR primers” on the next page.
Note: Once you have identified an insert that you think would knock out function of your
gene, the next step would be to order seeds from the ABRC (“ Arabidopsis Biological
Resource Center”). Due to time limitations, we have already obtained these seeds, and
planted them so that you have sufficient material from which to isolate DNA and confirm
the location of T-DNA insertions.
Identifying PCR Primers
We will be using PCR to confirm that the plants we obtained from ABRC do indeed have
a T-DNA insertion. We will discuss the how’s and why’s of PCR next week, but right
now, you we want you to see how you can design primers necessary for the PCR
Go to http://signal.salk.edu/cgi-bin/tdnaexpress
On the right side of the page, under Tools, select - iSct primers (you may have to scroll
Scroll down to “2. Salk-TDNA verification Primer design”
In the white box, type in the name of your SALK line (using the indicated format) and
click on “submit”.
In the space below, write the sequences of your LP and RP primers:
Copy the information you obtain into a Word document and e-mail it to yourself in
order to print it and SAVE it. you will need this information to design your PCR
protocol in next week’s laboratory.
Note: LP= Left Primer; RP = right primer, TM = melting temperature, GC = % GC
content. It also gives you the predicted PCR product size if an insert is present.
By convention, DNA sequences are written in the 5’ to 3’ orientation.
Note: Since you are joining an on-going research project, students from previous
quarters have already ordered the LP and RP primers for the PCR reaction.
Double checking primers
Before you go to the expense of ordering primers, carrying out PCR reactions and
analyzing results, it’s always a good a idea to double check the primer sequences to
see that they are specific for the sequence you’re interested in, and that they will amplify
the correct sequence.
To do this:
Go to the TAIR home page (http://www.arabidopsis.org/)
Under TOOLS select SeqViewer. In the text box, Paste (or type in) the LP primer
sequence, select search by sequence and hit Submit.
Look carefully at the screen. The green lines represent the 5 Arabidopsis
chromosomes. The position that matches your primer sequence will be indicated by a
red #1. You can click on the red #1 to look at the sequence, but it is easier to use the
select button and look at view 1. The button is located at the top of the page in the
middle. Click on “select” then scroll down to “1”. This will give you a view of the
chromosome sequence indicating what region your primer is complementary to. Look
along the right side of the screen to see if the gene name written vertically matches your
gene of interest. Is this the gene you want to study? Look at the sequence, and record
the sequence position of the primer. Is the sequence italicized or not? (If the sequence
is italicized it means that it is identical to the bottom strand of the DNA, if it is not, it
means that it is identical to the top strand of the DNA). Now repeat this process for the
Which strand will the LP primer anneal to? _________________________
Which strand will the RP primer anneal to? _________________________
Do both primers fall within the correct gene?
What is the distance between the two primers?
Is this the same information that the SIGnAL verification tool gave you? Is one primer
going to anneal to the top strand and the other to the bottom strand (one of the two
primers should be italicized indicating it matches a sequence on the bottom strand).
One of the most common mistakes in molecular biology is in designing primers that
either anneal to the wrong strand, or are inverted with respect to the 5’ and 3’ end of the
sequence. It’s worth your time to make sure you understand what the DNA polymerase
is doing and what your primer orientation must be! We will go over this in class when we
are designing our PCR reactions, so come prepared with questions if this is confusing
YOU WILL NEED TO UNDERSTAND HOW TO DESIGN PRIMERS FOR THE LAB
Once you have double-checked your primers, turn in your bioinformatics
worksheet so that a TA can check your answers.
Note: The third primer you need, LB is the same for all of the SALK lines since they all
have the same DNA insert used as a mutagen. We will provide you with the LB Primer