EIGHTH ROCKY MOUNTAIN BIOINFORMATICS CONFERENCE by sdfgsg234

VIEWS: 61 PAGES: 92

									E i g h t h R o c k y M o u n ta i n
             B i o i n f o R M at i c s
                    confEREncE




                    D E c E M B E R 9 to 11 , 2 010


                       SnowmaSS/aSpen
                            Color ado




                                 Conference Chair
                           lawrence Hunter, phd
                           University of Colorado,
                        denver School of medicine
D E a R R o c k y 2 010 pa R t i c i pa n t


welcome to the eighth annual rocky mountain regional Bioinformatics meeting.
The organizers hope that you enjoy the program, and find the meeting a
productive opportunity to meet researchers, students and industrial users of
bioinformatics technology in our area. we think we have the best program yet,
offering a remarkable cross-section of bioinformatics research.

There is clearly a lot of computational bioscience going on in the region.
Scientists from Colorado, new mexico, California and nevada make up the
bulk of the attendees, and are joined by substantial contingents from twelve
other states, ranging from minnesota to Hawaii. registrants from eleven other
countries, spanning north america, europe, asia, and africa give the meeting an
increasingly international flavor. The meeting is a chance to get to know your
colleagues, look for collaborative opportunities, and find synergies that can drive
our field forward.

we’ve listened carefully to the comments that were made about the previous
rocky meetings, and tried to be responsive. year after year, people have
requested more time for the poster sessions, so those are expanded. also, we
are starting the meeting on Thursday and ending Saturday for those who said
they wanted time to get home on Sunday. we welcome any further suggestions
that you have to make next year’s meeting even better.

we should all be grateful for the support of our sponsors. This year, we welcome
two new sponsors: Boulder-based Somalogic, and ion Torrent, recently
purchased by life Technologies. of course, we also remain extremely grateful for
the ongoing support of the international Society for Computational Biology and
iBm. it is only with the help of these sponsors that we can make this meeting as
affordable as it is. please seek out attendees from the sponsoring organizations,
and let them know that their participation is important to you!

we are also grateful for your continued interest in the meeting. despite the
hard economic times, we are on track for a record-setting attendance of more
than 140 scientists. Finally, the meeting would simply not be possible without
organizational help from Stephanie Hagstrom, the iSCB team, and kathy
Thomas.

we hope you enjoy the science, the company, and the spectacular scenery of the
rocky mountains. welcome!

larry Hunter




ii                                                                       roCky ‘10
                                        a g E n D a at - a - g L a n c E

a g E n D a at - a - g L a n c E




                                                                                             aGenda aT-a-GlanCe
t h u R s D ay – D E c E M B E R 9 , 2 0 1 0
11:00 aM – 1:00 pM        REgistRation                          siLvERtREE LowER LEvEL

1:00 pM – 1:45 pM         kEynotE 1

nEtwoRk as BioMaRkER foR ovaRian cancER pRognosis
Sol Efroni, PhD, Bar Ilan University, Ramat Gan, Israel
1:45 pM – 2:45 pM         oRaL pREsEntations 1–6

2:45 pM – 3:00 pM         BREak

3:00 pM – 4:10 pM         oRaL pREsEntations 7–13

4:10 pM – 4:25 pM         BREak

4:25 pM – 5:10 pM         kEynotE 2

Data anD anaLytics tRansfoRMing hEaLthcaRE foR sMaRtER hEaLthcaRE & LifE
sciEncE
Kirk E. Jordan, PhD, Emerging Solutions Executive & Assoc. Prog. Director,
Computational Science Center, IBM T.J. Watson Research
5:10 pM – 6:20 pM         oRaL pREsEntations 14–20

7:00 pM – 9:00 pM         BanquEt            iL poggio REstauRant, snowMass viLLagE



f R i D ay – D E c E M B E R 1 0 , 2 0 1 0
9:00 aM – 9:45 aM         kEynotE 3

intERactivE chRoMatin MoDELing: towaRDs a coMputationaL kaRyotypE
Thomas C. Bishop, PhD, Center for Computational Science, Tulane University
9:45 aM – 10:45 aM        oRaL pREsEntations 21–26

10:45 aM – 11:00 aM       BREak

11:00 aM – 12:00 pM       oRaL pREsEntations 27–32

12:00 pM – 4:00 pM        BREak

4:00 pM – 4:45 pM         kEynotE 4

suppoRting REsEaRch anD anaLysis acRoss thE BioMEDicaL LitERatuRE using
visuaL anaLytics
Carsten Görg, PhD, University of Colorado School of Medicine
4:45 pM – 5:45 pM         oRaL pREsEntations 33–38

5:45 pM – 8:00 pM         REcEption anD postER sEssion                  sincLaiR RooM *




session Locations: All oral presentations will be held at the Silvertree Hotel lower level
* poster session Location: Snowmass Conference Center across the street from Silvertree Hotel


roCky ‘10                                                                                3
                         a g E n D a at - a - g L a n c E

                     s at u R D ay – D E c E M B E R 1 1 , 2 0 1 0
aGenda aT-a-GlanCe




                     9:00 aM – 9:45 aM         kEynotE 5

                     nEw appRoachEs foR coMpaRing BioLogicaL tREEs anD nEtwoRks
                     Shawn Gomez, PhD, University of NC-Chapel Hill
                     9:45 aM – 10:35 aM        oRaL pREsEntations 39–43

                     10:35 aM – 12:00 pM       postER sEssion                                * sincLaiR RooM

                     12:00 pM – 4:00 pM        BREak

                     4:00 pM – 5:00 pM         oRaL pREsEntations 44–49

                     5:00 pM – 5:45 pM         kEynotE 6

                     thinking, MEasuRing, caLcuLating anD Diagnosing: fRoM pLatfoRM to nEw
                     iDEas aBout BioLogy
                     Larry Gold, MD, PhD, CEO and Chairman of the Board, SomaLogic, Inc.,
                     Professor, University of Colorado, Boulder
                     5:45 pM                   Rocky ‘10 cLosing coMMEnts




                     session Locations: All oral presentations will be held at the Silvertree Hotel lower level
                     * poster session Location: Snowmass Conference Center across the street from Silvertree Hotel


                     4                                                                               roCky ‘10
                                                               agEn Da

agEn Da

t h u R s D ay – D E c E M B E R 9 , 2 0 1 0
11 : 0 0   am   – 1:00    pm   REgistRation               siLvERtREE LowER LEvEL

1:00     pm   – 1:45     pm    kEynotE 1


nEtwoRk as BioMaRkER foR ovaRian cancER pRognosis
Sol Efroni, PhD, Bar Ilan University, Ramat Gan, Israel
1:45     pm   – 1 : 55   pm    oRaL pREsEntation 1




                                                                                    aGenda
fun anD gaMEs with RDf — Moving Rat Data on to thE sEMantic wEB
Presenter: Simon Twigger, Medical College of Wisconsin
Authors: Simon Twigger, Joey Geiger, Jennifer Smith
1 : 55   pm   – 2:05     pm    oRaL pREsEntation 2


fuLL-tExt BioMEDicaL LitERatuRE pRocEssing: MoRE than a scaLing chaLLEngE
Presenter: Christophe Roeder, University of Colorado, Denver
Authors: Christophe Roeder, Tom Christiansen, Helen Johnson, Karin Verspoor,
Gully Burns, Lawrence Hunter
2:05     pm   – 2 : 15   pm    oRaL pREsEntation 3


phiRast: a pipELinE foR RapiD annotation of phagE gEnoMEs using suBsystEMs
tEchnoLogy
Presenter: Ramy Aziz, San Diego State University
Authors: Ramy Aziz, Robert Olson, Ross Overbeek, Gordon Pusch, the PhAnToMe
team, Robert Edwards
2 : 15   pm   – 2 : 25   pm    oRaL pREsEntation 4


poLBasE: a REpositoRy of BiochEMicaL, gEnEtic, anD stRuctuRaL infoRMation
aBout Dna poLyMERasEs
Presenter: Bradley Langhorst, New England Biolabs
Authors: Bradley Langhorst, Nicole Nichols
2 : 25   pm   – 2 : 35   pm    oRaL pREsEntation 5


oRo MinER, an appLication foR invEstigating cELL coMMunication in
MuLticELLuLaR oRganisMs
Presenter: Michael Rogers, University of Nevada Las Vegas
Authors: Prashant Singh, Michael Rogers, Patrick Gradie, Rinu Thomas,
Dharmistha Kaul, Brandon Roe, Shruti Patel, Briana Sugihara, Narineh Abadian,
Michael Gryk, Martin Schiller
2 : 35   pm   – 2:45     pm    oRaL pREsEntation 6


EaRLy DEtEction anD DynaMics of RaRE viRaL vaRiants By uLtRaDEEp sEquEncing
Presenter: Peter Hraber, Los Alamos National Lab
Authors: Peter Hraber, Will Fischer, Elena Giorgi, Thomas Leitner, Tanmoy
Bhattacharya, Bette Korber
roCky ‘10                                                                       5
             agEn Da

         2:45 pM – 3:00 pM       BREak (15 MinutEs)

         3:00 pM – 3:10 pM       oRaL pREsEntation 7


         a pRocEss foR unifying souRcEs of BioMEDicaL infoRMation in an RDf-BasED
         knowLEDgE BasE
         Presenter: Kevin Livingston, University of Colorado Denver
         Authors: Kevin Livingston, Michael Bada, Lawrence Hunter
         3:10 pM – 3:20 pM       oRaL pREsEntation 8


         tEsting foR Joint association of aLL snp paiRs
         Presenter: Ronald Schuyler, University of Colorado
aGenda




         Authors: Ronald Schuyler, Lawrence Hunter
         3:20 pM – 3:30 pM       oRaL pREsEntation 9


         cognitivE task fLows anD visuaL anaLytics
         Presenter: Barbara Mirel, University of Michigan
         Authors: Barbara Mirel, Felix Eichinger
         3:30 pM – 3:40 pM       oRaL pREsEntation 10


         catEgoRycoMpaRE: high-thRoughput Data MEta-anaLysis using gEnE
         annotations
         Presenter: Robert Flight, University of Louisville
         Authors: Robert Flight, Jeffrey Petruska, Benjamin Harrison, Eric Rouchk
         3:40 pM – 3:50 pM       oRaL pREsEntation 11


         paiRwisE agonist scanning pREDicts cELLuLaR signaLing REsponsEs to
         coMBinatoRiaL stiMuLi
         Presenter: Scott Diamond, University of Pennsylvania
         Authors: Scott Diamond, Manash Chatterjee
         3:50 pM – 4:00 pM       oRaL pREsEntation 12


         a BayEsian nEtwoRk fRaMEwoRk foR statisticaL assEssMEnt of thE intEnt to
         staBiLizE yERsinia pEstis
         Presenter: Bobbie-Jo Webb-Robertson, Pacific Northwest National Laboratory
         Authors: Bobbie-Jo Webb-Robertson, Lee Ann McCue, Craig McKinstry, Brian
         Clowers, Heather Colburn, Christina Sorensen, David Wunschel, Karen Wahl
         4:00 pM – 4:10 pM       oRaL pREsEntation 13


         iDEntifying gEnEs in thE DRosophiLa hh pathway By intEgRating tf BinDing anD
         gEnE ExpREssion Data
         Presenter: Daniel Dvorkin, University of Colorado Denver
         Authors: Daniel Dvorkin, Brian Biehs, Katerina Kechris
         4:10 pM – 4:25 pM       BREak (15 MinutEs)




         6                                                                          roCky ‘10
                                                             agEn Da

4:25 pM – 5:10 pM      kEynotE 2


Data anD anaLytics tRansfoRMing hEaLthcaRE foR sMaRtER hEaLthcaRE & LifE
sciEncE
Kirk E. Jordan , phd , Emerging Solutions Executive & Assoc. Prog. Director,
Computational Science Center, IBM T.J. Watson Research
5:10 pM – 5:20 pM      oRaL pREsEntation 14


cancER gEnE ExpREssion Data is not noRMaLLy DistRiButED: anaLysis of
Data DistRiButions anD thEiR EffEcts on gEnE sELEction anD MoLEcuLaR
cLassification
Presenter: Nicholas Marko, Cleveland Clinic Department of Neurosurgery




                                                                                   aGenda
Authors: Nicholas Marko, Robert Weil
5:20 pM – 5:30 pM      oRaL pREsEntation 15


ExpLoRing MachinE LEaRning cLassifiERs foR thE pREDiction of ago BounD
tRanscRipts with MiRna sEED MatchEs
Presenter: Abel Licon, Thermo Fisher Scientific
Authors: Abel Licon, Kevin Sullivan, Amanda Birmingham
5:30 pM – 5:40 pM      oRaL pREsEntation 16


spLicEgRaphER: pREDicting spLicE gRaphs fRoM DivERsE EviDEncE
Presenter: Mark Rogers, Colorado State University
Authors: Mark Rogers, Asa Ben-Hur, Anireddy Reddy
5:40 pM – 5:50 pM      oRaL pREsEntation 17


pLasMa MEtaBoLitEs in thE MaMMaLian hiBERnation cycLE
Presenter: Anis Karimpour-Fard, University of Colorado Denver
Authors: Anis Karimpour-Fard, L. Elaine Epperson, Lawrence Hunter, Sandra
Martin
5:50 pM – 6:00 pM      oRaL pREsEntation 18


DEtEcting gEnoME-wiDE copy nuMBER vaRiations in a singLE saMpLE using nExt
gEnERation sEquEncing Data
Presenter: Rajesh Gottimukkala, Life Technologies
Authors: Rajesh Gottimukkala, Fiona Hyland, Somalee Datta, Asim Siddiqui, Ryan
Koehler, Yutao Fu
6:00 pM – 6:10 pM      oRaL pREsEntation 19


intERaction sitEs in MoDELs of pRotEin intERaction nEtwoRk EvoLution
Presenter: Todd Gibson, University of Colorado Denver
Authors: Todd Gibson, Debra Goldberg




roCky ‘10                                                                      7
             agEn Da

         6:10 pM – 6:20 pM         oRaL pREsEntation 20


         invEstigation of stRuctuRaL Basis of oncogEnEsis pRopERty of gtpasE h-Ras
         pRotEin: a MoLEcuLaR DynaMics siMuLation stuDy
         Presenter: Gyana Satpathy, National Institute of Technology
         Authors: Gyana Satpathy, Raghunath Satpathy, B.P. Nayak
         7:00 pM — 9:00 pM         BanquEt            iL poggio REstauRant, snowMass viLLagE



         f R i D ay – D E c E M B E R 1 0 , 2 0 1 0
         9:00 aM – 9:45 aM         kEynotE 3
aGenda




         intERactivE chRoMatin MoDELing: towaRDs a coMputationaL kaRyotypE
         Thomas C. Bishop, phd, Center for Computational Science, Tulane University
         9:45 aM – 9:55 aM         oRaL pREsEntation 21


         vaLiDation of pRotEin functionaL sitE pREDictions using autoMatED
         BioMEDicaL LitERatuRE anaLysis
         Presenter: Karin Verspoor, University of Colorado Denver
         Authors: Karin Verspoor, Judith Cohn, Christophe Roeder, Michael Wall
         9:55 aM – 10:05 aM        oRaL pREsEntation 22


         MoDELing gEnE-spEciEs Data By gEnERaLizED REpLicatoR DynaMics foR EfficiEnt
         phyLogEnEtic infEREncE
         Presenter and Author: Ying Liu, University of North Texas at Dallas
         10:05 aM – 10:15 aM       oRaL pREsEntation 23


         coMpaRativE anaLysis of apicoMpLExan BioLogicaL pRocEssEs
         Presenter: Segun Fatumo, Center for Tropical & Emerging Global Diseases
         Authors: Segun Fatumo, Jessica Kissinger
         10:15 aM – 10:25 aM       oRaL pREsEntation 24


         stop using Just go: a MuLti-ontoLogy EnRichMEnt anaLysis tooL foR gEnEs anD
         pRotEins
         Presenter: Emily Howe, The Buck Institute
         Authors: Emily Howe, Uday Evani, Mathew Fleish, Nigam Shah, Sean Mooney
         10:25 aM – 10:35 aM       oRaL pREsEntation 25


         coMpaRativE ontoLogicaL anD nEtwoRk anaLysis of aging associatED gEnEs in
         huMans anD MoDEL oRganisMs
         Presenter: Ari Berman, Buck Institute for Age Research
         Authors: Ari Berman, Tobias Wittkop, Emily Howe, Sean Mooney
         10:35 aM – 10:45 aM       oRaL pREsEntation 26


         fLu anD DRugs anD Rocky10
         Presenter and Author: Christian Forst, UT Southwestern Medical Center


         8                                                                         roCky ‘10
                                                                 agEn Da

10:45 aM – 11:00 aM     BREak (15 MinutEs)

11:00 aM – 11:10 aM     oRaL pREsEntation 27


anaLysis woRkfLow of MEthyLation anD gEnE ExpREssion MicRoaRRay in
pEDiatRic cRohn DisEasE
Presenter: Tzu Lip Phang, University of Colorado Denver
Authors: Tzu Phang, Anna Hunter, Ping Yao Zeng, Theresa Kerbowski, Edwin de
Zoeten
11:10 aM – 11:20 aM     oRaL pREsEntation 28


EvoLution of cis-REguLatoRy ELEMEnts




                                                                                   aGenda
Presenter and Author: Ken Yokoyama, University of Colorado Denver
11:20 aM – 11:30 aM     oRaL pREsEntation 29


MoRE is not aLways BEttER. consiDERations in thE usE of tiME couRsE
MicRoaRRay Data
Presenter: Elizabeth Siewert, Colorado School of Public Health
Authors: Elizabeth Siewert, Katerina Kechris
11:30 aM – 11:40 aM     oRaL pREsEntation 30


sEMantic RichnEss in thE coLoRaDo RichLy annotatED fuLL-tExt (cRaft) coRpus
Presenter: Michael Bada, University of Colorado Anschutz Medical Campus
Authors: Michael Bada, Miriam Eckert, Arrick Lanfranchi, William A. Baumgartner,
Jr., Colin Warner, Amanda Howard, William Corvey, Nianwen Xue, K. Bretonnel
Cohen, Karin Verspoor, Judith A. Blake, Martha Palmer, Lawrence Hunter
11:40 aM – 11:50 aM     oRaL pREsEntation 31


functionaL REguLatoRy ciRcuits inDucED By tRanscRiption factoRs anD sMaLL
Rnas
Presenter: Molly Megraw, Duke University
Authors: Molly Megraw, Uwe Ohler
11:50 aM – 12:00 pM     oRaL pREsEntation 32


RapiD coMpREhEnsivE sELEction of sEnsitivE anD spEcific oLigonucLEotiDE
signatuREs fRoM LaRgE hiERaRchicaLLy cLustERED nucLEic aciD sEquEncE
DatasEts
Presenter: Harald Meier, Technical University of Munchen
Authors: Kai Bader, Christian Grothoff, Harald Meier
12:00 pM – 4:00 pM      BREak

4:00 pM – 4:45 pM       kEynotE 4


suppoRting REsEaRch anD anaLysis acRoss thE BioMEDicaL LitERatuRE using
visuaL anaLytics
Carsten Görg, phd, University of Colorado School of Medicine




roCky ‘10                                                                      9
              agEn Da

         4:45 pM – 4:55 pM        oRaL pREsEntation 33


         cLassifying paREnthEsizED MatERiaL foR tExt Mining
         Presenter: Kevin Bretonnel Cohen, University of Colorado School of Medicine
         Authors: K. Bretonnel Cohen, Tom Christiansen, Lawrence Hunter
         4:55 pM – 5:05 pM        oRaL pREsEntation 34


         towaRDs intEgRativE gEnE pRioRitization
         Presenter: Graciela Gonzalez, Arizona State University
         Authors: Jang Lee, Graciela Gonzalez
aGenda




         5:05 pM – 5:15 pM        oRaL pREsEntation 35


         towaRDs a MoLEcuLaR cLassification of kiDnEy DisEasEs BasED on nEtwoRk
         anaLysis
         Presenter: Felix Eichinger, University of Michigan
         Authors: Felix Eichinger, Ramakrishna Varadarajan, Jignesh Patel, Matthias
         Kretzler
         5:15 pM – 5:25 pM        oRaL pREsEntation 36


         iDEntifying thE DynaMic statEs of thE 3D gEnoME oRganization
         Presenter: Andrzej Kudlicki, University of Texas
         Authors: Dirar Homouz, Gang Chen, Andrzej Kudlicki
         5:25 pM – 5:35 pM        oRaL pREsEntation 37


         pRoviDing contExt to gEnEtic associations with gEnE ExpREssion in REnaL
         DisEasE
         Presenter: Benjamin Keller, Eastern Michigan University
         Authors: Benjamin Keller, Sebastian Martini, Matthias Kretzler
         5:35 pM – 5:45 pM        oRaL pREsEntation 38


         gEnE anD pRoMotER DiscovERy thRough high-REsoLution ExpREssion pRofiLing
         Presenter and Author: Ian Davis, GrassRoots Biotechnology, Inc.
         5:45 pM – 8:00 pM        REcEption anD postER sEssion

                                  Snowmass Conference Center                Sinclair Room
                                  (across street from Silvertree)


         s at u R D ay – D E c E M B E R 1 1 , 2 0 1 0
         9:00 aM – 9:45 aM        kEynotE 5


         nEw appRoachEs foR coMpaRing BioLogicaL tREEs anD nEtwoRks
         Shawn Gomez, phd, University of NC-Chapel Hill




         10                                                                     roCky ‘10
                                                                     agEn Da

9:45     am   – 9 : 55   am        oRaL pREsEntation 39


using a Low DiMEnsionaL fiRing RatE MoDEL to stuDy intERactions in
BiophysicaL nEuRaL nEtwoRks
Presenter and Author: Anca Radulescu, University of Colorado at Boulder
9 : 55   am   – 10 : 0 5   am      oRaL pREsEntation 40


invEstigating RELationships BEtwEEn oBEsity anD thE BuiLt EnviRonMEnt using
agEnt-BasED MoDELing
Presenter: Helmet Karim, University of Pittsburgh
Authors: Helmet Karim, Leming Zhou




                                                                                      aGenda
10 : 0 5   am   – 10 : 15     am   oRaL pREsEntation 41


chaRactERization of gEnoMic vaRiaBiLity in cLinicaL isoLatEs of thE oRaL
pathogEn aggREgatiBactER actinoMycEtEMcoMitans
Presenter: Weerayuth Kittichotirat, University of Washington
Authors: Weerayuth Kittichotirat, Casey Chen, Roger Bumgarner
10 : 15    am   – 10 : 25     am   oRaL pREsEntation 42


ExtRacting aDvERsE DRug REactions fRoM usER posts to hEaLth-RELatED sociaL
nEtwoRks
Presenter: Laura Wojtulewicz, Arizona State University
Authors: Robert Leaman, Laura Wojtulewicz, Ryan Sullivan, Annie Skariah, Jian
Yang, Graciela Gonzalez
10:25 aM – 10:35 aM                oRaL pREsEntation 43


siMpLE LocaL assEMBLy pRogRaM
Presenter: Adam Spargo, Wellcome Trust Sanger Institute
Authors: Adam Spargo, Zemin Ning
10:35 aM – 12:00 pM                postER sEssion

                                   Snowmass Conference Center         Sinclair Room
                                   (across street from Silvertree)
12:00 pM – 4:00 pM                 BREak

4:00 pM – 4:10 pM                  oRaL pREsEntation 44


MoDELLing gEnE ExpREssion in tuMoR pRogREssion using BinaRy statEs
Presenter: Juan Emmanuel Martinez-Ledesma, ITESM Campus Monterrey
Authors: Juan Emmanuel Martinez-Ledesman, Victor Trevino
4 : 10   pm   – 4 : 20   pm        oRaL pREsEntation 45


finDing coMMunity LEaDERs in sociaL nEtwoRks
Presenter and Author: Xiaowei Xu, University of Arkansas at Little Rock




roCky ‘10                                                                        11
              agEn Da

         4 : 20   pm   – 4 : 30   pm   oRaL pREsEntation 46


         aDaptivE LEaRning nEuRaL nEtwoRks foR BinDing sitE sEaRch in gEnoMic
         sEquEncEs
         Presenter: Ivan Erill, University of Maryland Baltimore
         Authors: Joseph Cornish, Sumeet Bagde, Elisabeth Hobbs, Ivan Erill
         4 : 30   pm   – 4:40     pm   oRaL pREsEntation 47


         EnRiching REguLatoRy nEtwoRks with othER functionaL RELationships
         Presenter: Ronald Taylor, Pacific Northwest National Laboratory
         Authors: Ronald Taylor, Antonio Sanfilippo, Jason McDermott, Bob Baddeley, Rick
aGenda




         Riensche, Russ Jenson, Marc Verhagen
         4:40     pm   – 4 : 50   pm   oRaL pREsEntation 48


         gEnos: sEgMEnt-BasED REpREsEntation of gEnoMics Data. appLication to
         gEnotyping Data ManagEMEnt
         Presenter: Hugues Sicotte, Mayo Clinic
         Authors: Jean-Pierre Kocher, Hugues Sicotte, Yaxiong Lin, Eric Klee
         4 : 50   pm   – 5:00     pm   oRaL pREsEntation 49


         BioinfoRMatics stuDy of thE fERREDoxin-DEpEnDEnt BiLin REDuctasE faMiLy
         Presenter: Chanel Mejias-Rosario, Universidad Metropolitana
         Authors: Chanel Mejias-Rosario, Hugh Nicholas, Alexander Ropelewski, Ricardo
         Gonzalez-Mendez, Luis Vazquez-Quinones
         5:00     pm   – 5:45     pm   kEynotE 6


         thinking, MEasuRing, caLcuLating, anD Diagnosing: fRoM pLatfoRM to nEw
         iDEas aBout BioLogy
         Larry Gold , md, phd, CEO and Chairman of the Board, SomaLogic, Inc.
         Professor, University of Colorado, Boulder
         5:45     pm                   Rocky ‘10 cLosing coMMEnts




         12                                                                    roCky ‘10
                                           kEynotE sp EakERs

kEynotE sp EakERs

Note: Keynote speaker CV’s can be found on the Rocky website:
www.iscb.org/rocky10-program/keynote-speakers

t h o M a s c . B i s h o p, p h D

Research Associate Professor, Center for Computational Science,
Tulane University

intERactivE chRoMatin MoDELing: towaRDs a coMputationaL
kaRyotypE

aBstR ac t: Given a set of nucleosome positions, either from experiment or
theory, it is possible to construct in near real time and interactively display 3d
models of entire chromosomes at base pair resolution. These models are a first
order approximation that assumes each nucleosome is a canonical octasome
and that the linker dna assumes a sequence specific conformation similar to




                                                                                       keynoTe SpeaerS
free dna. our model allows thermal fluctuations in the nucleosome wrapping
(i.e. entry/exit angle) and linker conformation to be introduced. our current
model is likely not an accurate representation of chromatin, but it provides
critical insights, such as: properly scaled distance metrics, an indication of
how intrinsic bends or other deformations in linker dna may alter chromatin
structure, identification of potential nucleation sites for the folding of dna into
chromatin, the effects of thermal fluctuations, and identification of nucleosome
positions that are sterically excluded. Based on available nucleosome positioning
data, a computational karyotopye of the sixteen chromosomes in the yeast
genome is presented. a web based version of these tools capable of folding and
displaying kilobase segments of dna into chromatin in real time is available at
http://dna.ccs.tulane.edu/icm.

soL EfRoni, phD

Assistant Professor, Bar Ilan University, Ramat Gan, Israel

nEtwoRk as BioMaRkER foR ovaRian cancER pRognosis

aBstR ac t: ovarian cancer causes more deaths than any other
gynecologic cancer. as the research community continues to
invest extensive efforts in identifying disease etiology, with a current multicenter
effort in the form of The Cancer Genome atlas.

we have used the molecular characteristics — genomic and epigenomic
information from more than 200 women, combined with clinical features,
to identify molecular networks most affiliated with prognosis. By quantifying
network modifications, we measure the complex, combined, co-dependent
behavior of network genes in a manner that is both extremely significant in
its affiliation with phenotype, and is highly robust — enough to be able to
significantly stratify prognosis in other, independent, datasets.

roCky ‘10                                                                         13
                       kEynotE sp EakERs

                  we show how gene components of the network themselves do not serve as
                  efficient prognostic biomarkers. only a combined, co-dependent behavior may
                  serve as a biomarker. By affiliating processes’ description (signaling pathways)
                  with specific phenotype, we expose these process to further study and specific
                  intervention.

                  L aR Ry goLD, M D, p h D

                  CEO and Chairman of the Board, SomaLogic, Inc.
                  Professor, University of Colorado, Boulder

                  thinking, MEasuRing, caLcuLating, anD Diagnosing: fRoM
                  pLatfoRM to nEw iDEas aBout BioLogy

                  aBstR ac t: Biology makes no sense to a person of logic,
                  the very people filling the seats at this conference. Biology evolved over a
                  very long time, under conditions that changed from time to time, making
                  reverse engineering difficult. mathematics is logical, even to me, and thus
                  mathematicians can analyze and calculate beautifully, using awful data as their
keynoTe SpeaerS




                  input (a common problem). it is easier to reverse engineer car bumpers than
                  human biology.

                  we thought for a long time about what inputs should be the basis for deep
                  thinking. Unlike everyone else (or nearly everyone), we decided almost 15 years
                  ago that phenotype was better than genotype, in spite of the power of dna
                  sequencing technology. during those 15 years, while our colleagues decreed
                  personalized medicine to be the study of genotypes, we worked to find an
                  unbiased measurement of molecular phenotype. The platform we developed
                  was broad proteomics, on a scale not possible through any other methodology.

                  For the scientists at Somalogic the thrill has been overcoming difficult scientific
                  challenges. For people at this conference the thrill will be to access data that
                  may be logical, or at least coherent. i will provide examples of proteomic
                  similarities among people (and tumors) with broad genotypic differences. The
                  vast dimensionality of genomics can be reduced through proteomics, resulting in
                  (perhaps) insights about human biology.

                  shawn goMEz, phD

                  Assistant Professor, University of NC-Chapel Hill

                  nEw appRoachEs foR coMpaRing BioLogicaL tREEs anD
                  nEtwoRks

                  aBstR ac t: representation of biological relationships in
                  the form of trees or networks is a core aspect of numerous
                  biological analyses. Continued progress in this area requires
                  improvements in how we can analyze, compare and compute information on
                  these graphs.


                  14                                                                       roCky ‘10
                                          kEynotE sp EakERs

we have recently developed a set of related computational methods for the
representation and comparison of these trees/networks. The approaches
are based on the alignment of representative high-dimensional embeddings
or structures that provide the ability to measure global similarity, as well as
differences, between graphs. most recently, we have developed a generalized
spectral algorithm for the comparison of weighted graphs. Unlike other
methods, this approach takes into account edge information that is often
available and may be of significant importance in improving prediction accuracy.
we have applied these approaches to problems in phylogenetic tree comparison
including the detection of horizontal gene transfer events and the identification
protein interaction specificity in coevolving multigene families. Together, these
approaches provide a useful set of tools for future application in the analysis of
these increasingly common high-dimensional data sets. Finally, such methods
have broader applications in computer vision, image analysis and computational
chemistry.

caRstEn göRg, phD




                                                                                       keynoTe SpeaerS
Instructor in the Computational Bioscience Program, University
of Colorado School of Medicine

suppoRting REsEaRch anD anaLysis acRoss thE BioMEDicaL
LitERatuRE using visuaL anaLytics

aBstR ac t: Visual analytics is an emerging academic
discipline. it has been defined as “the science of analytical
reasoning facilitated by interactive visualizations” and aims
at supporting people to analyze and understand data through the help of
computer visualizations, and ultimately make decisions based on that analysis
and understanding. Visual analytics includes three main components: (1)
computational techniques and algorithms for data manipulation, transformation,
and analysis; (2) interactive visualizations and user interfaces to present the
data; and (3) an analytical reasoning component for understanding how people
think, reason, and come to conclusions, in order to design software that best
leverages those abilities.

we have developed a visual analytics system, named Bio-Jigsaw, to support
biologists in the challenging task of finding relevant publications in the large and
rapidly growing body of biomedical literature. Search queries on pubmed often
return thousands of publications and it can be tedious to filter out irrelevant
publications and choose a manageable set to read. Bio-Jigsaw acts like a
visual index on a document collection and supports biologists in investigating
and understanding connections between biological entities. we apply natural
language processing techniques to identify biological entities such as genes and
pathways and visualize connections among them via multiple representations.
Connections are based on co-occurrence in abstracts and also are drawn
from ontologies or annotations in digital libraries. Bio-Jigsaw’s interactive


roCky ‘10                                                                        15
                       kEynotE sp EakERs

                  visual representations help biologists more rapidly explore and understand
                  connections between biological entities and find relevant publications to read.

                  ki R k E. JoR Dan, p h D

                  Emerging Solutions Executive & Assoc. Prog. Director, Computational
                  Science Center, IBM T.J. Watson Research

                  Data anD anaLytics tRansfoRMing hEaLthcaRE foR sMaRtER
                  hEaLthcaRE & LifE sciEncE

                  aBstR ac t: in the life sciences and healthcare area there is a wealth of data.
                  yet, we need to make sense of this deluge of data. This is being done through
                  what is currently termed analytics. in this talk, i will describe some of the areas
                  that iBm is working to transform healthcare and life sciences through the use
                  of analytics and information technology including high performance computing
                  (HpC) systems infrastructure. while i will describe some projects underway
                  in these areas, i will also comment that to truly take full advantage of HpC to
                  accelerate the healthcare and life science transformation, we need to make
keynoTe SpeaerS




                  access easier. i will describe work being done to make HpC accessible to a wide
                  audience and eventually targeting the healthcare and life science practitioner
                  directly.




                  16                                                                       roCky ‘10
                                          o R a L p R E s E n tat i o n s

o R a L p R E s E n tat i o n s

o p 1 : f u n a n D g a M E s w i t h R D f — M o v i n g R at D ata o n to
thE sEMantic wEB

Presenter: Simon N Twigger, Medical College of Wisconsin
Authors: Simon Twigger, Joey Geiger, Jennifer Smith

aBstR ac t: we have been using the national Center for Biomedical
ontology’s annotator tool to annotate the text resources available for rat
expression datasets housed in the nCBi’s Gene expression omnibus database.
This has provided us with a large amount of anatomical and rat strain
annotations for genes that we can combine with our existing Gene ontology,
pathway, disease and phenotype annotations created by the rat Genome
database (rGd). we are now utilizing rdF, owl and related technologies to
bring this data to bear on candidate gene discovery. as part of this process
we have getting up to speed with rdF, exploring how to create additional
ontologies to classify rGd data and developed a hybrid relational database/
triple store application using ruby on rails and allegrograph. we are now
wrestling with how best to provide this rdF in a way that makes it maximally
useful to us and to others. i will describe our progress to date and some
observations on the pros and cons of the use of rdF in this context.

o p 2 : f u L L -t E x t B i o M E D i c a L L i t E R at u R E p R o c E s s i n g : M o R E
than a scaLing chaLLEngE


                                                                                                 oral preSenTaTionS
Presenter: Christophe Roeder, University of Colorado, Denver
Authors: Christophe Roeder, Tom Christiansen, Helen Johnson, Karin Verspoor,
Gully Burns, Lawrence Hunter

aBstR ac t: recent efforts in biomedical natural language processing (nlp)
are moving beyond abstracts and shared tasks to full-text and knowledge
generation. Such tasks involve more than desktop computing and require
parallelization of processing across more elaborate hardware to complete. we
describe our use of oracle Grid engine (oGe, Sun Grid engine), to distribute
Uima processes over the lab’s cluster of 6 8-core machines. Thankfully,
processing each document of a large collection is an independent effort,
minimizing concurrency issues, and adapting our Uima framework to oGe
required minimal effort. it allows us to distribute our systems in a form nearly
identical to what runs on desktops. in pursuing such scaling tasks for the
biomedical literature, we are finding further challenges. obtaining a large
collection of full-text documents and preparing them for processing is not trivial.
Current projects in the lab involve thousands or tens of thousands of full-text
documents. These documents are identified in a retrieval step using a pub
med search and the full text is then accessed using a combination of sources
including pubmed Central open access, licensed publisher collections, and
some manual retrieval. Since those documents are in any of HTml, pdF, or Xml

roCky ‘10                                                                                   17
                          o R a L p R E s E n tat i o n s

                     the documents must be converted into plain text for the nlp tools. This process
                     is complicated by the use of Unicode characters that aren’t handled well by the
                     software and are used often enough to adversely affect results. we explore the
                     issues we have encountered, as well as solutions we have implemented.

                     o p 3 : p h i R a s t : a p i p E L i n E f o R R a p i D a n n o tat i o n o f p h a g E
                     gEnoM Es usi ng su BsystEMs tEch noLogy

                     Presenter: Ramy K Aziz, San Diego State University
                     Authors: Ramy Aziz, Robert Olson, Ross Overbeek, Gordon Pusch, the PhAnToMe
                     team, Robert Edwards

                     aBstR ac t: phages are the most abundant nucleic acid-based biological
                     entities; however, their diversity is not properly sampled in sequence databases.
                     although 20% of the biosphere’s nucleotides are estimated to be viral in origin,
                     less than 0.1% of nucleotides deposited in GenBank are from phages, and
                     publicly available phage genomes are poorly annotated. The advances in high-
                     throughput sequencing techniques led to a surge in the number of sequenced
                     phage genomes and metagenomes, yet there is no automated pipeline for de
                     novo phage genome annotation. To address this need, we developed the phage
                     rapid annotations using Subsystems Technology (phiraST) pipeline as a part
                     of the phage annotation Tools and methods (phanTome) project (http://www.
                     phantome.org) using the Seed’s subsystems technology for genome annotation.
                     This technology, which combines human expertise and computational tools,
                     was used to annotate and re-annotate all of the proteins in complete phage
oral preSenTaTionS




                     genomes. The major differences between phage and bacterial genome
                     annotation are that the former lack a set of universal genes that can be used to
                     train orF callers and often include overlapping open-reading frames. Currently,
                     phiraST accepts fasta or GenBank files, and performs orF calling, annotation,
                     and subsystem reconstruction for an average phage genome (~ 100 kbp) in two
                     to three hours. Future development includes predicting a phage’s phylogenetic
                     neighbors and its lifestyle, and providing a primary reconstruction of phage
                     modules to understand its biology. * other members: mya Breitbart, Bhakti
                     dwivedi, Julio Cesar ignacio espinosa, Jeff elhai, Bradley Hull, matthew knatz, Jp
                     massar, katelyn mcnair, and matthew Sullivan

                     o p 4 : p o L B a s E : a R E p o s i to Ry o f B i o c h E M i c a L , g E n E t i c ,
                     a n D s t R u c t u R a L i n f o R M a t i o n a B o u t D n a p o Ly M E R a s E s

                     Presenter: Bradley W Langhorst, New England Biolabs
                     Authors: Bradley Langhorst, Nicole Nichols

                     aBstR ac t: polbase is a freely accessible database of dna polymerases
                     and references. it has been developed in a collaborative model with experts




                     18                                                                               roCky ‘10
                                         o R a L p R E s E n tat i o n s

whose contributions reflect their varied backgrounds in genetics, structural
biology, and biochemistry. polbase is designed to compile detailed results of
polymerase experimentation presenting them in a dynamic view to inform
further research. after validation, results from specific references are displayed
in context with relevant experimental details and are always traceable to their
source publication. polbase is interconnected with other resources, including
pubmed, Uniprot, and the rCSB protein data Bank, to provide multi-faceted
views of polymerase knowledge. in addition to a simple web interface, polbase
data is exposed for custom analysis by external software. our goal is to produce
a collaborative, dynamic, comprehensive research tool covering all important
aspects of polymerases, from sequence and structure to biochemistry. polbase is
an open resource available at http://polbase.neb.com.

o p 5 : o R o M i n E R , a n a p p L i c at i o n f o R i n v E s t i g at i n g c E L L
c o M M u n i c a t i o n i n M u Lt i c E L L u L a R o R g a n i s M s

Presenter: Michael D Rogers, University of Nevada Las Vegas
Authors: Prashant Singh, Michael Rogers, Patrick Gradie, Rinu Thomas,
Dharmistha Kaul, Brandon Roe, Shruti Patel, Briana Sugihara, Narineh Abadian,
Michael Gryk, Martin Schiller

aBstR ac t: Communication within an organism involves the utilization
of widely varied processes across multiple levels of granularity. although we
can identify and classify relationships between an organism’s organ systems,
organs, cell types, etc. (components), they are often studied outside the
context of the system. The limitation is present because there is no model that


                                                                                                 oral preSenTaTionS
describes the luminal, spatial, and temporal relationships between the various
components. oro (organism relational ontology) miner is designed to solve
this problem, identifying components that are capable of communicating with
each other either though direct contact or by diffusion of a ligand though a
contiguous lumen. The primary function of oro miner is to allow a user to
view relationships within the organism using a sophisticated visual graph
representation. The layout of relationships between different components in the
graph represents luminal and spatial properties, which are readily visible. The
secondary function of oro miner is to provide a query-based search mechanism
to the user. Basic questions can be presented to the application, which returns
appropriate answers. as an example, the user may issue a query regarding which
cell types are in direct contact with ?-cells located in the mammalian pancreas,
stomach, and small intestine. The application would return 38 cell types in direct
contact, as well as 3751 cell types that are capable of communication through
contiguous lumen and perivascular spaces. The database contains >180,000
entries for human and mouse, curated from textbooks, electronic databases, and
the primary literature.




roCky ‘10                                                                                   19
                          o R a L p R E s E n tat i o n s

                     o p 6 : E a R Ly D E t E c t i o n a n D D y n a M i c s o f R a R E v i R a L
                     v a R i a n t s B y u Lt R a D E E p s E q u E n c i n g

                     Presenter: Peter T Hraber, Los Alamos National Lab
                     Authors: Peter Hraber, Will Fischer, Elena Giorgi, Thomas Leitner, Tanmoy
                     Bhattacharya, Bette Korber

                     aBstR ac t: The ability to detect rare viral variants as they accumulate under
                     cytotoxic-T-lymphocyte selection illuminates the evolution of viral escape from
                     the immune system. while Sanger sequencing via single-genome amplification
                     (SGa) yields highly accurate sequences, detecting rare variants requires intensive
                     sampling. in contrast, next-generation sequencing technologies yield 3-4 orders
                     of magnitude more sequences, and provide sensitivity to detect rare variants
                     undetected by SGa sequencing. previous sequencing results informed design
                     of ultradeep pyrosequencing strategies for two longitudinal studies: (1) the
                     SiV p199ry nef epitope in experimentally infected macaques and (2) whole-
                     genome HiV-1 subtype B from the CHaVi 001 study participant designated as
                     subject 700010040 (CH40). The SiV study follows p199ry epitope evolution in
                     5 mamu a*01+/a*02+ rhesus macaques intravenously infected with 60,000
                     copies per ml of SiVmac251 inoculum, to quantify frequencies of epitope
                     variants at 21, 35, and 84 days post-infection. The HiV-1 study represents the
                     viral genome with 35 overlapping regions that average 500 nt (median 501,
                     range 448-556 nt) and from 5 longitudinal samples from acute to chronic
                     infection. with Sanger sequencing results guiding amplicon design, ultradeep
                     sequencing yielded 24,870-110,200 (median 48,719) SiV reads and 3,762-
oral preSenTaTionS




                     27,222 (median 13,801) from HiV per amplicon region sampled. Ultradeep
                     sequencing identified early waves of escape variants that had previously been
                     undetected limited conventional sequencing, and helps elucidate when and
                     how selection influences viral evolution.

                     op7: a pRocEss foR unifying souRcEs of BioMEDicaL
                     i n f o R M at i o n i n a n R D f - B a s E D k n o w L E D g E B a s E

                     Presenter: Kevin M Livingston, University of Colorado Denver
                     Authors: Kevin Livingston, Michael Bada, Lawrence Hunter

                     aBstR ac t: Biomedical information is being produced at unprecedented rates,
                     curated by various entities, and stored in various formats, typically optimized
                     for one or a specific set of tasks. This is often a major stumbling block for
                     researchers who would like to use this information for a new task, or integrate
                     this information with other sources. Simultaneously biomedical ontologies are
                     being produced to classify and organize the structure of biomedical concepts,
                     most notable are the oBos (open Biomedical ontologies). These ontologies can
                     be used as guides for how to integrate and organize the information contained
                     in the numerous independently curated projects. This work presents a method
                     and organization for unifying independently curated information using the
                     abstractions provided by the oBos. we are building knowledge base of biology

                     20                                                                               roCky ‘10
                                         o R a L p R E s E n tat i o n s

that allows multiple resources to be combined in a coherent fashion so that they
can be queried and reasoned over uniformly. Unifying biomedical knowledge
in this way means applications and users can interact with the information
from several sources without needing to know the exact structure of the
individual sources, by using the abstractions provided by the oBos. The ability
to leverage these abstractions does not eliminate the provenance of the data,
which is preserved throughout the system. This knowledge base is being used
at the foundation for several projects, including: statistical and pattern-based
nlp, visualization tools, and other large-scale analytical tools. The common
underlying knowledge base, structured in rdF, also functions as an integration
point for these different tools and methods.

o p 8 : t E s t i n g f o R J o i n t a s s o c i at i o n o f a L L s n p pa i R s

Presenter: Ronald P Schuyler, University of Colorado
Authors: Ronald Schuyler, Lawrence Hunter

aBstR ac t: in the five years since the publication of the first genome-wide
association study, nearly 700 publications have described more than 3000
single nucleotide polymorphisms associated with susceptibility to a disease
or trait. in some cases these associations have confirmed existing hypotheses,
while in others the implicated genes have provided surprising insights into
disease mechanisms. despite this success, GwaS results do not account for
all of the expected heritability of most conditions studied. in complex traits
(those which do not exhibit simple mendellian, i.e. single-gene, inheritance



                                                                                           oral preSenTaTionS
patterns) it may be useful to look for interactions between pairs of loci. Using
the standard likelihood-ratio test with logistic regression models to test for
joint effects requires iterative methods for determining maximum likelihood
estimates, making tests of in the 135 billion Snp pairs in a typical 520k chip
computationally intractable. By using a method based on log-linear models,
mles may be obtained in closed form, making it possible to test all pairs for
joint association in under two days on a 100 node computing cluster. despite
the huge multiple testing burden, we detected significant hits where single-locus
analysis did not. These pairs can be added to the list of significant single-locus
associations to provide a more complete picture of the genetics and molecular
mechanisms contributing to complex diseases.

o p 9 : c o g n i t i v E t a s k f L o w s a n D v i s u a L a n a Ly t i c s

Presenter: Barbara Mirel, University of Michigan
Authors: Barbara Mirel, Felix Eichinger

aBstR ac t: User models of sensemaking are important for guiding the
development of interactive visualizations toward an effective fit with analysts’
domain-driven practices for discovery-based explorations. However, few models
today represent sensemaking as it relates to common domain-specific analysis
problems, associated sub-goals (i.e. “task chunks”), and mixes of cognitive


roCky ‘10                                                                             21
                          o R a L p R E s E n tat i o n s

                     tasks performed for each sub-goal/chunk. To address this need, we conducted
                     a case-based, cognitive task analysis of a common biomedical workflow,
                     namely visually exploring molecular interaction networks to uncover possible
                     relationships relevant to a disease. our findings reveal a biomedical analyst’s
                     ways of thinking and knowing and show that many tasks central to success
                     in this workflow — especially metacognitive tasks — often are inadequately
                     addressed by visualization tools. our talk will describe briefly our cognitive task
                     analysis and then present the workflow as a set of cognitive task flow diagrams
                     structured by an analyst’s task chunks and analytical objectives. The diagrams
                     highlight that four multi-faceted reasoning modes interweave for almost
                     every task chunk — classification, comparison, validation, and metacognition
                     (i.e. wayfinding, managing and monitoring progress). notably, metacognition
                     for fluency in analysis/knowledge-building is prominent in every task chunk,
                     e.g. constituting 55% of the cognitive tasks enacted in one simple flow —
                     grouping relationships by similarity. However, despite wayfinding/awareness
                     and metacognition being active areas of visual analytic research, these
                     analysis processes are often under-supported by mainstream bioinformatics
                     visualizations. as implications for development, our findings suggest greater
                     usefulness could come from incorporating provenance tracking and querying
                     into visualizations for insight wayfinding and metacognition.

                     o p 1 0 : c at E g o R yc o M pa R E : h i g h -t h R o u g h p u t D ata M E ta -
                     a n a Ly s i s u s i n g g E n E a n n o t a t i o n s

                     Presenter: Robert M Flight, University of Louisville
oral preSenTaTionS




                     Authors: Robert Flight, Jeffrey Petruska, Benjamin Harrison, Eric Rouchka

                     aBstR ac t: motivation: many current dna microarray and other high-
                     throughput data meta-analysis studies concentrate on deriving a concordant
                     list of genes across many experiments to discover the “true” genes responsible
                     for a particular disease process or biological pathway or cellular response.
                     However, by concentrating on the genes in common, similarities or differences
                     that exist at a pathway or process level are ignored. results: we describe a
                     meta-analysis approach that allows comparison and contrast of gene lists at the
                     level of categorical annotation (pathway or Gene ontology annotations). This
                     categorical evaluation compares enriched annotations between gene lists, and
                     displays the results graphically to allow intuitive visualization and exploration
                     of the similarities and differences. a false discovery correction is implemented
                     to control for the effect of different sized gene lists as inputs. Conclusion: The
                     approach was tested using two gene lists, genes involved in the response to
                     denervation in muscle (a literature compendium), and in skin (experimentally
                     determined). Using the categorical comparison highlights known biological
                     processes that are common in the two cases, while also allowing one to easily
                     see areas of difference that are not apparent from examining the gene lists
                     alone. availability: The categoryCompare software is available as a Bioconductor



                     22                                                                          roCky ‘10
                                          o R a L p R E s E n tat i o n s

package, and a web interface (using rapache) has also been developed to
facilitate use in the wider research community.

o p 11 : pa i R w i s E a g o n i s t s c a n n i n g p R E D i c t s c E L Lu L a R
s i g n a L i n g R E s p o n s E s to c o M B i n ato R i a L s t i M u L i

Presenter: Scott L. Diamond, University of Pennsylvania
Authors: Scott Diamond, Manash Chatterjee

aBstR ac t: patient-specific prediction of cellular response to multiple
stimuli is central to evaluating clinical risk, disease progression, or response
to therapy. To understand how human platelets integrate diverse signals
encountered during heart attack or stroke, a high throughput assay measured
intracellular calcium responses of edTa-treated platelet-rich plasma to all
pairwise combinations of 6 major agonists (adp, convulxin, U46619, SFllrn,
aypGkF, and pGe2). This allowed analysis of signalling through p2y1, p2y12,
GpVi, Tp, par1, par4, ep and ip receptors. The calcium responses to single
agonists at 0.1, 1, 10 x eC50 and 135 pairwise combinations trained a neural
network (nn) model to predict the entire 6-dimensional platelet response
space. The nn model successfully predicted responses to sequential additions
and 54 ternary combinations of [adp], [convulxin], and [SFllrn] (r=0.85).
with 4077 nn simulations spanning the 6 dimensional space, 45 combinations
of 4-6 agonists (ranging from synergism to antagonism) were selected and
confirmed experimentally (r=0.88), revealing a highly synergistic condition of
high U46619/pGe2 ratio, consistent with the risk of CoX-2 therapy. Furthermore,



                                                                                                 oral preSenTaTionS
pairwise agonist scanning (paS) provided a direct measurement of 135 synergy
values, thus allowing a unique phenotypic scoring of 10 human donors. patient-
specific training of nns represent a compact and robust approach for prediction
of cellular integration of multiple signals in a complex disease milieu.

o p 1 2 : a B ay E s i a n n E t w o R k f R a M E w o R k f o R s tat i s t i c a L
a s s E s s M E n t o f t h E i n t E n t to s ta B i L i z E y E R s i n i a p E s t i s

Presenter: Bobbie-jo M Webb-Robertson, Pacific Northwest National Laboratory
Authors: Bobbie-Jo Webb-Robertson, Lee Ann McCue, Craig McKinstry, Brian
Clowers, Heather Colburn, Christina Sorensen, David Wunschel, Karen Wahl

aBstR ac t: in the event of a biothreat event, the genetic identity of a
microorganism is not adequate to track the origin and intent of an exposure.
However, information on the materials that were used to culture the organism
can give valuable knowledge for attribution purposes. a key forensic question
of interest is whether there was malicious intent, for example; identification of
components to stabilize the sample would infer intent to store and possibly
disperse the threat. Since many aspects of the microorganism can vary based
on culture medium, such as protein expression or medium-specific metabolic
products, a single analytical technique cannot determine solely the growth



roCky ‘10                                                                                   23
                          o R a L p R E s E n tat i o n s

                     environment of a microorganism. we present a Bayesian network framework
                     for the identification of growth media and stabilization combinations
                     associated with yersinia pestis. The Bayesian network integrates disparate
                     analytical measurements that capture various aspects of the organism’s growth
                     environment, such as the protein associated with the growth media and
                     the carbohydrates associated with stabilization. protein components of the
                     growth media are identified using tandem mass spectrometry (mS/mS) and
                     carbohydrate stabilizers are identified from a combination of mass spectral
                     techniques, include matrix-assisted laser desorption/ionization (maldi) and
                     gas chromatograph (GC/mS). we demonstrate that the network can be used
                     to (1) assign probabilistic measures to defined culture-stabilizer recipes, (2)
                     characterize individual components through intermediate nodes when the
                     growth conditions are not in the defined library and (3) visually display the
                     results to the user. Supported by the department of Homeland Security.

                     o p 1 3 : i D E n t i f y i n g g E n E s i n t h E D R o s o p h i L a h h pat h w ay
                     B y i n t E g R at i n g t f B i n D i n g a n D g E n E E x p R E s s i o n D ata

                     Presenter: Daniel Dvorkin, University of Colorado Denver
                     Authors: Daniel Dvorkin, Brian Biehs, Katerina Kechris

                     aBstR ac t: The Hedgehog (Hh) signaling pathway is critical in drosophila
                     embryonic development. Transcription factor (TF) activity by Cubitus interruptus
                     (Ci) is necessary and sufficient for upregulation of genes in the pathway. we
                     present here a graphical mixture model to integrate Ci binding data with
oral preSenTaTionS




                     multivariate gene expression data for wild-type vs. mutant embryos which are
                     null for Hh, Ci, and two other signaling pathway components. we show that
                     genes in the pathway can be better identified with an integrative model than
                     with either data source alone.

                     o p 1 4 : c a n c E R g E n E E x p R E s s i o n D a t a i s n o t n o R M a L Ly
                     D i s t R i B u t E D : a n a Ly s i s o f D a t a D i s t R i B u t i o n s a n D
                     thEiR EffEcts on gEnE sELEction anD MoLEcuLaR
                     c L a s s i f i c at i o n

                     Presenter: Nicholas F Marko, Cleveland Clinic Department of Neurosurgery
                     Authors: Nicholas Marko, Robert Weil

                     aBstR ac t: introduction: The distribution of gene expression in cancer
                     transcriptomes is generally assumed to conform to a normal distribution, and
                     many algorithms for molecular classification and gene selection are predicated
                     upon this assumption. This assumption may not be valid and may contribute to
                     inconsistencies and inaccuracies in translational molecular oncology research.
                     methods: we analyzed the 2nd-4th central moments of gene expression data
                     distributions from each of five publicly-available cancer microarray datasets
                     and compared them to those of the normal distribution. we then used curve
                     fitting to identify which of 53 theoretical distributions best approximated the
                     actual distribution of each expression data set. Finally, we compared a Box-

                     24                                                                            roCky ‘10
                                      o R a L p R E s E n tat i o n s

Cox-normalized, sixth dataset to its untransformed counterpart to investigate
the potential effects of non-normal distributions on gene selection and
molecular classification. results: The 2nd-4th central moments of all datasets
demonstrated statistically-significant differences from those of the normal
distribution. Curve fitting suggested that modeling cancer gene expression
distributions requires multi-parameter, generalized distributions, including
the beta, gamma, and weibull. application of several, common molecular
classification algorithms before and after Box-Cox normalization yielded different
results, and expression profiles distinguishing identical subgroups of this data
differed by an average of 15% before and after transformation. Conclusions: The
distribution of cancer gene expression data is not normal and is best modeled
using multi-parameter, generalized distributions. This deviation affects the results
of many standard algorithms for gene selection and molecular classification.
algorithms that do not assume normality may be necessary for accurate
genomic analysis of cancer.

op15: Exp LoR i ng Mach i n E LEaR n i ng cL assi f i ERs foR th E
pREDiction of ago BounD tRanscRipts with MiRna sEED
M atc h E s

Presenter: Abel Licon, Thermo Fisher Scientific
Authors: Abel Licon, Kevin Sullivan, Amanda Birmingham

aBstR ac t: elucidating interactions of mirnas with their target mrna
transcripts provides a more nuanced picture of gene modulation mechanisms.
it is often difficult and expensive to observe these interactions in the laboratory,


                                                                                        oral preSenTaTionS
so in silico methods can be a fast and inexpensive alternative. Here we explore
several machine learning classifiers for predicting whether or not a transcript
with a mirna seed match will be bound to an aGo/mirna complex. Using
training data from a published experiment employing par-Clip, a laboratory
technique for isolating protein-bound transcripts, we apply machine learning
classifiers on several sequence-dependent features to classify mirna seed
matches that have bound aGo/mirna complexes versus those that do not. we
describe the techniques for training data extraction from par-Clip sequence
data, explain methods for using this data to train several classifiers in weka,
including support vector machines and linear regression and report the
performance of each classifier.

op16: sp LicEgR ap h ER: p R EDicti ng sp LicE gR ap hs f RoM
DivERsE EviDEncE

Presenter: Mark F. Rogers, Colorado State University
Authors: Mark Rogers, Asa Ben-Hur, Anireddy Reddy

aBstR ac t: deep transcriptome sequencing (rna-seq) with next-generation
sequencing technologies is providing unprecedented opportunities to
researchers for probing the transcriptomes of many species. an important
goal of these studies is to asses the extent of alternative splicing, a process

roCky ‘10                                                                          25
                          o R a L p R E s E n tat i o n s

                     that increases transcriptome diversity and plays a key role in regulating gene
                     expression and protein function. although it is inexpensive and easy to obtain
                     whole transcriptome data using rna-seq, a major limitation is the lack of robust
                     methods to analyze these data. Consequently there is an increasing demand
                     for methods that can use the short reads produced in these studies to predict
                     alternative splicing patterns. There are significant challenges in using short read
                     data to predict alternative splicing, but as yet there are only a few methods
                     that address them. whereas existing tools like TaU and Cufflinks predict splice
                     variants, our approach is to predict splice graphs that capture in a single
                     structure all the possible ways in which exons can be assembled, allowing us
                     to address ambiguities that inevitably arise when using short reads to predict
                     explicit splice forms. Furthermore, our method can integrate short read data with
                     existing genome annotations and available eST data, and provide visualization of
                     splice graphs along with the evidence used to construct them. we compare our
                     framework with TaU and Cufflinks on rna-seq data from arabidopsis and find
                     that our results agree more closely with existing evidence from curated gene
                     models.

                     o p 17 : p L a s M a M E ta B o L i t E s i n t h E M a M M a L i a n
                     h i B E R n at i o n c yc L E

                     Presenter: Anis Karimpour-Fard, University of Colorado Denver
                     Authors: Anis Karimpour-Fard, L. Elaine Epperson, Lawrence Hunter, Sandra
                     Martin
oral preSenTaTionS




                     aBstR ac t: Hibernation is a dynamic endogenous circannual rhythm in which
                     metabolism, heart rate and body temperature all decrease drastically through
                     most of the winter season in order to conserve energy. Hibernators periodically
                     and regularly rewarm, and these intermittent euthermic periods are referred to
                     as interbout arousals. although the purpose behind the interbout arousal is not
                     yet known, it is hypothesized that they are essential for maintaining biochemical
                     homeostasis. we hypothesize that these physiological changes are reflected in
                     biochemical changes that provide mechanistic insights into, and biomarkers
                     for, hibernation states. in this study, we sought to identify compounds that are
                     significantly different in the plasma of a hibernator, the thirteen-lined ground
                     squirrel (ictidomys tridecemlineatus), throughout the seasons of the year.
                     Quantities of more than 200 metabolites were determined using lC and GC
                     mass spectrometry, and quantitative differences in compounds among the
                     seasonal groups were determined by statistical analyses and several machine
                     learning classification tools. Twenty compounds were identified that distinguish
                     plasma among the eight different stages of hibernation. our findings using
                     machine learning tools such as random forests are consistent with a proposed
                     two-switch model of hibernation in which setting the summer-winter switch to
                     winter enables expression of a distinct winter torpor-arousal switch.




                     26                                                                       roCky ‘10
                                        o R a L p R E s E n tat i o n s

o p 1 8 : D E t E c t i n g g E n o M E - w i D E c o p y n u M B E R va R i at i o n s
i n a s i n g L E s a M p L E u s i n g n E x t g E n E R at i o n s E q u E n c i n g
D ata

Presenter: Rajesh K Gottimukkala, Life Technologies
Authors: Rajesh Gottimukkala, Fiona Hyland, Somalee Datta, Asim Siddiqui, Ryan
Koehler, Yutao Fu

aBstR ac t: we present a sensitive and specific algorithm for calculating
genome wide copy number variations (CnVs) using next generation sequencing
data. CnVs encompass more nucleotide content per genome than Snps and
have been recently recognized as an important source of genetic variation.
detecting CnVs with microarrays has limitations due to low resolution which
deep sequencing methods overcome and allow for detection of CnVs of
arbitrary lengths. methods such as CnV-seq and SegSeq, detect CnVs in tumor
sample using deep sequencing methods but are constrained by the requirement
of a matched normal sample. our method is based on depth of coverage and
detects CnVs in a single sample compared to the reference (not requiring a
matched normal sample) by performing effective normalization based on GC
content and genome mappability. Given that coverage depth in any region is
proportional to the number of times it appears in the sample, we calculate
coverage in variable-sized genomic windows, normalize it, use Hidden markov
model for segmentation and apply empirically derived filters to the segments
to call CnVs. in Huref sample sequenced using the Solid(Tm) system, we
observe concordance of 89%-97% (using window size 2kb-5kb) with respect



                                                                                               oral preSenTaTionS
to database of Genomic Variants. with simulated reads of coverage 1X-10X, we
observe overall sensitivity between 90-96%. our method can not only accurately
detect CnVs of sizes ranging from few hundred bases to regions spanning full
chromosome (possible with cancer samples), but can also assign precise copy
number and p-value to the regions.

op19: i ntER action sitEs i n MoDELs of p RotEi n
i ntER action n Et woR k EvoLution

Presenter: Todd A Gibson, University of Colorado Denver
Authors: Todd Gibson, Debra Goldberg

aBstR ac t: Theoretical models of biological networks are valuable tools in
evolutionary inference. evolutionary network models featuring biologically-
plausible evolutionary mechanics have shown the importance of gene
duplication and divergence in the evolution of protein interaction networks.
Those these these duplication and divergence models are highly regarded,
they are not without shortcomings. Though both networks generated by these
models and empirical protein interaction networks are highly clustered, the
model-generated networks have substantially lower clustering than observed
in empirical data. we have enhanced the duplication and divergence model by
associating each protein’s interactions with one or more heritable interaction
sites. as genes duplicate, interaction sites are inherited by progeny proteins.
roCky ‘10                                                                                 27
                          o R a L p R E s E n tat i o n s

                     The loss of redundant interactions is resolved at the level of the interaction site,
                     modeling the effect of degenerative sequence mutations on interaction sites
                     on the surface of the protein. Heritable homomeric proteins and asymmetric
                     divergence are additional biological phenomena naturally captured by the
                     interaction site model. These model enhancements much more closely reflect
                     the clustering found in empirical networks.

                     o p 2 0 : i n v E s t i g at i o n o f s t R u c t u R a L B a s i s o f o n c o g E n E s i s
                     p R o p E R t y o f g t pa s E h - R a s p R ot E i n : a M o L E c u L a R
                     D y n a M i c s s i M u L at i o n s t u D y

                     Presenter: Gyana R Satpathy, National Institute of Technology
                     Authors: Gyana Satpathy, Raghunath Satpathy, B.P. Nayak

                     aBstR ac t: in the present study a human oncogenic protein GTpase H-ras
                     is considered to analyse the structural basis of its cancer causing nature. The
                     particular protein remains in various natural variant state among which the
                     mutation of 61 amino acid glutamine to lysine and leucine results 2 types of
                     cancers i.e. follicular thyroid carcinoma and melanoma respectively. The structure
                     of the protein was obtained from pdB and the mutations were performed in 61
                     position. The wild type protein and two mutants were subjected to molecular
                     dynamics simulation in water at 300 k and 350k temperature for 1000 pico
                     second by using Gromos 43a1 force field of GromaCS tool. The computing
                     facility utilised is High performance cluster for Biological applications which
                     is based on intel Xeon dual Quad core as processor, Gluster HpC 1.3 X86-
                     64 bit edition ,total 16 nodes each having 4GB of memory. The analysis after
oral preSenTaTionS




                     simulation was performed to check the energy, rmSF, rmSd value of the
                     proteins. also the nucleotide binding regions of the proteins from amino acid
                     residues 10-17, 57-61, 116-119 were analysed. The conformational differences of
                     the proteins that are obtained from molecular dynamics simulation study clearly
                     indicates about the oncogenic mutant forms and mode of their binding to other
                     molecules during the process of oncogenesis.

                     o p 2 1 : va L i D at i o n o f p R o t E i n f u n c t i o n a L s i t E
                     p R E D i c t i o n s u s i n g a u to M at E D B i o M E D i c a L L i t E R at u R E
                     a n a Ly s i s

                     Presenter: Karin M Verspoor, University of Colorado Denver
                     Authors: Karin Verspoor, Judith Cohn, Christophe Roeder, Michael Wall

                     aBstR ac t: prediction and validation of catalytic and allosteric binding sites in
                     proteins is a fundamental challenge in genomics and has practical applications
                     in rational drug design. dynamic perturbation analysis (dpa) is a computational
                     method for predicting protein functional sites by analysis of protein dynamics.
                     we used dpa to predict 122,866 functional sites in a comprehensive set of
                     95,741 protein domains from 32,192 structures in the protein data Bank (pdB),
                     yielding 1,845,452 functional residue predictions. we are investigating an
                     approach to validating these predictions using automated search for supporting

                     28                                                                                roCky ‘10
                                          o R a L p R E s E n tat i o n s

evidence in the literature. The approach is based on the assumption that
mentions of functionally important residues are much more frequent than
unimportant residues in publications about protein structure. as an initial test
of our validation concept we developed a set of patterns for detecting residue
mentions in text. The patterns accommodate surface and linguistic variations
in references to specific residues in the amino acid sequence. They also aim
to distinguish mutations from other types of references to residues. we tested
the performance of our patterns in automated retrieval of residue mentions
by compiling a ground-truth corpus of full text publications in which residue
mentions were manually identified. our patterns currently achieve approximately
90% F-score on this corpus. The results indicate that these patterns are highly
effective tools for automatically finding residue mentions in text. provided our
assumption that these mentions constitute evidence of functional relevance
holds, they suggest we can use automated literature mining to increase
confidence in functional site predictions.

o p 2 2 : M o D E L i n g g E n E - s p E c i E s D ata B y g E n E R a L i z E D
R E p L i c ato R D y n a M i c s f o R E f f i c i E n t p h y L o g E n E t i c
infEREncE

Presenter and Author: Ying Liu, University of North Texas at Dallas

aBstR ac t: in recent years, biclique methods have been proposed to
construct phylogenetic trees. one of the key steps of this method is to find
complete sub-matrices (no missing entries) from a species-genes binary matrix.



                                                                                                 oral preSenTaTionS
Sanderson et al. 1 formulated it as the problem of enumerating all maximal
bicliques. as widely adopted by the phylogeneticists, bicliques, which have both
large number of species and large number of genes, yield more informative
phylogenetic trees. This leads to the conclusion that a balanced biclique is
preferred to help phylogenetic inference. exact algorithms for the maximal
biclique enumeration problem are not efficient in finding balanced bicliques,
and it is not able to reveal the relationship among these bicliques. in this paper,
we identified the distinct ladder-like overlapping structure of bicliques that exists
in the species-genes matrix for discovering balanced bicliques. Such structure
can be easily used to select balanced bicliques. we approached the problem
of finding the ladder-like overlapping structure of bicliques by generalizing a
well-known evolutionary selection model, replicator dynamics, to a new discrete
dynamical system, called generalized replicator dynamics. empirical study shows
our method is effective and efficient for phylogenetic inference.

o p 2 3 : c o M p a R a t i v E a n a Ly s i s o f a p i c o M p L E x a n B i o L o g i c a L
pRocEssEs

Presenter: Segun A Fatumo, Center for Tropical & Emerging Global Diseases
Authors: Segun Fatumo, Jessica Kissinger

aBstR ac t: apicomplexans are early branching unicellular, parasitic
eukaryotes related to Ciliates and dinoflagellates (Baldauf 2003). included

roCky ‘10                                                                                  29
                          o R a L p R E s E n tat i o n s

                     in the phylum apicomplexa are several agents of human and animal disease
                     such as plasmodium spp.(the causative agent of malaria), and the aidS-related
                     pathogens, Cryptosporidium spp., and Toxoplasma gondii, The availability
                     of genome sequence for many apicomplexans provides an opportunity for
                     biochemical pathway comparative analysis. in this work, we used orthomCl
                     (li, Stoeckert and roos 2003) to identify the orthologous genes that are
                     uniquely present across the entire phylum and orthologous genes across some
                     specific lineages within the phylum. we have compared twelve species within
                     the apicomplexa and two ciliate outgroups and preliminarily mapped their
                     metabolic pathway reaction content. we mapped these data unto a tree of the
                     evolutionary relationships of these organisms (kuo et al get ref) to determine
                     the lineage-specificity of the metabolic capacity of the organisms. in addition
                     to whole content comparisons, we discovered lineage-specific evolution of
                     individual proteins in terms of their protein domains as identified by pFam.Â
                     By analyzing the unique genes common to all species of apicomplexa and
                     finding their biological processes, 16.1% have no biological processes according
                     to Blast2Go, pfam2Go and eupathdB while about 37% of the genes are still
                     hypothetical.

                     o p 2 4 : s t o p u s i n g J u s t g o : a M u Lt i - o n t o L o g y E n R i c h M E n t
                     a n a Ly s i s t o o L f o R g E n E s a n D p R o t E i n s

                     Presenter: Emily J Howe, The Buck Institute
                     Authors: Emily Howe, Uday Evani, Mathew Fleish, Nigam Shah, Sean Mooney
oral preSenTaTionS




                     aBstR ac t: enrichment analysis is a common technique among biologist
                     used to reduce a large set of annotations to a smaller and more manageable
                     set of significantly represented concepts. Currently enrichment analysis is
                     done primarily using Gene ontology (Go). Because enrichment analysis is a
                     reduction technique the quality of the results depend entirely on the data used
                     to create them. although Go has been largely useful, there are entire domains
                     of research that are not considered as part of that ontology (such as diseases or
                     phenotypes). To solve this problem we have created STop (Statistical Tracking of
                     ontological phrases), a multi-onotlogy automated enrichment analysis tool for
                     performing Go like enrichment analysis on genes and/or proteins using other
                     ontologies. STop gathers text related to genes from the nCBi entrez database
                     and is then automatically annotated using the Stanford nCBo annotator. The
                     nCBo annotator currently annotates with terms from over 200 ontologies.
                     STop will perform enrichment analysis using anywhere from 1 to all of the
                     annotated ontologies. Users can select their own background dataset or STop
                     will use a predefined background of the entire genome for a given species. STop
                     is currently fully implemented with Human genes. Human proteins and other
                     species are currently under development.




                     30                                                                              roCky ‘10
                                         o R a L p R E s E n tat i o n s

o p 2 5 : c o M p a R a t i v E o n t o L o g i c a L a n D n E t w o R k a n a Ly s i s
o f a g i n g a s s o c i at E D g E n E s i n h u M a n s a n D M o D E L
oRganisMs

Presenter: Ari E Berman, Buck Institute for Age Research
Authors: Ari Berman, Tobias Wittkop, Emily Howe, Sean Mooney

aBstR ac t: a great deal of research over the past few decades has been
devoted to the study of aging in humans and model organisms. despite the
steadily increasing foundation of research, its biological mechanisms remain
an active area of study. many genes have been implicated in the process of
aging, largely through the use of model organisms, such as C. elegans, d.
melanogaster, and m. musculus. although these genes shed light on the aging
process, it is not clear how these genes translate to other model organisms
and humans. in this study, we compared two gene sets associated with aging,
Genage (human-related genes), and anage (aging genes in animal models),
in order to determine if the gene sets carried any functional and conceptual
similarities. Comparisons were performed using Gene ontology, Swissprot,
Genemania, sequence similarity, and the nCBo automated concept annotator.
Generalized comparisons were made in order to evaluate the conceptual
similarities in the datasets. The results show commonalities between the model
organisms and humans. additionally, this process could lead to a mechanism for
inferring the appropriate aging genes in humans.

op26: f Lu an D DR ugs an D Rocky10



                                                                                                oral preSenTaTionS
Presenter and Author: Christian V Forst, UT Southwestern Medical Center

aBstR ac t: Viruses evade host defense mechanisms by targeting specific
host vulnerabilities, revealing critial points in host pathways that regulate
antiviral responses. Here, a chemical genetics approach was taken to identify
host factors required for the function of nS1, a major influenza a virus virulence
factor. a high-throughput screen identified naphtalimides that inhibited
replication of highly pathogenic H1n1 influenza virus, VSV, and vaccinia virus
in an interferon-independent manner. Gene-expression profiles were analysed
by combined gene set enrichment/response network approach to identify
enriched biochemical host-response networks. one particular response network
was further investigated, indicating a mechanism of action through activation
of ddiT4 expression and concurrent inhibition of mTorC1. no antiviral activity
was detected in ddiT4 knock-out cells. on the other hand, viruses inhibit ddiT4
expression, resulting in activation of the mTor pathway. Thus, ddiT4 is a novel
host defense factor and drug activation of ddiT4 expression suggests a potential
antiviral intervention strategy.




roCky ‘10                                                                                  31
                          o R a L p R E s E n tat i o n s

                     o p 2 7 : a n a Ly s i s w o R k f L o w o f M E t h y L a t i o n a n D g E n E
                     E x p R E s s i o n M i c R o a R R ay i n p E D i at R i c c R o h n D i s E a s E

                     Presenter: Tzu Lip L Phang, University of Colorado Denver
                     Authors: Tzu Phang, Anna Hunter, Ping Yao Zeng, Theresa Kerbowski, Edwin de
                     Zoeten

                     aBstR ac t: Crohn’s disease (Cd) is one of the major types of inflammatory
                     Bowel disease (iBd) and it is affecting up to one million americans. previous
                     studies have shown discordance between monozygotic twins affected by iBd,
                     which provide evidence that epigenetic regulation plays an important role in the
                     etiology of this disease. in this study, we combined two microarray technologies
                     to evaluate the dna methylation, a tissue specific genetic modulation that is
                     ‘heritable’ and their effects on gene expression. By comparing the 2 groups of
                     pediatric patients (10 iBd vs. 19 controls), the illumina platform determines
                     the methylation status of over 27,578 CpG dinucleotides spanning 14,495
                     genes. we compared these results to gene expression profiles in the same
                     tissues determined by the affymetrix Gene 1.0 ST microarray using 12,241
                     common genes between the microarray platforms. Here, we introduce the
                     data integration and statistical analysis workflow for the two datasets. we
                     divided the dataset by gender and perform the statistical analysis. when
                     comparing the female samples to controls, hypomethylation was noted in
                     85 genes corresponding to elevated expression profiles in iBd tissues, while
                     hypermethylation was found in 29 genes with decreased expression, whereas
                     the male samples reported 108 and 9 genes represented in the respective
oral preSenTaTionS




                     comparisons. Functional analysis shows immune response and stress response
                     genes were upregulated and hypomethylated in patients with Crohn’s disease.
                     This abstract demonstrates a data analysis workflow and the importance of
                     integrating multiple high-throughput technologies in studying human disease.

                     o p 2 8 : E v o L u t i o n o f c i s - R E g u L ato R y E L E M E n t s

                     Presenter and Author: Ken Yokoyama, University of Colorado Denver

                     aBstR ac t: Changes in gene expression can have major impacts upon
                     phenotype, yet little is known about the co-evolution of regulatory proteins
                     and their dna binding sites. we show that subtle changes in the Sp1
                     transcription factor have had dramatic effects upon its binding elements,
                     inducing nucleotide-specific changes across ~800 eutherian binding sites
                     genome-wide. Cross-species comparisons of the Sp1 binding domain implicate
                     no more than four amino acid sites to be responsible for changes in binding
                     preferences. independent changes at these sites in eutherians and birds have
                     produced nearly identical binding site modifications, with sites converting to the
                     modified consensus sequence within similar sets of target genes. This suggests
                     preservation of binding events in functionally important genes, highlighting
                     convergent evolution in both cis- and trans-regulatory elements.


                     32                                                                              roCky ‘10
                                         o R a L p R E s E n tat i o n s

o p 2 9 : M o R E i s n o t a Lw ay s B E t t E R . c o n s i D E R at i o n s i n t h E
u s E o f t i M E c o u R s E M i c R o a R R ay D ata

Presenter: Elizabeth A Siewert, Colorado School of Public Health
Authors: Elizabeth Siewert, Katerina Kechris

aBstR ac t: Time course microarray expression data have become more
common in the past 10 years. among other applications using these types of
data, model based methods have been developed to identify transcription factor
binding sites (TFBSs), de novo, in a single species. Using our TFBS identification
method, which incorporates information from multiple species, we explored
the use of time course data to predict TFBSs for a four species yeast expression
data set taken across multiple time points under two different stress conditions.
we found that, depending on which time course data points were used as the
response variable, there were large variations in the predictive accuracy of our
method. we present potential pitfalls of indiscriminately using time course data,
and propose three general criteria to consider when choosing data.

o p 3 0 : s E M a n t i c R i c h n E s s i n t h E c o L o R a D o R i c h Ly
a n n o tat E D f u L L -t E x t ( c R a f t ) c o R p u s

Presenter: Michael Bada, University of Colorado Anschutz Medical Campus
Authors: Michael Bada, Miriam Eckert, Arrick Lanfranchi, William A. Baumgartner,
Jr., Colin Warner, Amanda Howard, William Corvey, Nianwen Xue, K. Bretonnel
Cohen, Karin Verspoor, Judith A. Blake, Martha Palmer, Lawrence Hunter




                                                                                           oral preSenTaTionS
aBstR ac t: we are in the midst of creating the Colorado richly annotated
Full-Text (CraFT) Corpus, a collection of 97 full-text biomedical journal articles
that are being manually annotated both semantically and syntactically to
serve as a gold standard in the development of advanced biomedical natural-
language-processing applications. For the former, we are using the full sets of
terms of high-quality terminologies, primarily open Biomedical ontologies, as
well as other controlled biomedical terminologies to annotate every textual
mention of these concepts in the articles. we have preliminarily finished
annotation using the oBo Cell Type ontology, the oBo Go biological-process,
cellular-component, and molecular-function subontologies, the oBo Chemical
entities of Biological interest ontology, the oBo Sequence ontology, and
the nCBi Taxonomy, and we are continuing to annotate the corpus with the
unique identifiers of the entrez Gene database. Furthermore, we intend to
relationally link these concept annotations to form assertions. we present an
initial analysis of the occurrence of these concepts in these articles, showing that
these mentions are both abundant and diverse, and hypothesize that they are
generally indicative of the wider biomedical literature.




roCky ‘10                                                                             33
                          o R a L p R E s E n tat i o n s

                     o p 3 1 : f u n c t i o n a L R E g u L ato R y c i R c u i t s i n D u c E D B y
                     t R a n s c R i p t i o n fa c to R s a n D s M a L L R n a s

                     Presenter: Molly Megraw, Duke University
                     Authors: Molly Megraw, Uwe Ohler

                     aBstR ac t: a program of tightly regulated gene expression is at the heart of
                     development for every living organism. recent years have seen an increased
                     appreciation for the complexity of transcriptional control by gene Transcription
                     Factors (TFs) as well as post-transcriptional control by small rnas known as
                     micrornas (mirnas). Several specific cases of small mirna-TF regulatory
                     circuits have been painstakingly discovered link by link using traditional genetic
                     experiments. These examples all point to mirna-TF circuits as crucial network
                     components with important system-wide regulatory characteristics. They also
                     highlight the need for systematic studies and methods to identify TF-mirna
                     circuits and query their biological function. The fundamental idea behind
                     network motif discovery is that if a certain configuration (a 3-node cycle for
                     example) is contained within a given network a surprisingly high number of
                     times compared to many randomized networks, this configuration is likely
                     to have been preserved through evolutionary time because it benefited the
                     organism. However, motif identification is a useful concept only to the degree
                     that the set of randomized background networks used for comparison are
                     plausible as alternatives to the given network. Currently available background
                     models were developed for use in TF-only networks and therefore have a
                     number of shortcomings for use in TF-mirna-gene networks. Here we present
oral preSenTaTionS




                     an algorithm that assigns edges in a manner that accounts for the unique
                     biological constraints between each type of network entity, creating a more
                     flexible and realistic background randomization model. we discuss network
                     motifs identified in the arabidopsis thaliana model plant system.

                     op32: RapiD coMpREhEnsivE sELEction of sEnsitivE
                     a n D s p E c i f i c o L i g o n u c L E o t i D E s i g n at u R E s f R o M L a R g E
                     h i E R a R c h i c a L Ly c L u s t E R E D n u c L E i c a c i D s E q u E n c E
                     D ata s E t s

                     Presenter: Harald Meier, Technical University of Munchen
                     Authors: Kai Bader, Christian Grothoff, Harald Meier

                     aBstR ac t: organism- and taxon-specific oligonucleotide probes and primers
                     are key components for the detection and identification of microorganisms
                     using molecular techniques such as pCr, Hybridization or dna Sequencing. The
                     design of valuable probes/primers depends on the identification of promising
                     sequence- or group-specific oligonucleotide signatures (oS) in the relevant
                     sequence data, such as whole genome sequences or marker gene sequence
                     collections. The identification of valuable signatures requires comprehensive
                     in silico sequence searches, which become computationally expensive, when
                     having to analyze large collections of nucleic acid sequences. an example is the
                     phylogenetic rrna sequence database SilVa, one of the largest curated gene
                     34                                                                                  roCky ‘10
                                          o R a L p R E s E n tat i o n s

sequence databases worldwide. it contains more than 460.000 of published, full
length SSU-rrna sequences, which are hierarchically clustered in a phylogenetic
tree, providing more than 920.000 nodes (www.arb-silva.de, SilVa release 102,
SSUref). we introduce the two-component software system oligonucleotide
Signature Quest and evaluation System (oS-Quest), and its usage for compiling
a comprehensive collection of sequence- and group-specific oligonucleotide
signatures from SilVa. The special features of oS-Quest, in particular to cope
with large amounts of inexact sequence data, are mentioned. Furthermore we
present an oS-Quest component which gives the user-community unlimited
access to the pre-calculated signature collection via Client-/Server-Technology.
retrieving promising signatures from our pre-computed comprehensive
collection using oS-Quest offers a good starting point for designing sequence-
or group-specific primers and probes, in particular to users, lacking the
computational power or the expertise to perform extensive signature searches in
huge sequence datasets on their own local systems.

o p 3 3 : c L a s s i f y i n g pa R E n t h E s i z E D M at E R i a L f o R t E x t
Mining

Presenter: Kevin Bretonnel Cohen, University of Colorado School of Medicine
Authors: K. Bretonnel Cohen, Tom Christiansen, Lawrence Hunter

aBstR ac t: parenthesized text in biomedical documents often has the
characteristic of being confusing for lay readers but a potentially significant
source of extractable information for text mining. This talk describes current



                                                                                             oral preSenTaTionS
progress on a system for extracting and classifying parenthesized material. a
typology has been constructed for classifying parenthesized material into one
or more of fourteen categories. each is associated with at least one specific
use case. a Java class with a simple api has been developed for extracting
parenthesized text and returning one or more categories and the categorized
text. Current evaluations have been done by an extensive series of JUnit tests;
further plans include corpus annotation for evaluation on blind test data.

o p 3 4 : to w a R D s i n t E g R at i v E g E n E p R i o R i t i z at i o n

Presenter: Graciela H Gonzalez, Arizona State University
Authors: Jang Lee, Graciela Gonzalez

aBstR ac t: many methods have been proposed for facilitating the
uncovering of genes that underlie the pathology of different diseases. Some
are purely statistical, resulting in a (mostly) undifferentiated set of genes that
are differentially ex- pressed (or co-expressed), while others seek to prioritize
the resulting set of genes through comparison against specific known targets.
most of the recent approaches use either single data or knowledge sources,
or combine the independent predictions from each source. However, given
that multiple kinds of heterogeneous sources are potentially relevant for gene
prioritization, each subject to different levels of noise and of varying reliability,
each source bearing information not carried by another, we claim that an
roCky ‘10                                                                               35
                          o R a L p R E s E n tat i o n s

                     ideal prioritization method should provide ways to discern amongst them
                     in a true integrative fashion that captures the subtleties of each, rather than
                     using a simple combination of sources. integration of multiple data for gene
                     prioritization is thus more challenging than its single data type counterpart.
                     what we propose is a novel, general, and flexible formulation that enables
                     multi-source data integration for gene prioritization that maximizes the
                     complementary nature of different data and knowledge sources in order to
                     make the most use of the information content of aggregate data. protein-protein
                     interactions and Gene ontology annotations were used as knowledge sources,
                     together with assay-specific gene expression and genome-wide association data.
                     leave-one-out testing was performed using a known set of alzheimer’s disease
                     genes to validate our proposed method. we show that our proposed method
                     performs better than the best multi-source gene prioritization systems currently
                     published.

                     o p 3 5 : to w a R D s a M o L E c u L a R c L a s s i f i c at i o n o f k i D n E y
                     D i s E a s E s B a s E D o n n E t w o R k a n a Ly s i s

                     Presenter: Felix Eichinger, University of Michigan
                     Authors: Felix Eichinger, Ramakrishna Varadarajan, Jignesh Patel,
                     Matthias Kretzler

                     aBstR ac t: Classification of patients based on conventional criteria such
                     as histology and laboratory values in comprehensive datasets often show a
                     significant discrepancy with patterns in gene expression. while this disconnect
oral preSenTaTionS




                     can easily be explained by the several layers of cellular machinery between gene
                     expression and phenotype, even a description of the discrepancies is missing. To
                     address this problem, we perform a molecular classification of a comprehensive
                     gene expression dataset of 226 patients with 11 kidney related diseases states
                     and link the results back to clinical and histological data. in detail, we select
                     the regulated genes from each patient by comparison to a pool of healthy
                     controls. To control for noise in the data and redundancy in gene function, we
                     extend the gene lists to networks by adding edges representing cocitations of
                     genes in pubmed abstracts and comparison of the resulting networks with an
                     approximate graph matching algorithm (Tale). Subsequently we cluster the
                     networks by similarity. each cluster is analyzed for patterns either specific to the
                     cluster or shared across clusters, and tested for homogeneity by appearance of
                     patterns in the patients. Since patterns are hypothesized to indicate biological
                     processes active in a subset of patients, we investigate the genes for interactions
                     and strive to assign functional annotation to the patterns. Based on function
                     assignment and knowledge of connections between biological processes and
                     phenotype, we hypothesize about phenotypic effects and test on the clinical
                     data. embedding the cluster specific patterns in cross-cluster patterns enables
                     integration into the common biological context.




                     36                                                                             roCky ‘10
                                         o R a L p R E s E n tat i o n s

o p 3 6 : i D E n t i f y i n g t h E D y n a M i c s tat E s o f t h E 3 D g E n o M E
o R g a n i z at i o n

Presenter: Andrzej S Kudlicki, University of Texas
Authors: Dirar Homouz, Gang Chen, Andrzej Kudlicki

aBstR ac t: Chromatin capture experiments (4C, Hi-C) allow genome-wide
mapping of physical interactions between chromosomal loci. we discuss the
impact that multiple or non-homogeneous subpopulations of cells contained
in an experimental sample may have on the results of a chromatin capture
experiment. we propose a method allowing to identify this phenomenon
by analyzing statistical and geometrical properties of chromatin capture
measurements. By applying the algorithm to published experimental data,
we demonstrate that subpopulations with different chromatin conformations
are indeed present and their influence on the results is significant. Finally, we
present an algorithm that reconstructs the chromatin conformations in each
subpopulation by applying graph-theoretic consideration. we demonstrate
that the results are consistent with each subpopulation of cells executing a
significantly different transcriptional program.

o p 3 7 : p R o v i D i n g c o n t E x t to g E n E t i c a s s o c i at i o n s w i t h
gEnE ExpREssion in REnaL DisEasE

Presenter: Benjamin J Keller, Eastern Michigan University
Authors: Benjamin Keller, Sebastian Martini, Matthias Kretzler




                                                                                                 oral preSenTaTionS
aBstR ac t: Candidate Snps from genetic association studies are often
without context in terms of a molecular interpretation. Historically, scientists
performing a GwaS will translate the Snps to candidate genes using guilt-by-
proximity, explore gene annotation and perhaps pathways, but not go much
further. recent studies may take a stronger step by employing small studies
treating expression as a quantitative trait to help interpret Snps relative to gene
expression in cell lines frequently derived from peripheral blood. our focus is
on regulatory systems affected by Snps, and will discuss our experience using
renal tissue expression to interpret renal disease GwaS candidates with two
approaches. one, linking across populations, correlating a quantitative clinical
trait with expression under guilt-by-proximity; and, the other, within the same
population, associating renal tissue expression with genotype in an eQTl
analysis. we discuss our experience with these approaches, and the directions
the results allow us to take.




roCky ‘10                                                                                   37
                          o R a L p R E s E n tat i o n s

                     op38: gEn E an D p RoMotER DiscovERy th Rough h igh-
                     R EsoLution Exp R Ession p Rof i Li ng

                     Presenter and Author: Ian W Davis, GrassRoots Biotechnology, Inc.

                     aBstR ac t: organisms alter their transcriptional programs in response to
                     environmental stimuli, but because such changes are often highly localized, they
                     may not be detected if gene expression is measured over a whole organism
                     or whole organ. By contrast, we measure gene expression under multiple
                     conditions over individual cell types and developmental stages in the roots of
                     the model plant arabidopsis thaliana, and discover cell type-specific responses
                     to limiting nutrient conditions. network- and systems-focused analysis of the
                     resulting spatial/temporal/environmental-specific microarray data identifies
                     novel putative mechanisms of adaptation to scarcity, and well as potentially
                     engineerable regulators of these processes. Furthermore, the high-resolution
                     profiling data distinguishes similar but distinct expression patterns, allowing
                     accurate clustering of patterns. in turn, bioinformatic analysis of the associated
                     promoter sequences leads to identification of putative cis-elements directing
                     these expression patterns. Based on this analysis, we have created several
                     synthetic promoter sequences that drive their intended expression pattern in vivo.

                     o p 3 9 : u s i n g a L o w D i M E n s i o n a L f i R i n g R at E M o D E L to
                     stu Dy i ntER actions i n B iop hysicaL n Eu R aL n Et woR ks

                     Presenter and Author: Anca R Radulescu, University of Colorado at Boulder
oral preSenTaTionS




                     aBstR ac t: population bursting is defined as a period of high firing rate
                     followed by a period of quiescence. Bursting has been typically observed
                     experimentally in groups of neurons in certain brain regions (such as the
                     thalamus, the hippocampus, or the midbrain) during normal or pathological
                     animal behavior. Biophysical membrane-potential models of single cell bursting
                     involve at least three equations; extending such models to study the collective
                     behavior of neural populations in a network involves thousands of equations
                     and can be very expensive computationally. we construct a low dimensional
                     population model that captures biophysical aspects of the network using a firing-
                     rate mean-field approach. we study mechanisms that trigger and stop transitions
                     between tonic and phasic population firing; in our model, these mechanisms
                     are captured through a two-dimensional system. we then use this system as a
                     building block, and extend our study to synchronization within and interactions
                     between networks of neurons in different brain areas. This theoretical approach
                     may help contextualize and understand the factors involved in regulating burst
                     firing in populations and how it may modulate distinct aspects of behavior.




                     38                                                                          roCky ‘10
                                          o R a L p R E s E n tat i o n s

o p 4 0 : i n v E s t i g at i n g R E L at i o n s h i p s B E t w E E n o B E s i t y a n D
t h E B u i Lt E n v i R o n M E n t u s i n g a g E n t - B a s E D M o D E L i n g

Presenter: Helmet T Karim, University of Pittsburgh
Authors: Helmet Karim, Leming Zhou

aBstR ac t: obesity has become a world-wide epidemic. it is the result
of complex interactions among many different factors such as genetics,
environment, behavior, culture, and social networks. Currently extensive work
has been done on these factors in various fields, for instance, genome-wide
association studies for determining the genetic causes of obesity and statistical
investigations on the prevalence of obesity based on large-scale surveys. To
obtain a dynamic view of the complex interactions among various environmental
factors in the prevalence of obesity, in this work we propose to create an agent-
based model. in this model, agents are people and their direct environment
such as markets, restaurants, workplaces, homes, and gyms. rules governing
the behavior of these agents are constructed based on extensive literature
review. This agent-based model can visually present the dynamic interaction
among various agents in the model. Users of this model can conveniently adjust
the parameters in this model at the real time to observe the sensitive of each
parameter and the behavioral changes of those agents. after the calibration
of parameters in this model, we have observed some reasonable results. For
instance, an increased number of fast-food restaurants in the area or a longer
distance from healthy groceries would have negative effects on weight and
Bmi of the population. Further work on this model should provide us more


                                                                                                oral preSenTaTionS
meaningful results in the near future. The success of this model may open doors
to more comprehensive models and therefore a more accurate picture of factors
related to obesity.

o p 4 1 : c h a R a c t E R i z at i o n o f g E n o M i c va R i a B i L i t y
i n c L i n i c a L i s o L at E s o f t h E o R a L pat h o g E n
a g g R E g at i B a c t E R a c t i n o M yc E t E M c o M i ta n s

Presenter: Weerayuth Kittichotirat, University of Washington
Authors: Weerayuth Kittichotirat, Casey Chen, Roger Bumgarner

aBstR ac t: Bacteria are known to have the ability to exchange their genetic
materials with other evolutionarily distinct species. This dynamic gain and loss
of genetic elements causes bacterial genomes to be relatively plastic and in turn
may even display significant variations between closely related strains of the
same species. with the throughput of today’s genome sequencing technologies,
a high coverage bacterial genome sequence can readily be obtained within
days. Sequencing of closely related bacterial genomes and comparison of their
genetic content is therefore a logical strategy for identifying genetic variability
that underlie their phenotypic differences. in this project, we are developing
a genome sequencing, annotation and comparison pipeline for studying the
genotype-phenotypic correlations between closely related bacterial strains.

roCky ‘10                                                                                  39
                          o R a L p R E s E n tat i o n s

                     we have applied our pipeline to analyze a collection of 18 strains of an oral
                     bacterium known as aggregatibacter actinomycetemcomitans (aa). aa is a
                     human pathogen that is heavily associated with aggressive periodontitis and
                     other systemic infections. intriguingly, not all aa exhibit virulent phenotypes
                     and we believe that by looking at the dynamic gain and loss of genetic contents
                     across different strains of aa, we will be able to better understand the genotypic
                     and phenotypic relationship that gives rise to the variation in virulence of this
                     bacterium.

                     op42: ExtRacting aDvERsE DRug REactions fRoM usER
                     p o s t s t o h E a Lt h - R E L a t E D s o c i a L n E t w o R k s

                     Presenter: Laura Wojtulewicz, Arizona State University
                     Authors: Robert Leaman, Laura Wojtulewicz, Ryan Sullivan, Annie Skariah, Jian
                     Yang, Graciela Gonzalez

                     aBstR ac t: adverse reactions to drugs are among the most common causes
                     of death in industrialized nations. expensive clinical trials are not sufficient to
                     uncover all of the adverse reactions a drug may cause, necessitating systems
                     for post-marketing surveillance, or pharmacovigilance. These systems have
                     typically relied on voluntary reporting by health care professionals. However,
                     self-reported patient data has become an increasingly important resource,
                     with efforts such as medwatch from the Fda allowing reports directly from the
                     consumer. in this paper, we propose mining the relationships between drugs
                     and adverse reactions as reported by the patients themselves in user comments
oral preSenTaTionS




                     to health-related websites. we evaluate our system on a manually-annotated
                     set of user comments, with promising performance. we also report encouraging
                     correlations between the frequency of adverse drug reactions found by our
                     system in unlabeled data and the frequency of documented adverse drug
                     reactions. we conclude that user comments pose a significant natural language
                     processing challenge, but do contain useful extractable information which merits
                     further exploration.

                     o p 4 3 : s i M p L E L o c a L a s s E M B Ly p R o g R a M

                     Presenter: Adam W Spargo, Wellcome Trust Sanger Institute
                     Authors: Adam Spargo, Zemin Ning

                     aBstR ac t: we present a simple local assembly program which will be
                     used in the contig assembly stage of the phusion2 pipeline. phusion [1]
                     clusters sequencing reads by shared long k-mer words, these clusters are
                     then assembled in parallel, currently using phrap[2]. This pipeline was very
                     successful with Sanger sequencing technology, however second generation
                     sequencing technologies have presented several issues (i) phrap cannot
                     handle very high coverage data and so clusters must be small, (ii) phrap cannot
                     make use of read-pairs; with contigs requiring extensive post-processing by
                     phusion, both to join via read-pairs and to break at mis-assemblies, (iii) long
                     running phrap jobs destroy the previously effective parallelization of phusion,

                     40                                                                       roCky ‘10
                                     o R a L p R E s E n tat i o n s

(iv) phrap cannot handle all of the different second generation technologies
effectively, making a hybrid approach to genome sequencing more difficult
than necessary. The local assembler has been implemented via the overlap-
layout-consensus methodology, using libraries from the Smalt alignment tool
[3] and the Boost Graph library [4]. we detail this implementation and then
report on our investigations into algorithms for overlap-graph disambiguation;
using read-pairs, defined nucleotide positions and read-depth. re-use of robust/
multi-threaded libraries allows us to quickly implement new algorithms and
concentrate our research on developing new methods to make the best of the
available technologies. results show the disambiguation of graphs generated
from carefully constructed simulation data for various classes of repeats as
well as real data. [1] The phusion assembler. mullikin JC and ning Z. Genome
research 2003;13;1;81-90. [2] http://www.phrap.org/ [3] http://www.sanger.
ac.uk/resources/software/smalt/ [4] http://www.boost.org/doc/libs/1_44_0/
libs/graph/doc/index.html

op44: MoDELLing gEnE ExpREssion in tuMoR pRogREssion
u s i n g B i n a R y s tat E s

Presenter: Juan Emmanuel Martinez-Ledesma, ITESM Campus Monterrey
Authors: Juan Emmanuel Martinez-Ledesma, Victor Trevino

aBstR ac t: Cancer is a complex disease characterized by the disrupted activity
of several cancer-related genes such as oncogenes and tumor-suppressor genes
(TSG). By definition, it is expected that the expression of cancer-related genes



                                                                                     oral preSenTaTionS
changes during tumor progression. despite the enormous efforts made for
biomarker and gene pattern discovery, few methods have been designed to
model the gene expression level to tumor stage during malignancy progression.
Such models could help us to understand the dynamics and complexity of
tumor progression. we have developed a methodology based on the proportion
of samples whose gene expression level were activated or inactivated within a
tumor stage to compose expression patterns associated to tumor progression.
our preliminary results using a prostate cancer dataset show that our method
identifies the expected profile corresponding to oncogenes and TSG in both
cancer and non-cancer related genes. ontology and pathway analysis show that
the significant genes found are associated to well know cancer-related terms. in
addition, we show that a considerable proportion of significant profiles are not
found by other statistical tests commonly used to detect differential expression
between tumor stages.

op45: finDing coMMunity LEaDERs in sociaL nEtwoRks

Presenter and Author: Xiaowei Xu, University of Arkansas at Little Rock

aBstR ac t: identifying leaders in social networks is important to
epidemiology, viral marketing, systems biology and sociology. leaders control
contact between individuals and influence ideas and opinions. They are the
nexus for the propagation of disease, information and ideas. we propose
roCky ‘10                                                                       41
                          o R a L p R E s E n tat i o n s

                     an algorithm for identifying leaders and measuring their influence on their
                     community that is based on the structure of the network. we illustrate the
                     differences between our work and information spread maximization algorithms
                     both analytically and experimentally. we evaluate its performance on real social
                     networks, including the enron email network and Biblical Social network. our
                     algorithm is both fast and accurate. it is superior at identifying community
                     leaders and achieves comparable influence spread when compared to influence
                     spread maximization algorithms.

                     op46: aDaptivE LEaR n i ng n Eu R aL n Et woR ks foR B i n Di ng
                     sitE sEaRch in gEnoMic sEquEncEs

                     Presenter: Ivan Erill, University of Maryland Baltimore
                     Authors: Joseph Cornish, Sumeet Bagde, Elisabeth Hobbs, Ivan Erill

                     aBstR ac t: artificial neural networks (ann) and other machine learning
                     systems, like Hidden markov models, can be trained to become highly efficient
                     pattern recognition systems that are able to discern non-linear features on
                     complex backgrounds. For this reason, neural networks have been proposed
                     frequently as suitable search tools for the identification of transcription factor-
                     binding sites in genomic sequences. Here we show that neural networks
                     trained with the standard backpropagation algorithm perform significantly
                     worse at locating transcription factor binding sites in genome sequences than
                     standard weight-matrix search techniques. we observe that this is due to the
                     ill-balanced nature of the search problem, which requires the identification of
oral preSenTaTionS




                     a small number of sites against a very large background. we propose a new
                     algorithm, adaptive learning, based on a targeted sampling of the background
                     during backpropagation learning. we validate this approach by cross-validation
                     on an up-to-date collection of Crp sites from escherichia coli. a portion of these
                     sites is used to train ann committees with adaptive learning. The remainder
                     is used for benchmarking search efficiency against the original e. coli genome,
                     a randomly generated genome and the genome of paenibacillus sp. our
                     results demonstrate that adaptive learning of neural networks improves search
                     efficiency dramatically against all backgrounds. we observe also that enhanced
                     learning algorithms are likely to be hindered by the presence of unknown
                     positives on the original genome. we discuss the general implications of these
                     findings for machine learning approaches to binding site search.

                     o p 4 7 : E n R i c h i n g R E g u L ato R y n E t w o R k s w i t h o t h E R
                     f u n c t i o n a L R E L at i o n s h i p s

                     Presenter: Ronald C Taylor, Pacific Northwest National Laboratory
                     Authors: Ronald Taylor, Antonio Sanfilippo, Jason McDermott, Bob Baddeley, Rick
                     Riensche, Russ Jenson, Marc Verhagen

                     aBstR ac t: much of the current work on constructing biological networks
                     has focused on reverse-engineering regulatory connections between genes
                     from correlation patterns observed in gene expression data. Consequently, the

                     42                                                                            roCky ‘10
                                         o R a L p R E s E n tat i o n s

integration of the inferred networks with other background information from
sources such as the biomedical remains an open problem. This is an important
gap as such additional information is needed to (1) refine our confidence in
the inferred gene-to-gene regulatory connections and (2) expand the inferred
networks. Here, we report on one novel means of tying networks derived from
gene expression data to other information, using a bootstrapping version of
our Cross-ontological analytics (Xoa) algorithm. Xoa links genes into networks
using aggregated semantic similarities between Go annotations found for those
genes in the Go database. The resulting network formed by such edges provides
new information as to functional and possible regulatory relationships between
the genes. we use Context likelihood of relatedness (Clr) to infer edges
derived from mouse gene expression data gathered for study of neuroprotection
in stroke. we feed that set of genes and connections into our bootstrapped Xoa
algorithm, and report on the expanded set of connections found, performing
topological analysis. also, we compare those Xoa results to Xoa results that take
as the starting point for analysis the set of Clr connections combined with a set
of literature-based gene-to-gene connections found in pubmed abstracts by the
medStract tool for those same Clr-reported genes.

o p 4 8 : g E n o s : s E g M E n t - B a s E D R E p R E s E n tat i o n o f
g E n o M i c s D ata . a p p L i c at i o n to g E n o t y p i n g D ata
ManagEMEnt

Presenter: Hugues B Sicotte, Mayo Clinic
Authors: Jean-Pierre Kocher, Hugues Sicotte, Yaxiong Lin, Eric Klee




                                                                                     oral preSenTaTionS
aBstR ac t: motivation: The recent development of high throughput
sequencing technologies provides new opportunities to characterize lower
minor allele frequencies (maF) Snps in large sample populations. while
scientifically enticing, such genotyping studies will be technically challenging
with escalating data storage requirements per sample. To address this need,
data compression methods have been proposed that leverage Snp information
to reference genotyping data. However, these representations do not efficiently
account for Snps shared by a low number of individuals in a population. results:
GenoS is novel data model for the efficient storage of genotyping data. GenoS is
developed around a segment-based architecture designed to organize and share
biospecimen-related data produced by commonly used genomics technology
platforms. This architecture allows both explicit and referenced representations
of these data. This approach is particularly effective at storing genotypes.
Compared to the widely used plink format, GenoS achieves 1.4 time higher
compression on the storage of genotypes-related to Snps obtained from the
1000 genome and more then 8 time higher compression of genotypes for Snps
with maF <1%. on this dataset, data extraction times are comparable to plink.
as an increasing number of low maF Snps are discovered, GenoS will provide a
method for efficient and economical genotype data storage, while maintaining
good data retrieval performance. availability: The GenoS is a standalone
application written in Java. it includes functions for format conversion and data
extraction.
roCky ‘10                                                                       43
                   o R a L p R E s E n tat i o n s

              o p 4 9 : B i o i n f o R M at i c s s t u D y o f t h E f E R R E D o x i n -
              D E p E n D E n t B i L i n R E D u c t a s E f a M i Ly

              Presenter: Chanel K. Mejias-Rosario, Universidad Metropolitana
              Authors: Chanel Mejias-Rosario, Hugh Nicholas, Alexander Ropelewski, Ricardo
              Gonzalez-Mendez, Luis Vazquez-Quinones

              aBstR ac t: molecules that sense light in photosynthetic organisms,
              from bacteria to higher plants, are used for perception and adaptation to
              fluctuations of light. The Ferredoxin-dependent Bilin reductase (FdBr) protein
              Family includes enzymes that participate in the biosynthesis of the linear
              tetrapyrrole prosthetic groups of the light harvesting phycobiliproteins and
              the photoreceptor phytochromes. double bond reduction regiospecificity is
              responsible for the large diversity of their bilin products, which absorb light
              throughout the visible and near-ir spectral region. The FdBr protein Family
              consists of three distinctly different but closely related subfamilies: pcya
              (phycocyanobilin:ferredoxin oxidoreductase), peba (15,16-dihydrobiliverdin:ferre
              doxin oxidoreductase) and pebB (phycoerythrobilin:ferredoxin oxidoreductase).
              we performed a Bioinformatic characterization of the FdBr Family using SSearch
              & BlaST searches to detect homologues, multiple sequence alignments with
              Clustalw, motif elicitation by maximum entropy, and characteristic amino acid
              determination for each subfamily by group entropy using a kullback-leibler
              divergence. phylogenetic analysis for the FdBr family was done using meGa4
              and the neighbor-Joining algorithm with 100 bootstrap replicates. Structural
              alignments were performed against the pcya known structure with the predicted
              structures for peba and pebB by the protein Homology/analogy recognition
              engine (pHyre) server. The group entropy analysis identified sequence residues
              most likely to be responsible for the distinctive features of each subfamily:
              pcya: 98-H/n, 241-G/y & 173-i/y; peba: 242-S/G & 239-d/C & pebB: 242-r/G
              & 168-e/w (family identifier: amino acid alignment position-in family/ in the
              other families). results from these analyses can be used to elucidate chemical
              mechanisms and molecular basis of their unique specificities.
poSTer liST




              44                                                                               roCky ‘10
                                                       postER List

postER List

f R i D ay, D E c E M B E R 1 0
5:45pM – 8:00pM           snowMass confEREncE cEntER             sincLaiR RooM
                          (acRoss fRoM siLvERtREE hotEL)


intEgRating thE hypERgLossaRy with a quEstion answERing systEM
Michael A Bauer, University of Arkansas Little Rock

coMpaRativE ontoLogicaL anD nEtwoRk anaLysis of aging associatED gEnEs in
huMans anD MoDEL oRganisMs
Ari E Berman, Buck Institute for Age Research

BioinfoRMatic ELuciDation of consEnsus phosphoRyLation Motifs utiLizing
intER-spEciEs functionaL Data
Leethaniel Brumfield, NC State University

eqtL anaLysis in DiaBEtic nEphRopathy foR canDiDatE gEnE DiscovERy
Allison E Burlock, University of Michigan

pangEnoME-BasED taxonoMy
Nicholas P Celms, San Diego State University

Mapping intERnationaL pRotEin inDEx/unipRotkB to affyMEtRix pRoBE-sEt
iDEntifiER(s) to faciLitatE BioMaRkER iDEntification in MuLtipLE MyELoMa
Shweta S Chavan, University of Arkansas Little Rock

BLooD systEMs BioLogy foR MuLtiscaLE MoDELing of hEaRt attacks.
Scott L. Diamond, University of Pennsylvania

coMpaRativE anaLysis of thE fRactions of sEcREtED pRotEins EncoDED By
BactERiaL gEnoMEs
Yasmine T Elshakry, San Diego State University and Cairo University

aDaptivE LEaRning nEuRaL nEtwoRks foR BinDing sitE sEaRch in gEnoMic
sEquEncEs
Ivan Erill, University of Maryland Baltimore

coMpaRison of coDon usagE inDicEs as pREDictoRs of gEnE ExpREssion in
                                                                                 poSTer liST




MutationaLLy BiasED gEnoMEs
Ivan Erill, University of Maryland Baltimore

pREDicting fLExiBiLity in pRotEin stRuctuREs
Elizabeth A Eskow, University of Colorado, Boulder

coMpaRativE anaLysis of apicoMpLExan BioLogicaL pRocEssEs
Segun A Fatumo, Center for Tropical & Emerging Global Diseases

catEgoRycoMpaRE: high-thRoughput Data MEta-anaLysis using gEnE
annotations
Robert M Flight, University of Louisville

BuiLDing a high-DEnsity, high-thRoughput scaLaBLE gEnotypE stoRagE anD
coMputing fRaMEwoRk foR usE in LivEstock gEnoMic REsEaRch


roCky ‘10                                                                   45
                   postER List

              Fernanda S Foertter, Genus plc

              upic + go: zERoing in on infoRMativE MaRkERs
              Dorarean D Ford, Mississippi Valley State University

              BRinging coMputation into ap BioLogy cLassEs
              Suzanne R Gallagher, University of Colorado

              ELastin poLyMoRphisMs associatED with incREasED Risk of caRDiovascuLaR
              DisEasE
              Mahboubeh MG Ghoryshi, University of Toronto

              intERaction sitEs in MoDELs of pRotEin intERaction nEtwoRk EvoLution
              Todd A Gibson, University of Colorado Denver

              quantifying focaL aDhEsion spatiotEMpoRaL DynaMics thRough coMputationaL
              iMagE anaLysis
              Shawn M Gomez, University of NC-Chapel Hill

              DEtEcting gEnoME-wiDE copy nuMBER vaRiations in a singLE saMpLE using nExt
              gEnERation sEquEncing Data
              Rajesh K Gottimukkala, Life Technologies

              intERnaL DupLications in a-hELicaL MEMBRanE pRotEin topoLogiEs aRE coMMon
              But thE nonDupLicatED foRMs aRE RaRE
              Aron Hennerdal, Stockholm Center for Biomembrane Research

              invEstigating thE potEntiaL of viRaL pRocapsiDs in MEtaBoLic channELing
              Kris Hon, University of Toronto, Department of Biochemistry

              MRna-sEq woRkfLow at Mayo cLinic
              Asif Hossain, Mayo Clinic

              stop using Just go: a MuLti-ontoLogy EnRichMEnt anaLysis tooL foR gEnEs anD
              pRotEins
              Emily J Howe, The Buck Institute

              EaRLy DEtEction anD DynaMics of RaRE viRaL vaRiants By uLtRaDEEp sEquEncing
              Peter T Hraber, Los Alamos National Lab

              stRuctuRE aLignMEnt of pRotEins with Low sEquEncE iDEntity BasED on
poSTer liST




              EncoDED LocaL stRuctuRE aLphaBEts
              Kenneth Hung, Institute of Biomedical Engineering, National Taiwan University

              invEstigating RELationships BEtwEEn oBEsity anD thE BuiLt EnviRonMEnt using
              agEnt-BasED MoDELing
              Helmet T Karim, University of Pittsburgh

              pRoviDing contExt to gEnEtic associations with gEnE ExpREssion in REnaL
              DisEasE
              Benjamin J Keller, Eastern Michigan University

              aLgoRithM foR phyLogEnEtic tREE BuiLDing anD taxonoMic cLassification
              using cuRatED phyLogEnEtic tREE
              David A Knox, University of Colorado Anschutz

              DEtEcting casE-spEcific kEy-pathways using oMics ExpREssion Data

              46                                                                    roCky ‘10
                                                       postER List

Hande Kucuk, Max Planck Institute

phyLogEnEtic anaLysis anD stRuctuRE-function RELationships in thE
oxaciLLinasE EnzyME faMiLy
Kimberly R Lesnock, Pittsburgh Supercomputing Center

RE-sEquEncing woRkfLow at Mayo cLinic
Ying Li, Mayo Clinic

MoDELing gEnE-spEciEs Data By gEnERaLizED REpLicatoR DynaMics foR EfficiEnt
phyLogEnEtic infEREncE
Ying Liu, University of North Texas at Dallas

cancER gEnE ExpREssion Data is not noRMaLLy DistRiButED: anaLysis of
Data DistRiButions anD thEiR EffEcts on gEnE sELEction anD MoLEcuLaR
cLassification
Nicholas F Marko, Cleveland Clinic Department of Neurosurgery

aRtificiaL nEuRaL nEtwoRk appRoach foR pRoMotER pREDiction in pRokaRyotic
oRganisMs BasED on stRuctuRaL pRopERtiEs of Dna
Aleksandra A Markovets, University of Arkansas at Little Rock

MoDELLing gEnE ExpREssion in tuMoR pRogREssion using BinaRy statEs
Juan Emmanuel Martinez-Ledesma, ITESM Campus Monterrey

functionaL REguLatoRy ciRcuits inDucED By tRanscRiption factoRs anD sMaLL
Rnas
Molly Megraw, Duke University

LaRgE scaLE anaLysis of thE soLvation pRopERtiEs of foLDED pRotEins
Marcelo C Melo, UFRJ

a cRoss spEciEs iDEntification of shaRED tRanscRiptionaL nEtwoRk of DiaBEtic
nEphRopathy
Viji S Nair, University of Michigan

RaDiant: intERactivE visuaLization of taxonoMic aBunDancE
Brian D Ondov, National Biodefense Analysis & Countermeasures Center

topiaRy ExpLoRER
Megan A Pirrung, University of Colorado
                                                                                  poSTer liST




DiscovERy of nEw LiganDs foR ppaR gaMMa BasED on thiazoLiDinE-4-onE:
viRtuaL scREEning, MoLEcuLaR Docking anD REcEptoR BinDing stuDy
Sujatha Ramasamy, Sathyabama University

BioinfoRMatic anD coMputationaL chaRactERization of oRf6: a putativE
thioEstERasE
Maria M Rodriguez-Guilbe, Department of Biochemistry, University of Puerto Rico
School of Medicine

spLicEgRaphER: pREDicting spLicE gRaphs fRoM DivERsE EviDEncE
Mark F. Rogers, Colorado State University

a suppoRt vEctoR cLassifiER foR koRaRchaEota containing hot spRings
Christian A Ross, The University of Nevada Las Vegas

roCky ‘10                                                                    47
                   postER List

              EvoLution of pRotEin stRuctuRE in MEtapnEuMoviRus
              Sunando Roy, Pennsylvania State University

              hivtooLBox, an intEgRatED wEB appLication anD DataBasE foR invEstigating hiv
              David P Sargeant, University of Nevada Las Vegas

              MoLEcuLaR MoDELing anD Docking stuDiEs of soME novEL DERivativEs
              of n-phEnyL-2-(pyRiMiDin-2-yLsuLfanyL) acEtaMiDE as anti saRs pRotEasE
              inhiBitoRs
              Gyana R Satpathy, National Institute of Technology

              MoLEcuLaR DynaMics siMuLations of sEquEntiaLLy vaRiED huMan
              iMMunoDEficiEncy viRus-1 tat consEnsus pRotEin stRuctuREs
              Gyana R Satpathy, National Institute of Technology

              pRinsEq, tagcLEanER anD DEconsEq — tooLs foR quaLity contRoL anD pRE-
              pRocEssing of MEtagEnoMic DatasEts
              Robert Schmieder, San Diego State University

              in siLico infEREncE of iMMunoLogicaL RELationships BEtwEEn pRotEins BasED
              on thEiR cytotoxic t-LyMpthocytE EpitopE REpERtoiREs
              Werner Smidt, University of Pretoria

              chaRactERizing apicoMpLExan paRasitE MEtaBoLisM By fLux BaLancE anaLysis
              of toxopLasMa gonDii
              Carl Song, University of Toronto

              siMpLE LocaL assEMBLy pRogRaM
              Adam W Spargo, Wellcome Trust Sanger Institute

              coMpaRing gEnoMEs using thE pRofiLEs packagE in R
              Chris J. Stubben, Los Alamos National Lab

              EnRiching REguLatoRy nEtwoRks with othER functionaL RELationships
              Ronald C Taylor, Pacific Northwest National Laboratory

              fun anD gaMEs with RDf — Moving Rat Data on to thE sEMantic wEB
              Simon N Twigger, Medical College of Wisconsin

              cLosing thE gap in tiME: fRoM Raw Data to REaL sciEncE (sciEncE as a sERvicE –
              scaas)
poSTer liST




              Anjana Varadarajan, EdgeBio

              vaLiDation of pRotEin functionaL sitE pREDictions using autoMatED
              BioMEDicaL LitERatuRE anaLysis
              Karin M Verspoor, University of Colorado Denver

              iMpRoving thE accuRacy of coEvoLution-BasED MEthoDs to pREDict pRotEin-
              pRotEin intERactions
              Guisong Wang, University of Maryland Baltimore County

              foLD REcognition anD aLignMEnt foR tRansMEMBRanE pRotEins
              Han L Wang, University of Missouri

              concEpts at pLay in sciEntific aRguMEntation
              Elizabeth K White, University of Colorado, Denver

              48                                                                   roCky ‘10
                                                    postER List

DEfog — DiscREtE EnRichMEnt of functionaLLy oRganizED gEnEs
Tobias Wittkop, Buck Institute for Age Research

ExtRacting aDvERsE DRug REactions fRoM usER posts to hEaLth-RELatED sociaL
nEtwoRks
Laura Wojtulewicz, Arizona State University

finDing coMMunity LEaDERs in sociaL nEtwoRks
Xiaowei Xu, University of Arkansas at Little Rock

intEgRativE nEtwoRk anaLysis to pREDict EnDocRinE REsistancE in BREast
cancER
Jason Xuan, Virginia Tech

puRification of BactERiaL apoa-1 anD chaRactERization of novEL anticancER
DRug DELivERy systEM
Thurman Young, North Carolina State University




                                                                                 poSTer liST




roCky ‘10                                                                   49
                            p o s t E R p R E s E n tat i o n s

                       p o s t E R p R E s E n tat i o n s

                       i n t E g R at i n g t h E h y p E R g L o s s a R y w i t h a q u E s t i o n
                       answERing systEM

                       Presenter: Michael A Bauer, University of Arkansas Little Rock
                       Authors: Michael Bauer, Robert Belford, Roger Hall, Daniel Berleant

                       aBstR ac t: we live in an age where we have access to more information
                       than ever before which can be a double edged sword. The access to information
                       allows for a more informed and empowered researcher. There is a need for
                       information tools that bring back relevant information from multiple reliable
                       sources. Question answering (Qa) is a specialized type of information retrieval
                       with the aim of returning precise short answers to queries posed as natural
                       language questions. The HyperGlossary (www.hyperglossary.org) is a tool
                       that we developed to automate the insertion of hyperlinks into a digital text
                       document or web page which can connect words or phrases to relevant textual
                       material such as definitions of words, multimedia content, or multidimensional
                       structures for molecules. when a user reads a word or phrase in a document
                       that is connected to a glossary term, the information associated with the term
                       can be viewed without leaving the original document. we plan to integrate the
                       Hyperglossary with question answering system. potential answers chosen by the
                       Qa system will be piped through the HyperGlossary before being returned to
                       the user. The integration of the HyperGlossary will allow the answer to not only
                       be linked back to the original document, but also the keywords and phrases
                       will be linked to additional sources of information. The system brings together
                       traditional text information and dedicated biological databases to present a
                       concise answer to the user.

                       c o M p a R a t i v E o n t o L o g i c a L a n D n E t w o R k a n a Ly s i s
                       o f a g i n g a s s o c i at E D g E n E s i n h u M a n s a n D M o D E L
                       oRganisMs

                       Presenter: Ari E Berman, Buck Institute for Age Research
                       Authors: Ari Berman, Tobias Wittkop, Emily Howe, Sean Mooney

                       aBstR ac t: a great deal of research over the past few decades has been
                       devoted to the study of aging in humans and model organisms. despite the
                       steadily increasing foundation of research, its biological mechanisms remain
                       an active area of study. many genes have been implicated in the process of
poSTer preSenTaTionS




                       aging, largely through the use of model organisms, such as C. elegans, d.
                       melanogaster, and m. musculus. although these genes shed light on the aging
                       process, it is not clear how these genes translate to other model organisms
                       and humans. in this study, we compared two gene sets associated with aging,
                       Genage (human-related genes), and anage (aging genes in animal models),
                       in order to determine if the gene sets carried any functional and conceptual
                       similarities. Comparisons were performed using Gene ontology, Swissprot,

                       50                                                                               roCky ‘10
                                       p o s t E R p R E s E n tat i o n s

Genemania, sequence similarity, and the nCBo automated concept annotator.
Generalized comparisons were made in order to evaluate the conceptual
similarities in the datasets. The results show commonalities between the model
organisms and humans. additionally, this process could lead to a mechanism for
inferring the appropriate aging genes in humans.

B i o i n f o R M at i c E L u c i D at i o n o f c o n s E n s u s
p h o s p h o R y L at i o n M o t i f s u t i L i z i n g i n t E R - s p E c i E s
f u n c t i o n a L D ata

Presenter and Author: Leethaniel Brumfield, NC State University

aBstR ac t: a key to controlling the rice blast magnaporthe oryzae is a better
understanding of its pathogenic mechanisms; an important part of which
reside in the ability of cellular signaling molecules (kinases) to phosphorylate
a core set of transcription factors (TF) in a direct and controlled manner.
Therefore, the more we know about the downstream targets of kinases, their
associated pathways, and TF-regulated genes, the more effective controlling
pathogenicity efforts will be. previous pathway and network structure research
in S. cerevisiae and H. sapiens may be utilized to better understand cellular
signaling in m. oryzae when investigating similar proteins. large scale protein
phosphorylation microarrays can be used to accurately identify functional TF
targets of homologous kinases across these species. potentially phosphorylated
binding motifs were identified in these TFs using the pratt algorithm that detects
sequence patterns. These TF phosphorylation motifs were used to examine the
shared functionality between homologous kinases. our findings showed that in
all three species there were slightly more kinases that fell into the mapk kinases
family, and within m. oryzae enriched mapk TFs reached 75.64% and 77.54%
and 90.78% in H. sapiens and S. cerevisiae respectively. in our continued
mission to fully understand the transcriptional control of each gene and the
targets of each TF involved in controlling infection related development and
pathogenicity, future research includes comparing the data compiled from pratt
with other motif-finding programs and eventually composing an open-to-the-
public online m. oryzae TF database.

e q t L a n a Ly s i s i n D i a B E t i c n E p h R o p a t h y f o R c a n D i D a t E
gEn E DiscovERy

Presenter: Allison E Burlock, University of Michigan
Authors: Allison Burlock, Benjamin Keller, Matthias Kretzler
                                                                                                poSTer preSenTaTionS




aBstR ac t: in contrast to traditional candidate gene discovery using guilt-
by-proximity, expression as a Quantitative Trait loci (eQTl) analysis provides
an opportunity to link genetic variants directly to regulatory mechanisms of
disease. an eQTl analysis integrates gene expression with genotypic information
and covariates to identify genomic loci where genotype significantly affects
gene expression. we present our approach for performing an eQTl study of
tissue-specific expression from renal biopsies in a small population with early

roCky ‘10                                                                                  51
                            p o s t E R p R E s E n tat i o n s

                       stage diabetic nephropathy (dn). renal biopsies from early dn were analyzed
                       using affy 6.0 Snp chip and H133a and H133plus2 expression chips. Statistical
                       analysis was done using merlin. our analysis was based on the 12,888 refseq
                       transcripts common to the H133a and H133plus2 platforms. Snps were limited
                       to those with minor allele frequency (maF) greater than 0.05. we distinguish
                       between cis and trans associations. a cis association is defined for a Snp and
                       gene, where the Snp is within a given distance of the gene, and any association
                       that does not fit this definition is considered trans. For the trans analysis, we
                       look for hotspots—or many associations mapping to the same chromosomal
                       locus. a permutation test is used to verify that our results are not likely to be due
                       to chance.

                       pa n g E n o M E - B a s E D ta x o n o M y

                       Presenter: Nicholas P Celms, San Diego State University
                       Authors: Nicholas Celms, James Nulton, Peter Salamon, Robert Edwards

                       aBstR ac t: The pangenome, the complement of all genes, has been
                       calculated for different taxonomic groups that have sufficient numbers (>2)
                       of sequenced genomes. an organism’s gene composition has been modeled
                       as individual pools of genes with a probability of selection. Some genes occur
                       in pools with a high probability of selection, as they are found in all of the
                       genomes that contribute to a group, whereas genes found uniquely in one or
                       a few genomes have a very low probability of selection. These pools then serve
                       as a predictive model of the expected total pangenome size, and lead to a
                       predicted saturation curve for new genes. This saturation curve helps estimate
                       the sequencing required to confidently claim that all genes of a pangenome
                       have been sequenced. a matrix defines the pangenome for a sample set with
                       genes as columns, and strains as rows. properties of this matrix lead to a
                       novel approach to taxonomy. each set of genes with identical column vectors
                       is defined as a clique. Cliques represent gene sets with direct implication for
                       determining taxonomic proximity between strains. a clan is defined for each
                       clique as the set of strains in which the genes of the clique exist. Cliques and
                       clans are used to generate splits networks, which reproduce taxonomy among
                       the strains. Taxonomy based on gene sets is broadly applicable, and has been
                       demonstrated with bacteriophages. our application to bacteriophage datasets
                       will lead to new conclusions about the phage proteomic tree.

                       M a p p i n g i n t E R n at i o n a L p R o t E i n i n D E x / u n i p R o t k B
                       to a f f y M E t R i x p R o B E - s E t i D E n t i f i E R ( s ) to fa c i L i tat E
                       B i o M a R k E R i D E n t i f i c a t i o n i n M u Lt i p L E M y E L o M a
poSTer preSenTaTionS




                       Presenter: Shweta S Chavan, University of Arkansas Little Rock
                       Authors: Shweta Chavan, John Shaughnessy Jr., Bart Barlogie, Ricky Edmondson

                       aBstR ac t: affymetrix microarrays are widely used in genomics. likewise,
                       international protein index (ipi) database, and UniprotkB are commonly used


                       52                                                                                 roCky ‘10
                                   p o s t E R p R E s E n tat i o n s

in proteomics. However, a complete mapping from ipi to affymetrix ids is
currently unavailable, resulting in loss of critical information. our objective is to
maximize mapping of these protein ids to their corresponding affymetrix id(s)
to enable correlation of proteomics and genomics expression profiles. mappings
were obtained by parsing the annotation files provided by ipi, UniprotkB,
affymetrix, ensembl, and nCBi to establish a link using all possible identifiers
that are common to ipi and/or UniprotkB, and affymetrix id(s). if a common
identifier is unavailable, then indirect links were created by performing sequence
alignments. direct mapping linked 83.3% of the human ipi ids to affymetrix
ids, while the remaining 16.7% ipi ids were subjected to sequence alignments.
a web-based tool, ‘ipi2affy’ (http://binf-app.host.ualr.edu/~shweta/cgi-bin/
ipi2affy.cgi ), was created to enable id conversions. (The same procedure is in
progress for converting UniprotkB ids to affymetrix ids.) a proteomics dataset of
100 myeloma patient samples with 2243 proteins identified will be converted
to affymetrix ids using ipi2affy, and compared to available affymetrix gene
expression data to find the correlation(if any). ‘ipi2affy’ and ‘UniprotkB2affy’ will
therefore be used as and aid for finding biomarker patterns that may or may not
be consistent across both the ‘omic’ levels.

B L o o D s y s t E M s B i o L o g y f o R M u Lt i s c a L E M o D E L i n g o f
h E a R t at ta c k s .

Presenter: Scott L. Diamond, University of Pennsylvania
Authors: Scott Diamond, Manash Chatterjee, Matt Flamm

aBstR ac t: we deployed distinct approaches of bottom-up and top-down
analyses to gain insight into platelet signal transduction and coagulation
protease cascades. The bottom-up approach (purvis et al. Blood, 2008; ploS
Comp. Biol. 2009) involved a computational platelet model — assembled from
24 peer-reviewed platelet studies to yield 132 measured kinetic rate constants
— that accurately predicts resting and stimulated levels of cytosolic calcium,
ip3, diacylglycerol, phosphatidic acid, phosphoinositol, pip, and pip2. Similarly,
a bottom-up approach was used to model blood coagulation (Chatterjee et
al. ploS Comp. Biol. 2010) in resting and activated blood. The platelet-plasma
model accounts for thrombin production in resting or activated blood. The
kinetic ode model (76 species, 57 reactions, 105 kinetic parameters) predicted
the clotting of resting and convulxin-activated human blood as well as predicted
clotting times of human blood under 50 different initial conditions that titrated
increasing levels of TF, Xa, Va, Xia, iXa, and Viia. These approaches are now
being assembled in multiscale simulations of clotting under flow where: (1) the
changing flow field is solved by CFd or lattice Boltzman, (2) platelet aggregation
                                                                                          poSTer preSenTaTionS




and deposition is solved by lattice kinetic monte Carlo, (3) soluble species (adp,
TXa2, thrombin) are solved by continuum models for convection-diffusion-
reaction, and (4) platelet activation is solved by ode models or patient-specific
trained neural networks (Chatterjee et al. nature Biotech. 2010). Simulations
predict occlusion times for stenosed arteries exposing collagen and tissue factor.


roCky ‘10                                                                            53
                            p o s t E R p R E s E n tat i o n s

                       c o M p a R a t i v E a n a Ly s i s o f t h E f R a c t i o n s o f s E c R E t E D
                       p RotEi ns EncoDED By BactER iaL gEnoM Es

                       Presenter: Yasmine T Elshakry, San Diego State University and Cairo University
                       Authors: Ahmed Mahmoud, Yasmine Elshakry, Ramy Aziz

                       aBstR ac t: Bacteria secrete proteins for different purposes, including nutrient
                       acquisition, cell-cell communication, and adaptation to their environment.
                       Host-associated bacteria, in particular, use secreted proteins to evade their
                       hosts’ immune systems and invade host tissues, which sometimes leads to
                       pathogenesis. among the best-studied secretion systems in both Gram-positive
                       and Gram-negative bacteria is the Sec-dependent secretion. with no exception,
                       all bacterial genomes encode proteins with sec-dependent secretion signals;
                       however, the proportion of such proteins varies widely from one genome to
                       another. in this study, we sought to determine the fractions of secreted proteins
                       (fsp) in bacterial genomes and the different parameters associated with their
                       variations. we optimized the hidden markov models and the neural network-
                       based methods in Signalp (http://www.cbs.dtu.dk/services/Signalp) to predict
                       secreted proteins in all bacterial genomes sequenced until January 2009.
                       Having calculated the fsp for each bacterial genome, we performed multivariate
                       statistical analysis to determine the significant covariates that affect these
                       fractions. we also performed general linear model analysis on a subset of 10
                       Gram-positive and Gram-negative genera with pathogenic and non-pathogenic
                       species. not surprisingly, the type of cell wall (Gram-positive or negative) was
                       a significant factor affecting the fsp. in addition, genome size showed partial
                       correlation (correlation coefficients between 0.35-0.54). However, the most
                       significant determinant remained the genus to which bacteria belonged.
                       interestingly, endosymbionts with the smallest genomes (e.g. Buchnera) were
                       found to encode minimal sets of secreted proteins. moreover, in some bacterial
                       classes, e.g, Gram-positive cocci, pathogenic species encode significantly higher
                       fsp than non-pathogenic species belonging to the same genera.

                       aDaptivE LEaR n i ng n Eu R aL n Et woR ks foR B i n Di ng sitE
                       sEaRch in gEnoMic sEquEncEs

                       Presenter: Ivan Erill, University of Maryland Baltimore
                       Authors: Joseph Cornish, Sumeet Bagde, Elisabeth Hobbs, Ivan Erill

                       aBstR ac t: artificial neural networks (ann) and other machine learning
                       systems, like Hidden markov models, can be trained to become highly efficient
                       pattern recognition systems that are able to discern non-linear features on
poSTer preSenTaTionS




                       complex backgrounds. For this reason, neural networks have been proposed
                       frequently as suitable search tools for the identification of transcription factor-
                       binding sites in genomic sequences. Here we show that neural networks
                       trained with the standard backpropagation algorithm perform significantly
                       worse at locating transcription factor binding sites in genome sequences than
                       standard weight-matrix search techniques. we observe that this is due to the
                       ill-balanced nature of the search problem, which requires the identification of
                       54                                                                               roCky ‘10
                                  p o s t E R p R E s E n tat i o n s

a small number of sites against a very large background. we propose a new
algorithm, adaptive learning, based on a targeted sampling of the background
during backpropagation learning. we validate this approach by cross-validation
on an up-to-date collection of Crp sites from escherichia coli. a portion of these
sites is used to train ann committees with adaptive learning. The remainder
is used for benchmarking search efficiency against the original e. coli genome,
a randomly generated genome and the genome of paenibacillus sp. our
results demonstrate that adaptive learning of neural networks improves search
efficiency dramatically against all backgrounds. we observe also that enhanced
learning algorithms are likely to be hindered by the presence of unknown
positives on the original genome. we discuss the general implications of these
findings for machine learning approaches to binding site search.

c o M pa R i s o n o f c o D o n u s a g E i n D i c E s a s p R E D i c to R s o f
g E n E E x p R E s s i o n i n M u t a t i o n a L Ly B i a s E D g E n o M E s

Presenter: Ivan Erill, University of Maryland Baltimore
Authors: Mindy Or, Isaac Jensen, Ivan Erill

aBstR ac t: The study of codon usage bias (CUB) patterns is an important tool
in genome analysis. Codon usage bias indices measure either the deviation from
uniform codon usage distribution or the distance in CUB from a given reference
set representing the major codon bias. Both approaches provide dubious results
when analyzing CUB in genomes with other significant underlying patterns, such
as %GC bias or %GC skew. Here we analyze the behavior of six CUB indices with
and without correction for mutational bias, and we benchmark their efficiency
at predicting gene expression values from microarray expression data. reference
sets are automatically determined using an iterative deterministic algorithm for
the detection of self-consistent major codon bias sets in genome sequences.
our results show that the relative Codon adaptation (rCa) index outperforms
all other CUB indices, including the Codon adaptation index (Cai), at predicting
gene expression in genomes with markedly different mutational biases. we
also analyze the behavior of indices not relying on a reference set and their
corrections to integrate mutational bias. in combination with these indices, rCa
allows the development of methods to detect lateral gene transfer in genomes
displaying strong mutational bias.

p R EDicti ng f LExi B i Lit y i n p RotEi n stR uctu R Es

Presenter: Elizabeth A Eskow, University of Colorado, Boulder
Authors: Elizabeth Eskow, Asa Ben-Hur, Hubert Yin, Debra Goldberg, Deanne
                                                                                           poSTer preSenTaTionS




Sammond

aBstR ac t: protein flexibility evident in conformational changes plays a key
role in protein function, and is essential to many interactions with other proteins
or molecules. we are developing a machine learning approach to predicting
protein flexibility at the residue level. input features to a Support Vector machine
(SVm) classifier are calculated from experimental structures, including the

roCky ‘10                                                                             55
                            p o s t E R p R E s E n tat i o n s

                       number and types of non-covalent interactions a side-chain participates in.
                       Features from neighboring residues are included as attributes to encourage
                       the prediction of regions of flexibility. preliminary classification results will be
                       presented showing the promise of the method. we also discuss challenges
                       encountered in labeling residues as flexible or rigid for the construction of
                       training and test data. we believe that incorporating our method into a scoring
                       function used in computational protein design or protein docking software
                       will lead to improvements in these algorithms without incurring prohibitive
                       additional computational costs.

                       c o M p a R a t i v E a n a Ly s i s o f a p i c o M p L E x a n B i o L o g i c a L
                       pRocEssEs

                       Presenter: Segun A Fatumo, Center for Tropical & Emerging Global Diseases
                       Authors: Segun Fatumo, Jessica Kissinger

                       aBstR ac t: apicomplexans are early branching unicellular, parasitic
                       eukaryotes related to Ciliates and dinoflagellates (Baldauf 2003). included
                       in the phylum apicomplexa are several agents of human and animal disease
                       such as plasmodium spp.(the causative agent of malaria), and the aidS-related
                       pathogens, Cryptosporidium spp., and Toxoplasma gondii, The availability
                       of genome sequence for many apicomplexans provides an opportunity for
                       biochemical pathway comparative analysis. in this work, we used orthomCl
                       (li, Stoeckert and roos 2003) to identify the orthologous genes that are
                       uniquely present across the entire phylum and orthologous genes across some
                       specific lineages within the phylum. we have compared twelve species within
                       the apicomplexa and two ciliate outgroups and preliminarily mapped their
                       metabolic pathway reaction content. we mapped these data unto a tree of the
                       evolutionary relationships of these organisms (kuo et al get ref) to determine
                       the lineage-specificity of the metabolic capacity of the organisms. in addition
                       to whole content comparisons, we discovered lineage-specific evolution of
                       individual proteins in terms of their protein domains as identified by pFam.Â
                       By analyzing the unique genes common to all species of apicomplexa and
                       finding their biological processes, 16.1% have no biological processes according
                       to Blast2Go, pfam2Go and eupathdB while about 37% of the genes are still
                       hypothetical.

                       c a t E g o R y c o M p a R E : h i g h - t h R o u g h p u t D a t a M E t a - a n a Ly s i s
                       u s i n g g E n E a n n o tat i o n s

                       Presenter: Robert M Flight, University of Louisville
poSTer preSenTaTionS




                       Authors: Robert Flight, Jeffrey Petruska, Benjamin Harrison, Eric Rouchka

                       aBstR ac t: motivation: many current dna microarray and other high-
                       throughput data meta-analysis studies concentrate on deriving a concordant
                       list of genes across many experiments to discover the “true” genes responsible
                       for a particular disease process or biological pathway or cellular response.
                       However, by concentrating on the genes in common, similarities or differences

                       56                                                                                 roCky ‘10
                                  p o s t E R p R E s E n tat i o n s

that exist at a pathway or process level are ignored. results: we describe a
meta-analysis approach that allows comparison and contrast of gene lists at the
level of categorical annotation (pathway or Gene ontology annotations). This
categorical evaluation compares enriched annotations between gene lists, and
displays the results graphically to allow intuitive visualization and exploration
of the similarities and differences. a false discovery correction is implemented
to control for the effect of different sized gene lists as inputs. Conclusion: The
approach was tested using two gene lists, genes involved in the response to
denervation in muscle (a literature compendium), and in skin (experimentally
determined). Using the categorical comparison highlights known biological
processes that are common in the two cases, while also allowing one to easily
see areas of difference that are not apparent from examining the gene lists
alone. availability: The categoryCompare software is available as a Bioconductor
package, and a web interface (using rapache) has also been developed to
facilitate use in the wider research community.

B u i L D i n g a h i g h - D E n s i t y, h i g h -t h R o u g h p u t s c a L a B L E
g E n ot y p E s to R a g E a n D c o M p u t i n g f R a M E w o R k f o R u s E i n
L i v E s to c k g E n o M i c R E s E a R c h

Presenter: Fernanda S Foertter, Genus plc
Authors: Fernanda Foertter, Matthew Cleveland, Selma Forni, Nader Deeb, Nan
Yu, Scott Newman, Chad Cropper

aBstR ac t: Genus plc uses genomic data to estimate breeding values (BV)
for commercially relevant traits of cattle and pigs. Traditionally BV are estimated
from phenotypes and pedigree, yet with genotyping and sequencing becoming
easier, faster and cheaper, genomic BV promise to increase accuracy and
decrease generation interval thereby increasing rates of genetic improvement.
we are seeing our genetic evaluation datasets growing at an unprecedented
rate, necessitating a high performance database able to store large amounts
of genomic data from livestock populations with complex, known pedigree
structure and deep phenotypes. a current project at Genus is to create a high-
density scalable distributed genotype database able to store phenotypes with
Snp chip and in the future individual sequence data. Combined with our high
performance cluster, this will give us an efficient framework where we can mine
the data constantly to identify candidate individuals to be genotyped, impute
missing genotypes, select cohorts for specific studies, sequence alignment,
estimate BV and other routine genomic analysis.

u p i c + g o : z E R o i n g i n o n i n f o R M at i v E M a R k E R s
                                                                                          poSTer preSenTaTionS




Presenter: Dorarean D Ford, Mississippi Valley State University
Authors: Dorarean Ford, Renee Arias, Linda Ballard, Brian Scheffler, Mary Duke,
Sheron Simpson, Abigail Newsome

aBstR ac t: microsatellites/SSrs (simple sequence repeats) have become a
powerful tool in genomic biology because of their broad range of applications

roCky ‘10                                                                           57
                            p o s t E R p R E s E n tat i o n s

                       and availability. an efficient method recently developed to generate
                       microsatellite-enriched libraries used in combination with high throughput
                       dna pyrosequencing with roche 454 allow isolation of large number of
                       microsatellites. although very effective, screening hundreds of microsatellites
                       on large number of samples can be expensive. we introduce UpiC + Go as a
                       cost-effective tool that will zero in on informative markers with discrimination
                       power. This approach is an extension of UpiC, Unique pattern informative
                       Combinations, which provides users with a more economical plan for choosing
                       which markers to run in an experiment based on the obtainable information
                       and UpiC scores . we used as a model system macrophomina phaseolina, a
                       soil-borne fungus that causes charcoal rot in numerous plant species. Sequences
                       were assembled into contigs and primers were designed on repeat regions.
                       Blast2Go was used to annotate sequences, and primers were screened on 24
                       isolates of m. phaseolina. dna fingerprinting provided amplicon distinction
                       which was validated with Genemapper analysis. UpiC scores were calculated
                       and used in association with the annotation to make biological inferences
                       about the isolates. incorporating a priori knowledge about the function of a
                       discriminative marker will enhance the selection process in experiments in a
                       cost-effective manner.

                       B R i n g i n g c o M p u tat i o n i n to a p B i o L o g y c L a s s E s

                       Presenter: Suzanne R Gallagher, University of Colorado
                       Authors: Suzanne Gallagher, Debra Goldberg

                       aBstR ac t: Computation plays an important role in modern biology but is
                       almost non-existent in high school biology classes. we sought to correct this
                       by introducing a computational biology unit into advanced placement Biology
                       classes at two Boulder Valley high schools as part of the eCSite Gk-12 program.
                       Students were given a brief introduction to algorithms and taught the basics
                       of the BlaST algorithm. Students were then given genetic sequences from 8
                       different species and asked to use BlaST to compare the sequences and build
                       a phylogenetic tree based on the BlaST results. This activity allowed students
                       to learn about genetics and some of the genetic evidence for evolution,
                       essential learnings in the existing biology curriculum, while at the same time
                       gaining experience with computation and how it is used in modern biology.
                       we had three goals in this activity: introduce students to computer science
                       and computational thinking, give students a demonstration of computational
                       biology and how computer science is being used to make biological discoveries,
                       and to demonstrate why biology students should have an understanding of
                       the algorithms that they use. By and large we were successful at these goals,
poSTer preSenTaTionS




                       although there were some expected and unexpected challenges along the
                       way. we hope to expand on this work by presenting the lesson to other classes
                       and integrating other aspects of computational biology into the high school
                       curriculum. This work may also serve as a guide in finding ways to integrate
                       computational biology into a revised ap curriculum.


                       58                                                                           roCky ‘10
                                     p o s t E R p R E s E n tat i o n s

E L a s t i n p o Ly M o R p h i s M s a s s o c i a t E D w i t h i n c R E a s E D R i s k
o f c a R D i o va s c u L a R D i s E a s E

Presenter: Mahboubeh MG Ghoryshi, University of Toronto
Authors: M Ghoryshi, D He, S Lemaire, D Milewicz, F Keeley, J Parkinson

aBstR ac t: elastin is a polymeric structural protein that self-assembles into
fibers that are responsible for elastic properties of many tissues including large
arteries. it constitutes approximately 30% of the dry weight in arteries. elastin
fibers are remarkably stable with little or no normal turnover over the life-span
of an individual therefore; they should be able to withstand millions of cycles
of extension and recoil in tissues such as arteries without mechanical failure.
we hypothesize that any subtle variation in elastin sequence can impact elastin
durability in arteries and consequently increase susceptibility to cardiovascular
diseases. initial searches of public Single nucleotide polymorphism (Snp)
databases revealed the presence of 264 Snps in the elastin gene (eln), of
which 13 are non-synonymous. one of these (rs2071307 — Gly422Ser) which
converts a glycine to a serine was found to significantly impact the self-assembly
and elasticity properties of elastin-like polypeptides. in a more focused study
we are examining the functional consequences of sequence variants in elastin
on cardiovascular disease. applying the Solexa next generation sequencing
platform, we have sequenced the eln from 800 subjects diagnosed with
thoracic aortic aneurysm and dissection (Taad) in addition to 400 control
samples from ontario residents. our initial analyses identified an additional
50 Snps present in Taad samples, including 2 novel non-synonymous Snps
in exons 14 (ala239Gly) and 31 (pro736Gln). after confirming the increased
prevalence of these SnpS in the disease samples, their impact on elastin
assembly and function will be assessed through the production of appropriate
recombinant elastin-like polypeptides.

i ntER action sitEs i n MoDELs of p RotEi n i ntER action
n Et woR k EvoLution

Presenter: Todd A Gibson, University of Colorado Denver
Authors: Todd Gibson, Debra Goldberg

aBstR ac t: Theoretical models of biological networks are valuable tools in
evolutionary inference. evolutionary network models featuring biologically-
plausible evolutionary mechanics have shown the importance of gene
duplication and divergence in the evolution of protein interaction networks.
Those these these duplication and divergence models are highly regarded,
                                                                                                poSTer preSenTaTionS




they are not without shortcomings. Though both networks generated by these
models and empirical protein interaction networks are highly clustered, the
model-generated networks have substantially lower clustering than observed
in empirical data. we have enhanced the duplication and divergence model by
associating each protein’s interactions with one or more heritable interaction
sites. as genes duplicate, interaction sites are inherited by progeny proteins.

roCky ‘10                                                                                  59
                            p o s t E R p R E s E n tat i o n s

                       The loss of redundant interactions is resolved at the level of the interaction site,
                       modeling the effect of degenerative sequence mutations on interaction sites
                       on the surface of the protein. Heritable homomeric proteins and asymmetric
                       divergence are additional biological phenomena naturally captured by the
                       interaction site model. These model enhancements much more closely reflect
                       the clustering found in empirical networks.

                       q u a n t i f y i n g f o c a L a D h E s i o n s pat i o t E M p o R a L D y n a M i c s
                       t h R o u g h c o M p u t a t i o n a L i M a g E a n a Ly s i s

                       Presenter: Shawn M Gomez, University of NC-Chapel Hill
                       Authors: Matthew Berginski, Eric Vitriol, Klaus Hahn, Shawn Gomez

                       aBstR ac t: The success of Google’s pagerank algorithm can largely be
                       attributed to its capability to compute the unknown importance of each
                       webpage based upon relationships encoded in the form of a directed graph
                       representing billions of existing webpages. a related approach has been
                       successfully applied to the protein interaction network alignment problem
                       by Singh et al. (2008, pnaS), where mappings between genes/proteins and
                       their interactions are recovered from graphically represented ppi networks
                       across different species. These approaches, however, do not take into account
                       edge information that may be of significant importance and help improve
                       prediction accuracy. we have developed a novel, generalized spectral algorithm
                       and applied it to a phylogeny mapping problem where the goal is to predict
                       interaction specificity between two families of paralogous proteins. The premise
                       is that, for certain interacting protein families, proteins will coevolve so as to
                       maintain functional interactions. Conceptually related to the idea of comparing
                       the structure of two phylogenetic trees, the degree of coevolution in our method
                       is ascertained by comparing the neighborhood structure, as described by
                       evolutionary distances between members, between the two protein families.
                       in paralogous sequence space, proteins that are diverging from their ortholog
                       will form a similar neighborhood structure as their interaction partner(s).
                       Therefore, interacting paralogous pairs can form a correlated neighborhood
                       structure/distance relationship, which our algorithm uses to make predictions of
                       interaction specificity. This algorithm can also be more generally applied to the
                       problem of weighted graph matching, with numerous applications in computer
                       vision, image analysis, and computational chemistry.

                       D E t E c t i n g g E n o M E - w i D E c o p y n u M B E R va R i at i o n s i n a
                       s i n g L E s a M p L E u s i n g n E x t g E n E R at i o n s E q u E n c i n g D ata
poSTer preSenTaTionS




                       Presenter: Rajesh K Gottimukkala, Life Technologies
                       Authors: Rajesh Gottimukkala, Fiona Hyland, Somalee Datta, Asim Siddiqui, Ryan
                       Koehler, Yutao Fu

                       aBstR ac t: we present a sensitive and specific algorithm for calculating
                       genome wide copy number variations (CnVs) using next generation sequencing
                       data. CnVs encompass more nucleotide content per genome than Snps and

                       60                                                                              roCky ‘10
                                     p o s t E R p R E s E n tat i o n s

have been recently recognized as an important source of genetic variation.
detecting CnVs with microarrays has limitations due to low resolution which
deep sequencing methods overcome and allow for detection of CnVs of
arbitrary lengths. methods such as CnV-seq and SegSeq, detect CnVs in tumor
sample using deep sequencing methods but are constrained by the requirement
of a matched normal sample. our method is based on depth of coverage and
detects CnVs in a single sample compared to the reference (not requiring a
matched normal sample) by performing effective normalization based on GC
content and genome mappability. Given that coverage depth in any region is
proportional to the number of times it appears in the sample, we calculate
coverage in variable-sized genomic windows, normalize it, use Hidden markov
model for segmentation and apply empirically derived filters to the segments
to call CnVs. in Huref sample sequenced using the Solid(Tm) system, we
observe concordance of 89%-97% (using window size 2kb-5kb) with respect
to database of Genomic Variants. with simulated reads of coverage 1X-10X, we
observe overall sensitivity between 90-96%. our method can not only accurately
detect CnVs of sizes ranging from few hundred bases to regions spanning full
chromosome (possible with cancer samples), but can also assign precise copy
number and p-value to the regions.

i n t E R n a L D u p L i c at i o n s i n a - h E L i c a L M E M B R a n E p R o t E i n
to p o L o g i E s a R E c o M M o n B u t t h E n o n D u p L i c at E D f o R M s
aRE RaRE

Presenter: Aron Hennerdal, Stockholm Center for Biomembrane Research
Authors: Aron Hennerdal, Jenny Falk, Erik Lindahl, Arne Elofsson

aBstR ac t: many alpha-helical membrane proteins contain internal
symmetries, indicating that they might have evolved through a gene duplication
and fusion event. Here, we have characterized internal duplications among
membrane proteins of known structure and in three complete genomes. we
found that the majority of large transmembrane (Tm) proteins contain an
internal duplication. The duplications found showed a large variability both
in the number of Tm-segments included and in their orientation. Surprisingly,
an approximately equal number of antiparallel duplications and parallel
duplications were found. However, of all 11 superfamilies with an internal
duplication, only for one, the acrB multidrug efflux pump, the duplicated unit
could be found in its nonduplicated form. an evolutionary analysis of the acrB
homologs indicates that several independent fusions had occurred, including the
fusion of the Secd and SecF proteins into the 12- Tm-protein SecdF in Brucella
and Staphylococcus aureus. in one additional case, the Vitamin B12 transporter-
                                                                                                  poSTer preSenTaTionS




like aBC transporters, the protein had undergone an additional fusion to form
protein with 20 Tm-helices in several bacterial genomes. Finally, homologs to
all human membrane proteins were used to detect the presence of duplicated
and nonduplicated proteins. This confirmed that only in rare cases can homologs
with different duplication status be found, although internal symmetry is
frequent among these proteins. one possible explanation is that it is frequent

roCky ‘10                                                                                    61
                            p o s t E R p R E s E n tat i o n s

                       that duplication and fusion events happen simultaneously and that there is
                       almost always a strong selective advantage for the fused form.

                       i n v E s t i g at i n g t h E p o t E n t i a L o f v i R a L p R o c a p s i D s i n
                       M E ta B o L i c c h a n n E L i n g

                       Presenter: Kris Hon, University of Toronto, Department of Biochemistry
                       Authors: Kris Hon, Diane Bona, Alan Davidson, Karen Maxwell, John Parkinson

                       aBstR ac t: metabolic channeling is the phenomenon where intermediates
                       are transferred between the active sites of biochemically sequential enzymes
                       without release into the bulk cytosol. Some examples of metabolic channeling
                       in nature include direct tunnelling between active sites and bifunctionalization
                       of enzymes. it provides benefits such as decreased travel time to active sites,
                       limited metabolite inhibition of other enzymes, restricted release of toxic/
                       unstable intermediates and pathway flux control. Various attempts have been
                       made to emulate nature’s implementation of metabolic channeling for the
                       purposes of metabolic engineering. These include the use of protein scaffolds
                       and gene fusions. despite their moderate success, these enzyme co-localization
                       techniques do not prevent metabolites from being released into the bulk
                       cytosol and diffusing away. To explore this problem, we used our cellular
                       simulation tool, Cell++, to compartmentalize enzymes of interest and observe its
                       biochemical effects. our results indicated that compartmentalization of certain
                       enzymes, such as Udp-n-acetylglucosamine 1-carboxyvinyltransferase and
                       Udp-n-acetylmuramate dehydrogenase, greatly increases the catalytic efficiency
                       of their respective biochemical pathway. To this end, we are investigating
                       the potential of viral procapsids as a micro-compartment for confirming the
                       results from our Cell++ simulations. Viral proteases responsible for viral capsid
                       maturation prior to dna packaging are believed to be targeted to the interior of
                       the procapsid through a peptide targeting sequence. This allows us to potentially
                       target enzymes of choice into the viral procapsid and investigate whether or not
                       metabolic channeling is occurring between the targeted enzymes.

                       M R n a - s E q w o R k f L o w at M ayo c L i n i c

                       Presenter: Asif Hossain, Mayo Clinic
                       Authors: Asif Hossain, Yan Asmann, Sumit Middha, Saurabh Baheti, Zhifu Sun,
                       High-Seng Chai, Xiao-Yu Liu, Ying Li, Asha Nair, Eric Klee, Jean-Pierre Kocher

                       aBstR ac t: next Generation Sequencing technology has become the most
                       powerful tool for transcriptome profiling. However, the value of this technology
poSTer preSenTaTionS




                       has been constrained by the limited number of analytic tools available
                       for comprehensive interpretations of the mrna-Seq data. we developed
                       SnowShoes-eX, an analytic pipeline, to integrate rapid sequence alignment with
                       a rich set of sequence annotation tools. The software integrates open source




                       62                                                                                 roCky ‘10
                                    p o s t E R p R E s E n tat i o n s

tools for sequence alignment and variant calling, and novel algorithms for
fusion transcript detection, novel transcript identification, and alternative splicing
discovery. Sequencing reads from FaSTQ files are aligned to the reference
genome and an in-house developed exhaustive one-directional exon-junction
database using the Bwa aligner. refSeq gene and exon read-counts are then
rapidly computed using an approach based on the UCSC genome binning
algorithm. reads perfectly mapped to multiple locations on the reference
are assigned to features using the expectation-maximization algorithm. open
source SnVmix tool is used for allele calling with a probabilistic model. Fusion
transcripts and novel transcripts are identified using our in-house developed
algorithms and peak-finding tools, respectively. The alternatively spliced exons
are identified using a multi-variant anoVa model and subsequently validated by
corresponding junction reads. The SnowShoes-eX package is optimized to run
on the Sun Grid engine platform and can produce basic results within minutes
after alignment. with decreasing sequencing cost, the utility of mrna-Seq data
will be determined by the quality of analytic tools available to researchers. we
have developed a pipeline to rapidly provide investigators with interpretable
information from large mrna-Seq datasets.

s t o p u s i n g J u s t g o : a M u Lt i - o n t o L o g y E n R i c h M E n t
a n a Ly s i s t o o L f o R g E n E s a n D p R o t E i n s

Presenter: Emily J Howe, The Buck Institute
Authors: Emily Howe, Uday Evani, Mathew Fleish, Nigam Shah, Sean Mooney

aBstR ac t: enrichment analysis is a common technique among biologist
used to reduce a large set of annotations to a smaller and more manageable
set of significantly represented concepts. Currently enrichment analysis is
done primarily using Gene ontology (Go). Because enrichment analysis is a
reduction technique the quality of the results depend entirely on the data used
to create them. although Go has been largely useful, there are entire domains
of research that are not considered as part of that ontology (such as diseases or
phenotypes). To solve this problem we have created STop (Statistical Tracking of
ontological phrases), a multi-onotlogy automated enrichment analysis tool for
performing Go like enrichment analysis on genes and/or proteins using other
ontologies. STop gathers text related to genes from the nCBi entrez database
and is then automatically annotated using the Stanford nCBo annotator. The
nCBo annotator currently annotates with terms from over 200 ontologies.
STop will perform enrichment analysis using anywhere from 1 to all of the
annotated ontologies. Users can select their own background dataset or STop
will use a predefined background of the entire genome for a given species. STop
                                                                                         poSTer preSenTaTionS




is currently fully implemented with Human genes. Human proteins and other
species are currently under development.




roCky ‘10                                                                           63
                            p o s t E R p R E s E n tat i o n s

                       E a R Ly D E t E c t i o n a n D D y n a M i c s o f R a R E v i R a L v a R i a n t s
                       B y u Lt R a D E E p s E q u E n c i n g

                       Presenter: Peter T Hraber, Los Alamos National Lab
                       Authors: Peter Hraber, Will Fischer, Elena Giorgi, Thomas Leitner, Tanmoy
                       Bhattacharya, Bette Korber

                       aBstR ac t: The ability to detect rare viral variants as they accumulate under
                       cytotoxic-T-lymphocyte selection illuminates the evolution of viral escape from
                       the immune system. while Sanger sequencing via single-genome amplification
                       (SGa) yields highly accurate sequences, detecting rare variants requires intensive
                       sampling. in contrast, next-generation sequencing technologies yield 3-4 orders
                       of magnitude more sequences, and provide sensitivity to detect rare variants
                       undetected by SGa sequencing. previous sequencing results informed design
                       of ultradeep pyrosequencing strategies for two longitudinal studies: (1) the
                       SiV p199ry nef epitope in experimentally infected macaques and (2) whole-
                       genome HiV-1 subtype B from the CHaVi 001 study participant designated as
                       subject 700010040 (CH40). The SiV study follows p199ry epitope evolution in
                       5 mamu a*01+/a*02+ rhesus macaques intravenously infected with 60,000
                       copies per ml of SiVmac251 inoculum, to quantify frequencies of epitope
                       variants at 21, 35, and 84 days post-infection. The HiV-1 study represents the
                       viral genome with 35 overlapping regions that average 500 nt (median 501,
                       range 448-556 nt) and from 5 longitudinal samples from acute to chronic
                       infection. with Sanger sequencing results guiding amplicon design, ultradeep
                       sequencing yielded 24,870-110,200 (median 48,719) SiV reads and 3,762-
                       27,222 (median 13,801) from HiV per amplicon region sampled. Ultradeep
                       sequencing identified early waves of escape variants that had previously been
                       undetected limited conventional sequencing, and helps elucidate when and
                       how selection influences viral evolution.

                       stR uctu R E aLign M Ent of p RotEi ns with Low sEqu EncE
                       i DEntit y BasED on EncoDED LocaL stR uctu R E aLp haB Ets

                       Presenter: Kenneth Hung, Institute of Biomedical Engineering, National Taiwan
                       University
                       Authors: Kenneth Hung, Jui-Chih Wang, Cheng-Wei Chen, Cheng-Long Chuang,
                       Kun-Nan Tsai, Chung-Ming Chen

                       aBstR ac t: protein structure alignment is one of the major steps in
                       understanding the relationship between protein structures and evolution. The
                       basic framework of the structure alignment algorithms are usually composed
poSTer preSenTaTionS




                       of two major steps, i.e. exploration of a reasonably good estimate of initial
                       solution and iterative optimization process. Under this framework, the
                       conventional algorithms may have two potential deficiencies, i.e. deterioration
                       of final alignment due to the poor initial alignment and the long computation
                       time caused by the iterative optimization process. This paper proposed a new



                       64                                                                             roCky ‘10
                                     p o s t E R p R E s E n tat i o n s

vector-based algorithm with a hierarchical three-step framework, i.e. the vector-
based initial alignment offering potential pairings of SSe vectors of the two
proteins, the alphabet code-based local structure alignment, and the rigid-body
transformation. The test data, 600 protein pairs downloaded from SCop with
less than 30 % sequence identity, were employed in this study to assess the
alignment quality and the computational efficiency. The statistical analysis of
alignment quality based on match index (mi) demonstrated that the proposed
algorithm outperformed three other algorithms, i.e., Ce, SSm and Tm-align.
moreover, the proposed algorithm was shown to be more computationally
efficient than the Ce, SSm and Tm-align algorithms. The improved alignment
quality and computational efficiency may be attributed to the one-dimensional
alphabet code-based local structure alignment, which not only yields a better
estimate of initial solution for further optimization, but also makes it possible to
employ a non-iterative rigid-body transformation to achieve a better alignment
than the conventional iterative optimization process.

i n v E s t i g at i n g R E L at i o n s h i p s B E t w E E n o B E s i t y a n D t h E
B u i Lt E n v i R o n M E n t u s i n g a g E n t - B a s E D M o D E L i n g

Presenter: Helmet T Karim, University of Pittsburgh
Authors: Helmet Karim, Leming Zhou

aBstR ac t: obesity has become a world-wide epidemic. it is the result
of complex interactions among many different factors such as genetics,
environment, behavior, culture, and social networks. Currently extensive work
has been done on these factors in various fields, for instance, genome-wide
association studies for determining the genetic causes of obesity and statistical
investigations on the prevalence of obesity based on large-scale surveys. To
obtain a dynamic view of the complex interactions among various environmental
factors in the prevalence of obesity, in this work we propose to create an agent-
based model. in this model, agents are people and their direct environment
such as markets, restaurants, workplaces, homes, and gyms. rules governing
the behavior of these agents are constructed based on extensive literature
review. This agent-based model can visually present the dynamic interaction
among various agents in the model. Users of this model can conveniently adjust
the parameters in this model at the real time to observe the sensitive of each
parameter and the behavioral changes of those agents. after the calibration
of parameters in this model, we have observed some reasonable results. For
instance, an increased number of fast-food restaurants in the area or a longer
distance from healthy groceries would have negative effects on weight and
Bmi of the population. Further work on this model should provide us more
                                                                                                 poSTer preSenTaTionS




meaningful results in the near future. The success of this model may open doors
to more comprehensive models and therefore a more accurate picture of factors
related to obesity.




roCky ‘10                                                                                   65
                            p o s t E R p R E s E n tat i o n s

                       p R o v i D i n g c o n t E x t to g E n E t i c a s s o c i at i o n s w i t h g E n E
                       ExpREssion in REnaL DisEasE

                       Presenter: Benjamin J Keller, Eastern Michigan University
                       Authors: Benjamin Keller, Sebastian Martini, Matthias Kretzler

                       aBstR ac t: Candidate Snps from genetic association studies are often
                       without context in terms of a molecular interpretation. Historically, scientists
                       performing a GwaS will translate the Snps to candidate genes using guilt-by-
                       proximity, explore gene annotation and perhaps pathways, but not go much
                       further. recent studies may take a stronger step by employing small studies
                       treating expression as a quantitative trait to help interpret Snps relative to gene
                       expression in cell lines frequently derived from peripheral blood. our focus is
                       on regulatory systems affected by Snps, and will discuss our experience using
                       renal tissue expression to interpret renal disease GwaS candidates with two
                       approaches. one, linking across populations, correlating a quantitative clinical
                       trait with expression under guilt-by-proximity; and, the other, within the same
                       population, associating renal tissue expression with genotype in an eQTl
                       analysis. we discuss our experience with these approaches, and the directions
                       the results allow us to take.

                       aLgoR ith M foR p hyLogEn Etic tR EE B u i LDi ng an D
                       ta x o n o M i c c L a s s i f i c at i o n u s i n g c u R at E D p h y L o g E n E t i c
                       tREE

                       Presenter: David A Knox, University of Colorado Anschutz
                       Authors: David Knox, Robin Dowell

                       aBstR ac t: There are bacterial communities all around us in the soil, oceans,
                       and even within the human body, which are vital to everyday human life. These
                       microbiomes contain diverse groups of bacteria working in unison to maintain
                       healthy environments. when the environments are altered by pollution or
                       disease, the community changes in both dramatic and subtle ways. determining
                       the makeup of the community and identifying significant changes between
                       communities has only recently been possible by using high-throughput dna
                       sequencing technology. However, sequencing technology has advanced faster
                       than the analysis methods for interpreting the data. phylogenetic trees are
                       used to determine the relatedness of organisms within the community, but
                       most tree building algorithms are exponential in execution time, which make
                       them impractical on data sets with hundreds of thousands of sequences. This
                       work presents a new application, parsinsert (parsimonious insertion), which
poSTer preSenTaTionS




                       exploits the knowledge provided by publicly available curated phylogenetic
                       trees to efficiently produce both a phylogenetic tree and taxonomic classification
                       for sequences from a microbial community. Using a guide tree that curators
                       have established as having the best topology given current knowledge,
                       parsinsert assigns a parsimonious sequence and a taxonomic classification
                       for common ancestors (internal nodes) of the tree. The parsinsert o(n)
                       algorithm concurrently inserts all the unclassified sequences into the tree
                       66                                                                               roCky ‘10
                                     p o s t E R p R E s E n tat i o n s

while maintaining the original topology. it also infers the taxonomy for each
unclassified sequence by examining the classification at the insertion site.
results show that 88% of the sequences are classified to the Family rank with
98%.

D E t E c t i n g c a s E - s p E c i f i c k E y - pat h w ay s u s i n g o M i c s
E x p R E s s i o n D ata

Presenter: Hande Kucuk, Max Planck Institute
Authors: Hande Kucuk, Nicolas Millman, Mayank Kumar, Jan Baumbach

aBstR ac t: Computational systems biology methods today help life scientists
with exploring the masses of available data. different methods and tools have
been developed in an attempt to serve the need of extraction, exploration
and deeper analysis of the data. For instance, statistical and machine learning
methods have been successfully used in identifying pattern signatures. However,
these methods fail to give deeper insights into omiCS expression data. in this
work, we introduce aCoGea, a Cytoscape plug-in that allows extracting and
visualizing sub-pathways that may be of interest given the results of a series of
gene expression studies. we aim to detect “highly-connected” sub-networks
where most genes show “similar” expression behavior. in particular, given
network and gene expression study data, those maximal sub-networks are
identified where all but n nodes of the network are expressed similarly on all
but m cases of the user specified gene expression study data. as finding such
modules is computationally intense, we developed and implemented heuristics
algorithms based on ant Colony optimization. Here, we present initial results
and first evaluations for a ppi network and two cancer data sets.

p h y L o g E n E t i c a n a Ly s i s a n D s t R u c t u R E - f u n c t i o n
R E L a t i o n s h i p s i n t h E o x a c i L L i n a s E E n z y M E f a M i Ly

Presenter: Kimberly R Lesnock, Pittsburgh Supercomputing Center
Authors: Kimberly Lesnock, Brian Chen, Agnieszka Szarecka, Troy Wymore

aBstR ac t: Bacterial resistance to antibiotics is often facilitated through the
action of beta-lactamases and is an emergent and critically important health
challenge. Thus understanding the evolution and evolvability of these enzymes
with regards to the binding and hydrolysis of antibiotics can inform and guide
the process of new generations of these drugs. in this presentation, we focus on
analysis of class-d Beta-lactamases (oxacillinases) responsible for hydrolyzing
the last resort antibiotic carbapenem. a refined multiple sequence alignment of
                                                                                            poSTer preSenTaTionS




over 80 sequences was constructed using meme and 3-dimensional structure
as a guide. Subsequent phylogenetic analysis revealed several distinct groups of
oxacillinases for which residues were identified using Group entropy that most
distinguished these groups. of particular interest is the discovery of a disulfide
bond within one group and residues lying adjacent to the substrate-binding
region. Finally, through mutual information and other methods we identify a
network of residues that possibly co-evolve.

roCky ‘10                                                                              67
                            p o s t E R p R E s E n tat i o n s

                       R E - s E q u E n c i n g w o R k f L o w at M ayo c L i n i c

                       Presenter: Ying Li, Mayo Clinic
                       Authors: Ying Li, Yan Asmann, Sumit Middha, Asif Hossain, High Seng Chai, Asha
                       Nair, Saurabh Baheti, Jean-Pierre Kocher

                       aBstR ac t: re-sequencing of targeted regions in human genome has been in
                       a rapid rise among next generation sequencing applications due to its relatively
                       low cost compared to whole-genome sequencing and its capability for novel
                       variants discovery. aspired to provide accurate and more biological interpretable
                       results quickly for researchers at mayo Clinic and beyond, our bioinformatics
                       group has developed a re-sequencing workflow to generate Single nucleotide
                       Variant (SnV) and indel calls in parallel. each resulting variant is associated
                       with a broader range of annotations as well as hyperlinked to iGV visualization
                       of the region against selected UCSC tracks. after investigating several aligners
                       and variant callers, Bwa-GaTk and maQ-maQ alignment and variant calling
                       combinations were selected for identifying indels and SnVs, respectively.
                       annotations from various public databases provide researchers immediate
                       access to dbSnp rsids, allele frequencies based on Hapmap and 1000genome
                       populations, together with Seattle-Seq and/or SiFT functional predictions. other
                       features of the workflow include distance of the variants to the closest exon/
                       intron boundary on each transcript, tissue-specific expression of transcript and
                       pathway enrichment analyses for the genes of interest after variant filtering.
                       integrated web delivery of analysis results not only is user-friendly but also
                       allows researchers to perform dynamic filtering on key features to refine their
                       search. parallelization and optimization of the workflow reduced the run time
                       significantly from 70+ hours to as low as 3 hours.

                       M o D E L i n g g E n E - s p E c i E s D ata B y g E n E R a L i z E D R E p L i c ato R
                       DynaM ics foR Ef f ici Ent p hyLogEn Etic i n f ER EncE

                       Presenter and Author: Ying Liu, University of North Texas at Dallas

                       aBstR ac t: in recent years, biclique methods have been proposed to
                       construct phylogenetic trees. one of the key steps of this method is to find
                       complete sub-matrices (no missing entries) from a species-genes binary matrix.
                       Sanderson et al. 1 formulated it as the problem of enumerating all maximal
                       bicliques. as widely adopted by the phylogeneticists, bicliques, which have both
                       large number of species and large number of genes, yield more informative
                       phylogenetic trees. This leads to the conclusion that a balanced biclique is
                       preferred to help phylogenetic inference. exact algorithms for the maximal
poSTer preSenTaTionS




                       biclique enumeration problem are not efficient in finding balanced bicliques,
                       and it is not able to reveal the relationship among these bicliques. in this paper,
                       we identified the distinct ladder-like overlapping structure of bicliques that exists
                       in the species-genes matrix for discovering balanced bicliques. Such structure
                       can be easily used to select balanced bicliques. we approached the problem
                       of finding the ladder-like overlapping structure of bicliques by generalizing a

                       68                                                                              roCky ‘10
                                      p o s t E R p R E s E n tat i o n s

well-known evolutionary selection model, replicator dynamics, to a new discrete
dynamical system, called generalized replicator dynamics. empirical study shows
our method is effective and efficient for phylogenetic inference.

c a n c E R g E n E E x p R E s s i o n D a t a i s n o t n o R M a L Ly
D i s t R i B u t E D : a n a Ly s i s o f D a t a D i s t R i B u t i o n s a n D
thEiR EffEcts on gEnE sELEction anD MoLEcuLaR
c L a s s i f i c at i o n

Presenter: Nicholas F Marko, Cleveland Clinic Department of Neurosurgery
Authors: Nicholas Marko, Robert Weil

aBstR ac t: introduction: The distribution of gene expression in cancer
transcriptomes is generally assumed to conform to a normal distribution, and
many algorithms for molecular classification and gene selection are predicated
upon this assumption. This assumption may not be valid and may contribute to
inconsistencies and inaccuracies in translational molecular oncology research.
methods: we analyzed the 2nd-4th central moments of gene expression data
distributions from each of five publicly-available cancer microarray datasets
and compared them to those of the normal distribution. we then used curve
fitting to identify which of 53 theoretical distributions best approximated the
actual distribution of each expression data set. Finally, we compared a Box-
Cox-normalized, sixth dataset to its untransformed counterpart to investigate
the potential effects of non-normal distributions on gene selection and
molecular classification. results: The 2nd-4th central moments of all datasets
demonstrated statistically-significant differences from those of the normal
distribution. Curve fitting suggested that modeling cancer gene expression
distributions requires multi-parameter, generalized distributions, including
the beta, gamma, and weibull. application of several, common molecular
classification algorithms before and after Box-Cox normalization yielded different
results, and expression profiles distinguishing identical subgroups of this data
differed by an average of 15% before and after transformation. Conclusions: The
distribution of cancer gene expression data is not normal and is best modeled
using multi-parameter, generalized distributions. This deviation affects the results
of many standard algorithms for gene selection and molecular classification.
algorithms that do not assume normality may be necessary for accurate
genomic analysis of cancer.

aRti f iciaL n Eu R aL n Et woR k ap p Roach foR p RoMotER
p R E D i c t i o n i n p R o k a Ryot i c o R g a n i s M s B a s E D o n
stRuctuRaL pRopERtiEs of Dna
                                                                                          poSTer preSenTaTionS




Presenter: Aleksandra A Markovets, University of Arkansas at Little Rock
Authors: Aleksandra Markovets, Abigail Newsome, Charles Bland

aBstR ac t: one of the major challenges in biology is the correct identification
of promoter regions. wet-lab methods provide accuracy but suffer from being
time-consuming. in order to facilitate faster processing, computational methods

roCky ‘10                                                                            69
                            p o s t E R p R E s E n tat i o n s

                       are required. although far from perfect, they do offer means for quickly
                       identifying potential targets for experimental validation. Computational methods
                       based on motif searching have been the traditional approach used for promoter
                       prediction. recent studies have shown that dna structural properties, such as
                       curvature, stability, and stress-induced duplex destabilization (Sidd) are useful,
                       as well. Some of the most impressive results to date have been reported for
                       e. coli k12 using dna stability and Sidd. These were achieved simply using
                       minimum threshold values for distinguishing promoter from non-promoter
                       regions. in the current study, a more sophisticated approach is presented,
                       involving machine learning. artificial neural networks (anns) were used to
                       predict promoter regions in e. coli k12 from dna curvature, stability, and Sidd
                       profiles. in order to compare predictions using a one-dimensional performance
                       measure, the weighted average of the precision and recall, known as F-score,
                       was computed. F-scores of 0.36, 0.64, and 0.58 were achieved for curvature,
                       stability, and Sidd, respectively. The highest prediction of 0.66 was attained
                       when combining the three properties. These results are improvements over
                       those obtained using threshold-based methods, and represent some of the best
                       to date for both motif and structure-based promoter prediction.

                       MoDELLing gEnE ExpREssion in tuMoR pRogREssion
                       u s i n g B i n a R y s tat E s

                       Presenter: Juan Emmanuel Martinez-Ledesma, ITESM Campus Monterrey
                       Authors: Juan Emmanuel Martinez-Ledesma, Victor Trevino

                       aBstR ac t: Cancer is a complex disease characterized by the disrupted activity
                       of several cancer-related genes such as oncogenes and tumor-suppressor genes
                       (TSG). By definition, it is expected that the expression of cancer-related genes
                       changes during tumor progression. despite the enormous efforts made for
                       biomarker and gene pattern discovery, few methods have been designed to
                       model the gene expression level to tumor stage during malignancy progression.
                       Such models could help us to understand the dynamics and complexity of
                       tumor progression. we have developed a methodology based on the proportion
                       of samples whose gene expression level were activated or inactivated within a
                       tumor stage to compose expression patterns associated to tumor progression.
                       our preliminary results using a prostate cancer dataset show that our method
                       identifies the expected profile corresponding to oncogenes and TSG in both
                       cancer and non-cancer related genes. ontology and pathway analysis show that
                       the significant genes found are associated to well know cancer-related terms. in
                       addition, we show that a considerable proportion of significant profiles are not
                       found by other statistical tests commonly used to detect differential expression
poSTer preSenTaTionS




                       between tumor stages.




                       70                                                                      roCky ‘10
                                    p o s t E R p R E s E n tat i o n s

f u n c t i o n a L R E g u L ato R y c i R c u i t s i n D u c E D B y
t R a n s c R i p t i o n fa c to R s a n D s M a L L R n a s

Presenter: Molly Megraw, Duke University
Authors: Molly Megraw, Uwe Ohler

aBstR ac t: a program of tightly regulated gene expression is at the heart of
development for every living organism. recent years have seen an increased
appreciation for the complexity of transcriptional control by gene Transcription
Factors (TFs) as well as post-transcriptional control by small rnas known as
micrornas (mirnas). Several specific cases of small mirna-TF regulatory
circuits have been painstakingly discovered link by link using traditional genetic
experiments. These examples all point to mirna-TF circuits as crucial network
components with important system-wide regulatory characteristics. They also
highlight the need for systematic studies and methods to identify TF-mirna
circuits and query their biological function. The fundamental idea behind
network motif discovery is that if a certain configuration (a 3-node cycle for
example) is contained within a given network a surprisingly high number of
times compared to many randomized networks, this configuration is likely
to have been preserved through evolutionary time because it benefited the
organism. However, motif identification is a useful concept only to the degree
that the set of randomized background networks used for comparison are
plausible as alternatives to the given network. Currently available background
models were developed for use in TF-only networks and therefore have a
number of shortcomings for use in TF-mirna-gene networks. Here we present
an algorithm that assigns edges in a manner that accounts for the unique
biological constraints between each type of network entity, creating a more
flexible and realistic background randomization model. we discuss network
motifs identified in the arabidopsis thaliana model plant system.

L a R g E s c a L E a n a Ly s i s o f t h E s o Lv a t i o n p R o p E R t i E s o f
foLDED p RotEi ns

Presenter: Marcelo C Melo, UFRJ
Authors: Marcelo Melo, Pedro Pascutti

aBstR ac t: The folding process creates a stable structure from internal
contacts between residues and from their interaction with the environment.
The objective of this work was to give a novel approach to the acquisition of
biochemical data from known protein structures, seeking patterns in amino acid
usage and phisicochemical characteristics of folded structures. The properties
                                                                                             poSTer preSenTaTionS




of the solvent accessible surface (SaS) of over 13400 proteins from the protein
data Bank were analyzed, along with their individual amino acids, in a total of




roCky ‘10                                                                               71
                            p o s t E R p R E s E n tat i o n s

                       3177215 residues. Using experimental values of solvation free energy (SFe) for
                       the residues, the SFe for all proteins was determined. it was observed that it
                       follows a power law as a function of the number of amino acids in the protein
                       chain, with an exponent of 0,88 and correlation coefficient of 0,98, in good
                       agreement with predicted values. moreover, when the SFe is normalized by the
                       number of amino acids in the protein, it shows that, for proteins with less than
                       200 residues in length, the SFe per amino acid is 30% higher than in bigger
                       proteins, where it stabilizes to a fix value of 1,5 kcal/mol. analysing residues
                       with zero SaS, it was possible to study the composition of the protein core. it
                       can be seen that the amino acid composition is drastically different in structures
                       with more than 200 residues, primarily made up of Gly and ala along with polar
                       residues, than in structures with less residues, where the core is composed of
                       several nonpolar amino acids.

                       a c R o s s s p E c i E s i D E n t i f i c at i o n o f s h a R E D
                       t R a n s c R i p t i o n a L n E t w o R k o f D i a B E t i c n E p h R o pat h y

                       Presenter: Viji S Nair, University of Michigan
                       Authors: Viji Nair , Jeffrey Hodgin, Hongyu Zhang, Ann Randolph, Raymond
                       Harris, Robert Nelson, Frank Brosius, Matthias Kretzler

                       aBstR ac t: Though mouse models of diabetic nephropathy (dn) have some
                       utility, none reliably mimics human disease and it has proven challenging to
                       identify specific factors that cause or predict human dn. This study aims to
                       define where mouse models recapitulate human dn on functional level by
                       identifying shared transcriptional mechanisms at network level. our hypothesis
                       is that the major nodes of the shared network represent key mechanisms of
                       dn pathophysiology. Transcriptional profiling of glomerular mrna of patients
                       with Type ii diabetes (albuminuric (>30 mg/g alb/Cr) versus nonalbuminuric
                       (<30 mg/g alb/Cr)) and three amdCC mouse models (streptozotocin treated
                       dBa/2 mice, db/db C57BlkS mice, and enoS-deficient db/db C57BlkS mice,
                       each versus control) were generated using affymetrix microarrays, transcriptional
                       pathway mapping, and promoter modeling tools. integrating gene expression
                       alterations with biological knowledge using natural language processing resulted
                       in complex networks of 1000s of genes linked by multiple co-citations and
                       promoter binding sites (Genomatix Bibliosphere). Tale (Tool for approximate
                       large Graph matching) aligned the human and mouse transcriptional networks
                       to derive the shared network structures for each human-mouse comparison.
                       each shared network (~100 nodes) represents key nodes of conserved
                       regulatory events. many of them reflect established pathogenetic mechanisms of
                       diabetic complications including Jak-STaT and VeGFr signaling pathways. Shared
poSTer preSenTaTionS




                       top biological processes included endothelial cell differentiation, angiogenesis,
                       and phospolipase C activity. This approach can guide the selection of disease
                       pathways in mouse models that are the most relevant to the human disease
                       process and identify new pathways that are excellent targets for future study.




                       72                                                                              roCky ‘10
                                      p o s t E R p R E s E n tat i o n s

R a D i a n t : i n t E R a c t i v E v i s u a L i z at i o n o f ta x o n o M i c
aB u n DancE

Presenter: Brian D Ondov, National Biodefense Analysis & Countermeasures
Center
Authors: Brian Ondov, Adam Phillippy, Nicholas Bergman

aBstR ac t: Visualizing the taxonomic classification of metagenomic samples
is challenging due to the hierarchical nature of the data. Taxonomy trees do
not represent the abundance of taxa in the sample, while abundance charts
for specific ranks do not represent taxonomic relationships. radiant, however,
overcomes this challenge by creating interactive, multi-level pie charts. These
charts depict both the abundance and the taxonomic relationships of several
ranks of taxa simultaneously using recursively subdivided wedges. additionally,
the focus of a chart can be dynamically shifted to any taxon to show its
composition in more detail. Brief, animated transitions depict the change of
context, allowing simple, intuitive navigation of complex samples. radiant is
implemented using HTml5 and JavaScript, allowing charts to be explored locally
or served over the internet, requiring only a current version of any major web
browser.

to p i a Ry E x p Lo R E R

Presenter: Megan A Pirrung, University of Colorado
Authors: Megan Pirrung, Ryan Kennedy, Rob Knight

aBstR ac t: Current technologies for high-throughput sequencing provide an
investigator with massive amounts of data that are most easily interpreted with
use of appropriate graphical models. in microbial ecology studies, the use of a
phylogenetic tree can provide powerful insight into the structure of microbial
hierarchy. Topiary explorer is an innovtive phylogenetic tree-viewing program
written in Java. Unlike most phylogenetic tree-viewing software currently
used, Topiary explorer integrates the tree with its related metadata. This tree-
metadata relationship facilitates easy identification of interesting inferences by
way of automated tree coloring. Topiary explorer handles tip-related metadata,
commonly in the form of oTU metadata, as well as more generalized group
metadata by way of a tip to group table and group metadata, commonly in the
form of sample metadata. in addition to automated tip coloring, tips can also
be automatically labeled with related metadata. principle coordinates analysis
visualizations are available in the stand-alone offline version of Topiary explorer,
and are also colored automatically based on supplied metadata. Topiary explorer
                                                                                           poSTer preSenTaTionS




also features database connectivity, allowing investigators to pull down and
combine, edit, search and save metadata from multiple studies. multiple tree
layouts such as rectangular or polar views combined with automated coloring
of interesting conclusions can be exported as publication quality pdf images.
Topiary explorer acts as an analysis pipeline, keeping all phylogenetic tree
related information in one place so that data analysis is more interactive and
easier for the researcher.
roCky ‘10                                                                             73
                            p o s t E R p R E s E n tat i o n s

                       D i s c o v E Ry o f n E w L i g a n D s f o R p pa R g a M M a B a s E D o n
                       thiazoLiDinE-4-onE: viRtuaL scREEning, MoLEcuLaR
                       D o c k i n g a n D R E c E p to R B i n D i n g s t u Dy

                       Presenter: Sujatha Ramasamy, Sathyabama University
                       Authors: S. Ramasamy, U. Raj, A. Srivastava, R. Bhavsar, C. Lokesh, D. Tripathi,
                       S.A.H Naqvi

                       aBstR ac t: peroxisome proliferator-activated receptor gamma (ppar-? or
                       pparG), also known as the glitazone receptor, or nr1C3 (nuclear receptor
                       subfamily 1, group C, member 3) is a type ii nuclear receptor that in humans
                       is encoded by the pparG gene. peroxisome proliferator-activated receptor
                       (ppar) belongs to the nuclear hormone receptor (nHr) superfamily. Three
                       subtypes, ppara, pparc and ppard, for this receptor have been identified and
                       found to be important targets for the treatment of type 2 diabetes, dyslipidemia,
                       atherosclerosis, etc. a new series of ppar-? ligands based on thiazolidine-4-
                       one has been designed employing virtual screening and molecular docking
                       approach. lamarckian Genetic algorithm based docking (implemented in
                       autodock 4) was performed on 3000 derivatives of thiazolidine-4-one into the
                       active site of the ppar-? (pdB code 2prG). The docking results showed that
                       the binding energies were in the range of -4.26 kcal/mol to -9.56 kcal/mol with
                       minimum binding energy of -9.56 kcal/mol. 16 molecules maintained essential
                       H-bond interaction with the active site residue, i.e. His 323. The study provides
                       hints for the future design of new derivatives with higher potency and specificity.

                       B i o i n f o R M at i c a n D c o M p u tat i o n a L c h a R a c t E R i z at i o n o f
                       o R f 6 : a p u tat i v E t h i o E s t E R a s E

                       Presenter: Maria M Rodriguez-Guilbe, Department of Biochemistry, University of
                       Puerto Rico School of Medicine
                       Authors: Maria Rodriguez-Guilbe, Ricardo Gonzalez-Mendez Troy Wymore, Eric
                       Schreiter, Abel Baerga

                       aBstR ac t: photobacterium profundum is a deep-sea bacterium living at
                       high pressures characterized for the biosynthesis of polyunsaturated fatty
                       acids, pUFas, by a polyketide synthase (pkS) system. when the polyketide
                       chain has reached the required length, a thioesterase (Te) domain is typically
                       responsible for the release of the reaction products. To date no Te domain
                       has been identified in any of the pkS cluster for pUFa. This work is part of the
                       search for a Te specific for pUFas in p. profundum. a good candidate is the orf6
                       gene, which is conserved among other bacteria with pkS gene clusters and is
poSTer preSenTaTionS




                       adjacent to the pkS gene cluster of this organism. we performed bioinformatics
                       analyses including BlaST searches, multiple sequence alignments with TCoFFee,
                       and motif elicitation using maximum entropy (meme) that showed that the
                       orf6 protein belongs to the 4-hydroxybenzoyl-Coa (4HBT) Te family, which is
                       characterized by having a hot-dog fold with a conserved asp17 in the active
                       site. This is consistent with the orf6 crystal structure we determined. motif
                       analysis showed three large well-conserved motifs including the 4HBT motif.

                       74                                                                             roCky ‘10
                                    p o s t E R p R E s E n tat i o n s

evolutionary analysis using pHylip resulted in a phylogenetic gene tree with
three clusters, suggesting the evolution of these genes into sub-families related
to the surrounding environments. preliminary work using molecular mechanics
and quantum chemistry appear to show favorable configurations for a proposed
reaction mechanism. Structural and mechanistic knowledge of this novel enzyme
will be important for the design of lipid-based drugs or the development of
industrial applications like biofuels.

spLicEgRaphER: pREDicting spLicE gRaphs fRoM DivERsE
EviDEncE

Presenter: Mark F. Rogers, Colorado State University
Authors: Mark Rogers, Asa Ben-Hur, Anireddy Reddy

aBstR ac t: deep transcriptome sequencing (rna-seq) with next-generation
sequencing technologies is providing unprecedented opportunities to
researchers for probing the transcriptomes of many species. an important
goal of these studies is to asses the extent of alternative splicing, a process
that increases transcriptome diversity and plays a key role in regulating gene
expression and protein function. although it is inexpensive and easy to obtain
whole transcriptome data using rna-seq, a major limitation is the lack of robust
methods to analyze these data. Consequently there is an increasing demand
for methods that can use the short reads produced in these studies to predict
alternative splicing patterns. There are significant challenges in using short read
data to predict alternative splicing, but as yet there are only a few methods
that address them. whereas existing tools like TaU and Cufflinks predict splice
variants, our approach is to predict splice graphs that capture in a single
structure all the possible ways in which exons can be assembled, allowing us
to address ambiguities that inevitably arise when using short reads to predict
explicit splice forms. Furthermore, our method can integrate short read data with
existing genome annotations and available eST data, and provide visualization of
splice graphs along with the evidence used to construct them. we compare our
framework with TaU and Cufflinks on rna-seq data from arabidopsis and find
that our results agree more closely with existing evidence from curated gene
models.

a s u p p o R t v E c to R c L a s s i f i E R f o R k o R a R c h a E o ta
c o n ta i n i n g h o t s p R i n g s

Presenter: Christian A Ross, The University of Nevada Las Vegas
Authors: Christian Ross, Brian Hedlund
                                                                                      poSTer preSenTaTionS




aBstR ac t: korarchaeota constitute a deeply branching, uncultivated lineage
of the archaea that are found in high-temperature terrestrial and marine
hydrothermal environments. little is known about the factors influencing the
distribution of korarchaeota outside of their requirement for high temperature.
more than 100 sediment samples from hot springs over a wide range of
temperatures and pH were screened for the presence and abundance

roCky ‘10                                                                       75
                            p o s t E R p R E s E n tat i o n s

                       of korarchaeota 16S rrna genes. Using analytical water chemistry data,
                       classification support vector machines (C-SVms) were constructed using the
                       popular libSVm library. Through a process of feature selection using a greedy,
                       combinatoric sets method and grid-search optimization, we have generated
                       and ranked hypotheses regarding those geochemical features that are likely
                       to support the presence of korarchaeota. To our knowledge this is the first
                       application of support vector methods to determine microbial habitability in
                       nature.

                       E v o L u t i o n o f p R o t E i n s t R u c t u R E i n M E ta p n E u M o v i R u s

                       Presenter: Sunando Roy, Pennsylvania State University
                       Authors: Sunando Roy, Abinash Padhi, Francesca Chiaromonte, Mary Poss

                       aBstR ac t: Viruses replicating in different host environments accumulate
                       mutations in their genome that reflect important adaptive changes. Studying
                       how these substitutions help viruses adapt to their host by changing viral protein
                       structure has been challenging in the absence of crystal structure data. Here
                       we developed a statistical approach that utilizes predicted secondary structural
                       data to identify differences between avian and human metapneumovirus, which
                       shared a common ancestor in the last century. The properties that made up the
                       dataset were hydrophobicity, accessibility, flexibility, alpha helix, beta sheet and
                       coils all predicted from the expaSy server. we employed a multivariate linear
                       discriminant analyses at each amino acid position of an alignment to calculate
                       the distance between the two group means. amino acid positions with high
                       distance between the avian and human metapneumovirus groups were called
                       structurally informative sites (SinS). we were successful in identifying sites that
                       differentiated between the avian and human metapneumovirus groups in all
                       viral proteins. High levels of sequence polymorphism did not correspond to
                       elevated proportions of SinS suggesting that sequence diversity and structural
                       diversity were not correlated. most SinS were not under any positive selection
                       presumably because SinS accumulated synonymous substitution at the same
                       rate as other position of the protein suggesting early fixation of these sites.
                       This method thus helps identify amino acid positions that change in predicted
                       structural properties between phylogenetically related groups of viruses that may
                       also be involved in the early adaptation of the virus to its host.

                       h i v to o L B o x , a n i n t E g R at E D w E B a p p L i c at i o n a n D
                       D ata B a s E f o R i n v E s t i g at i n g h i v

                       Presenter: David P Sargeant, University of Nevada Las Vegas
poSTer preSenTaTionS




                       Authors: David Sargeant, Sandeep Deverasetty, Yang Luo, Angel Villahoz-Baleta,
                       Stephanie Zobrist, Viraj Rathnayake, Jacqueline Russo, Jay Vyas, Mark Muesing,
                       Martin Schiller

                       aBstR ac t: Current bioinformatics databases and applications are generally
                       focused on a small domain of knowledge. in contrast, biological systems are
                       very complex and have many interdependencies. we propose that integrated

                       76                                                                              roCky ‘10
                                    p o s t E R p R E s E n tat i o n s

data management can greatly enhance the utility of bioinformatics applications
for hypothesis generation and experimental interpretation. To that end, we have
built HiVToolbox as an example of integrated data management for a relatively
simple biological system. HiVToolbox consists of a database and a web-based
user interface. The SQl data warehouse consists of a unified comprehensive
relational model, populated with data from isolated HiV databases containing
protein sequences, structures, functions, interactions, domains, etc. The web
interface allows users to select one of the 24 HiV proteins. They are then shown
a console consisting of four main integrated components: a sequence viewer,
an array of structural displays, two log windows, and a sequence alignment
section. each window and menu has a number of interactive features that
trigger coupled events in other parts of the application. This graphics-driven
application facilitations data-driven mining and analysis in ways that were not
previously possible. To demonstrate its utility, HiV-1 integrase was analyzed
as a case study. integrase is a well-studied multidomain and oligomeric viral
protein that is essential for viral infectivity. This approach revealed several new
hypotheses concerning integrase nuclear import, dna binding, post-translational
modification, ledGF binding, and a Ck2 phosphorylation site in integrase.

MoLEcuLaR MoDELing anD Docking stuDiEs of soME
n o v E L D E R i vat i v E s o f n - p h E n y L - 2 - ( p y R i M i D i n - 2 -
y L s u L fa n y L ) a c E ta M i D E a s a n t i s a R s p R o t E a s E i n h i B i to R s

Presenter: Gyana R Satpathy, National Institute of Technology
Authors: Gyana Satpathy, B. Jabes, S Murugesan, Sripad Patnaik

aBstR ac t: The protein-ligand interaction plays a significant role in structure
based drug design. Starting from a collection of 28 druggable compounds
collected from the literature30. we performed a molecular docking study to their
protein receptor (SarS 3Cl protease). The binding preferences as well as the
hydrogen bonds contributing to the interaction between ligand and the receptor
were studied based on which eight compounds were found that they may be
promising candidates for further investigation. The main structural features
shared by these eight molecules were used to design analogues, followed
by building mimics for the best non-peptide analogues. The lead molecules
were screened based on the lowest energy with repeated conformation of
ligands, and passed through adme/tox filters to identify the toxic compounds.
The best performing eleven ligands retrieved after the screenings which are
non toxic with improved binding efficiency and steric complementarity were
reported. also we have identified a group, which could be a better combination
in substituting the amide bonds for any derivatives of n-phenyl-2-(pyrimidin-
                                                                                               poSTer preSenTaTionS




2-ylsulfanyl)acetamide targeted against SarS-CoV protease, and further infer
that some of the modified drugs are better than the original drugs which
provides useful insight for the development of potential inhibitors against
SarS protease enzyme. key words: SarS 3Cl protease, molecular docking,
(absorption, distribution, metabolism, elimination/ toxicity-adme/tox) steric
complementarity.

roCky ‘10                                                                                 77
                            p o s t E R p R E s E n tat i o n s

                       M o L E c u L a R D y n a M i c s s i M u L a t i o n s o f s E q u E n t i a L Ly
                       va R i E D h u M a n i M M u n o D E f i c i E n c y v i R u s - 1 tat c o n s E n s u s
                       p RotEi n stR uctu R Es

                       Presenter: Gyana R Satpathy, National Institute of Technology
                       Authors: Gyana Satpathy, Sripad Patnaik

                       aBstR ac t: macromolecules undergo changes with time and condition
                       thereby affecting the structural and functional properties. These sequential
                       and structural changes can be enumerated by comparative methods using the
                       sequences and structural models. molecular dynamics simulations are used to
                       investigate dynamics and interactions of proteins in aqueous solution. we have
                       studied the sequential variations in HiV-1 Trans-activating regulatory protein (Tat)
                       among different strains and isolates taken from different geographical areas.
                       Then these variations are modeled in consensus structures, so that each of the
                       disparity can be suitably studied. Comparative molecular dynamics simulation
                       (2 ns) is carried out on each of these models to study the residual motions
                       and interaction fluctuations. The results are compared and the functional
                       implications of each of these transforms are studied. we have identified intra
                       molecular interactions of importance for structure stabilization. The results show
                       the functional characteristics of the protein or part of it is precisely reflected in its
                       structural interactions and molecular dynamics flexibility. key words: modeling,
                       molecular dynamics simulation, namd, modeller, CHarmm, Tat, Tar rna,
                       Hydrogen interaction, rmSd.

                       p R i n s E q , ta g c L E a n E R a n D D E c o n s E q — to o L s f o R q u a L i t y
                       c o n t R o L a n D p R E - p R o c E s s i n g o f M E ta g E n o M i c D ata s E t s
                       Presenter: Robert Schmieder, San Diego State University
                       Authors: Robert Schmieder, Robert Edwards

                       aBstR ac t: High-throughput sequencing has revolutionized microbiology
                       and accelerated genomic and metagenomic analyses; however, downstream
                       sequence analysis is compromised by low-quality sequences, sequence artifacts
                       and sequence contamination, eventually leading to misassembly and erroneous
                       conclusions. These problems necessitate better tools for quality control and
                       pre-processing of all sequence datasets. Here, we present three tools for
                       easy and rapid quality control and data pre-processing (prinSeQ), automatic
                       identification and removal of sequence artifacts (TagCleaner), and contaminants
                       (deconSeq) in metagenomic datasets. These tools incorporate different
                       algorithms suitable for each task and were evaluated on both artificial and real
poSTer preSenTaTionS




                       datasets. They are publicly available through user-friendly web interfaces and
                       as standalone versions. The web interfaces allow online analysis of genomic
                       or metagenomic datasets and generation of the output using our computing
                       resources. The results can be exported for subsequent analysis, and the required
                       databases used for the web-based versions are automatically updated on a
                       regular basis. This set of tools allows scientists to efficiently check and prepare

                       78                                                                            roCky ‘10
                                     p o s t E R p R E s E n tat i o n s

their datasets prior to downstream analysis. The web interfaces for each tool are
simple and user-friendly and the standalone versions allow offline analysis and
integration into existing data processing pipelines. The results reveal whether
the sequencing experiment has succeeded, whether the correct sample was
sequenced, and whether the sample contains any contamination from dna
preparation or host. all tools provide a computational resource able to handle
the amount of data that next-generation sequencers are capable of generating
and can place the process more within reach of the average research lab.

i n s i L i c o i n f E R E n c E o f i M M u n o L o g i c a L R E L at i o n s h i p s
B E t w E E n p R ot E i n s B a s E D o n t h E i R c y toto x i c
t - Ly M p t h o c y t E E p i t o p E R E p E R t o i R E s

Presenter and Author: Werner Smidt, University of Pretoria

aBstR ac t: The importance of Cytotoxic T-Cell (CTl) reponses during the
course of intracellular infections has received a lot of attention during the
past few decades. CTls with the appropriate T-Cell receptor (TCr) respond
to epitopes originating from the cleavage of foreign intracellular proteins.
The CTl response is crucial for the control of intracellular pathogens such as
in?uenza, HiV and others by causing the destruction of the cell presenting
the offending epitope of the pathogen that resides within it. Classically, the
proteins are processed by the proteasome, transported to the endoplasmic
reticulum (er) by the Transporter associated with antigen presentation (Tap)
and loaded onto the major Histocompatibility (mHC) Class i molecule. due to
the extreme polymorphism of mHC molecules and difficulty of experimentally
determining potential CTl epitopes, computational tools have been developed
to detect potential CTl epitopes by making qualitative or quantitative
predictions concerning the different steps in the pathway or a combination
thereof. in this study, a novel method was developed to detect epitopes by
combining the different steps in the pathway using available prediction tools
for quantitative proteasomal cleavage and mHC affinity estimations as well as
the construction of a novel quantitative Tap affinity predictor. Furthermore, by
using a BloSUm based comparison score by other authors in conjunction with
the aforementioned epitope predictions, a method was developed to cluster
mutational variants of protein sequences based on their CTl epitope repertoires
for various mHC allotypes. This was implemented as a web-based tool, called
Fortuna.

c h a R a c t E R i z i n g a p i c o M p L E x a n pa R a s i t E M E ta B o L i s M B y
f L u x B a L a n c E a n a Ly s i s o f t o x o p L a s M a g o n D i i
                                                                                                 poSTer preSenTaTionS




Presenter: Carl Song, University of Toronto
Authors: Carl Song, Stacy Hung, John Parkinson

aBstR ac t: The increasing prevalence of infections involving apicomplexan
parasites such as plasmodium, Toxoplasma, and Cryptosporidium (causative
agents of malaria, toxoplasmosis and cryptosporidiosis respectively) represents

roCky ‘10                                                                                   79
                            p o s t E R p R E s E n tat i o n s

                       a significant global health care burden. with the emergence of new resistant
                       strains of parasites, increasingly fewer treatments are available. we postulate
                       that parasites have evolved distinct metabolic strategies critical for growth and
                       survival during human infections. we further hypothesize that the enzymes
                       which undertake these critical functions represent potent virulence factors within
                       a highly integrated metabolic networks. Unfortunately, current knowledge of
                       the metabolic potential of apicomplexan parasites throughout the course of
                       an infection is rudimentary at best. in order to fully understand the complex
                       parasite-host relationships and identifying those enzymes that mediate critical
                       roles from a global “systems” perspective, a fully characterized metabolic
                       network of the experimentally amenable model apicomplexan, Toxoplasma
                       gondii, has been reconstructed through extensive curation of available genomic
                       and biochemical data. Using a sophisticated mathematical modeling framework,
                       we are currently applying flux balance analysis to explore the metabolic potential
                       of the parasite, and to identify highly enzymes that mediate critical roles for its
                       growth. preliminary results show that Toxoplasma incorporates a novel pathway
                       for unsaturated fatty acid biosynthesis, in which the enzymes involved cannot be
                       identified by conventional in silico methods. This pathway is critical to parasite
                       survival in the host cell, since the composition of unsaturated fatty acid impacts
                       membrane fluidity and nutrient uptake. The lack of an orthologous pathway in
                       the host organism provides an additional source of interest from a therapeutic
                       perspective.

                       s i M p L E L o c a L a s s E M B Ly p R o g R a M

                       Presenter: Adam W Spargo, Wellcome Trust Sanger Institute
                       Authors: Adam Spargo, Zemin Ning

                       aBstR ac t: we present a simple local assembly program which will be
                       used in the contig assembly stage of the phusion2 pipeline. phusion [1]
                       clusters sequencing reads by shared long k-mer words, these clusters are
                       then assembled in parallel, currently using phrap[2]. This pipeline was very
                       successful with Sanger sequencing technology, however second generation
                       sequencing technologies have presented several issues (i) phrap cannot
                       handle very high coverage data and so clusters must be small, (ii) phrap cannot
                       make use of read-pairs; with contigs requiring extensive post-processing by
                       phusion, both to join via read-pairs and to break at mis-assemblies, (iii) long
                       running phrap jobs destroy the previously effective parallelization of phusion,
                       (iv) phrap cannot handle all of the different second generation technologies
                       effectively, making a hybrid approach to genome sequencing more difficult
                       than necessary. The local assembler has been implemented via the overlap-
poSTer preSenTaTionS




                       layout-consensus methodology, using libraries from the Smalt alignment tool
                       [3] and the Boost Graph library [4]. we detail this implementation and then
                       report on our investigations into algorithms for overlap-graph disambiguation;
                       using read-pairs, defined nucleotide positions and read-depth. re-use of robust/
                       multi-threaded libraries allows us to quickly implement new algorithms and
                       concentrate our research on developing new methods to make the best of the

                       80                                                                      roCky ‘10
                                   p o s t E R p R E s E n tat i o n s

available technologies. results show the disambiguation of graphs generated
from carefully constructed simulation data for various classes of repeats as
well as real data. [1] The phusion assembler. mullikin JC and ning Z. Genome
research 2003;13;1;81-90. [2] http://www.phrap.org/ [3] http://www.sanger.
ac.uk/resources/software/smalt/ [4] http://www.boost.org/doc/libs/1_44_0/
libs/graph/doc/index.html

c o M pa R i n g g E n o M E s u s i n g t h E p R o f i L E s pa c k a g E i n R

Presenter: Chris J Stubben, Los Alamos National Lab
Authors: Chris Stubben, Murray Wolinsky

aBstR ac t: The number of microbial genome projects submitted to public
sequence databases is growing rapidly. There are nearly 3,000 microbial genome
projects with complete or assembly sequences at nCBi and 12 species are now
represented by 25 or more sequenced strains. The accurate and comprehensive
description of these strains is critical for grouping genomes in comparative
analyses. we have previously developed an r package called genomes to
organize genome project metadata from nCBi, Gold, and other sources. Here
we describe a complementary package called profiles to compare sequence
features and genome annotations using phylogenetic profiles. The profiles
package provides access to databases cross-references in Uniprot from complete
and some assembly genomes. For each cross-linked database, we have created
tables to store the distribution of protein domains, families, orthologs and
other common identifiers across all annotated genomes. For example, the 6.3
million pfam cross-references to 1830 annotated genomes are cross-tabulated
into a table with 9877 rows representing a unique pfam identifier, 1830
columns representing a genome, and cells with the number of hits (similar to
phylogenetic profiles). The package includes functions to quickly access the
profile tables and then cluster, visualize and compare genome annotations using
detailed information about habitat, host specificity, known pathogenicity and
many other organism descriptors.

E n R i c h i n g R E g u L ato R y n E t w o R k s w i t h o t h E R f u n c t i o n a L
R E L at i o n s h i p s

Presenter: Ronald C Taylor, Pacific Northwest National Laboratory
Authors: Ronald Taylor, Antonio Sanfilippo, Jason McDermott, Bob Baddeley, Rick
Riensche, Russ Jenson, Marc Verhagen

aBstR ac t: much of the current work on constructing biological networks
                                                                                            poSTer preSenTaTionS




has focused on reverse-engineering regulatory connections between genes
from correlation patterns observed in gene expression data. Consequently, the
integration of the inferred networks with other background information from
sources such as the biomedical remains an open problem. This is an important
gap as such additional information is needed to (1) refine our confidence in
the inferred gene-to-gene regulatory connections and (2) expand the inferred
networks. Here, we report on one novel means of tying networks derived from

roCky ‘10                                                                              81
                            p o s t E R p R E s E n tat i o n s

                       gene expression data to other information, using a bootstrapping version of
                       our Cross-ontological analytics (Xoa) algorithm. Xoa links genes into networks
                       using aggregated semantic similarities between Go annotations found for those
                       genes in the Go database. The resulting network formed by such edges provides
                       new information as to functional and possible regulatory relationships between
                       the genes. we use Context likelihood of relatedness (Clr) to infer edges
                       derived from mouse gene expression data gathered for study of neuroprotection
                       in stroke. we feed that set of genes and connections into our bootstrapped Xoa
                       algorithm, and report on the expanded set of connections found, performing
                       topological analysis. also, we compare those Xoa results to Xoa results that take
                       as the starting point for analysis the set of Clr connections combined with a set
                       of literature-based gene-to-gene connections found in pubmed abstracts by the
                       medStract tool for those same Clr-reported genes.

                       f u n a n D g a M E s w i t h R D f — M o v i n g R at D ata o n to t h E
                       sEMantic wEB

                       Presenter: Simon N Twigger, Medical College of Wisconsin
                       Authors: Simon Twigger, Joey Geiger, Jennifer Smith

                       aBstR ac t: we have been using the national Center for Biomedical
                       ontology’s annotator tool to annotate the text resources available for rat
                       expression datasets housed in the nCBi’s Gene expression omnibus database.
                       This has provided us with a large amount of anatomical and rat strain
                       annotations for genes that we can combine with our existing Gene ontology,
                       pathway, disease and phenotype annotations created by the rat Genome
                       database (rGd). we are now utilizing rdF, owl and related technologies to
                       bring this data to bear on candidate gene discovery. as part of this process
                       we have getting up to speed with rdF, exploring how to create additional
                       ontologies to classify rGd data and developed a hybrid relational database/
                       triple store application using ruby on rails and allegrograph. we are now
                       wrestling with how best to provide this rdF in a way that makes it maximally
                       useful to us and to others. i will describe our progress to date and some
                       observations on the pros and cons of the use of rdF in this context.

                       c L o s i n g t h E g a p i n t i M E : f R o M R a w D ata to R E a L s c i E n c E
                       (sciEncE as a sERvicE – scaas)

                       Presenter: Anjana Varadarajan, EdgeBio
                       Authors: Anjana Varadarajan, Angelo Scorpio, David DeShazer
poSTer preSenTaTionS




                       aBstR ac t: next generation sequencing has drastically changed the traditional
                       costs and infrastructure within the sequencing community. There are several
                       technologies and algorithms that show promise, but it is not always intuitive
                       where to start. This uncertainty is compounded by the fact that commonly used
                       bioinformatics tools are difficult to build and maintain as well as require vast
                       amounts of compute resources. we will present information, research and a
                       case study on how we facilitate Science as a Service (ScaaS) to the community

                       82                                                                        roCky ‘10
                                     p o s t E R p R E s E n tat i o n s

through a technology agnostic sequencing and bioinformatics approach.
Specifically we will highlight a recent bacterial transcriptome project consisting
of 7 organisms and over 900 million reads. The analysis included spliced
mapping, differential gene expression analysis, profiling antisense transcription,
and comparative analysis of 6 important gene clusters.

va L i D at i o n o f p R o t E i n f u n c t i o n a L s i t E p R E D i c t i o n s
u s i n g a u t o M a t E D B i o M E D i c a L L i t E R a t u R E a n a Ly s i s

Presenter: Karin M Verspoor, University of Colorado Denver
Authors: Karin Verspoor, Judith Cohn, Christophe Roeder, Michael Wall

aBstR ac t: prediction and validation of catalytic and allosteric binding sites in
proteins is a fundamental challenge in genomics and has practical applications
in rational drug design. dynamic perturbation analysis (dpa) is a computational
method for predicting protein functional sites by analysis of protein dynamics.
we used dpa to predict 122,866 functional sites in a comprehensive set of
95,741 protein domains from 32,192 structures in the protein data Bank (pdB),
yielding 1,845,452 functional residue predictions. we are investigating an
approach to validating these predictions using automated search for supporting
evidence in the literature. The approach is based on the assumption that
mentions of functionally important residues are much more frequent than
unimportant residues in publications about protein structure. as an initial test
of our validation concept we developed a set of patterns for detecting residue
mentions in text. The patterns accommodate surface and linguistic variations
in references to specific residues in the amino acid sequence. They also aim
to distinguish mutations from other types of references to residues. we tested
the performance of our patterns in automated retrieval of residue mentions
by compiling a ground-truth corpus of full text publications in which residue
mentions were manually identified. our patterns currently achieve approximately
90% F-score on this corpus. The results indicate that these patterns are highly
effective tools for automatically finding residue mentions in text. provided our
assumption that these mentions constitute evidence of functional relevance
holds, they suggest we can use automated literature mining to increase
confidence in functional site predictions.

i M p Rovi ng th E accu R acy of coEvoLution-BasED M EthoDs
to p R E D i c t p R ot E i n - p R ot E i n i n t E R a c t i o n s

Presenter: Guisong Wang, University of Maryland Baltimore County
Authors: Guisong Wang, Mileidy Gonzalez, Maricel Kann
                                                                                             poSTer preSenTaTionS




aBstR ac t: protein interactions are the main mediators of metabolic and
signaling pathways and participate in the phenotypic expression of organisms’
healthy and diseased states. Computational approaches to infer protein
interactions can overcome the current limitations of resource-intensive,
time-consuming, and low-accuracy experimental techniques. The mirrortree
computational method relies on the observation that changes in one protein

roCky ‘10                                                                               83
                            p o s t E R p R E s E n tat i o n s

                       are compensated by correlated changes in its interacting partner to preserve
                       function. This coevolution-based method quantifies the similarity of proteins’
                       evolutionary histories as a means to infer their interaction. mirrortree has been
                       successfully implemented to confirm experimental interactions and to predict
                       new ones in several organisms. yet, to date, the choice of parameters has been
                       informed mostly by pragmatic considerations and conjectures, rather than by
                       analytical findings. For instance, it is commonly assumed that larger sets of
                       orthologous species yield higher accuracy. To test this assumption we evaluated
                       the effect of common-species size on mirrortree accuracy in yeast. our findings
                       show that the number of species does not affect performance; even a small
                       set can yield accurate results depending on the species used. Thus, it is not
                       the number, but the set of species in the distance matrix that ultimately directs
                       mirrortree performance. as protein characterization and studies of the molecular
                       basis of disease shift to the analysis of organisms’ interactomes, improving the
                       accuracy of interaction predictions will become a necessity. our study indicates
                       that to increase the accuracy of coevolution-based interaction predictions,
                       researchers should be looking for biological indications to inform species
                       selection.

                       foLD REcognition anD aLignMEnt foR tRansMEMBRanE
                       p RotEi ns

                       Presenter and Author: Han L Wang, University of Missouri

                       aBstR ac t: although membrane proteins (Tmp) account for 20-40% of all
                       proteins in a typical organism, to date there is no widely used tool for general
                       prediction of Tmp structures. Tmp has significantly different physicochemical
                       properties from soluble protein, making its structure very difficult to solve or
                       predict. as more and more experimental Tmp structures are available, template-
                       based method for Tmp structure prediction has a potential to become broadly
                       applicable. However, this approach has some challenges. The fold recognition
                       and alignment in the Tmp structure prediction process require different handling
                       from soluble protein structure prediction. in particular, Tmps are composed of
                       both transmembrane hydrophobic segments inside the membrane and the
                       hydrophilic domain outside the membrane. Hence, conventional threading
                       methods for globular proteins may not work well for Tmp. Towards developing
                       a reliable Tmp structure prediction tool, we applied a threading method in
                       Tmp alignment. our study is focused on integrating the structure features of
                       the Tmp into scoring function of threading. we handle the transmembrane
                       and non-transmembrane parts, as well as ?-helix and ?-strand transmembrane
                       segments separately. we integrate multiple features, such as substitution
poSTer preSenTaTionS




                       matrices, secondary structure, polarity, solvent/membrane accessibility, and
                       contact-capacity together to increase the alignment accuracy. we use a dynamics
                       programming algorithm for the alignment. we then rank all Tmp templates
                       based on the alignment scores. The final alignments for the top templates are
                       fed into our in-house model generation tool mUFold for 3d structural models.
                       we will present a preliminary result of our method on a Tmp benchmark.

                       84                                                                     roCky ‘10
                                    p o s t E R p R E s E n tat i o n s

c o n c E p t s at p L ay i n s c i E n t i f i c a R g u M E n tat i o n

Presenter: Elizabeth K White, University of Colorado, Denver
Authors: Elizabeth White, Lawrence Hunter

aBstR ac t: natural language processing techniques for scientific text have
produced a multitude of systems that can query a database of papers about
biological entities or processes and return a list of sentences relevant to that
query. more recently, though, interest has grown in recognizing more general
argumentative targets, like the hypotheses posed by an author, the main results
of his experiments, or the conclusions his results entail. Finding statements
like these requires identification of speculation, causal relations, and novel
assertions. Fortunately, argumentative assertions like these contain a number
of distinctive linguistic features that are typically absent from expository writing.
we have identified and reified these features into a set of 32 concepts that
act as hallmarks of argumentative statements. including these concepts yields
striking performance benefits for a support vector machine classification of
argumentative statements, implying that they capture a critical part of the feature
space independent of word or bigrams. Finally, we demonstrate that these 32
concepts provide the basis for parroT, a robust pattern-matching software that
can recover hypotheses, evidence for or against a theory, or explanations from
phrases, sentences, and multi-sentence spans. This system allows researchers
to “Google” scientific publications and summarize them in terms of emerging
theories, novel results, and evidence for or against a hypothesis.

D E f o g — D i s c R E t E E n R i c h M E n t o f f u n c t i o n a L Ly
oRganizED gEnEs

Presenter: Tobias Wittkop, Buck Institute for Age Research
Authors: Tobias Wittkop, Ari Berman, Sean Mooney

aBstR ac t: Bioinformatics analyses typically include a gene set enrichment
analysis using gene ontology (Go). These studies are done in order to gain
insight into the ‘functional content’ of a given set of genes. Typically, statistics are
applied to derive overrepresented Go-terms for such a set of genes, resulting
in a ranked list of terms that describe their functions. as is often the case, the
input set can include genes of various pathways and functions, leading to
hundreds of terms that are significantly overrepresented in the results list. This
makes interpretation and synthesis of the results difficult to process. Here, we
present deFoG, a web-based tool designed to facilitate the discovery of related
groups of functional concepts from gene lists. This is accomplished by creating
                                                                                           poSTer preSenTaTionS




a functional network of the genes based on experimental data, then clustering
genes into functionally related groups based on network similarity, and finally
performing an enrichment analysis on the gene clusters. This process reduces
the complexity of the enrichment analysis, thus increasing the likelihood that
enriched terms have meaning with regards to the subset of genes. Hence, the
use of deFoG may result in a better understanding of the biological processes

roCky ‘10                                                                            85
                            p o s t E R p R E s E n tat i o n s

                       that govern the submitted dataset. deFoG utilizes three recently developed
                       tools in its data analysis pipeline: (1) similarity networks are constructed
                       using the Genemania software, (2) the organization into groups is performed
                       by Transitivity Clustering, and (3) overrepresented terms for each group
                       are identified utilizing ontologizer. deFoG can be accessed at http://www.
                       mooneygroup.org/defog .

                       E x t R a c t i n g a D v E R s E D R u g R E a c t i o n s f R o M u s E R p o s t s to
                       h E a Lt h - R E L a t E D s o c i a L n E t w o R k s

                       Presenter: Laura Wojtulewicz, Arizona State University
                       Authors: Robert Leaman, Laura Wojtulewicz, Ryan Sullivan, Annie Skariah, Jian
                       Yang, Graciela Gonzalez

                       aBstR ac t: adverse reactions to drugs are among the most common causes
                       of death in industrialized nations. expensive clinical trials are not sufficient to
                       uncover all of the adverse reactions a drug may cause, necessitating systems
                       for post-marketing surveillance, or pharmacovigilance. These systems have
                       typically relied on voluntary reporting by health care professionals. However,
                       self-reported patient data has become an increasingly important resource,
                       with efforts such as medwatch from the Fda allowing reports directly from the
                       consumer. in this paper, we propose mining the relationships between drugs
                       and adverse reactions as reported by the patients themselves in user comments
                       to health-related websites. we evaluate our system on a manually-annotated
                       set of user comments, with promising performance. we also report encouraging
                       correlations between the frequency of adverse drug reactions found by our
                       system in unlabeled data and the frequency of documented adverse drug
                       reactions. we conclude that user comments pose a significant natural language
                       processing challenge, but do contain useful extractable information which merits
                       further exploration.

                       finDing coMMunity LEaDERs in sociaL nEtwoRks

                       Presenter and Author: Xiaowei Xu, University of Arkansas at Little Rock

                       aBstR ac t: identifying leaders in social networks is important to
                       epidemiology, viral marketing, systems biology and sociology. leaders control
                       contact between individuals and influence ideas and opinions. They are the
                       nexus for the propagation of disease, information and ideas. we propose
                       an algorithm for identifying leaders and measuring their influence on their
                       community that is based on the structure of the network. we illustrate the
poSTer preSenTaTionS




                       differences between our work and information spread maximization algorithms
                       both analytically and experimentally. we evaluate its performance on real social
                       networks, including the enron email network and Biblical Social network. our
                       algorithm is both fast and accurate. it is superior at identifying community
                       leaders and achieves comparable influence spread when compared to influence
                       spread maximization algorithms.


                       86                                                                            roCky ‘10
                                     p o s t E R p R E s E n tat i o n s

i n t E g R a t i v E n E t w o R k a n a Ly s i s t o p R E D i c t E n D o c R i n E
R E s i s ta n c E i n B R E a s t c a n c E R

Presenter: Jason Xuan, Virginia Tech
Authors: Jianhua Xuan, Li Chen, Chen Wang, Yue Wang, Rebecca Riggins, Robert
Clarke

aBstR ac t: despite the great benefit of endocrine therapy for breast cancer
patients, its application is greatly limited by both de novo and acquired
resistance. only 50% of all estrogen receptor-positive (er+) tumors are
responsive at first presentation to antiestrogens such as tamoxifen, and
many initially responsive tumors eventually become resistant to endocrine
treatment, leading to tumor recurrence and death. Thus, it is imperative to
better understand the mechanisms associated with endocrine resistance so
as to improve the prediction of endocrine resistance. in this paper, we will
present a novel computational approach, integrative network analysis (ina), for
endocrine resistance prediction. The ina approach is designed to integrate gene
expression data and protein-protein interaction (ppi) data for novel ppi network
identification. Specifically, a markov random field (mrF)-based approach is
developed to take in account the dependency among network member genes,
aiming to unravel many important hub genes participating in estrogen signaling
and action; a network-constrained support vector machine (netSVm) is then
developed to predict endocrine resistance using gene expression data. The
ina approach has been successfully applied to several, both in-house and
public, gene expression profiling data to predict antiestrogen resistance. The
experimental results have demonstrated that the ina approach can be used
to identify novel networks responsible for endocrine resistance, resulting in an
improved performance for endocrine resistance prediction. in summary, we have
developed a novel computational approach, the ina approach, to help discover
new knowledge of estrogen signaling and identify novel mechanisms associated
with endocrine resistance.

p u R i f i c at i o n o f B a c t E R i a L a p o a - 1 a n D c h a R a c t E R i z at i o n
of novEL anticancER DR ug DELivERy systEM

Presenter: Thurman Young, North Carolina State University
Authors: Thurman Young, Andras Lacko

aBstR ac t: Several chemotherapy regimens have proved effective in killing
cancer cells. However, chemotherapy remains unable to differentiate between
the cancerous cells and the normal cells, and although the normal cells can
                                                                                                poSTer preSenTaTionS




and will re-grow and be healthy, threatening side effects do occur. The use of
apolipoproteins a-1 (apo a1) as an anti-cancer drug delivery system has been
investigated in other earlier studies, however its role in the delivery of small
interfering ribonucleic acid (sirna) has not been extensively studied. The use
of apoa-1 in this regard has great potential due to its ability to specifically
target cancer cells and be transported through our water based blood

roCky ‘10                                                                                  87
                            p o s t E R p R E s E n tat i o n s

                       stream. objective: our research concentrates on the purification of apoa-1,
                       a major component in high density lipoprotein (Hdl), from e. coli and some
                       preliminary experiments on preparation of nanoparticles utilizing sirna and its
                       characterization. methods: e.coli was grown at 37oC, until an optical density
                       of 0.6 was reached. The cells were then induced with 0.5m isopropyl ?-d-1-
                       thiogalactopyranoside (ipTG) and centrifuged. Then the pellets were suspended
                       in lysate and loaded onto a nickel-Sepharose column. Thereafter, reconstituted
                       high density lipoprotein (rHdl) nanoparicles using sirna were prepared.
                       results: 160mg per liter of apoa1 was purified. The exact particle measurements
                       are being investigated. Significance: Further investigation regarding total
                       incorporation of sirna, physical and chemical characterization as well as
                       cytotoxicity of the particles could determine the efficiency of these particles as a
                       novel anti-cancer drug delivery system.
poSTer preSenTaTionS




                       88                                                                       roCky ‘10
            notEs




roCky ‘10           89
notEs
p L at i n u M s p o n s o R
iBM DEEp coMputing
www-03.ibm.com/systems/deepcomputing/index.html
1 rogers Street, Cambridge, ma 02142. 617-693-4581
iBm’s deep Computing organization is the high
performance computing organization in iBm Systems and Technology Group. This
group is responsible for the strategy, marketing and identification of areas that can
benefit from iBm’s high end technology. The life sciences is such an area and iBm is
and will continue to bring valued solutions to the life sciences.
iBm’s research division is a partner with iBm’s deep Computing organization
developing the next generation of high performance computers. in addition, the
research division has many groups investigating numerous applications area in
collaboration with iBm’s customers and partners. This includes iBm’s Computational
Biology Center and iBm’s new Computational Science Center.

goLD sponsoR
soMaLogic, i nc.

Somalogic, inc. is a privately-held biomarker discovery and
clinical proteomics company based in Boulder, Colorado.
The company’s mission is to use its proprietary Slow-offrate modified aptamer
(“Somamer”) technology to develop enhanced protein analysis tools and reagents
for the life sciences community, to facilitate target validation, and to develop and
commercialize clinical diagnostic products that will improve the delivery of healthcare
by offering timely and accurate diagnostic information to physicians and their
patients. Further information about Somalogic can be found at www.somalogic.com.

s i Lv E R s p o n s o R
i o n to R R E n t
ion Torrent has pioneered an entirely new approach to
sequencing that enables a direct connection between
chemical and digital information. ion Torrent™ technology
doesn’t use light—it’s the first commercial postlight™
sequencing technology.
instead, ion Torrent marries simple chemistry to incredibly powerful, proprietary
semiconductor technology—it’s watson meets moore. The result is a sequencing
system that is simpler, faster, more cost effective and scalable than any other
technology available. The company’s goal is to democratize sequencing and make
this critical technology available to every lab.
ion Torrent sequencing technology requires no proprietary chemistries or optics
because it’s based on a well-characterized biochemical process. when a nucleotide
is incorporated into a strand of dna by a polymerase, a hydrogen ion is released as
a byproduct. That hydrogen ion carries a charge which our proprietary ion sensor can
detect. if a nucleotide, for example a C, is added to a dna template and a signal is
detected, you know that nucleotide was incorporated. our sequencer—essentially
the world’s smallest solid-state pH meter—has called the base, going directly from
chemical information to digital information. Because this is direct detection, each
nucleotide incorporation is recorded in seconds and you can do an entire run in
about an hour.
The semiconductor has transformed every industry it’s touched. Just as the
microprocessor enabled desktop computing to displace the mainframe, ion Torrent
semiconductor technology will inevitably democratize sequencing, putting it within
the reach of any lab or clinic.
platinum sponsor




gold sponsor




silver sponsor




rocky ‘10 is an official conference of the
international Society for Computational Biology
rocky ‘10 is supported by the Computational
Bioscience program at the University of
Colorado School of medicine

								
To top