Opportunities for Text Mining in Bioinformatics

Reviews
Opportunities for Text Mining in Bioinformatics (CS591-CXZ Text Data Mining Seminar) Dec. 8, 2004 ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign Why Biology Text Mining? • Strong motivations from biology side – Difficulty for biologists to access literature • No theory in biology, so we must keep all literature “alive” • Observations about the same biology mechanism may be described in different terms (e.g., due to different perspectives of study) – Many unanswered research questions – Text mining may help better organize, link biology literature, and answer simple questions… (e.g., what do we know about this gene? ) Why Biology Text Mining? (cont.) • Potentially high impact from CS side – Any “discovery” from biology text could be potentially significant – Biology text is relatively “easy” for mining • Literature is cleaner (compared with web data) • Biology text often has many annotations • Many other kinds of biology data can be exploited (e.g., DNA/Protein sequences, gene expression information, metabolic networks) – Simple techniques may work Characteristics of Biology Text • Large number of entities (e.g., genes, proteins) that have well-defined semantics • No standard for terminology (inconsistencies) • Ambiguities (e.g., many acronyms) • Synonyms • High complexity in phrases and sentence structures Research Topics • • General goal: Applying known text mining techniques to help biology research Problem 1: Data/Information Integration – How can we integrate text information (discovering terminology linkages) – How can we link text with databases (semantic interpretations of text on top of entities/relations in DB, e.g., entity extraction) – How can we integrate biology DBs (many fields are text) • Problem 2: Functional annotations – How can we annotate a biological entity (e.g., a gene) with functional information extracted from literature – How can we annotate a set of related genes with functional information – How can we exploit the ontologies/thesauri in biology? Research Topics (cont.) • Problem 3: Data/Information Cleanup & Curation – How can we detect suspicious data/information in existing databases? – How can we automate many manual tasks of database curation? • Problem 4: Research question answering – How can we answer simply research questions? (e.g., what functional connections are there between these two genes?) – How can we support exploratory access and digest of literature information? (e.g., a biology research workbench)

Related docs
Bioinformatics
Views: 40  |  Downloads: 8
Text_mining
Views: 29  |  Downloads: 3
Bioinformatics Bioinformatics
Views: 17  |  Downloads: 3
MINING OVERVIEW
Views: 1  |  Downloads: 0
Data_Mining
Views: 108  |  Downloads: 32
Intractable Problem In Bioinformatics
Views: 16  |  Downloads: 1
Mining
Views: 350  |  Downloads: 11
premium docs
Other docs by Juan Agui
You are My All in All
Views: 501  |  Downloads: 6
Spiritual Capital
Views: 371  |  Downloads: 7
Oh Lord You_re Beautiful
Views: 206  |  Downloads: 1
course07-1
Views: 200  |  Downloads: 4
Contracts Outline
Views: 2381  |  Downloads: 182
Magnet Geometry Review
Views: 645  |  Downloads: 26
dv210infov
Views: 92  |  Downloads: 0
Said I Wasn_t
Views: 201  |  Downloads: 1
Articles of Incorporation-Nonprofit -- PA
Views: 1582  |  Downloads: 8
We Declare That the Kingdom of God is Here
Views: 514  |  Downloads: 5
Shout Hallelujah
Views: 480  |  Downloads: 7
Cheney Brothers v Doris Silk Corp
Views: 338  |  Downloads: 2
Career Opportunities for Biology Majors
Views: 544  |  Downloads: 7
Present Possessory Interest
Views: 355  |  Downloads: 3