The Sociogenomics Project A prototype environment for scalable, extensible, portable and seamlessly accessible information gateway Core B Proposal: Aim 1 a strategy for upload of data into the central server (SDSC/NCSA) • laboratory information management system • development of web interfaces (with background ftp scripts) for data upload • data formats/tools for standardization Core B Proposal: Aim 2 Development of the Glue Grant Website • - web design • - web development • - User Interfaces Core B Proposal: Aim 3 databasing and data storage (backup included) • - development of required ontologies • - development of schema and implementation of data tables (assume we will use Oracle DB) • - development of middle layer for query/input/output tools Core B Proposal: Aim 4 Development of query tools • - User Interfaces • - query engines and background SQL • - display of query outputs/formats/GUIs Core B Proposal: Aim 5 Development of dissemination/display tools for data and analysis • - integration of analysis tools developed in Core C • - integration of analysis outputs from Core C into the website • - Development of data and analysis download tools Core B Proposal: Aim 6 Community pages for enrichment of the data • - Community input pages • - Wiki? • - integration of public data and analysis (who will curate?) Core C Proposal: Aim 1 Proposed Informatics Pipeline(Responding to critique that the multiple omics were not well integrated A. Primary data: 1) Transcriptomics (RNA-seq) on specific brain regions or neuron populations 2) ChiP-seq for a set of 10 pre-determined (from literature) and 10data-driven TFs 3) Promoter scans for cis-regulatory elements 4) epigenomics (BS-seq) Core C Proposal: Aim 2 B. Build Transcriptional Regulatory Networks from 1-4 of Aim 1 Core C Proposal: Aim 3 C. Test predictions with:RNAi and relevant drugs, followed by: -Targeted proteomics -Targeted metabolomics -hormone measurements-behavioral analysis Core C Proposal: Aim 4 D. Conduct within and cross-species comparisons of Networks with existingand new analytical tools. Focus on "module" as the unit of analysis? What is conserved (universal)? What is species specific? What varies genotypically within species? Core B Infrastructure for the SocioGenomics Project Overall Mission: To create a comprehensive information infrastructure that will - Enable persistent archiving of information pertaining to SGP - Provide a seamless interactive infrastructure for SGP researchers - Create a state-of-the-art SGP Workbench that will contain tools, databases and interfaces - Develop an overarching resource that will enable total research administration of SGP initiatives The infrastructure will be based on a Hub and Spoke Model. Hub & Spoke Model of SGP-IT Research Core Centers Technology Centers Individual SGB-IT Public Research HUB Repositories Activities Annotations Administration Curations Publications Information Flow in SGP-IT Project LIMS Experimental Data Core Information Informatics Clinical data System Workbench Research Centers PI Laboratories Curation, Annotation, Publication Public-at-large Informatics System Architecture The SGP-IT WebPortal Create a Web Portal that will serve as the entry point to the SGP-IT infrastructure Website Users • Anonymous Public • Registered Public • SGP Personnel • Content Contributors • Website Developers Website Developers •Overall Development •Cellular Data •Phenotype Data • Other Experimental Data •Clinical Data •Reports •Database Search Utilities The organization of the website developers as independent groups developing separate sections of the website leads to a diverse collection of pages and programs rather than a centralized program. Website Content Public Areas Restricted Areas •About CalSCI •Access Control Administration •Data •Data Administration •Home •E-mail Aliases •Login •Progress Reports •Membership •View •Stem Cell Pages •Upload •Protocols •System Committees •Reports •Message Boards •Pathway Maps •Barcode •Sponsors •Data •DataScope •Review Committees Access to SGP-IT Who will have access and how will it be regulated? Overview of Access Control Layers HTTP request HTTP response from user from server Access Control Layers 1. Web server layer: General access control for Apache mod_perl all users of the website 2. Application layer: More specific access control and personalization for Static Perl Java users of each application HTML CGI servlets (username can be read from cookie set by Layer 1) 3. Database layer: Database gives different Oracle database access privileges to each application Access Control in SGP-IT Gateway Completely open Login required Also some special sections for sole use Home & of: Info Pages •Administrators •Review Panel •Access-Controlled News Staff Reviews Reports Experimental Clinical Data Data Oracle Database No direct human access (except for sysadmin) Laboratory Information Management System Creation of a LIMS for mandatory use: Importance of tracking and validating data CalSCI-IT Laboratory Information Management System (LIMS) Sample tracking Barcode Scanner Data Integrity/ Storage Oracle Database IP Barcode printer Data Input GUIs Sample Preparation Sample Processing Barcode Sample Handling Test tube/ Sample vial A barcode printer & a barcode scanner Zebra: Zebra Z4M barcode printer - 300DPI - 4MB Total - 2MB Usable Zebranet Print Server II, External, Parallel Symbol: Cyclone Scanner USB & Synapse Adapter SGP-IT Data Center Automatic Pipelining of Data from Research Laboratories in the Stem Cell Initiative Transfer of experimental • data Experimental data files transferred on a periodic basis using UNIX wget utility under control of cron job Cellular UIUC Phenotype UCSD Phenotype Cellular UCB SGP-IT Phenotype microarray Transfection Other U. Penn Other Parsing log files / loading db • Set of cron jobs run each night. Names of newly uploaded files passed as arguments to loader scripts. • Log files created so that experiment schemas can be easily repopulated Data master Data loader Data@CalSCI Data master Data loader Data@CalSCI auto-ftp log Data master Data loader Data@CalSCI Data master Data loader Data@CalSCI Graphics generation • Scripts generate libraries of static images, html, tab-delimited files. Data pulled from experiment and barcode schemas barcode@CalSCI data@CalSCI process data/ data@CalSCI process data/ data@CalSCI process data/ data@CalSCI process data/ Display • Argument passed to CGI script to specify graphics, etc. to be displayed data/Type Text, graphics, links, etc. for a cal_data.cgi given data type with specific Data/specific data Example Data Flow The SGP Data and Knowledge System Data Analysis Infrastructure Biology: Genotype to Phenotype COMPONENTS Parts-List QUANTITATIVE EXPERIMENTAL BIOLOGY INTERACTIONS CELLULAR AND NETWORKS MODELS CONTEXT DEPENDENCE Analysis Infrastructure Infrastructure for Data Integration and Analysis Workbenches Functional Genomics Systemic Annotation Biochemical Pathways Workbench Infrastructure Workbench Transcriptome & CHIP IT Applications To Research Biological Projects Bioinformatics Data Gene Data Transcriptome Query & SGP-IT Database Analysis Phenotypic Data Infrastructure Cellular Data Metabolome Data The Parts-List: SGP Strategy Genome Cell-Specific Transcriptome Genes Stem Cell Pages Signaling Proteins Cell-Specific Invoked in Input- Gene Products Specific Response Cell-Specific Gene Products Invoked by Input Reconstructing Networks Reconstructing Networks Legacy Data CalSCI Data Protein Interactions Microarray Data Biochemical Pathways Yeast Two-Hybrid Data Published Literature RNAi Data Protein Data Perturbation Data Microscopy Data Example Annotation pipeline Protein Class GO Other DB Unique proteins with Swiss-Prot IDs Protein annotation Function Genes Enzymes Domains Locuslink E.C. classification Structures Chr. Mapping Reaction Sub-cellular location RefSeq Substrates PTM Mutation data Products Disease association Phenotype variation Compound data The SGP-IT Workbench Sequence Molecule Interaction Annotation Pages Modules Interactions Motif Signaling Organism Cell & Tissue Molecular Libraries Pathways Specificity Phenotypes Data Structure Pathway Expression Proteomics Interaction GUI Profiles Profiles Profiles Database Temporary Editing Comparison Storage Tool Tool Query Tools Analysis Tools Legacy Signaling Pathways Networks Data Store Bulletin Annotations Other Board System Analysis Tools A Model Computing Infrastructure for the Stem SGP IT Server Core An Exemplar SGP-IT Computer Infrastructure Internet 100T public Network 1000T Private Network (inter-machine-NFS) Compute Sun 420R Sun E6500 Sun 420R Sun 420R Cluster (4) * 450mhz (8) * 450mhz (4) * 450mhz (4) * 450mhz INTEL-PCs 4G RAM 8G RAM 4G RAM 4G RAM nX32 Proc Web Web S+ Compute Web nX64GB Application Compute Server Application Server Server Server Sun V100 Sun V 440 Web SAN connection (4) * 450mhz Server 4G RAM WorkBench (seqtool) Sun 420 Sun V480 (4) * 450mhz 4 900mhz 5 TeraBytes 5 TeraBytes 16 GB RAM Computer CVS server RAID-5 RAID-5 680 GB Server SRB Server T3ES T3ES RAID-5 NFS Server T3 SGP Informatics and Systems Biology Proposal Cores B/C What should the experimentalists interaction be? Creation of a Communication and Networking Infrastructure across Core Institutions/Labs. • A: Videoconferencing Infrastructure: • B. Networking Infrastructure: Fiber, other connections between institutions. • C. The SGP Website • D. Web Portal for instant messaging/conferencing /scheduling. Creation and Maintenance of Samples/Reagents/Cells. • cell samples – appropriate stored and indexed • bar coding and tracking system • data base containing anonymized and access controlled meta data • annotations/notes on the cells and IDs to link to data Creation of Laboratory Information Management Systems. • Development of ontologies and meta data catalogues in consultation with experimentalists. • Barcoding, printing and sensor systems and development of software interfaces. • Pipelining and validation of bar codes and meta data. • Creation and maintenance of LIMS databases and interfaces. • Need detailed catalogued experimental protocols. Data Management, Sustenance and Interoperability. • design of information system – what types of data will be accrued? – How will they be collected and formatted? – What are the ontological definitions? – What relational databases are suitable – How will data be entered into databases? – What will be the type of queries and what will be good interfaces? – How will the database be access controlled? – What will be the links between different databases? – How will interoperability be achieved? – What tools are needed to process the data? – What types of statistical tools will we need? – How will the data be stored and made accessible for other repositories? – What will be the links with national repositories? – Who will authenticate the data? • Development of interfaces and analysis tools – What types of interfaces are necessary? – How interactive will the objects in these interfaces be? Will a client-server model work? – How will analysis tools be made available? How will these be tied to biology? – How will we deal with image representations? • Development of interoperability systems – Design of agent-based systems – Integration/interpretation of ontologies – Legacy knowledge incorporation – Scientific annotation and literature embedding. From genes to pathways to phenotypes . • Develop a parts list database for stem cells along with annotations. • Link gene function and polymorphisms to gene function, pathways and networks involved in cell differentiation and development • Identify cellular media conditions and their relationships to phenotypes • Use network reconstruction as hypothesis engine for knockouts and perturbations • Use statistical tools for analyzing stem cell differentiation pathways • Develop links between transcription profiles, development and phenotypes.
Pages to are hidden for
"Shankar Subramaniam - Home Sociogenomics Initiative"Please download to view full document