ICR ASBP Analysis Service Parameters Proposal

ICR ASBP Analysis Service Parameters Proposal Draft proposal In alphabetical order… Brian Davis, Kiran Keshav, Ted Liefeld, Curt Lockshin, Patrick McConnell. Martin Morgan, Sal Mungal, Jared Nedzel, Baris Suzek, Claire Wolfe, Nov 8, 2007 Analytic Services are DIFFERENT from Data Services • Differences • Data Services e.g. caArray • Long lifetimes • Remain useful for many years • May be extended/grow, but seldom disappear • Grow/change slowly • Few in number • 10’s-100’s of services • Analytic Services e.g. Hierarchical Clustering • Short(er) lifetimes • Replaced by newer algorithms or variants frequently • E.g. Blast - 13 variants at http://www.ncbi.nlm.nih.gov/BLAST/ • Change often • Some GenePattern Algorithms have had >10 updates • Parameters added/removed, implementations improved • Many in number • GenePattern+Bioconductor+geWorkbench have >400 between them Analysis Services are DIFFERENT than Data Services • Class registration of input and output for caGrid supports (relatively stable) data services • Data models have long lives • Overhead of registration small compared to service implementation • Registered classes remain valid for long period • Geared towards supporting new services • Starting from new data model to be put on the grid • Analytic services - More of them, more variable, shorter lifespan • Overhead of class registration a significant portion of development effort • (Many) analytic services are preexisting • GenePattern+BioConductor+geWorkbench have >400, ~9 on caGrid • Developers must ‘go back’ to re-model the service parameters in caBIG way • • Parameters change often, each version may have different parameters Conclusion: need to modify registration process in caBIG to get more analytic services on caGrid Process for analytic services • Re-annotating reused classes (Solution: Service Loader) •Annotation of parameter classes •Model reused classes (Solution: Service Loader) •Modeling parameters • SIW Roundtrip partially working for reused model (Solution: bug fixes in SIW 3.2.1 + Service Loader) • Reloading reused classes (Solution: Service Loader) •Loading parameter classes Outstanding issues in RED caGrid and analytical services: steps in the Introduce toolkit •XSD generation Wrong XSD using EA and caCORE SDK not used for Analytical Services XSD File •Redefinition of interfaces/ operations modeled in EA •Annotation and tagging (CDE id) of parameter classes in EA (needs caDSR load) XML File Import dataypes Create operations Add Service Metadata and Domain Model Create Skeleton / Implement Methods caDSR GME caDSR •Loading classes to caDSR/schemas to GME before development Outstanding issues in RED Issues and Solutions: 1) Use specialized “service loader” and improvement in “Roundtrip” Issue 1- Model Reuse • Significant time investment to reuse models. • Hard to include in UML • Round trip did not work well • Required re-annotation, re-generation of XSDs Solution 1- Use New “Service Loader” to Register Reuse of Models • • Significantly reduce registration time (~2 developer FTE weeks) to register models reusing other models Replace with Service Loader based process • Re-used Models not included in UML - unless modified/extended • Register model re-use in introduce. Recorded in service metadata • Service metadata submitted to Service Loader to record use in caDSR • Prevents partially re-used model mismatch problems (eg GP/caArray/caB2B) NOTE: Still Need to “Test Drive” new process to ensure it works • Created Demo Service to test using Service Loader process for model reuse • Used as an example for new Analysis Service developers • Used to provide scaffolding for developing white paper describing how to create analysis services Exploring Further Solutions & Additional Time Savings: Parameters Issue 2 – Modeling and registration of parameters • Parameters change frequently, requiring model changes, re-annotation, and loading into caDSR (3 Months?) • Parameters, unlike input and output data classes, are not intended for semantic interoperability or reusability • Parameters are not semantically rich or meaningfully annotatable • Parameters meaningful only within the context of the service Solution 2 – Treat parameters differently • • Time savings estimated at ~1 developer FTE week effort per service over 2-3 calendar months Additional Curator FTE savings due to reduced model loading workload Proposal - Generic Parameter Passing Model Use a generic parameter model to pass parameters to the services - Reuse model and allow Service Loader to register our model reuse - This model registered once, reused often Simple reusable metadata model facilitates auto-generation of Parameter metadata & service implementation Proposal 2- Generic Parameters Metadata Model • • Extend caGrid Service Metadata (already supported) with Parameter metadata Model (as discussed with caGrid Team) All metadata is handled at caGrid level Draft model: Exploring Further Solutions & Additional Time Savings: Parameters Issue 2 – Modeling and registration of parameters • • • • Parameters change frequently, requiring model changes, re-annotation, and loading into caDSR (3 Months?) Parameters, unlike input and output data classes, are not intended for semantic interoperability or reusability Parameters are not semantically rich or meaningfully annotatable Parameters meaningful only within the context of the service Solution 2 – Treat parameters differently • • • • • Time savings estimated at ~1 developer FTE week effort per service over 2-3 calendar months Additional Curator FTE savings due to reduced model loading workload Generic Parameter Passing Model reused in Domain Model Generic Parameter Metadata Model to be in service metadata ONLY • Enhanced service metadata to define parameters Parameters are NOT registered as CDEs in caDSR (not semantically annotated) • The parameters are found in the index service Pros and Cons Pros: • SAVE TIME (~1 developer FTE week per service) • More analytic services on caGrid/available to caBIG • Actual parameters and descriptions of parameters are still available at Grid level • No caDSR/GME registration dependency (if all classes are reused) Cons • No parameter re-use • No concept based-discovery of services • No semantic interoperability based on parameters (is this likely, anyway?) • No CDEs for parameters • A different place to look for parameter metadata (not caDSR) • Proposed model is not appropriate for non-caGrid services • Could be adapted to support non-caGrid silver services Time Savings from Adoption of Proposals Time to semantically annotate and grid-enable an Analytic Service Total Calendar Time No Change to Process 6-9 Month Total Developer Time 5 FTE Weeks (200 hours) Comments Estimated time it took for Reference Implementations in 2006-7 (GenePattern, geWorkBench, Bioconductor) Parameters still need to be registered in domain model Time for using service loader and using generic model proposal. Adoption of Analytic Service Loader Proposal Adoption of Analytic Service Loader AND Parameter Proposal 3.5 Months 3 FTE Weeks (120 hours) 2 Weeks (80 hours) 1.5 Months NEXT STEPS • Suggested Next Steps • Modifications to proposal based on input from NCI • Create final draft of generic parameters proposal • Meeting with caGrid team on Extension to Service Metamodel • Presentation to Arch and VCDE WorkSpace’s • Develop proof of concept services • Test drive of Analytic Service Loader • Test use of generic parameter services • Register generic parameter models in caDSR Appendix: additional slides • • Extra slides Not in any order ASBP Meeting Logistics • Meeting 2nd Friday of every month • Next Meeting: November 9th @ 2:00 pm EST • Topics • Continued development of demonstration service & white paper • Analysis of CGEMs model for demo service (Hrishi) • Follow-up about registration of Parameters • liefeld@broad.mit.edu Parameter Modeling Comparison Modeling one parameter set (below) Including all CADSR tags & stereotypes Clean up of tags, adding Concept codes Generating extended metadata using new model for GenePattern modules EA->Schema->jaxB->custom java code executeAnalysis java.lang.String reference gene accession from data file to find neighbors for true gene.accession 1 java.lang.String 50 number of neighbors to find true num.neighbors 2 … Modeling Time Comparison One-time cost to create metadata-generation code <120 minutes Parameter Set Time to model & first pass annotation Estimated registration & semantic annotation # of parameters 95 min ?? Working days 2-3 calendar months 4 Generic Parameter metadata generation ~1 min 0 min 270 # of value domains # modules Estimate for all modules to draft introductory stage 2 1 ~3 and a half person weeks + registration & XSD 125 82 ~1 min Note: This does not include the semantic annotation or XSD editing which are typically the most time consuming portions of the process caGrid Service Metadata From caDSR Registration Current caGrid Parameter Modeling A current caGrid parameter representation Cons: •Modeling, semantic annotation and caDSR registration –significant cost Pros: •CDE based discovery •Parameter information on caDSR

Related docs
ICR ASBP Generic Parameters Proposal
Views: 1  |  Downloads: 0
ICR-Coverpmd
Views: 0  |  Downloads: 0
ICR brochure USA4.indd
Views: 2  |  Downloads: 1
ICR Supporting Statement (PDF)
Views: 0  |  Downloads: 0
ICR Standard Dec Eng
Views: 15  |  Downloads: 0
ICR Workflow Working Group, 2008
Views: 7  |  Downloads: 0
ICR - Microbial Laboratory Manual
Views: 57  |  Downloads: 13
HOW TO PRIORITIZE YOUR ICR WORKLOAD
Views: 22  |  Downloads: 1
ICR Supporting Document
Views: 15  |  Downloads: 1
ICR Microbial Laboratory Manual
Views: 92  |  Downloads: 3
ICR WS November 12, 2008 Meeting Agenda
Views: 0  |  Downloads: 0
0977 ICR Homoph Book
Views: 1  |  Downloads: 0
Other docs by Eric Parish
Coach Inc Ammendments and By laws
Views: 290  |  Downloads: 0
Batmobile Rear
Views: 553  |  Downloads: 5
BILL OF SALE
Views: 247  |  Downloads: 3
Sample UCC1 Financing Statement
Views: 1322  |  Downloads: 9
My first "Celebrity Blog"
Views: 374  |  Downloads: 0
RSVP LIST
Views: 419  |  Downloads: 9
Users marcsigal Desktop term papers pagemills
Views: 216  |  Downloads: 0
Board Resolution Approving S Corp Election
Views: 220  |  Downloads: 4
wannamaker-all
Views: 306  |  Downloads: 2
Cyberian Outpost Inc Ammendments and By laws
Views: 247  |  Downloads: 0