NCSA Annual Report
2005
Letter from the Director
Welcome to NCSA’s annual report for fiscal year 2005.
Computing is essential for advancing scientific discovery and the state-ofthe-art in engineering, whether that means exploring the inner workings of the universe or of the economy, designing a new chemical process or a new material, gaining new insights into how cells function or into the delivery of personalized medicine, understanding the spawning of a tornado or urban development. The basic fact remains the same—without computing, science and engineering will not reach their full potential. But computational infrastructure, like any infrastructure, is expensive to build and maintain. Every dollar the nation spends on computing could go to other things. Health care. National defense. Social Security. Economic development. How does our investment in computing benefit the nation?
The teams, institutes, and centers that have led in the deployment of the nation’s cyberinfrastructure for the previous two decades— NCSA happily among them—can answer that question with confidence and pride. Many scientific discoveries and engineering feats were made possible with our help. Pharmaceutical companies used computer technology to usher in the era of rational drug design and scores of manufacturers used computational modeling to improve their products. Visualization techniques that we pioneered spread throughout science’s disciplines and also brought science to an appreciative public. Affordable cluster computing allows investigators to build their own computational resources and dedicate them to their individual efforts. These machines came of age and became standard on our machine room floor. And we must not forget that Mosaic, and the Web browsers that followed it, forever changed the way we live, work, and play. This is an impressive return on investment, and the above list describes these returns in only the broadest of strokes. But the computing landscape is changing. Extraordinary computing resources are now spread all over the country. To continue advancing we must integrate these various elements—computers, data sources and data stores, networks, scientific and engineering applications, data mining, analysis and visualization tools, and so on—into a resource that scientists and engineers can tap as needed. The development of this cyberinfrastructure has enormous implications for science and engineering. In many fields it will lead to major advances in our understanding; in other fields, it will open new avenues of inquiry. Realizing the national cyberinfrastructure will require a new level of collaboration between scientists and engineers and computer scientists, engineers, and technologists. In 2005, NCSA forged a new compact with the scientists and engineers that it serves, a promise that NCSA will be a place not only of technologies but of engagement. We pledge to partner with them to ensure that they realize the full benefit of the national cyberinfrastructure. We will help them harness the extraordinary computing and other resources and capabilities that will be available across the nation to solve their most pressing problems. Our goal is simple. We want to put researchers in control by giving them a comprehensive cyberinfrastructure environment—or cyberenvironment—for pursuing their work. Cyberenvironments will allow researchers to access the resources and tools that they need, to orchestrate and automate the work that must be done, and to collaborate with fellow researchers both near and far. This is a daunting task and will not be achieved in a day, a month, or even a year. But it promises a new age of scientific exploration.
This report highlights the work that we are undertaking in cyberenvironments and the progress that we have made in pursuit of this goal. It records not only what we’ve done, it also records what we think is important and what we believe is possible. We will also discuss the cyber-resources that are indispensable to progress in science and engineering, as well as research on innovative systems that will ensure that these resources continue to meet the rapidly increasing computing needs of the communities that we serve. Our contributions to science, engineering, and society are summarized by our four interlinked goals: Cyberenvironments—to realize the full potential of the national cyberinfrastructure. Cyber-resources—to advance scientific discovery and the stateof-the-art in engineering. Innovative systems—to define the path to petascale cyberscience and cyberengineering. Cybereducation—to bring cyberenvironments into the classroom. The concepts behind these four thrusts are more important than any single tool or service we might deploy. Each of these concepts is fundamental to advancing science and engineering in the 21st century. None of these concepts will be realized overnight, but they are crucial to our success as a center as well as to the success of our nation in an increasingly technological world. We are dedicated to achieving these goals.
Thomas H. Dunning Jr. Director, National Center for Supercomputing Applications Professor and Distinguished Chair for Research Excellence, Department of Chemistry, University of Illinois at Urbana-Champaign
1
Introduction
In January 2003 the National Science Foundation (NSF) issued the report from the Blue-Ribbon Advisory Panel on Cyberinfrastructure, which was chaired by Professor Daniel E. Atkins of the University of Michigan. The report, entitled Revolutionizing Science and Engineering Through Cyberinfrastructure, noted that: “multiple accelerating trends are converging and crossing thresholds in ways that show extraordinary promise for an even more profound and rapid transformation—indeed a further revolution—in how we create, disseminate, and preserve scientific and engineering knowledge.” To realize the benefits of this revolution, the Blue-Ribbon Advisory Panel called for the creation of a national cyberinfrastructure for science and engineering through long-term, coordinated investments in: • Fundamental research to create the technologies that are key to advancing cyberinfrastructure. • Development activities to create and evolve an advanced operational cyberinfrastructure. • Institutions to provide operational support and other services for the operational cyberinfrastructure. • High-impact applications of advanced cyberinfrastructure in all areas of science and engineering research and education. Following publication of the report from the Blue-Ribbon Advisory Panel, reports from all of NSF’s directorates have stressed the need for a national cyberinfrastructure for science and engineering with tools, services, and resources far beyond those currently available. Since its creation by NSF and the state of Illinois in 1986, the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign (UIUC) has been a leader in working with and supporting scientific and engineering communities through the development and deployment of new computing and software technologies. In 2005, NCSA undertook a strategic planning activity—NCSA 2010—to ensure that the center provides those cyberinfrastructure elements most needed to advance science and engineering research and education in the years ahead. Consultants from the Wharton School of the University of Pennsylvania and Strategic & Innovation Consulting assisted the NCSA Strategic Planning Committee. The committee held meetings in March, May, and June and commissioned a number of studies examining the trends and forces in cyberinfrastructure, as well as an assessment of NCSA’s internal capabilities and assets. The process included two days of meetings in Washington, D.C., with Congressional aides and program managers from NSF, the U.S. Department of Energy’s Office of Science, and the National Institutes of Health, as well as staff from selected non-governmental organizations headquartered in the D.C. area (Coalition for Academic Scientific Computation and the Council on Competitiveness). The “View of the World” that resulted from the strategic planning process is summarized in Table 1.
1“Revolutionizing Science and Engineering Through Cyberinfrastructure,” Report of the Blue-Ribbon Advisory Panel on Cyberinfrastructure, January 2003; see: http://www.nsf.gov/publications/pub_summ.jsp?ods_key=cise051203. 2These reports are listed in Appendix A of “NSF’s Cyberinfrastructure Vision for 21st Century Discovery, Version 4.0” NSF Cyberinfrastructure Council, National Science Foundation, 26 September 2005. See also: http://www.nsf.gov/od/oci/reports.jsp.
2
NCSA 2010
The NCSA strategic planning process concluded that new capabilities would indeed be needed to fully realize the benefits of a national cyberinfrastructure. NCSA 2010 is built on four pillars. First, NCSA is creating cyberinfrastructure environments—cyberenvironments. Cyberenvironments couple traditional desktop computing environments with the resources and capabilities in a national cyberinfrastructure to provide researchers with an unprecedented ability to access, integrate, automate, and manage complex, collaborative projects across disciplinary as well as geographical boundaries. This activity expands NCSA’s work on cybertools, cyberservices, and middleware to provide end-to-end solutions for addressing the most challenging problems in science and engineering. Second, NCSA will continue to provide the leading-edge computing, data, and visualization resources—cyber-resources—to power cyberenvironments for science and engineering. This will ensure that resources are available to solve the most demanding scientific and engineering problems and that the solutions are obtained in a timely manner. NCSA will work closely with existing and emerging communities to determine how best to deliver the needed cyber-resources to their communities. Third, to ensure that the cyber-resources provided by NCSA continue to meet future computational needs, NCSA has established a new effort in innovative systems. This effort will first focus on new computing technologies, as the technologies needed to realize petascale cyberscience and cyberengineering are unclear. Over time, the scope of this activity will broaden to include the entire system of computers, data stores, networks, and associated software needed to solve cutting-edge problems. Finally, NCSA is collaborating with faculty and staff at UIUC and other universities to develop a program to bring cyberenvironments into the classroom. This will ensure that the benefits of the national cyberinfrastructure are made available to educators and students throughout the country. Educational versions of cyberenvironments will bring the latest research concepts and technologies into the classroom, reducing the gap between coursework and practice. To better achieve the objectives listed above, NCSA was reorganized. There are three technology units in NCSA: the Persistent Infrastructure Directorate, which is responsible for the cyber-resources and services offered by NCSA, including NCSA’s local infrastructure; the Cyberenvironments and Technologies Directorate, which is responsible for creating and integrating the technologies needed to build cyberenvironments; and the Innovative Systems Laboratory, which is responsible for assessing the potential of new computing architectures for key scientific and engineering applications. Crosscutting these technology units is the Cyberapplications and Communities Directorate. This directorate is the bridge to the larger scientific and engineering communities that NCSA serves and is responsible for providing community input into the development of cyberenvironments, the acquisition of cyber-resources, and the evaluation of innovative systems. The organization structure of NCSA is illustrated in Figure 1.
Figure 1. Organizational overview of NCSA.
3
Funding
NCSA receives funding from several sources to support its technology development and deployment activities. The major source of support is the National Science Foundation, which provides funding for the acquisition, operation, and maintenance of supercomputing, data storage, and visualization resources for the nation’s academic scientists and engineers and for the development of the software infrastructure needed to efficiently and effectively use these resources. In a groundbreaking partnership between a state and the federal government, the state of Illinois provides additional funding to support NSF’s mission. Funding from NSF and the state of Illinois supports our activities in cyberenvironments, cyber-resources, and innovative systems. NCSA also receives major funding from the Office of Naval Research (ONR). ONR funds the National Center for Advanced Secure Systems Research (NCASSR), which is addressing the nation’s critical need for a dynamic, adaptive cybersecurity infrastructure. ONR also funds the Technology Research, Education and Commercialization Center (TRECC) at the DuPage Research Park in the western Chicago area. TRECC is a partnership among ONR, NCSA, and other University of Illinois academic and administrative units that supports innovative research in advanced information technologies and their application to the needs of the Navy R&D community. NCSA joint faculty and staff also carry out research under grants from a number of agencies. This includes research by the scientists and engineers in the Cyberapplications and Communities Directorate, which provides a direct link between NCSA’s cyberinfrastructure activities and the needs of working scientists and engineers. Similarly, research by computer scientists, engineers, and technologists in NCSA’s directorates allows us to participate in as well as track the development of new advances in computing and information technology. The funding that NCSA received in FY2005 is summarized in Figure 2.
Figure 2. Funding received by NCSA in FY2005.
4
View of the World
Cyberinfrastructure
• Cyberinfrastructure is increasingly essential for:
• • • • Achieving advances in science and engineering. Solving complex, real-world, multidisciplinary problems. Educating the next generation of scientists and engineers. Maintaining the world’s leading economy.
• Software for analyzing scientific data is increasingly important as scientific data volumes are rapidly increasing; broad community access to data analysis and visualization tools, as well as the data sets, is needed.
• Scalability of scientific and engineering applications is a
major barrier to realizing the promise of high-end computers based on massive parallelism.
• For cyberinfrastructure to be the basis for frontier science and
engineering, it must: • Be robust and stable, providing a “gold-standard” hardware and software environment. • Be needs-driven and scientific domain-specific, but designed holistically, re-using elements of the cyberinfrastructure that are common across domains. • Provide scientists and engineers with the ability to manage, coordinate, and automate their increasingly complex research and educational projects based on an increasingly rich set of cyber-resources, services, and tools.
• Diverse computing and storage systems are required.
• Performance of scientific and engineering applications depends on the computer architecture (microprocessor, memory subsystem, interconnect, etc.). • Local and national computing and storage systems serve different needs and both are required. • Both high-end capacity and capability computing are required to meet the needs of scientists and engineers; several units of capacity computing will be needed to support each unit of capability computing.
• Need for high-end computing will be driven by two factors:
• Increasing sophistication and fidelity of computational models. • Management, analysis, and visualization of massive, heterogeneous data sets.
• Rapid rate of technology change, in both hardware and software, will continue, providing important (even critical) new capabilities, but posing new challenges for programming and performance and affecting design tradeoffs. • Computing technology is undergoing a major change, driven by physical constraints; the path to petascale cyberscience and cyberengineering is not clear. • Middleware is evolving rapidly, providing much-needed capabilities but at the cost of increased fragility; careful tradeoffs must be made in the design of cyberinfrastructure.
• View of the World lack of understanding of the new cyberinfrastructure capabilities and resources will increasingly become the barrier to progress in research and education—new educational programs will need to fill this gap.
Hardware, Software, and Data
• New types of software will be required to fully realize the
promise of a national cyberinfrastructure, to enable scientists and engineers to: • Access capabilities and resources distributed in multiple sites. • Orchestrate and automate the processes needed to solve leading-edge problems. • Collaborate with colleagues across campus and across the nation and world.
• As computing systems and scientific and engineering applications become more complex, simulation capabilities and at-scale testing facilities will be required to optimize the performance of the systems and applications.
Table 1. View of the World: a summary of some of the major external trends and forces that will act on NCSA in the near future.
5
Building a Robust, Reliable Cyberinfrastructure
A professional approach is required to build a robust, reliable cyberinfrastructure that will enable advances in science and engineering. At NCSA, we have created Integrated Project Teams (IPTs) to build cyberenvironments and associated elements of the cyberinfrastructure. These teams draw on expertise from across NCSA and use modern software engineering practices as well as standard project management techniques to ensure the development of high-quality, robust, and reliable software systems. An IPT may be co-led by a domain scientist and a computer scientist, may include both domain scientists and computer scientists, and may involve outside partners. A project manager is assigned to each IPT to ensure that the project is completed on schedule and meets clearly defined milestones. Project managers also ensure coordination across projects. Large Synoptic Survey Telescope (LSST) The Large Synoptic Survey Telescope project is now in its design and development phase. The telescope will image an area of the sky roughly 50 times that of the full moon every 15 seconds, opening a movie-like window on objects that change or move on rapid timescales. It will provide an image of the entire sky every three days, resulting in an unprecedented amount of data—on the order of 6 petabytes per year. NCSA leads two of the three LSST data management teams and is responsible for managing the integrated work plans, tracking progress, reporting to the LSST Corporation, and coordinating a series of data challenges. The LSST project represents a model for NCSA projects. NCSA forged partnerships with the scientists who will generate and use the LSST data; this ensures that the data management system will meet the needs of the astronomy community. We also forged close partnerships with computer scientists and engineers to ensure that all of the expertise needed to create the data management system was available. This complex project provides a real test of NCSA’s new software engineering and project management capabilities. “With a dozen different research institutions at universities and laboratories across the country participating in the development of LSST’s data management system, the project management support that NCSA is providing is essential to keeping the team working smoothly,” stated Jeffrey Kantor, LSST’s data management project manager. “I rely on them for communications, work status, and technical integration of the many aspects of the project.”
Sowing and Reaping
In the last several years, NCSA has made a number of key additions to its staff. We’ve also seen some move on to new challenges. We take pride in our ability to recruit the best and brightest to NCSA and to seed new endeavors. The center is now home to: • Thom Dunning, who became director of NCSA in January 2005. Also a professor in the chemistry department at the University of Illinois at Urbana-Champaign, Thom previously led the Joint Institute for Computational Sciences in Oak Ridge, Tennessee. • Jim Myers, who leads the center’s Cyberenvironments and Technologies Directorate. Jim joins us from Pacific Northwest National Laboratory. • Barbara Minsker, who heads a newly formed group developing cyberenvironments for the environmental science and engineering, and hydrology communities. She is also an associate professor in the civil and environmental engineering department at UIUC. • Robert Brunner, who leads NCSA’s Laboratory for Cosmological Data Mining, oversees the processing, storage, and dissemination of data from the Palomar-Quest Survey. He is also a member of UIUC’s astronomy faculty. • Nosh Contractor, a professor of speech communications at UIUC, is the director of NCSA’s Science of Networks in Communities research group. Other staff members have moved on to other leadership positions: • Former director Dan Reed leads the Renaissance Computing Institute in North Carolina. • Charlie Catlett leads the National Science Foundation’s TeraGrid project. • Former deputy director Jim Bottum is now vice president for information technology at Purdue University.
Economic Development
NCSA contributes to the university’s economic development mission in a number of ways. Many of the technical innovations developed at NCSA have commercial applications. During FY2005, NCSA staff made 14 disclosures to UIUC’s Office of Technology Management, and four applications were made for patents. In addition, two companies were spun out of NCSA: • RiverGlass, based on NCSA’s D2K data mining system, develops real-time analytics software. It merges multiple data streams from disparate sources and applies powerful data analysis and modeling techniques to help customers manage risks, solve critical problems, and make informed decisions. • The HDF Group is a not-for-profit corporation whose mission is to sustain the HDF technologies developed at NCSA and to support worldwide HDF user communities with production-level software and services. HDF, software and file formats for the management of scientific data and other large datasets, has been under development at NCSA since 1987. NCSA also works with a number of Illinois-based companies through its Private Sector Program. These companies include Boeing, Caterpillar, Deere & Company, and Motorola.
6
National Center for Supercomputing Applications Director’s Office
Thom Dunning, Director Danny Powell, Executive Director Rob Pennington, Chief Technology Officer Bob Wilhelmson, Chief Science Officer Lex Lane, Chief Information Officer Mark Marikos, Project Manager Cristina Beldica, Project Manager Patty Kobel, Business Operations Specialist Thomas Prudhomme, Visiting Associate Director Beth McKown, Administrative Support Judy Olson, Administrative Support Campus and International Relations Radha Nandkumar Government Relations Kirk Hard Private Sector Program Mark Nolan Evelyn Hickman Brian Kucic ACCESS center (Washington, D.C.) Janet Thot-Thompson Tom Coffin Dan Copher Laura Winn Shaundra Farrow
Cyber-resources
The world is changing, and NCSA is evolving to meet the challenges of the next decade. Scientists and engineers need supercomputing resources more than ever to advance scientific discovery and the state-of-the-art in engineering. However, supercomputing alone isn’t enough. Scientists and engineers are collecting unprecedented amounts of data. NCSA must also provide the resources and capabilities to store, manage, analyze, and visualize the data generated by these studies, as well as the network connectivity to transfer the data and make it available to the user community. NCSA is committed to providing the cyberinfrastructure resources—or cyber-resources—that are essential to progress for the communities that it serves.
Supercomputing in the 21st Century
NCSA provides leading-edge computing, data storage, visualization, and networking resources to the nation’s researchers and educators. This ensures that resources are available to solve the most demanding problems and that solutions are obtained in a timely manner. In fact, the need for supercomputing is rapidly expanding. Many new communities now or soon will require supercomputing resources and services to advance their research objectives. Their needs for cyber-resources, support services, and usage models are varied and can differ from those prevalent today. NCSA is working closely with the communities that it serves to determine how best to deliver the needed resources and services—in other words, to define the supercomputing center of the 21st century. The major high-end computing usage mode of the past 20 years—allocation of time to a single investigator, submission of jobs by the investigator and her students to a queue, progress through the queue, execution of the job, spooling of the output to a file to be analyzed later—is giving way to new usage models. Many researchers need rapid turnaround to fit the demands of their research workflow. Others need dedicated resources for extended periods to ensure the timely completion of their projects or to assure that data can be handled when it becomes available from the data sources (sensor arrays, instruments, and simulations). Some even need on-demand computing to react to unforeseeable events.
Table 2. Major computing systems in production at NCSA in FY2005.
Name Mercury Copper Tungsten Tungsten2 Cobalt Vendor/Processor IBM / Itanium2 IBM / Power4 Dell / Xeon Dell / Xeon(EM64T) SGI / Itanium2 Number of Processors 1774 384 2560 1024 1088 Peak Performance 10 TF 2 TF 16 TF 7 TF 6 TF Inter-Connect Myrinet 2000 Gig-E Myrinet 2000 Infiniband NUMALINK 1,936 GB 3,072 GB 3,072 GB Total Memory 4,572 GB Local Storage 129 TB 35 TB 140 TB 4 TB 370 TB
7
We are exploring new models for allocating our cyber-resources so that they are more conducive to real-world use. For example, in FY2005 about 40 percent of NCSA’s largest system, Tungsten, was allocated in blocks: specific processor partitions were allocated to specific researchers for specified periods of time. These “block allocations” give researchers the ability to complete crucial computations that must be done in a specific timeframe, that require a large number of processors, or that would otherwise be difficult to schedule. More recently, we have begun to support “rapid response” computing. Rapid response computing differs from on-demand computing by allowing a longer lag time between the time the need arises and when the resources become available. This dramatically decreases the penalty associated with switching from one usage mode to the other. We recently exercised this capability in simulations related to Katrina and Rita hurricane assessment with great success. While the process still requires significant human intervention—and probably should to provide assessment and priority decisions—we are working to put procedures in place to expedite the process of approving and arranging for rapid access to needed resources.
In addition to these systems, Radium, a Condor flock with 512 SGI MIPS processors and 440 Intel PIII processors, is used to support a number of applications with modest communications needs. This computing infrastructure represents a carefully constructed multiarchitecture hardware strategy that allows NCSA to support a broad range of users and communities on the computer systems best suited to their applications; see Table 3 and Figure 3. In FY2005, NCSA provided more than 672 million NUs3 to the NSF research and education community. Nearly 1,500 scientists and engineers and their students used the computing and data systems at NCSA to support research in more than 600 projects. Large projects, consuming more than 1 million NUs per year, represented 92 percent of the NUs used. These researchers primarily used Tungsten and Mercury, which, at 16 and 10 teraflops respectively, are the largest machines at NCSA and are targeted to this set of users. A few large users also computed on Cobalt, our 6-teraflop shared memory system. Moderate-sized projects—those consuming between 100,000 and 1 million NUs per year—represent 7 percent of the NUs used. These researchers were primarily supported on Tungsten and Mercury, but there is also a sizable segment of this class of users on Cobalt and Copper (Copper is also a shared memory system). Applications in this group are a mix—many of them third-party applications—that have only moderate scalability. Small users are typically in start-up mode. They run on a variety of systems and gain access through a development account. This multi-architecture hardware strategy has been very effective in allowing NCSA to configure systems for classes of applications. Tungsten and Mercury are well suited for highly parallel, distributed memory applications; Cobalt serves very large memory, shared memory applications; and Copper provides resources for moderately parallel applications and many third-party codes.
3
NCSA’s Persistent Infrastructure
NCSA operates one of the largest academic computing facilities in the world, with more than 40 teraflops of computing capability available to support research. These resources are crucial to progress in science and engineering as well as in other endeavors; see Science & Engineering Successes in the next chapter. NCSA is committed to delivering leading-edge, yet reliable and robust, computing systems, data stores, and networks to science, engineering, and other communities, all backed by the user services needed to make efficient and effective use of these resources. In FY2005, NCSA provided more than 60 percent of all NSF-supported computational resources available for allocation. The divisions and staff in NCSA’s Persistent Infrastructure Directorate are listed at the end of the chapter. The major computing systems that were in production at NCSA in FY2005 are listed in Table 2 on page 7.
A normalized unit is the equivalent of one hour of computing time on a single processor of a Cray X-MP supercomputer.
Table 3. Academic usage of NCSA computing systems in FY2005.
Number of Projects System Tungsten Mercury Cobalt Copper Tungsten2 Radium NUs 306,633,671 264,314,726 58,583,761 24,632,427 17,831,924 28,818 >1,000 203 115 54 253 4 2 >100,000 132 80 27 57 2 0 >1 M 51 34 6 3 2 0 Number of Users >1 465 282 127 684 7 6 >1,000 396 217 106 394 6 3
Total
672,025,328
544
265
85
1,297
942
*The user community benefited with access to nearly 18 million additional NUs on Tungsten2, which was funded by NCSA’s Private Sector Program.
8
application codes on NCSA’s production and research systems and in developing methods and algorithms targeted to high-end platforms. Through the Strategic Application Program, NCSA has helped a number of users to port or optimize their code on NCSA systems. These activities are coordinated with the San Diego Supercomputer Center via the Cyberinfrastructure Partnership. NCSA’s involvement in several scientific and engineering projects with large data needs has required us to better integrate networking, computation, and storage hardware and software. Responding to the needs of these projects, NCSA has increased its emphasis on high-level data services with additional hires in FY2005. Building on our strength in storage enabling technologies, and recognizing the complexity of today’s filesystems and scientific data management needs, we are working to ensure that our users can take optimal advantage of our integrated storage, compute, and networking capabilities to address their research challenges. For individual users and the scientific and engineering community as a whole, NCSA provides both asynchronous and synchronous training covering a broad range of topics in the use of high-end resources, Access Grid technology, and cyberinfrastructure. This training is offered in a variety of formats: demonstrations, conference presentations, hands-on workshops, and remote seminars via the Access Grid. A number of offerings were provided in FY2005, attracting participants nationwide and abroad. NCSA delivered its first online tutorial in FY2000 and continues to provide a set of online tutorials via the WebCT learning management system. Currently, 15 courses are supported in this system with a total of 4,761 student logins created in FY2005. One new course, “Debugging Parallel and Serial Codes,” was completed and released to the user community in FY2005.
Figure 3. Use of NCSA supercomputing and data storage resources by discipline.
In addition, NCSA provides leading-edge mass storage and visualization resources: • Mass Storage System: the mass storage system is an EMC/Legato DiskExtender (UniTree) system running on two SGI Origin 3900 servers each with 16 processors, 12 GB of memory, and 10 1 GigE connections. The environment includes 35 TB of disk cache and has a total archival storage capacity of 5 PB. The combined aggregate throughput of the mass storage system is 750 MB/s. • Visualization System: The high-end visualization system consists of eight SGI Prisms, each with eight Itanium2 processors and 8 GB of memory. The Prisms are connected to each other and to Cobalt via an Infiniband interconnect fabric and share the same CxFS filesystem as Cobalt. Currently, 2.2 PB of data are stored in the NCSA mass storage system. Roughly half of the data was ingested during FY2005. This is as much as had been moved onto our storage systems in the previous 19 years combined! Between 40 and 60 TB of additional data was ingested each month in FY2005.
Accolades
The FY2005 NSF Cyberinfrastructure Survey reflected NCSA’s dedication to excellence. Academic scientists and engineers ranked NCSA’s computing and data systems and the support that they received at or near the top in all categories. NCSA ranked #1 among TeraGrid sites when users were asked to rate “the overall quality of the computing environment (computing systems, networks, consulting support, tools, and software, etc.).” NCSA’s Mercury computing system as well as its mass storage system ranked #1 (both tied with SDSC), as did its consulting and help desk support for TeraGrid. Three other NCSA systems also ranked highly. For example, Tungsten was ranked #2 (tied with SDSC’s IBM p655). Full results of the survey are available at: http://www.ci-partnership.org/survey/report2005.html.
TeraGrid
In FY2005, the National Science Foundation made a five-year award to operate and enhance the TeraGrid. The TeraGrid is a federation of computing, data storage, visualization and experimental resources located at eight sites: Argonne National Laboratory, Pittsburgh Supercomputing Center, Indiana University, NCSA, Oak Ridge National Laboratory, Purdue University, San Diego Supercomputing Center, and the Texas Advanced Computing Center. NCSA is both a significant provider of TeraGrid resources and a source of project leadership.
User Services
NCSA’s leading-edge computing, data storage, and networking resources are backed by a number of invaluable user services. The Consulting Office and HelpDesk group provides assistance via email, telephone, and walk-in contact. Users may interact directly with specific consulting staff when working on on-going or indepth problems. The Performance Engineering and Computational Methods group has extensive experience in supporting scientific and engineering users and communities in porting and optimizing
9
The TeraGrid—built over the past four years—is the world’s largest, most comprehensive distributed cyberinfrastructure for open scientific research. Through high-performance network connections, TeraGrid integrates high-end computers, data resources and tools, and high-end experimental facilities, making these resources accessible to researchers across the country to accelerate advances in science and engineering research and education. NCSA offers TeraGrid users access to two high-performance computing systems, Mercury and Cobalt, that together provide more than 16 teraflops of computing power. By April 2006, all NCSA systems will be available for allocation as TeraGrid resources, providing more than 40 percent of the total computational and nearly 40 percent of the storage capabilities on the TeraGrid. NCSA also contributes personnel to coordinate TeraGrid development and production services, to maintain and support the TeraGrid resources provided at NCSA, to assist TeraGrid users, and to enhance and advance the usability and utility of the TeraGrid. A special focus of NCSA’s user support is a partnership with TeraGrid scientific communities to architect high-end database solutions on NCSA’s SGI database server, Charon, running Oracle 10g. In FY2005, NCSA staff: • Played a key role in converting the GPFS parallel file system into a production-ready, global file system known as GPFS-WAN. • Reconfigured Mercury to include more data servers to respond to evolving user requirements.
• Provided dedicated reservations that set aside specific nodes for a period of time to ensure that multiresource grid jobs had contiguous access to resources, among other reasons. • Expanded NCSA’s Application Software Repository to include information on third-party software installed at all TeraGrid sites.
Cyberinfrastructure Partnership
The Cyberinfrastructure Partnership (CIP) is a joint endeavor of NCSA and San Diego Supercomputer Center to stimulate the expansion of cyberinfrastructure activities at multiple levels within the national community. FY2005 was a year for building the CIP, but there were several successes such as the launch of the CTWatch Quarterly newsletter, the release of the CI-Partnership.org Website, and several initial collaborations supporting researchers using resources at both sites to accomplish their scientific goals. In addition, a very successful CIP-TeraGrid Joint All Hands Meeting was held in April 2005 that helped leverage activities across the core and TeraGrid programs. CIP activities in FY2005 included: • Cyberinfrastructure Technology Watch. CTWatch is a forum for ideas, opinion, and analysis of issues relevant to the cyberinfrastructure community. The CIP created CTWatch Quarterly, an online publication modeled on a more traditional academic journal, and, along a more experimental line, CTWatch Blog for frequent updates, commentary, and links to developments and ideas in cyberinfrastructure. In 2005, four issues of CTWatch Quarterly were published, each with guest editors of international renown.
Exploring new usage models: Block allocations
Many users and communities are taking advantage of block allocations on NCSA’s computing resources. Block allocations have been used by 17 research groups representing 20 projects. A few examples are summarized below. Protein structure prediction and refinement David Baker and his team at the University of Washington are in the business of predicting and refining protein structures. A tailored account on NCSA’s Tungsten cluster gave the team more than just the power they needed: “We’d never computed on this system when we got our special allocation,” he says, and they were still in search of the precise approach that they would use for their structure predictions and refinements. A tailored allocation is “very good for methods development. Having dedicated time over days allows you to make rapid progress. You try different things quickly and get daily feedback. That’s really, really helpful as you’re trying to get on your feet.” Currently, the team has an in-house server system dedicated to conducting these protein structure predictions and refinements. It serves an entire community of researchers and is overtaxed. NCSA is configuring a portion of its Radium cluster to provide additional capacity to these researchers. Quantum chromodynamics Quantum chromodyanmics calculations by members of the MILC collaboration proceed in two steps. Ground state configurations are calculated with Monte Carlo simulations, then the group, along with many other physicists performing numerical studies of quantum chromodynamics, uses the data from these simulations to explore a wide variety of physical attributes of the subatomic world. The bottleneck is the Monte Carlo calculations. “Each ground state configuration is generated from the preceding one, so we cannot run jobs in parallel or start one job before the previous one ends,” explains Robert Sugar, the leader of the collaboration and a professor at the University of California at Santa Barbara. “As a result, we are in a poor position to compete for time with many of the users of normal queues who can have several jobs in the queues at once,” Sugar notes. Without a block allocation on Tungsten, there would be a ripple effect throughout the field. “The Department of Energy and the National Science Foundation spend approximately $750 million per year on their experimental programs in high-energy physics. A significant fraction of that is devoted to the study of weak decays of strongly interacting particles, a primary focus of our research.” The calculations must keep pace with the experiments.
10
• Security. The CIP security team evaluated the security posture of both partners against the security categories outlined in the Defense-In-Depth strategy paper (see the CIP Website). The team prepared a network intrusion detection report, and SDSC staff visited NCSA to learn about NCSA’s active network intrusion detection system based on the widely used public domain bro utility. The team also developed a portal security policy to support the creation of portals that need authentication and access control. The policy was presented to the TeraGrid security and Science Gateways groups. • Resource evaluation and planning. The joint resource evaluation and planning team made significant progress on a common benchmark suite, developing an initial version that incorporates probes and applications benchmarks (see the CIP Website). Probes are very low-level microbenchmarks that test individual components of a compute system including CPU, memory, network, and I/O. Application benchmarks include actual application test runs that allow users to make more direct comparisons of performance characteristics. Baseline comparisons were produced from test runs on NCSA and SDSC resources. SDSC is working to incorporate their application test results into NCSA’s database, which will become the new CIP performance database. • User engagement. CIP user engagement activities are coordinated in five areas—strategic applications collaborations, coordinated user services, allocations, training, and documentation—to enhance services for users, leverage best practices, and maximize the services provided through core funding. Enzo, Rosetta, and CyberShake were the focus of FY2005’s strategic applications collaborations.
Persistent Infrastructure Directorate Office
John Towns, Associate Director Tim Cockerill, Senior Project Manager Greg Pluta, Project Manager Amber Moore, Administrative Support Ronda Pellegrini, Administrative Support Susan Vinson, Administrative Support High-end Computing Division Mike Pflugmacher, Division Director Wayne Hoyenga Karen Fernsler Brian Kucic Chit Khin Mike Pingleton Henk Ten Have Galen Arnold Weddie Jackson Nahil Sobh Rick Kufrin Elaine Cler Steve Quinn Michael Shapiro Tim Bouvet Dan Lapine Nada Cagle Callie Montgomery Nancy Rudins Frank Wells Ramesh Balakrishnan Seid Koric Greg Bauer Mark Straka Vicki Halberstadt Chris Pond Ester Soriano Peter Enstrom Jim Long Stan Hicks Timothy Nickens Reta Sellers Susan John Dodi Heryadi Dave McWilliams Rooh-ul-Amin Khurram Ahmed Taha Barbara Jauhola Amy Schuele Sudhakar Pamidighantam
Data Knowledge Visualization Division Ruth Aydt, Division Director Darren Adams Dora Cai Emily Wu Michelle Butler Chris Cribbs Andy Loftus Tanweer Alam David Flemming Dave Bock Jason Alt Jim Glasgow Anthony Tong Federico Bassetti Elena Pourmal Alan Craig Brad Butler Chad Kerner
Astrophysics Every time a team of astrophysicists from Louisiana State University and Long Island University make a run on Tungsten, a star is born—a pair of them, in fact. Work by this team is altering scientists’ thinking on the mass ratio at which binary stars return to stability instead of merging in a spectacular and violent cosmic event. When they asked for their first block allocation on Tungsten, they had just received a referee report from a submission to The Astrophysical Journal. It said that “the conclusions we drew in the paper would be significantly strengthened if we could repeat one of our extended simulations using slightly different initial parameters. We knew from experience that, using the standard operating procedure, this simulation would require at least a month to complete,” says Joel Tohline, the professor at Louisiana State University who led the team. A week-long, 256-processor run on Tungsten was set up in short order, and the publication went to print shortly thereafter. “Our peers and funding agencies expect to see measurable progress on challenging and timely and relevant problems. If we invest our time performing a simulation that can be completed in a week’s time on 32 processors, it is not likely to be addressing one of the most challenging problems that presently confront us. NCSA’s commitment to dedicate major resources when they’re needed to a single problem is in synch with this overall philosophy,” Tohline says.
Internal Infrastructure Division Jackie Kern, Division Director Patrick Dorn Paul Wefel Jeff Carpenter Sara Archacki Pall St. John Matt Elliott Nicholas Merker Nick Buraglio Davey Wheeler Ken Jackson Alex Farthing Chris Lindsey Bill Glick Emilie Shoop George Estes Michael Miller Bruce Mather Mike Dopheide Eric Mosher
Production Cyberenvironments Division Doru Marcusiu, Division Director Kazi Anwar Tom Roney Scott Parker Joshi Fullop Michelle Gower John Quinn Byoung-Do Kim
Training and Documentation Division Sandi Kappes, Division Director Jennie File Leslie McNeil Herb Morgan
Security Division Jim Barlow, Division Director Tim Brooks Jeff Rosendale Aashish Sharma
11
Breakthroughs
There are more than 41 teraflops of computing power at NCSA, more than 1 petabyte of rotating disk, and 5 petabytes of archival storage. All of this is connected to TeraGrid partners around the country, offering their own impressive set of resources that can be shared. This provides enormous capabilities for advancing scientific discovery and the state-of-the-art in engineering. Below is a selection of researchers who made noteworthy breakthroughs in FY2005 thanks in part to NCSA’s resources and expertise. one with a generic amino acid and observed what went awry. One substitution nearly derailed the reaction. Normally, the enzyme uses one molecule of glutamine to make one molecule of IGP, an efficient 1:1 substrate/product ratio. The mutation changed the ratio to an abysmal 122:1. In another test, when the ammonia moved near a lysine gate residue, the lysine actually bent, and ammonia slipped through this newly discovered side opening. Once the ammonia had passed inside, the simulation revealed, the lysine swung shut behind it. This work appeared on the cover of Biophysical Journal. For more information, see: http://access.ncsa.uiuc.edu/Stories/IGP/ see also, R. E. Amaro, R. S. Myers, V. J. Davisson, and Z. A. LutheySchulten, Biophysical Journal 89, 475-487 (2005). Inteins Sajoj Nayak, Rensselaer Polytechnic Institute Inteins are segments of proteins that remove themselves entirely from the proteins of which they are components and then splice the remaining parts together to form a whole molecule. However, not much is really known about the exact mechanism by which the intein excises itself from the protein. Which amino acids are responsible for splicing the protein remnants back together is also unclear. The team, along with collaborators at the Wadsworth Center of the N.Y. State Department of Health, is using molecular-scale modeling to simulate intein behavior and the environment that triggers cleaving and splicing. The researchers run calculations on the entire protein and the solution surrounding it with a typical run containing 6,000 to 7,000 atoms. From these simulations, the collaborators ascertain what the possible key aspects (for example, temperature, acidity, etc.) might be that trigger the intein’s behavior, alter them, and run more simulations accordingly. After trying several, they have isolated one mechanism that is consistent with experimental data. It is now being tested in the laboratory. A paper, “Mechanism for C-terminal Intein Cleavage: Quantum Mechanical Simulations,” is in advanced preparation. For more information, see: http://access.ncsa.uiuc.edu/Stories/inteins/ Lac repressor Klaus Schulten, University of Illinois at Urbana-Champaign The Theoretical and Computational Biophysics Group at the University of Illinois at Urbana-Champaign decided to examine a prototypical case of gene suppression by looking at the lac repressor in the bacterium E. coli, simulating its activity in a realistic water/salt environment involved using the group’s NAMD parallel molecular
Biological sciences
Comparative genomics Harris A. Lewin, University of Illinois at Urbana-Champaign Comparative genomics, which roots out the finer points of genetic distinction and similarity among species, requires a large amount of data from disparate sources. Traditionally, it also requires a lot of manual work. Comparing two species is standard. Comparing eight is unheard of. But that’s the problem a team from the University of Illinois at Urbana-Champaign, along with collaborators, took on. Creating the map used to make cross-species comparisons is typically a daunting, laborious undertaking. The team instead relied on Evolution Highway, data visualization and analysis software built for them by NCSA and based on the center’s D2K framework. The team’s study broke with conventional wisdom, showing that evolution has been moving faster for the last 65 million years than it did for the 35 or so million years before that. They also found that “reuse” breakpoints—which show up in the same place across multiple species and are thought to mark fragile places in the genome where rearrangements are more likely to occur—are more common than previously thought. For more information, see: http://access.ncsa.uiuc.edu/Stories/break-point/ see also, W. J. Murphy et al., Science 309, 613-617 (2005). IGP synthase Zadia Luthey-Schulten, University of Illinois at Urbana-Champaign, and V. Jo Davisson, Purdue University Chemists at the University of Illinois at Urbana-Champaign and Purdue University simulated and visualized the workings of the enzyme imidazole glycerol phosphate (IGP) synthase. To determine the role of four gate residues at the connection between IGP’s two constituent parts, the scientists replaced them one by
12
dynamics code to calculate the behavior of 300,000 atoms. The researchers employed resources at both NCSA and the Pittsburgh Supercomputing Center. The researchers’ simulation showed for the first time how the lac repressor is able to withstand the forces by which DNA attempts to wrench free. The lac repressor grabs onto the gene expression machinery with two arms that terminate in thin coils. These coils provide the flexibility that allows the lac repressor to maintain its grip on the DNA, even allowing it to withstand simulated forces 50 times stronger than those in a real-world system. For more information, see: http://access.ncsa.uiuc.edu/Stories/lac/ MIDAS Neil Ferguson, Imperial College London Members of the National Institutes of Health’s Models of Infectious Disease Agent Study (MIDAS) are building computer models of the spread of an influenza epidemic in human populations to estimate the effectiveness of different strategies to limit the spread of the disease. The models simulate every individual in the United States or other countries, together with every school and workplace and the journeys people make. The researchers are using NCSA’s computing resources to undertake large numbers of model runs to rigorously examine the wide range of different pandemic spread and control scenarios that need to be evaluated. NCSA staff are assisting in profiling the models’ performance and optimizing the code being used. Cobalt is being used for these simulations because of the large shared memory available on that system. The results are being used to assist policy makers at the highest levels of the U.S. and other governments. Myosin and kinesin Edward Pate, Washington State University Myosin controls the skeletal muscles responsible for every move you make, the cardiac muscle that pumps your blood, and the smooth muscles that control blood pressure, push nutrients and waste through the intestines, and drive uterine contractions. Myosin’s close relative, kinesin, carries out vital functions in non-muscle cells, shepherding chromosomes to their proper places during cell division, for example, and ferrying proteins, lipids, and other molecules from neural cell bodies to distant axons. Scientists understand the basic outlines of how myosin and kinesin do their jobs, but certain crucial details are still unclear. Researchers from Washington State University, the University of California at San Francisco, and the private sector are using a combination of experimentation and simulation on NCSA’s computational systems to understand how the proteins produce force and motion, with each technique filling in part of the picture. For more information, see: http://access.ncsa.uiuc.edu/Stories/myosin/
Protein folding Carlos Simmerling, State University of New York at Stony Brook Surprisingly little is known about the physical structures of proteins and how they are created from gene sequences. Scientists know that proteins undergo processes called folding and unfolding through which they are built and torn down, but they don’t know how or why naturally occurring proteins consistently display a particular shape, or native state, after the folding process. Researchers at the State University of New York at Stony Brook are working to understand why proteins fold and unfold and what happens during the process. Using NCSA’s supercomputing clusters, they are simulating the folding and unfolding processes of small proteins and comparing their findings to experimental results from other research groups. For more information, see: http://access.ncsa.uiuc.edu/Stories/proteins/ Protein structure server David Baker, University of Washington A team from the University of Washington is in the business of predicting and refining protein structures. These structures are traditionally derived using limited experimental data or by starting from first principles and simulating the structure from scratch. This group’s technique combines the two to produce much more accurate models. A tailored account on NCSA’s Tungsten cluster gave the team more than just the power they needed—this account was key to their methods-development process, affording them daily feedback and rapid progress. Currently, the team has an in-house server system dedicated to conducting these sorts of protein structure refinements. It serves an entire community of researchers and is overtaxed. NCSA is configuring a portion of its condor cluster (Radium) to provide additional capacity to those researchers. It will expand their back-end capacity without any front-end change; researchers will continue to interact with the server as they always have.
13
Engineering
Airflow modeling Yuanhui Zhang, University of Illinois at Urbana-Champaign Bioenvironmental engineers at the University of Illinois at Urbana-Champaign have developed a greatly improved technique to track the airflow in enclosed spaces: volumetric particle tracking velocimetry (VPTV). The algorithms behind VPTV were developed by the team over the last decade on a succession of NCSA supercomputers. With the help of NCSA experts under the auspices of the NCSA/UIUC Faculty Fellows Program, the team is now beginning to visualize the results in virtual environments. NCSA’s ShadowLight software has allowed them to visualize a mock airplane cabin in the CAVE virtual environment and on stereoscopic display systems. VPTV gives researchers the massive amounts of experimental data that they need. These data allow them to develop and validate computational fluid dynamics models. With the combination of experiment and simulation, engineers bring their work back to reality—modeling the conditions in ventilated airspaces. The technology can be applied to agricultural, industrial, or residential buildings or vehicles such as airplane cabins to explore methods of improving the design of bioenvironmental systems. Arterial tree simulations George Karmiadakis, Brown University A team from Brown University, Argonne National Laboratory, and Imperial College in London completed the largest-ever simulation of blood flow through the human arterial tree. Their simulation included 55 arteries and 27 bifurcations, accounting for every artery in the human body larger than 2 millimeters in diameter. No one had ever attempted to model more than two arteries at the same time. The record-setting calculation took place in concert on the TeraGrid with machines at NCSA, the San Diego Supercomputer Center, the Pittsburgh Supercomputing Center, and the Texas Advanced Computing Center, as well as a computer from the United Kingdom’s Computer Services for Academic Research program. A one-dimensional simulation of the entire tree was run at NCSA, tracking stream-wise blood flow throughout the body. This fed detailed, three-dimensional simulations of the bifurcations at the other centers and on other systems at NCSA. Environmental remediation Barbara Minisker, University of Illinois at Urbana-Champaign Using the Umatilla Chemical Depot in Oregon as a case study, researchers at the University of Illinois at Urbana-Champaign and NCSA designed a study to determine whether using an accurate cost function is a better way to choose a cleanup plan for environmental contamination than the standard practice of using rough cost estimates. NCSA staffers developed a computational framework that can
distribute the many fitness evaluations involved in a complete simulation across multiple computing clusters. Simulations were run on TeraGrid systems at NCSA and the San Diego Supercomputer Center. The simulations identified the tradeoffs between costs and benefits of various cleanup designs. They concluded that ignoring cost complexities could increase the cost of the cleanup by up to 14 percent. Given that cleanups can cost millions of dollars, the time and expense of accurately identifying site-specific cost functions is worthwhile. Ill-posed problems Rebecca Hartman-Baker, Oak Ridge National Laboratory In what are known as illposed problems, there is no unique solution. A slight change in the data fed into the system of functions that rule a given ill-posed problem can produce a large, unpredictable change in the results. A computer scientist now at Oak Ridge National Laboratory focused on an ill-posed problem found in the field of geoprospecting—hunting for oil and other deposits using electromagnetic energy. In hundreds of runs on NCSA systems over two years, she tested the viability of a particular selection method and of the diffusion equation method for finding the global minimum for a system of mathematical functions. Ill-posed problems are found in medical imaging, financial modeling, environmental modeling, and astronomy. Though these studies focused on geoprospecting, the approach applies to any of those fields and many more. For more information, see: http://access.ncsa.uiuc.edu/Stories/ill-posed/ Transportation engineering Khalid El-Rayes, University of Illinois at Urbana-Champaign State and federal transportation departments want to make sure that their significant infrastructure investments are worthwhile. Accordingly, both the duration of a highway construction project and the quality and the durability of the end product are considerations, just as important as the cost of the project. How to reach a comfortable tradeoff between these conflicting objectives? That’s the question that faces engineers at the University of Illinois at Urbana-Champaign, who are using NCSA machines to optimize the decision-making process. Their genetic algorithm-based model allows an engineer or construction manager to generate a large number of possible construction resource utilization plans that provide a wide range of tradeoffs among project cost, duration, and quality and to eliminate the vast majority of suboptimal plans quickly. With Tungsten and the help of NCSA’s Performance Engineering and Computational Methods group, they are exploring how to parallelize computations over a number of processors. In the future, they hope to add even more factors for consideration, including safety, service disruption, and environmental impact. For more information, see: http://access.ncsa.uiuc.edu/Stories/construction/
14
Geosciences
Atmospheric science visualization Donald Wuebbles, University of Illinois at Urbana-Champaign Many simulations produced by atmospheric scientists not only generate huge data sets, the calculated data must be compared to vast stores of data gathered from climate monitoring. Collaborating through the NCSA/UIUC Faculty Fellows Program, researchers in UIUC’s Atmospheric Sciences Department and at NCSA developed sophisticated data visualization techniques in order to more readily extract meaning from this data and to more easily convey their significance to non-scientific audiences. They developed the code needed to create an interactive 3D visualization of the multi-variable output of a global climate model on a tiled display wall. The project relied on PartiView, a 3D visualization tool developed at NCSA. Individual variables, such as surface temperature and solar flux, are shown on a single node as their changes play out over time and across the continents. The display is interactive and can be embedded in Virtual Director, the virtual reality interface developed at NCSA. For more information, see: http://access.ncsa.uiuc.edu/Stories/PFOA/ Hurricane Katrina Richard Luettich and Brian Blanton, University of North Carolina at Chapel Hill Rapid-response computing at NCSA played an important role in the wake of Hurricane Katrina when floodwaters containing organic and chemical pollutants such as sewage and oil threatened to cover swaths of Mississippi and Louisiana. A group of researchers from the University of North Carolina at Chapel Hill used their 3D hydrodynamics code to simulate the 60-day forecast of water velocity and water surface elevation needed by NOAA’s Office of Coast Survey to provide near-shore, high-resolution input into hazard trajectory models. The required computational runs were completed in about 15 hours on NCSA resources, with little delay between the time the requests were made and the runs were initiated. For more information, see: http://access.ncsa.uiuc.edu/Releases/09.26.05_NCSA_Provi.html Oil reservoir simulations Mary Wheeler, University of Texas at Austin; Joel Saltz, The Ohio State University; and Manish Parashar, Rutgers University Today’s oil companies demand intelligent ways to choose the best places to site their equipment and to surmise the geological features of the ground beneath it. Using TeraGrid resources, a multidisciplinary team from the University of Texas, The Ohio State University, and Rutgers University is at work on software tools that improve companies’ oil reservoir management. A massive amount of computing is needed to identify the best possible configurations in the shortest amount of time. NCSA helped port reservoir simulation and optimization codes to the TeraGrid machines and built a toolkit that simplifies execution across multiple systems. The project will allow oil companies to better exploit existing reservoirs, find new reservoirs, and minimize drilling’s adverse environmental impact. Ultimately, the team plans a system that allows those prospecting for oil to build a database of possible conditions that have had reservoir optimizations run. Companies will assess the geological features of the site they are interested in, query the database for the description that most closely resembles the site, and receive an already-completed optimization in return. For more information, see: http://access.ncsa.uiuc.edu/Stories/oil/ Stratified jets Kraig Winters, Scripps Institute of Oceanography and University of California at San Diego Motions in stably stratified environments—such as oceans, lakes, and some parts of the atmosphere—are often highly anisotropic. In other words, their properties vary depending upon the direction in which the properties are measured. This feature is due to the presence of buoyancy forces that act in only one dimension. In laboratory experiments, for example, the fluid flow in the wake of a towed sphere evolves toward a highly stable configuration of vortices with nearly horizontal motions called pancake vortices. Using a pseudospectral numerical model, a cross-country team of researchers recently achieved high enough Reynolds numbers to resolve secondary instabilities in an idealized jet flow. The team included the Scripps Institution of Oceanography, the University of California at San Diego, the University of Washington, and the University of Massachusetts at Amherst. Calculations were completed at the Scripps Institution. Visualizations were created by NCSA. For more information, see: http://access.ncsa.uiuc.edu/Stories/S-Jet/
15
Mathematical and physical sciences
Binary stars Joel Tohline, Louisiana State University Just as the moon influences the earth with its gravity, causing the tides to wax and wane, stars in a binary system exert a pull on one another. These tidal interactions can cause material to transfer between the stars and distort the stars’ gravities, densities, sizes, and distance from one another. In some cases, the stars even merge in a spectacular and violent cosmic event. A team from Louisiana State University is altering scientists’ thinking on the mass ratio at which binary stars return to stability instead of coming to a catastrophic end, pushing it lower than expected. For more information, see: http://access.ncsa.uiuc.edu/CoverStories/binary_stars/ Biochemistry and multiscale modeling Gregory Voth, University of Utah Chemists at the University of Utah used NCSA systems to address proton transport in biomolecular systems, multiscale modeling of the cytoskeleton, and membrane remodeling via membrane protein interactions. They were able to analyze these biophysical and biochemical processes with unprecedented accuracy and detail. Classical molecular dynamics simulations with full electrostatics for a biological system composed of 575,000 to 750,000 atoms can now be conducted for a time-scale of 100 nanoseconds. Their progress resulted in six published papers in 2005 alone, including works in the Proceedings of the National Academies of Science and Biophysical Journal. Black hole visualization Andrea Ghez, University of California at Los Angeles A team led by the University of California at Los Angeles observed the orbital motion of 15 bright stars in the inner core of the Milky Way, yielding the best evidence yet that the center of our galaxy contains a massive black hole. Drawing on this data, the NCSA visualization team has created an animation that approaches the galactic center of the Milky Way, showing stellar orbits around the black hole. The visualization shows 150 years of the stars’ simulated motion along the reconstructed orbits, embedded in a 3D model—partly artistic, partly science-based—of the inner Milky Way. This footage is now part of a planetarium dome show, “Black Holes: The Other Side of Infinity,” which debuted at the Denver Museum of Nature and Science. For more information, see: http://www.ncsa.uiuc.edu/News/Access/Releases/ 06.09.05_NCSA_Visua.html Computational fluid dynamics Danesh Tafti, Virginia Polytechnic Institute and State University Researchers at Virginia Tech University are working with engineers at General Electric and the South Carolina Institute for Energy Studies at Clemson University, which is supported by the Department of Energy, to model airflow around tiny ribs inside gas turbine blades. Using GenIDLEST and NCSA systems, they modeled a series of ribs in a channel and the air that flows past them. Currently, they are focusing on nine or 10 ribs, watching the air movement both as it develops and once it has settled into a stable, though still turbulent, flow. For more information, see: http://access.ncsa.uiuc.edu/Stories/blades/ Evolution of the universe Michael Norman, University of California at San Diego A team of astrophysicists led by the University of California at San Diego completed the most highly defined spatial and temporal simulation of the universe ever reported. The calculation involved 2,000 simulated snapshots of a wide expanse of the universe approximately 250 million light years across. Each snapshot signifies the passage of 6.8 million years, to encompass the nearly 14 billion years from the Big Bang to the present. As a joint project of the National Science Foundation’s Cyberinfrastructure Partnership, 26 terabytes of data generated by the simulation conducted at the San Diego Supercomputer Center were mirrored to the TeraGrid system at NCSA. NCSA visualization experts used this flood of data to create a breathtaking visualization of the origin and evolution of the universe. The resulting visualization, “Evolution of the Universe: Galaxies Forming on a Filamentary Structure,” was accepted for inclusion in DomeFest 2005, a juried exhibition of the cutting-edge in immersive dome visualization and was shown at SIGGRAPH 2005, which has been called “the Academy Awards of Computer Animation.” For more information, see: http://access.ncsa.uiuc.edu/Releases/06.09.05_NCSA_Visua.html MILC collaboration Robert Sugar, University of California at Santa Barbara The MILC Collaboration is using NCSA’s Tungsten cluster and other computing resources to study quantum chromodynamics (QCD), the theory of the strong interactions of sub-atomic physics. The calculations proceed in two steps. First, one performs a Monte Carlo simulation to generate representative configurations of the QCD ground state. Then one uses these configurations to study a wide variety of physical phenomena. The MILC Collaboration makes its configurations publicly available, and they are used by many theoretical physicists in the U.S. and U.K. The bottleneck in QCD calculations is the generation of configurations. This process takes the bulk of the computer time, and it is sequential. Each configuration is generated from the previous one, so jobs cannot be run in parallel. A dedicated queue was set up on Tungsten for the MILC group, which they found extremely helpful in overcoming this bottleneck. QCD calculations are closely coupled to the large experimental programs in high-energy and nuclear physics. They are essential to fully capitalize on the investments that the DOE and NSF make in these experimental programs. One highlight in 2005 was the prediction, subsequently confirmed by experiment, of the leptonic decay constant of the D meson, which was published in Physical Review Letters. For more information, see: The Fermilab Lattice and MILC Collaborations, Phys. Rev. Lett., 95 122002 (2005).
16
Modeling sol-gels Lev D. Gelb, Washington University The sol-gel process is a versatile technique for making ceramic and glass materials. Researchers at Washington University simulated the polymerization of silicic acid in aqueous solution, modeling systems with differing water-to-silicon ratios, silicic acid concentrations, and temperatures. The simulations are moving researchers closer to engineering sol-gel materials with specified properties. For more information, see: http://www.ncsa.uiuc.edu/News/Access/Stories/solgel/index.htm Nanoparticle haloing Erik Luijten, University of Illinois at Urbana-Champaign Researchers in the Department of Material Science and Engineering at the University of Illinois at Urbana-Champaign developed a new algorithm to simulate nanoparticle haloing, a self-organizing process that imparts stability to naturally attractive colloidal microspheres, decorating their superficial areas with highly charged nanoparticles. Nanoparticle haloing can be used to control the phase behavior and structure of materials assembled from colloidal systems. The team has been running their simulations on NCSA’s Mercury cluster. NCSA’s Performance Engineering and Computational Methods group is helping them optimize the code. For more information, see: http://www.ncsa.uiuc.edu/News/datalink/0503/pecm.html Orbiting black holes Edward Seidel, Louisiana State University The Laser Interferometer Gravitational Wave Observatory (LIGO) has just started its first year-long observation runs. To help identify the signals seen at LIGO, scientists are carrying out numerical relativity calculations to predict the gravitational waveforms that will be produced during the last couple of orbits, plunge, and merger of a binary black hole system. Through improvements to their evolution code, a team based at Louisiana State University’s Center for Computation and Technology has been able to extend the lifetime of their simulations to enable them to evolve a binary black hole system for more than one orbit before merger. By using fixed mesh refinement techniques, they have pushed their simulations to previously unprecedented resolution levels (a factor of more than three compared to the highest-resolution uni-grid simulations) while using a lot fewer processors. This allowed a series of runs at different resolutions for a careful convergence test, where they were able to extrapolate to infinite resolution and, for the first time, show that a full orbit was completed before the black holes merged. Palomar-QUEST sky survey The Palomar-QUEST sky survey (a collaboration of the California Institute of Technology, Yale University, Indiana University, the NASA Jet Propulsion Laboratory, Lawrence Berkeley National Laboratory, and University of Illinois at Urbana-Champaign) images large regions of the sky night after night, with an emphasis on astronomical objects whose appearances change over the course of weeks or years.
A 48-inch telescope collects 112 images approximately every two minutes. Raw image data is transferred via the TeraGrid backbone network and stored on NCSA’s mass storage system. Those data are processed on an NCSA computing cluster, taking advantage of a parallel file system deployed to support the project. Collaborators throughout the country have easy access to both the raw and processed data. Further processing at NCSA is made easier with a custom data-loading framework designed by NCSA in collaboration with astronomers at UIUC, which has been used to load a large fraction of the processed data into a TeraGrid-hosted Oracle database that leverages a novel time domain indexing structure to facilitate rapid synoptic queries. For more information, see: http://access.ncsa.uiuc.edu/Stories/quest2/ Quantum dots Gerhard Klimeck, Purdue University Purdue University engineers used NCSA’s Mercury cluster to model the largest quantum dot simulation ever. It consisted of more than 21 million atoms. Quantum dots are nanostructures that, while consisting of a large number of atoms, behave like artificial atoms in their ability to confine a number of electrons to a small space. The simulation used NEMO3D, which calculates the eigenvalues and eigenvectors in a quantum dot’s closed system to reveal the orbits of individual electrons within the dot. For more information, see: http://access.ncsa.uiuc.edu/Stories/NCN/ Solvation studies Brent Krueger, Hope College Solvation occurs when molecules of a solvent surround and stabilize those of a solute. The solvent molecules are always in random thermal motion—the higher the temperature, the faster they move around. As the solvent molecules move, the solute sees a constantly changing environment. These solvent-driven energy fluctuations have a crucial effect on the outcome of a chemical reaction. The problem is there are limits to physical methods used to study chemical interactions, such as optical spectroscopy, a technique that uses light to examine molecular interactions. Using NCSA’s Linux clusters, researchers at Michigan’s Hope College are perfecting a computational approach that uses a combination of molecular dynamics and quantum mechanics to calculate the movements of solvent molecules and their effect on the excitation energy of solute molecules. For more information, see: http://access.ncsa.uiuc.edu/Stories/mdqm/index.htm
17
Supernova and star formation Paul Ricker, University of Illinois at Urbana-Champaign NCSA’s Performance Engineering and Computational Methods group worked with NCSA research scientists and members of the University of Illinois at Urbana-Champaign’s astronomy department to develop a physics code module for FLASH that incorporates the effects of star formation and supernova feedback in cosmological simulations. The FLASH code was developed at the University of Chicago for creating simulations of Type-1A supernovae. It has gas dynamics, expansion of the universe, collision-less dynamics of dark matter particles, gravity, and other pieces of physics, but it doesn’t include a model for the subgrid and small-scale star formation processes. So the team, as part of NCSA’s Strategic Applications Program, is at work on a model and an extension to the FLASH framework that would allow models of star formation to be included in the code and let them begin testing the resulting predictions against observations of real galaxies.
Land-use planning Bruce Hannon, University of Illinois at Urbana-Champaign Many questions arise when government officials and urban planners try to make the best decisions for their communities. A group at the University of Illinois at Urbana-Champaign is able to offer decision makers a glimpse of the future with the Land-use Evolution and Impact Assessment Model (LEAM), a computational model that simulates land-use change. The LEAM model allows decision makers to test scenarios, helping them to consider what their preferred outcome is and how they can get there. It draws on large amounts of data and performs numerous calculations for each of millions of “cells” over decades of time steps. The model has been used by numerous state and federal agencies. For more information, see: http://access.ncsa.uiuc.edu/Stories/future/ Music information retrieval J. Stephen Downie, University of Illinois at Urbana-Champaign Band directors looking for music to play, musicologists searching out a reoccurring melodic theme across different pieces, students looking for musical excerpts from a specific genre, or radio producers working to expand playlists—imagine if they could have audio recordings, written musical scores, and text files at their fingertips. Researchers in the areas of music information retrieval and the music digital library field seek to make searchable music data repositories of both public domain and copyrighted music. Researchers in the University of Illinois at Urbana-Champaign’s Graduate School of Library and Information Science are building the International Music Information Retrieval Systems Evaluation Laboratory (IMIRSEL). One of the fundamental challenges of creating IMIRSEL is finding a way to make many different types of information—lyrics and reviews in text form, MP3, WAV and MIDI files, and metadata such as bibliographic records—accessible and searchable. NCSA’s Automated Learning Group has provided the solution by adapting its D2K data mining application environment to suit the needs of music researchers. For more information, see: http://access.ncsa.uiuc.edu/Stories/MiningMusic/ Political information processing Sung-youn Kim, University of Iowa, and Milton Lodge and Charles Taber, State University of New York at Stony Brook How do people assess political candidates? How do campaign events and new information change their views? Various theories address these questions, but researchers at the University of Iowa and Stony Brook University saw gaps between existing models and empirical findings. To overcome these gaps, the researchers integrated cognitive and affective information-processing theories into a computational model (dubbed John Q. Public) that simulates how voters’ political opinions fluctuate during a campaign. Because of this complexity and the sheer computational intensity of the simulation, the researchers relied on the computational power of the TeraGrid at both NCSA and the San Diego Supercomputer Center. For more information, see: http://access.ncsa.uiuc.edu/Stories/JohnQ/
Social, behavioral, and economic sciences and the humanities
Consumer marketing David Goldberg, University of Illinois at Urbana-Champaign When a group of people aims to develop something new—a product, a disaster-response plan, an advertising campaign—they need to draw on data, share information and ideas, and discuss scenarios. Researchers at NCSA and the Illinois Genetic Algorithms Laboratory (IlliGAL) developed a tool, called DISCUS, to facilitate creativity, innovation, and collaborative work in such complex situations. DISCUS was used to help the Hakuhodo Institute of Life and Living (the second-largest marketing and publicity firm in Japan) gather and analyze information about consumers’ cell phone preferences, providing insights that will help in the development of new products. DISCUS was used to develop and conduct focus groups in a fraction of the time and with fewer staff than are usually required. DISCUS is supported by the Technology Research, Education and Commercialization Center (TRECC) and the National Center for Advanced Secure Systems Research; both programs are funded by the Office of Naval Research and administered by NCSA. For more information, see http://access.ncsa.uiuc.edu/Stories/DISCUS/ End of Cinematics Mikell Rouse, independent stage producer NCSA visualization and multimedia experts consulted with the New York-based writer and producer of “The End of Cinematics” on different technologies to enable the unique staging of the opera. During its world premiere in September at the Krannert Center for Performing Arts, the opera employed an enormous display screen as its virtual set, and NCSA created a digital animation showing how the featured performers, screens, projectors, and scrims would interact. This virtual storyboard was used to garner support for mounting the production. NCSA also edited and provided footage that was used in the production. For more information, see: http://access.ncsa.uiuc.edu/Releases/09.13.05_NCSA_Helps.html
18
Cyberenvironments, Cyberservices, and Cybertechnologies
Numerous NSF reports have shown that a national cyberinfrastructure will allow researchers to answer critical scientific and engineering questions, many that could not have even been posed earlier. In addition, a national cyberinfrastructure will enable multidisciplinary efforts to understand the complex systems that make up the real world—with untold opportunities to improve the health, well-being, and economy of the nation. However, scientists and engineers also face enormous challenges in harnessing the rapidly changing cyberinfrastructure to drive frontier research. Researchers collect data; integrate data from multiple sources; model and simulate phenomena; mine, analyze, and visualize data; and collaborate with remote colleagues. This requires an extensive, robust, and reliable cyberinfrastructure. To realize the full benefits of a national cyberinfrastructure, we must make the cyberinfrastructure as accessible and usable as Web browsers made the Internet. NCSA is committed to developing the integrated, end-to-end software systems—cyberinfrastructure environments or cyberenvironments—required to meet this challenge.
to interact with the cyber¬infrastructure using concepts and approaches familiar to their specific discipline—they are a disciplinary lens over the cyberinfrastructure to focus resources and capabilities on the solution of a specific problem. Cyberenvironments are built on technologies such as portals, workflow engines, and semantic data and service descriptions to enable continuing addition of new cyberapplications, cybertools, and cyberservices as researchers’ needs evolve and scientific understanding grows. A broad range of expertise and technical capabilities are required to create community-scale cyberenvironments capable of serving as the base for next-generation research and education. With its partners, NCSA’s Cyberenvironments and Technologies (CET) and Cyberapplications and Communities (CAC) directorates are obtaining an in-depth understanding of the scientific and engineering communities that NCSA serves. They are engaging communities to analyze scientific processes and gather requirements; integrate and develop innovative cybertechnologies across many technical areas; and create flexible, integrated production-capable cyberenvironments. In addition to creating the set of technologies that underlie cyberenvironments, they are developing capabilities that enable the configuration, monitoring, operation, and analysis of cyberenvironments at the scale of entire scientific communities. Cyberenvironments are frequently considered to be gateways to large-scale computational capabilities and community data stores or as collaboration spaces. They are also often assumed to be Web portal-based. NCSA is extending this definition to ensure that cyberenvironments satisfy the future needs of scientists and engineers. In particular, future cyberenvironments will: • Allow researchers to manage large-scale and complicated scientific projects and processes. • Allow researchers to manage the diverse and large-scale experimental, computational, and data resources needed to address challenging problems and complex phenomena. • Bridge local, institutional, and national cyberinfrastructure to create a seamless environment that ensures the most efficient and effective resources and capabilities are brought to bear on the problem at hand. • Assist in the bi-directional connection between raw or group research artifacts (data, notes, plans, etc.) and published artifacts (vetted data, annotations, best practices, reviews, and papers) to enhance the flow of information between basic research and application and between research and education. To provide these capabilities, NCSA is developing middleware abstractions above those currently available in the cyberinfrastructure as well as developing additional capabilities for automating or semiautomating processes. The concept of visual knowledge discovery— using data analysis to categorize, cluster, and extract features from large data sets coupled with interactive visualization—is a prime example of new capabilities needed to allow users to quickly digest data and build understanding. Similarly, capabilities to manage semantic information about data and resources will enable higher-level capabilities such as provenance tracking, annotation, and collaborative data curation.
Cyberenvironments
Cyberenvironments provide an easy-to-use interface to local and shared instruments, sensor arrays, data stores and data sets, computational systems, networks, scientific and engineering applications, data analysis and visualization tools and services, and collaboration capabilities, all within a secure framework. They help researchers organize and coordinate the appropriate subset of global resources available on a given problem and add capabilities that enhance researchers’ abilities to manage complex projects and automate processes within and across projects and disciplines as well as to collaborate effectively with colleagues near and far. Cyber¬environments are tailored to allow researchers and educators
Figure 4. Inter-relationship among cyberinfrastructure’s parts.
19
Cyberenvironment design and development methodologies will need to change to support large-scale deployment of cyberenvironments and to significantly reduce the cost involved in creating, adapting, and evolving cyberenvironments. Cyberenvironments must not be considered the product of one-time development projects—they are living infrastructure that will evolve with technology and with scientific discoveries and understanding over decades. NCSA is focused on building cyberenvironments on the principles of sustainability, adaptability, and scalability using current and emerging technologies such as Web and grid services, translating or integrating middleware (for example, MyProxy), global unique identifiers and metadata, workflow and provenance, and semantic descriptions of resources and data. These technologies reduce the coupling between the cyberinfrastructure and cyberenvironments, while maintaining end-to-end functionality. Revamping current development methodologies and tools to support this approach will be critical to achieving the overall cyberenvironment vision.
To maintain interoperability among different cyberinfrastructure projects and to maximize the sharing of tools, services, and technologies among cyberenvironments, NCSA formed a Cyberarchitecture Working Group. This group, which draws on staff from across NCSA’s directorates, serves as a forum to facilitate regular information exchange among the cyberenvironment, cyberservices, and cybertechnologies projects; a technical sounding board, allowing designs and decisions to be exposed for critique by a friendly but astute audience; and an expert panel that can dive deeply into technical areas of immediate interest from a broad range of perspectives. To begin the development of a cyberenvironment, CAC and CET staff initiate discussions with the pathfinders in a community or, if a community has begun to seriously explore the role of cyberinfrastructure in its research and education programs, with representatives of the community as a whole. These discussions provide an initial set of requirements, which are evaluated, prioritized, and presented to the NCSA Steering Committee. Upon approval of a cyberenvironment project by the Steering Committee, an Integrated Project Team (IPT) is assembled with staff, as needed, from across NCSA’s directorates. This team works with the Cyberarchitecture Working Group as well as the originating community to further refine the capabilities to be provided by the cyberenvironment and to produce a project plan. New cyberenvironments are designed to be integrated into NCSA’s existing cyberarchitecture and to be compatible with future directions. New cyberenvironments may be deployed as shortlived demonstrations or as prototypes that then move into the NCSA Testbed (Beta) for broad testing and evaluation. The overall process for building cyberenvironments is illustrated in Figure 5. The NCSA Testbed, established in FY2005, provides the means for user communities to thoroughly evaluate prototype cyberenvironments, cyberservices, and cybertechnologies. The Testbed acts as a small-scale version of the NCSA persistent infrastructure. Once solutions have been judged to meet the requirements of the community and proven themselves as viable for long-term support, a transition plan is created, and the cyberenvironment and its underlying cyberservices and technologies are migrated from the Testbed to NCSA’s persistent infrastructure.
Development of Cyberenvironments
Shared cyberinfrastructure includes networks and compute and data resources plus the middleware that links those resources together and presents them in a standard way. The NCSA view of a cyberinfrastructure architecture or cyberarchitecture extends this picture to include higher-level components that provide complete, community-scale, end-to-end scientific and engineering solutions. This architecture incorporates the traditional shared cyberinfrastructure as a base, but includes additional components needed by a scientific and engineering community, e.g., visualization tools and services, as well as community-specific interfaces, cyberservices, and cyberapplications—in other words, everything needed to support scientific and engineering research. These interfaces and services incorporate knowledge of the domain and allow scientists and engineers to focus on research issues and not on accessing and using cyberinfrastructure. From the point of view of the researcher attempting to answer a scientific or engineering question, all of this is infrastructure. A schematic of the layered structure of cyberenvironments is shown in Figure 5.
Figure 5. Architecture of cyberenvironments.
20
Prototype Cyberenvironments
In FY2005 NCSA began the development of prototype cyberenvironments, or pieces of cyberenvironments, for several scientific and engineering projects and communities. Many of these projects are in collaboration with individuals at other institutions. Building cyberenvironments is a major undertaking, and we know NCSA cannot be successful alone. Thus we are establishing partnerships with the scientists and engineers who will use the cyberenvironments, and we are forging partnerships with computer scientists, engineers, and technologists to ensure that all of the expertise needed to create specific cyberenvironments is available for the projects that we undertake. Table 4 summarizes several major areas where NCSA is collaborating with researchers to develop cyberenvironments.
Astronomy
NCSA is involved in building cyberenvironments for both optical and radio astronomy. These projects build on a rich history of activities at NCSA, which have provided the astronomy community with the capabilities and resources needed to advance our understanding of the universe.
community data reduction codes. CARMA is expected to produce approximately 15 TB of data per year; the raw data will be transferred from the telescope site near Bishop, California, to NCSA for automated pipeline reduction. Both raw and processed image data will be archived at NCSA. CARMA is being commissioned and is expected to have nearly 200 principal investigators per year. During FY2005, NCSA staff: • Completed the initial components of the workflow for CARMA, including the on-site archive, metadata management, and GridFTPbased transport component for the data flow from the telescope to NCSA. • Completed an infrastructure framework for high-performance data reduction in radio astronomy, using a Department of Energy common component architecture. • Developed a specialized application for image fidelity assessment for use in pipelined data reduction and imaging (see: A. Kemball and A. Martinsek, 2005, Astronomical Journal 125, 1760). For further information on the CARMA project, see: http://www.mmarray.org Dark Energy Survey (DES) Other Participants: Barcelona Consortium, Fermi National Accelerator Laboratory, Lawrence Berkeley National Laboratory, National Optical Astronomy Observatory, United Kingdom Consortium, University of Chicago, University of Illinois at Urbana-Champaign, and University of Michigan NCSA Team: Tanweer Alam, Cristina Beldica, Dora Cai, Greg Daues, Joe Mohr, and Ray Plante The Dark Energy Survey is a deep optical imaging survey of 10 percent of the sky that will allow astrophysicists to determine the nature of dark energy. The international DES collaboration is focused on two projects: (i) constructing a new wide-field optical imager for the Blanco four-meter telescope (led by Fermilab) and (ii) developing the data management system to handle the approximately 1 PB dataset that will be generated by the DES project (led by NCSA).
Combined Array for Millimeter-wavelength Astronomy (CARMA) Other Participants: California Institute of Technology, University of California, Berkeley, University of Illinois at Urbana-Champaign, and University of Maryland NCSA Team: Richard Crutcher, Athol Kemball, and Lisa Xu The computational challenges in radio astronomy are being driven by exponentially increasing data rates, increasingly complex instruments, and the need for advanced, automated data reduction capabilities. Focus areas for CARMA are: (i) automated workflows and data pipelines for radio astronomy, (ii) broadening radio astronomy community access through development and deployment of a radio astronomy cyberenvironment, and (iii) development of high-end
Table 4. Cyberenvironments in progress at NCSA.
Field Astronomy Project CARMA (Combined Array for Millimeter-wavelength Astronomy) DES (Dark Energy Survey) LSST (Large Synoptic Survey Telescope) Environmental Science & Engineering CLEANER (Collaborative Large-scale Engineering Analysis Network for Environmental Research) LEAD (Linked Environments for Atmospheric Discovery) LTER (Long-term Ecological Research Network) MAE (Mid-America Earthquake) Center Nanoscale Science & Engineering CMCS (Collaboratory for Multi-scale Chemical Sciences) Status* DEVEL DEVEL DEVEL DEVEL DEVEL DEVEL DEVEL PROTO 100 Number of Researchers** 200 300 1-3,000 5,000 100 1,800
* Status as of September 30, 2005: in the planning stage (PLAN), under development (DEVEL), prototype available (PROTO), or in production (PROD). ** Educational versions of many of the cyberenvironments will be developed. The number of educational users will be at least an order of magnitude larger than the number of research users.
21
The NCSA team is building the cyberinfrastructure required to reduce, archive, and disseminate the optical imaging data that will be produced, starting in September 2009 and continuing through January 2015. NCSA will be the primary processing and archiving center for the DES project. The volume of data and long-term nature of the survey require a highly automated data management system that includes quality assurance throughout. The DES project is a precursor to the longer-term Large Synoptic Survey Telescope (LSST) project, where the data volumes and data rates will be an order of magnitude larger. During FY2005, NCSA staff: • Designed a grid-enabled, database-query-driven data processing and archiving system and established a four-year development plan. The data system will support remote search and retrieval of DES images and object catalogs as well as remote analyses of the DES data. This development plan is punctuated by yearly data challenges, where the data management system is tested using simulated Dark Energy Camera data from Fermilab along with real imaging data from the existing camera on the Blanco four-meter telescope at the Cerro Tololo Inter-American Observatory. • Managed the development of three additional astronomy modules at Fermilab and an image archiving tool at the National Optical Astronomy Observatory. For further information on the DES project, see: http://www.darkenergysurvey.org Large Synoptic Survey Telescope (LSST) Other Participants: National Optical Astronomy Observatory, Research Corporation, University of Arizona, University of Washington, Brookhaven National Laboratory, Harvard-Smithsonian Center for Astrophysics, Johns Hopkins University, Las Cumbres Observatory, Inc., Lawrence Livermore National Laboratory, Stanford Linear Accelerator Center, Stanford University Kavli Institute for Particle Physics, Pennsylvania State University, University of California, Davis, and University of Illinois at Urbana-Champaign NCSA Team: Christina Beldica, Chris Cribbs, Greg Daues, David Fleming, Stephen Pietrowicz, Ray Plante, and Ramon Williamson The 8.4-meter Large Synoptic Survey Telescope (LSST) is a wide-field telescope facility under development with “first light” scheduled in 2013. The LSST will provide comprehensive, time-lapse imaging of the entire available sky in optical wavelengths that will provide: (i) an unprecedented census of the solar system, including potentially hazardous asteroids, (ii) the deepest three-dimensional maps of the mass distribution of the universe that can reveal the nature of “dark energy” that is driving the acceleration of the universe, and (iii) time-tracked light curves from a vast array of variable objects, including extra-galactic supernova used as distance calibrators. This new science will be enabled by a large, 3-gigapixel camera capable of mapping the sky every few days. The LSST project will push new capabilities in both astronomy and cyberinfrastructure. The raw images—15 TB per night—must be transmitted from the telescope and processed in real time—generating a total of 130 TB per night. The initial analysis of the data will produce alerts about time-critical discoveries that can be followed up by other telescopes. The demanding data management requirements, driven by the high data rate and the need for real-time alerts, will require significant cyberinfrastructure advances. NCSA is partnering with a dozen other academic institutions and U.S. De-
partment of Energy laboratories to design and develop an advanced, highly automated data management system for processing and disseminating LSST data for the astronomical community. The NCSA team is leading a collaboration to produce a prototype system based on grid technologies that enables high-performance, parallel I/O, and robust, automated processing workflows. In FY2005, NCSA staff made considerable progress on components that will be part of the first LSST prototype system. In particular, NCSA staff: • Established an LSST “Precursor Archive,” an archive of existing astronomical data that are similar to data that will be produced by LSST and that can be used to develop processing algorithms and demonstrate a prototype pipeline processing system. This archive serves data to developers across the collaboration. • Deployed an archive replication system built on SDSC’s Storage Resource Broker (SRB) and a transfer queuing system originally developed for the NCSA BIMA Data Archive (with the National Optical Astronomy Observatory). This system is being used to replicate the NOAO Science Archive at NCSA for distribution to the NOAO user community. This same system will be used to replicate portions of the LSST archive at partner sites. • Built an early version of the pipeline processing framework using Open Grid Runtime Environment (OGRE), leveraging the work of the OGCE group at NCSA. This system was demonstrated by processing a full night’s worth of simulated data from the Dark Energy Camera on TeraGrid. • Built the first prototypes of components of the LSST Data Access Framework, including a dynamic disk cache management system for an archive backed by a slow mass storage system and whose volume of products is larger than the available spinning disk. • Developed a security framework that will not only be the basis of LSST authentication and authorization mechanisms but will provide a community-wide security framework that will allow the LSST archive to interoperate with other astronomical archives (with collaborators from NMI GRIDS Center group at NCSA, Globus developers at Argonne National Laboratory, and developers from the National Virtual Observatory and NOAO). NCSA presented its first demonstration of this framework in July 2005 to the NVO community; the first fully working demonstration was presented at the American Astronomical Society Meeting in January 2006 as part of the NOAO Science Archive project. • Studied issues relevant to intercontinental data transfer through a collaboration with the Australian National University (ANU) Supercomputing Facility using the LSST “Precursor” data collections. For further information on the LSST project, see: http://www.lsst.org
Environmental Science and Engineering
Environmental science and engineering efforts at NCSA include the development of cyberenvironments for several projects that include data collection, assimilation, management, mining, and visualization in addition to simulation, all under workflow control. It also includes participation in environmental cyberinfrastructure planning for a variety of national efforts, including the CLEANER Project Office and the NEON and ORION projects.
22
Environmental CI Demonstration (ECID) Other Participants: Arizona State University, Chesapeake Research Consortium, Inc., Columbia University, Drexel University, HowaVrd University, Johns Hopkins University, Oregon State University, Rensselaer Polytechnic Institute, Texas Engineering Experiment Station, University of California, Merced, University of California, San Diego, University of Delaware, University of Illinois at Urbana-Champaign, University of Iowa, University of Maryland, University of Minnesota, Twin Cities, University of Massachusetts, Amherst, and Virginia Polytechnic Institute and State University. NCSA Team: Peter Bajcsy, David Clutter, Steve Downey, Joe Futrelle, Rob Kooper, Mark Marikos, Luigi Marini, Barbara Minsker, Jim Myers, Andrew Shirk, Andrew Wadsworth, and Tim Wentling University of Michigan Team: Tom Finholt, Il-Hwan Kim, and Katherine Lawrence The goal of the NSF Collaborative Large-scale Engineering Analysis Network for Environmental Research (CLEANER) project is to transform and advance the scientific and engineering knowledge base in order to address the challenges of complex, large-scale, humanstressed environmental systems, such as managing and protecting our nation’s water supplies, restoring altered ecosystems, preserving endangered species, and tracking harmful agents. The infrastructure required to realize the CLEANER vision includes multiple distributed sites where sensors and instruments will gather data as well as cyberinfrastructure for sharing, storing, managing, analyzing, mining, visualizing, and drawing insights from that data. Environmental Cyber Infrastructure Demonstration (ECID). The overall goal or design of this project is to create and provide an end-to-end cyberinfrastructure via demonstrations to specific environmental communities. The demonstrations are planned to show the integration of multiple heterogeneous workflows, models, and analytical and visualization tools using CyberIntegrator, an emerging “meta-workflow” technology, to support near-real-time prediction of hypoxia in Corpus Christi Bay, Texas. This technology will be integrated in a cyberenvironment with a collaborative technology, called the CyberCollaboratory, an event-based manager, knowledge networking (CI-KNOW), and a metadata system to provide provenance of all activities in the cyberenvironment. The cyberenvironment is being used by teams of researchers and educators to enable a better understanding of collaborative environments where scientists, educators and practitioners can find, share, analyze, and discuss data/information related to their research efforts. In FY2005, the start-up year for this project, NCSA staff: • Deployed a prototype cybercollaboratory to support planning activities of the national CLEANER community. • Identified initial requirements for the Texas Bays Eutrophication Scenario and the Illinois River Basin Observatory. • Developed an initial design for a data/metadata system for heterogeneous distributed data stores that will enable data provenance tracking and provide information to the planned social network analysis based “recommender system.” • Defined a concept of meta-workflow that meets the need for integrating multiple heterogeneous workflows, each of which performed a set of processing steps, into unified processes that span workflow engines, administrative domains, and subdisciplines. A prototype user interface for meta-workflow was also developed. For further information on the CLEANER project, see: http://cleaner.nacse.org/index.html
Linked Environments for Atmospheric Discovery (LEAD) Other Participants: Colorado State University, Howard University, Indiana University, Millersville University, University of Alabama in Huntsville, University Corporation for Atmospheric Research, University of Illinois at Urbana-Champaign, University of Oklahoma, and University of North Carolina, Chapel Hill NCSA Team: Jay Alameda, Greg Daues, Shawn Hampton, Chad Kerner, Albert Rossi, and Mark Straka, Bob Wilhelmson LEAD was funded by NSF in response to the pressing need for a comprehensive national cyberinfrastructure in mesoscale meteorology, particularly one that can interoperate with those being developed in other relevant disciplines. The LEAD project is a multidisciplinary effort involving nine institutions and more than 100 scientists, students, and technical staff in meteorology, computer science, social science, and education. LEAD is addressing the fundamental information technology research and development challenges to create an integrated, scalable framework for identifying, accessing, preparing, assimilating, predicting, managing, analyzing, mining, and visualizing a broad array of meteorological data and model output independent of format and physical location and in a dynamically adaptive manner (in contrast to today’s fixed time schedules and configurations). LEAD will provide advanced weather technologies for research and education, lowering the barrier to entry, empowering application in a distributed context, increasing the sophistication of problems that can be addressed, and facilitating rapid understanding, experiment design, and execution. NCSA is working with other institutions in the LEAD project on integration, hardening, and deployment efforts. NCSA is also participating in workflow and broker development related to LEAD’s use of production computational resources, including NCSA and TeraGrid systems. In support of these activities, NCSA developed the following technologies in FY2005: • Trebuchet file management libraries and desktop user client. Trebuchet is a high-level library and user client that was developed to simplify the management of files, whether remote or local. A number of protocols, including gridFTP, ssh, and HTTP are supported by these libraries. Work in FY2005 consisted of maintenance and enhancements of the libraries as well as a rebuild of the user client using the Eclipse platform. • ELF remote job management environment. ELF is a streamlined workflow engine, intended to support computational jobs on remote computational clusters. It supersedes the more complicated OGRE workflow engine and features enhanced error management and propagation capabilities that were not possible with OGRE as well as simpler remote configuration of the workflow engine. • Ensemble Broker prototype. The Ensemble Broker is a suite of services to support brokering of ensembles of workflows to manage large numbers of computational jobs. The prototype has a simplified set of supporting services in place that will be enhanced in the future to more fully support the requirements of LEAD. For further information on the LEAD project, see: http://lead.ou.edu
23
Long-term Ecological Research (LTER) Network Other Participants: Arizona State University, College of William and Mary, Colorado State University, Florida International University, Harvard University, Institute of Ecosystem Studies, Kansas State University, Marine Biological Laboratory, Michigan State University, New Mexico State University, Ohio State University, Oregon State University, Syracuse University, University of Alaska, University of California, San Diego, University of California, Santa Barbara, University of Colorado, University of Georgia, University of Minnesota, University of New Mexico, University of Puerto Rico, University of Virginia, and University of Wisconsin, Madison. NCSA Team: Bill Baker, Jim Basney, Greg Bauer, Michael Bletzinger, Randy Butler, Patrick Duda, Terry Fleury, Mike Freemon, David Gehrig, Raheem Syed, and Von Welch The LTER Network is a collaborative effort involving more than 1,800 scientists and students investigating ecological processes over long temporal and broad spatial scales. There are 26 sites in the LTER Network, scattered across the U.S., Puerto Rico, and Antarctica. The LTER Network promotes synthesis and comparative research across sites and ecosystems and among related national and international research programs. Initial activities are focusing on the effort by S. Gage of Michigan State University (MSU) to record, store, and interpret an area’s soundscape. Sonograms provide researchers with a rich source of information on an ecosystem’s health, including identification of the species inhabiting the ecosystem as well as the changes in the ecosystem over time. The sonogram data are automatically labeled, transferred to a computer at MSU, converted to sonogram images, and stored. Working with MSU, the GRIDS Center, and the LTER Network Office, NCSA built a pilot cyberenvironment (Biophony Portal) to support Gage’s acoustic monitoring project. The Biophony Portal allows scientists to search metadata files and move data from a remote storage site to NCSA for analysis. This project collaboratively leveraged the metadata catalog (METACAT) developed by NSF’s SEEK project. In FY2005 NCSA staff: • Expanded NCSA’s MyProxy software to seamlessly interact with the LTER Network Office LDAP security credentials. This was done by adding support for the Pluggable Authentication Module (PAM). • Deployed and tested Gage’s analysis algorithms on NCSA’s highperformance computing systems. • Integrated existing authentication, analysis, and metadata search applications into a one-stop portal interface. • Demonstrated the Biophony Portal at an LTER meeting in September 2005 and at SC05 in November 2005. The LTER cyberenvironment developed at NCSA effectively demonstrates the sharing of ecological data, high-end computing resources, sensors, instruments, models, and analytical tools. For further information on the LTER project, see: http://www.lternet.edu/ Mid-America Earthquake (MAE) Center Other Participants: Georgia Institute of Technology, University of Michigan, University of Texas, Austin, University of Illinois at Urbana-Champaign, University of Memphis, University of Puerto Rico, Washington University, and Rice University NCSA Team: Jim Myers and Chris Navarro Earthquake engineering research can save lives and minimize the disruption caused by major temblors, but only if it is an integral part of disaster planning and response activities. The MAEviz
cyberenvironment is being developed to ensure that the latest research in earthquake engineering is quickly made available to decision makers. Under development by the University of Illinois at Urbana-Champaign’s Mid-America Earthquake Center, NCSA, and the University of Michigan, MAEviz will provide a single community interface that integrates distributed heterogeneous data sources and multiple layers of analysis to allow earthquake engineers, planners, policy makers, and disaster-recovery experts to understand the physical, social, and economic ramifications of major earthquakes. The open-source framework of MAEviz builds on state-of-the-art grid, collaboratory, workflow, data analysis, and metadata technologies developed by NCSA and its partners, providing a flexible and modular conduit through which information is delivered to end-users. MAEviz leverages the pioneering cyberinfrastructure provided by NEESgrid to deliver a tool that enables new modes of transparent, persistent, and spontaneous interaction among geographically distributed researchers, engineers, scientists, social scientists, and decision makers. Secure, intelligent, easy-to-use collaborative tools—delivered via a uniform portal interface—are integrated with discipline-specific tools. MAEviz is driving NCSA’s efforts in metadata-based data management and the adaptation of Sakai, an advanced multilingual collaboration portal, for use in community-scale research efforts. Work on MAEViz in FY2005 included: • Release of a beta version of MAEviz demonstrating an initial integration of MAEviz visual analysis capabilities within the Sakai portal, the ability to save data to a secure metadata/data repository, and the ability to browse shared hazard scenarios based on their metadata descriptions. • Deployment of MAEviz as a pilot project to institutions such as Memphis Light, Gas, and Water (Memphis is close to the New Madrid earthquake fault, the site of the largest earthquake ever recorded in the continental U.S.). MAEviz is also being considered as the base for multinational integration of data and models that would serve researchers, engineers, and decision makers in the most earthquake-prone regions of the earth. For further information on the MAE project, see: http://mae.ce.uiuc.edu/
Nanoscience and Technology
NCSA is increasing its involvement in nanoscience and nanoengineering activities that require high-performance computing resources. Areas of interest include molecular science and engineering, materials science and engineering, molecular biology, and nanoscale science and engineering. Collaboratory for Multi-scale Chemical Sciences (CMCS) Other Participants: Sandia National Laboratories, Pacific Northwest National Laboratory, Argonne National Laboratory, Lawrence Livermore National Laboratory, Los Alamos National Laboratory, National Institute of Standards and Technology, Massachusetts Institute of Technology, University of California, Berkeley, and University of Illinois at Urbana-Champaign NCSA Team: Joe Futrelle, Jeff Gaynor, Bob McGrath, and Jim Myers Complex, multiscale phenomena, like those found in combustion and environmental research, present a formidable challenge. Scientists and engineers must understand concepts at a variety of levels in order to properly model and understand them. This presents intel-
24
lectual hurdles as these phenomena are being elucidated by a wide range of experimental and theoretical techniques, each with its own unique abilities, scientific models and approximations, and data formats. Information sharing and community-wide, system-level coordination are thus major challenges in these disciplines. The Collaboratory for Multi-scale Chemical Science (CMCS), funded by the Office of Science in the U.S. Department of Energy, is designed to address many of these challenges. It provides rich, group-level collaboration capabilities, facile data flow among communities, and community-level review and curation of data. NCSA has recently taken a role in CMCS and in the related Scientific Annotation Middleware (SAM) effort. NCSA staff are: • Integrating CMCS into NCSA’s overall cyberenvironments initiative, as an early example of how a cyberenvironment can assist community interaction in data publication, community data curation, data translation, and integration. • Incorporating semantic grid technologies into CMCS and assessing the resulting design changes. • Developing a plan to integrate Sakai into CMCS. • Contributing to the standardization of the Data Format Description Language within the Global Grid Forum. DFDL provides a means for describing the logical data model associated with binary or ASCII data formats and is being used within CMCS in conjunction with a DFDL parser to enable the extraction of metadata and translation among data formats. The CMCS cyberenvironment has been adopted by two of the groups recently awarded grants by NSF’s Chemistry Division to pursue the development of cyber-enabled chemistry applications (“Process Informatics for Chemical Reaction Systems,” PI: W. Green, Massachusetts Institute of Technology, and “Developing Collaboratory Tools to Facilitate Multi-Disciplinary, Multi-Scale Research in Environmental Molecular Science,” PI: K. Mueller, Pennsylvania State University). For further information on the CMCS project, see: http://cmcs.ca.sandia.gov
Biological and Biomedical Science
Evolution Highway Institute for Genomic Biology Team: Annelie Everts-van der Wind, Denis M. Larkin, and Harris A. Lewin NCSA Team: Loretta Auvil, Collen Bushell, Lisa Gatzke, Greg Pape, and Michael Welge Because of the massive databases created by DNA sequencing studies, whole genome comparisons present a number of analysis challenges. Traditionally, enormous effort and specialized tools have been required to relate one species’ genome to another and to uncover the finer points of distinction and similarity. Evolution Highway, which was designed and built by NCSA and colleagues at UIUC’s Institute for Genomic Biology (IGB), is a cyberservice that addresses these challenges, integrating diverse genomic and proteomic data and offering visualization tools that make comparing and understanding the data easier. Evolution Highway, which is based on NCSA’s Data to Knowledge (D2K) framework, automates many time-consuming tasks and serves as a framework for integrating new analysis tools, algorithms, and approaches. It was placed into pre-production at NCSA during the summer of 2005 and is being used by researchers at IGB and other institutions. Evolution Highway was used to perform the most extensive comparison of mammalian genomes ever attempted—examining the chromosome organization of humans, cattle, rats, mice, cats, dogs, horses, and pigs, all at once. Results were presented in a Science article (see: Science 309, 613-617, 2005). Protein Structure Prediction and Analysis Rosetta Team: David Baker, Dylan Chivian.David Kim, and Jack Schonbrun NCSA Team: Chit Khin, Michelle Gower, Doru Marcusiu, Jay Mashl, Sudhakar Padamighamton, and John Towns This project, with David Baker at the University of Washington, is aimed at deploying and disseminating the protein structure prediction and analysis methods developed by the Rosetta Project, led by Baker. This group is a world leader in protein structure prediction, which has sometimes been referred to as a “Holy Grail” of biomolecular computational science. The group operates a Web service for protein structure prediction that currently has a three-month backlog. The proposed project will provide cyberinfrastructure to support Web services, education, and software dissemination, as well as further methods development. In the past year with NCSA support, Baker’s group obtained top scores in the CASP competition for protein structure prediction and successfully predicted the structure of a large number of other, small-to-intermediate soluble proteins.
Cyberservices
Cyberservices are one of the basic building blocks of cyberenvironments. Often the development of a cyberenvironment begins with development of one or more stand-alone cyberservices. Cyberservices developed this past year are described below.
Agricultural Science and Engineering
SoyFace Data Service Principal Investigators: Steve Clough NCSA Team: Dora Cai and John Towns The SoyFace project involves multiple researchers who perform experiments on small plots of land over several years. Each researcher measures soil, weather, and crop conditions under “normal” conditions and with controlled increases in carbon dioxide and ozone concentrations. The goal is to quantify the changes in soybean yield and quality that result from changes in atmospheric composition. NCSA is working with researchers from UIUC’s Crop Science Department to design and deploy a prototype database system to store experimental data and facilitate data sharing and research discovery within the SoyFace project. This service will allow sharing of information over a single experimental season as well as facilitate multiyear analysis.
25
Cybertechnologies
Although many of the cybertechnologies needed for the creation of cyberenvironments are available, others must be developed. In addition, existing technologies may need to be extended or modified in order to meet the needs of research communities. NCSA is actively involved in the development, extension, and modification of many of the basic technologies that underlie cyberenvironments. The goal is to develop cybertechnologies that serve many different communities. Related activities by some of NCSA’s strategic partners are described in the sidebar, Strategic partnerships. Secure Credential Management Services. In FY2005, NCSA continued to extend MyProxy and GSI-OpenSSH, two widely used open-source software packages for secure grid computing. MyProxy provides a credential management service that helps users maintain the security of their credentials while delivering convenient access to the credentials when and where they are needed during grid sessions. It is key to the deployment of secure grid portals. GSI-OpenSSH is a modified version of OpenSSH that adds support for X.509 proxy certificate authentication and delegation, providing a single sign-on remote login and file transfer service for grids. MyProxy and GSI-OpenSSH are now an integral part of the latest version of the Globus Toolkit. Secure Databases. In support of the LTER project, NCSA enhanced the Metacat XML database to support the X.509-based grid security infrastructure (GSI). This was accomplished by starting with HTTP libraries included with the Java Commodity Grid (CoG) Kit that were already partially GSI-enabled. NCSA further developed these libraries, adding features needed by Metacat. Metacat was then modified to use the CoG HTTP libraries, giving it advanced authentication capabilities. The changes to the CoG HTTP libraries were contributed back to the CoG project for broad dissemination. Tupelo Metadata and Data Archiving Software. Tupelo is a grid-archiving system for scientific data and metadata. Objects and files can be created, updated, organized into semantic networks, and secured. In addition, metadata objects can be searched and retrieved based on the values of arbitrary, user-defined attributes. The metadata and data archiving software features support for RDF-OWL import and export, Oracle 10g, and MySQL backends, file transfers using GridFTP and ssh, and grid services based on OGSI 3.2.1. Tupelo has been used to store MAEviz scenario data and metadata. Tupelo 1.0 was released on July 15, 2005, and Tupelo 1.1 was released on September 30, 2005.
Cyberenvironments and Technologies Directorate Office
James Myers, Associate Director Randy Butler, Deputy Associate Director Ken Sartain, Project Manager Deanna Spivey, Administrative Support Marquita Miller, Administrative Support Andrea Fierro, Administrative Support Automated Learning Group Michael Welge, Division Director Kjellrun Olson, Project Manager Loretta Auvil Vered Goren Bruce Mather Barry Sanders Luigi Marini Bernie A’cs Eugene Grois Robert McGrath Andrew Shirk David Clutter Xavier Llora Greg Pape David Tcheng
Security Research Division Von Welch, Division Director Mark Marikos, Project Manager Rakesh Bobba Neil Gorsuch Joe Muggli Thomas Scavo William Yurcik Patrick Flanigan Himanshu Khurana Meenal Pant Adam Slagell Terry Fleury Patricia Kobel Kevin Price Jun Wang
Collaborative Technologies Division James Myers, Acting Division Director Peter Bajcsy Jeff Gaynor Joel Plutchak Inna Zharnitsky Steven Downey Rob Kooper Andrew Wadsworth Joseph Futrelle Chris Navarro Tim Wentling
Visualization and Experimental Technologies Division Donna Cox, Division Director Alex Betts Lorne Leonard Paul Rajlich Matthew Hall Stuart Levy David Semeraro Vlodoymyr Kindratenko Robert Patterson
Middleware Division Jay Alameda, Division Director William Baker Gregory Daues David Gehrig Stephen Pietrowicz James Basney Patrick Duda Shawn Hampton Albert Rossi Michael Bletzinger Mike Freemon Weddie Jackson
HDF Group Michael Folk, Division Director Albert Cheng James Laird John Mainzer Songyu Lu Barbara Jones Matthew Needham Elena Pourmal Binh-Minh Ribler Muqun Yang Pedro Nunes Quincey Koziol John Blinka Xiangchi Cao Frank Baker Vailin Choi Leon Arber
26
National Laboratory for Advanced Data Research
In FY2005, SDSC’s and NCSA’s Cyberinfrastructure Partnership formed the National Laboratory for Advanced Data Research (NLADR) to, in part, standardize the steps in the data collection management process and to create data management and scientific workflow algorithms that support the integration, search, analysis, visualization, and collaborative exploration analysis of data collections. Since a formal kick off in January 2005, NLADR has developed the Common Data Services Architecture (CDSA), integrating four development tasks: data management and integration services, scientific workflows, embedded cyberinfrastructure services, and data mining services. This year, NLADR: • Integrated SDSC’s Storage Resource Broker (SRB) and NCSA’s Hierarchical Data Format (HDF) systems to support fast, partial access to objects from very large files in SRB. A working prototype of the SRB-HDF5 data system became available in fall 2005. • Released version 1.1 of Tupelo, a grid-enabled archiving system for scientific data and metadata. • Began integrating data management, analysis, and visualization services by leveraging two well-known workflow systems, the cross-institution Kepler workflow engine and NCSA’s D2K system. Both are used by several NSF-supported projects. • Deployed a suite of instrument-management Web services and an associated administration portal. • Designed an open system architecture for several supported observing system projects. • Classified several prominent educational document collections and developed software for the statistical analysis of these collections.
Strategic partnerships
Accomplishing the goals set forth above will require a broad range of technical expertise and experience—NCSA cannot build cyberenvironments alone. We must forge close partnerships with computer scientists and engineers to ensure that all of the expertise needed to create cyberenvironments is available for the projects that we undertake. NCSA must create distributed, yet efficient and effective, teams to develop the integrated, comprehensive software environments needed by the various scientific and engineering communities. Below we summarize the intent of some of our strategic partnerships. an allocated collection of resources into a well managed computing environment governed by user-defined policies. Once these resources are revoked, NCSA should be able to migrate the environment to another collection of resources.
Dan Reed, University of North Carolina–Chapel Hill
In recognition of the need for advanced cyberenvironments that provide diverse capabilities to scientists and engineers, the Renaissance Computing Institute (RENCI) is partnering with NCSA to deploy a computational biology infrastructure for use by scientists, researchers, educators, and students. This infrastructure will evolve into a sophisticated cyberenvironment for executing, monitoring, and analyzing biological applications, with access to federated and searchable distributed databases. To ensure that this cyberenvironment enables researchers and educators to interact using concepts and approaches familiar to their specific discipline, requirements and iterative feedback are being derived from several collaborative biological and biomedical projects, including a biomedical study of the cell lifecycle and its mechanisms for DNA repair, the Carolina Center for Exploratory Genetic Analysis, the Bioinformatics TeraGrid Science Gateway, and the North Carolina Bioportal. This partnership will enable NCSA to serve the future needs of the nation’s science communities by creating a scalable cyberinfrastructure that meets specific needs through customized portal interfaces.
Ian Foster, Argonne National Laboratory
The collaboration with Argonne National Laboratory is focused on improving security in the LSST project, especially user authorization. There is a substantial amount of overlap of LSST needs with the Earth System Grid (ESG) project’s deployment model. The goal is to leverage the architectural security model as well as many of the technologies developed for ESG by ANL.
Tom Finholt, University of Michigan
The collaboration with the University of Michigan has two goals. First, to enhance synchronous long-distance communication by improving the experience of participating in videoconferences. For example, one way to improve the experience is to improve the usability of existing applications, such as the Access Grid. Another way to improve the videoconference experience is to dramatically increase the resolution of video images so remote participants appear more life-like and real. The second goal is to make it easier for collaborators to simultaneously view large, high-resolution data visualizations. For example, in both the mesoscale weather and the environmental engineering and hydrology communities, it is common to view phenomena that occur on a large geographic scale, but representing these views on a conventional desktop display is awkward.
Ewa Deelman, University of Southern California
The collaboration with USC’s Information Sciences Institute focuses on support for the data management capabilities within the LEAD environment. Specifically, ISI is participating in application requirement evaluation, deployment, and customization of data management services within LEAD. ISI is providing deployment and evaluation support for the use of the Globus data management tools, such as the Globus Replication Location Service that is used for registration and location of data replicas in a distributed environment and the Metadata Catalog Service that associates descriptive metadata attributes with logical data items.
Miron Livny, University of Wisconsin
The goal of the collaboration with the University of Wisconsin is to develop a capability to target resources at NCSA to satisfy opportunistic computing needs. In particular, work is focused on support for “on the fly” deployment, removal, and eventually migration of all Condor services. With the newly developed capabilities of Condor-C, this will allow NCSA to automatically turn
27
Innovative Systems for Science and Engineering
Petascale computing is now a realizable goal that will impact all of science and engineering, not just those applications requiring the highest capability. But the optimum pathway to petascale science and engineering—realizing the full potential of petascale computers to drive science and engineering—is unclear. Future computers cannot rely on continuing increases in clock speed to drive performance increases—heat dissipation problems will limit these increases. Instead, tomorrow’s computing systems will include processors with multiple “processor cores” on each chip, special application accelerators, or reprogrammable logic devices. In addition, all of these types of processors may be included in a single system, interconnected by a high-performance communications fabric. Individual processors may even have heterogeneous “processor cores” in the fashion of the new Cell processor from IBM, Sony, and Toshiba. These technologies have the potential to dramatically increase the fidelity and range of computational simulations as well as the scope and responsiveness of data mining, analysis, and visualization applications. However, they also pose significant technical problems that must be addressed before their full potential can be realized.
Emerging computing technologies
Reconfigurable computing systems are a highly cost- and power-effective computing technology for specific classes of applications. Reconfigurable systems are now riding the silicon technology curve, and it has been estimated that it may be possible to achieve a petaflop with as few as 4,000 FPGA devices4. Yet, there remain many unanswered questions concerning both the architecture and the range of applicability of reconfigurable systems. Field-programmable gate arrays (FPGAs) FPGAs have proven to be very effective for integer and fixed-point intensive applications (such as digital signal processing). They will soon be effective for a wide range of single-precision floating point intensive applications, and they will be effective in the longer term for double-precision floating point intensive applications. The central difficulty using this technology is that the algorithms and applications need to be rewritten and molded to fit the FPGA system architecture. A number of projects were undertaken in FY2005: • MATPHOT. With Kitt Peak National Observatory’s K. Mighell, ISL staff helped improve the performance of the MATPHOT code by a factor of 81⁄2. Further improvements would allow images from orbiting telescopes to be filtered and cleaned before they are sent to earth-based stations. This would remove the need for costly intermediate image storage at the stations while images await processing. • NAMD. With J. Phillips from UIUC’s Theoretical and Computational Biophysics Group, ISL staff successfully implemented the NAMD code, with its benchmark data set of 100,000 atoms, on an FPGA system from SRC Computers. This implementation achieved a performance improvement of a factor of 3 with a CPU-FPGA combination. This improvement is significant since the FPGA implementation is compared against a heavily optimized CPUonly code. Working with technical staff from SRC Computers, ISL staff modeled an idea to improve NAMD performance by slightly modifying the SRC system architecture. The model indicates that a performance improvement of as much as 50x is possible for NAMD. The modified system will be tested in FY2006.
Innovative Systems Laboratory
To address the issues confronting petascale computing, NCSA created the Innovative Systems Laboratory (ISL) in FY2005. The ISL will allow NCSA and its collaborators to thoroughly test and evaluate the performance of new computing technologies for key scientific and engineering applications and, thus, will help define the best path to petascale science and engineering. Initial activities are focused on high-end computing platforms as the underlying technology will be rapidly evolving over the next few years. However, other elements of the cyberinfrastruc¬ture that limit the performance of scientific and engineering applications will also be considered. In addition to allowing the scientific and engineering community to thoroughly evaluate new elements of the national cyberinfra¬structure before major investments are made, the ISL also allows the computer industry and university and laboratory research groups to expose new technologies to specific scientific and engineering application drivers. This will aid in quantifying the value of these new technologies. The knowledge gained from access to new computing technologies and systems will also greatly facilitate their integration into future computing systems and production computing environments. The ISL is testing and evaluating a set of highly promising technologies, with a focus in FY2005 on reconfigurable computing using field-programmable gate arrays (FPGAs) from several vendors. In FY2006, these activities will be expanded to include accelerator technologies from ClearSpeed as well as a Sony-Toshiba-IBM Cell processor-based system from IBM and on developing network and interconnect performance monitoring and measurement software to support ultrascale MPI applications and multi-protocol use of highperformance interconnects.
4
Table 5. FPGA systems and tools under evaluation at NCSA.
Equipment Cray XD1 Nallatech SGI Altix 350/Athena SRC MAPstation Starbridge Stone Ridge Technology
Tools DSPlogic Mitrion-C, Dime-C Mitrion-C Native ANSI C, FORTRAN Viva Frontier, SCC
Status Evaluated, terminated Reserved for future work Applications under way Applications under way Considered, not evaluated Not released yet
SRC Computers, Inc., “Petaflops computing using direct execution logic,” April 27, 2004. See: http://www.srcomputer.com
28
Networking, interconnects, and systems software
The level of complexity of computing systems and the sophistication of algorithms tuned for them continues to grow, a trend that is accelerated by the new systems that are being evaluated and by the requirements of the TeraGrid. Networks, and particularly lambdas, will need to be understood and integrated as allocated resources. The networking focus in ISL has been directed along on two paths: • Benchmarking, measurement, and instrumentation. • Scheduling and allocation of networks as resources for the HPC community. Some of these activities are described below. • AMINA and Datalines. In the domain of measurement and instrumentation, the AMINA measurement project was implemented for TeraGrid to provide baseline throughput and round-trip-time data among a set of non-compute resource hosts in order to help identify and separate problems as either network- or systemcentric. Datalines was created to help use measurement data more effectively by collecting and organizing it within a generalized framework. • MPICH2-PUMA. Since June 2005, the ISL team has created a version of MPICH2-PUMA that fully incorporates MPI 1.2 and includes the most immediate features of MPI2, such as Remote Memory Access operations. A highly tuned MPI2 for Infiniband implementation is underway within the MPICH2-PUMA project. This MPI research platform will be used for investigating a modular plug-and-play approach to network abstraction and internal MPI implementation components, performance tradeoffs for threaded MPI implementations, leveraging Intel hyperthreading technology within MPI stacks, instrumenting the MPI communication stack and applications to detect inefficiencies in their implementations, and simulating any application on any hypothetical network fabric. • High-availability Linux. As part of a multi-year collaboration with IBM’s Linux Technology Center, members of the ISL enhanced High-availability Linux (Linux-HA) in several ways. The team added a message layer to support multiple types of messages and compression. The team also improved the communication layer to support flow control and OpenAIS (an open-source implementation of the SA Forum Application Interface Specification). The ability to dynamically add and delete nodes on a cluster was also added to one of Linux-HA’s core components, the “heartbeat” program. A logging daemon was also developed and tested. • SciDAC. A multi-institution, multidisciplinary group composed of experts from around the country working as single team is developing an integrated suite of machine-independent, scalable systems software components for the U.S. Department of Energy’s Scientific Discovery through Advanced Computing (SciDAC) initiative. In 2005, members of NCSA’s ISL focused on scalability and functionality improvements with a system monitoring reference implementation called warehouse. The use of advanced protocols for communication minimization was incorporated and event registration was added via the SSS infrastructure.
Outreach
• The ISL and its collaborators in the University of Illinois at Urbana-Champaign’s Theoretical and Computational Biophysics Group demonstrated an MPICH2-PUMA-enabled version of the NAMD biomolecular simulation code at SC05 in Seattle in November 2005. • Given the challenges inherent in learning to program a new architecture, NCSA has been heavily involved in key efforts to disseminate techniques and best practices in reconfigurable computing, including co-hosting the first Reconfigurable Systems Summer Institute at NCSA in July 2005 (with the Ohio Supercomputer Center) and a very well attended tutorial at SC05.
Innovative Systems Laboratory Directorate Office
Robert Pennington, Associate Director Ken Sartain, Project Manager Andrea Fierro, Administrative Support
Emerging Technologies Division David Pointer, Division Director Craig Steffen David Raila Vlodoymyr Kindratenko
Networking & Interconnect Technologies Division Anthony Rimovsky, Joint Division Director Michael Showerman, Joint Division Director Jon Dugan Michael Haberman James Ferguson John Estabrook Michael Kutzko Avneesh Pant Jeremy Enos Hassan Jafri Guochun Shi David Norris Jason Brechin
29
Cyberapplications and Cybercommunities
The Cyberapplications and Communities (CAC) Directorate is NCSA’s window to the larger science and engineering communities, as well as a window to arts and humanities communities. Many of the staff in this directorate are active researchers who have firsthand knowledge of the challenges faced by their discipline. Many have joint appointments on the UIUC faculty. The primary responsibility of this directorate is to provide community input into:
Astronomy
Laboratory for Cosmological Data Mining (LCDM)
Robert Brunner, NCSA and Department of Astronomy The current theoretical picture is that the universe began in a Big Bang and, after an early period of exponential growth dubbed inflation, has settled down into the ongoing process of structure formation. This is not the whole picture, however, as the effects of gravity indicate that we do not see everything—there is “dark matter” that produces no visible light but exerts a gravitational force on ordinary (that is, visible) matter. In addition, recent measurements of distant supernovae indicate that the expansion of the universe is actually accelerating. This requires an extra energy component, known as “dark energy” because its exact composition, as is also the case with dark matter, is unknown. To tackle the challenges of cosmology, different approaches have been developed, including precise measurements of the cosmic background radiation, studying galaxy clusters, and understanding quasars (supermassive black holes) and their relation to large-scale structure in the universe. All of these approaches, however, require analysis of large quantities (terabytes and beyond) of data that are highly dimensional and complex. This type of analysis requires the development and application of new computational tools. The Laboratory for Cosmological Data Mining was created at NCSA in response to this challenge. The LCDM is collecting and archiving new and existing astronomical datasets, including data from the Sloan Digital Sky Survey and synoptic data from the Palomar-QUEST survey. To answer these cosmological questions, LCDM is: developing, applying, and deploying data mining algorithms and technology to the wealth of available astronomical data in order to understand our universe; exploring new computational technologies in partnership with the Innovative Systems Laboratory at NCSA in order to overcome existing computational limitations; and educating others in the techniques we are developing to manage, move, analyze, and explore massive datasets on cutting-edge computational systems.
• Development of cyberenvironments. • Acquisition of cyber-resources. • Assessment of innovative systems.
These activities were described in the chapters on cyberenvironments, cyber-resources, and innovative systems. Here, we focus on the other activities in this directorate that impact the mission of NCSA.
Community engagement
In developing cyberenvironments, we are also developing a process for engaging the scientific and engineering communities that they will serve. This process includes joint activities in the design, development, and deployment of cyberinfrastructure as well as the development of new concepts to integrate the various elements of the infrastructure. The goal is to transform communities’ requirements into explicit architecture specifications—to discover the details of their research workflows and to develop plans for building the technology that those workflows require. One size does not fit all, and one cyberenvironment cannot serve all science and engineering domains, even if many of the elements of the cyberenvironments are common. The community engagement process is iterative—a dialogue, not a monologue. We must obtain a clear understanding of the research that the communities need to accomplish, and they must obtain a clear understanding of what the technology can and cannot do. Help from our partners at the University of Michigan (T. Finholt, School of Information) and the University of Illinois at Urbana-Champaign (N. Contractor) is essential in formulating and implementing the engagement process. In choosing which communities to work with first, we consider issues like the degree to which the community is ready to engage in the process, the potential of the effort both for success and for applicability outside of the community, the significance of the research that the engagement will enable, and how well our expertise matches their needs. Cyberenvironments born of these collaborations will be more than just the integration of existing tools—powerful as that function alone might be. These cyberenvironments will provide new capabilities, some not even envisioned when the process began. Some elements of cyberenvironments that arise from this process will be specific to a particular community, satisfying needs that only they have. Others elements may be shared across communities and become part of a robust, general shared cyberinfrastructure. Identifying the latter elements is of great importance for they need only be implemented and supported once and can be re-used many times thereafter.
FLASH physics modules and cybercommunity development
Paul Ricker, NCSA and Department of Astronomy FLASH is an adaptive mesh refinement (AMR) hydrodynamics plus Nbody code for astrophysical problems originally developed by the DOE’s ASC Center for Astrophysical Thermonuclear Flashes at the University of Chicago. It is freely available and has at least 200 users worldwide, not only in astrophysics but also in areas such as plasma physics and computational fluid dynamics. It was originally developed for simulating Type Ia supernovae. After the Flash Center’s funding expires in 2007, DOE and the University of Chicago have committed to providing staff to support FLASH as a community code. The Flash Center’s resources are directed toward developing and maintaining framework and physics modules important to the Center’s scientific mission. However, much of the code’s general usefulness to the astrophysical community comes about through its N-body solver, Poisson solver, radiative cooling module, etc., which NCSA is developing. Ricker’s group also tests and tunes FLASH for the high-performance computing platforms at NCSA, a process that has uncovered and fixed bugs in HDF5, Lustre, and FLASH itself, for example. During the past year, NCSA staff: • Verified and documented FLASH’s performance on cosmological test problems. • Derived and documented refinement criteria for the N-body component of cosmological AMR simulations.
Research and development activities
CAC staff who hold joint faculty appoitnments are often involved in basic research activities. Their participation in these activities helps link NCSA’s cyberinfrastructure efforts with the needs of working scientists and engineers.
30
• Completed and documented the multimetallicity equilibrium radiative
cooling module for FLASH. • Completed the basic star formation and stellar feedback subgrid module for FLASH.
• Developed and validated a protocol for determining the protonation
state of proteins.
• Investigated multiscale methods for simulating and analyzing domain
formation in membranes.
Environmental science and engineering
Coalition for creation of a Collaborative Large-Scale Engineering Analysis Network for Environmental Research (CLEANER) Project Office
Barbara Minsker, NCSA and Department of Civil and Environmental Engineering The goal of the CLEANER project is to advance the scientific and engineering knowledge base needed to address the challenges of complex, large-scale, human-stressed environmental systems, such as managing and protecting our nation’s water supplies, restoring altered ecosystems, preserving endangered species, and tracking harmful agents. The vision for CLEANER includes multiple distributed, networked sites where sensors and instruments will gather data, as well as cyberinfrastructure for sharing, storing, managing, analyzing, mining, visualizing, and drawing fresh insights from that data. The CLEANER Project Office will have a broad impact in three areas: education, community outreach, and forecasting infrastructure needs. Specifically, the CLEANER Project Office will (i) develop a more diverse undergraduate and graduate student population that is educated in environmental cyberengineering, (ii) offer outreach services to the communities in order to transfer the technology to decision makers, community leaders, and students, and (iii) provide powerful diagnostic tools for understanding changes in the environment. The Project Office is developing the research plan using six planning committees and an executive and advisory committee. In the past year, the CLEANER team worked on: • Establishment of the project office and hiring of an executive director. • Establishment of advisory and planning committee membership. • The first All-Hands Meeting to set committee deliverables and due dates.
Computational materials science
David Ceperley, NCSA and Department of Physics, and Duane Johnson, Department of Materials Science and Engineering Computational materials scientists (CMS) are changing the way they analyze, understand, and predict the properties of materials. With computational materials science rapidly evolving, it is critical to develop cyberinfrastructure in conjunction with new theoretical developments, modern computer science approaches, and new experimental capabilities. To accomplish these goals, there must be a coordinated effort involving education of future computational materials scientists, knowledge-transfer activities associated with active research, networking of researchers and students with the active worldwide community, and creation and distribution of useful tools for research and applications to challenging problems. The CMS community is ripe for a large impact in the areas of community code development, the key bottleneck to progress in the field. Multiscale computations, in particular, require resources beyond what a single university group can muster. The computational materials science group at NCSA works closely with the Materials Computation Center (MCC) at the Materials Research Laboratory to foster code development and use of high-end computing facilities. This comes through a series of training events, schools, and workshops as well as work with other NCSA staff in performance and code optimization, grid utilization, portal development, and showcasing new hardware. The group fosters creation of a venue where the community gathers and discusses progress and algorithms, interacts with funding sources, and receives training and consultation. During the past year, the MCC has established itself as a national center with strong interdisciplinary links, a community code archive, ties to corresponding European centers, a national workshop series now in its 15th year, and a summer school series now in its sixth year. NSF-DMR picked the MCC to organize and host the last two bi-annual DMR-ITR program reviews (http://www.mcc.uiuc.edu/nsfitr04rev/), an international travel program in CMS, and a town hall meeting (http://www. mcc.uiuc.edu/nsfhpc/) to discuss future requirements for NSF computer acquisition. Ceperley’s group has developed simulation codes (ohmms and qmcPlusPlus) for classical and quantum materials problems. They have been applied to study properties of metal and semiconductor quantum dots. We developed interfaces for qmcPlusPlus to GEMSTONE for “grid-enabled nanoscale science through online networked environments” and developed other utilities to facilitate integration of qmcPlusPlus with workflow and portal services. We also released maintenance and extension of dataspork and packaged it as a Web application.
Nanoscale science and engineering
NCSA is increasing its involvement in nanoscience efforts that require new simulation and analysis approaches in the context cyberenvironments and high performance computing. Areas include molecular science and engineering, materials science and engineering, molecular biology, and nanoscale science and engineering.
Multiscale bio/nanoscience and engineering
Eric Jakobsson, NCSA and Department of Molecular and Integrative Physiology This project is working toward efficient multiscale simulation and analysis of biological and biomimetic transport processes for biological discovery and nanoscale device design. The scales range from electronic structure to cellular-level biosystems and synthetic biomimetic systems. The potential community impact is very large, encompassing workers in computational molecular science, nanofabrication, and all aspects of nanotechnology and nanodevices and nanoscale device design for medicine and for energy production. Presently funded work involves 13 investigators from 10 institutions. A proposal in preparation, to employ computational methods in simulation and analysis of proton transport in fuel cells, would involve collaborations with an additional five investigators. During the past year Jakobsson’s group: • Secured funding for the NIH National Center for the Design of Biomimetic Nanoconductors. • Developed a prototype Web interface for the open-source molecular dynamics program Gromacs. • Developed a more efficient and robust numerical algorithm for stochastic dynamics.
Biological, biomedical, and medical science
The first year of the program in medical informatics is focusing on three areas: infectious epidemiology, electronic health records (EHR), and proteomics. • In infectious epidemiology, the long-term goal will be to build cyberinfrastructure for both research and the early detection of outbreaks. This year, NCSA began to form partnerships with all sectors of the healthcare market, including public and private hospitals, regional clinics, and public health organizations.
31
• The long-term goal in EHRs will be to provide the cyberinfrastructure
to enable safe and efficient health care. Initial efforts are focused on understanding the existing commercial systems and providing crosssystem extensions that enhance patient safety. In proteomics, NCSA will work toward developing the cyberinfrastructure to enable the research community to answer proteomic-related questions quickly and efficiently. Initially this will involve working with stateof-the-art research groups to enhance their current capabilities and to design the next-generation research environment.
Arts, humanities, and social sciences
Center for Computing in Humanities, Arts, and Social Science (CHASS)
Vernon Burton, NCSA and Department of History NCSA is involved in the creation of the Center for Computing in Humanities, Arts, and Social Science (CHASS) at the University of Illinois at Urbana-Champaign (UIUC). The mission of CHASS is to enable innovative solutions to long-standing problems in the arts, humanities, and social sciences by fostering sustained collaborations between humanists, artists, and social scientists and their colleagues in computer science, engineering, and high-performance computing and communications. The goal is to identify, adapt or create, and deploy computational tools that will accelerate research and education in the humanities, arts, and social sciences. The emphasis will be on leveraging innovative opportunities for technology transfer among and between the sciences and engineering on the one hand and humanities, arts, and social science on the other. The center will help to ensure that disciplines that have traditionally not embraced computing and information technologies and digital tools will not be left behind in the rapidly evolving digital world and to contribute to democracy in a technological world by freeing information from the dark corners of archives and opening it up for all citizens. FY2005 was a planning year for CHASS, with efforts focused on two areas: identifying the needs for digital tools and high-performance computing and introducing faculty to computing-related research opportunities and to the concept of the center and what can be achieved through it. To achieve the first, we established an informal working group of interested UIUC faculty who are providing advice on current specific technological and computing requirements of their disciplines, as well as the needs they anticipate for the future. Two events were held to provide an opportunity for faculty members to meet and discuss their research needs. The first was an invitation-only event that had many more attendees than were expected; the second was a “town-hall” meeting open to anyone from the UIUC community, which also had participation well beyond expectations. Input from these meetings is being analyzed along with results from an online survey available from the CHASS Website to help determine the CHASS areas of focus. A CHASS Virtual Institute has been established with EPIC and monthly meetings relating to CHASS are being held over the Access Grid with participants from across the UIUC community.
SONIC is developing cutting-edge techniques to measure, mobilize, and modify the social and knowledge networks of communities. The target communities with which SONIC is engaged include the public health community, emergency response community, social network research community, group interaction research community, and the education and outreach community. In all of these cases, SONIC uses a comprehensive analytic methodology to computationally model, empirically assess, statistically validate, and iteratively influence the emergence of social and knowledge networks within communities. Understanding and assisting in the development of communities in this way will enable NCSA to design cyberenvironments that match each community’s unique attributes and network referral needs, facilitating collaborative work and propelling researchers to fresh insights and new innovations. SONIC: • Engages with communities that NCSA supports, providing them with network asset maps, metrics, and iterative design strategies that allow them to assess their “cyberinfrastructure readiness” and leverage cyberinfrastructure for their growth. • Invests in the development of a computationally intensive methodology (based on semantic Web/grid services) that focuses on both engaging with communities and advancing the basic science of networks in communities. • Design, develop, and deploy a suite of Web-based network referral tools called CI-KNOW (CyberInfrastructure Knowledge Networks on the Web) to enable global communities. In addition to SONIC’s research on the science of networks and provision of specific recommendations for the design of cyberinfrastructure and cyberenvironments, it feeds into efforts to design collaboration tools to leverage the knowledge and social networks within cybercommunities. SONIC helps communities discover their existing communication and knowledge networks, diagnose the network’s health (identifying its absorptive capacity, robustness, connectedness, vulnerability, bottlenecks, etc.), and design the social incentives and technical cyberinfrastructure for networks to function with optimal effectiveness. SONIC has already made significant progress in the following focus areas: • Development of the CI-KNOW core suite of tools. • CI-KNOW implementation in CLEANER. • CI-KNOW implementation in EPIC. • CI-KNOW implementation in TSEEN (Tobacco Surveillance, Epidemiology, and Evaluation Network) and cyberinfrastructure specification for Tobacco Grid (toBIG). • Cyberinfrastructure specification for SNAC (social network analysis community). • Cyberinfrastructure specification for GroupScope. • Advanced computational methodology for mapping Katrina’s emergency multi-organizational networks (EMONs) in real time.
Outreach and educational activities
Edee Norman Wiezecki, NCSA Few of the advantages of the national cyberinfrastructure will be realized without scientists, engineers, humanists, and artists who understand the new capabilities that it provides. NCSA education and development staff are engaging with faculty and staff at UIUC and other universities to discuss how to bring cyberenvironments and cyberservices into the classroom. The emphasis is on undergraduate education, both through the development of educational cybertools and through curriculum improvement. This effort involves regional and national partners. These partners, along with UIUC and NCSA, will help ensure that the benefits of the national cyberinfrastructure are made available to educators and students throughout the country.
Science of Networks in Communities (SONIC)
Noshir Contractor, NCSA and Department of Speech Communication The goal of the Science of Networks in Communities (SONIC) research group at NCSA is to identify and design tools that address the sociotechnical networking needs of diverse research, practitioner, and educational communities. It meets NCSA’s commitment to develop, deploy, and support robust cyberenvironments for the nation’s scientists and engineers by effectively empowering members to leverage the specific social and knowledge networking assets of their entire community.
32
Educational cyberservices (http://education.ncsa.uiuc.edu/) developed at NCSA during the past few years include NCSA Databridge 2.0; NCSA EasyViz; a version of the open-source course management tool, Moodle in a Box, to better meet the needs of the education community; and the Biodiversity Workbench (http://revitalise.ncsa.uiuc.edu/biology), a bioinformatics tool for education environments that runs on top of the current version of the Biology Workbench. NCSA’s involvement with the national community is exemplified by its involvement in EPIC. NCSA developed and maintains the EPIC Website and the Virtual Institute Moodle course collaboration tool for all of the seven EPIC VIs. NCSA is helping to develop CI-KNOW with NCSA’s Science of Networks in Communities (SONIC) group for use in metrics gathering and to show the impact of collaborations among partners and people. It also is developing tutorials and lessons for the education community for NCSA Databridge and NCSA EasyViz and developing Web services for data analysis and visualization for education communities. Other education efforts at NCSA include involvement in the LEAD educational effort as well as in LSST education and public outreach. NCSA is involved in reviewing cyberinfrastructure for elementary, secondary, and informal education for NSF. NCSA is also involved with several departments at UIUC to develop an IT minor integrated across campus. The education effort at NCSA supports under-represented groups in science, technology, engineering, and mathematics (STEM) fields through our projects and programs, including offering Research Experiences for Undergraduates, a program that has a high minority participation (Summer 2005, six undergraduate interns: five women, one male, East Indian; two African-American females, two Hispanic females, and one white female). NCSA organized the NSF Broadening Participation in Computing informational meeting, April 12-14, 2005, in Baltimore, Maryland. There were 80 faculty members in attendance from all over the country. NCSA Education staff continue to address pipeline issues through the Girls Engaged in Mathematics and Science program (30 middle school girls; 43 percent minorities with African-American and Hispanic background) and to help reach rural school teachers through the REVITALISE program. Also, NCSA is actively participating in a program on retention of female students in engineering and computational science.
Administration
The Administration Directorate consists of the following divisions: human resources, finance, public affairs, management information systems, and facilities. Directorate leaders work closely with the Director and Executive Director to manage and improve day-to-day operations of the center. Recently this has included: the creation of centerwide software tools for implementing and tracking human resources transactions and personnel data; creation of a long-term staff development program; deployment of NCSA MIS tools in the College of Business, School of Chemical Sciences, Computer Science Department, and University Administration; a new SC booth structure; design, development, and launch of a new Website to market the center’s strategic plan and new technology developments; and moving staff into a new building. New projects in 2006 include: the development of a three-year staffing plan; full implementation of the campus accounting system; development and implementation of a centerwide marketing plan; and continued software development, including a robust projects database and quarterly status reporting tool. The directorate is also responsible for developing and implementing NCSA policy and ensures that the center adheres to University of Illinois’ policies and procedures. Directorate leaders work closely with various campus administrative offices including the Provost’s Office, Office for the Vice Chancellor of Research, Academic Human Resources, Personnel Services Office, Office of Equal Opportunity and Access, International Faculty and Staff Affairs, and academic departments.
Finance Division
The Finance Division is responsible for the following functions in support of NCSA: budgeting and financial modeling, accounting, purchasing, proposal submission, business records, auditing, and reporting, and electronic business systems implementation.
Public Affairs Division
NCSA’s Public Affairs Division is responsible for the center’s communications materials, Website, and media relations efforts. It also supports several programs that NCSA administers or is a partner in, including TRECC, NCASSR, the NSF Middleware Initiative, and the TeraGrid.
Cyberapplications and Communities Directorate Office
Bob Wilhelmson, Associate Director Jay Roloff, Senior Project Manager Umesh Thakkar, Senior Research Scientist Jean Soliday, Administrative Support Arts, Humanities, and Social Science Division Simon Appleford Vernon Burton Noshir Contractor Roberto Dandi Hank Green Sean Mason Biological and Biomedical Science Division Ian Brooks Dairui Chen See-Wing Chiu Eric Jakobsson Robert Mashl Education Division Pam Joop
Edee Norman Wizieki
Astronomy Division Nicholas Ball Richard Crutcher Ray Plante Brian Wilhite
Cristina Beldica Alessandro Gardini Harold Ravlin Lisa Xu
Robert Brunner Athol Kemball Paul Ricker Ramon Williamson
Environmental Science and Engineering Division Rick Kufrin Yong Liu Luigi Marini Barbara Minsker Mark Straka Nanoscience and Engineering Division David Ceperley Jeongnim Kim
33
In FY2005 the Public Affairs Division produced: three issues of Access magazine per year; continuous content-related and technical updates to the NCSA Website; media relations with both specialty and general media, including coverage in the Chicago Tribune, CNN.com, Crain’s Chicago Business, Information Week, Discover, Federal Computing Week, MIT’s Technology Review, the News-Gazette, and the AP Wire; brochures, flyers and other materials for NSF Middleware Initiative, Private Sector Program, National Archives and Records Administration, and Reconfigurable Systems Summer Institute; more than 25 one-page impact statements for NSF; internal communications pieces like Headline News, data link, and newsletters for the GRIDS Center and TRECC; twice monthly or monthly updates to the TRECC and NCASSR Websites, as well as day-today updates and Web design; construction and technical maintenance of new Websites for NCSA-related programs and groups; video and online multimedia projects. booth, signage, collateral, and demos for the annual SC conference; art and design for the new building; support for NCSA-sponsored events and programs, including Website development, Web form development, logo development, signage, planning guidance, and printed collateral development.
Other Major Projects
NCSA received support for several additional major projects in FY2005. These projects are briefly described below.
Black Holes: The Other Side of Infinity
Other Participants: Harvard University, Sonoma State University, University of California, San Diego, University of Colorado, WGBH-TV, and Thomas Lucas Productions, Inc. NCSA Team: Alex Betts, Donna Cox, Matt Hall, Lorne Leonard, Stuart Levy, and Bob Patterson Funding Agency: NSF, NASA NCSA significantly contributes to science education and outreach by partnering with professionals and educators to develop high-quality science education video productions. NCSA’s visualization group creates high-fidelity, high-resolution data-driven visualizations of NSF peerreviewed science in support of educational narratives for the general public. These educational productions are viewed by thousands of planetarium visitors and millions of television viewers. The latest project involves astrophysical simulations to explain the origin and development of supermassive black holes. The Informal Science Division of NSF funded NCSA and partners to produce a large-dome digital planetarium show and a one-hour HDTV episode of the PBS TV series NOVA. This project was initiated with seed money from NASA. In FY2005, the NCSA visualization team collaborated with Thomas Lucas Productions, Inc., WGBH-TV, and astrophysics experts from the University of California, San Diego (M. Norman), Sonoma State University (L. R. Cominsky), Harvard University (L. Hernquist), and the University of Colorado (A. J. S. Hamilton) to explore and visualize a recent chain of discoveries about the black hole at the center of our galaxy. “Black Holes: The Other Side of Infinity,” a digital planetarium show, debuted in February 2006 at the Denver Museum of Nature and Science (DMNS)
Human Resources Division
The Human Resources Division oversees the hiring of new employees, administration of NCSA staff, employee activities, campus committee work, special projects, employee relations, payroll, Banner reporting and all questions related to employment at NCSA and the University. Human Resources also handles internal training activities as well as the coordination of the salary merit program, performance evaluations, and maintenance of the Human Resources Management Information System tools.
Management Information System Division
In August 2003, NCSA formed a Management Information System Division (MIS) Division whose mission is to provide accurate and timely business information to managers to help them make informed decisions. MIS is working to create, develop, and deploy a software suite called NCSA AutonomyTM that focuses on human resources and finance applications that are used by managers and project managers to effectively operate the enterprise. The Autonomy decision support software is capable of feeding and representing Banner data and interacts with the UIUC Data Warehouse. The system helps NCSA managers make day-to-day decisions about their staffing and funding and provides critical information flow between NCSA managers, staff, and the Human Resources and Finance divisions. MIS tasks assist all levels of people at NCSA with new information about the state of things. In addition, MIS will assist with tracking and rebuilding processes within NCSA to make them easier for employees and managers. Through analysis and testing, the MIS group improves the flow of business processes and provides an easier environment for NCSA managers and the administration. In addition, the MIS group collects information on those things most important to NCSA leaders, so they can better provide reporting for the center.
Administration Directorate Office
John R. Melchi, Associate Director Debbie Shirley, Administrative Support Facilities Division Tedra Tuttle, Division Director Thomas Hurst Mark Washburn Finance Division Michael Rudzinski, Division Director Karen Cromwell Cindy Garrett Patty Roth Annette Felkner Taryn Kelly Edward Young Jeffrey Gaede Laura Osterbur Cindy Wall
Facilities Management Division
The Facilities Management Division is responsible for maintaining essential day-to-day operations of NCSA facilities. This includes building maintenance, physical security, equipment inventory, shipping, receiving, mail distribution, office and building moves, key and access distribution, RMAs, receiving reports for vendor payment, special errands, center supplies, and van usage. The division responds to other needs of NCSA staff when requested.
Human Resources Division Jonathan Howell, Division Director Amy Dillman Candy Edwards Sheryl Reeder
Management Information Systems Division Douglas Fein, Division Director John Gabel Mary Winters-Meyer Douglas Yee
Public Affairs Division Bill Bell, Division Director Tricia Barker Steven Kleinvehn Timothy Dudek Kathleen Ricker Blake Harvey Michael Harden Jr.
34
and will be distributed globally to other planetariums and theaters by Spitz, Inc. The state-of-the art digital dome at DMNS has 11 footprints at 1,280 x 1,024 for each footprint. Each frame of the digital movie was 14.4 million pixels. NCSA created 11 scientific visualizations for the large-dome planetarium show. Michael Norman, UCSD, computed a simulation of the first star forming in the early universe, going supernova, and forming a black hole. This simulation was computed and stored on the NCSA Altix supercomputer, called Cobalt. The NCSA visualization team created new adaptive mesh refinement visualization techniques to render Norman’s scientific simulations. These AMR visualizations reveal 23 levels of nested grid refinement in a seamless transition in ultra-high resolution. An important goal for this project is to develop software to scale the high-resolution visualizations for distribution to a variety of smaller venues. NCSA scaled the hemispherical digital images to 1,920 x 1,080 high-definition format. These visualizations were re-purposed for the NOVA program “Monster of the Milky Way.” The team is developing additional visualizations for this HDTV program that will air in fall 2006. These include the formation of the Milky Way as simulated by Harvard computational scientists.
Disability Research Institute Projects Ticket to Work Project
Other Participants: College of Applied Life Studies, University of Illinois at Urbana-Champaign NCSA Team: Loretta Auvil, Bruce Mather, Greg Pape, Thomas Prudhomme, Barry Sanders, David Tcheng, and Michael Welge Funding Agency: Social Security Administration The Ticket to Work program is a key strategic element in the Social Security Administration’s Disability Insurance Program (SSDI). Participation rates in the Ticket Program have been low, and a reliable assessment of which beneficiaries currently in the system have a higher relative likelihood of participating in the program is essential to the management, long-term success, and evolution of the Ticket Program. This project was a partnership with the UIUC Disability Research Institute (Tanya Gallagher and Judee Richardson), and examined the behavior of beneficiaries who have initiated their participation in the Ticket Program by assigning the Ticket they received in the mail to one of the Employment Networks established to assist them in finding work. The overall objective was to statistically profile the attributes of beneficiaries who used the Ticket, then to model the relationships among beneficiary attributes and participation. Based on these predictive relationships, an assessment can be made of any individual’s likelihood of participation in the Ticket Program, and the beneficiary attribute profiles obtained can assist SSA in estimating the proportion of their entire beneficiary population that might use the Ticket Program as part of a return to work strategy. Predictive modeling was a primary focus of this project and was conducted using NCSA’s Data to Knowledge (D2K) environment. D2K integrates effective analytical data mining methods for prediction, discovery, and anomaly detection with data management, data transformation, and information visualization tools. Predictive modeling approaches used in the Ticket to Work project included decision trees, decision tree forests, naive Bayesian analysis, step-up naive Bayesian analysis, and stepwise multiple regression.
The Job Demands project was funded by the Social Security Administration through the UIUC Disability Research Institute (Tanya Gallagher and Judee Richardson). It focused on testing an alternative strategy for determining a claimant’s ability to do prior work or other work in the national economy based on their predicted ability to meet the physical and cognitive demands of actual posted jobs. The potential value in this alternative approach is that it can be used both as an assessment tool in the disability determination process and in the return to work process as a placement tool for persons with disabilities. This project evaluated an information-intensive approach that involved gathering employment opportunities (job ads) from the Internet and matching these job ads to the Dictionary of Occupational Titles (DOT) entries, thereby allowing occupational characteristics to be inferred from the DOT. NCSA’s Data to Knowledge (D2K) environment was instrumental in this project for data gathering and text mining analysis. D2K was used alongside other crawlers to search and retrieve employment opportunities from the Internet. D2K’s T2K (Text to Knowledge) tools provided text mining and analysis capabilities that have been specially designed to operate within and capitalize upon the complexity of rich natural language domains inherent to very large stores of text and multimedia documents. These T2K components were used to process both the employment opportunities and the DOT entries. Information extraction and deep web techniques were also leveraged through partnerships with Dan Roth and Kevin Chang in the UIUC Department of Computer Science. Special modules written in D2K were used to match employment opportunities to the DOT entries, and then an Employment Opportunities Database (EOD) was created containing the job ads with data from the matched DOT entries. To make the database highly accessible, it was exposed through a Webbased application for searching.
GRIDS Center
Other Participants: San Diego Supercomputer Center, University of Chicago, University of Southern California, University of Wisconsin NCSA Team: Bill Baker, Randy Butler, Michael Bletzinger, Patrick Duda, Mike Freemon, David Gehrig, Weddie Jackson, Herb Morgan, Kathleen Ricker, and Von Welch Funding Agency: NSF The GRIDS Center creates integrated and tested software packages (NSF Middleware Initiative releases) for the broad community. These releases are important in two regards: first, they serve as the foundation for important software distributions created by third parties; and second, they help to identify and resolve interoperability problems between different software components. Part of the success of these releases can be attributed to the continued development and refinement of the GRIDS Center build-and-test facility. In addition to supporting NMI software releases, the build-and-test facility is now being used by other cyberinfrastructure projects, such as TeraGrid. The GRIDS Center has increased emphasis on community engagement and outreach in the past two years. GRIDS Center has substantive engagements with a number of large-scale cyberinfrastructure projects, such as LTER and LOOKING. GRIDS Center members have participated in planning for new projects and provided on-the-ground support for existing projects in terms of infrastructure deployment, system architecture, application development, and training. The GRIDS Center has backed up these community engagement activities by producing well-documented and packaged integrated solutions in targeted areas such as authentication and data movement. These packages have not only provided value to their targeted communities, but are starting to see adoption in other applications.
Job Demands Project
Other Participants: College of Applied Life Studies, University of Illinois at Urbana-Champaign NCSA Team: Loretta Auvil, Colleen Bushell, David Clutter, Lisa Gatzke, Thomas Prudhomme, Duane Searsmith, Vered Goren, Xiaolei Li, Tom Redman, Barry Sanders, Andrew Shirk, Anca Suvaiala, Michael Welge, and Bei Yu Funding Agency: Social Security Administration
35
GRIDS Center has served as an effective bridge between a broad range of activities that are relying on Grid-based cyberinfrastructure—from infrastructure projects such as TeraGrid and OSG to science-based virtual organizations such as the Southern California Earthquake Center and LOOKING, and ultimately to individual investigators.
National Archives and Records Administration Project
NCSA Team: Peter Bajcsy, Alan Craig, Michael Folk, Joe Futrelle, Dan Kauwell, and Rob Pennington Funding Agency: National Archives and Records Administration The work being done with NARA is an excellent vehicle for examining the boundaries of data collection, storage, access, and handling. In the area of data integration and information gathering about decision processes, Peter Bajcsy has conducted tradeoff studies about storage and retrieval efficiency using different data structures and has examined key computer science problems related to the use of computers in government decision making, the capability of documenting and reconstructing government decision making processes, and the assessment of information load and computer performance cost. Michael Folk has studied the scalability, suitability, and performance of HDF5 software as a format for storing federal geospatial data, including analyzing the I/O efficiency of handling this data on terascale facilities, as well as the performance implications involving access to HDF5 data when the data is stored in a Storage Resource Broker (SRB). Joe Futrelle, working in the area of distributed record processing with OAI, has found that the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) has reasonable scaling characteristics for a distributed record processing task. This suggests that other, similar SOA architectures such as Web services may also scale reasonably, provided they are applied correctly to tasks such as distributed record processing. In studies of automated data gathering and analysis by Alan Craig, NARA made available a prototype e-mail collection that allowed tracking of keywords through time and following messages through the e-mail network. It also provided a rich source of data to display the communication network using different visualization techniques. More recently, these same techniques have been applicable to other large e-mail corpi. Dan Kauwell focused on finding better ways to search for and analyze results using visualization to enable archivists to understand the complexity of information more rapidly.
capabilities—analysis that requires only one pass over the data stream so that very large streams of data do not have to be stored. Resulting visualizations will be dynamic and near real-time, allowing exploration, discovery, prediction, and monitoring of event patterns. In 2005, other NCASSR collaborations included: • Integrated security enhancements that allow multiple sites to continue to operate in the face of an attack by raising security levels as necessary and gracefully reducing services. The work is a joint effort with Pacific Northwest National Laboratory and the Naval Research Laboratory Center for Computational Science. • Ways to visualize security information in network traces and tracking vulnerabilities in networked environments in real time. • Security and accountability services for communications systems like electronic mailing lists, which are often hindered by security concerns.
Technology Research, Education, and Commercialization Center
Other Participants: University of Illinois at Chicago and University of Illinois at Urbana-Champaign Funding Agency: Office of Naval Research The Technology Research, Education and Commercialization Center (TRECC) links businesses, education, and the government with next-generation technologies and facilitates responsive, comprehensive exchanges of information. Its manifold mission is to: promote basic research, development, and transfer of emerging university-developed technologies to government and commercial applications; promote innovative education research; support small and medium enterprises by promoting the use of advanced technology testbeds and expertise; and encourage public- and private-sector commercialization of technologies that have potential use in defense and industry. In FY2005 TRECC: • Provided funding for the NCSA OptIPuter Cybercollaboratory to create a unique high-performance collaboratory research infrastructure. Networking bandwidth at the remote facility is being increased to support this effort. • Initiated and funded a Cross-Cluster Secure On-Demand Computing project focused on examining the efficacy of computing on multiple clusters in an on-demand environment. • Initiated and funded the NCSA Agent Sharing of Archived Provenance project, which will develop a proactive, automatic, personal knowledge agent capable of assisting individuals with the acquisition, analysis, and interpretation of ever-changing information. • Initiated the Technology Acceleration for Commercialization project, which will accelerate the development of selected University of Illinois technologies that show near-term promise for commercialization. These wide-ranging technologies include means of: integrating sensor inputs to detect objects in an enhanced security environment; integrating multiformat, wireless sensor data to detect and address hazardous conditions; treating bleeding caused by trauma using high-efficiency hemostatic agents; easily measuring very low-density electrolytic concentrations using sensors; determining the best use of computing resources in a hardware environment; and screening for general viral presence distributed host-cell DNA without false positives using rapid viral infectivity detection. • Expanded the TRECC Knowledge Center, which provides collaborative knowledge-sharing tools to public elementary, middle, and high school teachers and administrators in Illinois.
National Center for Advanced Secure Systems Research
Other Participants: University of Illinois at Urbana-Champaign, Pacific Northwest National Laboratory, Naval Postgraduate School NCSA Team: Loretta Auvil, Jim Basney, Rakesh Bobba, Randy Butler, Weiting Cao, Charilaos Ermopoulos, Patrick Flanigan, Mike Freemon, Michael Harden, Jin Heo, Himanshu Khurana, Nadir Kiyanclar, Patty Kobel, Kiran Lakkaraju, Bruce Mather, Dmitry Mogilevsky, Joe Muggli, Suvda Myagmar, Kjellrun Olsen, Meela Pant, Greg Pape, Danny Powell, Kathleen Ricker, Barry Sanders, Andrew Shirk, Adam Slagell, Ramona Su, Anca Suvaiala, Goren Vered, Xuanhui Wang, Von Welch, Xiaoxin Yin, Bill Yurcik, Funding Agency: Office of Naval Research The more complex, the more powerful, the more ingrained in our lives cyberinfrastructure becomes, the larger a target it becomes for attack. The National Center for Advanced Secure Systems Research (NCASSR) addresses our nation’s need for a reliable and secure cyberinfrastructure, safeguarding the computing and networking tools available to government and business. For example, NCSA is developing a text mining and visualization system that extracts entities, events, and relationships to find hidden patterns and predict relationships between patterns across large and heterogeneous text information flows. These algorithms approach “one look”
36
Major NCSA Programs
NCSA Faculty Fellows Program
Radha Nandkumar, NCSA The NCSA Faculty Fellows Program, which is jointly funded by NCSA and the Office of the Vice Chancellor for Research at the University of Illinois at Urbana-Champaign, provides resources and support that enable faculty to tackle research problems and projects that would otherwise be out of reach. Through the program, faculty can access and benefit from NCSA’s expertise, high-performance computing and storage environment, cutting-edge visualization and virtual reality technologies, data mining capabilities, and opportunities for multidisciplinary collaboration. In the 2004-2005 academic year, the NCSA Faculty Fellows Program supported the following faculty and projects: • Luc Anselin, Geography: to work with NCSA staff to extend his software for the display of geospatial data, called GeoDa, so it could be used to create interactive three-dimensional visualizations on highresolution display walls. • Brian Bailey, Computer Science: to use the tiled display wall at NCSA to further develop and test a system that facilitates innovation and creativity by allowing designers to interact more naturally with large displays. • Stephen D’Arcy, Finance: to use NCSA’s D2K data mining system to analyze auto insurance records, developing predictive models to improve insurance claim investigation practices. • Douglas Kibbee, French: to work with NCSA staff to develop software, dubbed InvisibaseTM, that allows the average computer user to create and share databases without advanced technical knowledge. The shared workspace created by using Invisibase allowed Kibbee to better organize and analyze data related to his study of linguistic prescriptivism and made it possible for him to collaborate with researchers around the world. • Marianne Winslett, Computer Science: to collaborate with grid-computing experts at NCSA to explore the application of her trust-negotiation broker to the supply of grid services. This type of policy-based access control could be far simpler for both the users of grid services and the administrators. • Yuanhui Zhang, Agricultural and Biological Engineering: to work with visualization experts at NCSA to develop improved visual analysis techniques for data generated by experiments that track airflow in aircraft cabins. Improved understanding of airflow could lead to the design of healthier enclosed environments. Since its inception in 1999, 60 faculty from 10 colleges and 39 departments have participated in the NCSA Faculty Fellows Program. This program has greatly expanded the role of cyberinfrastructure in the arts, humanities, and social sciences as well as in science and engineering.
International Affiliates Program
Radha Nandkumar, NCSA The International Affiliates Program engages organizations and science and engineering communities outside the United States that share NCSA’s interests in cyberinfrastructure, high-performance computing, networking, storage, visualization, and data analysis. The program provides opportunities for sharing information, collaborative projects, and staff exchanges. NCSA’s international affiliates include: • Australia: Australian Partnership for Advanced Computing; Monash University, School of Computer Science and Software Engineering; Victorian Partnership for Advanced Computing • Brazil: National Laboratory for Scientific Computing; College of Engineering, Federal University of Rio de Janeiro; National Education and Research Network • China: Chinese Academy of Sciences • Europe: Council for the Central Laboratory of the Research Councils; Open Middleware Infrastructure Institute; UK e-Science Programme • India: C-DAC, The Center for the Development of Advanced Computers; Sri Sathya Sai Institute of Higher Learning, Department of Math and Computer Science; University of Hyderabad, Department of Computer Science • Korea: Korean Institute for Science and Technology Information • Singapore: Institute for High Performance Computing; Bioinformatics Institute • South Africa: Center for High Performance Computing, University of Cape Town • Taiwan: National Center for High-performance Computing NCSA is also a member of the Pacific Rim Applications and Grid Middleware Assembly (PRAGMA). In 2005, NCSA hosted collaborators from Australia, Brazil, China, Korea, Singapore, Taiwan, and the U.K., while staff from NCSA traveled to Brazil, Costa Rica, China, Korea, India, Singapore, South Africa, and the U.K. NCSA brought together representatives from all of its international affiliates for a meeting in conjunction with SC05 in Seattle. NCSA also helped organize several international events, including the III Workshop on Computational Grids and Applications at the National Laboratory for Scientific Computing in Brazil, a Middleware for Grid Computing workshop in France, an e-Science Conference in Australia, and the Women in Information Technology Conference in India.
Private Sector Program
J. Mark Nolan, NCSA NCSA’s Private Sector Program connects industry with innovation. By partnering with leading-edge companies, it guarantees that the emerging technologies and innovations developed at the center are applied to real-world challenges. This provides novel challenges for NCSA’s researchers and gives partners a competitive edge. NCSA’s partners use visualization and virtual reality technologies to design new equipment and test prototypes, apply knowledge discovery and management techniques to vast quantities of data, and tackle formerly intractable problems with the power of supercomputers. In FY2005, five new organizations joined NCSA’s Private Sector Program: Abaqus, ACNielsen, John Deere, JPMorgan Chase & Co., and the Research Triangle Institute. Other current partners are Boeing Phantom Works, Caterpillar, Dell, Exxon Mobil, IBM, and Motorola Labs.
Technology Research, Education, and Commercialization Center
James Myers, Associate Director Kirk Hard, Assistant Director E.J. Grabert, Program Manager Tom Brown Carlos Lonberger Gail Tate Shalini Dewan Joe Reitzer Harry Hilton Nancy Komlanc Jonas Talandis
37
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign 1205 West Clark Street Urbana, IL 61801
www.ncsa.uiuc.edu