TeraGrid Extension Proposal Straw Man
(20 Feb 2009)
This document represents a starting point for determining the strategy for developing our TeraGrid
extension document. Initially it represents the straw man proposal from J. Towns, but has evolved
quickly based on feedback and perhaps active editing. This will allow the TG collaboration to move
down the path of determining our plan, however.
It seems the following principles should be followed. They are open for debate.
a. Science Case/User Impact: Whatever we propose must be supported by a strong science case.
While it is helpful to be able to state improved productivity for researchers, that is a supporting
argument and not a central argument.
b. Positioning for XD: We can reasonably assume that a strong science case will be made in any
XD/CMS-AUSS-TEOS award and we can reasonably assume such an award will be made. Our TG
Extension activity should clearly position important TG capabilities for transition to XD and
address any transition preparation issues.
We are really only constrained by the amount of funding available to us to propose a compelling body of
work for the TG Extension period. Some other considerations are:
a. This proposal will be for the period April 1, 2010—March 31, 2011
b. Total budget is $30M
c. Budget must cover any GIG and RP activities in single proposal; there will be no other proposals
to cover these costs separately.
d. Proposal will be a single proposal submitted via UChicago to leverage current sub-award
structures in place.
This document breaks up the potentially proposed work into a few “activity areas.” This organization is
not necessary for the proposal and likely should be re-mapped to the appropriate TG Areas under the
corresponding Area Directors for the proposal. Currently, the discussion of this document is primarily
related to addressing how RP-funded activities will change going into the TG Extension period. This
necessarily involves some discussion of the GIG funding changes along the way, but this document does
not address a full discussion of the full scope of GIG-funded activities.
This primarily concerns the operation and support of allocated resources for the TG user community.
This focuses on computational resources, but data, visualization and other resources are relevant here.
The O&M for Track 2 awards is not part of this discussion or part of this proposal. Those are separately
awarded and have their own funding time lines.
This leaves the currently supported, non-Track 2 resources for consideration. Only resources that would
arguably be able to have a discernable positive impact for the user community would be appropriate to
propose that we continue. Providing pure capacity does not look like a strong argument given that (a)
Ranger is into its production life and as yet appears to have some available resource, (b) Kraken is still
growing (currently 166TF, adding 450TF any day and adding another 300TF later this year) to provide a
total of ~1PF by year’s end, and (c) the Track 2c system will come on line presumably adding ~1PF more
in early 2010. These three machines will represent more than 2.5PF of computing capability and
capacity. Currently TeraGrid provides a total of ~1.2PF across all resources; most of this is provided by
Ranger and Kraken.
Resources that are continued should provide a clearly defined benefit to the user community either
through direct provision of resource or by providing a resource for developing/enabling important new
Here is a draft of what we might do.
Retain one or perhaps two of the IA32-64 resources to provide a platform for supporting large scale
interactive and on-demand (including science gateway) use. We have been given clear indication from
the user community, the Science Advisory Board and review panels that we should put more effort into
this area. This resource might also be used to work on development of metascheduling/co-
scheduling/advanced reservation capabilities.
In addition, there are increasing interactions with OSG and a desire on the part of many for
interoperability and resource and technology sharing. While I believe we should move in this direction
in general, these particular resources provide an opportunity to running tradition OSG-style jobs (i.e.
single node execution) on TG resources. At a minimum they could backfill the schedule, but we might
want to allow them to have “reasonable” priority, as opposed to how low-level parallel jobs are typically
handled by scheduling policies on large systems today.
Finally, these systems can provide a transition platform for those coming from a university- or
departmental-level resource and moving out into the larger national cyberinfrastructure resources suite.
Typically such researchers are accustomed to using an Intel cluster and this provides something familiar
with which to expand their usage and to work on scalability and related issues. These researchers would
not be restricted to taking this path and could jump straight to the Track 2 systems, but many have
asked for this type of capability. By making use of these platforms in this way, we also alleviate the
pressure of smaller jobs on the larger systems that have been optimized in their configuration and
operational policies to favor highly callable applications.
Options here include:
Steele: 66TF, 15.7 TB memory, 893 node Dell cluster w/ 130TB disk
QueenBee: 51TF, 5.3TB memory, 668 node Dell cluster w/ 192 TB disk
Abe: 90TF, 14.4TB memory, 1200 node Dell cluster w/ 400TB disk
Lonestar: 62TF, 11.6TB memory, 1460 node Dell cluster w/ 107TB disk
Unique Computing Resource: The Lincoln cluster provides a unique GPU-based computing resource at
scale. 192 compute node (3TB memory) + 96 S1070 Tesla unit (1.5TB memory) Dell/NVIDIA cluster w/
Support for VMs: An emerging need and very interesting area for investigation and evaluation is the
use of VMs to support scientific calculations. There are some groups doing this now and Quarry at IU
already provides a VM hosting service that is increasingly widely used and unique within the TeraGrid.
(Currently Quarry supports more than 18 VMs for 16 distinct users, many of which host gateway front-
end services.) This also has connections to supporting OSG users and we should have an effort in this
area. I believe this is another viable usage modality for the four cluster resources noted above along
with Quarry at IU. (7.1 TF , 112 HS21 Blades in IBM e1350 BladeCenter Cluster with 266 TB GPFS disk)
Track 2c Transition: Given that the Track 2c machine will likely come up very close to the end of funding
for TeraGrid, a 3 month transition period should be supported for Pople/Cobalt users. To support this,
we should provide 3 months of funding to operate Pople . 384 socket SGI Altix 4700 w/150TB disk.
While there is a need for visualization resources, by this point in time the ANL visualization cluster will
have been phased out of service. In addition, while Spur at TACC is already operational, this resource is
partially (50%) associated with TACC’s Track 2a award and staffing and O&M are covered from that
budget. Given that two XD/Remote Visualization and Data Analysis awards will be made prior to the TG
Extension period.. There is no clear need to provide additional support for Spur from the TG Extension
Resource of this type will have potential important connections to the development of a distributed
archival data replication service and also to provision of wide-area filesystems. It is clear that a general
wide-area filesystem solution is desired. We currently have a couple of limited solutions with GPFS-
WAN and Lustre-WAN.
Data Archive Resources
The ongoing provision of these resources must clearly be tied to the development activities associated
with establishing a distributed archive data replication service. That project has yet to be clearly defined
and agreed upon as yet, but the data holding need to be retained until the solution is in place and at
least some period of transition has passed. This collective activity must position TeraGrid to offer the
presumed XD awardee with a service that can be transitioned to the XD awardee along with assurance
that the data is in locations that will provide for any further transition of the data hosting as we enter
the XD era. The following resources should be supported:
SDSC Archive: 6 months support for HPSS/SAM-QFS w/ ~5PB of data
NCSA Archive: support for UniTree w/ ~3PB of data; assumes additional data ingestion from
Lincoln and/or Abe
IU HPSS: support for HPSS (~2.7PB capacity) primarily as a data resource for data replication
TACC archive: support for ~10PB capacity
PSC archive: support for 3 month transition period for Track 2c
During the TG Extension period, data from the SDSC archive holdings would be replicated to data service
provider sites after the archive data replication service is deployed. As appropriate, data from the NCSA
archive would also be replicated. In addition, the Blue Waters project has expressed interest in
ingesting the data into the Blue Waters archive from the UniTree archive when the Blue Waters archive
becomes operational. There is also interest in the Blue Waters archive participating as a data resource
for the archive data replication service.
There has been an ongoing call for general availability of a wide-area filesystem spanning all of the
TeraGrid resources. While this is something we may not be able to provide ahead of the period of the
TG Extension, when we do reach that time period, the number of resources to be supported by such a
wide-area filesystem becomes much more manageable.
There are advantages and disadvantages for using either GPFS-WAN or Lustre-WAN to address this
issue. Given the high value this has both for users implementing workflow systems within the TeraGrid
environment and also to those simply using multiple TeraGrid resources (who likely have a potential to
employ a workflow system!), we should move aggressively to deploy a wide area filesystem that at least
will be available on the known resources. This would include Ranger/Spur, Kraken, Track 2c and any
resources we otherwise opt to continue. In addition, this plan should look to make best efforts to
deploy to Track 2d and XD/Remote Visualization and Data Analysis resources. Since these are as yet
undefined, making commitments there is difficult. As such, a contingency plan should be defined and
funds allocated in a budget.
We will enhance Lustre-WAN filesystem capacity and functionality for all following sites, PSC,…??John,
fill out the list of sites being supported?? and extend availability of Lustre-WAN client package to users.
This will galvanize development of multi-site computational workflows, and will allow users to mount
Lustre-WAN filesystems locally at their home institutions, to facilitate workflows involving both TeraGrid
and external resources.
Finally, there has also been ongoing hope for an emerging pNFS solution that now potentially would
appear in the time period of the TG Extension. This will need to be taken into account in the scope of
These resources should have clear ties to one or both of providing a general purpose wide-area
filesystem and the archive data replication service.
Resources to support:
SDSC GPFS-WAN: 700TB capacity; continue to provide support for data collections, and
participate in archive data replication service project, potentially as a wide-area filesystem or high-speed
data cache for transfers. If appropriate, hardware resources could be re-directed to participate in TG-
wide Lustre-WAN solution.700TB capacity; move to Luster-WAN to participate in TG-wide Luster-WAN
solution, continue to provide support for data collections, participate in archive data replication service
project to potentially be a wide-area filesystem supporting that service. [0.25 FTE + O&M @ SDSC]
IU Lustre-WAN: 984 TB capacity; be data resource for TG-wide Luster-WAN filesystem,
participate in archive data replication service project to potentially be a wide-area filesystem supporting
that service. [2.0 FTE @ IU]
For these resources, pNFS integration would need to be investigated if the technology has promise in
that time frame. We should also consider investigating other alternatives (e.g. PetaShare). Both of these
would likely fall under “project” types activities traditionally funded by the GIG.
To support the above mentioned activities, site funded for those activities have some other additional
costs to support their participation. NSF has indicated that no support will be provided to Track 2
awardee sites since network connectivity was an explicit requirement of the Track 2 solicitation.
LONI: network connectivity
SDSC: network connectivity; LA Hub networking support
IU: network connectivity
Purdue: network connectivity
ORNL: network aggregation switch maintenance
PSC: network connectivity (3 months)
NOTE: A couple of thoughts for possible project activities in extension period:
a. Investigate Internet2 DCN testbed for point-to-point transfers.
b. Investigate queuing systems to provide better global services – archives, notifications, and
other, to take advantage of Web 2.0 capabilities.
TeraGrid-Wide Operational Activities
This area covers a number of items that clearly need to be continued. In some small cases, there are
activities currently being funded from RP budgets to perform GIG operational activities. This is done for
convenience in most cases. In addition, there are some funds now re-directed from RP budget via the
GIG to support activities in this area. This area does not include development projects (discussed
below). Here are activities in this area that clearly must continue to properly support the functioning of
the TeraGrid. Operational activities currently funded from the GIG will largely continue, though a
detailed review of these is appropriate due to impacts from a reduced number of resources or related
changes in the project during the TG Extension period.
Management and Project Management
In the context of this document, this will encompass what is currently funded within the GIG budget
proper for the operation of the GIG and appropriate funds allocated to provide management and
project management support at the RP sites. This includes support for the Area Directors currently
funded from the GIG budget. Project management support for the RPs is currently redirected from RP
budgets and delivered via the GIG; management support is still within the RP budget. In the context of
the TG Extension, appropriate management and project management funding will need to be provided
to sites consistent with the level of activity for which they will be funded during the TG Extension period.
Site: RP-funded effort Management Project Management
SDSC 1.30 FTE 0.50 FTE
TACC 0.70 FTE
NCSA 0.90 FTE
IU 0.10 FTE 0.50 FTE
Purdue 0.50 FTE
PSC 0.15 FTE
ORNL 0.37 FTE
Site: GIG funded effort Management Project Management
GIG 0.50 FTE (TGF Chair, NCSA); 0.50 FTE (TG PM, NCSA);
0.50 FTE (TG Events Coord, 1.75 FTE (GIG PM, UC)
SDSC 0.50 FTE
TACC 0.50 FTE
NCSA 0.50 FTE
LONI 0.10 FTE
IU 0.50 FTE
Purdue 0.25 FTE
PSC 0.50 FTE
ORNL 0.10 FTE
NICS 0.50 FTE
TGCoord - TeraGrid Forum Chair [0.50 FTE @ NCSA]
Continuing duties of the TGF Chair.
TGCoord - TeraGrid Project Events [0.50 FTE @ ANL]
Continuing into Extension (need description of this activity)
TGCoord - TeraGrid GIG Project Management [1.75 FTE @ UC]
GIG Project Management personnel.
PMWG – TG Project Management [3.45 FTE: 0.50 @ IU, 0.50 @ NCSA, 0.50 @ NICS, 0.10 @ ORNL,
0.50 @ PSC, 0.25 @ Purdue, 0.50 @ SDSC, 0.50 @ TACC, 0.10 FTE @ LONI]
The effort depends on how many RP resources continue into the Extension.
PMWG - Area Coordination [0.50 FTE @ NCSA]
PM Area Director leads the effort of the PM Working Group for developing the TG IPP, tracking the
project, reporting, and overseeing change management.
Advanced User Support
Currently all advanced user support funding has been redirected from RP budget and delivered via the
GIG. As an area in which the TeraGrid can have major impact, the funding levels for this should be
retained, but a much more aggressive strategy should be developed to much more proactively pursue
users and we should be more willing to provide a higher level of support and service. This means a
willingness to do more “for” users as opposed to working “with” them on some of these activities. This
is hard to capture in text, but it can be explained.
The AUS area will continue to work, in a collaborative fashion, with the User Support (US) area, the
Science Gateways (SGW) area, the EOT/Broadening Participation (EOT) area and the User Facing Projects
and Core Services (UFC) area to provide a comprehensive user support for all the TeraGrid users.
Following the established transparent process, AUS activities will focus on identifying requests and
potential opportunities for providing AUS to users, prioritizing those requests and opportunities, and
matching appropriate AUS staff to those requests and opportunities. In the Extension the total FTE
count for AUS includes FTEs provided by GIG for AUS, as well as FTEs provided by carry forward funding
by some of the RPs.
Site: GIG-funded effort AUS Effort Notes
GIG 0.75 Area Coord: 0.75 FTE @
SDSC 3.2 FTE 3.20 FTE Adv Proj
TACC 4.25 FTE [need to know distribution]
NCSA 4.5 FTE 1.50 FTE TG Apps, 1.50 FTE
Adv Proj, 1.50 FTE Adv EOT
LONI 0.50 FTE [need to know distribution]
Purdue 1.0 FTE 0.50 FTE TG Apps, 0.30 FTE
Adv Proj, 0.20 FTE Adv EOT
PSC 5.50 FTE 3.15 FTE TG Apps, 1.65 FTE
Adv Proj, 0.70 FTE Adv EOT
NIU 1.29 FTE MPIg support
AUS - MPIg Support[1.29 FTE @ NIU]
[[need text to support need for this support…
AUS - Area Coordination [0.75 FTE @ SDSC]
Amit Majumdar continues leading this effort into the Extension.
AUS - Advanced Support TG Applications [5.15 FTE: 1.50 @ NCSA, 3.15 @ PSC, 0.50 @ Purdue]
Under this subarea, one or more TeraGrid AUS staff members will provide targeted advanced support to
users to enhance the effectiveness and productivity of users’ applications utilizing TeraGrid resources.
This support will include porting applications to new resources, implementing algorithmic
enhancements, implementing parallel programming methods, incorporating mathematical libraries,
improving the scalability of codes to higher processor counts, optimizing codes to efficiently utilize
specific resources, enhancing scientific workflows, tackling visualization and data analysis tasks, and
implementing innovative use of resources. It should be noted that providing advanced support for the
scientific applications supported by the Science Gateways also fall under AUS.
AUS - Advanced Support Projects [6.65 FTE: 1.50 @ NCSA, 1.65 FTE @ PSC, 0.30 FTE @ Purdue,
3.20 FTE @ SDSC]
In addition to the ASTA support the complex and leading edge nature of the TeraGrid infrastructure
necessitates two more categories of advanced projects that can benefit large number of TeraGrid users.
One of the categories requires advanced projects, carried out by AUS staff at RP sites that only an AUS
staff with higher level of expertise and experience can perform. These are necessary tasks that fall
within the AUS area and benefits a large number of TeraGrid users. In some cases these tasks require
maintenance mode of operation in an ongoing basis. The second category includes proactive projects
that AUS staff will undertake, in consultation with the TeraGrid user community, and these have the
potential to benefit large number of domain science specific users or users of specific algorithms and
methodologies or mathematical libraries etc. In the following section we describe these two categories
of advanced support projects.
AUS - Advanced Support EOT [2.40 FTE: 1.50 FTE @ NCSA, 0.70 FTE @ PSC, 0.20 FTE @ Purdue]
Under this subarea, in coordination with the TeraGrid EOT area, AUS staff will provide outreach to the
TeraGrid user community about the availability of and process for requesting ASTA and ASP. This
outreach will be done by (1) posting regular TeraGrid news, about AUS availability, before every TRAC
quarterly deadline, (2) contacting NSF program directors that fund computational research projects and
making them aware of availability of AUS, (3) having TeraGrid staff and leaders advertise about AUS
availability at appropriate conferences and workshops, and (4) planning and organizing TG09 and other
workshops and attending these workshops. ??don’t we also do training here??
Currently this is funded both from RP budgets and from the GIG budget. In general, the total funding for
activities in this area across all budgets should be maintained. This will require a careful review of the
currently funded effort and each individual RP to assess if the specific effort should be continued.
Site: RP-funded effort EOT Effort Notes
SDSC 1.25 FTE 0.50 FTE Training, 0.25 FTE
ER/Comm, 0.50 FTE E&O
TACC 0.75 FTE 0.50 FTE Comm, 0.25 FTE
NCSA 1.35 FTE 0.50 FTE Training, 0.65 FTE
ER/Comm, 0.20 FTE E&O
PSC 0.10 FTE Training, 0.25FTE
ER/Comm, 0.31FTE E&O
Site: GIG-funded effort EOT Effort Notes
GIG 0.50 FTE 0.50 FTE (Area Dir @ UC)
SDSC 1.03 FTE 0.10 FTE Sci Highlights, 0.33
FTE Training, 0.40 FTE Edu,
0.20 FTE Outreach
TACC 0.50 FTE 0.50 FTE Training
UC 1.0 FTE 1.0 FTE Ext Relations
NCSA 0.70 FTE 0.50 FTE Training, 0.20 FTE
IU 0.40 FTE 0.20 FTE Training, 0.20 FTE
Purdue 0.75 FTE 0.75 FTE Edu
PSC 1.40 FTE 0.55 FTE Training, 0.35 FTE
Edu, 0.10 FTE Outreach,
0.40 FTE Sci Writing
NICS 0.50 FTE 0.50 FTE Training
[[NOTE: suggest add $1M to this area for future allocation. ScottL needs to provide text for proposal
describing high priority items likely to be funded and refer to process by which things will be decided
upon (annual planning process)]]
EOT - Area Coordination [0.50 FTE @ UC]
University of Chicago will be the Area Director to coordinate the TeraGrid Education, Outreach, Training
working group, and supervise the External Relations Coordinator who will coordinate the External
Relations working group. The Area Director will coordinate programs and activities that are funded by the
GIG among the RPs, coordinating information sharing among the RPs, establishing external partnerships
and collaborations, continually seeking community input and advice, and coordinating evaluation of the
Science Highlights [0.10 FTE @ SDSC]
Publish the Science Highlights.
External Relations [1.0 FTE @ UC]
To meet NSF, users and public expectations, information about TeraGrid success stories, including
science highlights, news releases, and other news stories should be readily accessible via the TeraGrid
website and distributed among news outlets that reach the scientific user community and the general
public. Such outlets include iSGTW, HPCwire (and NSF OIC). This work also involves design and
preparation of materials for the TeraGrid website, for conferences and other outreach activities.
Training [2.58 FTE: 0.20 @ IU, 0.50 @ NCSA, 0.50 @ NICS, 0.55 @ PSC, 0.33 @ SDSC, 0.50 @
Training will focus on expanding the learning resources and opportunities for current and potential
members of the TeraGrid user community by providing a broad range of live, synchronous and
asynchronous training opportunities. The goal is to prepare users to make effective use of the TeraGrid
resources and services to advance scientific discovery. A key objective is to make the learning
opportunities known and accessible to all users.
Education [1.70 FTE: 0.20 @ NCSA, 0.35 @ PSC, 0.75 @ Purdue, 0.40 @ SDSC]
TeraGrid has established a strong foundation in learning and workforce development efforts focused
around computational thinking, computational science, and quantitative reasoning skills to prepare
larger and more diverse generations motivated to pursue advanced studies and professional careers in
science, technology, engineering and mathematics (STEM) fields. The RPs have led, supported, and
directly contributed to K-12, undergraduate and graduate education programs across the country.
Outreach [0.75 FTE: 0.20 @ IU, 0.10 @ PSC, 0.20 @ SDSC, 0.25 FTE @ Purdue]
TeraGrid has been conducting a very aggressive outreach program to engage new communities in using
TeraGrid resources and services. The impact of this can be seen in the number of new DAC (and now
Start-up and Education) accounts that have been established over the last few years. TeraGrid has been
proactive about meeting people “where they live” on their campuses, at their professional society
meetings, and through sharing examples of successes achieved by their peers in utilizing TeraGrid
resources. Programs include Campus Champions, Professional Society Outreach, EOT Highlights, and
Science Writing [0.40 FTE @ PSC]
External Relations — science writing, including science impact and news releases, support of distribution
activities including flyer design, conference presence materials, and TeraGrid web site planning and
design. PSC’s science writer’s experience and proven ability in translating complex information to
informative prose will contribute to the TeraGrid objectives to make science impact stories accessible to
the non-specialist public. PSC’s multi-media designer has contributed strongly to TeraGrid ER
communication activities including design—for the website (and implementation of a content
management system), design of outreach flyers, and multi-media design of video loops for SC
Other activities that should be supported in the TG Extension period that are currently funded from RP
gx-map support: 0.25FTE (SDSC)
This area covers project-type activities funded from RP-budget that should be considered for continued
funding in the TG Extension period.
As we look toward transitioning of Gateway activities to the XD awardee, Gateway activities in the TG
Extension period should focus on infrastructure for gateways that is transferrable and work with
assuring gateways in operation and in development can function effectively on the infrastructure.
Currently there are some efforts in these areas funded from RP budgets and those efforts should be
considered for continued support if they fit within the objectives stated.
Site: RP-funded effort Gateway Effort Notes
IU 0.50 FTE (?) Extension to data gateway
integration w/ gateways
(conversion from GridFTP
to Lustre-WAN use)
ORNL 0.72 FTE NSTG Gateway support
Site: GIG-funded effort Gateway Effort Notes
GIG 9.99 FTE Area Coordination: 0.49
FTE @ SDSC, 0.50 FTE @
ANL; 9.0FTE GW Targeted
SDSC 0.55 FTE 0.10 FTE Comm Accts, 0.20
Targeted Supp Coord, 0.25
FTE GW Docs
UC 1.0 FTE Gateway Supp
NCSA 0.20 FTE 0.20 FTE HD, new comm
NICS 0.15 FTE 0.15 FTE Comm Accts
UNC 0.25 FTE 0.25 FTE Code Discovery
SGW Area Coordination [0.99 FTE: 0.49 FTE @ SDSC, 0.50 FTE @ ANL]
The SDSC area director for community engagement will work to address the high-level needs of the
Science Gateways and other new communities. The director will identify common needs across projects
and work with the other area directors to prioritize meeting these needs. The Gateways will be working
to extend prototype interfaces to TeraGrid resources so that broad science communities, where users
may number in the thousands, will benefit from access to the TeraGrid. The area director will work to
meet all objectives for science gateways. Argonne National Laboratory will provide Stuart Martin to
provide expertise to the TeraGrid Science Gateways Area Director. Stuart will oversee the Gateway
Targeted support program, helping gateway developers overcome technical issues that delay them in
their use of TeraGrid as a production resource. Stuart will also provide advice on the gateway general
SGW - Helpdesk, new communities [0.20 FTE @ NCSA]
Provide helpdesk support for production science gateways by answering user questions, routing user
requests to appropriate gateway contacts, and tracking user responses. Provide input for the knowledge
base to improve gateway helpdesk services.
SGW - Community Accounts [0.25 FTE: 0.15 FTE @ NICS, 0.10 FTE @ SDSC]
Continuation of the Yr5 effort. Yr5 developed a standard implementation for community accounts with a
single implementation deployed as an exemplar. During the Extension there is a need to support
additional gateways with deployment of the standard implementation.
SGW - Gateways Code Discovery [0.25 FTE @ UNC]
Yr4/5 developed the Code Discovery implementation – a means for determining available software at a
SGW analogous to command-line tools on the TG. During the Extension there is a need to support
additional gateways with deployment of this implementation.
SGW - Targeted Support Coordination [0.20 FTE @ SDSC]
The gateway targeted support program provides assistance to gateways wishing to integrate TeraGrid
resources. The request process for support will be clear. It will be clear to staff members what they are
working on and when. Lessons learned will be included in general gateway documentation/case studies.
Outreach will be conducted to make sure that underrepresented communities are aware of the targeted
SGW - Gateway Documentation [0.25 FTE @ SDSC]
There are continuing needs for gateway documentation and tutorials.
SGW - Gateway Targeted Support [9.0FTE: distribution TBD as part of planning process; need
prioritized list of projects]
Need to identify communities that will receive support during the Extension, or identify a
process/timeline to identify targeted support at some future date.
NancyW-D needs to provide text for proposal describing high priority items likely to be funded and refer
to process by which things will be decided upon (annual planning process)
[[earmark UC 1.0 FTE for gateway support]]
There are some activities that will need special handling due to complicating factors.
Quality Assurance/Common User Environment
While this is an important emerging area for the TeraGrid, how we will handle this in our extension
proposal is highly complicated by the decision by NSF to proceed with the XD/TAIS activity according to
the original XD schedule. This means that an XD/TAIS activity will start in April 2010. This will
potentially have significant overlap with QA/CUE (and the TG Software Integration Area and operations
verification and testing!).
Site: GIG-funded effort QA/CUE Effort Notes
GIG 0.45 FTE 0.20 FTE (PSC, CUE Lead);
0.25 FTE (NCSA, QA Lead)
ANL 0.50 FTE 0.25 FTE QA; 0.25 FTE CUE
SDSC 1.0 FTE 0.50 FTE CUE; 0.50 FTE QA
TACC 1.0 FTE 0.50 FTE CUE; 0.50 FTE QA
NCSA 0.76 FTE 0.38 FTE QA; 0.38 FTE CUE
LONI 0.50 FTE 0.25 FTE QA, 0.25 FTE CUE
IU 0.50 FTE 0.25 FTE QA; 0.25 FTE CUE
Purdue 0.50 FTE 0.25 FTE QA; 0.25 FTE CUE
PSC 1.0 FTE 0.50 FTE CUE; 0.50 FTE QA
NICS 1.0 FTE 0.50 FTE CUE; 0.50 FTE QA
QA/CUE - Group Leadership [0.45 FTE]
Working group leads Doru Marcusiu and Shawn Brown.
Common User Environment [3.38 FTE: 0.25 @ ANL, 0.25 @ IU, 0.25 @ LONI, 0.38 @ NCSA, 0.50
@ NICS, 0.50 @ PSC, 0.25 @ Purdue, 0.50 @ SDSC, 0.50 @ TACC]
Users should be able to move relatively easily among TG systems and remain productive; too little
coordination hampers user productivity by introducing unnecessary overhead and sources of error.
However, diversity of resources is a strength of TG and so excessive, unnecessary coordination is an
obstacle to scaling and to using each resource's specific abilities to the fullest, which will include learning
some resource-specific tools and (possibly) policies.
Quality Assurance [3.38 FTE: 0.25 @ ANL, 0.25 @ IU, 0.25 @ LONI, 0.38 @ NCSA, 0.50 @ NICS,
0.50 @ PSC, 0.25 @ Purdue, 0.50 @ SDSC, 0.50 @ TACC]
This activity will function for 3 months of the Extension to aid transition to XD-TAIS.
Software Integration and Information Services
Site: GIG-funded effort SIIS Effort Notes
GIG 0.73 FTE Area Coord: 0.35 FTE @
ANL, 0.38 FTE @ UC
ANL 0.57 FTE 0.02 FTE Operate Inf Svcs,
0.10 FTE Maint Kits, 0.15
FTE GIG-Pack, 0.30 FTE IS
SDSC 0.85 FTE 0.70 FTE Metascheduling,
0.15 FTE IS Enhance
TACC 1.45 FTE 0.10 FTE Operate Inf Svcs,
0.25 FTE Sched WG Coord.
0.10 FTE Maint Kits, 1.0 FTE
UC 3.41 FTE 0.02 FTE Operate Inf Svcs,
1.10 FTE Maint Kits, 1.19
FTE GIG-Pack, 0.80 FTE
Metascheduling, 0.30 FTE
NCSA 0.40 FTE 0.05 FTE Operate Inf Svcs,
0.10 FTE Maint Kits, 0.15
FTE GIG-Pack, 0.10 FTE IS
NICS 0.30 FTE 0.30 FTE Metascheduling
SI - Area Coordination [0.73 FTE: 0.35 FTE @ ANL, 0.38 FTE @ UC]
This includes participation in the TeraGrid annual project review, ownership of Change Management
process, ownership of the CTSS process, participation in program plan and project implementation plan
development, participation in quarterly meetings, oversight and resolution of staffing issues in the
Software Integration area, coordination of sub-awards in the Software Integration area, chairmanship of
the TeraGrid Software working group, and coordinating the transition of Software Integration activities,
processes, products, and services to XD.
SI - Operate Infrastructure services [0.19 FTE: 0.02 FTE @ ANL, 0.05 FTE @ NCSA, 0.10 FTE @
TACC, 0.02 FTE @ UC]
Argonne National Laboratory, University of Chicago, TACC, and NCSA will provide the TeraGrid Software
Integration Operations with Infrastructure Services efforts. This work covers operating the TeraGrid-
wide component of TeraGrid’s Information Services (central index service and WebMDS service at
University of Chicago and a set of redundant services at a commercial hosting service), operating the
TeraGrid-wide component of TeraGrid’s Build and Test Service (a Metronome service at University of
Chicago), and operating centralized scheduling services for TeraGrid.
SI - Scheduling Working Group Coordination [0.25 FTE @ TACC]
The purpose of this work is to maintain TeraGrid's existing metascheduling and coscheduling capabilities
in support of TeraGrid users who require cross-site scheduling capabilities for their work.
SI - Maintain Seven Current CTSS Kits [1.30 FTE: 0.10FTE @ ANL, 0.10 FTE @ NCSA, 0.10 FTE @
TACC, 0.80 FTE @ UC, 0.20 FTE @ UWisc]
Argonne National Laboratory, NCSA, TACC, and University of Chicago will provide the TeraGrid Software
Integration CTSS maintenance effort. This team will respond to help desk tickets concerning existing
CTSS capability kits, debug software issues (including but not limited to defects), and work with software
providers to resolve software defects.
SI - Package Software [1.49 FTE: 0.15 FTE @ ANL, 0.15 FTE @ NCSA, 0.89 FTE @ UC, 0.30 FTE @
Argonne National Laboratory, NCSA, and University of Chicago will provide the TeraGrid Software
Integration Packaging effort. The “GIG-Pack” team is collectively responsible for generating:
-- rebuild software component on TeraGrid resources to address security vulnerabilities and
-- new builds of software components across all TeraGrid resources to implement new CTSS kits;
-- new builds of software components to allow their deployment and use on new TeraGrid resources.
SI – Metascheduling [2.0 FTE: 0.30 FTE @ NICS, 0.70 FTE @ SDSC, 1.0 FTE @ TACC, 0.80 FTE @
What activities continue into the Extension? What hardware is continuing into the Extension? Is there a
large effort here for the interactive computing effort?
[jtowns: claim we will deploy on x86 architectures and further pursue OSG integration, might be
SI - Information Services Enhancements [0.85 FTE: 0.30 @ ANL, 0.10 @ NCSA, 0.15 FTE @ SDSC,
0.30 FTE @ UC]
Should the only effort here be to assist new resources with implementing the TG Information Services?
New resources in the Extension will include T2D and XD-Hardware.
Data and Visualization
Site: GIG-funded effort DV Effort Notes
GIG 0.75 FTE Area Coord: 0.75 FTE @
ANL 0.50 FTE 0.50 FTE Viz Supp
SDSC 0.50 FTE 0.50 FTE Lustre-WAN
NCAR 0.65 FTE 0.65 FTE Viz Supp
TACC 0.50 FTE 0.50 FTE Lustre-WAN
UC 1.0 FTE 1.0 FTE Viz Supp
NCSA 0.50 FTE 0.50 FTE Lustre-WAN
IU 0.50 FTE 0.50 FTE Lustre-WAN
PSC 1.0 FTE 0.50 FTE Data Move Perf,
0.50 FTE Lustre-WAN
NICS 0.50 FTE 0.50 FTE Lustre-WAN
DV- Area Coordination [0.75 FTE @ TACC]
DV Area Director.
DV- Archive Replication
What can we do here that complements the activity in the supplemental request?
Need to support archive service
Need to due transfer optimization.
[[NOTE: no budget allocated here as yet…]]
DV- Visualization [2.15 FTE: 0.50 @ ANL, 0.65 @ NCAR, 1.0FTE @ UC]
Do we continue Viz Gateway efforts – what has usage been thus far and how long has the gateway been
available to users? Are we providing general Viz user support here, or is that being done in AUS?
DV- Data Movement Performance [0.50 FTE @ PSC]
We will implement additional DMOVER client utilities, performance instrumentation, and scheduler
interfaces; co-scheduling between DMOVER installations to incorporate DMOVER capabilities at both
source and target locations; and work with additional external partners (e.g., DEISA, NAREGI, UK E-
Science) to deploy DMOVER off-TeraGrid for improved data movement capability between remote
external sites and TeraGrid sites.
We will also enhance HPN-SSH to parallelize transfers with automatic load-balancing across streams;
implement additional instrumentation options in HPN-SSH to permit measurement and monitoring of
performance and transfer behaviors; and incorporate these enhancements into GSI-OpenSSH.
DV – Lustre-WAN Deployment:
Add 0.50 FTE and $100k @ PSC, NCSA, IU, NICS, TACC; +0.50 FTE @ SDSC
Networking, Operations, Security
Site: GIG-funded effort NOS Effort Notes
GIG 0.70 FTE Area Coord: 0.70 FTE @
ANL 0.50 FTE 0.50 FTE Network Lead
SDSC 1.10 FTE 0.10 FTE Network LA Hub,
1.0 FTE INCA
NCSA 3.75 FTE 1.0 FTE Secure Access, 0.50
FTE Security Svcs, 0.25 FTE
Ops Instrumentation, 2.0
PSC 0.50 FTE 0.50 FTE Security Team
NOS - Area Coordination [0.70 FTE @ NCSA]
The NOS Area Director coordinates the working groups and teams in his area, including: networking-wg,
operations-wg, security-wg, TG incident response, TOC, CUE and QA teams, instrumentation and INCA.
The Area Director ensures these activities progress in a coordinated fashion, and that they are linked in
with groups under other areas. In PY5 the NOS AD will also prepare for the operational transition to the
new XD team.
NOS - Networking Lead [0.50 @ ANL, 0.10 @ SDSC]
Argonne National Laboratory will provide Linda Winkler to provide expertise to the TeraGrid Network,
Operations and Support Area Director for the Networking Team effort. This will include coordination of
the networking working group on network maintenance, networking contracts and communication with
sites wishing to connect to TeraGrid. Linda will also be responsible for architectural planning for the
TeraGrid network and work with the NOS area director and project leadership on networking issues.
NOS - Expanding Secure TG Access [1.0 FTE @ NCSA]
How much work remains for this in the Extension? Yr5 said “We will produce a TeraGrid identity
management infrastructure that interoperates with campus cyberinfrastrure and other grids for a
transparent user experience.”
NOS - Security Services (kerb, myproxy, CA) [0.50 FTE @ NCSA]
TeraGrid depends on a set of centralized, core services including a mechanism for obtaining X.509
credentials for PKI authentication, provision of single sign on across resources provided by the
MyProxy service, and the Kerberos realm for sign on access to the TGUP.
NOS - Operational Instrumentation (device tracking) [0.25 FTE @ NCSA]
support existing tools and take care of reporting activities? There will be a need to incorporate the new
resources (T2D and XD-Hardware) into these tools.
NOS - INCA Improvements [1.0 FTE @ SDSC]
SDSC will continue to manage and maintain the Inca monitoring deployment on TeraGrid, including
writing and updating Inca reporters (test scripts), configuring and deploying reporters to resources,
archiving test results in a Postgres database, and displaying and analyzing reporter data in Web status
pages. We will work with RP administrators to troubleshoot detected failures on their resources and
make improvements to existing tests and/or their configuration on resources. In addition, we plan to
write or wrap any new tests identified by TeraGrid working groups or CTSS kit administrators and deploy
them to TeraGrid resources. We will modify Web status pages as CTSS and other working group
requirements change. SDSC will continue to upgrade the Inca deployment on TeraGrid with new
versions of Inca (as new features are often driven by TeraGrid) and optimize performance as needed.
NOS - TOC Services [2.0 FTE @ NCSA]
Continue the TOC and 800 number during the Extension. The TOC is heavily leveraging the NCSA 24x7
NOS - Security Team Lead [0.50 FTE @ PSC]
TeraGrid Incident Response Team Leadership (Marsteller): Coordination of TeraGrid weekly Incident
Response calls. Weekly call for all TeraGrid site incident response personnel to communicate current
security attacks, assess threat levels for newly discovered vulnerabilities and develop response plans to
Site: GIG-funded effort User Supp Effort Notes
GIG 0.30 FTE Area Coord: 0.30 FTE @
PSC 1.80 FTE 1.20 FTE User Engage, 0.60
FTE Best Practices
User Engagement [1.20 FTE @ PSC]
Coordinate the TeraGrid process for developing, administering and reporting the TeraGrid user survey.
Extract, report, and act upon continuous feedback via the User Champions program, which is the
TeraGrid’s primary tool for proactive engagement with Research allocations level user groups, and via
the Campus Champions program aimed at Startup and Education grants, and at actively fostering
diversity in terms of both fields of science and of demographics. Organize alpha and beta testing of new
Share and Maintain Best Practices for Ticket Resolution across all RPs [0.60 FTE @ PSC]
Share and maintain best practices for ticket resolution across all RPs. Focus on providing users with a
substantive clarification of the nature of the problem and the way forward, quickly and accurately. Focus
on the coordinated resolution of complex problems spanning RPs and systems.
US - Area Coordination [0.30 FTE @ PSC]
Sergiu Sanielevici, the GIG AD for frontline user support coordination, will continue to lead the User
Services Working Group, which is the general forum for addressing cross-site user issues and sharing
best practices and technical information among RP support staff. He will coordinate frontline user
support with the other GIG areas, and steer US participation in the User Champion, Campus Champion,
and Pathways efforts. He will work with the area directors for NOS, AUS, and UFC in the newly formed
User Interaction Council, to coordinate all aspects of user support, engagement, and information.
User Facing Projects
Site: GIG-funded effort UFP Effort Notes
GIG 0.93 FTE Area Coord: 0.93 FTE @
SDSC 2.05 FTE 1.75 FTE Core Services,
1.30 FTE Docs
TACC 3.30 FTE 1.30 FTE Core Services, 2.0
FTE User Portal
UC 1.0 FTE 1.0 FTE Web Site
NCSA 2.0 FTE 2.0 FTE Core Services
IU 1.30 FTE 1.30 FTE Knowledgebase
UFP - Area Coordination [0.93 FTE @ SDSC]
The User-Facing Projects and Core Services area encompasses a wide range of efforts to support the
technological infrastructure through which users interact with TeraGrid. These efforts include
centralized mechanisms for user access, the allocations process and allocations management, RP
integration and information sharing, user information presentation, and user information production
and organization. Specific mechanisms for accomplishing these goals include the TeraGrid User Portal
and web site; TeraGrid-wide documentation and knowledgebase; the TGCDB and AMIE accounting
infrastructure; the current set of procedures and processes for bringing users into the TeraGrid
environment, establishing their identity, and making allocations and authorization decisions for use of
TeraGrid resources (also known as “Core Services”); the TeraGrid allocations and accounting monitoring
Web site; and development efforts to improve these mechanisms and meet the needs of users and
TeraGrid management. The Area Director is also responsible for reporting of TeraGrid-wide user and
resource utilization metrics.
Core Services [5.05 FTE: 1.75 FTE @ SDSC, 2.0 FTE @ NCSA, 1.30 FTE @ TACC]
Web Site [1.00 FTE @ UC]
Not in IPP???
User Portal [2.00 FTE @ TACC]
Not in IPP???
Knowledgebase [1.30 FTE @ IU]
Documentation [1.30 FTE @ SDSC]
This objective ensures that users are provided with current, accurate information from across the
TeraGrid in a dynamic environment of resources, software and services.
Management, Finance, Administration
Site: GIG-funded effort UFP Effort Notes
GIG 1.0 FTE GIG Director
ANL 0.50 FTE 0.50 FTE Admin Supp
UC 2.81 FTE 1.0 FTE GIG Dep Dir, 0.50
FTE Dir of Science, 1.25 FTE
Fin Mgmt, 0.06 FTE
MFA - TeraGrid GIG Director [1.00 FTE @ UC]
MFA - Deputy Director to TeraGrid GIG Director [1.00 FTE @ UC]
MFA - GIG Director of Science [0.50 FTE @ UC]
Dan Katz, TG Director of Science.
MFA - GIG Directorate Admin Support [0.50 FTE @ ANL]
Carolyn Peters, administrative support.
MFA - Financial Management [1.25 FTE @ UC]
UC Business Office
MFA – Advisory [0.06 FTE @ UC]
Ian Foster, PI for TG GIG.