Docstoc

1 Overview of the MIDAS IT Infrastructure

Document Sample
1 Overview of the MIDAS IT Infrastructure Powered By Docstoc
					                   USING THE MIDAS IT
                    INFRASTRUCTURE
The National Institute of General Medical Science
                                         (NIGMS)
       Models of Infectious Disease Agent Study
                                          (MIDAS)
                                        User Manual
                                               Version 6.0


                                                    By the
                              The MIDAS Informatics Group:
                                             RTI, IBM, SAS




                                            March 19, 2007
Preface
 This User Manual documents the information technology infrastructure and services support available
 to participants in the MIDAS project from the MIDAS Informatics Group. This document is designed to
 help the participants in the MIDAS project find the services they need. Since the variety and nature of
 this support will evolve as a function of changing MIDAS partner’s needs and especially as MIDAS is
 given new challenges, this will be a living document. We solicit feedback from the MIDAS users to
 identify missing or confusing information so this document can be improved.
 If you have suggestions, problems or concerns we want to hear from you. Please e-mail your
 comments to Philip Cooley, Informatics Team Leader RTI.

 If you need IT help with MIDAS, send e-mail to the MIDAS Help Desk
TABLE OF CONTENTS
1 OVERVIEW OF THE MIDAS IT INFRASTRUCTURE ................................. 6
    1.1      SCOPE ......................................................................................................... 6
    1.2      ABOUT THIS MANUAL....................................................................................... 7
    1.3      PLANS AND SCHEDULE ...................................................................................... 8
    1.4      KEY PARTICIPANTS .......................................................................................... 8
2 KNOWLEDGE MANAGEMENT AND EDUCATION – THE PORTAL ................... 9
    2.1      PUBLIC SERVICES ............................................................................................ 9
     2.1.1      About Tab........................................................................................................ 9
     2.1.2      Calendar Tab ................................................................................................... 9
     2.1.3      Publications Tab ............................................................................................. 10
     2.1.4      Links to Public Sites, News and a MIDAS Contact .............................................. 10
     2.1.5      Login to the private site .................................................................................. 10
    2.2      PRIVATE SERVICES ........................................................................................ 11
     2.2.1 Home Tab...................................................................................................... 11
     2.2.2 My Profile Tab ................................................................................................ 11
     2.2.3 Calendar Tab ................................................................................................. 11
     2.2.4 Forum Tab ..................................................................................................... 11
     2.2.5 Documents Tab .............................................................................................. 11
     2.2.6 Publications Tab ............................................................................................. 11
     2.2.7 Models Tab .................................................................................................... 11
     2.2.8 Data Tab ....................................................................................................... 11
3    THE LINUX CLUSTER ENVIRONMENT ................................................ 15
    3.1      TECHNICAL SUMMARY .................................................................................... 15
     3.1.1      Hardware ...................................................................................................... 15
     3.1.2      List of available software ................................................................................. 16
    3.2      COMPUTER RESOURCES .................................................................................. 17
     3.2.1      Allocation of Computer Resources .................................................................... 17
     3.2.2      Standard Allocation......................................................................................... 17
     3.2.3      Applying for Additional Resources .................................................................... 18
     3.2.4      Moab Access Portal......................................................................................... 18
     3.2.5      Establish accounts and password management ................................................. 18
     3.2.6      Track allocations ............................................................................................ 19
    3.3      ACCESS THE CLUSTER ..................................................................................... 19
     3.3.1      Connecting to the cluster via VNC .................................................................... 19
     3.3.2      Running X-Windows Applications ..................................................................... 24
     3.3.3      Model Execution ............................................................................................. 27
     3.3.4      File Transfers ................................................................................................. 27
     3.3.5      Terminal Access ............................................................................................. 28
     3.3.6      Interactive Sessions Using qsub ....................................................................... 28
    3.4      FILE SYSTEMS AND STORAGE ............................................................................ 28
     3.4.1      Home Directories............................................................................................ 28
     3.4.2      Scratch Directories ......................................................................................... 29
     3.4.3      Local scratch space ........................................................................................ 29
     3.4.4      /tmp, /usr/tmp, and /var/tmp Directories ......................................................... 30



77b63587-bd68-4731-a5f7-ebca3c46b609.doc                                                                        2/1/2013
    3.5      FUTURE ENHANCEMENTS ................................................................................. 30
     3.5.1      Reserved Scratch Directories ........................................................................... 30
     3.5.2      Permanent File Storage ................................................................................... 30
    3.6      RUNNING JOBS ............................................................................................. 30
     3.6.1      Running MPI programs ................................................................................... 30
     3.6.2      Running Intel C++ programs .......................................................................... 30
     3.6.3      Running Java Programs .................................................................................. 30
     3.6.4      Running R programs ....................................................................................... 31
     3.6.5      Running SAS programs ................................................................................... 31
     3.6.6      Running ArcExplorer ....................................................................................... 31
     3.6.7      Job Queues.................................................................................................... 32
     3.6.8      Disk space for batch jobs ................................................................................ 34
    3.7      DEVELOPMENT UTILITIES AND COMPILERS ........................................................... 34
     3.7.1      Editors........................................................................................................... 34
     3.7.2      Default Compilers ........................................................................................... 34
     3.7.3      Libraries and Application Software ................................................................... 35
    3.8      CVS .......................................................................................................... 36
     3.8.1 Initial check out ............................................................................................. 36
     3.8.2 Get updates ................................................................................................... 37
     3.8.3 Look at change log ......................................................................................... 37
     3.8.4 Look at Differences in Versions ........................................................................ 37
     3.8.5 Get CVS Information about Local Files.............................................................. 38
     3.8.6 Add New Files and Directories ......................................................................... 38
     3.8.7 Commit Changes ............................................................................................ 38
     3.8.8 Other CVS Tools and Concepts ........................................................................ 38
4    MODEL DEVELOPMENT .................................................................. 39
    4.1      MOVING TO LINUX ........................................................................................ 39
    4.2      PARALLELIZING CODE ..................................................................................... 39
    4.3      FIXING CODE/PERFORMANCE TUNING ................................................................. 39
     4.3.1      Timers ........................................................................................................... 39
     4.3.2      Profilers ......................................................................................................... 40
     4.3.3      Debugging ..................................................................................................... 41
     4.3.4      Fatal Errors .................................................................................................... 41
     4.3.5      Java Optimization ........................................................................................... 42
    4.4      MODEL VALIDATION ...................................................................................... 42
    4.5      SETTING UP PRODUCTION RUNS – SCRIPTS........................................................... 42
    4.6      MODEL ENHANCEMENT ................................................................................... 43
     4.6.1 Other Diseases ............................................................................................... 43
     4.6.2 Developing a general model ............................................................................ 43
     4.6.3 Documenting Models ...................................................................................... 43
5    MODEL AND DATA REPOSITORIES ................................................... 44
    5.1      COMPUTE SERVERS........................................................................................ 44
     5.1.1      Cluster and job queuing system (combined) ..................................................... 44
     5.1.2      Cluster management system ........................................................................... 44
     5.1.3      Server for large serial applications ................................................................... 44
     5.1.4      Spillover capacity ........................................................................................... 45
    5.2      LARGE FILE SYSTEM ...................................................................................... 45


77b63587-bd68-4731-a5f7-ebca3c46b609.doc                                                                          2/1/2013
    5.3      DATABASE SYSTEM ........................................................................................ 45
     5.3.1      Metadata used to query results........................................................................ 45
     5.3.2      Metadata ....................................................................................................... 45
    5.4      VERSION CONTROL SYSTEM ............................................................................. 47
    5.5      BUG REPORT AND CHANGE TRACKING ................................................................ 47
    5.6      MIDAS USER INTERFACE (UI) ......................................................................... 47
    5.7      GEODATABASE METADATA EXPLORER ................................................................. 48
     5.7.1 Accessing the GMS ......................................................................................... 48
     5.7.2 Using the GME ............................................................................................... 48
     5.7.3 Using GME Query Results ................................................................................ 49
     5.7.4 Retrieving Geospatial Data from the MIDAS Geospatial Database ....................... 52
6    VISUALIZATION TOOL................................................................... 53
    6.1      INPUT ........................................................................................................ 53
     6.1.1      Scope ............................................................................................................ 53
     6.1.2      Conformance ................................................................................................. 53
     6.1.3      Normative References ..................................................................................... 54
     6.1.4      UML Schema .................................................................................................. 54
     6.1.5      XML Schema .................................................................................................. 56
    6.2      OUTPUT ..................................................................................................... 58
    6.3      GETTING STARTED ........................................................................................ 58
    6.4      RUNNING OUTBREAK ..................................................................................... 58
    6.5      LEGEND ...................................................................................................... 59
7 GEOGRAPHIC INFORMATION SYSTEM (GIS) TOOLS ............................ 61
    7.1      GEOSPATIAL DATA AND DATA DEVELOPMENT ....................................................... 61
     7.1.1      Geospatial     Data Concepts ............................................................................... 61
     7.1.2      Geospatial     Metadata ....................................................................................... 61
     7.1.3      Geospatial     Data Acquisitions ............................................................................ 62
     7.1.4      Geospatial     Data Acquisition Goals and Objectives .............................................. 62
     7.1.5      Geospatial     Data Acquisition Scope ................................................................... 62
     7.1.6      Geospatial     Data Formats Available to MIDAS .................................................... 62
    7.2      GEOSPATIAL DATA STORAGE AND MAINTENANCE PLAN............................................ 63
     7.2.1      Geospatial Data on the MIDAS Portal ............................................................... 63
     7.2.2      Geospatial Data on the MIDAS Linux Cluster ..................................................... 64
    7.3      CUSTOM GEODATABASE DEVELOPMENT, INTEGRATION AND PROCESSING ..................... 64
    7.4      GIS APPLICATIONS DEVELOPMENT .................................................................... 64
    7.5      MODEL PARAMETERIZATION ............................................................................. 65
     7.5.1      Creating Synthetic Agent Populations ............................................................... 65
    7.6      GIS TECHNICAL SUPPORT/TRAINING ................................................................. 66
     7.6.1      Informal Support/Consulting............................................................................ 66
     7.6.2      Formal Training .............................................................................................. 66
    7.7      WEB-BASED GIS MAPPING .............................................................................. 66
     7.7.1      Model Output Visualization .............................................................................. 67
     7.7.2      Geospatial Data Selection ................................................................................ 67
     7.7.3      Geospatial Exploratory Data Analysis ................................................................ 67
    7.8      GIS APPLICATIONS SOFTWARE ......................................................................... 67
     7.8.1      ArcExplorer .................................................................................................... 68



77b63587-bd68-4731-a5f7-ebca3c46b609.doc                                                                         2/1/2013
     7.8.2      GRASS........................................................................................................... 68
     7.8.3      SPRING ......................................................................................................... 68
    7.9      PRELIMINARY LIST OF GIS TECHNOLOGIES AVAILABLE ............................................ 68
     7.9.1 Displaying, Visualizing, Analyzing GIS Data Outside Models ................................ 68
8    APPENDICES............................................................................... 70
    8.1      OUTBREAK TERMS AND DEFINITIONS ................................................................. 70
    8.2      OUTBREAK CONVENTIONS ............................................................................... 72
    8.3      STATE, REGIONAL, LOCAL GIS DATA SOURCES ..................................................... 73
9 PROJECT CONTACTS ..................................................................... 85
    9.1      ANYLOGIC™ SOFTWARE ................................................................................. 85
    9.2      CLUSTER .................................................................................................... 85
    9.3      HIGH PERFORMANCE COMPUTING ...................................................................... 85
    9.4      INFORMATICS GROUP ..................................................................................... 85
     9.4.1      IG Contacts.................................................................................................... 85
    9.5      METADATA SERVER ....................................................................................... 85
    9.6      MOAB AND MAP.......................................................................................... 85
    9.7      MODEL COMPARISON ..................................................................................... 85
    9.8      MIDAS MODEL REPOSITORY ........................................................................... 85
    9.9      OUTBREAK VISUALIZATION TOOL ...................................................................... 86
    9.10     MIDAS PORTAL ........................................................................................... 86
    9.11     SPILLOVER COMPUTING CAPACITY ..................................................................... 86
    9.12     STATE PREPAREDNESS ASSESSMENT ................................................................... 86
    9.13     SYNTHETIC POPULATIONS ............................................................................... 86
    9.14     USER MANUAL ............................................................................................. 86
    9.15     VALIDATION ................................................................................................ 86
    9.16     RESEARCH GROUP CONTACTS........................................................................... 86
10        JAVA OPTIMIZATION ................................................................. 87
    10.1     JAVA RUNTIME OPTIONS ON THE MIDAS CLUSTER ................................................. 87
    10.2     LESSONS LEARNED FROM THIS EXERCISE:............................................................. 88
    10.3     RESULTS (EACH STOPPED AFTER 2 BILLION INVOCATIONS) ....................................... 89




77b63587-bd68-4731-a5f7-ebca3c46b609.doc                                                                        2/1/2013
TABLE OF FIGURES
Figure   1.1 Life cycle of model from model developer point-of-view......................................... 7
Figure   1.2 Possible model use perspective by future research participants .............................. 7
Figure   2.1 The MIDAS Public Portal...................................................................................... 9
Figure   2.2 The MIDAS Private Portal .................................................................................. 10
Figure   3.1 One of the two MIDAS Linux Cluster Installations (with IBM Part Numbers) ........... 16
Figure   5.2 The GME query interface ................................................................................... 48
Figure   5.3 Results of a U.S. search for “Admin & Political Bounds” data ................................ 49
Figure   5.4 “View Details” results for the Midas.DBO.NPTS_Tracts layer ................................. 50
Figure   5.5 Bottom of the Details page ................................................................................ 51
Figure   5.6 Coverage Area for the Midas_DBO.NPTS_Tracts layer. Because the green
             box covers Alaska and Hawaii, the user can surmise that the data layer
             contains census tracts for those states. .............................................................. 51
Figure   5.7 A portion of a full metadata record for the Midas.DBO.NPTS_Tracts layer ............. 52
Figure   6.1 OutBreak main map screen ............................................................................... 59
Figure   7.1 PUMAs for a portion of North Carolina. PUMA 02900 covers two counties,
             whereas PUMAs 02801 and 02802 each cover a portion of a county. .................... 66
Figure   7.2 Example Web-based geospatial application that allows users to enter new
             locational data by pointing and clicking in an Internet browser window ................. 67

TABLE OF TABLES
Table   3.1   Summary of software available on the MIDAS cluster............................................ 17
Table   7.1     Geospatial formats available for data in the MIDAS repository ............................. 63
Table   7.2   Spatial Database Products. ................................................................................. 68
Table   7.3   Internet Map Servers. ........................................................................................ 68
Table   7.4   Desktop GIS Products. ....................................................................................... 69
Table   7.5   Tools for Spatial Statistics................................................................................... 69




77b63587-bd68-4731-a5f7-ebca3c46b609.doc                                                                     2/1/2013
1 Overview of the MIDAS IT Infrastructure
   The Modeling Infectious Disease Agent Study (MIDAS) is funded by the National Institute of
   General Medical Sciences (NIGMS) of the National Institutes of Health (NIH) for the purpose
   of improving the nation’s ability to respond to biological threats promptly and effectively. The
   two main objectives are to develop computational simulation tools and to establish a
   centralized informatics resource to store, display and make publicly available the tools and
   information developed, for the practical use of policymakers, public health professionals and
   researchers.

   MIDAS is a network of collaborative groups composed of several Research Groups and one
   Informatics Group. RTI International leads the Informatics Group that supports the
   organizational, management, and technical computational infrastructure for MIDAS. IBM and
   SAS support RTI in this endeavor.

   MIDAS is heavily dependent on Information Technology (IT) because of the nature of its high-
   powered data analysis and simulation models. NIGMS foresaw the advantage of shared IT
   resources that would allow a more efficient and cost-effective accomplishment of the MIDAS
   goals. The MIDAS Informatics Group works closely with the MIDAS Research Groups to make
   sure that appropriate computational support is developed for the project. The development of
   this computational infrastructure will be a continuous process based on requests, lessons
   learned, new needs, and budget realities. A MIDAS IT Infrastructure Governance Process has
   been established to make sure that a fair, open and rational decision-making mechanism is in
   place.

   Given that MIDAS is an organization composed of many participants with diverse levels of
   computational sophistication, we hope this User Manual can ease and speed up access to the
   valuable resources and services made available because of NIGMS’s investment in a shared
   IT infrastructure. Our view is that time spent looking for support is time not spent on the
   primary function of MIDAS, namely that of improving our ability to understand epidemics. This
   document is intended to be easy to use, even for those who are new to MIDAS or who are
   only part-time or temporary participants.

   The MIDAS Informatics Team continues to develop other communications methods, including
   e-mail notices, a help line, online forums and specialized user guides. Additional suggestions
   for information dissemination methods are welcome.

     1.1   Scope
   The MIDAS IT infrastructure provides three overlapping services, although it is expected to
   evolve and serve additional functions over the course of the project lifetime. Current services
   are to:

   Explore and enhance model performance. By providing a significant amount of
   homogeneous hardware resources that can be used in parallel, MIDAS will decrease the
   average time per run, resulting in a more feasible systematic exploration of the parameter
   space. By providing assistance to the Research Groups in improving the performance of their
   models, the Informatics Group identifies methods to enable faster analyses. By providing a
   supportive infrastructure around these resources, the Informatics Group simplifies the use of
   these models.
   Support model development and results analysis. MIDAS resources are used to support:
                Maintaining, analyzing and sharing empirical epidemiological data and social
                 model data (Chapter 5.)




77b63587-bd68-4731-a5f7-ebca3c46b609.doc          6                                           2/1/2013
                Enhancing model systems to have more scientific value or to be more
                 computationally effective or easier to use (Chapter 4.)
                Calibrating these models for new social networks or diseases models. (Chapter
                 4)
                Analyzing, managing, manipulating, and distributing simulation output (Chapters
                 6 and 7.)
   These fit within the typical simulation model life cycle as shown in Figure 1.1. Over time, we
   expect the MIDAS team to discover where the most benefit exists from common resources.




   Figure 1.1    Life cycle of model from model developer point-of-view

   Provide training and help support capabilities. Eventually, as other participants outside the
   original MIDAS Network start taking advantage of the MIDAS models, other valuable services
   in support of less sophisticated or specific models might become available. MIDAS might be
   called on to make the models, and analytic and data management tools, even easier to use
   and to provide additional education and support services. Figure 1.2 is a schematic of the
   various usage requirements for these non-MIDAS users.




   Figure 1.2    Possible model use perspective by future research participants

     1.2   About This Manual
   The chapters in this manual deal with the support areas available and how to access them.
   We expect the number of support areas to expand as the MIDAS Informatics Group and the
   MIDAS Research Groups learn from experience. The following information is currently
   available:
              Chapter 2: Knowledge management and education regarding the portal – MIDAS
              communications support, support of knowledge sharing of MIDAS subgroups,
              online education, etc.
               Chapter 3: Hosting and running models – importing, version control, running
                models, retrieving results, etc.



77b63587-bd68-4731-a5f7-ebca3c46b609.doc        7                                         2/1/2013
                Chapter 4: Model development—enhancing and calibrating models, dealing with
                performance issues, porting to new platform, etc.
                Chapter 5: Data management – for model intermediate and final results, and
                empirical epidemiological data.
                Chapter 6: Analytic and visualization facilities – Including cluster based utilities
                Chapter 7: Geographic Information Systems (GIS) facilities – data (human and
                natural object spatial and descriptive) and tools for managing data and mapping.
                Chapter 8: Appendices – Procedures that guide the MIDAS project; glossaries.

     1.3   Plans and schedule
   Through Spring 2006, the MIDAS Informatics Group plans to:
              Double the cluster hardware capacity and provide dual head node login
              capabilities;
                Develop analytic tools, such as visualization tools and synthetic population
                generation tools;
                Evaluate and expand model development tools;
                Provide training;
                Provide flexible access to other super computing facilities;
                Develop and populate a GIS data repository;
                Acquire additional data to support disease models of the US; and
                Capture the production disease models and place models, their results, inputs
                and documentation to establish a repository of models and their results.

     1.4   Key Participants
   Charter MIDAS study participants include:
              NIGMS – the study sponsor [Irene Eckstrand, Scientific Director, and James
              Anderson, Program Director]
                MIDAS Steering Committee – the group of advisors that establishes the goals
                and milestones that all participants should achieve [Bryan Grenfell, Chairman]
                Informatics Group – the group primarily responsible for computational support,
                data repositories, model enhancements, and user-driven models and tools [Diane
                Wagener, Principal Investigator]
                Research Groups
                o Virginia Polytechnic Institute [formerly Los Alamos National Laboratory] – the
                   social network modeling research group [Stephen Eubank, Principal
                   Investigator]
                o Johns Hopkins University – the agent base modeling research group [Don
                   Burke, Principal Investigator]
                o Emory University – the stochastic process modeling research group [Ira
                   Longini, Principal Investigator]




77b63587-bd68-4731-a5f7-ebca3c46b609.doc           8                                            2/1/2013
2 Knowledge Management and Education – The Portal
   This section describes the MIDAS portal, which serves as the interface for electronic
   information exchange (see Figure 2.1). The MIDAS portal is accessible to the public, but only
   registered users can access and provide information to the private section of the portal.

   Currently, there are two types of users for the private section: MIDAS researchers/staff and
   invited participants of MIDAS meetings. The registered users have been given user IDs and
   passwords to access the private area.




      Figure 2.1 The MIDAS Public Portal



     2.1   Public services
   The public component of the site is a public relations vehicle to advertise what MIDAS is and
   does. It also provides an entry point to the private site.

        2.1.1    About Tab
        Clicking on this tab (tabs run from left to right on the home page) displays a statement
        that defines the objectives of the MIDAS study.

        2.1.2    Calendar Tab
        A calendar of meetings and other events of importance displays under this tab on the
        public home page. The calendar is maintained by the MIDAS Portal Administrator.




77b63587-bd68-4731-a5f7-ebca3c46b609.doc         9                                          2/1/2013
        2.1.3    Publications Tab
        Published articles and book chapters relevant to MIDAS participants display under the
        Publications Tab. The page provides an alphabetic navigation bar to search by first
        author’s last name and subject of the abstract.
        For journal articles, participants may download PDF files attached to the citation or
        upload PDF files from remote sites directly into MIDAS. The “bibliography” can be
        updated with abstracts that are accessible through the National Library of Medicine’s
        Medline using a PUBMED ID.

        2.1.4    Links to Public Sites, News and a MIDAS Contact
        The links provided on the left most column of the home page identify useful public sites
        (Related Links), news items (News), and a point of contact (Contact) for the MIDAS
        project.

        2.1.5    Login to the private site
        The private portion of the MIDAS Portal is accessed by clicking on the Login link and
        providing a username and password. Password management is relegated to the MIDAS
        user. User authentication is strictly maintained by the portal security software. Users of
        the portal and the cluster use the same passwords for both systems. When a successful
        login occurs, the user should see the screen displayed in Figure 2.2.
        For more information or to apply for a Portal account, e-mail the MIDAS Help Desk.




      Figure 2.2 The MIDAS Private Portal




77b63587-bd68-4731-a5f7-ebca3c46b609.doc        10                                          2/1/2013
     2.2   Private Services
   The private site is organized into nine tabs: Home, My Profile, Calendar, Forum, Document,
   Publications, Models, Data and Statistics. The services, which are available to approved
   MIDAS researchers and staff, are described in this section.

        2.2.1    Home Tab
        This tab returns the user to the Private Home page, which includes links to other sites of
        interest and industry news. The MIDAS User Manual also displays from this page.

        2.2.2    My Profile Tab
        My Profile Tab allows the user to change his/her password and contact information. This
        information is used to generate the MIDAS roster.

        2.2.3    Calendar Tab
        See Section 2.1.2.

        2.2.4    Forum Tab
        The Forum or discussion board for cross-modeler communication provides a place for
        regular visitors to discuss topics relevant to MIDAS. Features include message posting,
        private messaging, and initiating new topic categories.

        2.2.5    Documents Tab
        The Documents page provides links to the investigator proposals, bi-yearly Steering
        Committee meeting minutes, monthly Network conference calls, materials related to
        quarterly Network and Consultation meetings. Click on the links to view or download
        agenda, roster, PowerPoint presentations or materials distributed at the meetings.

        2.2.6    Publications Tab
        See Section 2.1.3.

        2.2.7    Models Tab
        This tab identifies models in the model repository that have been received from the
        modeling groups.

        2.2.8    Data Tab
        The Data tab leads to the page where users can store and annotate datasets, URLs and
        documents. These data sources are used to describe model parameters and the source
        of the data that generated the parameters. The Data page enumerates the contents of
        the sources of data currently collected. The four column headings presented on the
        screen are:
                Dataset name links to the metadata summary page. The purpose of this column
                is to present a common data view that can be queried. The dataset name is
                assigned by the individual uploading the dataset. The name should be chosen so
                that the reader can identify the content in the data from the name.
                Description displays the subject matter, year, source, and other essential
                information.
                Dataset uploaded file links to the raw data (Excel format) upon which the values
                viewed in the Dataset Name files are based.




77b63587-bd68-4731-a5f7-ebca3c46b609.doc        11                                          2/1/2013
                 URL links to a Web site that contains the raw data. Note that there will either be
                 a Dataset Upload file entry or an URL entry.

        On the left side of the first Data page, there are three action options:
        Search all datasets allows a user to search the contents of the parameters represented
        in the Dataset Name list. The search can be used to identify parameters with different
        characteristics, for example, parameters that capture information on population studies
        that stratified Avian flu information by gender.

        View all datasets link returns the user to the full listing of datasets.

        Enter new datasets links to the metadata form to enter parameter data for a new dataset
        and upload the actual dataset to the server. This form, which feeds the information into a
        relational database, also allows the user to assign additional useful attributes to the data,
        which can be used as search keywords.

        The metadata are categorized in five main groups:
             Population information, which represents Census files or other specialized data
             sources that aggregate survey data for public use;
             Social Network information, including Daycare, School, Work, among others;
             Transportation information;
             Individual-level information, which comes from clinical trials, outbreak
             investigations, case-control studies or cross-sectional surveys. Classification is
             needed to characterize features of disease history, transmission, and treatment;
             Public health parameters, which include information on ways to contain the spread
             of disease both inside and outside the home and hospital.

        Each variable captured in the metadata form adheres to one of the following variable
        types: Yes, No, Not specified. This type is specified by clicking on a radio button.
        Population and Transportation have standard variable types that are defined from
        Census files or other specialized data sources. These data sources include Census
        statistics or demographic information from specific journal articles.

        Individual-level data most often involves categorizing the features of a published article.
        Examples of individual-level data come from clinical trials, outbreak investigations, case-
        control, or cross-sectional surveys. Classification is needed to characterized features of
        disease history, transmission and treatment.

        Public health parameters include information on ways to contain spread of disease both
        inside and outside the home and hospital.

        Specific fields captured in the metadata form are:

        General
            Dataset Name and Primary Contact (required): Dataset name formats the name
               as it will appear in the master listing of information on the portal page. Primary
               contact is the name of the person who completed the metadata form. This person
               can delete the file and modify the description content.
                URL: Public datasets, typically government-sponsored data, are often available
                 for download. URL linkage allows the MIDAS user to link directly to the website
                 containing the data to determine if the download is needed. Public datasets are
                 available from the sponsors’ Website. URL linkage allows the MIDAS user to do
                 directly to the Website containing the data to make a determination if the
                 download is needed.




77b63587-bd68-4731-a5f7-ebca3c46b609.doc          12                                         2/1/2013
                Citation: For some sources of data, such as parameter estimates, the entries are
                 available in peer-reviewed journals, typically in a table within a published article.
                 The citation identifies where, when and by whom these original data are
                 published.
                Primary Contact. This entry is the name of the person who completed the
                 metadata form. This person can delete the file and modify the description
                 content.
                Countries
                Duration of Study

        Population Parameters
            Age
                Gender
                Race/Ethnicity
                Household Size
                Hospitalization
                Other Parameters
                Daycare
                Playgroup
                School
                Work
                Other

        Transportation Parameters
            Air
                Surface

        Individual Level Parameters
             Analytical Design
                Multi-center
                Simulation

        Other Fields:
        It is important to note whether data is available electronically. If data is available
        electronically, data can be uploaded as a PDF file or actual dataset. These files then
        show up on the main Dataset page.

        Social Networks Parameters
            Number of Participants
                Transmission Probabilities
                Natural History Parameters
                Outcome Measures
                Disease
                Biological Specimens Collected
                Household Members Included
                Treatment/Intervention
                Efficacy Parameters


77b63587-bd68-4731-a5f7-ebca3c46b609.doc          13                                           2/1/2013
        Public Health Control Parameters
            Wearing Masks
                Isolation/Quarantine
                Closure of Public Places
                In-hospital Infection Control
                Targeted Antiviral Prophylaxis
                Surveillance/Containment
                Vaccination

        Other Parameters
            Comments




77b63587-bd68-4731-a5f7-ebca3c46b609.doc          14   2/1/2013
3 The Linux Cluster Environment
   The initial priority for the MIDAS Informatics Group is to provide resources for helping the
   Research Groups run their models on a powerful shared infrastructure. A significant amount of
   computational capability is being provided for running these models..

     3.1    Technical Summary
   The initial computational facility is a Linux cluster with significant expansion capability. The
   following is a technical overview of the initial cluster.

        3.1.1    Hardware
        The MIDAS model hosting system is deployed at RTI in Research Triangle Park, North
        Carolina. The initial architecture for running of the MIDAS models is composed of twin
        clusters each of which comprises:
              64 processors – 32 nodes, 25 with 5 GB of memory and 6 (nodes 025–030, 034-
                 039) with 12 GB of memory each. The application node (node 031) has 6 GB of
                 memory. Each node has two 2.4 GHz Opteron 64-bit addressable processors.
                 Shortly, 12 of the 5 GB nodes will be upgraded to 6 GB.
               1 management node – A dual-processor Intel 3.2 GHz Xeon with 8 GB Memory.
               1 storage node – 2.05 TB of hot-swappable SCSI disks operated with dual
                processor Intel 3.2 GHz Xeons with 8 GB memory.
               High speed interconnection for all nodes with Myrinet, which runs within each
                cluster but not across clusters.
              SUSE Linux Enterprise Server 9.3.
        The MIDAS Linux cluster architecture was selected after careful analysis of
        benchmarking studies using EpiSims, the largest and most complex of the original
        MIDAS Research Group models. Although EpiSims was designed to take advantage of
        the Linux architecture, the other two models (from Johns Hopkins and Emory) can be
        ported relatively easily to the Linux environment.
        The cluster uses high performance AMD Opteron processors as compute nodes because
        they can manage heavy memory usage through 64-bit addressing, which is attractive for
        large models. In addition, Opteron processors provide flexible access to a significant
        amount of relatively inexpensive computational resources to satisfy the growing needs of
        these three models.
   The layout of the two racks within each twin installation is shown in Figure 3.1.




77b63587-bd68-4731-a5f7-ebca3c46b609.doc          15                                            2/1/2013
Figure 3.1 One of the two MIDAS Linux Cluster Installations (with IBM Part Numbers)

        3.1.2    List of available software
        Table 3.1 lists the software available on the cluster. We will update this list as software is
        added. Software marked with an asterisk (*) is licensed and may have some use
        restrictions, so refer to the section in this document that is listed in the last column. Use
        the URL to get documentation on this feature.




77b63587-bd68-4731-a5f7-ebca3c46b609.doc          16                                           2/1/2013
Return to Text

Table 3.1 Summary of software available on the MIDAS cluster
                                                                                                                 MIDAS
                                                                                                                 Manual
   Name          Version        Description                                       URL                            Section
ArcExplorer      9.1       Software that            http://www.esri.com/software/arcexplorer/download.html       Chapter
                           performs a variety of                                                                 7&
                           basic GIS functions,                                                                  3.5.6
                           including display,
                           query, and data
                           retrieval applications
BOOST            1.32      Free, peer-reviewed      http://www.boost.org/libs/libraries.htm                      NA
                           C++ storage
                           libraries
Breeze           V4        Video conferencing       http://www.macromedia.com/support/documentation/en/breeze/   NA
CVS              1.12.11   Version control          http://www.hpcc.ecs.soton.ac.uk/hpci/tools/cvs/              NA
                           software
GCC              3.4.3     C++ Compiler             http://gcc.gnu.org                                           3.6.2
Intel            9.0       C++ Compiler             http://www.intel.com/software/products/compilers/clin/       3.5.2
Java             1.4.2     JAVA Compiler            http://java.sun.com/j2se/1.4.2/download.html 3.5.3
MPICH            1.2.6     Freely available,        http://www-unix.mcs.anl.gov/mpi/mpich/                       3.6.2
                           portable                                                                              and
                           implementation of                                                                     3.6.3
                           MPI, the standard
                           for message passing
                           libraries
PathScale        2.2.1     C++ Compiler             http://www.pathscale.com/ekopath.html                        3.5.9
R                2.1.0     Statistical software     http://www.r-project.org                                     3.6.4
SAS              9.1       SAS Basic, Stat,         http://www.sas.com                                           3.5.5
                           Graph
SUSE Linux       9         Operating system
                 Service   (IA-32 compatible)
                 Pack 1    Kernel version 2.6.5
Visualization              R, SAS, ArcExplorer                                                                   Chapter
software                                                                                                         6

      3.2       Computer Resources

          3.2.1      Allocation of Computer Resources
          The two multiple-processor computing clusters described in this chapter have been
          designed and developed for use by MIDAS researchers. In addition, access to several
          national computing centers has been arranged as secondary computing resources for
          use by MIDAS researchers on an as-needed basis.
          A Research Group’s access to the cluster is managed by a designated Account Manager,
          who will match individuals in the group with access to appropriate resources. For
          questions on allocations, passwords and related administrative issues, contact the
          MIDAS Help Desk.

          3.2.2      Standard Allocation

          Each of the Research Groups is given a standard allocation of CPU resources. Fifty
          percent of the total CPU resources available in the RTI cluster is set aside for “special




77b63587-bd68-4731-a5f7-ebca3c46b609.doc                    17                                                   2/1/2013
        allocation” requests. If the amount of special allocation requests is less than the amount
        available, the remaining CPU cycles are distributed equally across the groups.

        If a group does not use its allocated resources, they are shared equally among the other
        groups. Resources are allocated and available in such a manner as to maximize total
        usage. For example, if only one job is running on the system, then that job uses all
        available CPU cycles. Once another job is submitted, the first job decreases in priority in
        relation to resource allocations across the groups.

        Standard user allocation is 7,500 CPU hours per month. Disk space quotas are allocated
        based on a survey of user needs and amount of storage purchased.

        3.2.3    Applying for Additional Resources
        When the standard allocation does not meet user needs, the Research Group’s principal
        investigator can apply for extra resources for specific projects. A committee grants
        allocations on the basis of available resources and review of the priority proposed in the
        request. The committee consists of the principal investigators of the research groups, the
        NIGMS scientific director, and any other members selected by the MIDAS program
        officer. The application must be sent via email to the MIDAS Help Desk in the name of
        the principal investigator who is responsible for using the special allocation.

        Based on the committee recommendations, extra CPU time, disk resource, use of
        national computing center resources and, possibly, newly installed applications are made
        available to the research group applying for these extra resources.

        The committee assesses this request and responds within three working days. In
        general, allocations can be made for up to three months. If a special allocation of
        resources is needed after the three months, the investigator is asked to submit a new
        request.

        3.2.4    Moab Access Portal
        Members of the MIDAS community have access to the processor and storage resources
        in the MIDAS Linux cluster through a Moab Access Portal (MAP), available at
        http://moab.rti.org/map
        MAP, which provides the user interface to the MIDAS cluster, is an end-user job
        submission portal that integrates with the Moab Cluster Workload Manager/Scheduler. It
        provides large-scale submission to the Moab Cluster Scheduler, and associated resource
        managers, from any location where a Web browser is available; there is no need to install
        additional client software. Learn more about Moab from
        http://www.clusterresources.com/products/map/.

        3.2.5    Establish accounts and password management
        Each Research Group, through its Account Manager, will identify eligible members of
        their teams by sending e-mail to the MIDAS Help Desk with name, e-mail, phone
        number and client station IP address. The Account Manager will also be responsible for
        identifying accounts that should be terminated in a timely fashion after a team member
        leaves the Research Group.
        Each identified member will have Moab account established with an initial, temporary
        password which will be conveyed via phone. The user will be required to change the
        temporary password at the first login.
        MIDAS passwords must be changed every 180 days. Once a password has expired, the
        only way to get it reset will be to contact the MIDAS Help Desk.



77b63587-bd68-4731-a5f7-ebca3c46b609.doc        18                                          2/1/2013
        3.2.6    Track allocations
        On the Linux cluster, service units (SUs) are based on the number of nodes used and the
        wall clock time according to the following formula:

                  number of SUs = 2 * number of nodes used * wall clock hours.



        Currently several methods for monitoring usage statistics are being explored. Monitoring
        data will be used to define priority levels and guide allocation policies for the user
        population.

     3.3   Access the cluster
   The IT infrastructure provides a firewall on the connections to the Internet and offers a
   restricted access to the MIDAS hosting system based on the registered client IP address and
   required service port.
   Three types of access to the MIDAS hosting system are supported:
              Secure VNC-based access.
                Secure FTP connections for bidirectional file transfers using a Secure File
                 Transfer Protocol (SFTP) client.
                Secure shell (SSH) connections in place of Telnet for Terminal Emulation access
                 to the cluster using the login node only (node 032).
                X-Windows for access to applications requiring graphical output. Use of X-
                 Windows software for cluster access is only supported when the X-Windows
                 protocol is tunneled over the SSH port using PuTTY as the SSH client. Further
                 information for setting up X11 forwarding in SSH with PuTTY is available at
                 http://the.earth.li/~sgtatham/putty/0.55/htmldoc/Chapter3.html#S3.4.

        3.3.1    Connecting to the cluster via VNC
        RTI firewall security policies disallow X11 applications from being streamed to X-window
        servers outside of the RTI infrastructure. As an alternative, the RTI MIDAS Cluster offers
        the ability to connect to an encrypted VNC Linux desktop to run X11 applications.

        The RTI MIDAS Cluster utilizes the software package Enterprise RealVNC
        (http://www.realvnc.com) to create and manage the VNC Linux desktop sessions.
        RealVNC supports data encryption during user authentication and for sending/receiving
        session traffic.

        To use VNC on the RTI MIDAS Cluster, it is necessary to have a java-enabled web
        browser. Any modern web browser (Internet Explorer 6, Firefox 1.5, Opera 8, etc.) will
        work.

        To create a VNC session, log in to the RTI MIDAS Cluster via SSH and perform the
        following command:




77b63587-bd68-4731-a5f7-ebca3c46b609.doc         19                                            2/1/2013
                node032> vncserver

                VNC Server Enterprise Edition E4.2.5 - built May 16 2006 17:37:05
                Copyright (C) 2002-2006 RealVNC Ltd.
                See http://www.realvnc.com for information on VNC.
                Running applications in /MIDAS/home/gmcconnell/.vnc/xstartup
                Log file is /MIDAS/home/gmcconnell/.vnc/node032:2.log
                New desktop is node032:2

                node032>

        Look for the last line printed ‘New desktop is node032:x’ where x is a number. This
        number is the ‘display’ for your session. Take note of this number, it will be used
        throughout your VNC session.

        Open a web browser and type the following into the address bar and hit enter:
               http://rtihpc.rti.org:58xx

        Replace the xx in 58xx with the number of the display that was printed when you started
        the vncserver. For example, display 2 would be 5802; display 10 would be 5810.

        The web browser will now request a VPN session with the RTI MIDAS Cluster. A ‘Sun
        Java’ graphic will appear as your browser loads and initializes a Java Applet. There may
        be a pause of several seconds to a few minutes as the Java Applet is downloaded and
        initialized:




77b63587-bd68-4731-a5f7-ebca3c46b609.doc       20                                         2/1/2013
        A ‘Warning – Security’ dialog will appear. It will ask if you want to trust the signed applet
        distributed by RealVNC Ltd. It is safe to say ‘Yes’ or ‘Always’ (which remembers your
        approval and will not prompt you with another security warning when you reconnect):




77b63587-bd68-4731-a5f7-ebca3c46b609.doc          21                                           2/1/2013
        Next to appear will be a ‘VNC Viewer: Connection Details’ dialog. Change the
        ‘Encryption’ drop down box from the default ‘Let Server Choose’ to ‘Always On’:




        Click on ‘Options’ and then on ‘Security’. Change the selection from 512 bits to 1024 bits
        or 2048 bits:




                                  .


        Click ‘OK’ > ‘OK’ and you will be presented with a window for your Username and
        Password. This is the same username and password you use to connect to the RTI
        MIDAS Cluster:




        After successfully authenticating, a new window with a Linux KDE desktop will appear.
        On this desktop, you can run any X11 apps that are installed on the RTI MIDAS Cluster
        such as Matlab or SAS.




77b63587-bd68-4731-a5f7-ebca3c46b609.doc        22                                         2/1/2013
        It is possible to have specific applications execute when you start your VNC session so
        they are already running and displayed on your VNC desktop when you connect. To have
        an application start automatically, insert the command name into $HOME/.vnc/xstartup.
        For example, to start an xterm window automatically, insert this line into
        $HOME/.vnc/xstartup:


                    /usr/bin/xterm&



        Upon connection to your VNC desktop session, an xterm window will be displayed and
        wait for input.

        To disconnect from your VNC Linux desktop, click on the red ‘X’ button in the top right
        window corner. This will not terminate your VNC Linux desktop; it will persist even though
        you are not connected. This will allow long jobs to continue running. You can reconnect
        to the VNC Linux desktop at another time to monitor and interact with your X11
        application.

        When your X11 application jobs have completed, you should terminate your VNC Linux
        desktop to free resource memory. Connect to the RTI MIDAS Cluster using SSH. On the
        command line, type the command:




77b63587-bd68-4731-a5f7-ebca3c46b609.doc        23                                         2/1/2013
                  node032> vncserver -kill :x

                  VNC Server Enterprise Edition E4.2.5 - built May 16 2006 17:37:05
                  Copyright (C) 2002-2006 RealVNC Ltd.
                  See http://www.realvnc.com for information on VNC.

                  node032>



        Replace :x with the display number that was provided when you started the vncserver
        process.

        There are many features and options to the VNC protocol. This document is intended to
        provide an overview to assist you in creating and connecting to a VNC Linux desktop
        session. For more information, view the manpages for Xvnc and vncserver.

        Additional documentation about the RealVNC viewer can be downloaded from the
        RealVNC website here.

        3.3.2    Running X-Windows Applications
        In order to run X11 Application, first connect to a VNC desktop session and then open an
        Xterm window to enter a command line session:




        Once the Xterm command line window is open, enter an interactive session on a
        compute node by using the ‘qsub –I’ command. For example, the command below starts
        a job on the cluster that requests two nodes, with 2 processors/node (which is the default



77b63587-bd68-4731-a5f7-ebca3c46b609.doc        24                                         2/1/2013
        for the RTI cluster and an expected duration of 2 hours, running on the new cluster –
        nodes033 – node063):



                    qsub -I -l nodes=2:ppn=2,walltime=2:00:00 -q rti_new_cluster



        When the qsub command is successfully entered, you will be allocated a node or set of
        nodes. You will then be logged into one of the requested nodes and will be on the
        command line of that node. The output will be similar to:




77b63587-bd68-4731-a5f7-ebca3c46b609.doc       25                                         2/1/2013
                       gmcconnell@node032:~>qsub -I -l nodes=2:ppn=2,walltime=2:00:00 -q
                       rti_new_cluster
                       qsub: waiting for job 321341.node032 to start
                       qsub: job 321341.node032 ready
                       --------------------------------------------------
                       Begin PBS Prologue Mon Oct 30 15:36:26 EST 2006

                       Job ID:   321341.node032
                       Username:      gmcconnell
                       Group:     rti
                       Job Name:     STDIN
                       Limits:   nodes=2:ppn=2,walltime=02:00:00
                       Queue:     rti_new_cluster

                       PBS Prologue:       Enabling access for gmcconnell on node064
                       PBS Prologue:       Enabling access for gmcconnell on node064
                       PBS Prologue:       Enabling access for gmcconnell on node063
                       PBS Prologue:       Enabling access for gmcconnell on node063

                       End PBS Prologue Mon Oct 30 15:36:26 EST 2006
                       --------------------------------------------------
                       gmcconnell@node064:~>


                       pn=2,walltime=2:00:00 -q rti_new_cluster



        You are now logged into one of the requested nodes that were allocated by the resource
        manager for your use.

        One of the key components to running X11 applications is the $DISPLAY variable. This
        generally gets set automatically during normal login. However, when logged into a node
        using interactive qsub, it is not set correctly. This makes it necessary to open a new
        Xterm terminal login window and connect via SSH to the compute node that was
        allocated by the resource manager.

        In this example, node064 was allocated and the user was automatically logged in. In the
        VNC desktop session, the user opened a new Xterm terminal window and executed the
        following command:

                    gmcconnell@node032:~>ssh -X node064
                    Last login: Mon Oct 30 13:28:01 2006 from node032
                    gmcconnell@node064:~>




        Once on node063, you can check the $DISPLAY variable to ensure that it is set:

                  gmcconnell@node064:~>echo $DISPLAY
                  localhost:10.0
                  gmcconnell@node064:~>




77b63587-bd68-4731-a5f7-ebca3c46b609.doc        26                                       2/1/2013
        It is now possible to execute an X11 application. In this example, SAS is executed.


                    gmcconnell@node064:~>sas




        This should bring up the SAS window:




        When the X11 application has completed running, exit the application and type ‘exit’ in
        Xterm command line window. Once the interactive qsub session ends, you will be logged
        out of the compute node and be back on the command line of node032.

        3.3.3    Model Execution
        It is not necessary for remote users to stay connected to the cluster while the models are
        running. Users can disconnect at any time and the job they have initiated will continue to
        run on the cluster. Users can reconnect to the cluster at any time to check the status of
        the previously started job.

        3.3.4    File Transfers
        Once connected, users can transfer files into and out of any of the /MIDAS/* file systems
        to which they have been granted access. For transfer of files larger than 5 GB, an
        alternative method of file transfer such as tape media should be arranged. Examples of



77b63587-bd68-4731-a5f7-ebca3c46b609.doc        27                                         2/1/2013
        files MIDAS Users might need to move to and from the Linux cluster frequently include
        configuration files and run logs and can be very large (hundreds of Gigabytes). SFTP
        access to the Cluster Control Node is provided for this purpose.
        Users do not have direct read/write access from their Windows workstations to the
        prototype cluster’s file system through partition sharing protocols such as NFS or Samba.
        Only SFTP file transfer functionality is provided.


        3.3.5    Terminal Access
        Telnet access to the cluster is not permitted because of the protocol’s inherent security
        limitations. Instead, users are able to connect using SSH protocol. SSH clients for various
        platforms are widely available. The recommended freeware is PuTTY, which can be
        downloaded from:
        http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html
        Users can have multiple PuTTY windows open at the same time. Once connected, users
        have the same level of access as when using Telnet. In other words, they are able to
        execute any Linux shell command permitted for non-root users and to run their own
        scripts and binaries.

        3.3.6    Interactive Sessions Using qsub
        When a cluster user invokes an interactive qsub session, such as

                  > qsub –I -1 nodes=10:walltime=04:00:00



        X11 forwarding is not enabled. This makes it impossible to run X11-based applications,
        such as TotalView.

        The reason that X11 forwarding is not enabled is because of a known feature of qsub: the
        cluster access session is run by qsub as the Linux super user (root), and not the user's
        id. X11 forwarding therefore does not work.

        The solution to this problem is as follows: once the interactive qsub session has started,
        the user is now allowed to ssh directly to any of the nodes (ten in the example above)
        allocated to him by the qsub session. In order to have X11 access, the user simply needs
        to open another X terminal and ssh to one of the nodes given to him by the interactive
        qsub session. The command to do this is:


                  > ssh –X nodeXXX

        X11 forwarding in now enabled from that X terminal.

     3.4   File systems and storage
   The MIDAS Linux cluster has 2.05 TB of shared storage and 80 GB of local storage on each
   node. This is divided into categories of storage defined in this section.

        3.4.1    Home Directories
        Home directories are for long-term storage and are backed up daily.
        The home directory is the default directory at login. Use this space for storing files such
        as source code, scripts, input data sets, etc. A disk quota of 10 GB exists for the home
        directory. All user home directories are contained in the /MIDAS/home file system.



77b63587-bd68-4731-a5f7-ebca3c46b609.doc         28                                           2/1/2013
        The command to see individual disk usage and limits is quota.

        3.4.2    Scratch Directories
        Scratch file directories are intended for short-term use and should be considered volatile.
        Four shared scratch file systems (/MIDAS/storage1, /MIDAS/storage2, /MIDAS/storage3,
        and /MIDAS/storage4) are available from all the nodes. The size of scratch file systems
        varies across the four storage areas.

        Please note that backups are not performed on the scratch directories. In the event of a
        disk crash or file purge, files on the scratch directories cannot be recovered. Therefore,
        users should make sure to back up these files to permanent storage as often as
        significant changes are made (at least daily).
        Files in these scratch file systems are removed by the system administrator on the basis
        of size and time since the last access. The following is the removal schedule for the initial
        startup phase:

         File Size    Removed after
         > 25 MB      3 days
         < 25 MB      14 days

        This schedule will be reviewed at least quarterly.

        3.4.3    Local scratch space
        Each node has available 80 GB of local scratch space. The space is available only while
        a user is assigned the node through a batch job and is available at /scr (on IA-64).

        The scratch space is local to each compute node, and writing and reading to disk is faster
        for the local scratch than for a home directory or shared scratch directories. Therefore,
        the local scratch is ideal for writing temporary files during execution.

        Files in the local scratch space are not available to any other nodes and hence are not
        directly accessible to processes running on other nodes as part of a job. Only processes
        running on the two CPUs that make up a node have direct access to files in the local
        scratch space.

        Users should consider the local scratch space as volatile. All files are automatically
        deleted after a batch job completes and the node is unallocated. All files to be saved
        must be copied from local scratch as part of a job. Users will not be able to access files in
        local scratch after a job has completed. The following example snippets can be used in a
        batch script to copy files:


             In Bourne shell:
                           for host in `cat $PBS_NODEFILE`
                           do
                                ssh -q -x $host '/bin/cp....' &
                           done
                           wait
             In C-shell:
                           foreach host (`cat $PBS_NODEFILE`)
                               ssh -q -x $host '/bin/cp....' &
                           end
                           wait



77b63587-bd68-4731-a5f7-ebca3c46b609.doc         29                                          2/1/2013
        3.4.4    /tmp, /usr/tmp, and /var/tmp Directories
        The /tmp, /usr/tmp, and /var/tmp directories are intended for temporary files that are used
        during the execution of a process or job. Do not use these directories for storage of user
        files. Files placed in /tmp, /usr/tmp or /var/tmp may be purged at any time.

     3.5   Future Enhancements

        3.5.1    Reserved Scratch Directories
        The ability to manage an area for large, but short-term storage needs will be added. This
        section will be updated with information on this facility when it is installed.

        3.5.2    Permanent File Storage
        Storage of important long-term project files, with annotation and back up, will be
        available. This resource will be developed for very long-term file storage. It will provide
        facilities grouping, version control, annotating, sharing, and searching research project
        files such as source code and data. This section will be updated with information on this
        facility when it is installed.

     3.6   Running jobs

        3.6.1    Running MPI programs
        Executing the programs can be done with mpirun using the statement:

                  > mpirun [options][-np n] mpi_program args...
        where n can be replaced by any number of processors. For a list of options, type:


                  > mpirun –help


        Some of the most frequently used programs are:
            dbg      To run under the chosen debugger, for gdb “-dbg=gdb”
              np n      Specifies the number of processors n to run on
              verbose Displays errors and warnings

        3.6.2    Running Intel C++ programs
        The Intel compiler is located on node031 in /opt/intel_cc_80 and /opt/intel_idb_80,
        respectively. The following command to set up your environment should be invoked prior
        to first use.

                  > source /opt/intel_cc_80/bin/iccvars.sh


        3.6.3    Running Java Programs

        Several versions of the Java compiler (javac) and run-time virtual machine (java) are
        available once the following command to set up your environment is invoked.



77b63587-bd68-4731-a5f7-ebca3c46b609.doc         30                                          2/1/2013
              source setJava --version 1.3.1
              source setJava --version 1.4.1
              source setJava --version 1.4.2

        3.6.4    Running R programs
        R supports X-windows, which allows the user to view the graphs on screen. To allow X-
        windows to work on a workstation, the user needs an X-windows client such as xwin32. R
        will also operate out of the shell, but the user will not be able to see the graphs on the
        screen. The first thing to do is to log in to the application node:

        T
                  > ssh node031
        h
        The location R executable must be made accessible to the Linux shell. You can either
        invoke R directly by specifying the absolute path of its location within the RTI Cluster file
        system:

                  > /opt/R/bin/R


        or by first adding /opt/R/bin/R to your shell PATH variable and then running the program
        directly from the command line:


                  >R


        If you submit R programs to the RTI Cluster via Moab, you should use the rti_app queue.

        3.6.5    Running SAS programs

        SAS can be used either interactively with X-windows (using an X-windows client such as
        xwin32 or for batch submittals. Use the sequence below to invoke SAS.


                  > ssh node031
                  > sas


        3.6.6    Running ArcExplorer

        ArcExplorer should be invoked from xwin32. It can be started on node031 by the
        following script:



                  export AEJHOME=/opt/aej90exe
                  export LD_LIBRARY_PATH=$AEJHOME/lib
                  export PATH=$PATH:$AEJHOME/bin
                  aejava




77b63587-bd68-4731-a5f7-ebca3c46b609.doc          31                                           2/1/2013
        3.6.7    Job Queues
        MIDAS has installed the RPM (a UNIX package that helps install software packages)
        version of the latest sysstat tool on nodes001-030 of the MIDAS cluster. The sysstat
        package contains the iostat, mpstat and sar utilities. These utilities are used to gather
        information on system load and I/O utilization. We have also installed the Moab Workload
        Manager, a workload scheduler for clusters. It supports dynamic management of
        compute resources, i.e., nodes, CPUs, RAM, and disk space. Moab also supports
        identity-based usage tracking and reporting.

        The combination of the two systems enables a comprehensive reporting system to be
        developed for MIDAS, including resource utilization by user, group, and type of request.

        Standard Queues: The use of the MOAB Workload Manager allows MIDAS to define
        two types of job queues on the RTI cluster, depending on the intended use of the
        resources. The low-priority queue is to be used for non-production runs and program
        development. The high-priority queue is to be used for production runs. Real-time
        examination of queue loads as well as job submission can be performed via the MAP
        tool.

        Spillover Queues: MIDAS has two spillover queues to handle situations where the users
        may need more computing space.

        NCSA/Moab spillover queue: This Moab peer-to-peer functionality can be used for
        spillover situations when the job submitted to the RTI cluster exceeds its computational
        capacity. The remote peer is the 1024-node Itanium-based Mercury Cluster at the
        National Center for Supercomputer Applications (NCSA). Users requiring NCSA’s
        computational resources should submit their jobs to the rti_ncsa Moab queue either via
        the Moab Access Portal or the Linux msub command.

        TACC/Gridshell spillover queue: The GridShell software allows transparent user
        submission of large jobs to the Texas Advanced Computer Center (TACC)’s Lonestar
        cluster. MIDAS cluster users require an account on Lonestar in order to use the TACC
        spillover queue. Contact the MIDAS help desk for more information on the proper
        procedure for gaining access to Lonestar. Once a Lonestar account had been created,
        the user should use the following procedure:

        First, test whether jobs can be submitted to Lonestar while connected directly to that
        machine, as follows (for more information on Condor, see
        http://www.cs.wisc.edu/condor):
                 1. Login to Lonestar:


                       > ssh -l <username> lonestar.tacc.utexas.edu


                 2. Edit $HOME/.soft file and add the line:

                       +gridshell

                 3. Type "resoft"
                 4. Start eg. a pool with 1 job of 10 processors each that will run for 60 minutes:


                       > pool_starter -n 1:10 -W 60



77b63587-bd68-4731-a5f7-ebca3c46b609.doc         32                                          2/1/2013
                 5. Check that the LSF job proxies were submitted:


                       > bjobs

                 6. Edit a sample condor job submit file and submit it:
                    > Edit (eg.) sub.cmd to include:

                       executable = /bin/sleep
                       arguments = 150
                       universe = vanilla
                       # Pick a job proxy with sufficient lifetime to
                       # execute my job
                       Requirements = (TARGET.TimeToLive > 200)
                       should_transfer_files = true
                       when_to_transfer_output = on_exit
                       Error = output/errfile.$(CLUSTER).$(PROCESS)
                       Output = output/outfile.$(CLUSTER).$(PROCESS)
                       queue 10


                         > mkdir output # - this is for the output files
                         > condor_submit sub.cmd
                 7. Check your jobs are in the condor queue:

                        > condor_q

        Perform job submission tests while connected directly to Lonestar. When they have been
        completed, follow the procedure below to submit jobs to Lonestar from RTI cluster:
                 1. Set up your remote ssh login (optional):
                         a. Create a new public-private key pair:

                                 > ssh-keygen –rsa


                         b. Copy the id_rsa.pub to lonestar:$HOME/.ssh/authorised_keys file.
                 2. Add the following entry into the $HOME/vo.conf file:

                                 ssh://<lonestarUsername>@lonestar.tacc.utexas.edu



                 3. Create a condor submit file (as in Step 6 above), and the executor script
                    (myexec.sh), which should include a reference to the actual executable:

                                 #!/bin/sh
                                 # Here insert the path to the local executable in
                                 Lonestar
                                 /UnixPath/binary




77b63587-bd68-4731-a5f7-ebca3c46b609.doc          33                                       2/1/2013
                 4. Examine and edit the submit file and the myexec.sh and make sure you
                    stage your executable onto Lonestar first. Staging means to transfer and
                    recompile your program under the Lonestar environment and place it
                    somewhere in your Lonestar account directories along with whichever input
                    and/or other datafiles required to perform a run.
                 5. Create the same condor pool now with the vosub.pl command:


                               > vosub.pl -n 1:10 -W 60



                 6. Check that your LSF job proxies have been submitted:

                                 > agent_jobs


                 7. Submit your jobs:

                               > condor_submit sub.cmd


                 8. Check your jobs:

                               > condor_q



        3.6.8    Disk space for batch jobs
        See Section 3.4 for an overview of storage options.

     3.7   Development Utilities and Compilers
   Chapter 4 provides greater depth about the MIDAS resources and support for model
   development. This section provides a description of utilities and compilers.

        3.7.1    Editors
        Standard Linux editors, vim and emacs, are supported.

        3.7.2    Default Compilers
        PathScale EKOpath Compiler Suite is a set of high-performance C, C++ and Fortran
        compilers along with associated libraries and debugger that are optimized for the
        cluster’s 64-bit Linux architecture. The suite of tools is compatible with the standard
        GNU/gcc compiler, also available on the cluster.

        The tools are available on all but the cluster log in nodes. The number of concurrent
        users of the compiler is controlled by a subscription service running on the cluster.

        GCC (version 3.3.2) Standard GNU compiler suite is open source, public-domain
        software that has no license restrictions.




77b63587-bd68-4731-a5f7-ebca3c46b609.doc        34                                         2/1/2013
        The GNU compiler suite includes:

         Name    Subject
         gcc     Main GNU compiler page
         G++     C++ compiler
         G77     F77 compiler

        The Linux Manual provides the documentation available for gcc, g++, and g77.

        3.7.3    Libraries and Application Software
        Information on location of the libraries and how to link to them will be provided once all
        the software is installed.

        GLUT (Version 3.6) (stands for OpenGL utility tool kit) allows writing system-independent
        applications without having to learn X-windows and window’s own window system.
        Documentation is available at http://www.opengl.org/documentation/.

        MPI (Version 1.2.5) is a freely available, portable implementation of MPI, the standard for
        message-passing libraries. Documentation is available at http://www-
        unix.mcs.anl.gov/mpi/mpich/.

        Boost (Version 1.30.2) provides free peer-reviewed portable C++ source libraries.
        Documentation is available at http://www.boost.org/index.htm .

        Spring (Version 2.0) is software designed with the following design objectives:
              Operate as a seamless geographical database, with a large volume of data,
                 without being limited by tiling schemes, scale and projection. Object identity
                 should be maintained on the whole database.
               Support both raster and vector data geometries and integration of remote
                 sensing data into a GIS, with functions for image processing, digital terrain
                 modeling, spatial analysis and database query and manipulation.
               Achieve full scalability, that is, be capable of working with full functionality from
                 desktop PCs running Windows or OS/2 to high-performance UNIX workstations.
               Provide an easy-to-use, powerful environment with a combination of menu-driven
                 applications and a spatial algebra language.
        Documentation is available at http://www.dpi.inpe.br/spring/program/chap1/intro.html.

        Metis (Version 4.0) is a family of programs for partitioning unstructured graphs and
        hypergraphs and computing fill-reducing orderings of sparse matrices.

        SAS
               SAS Basic – SAS Basic includes commonly used statistical tools for data
                 management and univariate/bivariate analyses.
               SAS Stat – SAS Stat includes software statistical capabilities with tools for both
                 specialized and enterprise-wide analytical needs. Ready-to-use procedures
                 handle a wide range of statistical analyses, including analysis of variance,
                 regression, categorical data analysis, multivariate analysis, survival analysis,
                 psychometric analysis, cluster analysis, and nonparametric analysis.
               SAS Graph – SAS Graph software is designed for use by analysts who need to
                 explore, examine and present data in an easily understandable way and then
                 distribute their findings to decision-makers in a variety of formats.



77b63587-bd68-4731-a5f7-ebca3c46b609.doc         35                                           2/1/2013
        R is a language and environment for statistical computing and graphics. It is a GNU
        project similar to the S language and environment developed at Bell Laboratories
        (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be
        considered a different implementation of S. There are some important differences, but
        much code written for S runs unaltered under R.

        R provides a wide variety of statistical (linear and nonlinear modeling, classical statistical
        tests, time-series analysis, classification, clustering, etc.) and graphical techniques, and
        is highly extensible. The S language is often the vehicle of choice for research in
        statistical methodology, and R provides an open source route to participation in that
        activity.

        One of R's strengths is the ease with which well-designed, publication-quality plots can
        be produced, including mathematical symbols and formulae where needed. Great care
        has been taken over the defaults for the minor design choices in graphics, but the user
        retains full control.

        R is available as free software under the terms of the Free Software Foundation's GNU
        General Public License in source code form. It compiles and runs on a wide variety of
        UNIX platforms and similar systems (including FreeBSD and Linux), Windows, and
        MacOS.

     3.8   CVS
   To establish a new CVS repository for a project send an email to the MIDAS Help Desk. CVS
   is currently loaded on nodes 31 and 32 on the MIDAS cluster. Log on to one of these nodes to
   use standard CVS commands to manage project files under version control. Visit the following
   link for documentation and help on using CVS: https://www.cvshome.org/docs/manual/cvs-
   1.11.19/cvs.html#SEC_Top

   Following is a brief discussion of a few typical CVS scenarios. The actual command-lines are
   presented in bold and italic fonts on separate lines. The standard cycle when working with
   revision control is:
            1. initial checkout
            2. make local changes
            3. get updates from repository
            4. repeat steps 2 and 3
            5. commit local changes to repository

   You can get help on any CVS command by typing

                  > cvs - -help command.

        3.8.1
                 Initial check out
        The following command may be run to create a local copy of a project. The command
        creates the My_MIDAS_project directory and copies all the files in the repository under
        CVS control:

                  > cvs checkout My_MIDAS_project




77b63587-bd68-4731-a5f7-ebca3c46b609.doc          36                                           2/1/2013
        3.8.2    Get updates

        Once you have a local copy of the code, you can work in your sandbox. When you're
        ready to get the latest code, you'll need to update your local working copy. Change
        directory to the top-level folder, such as My_MIDAS_Project, and run the command:


                  > cvs update -d -p



        If you have local changes, the update will try to merge changes from the repository into
        your local files. If update cannot merge the file, it will leave conflicts in your files. Resolve
        those conflicts before proceeding. It should be easy to spot conflicts in your files since
        CVS marks these regions with: "<<<<<<<", "=======", and ">>>>>>>".

        3.8.3    Look at change log
        CVS provides you commands to investigate the history of a file. The command “log”
        helps to see the history of changes made to a file:


                  > cvs log filename

        CVS command “diff” helps to look at the differences between two revisions of a file:


                  > cvs diff -r rev1 -r rev2 filename

        There are many options for the diff command.

        3.8.4    Look at Differences in Versions
        If the file is My_MIDAS_Model.txt, then you can obtain the difference between two
        versions by running a command that looks like this:


                  > cvs diff -r 1.3 -r 1.6 My_MIDAS_Model.txt
        I
        n place of revision numbers, you may also use tag names. You'll probably want to use
        tag names if you need to diff multiple files at once.

        You may also specify special date arguments to some commands, including the diff
        command. So, for example, to see the diff between the local version of
        My_MIDAS_Model.txt and the version of the file from two days ago, you could use:


                  > cvs diff -D "2 days ago" My_MIDAS_Model.txt



        The -D option accepts dates in many formats including yyyy-mm-dd.
        You can also diff a file against the current revision of the file in the repository:


                  > cvs diff filename


77b63587-bd68-4731-a5f7-ebca3c46b609.doc           37                                             2/1/2013
        3.8.5    Get CVS Information about Local Files
        If you need to see CVS information about your local copy of the files (not often needed),
        use CVS status like this:

                  > cvs status My_MIDAS_Model.txt


        3.8.6    Add New Files and Directories
        If you need to add new files and directories to the CVS repository, you'll need the add
        command. You can use a simple version like this:

                  > cvs add -m "Creating a folder for holding model documents" docs

        You may add multiple files or directories at once, but you must add a directory before
        attempting to add any files or subdirectories that it contains.

        3.8.7    Commit Changes

        It is a good practice to run an update before committing changes, to make sure that your
        code works with the head of the repository. Although CVS does not guarantee atomic
        check-ins, it is recommended to attempt to commit all files that logically contribute to the
        same change all at once. You commit files with the commit command (sometimes
        shortened to ci):


                  > cvs commit -m "Describe the purpose of your changes" file1 file2


        At this point the code has been updated and any changes have been included in the
        repository for future use.

        3.8.8    Other CVS Tools and Concepts
        This is by no means an exhaustive tutorial on CVS. Creating branches to the main trunk
        and tracking different releases can be complicated activities but they can also be an
        efficient use of the tool. In future we may support direct CVS sessions on the users PCs
        through the use of encrypted keys. There are no specific targets for doing this right now.




77b63587-bd68-4731-a5f7-ebca3c46b609.doc         38                                          2/1/2013
4 Model Development
   The purpose of this section is to describe (1) the different components of the cluster that
   support development of the disease models and (2) how to use these components to support
   the running and development of models.

     4.1   Moving to Linux
   Moving files and data can be accomplished by one of the following options:
              E-mail code to MIDAS Help Desk, including a precise description of the
                 attachments, and an indication of where in the cluster the files should reside.
                 SFTP code to FTP site (ftp.rti.org) and send an e-mail notification to MIDAS Help
                  Desk indicating that a transfer of data has been performed, listing the locations
                  where the new data reside.
                 Send fixed media (CD) to MIDAS Help: 800 Park Drive, 4th Floor, 4B01, RTP,
                  NC, 27709, ATTN. Bill Savage (919) 316-3485 along with precise descriptions of
                  the included files and an indication of where in the cluster such files should
                  reside.
                 Copy directly from source to Linux. This is not recommended if the size of the
                  compressed file exceeds 3 GB. Files larger than 3 GB but smaller than 10 GB
                  should be moved after business hours (EST/EDT.)

     4.2   Parallelizing code
   Each of our two Linux clusters currently consists of 32 dual processor nodes. Each of these
   processors can operate independently. The trick in parallelizing code is to design it so the
   different components of a complete run can capitalize on the independence of the processors.
   There are a number of strategies that can be used to develop parallel code. The best strategy
   will depend on a variety of application-specific elements. The first step is to establish the need
   for parallelizing the code. Usually, the most important criterion is that the problem size exceed
   the available memory in a single node. Once the need is established, a strategy can be
   determined that fits the specific features of the program and the environment.

   For more information on methods for parallelizing code contact Diglio Simoni at
   dsimoni@rti.org.

     4.3   Fixing code/Performance tuning
   The parallel efficiency of a job is calculated by measuring the wall clock time using one
   processor, then measuring the wall clock time with n processors, and calculating the ratio. If
   this number is close to n, then you have a high degree of parallelism.

        4.3.1     Timers
        Timers are used to measure code performance and to determine bottlenecks in
        applications. A timer is a function, subroutine or program that can be used to return the
        amount of time spent in a section of code.
        Different types of time quantities are returned from the timers:
               user time – the amount of CPU time used by the user's program;
                 system time – the amount of CPU time used by the system in support of the
                   user's program;
                 CPU time – sum of user time and system time;
                 wall clock time – lapsed time.




77b63587-bd68-4731-a5f7-ebca3c46b609.doc           39                                         2/1/2013
        Some of the most often used routines/programs are:

            Time (/usr/bin/time)
            The quickest way to time code is to run the code within the command /usr/bin/time.
            This LINUX command will return user time, system time, and the total wall clock time.
            The LINUX user’s manual will provide detailed documentation of the command. The
            details that identify how to format the output are particularly pertinent. Note also that
            the cdh and tcsh have time commands and they perform identical tasks.

            An example is

                       > /usr/bin/time a.out –p

            where -p is the option that specifies a format that can be viewed by a text editor.

            Gettimeofday Function
            A second method of obtaining time is to use a C library function named gettimeofday.
            The gettimeofday() function returns four different values, two of which are important
            for timing. These two items are the number of seconds since Jan. 1, 1970, and the
            number of microseconds since the beginning of the last second.
            It can be used as an elapsed time as is shown in the following C code fragment.

                       #include <stddef.h> /* definition of NULL */
                       #include <sys/time.h> /* definition of timeval struct and protyping
                       of gettimeofday */

                       double t1,t2,elapsed;
                       struct timeval tp;
                       int rtn;
                       ....
                       ....
                       rtn=gettimeofday(&tp, NULL);
                       t1=(double)tp.tv_sec+(1.e-6)*tp.tv_usec;
                            ....
                       /* do some work */
                            ....
                       rtn=gettimeofday(&tp, NULL);
                       t2=(double)tp.tv_sec+(1.e-6)*tp.tv_usec;
                       elapsed=t2-t1;


        4.3.2    Profilers
        Manually inserting calls to a timer is a practical method to estimate performance only if
        you are working with a small piece of code and know that the code is important for the
        performance of your application. A profiler command automatically includes timing calls in
        large applications to determine the critical areas to optimize. Specifically, calls are
        inserted into applications to generate timings about subroutines, functions or even loops.
        When using a profiler, optimizing code becomes an iterative process that is defined by
        the following four steps:
           1. Check for correct answers.
           2. Profile to find the most time-consuming routines.
           3. Optimize these routines using compiler options, directives/pragmas, and source
                code modifications.
           4. Repeat steps 1–3 until the most important routines have been optimized.


77b63587-bd68-4731-a5f7-ebca3c46b609.doc          40                                         2/1/2013
        Using profilers in conjunction with independent timers is a powerful technique. The
        profiler can narrow the field of routines to optimize. Timers allow these routines to be
        finely tuned.

            gprof
            A quick way to get more detailed information on functions and routines is to use the
            profile tool gprof. The first step is to compile to source code with the compiler flags for
            profiling. For the GNU compiler, the flag is -pg. After compiling the code, the second
            step is to execute the code, which will then generate a gmon.out file. To analyze the
            gmon.out file, use gprof. The results of the analyses will be dumped to stdout. For
            example, the sequence:

                       > gcc -O -pg foo.c
                       > ./a.out
                       > gprof a.out gmon.out
            w
            ill generate a 'flat' profile that contains a useful breakdown of time spent in functions
            and subroutines. The 'call graph' profile contains inclusive and exclusive time spent in
            subroutines and functions. Section 3.6.2, which describes the GNU compiler,
            identifies information about the compiler flags for profiling.

        4.3.3    Debugging
        There are two types of debuggers to consider: gdb and TotalView.

            gdb
            The GNU gdb debugger is available on the cluster. For serial debugging, just start up
            gdb.
            Debugging parallel programs is much more difficult than debugging serial programs.
            Parallel programs are subject not only to the usual kinds of bugs, but also to those
            having to do with timing and synchronization errors. The following examples illustrate
            some uses of the gdb option.
             Using gdb with MPI Code – the Printf approach
                If printf is being used to aid in code debugging, it is important to include the rank
                (i.e., the node ID) of the process that is being traced during the debugging
                process.
                Starting MPI jobs with a debugger
                 The -dbg=<name of debugger> option to mpirun causes processes to be run
                 under the control of the chosen debugger. For example, enter
                 mpirun -dbg=gdb or mpirun -dbg=gdb a.out
                 to invoke the mpirun_dbg.gdb script located in the mpich/bin directory. This script
                 captures the correct arguments, invokes the gdb debugger, and starts the first
                 process under gdb.

            TotalView
            TotalView is a fast and efficient application debugger supporting the Fortran, C, and
            C++ languages as well as most parallel computing paradigms. TotalView enables
            developers of Linux applications to graphically debug their code. TotalView supports
            multiprocess and multithreaded applications. This tool is currently installed on nodes
            25-29 of the MIDAS cluster.

        4.3.4    Fatal Errors
        Errors that require restarts or discussion with RTI personnel should be initiated through
        the MIDAS Help Desk.



77b63587-bd68-4731-a5f7-ebca3c46b609.doc         41                                            2/1/2013
        4.3.5    Java Optimization
        The IBM Java2 Diagnostic Manual describes the study of Java code performance
        optimization. See Chapter 8, Appendix XX, for details on how MIDAS performance can
        be optimized.

     4.4   Model Validation
   Model validation verifies that the simulation is a good model of the target. Validity can be
   ascertained by comparing the output of the simulation with data collected from the target.
   However, keep in mind that:

                Both model and target processes are stochastic (i.e., based partly on random
                 factors.) Do not expect them to correspond exactly on every occasion. Whether
                 the difference between simulation and data from the target is so large as to cast
                 doubt on the model depends partly on the expected statistical distribution of the
                 output measures. Unfortunately, with simulations, these distributions are rarely
                 known and not easy to estimate.
                Many simulations are path-dependent; the outcomes depend on the precise
                 initial conditions chosen because these affect the “history” of the simulation. In
                 other words, the outcomes may be very sensitive to the precise values of some
                 of the assumptions in the model.
                Even if the results obtained from the simulation match those from the target,
                 there may be some aspects of the target that the model cannot reproduce.
                Do not forget the possibility that the model is correct, but the data about the
                 target are incorrect, or more often, result from making assumptions and
                 estimates. Data accuracy issues frequently arise when the model is highly
                 abstract. It may be hard to relate the conclusions drawn from the model to any
                 particular data from the target. In a highly abstract model, it is not clear what data
                 could be used to validate it directly. This issue arises with models of artificial
                 societies, where the target is either intentionally remote from the simulation or
                 does not exist at all.
   Read more about model validation in the white paper, Disease Model Validation.
   The Informatics Group can also provide an independent assessment of model performance.
   This can be done within a model or across multiple models using a variety of standard
   validation approaches. Users interested in this type of services should contact Phil Cooley
   (pcc@rti.org) for more information.

     4.5   Setting up production runs – scripts
   The RTI Informatics Group will help the modelers develop Linux scripts to run their models in
   batch mode. Because all of the models are stochastic, a random number seed will initiate each
   model replicate. It is possible to exploit the parallel nature of the cluster without parallelizing
   the internal code. This can be accomplished in two ways.
   1. Attempt to run a different replicate on each node (or processor). Up to 64 (128) replicates
        can be run simultaneously using this approach.
   2. Allocate the compute cycles according to a specific parameter that is being varied in a
        parameter sweep investigation. Each of the 64 (128) nodes (processors) would run all of
        the variations made against a single parameter.




77b63587-bd68-4731-a5f7-ebca3c46b609.doc          42                                           2/1/2013
     4.6   Model Enhancement

        4.6.1    Other Diseases
        As new models that focus on different diseases and/or pathogens are developed, the
        Informatics Group will acquire data for estimating the appropriate parameters and identify
        the other system resources that are needed to support and expedite model development.

        4.6.2    Developing a general model
        The model development platforms to date have involved variations of the third-generation
        computer language C++.
        To develop a general model that supports the simulation of different diseases; operates
        on different population settings; investigates the effectiveness of different control
        strategies, it will be necessary to synthesize the disease modeling process. This analysis
        will identify the generic components of the disease modeling environment. Once the
        components are identified, the models can be reassembled and/or reprogrammed into
        these components to enable a new disease to be accommodated more easily.

        4.6.3    Documenting Models
        The documentation for a given model should consist of a manuscript (either published or
        submitted for publication). The documentation also should include the source code.
        The Informatics Group continues to work with the developers to compile a user’s manual
        for each model.




77b63587-bd68-4731-a5f7-ebca3c46b609.doc        43                                         2/1/2013
5 Model and Data Repositories
The purpose of the MIDAS Model
Repository is to provide the various
collaborators of the MIDAS project with
the ability to maintain a reference library of
previously developed models and
experiments with their results, and to link
their model description, code, results and
data. Four primary component systems
comprise the system:
   1. Compute Servers;
   2. Large File System;
   3. Database System; and
   4. Version Control.
These components are described in
subsequent portions of this section. In its
final production version, the MIDAS Model
Repository will provide a Graphical User
Interface (GUI) that will allow members of     Figure 5.1 MIDAS Model Repository overview.
the MIDAS community to access a suite of
simulation codes, check them out of a repository, run them, perform data pre- and post-
processing tasks and view results, all via the MIDAS portal.

     5.1   Compute Servers

        5.1.1    Cluster and job queuing system (combined)
        The job queuing system provides access to the compute servers for MIDAS users. It is
        the primary and preferred access method for analyst users. Additionally, code maintainer
        and developers can access all of the Linux system's features as needed to make
        changes to the code.
        Through the system, users can check availability of compute nodes, submit and manage
        a distributed or batch job queuing script to the cluster or request an interactive session to
        debug a parallel code on multiple notes.

        5.1.2    Cluster management system
        The cluster systems administrator uses this system to perform a variety of tasks,
        including to:
                 Remove a faulty compute node from the cluster configuration.
                 Take blocks of nodes off line to perform system maintenance.
                 Patch system software.
                 Change the default walltime limit for the high-memory nodes queue from four
                      hours to three.

        5.1.3    Server for large serial applications
        The use cases for this are covered above, since this server is simply one or more cluster
        compute nodes with extra memory.




77b63587-bd68-4731-a5f7-ebca3c46b609.doc         44                                           2/1/2013
        5.1.4    Spillover capacity
        See Section 3.6.7.

     5.2    Large File System
   Large data files as well as simulation data files will be stored in an archival file system that can
   be accessed by the users for data retrieval and to perform extensions to existing studies. All
   data manipulations can be performed via the GUI, although experienced users can access the
   file system through standard Linux facilities. Access level controls will be enforced according
   to the permissions granted by the owning research groups.

     5.3    Database System

   The database system associates metadata that specifies specific simulation run results with
   the input and output datasets associated with those runs. The database system allows users
   to perform keyword searches on the system.

        5.3.1    Metadata used to query results
        The database maintains tables of metadata that describe the following entities:
                    Studies – a collection of experiments;
                    Experiments – a collection of runs;
                    Runs – a set of output data generated by a single unique parameter set;
                    Models – the code that generates the runs.

        The runs and models categories provide information about the model details and the
        results generated by those models. The studies and experiments categories provide
        information about the collective set of run results. Applications have been developed that
        display models, metadata views and results.
        Studies are a collection of experiments the results of which have been published and can
        be repeated by other researchers, for example, a study of the Emory influenza published
        in Science, August 12, 2005.
        An experiment in the Emory study might entail the set of runs used to investigate the
        impact of closing schools on the infection curve. Run 1 would be a simulation case where
        students continued to go to school during a disease outbreak. Run 2 would be a
        simulation case in which the students were kept home, starting at that simulation point in
        time when the first symptomatic people showed up in the simulation. The experiment
        results would show the differences in infection rates between the two cases.
        A run is the unit of analysis. Each run constitutes a single replicate of a single set of
        parameters.

        5.3.2    Metadata
        The fields identified in the searchable metadata tables maintained by the database
        system include two types of metadata: searchable attributes and annotation attributes.
        The searchable attributes sections – i.e. attributes at the study, experiment, run and
        model levels – identify the metadata captured by the system to support the query
        process. Result-level and model-level annotation attributes are also maintained as
        metadata to provide additional information about the models and the results. Annotation
        attributes are not used to query records.

        The following are lists of the various attributes:
           Searchable Attributes at the Study level



77b63587-bd68-4731-a5f7-ebca3c46b609.doc          45                                           2/1/2013
                    Study ID (Primary ID)
                    Diseases
                    Geographic regions
                    Developers (PI, institution)
                    Time frames of runs (present, 1918)
                    Publication describing experiment of study

           Searchable Attributes at the Experiment level
                Study ID
                Experiment ID
                Manuscript describing experiment
                Containment strategies

           Searchable Attributes at the run level
                Study ID
                Experiment ID
                Replicate ID - note the summary run will be assigned a replicate value

           Searchable Attributes at the model level
                Version ID (primary key)
                Diseases
                Geographic regions
                Containment strategies
                Developers (PI, institution)
                Time frames of runs (present, 1918)

           Result-level Annotation Attributes
                Version ID
                Replicate
                Containment code
                Start random seed
                Compute time to produce results
                Social network measures average links per node at iteration 0
                Social network measures average links per node at iteration 50
                R0 value
                Summary description of run

           Model-level Annotation Attributes
                Version ID
                Urban, rural, mixed
                Location and description of regions
                Developer contact information
                Computer resources used to produce results
                Programmers
                Platform details
                Compiler details
                Support libraries
                Support databases
                Location of script running jobs
                Location of manuscript describing results
                Location of model source code
                Location of model executable


77b63587-bd68-4731-a5f7-ebca3c46b609.doc        46                                        2/1/2013
                    Other attributes re input and output files or data types or type of model (in
                     terms of computational or statistical characteristics)

     5.4   Version Control System
   The MIDAS model repository uses Subversion as its version control system. A user of the
   MREP may check out an existing executable version of a model to make additional simulation
   runs. This allows the researcher to go back and replicate a series of runs, or extend a study
   from a version of a code that was used previously. The subversion module also allows a
   researcher to check in new versions of code and studies. All interactions with the version
   control system occur through the Graphical User Interface (GUI) of the model repository. The
   user does not interact directly with the configuration control system.

     5.5   Bug Report and Change Tracking

   The MIDAS model repository uses Bugzilla as its change/enhancement request tool. This tool
   allows users to create bug reports and feature requests, and submit feedback to developers to
   provide status information on active tasks.

     5.6   MIDAS User Interface (UI)
   Users interact with the model repository through the MIDAS User Interface (UI.) Through the
             browse the repository, using key words to search for studies of interest;
                request and receive run-time access to the latest production version of a
                 simulation code (EpiSims, for example), or any other specific version of that
                 code;
                request and receive specific input data sets, such as for the Chicago pneumonic
                 plague case entitled "Subway Contamination Scenario #32B";
                view available output data graphs from a specific named simulation run;
                perform a data pre-processing task, such as create a new initialized health file for
                 a population;
                generate a parameter sweep of input data sets for a new series of simulation
                 runs;
                launch a set of simulation runs that have been previously set up;
                check in a new production version of a simulation code (source and executable);
                check in a new experiment (a collection of runs);
                request a report on the differences between the metadata for specific versions,
                 such as the 12/2/2004 and the 4/8/2005 versions of the production version of a
                 specific code;
                request a specific high-memory compute node upon which to run a large
                 memory-intensive serial job;
                specify a series of jobs to run in sequence (i.e. second one starts after first one
                 finishes successfully). User is notified if any job in the sequence fails, and the
                 location of any log and error messages;
                specify a test case for regression testing prior to checking in a new version of the
                 code;
                request a report of all job activity on the cluster;
                use the Bug Tracking interface to report the new bug or request a new feature.




77b63587-bd68-4731-a5f7-ebca3c46b609.doc           47                                          2/1/2013
     5.7   Geodatabase Metadata Explorer
   The Geodatabase Metadata Explorer (GME) is a web-based interface to MIDAS geospatial
   data available from RTI. The GME allows users to search geospatial metadata by location and
   keyword. Once data layers of interest have been identified full, complete Federal Geographic
   Data Committee (FGDC) metadata can be viewed. Users therefore can find out if data of
   interest are available and if so, what the specific contents and formats are.

        5.7.1    Accessing the GMS
        The GMS is accessible via the MIDAS Portal under the Data tab.

        5.7.2    Using the GME
        The GMS interface allows the user to specify a geographic and keyword search for
        spatial data.




        Figure 5.2 The GME query interface

        The main interface shown in Figure 5.3 includes a search tab and a browse tab along the
        top. The browse tab will list all of the geospatial data available.
        To use the search tab, follow the three step process outlined on the query interface page
        to initiate a query.
              In Step 1, use the tools and map in the upper left to define a geographic search
                  area. Use the default tool (the red box) to click and drag an area for the search.
                  Any geospatial data that touches or is within the box you draw will be considered
                  for the search.
                In Step 2, use the content type dropdown list to determine the types of data your
                 are looking for. Currently it is acceptable to leave this dropdown at the default,
                 <All Content Types>.
                 Also in Step 2, specify a content theme from the dropdown list and or use an
                 optional keyword to look for data having a particular theme (e.g. ‘population’,
                 ‘administrative boundaries’, etc.).
                In Step 3, click the yellow ‘Start Search’ button to initiate the search. If you click
                 the Search NSDI Clearinghouse button the search will be expanded beyond the
                 data housed for the MIDAS project and additional geospatial data published on
                 the web will be queried.




77b63587-bd68-4731-a5f7-ebca3c46b609.doc           48                                            2/1/2013
        5.7.3    Using GME Query Results
        After clicking the ‘Start Search’ button, the GME will report the results of the search. In
        the example below, the geographic search area was centered on the United States and
        ‘Admin & Political Bounds’ was selected from the content theme dropdown. Eight data
        layers were returned that meet those criteria.




     Figure 5.3 Results of a U.S. search for “Admin & Political Bounds” data
        The scrollbar on the right side of the window allows you to scroll through all the results.
        Each result includes the name of the publisher of the data, the content title, and the
        coverage area. In addition, a button called ‘View Details’ is included for each resulting
        data layer.

        Click the ‘View Details’ button to retrieve more detail metadata about a particular layer.
        Figure 5.5 below illustrates the results when clicking the ‘View Details’ button for the
        Midas_DBO.NPTS_Tracts data shown in Figure 5.4 above.




77b63587-bd68-4731-a5f7-ebca3c46b609.doc         49                                           2/1/2013
Figure 5.4 “View Details” results for the Midas.DBO.NPTS_Tracts layer
        The information shown in the Details page provides additional information about the
        layer. At the bottom of the Details page are two additional buttons (Figure 5.6.)




77b63587-bd68-4731-a5f7-ebca3c46b609.doc       50                                        2/1/2013
Figure 5.5 Bottom of the Details page
        Clicking the ‘View Coverage Area’ button produces a popup window illustrating the
        spatial extent of the layer as shown in Figure 5.7.




Figure 5.6 Coverage Area for the Midas_DBO.NPTS_Tracts layer. Because the green box
covers Alaska and Hawaii, the user can surmise that the data layer contains census tracts
for those states.
        The other button at the bottom of the Details page is the ‘View Full Metadata’ button. By
        clicking the ‘View Full Metadata’ button, the GME will produce the complete metadata
        record for the layer. The complete metadata includes information on the attribute fields



77b63587-bd68-4731-a5f7-ebca3c46b609.doc        51                                         2/1/2013
        contained in the layer as well as additional information concerning who to contact with
        questions about the layer and how it was processed.
        Figure 5.8 below illustrates a portion of a full metadata record.




Figure 5.7 A portion of a full metadata record for the Midas.DBO.NPTS_Tracts layer

        5.7.4    Retrieving Geospatial Data from the MIDAS Geospatial Database
        Once you have located a layer in GME that you are interested in using, contact Bill
        Wheaton (wdw@rti.org), Jeannie Game (game@rti.org), or David Chrest (davidc@rti.org)
        at RTI to request access to the layer.




77b63587-bd68-4731-a5f7-ebca3c46b609.doc        52                                         2/1/2013
6 Visualization Tool
   OutBreak is provided to MIDAS users to visualize model runs. OutBreak allows points
   representing the condition of a particular individual (sick, well, etc) to be rendered on an
   appropriate map. Each day can be viewed for new occurrences or every occurrence up to that
   date. Users can run the OutBreak application from within their user accounts on the Model
   Repository and use it to access data residing there. (Chapter 5.)

   OutBreak is a two part Java application. The first reads a text file and moves the data onto a
   relational database. The second queries the database to create a series of overlays (one for
   each day). The open source Geotools library (www.geotools.org) is used to render the maps.

   Sections 8.2 and 8.3 provide definitions and conventions related to the OutBreak Visualization
   Tool.

     6.1   Input
   OutBreak accepts input files in a subset of Agent-Based Modeling Markup Language (AMML).
   AMML is an XML grammar written in XML Schema for the storage and transport of information
   related to the construction of agent-based models.
   AMML provides a variety of objects for describing agent-based simulation features including
   agent identifiers, agent states, geographical locations, simulation timestamps and other
   generalized values.

        6.1.1    Scope
        In general, AMML is an XML encoding (with no current associated ISO or other standard)
        for the transport and storage of information related to the conceptual framework used in
        agent-based simulations, and includes both spatial and non-spatial properties of
        simulation features.
        The AMML specification defines the XML schema syntax, mechanisms and conventions
        that:
              Provide an open, vendor-neutral framework for the definition of particular agent-
                  based simulation schemas and objects;
              Allow profiles that support proper subsets of AMML framework descriptive
                  capabilities;
              Support the description of particular agent-based simulation schemas for
                  specialized domains and information communities;
              Enable the creation and maintenance of linked various agent-based simulation
                  schemas and datasets;
              Support the storage and transport of simulation schemas and datasets;
              Increase the ability of organizations to share simulation schemas and the
                  information they describe.
        In general, implementers may decide to store simulation schemas and information in
        AMML, or they may decide to convert from some other storage format on demand and
        use AMML only for schema and data transport. OutBreak users, however, should comply
        with the specification for the subset of AMML described in the following sections.

        6.1.2    Conformance
        The framework, concepts and methodology for testing, and the criteria to be achieved to
        claim conformance are defined elsewhere (Work in Progress AMML – Conformance and
        Testing).




77b63587-bd68-4731-a5f7-ebca3c46b609.doc       53                                         2/1/2013
        6.1.3    Normative References
        The following referenced documents are indispensable for the application of the AMML
        specification. For dated references, only the edition cited applies. For undated
        references, the latest edition of the referenced document (including any amendments)
        applies.

           ISO 8601:2000, Data elements and interchange formats – Information interchange
           Representation of dates and times
           ISO/TS 19103:—1, Geographic Information – Conceptual Schema Language
           ISO 19105:2000, Geographic information – Conformance and testing
           ISO 19107:2003, Geographic Information – Spatial Schema
           ISO 19108:2002, Geographic Information – Temporal Schema
           ISO 19109:—1, Geographic Information – Rules for Application Schemas
           ISO 19115:2003, Geographic Information – Metadata
           ISO 19117:—1, Geographic Information – Portrayal
           ISO 19118:—1, Geographic Information – Encoding
           ISO 19123:—1, Geographic Information – Coverages
           ISO/TS 19139:—1, Geographic Information – Metadata – Implementation Specification
           OpenGIS® Abstract Specification Topic 0, Overview, OGC document 99-100r1
           OpenGIS® Abstract Specification Topic 1, Feature Geometry, OGC document 01-101
           OpenGIS® Abstract Specification Topic 2, Spatial referencing by coordinates, OGC
           document 03-071r1
           OpenGIS® Abstract Specification Topic 5, The OpenGIS Feature, OGC document 99-
           105r2
           OpenGIS® Abstract Specification Topic 8, Relations between Features, OGC
           document 99-108r2
           OpenGIS® Abstract Specification Topic 10, Feature Collections, OGC document 99-
           110
           IETF RFC 2396, Uniform Resource Identifiers (URI): Generic Syntax. (August 1998)
           IETF RFC 2732, Format for Literal IPv6 Addresses in URLs. (December 1999)
           W3C XLink, XML Linking Language (XLink) Version 1.0. W3C Recommendation (27
           June 2001)
           W3C XMLName, Namespaces in XML. W3C Recommendation (14 January 1999)
           W3C XMLSchema-1, XML Schema Part 1: Structures. W3C Recommendation (2 May
           2001)
           W3C XMLSchema-2, XML Schema Part 2: Datatypes. W3C Recommendation (2 May
           2001)
           W3C Xpointer, XML Pointer Language (XPointer) Version 1.0. W3C Working Draft (16
           August 2002)
           W3C XML Base, XML Base, W3C Recommendation (27 June 2001)
           W3C XML, Extensible Markup Language (XML) 1.0 (Second Edition), W3C
           Recommendation 6 October 2000
           W3C SVG, Scalable Vector Graphics (SVG) 1.0 Specification. W3C Recommendation
           (04 September 2001)
           W3C SMIL, Synchronized Multimedia Integration Language (SMIL 2.0). W3C
           Recommendation (07 August 2001)
           The Schematron Assertion Language 1.5. Rick Jelliffe 2002-10-01

        6.1.4    UML Schema
        Many diagrams that appear in this standard are presented using the Unified Modeling
        Language (UML) static structure diagram. The UML notations used in this standard are
        described in the diagram below.




77b63587-bd68-4731-a5f7-ebca3c46b609.doc      54                                       2/1/2013
        In this standard, the following stereotypes of UML classes are used:

         Class                  Description
         <<DataType>>           A descriptor of a set of values that lack identity (independent
                                existence and the possibility of side effects). A DataType is a class
                                with no operations whose primary purpose is to hold the
                                information.
         <<CodeList>>           A flexible enumeration that uses string values for expressing a list
                                of potential values.
         <<Enumeration>>        A fixed list of valid identifiers of named literal values. Attributes of
                                an enumerated type can only take values from this list.
         <<Union>>              A list of attributes. The semantics is that only one of the attributes
                                can be present at any time. In this standard, the following standard
                                data types are used:
                                                CharacterString – A sequence of characters (in
                                                general this data type is mapped to “string” in XML
                                                Schema)
                                                Integer – An integer number (in general this data
                                                type is mapped to “integer” in XML Schema)
                                                Real – A floating point number (in general this data
                                                type is mapped to “double” in XML Schema)
                                                Boolean – A value specifying TRUE or FALSE (in
                                                general this data type is mapped to “boolean” in
                                                XML Schema)




77b63587-bd68-4731-a5f7-ebca3c46b609.doc          55                                             2/1/2013
        6.1.5    XML Schema
        The normative parts of the specification use the W3C XML Schema language to describe
        the grammar of conformant subset of AMML data instances used by OutBreak. XML
        Schema is a rich language with many capabilities and subtleties. While a reader who is
        unfamiliar with XML Schema may be able to follow the description in a general fashion,
        this specification is not intended to serve as an introduction to XML Schema. In order to
        have a full understanding of this specification it is necessary for the reader to have a
        reasonable knowledge of XML Schema.
        The following is the DTD of the subset of AMML used by OutBreak:

                  <!ELEMENT outbreakinput ( parameters, columndata ) >
                  <!ELEMENT parameters ( general, display ) >
                  <!ELEMENT general EMPTY >
                  <!ATTLIST general
                        id ID #REQUIRED
                        source CDATA #REQUIRED
                        timestamp CDATA #REQUIRED >
                  <!ELEMENT display ( colors? ) >
                  <!ELEMENT colors ( color+ ) >
                  <!ELEMENT color EMPTY >
                  <!ATTLIST color
                        condition ID #REQUIRED
                        R CDATA #REQUIRED
                        G CDATA #REQUIRED
                        B CDATA #REQUIRED >
                  <!ELEMENT columndata ( column+, data ) >
                  <!ATTLIST columndata
                        columns CDATA #REQUIRED
                        rows CDATA #REQUIRED
                        format ( CSV | TAB | POS ) "CSV" >
                  <!ELEMENT column EMPTY >
                  <!ATTLIST column
                        id ID #REQUIRED
                        position CDATA #IMPLIED
                        name CDATA #REQUIRED
                        description CDATA #IMPLIED
                        datatype ( Integer | Real | String ) #REQUIRED >
                  <!ELEMENT data ( #PCDATA )* >




77b63587-bd68-4731-a5f7-ebca3c46b609.doc       56                                         2/1/2013
        The following is a sample XML file that uses the DTD in Section 6.1.5:

                <?xml version="1.0" encoding="UTF-8" standalone="no" ?>
                <!DOCTYPE outbreakinput SYSTEM "OutBreakInput.dtd" >
                <outbreakinput>
                       <parameters>
                              <general
                                      id="SampleOutputFile"
                                      source="RTI Influenza Model V1.0"
                                      timestamp="10/10/05" />
                              <display>
                                      <colors>
                                              <color condition="II" R="255" G="0" B="0"
                />
                                              <color condition="RR" R="0" G="255" B="0"
                />
                                              <color condition="DD" R="0" G="0" B="0"
                />
                                      </colors>
                              </display>
                       </parameters>
                       <columndata columns="7" rows="100" format="CSV" >
                              <column id="1" position="1" type="integer"
                name="RESERVED" description="Not Used" />
                              <column id="2" position="2" type="integer"
                name="AgentID" description="Unique ID" />
                              <column id="3" position="3" type="string"
                name="Condition" description="Agent State" />
                              <column id="4" position="4" type="integer" name="Day"
                description="Simulation Day" />
                              <column id="5" position="5" type="float" name="Latitude"
                description="Latitude" />
                              <column id="6" position="6" type="float"
                name="Longitude" description="Longitude" />
                              <column id="7" position="7" type="integer" name="Other1"
                description="Other 1" />
                              <column id="8" position="8" type="integer" name="Other2"
                description="Other 2" />
                         <data>
                              <![CDATA[
                                      212250,91206,II,1,2215494.39,704368.64,25,Firefighter
                                      .
                                      . 98 other entries in this example
                                      .

                       322340,43251,RR,321,3415334.84,714458.21,42,Teacher
                              ]]>
                         </data>
                       </columndata>
                </outbreakinput>




77b63587-bd68-4731-a5f7-ebca3c46b609.doc       57                                   2/1/2013
     6.2   Output
   Outbreak includes two output capabilities. Both functions use the Java Advanced Imaging
   (JAI) API from SUN. These libraries include the capability to output java screens to a variety
   of image formats.
   1. A button on the bottom of the screen will export a JPG of the screen and graph. A file
       browser will open up to select the location desired location of the output file.
   2. A button on the bottom of the screen will allow a movie to be exported in AVI format. The
       movie will output will correspond to the settings on the map display screen. A file browser
       will open up to select the location desired location of the output file.

     6.3   Getting Started
   To run OutBreak you need Java Runtime Environment (JRE) version 1.4.2. To download go to
   http://java.sun.com/j2ee/1.4/download.html#sdk. Install according to the setup that is correct
   for your operating system.

     6.4   Running OutBreak
   Outbreak is run from the Model Repository query screen. Once a model run of interest is
   found, click on the “Launch Outbreak” link to the left.

   Once Outbreak is launched, parsing the file will begin on the cluster and can take several
   seconds depending on the number of events in the file. A progress bar at the bottom of the
   screen will count the lines loaded until the full data file is ready.

   After converting the text file to a data store, the main Movie Panel will load. From there you
   can zoom and move around the map, turn layers on and off, start/stop the day progression,
   choose a particular day to look at, increase/decrease the speed of day progression, or choose
   another file. Figure 6.1 shows the Movie Panel interface; Section 6.5 describes the
   functionality.

   Note: Use the left mouse button and wheel to zoom in and out. Right clicking will bring up a
   context-sensitive menu including the “Show Magnifier” option. The magnifier can be moved
   around and zoomed independently of the main map.

   You can manipulate the display by toggling layers. To turn layers on and off, click the Layers
   tab at the top of the Movie Panel. After making your selection, click the Movie tab to return to
   the main display area.




77b63587-bd68-4731-a5f7-ebca3c46b609.doc         58                                          2/1/2013
Figure 6.1 OutBreak main map screen

     6.5    Legend
   The following legend describes the functionality shown in Figure 6.1:
   “A” renders the map panel. Display is based on the shape file chosen in the File Selection
   Panel.
   “B” displays the coordinates of the current cursor location at any point in the map.
   “C” is a list of all conditions found in the original text file. The colors to the left indicate how the
   points will be displayed. A scroll bar will appear if the number of conditions exceeds the
   amount of viewable space in the box.
   “D” displays video controls. You may Rewind, Play and End the video progression. When Play
   is in progress, a Pause button displays. Play will be stopped automatically when the last day is
   reached.
   “E” displays what day the movie is on
   “F” allows you to choose the number of days between frames. Default value is one day.
   “G” contains a slider bar that represents all the days included in the simulation. Clicking and
   dragging will result in the chosen day being displayed, but none of the days passed over while
   dragging will be displayed.


77b63587-bd68-4731-a5f7-ebca3c46b609.doc            59                                             2/1/2013
   “H” controls how quickly the animation runs. The delay between each frame is in milliseconds.
   If the delay is set to “0” then the program will run as quickly as the local machine will allow.
   “I” provides map and graph display modes that allow the user to select different modes of
   display for both the maps and the graphs generated by the simulation.
   “J” allows the user to set map style to a point or group display.
   “K” provides an area for display of simple line graphs of the epidemic curve. The program
   displays a line for each condition with their counts.
   “L” provides a means to preserve the map displayed in the OutBreak run. Export is to JPG or
   AVI format.
   “M” provides a means to conduct another OutBreak session. Clicking “Do Another” allows the
   user to return to the File Selection panel and preserves the settings from the previous run.
   Note: Using the browser Back arrow will return you to the File Selection panel but will not
   preserve the previous settings.
   “N” allows the user to close the OutBreak session and the browser at the same time by
   clicking the Exit button.




77b63587-bd68-4731-a5f7-ebca3c46b609.doc        60                                          2/1/2013
State, Regional & Local GIs Data Sources




7 Geographic Information System (GIS) Tools
     7.1     Geospatial Data and Data Development
   Epidemiologists have recognized the clear importance of distance and spatial relationships in
   disease transmission for more than 100 years. As geospatial technologies have evolved and
   improved over the past 30 years, these technologies and data have become an integral
   component of analyzing, illustrating, modeling, and understanding disease transmission. To
   support the MIDAS system in the use of state-of-the-art geospatial technologies, GIS is being
   incorporated into MIDAS in a number of ways as described in this chapter.

   MIDAS will provide geospatial data from other sources and will modify process or analyze
   existing geospatial data to make it conform to the requirements of MIDAS modelers.

   There has been an explosion in the source and quantity of geospatial data in the past 30
   years. Federal, state and local government agencies are the largest creators and consumers
   of geospatial data. Therefore, much of the data that will be available in MIDAS will come from
   governments (United States and international). Usually, these data will be available free of
   charge, although license restrictions related to use and distribution must be followed.
   However, for certain specialized data that are not available publicly or whose source is a
   foreign government that does not provide data publicly, MIDAS will purchase data that is
   deemed to be essential to model development, testing and production.

         7.1.1    Geospatial Data Concepts
         Geospatial data generally falls into three categories:

        Vector data, represented by points, lines, and areas. These data (sometimes generically
        referred to as ‘shapefiles’ or ‘coverages’) may be stored in disk-based structures or within
        relational database management systems (RDBMS) software that has been enabled for
        spatial data handling. Vector data are referred to as discrete data because they exist only
        in space where the point, line or polygon boundaries exist.

        Raster or ‘grid’ data, represented by regularly spaced rows and columns of data (‘pixels’
        or grid cells). Examples of raster data include satellite imagery, aerial photos and digital
        elevation models (DEMs). Raster data are an appropriate format for imagery or for other
        continuous data where each pixel may have a different value than its neighbor.

        Geocoded data, represented in flat files, spreadsheets or any other format. These data
        must have a geographic identifier that allows the data to be linked to spatial layers.
        Geocodes can be as simple as addresses, Zip codes or county codes. A wide variety of
        geocode identifiers are available. Some that may be of use in MIDAS include three-digit
        airport codes whereby the locations of airports can be linked to detailed transportation
        data available from non-GIS sources.

        The geospatial data available on the MIDAS Web portal currently reflects priorities for
        research and modeling as understood by RTI. Researchers will be able to request that
        certain types of geospatial data be sought, discovered, acquired and processed to meet
        particular needs.

         7.1.2    Geospatial Metadata
         The MIDAS portal houses geospatial metadata that allows researchers to query and
         quickly find geospatial data geospatial data of interest to them.




77b63587-bd68-4731-a5f7-ebca3c46b609.doc          61                                          2/1/2013
State, Regional & Local GIs Data Sources


         Geospatial metadata conforms to Federal Geographic Data Committee (FGDC)
         standards to ensure that users understand the source, content and use constraints of
         every geospatial dataset available.
         Metadata will be searchable and provided with all geospatial datasets downloaded from
         the MIDAS portal.

         7.1.3    Geospatial Data Acquisitions
         Data acquisition and identification are an important part of the IG support role. As such,
         activities involving data acquisition and development have been organized into a single
         management structure under a data acquisition group leader.

         7.1.4    Geospatial Data Acquisition Goals and Objectives
       Geospatial data acquisition fits within the larger MIDAS data acquisition context. The
       Data Identification and Acquisition Group supports the objectives of the external
       Research Groups and internal RTI research being conducted within the larger context of
       RTI’s Informatics role in MIDAS. The key activities are to:
   Determine/anticipate the data needs of the modelers to directly support data needs of their
    models
   Acquire the data requested by the RGs and place it in a format that enables the data to be
    easily used by the RGs to prepare for an experiment.
   Embark on custom data development/configuration projects based on requests from RGs
   Document the source, content, and structure of all data collected and prepared for use in
    MIDAS
   Make data accessible to the MIDAS network in general

         7.1.5    Geospatial Data Acquisition Scope
         The Data Identification and Acquisition Group will be involved in structured data (i.e. data
         that are already or can be managed in database); non-structured data (i.e. single data
         values like R-naught estimates from past epidemics, other model estimates); geospatial
         and non-geospatial data; and historical data.

         Data acquisitions will continue for the life of the project. Section 8.3 contains a table
         summarizing geospatial data currently available.

         7.1.6    Geospatial Data Formats Available to MIDAS
         There are number of proprietary and open standard formats for geospatial data. MIDAS
         geospatial data are usually stored in their native format for grid/raster data and in a
         geodatabase for vector data. Most geocoded descriptive files are stored in RDBMS
         tables. Because the MIDAS models usually require custom formats for geospatial data,
         users can reformat all geospatial data acquired and available through the MIDAS portal
         to meet the needs of the models. Upon request, modelers can have geospatial data
         reformatted and configured to meet their particular needs. Conversions from one
         geospatial format to another (for example, from a shapefile to a geodatabase) are
         possible, as well as conversions from any existing geospatial format to new ASCII text
         format. There are a number of standards for tagging and outputting geospatial data in
         ASCII text format. Where appropriate, these standards should be employed. For
         example, the Open Geospatial Consortium (OGC) publishes standards for geospatial
         data that would allow modelers’ data to be easily ingested and understood by other
         geospatial applications.




77b63587-bd68-4731-a5f7-ebca3c46b609.doc           62                                           2/1/2013
State, Regional & Local GIs Data Sources


         Table 7.1 lists the geospatial formats available for data in the MIDAS repository. MIDAS
         geospatial staff can convert data from these various formats or output data to ASCII
         specifications of MIDAS model developers.


         Table 7.1             Geospatial formats available for data in the MIDAS repository
           Format                Publisher/    Description
           Standard Name         Source
           Shapefile             ESRI          Vector data format that is relatively ubiquitous in the geospatial industry.
                                               Although it is proprietary, the format has been published to allow developers
                                               to read and write the format.
           Coverage              ESRI          Coverage format is the oldest ESRI format and involves a complex, multi-
                                               directory structure. Coverage format is not open. ESRI software is generally
                                               required to read and write coverage-formatted files.
           Ungenerate            ESRI          Ungenerate format is an ASCII format that was standardized by ESRI. It
                                               allows geospatial data to be output in a simple ASCII structure and is useful
                                               when converting geospatial data from widely different platforms. Application
                                               programs to read ungenerate format are easy to develop.
           Well Known Text       OGC           Open standard text format for handling geospatial data. Commonly used as a
           (WKT) format                        format in open source geospatial databases.
           Well Known Binary     OGC           WKB is used to exchange geometry data as binary streams represented by
           (WKB) format                        BLOB values containing geometric WKB information.
           GRID                  ESRI          ArcGRID format is similar to ESRI Coverage format except that GRID format
                                               stores raster data.
           Other Imagery         Various       There are too many raster/image data formats to list here. The main ones that
           Formats                             might be used on MIDAS include .TIFF, .JPG, .GIF, GeoTIFF, .gis, .bil, and
                                               .bip.
           Spatial Database      ESRI          SDE is a software product, not a format. However, when geospatial data are
           Engine (SDE)                        stored in an SDE-based geodatabase, it is said to be in SDE format.
           PostGIS               Open Source   PostGIS is an open source geodatabase product. Geospatial data stored in a
                                               PostGIS-enabled RDBMS is in ‘PostGIS’ format.
           .MIF                  MapInfo       .MIF format is a proprietary geospatial format produced by MapInfo
                                               corporation and is used in MapInfo geospatial databases.
           GRASS                 Open Source   GRASS format is used by the GRASS open source GIS product. GRASS
                                               format is a raster format.
           Other                 Various       Other geospatial data formats and translators exist. Inquire with RTI GIS staff
                                               for information if you don’t see the format you are interested in listed here.
           Geocoded              Various       Any data file containing a geographic link (address, zipcode, administrative
                                               code, etc.) can be linked to geospatial data.

     7.2     Geospatial Data Storage and Maintenance Plan
   Geospatial data for MIDAS is acquired, processed, managed and maintained in different
   places depending on its processing status.

         7.2.1     Geospatial Data on the MIDAS Portal
         Geospatial data is acquired and processed on RTI internal servers. Once the data are in
         suitable formats and projections, and have been properly documented, they will be
         moved to the MIDAS portal where they will be documented and available for use by
         MIDAS users through a Web page. Modelers will be able download data directly, if
         required.

         Some of the geospatial data will be stored in geodatabases, making it accessible from
         client-based GIS products located either on the portal, on the Linux cluster, or on



77b63587-bd68-4731-a5f7-ebca3c46b609.doc              63                                                   2/1/2013
State, Regional & Local GIs Data Sources


         networked desktops. Other data will be stored in file-based structures in
         directories/subdirectories on the portal.
         A full list and metadata for geospatial data located on the MIDAS portal will be available
         through a link on the portal.

         7.2.2    Geospatial Data on the MIDAS Linux Cluster
         For model runs, geospatial data is stored on the Linux cluster disks. This is necessary to
         enable the CPU power of the cluster to be less constrained by I/O and network
         bottlenecks. Geospatial data located on the cluster is generally in ASCII text files and
         formatted to custom requirements specified by model developers. However, for modelers
         who wish to use native geospatial data formats for their models, RTI geospatial staff will
         provide those data in the format requested by the modeler.

         A complete list of geospatial data available to modelers on the cluster will be presented
         as a link on the MIDAS portal.

     7.3     Custom Geodatabase Development, Integration and Processing
   MIDAS geospatial staff can perform the following tasks to process secondary geospatial data:

   Projecting geospatial data from one coordinate system to another to ensure that all
   geospatial data for an analysis is in a common geographic system and uses a projection that
   is appropriate for the geographic size and location of the dataset.

   Merging and or appending geospatial data to reassemble large geospatial datasets that
   were tiled or otherwise divided into portions to allow for easier management and transmission.

   Overlaying/integrating geospatial data to create geospatial input data useful to modelers,
   e.g. point-in-polygon overlays used to assign point features (such as persons, hospitals, or
   other facilities) to regions or polygon overlays that are used to vertically merge polygon layers;
   for example, population data with natural resource data to determine population counts and
   characteristics within non-demographic boundaries.

   Distance analysesto calculate distances between two sets of points, find nearest locations
   (e.g., schools) from a source dataset (e.g., school-aged children), calculate the spatial
   interaction between two sets of points, network analysis, and many other distance-related
   algorithms are available.

   Selections, queries, subsets and aggregationsto reduce dataset size and provide data that
   are streamlined for models or aggregated to coarser levels of resolution.
   \
   Data acquisition and processing to search for, assess, and acquire geospatial data needed
   by modelers. When data are found and acquired, the data will be processed, projected,
   documented, and otherwise managed in such a way that the data can be used by any of the
   MIDAS researchers.

     7.4     GIS Applications Development
   GIS application development can be requested through the MIDAS portal to enable a
   particular research project. This includes custom applications programming or scripting to
   enhance, automate, refine or otherwise standardize GIS data transfer mechanisms,
   input/output, and GIS processing for models. An example of such an application is the
   OutBreak visualization tool (Chapter 6).
   A variety of programming environments are employed depending on the needs of the
   researchers. The underlying GIS functions and methods are programmed using Visual Basic,
   Java, or other programming languages available for programming the desktop GIS.


77b63587-bd68-4731-a5f7-ebca3c46b609.doc         64                                           2/1/2013
State, Regional & Local GIs Data Sources


     7.5     Model Parameterization
   GIS can be used to calculate spatially referenced parameter values for models. Examples
   include population, subpopulation, socioeconomic variables, environmental variables (water,
   elevation, soil, vegetation, land cover, etc.), health/disease variables, vaccine locations, over-
   the-counter (OTC) drug purchases (GI relief products, etc.). Section 7.5.1 provides a step-by-
   step example.

         7.5.1    Creating Synthetic Agent Populations
         Overview: Using the TranSims population generator, RTI is generating a nationwide
         synthetic population using Census SF3 counts by block group and Public Use Microdata
         (PUMS) individual long-form records. The resulting database will be a microdata version
         of a whole U.S. population and will be tied to individual household locations.

         Data sources: U.S. Census Bureau TIGER data: The TIGER data forms the spatial
         context for decennial Census data collection. TIGER data are geospatial data defining,
         among many other things, the boundaries of states, counties, Census tracts, Census
         block groups, and Census blocks. Census tabulation data are aggregated into these
         various geographic boundaries. The Census block group is the smallest Census
         geographic boundary for which the full suite of Census variables (including
         socioeconomic variables) is available.

         Summary File 3 (SF3): SF3 data contain the demographic variables from the Census,
         organized and aggregated to many different geographies. Data variables on population
         and housing are available in these files.

         Public Use Microdata Sample (PUMS): These files contain records representing 5% and
         1% samples of the occupied and vacant housing units in the United States and the
         people in the occupied units. These data are actual responses to Census long-form
         questionnaires and therefore retain family structure information. Data on households
         (number of persons in the household, number of bedrooms, age of building, access to
         telephone service, type of heating, mortgage data, and many other variables) are
         present. Data on individuals within each household (age, sex, ethnicity, language spoken,
         school enrollment, occupation, travel time to work, military service, and many other
         variables) are also present. In addition, linkages between individuals and households that
         allow the household population structure to be brought forward through further analyses
         are maintained. The 5% PUMS data are available for predefined Census areas known as
         Public Use Microdata Areas (PUMAs). This means that the sample is directly related to a
         specific and fairly small geographic area.

         The production process for the synthetic population began in November 2005.




77b63587-bd68-4731-a5f7-ebca3c46b609.doc         65                                           2/1/2013
State, Regional & Local GIs Data Sources




                            Figure 7.1     PUMAs for a portion of North Carolina. PUMA
                            02900 covers two counties, whereas PUMAs 02801 and 02802 each
                            cover a portion of a county.


     7.6     GIS Technical Support/Training
   The MIDAS geospatial staff can provide technical assistance, support and consultation in the
   proper use of MIDAS geospatial data or in the use of most of the software technologies being
   made available.

         7.6.1    Informal Support/Consulting
         MIDAS geospatial staff can:
          help researchers understand the content and structure of MIDAS geospatial
            data;
          help researchers make proper use of geospatial data;
          provide advice on methods of processing or manipulating geospatial data to
            suit modelers’ needs. This kind of support can be offered informally through
            e-mail or telephone communications.

         7.6.2    Formal Training
         Upon request, MIDAS geospatial staff can develop more formal and defined training
         materials that address particular needs of modelers. Formal training can be provided at
         RTI’s location or at the location of MIDAS researchers.

     7.7     Web-based GIS Mapping
   Web-based GIS mapping provides a mechanism for the query and display of geospatial data
   within standard Web browsers such as Firefox and Internet Explorer. Web mapping does not
   require that data be downloaded, and there is no requirement for specialized GIS software on
   the end users’ desktop. In addition, existing national Web-based map services such as
   Terraserver and many others from USGS can be automatically loaded into the mapping Web
   sites, enhancing the visual context of the model data.




77b63587-bd68-4731-a5f7-ebca3c46b609.doc        66                                         2/1/2013
State, Regional & Local GIs Data Sources


         7.7.1    Model Output Visualization
         Geospatial data illustrating the output results of model runs can be loaded into a Web-
         based mapping system so that the information can be easily and widely disseminated to
         other researchers. Visualization of the data involves the ability to turn various layers on
         and off (for example, to visualize different model runs or the model outputs at different
         points in time), zoom and pan across the dataset, and see more detailed data as the user
         zooms in. (See also Chapter 6.)

         7.7.2    Geospatial Data Selection
         Web-based mapping applications can be developed that allow researchers to
         interactively select data for their models or define selection areas for data to be
         downloaded. For example, in the case of the agent population data layers discussed in
         Section 7.5.1, if a researcher wished to run a model for a particular area of the United
         States on very short notice, a Web-based front end interface would allow the researcher
         to quickly define the area of interest on a map. This would initiate a process on the server
         that would extract and format the data for that area for use in the model.




                  Figure 7.2      Example Web-based geospatial application that allows users
                  to enter new locational data by pointing and clicking in an Internet browser
                  window


         7.7.3    Geospatial Exploratory Data Analysis
         Simply looking at a geospatial data layer, especially in the context of other relevant
         layers, provides researchers with valuable information and may provide insight into how
         their models behave under different scenarios. Web-based mapping applications also
         provide the capability to run queries against the data, so that researchers can ask
         questions about the data (i.e., supply selection criteria) and then see which data
         elements meet those criteria.

     7.8     GIS Applications Software
   Several GIS applications packages are available on the MIDAS cluster for displaying and
   manipulating geospatial data. These packages include the following programs.



77b63587-bd68-4731-a5f7-ebca3c46b609.doc         67                                          2/1/2013
State, Regional & Local GIs Data Sources


         7.8.1     ArcExplorer
         A free geospatial data viewer from ESRI, ArcExplorer allows the user to display
         geospatial data from a number of sources and also to connect to publicly available
         geospatial data servers. ArcExplorer is limited in its capabilities and does not provide
         tools for data processing, updating or maintenance. It also has no capabilities for
         performing table-based joins, which severely limits its usefulness in displaying data that
         has not already been joined to geospatial features.

         7.8.2     GRASS
         GRASS is a very powerful open-source geospatial application that allows for the display,
         analysis and manipulation of geospatial data. Its power lies in its analysis capabilities,
         especially with raster data. Its limitations are in two areas: (1) the user interface is difficult
         to learn and (2) it does not provide very solid cartographic capabilities.

         7.8.3     SPRING
         SPRING is a free (but not open source) GIS application software developed in Brazil.

     7.9     Preliminary List of GIS Technologies Available

         7.9.1     Displaying, Visualizing, Analyzing GIS Data Outside Models
         Spatial Database Products: Although GIS data have traditionally been stored in file
         system/disk formats, newer technology relies on spatially enabling RDBMS technology to
         store and manipulate geospatial data in the RDBMS itself. These spatial database
         products are called “spatially enabled” RDBMSs
         Table 7.2 Spatial Database Products.
           Product Name            Source          TYPE        Platform        Source                   Notes
         ArcSDE                 ESRI              COTS       WIN/LINUX       pre-         SUSE Linux 8.0 w/ Oracle 9i; Red
                                                                             compiled     Hat AS ES 3.0 w/ Oracle 10g.
         PostGIS                Refractions.net   OSS        Multiplatform   C code       Sits on the OSS postgresql
                                                                                          RDBMS
         Oracle Spatial         Oracle            COTS       WIN/LINUX       pre-
                                                                             compiled

         Internet Map Servers: These products allow for the presentation and manipulation of
         geospatial data in Web browsers. They are server-based products that allow for the
         display of multiple layers of geospatial data. The user has buttons/tools to turn layers on
         and off, perform queries, zoom/pan, and otherwise manipulate the map dynamically.
         These products use geospatial data located on a server and are dynamic in the sense
         that (1) new maps are created in the browser based on user manipulation and (2)
         changes in the data on the server will show up in the Web map immediately.


         Table 7.3 Internet Map Servers.

             Product Name             Source       TYPE       Platform        Source                Notes
         ArcIMS                    ESRI            COTS      WIN/LINUX       pre-       Red Hat
                                                                             compiled
         MapServer                 U of Minn.      OSS       multiplatform   C code
         Grass                                     OSS       multiplatform   C code




77b63587-bd68-4731-a5f7-ebca3c46b609.doc                68                                             2/1/2013
State, Regional & Local GIs Data Sources


         Desktop GIS: These are desktop (non-Web-enabled) products for using GIS. However,
         some of these can access geospatial data across the Internet.


         Table 7.4 Desktop GIS Products.
         Product Name              Source           TYPE   Platform        Source       Notes
         ArcView                   ESRI             COTS   WIN                          no editing to backend geodatabase
         ArcGIS                    ESRI             COTS   WIN                          editing to backend geodatabase
         MapInfo                   MapInfo Corp.    COTS   WIN
         GRASS                                      OSS    multiplatform   C code       Mainly raster based
         Other                                                                          Several OSS projects in
                                                                                        development, none very mature


         Table 7.5 Tools for Spatial Statistics
           Product Name            Source    TYPE      Platform       Source                            Notes
         GRASS                               OSS     multiplatform    C code.       Includes hooks to the 'R' statistics
                                                                                    package for geospatial statistics.
         R                                   OSS     multiplatform    C code        statistics
         ArcServer                ESRI       COTS    WIN/LINUX        pre-          ArcGIS objects enabled for access via
                                                                      compiled      the Web. ArcGIS Server .NET
                                                                                    Application Development Framework
                                                                                    (ADF) runs on a Microsoft Windows
                                                                                    Server (2003 and 2000) and supports
                                                                                    IIS. ArcGIS Server ADF for Java runs
                                                                                    on a Microsoft Windows Server, as well
                                                                                    as a variety of UNIX platforms, and
                                                                                    supports numerous Web servers. The
                                                                                    user’s Java Web applications and Web
                                                                                    services will fit within the user’s
                                                                                    standard Web server environments
                                                                                    without the need to change the user’s
                                                                                    present Web server environments.

                                                                                    ArcGIS Server is designed to run
                                                                                    across a single server or a distributed
                                                                                    system of servers with any number of
                                                                                    CPUs. The initial release of ArcGIS
                                                                                    Server lets the GIS server portion of
                                                                                    ArcGIS Server run on a Microsoft
                                                                                    Windows Server (2003 and 2000). After
                                                                                    the initial release, ESRI plans to enable
                                                                                    the GIS server portion of the product to
                                                                                    also run on Sun Solaris and Linux
                                                                                    platforms. The ArcGIS Server ADF is
                                                                                    supported on Windows, in addition to
                                                                                    Sun Solaris, HP, HP-UX, and Red Hat
                                                                                    Linux (Intel).
         SAS                      SAS        COTS    WIN/LINUX        pre-          Must have some geostatistics. Others
                                                                      compiled      would know more.
         Geostatistical Analyst   ESRI       COTS                     pre-          Geostatistics; not sure if available in
                                                                      compiled      ArcServer.
         Other                               OSS                                    Many, many, many small research
                                                                                    oriented packages for doing cluster
                                                                                    analysis and other geospatial statistics.



77b63587-bd68-4731-a5f7-ebca3c46b609.doc             69                                                   2/1/2013
State, Regional & Local GIs Data Sources



8 Appendices
     8.1      OutBreak Terms and Definitions

   The following terms apply to the OutBreak input:

     Term               Definition
     AMML               An XML Schema written according to the AMML rules for simulation schemas and which
     application        defines a vocabulary of agent-based simulation objects for a particular domain of
     schema             discourse
     association        semantic relationship between two or more classifiers that specifies connections among
                        their instances [ISO 19103]
     attribute <XML>    An information item in the XML Information Set [Infoset]
                        NOTE 1 In this document an attribute is an XML attribute unless otherwise specified.
                        The syntax of an XML attribute is “Attribute::= Name = AttValue”. An attribute typically
                        acts as an XML element modifier (e.g. <Road gml:id = “r1” /> here gml:id is an attribute.
                        NOTE 2 ISO 19130 defines the term “attribute” as: property which describes a
                        geometrical, topological, thematic, or other characteristic of an entity. This definition is
                        wider than the term “attribute” as it is used in this International Standard and
                        corresponds more to a “GML property”.
     boundary           set that represents the limit of an entity [ISO 19107]
     category           One of a set of classes in a classification scheme.
     child <XML>        An XML element c that is in the content of another element p, its parent, but is not in the
                        content of any other element in the content of p
     codelist           Value domain including a code for each permissible value
     codespace          Rule or authority for a code, name, term or category
                        EXAMPLE Examples of a codespace include dictionaries, authorities, codelists, etc.
     color (child of    Defines an RGB specification for a color to be associated with an agent
     colors)            state. Attribute condition (String) refers to the name of the associated
                        agent state. Attributes R G and B (String) contain the decimal (range 0-
                        255) representation of color values.
     colors             Contains a set of color child elements.
     column (child of   Describes the data contained in a single column in the data element. Attribute id (String)
     columndata)        is a unique identifier for this column of data. Attribute position (String) contains either
                        the ordinal of the describe column within a row of data if the columndata object
                        specifies format CSV or TAB, or a tuple indicating the starting and ending character
                        within a row of data. Attribute type (String) contains the datatype of this column of data.
                        Attribute name (String) contains a human-readable name associated with this column of
                        data. Attribute description (String) is a free-form description of this column of data.
     columndata         Defines the data contained in the data object. Contains a set of column objects.
                        Attributes columns (String) and rows (String) contain the number of columns and rows
                        of data present in the data object. Attribute Format (CSV| TAB|POS) specifies whether
                        the columns are separated by commas (CSV), tabs (TAB), or are specified in absolute
                        character positions by their respective column descriptors.
     coordinate         one of a sequence of numbers designating the position of a point in n-dimensional space
                        [ISO 19111]
                        NOTE In a coordinate reference system, the n numbers shall be qualified by units.
     coordinate         coordinate system that is related to the real world by a datum [ISO 19111]
     reference
     system
     coordinate         set of mathematical rules for specifying how coordinates are to be assigned to points
     system             [ISO 19111]
     coordinate tuple   tuple composed of coordinates




77b63587-bd68-4731-a5f7-ebca3c46b609.doc                  70                                                    2/1/2013
State, Regional & Local GIs Data Sources


     Term                Definition
     coverage            feature that acts as a function to return values from its range for any direct position
                         within its spatiotemporal domain [ISO 19123]
     data                Contains raw data
     data type           specification of a value domain with operations allowed on values in this domain [ISO
                         19103]. EXAMPLE Integer, Real, Boolean, String, Date (conversion of a data into a
                         series of codes). NOTE Data types include primitive predefined types and user-definable
                         types.
     datum               parameter or set of parameters that serve as a reference or basis for the calculation of
                         other parameters [ISO 19111]. NOTE 1 A datum defines the position of the origin, the
                         scale, and the orientation of the axes of a coordinate system. NOTE 2 A datum may be a
                         geodetic datum, a vertical datum or an engineering datum.
     display (child of   Defines a set of colors to be used in the display of simulation result data
     parameters)
     direct position     position described by a single set of coordinates within a coordinate reference system
                         [ISO 19107]
     domain              well-defined set [ISO 19103]. NOTE 1 A mathematical function can be defined on this
                         set, i.e. in a function f:A B A is the domain of the function f.
                         NOTE 2 A domain as in domain of discourse refers to a subject or area of interest.
     element <XML>       an information item in the XML information set. NOTE From the XML Information Set
                         Specification: Each XML document contains one or more elements, the boundaries of
                         which are either delimited by start-tags and end-tags, or, for empty elements, by an
                         empty-element tag. Each element has a type, identified by name, sometimes called its
                         "generic identifier" (GI), and may have a set of attribute specifications. Each attribute
                         specification has a name and a value.
     feature             abstraction of real world phenomena [ISO 19101]. NOTE A feature may occur as a type
                         or an instance. Feature type or feature instance should be used when only one is meant
     feature             association between features [ISO 19103]
     relationship
     function            rule that associates each element from a domain (source, or domain of the function) to a
                         unique element in another domain (target, co-domain, or range) [ISO 19107]
     general (child of   Describes the dataset. AttribuetAttribute id (String) specifies some unique name
     parameters)         associated with the file. Attribute source (String) specifies the name of the simulation
                         that generated the data. Attribute timestamp (String) records the XML file creation
                         timestamp.
     namespace           Collection of names, identified by an URI reference which are used in XML documents
     <XML>               as element names and attribute names [XML]
     node                0-d topological primitive [ISO 19107]
     object              entity with a well defined boundary and identity that encapsulates state and behavior
                         [ISO 19107]. NOTE A GML object is an XML element of a type derived from
                         AbstractGMLType.
     observable          Phenomenon or property that is subject to observation
     parameters          There are two types of parameters: general and display
     point               0-dimensional geometric primitive, representing a position [ISO 19107]. NOTE The
                         boundary of a point is the empty set.
     polygon             a planar surface defined by 1 exterior boundary and 0 or more interior boundaries
     property            a child element of a GML object. NOTE It corresponds to feature attribute and feature
     <GML>               association in ISO 19109. If a GML property of a feature has an xlink:href attribute that
                         references a feature, the property represents a feature association.
     quantity            property ascribed to phenomena, bodies or substances that can be used to specify the
                         size, amount, or extent of a particular phenomenon, body or substance [ISO 19126].
                         NOTE In GML a quantity is always a value described using a numeric amount with a
                         scale or using a scalar reference system. Quantity is a synonym for measure when the
                         latter is used as a noun.
     range               the set B of a mathematical function (f:A B) is called the range of the function.



77b63587-bd68-4731-a5f7-ebca3c46b609.doc                   71                                                  2/1/2013
State, Regional & Local GIs Data Sources


     Term                  Definition
     schema                formal description of a model [ISO 19101]. NOTE In general, a schema is an abstract
                           representation of an object's characteristics and relationship to other objects. An XML
                           schema represents the relationship between the attributes and elements of an XML
                           object (for example, a document or a portion of a document)
     semantic type         A category of objects that share some common characteristics and are thus given an
                           identifying type name in a particular domain of discourse.
     sequence              finite, ordered collection of related items (objects or values) that may be repeated [ISO
                           19107]
     set                   unordered collection of related items (objects or values) with no repetition [ISO 19107]
     simulation            Conceptual schema for data required by one or more applications [ISO 19101]
     schema
     spatial object        object used for representing a spatial characteristic of a feature [ISO 19107]
     tag <XML>             the text in an XML document bounded by angle brackets
                           EXAMPLE <Color>. NOTE A tag with no forward slash (e.g. <Color> ) is called a start
                           tag (also opening tag), and one with a forward slash (e.g. </Color> is called an end tag
                           (also closing tag).
     topological           spatial object representing spatial characteristics that are invariant under continuous
     object                transformations [ISO 19107]
     topology              A branch of geometry describing the properties of a figure that are unaffected by
                           continous distorition [Collins Concise Dictionary]. NOTE Topology is mostly concerned
                           with identifying the connectivity of networks and the adjacency of surfaces.
     tuple                 An ordered list of values
     type                  A class in a classification system. NOTE See also data type.
     Uniform               A unique identifier, usually taking the form of a short string or address that is used to
     Resource              identify the location of a resource:
     Identifier (URI)      © OGC and ISO 2004 – All rights reserved 9
                           NOTE The general syntax is <scheme>::<scheme-specific-part>. The hierarchical
                           syntax with a namespace is
                           <scheme>://<authority><path>?<query> - see [RFC 2396].


     8.2      OutBreak Conventions

   The following conventions apply to the OutBreak input:

   Deprecated parts of previous versions of AMML
   The verb "deprecate" provides notice that the referenced portion of the specification is being
   retained for backwards compatibility with earlier versions but may be removed from a future
   version of the specification without further notice. There are currently no deprecated terms in
   this initial version of the specification.

   Symbols and Abbreviations
   The following symbols and abbreviations are used in this chapter:

   Abbreviation         Description
   AMML                 Agent-Based Modeling Markup Language
   CRS                  Coordinate Reference System
   CSV                  Comma Separated Values
   CT                   Coordinate Transformation
   DTD                  Document Type Definition
   GIS                  Geographic Information System
   GML                  Geography Markup Language
   HTTP                 Hypertext Transfer Protocol
   IETF                 Internet Engineering Task Force


77b63587-bd68-4731-a5f7-ebca3c46b609.doc                     72                                                    2/1/2013
State, Regional & Local GIs Data Sources


   iff             if and only if
   ISO             International Organization for Standardization
   OGC             Open GIS Consortium
   RDF             Resource Description Framework
   RFC             Request for Comments
   SMIL            Synchronized Multimedia Integration Language
   SOAP            Simple Object Access Protocol
   SVG             Scalable Vector Graphics
   UML             Unified Modeling Language
   URL             Uniform Resource Locator
   WKT             Well-Known Text
   WFS             Web Feature Service
   XML             Extensible Markup Language
   XSLT            eXtensible Stylesheet Language – Transformations
   0D              Zero Dimensional
   1D              One Dimensional
   2D              Two Dimensional
   3D              Three Dimensional

     8.3     State, Regional, Local GIS Data Sources

MPO                                                             SITE
Alabama
Birmingham RPC - Birmingham                                     http://www.bhammpo.org/
                                                                http://www.dothan.org/depts/commdev/commdev4g.ht
Dothan MPO - Dothan                                             ml
East Alabama Regional Planning and Development
Commission - Anniston                                           http://www.earpdc.org/
Gadsden-Etowah MPO - Gadsden                                    http://www.cityofgadsden.com/Default.asp?ID=255
Huntsville MPO - Huntsville                                     http://www.ci.huntsville.al.us/Planning/mpo.htm
http://www.alarc.org/lrcog/                                     http://www.alarc.org/lrcog/
Montgomery Division of PPT - Montgomery                         http://www.montgomerympo.org/

North-Central Alabama Regional COG - Decatur                    http://www.narcog.org/

Northwest Alabama COLG - Muscle Shoals                          http://www.nacolg.com/
South Alabama RPC - Mobile                                      http://www.sarpc.org/
West Alabama PDC - Northport                                    http://www.warc.info/index.php
Alaska
Anchorage MATS - Anchorage                                      http://www.muni.org/transplan/amats.cfm
Fairbanks MPO - Fairbanks                                       http://www.ci.fairbanks.ak.us/
Arizona
Central Yarapai - Prescott                                      http://www.cityofprescott.net/
Flagstaff MPO - Flagstaff                                       http://az-flagstaff.civicplus.com/index.asp
Maricopa AOG - Phoenix                                          http://www.mag.maricopa.gov/display.cms
Pima AOG - Tucson                                               http://www.pagnet.org/default.htm
Yuma MPO - Yuma                                                 http://www.ympo.org/
Arkansas
Arkhoma RPC (Bi-State) - Fort Smith                             http://www.wapdd.org/bi_aboutus.html
                                                                http://www.ci.hot-springs.ar.us/business-
Hot Springs MPO                                                 transportation-planning.html




77b63587-bd68-4731-a5f7-ebca3c46b609.doc              73                                          2/1/2013
State, Regional & Local GIs Data Sources


MPO                                                      SITE

Jonesboro Area Transportation Study - Jonesboro          http://www.jonesboro.org
Metroplan - Little Rock                                  http://www.metroplan.org/home.php
Northwest Arkansas RPC - Springdale                      http://www.nwarpc.com/
Southeast Arkansas RPC - Pine Bluff                      http://www.cityofpinebluff.com/
West Memphis MPO - West Memphis                          http://www.memphisregion.com/transportation.asp
California
AMBAG - Marina                                           http://www.ambag.org/

Butte County Association of Governments - Chico          http://www.bcag.org/

Council of Fresno County Governments - Fresno            http://www.fresnocog.org/
Kern COG - Bakersfield                                   http://www.kerncog.org/
Madera MPO - Madera                                      http://www.maderactc.org/

Merced County Association of Governments - Merced        http://www.mcag.cog.ca.us/
MTC - Oakland                                            http://www.mtc.ca.gov/
Sacramento Area COG - Sacramento                         http://www.sacog.org/
San Diego AOG - San Diego                                http://www.sandag.org/
San Joaquin County COG - Stockton                        http://www.sjcog.org/
San Luis Obispo COG - San Luis Obispo                    http://www.slocog.org/
Santa Barbara Cnty AOG - Santa Barbara                   http://www.sbcag.org/
SCAG - Los Angeles                                       http://www.scag.ca.gov/
Shasta County Regional TPA - Redding                     http://www.scrtpa.org/
Stanislaus AOG - Modesto                                 http://www.stancog.org/
Tulare County AOG - Visalia                              http://www.tularecog.org/
Colorado
DRCOG - Denver                                           http://www.drcog.org/

Mesa County Regional TPO - Grand Junction                http://www.mesacounty.us/rtpo/
North Front Range Transportation & Air Quality
Planning Council - Fort Collins                          http://www.nfrmpo.org/
Pikes Peak Area COG - Colorado Springs                   http://www.ppacg.org/
Pueblo Area COG - Pueblo                                 http://www.pacog.net/
Connecticut
Capital Region COG - Hartford                            http://www.crcog.org/
Central Connecticut RPA - Bristol                        http://www.ccrpa.org/

COG of Central Naugatuck Valley - Waterbury              http://www.cogcnv.org/
Greater Bridgeport RPA/MPO - Bridgeport                  http://www.gbrpa.org/

Housatonic Valley Cncil Elected Officials - Brookfield   http://hvceo.org/index.php
South Central Regional COG- North Haven                  http://www.scrcog.org/
South Western RPA- Stamford                              http://www.swrpa.org/
Southeastern Connecticut COG - Norwich                   http://www.norwichct.org/
Delaware
Dover/Kent MPO - Dover                                   http://www.doverkentmpo.org/indexmpo.html

Wilmington Area Planning Council - Newark                http://www.wilmapco.org/



77b63587-bd68-4731-a5f7-ebca3c46b609.doc         74                                       2/1/2013
State, Regional & Local GIs Data Sources


MPO                                                  SITE
District of Columbia

Metropolitan Washington COG - Washington             http://www.mwcog.org/
Florida
Brevard MPO - Melbourne                              http://www.brevardmpo.com/
Broward County MPO - Fort Lauderdale                 http://www.broward.org/mpo/

Charlotte County-Punta Gorda MPO - Punta Gorda       http://www.ccmpo.com/
Collier County MPO - Naples                          http://www.colliermpo.com/
First Coast MPO - Jacksonville                       http://www.firstcoastmpo.com/

Florida MPO Advisory Council - Tallahassee           http://www.mpoac.org/
West Florida MPO - Pensacola                         http://www.wfrpc.dst.fl.us/
Gainesville MPO - Gainesville                        http://www.ncfrpc.org/
Hernando County MPO - Brooksville                    http://www.co.hernando.fl.us/mpo/
Hillsborough County MPO - Tampa                      http://www.hillsboroughmpo.org/
Indian River County MPO - Vero Beach                 http://www.ircgov.com/boards/mpo/
Lee County MPO - North Fort Myers                    http://www.swfrpc.org/mpo.htm
Martin County MPO - Stuart                           http://www.martincountympo.com/
METROPLAN Orlando - Orlando                          http://www.metroplanorlando.com/home/
Miami Urbanized Area MPO - Miami                     http://miamidade.gov/wps/portal
Ocala/Marion County MPO - Ocala                      http://www.ocalamariontpo.org/
Palm Beach MPO - West Palm Beach                     http://www.pbcgov.com/
Panama City MPO - Pensacola                          http://www.wfrpc.dst.fl.us/bctpo/default.htm
                                                     http://www.pascocountyfl.net/menu/index/mpoindex.ht
Pasco County MPO - New Port Richey                   m
Pinellas County MPO - Clearwater                     http://www.pinellascounty.org/mpo/

Polk Transportation Planning Organization - Bartow   http://www.polk-county.net/county_offices/tpo/
Sarasota/Manatee MPO - Sarasota                      http://www.sarasota-manateempo.org/
St. Lucie MPO - Fort Pierce                          http://www.stluciempo.org/

Tallahassee-Leon County MPO - Tallahassee            http://www.talgov.com/planning/index.cfm
Volusia County MPO - Daytona Beach                   http://www.volusiacountympo.com/
Georgia
                                                     http://www.athensclarkecounty.com/~planningdept/ac
Athens-Clarke County MPO - Athens                    orts/

Albany Planning and Community Dev. - Albany          http://production.albany.ga.us/
Atlanta Regional Commission - Atlanta                http://www.atlantaregional.com/
                                                     http://www.augustaga.gov/departments/planning_zoni
Augusta-Richmond County PC - Augusta                 ng/

Chatham County-Savannah MPC - Savannah               http://www.thempc.org/
Columbus MPO - Columbus                              http://www.columbusga.com/mpo/
Glynn County Dept of Community Dev (Brunswick
ATS) - Brunswick                                     http://www.glynncounty.org/
Hinesville MPO - Hinesville                          http://www.cityofhinesville.org/

Macon-Bibb County Planning & Zoning - Macon          http://www.mbpz.org/



77b63587-bd68-4731-a5f7-ebca3c46b609.doc      75                                        2/1/2013
State, Regional & Local GIs Data Sources


MPO                                                   SITE
North Georgia Regioonal Development Center -
Dalton                                                http://www.ngrdc.org/
                                                      http://www.albany.ga.us/Plannning_Community%20D
Planning & Dev Services- Albany                       ev/plan_dev_services.htm
Rome/Floyd CPC - Rome                                 http://www.romega.us/departments/planningcom.asp
Valdosta MPO - Valdosta                               http://www.sgrdc.com/
Warner Robins MPO - Warner Robins                     http://warner-robins.org/
Hawaii
Oahu MPO - Honolulu                                   http://oahumpo.org/
Idaho
Bannock Planning Org. - Pocatello                     http://www.bannockplanning.org/
Bonneville MPO - Idaho Falls                          http://www.bmpo.org
Community Planning Association of Southwest Idaho -
Boise                                                 http://www.compassidaho.org/
Kootenai MPO - Coeur d'Alene                          http://www.kcgov.us/
Illinois

Bi-State Regional Commission - Rock Island            http://www.bistateonline.org/index_ie.shtml
CATS - Chicago                                        http://www.catsmpo.com/
Champaign County Regional Planning Commission -
Urbana                                                http://www.ccrpc.org/
                                                      http://www.cityofdanville.org/COD/Maps/DATS%20ho
Danville Area Transportation Study - Danville         me.htm

Dekalb-Sycamore Area Trans. Study - Dekalb            http://www.cityofdekalb.com/
Kankakee County RPC - Kankakee                        http://www.k3county.net/
Macon County RPC - Decatur                            http://www.ci.decatur.il.us/
McLean County RPC - Bloomington                       http://www.mcplan.org/
                                                      http://www.ci.rockford.il.us/government/works/index.cf
Rockford Area Transportation Study MPO - Rockford     m?section=planning&id=977

Springfield-Sangamon Cty RPC - Springfield            http://www.co.sangamon.il.us/transportation/
Tri-County RPC - Peoria                               http://www.tricountyrpc.org/
Indiana

Bloomington City Planning Comm - Bloomington          http://bloomington.in.gov/planning/
Columbus MPO - Columbus                               http://www.columbus.in.gov/
                                                      http://www.co.delaware.in.us/departments/plancommi
Delaware-Muncie MPO - Muncie                          ssion2/
Evansville Urban Transp. Study - Evansville           http://www.eutsmpo.com/home.htm
Greater Lafayette Area Transportation and
Development Study - Lafayette                         http://www.county.tippecanoe.in.us/
Indianapolis MPO - Indianapolis                       http://www6.indygov.org/indympo/
Kokomo/Howard County Governmental Coordinating
Council - Kokomo                                      http://www.kokomompo.com/
Madison County COG - Anderson                         http://www.mccog.net/
Michiana Area COG - South Bend                        http://macog.com/
Northeast Indiana RCC - Fort Wayne                    http://www.acdps.org/
Northwestern Indiana RPC - Portage                    http://www.nirpc.org/



77b63587-bd68-4731-a5f7-ebca3c46b609.doc        76                                      2/1/2013
State, Regional & Local GIs Data Sources


MPO                                                  SITE

West Central Indiana EDD, Inc - Terre Haute          http://www.westcentralin.com/
Iowa
Ames MPO - Ames                                      http://www.city.ames.ia.us/
Des Moines Area MPO - Des Moines                     http://www.dmampo.org/
Dubuque Metropolitan Area Transportation Study -
Dubuque                                              http://www.ecia.org/
Iowa Northland Regional COG - Waterloo               http://www.inrcog.org/
Johnson County COG - Iowa City                       http://www.jccog.org/
Linn County RPC - Cedar Rapids                       http://www.cedar-rapids.org/rpc/
Siouxland Interstate MPC - Sioux City                http://www.simpco.org/
Kansas

Lawrence-Douglas County Plng Office - Lawrence       http://www.lawrenceplanning.org/
Topeka-Shawnee County MPD - Topeka                   http://topeka.org/
Wichita-Sedgwick Cnty MAPD - Wichita                 http://www.wichita.gov/CityOffices/Planning/
Kentucky
Bowling Green MPO - Bowling Green                    http://www.bradd.org/MPO/index.asp
Green River ADD - Owensboro                          http://www.gradd.com/
Kentuckiana RP&DA - Louisville                       http://www.kipda.org/Home/Default.asp
Lexington Area MPO - Lexington                       http://www.lfucg.com/PlanDiv/TransportPlan.asp
Lincoln Trail Area Dev. Distric (LTADD) -
Elizabethtown                                        http://ltadd.org/
Louisiana

Capital Region Planning Commission - Baton Rouge     http://www.crpc-la.org/

Imperial Calcasieu Regional P & DC - Lake Charles    http://www.imcal.org/

Lafayette Consolidated Government - Lafayette        http://www.lafayettelinc.net/default.asp

New Orleans Regional Planning Commision              http://www.norpc.org/
North Delta RP&DD - Monroe                           http://www.northdelta.org/
Northwest Louisiana COG - Shreveport                 http://www.nwlainfo.com/

Rapides Area Planning Commission - Alexandria        http://www.rapidesplanning.com/
South Central Planning and Development
Commission - Gray                                    http://www.scpdc.org/scpdc/
Maine
Bangor ACTS - Bangor                                 http://www.emdc.org/
Kittery Area Comprehensive Transp. Study -
Springvale                                           http://www.smrpc.org/
Lewiston-Auburn Comprehensive Trans. Study -
Auburn                                               http://www.atrcmpo.org/
PACTS - Portland                                     http://www.pactsplan.org/
Maryland
Allegany County Dept. of Planning & Zoning -
Cumberland                                           http://gov.allconet.org/plan/index.htm




77b63587-bd68-4731-a5f7-ebca3c46b609.doc        77                                      2/1/2013
State, Regional & Local GIs Data Sources


MPO                                                    SITE

Baltimore Regional Transportation Board - Baltimore    http://www.baltometro.org/index.asp

Hagerstown/Eastern Panhandle MPO - Hagerstown          N/A
St. Charles MD MPO - St. Charles                       N/A
Salisbury MD/DE MPO - Salisbury                        http://www.wicomicocounty.org/
Massachusetts

Berkshire Regional Planning Commission - Pittsfield    http://www.berkshireplanning.org/
Boston MPO - Boston                                    http://ctps.org/bostonmpo/
Cape Cod Commission - Barnstable                       http://www.gocapecod.org/
Central Massachusetts RPC - Worcester                  http://www.cmrpc.org/
Merrimack Valley PC - Haverhill                        http://mvpc.org/
Montachusett RPC - Fitchburg                           http://www.mrpc.org/
Northern Middlesex COG - Lowell                        http://www.nmcog.org/
Old Colony Planning Council - Brockton                 http://www.ocpcrpa.org/
Pioneer Valley PC - West Springfield                   http://www.pvpc.org/
Southeastern RP & EDD - Taunton                        http://www.srpedd.org/

Michigan
Battle Creek ATS - Battle Creek                        http://members.aol.com/bcatsmpo/bcathome.htm

Bay County Board of Commissioners - Bay City           http://www.co.bay.mi.us/
Genesee County Metropolitan Planning Commission -
Flint                                                  http://www.co.genesee.mi.us/gcmpc-plan/

Grand Valley Metropolitan Council - Grand Rapids       http://www.gvmc.org/
Kalamazoo ATS - Kalamazoo                              http://www.katsmpo.org/

Macatawa Area Coordinating Council - Holland           http://134.215.205.97/
Region 2 Planning Commission- Jackson                  http://www.region2planning.com/
Saginaw County MPC - Saginaw                           http://www.saginawcounty.com/planning_dep/
Southeast Michigan COG - Detroit                       http://www.semcog.org/

Southwestern Michigan Commission - Benton Harbor       http://swmicomm.org/
Tri County RPC - Lansing                               http://www.tri-co.org/
West Michigan Shoreline RDC - Muskegon                 http://www.wmsrdc.org/
Minnesota
Arrowhead RDC - Duluth                                 http://www.ardc.org/

Metropolitan Council of the Twin Cities - Saint Paul   http://www.metrocouncil.org/
Rochester-Olmsted COG - Rochester                      http://www.co.olmsted.mn.us/
St. Cloud Area Planning Org. - Saint Cloud             http://www.stcloudapo.org/
Mississippi
Central Mississippi P&DD - Jackson                     http://www.cmpdd.org/

Gulf Regional Planning Commission - Gulfport           http://www.grpc.com/

Hattiesburg-Petal-Forest-Lamar MPO - Hattiesburg       http://hattiesburgms.com/u.html



77b63587-bd68-4731-a5f7-ebca3c46b609.doc        78                                         2/1/2013
State, Regional & Local GIs Data Sources


MPO                                                      SITE
Missouri
Columbia ATS - Columbia                                  http://www.gocolumbiamo.com/Planning/
East-West Gateway Coordinating Council - Saint
Louis                                                    http://www.ewgateway.org/
Jerfferson City MPO - Jefferson City                     http://www.jeffcitymo.org/
Joplin ATS - Joplin                                      http://www.fhwa.dot.gov/modiv/joplin.htm
Mid-America RC - Kansas City                             http://www.marc.org/
Springfield ATS - Springfield                            http://www.ozarkstransportation.org/
St. Joseph ATS - Saint Joseph                            http://www.ci.st-joseph.mo.us/publicworks/mpo.cfm
Montana
Great Falls City-County PB - Great Falls                 http://www.ci.great-falls.mt.us/people_offices/planning/

Missoula Policy Coordinating Comm - Missoula             http://www.co.missoula.mt.us/
Yellowstone County Planning Dept (Billings MPO) -
Billings                                                 http://ci.billings.mt.us/
Nebraska

Lincoln/Lancaster Planning Department - Lincoln          http://www.ci.lincoln.ne.us/city/plan/
Omaha-Council Bluffs MAPA - Omaha                        http://www.mapacog.org/
Nevada
Carson City MPO                                          http://www.carson-city.nv.us/
Regional Transportation Commission of Southern
Nevada - Las Vegas                                       http://www.rtcsouthernnevada.com/
Tahoe MPO - Zephyr Cove                                  http://www.trpa.org/
Washoe County RTC - Reno                                 http://www.rtcwashoe.com/
New Hampshire
Nashua RPC - Nashua                                      http://www.nashuarpc.org/
Salem-Plaistow-Windham MPO - Exeter                      http://www.rpc-nh.org/
Seacoast MPO - Dover                                     http://www.strafford.org/
Southern New Hampshire PC - Manchester                   http://www.snhpc.org/
New Jersey
North Jersey Transportation Planning Authority, INc. -
Newark                                                   http://www.njtpa.org/welcome.html
South Jersey TPO - Vineland                              http://www.sjtpo.org/
New Mexico
Farmington MPO - Farmington                              http://www.farmingtonmpo.org/
Las Cruces MPO - Las Cruces                              http://lcmpoweb.las-cruces.org/
Mid-Region COG - Albuquerque                             http://www.mrcog-nm.gov/index.htm
Santa Fe MPO - Santa Fe                                  http://www.santafenm.gov/
New York
Adirondack-Glens Falls TC - Fort Edward                  http://www.agftc.org/
Binghamton Metropolitan Transportation Study -
Binghamton                                               http://www.gobroomecounty.com/bmts/
CDTC - Albany                                            http://www.cdtcmpo.org/

Elmira-Chemung Transportation Council - Elmira           http://www.elmirampo.org/

Genesee Transportation Council - Rochester               http://www.gtcmpo.org/
Greater Buffalo-Niagara RTC - Buffalo                    http://www.gbnrtc.org/



77b63587-bd68-4731-a5f7-ebca3c46b609.doc        79                                          2/1/2013
State, Regional & Local GIs Data Sources


MPO                                                    SITE
                                                       http://www.co.oneida.ny.us/hoctsmpo/transportation.ht
Herkimer-Oneida Counties Transp. Study - Utica         ml
Ithaca-Tompkins County TC - Ithaca                     http://owasco.co.tompkins.ny.us/itctc/
New York Metropolitan TC - New York                    http://www.nymtc.org/
Newburgh-Orange County TC - Goshen                     http://www.co.orange.ny.us/
                                                       http://www.dutchessny.gov/CountyGov/Departments/P
Poughkeepsie-Duchess County TC - Poughkeepsie          lanning/PLPDCTCIndex.htm
Syracuse MTC - Syracuse                                http://www.smtcmpo.org/

Ulster County Transportation Council - Kingston        http://www.co.ulster.ny.us/planning/
North Carolina
Asheville Urban Area MPO - Asheville                   http://www.ci.asheville.nc.us/
Burlington-Graham MPO - Burlington                     http://www.mpo.burlington.nc.us/
Cabarrus/South Rowan MPO - Kannapolis                  http://www.crmpo.org
Capital Area MPO/LPA - Raleigh                         http://www.campo-nc.us/

Durham-Chapel Hill-Carrboro MPO - Durham               http://www.dchcmpo.org/
Fayetteville Area MPO - Fayetteville                   http://www.fampo.org/
Gaston Urban Area MPO - Gastonia                       http://www.cityofgastonia.com/
Goldsboro Transportation AC - Goldsboro                http://www.ci.goldsboro.nc.us/
Greensboro Transportation Advisory Committee -
Greensboro                                             http://www.greensboro-nc.gov/default.htm
                                                       http://www.greenvillenc.gov/departments/public_works
Greenville Urban Area TAC - Greenville                 _dept/default.aspx?id=76
High Point Transportation Advisory Committee - High
Point                                                  http://www.hpdot.net/HPMPO/default.htm
Jacksonville MPO - Jacksonville                        http://www.ci.jacksonville.nc.us/opencms/opencms
Mecklenburg - Union MPO - Charlotte                    http://www.mumpo.org/
Rocky Mount Area TAC - Rocky Mount                     http://www.wmpo.org/

Triangle J COG - Raleigh, Durham, Chapel Hill          http://www.tjcog.dst.nc.us/
Western Piedmont COG - Hickory                         http://www.wpcog.org/
Wilmington Urban Area MPO - Wilmington                 http://www.wmpo.org/
Winston-Salem/Forsyth Urban Area MPO - Winston-
Salem                                                  http://www.cityofws.org/dot/
North Dakota
                                                       http://www.bismarck.org/city_departments/department
Bismark-Mandan MPO - Bismarck                          /default.asp?dID=16
Fargo-Moorhead Metro COG - Fargo                       http://www.fmmetrocog.org/

Grand Forks/East Grand Fork MPO - Grand Forks          http://www.theforksmpo.org/
Ohio

Akron Metropolitan Area Transportation Study - Akron   http://ci.akron.oh.us/AMATS/

Brooke-Hancock-Jefferson MPC - Steubenville            http://www.bhjmpc.org/
Clark-Springfield TS - Springfield                     http://www.ci.springfield.oh.us/
Eastgate Regional COG - Youngstown                     http://www.eastgatecog.org/
Licking County ATS - Newark                            http://www.lcats.org/



77b63587-bd68-4731-a5f7-ebca3c46b609.doc      80                                          2/1/2013
State, Regional & Local GIs Data Sources


MPO                                                 SITE
Lima/Allen County RPC - Lima                        http://lacrpc.com/
Miami Valley RPC - Dayton                           http://www.mvrpc.org/
Mid-Ohio RPC - Columbus                             http://www.morpc.org/
Northeast Ohio Areawide Coordinating Agency
(NOACA) - Cleveland                                 http://www.noaca.org/
OKI Regional COG - Cincinnati                       http://www.oki.org/
Richland County RPC - Mansfield                     http://www.rcrpc.org/
                                                    http://www.co.stark.oh.us/internet/HOME.DisplayPage
Stark County Area Transportation Study - Canton     ?v_page=rpc
Toledo Metropolitan Area COG - Toledo               http://www.tmacog.org/
Oklahoma

Assoc. of Central Oklahoma Govts. - Oklahoma City   http://www.acogok.org/
Indian Nations COG - Tulsa                          http://www.incog.org/
Lawton MPO - Lawton                                 http://www.cityof.lawton.ok.us/
Oregon
                                                    http://www.ci.bend.or.us/depts/community_developme
Bend MPO - Bend                                     nt/bend_metropolitan/
Corvallis Area MPO - Corvallis                      http://www.corvallisareampo.org/
Lane COG - Eugene                                   http://www.lcog.org/lgs/trans.html
Metro - Portland                                    http://www.metro-region.org/
Mid Willamette Valley COG - Salem                   http://www.mwvcog.org/cog/mwvcog.asp
Rogue Valley COG - Central Point                    http://www.rvcog.org/
Pennsylvania
Centre Region COG - State College                   http://www.crcog.net/
DVRPC - Philadelphia                                http://www.dvrpc.org/
Erie Area Transportation Study - Erie               http://www.eriecountyplanning.org/

Johnstown Area Transportation - Ebensburg           http://www.co.cambria.pa.us/
Lackawanna-Luzerne Transportation Study
(Lackawanna County RPC) - Wilkes Barre              http://www.luzernecounty.org/luzerne/site/default.asp
Lancaster County TCC - Lancaster                    http://www.co.lancaster.pa.us/planning/site/default.asp

Lehigh Valley Planning Commission - Allentown       http://www.lvpc.org/
Lycoming County PC - Williamsport                   http://www.lyco.org/lyco/site/default.asp
Mercer County RPC - Hermitage                       http://www.mcrpc.com/
                                                    http://www.pamunicipalitiesinfo.com/counties/Blair/ind
Blair County (Altoona MSA) MPO - Altoona            ex.htm
Reading Area Transportation - Reading               http://www.co.berks.pa.us/planning/site/default.asp
Southwestern PA Commission - Pittsburgh             http://www.spcregion.org/
Tri-County RPC - Harrisburg                         http://www.tcrpc-pa.org/
Uniontown-Connellsville MPO - Untiontown            http://www.fayettepa.org/County/
York County Planning Commission - York              http://www.ycpc.org/
Puerto Rico
Puerto Rico DOT Public Works - San Juan             http://www.dtop.gov.pr/
Rhode Island
Rhode Island Statewide Planning Program -
Providence                                          http://www.planning.ri.gov/
South Carolina



77b63587-bd68-4731-a5f7-ebca3c46b609.doc      81                                      2/1/2013
State, Regional & Local GIs Data Sources


MPO                                                  SITE

B-C-D Council of Governments - North Charleston      http://www.bcdcog.com/
Central Midlands COG - Columbia                      http://www.centralmidlands.org/
City of Anderson MPA - Anderson                      http://www.andersoncountysc.org/

Florence Municipal/County P&BI Dept - Florence       http://www.florenceco.org/planninghome.htm
Grand Strand MPO - Georgetown                        http://www.wrcog.org/transport.htm
Greenville County PC - Greenville                    http://www.greenvilleplanning.com/
Rock Hill-Fort Mill ATS - Rock Hill                  http://www.cityofrockhill.com/
Spartanburg County Planning & Development Council
- Spartanburg                                        http://www.co.spartanburg.sc.us/

Sumter City-County Planning Commission - Sumter      http://www.sumtercountysc.org/planning/
South Dakota
Rapid City Area MPO - Rapid City                     http://www.rcgov.org/
Sioux Falls MPO - Sioux Falls                        http://www.secog.org/
Tennessee
Bristol MPO - Bristol                                http://www.bristoltn.org/

Chattanooga Hamilton County RPC - Chattanooga        http://www.chcrpa.org/

Clarksville-Montgomery County RPC - Clarksville      http://www.cityofclarksville.com/planningcommission/
Cleveland Area MPO - Cleveland                       http://www.cityofclevelandtn.com/
                                                     http://www.cityofjackson.net/departments/planning/MP
Jackson Urban Area MPO - Jackson                     O.html
Johnson City MPO - Johnson City                      http://www.jcmpo.org/
Kingsport MPO - Kingsport                            http://ci.kingsport.tn.us/
Knoxville Urban Area MPO - Knoxville                 http://www.knoxtrans.org/
Lakeway Area MTPO - Morristown                       http://www.lamtpo.org/
Memphis MPO - Memphis                                http://www.dpdgov.com/
Nashville MPO - Nashville                            http://www.nashvillempo.org/
Texas
Abilene MPO - Abilene                                http://www.abilenetx.com/
Amarillo MPO - Amarillo                              http://www.ci.amarillo.tx.us/
Brownsville MPO - Brownsville                        http://www.cob.us/
Bryan-College Station MPO - Bryan                    http://www.bcsmpo.org/
Capital Area Metropolitan Planning Organization -
Austin                                               http://www.campotexas.org/
                                                     http://www.cctexas.com/?fuseaction=main.view&page
Corpus Christi MPO - Corpus Christi                  =193
El Paso MPO - El Paso                                http://www.elpasompo.org/
Harlingen-San Benito MPO - Harlingen                 http://www.myharlingen.us/
Hidalgo County MPO - Mcallen                         http://www.lrgvdc.org/
Houston Galveston Area Council - Houston             http://www.h-gac.com/HGAC/home/Default.htm
Killeen-Temple MPO - Belton                          http://www.ktuts.org/

Laredo Metropolitan Planning Organization - Laredo   http://www.ci.laredo.tx.us/
Longview MPO - Longview                              http://www.ci.longview.tx.us/
Lubbock MPO - Lubbock                                http://mpo.ci.lubbock.tx.us/



77b63587-bd68-4731-a5f7-ebca3c46b609.doc       82                                       2/1/2013
State, Regional & Local GIs Data Sources


MPO                                                    SITE
North Central Texas COG - Arlington                    http://www.nctcog.org/
Permian Basin RPC - Midland                            http://www.txregionalcouncil.org/regions/PBRPC.php
San Angelo MPO - San Angelo                            http://www.sanangelompo.org/

San Antonio-Bexar City MPO - San Antonio               http://www.sametroplan.org/
Sherman-Denison MPO - Sherman                          http://www.sdmpo.org/
South East Texas RPC - Beaumont                        http://www.setrpc.org/
Texarkana MPO - Texarkana                              http://www.txkusa.org/tx/departments/mpo/
Tyler MPO - Tyler                                      http://www.cityoftyler.org/
Victoria MPO - Victoria                                http://www.victoriatx.org/planning/
Waco MPO - Waco                                        http://www.waco-texas.com/mpo.htm
Wichita Falls MPO - Wichita Falls                      http://www.cwftx.net/MPO/transportationplanning.htm
Utah
Cache MPO - Logan                                      http://www.cachempo.org/
Dixie MPO - St. George                                 http://www.fcaog.state.ut.us/
Mountainland AOG - Orem                                http://www.mountainland.org/
Wasatch Front Regional Council - Bountiful             http://www.wfrc.org/
Vermont
Chittenden County MPO - South Burlington               http://www.ccmpo.org/
Virginia
Central Virginia MPO (Region 2000 Regional
Commission) - Lynchburg                                http://www.regcomm.org/
Blacksburg-Christiansburg-Montgomery Area MPO -
Christiansburg                                         http://www.montva.com/departments/mpo/

Charlottesville-Albemarle MPO - Charlottesville        http://www.tjpdc.org/
Fredericksburg Area MPO - Fredericksburg               http://www.fampo.state.va.us/
Hampton Roads Planning District Commission -
Chesapeake                                             http://www.hrpdc.org/
                                                       http://www.virginiadot.org/projects/urbanplans/harrison
Harrisonburg-Rockingham MPO - Staunton                 burg.htm
Richmond Area MPO - Richmond                           http://richmondregional.org/
Roanoke Valley Area MPO - Roanoke                      http://www.rvarc.org/
Tri-Cities Area MPO - Petersburg                       http://www.craterpdc.state.va.us/MPO/mpo_main.htm

West Piedmont PDC (Danville MPO) - Martinsville        http://www.wppdc.org/
Winchester-Frederick MPO - Front Royal                 http://www.winfredmpo.org/
Washington
Benton-Franklin COG - Richland                         http://www.benton-franklin.cog.wa.us/
Cowlitz-Wahkiakum COG - Kelso                          http://www.cwcog.org/
Lewis Clark Valley MPO - Asotin                        http://www.lewisclarkmpo.org/
Puget Sound Regional Council - Seattle                 http://www.psrc.org/
Southwest Washington Regional Transportation
Council - Vancouver                                    http://www.rtc.wa.gov/

Spokane Regional Council of Governments - Spokane      http://www.srtc.org/
Thurston RPC - Olympia                                 http://www.trpc.org/
Wenatchee Valley Transportation Council -
Wanatchee                                              http://www.wvtc.org/



77b63587-bd68-4731-a5f7-ebca3c46b609.doc          83                                    2/1/2013
State, Regional & Local GIs Data Sources


MPO                                                SITE
Whatcom COG - Bellingham                           http://www.wcog.org

Yakima Valley Conference of Governments - Yakima   http://www.yvcog.org/
West Virginia
BCKP Regional Intergovernmental Council - South
Charleston                                         http://www.wvregion3.org/
Bel-O-Mar RC - Wheeling                            http://www.belomar.org/

KYOVA Interstate Planning Comm. - Huntington       http://www.wvs.state.wv.us/kyova/
Morgantown MPO - Morgantown                        http://www.plantogether.org/

Wood-Washington-Wirt Interstate PC - Parkersburg   http://www.movrc.org/
Wisconsin
Bay-Lake RPC - Green Bay                           http://baylakerpc.org/
                                                   http://www.co.brown.wi.us/planning/transportation.htm
Brown County PC - Green Bay                        l
Madison Area MPO - Madison                         http://www.madisonareampo.org/
East Central Wisconsin RPC - Menasha               http://www.eastcentralrpc.org/
Fond du Lac MPO - Fond du Lac                      http://www.eastcentralrpc.org/
Janesville MPO - Janesville                        http://www.ci.janesville.wi.us/
La Crosse APC - La Crosse                          http://www.lapc.org/
Southeastern Wisconsin RPC - Waukesha              http://www.sewrpc.org/
State Line ATS - Beloit                            http://www.ci.beloit.wi.us/
Wausau MPO c/o Marathon County Planning Dept -     http://www.co.marathon.wi.us/infosubtop.asp?dep=27
Wausau                                             &tid=3
West Central Wisconsin RPC - Eau Claire            http://www.wcwrpc.org/
Wyoming
Casper Area MPO - Casper                           http://www.casperwy.gov/
Cheyenne Area MPO - Cheyenne                       http://www.plancheyenne.org/




77b63587-bd68-4731-a5f7-ebca3c46b609.doc    84                                     2/1/2013
MIDAS Contacts Appendix for User Manual




9 Project Contacts
     9.1     AnyLogic™ Software
Michael Goedeke, AnyLogic™ Developer                          mgoedeke@rti.org
Feng Yu, AnyLogic™ Developer                                  fengyu@rti.org
Georgiy Bobachev, AnyLogic™ Developer                         bobashev@rti.org

     9.2     Cluster
Evan Patterson, Cluster Administrator                         emp@rti.org
Doug Roberts, High Performance Computing Specialist           droberts@rti.org
Diglio Simoni, High Performance Computing Specialist          dsimoni@rti.org

     9.3     High Performance Computing
Doug Roberts, Episims Co-developer                            droberts@rti.org
Diglio Simoni, High Performance Computing Specialist          dsimoni@rti.org

     9.4     Informatics Group
Diane Wagener, PI                                             dwagener@rti.org
Philip Cooley CoPI, Modeling Support                          pcc@rti.org
Peter Highnam, MIDAS Informatics Advisor                      Highnam@NIH.gov

         9.4.1     IG Contacts
             RG                                         IG contact               Email
             Ambulatory Pilgrim Health Care (Harvard)   Scott Holmberg           sholmberg@rti.org
             Emory University                           L. Ganapathi (Gana)      lganapathi@rti.org
             Harvard School of Public Health            Christine Layton         layton@rti.org
             Johns Hopkins University                   Steve Naron              narons@us.ibm.com
             University of California, Irvine           Diglio Simoni            disimoni@rti.org
             University of Pennsylvania                 George Ghneim            gsghneim@rti.org
             Virginia Bioinformatics Institute          Doug Roberts             droberts@rti.org

     9.5     Metadata Server
Bill Wheaton, Director GIS Applications                       wdw@rti.org
Jeannie Game, GME Administrator                               game@rti.org
David Chrest, GME Administrator                               davidc@rti.org

     9.6     MOAB and MAP
The MIDAS Help Resource                                       MIDAShelp@rti.org

     9.7     Model Comparison
Moshe Feder, Model Comparisons Task Leader                    mfeder@rti.org
Betsy Costenbader, Spatial Statistician                       ecostenbader@rti.org

     9.8     MIDAS Model Repository
Doug Roberts, Application Designer                            droberts@rti.org
L. Ganapathi (Gana), Content Administrator                    lganapathi@rti.org




77b63587-bd68-4731-a5f7-ebca3c46b609.doc                85                                            2/1/2013
MIDAS Contacts Appendix for User Manual


      9.9       OutBreak Visualization Tool
Aaron Parks, OutBreak Designer                                       aparks@rti.org
Diglio Simoni, Linux Cluster Specialist                              dsimoni@rti.org

      9.10 MIDAS Portal
Ying Qin, Portal Administrator                                       yingqin@rti.org
Tonya Farris, Content Administrator                                  tfarris@rti.org

      9.11 Spillover Computing Capacity
Diglio Simoni, Spillover Administrator                               dsimoni@rti.org
Evan Patterson, Cluster Administrator                                emp@rti.org
Doug Roberts, Spillover User/Developer                               droberts@rti.org

      9.12 State Preparedness Assessment
Scott Holmberg, Senior Infectious Disease Epidemiologist             sholmberg@rti.org
George Ghneim, Syndromic Analysis Specialist                         gsghneim@rti.org
Christine Layton, Vaccine Delivery Analyst                           layton@rti.org

      9.13 Synthetic Populations
Bill Wheaton, GIS Program Manager                                    wdw@rti.org
Bernadette Chasteen, GIS Analyst                                     bmc@rti.org

      9.14 User Manual
Susanna Cantor, Content Administrator                                scantor@rti.org
Diglio Simoni, Content Administrator                                 dsimoni@rti.org
Bill Wheaton, Manager GIS group                                      wdw@rti.org

      9.15 Validation
Phil Cooley, Informatics Model Support Co-PI                         pcc@rti.org

      9.16 Research Group Contacts
Name                  Institution                                      City/State        Email
Stephen Eubank        Virginia Polytechnic Institute [formerly Los     Blacksburg, VA    seubank@vbi.vt.edu
                      Alamos National Laboratory], Virginia
                      Bioinformatics Institute
Don Burke             Johns Hopkins University                         Baltimore, MD     dburke@jhsph.edu
Ira Longini           Emory University                                 Atlanta, GA       ilongin@sph.emory.edu
Robin M. Bush         Department of Ecology & Evolutionary and         Irvine, CA        rmbush@uci.edu
                      Biology, University of California Irvine
Mark Lipsitch         Harvard School of Public Health, Harvard         Boston, MA        mlipsitc@hsph.harvard.edu
Richard Platt         Ambulatory Pilgrim Health Care, Harvard          Boston, MA        richard_platt@hms.harvard.edu
Gary Smith            University of Pennsylvania, School of            Kennett           garys@vet.upenn.edu
                      Veterinary Medicine                              Square, PA




77b63587-bd68-4731-a5f7-ebca3c46b609.doc                   86                                                 2/1/2013
Java performance issues – Automatic Code Optimization July 25, 2006




10 Java Optimization
    Proper arrangement of Java code and invoking of a Java Virtual Machine (JVM) can produce
    a significant run-time performance effect. There is a detailed discussion in how to get the
    best performance out of the IBM Java 2 environment in its Diagnostics Guide.1

    The University of Maryland team uses the Mersenne Twister random number generator2
    because of its extremely large periodicity. They used the Java implementation from CERN.3
    Small modifications resulted in a performance increase from 216 ns to 15 ns per invocation.
    This runtime difference was largely affected by the type and degree of automatic code
    optimization built into the JVMs. Indeed, on-the-fly optimization may be an advantage over
    fully compiled languages, depending on the computational requirements of the problem.

    Stephen Wendel, a member of the MIDAS network at the University of Maryland, found that
    for certain codes, the MIDAS cluster’s default 1.5 JVM ran three times slower than the
    default 1.5 JVM on NCSA’s Cobalt machine.4 This is surprising because at least from a
    purely computer architecture point of view, the MIDAS cluster should be able to at least
    match the Cobalt machine.

    Subsequent research found that by modifying the code, adding a single parameter, or using a
    different JVM, the MIDAS cluster was able to dramatically improve performance. Java users
    should pay attention to the suggestions described below in order to determine if they might
    also contribute to enhanced computational performance increases in their own applications.

    Performance of some memory-intensive programs may be affected less by Java’s automatic
    code optimization and more by its automatic memory management. Some MIDAS
    researchers have a significant amount of experience with these issues because large disease
    spread agent-based discrete simulations can have very irregular access patterns of large
    amounts of memory. A separate report will be produced on this subject “Java performance
    issues – Automatic memory management.” The automatic features of Java provide many
    potential benefits, but should be understood for those doing intense and large-scale
    computing.


     10.1 Java Runtime options on the MIDAS cluster
    (See examples in Results sections below.)

    1. To use the IBM 1.5 JVM

1
  http://www.ibm.com/developerworks/java/jdk/diagnosis/ To tune the optimization parameters for
your code see the section on subsection on Selectively Disabling the JIT, page 261. A overview
of JVM optimization is provide in chapter 5, Understanding the JIT
2
  http://en.wikipedia.org/wiki/Mersenne_twister
3
 http://hoschek.web.cern.ch/hoschek/colt/V1.0.3/doc/cern/jet/random/engine/MersenneTwister.html
4
 In fact it was reported that: "Twister runs slower on MIDAS cluster 1. my Win desktop, 2. laptops
here 3. Macs here, 4. Linux here, 5. Linux cluster at UMD, 6. Cobalt, 7. Tungsten. Ie, for
everywhere we have run". Since the code is in Java it was easy to quickly test it on all these
platforms.



Produced for MIDAS (www.epimodels.org) by Steve Naron, narons@us.ibm.com             / page 87 of 94
Java performance issues – Automatic Code Optimization July 25, 2006



    source setJava --version 1.5          << Linux command to select the JVM
    use the “–Xjit:count=0” parameter on the JVM command

    The IBM JVM seems to use what it learns from the initial run results to optimize
    performance. For some Java code arrangements, this approach appears to be
    counterproductive. The “–Xjit:count=0” parameter suppresses this “optimization”, resulting
    in better performance (at least in this case of the initial code)

    2. To use the SUN 1.5 JVM

    source setJava --vendor Sun --version 1.5.0

    3. Modify code to allow optimization to help.

    The best performance was seen by modifying the code to move the often-repeated code into a
    separate part of the code (See TwisterMultiMethod) Notice in the results below how
    performance improves over time as the JVM learns how to best optimize its runtime.


     10.2 Lessons learned from this exercise:
    1. Anyone using Java for significant computational work should try these alterative
       approaches to see if they make a difference. Perhaps some further familiarization with
       JVM parameters could also help.
    2. Getting the code to run on different computers was as easy as copying the files. The Java
       code did run “everywhere” (see Footnote 1) which was handy while writing and
       debugging the code.
    3. Depending on the way it is invoked, the random number generator ran in as little as about
       11 nanoseconds (one billion calls in about 11 seconds). This seems fast for an
       “interpretative language” and shows that Java can now produce respectable performance.
       Using the original code, a trillion invocations were done on one processor in about 6
       hours without evidence of a repeat in the sequence.

    For help with Java performance, contact Steve Naron, narons@us.ibm.com




Produced for MIDAS (www.epimodels.org) by Steve Naron, narons@us.ibm.com            / page 88 of 94
Java performance issues – Automatic Code Optimization July 25, 2006




     10.3 Results (each stopped after 2 billion invocations)

    With the IBM 1.5 JVM with the default optimization (Original results)
    narons@node030:~/java> source setJava --version 1.5
    narons@node030:~/java> java -classpath .:colt.jar:concurrent.jar
    JavaMultipleTimerRepeat
    Mersenne Twister run 1.0E9 times.
    1. 1.0E9 Twister per call            216.04nsec. -- sum is 4.999992565929561E8
    2. 1.0E9 Twister per call            207.402nsec. -- sum is 5.000001102935505E8


    With the IBM 1.5 JVM with the –Xjit:count=0 parameter
    narons@node030:~/java> source setJava --version 1.5
    narons@node031:~/java> java -classpath .:colt.jar:concurrent.jar -Xjit:count=0
    JavaMultipleTimerRepeat
    Mersenne Twister run 1.0E9 times.
    1. 1.0E9 Twister per call            24.566nsec. -- sum is 4.999977676796682E8
    2. 1.0E9 Twister per call            24.08nsec. -- sum is 4.999876955170463E8


    With the Sun 1.5 JVM
    narons@node030:~/java> source setJava --vendor Sun --version 1.5.0
    narons@node030:~/java> java -classpath .:colt.jar:concurrent.jar
    JavaMultipleTimerRepeat
    Mersenne Twister run 1.0E9 times.
    1. 1.0E9 Twister per call            19.376nsec. -- sum is 4.999958696667034E8
    2. 1.0E9 Twister per call            21.594nsec. -- sum is 5.0001468488907087E8




    Modified code to take advantage of optimization (see code below)
    Both with default optimization

    With the IBM 1.5 JVM
    narons@node030:~/java> source setJava --version 1.5
    narons@node031:~/java> java -classpath .:colt.jar:concurrent.jar
    TwisterMultiMethod IBM
    Mersenne Twister run 1000000000 times.
    IBM:0. Twister per call 97.536nsec. -- sum is 5.000097932673301E8
    IBM:1. Twister per call 14.022nsec. -- sum is 5.0001137627635044E8

    Tried one JIT parameter experiment (–Xjit:count=1), which eventually got the time-per-call to
    below 12.3 nsec. Further experimentation might help more. This type of experimentation
    might be valuable for your code. See reference in Footnote 1.



Produced for MIDAS (www.epimodels.org) by Steve Naron, narons@us.ibm.com             / page 89 of 94
Java performance issues – Automatic Code Optimization July 25, 2006




     With the Sun 1.5 JVM
     narons@node031:~/java> source setJava --vendor Sun --version 1.5.0


     narons@node031:~/java> java -classpath .:colt.jar:concurrent.jar
     TwisterMultiMethod SUN
     Mersenne Twister run 100000000 times.
     SUN:0. Twister per call 13.11nsec. -- sum is 4.999835690899922E7
     SUN:1. Twister per call 11.31nsec. -- sum is 4.999979580736721E7




code: JavaMultipleTimerRepeat
import cern.jet.random.*;
//import cern.jet.*;
import cern.jet.random.engine.MersenneTwister;
//                          timing of random number generator
// and looks for repeat in first and second number in sequence in order
class JavaMultipleTimerRepeat {
         public static void main (String args[]) {
                  long start, stop, elapsed, loops = 1000000;
                  int repeats = 0;
                  double freq;
                  double n = 1000000000;
                  double sum, first, second, next, present;
                  MersenneTwister generator = new MersenneTwister(new java.util.Date());


                  System.out.println("Mersenne Twister run " + n + " times.");
                  first = generator.raw();
                  second = generator.raw();
                  next = generator.raw();


// exit from this loop if ever find first/second repeating
         retry:
                  for (int j=1; j<loops; ++j) {
                            start = System.currentTimeMillis();
                            sum = 0.0;
                            for (int i=0; i<n; ++i) {
                                     sum += present = next;
                                     next = generator.raw();
                                     if (present == first) {
                                              repeats ++;
                                              System.out.println(j + " " + i + " " + repeats + " one
repeat ");
                                              if (second == next) {



Produced for MIDAS (www.epimodels.org) by Steve Naron, narons@us.ibm.com                     / page 90 of 94
Java performance issues – Automatic Code Optimization July 25, 2006


                                                        System.out.println(j + " " + i + " repeat two in a
row!!");
                                                        break retry;
                                               }
                                     }
                            }
                            elapsed = System.currentTimeMillis() - start;
                            freq = 1.0E6 * elapsed / n;
                            System.out.println(j +". " + n + " Twister per call" + "\t" + freq + "nsec. -
- sum is " + sum);
                   }
           }
}
// Compile
// Windows         javac -classpath colt.jar;concurrent.jar JavaMultipleTimerRepeat.java
// Linux           javac -classpath colt.jar:concurrent.jar JavaMultipleTimerRepeat.java
//
// Run
// Windows         java -classpath .;colt.jar;concurrent.jar JavaMultipleTimerRepeat
// Linux           java -classpath .:colt.jar:concurrent.jar JavaMultipleTimerRepeat
// Linux (with IBM JVM 1.5)          java -classpath .:colt.jar:concurrent.jar -Xnoquickstart
JavaMultipleTimerRepeat




code: TwisterMultiMethod

import cern.jet.random.*;
//import cern.jet.*;
import cern.jet.random.engine.MersenneTwister;


//         timing of random number generator
//
// We use two (other) methods to ensure the JIT can process them while they are inactive.
//


class TwisterMultiMethod {


     static MersenneTwister generator = new MersenneTwister(new java.util.Date());


     public static void main (String args[])
     {
           long start, stop, elapsed;
           double freq;
           int   n = 1000000000;
           double sum;




Produced for MIDAS (www.epimodels.org) by Steve Naron, narons@us.ibm.com                      / page 91 of 94
Java performance issues – Automatic Code Optimization July 25, 2006


          System.out.println("Mersenne Twister run " + n + " times.");


          for (int j=0; j<200; ++j)
               {
                    start = System.currentTimeMillis();
                    sum = genlots(n);
                    stop   = System.currentTimeMillis();


                    elapsed = stop - start;
                    freq   = 1.0E6 * elapsed / n;
                    System.out.println(args[0] + ":" + j + ". Twister per call" + "\t" + freq + "nsec. --
sum is " + sum);
          };
     }




     private static double genlots(int count)
     {


          double total = 0.0;


          for (int i=0; i<count; ++i)
               {
                    total += generator.raw();
               };
          return total;
     }




}
// Compile
// Windows          javac -classpath colt.jar;concurrent.jar TwisterMultiMethod.java
// Linux            javac -classpath colt.jar:concurrent.jar TwisterMultiMethod.java
//
// Run
// Windows          java -classpath .;colt.jar;concurrent.jar TwisterMultiMethod
// Linux            java -classpath .:colt.jar:concurrent.jar TwisterMultiMethod
// Linux            java -classpath .:colt.jar:concurrent.jar -Xjit:bcount=0 -Xjit:verbose
TwisterMultiMethod
// Linux            java -classpath .:colt.jar:concurrent.jar -Xjit:bcount=0 TwisterMultiMethod




Produced for MIDAS (www.epimodels.org) by Steve Naron, narons@us.ibm.com                     / page 92 of 94

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:5
posted:2/1/2013
language:English
pages:94
xuxianglp xuxianglp http://
About