Archiving State and Local Agency Digital Geospatial Data

Document Sample
Archiving State and Local Agency Digital Geospatial Data Powered By Docstoc
					              Archiving State and Local Agency
              Digital Geospatial Data:
              Looking for Solutions

              Steven P. Morris
              Head of Digital Library Initiatives
              North Carolina State University Libraries



NDIIPP Multistate Geospatial Project Kickoff Meeting   January 23, 2008
Looking for Solutions: Outline
 Approaches to Archiving and Preservation
 Current and Recent Geoarchiving Projects
 Content Identification
 Content Selection
 Content Exchange
 Digital Repository Development
 Engaging Spatial Data Infrastructure
 Archives Processes


                 Note: Percentages based on the actual number of
                           respondents to each question            2
Different Ways to Approach Preservation

  Technical solutions: How do we preserve acquired content
  over the long term?

  Cultural/Organizational solutions: How do we make the
  data more preservable—and more prone to be preserved—
  from point of production?


  Current use and data sharing requirements – not archiving
  needs – are most likely to drive improved preservability of
  content and improvement of metadata
                     Note: Percentages based on the actual number of
                               respondents to each question            3
Different Ways to Approach Preservation
 Technical solutions: How do we archive acquired
 content over the long term?
    Build data repositories: not just as an end in itself
    but also as a catalyst for discussion within the
    data community
    Develop repository ingest workflows: create
    technical points of engagement with other NDIIPP
    preservation projects and build on collective
    learning experience



                  Note: Percentages based on the actual number of
                            respondents to each question            4
Different Ways to Approach Preservation
 Cultural/Organizational solutions: How do we
 make the data more preservable—and more prone to
 be archived—from point of production?
    Engage data producer community and spatial data
    infrastructure through outreach and engagement;
    influence practice
    Sell the problem to software vendors and
    standards development
    Find overlap with more compelling business
    problems: disaster preparedness, business
    continuity, road building, etc.
    Start a discussion about roles at the local, state,
    and federal level
                  Note: Percentages based on the actual number of
                            respondents to each question            5
Current or Recent Geospatial Data
        Archiving Projects




          Note: Percentages based on the actual number of
                    respondents to each question            6
Selected Geospatial Data Archive Projects
Project                               Organizations                           Funding
Persistent Archives Testbed           San Diego Supercomputer                 NARA
                                      Center, NARA
VanMap                                San Diego Supercomputer                 Inter-
                                      Center                                  PARES
Geospatial Repository for             EDINA                                   JISC
Academic Deposit &
Extraction
Geospatial Electronic Records CIESIN                                          NHPRC

various                               Carleton University                     various

National Geospatial Digital           UC Santa Barbara                        NDIIPP
Archive
Maine GeoArchives                     State of Maine                          NHPRC



                            Note: Percentages based on the actual number of
                                      respondents to each question                      7
NC Geospatial Data Archiving Project
 Partnership between university library (NCSU) and
 state agency (NCCGIA), with Library of Congress under
 the National Digital Information Infrastructure and
 Preservation Program (NDIIPP)
 One of 8 initial NDIIPP collection building partnerships
 Focus on state and local geospatial content in North
 Carolina (state demonstration)
 Tied to NC OneMap initiative, which provides for
 seamless access to data, metadata, and inventories
 Objective: engage existing state/federal geospatial
 data infrastructures in preservation
  Serve as catalyst for discussion within industry
                  Note: Percentages based on the actual number of
                            respondents to each question            8
NCGDAP Goals
 Repository Goal
    Capture at-risk data
    Explore technical and organizational
    challenges
 Project End Goal
    Data Producers: Improved temporal data
    management practices
    Archives: More efficient means of
    acquiring and preserving data;
    Progress towards best practices

 Temporal data management vs. long-term preservation
                  Note: Percentages based on the actual number of
                            respondents to each question            9
Content Identification




     Note: Percentages based on the actual number of
               respondents to each question            10
Formal Inventory Processes
  Alleviate ―contact fatigue‖ on part of local
  agencies
    20 different NC state agencies contact local
    agencies for data … also, federal/regional
    agencies
  Geospatial data is complex, requiring lengthy
  inventory process
    Must capture descriptive, technical, and
    administrative information related to the data
  Make the inventory available as a sharable
  data store
                 Note: Percentages based on the actual number of
                           respondents to each question            11
   RAMONA Inventory System
      -- From March 2006
      -- Selective nationwide coverage




Note: Percentages based on the actual number of
          respondents to each question            12
What do Inventories Offer to Archives?
  Data Availability Information
    Detailed information by data layer
  Contact Information
  Minimal Metadata
    Descriptive, technical, administrative
  Rights Information
  Document Technical Environment
    Software used, formats, transfer methods
  Future Data Development Plans
                  Note: Percentages based on the actual number of
                            respondents to each question            13
   Detailed Information About Data
         Imagery Type
                1.5%   1.5%                                                         Maintenance Frequency
                              21.0%                                                                          3.0%        2.0%
                                                                                                                                3.0%
                                                                                            10.6%
                                      Color-Infrared                                                                               1.5%
                                      True Color                                           1.7%
                                      Black & White                                      1.6%
                                      Not Sure                                                                                         10.6%


   76.0%



Data Meet LRMP Specifications?
                                                                                                                                          7.6%
        16.9%



 3.1%
                                                                                                  36.4%
                                         Yes
                                         No                          Daily                  Annually          Every 2 Yrs.      Every 3 Yrs.
                                                                     Every 4 Yrs.           Every 5 Yrs.      As Funds Allow    Other
                                         Not Sure                    Not Sure               Not Maintained


                              80.0%



                                                Note: Percentages based on the actual number of   Source: NC OneMap Data Inventory 2004
                                                          respondents to each question                                                           14
Inventories as Source of Metadata
Example: Surface Water
           Source of Data                                               Surface Water Attributes

30%                                                         70%
25%                                                         60%
20%                                                         50%
15%                                                         40%
10%                                                         30%
 5%                                                         20%
 0%                                                         10%
                                                             0%
  CGIA 1:24k
  CGIA 1:100k
  USGS 1:24k DLG                                                        Stream Name    Stream Class
  USGS 1:24k DRG                                                        Stream Order   Stream Type
  Private Contractor
  Delineated from Locally-Produced Ortho                                Other          Not Sure
  Elevation Data                                                        None
  Other
  Not Sure
                              Note: Percentages based on the actual number of
                                        respondents to each question                                  15
Content Selection




   Note: Percentages based on the actual number of
             respondents to each question            16
Selection Issues
  Most content is already at some level of risk
  Early-Middle-Late Stage issues
    Middle stage is usually the ―sweet spot‖, e.g. TIFF
    orthophotos vs. raw images or compressed images
  Also added-value products: digital maps,
  cartographic representation
    Digital maps: ―record‖ or not?
  Frequency of capture


                 Note: Percentages based on the actual number of
                           respondents to each question            17
               Problem:
               Multiple choice for: format type,
               coordinate system, tiling scheme




Note: Percentages based on the actual number of
          respondents to each question             18
                Time series – vector data
        Parcel Boundary Changes 2001-2004, North Raleigh, NC




Continuously updated data:
       Frequency of snapshots?
       Different for various framework layers?



                          Note: Percentages based on the actual number of
                                    respondents to each question            19
Sept. 2006 Frequency of Capture Survey
 Survey objective:
   Document current practices for obtaining archival snapshots
   of county/municipal geospatial vector data layers
   Seek guidance about frequency of capture
 Survey topics:
   General questions about data archiving practice
   Specific questions about parcels, street centerlines,
   jurisdictional boundaries, and zoning
 Survey subjects:
   All 100 counties and 25 municipalities
   58% response rate
   Survey conducted September 2006
                    Note: Percentages based on the actual number of
                              respondents to each question            20
Data Capture Survey Results: Overview
 Two-thirds of responding agencies create and retain
 periodic snapshots
 Long-term retention more common in counties with
 larger populations
 Storage environments vary, with servers and CD-
 ROMs most common
 Offsite storage (or both onsite and offsite) is used by
 nearly half of the respondents
 Popularity of historic images has resulted in scanning
 and geo-referencing of hardcopy aerial photos among
 one-third of the respondents

                  Note: Percentages based on the actual number of
                            respondents to each question            21
Survey Observations
 Process of survey formulation and
 implementation helped to socialize the problem
 of archiving data
 Local innovation needs to be mined further to
 inform development of best practices
 Business drivers for archiving need more study
 (e.g., stated adherence to retention policy)
 Exposure to peer practice encourages archiving
 Pronounced local interest in scanning/rectifying
 older analog maps and imagery

                  Note: Percentages based on the actual number of
                            respondents to each question            22
Content Exchange




  Note: Percentages based on the actual number of
            respondents to each question            23
Solutions: Content Exchange Infrastructure
 High volume of state/federal requests for local data
 Solving the present-day problems of data sharing is a
 pre-requisite to solving the problem of long-term
 access
 Leveraging more compelling business reasons to put
 the data in motion (disaster preparedness, business
 continuity, highway construction, census, …)
 Content exchange networks:
    Minimize need to make contact
    Add technical, administrative, descriptive metadata
    Establish rights and provenance
                  Note: Percentages based on the actual number of
                            respondents to each question            24
Transfer Modes - Conventional

  CD/DVD
    e.g., 230 CD-ROMs for 1999 Wake County orthophotos
  External drives
    Becoming more routine
  FTP
    Bandwidth intensive: restricted to off hours, or not done
  WAN (Wide Area Network)
    Network incompatibilities, network load
  Web Download
    Complex interfaces make automation difficult
                    Note: Percentages based on the actual number of
                              respondents to each question            25
Transfer Modes - Web Services

  WMS (Web Map Service)
    Can only capture derived static images, losing the
    underlying data intelligence
    Possible use for agent-based image atlas creation
  WFS (Web Feature Service)
    Transfers actual vector data as GML
    Not widely deployed; variation in configuration
    Scalability for bulk transfer questionable
  Federal Enterprise Architecture Geospatial Profile
  suggests WMS, WFS, FTP
                    Note: Percentages based on the actual number of
                              respondents to each question            26
Repository Development




     Note: Percentages based on the actual number of
               respondents to each question            27
Repository Pre-ingest Workflow

    Data Receipt



 Format Processing


 Metadata Processing



  Ingest Processes




                                 28
NCGDAP Workflow – Data Receipt

   Data Receipt        Acquisition



 Format Processing    Reorganization


Metadata Processing
                        Validation


 Ingest Processes
                      Threat Analysis



                         Inventory




                                        29
Workflow – Format Processing

   Data Receipt



 Format Processing       Conversion


Metadata Processing
                      Compound Formats


 Ingest Processes




                                         30
Workflow – Metadata Processing

   Data Receipt



 Format Processing


Metadata Processing    Creation



 Ingest Processes     Remediation




                                    31
Workflow – Ingest Processes

    Data Receipt



 Format Processing


 Metadata Processing



  Ingest Processes     Metadata Conversion



                          SIP Creation




                                             32
Extended Curation: Feedback and Outreach


   Data Receipt

                       Content Producers

 Format Processing

                           Industry
Metadata Processing
                          Standards
                         Organizations
 Ingest Processes




                                           33
Engaging Spatial Data Infrastructure




           Note: Percentages based on the actual number of
                     respondents to each question            34
   NC Spatial Data Infrastructure: NC OneMap
NC OneMap is a next generation mechanism to coordinate and
  disseminate geographic information in North Carolina and
  interact with the NSDI.

Objectives:

• Build a common
understanding of North
Carolina data resources

• Enable widespread
access and distribution
of geospatial data

                          Note: Percentages based on the actual number of
                                    respondents to each question            35
   NC OneMap
Objectives (cont.):
• Develop ongoing data
inventory for all geospatial data
holdings RAMONA –
http://nc.gisinventory.net

• Develop content standards
for key data themes
NC Geographic Information
Coordinating Council (GICC)

   One of the defined characteristics of NC OneMap is that
   “Historic and temporal data will be maintained and
   available”.               Note: Percentages based on the actual number of
                                       respondents to each question            36
Points of Engagement with Spatial Data
Infrastructure
 Framework data communities
    Snapshot frequency, naming schemes, classification, GML
    application schemas, format strategies
 Metadata standards and outreach
    Persistent identifiers, versioning, feedback on metadata quality
 Content replication/transfer
    For data improvement projects, disaster preparedness,
    aggregation by regional service providers, … and archives
 Where does archiving and preservation fit in?


                     Note: Percentages based on the actual number of
                               respondents to each question            37
Engaging Industry




   Note: Percentages based on the actual number of
             respondents to each question            38
Cultural: Changing Industry Thinking
  Is the geospatial industry ―temporally-impaired?‖
      Lack of access to older data
      Lack for tool/model support for temporal analysis
      Metadata: poor support for changing data
      Education: building class projects around
      available data (i.e., not temporal)
  Increased interest now in temporal applications?
      Increased demand for temporal data?
      Improved tool support: ArcGIS 9.2 animation
      tools; Geodatabase History, etc.

                  Note: Percentages based on the actual number of
                            respondents to each question            39
                 What About Commercial Data?

    Project Status                                     Cultivating a commercial
                                                        market for older data.




Part of “permanent access” is
marketing, advertising, and
putting older data into the path of
the user
                        Note: Percentages based on the actual number of
                                  respondents to each question                    40
Points of Engagement with the Open
Geospatial Consortium (OGC)
 Geography Markup Language (GML) for archiving
 (PDF/A version of GML?)
 GeoRM (Geo Rights Management)
   Adding preservation use cases
 Content Packaging
   Will there be an industry solution?
 Web Services Context Documents
   Can we save data state as well as application state?
 Content Replication
   Is this a layer in the overall architecture?
 Persistent Identifiers
                 Note: Percentages based on the actual number of
                           respondents to each question            41
Archives Processes




   Note: Percentages based on the actual number of
             respondents to each question            42
Maine GeoArchives Project Components
 Retention schedules
    Geospatial data
    Administrative records
 Record accessioning
 Appraisal system
 System documentation
 Archival data and metadata standards
 Rules for disposition of local government records



                  Note: Percentages based on the actual number of
                            respondents to each question            43
Maine GeoArchives: Functional Requirements

 Adopted set of functional requirements for recordkeeping
 systems to insure permanent retention of data layers

      Compliance                                              Auditability
      Responsible                                             Availability
      Credibility                                             Exportable
      Completeness                                            Renderable
      Authenticity                                            Redactable
      Soundness


                   Note: Percentages based on the actual number of
                             respondents to each question                    44
 Conclusion




Note: Percentages based on the actual number of
          respondents to each question            45
Key issues
 What are the points of intersection between archive
 needs and business continuity/disaster preparedness
 and other business needs?
 How to best stimulate and learn from innovation at the
 state/regional/local level?
 How to make data more preservable from point of
 production and on through data transfer
 How to most effectively move data in an efficient, well-
 documented manner with clarified rights



                  Note: Percentages based on the actual number of
                            respondents to each question            46
Key issues (continued)

 How to best make State Archives a part of spatial data
 infrastructure?
 How should tradeoffs between level of curation and
 quantity of acquisition be made?
 Defining the record: data vs. derivative components
 How to best cross-fertilize with other projects (NDIIPP,
 NHPRC, etc.)




                  Note: Percentages based on the actual number of
                            respondents to each question            47
            Questions?
Contact:

Steve Morris
Head, Digital Library Initiatives
NCSU Libraries
ph: (919) 515-1361
Steven_Morris@ncsu.edu

http://www.lib.ncsu.edu/ncgdap
            Note: Percentages based on the actual number of
                      respondents to each question            48
Emerging Regional Partnerships

 Focused on development of shared infrastructure for
 cultivating access to data
 Becoming test beds for innovation in the area of data
 sharing and data management, including archiving




                  Note: Percentages based on the actual number of
                            respondents to each question            49
Workflow Overview
                  Handout 1




        Note: Percentages based on the actual number of
                  respondents to each question            50
Workflow Focus: Digital
Format Curatorship
           Handout 2




         Note: Percentages based on the actual number of
                   respondents to each question            51
Workflow Focus: Geospatial
Metadata Management
           Handout 3




         Note: Percentages based on the actual number of
                   respondents to each question            52
   Question #1 (the filter)
―Do you create periodic snapshots of any vector datasets
  for long-term retention and archiving?‖
                                       Jurisdictions Archiving Snapshots

Response:                        No: 34.7%
yes = 65.3%,
no = 34.7%*
                                                                           No response
                                                                           Yes
(out of 57.6%
response rate)                                                             No


                                Yes: 65.3%




                 *   Respondents answering ―No‖ automatically skip most of the
                     remaining questions
                                                                                         53
Key Results: Capture Frequency
                                                     Capture Frequency

                     50.0%
                     45.0%
  % of Respondents




                     40.0%
                     35.0%
                     30.0%                                                                  Parcels
                     25.0%                                                                  Street Centerlines
                     20.0%                                                                  Zoning
                     15.0%
                     10.0%
                      5.0%
                      0.0%



                                                                                        d
                                           s




                                                                            ly
                                                      rly



                                                                ly
                             lly


                                         th




                                                                                     ve
                                                                         ai
                                                              th
                                                   rte
                            a


                                       on




                                                                                   Sa
                                                                         D
                          nu




                                                            on
                                                ua
                                      M




                                                                     or
                        An




                                                            M




                                                                                    t
                                                                                 No
                                               Q
                                   6




                                                                   kly
                                  y
                               er




                                                                 ee
                             Ev




                                                                W




                                                        Frequency

                                                                                                                 54
Key Results: Formats
                                            Format of Snapshot

                      90.0%
                      80.0%
   % of Respondents




                      70.0%
                                                                   Parcels
                      60.0%
                      50.0%                                        Street Centerlines
                      40.0%                                        Jurisdictional Boundaries
                      30.0%
                                                                   Zoning
                      20.0%
                      10.0%
                       0.0%
                                                       )
                                                      e
                                                    se
                                                   file




                                                   00



                                                              er
                                                  ag
                                                 ba




                                                            th
                                                 (e
                            e




                                                er
                          ap




                                                           O
                                              ta




                                               e
                                             ov
                                            da
                        Sh




                                           ng
                                           C


                                         ha
                                        eo



                                         c
                                       Ar
                                       G




                                      rc
                                    te
                                  In
                                c
                              Ar




                                             Formats


                                                                                               55
Key Results: Formats
                                    Format Conversion Involved?

                      120.0%

                      100.0%
   % of Respondents




                      80.0%
                                                                                  No
                      60.0%
                                                                                  Yes
                      40.0%

                      20.0%

                       0.0%
                               Parcels      Street      Jurisdictional   Zoning
                                          Centerlines    Boundaries
                                                   Dataset



                                                                                        56
Key Results: Metadata

                                          Metadata Archived?

                    70.0%
 % of Respondents




                    60.0%
                    50.0%
                    40.0%
                    30.0%
                    20.0%
                    10.0%
                     0.0%
                            FGDC format    Locally defined    NC OneMap         None
                                             metadata        metadata starter
                                                                 block



                                                                                       57
Key Results: Storage

                                   Storage Environment

                    70.0%
 % of Respondents




                    60.0%
                    50.0%
                    40.0%
                    30.0%
                    20.0%
                    10.0%
                     0.0%
                            Tape   CD      DVD    External Server or   Other
                                                 Hard Drive Online
                                                            Storage




                                                                               58
Key Results: Storage

                                    Storage Location

                   60.0%
% of Respondents




                   50.0%
                   40.0%
                   30.0%
                   20.0%
                   10.0%
                   0.0%
                           Onsite             Offsite   Both Onsite and
                                                            Offsite




                                                                          59
Key Results: Reasons for Archiving


                                                    Driving Factors

                   60.0%
% of Respondents




                   50.0%
                   40.0%
                   30.0%
                   20.0%
                   10.0%
                   0.0%
                           IT policy   Records     Tax admin   Land use   Resolution   Historic   Other
                                       retention     rules      change     of legal    mapping
                                         policy                analysis    issues




                                                                                                          60

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:1
posted:8/24/2011
language:English
pages:60