Urn_Resolution_1_ by wuxiangyu


									                                                                     GIN DD-09-001

                Groundwater Information Network
           Aquifer Mapping for Groundwater Resources Program
            Earth Sciences Sector / Natural Resources Canada
                       Geological Survey of Canada

                       Technical considerations for implementing a
                           URN resolution mechanism for GIN

Date : April 2, 2011

Document Reference Number : GIN DD-09-001
Category : Discussion Document
Author: Eric Boisvert

                                                                          GIN DD-09-001

1. Revision History

Date              Release            Author            Description
2009/10/09        0.1.0              Eric Boisvert     First version
2009/10/19        0.2.0              Eric Boisvert     Rewrote section 4
2009/10/26        0.3.0              Eric Boisvert     Rewrote section 5
2009/11/20        0.4.0              Eric Boisvert     Added comments / improvement
                                                       from A. Ritchie and discussions
                                                       with S. Cox
2009/11/24        0.5.0              Eric Boisvert     Added URI is URL to resolver

2. Introduction
This document explores the concept of urn resolutions as described in
on). This document will be used to guide Use Case redaction and Testbed planification
for both GIN and GeoSciML communities. Typical GWML datasets, and other complex
GML documents such as GeoSciML, use several externally defined concepts, such as
vocabularies, units of measures, and common entities or features. Furthermore, due to
the deeply nested and recursive nature of complex models, serialized version of the
dataset must also use internal and external references to remove duplications and limit the
size and the complexity of the dataset. This report does not discuss the resolver
implementation details, which should be discussed in another document on the
implementation of the resolver, but discusses some aspect of the resolution seen from the
client application perspective.
GML has a certain set of syntactic constructs to allow an external reference to be
included in the XML document. The external reference can either be an explicit
reference to a fragment of valid XML that lies elsewhere, or implicit references where the
value conveyed in the document is part of a controlled list. In both cases, a client
application parsing the document must be able locate those external resources to either
complete the externalized parts of the dataset, or to take action according to the values
serialized in the document.
The problem that is explored here is how a dataset making use of external and internal
references can be adequately parsed and ingested by a client application. What is
expected from a client application to use such a document? What information the client
needs? When does the client need to actually fetch a resource? How does it extract this
information ?

                                                                            GIN DD-09-001

3. Linkages in GML documents
A formal way to make explicit links to a resource in XML documents is through XLink
(deRose et al, 2001). These links are explicit in a sense that the client knows exactly
(assuming the client is xlink aware) where linkages are made. GML applications add a
second type of external reference that defines an authority for the content of a property in
order to constrain the values that can be used. The authority (a.k.a code space) can be a
resolvable identifier that can lead to a list of processable terms. We expect that some
client applications might want to resolve those lists for various reasons, such as
displaying a human readable text to replace a machine readable code (for instance, where
multiple languages are involved) or to validate the content of the property. One other
important use of those vocabularies is to query a datastore. Any non-numerical
properties can only be queried effectively if the client has a prior knowledge of the list of
terms and their meaning.

Linkage is expressed using a „pointing‟ mechanism that can either be explicit, by
providing the location of the resource or it can be implicit, where a unique key is
provided to access the resource but the location and the method to get the resources is not

For GML and OGC, this unique key is represented as a URN (Uniform Resource Name,
see http://tools.ietf.org/html/rfc1737). URN is a subtypes of URI (Uniform Resource
Identifier). Another type of URI is the well known and URL (Uniform Resource
Locator). URN is a scheme to represent an identity while URL is to represent a location.
The common metaphor is to compare URN to a person‟s name (assuming it is unique)
and URL to his address. URN are persistent, URL are variable. The URN does not
provide the location of the resource it represents, just its identity. The location of the
resource identified by the URN must be resolved (or translated) using a separate system
called a resolver.

The benefit of URN + Resolver over explicit URL is debatable. While URN does indeed
shield resource identity from its location, it does it by dumping a lot of mechanic on the
client application. It also implies that some support infrastructure must be created to
provide URN resolution. What is gained by not relying on volatile URL might be lost by
relying on a complex resolution infrastructure. Furthermore, the volatility of URL is not
inevitable. If there is enough will to maintain the service domain name unchanged, the
URLs can become as persistent as a URNs.
We can provide this initial list of pros and cons:


      URN are atomic, therefore, they are easier to use as keys (while a URL must be
       parsed to extract the key). We can‟t use the whole URL as a unique key since the
       location of the service might change, parameters might be slightly different, etc).
       This is one of the important use cases in dataset processing.

                                                                            GIN DD-09-001

       URNs are decoupled from the location of the resource, therefore more robust to
        service location. Furthermore, URNs are independent of access protocols,
        therefore alternative API can be used to access a resource. URL implies an access
        method (WFS1, SKOS2/SPARQL3, SOS4, etc.) and prevents alternative routes to
        be taken by the client application. In other words, URL does not provide good
        separation of concerns
       Resolver can be engineered for robustness (redundancy) so when a resolver is
        down, another can take over transparently.
       URNs are structured and thus can provide some information to a client about
        authority and nature of the resource being referred. Furthermore, the syntax
        constrains are explicit (order of elements and case sensivity) and does not allow
        creative variations, ensuring that the URN is always presented the same way
        (more on this later). Also, they are good regular expression5 targets.

       URNs require an external resolver that can suffer all the service access problems
        we can are used to (server down, corrupted registries, unresolved URN, slow
        service, proxy configurations). Granted this can also happen with URL, but
        URNs add one more layer of indirection where up and running WFS (from which
        the user just got the document) is unreachable because the resolver is down.
       Complex architecture and several resolution services required (collection of
        several registries and authorities). Certainly not a KISS system. Having a service
        that is "resolver" ready implies a lot of work.
       It is likely that a URN resolver will be tied to a community and document that
        bridges across communities will be complex to resolve. It essentially shifts the
        "where's the resource" problem to "where's the resolver" problem.

One of the core arguments to prefer URN over URL is URN persistence versus URL
volatility. Although this is not strictly true, in theory a URL could be made permanent,
there is one more quality to URN that is important to consider. URN syntax imposes a
strict structure to the identifier while URL is looser. The consequence is that several
URL strings can actually represent to the same resource. Although it‟s not a problem for
resolution, it is a problem when the URN is used as a unique key. A unique key is useful
to compare two resource pointers to decide if they are the same (ie, is this vocabulary the
same as this other vocabulary ?). With the looser URL scheme, the same vocabulary
could be accessed through many different URLs, or the “same” URL with slight

  See <http://www.opengeospatial.org/standards/wfs>
  See <http://en.wikipedia.org/wiki/Simple_Knowledge_Organization_System>
  See <http://en.wikipedia.org/wiki/SPARQL>
  See <http://www.opengeospatial.org/standards/sos>
  See <http://en.wikipedia.org/wiki/Regular_expression>

                                                                                        GIN DD-09-001

One further problem with URL is that data provider must coordinate service location and
access method with service provider from other governances. It‟s more efficient to leave
this problem to an external (central) resolver than to maintain this mechanic on the
provider‟s end.

Example, let‟s consider those fictive URLs. They are all different but could point to the
same resource, in this example; they could represent a vocabulary for a term. :

        <property
        <property
        <property
        <property

(a) and (b) are just different servers, or alternate names that points to the same IP 6 while
(c) is the same as (b), but the parameters are in different order and different case, which is
perfectly legal as far as URL is concerned. (c) and (d) are again the same server, but
different APIs. When those URLs are needed to provide a unique identifier (are two
identical terms coming from the same vocabulary ?), URLs can‟t be trusted unless there
is a very strict encoding rule for URL, and even this would not help if the vocabulary is
accessible from various servers or through various APIs. Of course, we don‟t expect that
URL structure would vary when provided from a single provider, but the whole point of
interoperability is that we must deal with many providers. URN syntax is just easier to

Bottom line: URNs are good when we need to identify thing, URLs are good when we
need to locate thing, which is precisely what their names stand for!. Choosing between
URN and URL is really a matter of defining the role of the URI.
3.1. URN Structure
URNs are structured strings following a strict rule of hierarchical composition. The
syntax is defined in http://www.ietf.org/rfc/rfc2141.txt. It is composed of a series of
alphanumeric strings separated by semi-colons. Strings are case-sensitive and the order of
the strings (the scheme) is controlled by some governance. Reed (2004) proposed a
specific URN for OGC while Cox (2008) is proposing a CGI scheme at
https://www.seegrid.csiro.au/twiki/bin/view/CGIModel/CGIIdentifierScheme . Thoses

 A typical situation in countries that have more than one official language. For instance, Natural Resource
Canada URL is either http://www.rncan-nrcan.gc.ca/com/ or http://www.nrcan-rncan.gc.ca/com/ (note the
rncan and nrcan positions for Ressources naturelles and Natural Resource). They are both perfectly valid
and can be used interchangeably. But any AJAX programmer knows that JavaScript thinks otherwise.

                                                                                GIN DD-09-001

schemes regulates the creation of URN identifier for GeoSciML and therefore has
ramifications into GWML. So far, there is no formal URN scheme for GWML (which is
a gap).

4. Resolver and Resolution
Once we have a URN, we need to fetch the resource it stands for. This is done through a
“resolver”. A resolver is an agent who locates a resource, usually on a network,
identified by a unique key (here, a URN). The actual location of the resource might not
be explicit in the dataset (ie, from the service that provided the dataset or provided
elsewhere in the dataset). It's the role of the resolver to bind this location. The resolver
must have access to a registry (a database) to either match an identifier to its location (or
use part of the URN structure to match a service that can return the resource) or use a
“DNS” (Domain Name System7) approach of invoking a series of services hoping that
one or more of those service can resolve the location of the resource. In the latter case,
the resolver does not have prior knowledge of which service can reach the resource. The
service to be invoked can be as simple as the direct address to a document on the web or
a more complex parameterized request through a formal API (eg, WFS GetFeature)
where the URN (or part of it) becomes parameters to be passed to the service. Each of
the services invoked by the resolver can themselves be resolvers, producing a cascading
resolution service.

Several resolution approaches are proposed in Daniel (1997).

         N2R: URN is sent to resolver and resource is returned. An example is a Web
          based application (such as the one proposed by CSIRO) where a user copy/paste a
          urn in a form and expect to download/see the resource without knowledge or from
          where it actually got it. The common implementation uses HTTP 303
          (http://tools.ietf.org/html/rfc2616) status code to tell the client application that the
          resource location has changed. The resolution is done by a process of redirection.
         N2L: URN is sent to resolver and full url is returned. The client ends up with the
          full URL to the resource and might decide to fetch the resource of not (or defer
          the resolution, like turning the string into a HTML hyperlink for the user).
          Obviously, N2L might return a N2R link.
         N2C: URN is sent to resolver and description of the resource is return. This
          technique allows the client application to get further information about the
          resource, and maybe even use this information to interact with the user.

N2C seems to be the most interesting approach since N2R and N2L are essentially
embeded in this solution. If we stand on the client application side, N2C allows all
possible operation, assuming that the information provided covers all the need. With a


                                                                              GIN DD-09-001

pretty thin layer on top of a N2C, it can be turned into a N2R or a N2L. Therefore, at the
core, a resolver should always be able to provide a N2C response.

While all URNs are supposed to actually point to something, they are not all meant to be
resolved. They might only be flags that affect the interpretation or processing of the
document. Therefore, an efficient resolution process is not reduced to a replacement of all
URN to their corresponding resources. Resolving or not resolving the URN depends of
the context and where a large amount of URNs are to be processed, we can't request the
user intervention for each occurrence. Furthermore, there are different environments
where resolution is performed:

          Deserialisation : The dataset is read from it's XML form into some memory
           representation by the client application. The decision to resolver a URN is
           specific to the application.
          Transformation : The dataset is processed to generate another XML document.
           We could see this happening in WPS (Web Processing Service, Schut, 2007)
           where a XML document containing URN is processed and relevant URN are
           replaced by either URL or by the resolved resource. This is an potential use case
           where we need to deliver data to a client that does not have resolution capabilities
           and where it expects valid GML (ie, a generic WFS client such as Gaia8).
           Otherwise, we limit data to only resolver enable client application. We shall at
           least consider this option to resolve fragment URNs (to be discussed later). In
           this scenario, WPS or any other middleware application, would act as a
           preprocessor interacting with a resolver. This use case has some added
           constrains with regard to deserialisation as we suppose the resulting document
           must be schema valid, therefore:
          Non embeddable resources won't be resolved
          Resolved fragments must be valid and respect gml:id value uniqueness and non-
          The same resource should not be repeated, which is a different issue than gml:id
           repetition. The same feature can appear twice in the document under different
           gml:id (specially if the same feature comes from different services, for instance
           when the original document comes from an aggregation process). Of course, this
           is not the expected result.
          Query building. For some properties, the filter need to refer to a vocabulary item,
           therefore the client application must allows the client to pick terms for a
           predefined vocabulary. In this case, we need to fetch a list of terms to provide it
           to the user.
          Content validation (conformance test) is another use case where the resolution
           process is used differently. Now the references must be resolved to be checked
           against a controlled list. This usage is similar to #3


                                                                           GIN DD-09-001

This actually brings a broader discussion on how a generic client can handle a domain
specific GML documents. Even if we ignore the URN, we expect a generic client to be
only able to 'understand' basic GML entities and all domain specific entities do not have
any significance. The client app can at best display the data or provide tools to navigate
the database and maybe call a resolver to get to more data. Why would a generic client
resolve a URN to fetch a piece of information that it does not have a prior need for ?. The
only thing a generic WFS client can do is to draw the geometries (because it knows GML
geometries). Then one might ask: who will ever use such a client application just to
display geometries? Why bother to create a rich data transfer format if the client
application can't do much more than handle the geometries. Following this logic, we can
argue that case #2 (above) is not a real requirement because any application that can
ingest GWML will need prior knowledge of the domain and then won't need to
systematically resolve all URN in the document but only those that are specific for a
given task (e.g., draw a water well, get a water level, etc).

4.1. Resolution cases
There are four distinct cases of resolution (or non-resolution) of URI, expressed in the
context of a GML document that is being processed by a client application.
4.1.1 Fragment

GWML (or any complex GML document) can be deeply nested and even recursive. This
produces WFS responses that can be quite large and some deeply nested elements are
maybe not relevant for the client and outside the query scope but still serialized because
of the connectivity between features. The extra features shall not be serialized but
"pointed to" and leave the client application to decide if it want it or not. Fragment are
always in xlink:href (but not all xlink:href are fragments, most notable exception
are 'null' pointers, such as eg: urn:ogc:def:nil:OGC:missing).

4.1.2 Flag

This is probably the case where URN is incontestably the best solution. Some of the
URNs are just really flags and are not meant to be resolved (the client is not expected to
fetch something), but they just trigger some behavior in the client application. Eg:
identity identifiers such as “http://www.cgi-iugs.org/uri”, uom, srsName and null value
identifiers. Although the URI could be resolvable, they don‟t resolve into processable
content (it is not garanteed to be XML, and even less garanteed to be valid XML for the
current schema). Some flags can appear where normally a fragment is expected
(xlink:href), such as null values (eg: urn:ogc:def:nil:OGC:missing). Seegrid (2007a)
defines this category as pointer to “Resources that define service behavior”. A client
application should have a prior list of those URI and should know what to do with them.
Flags can be seen as terms from very high level vocabularies (at GML level) that are
common to all GML applications.

                                                                             GIN DD-09-001

4.1.3 Vocabulary (or Controlled Concept)

They are different from Flag as they actually point to content that a client application
might want to read (often a list) and might even be processable (ie, ControlledConcept)
but we don‟t normally resolve them in the context of a document. It is expected to be
used on the client side to populate user interface and are likely to be cached prior to any
processing. Therefore, they shall not be resolved in the „response‟ document (they are
probably already „known‟ by the client). A client might want those URNs to be resolved
in the document but there are much more efficient way to use vocabulary identifiers. In
ideal case, Vocabularies are „already resolved‟ in the client framework. A client
application should be configurable and should be able to fetch the list (dynamically or
through some sort of configuration is the reference point to some non processable form,
ie uom). Vocabulary are generally delivered as ScopedName in GML (some might
appear in xlink:href) while others are located in attributes. ScopedName are composed of
a pair of an authority and a term :

<someProperty codeSpace="authority">term</someProperty>

The codeSpace contains the authority, that might be a URN and a term, that also might be
a URN. In GWML, we expect the codeSpace to always be a URN. The codeSpace URN
should be resolvable. If the term is a regular string, it must be part of the authority list
but there are no formal way for the application to check. This term is not resolvable (not
by a formal resolver mechanism anyway). If the term is a URN, this URN should be
resolvable and return a formal dictionary item. GeoSciML used to promote
gsml:ControlledConcept (a subtype of gml:Dictionary) but the trend is now to use SKOS.

A very special codeSpace value (http://www.cgi-iugs.org/uri) acts as a flag to indicated
that the value has a special meaning.
The value is a global identifier (see 4.1.4) and provides the identifier of the current object
(it‟s not a pointer). This role is restricted to gml:name or gml:identifier and the value
must be a URN.
This value is a special pointer, such as urn:ogc:def:nil:OGC:unknown, to indicate a
special value and this value is a URN.
      That the value is a global identifier which is not part of any governance

An important aspect of using ScopedName and a URN terms, is that, although the term
URN is in theory a globally unique identifier, it must be pointed out that the term can be
simultaneously part of several dictionaries. Therefore, the codeSpace URN is also
important, and not redundant. The uniqueness of the term key is not sufficient for its

4.1.4 Identifier

                                                                           GIN DD-09-001

This final case is when the urn is present in a document as a unique identifier for a feature
/ type. Obviously, this URN should not be resolved because the resource it represents is
already in the document. In GML 3.1.x, this is encoded with gml:name. When this
name has a codeSpace = http://www.cgi-iugs.org/uri, it means that the id is
globally unique. For GML 3.2.x, this role (global identifier) has been transferred to the
new property gml:identifier. A client application should recognize the entity that
embed this identifier as a serialization of fragment of a feature / type and as such, is
potentially the target of a fragment urn. There is According to Seegrid 2007a, the
codeSpace of gml:name and gml:identifier provides the url of the resolver (see Note

4.2. Resolution Rules Discussion
4.2.1 Resolution depth and traversal
Given the large amount of external references in a typical document might contains, a
client application should not rely on user intervention on each and every URN.. The
client therefore needs some rules while processing a dataset. Resolving all the URN just
because they are there is certainly not an option, even if we limit the resolution to
fragments. Actually, resolving all fragments is likely to lead to extraction of the
complete database, just because of the interconnectivity of entities in complex models.
This discussion is not restricted to resolver, but to broader external reference, be it
URN+Resolver or URL

A typical example in GWML and GeoSciML is the connectivity between MappedFeature
and GeologicUnit. A MappedFeature is indirectly connected (Figure 1) to all the other
unit‟s MappedFeatures through
MappedFeature/specification/GeologicUnit/occurrence/MappedFeature. This means
that a document serialising a MappedFeature selected in a given bounding box could
potentially serialise all the MappedFeatures nested in GeologicUnit. The extra
MappedFeature might obviously not be in the bounding box of interest.

                                                                             GIN DD-09-001

Figure 1: Interconnectivity between MappedFeatures

If we look at the gml document (we removed the properties that are not relevant for the
discussion, this document won‟t validate)

<?xml version="1.0" encoding="UTF-8"?>
                <gsml:MappedFeature gml:id="mf.1">
                               <gsml:GeologicUnit gml:id="a">
                                       <gsml:occurrence xlink:href="#mf.1"/>
                                       <gsml:occurrence xlink:href="urn:cgi:feature:GSC:mf.2"/>
                                       <gsml:occurrence xlink:href="urn:cgi:feature:GSC:mf.3"/>
                                       <gsml:occurrence xlink:href="urn:cgi:feature:GSC:mf.4"/>
                                       <gsml:occurrence xlink:href="urn:cgi:feature:GSC:mf.5"/>
                                       <gsml:occurrence xlink:href="urn:cgi:feature:GSC:mf.6"/>
                                       <gsml:occurrence xlink:href="urn:cgi:feature:GSC:mf.7"/>
                                       <!-- ad nauseam-->

By blindly resolving all fragments, we can end up with a large amount of MappedFeature
that are simply out of context. Furthermore, GeologicUnit also have other properties that
link to even more features and so forth. This has an important consequence for a
resolution process. How a client can tell if a property target should be resolved or not?

In this BBOX filter case, extra MappedFeatures are clearly out of scope.
Depending of the goal of the client application that ingest the dataset, resolution needs
will be quite variable. We can imagine that a specific application, such as a tool that

                                                                                           GIN DD-09-001

renders well logs, will follow a predefined path until it locates the information it needs to
fulfill its role. This tools will "know" what URNs are worth resolving. This further
enforces the idea that domain specific GML application can‟t be processed by generic
clients (any beyond pratical purpose).

But requirements for systematic resolution do exist. One case that has been explored in
GeoSciML Testbed 3 is dataset importation where a GeoSciML document is loaded into
a GIS database structure. On the MappedFeature / GeologicUnit case, when does it stop?
Another similar case, but far more systematic would be a generic tool that explores the
dataset, such as a reporting tool, might want to follow all possible paths.
In GeoSciML Testbed 3, the pattern that has been used was to not serialize the
gsml:occurrence properties of embedded GeologicUnit9. This is problematic because it
could be interpreted as GeologicUnit has no MappedFeature

We can call those strategies as "fetcher" versus "crawler".

        A fetcher tries to get from a dataset a specific set of information in order to
         produce something. It resolves a URN because it needs to access something
         specific and ignores irrelevant references. Resolution is guided by local rules.
        A crawler reads in an arbitrary dataset and analyses the content. The crawler
         does not have any prior requirements. A 'content' validator, such as a tool that
         validates a profile conformance, might be tempted to crawl the whole document
         (that might end up being the whole database unless there are some boundary
         rules). A report tool or a graph tool (a tool that creates an interconnection graph)
         would also fall in this category.

The issue of resolution depth does not apply for the fetcher; it will traverse the document
until it finds what it needs. The crawler is more of a problem. How do we tell the
crawler to stop? How do we scope the limits of a response to a query? Is there a
reasonable use case for crawlers?10

Furthermore, especially when it comes to fragments, should the resolver automatically
resolve URN that are present in the response (then how to we prevent recursive
resolution)? Should the resolver be minimalist when it comes to fragment (returns the
strict minimum). If the result is minimalist, a long path
(classification/GeologicUnit/event/GeologicEvent/eventAge/CGI_Value) will require
2 resolution requests11 (for each instance !).

  It was not to prevent rogue resolution then, it was only to homogenize the documents returned by each
participating agencies into a manageable dataset.
   There is probably a good opportunity to use other attributes available in XLink. Xlink:role could provide
such a boundary.
   eventAge is not a third request because the content is a DataType, that must be inline, therefore, it will be
serialized with GeologicEvent

                                                                                         GIN DD-09-001 Best practices and recommendations:

Resolver should implement safe guard to prevent recursive calls to themselves (infinite
loops). Less of a problem if we agree on minimalist results.

4.2.2 Identity of the resource

In the context of a fragment, and when this fragment can be extracted through a WFS
service, this WFS capability must expose this feature or object type or ensure it is
available from some service. It is rarely the case, most WFS only exposes a subset of all
the feature types contained in the model. Either a WFS serving URN should be forced to
expose all the Feature that it could externalize or unexposed features should not be
externalized and be serialized in the document.

To resolve a URN to a feature or an object, we must be able to send a request to a WFS
and pass a unique identifier for this feature or object:

       WFS 2.0 (ISO 19142 section 7.2.1) states that "each feature instance in a WFS
        shall be assigned a persistent unique resource identifier"… and further "and the
        value do not have to be meaningful outside the scope of the web service instance".
     WFS 1.1.0 (OGC 04-094 section 7.1) has a similar statement ; " That is to say,
        when a WFS implementation reports a feature identifier for a feature instance,
        that feature identifier is unique to the server and can be used to repeatedly
        reference the same feature instance (assuming it has not been deleted). It is further
        assumed that a feature identifier is encoded as described in the OpenGIS® Filter
        Encoding Implementation Specification [3]."
This means that the id is not globally unique, which is a problem because we cannot
assume from the URN what is the unique identifier assigned by the service, unless there
is some rule to create the URN so we can deduce the service id from the URN.
There is an alternate way to get to the right feature using a FILTER statement an target
directly the gml:name or gml:identifier.
<?xml version="1.0" encoding="utf-8"?>
<wfs:GetFeature maxFeatures="1" service="WFS" version="1.1.0">
<wfs:Query typeName="gwml:WaterWell">

  This is not an absolutely precise request since we should provide the right codeSpace for a unique
identifier (http://www.cgi-iugs.org/uri). But WFS 1.1.0 XPath support does not allow conditions in paths and
or ogc:And clause is scoped to the Feature level (and not the gml:name) so it can't be expressed (WFS 2.0
solves this). But since the URN of the feature is supposed to be unique, there should not be any abiguities.

                                                                                   GIN DD-09-001

This works, but only for Features. The WFS specs states that this unique identifier is for
feature only. No mention of Type such as EarthMaterial.

        WFS 1.1.0 provides a way to extract a specific object through GetGmlObject, but
         the ID that this operation requires is the gml:id, not the URN that is exposed in
         gml:name or gml:identifier. Unless there is some sort of rule to find the gml:id
         from the URN, we can't target a specific Type from it's URN.
        WFS 2.0 (ISO-19143 Filter encoding actually) does seem to solve the problem by
         defining the element fes:ResourceId as a generic element for referencing
         resources by id. See section (7.14.2 Id capabilities)
        WFS 2.0 also supports stored queries that could be used as an alternative method
         to access features and object. Best practices and recommendations:

        A fragment shall not be externalized if the feature is not exposed in the WFS
         Feature List of a service within the community (does not appear in a
         GetCapabilities of a WFS somewhere)
        URN of a fragment shall be constructed from the persistent identifier (gml:id) to
         allow a WFS 1.1.0 REQUEST=FeatureID or a REQUEST=GetGmlObject to be
         used to extract a given feature or object. 13
        WFS 2.0 service should be configured to support URN as identifier.
        N2L resolvers should provide, when possible, a gml:id in the parameter
         (assuming the URN can be turned into a gml:id)
        Fragment expressed as WFS request URL should target gml:id when possible.

4.2.3 Extracted Resource Structure

When a URN is resolved to its resource and this resource is some processable database
(the resource we extracted is expected to be parsed by the client application), the API
used to extract the resource should not leave any artifact. For instance, a fragment URN
will likely be resolved using a WFS request. A WFS request returns the requested feature
embedded into a collection object (wfs:FeatureCollection). The client invoking the
resolver should not know that a WFS request has been invoked. Therefore, the resolver
should remove all the services artifacts before sending the resource. If the resolver is a
N2L (URN to URL), it should provide a XPointer doctored URL.

  There is one great benefit of request=FeatureID. This request does not need to know the typeName
which removes one problem for the resolver.

                                                                              GIN DD-09-001     Best practices and recommendations:

Resolver shall remove all service artifacts
N2L shall be formatted using XPointer to target the fragment of interest

4.2.4 Service parameters

URN only provide the identity of the resource but does not prescribe any feature
representation options. Therefore, the resolver does not know anything about the
serialization options that should be used when this resource is requested. For instance,
WFS provides several serialization options:

         Projection of the geometry (srsName)
         Format (outputFormat)
         Profile or any other parameter that affects the output of the feature (wfs:Property,
          traversal depth)
Since the resolver cannot know the context of the client calling him, it does not know
what format or projection the client is expecting. For the projection, it's conceivable that
the client could be smart enough to reproject the geometry by itself. Format is trickier
because we can run in situations where the client is processing a GML 3.2 document and
is trying to resolve a URN that returns is returned GML 3.1 document. Profile is also a
problem because some WFS might remove some optional properties in some profile,
properties the client application would like to actually fetch. The WFS ways is to provide
proper wfs:Property parameters, but the resolver can't know this. We are then forced to
pass parameters to the Resolver or have a small set of resolvers profile that would fit the
client needs.

A proposed solution is to state that any parameters passed to a resolver is passed to all the
services it cascades to. This means that a resolver that is invoked with a srsName
parameter will eventually pass it to the WFS service. The rule that all parameters that the
resolver does not know about is passed untouched.


will be transformed by the resolver to

         FEATURE_ID is not passed because the resolver needed it to pass its own id. We
          assume the resolver knew how to convert the URN to a gml:id
         resolver added SERVICE, VERSION and MAXFEATURES and would have
          therefore ignored those if passed by the client

                                                                             GIN DD-09-001

      SRSNAME is passed untouched to the WFS because the resolver has no idea
       what to do with it. Best practices and recommendations:

      Provide resolution profiles that cover the most common parameters
      Create a WFS specific Resolver API (that supports relevant WFS parameters) or
       set as a rule that resolvers must pass parameters to all cascaded service as-is.

4.2.5 Error handling

How should errors be reported ?. The common wisdom is to generate a HTTP error
(such as 404)?. It‟s probably good for a single resolution, but what to do when a large
collection of URN are processed?. Does a single resolution error should stop the whole
process? We suspect the behaviour will depend of the application. N2C resolver are
probably the best solution here since the error message can be embedded in the resolver

4.2.6 Vocabularies and queries

To build a query that filters a property using a vocabulary, you must get this vocabulary
somewhere. Ironically, the only place the user can know what vocabulary is being used,
is when he gets the document because he can looked at the codeSpace. The typical form
of vocabulary references are shown in this snippet.

  <gsml:rank codeSpace="urn:cgi:classifierScheme:GA:Rank">formation</gsml:rank>
    <gsml:value codeSpace="http://www.cgi-

   o   #1 tells that the rank value comes from a vocabulary list identified by
   o #2 tells that the value is a globally recognized URN for a null value.

In order to build a query, a hypothetical application needs to provide some sort of pick
list, presumably available in various languages. Therefore, it needs to know:

      What vocabulary or vocabularies is relevant for a given property
      Where to get it
      In what form it is (can the client parse it or just flash the resource to the user)
      Which service supports it.

                                                                                 GIN DD-09-001

Some of this information is coded in the instance document (which the client does not
have yet) and this document does not provide all the alternative vocabularies (just the one
that the specific document happened to use). Therefore, there is a need to provide a way
for a client application (or a user) to locate a vocabulary and explore the list of terms.

     a) To get the right vocabulary, we must figure out which property is in context. #1
        is gsml:rank and #2 is gsml:observationMethod. This means that the scope of the
        vocabulary is not necessarily the property itself (#2 property in gsml:value). The
        client application must be smart enough to understand the context. Furthermore,
        property names are not globally unique and might represent different things,
        therefore the real context is the name of the feature + name of the context
        property. Word of caution. Features can inherit properties from their super types!
        We'll assume at this point that the client application knows the context property
        and this property has a URN identifier14

     b) Once you have this context, you must match it to a list of potential vocabularies.
        Identically, only one should exist per community, but some vocabulary are local,
        such as lexicons, so we might need to scope the search.

     c) Once you located the vocabulary, you must get the list of terms or, if the list is not
        processable (or is not a known schema) the user must be informed that she/he is
        on her/his own to figure the term to use.

     A possible solution to address this

     (a) use a registry / catalog. Not a resolver because a resolver would return the
         property itself (presumably in xsd). We need to query a catalog of vocabularies.
         This would return a series of vocabulary URN.
     (b) Querying the info from the catalog is obviously done through CSW. Now, where
         do we get this catalog url ? I suppose this is the same problem as "where's the
         resolver". See the next section.
     (c) One we have a URN (or the user picked from a list), the client application can
         fetch the list of terms, or a more formal SKOS / gsml:ControlledConcept
         vocabulary that can be offered to the user. Definitively a resolver job here. We
         assume that the client is either smart enough to figure what is the format of the
         vocabulary or has some way to know. We suggest this format information should
         also be part of the catalog. Best practices and recommendations:

  see O&M for an example of property URNs, OGC 07-022r1, om:observedProperty actually uses those
property URNs.

                                                                           GIN DD-09-001

   o Implement a registry of vocabularies based on the context property
   o Registry shall contain a format property and other parameters the client might
     need to successfully parse the vocabulary or take any other actions. Eg, if the
     format is pdf, it can provide a link to the user. Would be nice to have a bit of
     direction in the catalog to help the user.

5. Resolver Location and resolution architecture
A common question, that usually triggers chicken and egg discussions, is “How does a
client find the location of the resolver”. There are many ways to provide this value to an
application and the real answer seems to be that it is an architectural element that is tied
to a specific community. We don‟t expect that all URNs that can possibly be generated
by all communities can be resolved by some sort of universal resolver. Resolvers are
normally attached to a registry and the latter is generally under some authority.
Therefore, a dataset is likely to require more than one resolver. This raises the problem
of matching a URN with a resolver. Several resolvers could be unified into a single
service and this service can dispatch to several resolvers, but this only offset the problem
from the client to the server; more than one resolver are still required. Several options
have been offered to provide the location of the resolver to a client. Two aspect of the
resolver location must be considered

      Where is the resolver or where are the resolvers ?
      If there are more than one resolver; which resolver can handle which URN ?

5.1. How many resolvers ?
5.1.1 One resolver

This scenario is the simplest for the client point of view because all URN resolution
requests are routed to a single resolver service. This implies that the resolver is smart
enough to dispatch or "cascade" URN resolution to other resolvers when it can‟t process
it itself. It is unlikely that a single resolver service (all possible resources in a single
registry) is possible. We simply can't prevent cross domain documents (actually, GWML
is a good example, we cannot expect GeoSciML community to store GIN resources,
therefore, the resolver must be "cascading" to other resolvers). A "dumb" version of the
cascading resolver is a system that 'hits' all resolver until one returns something relying
on URN uniqueness.

5.1.2 Many resolvers

The client application is provided with list resolvers. It then must figure out where to
send the resolution request. The problem here is how to match a specific URN to a
specific Resolver. One assumption is that the structure of the URN could be a clue and

                                                                                     GIN DD-09-001

resolver could be provided with a series of regular expressions that matches the URNs a
resolver can handle. Or the client fires a request to all resolvers and hope one will

In those two cases, the question is "where is the dispatch mechanism" (matching a URN
to a resolver). In the single resolver scenario, the dispatch mechanism is delegated to the
root resolver itself, which can further delegate the function down. In the second scenario,
the dispatch problem is dumped to the client (at least initially) so we expect a bit more
information is provided to the client to send the URNs to the right resolver (otherwise, it
is forced to send it to all resolvers). Details of the dispatch are not opaque anymore since
the dispatch details must be provided to any client requesting it

There are obvious advantage of using the single resolver approach because it's really a no
brainer for the client application. The dispatch rules are opaque and maintained by a
third party. The flip side is that any data provider that publishes URNs that are not
known by the community must register their resolver with the community OR build their
own cascading resolver (which will cascade to the community resolver when needed).
Good examples of this issue are stratigraphic lexicons which are specific to a provider.

Tableau 1: Pro and con analysis of dispatch mechanism

                               Pros                                   Cons
One resolver                   Simpler for the client (no brainer)    Harder to expand, force service
                               Opaque rules, fined tuned /            to register their resolvers to
                               managed by community                   community (or many
                                                                      communities) otherwise client
                                                                      won't see it or to build your own
                                                                      cascading resolver and
                                                                      publish/maintain it
Many resolver                  Expandable, one can add his own        Client must deal with dispatch
                               simple resolver (without dispatch      rules.
                               since the client handles it)           Must create some sort of
                                                                      formalism to provide dispatch
                                                                      rules to client or assume a dumb
                                                                      client that generates a lot of

5.1.3 Discussion

As it is often the case, the best solution is somewhere in the middle. One possible design
would be to accept a very limited set of resolvers, that are invoked in a precise order. The
ideal setup would be 1 or 2 resolvers. The first resolver is the community resolver and
the second (and maybe more, but it should be discouraged) is the local resolver. This
would just reflect the governance structure of the URN used in the document, the first
resolver deals with externally governed URNs and the second (or more) with locally
governed URNs.

                                                                           GIN DD-09-001

An alternative is to avoid URN for local references and use URL instead. This means
that only externally governed references are going to a resolver and the others are directly
invoked by the client. This is appealing for fragment because the resolver does not need
to deal with algorithmically generated URN (ie, FEATURE_ID requests). Therefore, all
locally governed fragments pointed by xlink:href would be full URL + XPointer.
Vocabularies that are locally governed would provide the location of the vocabulary in
the codeSpace.
5.2. Where is the resolver ?
Probably the most frequently asked question. How does the client know where to send
the resolution request? There are many options and some of these solutions can support
both single cascading resolver and multiple resolver options.

5.2.1 Well Known Resolver(s) (WKR)
The URL of the resolver (or resolvers) is well known in a community and this is provided
to the client application through manual configuration. The design essentially says that
the application does not get the resolver address automatically from anywhere, it's
configured in the client application.

5.2.2 Resolver advertised by the provider
The service provides the URL of the resolver (and maybe some other information
depending of the dispatching model). In OGC world, we expect this to appear in the
GetCapabilities document of W*S services. There are provisions in the OGC spec to
provide "Vendor tags" or Extended Capabilities. For example, see

       o § 7.4.6 of 06-121r3 OGC Web Services Common Specification.
       o § 6.5.11 of 01-068r3 Web Map Service Implementation Specification.15
       o § 14.3.4 of 04-094 Web Feature Service Implementation Specification.

One or more resolvers could be advertised using this technique. Obviously, this
technique must be adapted for non-OGC services (SKOS ?) Another problem is this
implies that the dataset comes from a service, which is not always the case. We could
have datasets on a CD or sent using a different medium (by email, ftp, download from a
web site). For each of those cases, the client application would need to use various
strategies to get to the resolver address. One option is to ensure that disconnected media
to be self-contained and all external references should also be distributed on the media.
This also means that all external references are now static for this dataset. We suggest
that disconnected media still have a way to provide the client application with resolver
location(s) and the dataset still uses URN as much as possible so the client can still opt
for remote resolution.

     WMS 1.3.0 does not seems to allow this

                                                                                            GIN DD-09-001

5.2.3 Provider is the resolver
You generate it, you resolve it. Probably a subtype of 5.2.2. The subtlety is that the data
provider must implement all the resolution services, and is a de-facto "single resolver"
design that will cascade the request to other resolvers. We could expect that the domain
name of the resolver be the same as the service, although this is not what the W*S specs
imply because service location of each operation are provided in the capabilities
document and therefore not required to be the same as the base service URL16. Because
this solution is similar to 5.2.2, it also suffers from the same problem. The dataset might
not come from a service.

5.2.4 Codespace provides the resolver

In this case, we assume that all xlink:href are URL (because there are no other place to
provide the service URL) and all codeSpace must contain the URL of the resolver for a
specific term. URN are just used for flags and vocabulary terms.

The main implication is that Vocabulary identity is the URL and this URL should be
persistant. There is one serious limitations to this model where the access method and
the location is fix (no alternative route, nor alternative API).

Cox proposed17 to use xlink:role to hold the resolver URL but this solution is limited to
xlink (fragment) cases.

5.2.5 Resolver(s) from the dataset itself

This solution proposes to embed resolver directly in the dataset, (but not in the codeSpace
like 5.2.4). A special metadata block could be added to the head of the document
(perhaps using ISO-19115 Service metadata) or create a extension to either ISO-19115 or
GWML. This solution is similar to 5.2.4 but it does not block the codeSpace attribute
with resolver location and leaves it with vocabulary identity. This information is provided
elsewhere in the document. The immediate advantage over 5.2.2 and 5.2.3 is this does
not require a "service". The resolver information is embedded in the document. But it
suffers from the same problems 5.2.4 have. It's just another way to hardcode resource

5.2.6 Resolver(s) from a registry

This is a slight variation where the resolver is not advertised directly but one or more
resolver can be extracted from a catalog (a registry). The problem is shifted from
  This detail is often overlooked by client applications that just assume that all operations are located at the
same URL as the GetCapabilities URL.


                                                                          GIN DD-09-001

"where's the resolver ?" to "where's the catalog ?" so we still need one of the solution
proposed in 5.2.x. The only up side of this solution is when we need to deal with many
resolver but the number of resolver is not known by the service. Using a cascading
resolver fixes the problem at the source however.

5.2.7 Resolver is explicitly referenced using a URL

This option is a mixture of 5.2.4 and full URL to the resource. As far as the client is
concerned, it is a full URL approach. The twist is that URLs are pointing to a resolver
(or a limited set of resolvers). It pretty much solves the service artifacts and XPointer
problems and WFS gml :id versus URN issues because the URN is used as an argument.
If a community can agree on a resolver API syntax, this could also address some of the
URL variability issues. It finally shields API variations because only the resolver API is


<property codeSpace=”2http://ngwd-


<property xlink:href=”http://ngwd-

5.2.8 Resolution is externalized

5.2.1, 5.2.2 and 5.2.3 models assume we deal with a resolver aware client, which greatly
limits the number of client that can access the services. One option to consider is to
provide the client application with a WFS (or some other service) façade (A service that
exposes the same API as a regular WFS – see Figure 2) that would do the resolution
process before the document is sent to the client and provide it with a URL based (all
URN are turned into URL, where possible) and all dictionaries urn are turned into human
readable terms (potentially in their language of choice).

One might ask, why bother generating URNs if they are to be turned into URLs
immediately? The short answer is that URNs are managed under different governances
therefore the service generating the original document might not have the knowledge on
how to turn shared URNs into URL. (Service A generates a reference to
urn:CGI:some:other:governance, but it does not know itself how resolve this URN). This
responsibility is delegated to a specific resolver that is used by the cascading resolver.

                                                                           GIN DD-09-001

We expect this service to be used at process time as a general 'batch resolver' or a general
„community integrator‟ that would bind resources coming from many sources.

Figure 2: Proposed architecture for resolution of URN in GIN.

Furthermore, this façade application could do more than just turning URNs into URLs. It
could also insert fragments using some specific rules (a.k.a profiles) that could be shared
amongst a larger community (instead of implementing those rules into the client
application). We could expect that a WFS façade resolver (WFR) would be better
synched with the community, more apt at caching dictionary and deal with GWML ( and
GeoSciML) best practices. The WFR could provide other services that are specific to the

This option can also be used for dataset that are not provided by a "service" by exposing
other API to this resolution service. WPS (Web Processing Service) seems to be the ideal

                                                                           GIN DD-09-001

approach because it can provide this kind of service to any kind of GWML document (be
it from a service or an archived CD).

5.2.9 Discussion

Using URN brings in a lots of assumptions about what the client application should do.
Thinking that a generic GML client can do anything useful out of GWML is an illusion.
Typical generic GML engine can only act on entities that are defined in GML (such as
geometries, projections, etc.) and are usually limited to organize the dataset in a tree and
plot geometries on a map. A generic application won't know about WaterWells and won't
know that MappedInterval should be plotted as a log. Any processing beyond this
requires a client application that is aware of the domain. Even if a generic GML parser
would know how to resolve vocabulary, it would not know what to do with them.

Therefore, it is safe to think that any application that will be built to ingest GWML will
be aware of the presence of URN and the need for resolver. The best we can do is to
implement some architectural solution that could be reused by other community and
avoid domain specific solution (eg, a rule that says "if this feature is a
gsml:MappedFeature, the classification property must be resolved"), we should
try to implement reusable best practices. Out of the seven options to deal with resolver
location, only the first one and the last one seems to fill all the requirements. Given that
GWML could not be process, WKR that points to the GIN infrastructure is only one more
small piece of information that a client application digesting GWML will need.

                                                                             GIN DD-09-001

6. Recommendations
This is a list of recommendations for implementation of resolvers for GIN.
       Given that URN is both a decoupling of resource identity and access protocol and
        is also a governance tool; we recommend that URN should be use for all
        references that lies outside the governance of the provider. When a reference
        points to an object within the governance of the provider of the dataset, it should
        be at the provider discretion, but as a best practice, we recommend that explicit
        URL to the resolver be used (5.2.7).
       When issuing a URN reference to an object within the governance of the provider,
        the provider must insure that a service can resolve this object. Otherwise the
        object must be serialized in the document.
       GIN Resolvers shall remove service artefacts (see when a URN are used.
       Given that resolver will act as an integrator and it is likely that many sources of
        data under many governance will be required, it will be more efficient to
        implement a single resolution point, manage by GIN instead of expecting that
        client applications to deal with multiple resolvers. We therefore recommend the
        implementation of a single cascading resolver. (see 5.1.1)
       Given that N2C is the most complete response and all other forms (N2R and N2L)
        can be derived from N2C; GIN Resolver shall return N2C response.
       Since it is unlikely that a generic application could do any significant use of a
        complex model such as GWML and therefore any application ingesting it will be
        aware of GWML and many other things, we propose that resolver be a WKR
        (well know resolver) that the client application can keep in its internal
        configuration. Resolver location is then just one more bit of information a client
        application will need to know to participate in the network. This will also enforce
        GIN as a collection of coherent resources beyond a mere WFS service.
       If a generic client must be supported, we propose exposing a WFS (or any
        relevant service) façade is provided, with the understanding that there is very little
        benefit of using such tool with a complex, domain specific model.

Implementing a resolution architecture is far more complex that expected. From a simple
“turn a URN into a URL”, a lot of aspects need to be considered and agreed on. This
document is probably not complete and many details are still open to debate.

                                                                        GIN DD-09-001


Client : A client is an application or system that accesses a remote service on another
computer system, known as a server, by way of a network.[1] The term was first applied
to devices that were not capable of running their own stand-alone programs, but could
interact with remote computers via a network. These dumb terminals were clients of the
time-sharing mainframe computer. (<http://en.wikipedia.org/wiki/Client_(computing)>)

User: A user is a person who uses a computer or Internet service. Users are also widely
characterized as the class of people that use a system without complete technical
expertise required to fully understand the system.

                                                                        GIN DD-09-001

Cox, 2008, A Uniform Resource Name (URN) Namespace for the Commission for the
     Management and Application of Geoscience Information (CGI) ,
     http://tools.ietf.org/html/rfc5138 accessed October 2009

Daniel, R., 1997, A Trivial Convention for using HTTP in URN Resolution, Network
     Working Group, RFC 2169. http://tools.ietf.org/html/rfc2169 accessed October

DeRose, S., Maler, E., Orchard D., editors., 2001, XML Linking Language (XLinks)
    Version 1.0, World Wide Web Consortium (See http://www.w3.org/TR/xlink/)

Fielding, R. et al, 1999, Hypertext Transfer Protocol -- HTTP/1.1, Working Group, RFC
      2616, <http://tools.ietf.org/html/rfc2616>, accessed October 2009.

Murata,M., 2001, XML Media Types, Network Working Group, RFC 3023,
     http://www.rfc-editor.org/rfc/rfc3023.txt accessed October 2009.

Schut, P. ed, 2007, OpenGIS® Web Processing Service, OGC 05-007r7,
     http://portal.opengeospatial.org/files/?artifact_id=24151 accessed October 2009.

Seegrid,2008, Workshop on Register-managed vocabularies in a service-oriented
     chanismsWorkshop> accessed October 2009.

Seegrid, 2009, Resolving terms referenced from within XML schemas and instance
     nisms>, accessed October 2009

Seegrid, 2007a, CGI Identifier Scheme, accessed at
     October 2009

Seegrid 2007b, CGI URI Resolver accessed at
     cification> October 2009.

Steven DeRose, editor. 1999 XML XLink Requirements Version 1.0.World Wide Web
     Consortium,. (See http://www.w3.org/TR/1999/NOTE-xlink-req-19990224/.)


To top