FAQs by panniuniu


									FAQs – Multi-Jurisdictional Data (MJD) Screening Tool
To be updated periodically as the Qs are FA’d; last updated September 2011

Will users be able to limit their queries to only extant EOs and/or, say, to just G1-G2 taxa?
Yes, that is the goal. Currently the tool is minimalist in that it has no interface at all and only allows a
user to limit their query results to species with status under the U.S. Endangered Species Act (since that
was our funder’s priority). The interface we build will allow for a way to filter out based on EO Rank = X,
H, or to filter by Last Observation Date, by Conservation Status Rank, by other statuses such as SARA,
and potentially by many other fields—Representational Accuracy, State/Provincial Protection Status,

I’m uncomfortable with opening this up to the private sector. Why don’t you have at least 2 versions:
one for government/university/non-profit and one for private? Then I could make separate decisions
about exposing and fuzzing data for each version.
Our goal has always been to have a single “off-the-shelf” MJD product that we can provide to all clients
and that would be an efficient alternative to the time-and-labor-intensive custom data requests that we
will continue to offer. Maintaining two (or more) versions inevitably means a great deal more overhead:
separate spatial databases, more testing, more complexity in just about every aspect. We simply don’t
think it would be feasible.

Could someone do repeated queries in and around an area to zero in on the actual location of an EO?
They could, in theory, but there will be at least one strong disincentive: cost. The subscription fee
structure will not allow for an “unlimited query” option; queries will cost money. If you are concerned
about especially sensitive species, that may be worth the cost of repeated queries to some unscrupulous
parties, you can always elect to have the EOs for those taxa fuzzed (enlarged). No matter how many
queries are run, there will be no way to zero in to anything finer than the size of that underlying
enlarged footprint. You can also make these records “species blind” to mask which species the query
has hit.

If I fuzz data, and a query “hits” those data, won’t that give the client a false positive? In other words,
wouldn’t it mislead the user into thinking a species of concern was in their area of interest when in
fact it may not be?
We get around this by returning a flag in each case where a query hits fuzzed data. The flag tells them
that the underlying footprint has been intentionally enlarged (made less precise) and the actual location
“may or may not be” within their query area. (The flag does not say anything about the fuzzing method
or the degree to which it has been enlarged.)
For anyone who wants more specifics, here’s how these flags work at the 3 query scales:
 At the largest scale, where the client gets the "Exact Species" output, the fuzz flag is placed on the
      individual EO records.
 In the intermediate scale, where the output is summarized into “Major Taxonomic Group,” if all EOs
      for a given species (e.g., “plant species 4”) within the query area have been fuzzed, the web service
      output will contain the “fuzzed” flag on that species record.
 At the smallest scale, where the return is “Known Presence” (Yes or None Known), the “Yes”
      response will include the fuzzed flag if all the EO data in the query area were fuzzed.

Are there limits to how much I can fuzz my data?
Yes. If you feel it is necessary to fuzz all your EOs, you can fuzz them up to 2 square miles. If you only
feel it is necessary to fuzz a “handful” of especially sensitive species or EOs, you can fuzz those up to 4
square miles (we aren’t defining “handful” precisely; there is some latitude). The reasoning behind the
limit is two-fold: 1. The tool inherently provides substantial masking of actual locations and 2.
Extensive fuzzing severely compromises the basic screening functionality of the tool. We can’t, after all,
in good conscience charge clients for a “screening” tool that—in some jurisdictions at least—will not
“screen” because essentially all queries will hit something (and that something will be flagged to say the
actual location may or may not be within your query area). We have, however, developed an option to
mask the identities of EOs by making them “species blind” Basically, instead of enlarging the underlying
spatial data, we remove the information about the record. It can be applied to all EOs for a species, or
to individual EOs

How will the “species blind” option work to mask sensitive data?
 Let’s say you have a species X that is sensitive and you elect to have all the EOs for it be included in the
tool as “species blind.” And let’s say a user queries the largest size area and hits one or more of these
“species blind” X EOs. She would get back all the usual information about whatever else the query “hit”
(if anything) plus a statement that said something like, “in addition to any other results of your query,
one or more species occurrences met your search criteria but we are unable to provide you additional
information due to data sensitivity. If you need more information, please contact [the data steward—
i.e., you, the member program+”

At the medium sized query, they’d get something like:
Birds sp. 1, G2, LE
Birds sp. 2, G1
Flowering plants sp. 1, G2
Flowering plants sp. 2, G5
Flowering plants sp. 3, G1, LT
“in addition to any other results of your query, one or more species occurrences met your search criteria
but we are unable to provide you additional information due to data sensitivity. If you need more
information, please contact *the data steward+”

At the smallest scale, it would just return the basic “Yes” response.

So basically, the species blind option just forces the Yes/No return at all the query levels for those EOs.

I think the “Terms and Conditions” can be improved. Are you open to changes?
Yes, please. Please send your suggested changes to Kat Maybury: kat_maybury@natureserve.org.
(A suggestion from Roxanne Bittman (CA) has been added to a new draft as of October 26, 2010.)

How will the income from this (assuming there is some) be distributed around the network?
The Product Development Team has decided that the immediate priorities are: 1) developing an
interface (because we are still at a point where we need to put funds into the tool until it reaches a
certain level of usefulness/marketability); and, 2) Data exchange, including improvements to the data
exchange process itself. Once the tool itself is more developed, priority #1 will switch to be focused on
providing funding to programs for meeting Benchmark Data Content Standards, particularly for those
standards that will, in turn, improve the tool functionality such as addressing EO backlogs, updating S
and G ranks, assigning Representational Accuracy, verifying State Protection Statuses, and the like.
Some funds may also go towards marketing the tool. The Product Development Team, consisting of
both NatureServe and member program staff, will continue to provide specific guidance. Note that the
“disposable” income addressed here is any funding above and beyond the basic costs to NatureServe to
maintain and refresh the tool itself (the cost for servers and for staff time to refresh the data and tweak
performance, etc.).
We’ve made lots of updates to our data recently and I’m worried that the data you have from my
program are now out-of-date. What can we do about that?
There are two options for updating the data we have in central databases: a “full” data exchange and an
EO Update. The latter can take as little as one person-day or less on your end and can be scheduled
much more flexibly than the full exchange. (It is only available to programs using Biotics, though.)
Please contact Donna Reynolds donna_reynolds@natureserve.org to schedule one. And, even though
this is a little bit of “the-chicken-or-the- egg,” keep in mind that the proceeds from this product should
eventually help us fund more, and more efficient, exchanges of data, with both Biotics and non-Biotics
programs. That’s the goal, anyhow.

How often are the data accessed by the tool refreshed?
We are currently operating on the same schedule as the NatureServe Explorer refresh: we take a
snapshot of central databases three times per year and refresh both Explorer and the tool based on
those snapshots. We don’t expect that schedule to change in the near term.

I was testing this for my state and noticed that I got some odd results: results that do not agree
completely with the same query executed on NatureServe Explorer; and results that have data from an
adjacent state in them even though I was querying a county within my state. What’s up?
The underlying spatial data that this tool accesses is different than that used by Explorer, or for that
matter, than those used in analyses presented on Landscope America or in other venues you may have
seen. The primary reason is the fuzzing. NatureServe Explorer only allows users to obtain distribution
data by state/province, U.S. county, and 8-digit HUC. It bases that distribution on the unfuzzed EO data.
However, the screening tool also allows the user to draw a custom polygon as a query area. Because we
wanted the data to be internally consistent within the tool itself, we based ALL the queries—
county, HUC, and polygon—on the same dataset, which is comprised of fuzzed and unfuzzed data. This
means a user should get the same results from the tool whether she enters a county FIPS code as her
query or submits a custom polygon that delineates that same county. It also means that she may see
data from an adjacent state or province showing up in her county results because fuzzed data from that
adjacent program that “spills over” into her county will be “hit.” (We thought about clipping the fuzzed
data from a jurisdiction to that state’s/province’s border but you run into many cases where the size of
the fuzzed & clipped shape would become inadequate to do what the fuzzing intended—protect the
location of the actual EO.) NatureServe Explorer and the screening tool are generally refreshed on the
same database “snapshot” schedule, but data slightly out of sync could be another reason for disparities
between the screening tool and other analyses/products, as could decisions about what data are/are
not included (e.g., whether taxonomic “non-standards” are or are not included in a particular analysis).

A lot of programs (like mine) already have their own online screening/web services so why is an MJD
tool important?
A lot of programs do but a lot don’t. So this screening tool can fill in the gaps and help connect
programs that don’t have this kind of resource to clients who need these data. It can thus benefit the
network as a whole. And even if you have your own online tools, we think it is possible this tool may act
as portal for some entirely new, federal-level clients that may not otherwise have accessed your data, or
perhaps wouldn’t have incorporated them into as many of their applications. The idea behind this, and
the decisions made about the scale of the data exposed, was to act as a “first stop” for data, maximizing
the benefits to programs that don’t have an online resource while at the same time not eclipsing
anyone’s locally maintained tools, and certainly not their local expertise. Basically, we think having
another portal to these data, if it exposes data at the right scale, can be a positive for all concerned. The
more ways to access the data (again, at the right scales), the more we can achieve avoidance of these

To top