Grand Challenges in Global
The stimulus from Paul Mather
• A man called Hilbert wrote a seminal paper in 1900
that contained a list of problems that had to be
overcome if maths was to develop.
• This provided a research focus for mathematicians
around the world.
• Given the range of uses of RS data and the
inadequacies of many of the techniques used to
extract information from that data I suggest that the
RS community …needs a remote sensing Hilbert to
write a paper that focuses on land cover extraction
(from a range of data of different scales and
coverages, and the use to which this remotely-sensed
information is put).
• To put it bluntly, would you be willing to write such a
paper for PIPG (of which I'm an editor)?
Examples of David Hilbert’s 23
• The continuum hypothesis (that is,
there is no set whose size is
strictly between that of the integers
and that of the real numbers)
• The Riemann hypothesis (the real
part of any non-trivial zero of the
Riemann zeta function is ½) and
Goldbach's conjecture (every even
number greater than 2 can be
written as the sum of two prime
• Solve all 7-th degree equations
using functions of two parameters.
Some hard nuts to crack
• Making progress, but:
• Validation is grossly unsatisfactory.
• Classification issues.
• Separating emissivity and temperature.
• Failure to repudiate nonsense
• Data Policy
• Research to operations
GlobCover (300m product)
Landsat-5: Atmospheric Correction (Masek et al)
1990’s Landsat-5 mosaic
BOREAS Study Region
TM Mosaic (current) band 321 (0-1200)->(0,512)
500m or 1km
Masek et al
LAI distribution around August 12, 2000: MODIS
product (A) and processed product (B)
Fang, H., S., Liang, J. Townshend, R. Dickinson, (2008), Spatially and temporally
continuous LAI data sets based on an new filtering method: Examples from North
America, Remote Sensing of Environment, 112:75-93
Monitoring Vegetation Fires in Amazonia
Schroeder et al
Optimizing the combined use of MODIS
and GOES fire detection data for
1. Schroeder, W., Prins, E., Giglio, L., Csiszar, I., Schmidt,
C., Morisette, J., and D. Morton (2008). Validation of
GOES and MODIS active fire detection products using
ASTER and ETM+ data. Remote Sensing of Environment,
112 (5), 2711-2726, doi:10.1016/j.rse.2008.01.005.
2. Schroeder, W., Csiszar, I., and Morisette, J. (2008).
Quantifying the impact of cloud obscuration on remote
sensing of active fires in the Brazilian Amazon. Remote
Sensing of Environment, 112, 456-470,
3. Schroeder, W., Morisette, J. T., Csiszar, I., Giglio, L.,
Morton, D., and Justice, C. (2005). Characterizing
vegetation fire dynamics in Brazil through multisatellite
data: Common trends and practical issues. Earth Integrated fire product for Brazilian Amazonia using 2005
Interactions, 9, Paper 13. MODIS and GOES data showing average number of
detection days per year.
4. Morisette, J.T., Giglio, L., Csiszar, I., Setzer, A.,
Schroeder, W., Morton, D., and Justice, C. (2005),
Validation of MODIS active fire detection products derived
from two algorithms. Earth Interactions, 9, Paper 9.
Making data available through the web in standard formats
makes an enormous difference
From: Chris M Mayfield,
NORTHCOM COP/GIS Manager
Wildfires in California
MODIS active fire
“Long time NASA MODIS users,
we detections superimposed
were unaware of the FIRMS
with USFS park
resource until that mid morning,
now, I can hydrology,
butboundaries,assure you that
roads. very can query for
FIRMS is User much a part of the
NORTHCOM Team in protecting
fire detection attribute
Again, our many thanks and a very
big BRAVO ZULU to all of you on
the FIRMS Team.”
Davies et al. UMd
SRTM/DEM and PRISM/DSM (1/2)
財団法人 リモート・センシング技術センター SRTM DEM
ALOS PRISM DSM
What is truth?
Operational lc validation framework
Primary Comparative Updated Validation of
validation validation valid./change new products
Degree of usability and flexibility
Interpretation Reference database:
(Regional statistically robust, consistent,
Design based Networks) harmonized, updated, and accessible
sample of reference sites
International consensus on technical issues
Strahler et al., 2006
Validation is really hard.
• Scale matters a lot
• Making ground measurements and relating them to
even 30m or 250m pixels is hard work and
• With too much inherent spatial variability relative to
pixel size and locational rms errors you never know
where your ground observations are in relation to the
– Some areas can not be validated
• Not to mention MTF/PSF.
• Timing (or lack of it) is usually also an issue.
• In rugged terrain we are usually screwed.
• Validation of change detection is really, really hard.
• We have failed to make the case for Validation so that enugh
funds are available!
• Few funds means that validation of all products is inadequate.
• Stage 1 Validation – Product accuracy has been estimated using a small
number of independent measurements from selected locations and time
• Stage 2 Validation – Product accuracy has been assessed by a number of
independent measurements, at a number of locations or times
representative of the range of conditions portrayed by the product e.g. EOS
Land Validation Core Sites, Fluxnet sites, Aeronet sites.
• Stage 3 Validation - Product accuracy has been assessed by independent
measurements in a systematic and statistically robust way representing
global conditions e.g. IGBP DISCover Project – suggest that this be
• For any product can we truthfully give the errors in space and
time to our own satisfaction?
• Sometimes there are no funds and no validation.
Does validation allow us to assess value?
“The widely used leaf area products derived from satellite-observed surface
reflectances contain substantial erratic fluctuations in time due to inadequate
atmospheric corrections and observational and retrieval uncertainties.
These fluctuations are inconsistent with the seasonal dynamics of leaf area,
known to be gradual.
Use in process-based terrestrial carbon models corrupts model behavior,
making diagnosis of model performance difficult.
We propose a data assimilation approach
Combines the satellite observations of Moderate Resolution Imaging
Spectroradiometer (MODIS) albedo with a dynamical leaf model.
Its novelty is that the seasonal cycle of the directly retrieved leaf areas is smooth
and consistent with both observations and current understandings of processes
controlling leaf area dynamics.”
Liu et al 2008
The point is that any sort of generic validation might not identify this problem.
• Classification often does not work well.
– Many reasons.
– Some arise because we still don’t know
how to classify
• Robustness to error in training data.
• Class proportions
Dealing with training site errors
• Training sets always contain errors
• Can we overcome this problem in classification?
– Test the classifiers with varying amounts of errors
introduced into the training set
– Support Vector Machine (SVM) and Kernel
Perceptron (KP) outperforms Maximum Likelihood,
Decision Tree, and ARTMAP Neural Network
– Errors as much as 30% in SVM can be tolerated
• The soft-boundary design of modern SVM
allows a proportion of errors to exist in the
SVM Robust against subjective errors
A. Overall of and SVM using a a 20% corrupted training data
B D. Change Detection Resultcondition of the Experiment Site
Change Detection Result of DTDT and SVM using10% corrupted training data
Error Resistance of Major Machine Learning Algorithms
MLC total Accuracy
ARTMAP NN Total Accuracy
DT Total Accuracy
SVM Total Accuracy
KP Total Accuracy
0 5 10 15 20 25 30 35 40 45 50
Percentage of Error in Training Data
Early Work on Training Design
• Class proportions impact on a priori probabilities
– Identified by Strahler in 1980
– Part of the Maximum Likelihood Classifier (MLC) framework
– Usage: to multiply with the probability of each pixel
– Contribution: Introduced the concept of “Class Prior”
– Issue: The concept was not used in training design
• Class proportions in the Population
– Identified by Hagner in 2001 and 2005
– Estimated using MLC
– Usage: to adjust the proportions in the training set for iterative MLC
– Contribution: Adaptive training design using “Class Prior”
– Issue: It is not MLC that needs training set design. MLC actually is
largely invariant to training sets of different proportions, as is shown
in Hagner’s own results.
The Over/Under-Estimation Problem (Song et al)
The Optimal Configuration of Training data for SVM-based Forest Change Detections
User Accuracy of Forest Change
Producer Accuracy of Forest Change
0 10 20 30 40 50 60 70 80 90 100
Percentage of forest change pixels in the training data(%)
Modern Algorithms such as SVM are very susceptible to this problem.
But MLC is largely unaffected
The Over/Under-Estimation Problem
The Optimal Configuration of Training data for MLC-based Forest Change Detections
User Accuracy of Forest Change
Producer Accuracy of Forest Change
0 10 20 30 40 50 60 70 80 90 100
Percentage of forest change pixels in the training data(%)
• Many methods need the class prior of the population
to resample the training dataset
• The class prior of the population might be estimated
Almost impossible to separate surface emissivity
and temperature accurately (Liang)
Surface leaving radiance is the sum of the surface emitted radiance and
reflected downward atmospheric radiation
L( ) B(T ) (1 ) Fd /
Where is surface emissivity, B () is the Planck function, and Fd is the downward flux.
For most surfaces, since emissivity is close to 1 the reflected radiance is quite small. Thus
L( ) B(T )
It is almost impossible to separate two multiplied components, so we cannot determine
emissivity and temperature T accurately.
The alternative solution is to estimate upwelling radiation from thermal IR observations
for initialization/calibration/validation of land surface models.
Some other issues
• The history of remote sensing information extraction
is largely the history of over-fitting.
– Those working on identification of spam have a one-shot
externally organized test.
• Hyper-spectral RS.
– Something is almost bound to be related to something.
– How do we begin to move towards standard products?
– Where is the underlying theory to determine them?
• Disparities in resolution of reanalysis products and
typical land cover variability.
• Difficulty of getting global biomass at time and space
resolutions appropriate for REDD and conservation.
Standing up for what we believe in.
• 159 scientific papers have been found to base their
conclusions heavily on FRA statistics (Grainger, 2008)
• We know FRA is garbage for land cover change so
why don’t we say so? This should not be a challenge.
Land cover and land use change.
• FRA Problems are twofold
• Having to deal with individual countries
• Confusion between land cover and land use
– “Where part of a forest is cut down but replanted
(reforestation), or where the forest grows back
on its own within a relatively short period (natural
regeneration), there is no change in forest area.”
– But for those concerned with land cover these
differences are real
The curious case of Canada in FRA 2005
• Forest Area 1990 310,134,000 ha.*
• Forest Area 2000 310,134,000 ha.*
• Forest Area 2005 310,134,000 ha.
“Canada reports only productive forest land;
unproductive forests are classified as “other wooded
land” even though many of them meet the FAO
definition of forest land. This results in underreporting
of more than 170 million hectares, or 40 percent of
Canadian forest land.” (Matthews 2000).
Note in FRA 2000 Canada reported only 244,571,000
hectares for both 1990 and 2000!
Issues with FRA
• Assuming we are interested in land cover
and not land use
– Global rates are wrong (much too low)
– Changes in rates (by decade and half-decade)
are wrong (Tropical deforestation rates from 80s
to 90s supposedly declining when increasing).
– Inter-continental variations are seriously
mistaken (South America vs Africa)
– Considerable inconsistencies between countries.
The importance of formats and
How to ensure data are used
On December 8, 2008, the USGS made the
entire 36-year long Landsat archive available to
anyone via the Internet at no cost.
Calibrated across missions and instruments
Questions for space agencies
• Why don’t you always provide the following:
– User friendly formats allowing immediate ingestion
– Standardized meta-data.
– Rapid response systems.
– Ortho-rectified data for all resolutions 500m and
– Atmospherically corrected data
– Up to date Calibration data
– Validation data for all products
Six Problems with RS data policies
1. If people want to use remotely
sensed data then they should 4. Restrictive Data Policy is OK
pay because remote sensing data is
made available free to scientists.
– They already have as citizens.
Plus the driving force for most – Why should scientists have
environmental remote sensing preferential access compared with
data is scientific or policy driven. those in developing countries
2. Making data available has an
incremental cost. 5. Principal Investigators need an
extended period of exclusive use
– Resources raised are a tiny
fraction of the total cost of the – Only to make sure the products
system. are characterized so that “health
warnings” can be attached.
3. There is a commercial future for all
environmental remote sensing 6. Tell us why you want to use the
data. data before we will let you have it
– Otherwise known as the
– No evidence for mid and coarser
resolution data. ”Papa ESA knows best policy”
GEO Halls of Fame and
Shame for Agencies
HALL OF SHAME
• Free and open data • Restrictive data policy
policy with charging.
• Data easily accessible • Not on-line: difficult to
on line. order.
• Community specified • Non-standard agency
formats specified formats
• Orthorectified • Not orthorectified
• Validated data sets • Unvalidated data sets
“Valley of death”.
FROM RESEARCH TO
OPERATIONS IN WEATHER
CROSSING THE VALLEY OF
Board on Atmospheric
Sciences and Climate
The term “Crossing the Valley of Death” is sometimes
used in industry to describe a fundamental challenge
for research and development (R&D) programs. For
technology investments, the transitions from
development to implementation are frequently difficult,
and, if done improperly, these transitions often result
in “skeletons in Death Valley.”
Successful transitions from R&D
to operational implementation
• Understanding of the importance (and risks) of the transition,
• Development and maintenance of appropriate transition
• Adequate resource provision,
• Continuous feedback (in both directions) between the R&D
and operational activities.
“In the case of the atmospheric and climate sciences,
inadequacies in transition planning and resource commitment
can seriously inhibit the implementation of good research
leading to useful societal benefits.” NRC.
Landsat>LDCM and MODIS>VIIRS clearly demonstrate
the enormous difficulties that can occur.
• A near-term major challenge for the international
community will be to develop the best available -
validated Fire Disturbance ECVs.
• The Grand Challenge will be to secure the satellite
fire observing system that is needed consisting of
– 1) operational polar orbiters with appropriate saturation for
– 2) operational global geostationary network with 500m
resolution 30 minute repeat,
– 3) operational global Landsat class observations with 3-5
Who has the responsibility for
doing things operationally?
• Broad consensus on methods to achieve operational
– But we must adapt to rapidly changing technologies and data
availability (Google and radar)
• Need to ensure commitment to:
– Supply of remote sensing data
– Generation of terrestrial products
– Operational validation process
– More broadly who will commit to generation of operational products
such as ECVs?
• Which international body will oversee the work?
– Who has both the formal responsibility and scientific and technical
– Can not simply be left to agencies. Agencies are starting to lay claim to
certain ECVs but with little oversight.
• Urgent need to establish roles and responsibilities.
GEO and CEOS
• Internationally highly dependent on them.
• But both “best efforts” organizations.
• Much talk about cooperation but concepts
such as virtual constellations will be very
– Agreements on data policy
– Agreements on formats and pre-processing
– Common portals that work.
• Perhaps the greatest challenges is to get
these organizations acting in an integrated
coordinated fashion responding to user
Time Series for Amazon Forests Solar
4.0 (mm /mo) 750
2000 2001 2002 2003 2004 2005 2006
*Dry seasons are in grey shaded bars.
The phase-shift between LAI and solar radiation suggests
rainforests’ adaptation to anticipating more sunlight.
Transitioning to operational
• Get the data policy right
• Standardization of formats
• Atmospheric correction
• Use of improved algorithms
• Performance of remotely sensing studies in the
real world largely relies on two factors:
– 1. How well can algorithms handle unknown errors
– 2. How to adaptively design the training set so that
we can balance the overestimation/underestimation
Global Agricultural Fires
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Korontzi et al. 2007