Recommendations on archiving strategies and general access methods by tse12581

VIEWS: 5 PAGES: 7

									      Recommendations on archiving strategies and
         general access methods for radar data.

                        EUMETNET / OPERA programme project 4b


                     Dr. Thomas Leitner and Dipl.-Ing. Helmut Paulitsch
                     Institute of Communications and Wave Propagation
                            University of Technology Graz / Austria


1.    Introduction

Most of the currently applied national archiving strategies are incompatible to each other and
accessing archived data of another NMS is highly complicated not only because of
administrative rules but mainly because of the different national standards. A sub-project in
the EUMETNET project OPERA has already been carried out which collected information
about the current archiving strategies by the various OPERA members. The result of this
sub-project is a report (see [1]) which clearly confirms that all NMSs use quite different
archiving methods and standards. For the MAP project, for example, much effort had to be
put in format conversions of the archived radar data.

But why is it desirable to have a common archiving standard anyway? For example COST-
717 has stated in it’s memorandum of understanding: “...establish a European standard for
archiving of radar data including a quality index and supporting observations for use in model
validation studies. Collate a list of interesting cases. Establish an internet based European
radar archive for modelling studies through OPERA, with a diary of key meteorological
events.”

In other words: For the research community on one hand, but also for operational uses on
the other hand, easy access to archived radar of different NMSs would help to improve and
expedite results for the users.

2.    Main goals

The main goals for the archiving strategy can be summarised as follows:

1.    Accept radar images in BUFR format (preferred) but also in other, proprietary formats.
2.    Store all radar images in a database
3.    Easy transfer of radar data to the database
4.    Maintain meta-data (data describing the contents of the radar images)
5.    Platform independent: The primary target platform is Unix (AIX, Linux, Solaris Tru64,
      HP-UX etc.) but MS-Windows is targeted as well.
6.    Provide easy manual as well as automatic access via the Internet (WWW)
7.    Provide automatic means to export parts of the database to CD-ROM or tape.
8.    Integrate a CD-ROM writing software: When the harddisk gets full, the oldest archive
      files can be exported to a CD-ROM with a simple mouse click.
9.    Integrate a tape (DAT or DLT) writing software: When the harddisk gets full, the oldest
      archive files can be exported to DAT or DLT tape with a simple mouse click.
10.   Off-Line data which is located on external storage media is erased from the harddisk
      but the meta-data and the index files are kept on the harddisk for access via the WWW.
11.   Be prepared for easily extending and modifying the database to suit to other external
      storage facilities.

Recommendation on archiving strategies                                            Page 1 of 7
12.       Provide automatic means to notify an operator if someone requests an image which is
          not available on-line.
13.       For BUFR images, provide a simple image preview in the WEB browser to see the
          contents of selected images.
14.       Follow the KISS (Keep It Small and Simple) principle.

The following chapters explain the suggested architecture and how these goals can be met.


3.        Architecture

The basic architecture of the archiving software is depicted in Figure 1:



                                                                                          Client
                                                                 Output Module:
                                                                 WEB Interface

        Input Module:
       Directory Reader


                                                                                          Client


                                                                  Output Module:
                                           OnlIne-
                                                                 Automatic Export
                                          Database
                                        (on harddisk)




     Management Modules

                                                                    Any other
                                                                 output modules




                                          Offline-
                                         Database
                                 (on CD-ROM, DAT, DLT etc.)
      Management Station




                            Figure 1 : Archiving System Architecture

The database basically consists of two main parts: The on-line database and the off-line
database. The on-line database contains all datafiles and metadata as well as standardised
means to access them. The off-line database is a set of storage media (CD-ROMs, DAT or
DLT tapes) containing all data which cannot be held on-line due to the limited on-line storage
capacity. Note, however, that a list of all off-line data needs to be available on-line in order to
make off-line data accessible to the users.

Data input is managed by a set of input modules which work together to allow operational
systems to put their data into the database easily. Management modules provide a web-
based interface to maintenance tasks for the database. For example copying data to/from the
off-line database, allow users to access the database etc. is managed from here.

Finally assorted output modules allow the clients to access the database. The main access
method is via a web browser. The same protocol (HTTP), however, can be used to
automatically retrieve data from the database.


Recommendation on archiving strategies                                                 Page 2 of 7
The suggested implementation and functions of all these modules is described in the
following chapters.

3.1 Database structure

The data items which need to be saved in the database are mainly

•   Radar images
•   Metadata about the images

Each radar image has a product name (CAPPI etc.), radar site identifier, originating country
identifier as well as a time-stamp. For the radar-site identifier as well as for the product name
and originating country, standards were already established, for example the German radar
composite has a product name/radar-site identifier of PAAM21 and the originating country is
identified as EDZW. This follows the WMO bulletin format for BUFR data (see [2]) and
constitutes a strictly hierarchical description of a radar image and is ideal for using a file-
system hierarchy for organising the data. Or in other words: It is suggested to use the normal
file-system for storing the single radar images and not to use a commercial or open-source
database product. Radar images could be stored in the following file-system hierarchy:

COUNTRY
     PRODUCT
          SITE
                       YEAR
                               MONTH
                                       DAY
                                               RADAR-IMAGE-FILE

Of course if the database is installed on a national level, the hierarchy can start at PRODUCT
and skip the COUNTRY level.

RADAR-IMAGE-FILE is the name of the BUFR or RAW radar image file with the following file
name format:

                       PAppss_country.YYYYMMDDhhmm

Where PA is a common prefix for all BUFR files (raw files are supposed to have the prefix
“RA”), ‘pp’ is the product identifier, for example AM etc., ‘ss’ is the radar site identifier and
‘country’ is the country identifier. Please refer to [1] for more information about these
identifiers.

For example to access a German composite image recorded on 15.9.2002 at 10:20, the
following directory path can be used:

EDZW/AM/21/2002/09/15/PAAM21_EDZW.200209151020

Storing the radar images in this way has several advantages as follows:

•   Cheap: No additional cost for a commercial database product is necessary.
•   No complex installation and maintenance: Even if an open-source database product like
    MySQL would be used, users and system managers would still have to struggle with
    installation, upgrade and maintenance of the database software. This is avoided here.
•   A hierarchical filesystem is readily available on every operating system regardless of
    whether it is Unix or Windows.


Recommendation on archiving strategies                                                  Page 3 of 7
•   Radar images are not stored in complex database files with unknown structures and
    where it is hard to retrieve the data in the case the database files are corrupted.
•   Modern filesystems are optimised for fast access so traversing the filesystem hierarchy
    can expected to be quite fast.
•   Access to the single radar images is possible via simple file open methods so extending
    the database access methods is quite simple and can be done either with executable
    programs or simple shell or Perl scripts.
•   When radar images are saved in BUFR code and meta-data is saved in ASCII, the
    complete database can be copied from one platform to another platform because both,
    BUFR and ASCII are basically platform independent. This would not be the case with any
    integrated database product.
•   Still network access can be provided to the database by writing necessary output
    modules.

Of course there are some disadvantages as well:

•   The fastest access to the files is only possible when all parameters (PRODUCT, SITE,
    DATE) are known. If any of these parameters is unknown, a filesystem search is
    required. However the required search time can be limited by regularly creating index
    files and using standard search tools (fgrep) to search for particular items. This has the
    additional advantage that regular expressions can be used to formulate quite complex
    database queries.
•   There is no file locking: File access conflicts (an input module writes a new image to the
    database while an output module tries to read the same image) can be solved by
    designing the input modules in a way that they make sure that the datafile is securely
    written to the filesystem before updating the index. The usual way to achieve that, is to
    copy the file to the target directory under a different (invisible) name and after the file is
    completely copied, rename the file to its final name.
•   Advanced databases allow for triggers and stored procedures to perform actions as soon
    as there are any changes in the data structure. If any such means are necessary, they
    can be implemented by calling a common “update” script as soon as the database has
    changed. The update script could be modified to execute any “triggers” when the data
    changes.
•   There is no common access method (like SQL) to the data and basically any program on
    the computer could access the data in the filesystem. This could be avoided by using
    proper file protection and providing common access commands (programs, scripts) to the
    database.

So all in all using the file-system for storing the radar images and metadata has many
advantages and some disadvantages. However for this particular purpose the advantages
outweigh the disadvantages by far.

The storage of metadata (if necessary anyway because BUFR files are basically self-
descriptive) can be done in a similar fashion: In addition to the image files, one or more
metadata files can be saved. It is suggested to save a metadata file in each month’s
directory. If it exists, the metadata file is valid for all radar images within this month. If any
images are different and require different metadata, a metadata file can be saved along with
the image. Consider this example:

EDZW/AM/21/2002/09/meta.dat
EDZW/AM/21/2002/09/15/PAAM21_EDZW.200209151020
EDZW/AM/21/2002/09/15/PAAM21_EDZW.200209151030
EDZW/AM/21/2002/09/15/PAAM21_EDZW.200209151040
EDZW/AM/21/2002/09/15/PAAM21_EDZW.200209151040.meta



Recommendation on archiving strategies                                                 Page 4 of 7
This means that the metadata in file meta.dat is valid for files
PAAM21_EDZW.200209151020 as well as for PAAM21_EDZW.200209151030. But file
PAAM21_EDZW.200209151040 has an own metadata file (with the extension .meta) which
overrides the month metadata file. Metadata files can also be put in the day directory in order
to override the month setting. So if some radar is re-configured on a certain day, this day and
all following days within this month need to have the new metadata file. The next month can
have a per-month metadata file then.

Note that the data import modules take care of organizing the metadata files in this fashion.
The program which imports data only needs to send a radar image along with a metadata file
to the database. The importing modules compare the new metadata file with the current one
and save it in the fashion described above. When BUFR files are imported, no metadata is
required anyway because BUFR files are self-descriptive. Still it is possible to use metadata
in order to describe properties of the radar images which are not covered by the BUFR
standard.

The format of the data in the metadata files is arbitrary. It is suggested to use plain ASCII
files describing the contents of the radar images. No special or fixed format is used. The
database search facilities will just search through these text files and return any matches.
This way it is possible to store any descriptions in the metadata and there are no constraints
about the contents. When BUFR files are stored in the database, the BUFR description has a
fixed format and thus search results will be more reliable.

The total storage capacity of the on-line database is monitored in regular intervals. As soon
as the total amount of data exceeds a configurable limit, an e-mail to an administrator is
being sent containing a notice that copying data to the off-line storage will be necessary.

Copying data to the off-line storage can be done by inserting new blank storage media (CD-
ROMs, DAT or DLT tapes) and selecting the “Swap out to off-line database” function in the
management web page. This function starts a script to collect the oldest on-line files, update
the off-line index files and copy the files to the off-line storage. After copying the files, file and
data verification run is executed which compares the files on the media with the files on the
harddisk. Only after this verification run has been executed successfully, the files are erased
from the on-line database.

The reverse way, copying data files from off-line storage to the on-line database is done in a
similar way. The operator selects the required files in the web page and is notified about
which media to insert into which drive. After inserting the media, the files are copied to the
on-line database.

The archiving software will support writing CD-ROMs under Windows and Unix using the
normal ISO9660 CD-ROM format as well as writing to DAT or DLT drives in the standard
TAR format.

3.2 Input modules

Input modules will allow to import data (radar images with or without a metadata file) to the
database. In the simplest case, an input module will monitor a certain “transfer” directory. As
soon as a datafile with a certain name is found in the transfer directory, it is imported into the
database. As the transfer directory can be made available via the network (for instance for
FTP, RCP or NFS access), it is very easy to integrate it into operational radar systems. The
operational system just needs to be modified to copy a new complete radar image to the
transfer directory.

In order to maintain data integrity, the following way to copy files to the transfer directory is
suggested:
Recommendation on archiving strategies                                                   Page 5 of 7
a.) Copy the datafile with the final filename to the transfer directory.
b.) Copy the metadata file (if any) with the final filename to the transfer directory.
c.) Create an empty file with the same name as the main datafile but with the extension “.ok”
    appended in the transfer directory. This file tells the import module that a new radar
    image is ready to be imported.

For example in order to store the German composite of 2002.09.15, 10:20 in the database,
the following copy operations would be necessary (supposed that the transfer directory is
called /transfer)

$ cp PAAM21_EDZW.200209151020 /transfer
$ cp PAAM21_EDZW.200209151020.meta /transfer
$ touch /transfer/PAAM21_EDZW.200209151020.ok

The first two command copy the radar image file and the metadata file to the transfer
directory and the last command creates an empty file in the transfer directory in order to tell
the input module that there is a new file to be imported. Note that when a new import data file
is encountered and the respective directory hierarchy (in this case, directory
EDZW/AM/21/2002/09/15) does not exist yet, it is automatically created by the input
module.

The input module runs in regular intervals (say once per minute) and checks the transfer
directory. All files in this directory which follow the above naming convention will be imported
into the database and then deleted from the transfer directory.

The input module also runs a user configurable “update” script which can be used to perform
special actions every time a new file has been imported into the database.

It is also possible to implement input modules which directly accept files via a special
protocol over the network, but the method described above should be sufficient for most of
the cases. If there are any particular requirements for other input methods they can be
implemented easily by writing additional input modules.

3.3 Management modules

Management modules allow to perform management tasks on the database. These tasks
can either be performed by executing commands in a shell window or over a convenient
WEB interface which basically provides the same functions. Additionally people with Unix
experience can also perform management tasks directly on the database directory structure.
For example in order to backup the complete database, it is only necessary to copy the file
hierarchy to a tape or in order to find the number of files or the total required disk size, the
appropriate Unix tools can be used on the directory hierarchy.

The most important management tasks can be summarised as follows:

•   Configuration: Configure e-mail addresses, max. on-line database size, base directory
    etc.
•   Swap out oldest files to off-line storage
•   Copy particular files from off-line storage to on-line storage
•   Get an overview and statistics about the number of archived files in on-line and off-line
    storage.
•   Get an overview about the current total database size as well as about the size of each
    different product.



Recommendation on archiving strategies                                              Page 6 of 7
The WEB based management of the database is protected by a single username/password
combination.

3.4 Output modules

Output modules allow access to the database locally and over the network. The following
access methods are suggested:

•    Get a list of all available products
•    Get a list of all available sites in total or per product
•    Get a list of all available images per product and site within a certain time period
•    Search the database metadata for the occurance of certain values and/or strings and
     return all images which match.
•    Retrieve a single image from the database specified by product, site and date.

For the access to the database, the HTTP protocol is suggested because this protocol is
easy and can be used quite simple from many programming languages. For instance in order
to get a list of all available products, the following URL can be used:

         http://radar-database.xy.com/get_list_of_all_products.cgi

and the server will answer by transmitting the list of all products with contenttype TEXT.

So all of the above database query methods can be executed in a simple web browser or
sent out via simple programs.

Additionally a web interface which provides an easy access to all of the above methods can
be implemented. The web interface will contain on one hand a hierarchical view and
selection method for the complete database with a simple preview facility for BUFR files and
on the other hand provide search facilities to search for images with certain properties.

Additionally collections of images can be created in order to provide a list of interesting
meteorogical events. Image collections are lists of URLs referencing the single images in the
collection.

By using http redirection or URL rewriting on the WEB server side, it would also be possible
to provide a distributed database. For example the databases of two NMSs can be viewed as
one big database because the output module of one NMSs database redirects queries for a
particular product and site to the web address of the other NMS.

4.     Bibliography

[1] Weather Radar Data Archived in OPERA Countries. Compilation of information
     supplied by OPERA delgates to W K Wheeler (UK Delegate)

[2] Opera Working Document 19/01, WMO bulletin format for BUFR encoded radar data

[3] FM94-BUFR Encoding and Decoding Software, User Guidelines, Version 1.0, Konrad
    Köck, OPERA Programme Manager

[4] Operational Programme for the Exchange of Weather Radar Information, OPERA,
    Interim Report, Konrad Köck, Asko Huusonen and Bill Wheeler

[5] Apache Web Server 2.0 Documentation, http://httpd.apache.org/docs-2.0


Recommendation on archiving strategies                                              Page 7 of 7

								
To top