Data Archival, Exchange and Seismic Data Formats
Bernard Dost1), Jan Zednik2) , J. Havskov 3), R. Willemann4) and P. Bormann5)
1) ORFEUS Data Center, Seismology Division KNMI, P.O. Box 201,3730 AE De Bilt, The Netherlands, Jan Zednik, Geophysical Institute AS CR, Bocni
II/1401, 141 31 Prague Czech Republic,3) Jens Havskov, Institute of Solid Earth Physics, University of Bergen, Allegaten 41, 5007 Bergen, Norway,
Raymond J. Willemann, International Seismological Centre, Pipers Lane, Thatcham, Berkshire RG19 4NS, UK-England; 5) Peter Bormann,
GeoForschungsZentrum Potsdam, Telegrafenberg E428, D-14473 Potsdam, Germany,.
NORDIC format Some commonly encountered digital data formats
The following section gives an alphabetical list of common formats in use. The list of
Seismology entirely depends on international co-operation. Only the accumulation of formats will of course not be complete, particularly for formats in little use, however
In the eighties, there was one of the first attempts to create a more complete format for the most important formats in use today (2000) are included. In a later section, a list of
large sets of compatible high quality data in standardized formats from many stations
data exchange and processing. The initiative came from the need the exchange and popular analysis software systems is mentioned as well as a brief description of some
and networks around the globe and over long periods of time will yield sufficiently
store data in Nordic countries and the so called Nordic format was agreed upon among conversion programs.
reliable long-term results in event localization, seismicity rate and hazard assessment,
the 5 Nordic countries. The format later became the standard format used in the
investigations into the structure and rheology of the Earth interior and other priority In the following only those formats are listed which can be converted by at least one
SEISAN data base and processing system and is now widely used. The format tried to
tasks in seismological research and applications. of these analysis software systems.
address some of the shortcomings in HYPO71 format by being able to store nearly all
For almost a century, only parameter readings taken from seismograms were parameters used, having space for extensions and useful for both input and output. An It is of particular importance on which computer platform the binary file has been
exchanged with other stations and regularly transferred to national or international example is given in below. written since only a few analysis programs work on more than one platform.
data centers for further processing. Because of the uniqueness of traditional paper Therefore, the data file should usually be written on the same platform as the one on
seismograms and lacking opportunities for producing high-quality copies at low cost, which the analysis program is run.
original analog waveform data, cumbersome to handle and prone to damage or even 1996 6 6 0648 30.4 L 62.635 5.047 15.0 TES 13 1.4 3.0CTES 2.9LTES 3.0LNAO1
loss, were rarely exchanged. The procedures for carefully processing, handling, GAP=267 5.92 18.8 43.0 31.8 -0.5630E+03 0.8720E+03 -0.3916E+03E
annotating and storing such records have been extensively described in the 1979 1996-06-06-0647-46S.TEST__011 6
AH: The Ad Hoc (AH) format is used in the AH analysis system.
edition of the Manual of Seismological Observatory Practice. Also the formats for STAT SP IPHASW D HRMM SECON CODA AMPLIT PERI AZIMU VELO SNR AR TRES W DIS CAZ7
reporting parameter readings from seismograms to international data centers such as FOO SZ EP C 648 48.47 136 -0.110 116 180 CSS :The Center for Seismic Studies (CSS) Database Management System (DBMS)
the U.S. Geological Survey National Earthquake Information Service (NEIS), the FOO SZ ESG 649 2.67 0.710 116 180
was designed to facilitate storage and retrieval of seismic data for seismic monitoring
International Seismological Centre (ISC) or the European Mediterranean FOO SZ E 649 2.89 426.4 0.3 116 180
of test ban treaties.
Seismological Centre (EMSC) are outlined in this manual in detail in the section MOL SZ EP C 648 49.97 144 -0.310 129 92 GeoSig: Binary format used by GeoSig recorders.
Reporting output. They have not been changed essentially since then. On the other MOL SZ EPG C 648 50.90 0.410 129 92
hand, respective working groups on parameter formats of the IASPEI and of its Güralp format: Format used by Güralp recorders
MOL AZ E 649 5.86 129 92
regional European Seismological Commission (ESC) have now already debated for MOL SZ ESG 649 5.87 0.410 129 92 ESSTF binary : The European Standard Seismic Tape Format (ESSTF).
many years, without conclusive results or binding recommendations yet, how to make MOL SZ E 649 6.98 328.6 0.6 129 92
these formats more homogeneous, consistent and flexible so as to better accommodate GSE: The (GSE) format has been extensively used with the GSETT projects on
HYA SZ EP 648 56.78 135 0.810 174 159
also other seismologically relevant parameter information. disarmament.
HYA SZ IP D 648 56.78 0.810 174 159
Meanwhile, the Database Management System (DBMS) of the Center for HYA SZ EPG D 648 57.56 0.110 174 159 IRIS dial-up expanded ASCII.The IRIS dial-up data retrieval system format.
Seismic Studies (CSS) developed a standard IMS1.0 format for exchanging parametric HYA SZ ESG 649 18.07 0.610 174 159 ISAM-PITSA: Indexed Sequential Access Method (ISAM) is a commercial database
seismological data used to monitor the Comprehensive Test Ban Treaty (CTBT). It NRA0 SZ Pn 0649 24.03 309.6 8.5 139 5 -0.410 403 119 file system designed for easy access . PITSA bases its internal file structure for digital
uses a commercial relational database management system to facilitate storage and NRA0 SZ Pg 0649 32.60 305.6 7.285.2 1 0.410 403 119 waveform data on ISAM.
retrieval of seismological data. Since seismological research has a broader scope than
the International Monitoring System (IMS) for the CTBT, a IASPEI Seismic Ismes: Format used by Italian Ismes recorders
Parameter Format (ISF) has now been proposed . It conforms with the IMS.1.0 Example of Nordic format. The data is the same as seen in Tabs. 10.1 and 10.2. The
Kinemetrics formats: Kinemetrics have several binary formats.
standard but has essential extensions and is currently tested at the ISC and NEIC. It is format starts with a series of header lines with type of line indicated in the last column
hoped that this format will be adopted as binding at the IASPEI meeting in 2001 and (80) and the phase lines are following the header lines with no line type indicator. Lennartz: Format for Lennartz recorders.
that a standardized instruction on how to report seismological parameter data to There can be any number of header lines including comment lines. The first line gives
Nanometrics: Format used by Nanometrics recorders.
seismological data centers in future will follow soon. This new reporting format will among other things, origin time, location and magnitudes, the second line is error
fully exploits the much greater flexibility and potential of E-mail and Internet estimate, the third line is the name of the corresponding waveform file and the fourth NEIC ORFEUS: The NEIC ORFEUS early CD-ROMs
information exchange as compared to the older telegraphic reports. It will be added to line is the explanation line for the phases (type 7). The abbreviations are: STAT:
PDAS: The format used by the Geotech PDAS recorders
this manual as soon as it is adopted and recommended by the IASPEI Commission on Station code, SP: component, I: I or E, PHAS. Phase, W: Weight, D: polarity, HRMM
Practice for general use. SECON: time, CODA: Duration, AMPLIT: Amplitude, PERI: Period, AZIMU: PITSA BINARY: A PITSA format
Azimuth at station, VELO: Apparent velocity, SNR: Signal-to-noise ratio, AR:
By far the largest volume of seismic data stored and exchanged nowadays are Public Seismic Networks format
Azimuth residual of location, TRES: Travel time residual, W: Weight in location, DIS:
digital waveform data. The number of formats in existence and their complexity far Epicentral distance in km and CAZ: Azimuth from event to station. SAC: Seismic Analysis Code (SAC) is a general purpose interactive program
exceeds the variability for parameter data. With the wide availability of continuous designed for the study of time sequential.
digital waveform data and unique communication technologies for world-wide
transfer of such complete original data, their reliable exchange and archival has gained SEED: The Standard for the Exchange of Earthquake Data (SEED). SEED was
IMS formats adopted by the Federation of Digital Seismographic Networks (FDSN) in 1987 as its
tremendous importance. Several standards for exchange and archival have been
proposed, however a much larger number of formats are in daily use. The purpose of standard. IRIS has also adopted SEED, and uses it as the principal format for its
At about the same time as the Nordic format was made, a new format was also
the section on digital waveform data is to describe the international standards and to datasets. It is worth pointing out that formats (such as SEED) designed to handle the
created for exchange of data within the International Monitoring System (IMS) of the
summarize the most often used formats. In addition, there will be a description of requirements of international data exchange are seldom suited to the needs of
Comprehensive Test Ban Treaty Organization (CTBTO) (formally called the GSE
some of the more common conversion programs. individual researchers. Thus the wide availability of software tools to convert between
parameter format). The format IMS1.0 is similar in structure to the Nordic format,
SEED and a full suite of Class 2 formats is crucial for its success.
however more complete in some respects and lacking features in other respects. A
major difference is that the line length can be more than 80 characters long, which is SEISAN: The SEISAN binary format is used in the seismic analysis program
Parameter formats not the case for any of the previously described formats. The IMS1.0 format was the SEISAN
Parameter formats deal with all earthquake parameters like hypocenters, first real international parameter format (although decided upon by a very limited and
SeisGram ASCII and binary: SeisGram software format
magnitudes, phase arrivals etc. There are no real standards, except The Telegraphic specialized user group) and has been used extensively for data exchange within the
institutions participating in the IMS. It has also been used for data exchange outside Sismalp: Sismalp is a widespread French data seismic recording system
Format (TF) used for many years to report phase arrival data to international agencies.
The format is not used for processing. There has been attempts to modernize TF for IMS like in the popular AutoDRM system, however it has been used less as a
Sprengnether: Format used by Sprengnether recorders.
many years through the IASPEI Commission of Practice and as mentioned in the processing format than HYPO71 and Nordic formats. The format has recently been
extended to include all information needed under the IASPEI Commission on Practice SUDS: SUDS stands for “The Seismic Unified Data System”. The SUDS format was
introduction, a new standard might emerge from year 2001. Thus there is currently no
to be approved in the year 2001. This GSE-IMS extended format is called the IASPEI launched to be a more well thought out format useful for both recording and analysis
modern and internationally accepted exchange format like SEED. In practice many
Seismic Format (ISF). Below is an example of the ISF format. and independent of any particular equipment manufacturer.
different formats are used and the most dominant ones have come from popular
Sta Dist EvAz Phase Time TRes Azim AzRes Slow SRes Def SNR Amp Per Qual Magnitude ArrID Format conversions
KSAR 13.04 16.5 P 01:15:20.300 1.2 200.2 1.2 12.5 -0.3 TAS 47.5 1.5 0.33 a__ 25616243
BJT 16.14 340.0 P 01:15:59.460 1.9 154.3 -1.9 9.0 -2.7 T__ 26.3 1.3 0.33 a__ 25616240 Ideally we should all use the same format. Unfortunately, as the previous descriptions
MJAR 17.24 44.5 P 01:16:09.650 -0.4 240.1 7.9 10.9 -0.1 T__ 6.0 0.4 0.33 a__ 25616246 have shown, there are a large number for formats in use. With respect to parameter
CMAR 23.49 258.8 P 01:17:16.050 0.7 60.9 0.3 8.4 0.6 T__ 35.6 10.5 0.83 a__ mb 4.1 25616266 formats, one can get a long way with HYPO71, Nordic and GSE/ISF formats for
CMAR 23.49 258.8 LR 01:27:05.155 -9.3 80.0 10.3 37.7 -0.4 ___ 96.9 19.42 a__ Ms 3.4 25636151 which converters are available, e.g., in the SEISAN system. For waveform formats,
Net Chan F Low_F HighF AuthPhas Date eTime wTime eAzim wAzim eSlow wSlow eAmp ePer eMag Author ArrID the situation is much more difficult.
IMS BZH C 1.00 10.0 Pg 1997/01/01 0.200 0.000 10.0 0.400 2.5 0.400 0.1 0.05 1.0 EIDC 25636151
Many processing systems require a higher level format than the often primitive
IMS BZH C 1.00 10.0 pPKKPPKP 1997/01/01 99.200 0.000 10.0 0.400 2.5 0.400 0.1 0.05 EIDC 25616240 recording formats so that is probably the most common reason for conversion, and a
IMS BZH C 1.00 10.0 P 1997/01/01 0.200 0.000 10.0 0.400 2.5 0.400 0.1 0.05 EIDC 25616246 similar reason is to move from one processing system to another.
IMS BZH C 1.00 10.0 P 1997/01/01 0.200 0.000 10.0 0.400 2.5 0.400 0.1 0.05 EIDC 25616266
(#MEASURE RECTILINEARITY=0.8) The SEEED format has become a success for archival and data exchange.
IMS BZH C 1.00 10.0 LR 1997/01/01 0.000 10.0 0.400 2.5 0.400 1234567.9 1.00 EIDC 25636151 Unfortunately, it is not very useful for processing purposes, and almost unreadable on
(#ORIG PZH NRA0 1997/01/01 01:27:05.123 359.9 1234.5 123.4 1.3) PC. So it is also important to be able to move down in the hierarchy.
(#MIN -99.999 -100.0 -1000.0 -1234567.9-10.23)
(#MAX +99.999 +100.0 +1000.0 +1234567.9+10.23)
(#COREC +0.500 -100.0 -1234.5 0.12)
There are essentially two ways of converting. The first is to request a data from a data
center in a particular format or logging into a data center and using one of their
conversion programs. The other more common way is to use a conversion program on
HYPO71 the local computer. Such conversion programs are available both as free standing and
as part of processing systems.
The very popular locations program HYPO71 has been around for many years and
has been the most used program for local earthquakes. The format was therefore
limited to work with only a few of the important parameters. An example is shown
Since conversion programs are often related to analysis programs, we list some of the
better known analysis systems and the format they use directly.
FOO EPC 96 6 6 64848.47 62.67ES 136
Digital waveform formats
MOL EPC 96 6 6 64849.97 65.87ES 144
HYA EP 96 6 6 64856.78 78.07ES 135 Many different formats for digital data are used today in seismology. Most formats
ASK EP 96 6 6 649 2.94 34.72ES 183 can be grouped into one of the following five classes:
BER EPC 96 6 6 649 7.56 36.61ES
EGD EPD 96 6 6 649 5.76 40.53ES
10 5.0 1. Local formats in use at individual stations, networks or used by a particular seismic
recorder (e.g. ESSTF, PDR-2, BDSN, GDSN). Program Author(s) Input format(s) Output
Example of an input file in HYPO71 format. Each line contains, from left to right: 2. Formats used in standard analysis software (e. g. SEISAN, SAC, AH, BDSN).
CDLOOK R.Sleeman SEED SAC, GSE
Station code (max 4 characters), E (emergent) or I (impulsive) for onset clarity,
3. Formats designed for data exchange and archiving (SEED, GSE).
polarity (C – compression; D – dilatation), year, month, day, and time (hours,
Geotool J.Coyne CSS, SAC, GSE CSS, SAC, GSE
minutes, seconds, hundredth of seconds) for P-Phase, second for S-phase (seconds 4. Formats designed for database systems (CSS, SUDS)
and hundredth of seconds only), S-phase onset and, in the last column, duration. The
5. Formats for real time data transmission. PITSA F.Scherbaum, ISAM, SEED, ISAM, ASCII
blank space between ES and duration has been used for different purposes like J.Johnson Pitsa binary,
amplitude. The last line is a separator line between events and contains control GSE, SUDS
information. SAC LLNL SAC SAC
Use of the term "designed" in describing Class 3 and 4 formats is intentional. It is
The format is rather limited since only P or S phase names can be used and the S- usually only at this level that very much thought has been given to the subtleties of
SEISAN J.Havskov, L. SEISAN, GSE SEISAN, GSE,
phase is reference to the same hour-minute as the P-phase and the format cannot be format structure which result in efficiency, flexibility, and extensibility. Ottemöller SAC
used with teleseismic data. However, the format is probably one of the most popular
The four classes (1-4) show a hierarchical structure. Class 4 forms a superset of the SeismicHand K.Stammler q, miniSEED, q, GSE,
formats ever for local earthquakes. The HYPO71 program has seen many
others, meaning that classes 1-3 can be deduced from it. The same argument applies to ler GSE, AH, ESSTF miniSEED
modifications and the format exists in many forms with small changes.
class 3 with respect to classes 1 and 2. Nearly all format conversions performed at
seismological data centers are done to move upwards in the hierarchy for the purpose SNAP M.Baer SED, GSE SED, GSE
HYPOINVERSE of data archiving and exchange with other data centers. Software tools are widely
available to convert from one format to another and particularly upwards in the SUDS P.Ward SUDS SUDS
Following the popularity of HYPO71, several other popular location programs hierarchy.
followed like Hypoinverse and Hypoellipse, however none has been used as much as Event M.Musil ESSTF, ASCII ESSTF, ASCII
The GDSN (Global Digital Seismic Network) format began as a Class 1 format, but
HYPO71. Below is an example of the input format for Hypoinverse.
because it was used by an important global seismograph network (DWWSSN, SRO) it SeisBase T.Fischer ESSTF, Mars88, GSE
became accepted as a de facto standard for data exchange (Class 3). The beginning of GSE
96 6 60648 widespread international data exchange within the FDSN (Federation of Digital
FOO EPC 48.5 136 Seismic networks) and GSE (Global Seismic Exchange) groups in the late 1980s
FOO ES 62.7 revealed the GDSN format's weaknesses in this role and put in motion the process of
MOL EPC 50.0 144 defining more capable exchange formats.
MOL EPC 50.9
MOL ES 65.9
Example of the Hypoinverse input format. Note that year, month, day, hour, min is
only given in the header and only one phase is given per line