BDS Public

Blacknest BDS Data Overview

Date 2008-03-04
Version 0.6

Note that this is a "work in progress" document and is being continually updated.

Introduction

This document provides an overview of the seismic data and meta data that the BDS system needs to process.

Terminology

Quite a lot of terminology is used seismic instrumentation. Some of this is conflicting. For the BDS system we will use the following terminology:

Term Description
Array Name of an array of Stations having seismometers and other sensors. An Array could just have one Station. A whole array is sometimes termed a Station in some contexts. (The may be integrated in the Station, where a Station could have a type which could be an Array).
Station A single measuring site that can have a number of instruments. This has a location which is time dependent.
Instrument A single measuring instrument that can have a number of channels.
Channel A single measuring channel that consists of a seismometer sensor and a digitiser. Each channel has a calibration value.
Sensor A per channel, seismometer, hydrophone or other sensor measuring time-dependent data, e.g. temperature, wind speed.
Digitiser A per channel digitiser implicitly including the cascade of filters/decimators if it over samples.
Network This defines an organisation that is responsible for a number of seismic Arrays or provides data for a number of Arrays. Each Network organistaion may have its own database with different settings for various parameters such as CalibrationScale.
PAZ Poles and Zeros table for frequency response
FAP Frequency Amplitude Phase table for frequency response
FIR Finite Impulse Response coefficient table for frequency response
Event A seismic disturbance, e.g. and earthquake or explosion.
Arrival An "arrival" is a signal from an earthquake or explosion recorded at a station.

  • Currently the term Station can to refer to an Array or an individual Station. We could use the Term Station for multiple site Arrays and single site Stations. In software a Station object could thus have a list of "Sub-Stations".

Instrument Response MetaData

An Array consists of a number of seismic measuring Stations. Each seismic measuring Station has a location and a number of instruments related to it. Each Instrument can measure data on a number of Channels. Each Channel has the following components:
  1. A seismometer sensor. This outputs a voltage dependent on displacement.
  2. An anti-aliasing filter with some response.
  3. A digitiser which samples at some frequency with some frequency response.
  4. A set of filters with appropriate frequency responses.
The frequency response for the seismometer is available either as a set of pole/zero's or as a frequency to amplitude/phase table. These define the, frequency dependent, voltage output generated by a given displacement in nanometers. These values are defined for a particular instrument type, are provided by the manufacturer's and are not changed. Sometimes the senors frequency response is provided as a velocity or acceleration based set of values. These would be converted to a displacement based set for the BDS system. (We certainly want these output in displacement units; how we store them, we will have to think about.)

The anti-aliasing filter, digitiser and post filters have some frequency response that is given in either a pole/zero table, a frequency to amplitude/phase table or a set of FIR coefficients. These filters are fairly "flat" over the region of interest compared to the seismometers frequency response and are thus their responses are not normally taken into account.

A manual calibration is performed, occasionally, which provides an overall CalibrationScale value for a given CalibrationFrequency over the entire channel from seismometer movement to digitised value in file. This allows the seismic movement to be expressed in nanometers.

If a user wants basic seismic data they simply multiply each samples value by the CalibrationScale value. Geotool can do this.

If a user wants more detail they can get the frequency response of the seismometer and perhaps the other filters, normalize these at the CalibrationFrequency and apply the inverse transformation to the data set as well as multiply by the CalibrationScale value.

In the old system, to simplify the later procedure, the pzconst value is calculated for the pole/zero response at the calibration frequency. This provides a simple scale value that can be applied to the data's values to normalise the seismometer's pole/zero table.

The following tables provide a possible, high level, ordered list of the information available. It's order is based on the overall structure of the system. It is a high level list and does not provide detailed information.

Network

This defines an organisation that is responsible for a number of seismic Arrays or provides data for a number of Arrays. Each Network organistaion may have its own database with different settings for various parameters such as CalibrationScale.
Item Description
Id Unique integer ID
Name The Network name. ( etc)
Arrays[] The list of arrays used by this Organisation. (Maybe this should also include Stations separately ??)

Array

A seismic measuring array that consists of a number of Stations.
Item Description
Id Unique integer ID
Name The Arrays unique name. (YKA etc)
Stations[] The list of stations within this array

Station

A location where a set of instruments is located.
Item Description
Id Unique integer ID
Name The Stations name. (A Pit name)
Locations[] Location: latitude and longitude in degrees using the WGS84 datum, the ground level elevation in meters from the WGS84 ellipsoid (Sea level) and the depth of sensor in a pit in meters. (IDC standard is a little vague on this - we will have to check, and might need information to denote the standard(s) used. The geographical information about a site is usually quite static.) This is a list of locations with a TimePeriod for each location as Stations can be moved.
TimePeriods[] A list of time periods the Station was operation (not/operational ?)
Instruments[] A list of all of the instruments at this site.

Instrument

A measuring instrument that can have a number of channels.
Item Description
Id Unique integer ID
Name The instruments name.
Channels[] A list of all of the channels this Instrument has

Channel

An individual data channel.
Item Description
Id Unique integer ID
Name The channel name. (Such as SHZ, BHZ)
LocationId An ID for the location of the instrument. (Such as "00" for the borehole and "10" for the position on "Pier 1" in the vault at Wolverton) (This might be Name and therefore could be removed).
Calibration[] Calibration measurements. A list of calibration measurements by date
Sensor[] The Sensors used. A list of sensors by date
Digitiser[] The Digitisers used. A list of Digitisers by date

Sensor

A measurement sensor. This would be a seismometer or could be another unit such as a Hydrophone.
Item Description
Id Unique integer ID
TimePeriod The time period the sensor was in use
Name The sensor name. (This may not be needed)
Type The sensors type. (Seismometer, Hydrophone etc)
Model The sensors model name. The Vendor make/type of sensor
SerialNumber The sensors serial number
Response Frequency response of sensor as an array of pole zero values or as an amplitude/phase table
HorizontalAngle Seismometer placement horizontal angle in degrees clockwise from north
VerticalAngle Seismometer placement vertical angle in degrees with zero = vertically up
Gain The gain setting. For information only. Set to 0 if unknown.

Digitiser

The digitiser used. In reality the sensor/digitiser could be a single unit.
Item Description
Id Unique integer ID1Calibration measurements. A list of calibration measurements by date
TimePeriod The time period the sensor was in use
Name The digitiser name. (This may not be needed)
Model The digitiser model name. The Vendor make/type of instrument
SerialNumber The digitiser's serial number
SamplingFrequency The frequency of sampling in HZ
Response[] Array of responses for each module (Anti-aliasing filter, Digitiser, post filter etc)
Gain The gain setting. For information only. Set to 0 if unknown.

Calibration

A calibration measurement.
Item Description
Id Unique integer ID
TimePeriod The time period the sensor was in use
CalibrationFrequency The frequency that the CalibrationScale value is valid for
CalibrationFactor The scaling value to apply to the data to normalise to Nanometers. This is a measured value at the calibration frequency and is in Nanometers/Count.

Response

A frequency response. This can store the response as a pole/zero table, an amplitude/phase table or a set of FIR coefficients.
Item Description
Id Unique integer ID
Name The response name. (Sensor, AntiAlias, Digitiser)
PoleZeros Frequency response defined by an array of pole zero values in radians per second.
AmplitudePhaseTable Frequency response defined by an array of amplitude/phase values with respect to frequency.
FirCoefficients Frequency response defined by an array of coefficients.
Gain The overall gain of the filter (At what frequency ???)
Decimation The amount of decimation applied ??
GroupCorrectionApplied The group delay correction applied in seconds ??
Symmetry Symmetry for FIR coefficients (A = asymmetric, B = symmetric[odd], C = symmetric[even]) ??

Notes:symmetry flag (A = asymmetric, B =
symmetric [odd], C = symmetric [even])

  • This is a high level overview.
  • No information has been included as to the Network (Organisation). This would allow the choice of different sets of parameters dependent on the Organisation's own settings.
  • It may be worth pre-calculating the sensors pole/zero normalisation scale value (pzconst) and storing this in a cache for performance reasons.
  • It might be worth renaming a Channel as Instrument and losing the Instrument entry above.

Data Storage Meta Data

There needs to be information on the available data sets stored in the archive. The following this the important information that is required:

Item Description
Id Unique integer ID
Network The Network organisation the original data is from
Array/Station The Array/Station the data is from.
Period The time period of the data
The NetCDF format could be used Some information on the source of this data. This would allow multiple sources of data from the same Array and Period. It could also support processed data. the source of this data. This would allow multiple sources of data from the same Array and Period. It could also support processed data. (DIRECT, TAPE, PROCESSED ...)
Location The location of the data. This could be a local archive or even a remote archive
Type The type of the data, basically the data format.
URL File location. This can embody a protocol, host, path and filename.
Comment A general comment string

  • For any given Array and Period there can be multiple sets of data. This can come from different Network sources and also could include pre-processed data.
  • The Array and Period information would be used to lookup the Channel and Instrument information from the Instrument Response Meta Data.
  • We may need to add information here on the channels contained in the dataset. This could include channel swaps etc.

Arrival/Event Meta Data

There is a fair amount of information to be stored for Arrival handling. The main information, as from the IDC data base, has the following database tables:

Arrival General information on an event.
Stassoc Summary information on groups of Arrivals.
Origin Information on the derived origin of events.
Expio Information on explosions

The event information system will need some looking at, however its information can be separated from the main BDS data. The only link between the two data sets will probably be the time period of the event.
  • It may be useful to store processed data sets in the main BDS system linked to particular events.

Other Meta Data

There is a number of other items of Meta Data to be stored. The main sets of these include:

System Outages Information on known system outages
User Information User ID and other login and security information
Data Request Queuing Data request queuing and bandwidth management
Logs/Statistics Logs and statistics of use
Notes This could be a list of notes on the Meta Data that can be added for information

Most of these items content is dependent of the operation of the BDS itself. They are not crucial to the Seismic data itself.

Seismic Raw Data

The seismic data consists of a number of channels of sampled seismic amplitude values from an Array's Stations. The seismic data is stored in files in various formats. It is proposed that the BDS should initially be able to handle the following formats:

External Data Formats

BKNAS 1.0/2.0 The Blacknest standard data format 1.0 and 2.0
IMS 2.0 The IDC IMS data format

Data Storage Formats

BDRS BDRS data files are comprised of 4012 byte blocks, each containing a 6 byte header, 4000 data bytes, containing 100 data points from each of 20 channels in two byte blocks, plus 6 footer bytes.
WRA WRA40, WRA64 and WRA-AGSO.
GCF Guralp compressed format. File and streaming format.
SEED Standard for the Exchange of Earthquake Data format.
TapeDigitiser The Blacknest TapeDigitser file format for Digitised Analogue Tapes.

The following provides the core features of these formats.

TapeDigitiser

Overview Developed for the TapeDigitiser project to store information from sampled old analogue tapes. Contains a lot of Meta Information about the quality of tape and digitisation process.
Structure ASCII header. Normally 24 Channels, blocked in variable length blocks of about 12.5 seconds of data per block.
Raw data format 32bit floating point multiplexed data
Sampling Rate 100Hz, but can vary due to tape speed fluctuations (~0.1%)
Meta Data Meta data on Array and Tape name, time, and details of the Tape Digitisation process.
Misc Two channels are error channels, one channel is a VELA time-code channel.
Amount

BKNAS

Overview Blacknest AutoDRM data format. Available in Version 1.0 and 2.0. No compression.
Structure Up to 35 Channels of continuous data. Multiplexed ASCII data. One set of channel samples per line.
Raw data format ASCII Integer
Sampling Rate 20Hz normally
Meta Data Array/Station, Channels, Original Data Type, MasterTapeNumber, TapeFileNumber, Location (no depth), Event information, Geographic region number and name, DigitisingOffset?,
Misc Cannot provide all of the information that will be available.
Amount
  • BKNAS1 imposes some limits on the number of channels. Subroutine "afdtr.f" which reads in BKNAS 1 and 2 data for "Apple" (which is our only program that seriously uses BKNAS-format data) has array size 35 for the arrays containing the channel information. Also the digital samples are read in with format statement "40I6", which both imposes a limit of 40 channels and restricts the dynamic range to 9x10^5 digital counts.

IMS 2.0

Overview IDC AutoDRM data format. Available in Version 1.0 and 2.0. Compressed or un-compressed.
Structure Up to 35 Channels of continuous data. Separate sets of data per channel.
Raw data format This can be in: INT(ASCII Integers), CM6(compressed integers described in 6bit ASCII), CM8(compressed integers described in 8bit ASCII), CSF(a sub-format for authenticated data)
Sampling Rate Any
Meta Data Network, Station, Channels
StationCode, FDSN ChannelCode, AuxCode, InstrumentType, CalibrationScale, CalibrationFrequency, SampleRate, Start/Stop times, PAZ/FAP/FIR Tables, Event information
Misc Maximum 100MByte message size, can have multiple messages using a REF_ID field. Maximum line length 1,024 characters.
Amount

BDRS

Overview Very old and basic data storage format. No compression.
Structure 4010 byte Block based, with simple header (Part of array name + Date/Time only). 20 Channels of data.
Raw data format 16bit Integer in little-endian binary format.
Sampling Rate Normally 20Hz
Meta Data None other than Part of an array name and Date/Time (Just last digit of year)
Misc The directory path and file name is used to define the Array, the Year and Day.
Amount

WRA 40 and 64

Overview Old basic data storage format. No compression.
Structure WRA40: 32768 Byte blocks with a 256Byte header. 40 Channels of data.
WRA64: 53248 Byte blocks with a 256Byte header. 64 Channels of data.
Raw data format Binary 16bit signed values in little-endian format.
Sampling Rate 20Hz ?
Meta Data None other than Date/Time
Misc The directory path and file name is used to define the Array, the Year and Day.
Amount

WRA-AGSO

Overview Old basic data storage format, but with compression.
Structure ASCII file header followed by 256 Byte binary data blocks or 256 Byte ASCII blocks.
Raw data format 16bit signed integers in double-difference compressed format identical to that used in GSE2.1 CM8 (Same as IMS CM8?). Non-multiplexed data (one channel at a time).
Sampling Rate 20Hz
Meta Data StationCode, SamplingRate, ChannelName, CalibrationFactor
Misc The directory path and file name is used to define the Array, the Year and Day.
Amount

GCF

Overview Guralp compressed format. Designed for streaming data as well as storage. A low level compressed format with little MetaData.
Structure

A file or stream format. Consists of a sequence of variable length blocks, which can be up to 1024 bytes long. Blocks define data for an individual channel.

Raw data format Compressed Binary either: 32-bit differences, 16-bit differences or 8-bit differences
Sampling Rate Defined in block headers
Meta Data SystemId, StreamId, Date/Time, SamplingRate,
Misc The block duration is always a whole number of seconds, and always starts on a whole second boundary.
Amount

SEED

Overview This is an extensive well thought out format. It provides full data and meta-data capability. It is quite complex and big. It is a bit old fashioned in places and looks like it would be awkward to use.
Structure Blockette based. This is a variable length lump of information with an integer type field. The file consists of a sequence of headers andvariable numbers of Blockette's. Blockette's describe data, and meta information.
Raw data format Various ASCII and binary, compressed and non-compressed formats are provided. There may be issues with multiplexed channel data. The SEED manual states that is is not desirable to usemultiplexed channel data in SEED files although it is possible.
Sampling Rate Any
Meta Data Full meta-data available and can be extended (with some difficulty).
Misc May not support streaming that well. Would be difficult to add additional meta-Data although it can be done.
Amount

Some notes on these data formats:
The BDRS, WRA-40, WRA-64 and TapeDigitiser are uncompressed formats. If there is a large amount of these they should be stored in a compressed format.
  1. Only the SEED and TapeDigitiser formats have any appreciable ability to store Meta Information.
  2. The BDRS, WRA-40, WRA-64, WRA-AGSO, GCF and TapeDigitiser formats can easily be stored in another format without any loss of information. Due to the extensive abilities of the SEED format, some loss of information could be incurred when storing in another format unless that format has the same abilities.

Compression

  • Have a look at the IMS CM6 compression system.
  • Some good descriptions of some methods in the SEED manual:
CODES 10 - 29 FDSN Networks
10            STEIM (1) Compression
11            STEIM (2) Compression
12            GEOSCOPE Multiplexed Format 24 bit integer
13            GEOSCOPE Multiplexed Format 16 bit gain ranged, 3 bit exponent
14            GEOSCOPE Multiplexed Format 16 bit gain ranged, 4 bit exponent
15            US National Network compression
16            CDSN 16 bit gain ranged
17            Graefenberg 16 bit gain ranged
18            IPG - Strasbourg 16 bit gain ranged
19            STEIM (3) Compression
  • What levels of compression are possible, we need to store a set of example data in the different formats ?

Notes

  1. All Station Codes in the world should be unique and registered with the ISC in the United Kingdom/the National Earthquake Information Center (NEIC) in the United States ? Answer: This is not always the case but they are unique per Network.
  2. Channel code standards are defined in the IMS "Formats and Protocols for Messages" document. Also in the SEED manual. Are all codes to these rules ? Answer: Yes, we will use these codes. There may be some two letter codes such as BZ or SZ, or in lower case, bz, sz, which will be converted.
  3. There are some Instrument type codes defined in the SEED manual (Appendix A). Are all codes to these rules ?
  4. The IMS format has information on Elevation and Depth. Why are the two given ? Answer: The "depth" value tells the seismologist that an instrument is in a borehole. This is useful information. The elevation is required for e.g. synthetic seismogram calculation. So both values are required.
  5. I note that the SEED format has the possibility of flagging that a Leap Second occurred during a block of data. Do we need to handle Leap Seconds ? Answer: Handling Leap Seconds would be good. GPS time does not include Leap Seconds, although a regularly broadcast message notes how far GPST and UTC are apart.
  6. What date time accuracy should we use: Seconds, Milli-Seconds ? Answer: Milli-Seconds.