Account

Blacknest BDS Data Overview

Date	2008-03-04
Version	0.6

Note that this is a "work in progress" document and is being continually updated.

Introduction

This document provides an overview of the seismic data and meta data that the BDS system needs to process.

Terminology

Quite a lot of terminology is used seismic instrumentation. Some of this is conflicting. For the BDS system we will use the following terminology:

Term	Description
Array	Name of an array of Stations having seismometers and other sensors. An Array could just have one Station. A whole array is sometimes termed a Station in some contexts. (The may be integrated in the Station, where a Station could have a type which could be an Array).
Station	A single measuring site that can have a number of instruments. This has a location which is time dependent.
Instrument	A single measuring instrument that can have a number of channels.
Channel	A single measuring channel that consists of a seismometer sensor and a digitiser. Each channel has a calibration value.
Sensor	A per channel, seismometer, hydrophone or other sensor measuring time-dependent data, e.g. temperature, wind speed.
Digitiser	A per channel digitiser implicitly including the cascade of filters/decimators if it over samples.
Network	This defines an organisation that is responsible for a number of seismic Arrays or provides data for a number of Arrays. Each Network organistaion may have its own database with different settings for various parameters such as CalibrationScale.
PAZ	Poles and Zeros table for frequency response
FAP	Frequency Amplitude Phase table for frequency response
FIR	Finite Impulse Response coefficient table for frequency response
Event	A seismic disturbance, e.g. and earthquake or explosion.
Arrival	An "arrival" is a signal from an earthquake or explosion recorded at a station.

Currently the term Station can to refer to an Array or an individual Station. We could use the Term Station for multiple site Arrays and single site Stations. In software a Station object could thus have a list of "Sub-Stations".

Instrument Response MetaData

An Array consists of a number of seismic measuring Stations. Each seismic measuring Station has a location and a number of instruments related to it. Each Instrument can measure data on a number of Channels. Each Channel has the following components:

A seismometer sensor. This outputs a voltage dependent on displacement.
An anti-aliasing filter with some response.
A digitiser which samples at some frequency with some frequency response.
A set of filters with appropriate frequency responses.

The frequency response for the seismometer is available either as a set of pole/zero's or as a frequency to amplitude/phase table. These define the, frequency dependent, voltage output generated by a given displacement in nanometers. These values are defined for a particular instrument type, are provided by the manufacturer's and are not changed. Sometimes the senors frequency response is provided as a velocity or acceleration based set of values. These would be converted to a displacement based set for the BDS system. (We certainly want these output in displacement units; how we store them, we will have to think about.)

The anti-aliasing filter, digitiser and post filters have some frequency response that is given in either a pole/zero table, a frequency to amplitude/phase table or a set of FIR coefficients. These filters are fairly "flat" over the region of interest compared to the seismometers frequency response and are thus their responses are not normally taken into account.

A manual calibration is performed, occasionally, which provides an overall CalibrationScale value for a given CalibrationFrequency over the entire channel from seismometer movement to digitised value in file. This allows the seismic movement to be expressed in nanometers.

If a user wants basic seismic data they simply multiply each samples value by the CalibrationScale value. Geotool can do this.

If a user wants more detail they can get the frequency response of the seismometer and perhaps the other filters, normalize these at the CalibrationFrequency and apply the inverse transformation to the data set as well as multiply by the CalibrationScale value.

In the old system, to simplify the later procedure, the pzconst value is calculated for the pole/zero response at the calibration frequency. This provides a simple scale value that can be applied to the data's values to normalise the seismometer's pole/zero table.

The following tables provide a possible, high level, ordered list of the information available. It's order is based on the overall structure of the system. It is a high level list and does not provide detailed information.

Network

This defines an organisation that is responsible for a number of seismic Arrays or provides data for a number of Arrays. Each Network organistaion may have its own database with different settings for various parameters such as CalibrationScale.

Item	Description
Id	Unique integer ID
Name	The Network name. ( etc)
Arrays[]	The list of arrays used by this Organisation. (Maybe this should also include Stations separately ??)

Array

A seismic measuring array that consists of a number of Stations.

Item	Description
Id	Unique integer ID
Name	The Arrays unique name. (YKA etc)
Stations[]	The list of stations within this array

Station

A location where a set of instruments is located.

Item	Description
Id	Unique integer ID
Name	The Stations name. (A Pit name)
Locations[]	Location: latitude and longitude in degrees using the WGS84 datum, the ground level elevation in meters from the WGS84 ellipsoid (Sea level) and the depth of sensor in a pit in meters. (IDC standard is a little vague on this - we will have to check, and might need information to denote the standard(s) used. The geographical information about a site is usually quite static.) This is a list of locations with a TimePeriod for each location as Stations can be moved.
TimePeriods[]	A list of time periods the Station was operation (not/operational ?)
Instruments[]	A list of all of the instruments at this site.

Instrument

A measuring instrument that can have a number of channels.

Item	Description
Id	Unique integer ID
Name	The instruments name.
Channels[]	A list of all of the channels this Instrument has

Channel

An individual data channel.

Item	Description
Id	Unique integer ID
Name	The channel name. (Such as SHZ, BHZ)
LocationId	An ID for the location of the instrument. (Such as "00" for the borehole and "10" for the position on "Pier 1" in the vault at Wolverton) (This might be Name and therefore could be removed).
Calibration[]	Calibration measurements. A list of calibration measurements by date
Sensor[]	The Sensors used. A list of sensors by date
Digitiser[]	The Digitisers used. A list of Digitisers by date

Sensor

A measurement sensor. This would be a seismometer or could be another unit such as a Hydrophone.

Item	Description
Id	Unique integer ID
TimePeriod	The time period the sensor was in use
Name	The sensor name. (This may not be needed)
Type	The sensors type. (Seismometer, Hydrophone etc)
Model	The sensors model name. The Vendor make/type of sensor
SerialNumber	The sensors serial number
Response	Frequency response of sensor as an array of pole zero values or as an amplitude/phase table
HorizontalAngle	Seismometer placement horizontal angle in degrees clockwise from north
VerticalAngle	Seismometer placement vertical angle in degrees with zero = vertically up
Gain	The gain setting. For information only. Set to 0 if unknown.

Digitiser

The digitiser used. In reality the sensor/digitiser could be a single unit.

Item	Description
Id	Unique integer ID1Calibration measurements. A list of calibration measurements by date
TimePeriod	The time period the sensor was in use
Name	The digitiser name. (This may not be needed)
Model	The digitiser model name. The Vendor make/type of instrument
SerialNumber	The digitiser's serial number
SamplingFrequency	The frequency of sampling in HZ
Response[]	Array of responses for each module (Anti-aliasing filter, Digitiser, post filter etc)
Gain	The gain setting. For information only. Set to 0 if unknown.

Calibration

A calibration measurement.

Item	Description
Id	Unique integer ID
TimePeriod	The time period the sensor was in use
CalibrationFrequency	The frequency that the CalibrationScale value is valid for
CalibrationFactor	The scaling value to apply to the data to normalise to Nanometers. This is a measured value at the calibration frequency and is in Nanometers/Count.

Response

A frequency response. This can store the response as a pole/zero table, an amplitude/phase table or a set of FIR coefficients.

Item	Description
Id	Unique integer ID
Name	The response name. (Sensor, AntiAlias, Digitiser)
PoleZeros	Frequency response defined by an array of pole zero values in radians per second.
AmplitudePhaseTable	Frequency response defined by an array of amplitude/phase values with respect to frequency.
FirCoefficients	Frequency response defined by an array of coefficients.

Gain	The overall gain of the filter (At what frequency ???)
Decimation	The amount of decimation applied ??
GroupCorrectionApplied	The group delay correction applied in seconds ??
Symmetry	Symmetry for FIR coefficients (A = asymmetric, B = symmetric[odd], C = symmetric[even]) ??

Notes:symmetry flag (A = asymmetric, B =
symmetric [odd], C = symmetric [even])

This is a high level overview.
No information has been included as to the Network (Organisation). This would allow the choice of different sets of parameters dependent on the Organisation's own settings.
It may be worth pre-calculating the sensors pole/zero normalisation scale value (pzconst) and storing this in a cache for performance reasons.
It might be worth renaming a Channel as Instrument and losing the Instrument entry above.

Data Storage Meta Data

There needs to be information on the available data sets stored in the archive. The following this the important information that is required:

Item	Description
Id	Unique integer ID
Network	The Network organisation the original data is from
Array/Station	The Array/Station the data is from.
Period	The time period of the data
The NetCDF format could be used	Some information on the source of this data. This would allow multiple sources of data from the same Array and Period. It could also support processed data. the source of this data. This would allow multiple sources of data from the same Array and Period. It could also support processed data. (DIRECT, TAPE, PROCESSED ...)
Location	The location of the data. This could be a local archive or even a remote archive
Type	The type of the data, basically the data format.
URL	File location. This can embody a protocol, host, path and filename.
Comment	A general comment string

For any given Array and Period there can be multiple sets of data. This can come from different Network sources and also could include pre-processed data.
The Array and Period information would be used to lookup the Channel and Instrument information from the Instrument Response Meta Data.
We may need to add information here on the channels contained in the dataset. This could include channel swaps etc.

Arrival/Event Meta Data

There is a fair amount of information to be stored for Arrival handling. The main information, as from the IDC data base, has the following database tables:

Arrival	General information on an event.
Stassoc	Summary information on groups of Arrivals.
Origin	Information on the derived origin of events.
Expio	Information on explosions

The event information system will need some looking at, however its information can be separated from the main BDS data. The only link between the two data sets will probably be the time period of the event.

It may be useful to store processed data sets in the main BDS system linked to particular events.

Other Meta Data

There is a number of other items of Meta Data to be stored. The main sets of these include:

System Outages	Information on known system outages
User Information	User ID and other login and security information
Data Request Queuing	Data request queuing and bandwidth management
Logs/Statistics	Logs and statistics of use
Notes	This could be a list of notes on the Meta Data that can be added for information

Most of these items content is dependent of the operation of the BDS itself. They are not crucial to the Seismic data itself.

Seismic Raw Data

The seismic data consists of a number of channels of sampled seismic amplitude values from an Array's Stations. The seismic data is stored in files in various formats. It is proposed that the BDS should initially be able to handle the following formats:

External Data Formats

BKNAS 1.0/2.0	The Blacknest standard data format 1.0 and 2.0
IMS 2.0	The IDC IMS data format

Data Storage Formats

BDRS	BDRS data files are comprised of 4012 byte blocks, each containing a 6 byte header, 4000 data bytes, containing 100 data points from each of 20 channels in two byte blocks, plus 6 footer bytes.
WRA	WRA40, WRA64 and WRA-AGSO.
GCF	Guralp compressed format. File and streaming format.
SEED	Standard for the Exchange of Earthquake Data format.
TapeDigitiser	The Blacknest TapeDigitser file format for Digitised Analogue Tapes.

The following provides the core features of these formats.

TapeDigitiser

Overview	Developed for the TapeDigitiser project to store information from sampled old analogue tapes. Contains a lot of Meta Information about the quality of tape and digitisation process.
Structure	ASCII header. Normally 24 Channels, blocked in variable length blocks of about 12.5 seconds of data per block.
Raw data format	32bit floating point multiplexed data
Sampling Rate	100Hz, but can vary due to tape speed fluctuations (~0.1%)
Meta Data	Meta data on Array and Tape name, time, and details of the Tape Digitisation process.
Misc	Two channels are error channels, one channel is a VELA time-code channel.
Amount

BKNAS

Overview	Blacknest AutoDRM data format. Available in Version 1.0 and 2.0. No compression.
Structure	Up to 35 Channels of continuous data. Multiplexed ASCII data. One set of channel samples per line.
Raw data format	ASCII Integer
Sampling Rate	20Hz normally
Meta Data	Array/Station, Channels, Original Data Type, MasterTapeNumber, TapeFileNumber, Location (no depth), Event information, Geographic region number and name, DigitisingOffset?,
Misc	Cannot provide all of the information that will be available.
Amount

BKNAS1 imposes some limits on the number of channels. Subroutine "afdtr.f" which reads in BKNAS 1 and 2 data for "Apple" (which is our only program that seriously uses BKNAS-format data) has array size 35 for the arrays containing the channel information. Also the digital samples are read in with format statement "40I6", which both imposes a limit of 40 channels and restricts the dynamic range to 9x10^5 digital counts.

IMS 2.0

Overview	IDC AutoDRM data format. Available in Version 1.0 and 2.0. Compressed or un-compressed.
Structure	Up to 35 Channels of continuous data. Separate sets of data per channel.
Raw data format	This can be in: INT(ASCII Integers), CM6(compressed integers described in 6bit ASCII), CM8(compressed integers described in 8bit ASCII), CSF(a sub-format for authenticated data)
Sampling Rate	Any
Meta Data	Network, Station, Channels StationCode, FDSN ChannelCode, AuxCode, InstrumentType, CalibrationScale, CalibrationFrequency, SampleRate, Start/Stop times, PAZ/FAP/FIR Tables, Event information
Misc	Maximum 100MByte message size, can have multiple messages using a REF_ID field. Maximum line length 1,024 characters.
Amount

BDRS

Overview	Very old and basic data storage format. No compression.
Structure	4010 byte Block based, with simple header (Part of array name + Date/Time only). 20 Channels of data.
Raw data format	16bit Integer in little-endian binary format.
Sampling Rate	Normally 20Hz
Meta Data	None other than Part of an array name and Date/Time (Just last digit of year)
Misc	The directory path and file name is used to define the Array, the Year and Day.
Amount

WRA 40 and 64

Overview	Old basic data storage format. No compression.
Structure	WRA40: 32768 Byte blocks with a 256Byte header. 40 Channels of data. WRA64: 53248 Byte blocks with a 256Byte header. 64 Channels of data.
Raw data format	Binary 16bit signed values in little-endian format.
Sampling Rate	20Hz ?
Meta Data	None other than Date/Time
Misc	The directory path and file name is used to define the Array, the Year and Day.
Amount

WRA-AGSO

Overview	Old basic data storage format, but with compression.
Structure	ASCII file header followed by 256 Byte binary data blocks or 256 Byte ASCII blocks.
Raw data format	16bit signed integers in double-difference compressed format identical to that used in GSE2.1 CM8 (Same as IMS CM8?). Non-multiplexed data (one channel at a time).
Sampling Rate	20Hz
Meta Data	StationCode, SamplingRate, ChannelName, CalibrationFactor
Misc	The directory path and file name is used to define the Array, the Year and Day.
Amount

GCF

Overview	Guralp compressed format. Designed for streaming data as well as storage. A low level compressed format with little MetaData.
Structure	A file or stream format. Consists of a sequence of variable length blocks, which can be up to 1024 bytes long. Blocks define data for an individual channel.
Raw data format	Compressed Binary either: 32-bit differences, 16-bit differences or 8-bit differences
Sampling Rate	Defined in block headers
Meta Data	SystemId, StreamId, Date/Time, SamplingRate,
Misc	The block duration is always a whole number of seconds, and always starts on a whole second boundary.
Amount

SEED

Overview	This is an extensive well thought out format. It provides full data and meta-data capability. It is quite complex and big. It is a bit old fashioned in places and looks like it would be awkward to use.
Structure	Blockette based. This is a variable length lump of information with an integer type field. The file consists of a sequence of headers andvariable numbers of Blockette's. Blockette's describe data, and meta information.
Raw data format	Various ASCII and binary, compressed and non-compressed formats are provided. There may be issues with multiplexed channel data. The SEED manual states that is is not desirable to usemultiplexed channel data in SEED files although it is possible.
Sampling Rate	Any
Meta Data	Full meta-data available and can be extended (with some difficulty).
Misc	May not support streaming that well. Would be difficult to add additional meta-Data although it can be done.
Amount

Some notes on these data formats:
The BDRS, WRA-40, WRA-64 and TapeDigitiser are uncompressed formats. If there is a large amount of these they should be stored in a compressed format.

Only the SEED and TapeDigitiser formats have any appreciable ability to store Meta Information.
The BDRS, WRA-40, WRA-64, WRA-AGSO, GCF and TapeDigitiser formats can easily be stored in another format without any loss of information. Due to the extensive abilities of the SEED format, some loss of information could be incurred when storing in another format unless that format has the same abilities.

Compression

Have a look at the IMS CM6 compression system.
Some good descriptions of some methods in the SEED manual:

CODES 10 - 29 FDSN Networks
10            STEIM (1) Compression
11            STEIM (2) Compression
12            GEOSCOPE Multiplexed Format 24 bit integer
13            GEOSCOPE Multiplexed Format 16 bit gain ranged, 3 bit exponent
14            GEOSCOPE Multiplexed Format 16 bit gain ranged, 4 bit exponent
15            US National Network compression
16            CDSN 16 bit gain ranged
17            Graefenberg 16 bit gain ranged
18            IPG - Strasbourg 16 bit gain ranged
19            STEIM (3) Compression

What levels of compression are possible, we need to store a set of example data in the different formats ?

Notes

All Station Codes in the world should be unique and registered with the ISC in the United Kingdom/the National Earthquake Information Center (NEIC) in the United States ? Answer: This is not always the case but they are unique per Network.
Channel code standards are defined in the IMS "Formats and Protocols for Messages" document. Also in the SEED manual. Are all codes to these rules ? Answer: Yes, we will use these codes. There may be some two letter codes such as BZ or SZ, or in lower case, bz, sz, which will be converted.
There are some Instrument type codes defined in the SEED manual (Appendix A). Are all codes to these rules ?
The IMS format has information on Elevation and Depth. Why are the two given ? Answer: The "depth" value tells the seismologist that an instrument is in a borehole. This is useful information. The elevation is required for e.g. synthetic seismogram calculation. So both values are required.
I note that the SEED format has the possibility of flagging that a Leap Second occurred during a block of data. Do we need to handle Leap Seconds ? Answer: Handling Leap Seconds would be good. GPS time does not include Leap Seconds, although a regularly broadcast message notes how far GPST and UTC are apart.
What date time accuracy should we use: Seconds, Milli-Seconds ? Answer: Milli-Seconds.