Account

Blacknest BDS Schema Overview

Date	2008-04-03
Version	0.1

Note that this is a "work in progress" document and is being continually updated.

Introduction

This document provides an overview of the Blacknest Data System (BDS) Database Schema.

Comments in italics by SP 3rd April 2008

Database Schema Overview

The BDS Database holds information on all of the Seismic data and associated metadata. The database will be implemented as a MySQL database. Generally the database is not directly accessed. It is normally only accessed by the BDS Data Engine in response to API requests.

General

Dates and times: These need to be stored to the nearest ms. We could use a MySQL DateTime + an Integer for the ms. Or we could simply use a String formated using ISO 8601 format: eg "2002-04-03T11:20:53.103". An entry of 000-00-00T00:00:00.000 is used for unknown. I prefer to use DateTime rather than a string. Both have the advantages over epochal time that we can read dates/times from raw database dumps, but only DateTime allows us to devise queries more easily to get specific dates and times. A string would require pre-processing before any query; and might give wrong answers to direct queries if lexical greater-than/less-than gives different answers from numerical greater-than/less-than. If we are not bothered about milliseconds then we can use simple less-than and greater-than queries on the DateTime column; but how will we do date/time comparisons when we do need them to the nearest millisecond?

Overall Management

Groups

The list of user groups.

Name	Type	Description
group	String	The Group name
description	String	The Groups description
permissions	String	Comma separated list of permissions (or separate table)

Users

The list of users.

Name	Type	Description
id	Integer	Unique ID for this entry. (Used as the userId)
name	String	The users full name
loginId	String	The users login ID
loginPassword	String	The users login password (Encrypted)
emailAddress	String	The users email address
telephone	String	The users telephone number
enabled	Bool	If this user is enabled

UserGroups

Used to keep the list of groups a user belongs to.

Name	Type	Attributes	Description
userId	Integer		The userId for this group
group	String		A group this user belongs to

Overall Information MetaData

I suggest that all the tables have a "date when last altered" column that is automatically updated when the row is modified. TB: We could use a separate Changes table to handle this.

Networks

The list of networks.

Name	Type	Attributes	Description
network	String	Primary	The Network name
description	String		The Network description

NetworkStations

Used to keep the list of all stations managed by a network.

Name	Type	Attributes	Description
network	String	Primary	The Network name
station	String		The Station/Array name

Stations

Used to keep the list of all station names. If a station is only managed by one network then we could integrate the NetworkStations table with this one. How do we denote stations that are both an array and a co-located three-component set? Will we want to do searches for three-component sets that include those co-located with an array? At present our co-located three-component set has a different name (EKB) from the array (EKA), but will this change after the upgrade? We could have duplicate entries, one to denote the array, one for the three-component set, but we will not be able to use "station" as a primary key if it is not unique within the table. Also, is this table going to contain all the pits within an array? Will they have a different station type from an isolated single station? (they don't in IDC table "site" - array pits are all designated "ss" - "single station")

Name	Type	Attributes	Description
station	String	Primary	The Station/Array name
type	String		The station type (Station, Array, Pit)

Channel MetaData

StationLocations

Used to keep the list of all station locations

Name	Type	Attributes	Description
startTime	String	Primary	The Start time
endTime	String	Primary	The End time
network	String	Primary	The Network name
station	String	Primary	The Station/Array name
latitude	Double		The Latitude in degrees using the WGS84 datum.
longitude	Double		The longitude in degrees using the WGS84 datum.
elevation	Double		The ground level elevation in metres from the WGS84 ellipsoid (Sea level)
arrayOffsetEast	Double		For a station in an array, this is the offset from the array's centre in km
arrayOffsetNorth	Double		For a station in an array, this is the offset from the array's centre in km

Are any of our station locations NOT relative to WGS84? Do we know? If any, do we know which reference WAS used, and who's going to do the conversion and how?

Channels

Used to keep the list of all channels. I have added "locationId" to this table since two instruments running simultaneously at the same station can have the same channel name, e.g., surface and borehole instruments at WOL when the surface instrument was mis-oriented were both called "BH1", "BH2" and "BH3".

Name	Type	Attributes	Description
startTime	String	Primary	The Start time for this channel
endTime	String	Primary	The End time for this channel
network	String	Primary	The Network name
station	String	Primary	The Station/Array name
channel	String		The Channels name
locationId	String		The location ID (usually two digits, e.g. "01")

Calibrations

Used to keep the list of all calibration entries for a channel. Location ID (see above) must appear as a primary key wherever "channel" appears as one.

Name	Type	Attributes	Description
startTime	String	Primary	The Start time for this channel
endTime	String	Primary	The End time for this channel
network	String	Primary	The Network name
station	String	Primary	The Station/Array name
channel	String	Primary	The Channel name
locationId	String	Primary	The location ID (usually two digits, e.g. "01")
calibrationFrequency	Double		The frequency that the CalibrationFactor value is valid for (Hz)
calibrationFactor	Double		The scaling value to apply to the data to normalise to nanometres. This is a measured value at the calibration frequency. For seisometers it is in nanometres/count. The calibrationUnits specify the actual units.
calibrationUnits	String		Unit for calibration Factor, e.g. volts per (m/s), volts per Pa

Instruments

Used to keep the list of all instruments for a channel

Name	Type	Attributes	Description
startTime	String	Primary	The Start time
endTime	String	Primary	The End time
network	String	Primary	The Network name
station	String	Primary	The Station/Array name
channel	String	Primary	The channel name
locationId	String	Primary	The location ID (usually two digits, e.g. "01")
instrument	String		The instrument name
type	String		The type of instrument - e.g. seismometer, infrasound microphone, hydrophone, wind gauge, thermometer .
model	String		The instrument model name. The Vendor make/type of instrument. Only used when there is a unique physical instrument
serialNumber	String		The instrument's serial number. Only used when there is a unique physical instrument.

Digitisers

Used to keep the list of all digitisers for a channel

Name	Type	Attributes	Description
startTime	String	Primary	The Start time
endTime	String	Primary	The End time
network	String	Primary	The Network name
station	String	Primary	The Station/Array name
channel	String	Primary	The channel name (see below, table Sensor)
locationId	String	Primary	The location ID (usually two digits, e.g. "01")
digitiser	String		The digitiser's model name
model	String		The digitiser's model name. Only used when there is a unique physical digitiser
serialNumber	String		The digitisers's serial number. Only used when there is a unique physical digitiser
samplingFrequency	Double		The sampling frequency (Hz)
initialSamplingFrequency	Double		The initial pre-decimation sampling frequency (Hz)
gain	Double		The overall gain of the digitiser at the manufacturers calibration frequency (counts/volt?). (For information only)

Sensors

Used to keep the list of all sensors for a given channels digitiser. I don't understand this table. As it stands it is a duplicate of "instrument" above. If it is meant to be keeping track of the several channels served by a single digitiser, then it has a "many-to-one" relationship with individual rows of the "digitiser" table (i.e. one digitiser has many sensors). Hence it should have as foreign key the digitiser model and serial number. The "digitiser" table above should not contain a "channel" row if each digitiser is taken to serve many channels.

Name	Type	Attributes	Description
startTime	String	Primary	The Start time
endTime	String	Primary	The End time
network	String	Primary	The Network name
station	String	Primary	The Station/Array name
channel	String	Primary	The channel name
locationId	String	Primary	The location ID (usually two digits, e.g. "01")
sensor	String		The sensor name What will this contain? I have lost track of why this is different from the "instrument".
type	String		The type of sensor. (Seismometer, Hydrophone etc)
model	String		The sensor's model name. Only used when there is a unique physical sensor
serialNumber	String		The sensor's serial number. Only used when there is a unique physical sensor
gain	Double		The overall gain of the sensor at the manufacturers calibration frequency. (For information only)
units of gain?	String		Units in which gain is quoted, e.g. volts per (m/s), volts per Pa

SensorLocations

Used to keep the list of sensor locations. The placement angles could be in a separate table.

Name	Type	Attributes	Description
startTime	String	Primary	The Start time
endTime	String	Primary	The End time
network	String	Primary	The Network name
station	String	Primary	The Station/Array name
channel	String	Primary	The channel name
sensor	String		The sensor name (see above)
locationId	String	Primary	The location ID (usually a two-digit code, e.g. "01")
depth	Double		The depth below ground level (m)
verticalAngle	Double		The Sensor's placement vertical angle in degrees degrees with zero = vertically up (degrees)
horizontalAngle	Double		The Sensor's placement horizontal angle in degrees clockwise from north (degrees)

Responses

Used to keep the list of all frequency responses for a given channels digitiser and sensors. We may need sensor id/name and digitiser id/name in here.

Name	Type	Attributes	Description
startTime	String	Primary	The Start time
endTime	String	Primary	The End time
network	String	Primary	The Network name
station	String	Primary	The Station/Array name
channel	String	Primary	The channel name
locationId	String	Primary	The location ID (see Sensor Locations, above)
digitiser	String		The digitiser name
sensor	String		The sensors name (may be null if this is a digitiser response)
response	String		The responses name (Sensor,Anti-Aliasing filter, Digitiser, post filter etc)
type	String		The type of response (PoleZero,AmplitudePhase or FIR Coefficients)
data	Blob		PoleZero, AmplitudePhase or FIR Coefficient data
gain	Double		Overall gain at gain frequency.
gainFrequency	Double		Frequency that gain is valid for. (For information)
decimation	Double		Decimation performed post filter
symmetry	String		Symmetry for FIR coefficients (A = asymmetric, B = symmetric[odd], C = symmetric[even])

Seismic Data

Data

Used to keep the list of data files managed by the BDS.

What is the purpose of this table? How does the "unique integer ID" work? Since this ID appears only in this table it can't be used as a foreign key to other tables. I have assumed that the primary keys for this table are the start and end time, network and station, and the "source", and that the purpose of the table is to look up the unique integer ID, in order to use it in other places, as yet unspecified. I have added "channel" and "locationID" to the primary keys, because although the multiplexed data and SEED data files in our archive do contain all the channels for each station, the gcf data are archived per channel/locationID. Note that for the multiplexed and multichannel formats there will be a many-to-one relationship between the primary key suite and the "unique integer ID".

Name	Type	Attributes	Description
id	Integer		Unique integer ID
startTime	String	Primary	The Start time
endTime	String	Primary	The End time
network	String	Primary	The Network organisation the original data is from
station	String	Primary	The Station/Array name
channel	String	Primary	The channel name
locationId	String	Primary	The location ID
source	String	Primary	Some information on the source of this data (DIRECT, TAPE, PROCESSED ...). This would allow multiple sources of data from the same Array and TimePeriod. It could also support processed data.
location	String		The location of the data. This could be a local archive or even a remote archive
format	String		The data's format
url	String		File location. This can embody a protocol, host, path and filename.
comment	String		A general comment string.
channels	String		A list of channels some how ....

Misc Data

Notes

Used to keep the list of data outages and other notes.

Name	Type	Attributes	Description
startTime	String	Primary	The Start time
endTime	String	Primary	The End time
network	String		The Network organisation the original data is from
station	String	Primary	The Station/Array name
channel	String	Primary	The Channel name
locationId	String	Primary	The location ID (see Sensor Locations, above)
type	String		The entry type: (Outage, Note, Warning etc)
source	String		The source of the note
title	String		A Title for the note
comment	String		A Comment for the note

Changes

Used to keep the list of changes made to the database. Not sure on this.

Name	Type	Description
time	String	The time the change was made
table	String	The Table that was altered. (Should be have a list of tables ?)
id	Integer	The tables entry that was altered (We would need a unique ID in each record for this)(SHould this be a list ?)
user	String	The User that made the change
title	String	A Title for the change
comment	String	A Comment for the change

We will also need database information store on the following:

Data Request queues
Data Requests serviced
Data additions made ?