Note that this is a "work in progress" document and is being continually updated.
Introduction
This document provides an overview of the Blacknest Data System (BDS) Database Schema.
Comments in italics by SP 3rd April 2008
Database Schema Overview
The BDS Database holds information on all of the Seismic data and associated metadata. The database will be implemented as a MySQL database. Generally the database is not directly accessed. It is normally only accessed by the BDS Data Engine in response to API requests.
General
Dates and times: These need to be stored to the nearest ms. We could use a MySQL DateTime + an Integer for the ms. Or we could simply use a String formated using ISO 8601format: eg "2002-04-03T11:20:53.103". An entry of 000-00-00T00:00:00.000 is used for unknown. I prefer to use DateTime rather than a string. Both have the advantages over epochal time that we can read dates/times from raw database dumps, but only DateTime allows us to devise queries more easily to get specific dates and times. A string would require pre-processing before any query; and might give wrong answers to direct queries if lexical greater-than/less-than gives different answers from numerical greater-than/less-than. If we are not bothered about milliseconds then we can use simple less-than and greater-than queries on the DateTime column; but how will we do date/time comparisons when we do need them to the nearest millisecond?
Overall Management
Groups
The list of user groups.
Name
Type
Description
group
String
The Group name
description
String
The Groups description
permissions
String
Comma separated list of permissions (or separate table)
Users
The list of users.
Name
Type
Description
id
Integer
Unique ID for this entry. (Used as the userId)
name
String
The users full name
loginId
String
The users login ID
loginPassword
String
The users login password (Encrypted)
emailAddress
String
The users email address
telephone
String
The users telephone number
enabled
Bool
If this user is enabled
UserGroups
Used to keep the list of groups a user belongs to.
Name
Type
Description
userId
Integer
The userId for this group
group
String
A group this user belongs to
Overall Information MetaData
I suggest that all the tables have a "date when last altered" column that is automatically updated when the row is modified.
I have added a column, "Key", to all the tables below, to denote which row or rows constitute the primary key to the table, i.e. this row or rows has to be unique within the table, so that any search for a single value of the primary key returns no more than one row of the table. For most of these tables the primary key is the combination of netework, station, channel and start and stop dates/times.
Networks
The list of networks.
Name
Type
Description
Key
network
String
The Network name
Primary
description
String
The Network description
NetworkStations
Used to keep the list of all stations managed by a network.
Name
Type
Description
Key
network
String
The Network name
Primary
station
String
The Station/Array name
Stations
Used to keep the list of all station names. If a station is only managed by one network then we could integrate the NetworkStations table with this one. How do we denote stations that are both an array and a co-located three-component set? Will we want to do searches for three-component sets that include those co-located with an array? At present our co-located three-component set has a different name (EKB) from the array (EKA), but will this change after the upgrade? We could have duplicate entries, one to denote the array, one for the three-component set, but we will not be able to use "station" as a primary key if it is not unique within the table. Also, is this table going to contain all the pits within an array? Will they have a different station type from an isolated single station? (they don't in IDC table "site" - array pits are all designated "ss" - "single station")
Name
Type
Description
Key
station
String
The Station/Array name
Primary
type
String
The station type (Station, Array)
Channel MetaData
StationLocations
Used to keep the list of all station locations
Name
Type
Description
Key
startTime
String
The Start time
Primary
endTime
String
The End time
Primary
network
String
The Network name
Primary
station
String
The Station/Array name
Primary
latitude
Double
The Latitude in degrees using the WGS84 datum.
longitude
Double
The longitude in degrees using the WGS84 datum.
elevation
Double
The ground level elevation in metres from the WGS84 ellipsoid (Sea level)
arrayOffsetEast
Double
For a station in an array, this is the offset from the array's centre in km
arrayOffsetNorth
Double
For a station in an array, this is the offset from the array's centre in km
Are any of our station locations NOT relative to WGS84? Do we know? If any, do we know which reference WAS used, and who's going to do the conversion and how?
Channels
Used to keep the list of all channels. I have added "locationId" to this table since two instruments running simultaneously at the same station can have the same channel name, e.g., surface and borehole instruments at WOL when the surface instrument was mis-oriented were both called "BH1", "BH2" and "BH3".
Name
Type
Description
Key
startTime
String
The Start time for this channel
Primary
endTime
String
The End time for this channel
Primary
network
String
The Network name
Primary
station
String
The Station/Array name
Primary
channel
String
The Channels name
locationId
String
The location ID (usually two digits, e.g. "01")
Calibrations
Used to keep the list of all calibration entries for a channel. Location ID (see above) must appear as a primary key wherever "channel" appears as one.
Name
Type
Description
Key
startTime
String
The Start time for this channel
Primary
endTime
String
The End time for this channel
Primary
network
String
The Network name
Primary
station
String
The Station/Array name
Primary
channel
String
The Channel name
Primary
locationId
String
The location ID (usually two digits, e.g. "01")
Primary
calibrationFrequency
Double
The frequency that the CalibrationFactor value is valid for (Hz)
calibrationFactor
Double
The scaling value to apply to the data to normalise to nanometres. This is a measured value at the calibration frequency and is in nanometres/count. What about instruments that aren't seismometers?
units of gain?
String
Units in which gain is quoted, e.g. volts per (m/s), volts per Pa
Instruments
Used to keep the list of all instruments for a channel
Name
Type
Description
Key
startTime
String
The Start time
Primary
endTime
String
The End time
Primary
network
String
The Network name
Primary
station
String
The Station/Array name
Primary
channel
String
The channel name
Primary
locationId
String
The location ID (usually two digits, e.g. "01")
Primary
instrument
String
The instrument name
type
String
The type of instrument - e.g. seismometer, infrasound microphone, hydrophone, wind gauge, thermometer - ?
model
String
The instrument model name. The Vendor make/type of instrument. Only used when there is a unique physical instrument
serialNumber
String
The instrument's serial number. Only used when there is a unique physical instrument.
Digitisers
Used to keep the list of all digitisers for a channel
Name
Type
Description
Key
startTime
String
The Start time
Primary
endTime
String
The End time
Primary
network
String
The Network name
Primary
station
String
The Station/Array name
Primary
channel
String
The channel name (see below, table Sensor)
Primary
locationId
String
The location ID (usually two digits, e.g. "01")
Primary
digitiser
String
The digitiser's model name
model
String
The digitiser's model name. Only used when there is a unique physical digitiser
serialNumber
String
The digitisers's serial number. Only used when there is a unique physical digitiser
samplingFrequency
Double
The sampling frequency (Hz)
initialSamplingFrequency
Double
The initial pre-decimation sampling frequency (Hz)
gain
Double
The overall gain of the digitiser at the manufacturers calibration frequency (counts/volt?). (For information only)
Sensors
Used to keep the list of all sensors for a given channels digitiser. I don't understand this table. As it stands it is a duplicate of "instrument" above. If it is meant to be keeping track of the several channels served by a single digitiser, then it has a "many-to-one" relationship with individual rows of the "digitiser" table (i.e. one digitiser has many sensors). Hence it should have as foreign key the digitiser model and serial number. The "digitiser" table above should not contain a "channel" row if each digitiser is taken to serve many channels.
Name
Type
Description
Key
startTime
String
The Start time
Primary
endTime
String
The End time
Primary
network
String
The Network name
Primary
station
String
The Station/Array name
Primary
channel
String
The channel name
Primary
locationId
String
The location ID (usually two digits, e.g. "01")
Primary
sensor
String
The sensor name What will this contain? I have lost track of why this is different from the "instrument".
type
String
The type of sensor. (Seismometer, Hydrophone etc)
model
String
The sensor's model name. Only used when there is a unique physical sensor
serialNumber
String
The sensor's serial number. Only used when there is a unique physical sensor
gain
Double
The overall gain of the sensor at the manufacturers calibration frequency. (For information only)
units of gain?
String
Units in which gain is quoted, e.g. volts per (m/s), volts per Pa
SensorLocations
Used to keep the list of sensor locations. The placement angles could be in a separate table.
Name
Type
Description
Key
startTime
String
The Start time
Primary
endTime
String
The End time
Primary
network
String
The Network name
Primary
station
String
The Station/Array name
Primary
channel
String
The channel name
Primary
sensor
String
The sensor name (see above)
locationId
String
The location ID (usually a two-digit code, e.g. "01")
Primary
depth
Double
The depth below ground level (m)
verticalAngle
Double
The Sensor's placement vertical angle in degrees degrees with zero = vertically up (degrees)
horizontalAngle
Double
The Sensor's placement horizontal angle in degrees clockwise from north (degrees)
Responses
Used to keep the list of all frequency responses for a given channels digitiser and sensors. We may need sensor id/name and digitiser id/name in here.
Name
Type
Description
Key
startTime
String
The Start time
Primary
endTime
String
The End time
Primary
network
String
The Network name
Primary
station
String
The Station/Array name
Primary
channel
String
The channel name
Primary
locationId
String
The location ID (see Sensor Locations, above)
Primary
digitiser
String
The digitiser name
sensor
String
The sensors name (may be null if this is a digitiser response)
response
String
The responses name (Sensor,Anti-Aliasing filter, Digitiser, post filter etc)
type
String
The type of response (PoleZero,AmplitudePhase or FIR Coefficients)
data
Blob
PoleZero, AmplitudePhase or FIR Coefficient data
gain
Double
Overall gain at gain frequency.
gainFrequency
Double
Frequency that gain is valid for. (For information)
decimation
Double
Decimation performed post filter
symmetry
String
Symmetry for FIR coefficients (A = asymmetric, B = symmetric[odd], C = symmetric[even])
Seismic Data
Data
Used to keep the list of data files managed by the BDS.
What is the purpose of this table? How does the "unique integer ID" work? Since this ID appears only in this table it can't be used as a foreign key to other tables. I have assumed that the primary keys for this table are the start and end time, network and station, and the "source", and that the purpose of the table is to look up the unique integer ID, in order to use it in other places, as yet unspecified. I have added "channel" and "locationID" to the primary keys, because although the multiplexed data and SEED data files in our archive do contain all the channels for each station, the gcf data are archived per channel/locationID. Note that for the multiplexed and multichannel formats there will be a many-to-one relationship between the primary key suite and the "unique integer ID".
Name
Type
Description
Key
id
Integer
Unique integer ID
startTime
String
The Start time
Primary
endTime
String
The End time
Primary
network
String
The Network organisation the original data is from
Primary
station
String
The Station/Array name
Primary
channel
String
The channel name
Primary
locationId
String
The location ID
Primary
source
String
Some information on the source of this data (DIRECT, TAPE, PROCESSED ...). This would allow multiple sources of data from the same Array and TimePeriod. It could also support processed data.
Primary
location
String
The location of the data. This could be a local archive or even a remote archive
format
String
The data's format
url
String
File location. This can embody a protocol, host, path and filename.
comment
String
A general comment string.
channels
String
A list of channels some how ....
Misc Data
Notes
Used to keep the list of data outages and other notes.
Name
Type
Description
Key
startTime
String
The Start time
Primary
endTime
String
The End time
Primary
network
String
The Network organisation the original data is from
station
String
The Station/Array name
Primary
channel
String
The Channel name
Primary
locationId
String
The location ID (see Sensor Locations, above)
Primary
type
String
The entry type: (Outage, Note, Warning etc)
source
String
The source of the note
title
String
A Title for the note
comment
String
A Comment for the note
We will also need database information store on the following: