Xenia Table Schema
This table schema is the latest adaptation from earlier approaches documented earlier at
Note: The above table schema is primarily oriented towards platform/sensor type setups like buoys, water level stations, ship obs, etc. I'll probably suggest other similar observation table reference schemas for collections, grid, particle trajectory, etc datatypes that are sufficiently different as to need their own specialized schema.
My general strategy with the database design is to try to develop a single general scheme that can capture the less dynamic metadata like organization/platform/sensor for everyone and a small set of observation table sub-schemas(say < 5 for now) depending on the particular datatype and it's additional indexing columns or number of dimensions. For instance for CTD's we also track the 'collection_id' column which is an additional index to specify observations within a specific cast.
The newer schema for me represents what I'd like to move to as the old schema isn't as generalized as I'd like for going forward with new data or products. The old schema uses separate similar tables for each observation type, whereas the new tries to stick with one table type and uses an observation index. The old schema also mixes in product specific(mostly MapServer?
GIS specific in our case) columns like 'top of the hour', etc which create a development/maintenance tax on users who might not be interested in those particular products.
Temporal Table Design
see also XeniaReplication
Schemas use the strategy of using the same table type/schema(multi_obs) for holding smaller/larger time periods of data
- table of latest or past few hours of data available (small size tables)
- tables for past 2 weeks (medium size tables)
- tables for older than 2 weeks (large size tables)
- file archives produced by month broken out by both observation type(all platforms for a specific observation type) and platform(all observation types for a specific platform). Current month is a daily appended file with past days data.
- data cascades from smaller, more recent tables to larger archival tables/files
This allows queries on recent data to respond quickly and places a certain limit on how large any one table might get for indexing or backup purposes.
Schema and lookup SQL source
The simple schema dump of just the basic table components for capturing observation data is availabe here
A sample observation data dictionary population SQL is available here
. This contains most of the currently collected observation types for the existing Seacoos database.
A support table population(m_type_display_order) for controlling the display order
of measurement table is available here
To install this table(assuming Postgres already installed) run
#as user postgres
##tables repopulated in following order
#install the base tables
psql -d db_xenia_v1 -f db_xenia_v1_simple_schema.sql
#if want/need, add postgis and 'the_geom' to multi_obs table
#install the observation data dictionary
psql -d db_xenia_v1 -f obs_data_dictionary.sql
I've removed any references to PostGIS?
from earlier(the multi_obs table used to contain a 'the_geom' column to store long/lat values) to help keep things simple/separate in the case a user doesn't want/need geospatial indexing or wants to use a different product/version than what I'm using.
The xenia schema is setup to run fine without reference to the quality control test tables and other table schemas to the right(in the diagram) of the multi_obs table. Be sure to disable any triggers on the multi-obs table which reference these additional functional tables which you may not be using.
Sample organization/platform/sensor records
A sample organization,platform_type,platform,sensor_type,sensor population SQL is available here
I haven't had time to populate more fully but it should be enough to get started with the buoy data processing example listed here
and give the general idea.
A full example listing is as follows
To install these tables run
#don't run as is - for new regional areas, edit inserts to reflect regional organization, platform_type,sensor_type
##tables repopulated in following order
psql -d db_xenia_v1 -f insert_organization.sql
psql -d db_xenia_v1 -f insert_platform_type.sql
psql -d db_xenia_v1 -f insert_platform.sql
psql -d db_xenia_v1 -f insert_sensor_type.sql
psql -d db_xenia_v1 -f insert_sensor.sql
An extended schema which uses a trigger from the multi_obs table to activate quality control tests and user/group notification is available here
and sample rows here
To install these extra tables run
psql -d db_xenia_v1 -f db_xenia_v1_extra_schema.sql
psql -d db_xenia_v1 -f insert_test_type.sql
The schema as shown in the diagram above has been color-coded grouped into it's different functional aspects:
Metadata concerning the organization or institution and projects or catalogs developed within that organization.
Metadata concerning the platforms and sensors. Observations separately indexed on the multi_obs table by observation type(measurement type or 'm_type'), time and location.
platform and sensor types
The platform and sensor type labels are currently not very consistent(some platforms are described as 'buoy' while others are 'met',etc) representing some earlier naming conventions as supplied by various data providers. I've avoided keying anything too much too the platform or sensor types, keying more on consistency at the observation/multi_obs/m_type_id level from which sensor/platform profiles/lookups can be derived(Finding all air_temperature sensors is a query on the sensor table for all sensors with that m_type_id).
The multi-obs table includes fields for row_entry_date and row_update_date to help determine latencies between observations made and when they are collected.
On the multi_obs table the field platform_handle is the same format as station_id referenced in earlier schemas - a concatenation of organization(short_name)_platform(short_name)_platform_type(type_name) like carocoops_CAP2_buoy
All m_date values should be GMT
I'm leaving m_date, m_lat, m_lon, m_z, m_value without a 'not null' constraint for now. In the XeniaPackageV2
there is a 'collection' table which support a collection or grouping of measurements which may cover a range of time or position but not have a specific value. Also relating to version 2 are vector types which may be ambiguous in the specific layout of how measurements are recorded.
The following two items are optional add-ons to the existing simple schema
- The multi_obs table is the only PostGIS? enabled table. There is a trigger which auto-updates the_geom field for an inserted long and lat.
- The multi-obs table also includes an auto-update trigger to set the qc test table ignore flag to false so the qc test will be re-run on the next qc processing.
Observations Data Dictionary
The 'm_type' table is a data dictionary which references a standard_name and unit of measure(uom).
Note the following sections are less finished and more developmental. The system could perform with just the tables above in relation to collecting and sharing observations without actually getting into further functionality like quality control test , test actions and notification.
Quality Control Tests
Tests which are periodically performed on the observations as they arrive.
Tests must also pass the following criteria:
- they are active (active flag=true)
- the test is within the effective date range (eff_date_begin to eff_date_end)
- they should not be ignored (ignore flag=false)
- the wait_interval has expired. wait_interval is measured in minutes. wait_interval expiration time = done_date time + the wait_interval time
The recurring flag is used to reset the effective date range one year ahead as it expires which enables tests to be seasonally enabled from year to year.
Actions which are taken as a result of a test failure.
Actions which are email oriented specifying a particular message, importance and user/group to be notified.
Every user belongs to their own default group. Groups can be additionally defined as a collection of users.
October 28, 2006
November 15, 2006
- added to table organization columns url and opendap_url
November 19, 2006
- made all varchar fields a minimum of 50 characters
- added several new uom_types, obs_types, m_types and corrected a few (see new obs_data_dictionary file)
- added a url field to the sensor table (each sensor could potentially link to it's own display page)
- added a support table(m_type_display_order) and population for ordering the measurement listings in various displays
Tried to avoid using column names which are SQL or PostgreSQL?
reserved words using the listing at
In some instances the reserved word was very useful, so to use the word without conflicting with it I'm prefixing these words with 'app_' like 'app_user', 'app_group'.
All tables by design use an row_id column as the primary key and do not rely on concatenated columns for primary key use. In most cases the row_id is a SERIAL type which provides for automatic indexing. I used serial type for row_id for tables that I thought would be have automatic inserts and regular integer type row_id where the table population is more manual initial setup.
To provide some flexibility for additional column type information, but without a dedicated table column, I've added support tables of '_attr' for sensor_attr and test_attr. These support tables provide a attribute string label and value which references a particular row_id.
'timestamp without time zone' on external dates
I've made the field types for multi_obs.m_date and several other date fields that represent actual external time sources (not system internal processes like row insert or update time) as 'timestamp without time zone' with the idea that all of these dates represent GMT. I don't think there is code that would be effected if a developer wants to change these fields to 'with time zone' for their own needs.
- Having platform_handle on multi_obs is mainly for convenience when processing the multi_obs resultsets, not having to do the repeated separate table lookups for the hierachical metadata/labels. Spelling the platform_handle out at the multi_obs row level makes it a lot easier to read/debug those multi_obs data rows as well.
The dominant use/lookup from the aggregator perspective is across platforms for certain observation types(m_type_id, hourly observation maps). You could put an index for those queries going the other way (across observations for certain platforms), but that's a more platform tailored approach. I did develop out some basic scripts for NERRS northlinlet http://nautilus.baruch.sc.edu/twiki_dmcc/bin/view/Main/XeniaSampleCase
that are platform specific, but the code isn't very optimized and the database isn't platform index tuned because there's only one platform going into it currently.
> -----Original Message-----
> Sent: Thursday, June 28, 2007 4:08 PM
> > Having platform_handle on multi_obs is mainly for convenience when
> > processing the multi_obs resultsets, not having to do the repeated
> > separate table lookups for the hierachical metadata/labels.
> > the platform_handle out at the multi_obs row level makes it a lot
> > easier to read/debug those multi_obs data rows as well.
> That makes sense. So why not have constrained by
> platform(platform_handle) to maintain integrity? It might be bad to
> index the multi_obs further, but is would make sense DDL-wise.
Sent: Thu 6/28/2007 6:05 PM
Regarding observation standard names, definitions and associated labels:
In general I'm trying to leave the labeling and semantics open to choice with the database and scripts as usual referencing abstract system id's so the mechanics of import/export, selection and display are uneffected. If other installs/groups want more descriptive/different labels and definitions then that's fine just so whatever is decided represents what's to be collected to the database. At the end of it all terms/definitions between systems should be mappable as a superset,same as or subset definitions or no mapping.
My style preference is towards shorter layman term names and I've been avoiding using specific subtypes when a general type will do for collection/aggregation(water_temperature with associated lat,long,depth instead of sea_surface_temperature).
Basically I'd rather cast a wide net first and draw down to smaller nets that start with smaller nets. Or another analogy in starting with broad brushstrokes to establish the composition before getting into the details.
Vector representation is also similarly flexible depending on the potential usage - might have several vector representations of the same data to accomodate several potential uses.