Skip to topic | Skip to bottom
Home
Main
Main.XeniaTableSchemar1.23 - 29 Jun 2007 - 20:40 - JeremyCothrantopic end

Start of topic | Skip to actions
Xenia Table Schema

History

This table schema is the latest adaptation from earlier approaches documented earlier at

http://nautilus.baruch.sc.edu/twiki_dmcc/bin/view/Main/SEACOOSCookbook2#Data_structures_canonical_forms

http://nautilus.baruch.sc.edu/twiki_dmcc/bin/view/Main/MultiObsSchema

Note: The above table schema is primarily oriented towards platform/sensor type setups like buoys, water level stations, ship obs, etc. I'll probably suggest other similar observation table reference schemas for collections, grid, particle trajectory, etc datatypes that are sufficiently different as to need their own specialized schema.

My general strategy with the database design is to try to develop a single general scheme that can capture the less dynamic metadata like organization/platform/sensor for everyone and a small set of observation table sub-schemas(say < 5 for now) depending on the particular datatype and it's additional indexing columns or number of dimensions. For instance for CTD's we also track the 'collection_id' column which is an additional index to specify observations within a specific cast.

http://nautilus.baruch.sc.edu/twiki_dmcc/bin/view/Main/CtdNotes

The newer schema for me represents what I'd like to move to as the old schema isn't as generalized as I'd like for going forward with new data or products. The old schema uses separate similar tables for each observation type, whereas the new tries to stick with one table type and uses an observation index. The old schema also mixes in product specific(mostly MapServer? GIS specific in our case) columns like 'top of the hour', etc which create a development/maintenance tax on users who might not be interested in those particular products.

Temporal Table Design

see also XeniaReplication

Schemas use the strategy of using the same table type/schema(multi_obs) for holding smaller/larger time periods of data

  • table of latest or past few hours of data available (small size tables)
  • tables for past 2 weeks (medium size tables)
  • tables for older than 2 weeks (large size tables)
  • file archives produced by month broken out by both observation type(all platforms for a specific observation type) and platform(all observation types for a specific platform). Current month is a daily appended file with past days data.
  • data cascades from smaller, more recent tables to larger archival tables/files

This allows queries on recent data to respond quickly and places a certain limit on how large any one table might get for indexing or backup purposes.

Schema and lookup SQL source

Simple schema

The simple schema dump of just the basic table components for capturing observation data is availabe here

A sample observation data dictionary population SQL is available here. This contains most of the currently collected observation types for the existing Seacoos database.

A support table population(m_type_display_order) for controlling the display order of measurement table is available here.

To install this table(assuming Postgres already installed) run

#as user postgres
createdb db_xenia_v1

##tables repopulated in following order

#install the base tables
psql -d db_xenia_v1 -f db_xenia_v1_simple_schema.sql

#if want/need, add postgis and 'the_geom' to multi_obs table

#install the observation data dictionary
psql -d db_xenia_v1 -f obs_data_dictionary.sql

I've removed any references to PostGIS? from earlier(the multi_obs table used to contain a 'the_geom' column to store long/lat values) to help keep things simple/separate in the case a user doesn't want/need geospatial indexing or wants to use a different product/version than what I'm using.

NOTE The xenia schema is setup to run fine without reference to the quality control test tables and other table schemas to the right(in the diagram) of the multi_obs table. Be sure to disable any triggers on the multi-obs table which reference these additional functional tables which you may not be using.

Sample organization/platform/sensor records

A sample organization,platform_type,platform,sensor_type,sensor population SQL is available here. I haven't had time to populate more fully but it should be enough to get started with the buoy data processing example listed here and give the general idea.

A full example listing is as follows

To install these tables run

#organization/platform/sensor lookups
#don't run as is - for new regional areas, edit inserts to reflect regional organization, platform_type,sensor_type

##tables repopulated in following order
psql -d db_xenia_v1 -f insert_organization.sql

psql -d db_xenia_v1 -f insert_platform_type.sql
psql -d db_xenia_v1 -f insert_platform.sql

psql -d db_xenia_v1 -f insert_sensor_type.sql
psql -d db_xenia_v1 -f insert_sensor.sql

Extra schema

An extended schema which uses a trigger from the multi_obs table to activate quality control tests and user/group notification is available here and sample rows here

To install these extra tables run

psql -d db_xenia_v1 -f db_xenia_v1_extra_schema.sql
psql -d db_xenia_v1 -f insert_test_type.sql

Schema discussion

The schema as shown in the diagram above has been color-coded grouped into it's different functional aspects:

Organization/Project/Catalog

Metadata concerning the organization or institution and projects or catalogs developed within that organization.

Platforms/Sensors/Observations

Metadata concerning the platforms and sensors. Observations separately indexed on the multi_obs table by observation type(measurement type or 'm_type'), time and location.

platform and sensor types

The platform and sensor type labels are currently not very consistent(some platforms are described as 'buoy' while others are 'met',etc) representing some earlier naming conventions as supplied by various data providers. I've avoided keying anything too much too the platform or sensor types, keying more on consistency at the observation/multi_obs/m_type_id level from which sensor/platform profiles/lookups can be derived(Finding all air_temperature sensors is a query on the sensor table for all sensors with that m_type_id).

multi_obs table

The multi-obs table includes fields for row_entry_date and row_update_date to help determine latencies between observations made and when they are collected.

On the multi_obs table the field platform_handle is the same format as station_id referenced in earlier schemas - a concatenation of organization(short_name)_platform(short_name)_platform_type(type_name) like carocoops_CAP2_buoy

All m_date values should be GMT.

I'm leaving m_date, m_lat, m_lon, m_z, m_value without a 'not null' constraint for now. In the XeniaPackageV2 there is a 'collection' table which support a collection or grouping of measurements which may cover a range of time or position but not have a specific value. Also relating to version 2 are vector types which may be ambiguous in the specific layout of how measurements are recorded.

The following two items are optional add-ons to the existing simple schema

  • The multi_obs table is the only PostGIS? enabled table. There is a trigger which auto-updates the_geom field for an inserted long and lat.

  • The multi-obs table also includes an auto-update trigger to set the qc test table ignore flag to false so the qc test will be re-run on the next qc processing.

Observations Data Dictionary

The 'm_type' table is a data dictionary which references a standard_name and unit of measure(uom).


Note the following sections are less finished and more developmental. The system could perform with just the tables above in relation to collecting and sharing observations without actually getting into further functionality like quality control test , test actions and notification.

Quality Control Tests

Tests which are periodically performed on the observations as they arrive.

Tests must also pass the following criteria:

  • they are active (active flag=true)
  • the test is within the effective date range (eff_date_begin to eff_date_end)
  • they should not be ignored (ignore flag=false)
  • the wait_interval has expired. wait_interval is measured in minutes. wait_interval expiration time = done_date time + the wait_interval time

The recurring flag is used to reset the effective date range one year ahead as it expires which enables tests to be seasonally enabled from year to year.

Test Actions

Actions which are taken as a result of a test failure.

Email Notification

Actions which are email oriented specifying a particular message, importance and user/group to be notified.

Groups/Users

Every user belongs to their own default group. Groups can be additionally defined as a collection of users.

Schema changes

October 28, 2006

  • added to table organization columns url and opendap_url

November 15, 2006

  • made all varchar fields a minimum of 50 characters
  • added several new uom_types, obs_types, m_types and corrected a few (see new obs_data_dictionary file)

November 19, 2006

  • added a url field to the sensor table (each sensor could potentially link to it's own display page)
  • added a support table(m_type_display_order) and population for ordering the measurement listings in various displays

Conventions

column names

Tried to avoid using column names which are SQL or PostgreSQL? reserved words using the listing at
http://www.php-editors.com/postgres_manual/p_sql-keywords-appendix.html

In some instances the reserved word was very useful, so to use the word without conflicting with it I'm prefixing these words with 'app_' like 'app_user', 'app_group'.

row_id

All tables by design use an row_id column as the primary key and do not rely on concatenated columns for primary key use. In most cases the row_id is a SERIAL type which provides for automatic indexing. I used serial type for row_id for tables that I thought would be have automatic inserts and regular integer type row_id where the table population is more manual initial setup.

additional attributes/columns

To provide some flexibility for additional column type information, but without a dedicated table column, I've added support tables of '_attr' for sensor_attr and test_attr. These support tables provide a attribute string label and value which references a particular row_id.

'timestamp without time zone' on external dates

I've made the field types for multi_obs.m_date and several other date fields that represent actual external time sources (not system internal processes like row insert or update time) as 'timestamp without time zone' with the idea that all of these dates represent GMT. I don't think there is code that would be effected if a developer wants to change these fields to 'with time zone' for their own needs.

Miscellaneous

  • Having platform_handle on multi_obs is mainly for convenience when processing the multi_obs resultsets, not having to do the repeated separate table lookups for the hierachical metadata/labels. Spelling the platform_handle out at the multi_obs row level makes it a lot easier to read/debug those multi_obs data rows as well.


The dominant use/lookup from the aggregator perspective is across platforms for certain observation types(m_type_id, hourly observation maps). You could put an index for those queries going the other way (across observations for certain platforms), but that's a more platform tailored approach. I did develop out some basic scripts for NERRS northlinlet http://nautilus.baruch.sc.edu/twiki_dmcc/bin/view/Main/XeniaSampleCase that are platform specific, but the code isn't very optimized and the database isn't platform index tuned because there's only one platform going into it currently.

> -----Original Message-----
> Sent: Thursday, June 28, 2007 4:08 PM
>
> > Having platform_handle on multi_obs is mainly for convenience when
> > processing the multi_obs resultsets, not having to do the repeated
> > separate table lookups for the hierachical metadata/labels.
> Spelling
> > the platform_handle out at the multi_obs row level makes it a lot
> > easier to read/debug those multi_obs data rows as well.
>
> That makes sense. So why not have constrained by
> platform(platform_handle) to maintain integrity? It might be bad to
> index the multi_obs further, but is would make sense DDL-wise.


Sent: Thu 6/28/2007 6:05 PM

Regarding observation standard names, definitions and associated labels:

In general I'm trying to leave the labeling and semantics open to choice with the database and scripts as usual referencing abstract system id's so the mechanics of import/export, selection and display are uneffected. If other installs/groups want more descriptive/different labels and definitions then that's fine just so whatever is decided represents what's to be collected to the database. At the end of it all terms/definitions between systems should be mappable as a superset,same as or subset definitions or no mapping.

My style preference is towards shorter layman term names and I've been avoiding using specific subtypes when a general type will do for collection/aggregation(water_temperature with associated lat,long,depth instead of sea_surface_temperature).

Basically I'd rather cast a wide net first and draw down to smaller nets that start with smaller nets. Or another analogy in starting with broad brushstrokes to establish the composition before getting into the details.

Vector representation is also similarly flexible depending on the potential usage - might have several vector representations of the same data to accomodate several potential uses.



to top

I Attachment Action Size Date Who Comment
obs_data_dictionary.sql manage 7.0 K 15 Nov 2006 - 20:49 JeremyCothran NA
db_multi_obs_beta_support1.sql manage 2.1 K 30 May 2006 - 20:37 JeremyCothran NA
db_xenia_v1_simple_schema.sql manage 10.8 K 19 Nov 2006 - 20:54 JeremyCothran NA
db_xenia_v1_extra_schema.sql manage 9.7 K 16 Feb 2007 - 16:22 JeremyCothran NA
insert_organization.sql manage 1.5 K 26 Oct 2006 - 21:47 JeremyCothran NA
insert_platform_type.sql manage 0.8 K 26 Oct 2006 - 21:48 JeremyCothran NA
insert_platform.sql manage 22.8 K 26 Oct 2006 - 21:48 JeremyCothran NA
insert_sensor_type.sql manage 2.3 K 26 Oct 2006 - 21:48 JeremyCothran NA
insert_sensor.sql manage 132.9 K 26 Oct 2006 - 21:48 JeremyCothran NA
m_type_display_order.sql manage 1.8 K 19 Nov 2006 - 20:54 JeremyCothran NA
display_order.txt manage 2.0 K 20 Nov 2006 - 20:02 JeremyCothran NA
insert_test_type.sql manage 1.2 K 26 Mar 2007 - 19:59 JeremyCothran some population rows for the extra schema test_type table

You are here: Main > XeniaTableSchema

to top

Copyright © 1999-2014 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding DMCC? Send feedback