|
| |
 |
Data Management in International Monitoring Programs
Joint Workshop of the
European Environment Agency (EEA)
and the
Common Wadden Sea Secretariat (CWSS) |
|
EU Life-project
DEMOWAD |
Copenhagen 18th/19th February 1998 |
|
IEO Marine Pelagic Ecosystem
Studies and Data Management System
Luis Valdés
Instituto
Españiol de Oceanografía, Centro Oceanográfico de Santander,
P.O. Box 240, 39080 Santander, Spain [tel: (+34)42 27 50 62; fax:
(+34)42 27 50 72; E-mail: luis.valdes@st.ieo.es].
1. Introduction
The Marine Environment Department
of the Instituto Españiol de Oceanografía (IEO)
is conducting several research projects based on the systematic
and continuous study of the Ocean. The principal goal of the
core project (titled "Studies on time series of oceanographic
data") is to understand the underlying causes of temporal
variability of the physical and biological properties and processes
in the pelagic ecosystem in the neritic and oceanic waters surrounding
the Spanish coast. The research effort involves 1) time series
measurements in several transects along the Spanish coast, on
both the Atlantic and the Mediterranean sides, 2) large-scale
measurements of physical and biological properties of the marine
pelagic ecosystem from ichthyoplanktonic, acoustic, bottom trawls
surveys, etc, the spatial resolution in this case being more
important than the temporal resolution, and 3) retrospective
analysis of existing data sets from landings of pelagic fisheries,
sea level measurements, climatological data, etc.
In response to the need to
deal effectively with the large and varied volumes of data that
have been accumulated as a result of the above activities 1 and
2, the IEO has developed a relational database that acts as an
archive of the data and as a tool for data analysis and elaboration
of reports.
2. Data Management System
The IEO Pelagic Ecology Data
Base is an ORACLE relational database. The database is accessed
from PC's running Windows NT. It is prepared to store and manage
data of Phytoplankton, Microzooplankton, Mesozooplankton, Ichthyoplankton
and environmental data. Various routines are available for coding
and entering data via keyboard or directly from data files (e.g.
for CTD data). The database includes more than 30 Master-tables
that codify the type of data to be introduced but also give help
to the user and minimise the effort of introducing the data.
Data can be extracted from defined areas, periods or selected
entities (surveys, species, etc), and all the results can be
easily exported to EXCEL with provision for users without prior
experience of the ORACLE system. A collection of selected outputs
is included in the package.
The structure is divided
into several levels which constitute a hierarchy and it is the
major axis of the database. The data conforming to the hierarchy
described in Table I is pyramidal in structure, thus, for a given
cruise only a single input is required at level 1. Each cruise
has a number of stations associated. At each station several
samples with different devices and purposes are recorded, and
at each sample a different assemblage of species are counted,
this level being the base of the pyramid and by far the most
numerous in number of records.
Table I. Summary of the data
hierarchy
|
Level |
Code |
Contents |
|
1 |
BaseCamp |
Information about the organization
of a cruise |
|
2 |
BaseEst |
Information about an individual
station |
|
3 |
Mues*.* |
Details about a Phyto, Micro,
Mesozoo, etc sample taken at a given station |
|
4 |
Rec*.* |
Details of the assemblage of
Phyto, Micro, Mesozoo, etc. in a given sample, detailed by species
and numbers |
3.DataEntry
All information associated
with cruises, stations and sampling procedures is entered from
specially designed forms (Data Reporting Formats), which were
defined prior to the developing of the database. In order to
avoid the risk of errors when typing data at some critical records,
the system derives the user to a master table when some input
is unknown. If the input is really new, then it is added to the
stored list. More than 25 master tables are appended at these
levels of hierarchy, including: Vessel names, Harbours, Chief
scientist, Sampling Area, Periodicity of sampling, another four
tables to identity the cruise and sampling station; Type of device,
Mesh microns, Sampling preservation, Kind of tow, another five
tables to identity the sampling details; Douglas scale, Beaufort
scale, another six climatological and environmental scales to
define the conditions of the sampling. At level 1, SST images,
depth contour maps, station maps, etc., can be added as PIF,
TIF, as well as other format files.
The assemblages of species
counted at each sample and their numbers are entered by means
of a multi-line stream. A three to six letters code is required
to enter the species and taxa (Acartia clausi is ACACLA;
Copepoda is COP, and so on), this code is checked against the
species Master-list, where every single species or taxa have
been assigned an NODC code, the complete Linnaean system name,
ecological information (e.g. cosmopolitan, neritic, value as
indicator species, etc.), biological (e.g. holoplankton, meroplankton,
etc), physiological (Developmental stage, Weight in Carbon, etc.)
and other data of interest.
If a wrong species is typed
or it is a new species not included in the master list, then
a pick-list is displayed on the screen to help the user to find
the correct name, or to add and store the new species in the
database.
Once that the complete assemblage
of a given sample has been entered in the database, several crossed
sums are computed (e.g. Holoplankton, Meroplankton, Copepods,
information that the database obtains directly from the master
list) to verify that the data have been typed correctly. Thus,
at this level of hierarchy more than 5 types of data are codified
in master tables.
4. Logistic considerations:
As the data entry process
is critical to the accuracy and proper use of the stored data,
some logistic considerations are under continuous revision to
improve the database.
Data reporting formats: With the objective of unifying
the type of information associated with cruises, stations and
marine and climatological conditions, a Data Reporting Format
was created, which must be used by those scientists using the
database. This form is independent of the discipline, and the
main purpose is to give the essential space-time information
to allow every observation to be assigned and located in a specific
project and cruise. With respect to the cruise this form includes
the following items: Project, Cruise identifier, Vessel, Start
of Cruise ddmmyyyy, End of cruise, Sampling area, Number of sampling
stations, Operator (Chief Scientist). With respect to the station
the form includes among others the following items:
Project, Cruise identifier,
Station number, Date, Station Start Longitude, Station Start
Latitude, Station Start Depth, Station Start Time, Station End
Time, Station End Longitude, Station End Latitude, Station End
Depth, Number of sampling devices, 14 different records of sea
and climatological conditions, and Operator.
With respect to the sampling
procedures, the report format is different depending on the discipline,
but basically they contain the information necessary to: i) connect
the sampling with a specific station and cruise, and ii) define
the quality of sampling, e.g. sampling gear, type of tow, speed
of the tow, depth of sampling, volume of water filtered, existence
of clogging, etc (i.e. metadata).
Data gualitv control and
Metadata: In the
case of hydrographical parameters, an indication on quality control
can be added to the database if calibration has been applied
to the data, this is to ensure that the data are of high quality,
and to make sure that all data are processed in a similar manner.
However, this is not completely solved in the case of biological
data, and the best approach to a data quality control is that
obtained by means of the metadata (data that inform about the
data).
Thus, our database contains
two types of records, those that must necessarily be filled (they
allow us to connect different levels of the database, and are
the only data necessary to operate the database), and those that
can be left blank (but they don't). Most of the latter refer
to the data that inform about the data (Metadata).
The problem that arise with
the indicator is that, according to the situation it may be necessary
to fill more than 20 items to inform properly about a given measure
(e.g. ICC, 1996, suggests 23 questions to inform about the methodological
details for Primary Production measurement). We have tried to
find a meeting point between metadata requirements and the operative
procedures of the scientist when working at sea, thus for primary
production we have reduced the metadata to 12 questions.
In recognition that there
is an essential need to provide metadata, at every level of the
hierarchy, the database has been provided with a series of metadata
requirements that make the data useful for future users of the
data and when submitting data to a data centre.
Taxonomic code: Capacity of banking and handling
data is not a real limit in terms of computing time and space,
and the interest of using a numerical code to classify the species
is not based on these objections but on the interest of the hierarchy
of a numeric taxonomic code. For example, NODC v.7 code for Acartia
clausi is 6118290101 which means that Acartia clausi
belongs to the Family Acartiidae, Order Calanoida, Subclase Copepoda
etc., which allows us to operate when summing up all the species
that share one given character, e.g. those that are Copepoda,
or Decapoda, or Clupeida, etc.
The election of a single
taxonomic code for the ICES area was extensively discussed during
1997 in several Working Groups, and a general agreement was reached
about the NODC code on the basis that i) this taxonomic code
contains logical hierarchical taxonomic information, ii) it is
not limited to pelagic species, iii) it is flexible, giving the
opportunity to incorporate new codes, and iv) it is well established
(WGZE, 1997). (Subsequent to the 1997 Working Group meetings,
a new version of NODC code has been released, which differs significantly
from v.7).
The WO Data Management System
has incorporated the NODC taxonomic code in its version 7.0 as
the standard to operate in all the planktonic communities managed
in the database.
5. Data Outputs: Reports, ASCII Tables,
EXCEL Tables, and Figures
The database not only acts
as an archive of the data, but also as a tool for data analysis
and elaboration of reports. Data can be extracted from defined
areas, periods or selected entities (surveys, species, etc),
and all the results can be easily exported to EXCEL and ASCII
tables with provision for users without prior experience of the
ORACLE system.
In addition, a collection
of selected outputs is included in the package, which allows
the user to obtain Reports, Tables and Figures. For example it
is possible to select, from a pre-defined menu, the sampling
area, dates, depth and parameters to be plotted in a graph or
to be transferred to a Table, or produce the list of species
for a particular survey, etc. Options allow the user to choose
any combination among more than 20 abiotic and biotic parameters
(Depth, Temperature, Salinity, Nutrients, PAR, Fluorescence,
Chlorophyll, Biomass, Abundance, etc.).
Also two different types
of cruise reports are pre-defined in the outputs menu and a dialogue
menu is being developed to edit and feed the ROSCOP format directly
from the database.
|