Home
The Trilateral
Cooperation
News / Service

Management

Monitoring
Interregional
Cooperation

Data Management in International Monitoring Programs

Joint Workshop of the
European Environment Agency (EEA) and the
Common Wadden Sea Secretariat (CWSS)

EU Life-project
DEMOWAD
Copenhagen 18th/19th February 1998


 



IEO Marine Pelagic Ecosystem Studies and Data Management System

Luis Valdés

 

Instituto Españiol de Oceanografía, Centro Oceanográfico de Santander, P.O. Box 240, 39080 Santander, Spain [tel: (+34)42 27 50 62; fax: (+34)42 27 50 72; E-mail: luis.valdes@st.ieo.es].

 


 


1. Introduction

The Marine Environment Department of the Instituto Españiol de Oceanografía (IEO) is conducting several research projects based on the systematic and continuous study of the Ocean. The principal goal of the core project (titled "Studies on time series of oceanographic data") is to understand the underlying causes of temporal variability of the physical and biological properties and processes in the pelagic ecosystem in the neritic and oceanic waters surrounding the Spanish coast. The research effort involves 1) time series measurements in several transects along the Spanish coast, on both the Atlantic and the Mediterranean sides, 2) large-scale measurements of physical and biological properties of the marine pelagic ecosystem from ichthyoplanktonic, acoustic, bottom trawls surveys, etc, the spatial resolution in this case being more important than the temporal resolution, and 3) retrospective analysis of existing data sets from landings of pelagic fisheries, sea level measurements, climatological data, etc.

In response to the need to deal effectively with the large and varied volumes of data that have been accumulated as a result of the above activities 1 and 2, the IEO has developed a relational database that acts as an archive of the data and as a tool for data analysis and elaboration of reports.

 

2. Data Management System

The IEO Pelagic Ecology Data Base is an ORACLE relational database. The database is accessed from PC's running Windows NT. It is prepared to store and manage data of Phytoplankton, Microzooplankton, Mesozooplankton, Ichthyoplankton and environmental data. Various routines are available for coding and entering data via keyboard or directly from data files (e.g. for CTD data). The database includes more than 30 Master-tables that codify the type of data to be introduced but also give help to the user and minimise the effort of introducing the data. Data can be extracted from defined areas, periods or selected entities (surveys, species, etc), and all the results can be easily exported to EXCEL with provision for users without prior experience of the ORACLE system. A collection of selected outputs is included in the package.

The structure is divided into several levels which constitute a hierarchy and it is the major axis of the database. The data conforming to the hierarchy described in Table I is pyramidal in structure, thus, for a given cruise only a single input is required at level 1. Each cruise has a number of stations associated. At each station several samples with different devices and purposes are recorded, and at each sample a different assemblage of species are counted, this level being the base of the pyramid and by far the most numerous in number of records.

Table I. Summary of the data hierarchy

Level Code Contents
1 BaseCamp Information about the organization of a cruise
2 BaseEst Information about an individual station
3 Mues*.* Details about a Phyto, Micro, Mesozoo, etc sample taken at a given station
4 Rec*.* Details of the assemblage of Phyto, Micro, Mesozoo, etc. in a given sample, detailed by species and numbers

3.DataEntry

All information associated with cruises, stations and sampling procedures is entered from specially designed forms (Data Reporting Formats), which were defined prior to the developing of the database. In order to avoid the risk of errors when typing data at some critical records, the system derives the user to a master table when some input is unknown. If the input is really new, then it is added to the stored list. More than 25 master tables are appended at these levels of hierarchy, including: Vessel names, Harbours, Chief scientist, Sampling Area, Periodicity of sampling, another four tables to identity the cruise and sampling station; Type of device, Mesh microns, Sampling preservation, Kind of tow, another five tables to identity the sampling details; Douglas scale, Beaufort scale, another six climatological and environmental scales to define the conditions of the sampling. At level 1, SST images, depth contour maps, station maps, etc., can be added as PIF, TIF, as well as other format files.

The assemblages of species counted at each sample and their numbers are entered by means of a multi-line stream. A three to six letters code is required to enter the species and taxa (Acartia clausi is ACACLA; Copepoda is COP, and so on), this code is checked against the species Master-list, where every single species or taxa have been assigned an NODC code, the complete Linnaean system name, ecological information (e.g. cosmopolitan, neritic, value as indicator species, etc.), biological (e.g. holoplankton, meroplankton, etc), physiological (Developmental stage, Weight in Carbon, etc.) and other data of interest.

If a wrong species is typed or it is a new species not included in the master list, then a pick-list is displayed on the screen to help the user to find the correct name, or to add and store the new species in the database.

Once that the complete assemblage of a given sample has been entered in the database, several crossed sums are computed (e.g. Holoplankton, Meroplankton, Copepods, information that the database obtains directly from the master list) to verify that the data have been typed correctly. Thus, at this level of hierarchy more than 5 types of data are codified in master tables.

 

4. Logistic considerations:

As the data entry process is critical to the accuracy and proper use of the stored data, some logistic considerations are under continuous revision to improve the database.

Data reporting formats: With the objective of unifying the type of information associated with cruises, stations and marine and climatological conditions, a Data Reporting Format was created, which must be used by those scientists using the database. This form is independent of the discipline, and the main purpose is to give the essential space-time information to allow every observation to be assigned and located in a specific project and cruise. With respect to the cruise this form includes the following items: Project, Cruise identifier, Vessel, Start of Cruise ddmmyyyy, End of cruise, Sampling area, Number of sampling stations, Operator (Chief Scientist). With respect to the station the form includes among others the following items:

Project, Cruise identifier, Station number, Date, Station Start Longitude, Station Start Latitude, Station Start Depth, Station Start Time, Station End Time, Station End Longitude, Station End Latitude, Station End Depth, Number of sampling devices, 14 different records of sea and climatological conditions, and Operator.

With respect to the sampling procedures, the report format is different depending on the discipline, but basically they contain the information necessary to: i) connect the sampling with a specific station and cruise, and ii) define the quality of sampling, e.g. sampling gear, type of tow, speed of the tow, depth of sampling, volume of water filtered, existence of clogging, etc (i.e. metadata).

Data gualitv control and Metadata: In the case of hydrographical parameters, an indication on quality control can be added to the database if calibration has been applied to the data, this is to ensure that the data are of high quality, and to make sure that all data are processed in a similar manner. However, this is not completely solved in the case of biological data, and the best approach to a data quality control is that obtained by means of the metadata (data that inform about the data).

Thus, our database contains two types of records, those that must necessarily be filled (they allow us to connect different levels of the database, and are the only data necessary to operate the database), and those that can be left blank (but they don't). Most of the latter refer to the data that inform about the data (Metadata).

The problem that arise with the indicator is that, according to the situation it may be necessary to fill more than 20 items to inform properly about a given measure (e.g. ICC, 1996, suggests 23 questions to inform about the methodological details for Primary Production measurement). We have tried to find a meeting point between metadata requirements and the operative procedures of the scientist when working at sea, thus for primary production we have reduced the metadata to 12 questions.

In recognition that there is an essential need to provide metadata, at every level of the hierarchy, the database has been provided with a series of metadata requirements that make the data useful for future users of the data and when submitting data to a data centre.

Taxonomic code: Capacity of banking and handling data is not a real limit in terms of computing time and space, and the interest of using a numerical code to classify the species is not based on these objections but on the interest of the hierarchy of a numeric taxonomic code. For example, NODC v.7 code for Acartia clausi is 6118290101 which means that Acartia clausi belongs to the Family Acartiidae, Order Calanoida, Subclase Copepoda etc., which allows us to operate when summing up all the species that share one given character, e.g. those that are Copepoda, or Decapoda, or Clupeida, etc.

The election of a single taxonomic code for the ICES area was extensively discussed during 1997 in several Working Groups, and a general agreement was reached about the NODC code on the basis that i) this taxonomic code contains logical hierarchical taxonomic information, ii) it is not limited to pelagic species, iii) it is flexible, giving the opportunity to incorporate new codes, and iv) it is well established (WGZE, 1997). (Subsequent to the 1997 Working Group meetings, a new version of NODC code has been released, which differs significantly from v.7).

The WO Data Management System has incorporated the NODC taxonomic code in its version 7.0 as the standard to operate in all the planktonic communities managed in the database.

 

5. Data Outputs: Reports, ASCII Tables, EXCEL Tables, and Figures

The database not only acts as an archive of the data, but also as a tool for data analysis and elaboration of reports. Data can be extracted from defined areas, periods or selected entities (surveys, species, etc), and all the results can be easily exported to EXCEL and ASCII tables with provision for users without prior experience of the ORACLE system.

In addition, a collection of selected outputs is included in the package, which allows the user to obtain Reports, Tables and Figures. For example it is possible to select, from a pre-defined menu, the sampling area, dates, depth and parameters to be plotted in a graph or to be transferred to a Table, or produce the list of species for a particular survey, etc. Options allow the user to choose any combination among more than 20 abiotic and biotic parameters (Depth, Temperature, Salinity, Nutrients, PAR, Fluorescence, Chlorophyll, Biomass, Abundance, etc.).

Also two different types of cruise reports are pre-defined in the outputs menu and a dialogue menu is being developed to edit and feed the ROSCOP format directly from the database.