Interoperable Catalogue System (ICS)

Collections Manual (CM)


 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
CEOS

Working Group on Information Systems and Services

CEOS INTeroperability EXtensions 

  DocRef.:CEOS/WGISS/PTT/SDDCEOS/WGISS/CINTEX/CM 

Date:April 1999

Issue: Version 1.3

Authority


Issue: Version 1.3
Date: April 1999

 

This document has been approved for publication by the Catalogue INTeroperability EXperiment, (CINTEX), of the Committee on Earth Observation Satellites (CEOS) and reflects the consensus of the CINTEX technical panel experts from the CEOS member agencies.


Table of Contents


1. Introduction *

1.1 Purpose *

1.2 Organization of the Collections Manual (CM) *

1.3 ICS/CINTEX Development Process *

1.4 A Guide To CINTEX Documents *

1.5 Glossary *

1.5.1 Acronyms *

1.5.2 Definitions *

1.6 References *

1.7 Catalogue Interoperability *

1.7.1 Purpose and Scope of Catalogue Interoperability *

1.7.2 ICS Concepts *

1.7.2.1 Design Approach: CIP Space, IGP Space and ICS *

1.7.2.2 Collections Data Model *

1.7.2.3 CIP as a Z39.50 Profile *

1.7.2.4 Browse Data in CIP *

1.7.2.5 Data Product Ordering and Security *

1.7.2.6 Guide Documents in ICS *

1.7.3 Levels of Compliance to CIP, IGP and ICS *

2. The ICS Collections Model *

2.1 Descriptor Types *

2.1.1 Archive Collection Descriptor *

2.1.2 Theme Collection Descriptors *

2.1.3 Theme and Archive Collection Descriptors Summary *

2.1.4 Product Data *

2.1.5 Product Descriptors *

3. Collection Management *

3.1 ISA Responsibilities *

3.1.1 Collections Database *

3.1.2 Explain Database *

3.2 Collection Creation *

3.2.1 Collections Structure *

3.2.1.1 Formulating the Collections Structure *

3.2.1.1.1 The Query Model *

3.2.1.1.2 Analysis of Existing Data *

3.2.1.1.3 Organizing the Data into a Collections Structure *

3.2.1.1.3.1 Using the Query Model *

3.2.1.1.3.2 Including Collections *

3.2.1.1.3.3 Creating a Root Collection *

3.2.1.1.4 Relating Collections/Guides *

3.2.1.2 Review of the Collections Structure *

3.2.2 Collections Database (CDB) *

3.2.2.1 Adding Collections *

3.2.2.1.1 Identify Additional Elements/Attributes *

3.2.2.1.2 Verify Multiplicities *

3.2.2.1.3 Verify Mandatory Attributes *

3.2.2.1.4 Verify Valids *

3.2.2.1.5 Populate Database *

3.2.2.1.6 Adding Product Metadata *

3.2.2.2 Review Processes *

3.2.2.2.1 Scientific Review Process *

3.2.2.2.2 Periodic Consistency Review Process *

3.2.3 Explain Database *

3.2.3.1 Adding a CIP Collection *

3.2.3.2 Adding CIP Collection with Local Attributes *

3.2.3.2.1 Creating a New Entry -- Present Service *

3.2.3.2.2 Creating a New Entry -- Search and Present Service *

3.3 Collection Evolution *

3.3.1 Modifying the Collections Structure *

3.3.1.1 Adding Collections *

3.3.1.2 Deleting Theme Collections *

3.3.2 Modify Existing Collections in the Collections Database *

3.3.2.1 Deleting Collections *

3.3.2.2 Modifying Existing Product Metadata *

3.3.2.3 Deleting Product Descriptor *

3.3.3 Explain Database *

3.3.3.1 Modifying the Collection Related Entries in the Explain Database *

3.3.3.2 Valids in the Explain *

3.3.3.2.1 Capturing the Valids *

3.3.3.2.2 Maintaining the ICS Valids *

3.3.3.2.2.1 CIP Attribute Valid Value Changes *

3.3.3.2.3 Adding Local Valids for Local Attributes *
 



 
List of Figures

Figure 1-1 Collection Management Activities *

Figure 1-2 ICS Domain *

Figure 1-3 VENN Diagram of CIP Space and ICS *

Figure 1-4 The Concept Of A Collection *

Figure 3-1 Collections Process *

Figure 3-2 Formulating the Collections Structure *

Figure 3-3 Adding Collections to an Existing Collections Structure *

Figure 3-4 Deleting a Theme Collection from an Existing Collections Structure *

Figure 3-5 Including Collections *

Figure 3-6 Including Remote Collections *

Figure 3-7 Adding a Root Collection *

Figure 3-8 Mandatory Data for Collection Descriptors *

Figure 3-9 Mandatory Data for Product Descriptors *

Figure 3-10 Add New CIP Collection Entry to Explain *

Figure 3-11 Add Collection with Local Attributes to Explain --Present *

Figure 3-12 Add Collection with Local Attributes to Explain - Search & Present *

Figure 3-13 Adding Collections - Option "1" *

Figure 3-14 Adding Collections - Option "2" *

Figure 3-15 Adding Collections *

Figure A-1 Collection Structure "A" *

Figure A-2 Collection Structure "B" *

Figure A-3 Collection Structure "C" *

Figure A-4 Collection Structure "D" *

Figure A-5 Collection Structure "E *
 
 


List of Tables





Table 2-1 Example of Combining Attributes *

Table 2-2 What Collections May Include *

Table 3-1 Product Descriptor Analysis Results Table *

Table 3-2 Collection Descriptor Analysis Results Table (Archive Collections) *

Table 3-3 Collection Descriptor Analysis Results Table (Archive&Theme Collections) *

Table 3-4 Collection Descriptor Analysis Results Table (Remote Collections) *

Table 3-5 Collection Descriptor Analysis Results Table (Root Collection) *

Table 3-6 Collection Descriptor Analysis Results Table (Adding Present Service Column) *

Table 3-7 Explain Data Class Valid and Default Descriptions *

Table 3-8 AVHRR Collection Explain Data Class Selections *

Table 3-9 Present Service Extensions -- Explain Data Class Selections *

Table 3-10 Search & Present Service Extensions - Explain Data Class Selections *

Table 3-11 Updating Collection Descriptor Analysis Results Table *

Table B-1 CIP Elements *

Table B-2 CIP Item Descriptors ARS *

Table B-3: CIP Sub-Elements ARS *
 



Document Status Sheet

Version
Date
Comments
1.0
April 1997 First issue of document to CINTEX
1.01
May 1997 CET Review - Hold for Completion of Specification Consolidation
1.1
September 1998 CET Review – Document Reorganized per CET
1.2
February 1999 CET Review 
1.3
April 1999 Baseline Version


    1.  Introduction

    1.1  Purpose

    This document provides procedures and guidelines for the creation and maintenance of information about Earth Observation (EO) Metadata Collections that are to be catalogued in the Interoperable Catalogue System (ICS) Retrieval Manager. This "data about data" is called "metadata". The metadata described in this document includes the ICS Retrieval Manager's Explain as well as Collections Databases. The major goal of this document is to provide sufficient detail to the Data Providers as well as the ICS Site Administrators (ISA) to ensure ICS metadata interoperability and to sort through the complexities of the "Collections" concepts. To assist in meeting this goal this document provides the recommended procedures and guidelines that can be applied to any/all implementation strategies by a small or large Data Provider.

    1.2  Organization of the Collections Manual (CM)

    The Collections Manual is organized into Three sections.

    The first section provides general information about the contents of the Collections Manual such as the Purpose, Glossary, Definitions etc. This information applies to all of the subsequent sections. Additionally, an overview section that describes the Committee on Earth Observation Satellites (CEOS ) INTeroperability EXtensions (CINTEX ) approach to catalogue interoperability is included in this section of the manual.

    The second section addresses the ICS Collections Model. In this section the various types of ICS Collections are defined in terms of their descriptors. Additionally, detailed characteristics that further define the various Collections are also addressed in this section.

    The third section addresses Collection Management activities that relate to the creation and maintenance of Collection Descriptors. These activities provide the framework and suggested strategy for creating Collections Structures and entries in the Collections Database (CDB). The components of the Explain Database that relate to Collection Management are also addressed in this section.

    The organization of the Collections Manual is centered around the roles that each individual should play in support of Collections Management. The following diagram, Figure 1-1, graphically depicts these roles and the interaction with the various activities described in this manual. To begin the Data Provider will provide the mandatory metadata for his data Collections/products. The mandatory metadata coupled with specific requirements obtained from the users as well as the Data Providers will serve as input to the analysis activities. The ISA will then construct/modify a Query Model from the results of the analysis and create/modify a Collections Structure and corresponding entries in the CDB and Explain Databases.

       
    A starting point for most users of CINTEX documents will be the SDD that provides tutorial information about how CIP, IGP and Collections are used in ICS. An implementer who wants the details of CIP messages may want to go directly to the CIP Specification. This is also true for an IGP implementer who may want to go directly to the Guide Design and Protocol Specification. Someone who is responsible for organizing the data for an agency may want to browse the SDD to understand the ICS data model and then proceed to the details in the Collections Manual and the Valids Document. Anyone interested in the direction the CINTEX is headed should review the CINTEX Work Plan. If technical input to the CINTEX direction is desired, reviewing the URD and proposing new User Requirements is the right approach.

    Additional information about CINTEX activities and documents can be found at:

    http://ceos.ccrs.nrcan.gc.ca/taskteam/cip.html
     

    1.5  Glossary

    The following sections provide a listing of acronyms and terms and definitions for the terms that may assist the reader in interpreting the terms and acronyms used throughout this document.

    1.5.1  Acronyms

        ARS Abstract Record Structure
        AVHRR Advanced Very High-Resolution Radiometer
        BNSC British National Space Centre
        CCRS Canada Centre for Remote Sensing
        CDB Collection Data Base 
        CEO Centre for Earth Observation (European Commission)
        CEOS Committee on Earth Observation Satellites
        CERES Clouds and the Earth’s Radiant Energy System
        CINTEX CEOS INTeroperability EXtensions
        CIP Catalogue Interoperability Protocol
        CM Collections Manual
        CMT Collection Management Tools
        CNES Centre National d'Etudes Spatiales (France)
        CSIRO Commonwealth Scientific and Industrial Research Organisation (Australia)
        DB Data Base 
        DBMS Data Base Management System
        DLR Deutsches Zentrum fur Luft-und Raumfahrt
        EO Earth Observation
        ESA European Space Agency
        ESRIN European Space Research Institute (ESA)
        GEO Geospatial Metadata Profile
        GILS Government Information Locator Service
        GIS Geographic Information System
        GSFC Goddard Space Flight Center (NASA)
        HTML Hyper Text Mark-up Language
        HTTP Hypertext Transfer Protocol (for consistency with HTML)
        ICS Interoperable Catalogue System
        ID IDentifier
        IGP ICS Guide Protocol
        IRE-RAS Institute of Radio Engineering and Electronics. - Russian Academy of Science
        ISA

        ISO

        ICS Systems Administrator

        The International Organisation for Standardization

        LaRC  Langley Research Center (NASA)
        LIS Lightning Imaging Sensor
        MODIS Moderate-Resolution Imaging Spectroradiometer
        NASA National Aeronautics and Space Administration (US)
        NASDA National Space Development Agency (Japan)
        NOAA National Oceanic and Atmospheric Administration (US)
        NRSC National Remote Sensing Center 
        NSRS  Natural Environment Research Council (BNSC)
        PTT Protocol Task Team (Part of CEOS- WGISS-AS)
        QA Quality Assurance
        RM Retrieval Manager
        RPN Reverse Polish Notation
        SAGE Stratospheric Aerosol and Gas Experiment III
        SDD System Design Document
        SST Sea Surface Temperature
        TBD  To Be Determined
        TBR To Be Resolved 
        TBS To Be Supplied
        URD

        URL

        User Requirements Document

        Universal Resource Locator

        VISSR Visible and Infrared Spin Scan Radiometer
        WGISS Working Group on Information Systems and Services (Part of CEOS)
        WWW World Wide Web

    1.5.2  Definitions

    This section provides definitions of the terms related to ICS and used in this document:
       
      Abstract Record Structure An Abstract Record Structure is the primary component of a database schema. An Abstract Record Structure applied to a database record results in an abstract database record.
      Archive An Archive of EO data can hold various types of data ranging from satellite images and climatological products processed from the images, to observation data and climatological statistics. An Archive may also contain information describing the EO data and also supplementary data such as design documentation, algorithm object and source code, technical reports, user manuals, etc.

      There is likely to be a database management system for maintenance and low level access to the data. The archive will, in general be accessed by a front end archive server that then presents the data as requested by the Retrieval Manager

      Archive Collection Group of related metadata based on the contents of the Archive.
      Attribute An attribute is a characteristic of a search term, or one of several characteristic components that together form a characteristic of a search term.
      Catalogue Interoperability The ability to provide a Data User with the appearance of a single, unified catalogue for all participating data providers. In order to provide catalogue interoperability all participating data providers must support at least one common method (i.e. API) for accessing functions such as authentication directory, inventory, guide and order. Each supplier may support additional consumer functional interfaces to support their private data users.
      Catalogue System A Catalogue System provides services such as inventory, browse, directory, order and guide, which may be supplemented by further services, but should contain at a minimum, inventory. The CIP is the protocol that shall enable the many services of many catalogue systems to interoperate. Usually a catalogue system resides at a particular agency or data provider facility but may be distributed across catalogue sites.
      Catalogue Translator One of three types of ICS Translators. Catalogue Translator converts CIP messages into a data provider's protocol for the services of Inventory, Directory, Browse, Guide and Ordering. A detailed discussion about the various translators can be found in the ICS System Design Document, Section 3.5.
      Collection  A Collection is (1) a group of related data items with certain common characteristics. Collections are generally defined by data providers, but may also be defined by users. (2) An abbreviated term for "Collection Descriptor". 
      Collection Category The type of Collection i.e. Theme or Archive. 
      Collection Descriptor Metadata description of a Collection. This descriptor includes pointers to items included in the Collection. The included items may themselves be Collections, thus forming a hierarchical Collections Structure.
      Collection Management Tool Used by the ISA for tasks involved with populating and maintaining the data in the Retrieval Manager. These tasks involve translating Collection or directory information into CIP Collection format, checking for valid entries and the presence of mandatory data.
      Collections Structure A Collections Structure is a logical organization of Collection and Product Descriptors.
      Data Provider Individual responsible for the Scientific Content of the data product(s).
      Element An element is the smallest unit of information used to define the schema elements which in turn defines a schema.
      Element Sets  Element Sets are a compilation of elements that form a view of the data. CIP has identified eight (8) Element Sets. Each of these sets is described below.

      Full (F): the Full element set includes all defined standard elements from the appropriate database schema (for the CIP, this is usually the CIP database schema, as defined in Appendix B), and so, when applied, results in a null transformation. This is a large set of elements, but it ensures that clients receive everything their users may need to evaluate the retrieval record for further processing. 

      CIP Full (CF): the CIP Full element set includes all defined standard CIP elements from the CIP database schema as defined in Appendix B. When the CIP database schema is used, the Full and CIP Full element sets are therefore equivalent. However, when a custom database schema is used (i.e. a custom schema defined by a data provider and containing Local Attributes in addition to the standard CIP ones), the CIP Full element set contains uniquely the standard CIP elements, whereas the Full element set contains all the elements defined in the custom database schema, i.e. the standard CIP elements and the custom local elements.

      Brief (B): the Brief element set includes a minimal subset of the defined standard schema elements available from the appropriate database schema.

      Summary (Sum): the Summary element set includes a subset of the defined standard schema elements that is appropriate for interoperability with the GEO profile.

      Browse (Br): the Browse element set includes a subset of the defined schema elements that are appropriate for retrieval of browse data alone.

      Options (Opt): the Options element set includes a subset of the defined schema elements that are appropriate for retrieval of options alone.

      Local Attributes (LA): the Local Attributes element set includes a subset of the defined standard schema elements that is appropriate for retrieval of the local attributes in a product descriptor.

      Collection Members (CM): the Collection Members element set includes a subset of the defined schema elements that are appropriate for retrieval of the collection hierarchy tree.

      Appendix B of this document identifies the elements that are included in each of the above Element Sets.

      Guide  Data that is available to the user to enhance understanding of the EO data, spacecraft, instrument, etc., and hence make a detailed analysis of whether the product data will be of value for a particular application. Guide data may also contain information necessary for processing the product data further, such as calibration coefficients.
      ICS Site Administrator The human operator that performs all tasks needed to establish and maintain a Retrieval Manager. In practice this is more than one person as the tasks are various types: scientist for Collection definition, data base expert for maintaining CD, system operator for diagnosing and correcting operational activities, etc. For convenience purposes all of these tasks are performed by the ISA.
      Item Descriptor An item descriptor is comprised of one or more attributes. The attributes describe the item in question in a consistent manner, therefore resulting in dynamically defined item instances. The item descriptor is used to represent a number of key objects in the CIP domain such as Collection Descriptor

      The attributes and their values that constitute the item descriptor can be searched on so as to identify a particular item descriptor or group of item descriptors.

      Present Service The present service allows the client to request response records corresponding to database records represented by a specified result set.
      Product data A unique aggregation of data generated from information held in, or to be held in an archive (for predicted products). It can be located and retrieved by a user via CIP, possibly following further processing, such as map projection, sub-setting, band selection, etc., after or during extraction of the raw data as stored in the archive.
      Product Descriptor Metadata description for product data.
      Result Set A local data structure used as a selection mechanism for the transfer of records, identified by a query. Its logical structure is a named ordered list of result set items, and possibly, unspecified information that may be used as a surrogate for the search that created the result set.
      Retrieval Manager A Retrieval Manager services (and may be installed at) each catalogue site, it is used to integrate together the local catalogue systems and provide communication between users and other catalogue site Retrieval Managers. It is anticipated that each catalogue site would have at least one Retrieval Manager and that Retrieval Manager would ‘know about’ or ‘own’ a number of collections. The data within these Collections would be the responsibility of that Retrieval Manager, with external Collections referenced only and managed by their respective Retrieval Managers.

      The Retrieval Managers at each catalogue site would also communicate with each other using the CIP. The Retrieval Manager would then also communicate with local catalogue servers, such as archives and inventories, within its own site to service requests received from users. Another key function of the Retrieval Manager is to route search queries to other relevant Retrieval Managers and consolidate the search results before returning them to the user.

      Schema A common understanding shared by the client and the server of the information contained in the records of the database. The schema is defined in terms of schema elements.
      Search Service The search service enables an origin to query databases at a target system, and to receive information about the results of the query.
      Tag Set A tag set is the set of identifiers for the elements. 
      Task Package The set of attributes that describe an activity which is started by an Extended Services Request. Based on Z39.50 definition for a Task Package.
      Theme Collection Group of related metadata based on a common theme or purpose.
      Translators Software element that converts CIP into the protocols used by a data provider. Three Translators are identified in ICS: Catalogue Translator, OHS Translator, UPS Translator. A detailed discussion about the various translators can be found in the ICS System Design Document, Section 3.5.

    1.6  References

    The following list of documents may offer additional technical information to the reader who may be interested in CIP/ICS.

    [R1] Catalogue Interoperability Experiment (CINTEX) Development Plan, CEOS/WGISS/CINTEX/Plan, Issue 1.0, 19 July 1996, Committee on Earth Observation Satellites /CINTEX

    [R2] Interoperable Catalogue System (ICS) User Requirements Document (URD), CEOS/WGISS/CINTEX/ICS-URD, Issue 2.2, March 1997, Committee on Earth Observation Satellites /CINTEX

    [R3] Catalogue Interoperability Protocol (CIP) Specification - Release B, CEOS/WGISS/CINTEX/CIP, Issue 2.4, June 1998, Committee on Earth Observation Satellites /CINTEX

    [R4] Interoperable Catalogue System (ICS) System Design Document (SDD), CEO/WGISS/CINTEX/SDD, ISSUE 1.4, June 1998, Committee on Earth Observation Satellites /CINTEX

    [R5] Information Retrieval (Z39.50): Application Service Definition and Protocol Specification, ANSI/NISO Z39.50-1995, Official Text, July 1995, Z39.50 Maintenance Agency.

    [R6] ICS Guide Design and Protocol Specification, CEOS/WGISS/CINTEX/GDPS, Issue 1.1, July 1998, Committee on Earth Observation Satellites/CINTEX

    [R7] CIP Specification Valids, CEOS/WGISS/CINTEX/GDPS, Issue 0.5 August 1998, Committee on Earth Observation Satellites/CINTEX

       

    1.7  Catalogue Interoperability

    This section provides an overview of the CINTEX efforts in support of catalogue interoperability. The following topics are addressed in this section:
     

    1.7.1  Purpose and Scope of Catalogue Interoperability

    The Committee on Earth Observation Satellites (CEOS) is comprised of international space agencies. CEOS promotes the interoperability of space agency catalogues through the definition and development of interoperability concepts. By enhancing the standardization of EO data and information management services, CEOS enables the catalogue services to be more accessible and usable to data providers and data users world wide. EO catalogues services, as defined by CEOS, are as follows:
    Catalogue Interoperability in this context is defined as: the ability to provide a Data User with the appearance of a single, unified catalogue for all participating data providers. In order to provide catalogue interoperability all participating data suppliers must support at least one common method(i.e., API) for accessing functions such as authentication, directory, inventory, guide and order. Each supplier may support additional consumer functional interfaces to support their private data users

    Catalogue interoperability may extend beyond just the members of CEOS in promoting data access within a wider community of EO data providers and eventually to non EO data providers.
     

    1.7.2  ICS Concepts

    This section introduces key concepts for the understanding of the CIP, IGP and ICS.
         

    1.7.2.1  Design Approach: CIP Space, IGP Space and ICS

    The CINTEX design approach considers catalogue interoperability as the loose coupling of a federation of existing catalogue systems using a set of common protocol. The approach provides users the services and metadata available at all sites regardless of which site the user established a connection with. The Catalogue Interoperability Protocol (CIP) standardizes the services needed for interaction between users and catalogues of EO data products The ICS Guide Protocol (IGP) standardizes the services needed for a user to discover EO Related documents (i.e., guide documents). The Interoperable Catalogue System (ICS) is a design which uses CIP and IGP as the common protocols between data providers and users of the data. The objective of implementing CIP and ICS is to provide more users with access to more data more easily.
The ICS domain can be seen in Figure 1-2 as divided into two virtual domains;

Figure 1-2 ICS Domain


 



 

To support transparent access to multiple catalogues, a three tier structure was used to design the ICS space. Client’s exchange messages with a middleware layer, which in turn interacts with multiple catalogue servers. The middleware provides the routing and translation services to allow client requests to be presented at the multiple heterogeneous catalogues. The middleware is of two types of elements: Managers and Translators. Managers provide an access point for clients and route the requests to the various servers. Translators, bound with the clients and servers, translate CIP or IGP to and from the native protocol of the client or server. Future client and server developments may use CIP or IGP directly and hence not require translators

This approach supports a diversity of clients, and servers. Clients may be used directly by a human user or may be an agency system acting on behalf of a user. Depending on the design of an existing catalogue system, services may be provided by different servers and translators. Because the routing service provided by the Middleware is independent of the type of service, separate translators may be provided for inventory, browse, ordering, and user profiles. This architecture is also applicable for small data providers, such as university research groups, who are unable to provide adequate middleware at their site but still wish to join the ICS domain. Their local catalogue, inventory and guide documents can be made available to the ICS community by the inclusion of appropriate descriptions within another agency’s middleware..

CIP Space is a protocol centric view of catalogue interoperability and provides for the loosest coupling needed to achieve catalogue interoperability among a wide community of EO data providers and users of EO data.. A range of design solutions is permitted by the CIP and IGP spaces. To provide for a higher degree of uniform services at the cost of additional agreements between agencies, additional design for interoperability is defined in the ICS design document. The additional design definitions pertain to the allocation of functionality and data amongst components, agreement on an underlying communication protocol, and agreement on how to conduct distributed system management of ICS. The difference between CIP Space and the ICS is depicted in Figure 1-3. CIP Space is defined by those CEOS agencies and other federations and organizations which provide catalogue services using CIP and/or guide service using IGP. Those CEOS agencies which provides services, communications and systems management compatible with the ICS design make up the ICS. Its should be noted that while all ICS members must implement CIP, guide handling is considered an optional element of ICS and an ICS members may choose not to implement IGP. Note that other federations may choose to use the ICS design as the basis for their federation.


Figure 1-3 VENN Diagram of CIP Space and ICS


 



 
 
 

Assuming query and result routing between geographically dispersed sites (see Figure 1-2), an agreed middleware layer and its interfaces to users and providers needs to be in place. To define such a system, the CINTEX have established the following CIP, IGP and ICS standards:

    1.7.2.2  Collections Data Model

    In an interoperable catalogue environment it is essential to organize metadata by distinguishing user and provider views as well as archive-oriented and theme-oriented structures. It is important to define a mapping between the different views and structures which often will lead to a hierarchical relationship of Collections. The ICS data model is based on the notion of Collections.. A Collection may contain descriptors for data products or descriptors for other Collections. In addition to the value of Collections for presentation of data organization to users, Collections provide the mechanism for routing distributed searches. When a Collection contains both local and remote members, the Retrieval Manager may search the local Collection as well as sending the search on to the remote site.

    Collection Characteristics are terms that serve to further describe the roles of the various Collections within ICS. For example, Terminal Collections are those Collections that appear in terminal positions in a Collections Structure and thus have a semantic meaning that ICS recognizes and responds to. Therefore, it is important for the ISA to understand the following Collections Characteristics.

        • Commonality
        • Evolution over time
        • Terminal Collections
        • Identifier
        • Uniqueness
        • Remote Collections
        • Related Collections
        • Local Attributes
Commonality:

By definition, a Collection is a grouping of items that have something in common. A Collection may have members that have many or fully common attributes (Archive Collection), or a Collection may have members that have a common semantic theme, though only a small subset of common attributes (Theme Collection).

Further CIP specifies a list of standard attributes that can be searched. Some of these standard attributes are mandatory for all Collection members (different mandatory sets for different descriptor types), while some are optional (although commonly understood). Finally, some attributes can be locally defined.

Evolution over time:

Static members do not change over time. This can be a result of static underlying Collections or the mechanism used to create the Theme Collection such as a Volcanic Eruption Theme Collection whose members will more than likely remain static over time.

Dynamic members may change over time based on changes in the included Collections. It is envisioned that the majority of Collections will not be static, but will evolve as ICS is used. This will occur in response to the way in which the user community wants to view relationships between the various data held by ICS, and the ways the CEOS Agencies wish to respond to those desires. Dynamic membership will require close supervision by the Retrieval Manager Administrator to ensure that the Collection Descriptor information is current.

Terminal Collection

  As Collections can contain pointers to other Collections, reflected in the included Item Descriptor ID’s, there exists the concept of a ‘Collections Structure’ with the leaves of the structure's branches being product descriptors. The Collections that include only product descriptors are termed ‘Terminal Collections. However, Theme Collections that include Remote Collections may act as Terminal Collections at the RM that hosts the Collections Structure.

Identifier

Each member (collection descriptor) of a Collections Structure must have an identifier that is unique within the Retrieval Manager. This unique identifier of a Collection Descriptor (Archive or Theme) will include the Retrieval Manager (RM) identifier and the Collection Identifier. A Product ID is assumed to be unique within its home Archive Collection.

Uniqueness:

A Collection Member (included item descriptor ID) may be a member of more than one Collection. However, duplicate members (included item descriptor ID’s) must not be visible within a single Collection. For example, Provider Archive Collection AMSR on ADEOSII could not appear twice as an included item descriptor ID in a Collection that contained both the Sea Surface Temperature Collection and the Andes Event Collection. This property is known as elimination of duplicates to achieve uniqueness.

In the case of a Collection that is a child of two or more included Collections, any operation such as search, which traverses the Collection Tree from the top level Collection will end up repeatedly visiting the child Collection. The unique Collection Identifier provides a means of preventing repeated operations on the same Collection. This is achieved by noting which tree nodes (unique identifiers) had been visited and then restricting access to those nodes (unique identifiers) for the same search. The Retrieval Manager will perform this functionality.

Remote Collections:

Remote Collections are members of a Collection Hierarchy whose descriptor information is stored or maintained at a CIP site other than the one in which it is listed as an included remote item.

Normally, a Collections Structure would be held in one place (for example as a database on a computer). A logical Collection Tree is where one or more of the members are held elsewhere - the complete Collection Tree thus spans multiple sites. If a Collection references (lists an included item descriptor or included task package) a Member Collection at a remote site, this Member Collection is termed a ‘remote member’.

Collection descriptors do not have to maintain information about which Remote Parent Collections refer to them; remote members are indistinguishable from local Collection Members from the user’s point of view. This concept is supported by the consistent use of Universal Resource Locators (URL) to identify Collection Descriptors, in the same manner as the complete World Wide Web (WWW) is seen by the user as a single database. A Retrieval Manager ‘owns’ those Collections for which it stores and maintains the attributes; it only stores the pointers (Included Item Descriptor ID’s/included task packages) to remote members, not their attributes and values.

No attribute or value of an attribute for a remote member, or the pointer (included item descriptor/task package name) to the remote member, can be guaranteed. The Retrieval Manager where the remote member is stored may not be available; the remote member may have changed its data structure (adding, changing or deleting attributes), or the remote member may have been deleted from the remote Retrieval Manager.

Related Collections:

Collections may be related to one another without the need for a "parent-child" or the "include" construct. The relation may be through content or purpose, for example, and allows the spanning of one Collection Tree to another. A Collection Descriptor will contain a list (possibly empty) of related Collections as part of its content.

Local Attributes

Local Attributes are Collection-Specific Characteristics. The existence of Local Attributes may be specified within a Collection Descriptor by setting the Local Use Attribute Flag Element. A flag of 0 indicates that the Collection Descriptor does not contain Local Attributes, 1 indicates that the Local Attributes are described within the Collection Descriptor, with corresponding values captured in the member Product Descriptors; 2 indicates that the Local Attributes have been described in the Explain Database and the values for the attributes are captured in the Product Descriptor. Local Attributes Using the Collections Database and Local Attributes Using Explain are addressed in Section 3.

Ordering Nodes

TBD

  The Collection concept is visualized in Figure 1-4 below. The Collections in the diagram are numbered so that their relationship can be easily seen; they do not represent the naming of Collections in an actual implementation. The Terminal Collections (labeled ‘1.x’) group the product descriptors (inventory entries) as is appropriate. As can be seen the Collections can overlap each other and product descriptors can appear in more than one Terminal Collection. Above the terminal level Collections, there are non-Terminal Collections that group together any number of other Collections. The grouped Collections do not all have to be at the same hierarchical level and this grouping of Collections can continue to any hierarchical level, with existing Collections being included at any other arbitrary level. A non-terminal Collection could group together Terminal Collections and other non-terminal Collections (as the link between Collections 3.1 and 1.5 shows). Also, a Terminal Collection could exist without a relationship to a higher Collection (i.e. Collection 1.9), or a non-terminal Collection could exist with no relationship to lower Collections, in other words a Collection without members (i.e. Collection 2.5). Collection 1.9 can not be reached by a hierarchical search, but could be located if its URL was made public (an example of such a Collection may be a Collection under construction).
Collections can be used to group data together which have a similar semantic theme. All Collections support the search mechanisms defined in the CIP. CIP defines two types of searches that a CIP user may request:
Additionally, the user may request that the search be contained locally to the target Retrieval Manager (i.e., a local search), or request that the search be propagated to other Retrieval Managers based on the Collections (i.e., a distributed search).

Figure 1-4 The Concept Of A Collection

    1.7.2.3  CIP as a Z39.50 Profile

    Based on a set of user requirements, and an analysis of existing communication standards, Z39.50 was selected as the base protocol for CIP. CIP has exploited and extended the services of Z39.50 to provide distributed searching, extensions to attribute set definitions, and the definition of a secure ordering service. The Z39.50 protocol is designed for information search and retrieval within a generic domain that, together with the powerful services and data structures it supports, makes it an ideal basis of an EO domain search and retrieval protocol. CIP is a profile of Z39.50, i.e. it defines the use of the Z39.50 facilities within the CIP domain and defines the attributes that are used to search and present EO information. Other Z39.50 profiles include Government Information Locator Service (GILS), Geospatial Metadata Profile (GEO) and the Digital Collections Profile. CIP extends Z39.50 for distributed searching by supporting the Collection data model discussed in section 1.7.2.2 which allows hierarchies of related Collections to be constructed and searched.. Additional support for compatibility is provided by the requirement that Retrieval Managers should be able to support access by any Z39.50 Version 3 compatible client.

    The GEO Profile supports Geographic Information Systems (GIS) applications and thus is of special interest to users of EO data. For this reason an alignment of the CIP and GEO profiles has been made. The objective of this alignment has been to allow both GEO and CIP clients to search and retrieve records from databases defined by either profile, and thereby maximize interoperability. The alignment was helped by the similarity of the spatial and temporal attributes of the metadata, has had to take into account the different data models in CIP and GEO. It should be emphasized that the CIP/GEO interoperability is for search on the intersection of CIP and GEO attributes and the retrieval of item descriptors. There is no interoperability on the more advanced functions of CIP such as ordering and security.

           

    1.7.2.4  Browse Data in CIP

    Browse data helps users to evaluate EO data products. Browse data are typically reduced resolution or summary data versions derived from the EO data product data itself. Browse data are delivered to the user via two different mechanisms, dynamically over the network during a user query session, and as an EO data product order. The second case allows users to order the Browse data from an archive system to be delivered separately from their query session. This means that the user can then store and access the data locally rather than dynamically over a network. It is important to note that although most catalogue systems will provide some form of reduced data retrieval, it is not a mandatory CIP service. The form and content of browse data is dependent on the nature of the associated EO data and the data selection criteria necessary for a science discipline to evaluate the EO data. Browse data in the CIP is seen as one of the following forms:
     

    1.7.2.5  Data Product Ordering and Security

    CIP includes an ordering method including the ability to specify order options and includes provisions for authentication and non-repudiation of orders. A user can retrieve the order options associated with a data product, where order options may be processing as well as packaging options. CIP allows the local order handling system to define the order options without attempting to define an all-encompassing order options standard. The user can request a quote for a specific order and submit the order. The order process is monitored by the Retrieval Manager and can be queried later by the user to determine the status of the order. To support ordering of data for which a user must have privileges or for orders that the user will be charged, a authentication scheme has been defined. The authentication supports digital signatures using either a shared (symmetric) key approach or a public (asymmetric) key approach. Authentication allows the Retrieval Manager to identify the user with an appropriate level of confidence and enables the Retrieval Manager to log the authenticated user requests to provide non-repudiation. The CIP security approach avoids the need to transfer password information over the network. Future enhancements to CIP anticipate the ability to support the transfer of financial information to support billing.
     

    1.7.2.6  Guide Documents in ICS

    Much of the metadata for EO data Collections is not easily stored in a structured form. This information is stored in documents called Guides. Since Guide documents provide information that is required for the understanding of some EO data Collections, they must easily be accessed via ICS mechanisms that provide search and retrieval of catalogs. Guide documents also provide human readable descriptions of EO data Collections and are often used by new EO data users as a discovery mechanism to identify Collections of interest. It is the goal of the ICS to make this discovery mechanism as simple and widely available as possible to extend the uses of EO data to communities which have not traditionally used EO data. This goal has resulted in the definition of a guide system in ICS which uses ICS Guide Protocols (IGP) based on HTTP and enables general purpose Internet Search/Discovery Engines such as Alta Vista to locate EO Guide documents either by free text or attribute value searches.

    This system is not based on Z39.50 and is not a mandatory capability of an ICS node. However there is a strong linkage between the CIP client/retrieval manager and the HTTP based client and indexing method for Guide Documents. To allow coordinated access to catalogues and documents an ICS client was designed with a CIP Client component and an IGP Client Component. The ingest of documents and Collections into the ICS is coordinated by the Collection Management Tool (CMT) to assist in maintaining the consistency of the Collection descriptor and the HTTP index that enable search and access of Guide Documents. Further details of this ingest process are discussed in this Collections Manual. The specific design of the guide system can be found in the ICS Guide Design and Protocol Specification [R7].
     

    1.7.3  Levels of Compliance to CIP, IGP and ICS

    The ICS and CIP are complex documents with many services and mechanisms specified. As discussed in Section 1.7.1, agencies may choose to implement a wide range of these services in their CIP Clients and Retrieval Managers. It is critical for the designers and implementer of these software components to understand what capabilities are critical to the minimal operations of the ICS and must be implemented in all components versus those capabilities that are optional. In addition it is assumed that various CIP based components will be available either as shareware or commercial software. The developers of ICS or other CIP federations will need a method to categorize and select among these available components. For this reason, compliance levels have been defined within both the ICS SDD and the CIP Specification.
     
These compliance concepts are interdependent since in order to support a specific ICS service, the RM and CIP Client must support the CIP messages that enable that service.
 

IGP has only one compliance level that is full compliance so no system that does not implement the full IGP Specification can be considered IGP compliant.


    2.  The ICS Collections Model

    The basic components of the ICS Collections Model are the descriptors. The following sections describe the intent of the descriptors and the types of descriptors that can be included in the Collections Model.

    2.1  Descriptor Types

    In general, Collections Metadata describe the contents of a group of associated data. This set of descriptive metadata in ICS is called a Collection Descriptor. Collection Descriptors contain the attributes/elements that are used to capture a Collection of related information. This information can be thematically related (Theme Collections) or associated as a result of common information contained in the underlying product data (Archive Collection). This distinction is important as each category (Theme/Archive), is governed by a separate set of rules that specify the contents and creation possibilities for the Collection Descriptor category.

    In addition to these two primary categories of Collection Descriptors ICS also supports the capturing of Product Descriptors. These descriptors describe the contents of individual product data that is stored in the local archives.

    This section of the Collections Manual addresses the rules associated with the creation of Collection and Product Descriptors. Each subsection will begin with a brief definition of the object (Collection or Product Descriptor) followed by the mandatory information for each object and a discussion of the rules surrounding their creation. The purpose for this discussion is to ensure that the ISAs create these objects with the same semantic definition. This in turn will provide an important step towards data interoperability.

       

    2.1.1  Archive Collection Descriptor

    What They Are:

    Archive Collection Descriptors are aggregations of product descriptor information. One Archive Collection Descriptor will typically exist for many data products. The common information across these products is reflected in the Collection Descriptor for the Archive Collection. For Collections to be considered Archive the following should be true:

      1. The same time and space attribute types (i.e., Bounding Rectangle and Temporal Range) will characterize each product in the Archive Collection.

      2. All of the product descriptors reflected in the Archive Collection had the same set of elements in their description.


    What They Contain:

    These Collection Descriptors must contain

  1. mandatory metadata that is identified in the Collection Descriptor, reference Appendix B, Table B-2 and B-3, and reflected in the Mandatory Data for Collections, Figure 3-3.
    Temporal Coverage

    Spatial Coverage

    Included Product Descriptors

    Keyword (Spatial)

    Keyword (Temporal)

    How They Are Created:

    The Data Provider will create these Collections from the information contained in the local inventory system.
     

    2.1.2  Theme Collection Descriptors

    What They Are:

    Theme Collections are Collections that are based on themes or topics of interest. They are a mechanism for organizing the Earth Observation Data into manageable sets of information. For example, a Data Provider may desire to organize several of his Archive Collections into a Sea Surface Temperature Theme Collection that may or may not contain homogenous data. For example, the geographic extent for each of the Archive Collections may be non-overlapping thus resulting in a combined global extent for the new Theme Collection. Or, several of the Archive Collections may contain Atmospheric Geophysical Parameter Data, while the remaining Archive Collections address Sea Surface Temperature Measurements. Therefore, Theme Collections allow the ISA and/or Data Provider to combine many existing Collections into a single Collection.

    What They Contain:

    Each Theme Collection must

  1. Contain as a minimum the mandatory data identified in Appendix B Table B-2 and B-3, for the Collection Descriptor, and illustrated in the Mandatory Data for Collections Figure 3-8, Section 3.2, of this document.
  2. Contain at least one (1) Included Item Descriptor ID.
  3. eventually trace to at least one (1) Provider Archive Collection so that the goals of ICS can be met.
  4. Additionally, these Collections, like the Archive Collections, may also contain dynamic or static data. The content of the Static Collection will not vary over time. For example, a Science Data Provider may choose to create a Theme Collection which will reference all known Collection Descriptors that contain information about a specific event, i.e., the Mid West Flood of US in 1992. Once established the content of this Collection would not necessarily vary. On the other hand, the dynamic Collection will change as frequently as the underlying (included) Collection/Product Descriptors change. For example, assume that a Sea Surface Temperature Theme Collection has been established that includes several Archive Collections. Also assume that these Collections are still in the process of being continually updated with additional data product information. The additional data product information has forced a change to the Temporal Coverage for the related Archive Collections. Therefore, the Sea Surface Temperature Theme Collection’s Temporal Coverage would also be updated to reflect the changes in the underlying Archive Collections.
How They Are Created:

There are several creators of the Theme Collections. The Data Provider may create Theme Collections to support his user communities or research activities. He may include in his Theme Collection other Theme Collection, Product or Archive Collection Descriptors. Table 2-1 provides an example of combining attributes from various Collections to create a single Theme Collection Descriptor. Column one identifies the attributes that will need to be combined across Collection Descriptors. Columns two through four identify the values of the attributes in the existing Collection Descriptors that will serve as input to the new Collection Descriptor in column five. Column five identifies the results of combining the values of the attributes in columns two through four for the new Theme Collection Descriptor. Column six provides comments on combining the attributes reflected in column one.


 

Table 2-1 Example of Combining Attributes

Attributes Theme Collection 

Input Collection

Archive Collection

Input Collection

Archive Collection 

Input Collection

Theme Collection

New Collection

Comments
ArchivingCentreId N/A GSFC LaRC N/A  
CollectionCategory Registered Registered Registered Registered This is a default value for all collections
CollectionHierarchy
Category
Theme Archive Archive Theme  
CollectionHierarchy
Position
Non-Terminal Terminal Terminal Non-Terminal  
EndDate 971201 980101 981201 981201  
GeospatialForm Model Model Remote-Sensing Image Model, Remote-Sensing Image  
ItemDescriptorID z39.50s://
larc.nasa.gov/thc_1
z39.50s://
larc.nasa.gov/
arc_fire_ax_isccp_dx
z39.50s://larc.nasa.gov/
arc_atm_ceres_01
z39.50s://
larc.nasa.gov/thc_2
Unique ID for this new Theme Collection
ItemDescriptor
Language
English Italian French English Language for this new collection
ItemDescriptorName AirQualLead AirQualMts AirQualUrban AirQual  
RevisionDate 960101 960101 961201 990401 This should reflect the date of creation of the new Theme Collection.
StartDate 960101 960101 961201 960101 Earliest date of the combined collections.
Northboundingcoordinate 40 45 23 45 Take max. value
Purpose  Monitoring of Air Quality  Air Quality Urban Air Quality Mountains Monitoring of Air Quality Urban, Mountains Semantic accumulation
ThemeKeyword Earth Science>Atmosphere>Air Quality>Lead Earth Science>Atmosphere>Air Quality>Carbon Monoxide Earth Science>Atmosphere>Air Quality>Emissions Earth Science>Atmosphere>Air Quality  
Local Attributes None a,b,c,d,e a,c,d,f Not Allowed Combining Local Attributes is not allowed

 

    2.1.3  Theme and Archive Collection Descriptors Summary

    To summarize the above discussion the following Table 2-2  summarizes what the various Collection Descriptors may include.
         

        Table 2-2 What Collections May Include

          Product Descriptor Collection Descriptor

        Archive

        Collection Descriptor

        Theme

        Collection Category Local/Remote Local/Remote Local/Remote
        Archive Collection X/    
        Theme Collection      
           Terminal X/    
           NonTerminal   X/X X/X
    The rows in the above table identify the Collection Categories. The columns identify the types of descriptors qualified by the Collection Categories where appropriate. Additionally the columns indicate whether the descriptor is local or remote. The "X" indicates that the Collection Categories specified in the rows may include the descriptors (local/remote) specified in the columns. For Example, The Archive Collection Category may include Local Product Descriptors.
         

    2.1.4  Product Data

    The product data are the orderable items that are referenced in ICS. This reference may be the Item Descriptor ID’s for the Product Descriptors contained in the CDB, or the reference may be to the Local Catalogue Translator. The latter contains the translation between the CIP Product Descriptor and the Local Inventory description of the product.
         

    2.1.5  Product Descriptors

    What They Are:

    Product Descriptors are the metadata that describe the instances of the product data that are archived in the local system. This metadata is used to search and retrieve information about the product data.

What They Contain:

All Product Descriptors must


How They Are Created:

Product Descriptors are primarily created from the local inventory descriptions. This creation activity centers around the mapping of existing product information contained in the local inventory to the CIP Product Descriptor Schema Definition, reference Appendix B, Table B-2. This is a multi step operation that requires knowledge of the site's existing inventory and the CIP Data Definitions. This process is defined on the CEO Web Page (TBS).


    3.0  Collection Management

    Collection Management within ICS refers to the development and maintenance of the ICS Collections metadata. This development and maintenance includes the physical, i.e. databases, as well as the logical, i.e., Collections Structures, for ICS Collections. This section begins with an overview of the ISAs Collection responsibilities, followed by the rules and guidelines for identifying Theme and Archive Collections, and product information (descriptors); the process for creating and maintaining a Collections Structure; creating and maintaining the instances in the Collections Database and lastly creating and maintaining entries in the Explain Database. Figure 3-1 graphically illustrates this process and identifies the sections within this document where each activity is described in further detail.
     
     

    Figure 3-1 Collections Process


     






    Each task identified in the above process may be the responsibility of the ISA or shared among various individuals who may be responsible for various aspects of the data contained in the ICS. For example, a Data Analyst may assist in analyzing the EO Data and establishing a Collections Structure; a Scientist may assist in the scientific review activity and a Database Designer may be responsible for the creation and modification of the database entries.
     

    3.1  ISA Responsibilities

    It is assumed that an ICS Site Administrator (ISA) will be responsible for the population and maintenance of the metadata contained in the ICS Retrieval Manager. The following identifies and describes the ISA responsibilities for the Explain and Collections Databases that are defined in the ICS SDD (System Design Document) Section 4.
       

    3.1.1 Collections Database

    The Collections Database is a repository of information about Earth Observation Data in the ICS. The database is organized around two major data objects: the Collection Descriptor and the Product Descriptor. Collectively these descriptors and their associated data objects provide the framework for describing the Earth Observation Data within ICS. The ISA is responsible for the logical consistency of this database, that is, they are responsible for the following:
1. Ensuring that the information contained in the database conforms to the specified rules
Mandatory attributes exist,
Valids are correct,
Data Types for each attribute are correct.
    2. Ensuring that the information contained in the Collections Structure conforms to the specified rules (Ref Section 2), for Archive Collections, Theme Collections, and Product Descriptors. Reference Section 3.2.1 of this document for a discussion on Collections Structures.

    3. Ensuring that the Collections Structure is referentially correct which implies that the included and related items exist either locally or remotely.

    4. Ensuring that all of the Collection IDs are unique within the Retrieval Manager.

    5. Working with the Data Providers in defining Collections that satisfy the users needs.

    6. Ensuring that the designated scientific review group has scientifically reviewed the data.

    7. Ensuring that any data mappings that may have occurred between existing systems and ICS are semantically correct.

    8. Ensuring that the Terminal Theme Collection’s included Product Descriptors contain the related Archive Collection

    9. Ensuring that the Product Descirptors have been previously defined

    10. Ensuring that the Query Model has been defined.

    3.1.2  Explain Database

    The Explain Database contains the data necessary to support the Search and Present Services of the Z39.50 protocol. These services require that the Explain Database contain the following descriptive information about the Collections Database: · The elements and their associated information (size, datatype, etc.) as specified in Appendix B Table B-1

    · The use attributes identified in Appendix B Table B-2 and B-3;

    · The schemas;

    · The tag sets;

    · The Collection Identifiers;

    · The record syntax information identified in the CIP Release B Specification as SUTRS, GRS-1, EXPLAIN, and ES Task Package.

    Thus the Explain Database acts as the data dictionary for the CIP Collections Database. Therefore it is the ISA's responsibility to ensure the following:
      1. That the elements and characteristics about the elements that are expressed in the Collections Database are contained in the Explain Database

    2. That the use and present attributes/elements are identified in the Explain Database

    3. That logical consistency exists between the various data objects in the Explain Database.
     

    3.2  Collection Creation

    The creation of Collections is a multi-step process. The following sections recommend establishing a Collections Structure, followed by creating entries in the Collections Database and concluding with the appropriate Explain Database entries. Each of these steps provides the necessary input into subsequent steps of the process. It is a procedure that draws on many subordinate tasks (i.e. creating Query Models and Analysis Results Tables) to accomplish its goals, thus it may appear burdensome. However, it is a recommendation for accomplishing the various Collection Management Tasks (i.e. creating a Collections Structure and populating databases), that when followed will produce the desired results.

    3.2.1  Collections Structure

    The principal purpose for the Collections Structure is to provide a framework that the ICS search function will use to determine the full extent of a Collection and/or product query. Therefore, it is necessary to first create and then to maintain the Structure. It is important to note that the Collection Descriptors in the Collections Database capture the characteristics of the instances of the EO data. The Collections Structure, on the other hand, identifies the logical organization of the EO data instances. This logical organization is based on grouping similar information (Collection and Product Descriptors) into themes that will support effective and efficient searching. Including a reference to associated Collections or products (Included Item Descriptors) carries out the logical implementation of this organization. It will be necessary for the ISA coupled with the Data Providers to establish this Collections Structure.

    Collection Structures are site-specific data organizational strategies. Like any data organizational strategy, themes or aggregation topics are used to organize the data. These data organizational strategies will vary from site to site depending on the Collection Categories and the desired associations between the various Collections. Therefore, a standard software procedure to ensure that themes or aggregation topics are consistent across sites is not possible. It may be possible for a local site to develop site-specific software to ensure that the site's Theme Collections are consistent within the site's Collections Structure, however, this is external to the ICS Retrieval Manager. Figure 3-2 illustrates a proposed process for determining the appropriate Collections Structure for the site's data holdings. Additionally, Figures 3-3 and 3-4 illustrate a process for modifying an existing structure. The shaded processes in Figures 3-2, 3-3 and 3-4 are addressed in Sections 3.2.2 and 3.2.3 of this document.

         

        Figure 3-2 Formulating the Collections Structure



         
         

        Figure 3-3 Adding Collections to an Existing Collections Structure



         
         
         

        Figure 3-4 Deleting a Theme Collection from an Existing Collections Structure



         

    3.2.1.1  Formulating the Collections Structure

    The following discusses the procedure for creating a Collections Structure. This procedure uses a three-step process. The results from each step are the input into the next. The following discusses each stage of the process from the development of the Query Model through the construction of the Collections Structure.
           
    3.2.1.1.1  The Query Model
    The Query Model is a representation of the way in which the user can search for information in an Information System. This model is represented as parameters, relationships and values. The purpose for this model is to establish an effective way to search the site's Collections Structure and to establish common search attributes across the Collections Structure.

    In ICS the basic query parameters are the mandatory use attributes that are specified in Appendix B Table B-2 and B-3. In addition to these parameters the site may identify additional use attributes from the optional use attributes, also specified in Appendix B Table B-2 and B-3, or identify their own set of local use attributes. Through this specialization, site-specific Use Attributes can be developed that will represent the site-specific query requirements. For example each Query Model, regardless of its heritage (ICS and/or Site-Specific) will contain search parameters. The ICS Use Attributes for a product descriptor include Bounding Rectangle, Product Descriptor, Spatial Coverage, Temporal Coverage and Temporal Range search parameters. For the Collection Descriptor the Bounding Rectangle, Collection Descriptor, Collection Type, Data Originator, Included Collections Descriptors, Included Item Descriptors, Instrument Sensor, Keywords, Platform, Product Collection Specific, Revision, Spatial Coverage, Temporal Coverage, and Theme Keywords, are specified as search parameters. A site-specific capability may contain all of the above in addition to site-specific parameters such as Data Centre Name.

    In addition to the parameters, what makes a Query Capability a Query Model is the coupling of the parameters, relationships, and values for the parameters. For example, assume that a user frequently requests all AVHRR and Atmospheric Collections. A Query Model would then be developed that specifies Sensor=AVHRR, "Sensor" being the parameter, "=" being the relationship, and "AVHRR" being the value and Theme Keyword=Atmosphere. There can exist any number of parameter = value pairs in a Site Query Model.

    The user still has available, for searching purposes, the basic ICS Use Attributes. The Query Model does not eliminate this capability; it merely extends the query capability to another level of specificity.

    The two primary sources for the components of the Query Model (parameters, relationships and values) are the user requirements and data access patterns.
     

    3.2.1.1.2  Analysis of Existing Data
    Prior to identifying a Collections Structure, the ISA and Science Data Providers should analyze the local site’s data holdings. The purpose for this activity is to determine the desired metadata that will be captured and maintained in the ICS. At the conclusion of this activity a list of products and Collections and the characteristics of these products and Collections should be identified. The characteristics should be determined by consulting the rules for constructing the various categories of Collections that are described in Section 2.

    It may be useful to construct several tables that identify and record the characteristics for each of the collections and/or products. A Collection Table would capture the Collection characteristics while the product, product characteristics. The purpose for these tables is to assist in identifying the characteristics for the descriptors, recording the values for the attributes among the set of Collections and/or products, constructing the Collections Structure, and populating the Collections Database. Tables 3-1 and 3-2 provide examples of the structure and content of these tables. As a minimum, it is recommended that the ICS Query Parameters serve as the column headings. In the example Tables 3-1 and 3-2 illustrated below, only a subset of the above Query Parameters were used to demonstrate this concept.

             
Table 3-1 Product Descriptor Analysis Results Table
             
            Item Descriptor ID
            Category
            Theme Keyword
            Sensor ID
            Temporal Range
            Spatial Coverage
            Archiving Centre ID
            z39.50s://larc.nasa.gov/pidax_dx_noa11_9206 Product Albedo AVHRR 920601

            920630

            39.9000 -39,9000

            10.1000 -5.1000

            LaRC
            z39.50s://larc.nasa.gov/pidax_dx_noa12_9206 Product Pressure AVHRR 920601

            920630

            39.9000 -39,9000

            10.1000 -5.1000

            LaRC
            z39.50s://larc.nasa.gov/piddx_8902_noaa11 Product Ice AVHRR 890201

            890228

            90.0000 -90.0000

            -180.0000 180.0000

            LaRC
            z39.50s://larc.nasa.gov/piddx_8901_noaa11 Product Snow AVHRR 890101

            890131

            90.0000 -90.0000

            -180.0000 180.0000

            LaRC
            z39.50s://larc.nasa.gov/pidmx_02_ceress2 Product Radiation Flux CERES 980101

            980102

            39.9000 -39,9000

            10.1000 -5.1000