Publishing data to the National Geothermal Data System

This section is specific for NGDS project participants or entities that would like to become nodes in the network. In this discussion, the term resource will be used in a broad sense to include all the data items of interest that might be registered for use in the NGDS. A resource becomes part of the NGDS system when it is locatable using an NGDS catalog search, and accessible according to procedures described in the metadata record obtained from the NGDS catalog. That’s all there is to it. The first step to make any resource part of the system is thus to create a metadata record describing the resource and how it can be obtained. For guidance on metadata preparation, see Metadata.

Although core or other kinds of physical samples, and hard copies of documents can be registered as NGDS resources, this discussion is focused on information resources that can be accessed electronically. Access to a resource via the web may be file-based, via web application, or service-based. Resources may provide different levels of interoperability (see http://en.wikipedia.org/wiki/-Conceptual_interoperability) based on whether their information content is unstructured, structured, or structured according to an NGDS content model and interchange format.

Publishing Data through the US DOE Geothermal Data Repository

The US DOE GDR is hosted on the Open Energy Information (OpenEI) Platform and designed to receive all US DOE Geothermal Technologies Program grant recipient data. All grant recipients are required to submit their data as it is generated or no later than 90 days after the quarter in which the data is generated. For more information on submitting data to the DOE GDR, please download the DOE GDR brochure, visit the NGDS information page, or visit the DOE GDR directly.

File-based
approaches

Web applications

Service-based
approaches

A representation of the resource may be retrieved on the WWW as a single digital package (a file!). The internal content of the package may be standardized to varying degrees corresponding to various levels of interoperability

At the simplest level, a file-based representation of the resource in whatever format it currently exists is made available at some permanent web location (identified by a URL). The contained information can be accessed by users who have software that can recognize and open files in the format it is delivered. They can utilize the information if they can understand the encoding, language, and data structure, but the system provides no support for this understanding, and little or no automation is possible.

Additional interoperability can be achieved if the information is loaded in a structured format such that the information can be processed using computer software, and made available as a file. Typical example file formats are various spreadsheet file types (e.g. Excel), dBase files, and Microsoft Access databases.  To be useful, the metadata must fully describe the data structure, e.g. the tables and fields in each table are defined. With this information, the content of the file can be accessed with computer software.

A higher degree of interoperability is enable if the information in the file is presented using a documented, community data structure. In this case the metadata can point to the data model specification that is used, and software that is aware of that data structure can process the information content of the file with little or no user interaction.  In the NGDS, this would correspond to loading data into a file-based table structure defined by one of the interchange content models.

Information resources can be made available to users via web-browser based applications that allow them to browse, view, process, analyze, or download in various ways. Such approaches can provide useful functionality, but do not lend to interoperability or resource reuse, because the application functionality is typically tightly coupled to a particular data source. In such a case, application function cannot easily be applied to other data sources, and the data cannot be accessed directly by other applications. Download functions commonly include some filtering and selection of data subset capabilities, but these operations required direct human interaction. The acquired files may be structured or unstructured (see above).

The intention of the NGDS that structured data be provided through web services, and that NGDS applications use NGDS service interfaces to acquire data that they operate on. In this fashion the work of the data compiler, data hosting, and application development agents in the system can be decoupled.

Example web applications:

Nevada Bureau of Mines and Geology Geothermal Web Application

Geothermal Prospector

EarthChem

System for Earth Sample Registration

In pursuit of an open data/linked data approach to making data available to the end users, the objective of the NGDS is to make all structured data available through web services using well documented community specifications for service protocols and data interchange formats. This sort of data delivery requires more sophisticated client and server software stacks, and more rigid quality control of the content. Because of the additional up front cost, widely available and useful data will need to be prioritized for web service delivery. Web services may be deployed using custom data schema corresponding to existing database holdings, but the goal is to facilitate use by data consumers by delivering data of a particular type in a consistent schema.

The NGDS is currently using simple feature data interchange/exchange schema for service-based delivery--these are 'flat' file formats that can be represented as simple spreadsheets or text tables with no information loss. These are developed as content models independent of a particular implementation, currently using Excel workbooks to document the content fields in the model (see Content Models).

Data access tiers

In discussions of NGDS resource publication, it is useful to think in terms of 3 tiers:

Tier 1. Information in text, images, or recorded sound. This is unstructured data, and can be very useful to people, but is not amenable to automated computer processing and analysis of the contained information (other than what is possible using text analytics software).

Tier 2. Structured data not in an NGDS content model. The structured data must be encoded in a file format that can be determined by inspection or based on file name or  metadata. WFS and WMS services may be deployed using such data, but users will have to determine  how to extract useful information on a case by case basis. Once the data structure has been determined, some computer processing is possible.

Tier 3. Structured data in an NGDS content model. May be in a file-based table with field names matching those in the content model, or in an XML document that validates against the XML schema for that model (and version of the model).

Next: Data Item Categories

See also: Data Interchange Formats

Share Data

Becoming a data provider to the NGDS is simple. To learn more about contributing your project’s data to the DOE Geothermal Data Repository, data interchange formats, and data services follow the links below: