RDF versus GML

September 2004
Chris Goad cg@mapbureau.com



GML is the XML language for geography developed by the Open GIS consortium. The third major revision of this specification, known as GML3, was released in January of 2003. RDFMap, when used in conjuction with RDFGeom, constitutes an attempt to develop an alternative approach based on RDF to expressing geographical information. This note outlines the relationship between this approach and GML, and suggests techniques for converting data between the two formalisms. The important differences between RDFMap and GML derive from the choice of RDF, and would apply equally to other RDF-based formalisms for geography.

These are the differences:

Data model and type system

GML is built on the XML data model and the XML Schema type system.

RDFMap and RDFGeom are built on the RDF data model. RDF Schema or OWL can be used to express typing information.

The advantage of RDF: composability

RDFMap, like any other RDF vocabulary, allows free composition with other RDF languages. Geographic assertions can be mixed with assertions about weather, physics, business processes, weblogs and syndication, genealogy, politics, and so on, without the need for prior coordination between the designers of languages for these domains. In the case of RDFMap, even geometry is relegated to a separate language (RDFGeom).

In contrast, GML is not directly composable with other XML languages. Entities that are described by other languages cannot legally play the role of geographic features in GML. This because all types of geographic features are required to derive from the GML abstract class gml:AbstractFeatureType. Even if it were not for this formal requirement, the lack of conventions about how to represent even simple semantic notions in XML languages would prevent effective integration of GML with XML languages developed independently.

The non-composability of GML requires that it absorb as application schemas the multitude of other domains to which geographical information is relevant. Failing this, non-standard mechanisms of some kind must be used to relate GML content with external data.

Indeed, GML positions itself as a universal, rather than geography-specific, semantic standard by including its own general formalisms for collections, assertion of properties (in a style very much like RDF), time and processes, and reference between content in separate files (via Xlink). GML can be viewed as an alternative not just to geography in RDF, but to RDF itself.

Maturity

The application of RDF to geography is at an early stage[1], whereas GML is a mature effort. RDFMap combined with the companion RDFGeom language cover only a fraction of the ground covered by GML3. This is partly because GML3 addresses other topics, such as topology, time and processes, and observations, that are not proper to the domain of geography. These are or will be covered by other vocabularies in the composable world of RDF. There is one area included in GML3 but not RDFMap that is often regarded as a proper part of geography, and is of great practical importance: coverages. A coverage is a mechanism for assigning data systematically to a set of points or areas that (usually) take a regular form such as grid or triangulation. Finally, GML's treatment of temporal matters and of observations is careful to support what is needed for geography, even if time and observation are not inherently geographical concepts. So, to match GML3 completely, feature for feature, within the RDF world will involve substantial effort.

However, RDFMap and RDFGeom are adequate as they stand to formulate much of the geographical information that is in exchanged in standard formats today. GML is not yet nearly as widely used as older and simpler formats such as ESRI shapefiles and USGS DEMs; GML3 in particular is very early in the adoption process. In the near term, interoperability with older standards will be of greater practical importance than interoperability with GML, though this may shift with growing adoption of GML. RDFMap and RDFGeom suffice already to cover the expressiveness of shapefiles, in which a vast amount of GIS data is available. Relatively modest efforts in the direction of coverages will suffice for DEMs.

Data conversion

GML applications are developed by combining XML schemas from the GML standard with application schemas that capture the relevant types and properties of the target domain. In the RDF world, the role of an application schema is played by a domain-specific vocabulary that is mixed with RDFMap to represent geographic information about objects in the domain.

Standard conversion techniques can be developed for the feature and geometry schemas in GML. Development of domain specific vocabularies in RDF corresponding to application schemas is also needed, and this part of the job requires separate treatment for each schema. Fortunately, a core requirement of GML is that information be represented in "striped" form by asserting values of properties on objects (as in RDF). It will often be possible to port an application schema to an RDF vocabulary by giving RDF (ie URI) names to the properties and types in the schema. This will allow instance data to be converted.

Type-level conversion

Converting instance data is not the only issue for interoperability between GML and RDF, though it is certainly the primary issue for practical purposes. There is also the matter of mirroring GML type definitions in the RDF world. In GML,definition of types - that is, characterization of the form of correct instance data - is handled by XML Schema - a very complex formalism. The two predominant typing formalisms for RDF are RDFS (RDF schema), and OWL. OWL is the appropriate choice for this job, since its expressiveness corresponds more closely to that of XML Schema.

However, practical applications do not always - in fact, do not usually - require definition of types; naming the types is adequate. Type definition serves the purpose of validation, and in some cases, support for automated reasoning ( particularly on the RDF side), but practical development of software involving instance data can proceed without formalizing the constraints that define types. Type definitions have not yet been developed for RDFMap or RDFGeom. So, type-level conversion between GML and RDFMap is a topic for the future.

An example

The venerable (and simple) Cambridge example has appeared in each version of the GML specification. It includes a minimal application schema for city features, and a brief data file describing two roads in Cambridge and the river Cam. The application schema used can be found on page 420 of the GML document. The schema introduces the types CityModelType, RoadType, RiverType, and MountainType (even though there is no mountain within 50 kilometers of Cambridge). The following files illustrate conversion of this simple data to RDFMap.

http://www.mapbureau.com/gml/examples/rdfcityvocabulary.xml

uses RDF Schema to define an RDF vocabulary corresponding the application schema in the example.

http://www.mapbureau.com/gml/examples/gmlcambridge.xml

is the instance data for Cambridge, expressed in GML.

http://www.mapbureau.com/gml/examples/rdfcambridge.xml

is the instance data for Cambridge, expressed in RDF.

[1]GML version 1 of May, 2000, had an RDF profile, which was dropped from subsequent versions. RDFMap revives this direction, while making different technical choices. The most important difference between RDFMap and GML 1 is the choice to partition geometry into a separate vocabulary based on SVG.
Copyright 2004, Map Bureau. All rights reserved.