Skip to content
This repository has been archived by the owner on Feb 21, 2024. It is now read-only.

Create a SKOS thesaurus from a Shapefile

fxprunayre edited this page Oct 24, 2012 · 1 revision

How-to create a SKOS thesaurus to be used in GeoNetwork metadata catalogue ?

GeoNetwork opensource is using thesaurus to help metadata editor to add keywords from a controlled vocabulary list. By default, GeoNetwork provide one thesaurus with the main regions of the world (like countries, seas, continents). A good practice when setting up your catalogue is to provide to the end-user lists of keywords (aka thesaurus) which match their working domain.

This tutorial explains how to convert using Talend Open Studio and the spatial module an ESRI shapefile which contains protected areas in France into a SKOS thesaurus. http://www.w3.org/2004/02/skos/ is used to store thesaurus in GeoNetwork.

SKOS could be extended in order to store more information. GeoNetwork is using GML to add the bounding box of a concept in the thesaurus. A concept looks like this:

<skos:Concept rdf:about="http://geonetwork-opensource.org/regions#338">' <skos:prefLabel xml:lang="en">Africa</skos:prefLabel> <skos:prefLabel xml:lang="fr">Africa</skos:prefLabel> <skos:prefLabel xml:lang="es">Africa</skos:prefLabel> <skos:prefLabel xml:lang="cn">非洲</skos:prefLabel> <skos:prefLabel xml:lang="ar">Africa</skos:prefLabel> gml:BoundedBy <gml:Envelope gml:srsName="http://www.opengis.net/gml/srs/epsg.xml#epsg:4326"> gml:lowerCorner-17.3 -34.6</gml:lowerCorner> gml:upperCorner51.1 38.2</gml:upperCorner> </gml:Envelope> </gml:BoundedBy> <skos:broader rdf:resource="http://geonetwork-opensource.org/regions#continent" /> </skos:Concept>

This tutorial does not cover multilingual capabilities of SKOS. Thesaurus is generated in only one language.

The job processing

Create a job

Create a simple job in the workspace.

images/00CreateJob.png

Read the Shapefile

From the metadata, create a generic schema for the Shapefile.

images/01CreateGenericSchema.png

Use the sShapefileInput component to read the file and the sEnveloppeCalculator to compute the extent.

images/02ReadShapefile.png

It's always good to check all is ok with the tLogRow when building the job ...

images/04Test.png

Extract coordinates

GML envelope is defined by a lower and upper corner. We have to extract the first and second coordinates from the boundary.

images/05Mapping.png

A tAdvancedFileOutputXML component is used to create the RDF file.

images/06SKOSOutput.png

Run the job from the studio !

images/07RunTheJob.png

Results once loaded into GeoNetwork

Once created, go to the administration interface of your catalogue. Upload the thesaurus and move to the editor.

You should see the new thesaurus in the list of the keyword selection panel.

images/08AddToMetadata.png

Then you can use the "compute extent" mechanism to automatically populate the extent section of the metadata records.

images/09ComputeExtent.png

images/09ViewExtent.png