Create a SKOS thesaurus from a Shapefile
GeoNetwork opensource is using thesaurus to help metadata editor to add keywords from a controlled vocabulary list. By default, GeoNetwork provide one thesaurus with the main regions of the world (like countries, seas, continents). A good practice when setting up your catalogue is to provide to the end-user lists of keywords (aka thesaurus) which match their working domain.
This tutorial explains how to convert using Talend Open Studio and the spatial module an ESRI shapefile which contains protected areas in France into a SKOS thesaurus. http://www.w3.org/2004/02/skos/ is used to store thesaurus in GeoNetwork.
SKOS could be extended in order to store more information. GeoNetwork is using GML to add the bounding box of a concept in the thesaurus. A concept looks like this:
<skos:Concept rdf:about="http://geonetwork-opensource.org/regions#338">' <skos:prefLabel xml:lang="en">Africa</skos:prefLabel> <skos:prefLabel xml:lang="fr">Africa</skos:prefLabel> <skos:prefLabel xml:lang="es">Africa</skos:prefLabel> <skos:prefLabel xml:lang="cn">非洲</skos:prefLabel> <skos:prefLabel xml:lang="ar">Africa</skos:prefLabel> gml:BoundedBy <gml:Envelope gml:srsName="http://www.opengis.net/gml/srs/epsg.xml#epsg:4326"> gml:lowerCorner-17.3 -34.6</gml:lowerCorner> gml:upperCorner51.1 38.2</gml:upperCorner> </gml:Envelope> </gml:BoundedBy> <skos:broader rdf:resource="http://geonetwork-opensource.org/regions#continent" /> </skos:Concept>
This tutorial does not cover multilingual capabilities of SKOS. Thesaurus is generated in only one language.
Create a simple job in the workspace.
From the metadata, create a generic schema for the Shapefile.
images/01CreateGenericSchema.png
Use the sShapefileInput component to read the file and the sEnveloppeCalculator to compute the extent.
It's always good to check all is ok with the tLogRow when building the job ...
GML envelope is defined by a lower and upper corner. We have to extract the first and second coordinates from the boundary.
A tAdvancedFileOutputXML component is used to create the RDF file.
Run the job from the studio !
Once created, go to the administration interface of your catalogue. Upload the thesaurus and move to the editor.
You should see the new thesaurus in the list of the keyword selection panel.
Then you can use the "compute extent" mechanism to automatically populate the extent section of the metadata records.