Skip to content
Tim L edited this page Jul 7, 2014 · 128 revisions

What's first

What we'll cover

On this page, we describe how CKAN is used to catalog the datasets in the LOD Cloud. We start by showing how to list a dataset manually, then describe how some automation can assist the process. Next, we highlight some characteristics that CKAN collects using a manual process, to see if we can implement some [FAqT Services](FAqT Service) to help out.

This page has the following sections:

Let's get to it

How to list a version of https://scm.escience.rpi.edu/svn/public/logd-csv2rdf4lod/data/source/contactingthecongress/directory-for-the-112th-congress/?

Manually publish a dataset on CKAN

I. Go to http://thedatahub.org and create an account.

  • You can create an account directly on CKAN, or use an Open ID, e.g. Google.
  • After logging in, the Data Hub will show your API key, which can be used to programmatically retrieve and update CKAN data.

III. Review and accept the Open Database License

http://opendatacommons.org/licenses/odbl/1.0/

IV. Add a dataset

screenshot: add dataset link

V. Have a name for your dataset

screenshot: add a title for the dataset

A description of the dataset should be provided, as well as the URL for your dataset on CKAN. The URL follows http://thedatahub.org/dataset/. Click "Add Dataset" as you done with the basic information.

screenshot: submit button to add dataset screenshot: dataset created confirmation

(Note, the offer to "Upload or link some data now" only appears once, but it links to the "Edit : Resources" section of the dataset page, so you can get there yourself.)

VI. Provide some additional information

Then the page will be directed to add additional information. Additional information such as author, maintainer, contact info of the two as well as the link of the dataset should be provided. It is also preferred that data format (.ttl,.nt. or .rdf), namespaces (http://www.myData.org/), ontology (Dublin Core, FOAF), links to open dataset (number of links to DBpedia, freebase) and basic statistical information (total num of triples) are also given by the author to give a thorough view of your dataset.

screenshot: add a resource to the dataset by linking to a file, API, or uploading the file itself

V. Get the core dataset appeared online

There are 3 options to provide your data to the public.

  • Link to URL
  • Link to API
  • Upload a file

The first option requests a link to your data that is already available on the web. The second option is to provide the link of your service (http://dbpedia.org/sparql). Finally, you may upload your data to CKAN so that they can take care of it.

When providing a Link to URL, the following metadata is requested:

screenshot: link to a URL metadata requested

The same metadata is requested when submitting a Link to API; though, most fields do not suit.

Group your CKAN dataset?

If you have a handful of similar datasets, it is a good idea to create a "CKAN group" for them. For example, the lodcloud CKAN group contains all datasets worth of being in the LOD Diagram.

Although "tagging" some CKAN datasets also groups them, creating a CKAN group is a little heavier weight. Each group gets its own page that lists its members.

Note that when you create a CKAN group, you must go into the administration settings to permit others to modify it. Groups are not "public edit" by default like dataset listings are.

Characteristics of datasets on CKAN

  • Metadata

Author, contact Info, description of the content of the data provide data consumer a basic idea of what the dataset is about.

  • Basic Data Quality Preview

Number of links to the open dataset, ontology give the data consumer a basic confidence in using the data.

Beyond data hub - and on to LOD

For details about what metadata they require and how it's encoded as both CKAN attributes and proper RDF vocabularies, see CKAN lodcloud RDF vocabulary.

An overview hypermap

The following figure illustrates the major resources that can be used to figure out how to publish a Linked Data set to the lodcloud group. Clicking on the image will lead to a higher resolution PDF version with links.

hypermap showing the web sites that support the lodcloud datahub group

Getting back what you put in

Going through the effort to list a dataset in CKAN has several advantages.

  • CKAN provides a framework to maintain and publish information about your dataset, translating to less work for you.
  • Your dataset can be discovered by those looking at CKAN
  • You can add and modify your dataset listing yourself using a web browser
  • You can add and modify your dataset listing programmatically (using REST!)
  • Your RDF dataset can be listed in the lodcloud group, and thus become part of The Diagram

So far, this page has described how to list your dataset manually. Now we'll turn to how to access that information programmatically. Then, we'll look at submitting new datasets programmatically using the CKAN API, which is described at http://wiki.ckan.org/API with a tutorial at http://wiki.ckan.org/Using_the_API.

We'll use the Farmers Markets dataset as an example, since we listed a lot of its metadata and it is listed in lodcloud. First, we'll start with reading and writing at the CKAN api level, then move on to how CKAN has been interpreted as RDF by Will Waites and Keith Alexander.

Q: The attributes of a dataset listing?

Dataset's web page has the "package id" in its URL: http://thedatahub.org/dataset/farmers-markets-geographic-data-united-states

Change the base to get to the API: http://thedatahub.org/api/rest/package/farmers-markets-geographic-data-united-states

{
   "maintainer":"",
   "maintainer_email":"",
   "id":"91d2c0de-75a4-4bb6-b260-bc2946e1be8b",
   "metadata_created":"2007-04-10T21:19:38",
   "relationships":[

   ],
   "metadata_modified":"2011-12-05T11:35:55.419895",
   "author":"Tim Lebo",
   "author_email":"lebot@rpi.edu",
   "download_url":"http://logd.tw.rpi.edu/source/data-gov/file/4383/version/2011-Nov-29/conversion/data-gov-4383-2011-Nov-29.ttl",
   "state":"active",
   "version":"2011-Nov-29",
   "license_id":"cc-by",
   "resources":[
      {
         "mimetype":"text/turtle",
         "resource_group_id":"3bb8f6c7-25dc-4533-b4e5-0c84a984638e",
         "hash":"",
         "description":"Farmers Markets Geographic Data (United States) Version 2011-Nov-29, Enhancement 1",
         "format":"text/turtle",
         "url":"http://logd.tw.rpi.edu/source/data-gov/file/4383/version/2011-Nov-29/conversion/data-gov-4383-2011-Nov-29.ttl",
         "cache_url":null,
         "webstore_url":null,
         "cache_last_updated":null,
         "package_id":"91d2c0de-75a4-4bb6-b260-bc2946e1be8b",
         "mimetype_inner":"text/turtle",
         "webstore_last_updated":null,
         "last_modified":"2011-11-29T11:40:00",
         "position":0,
         "size":"5772173",
         "id":"99fce83a-f0c2-4efa-8421-59eac9fc4531",
         "resource_type":"file",
         "name":"Farmers Markets Geographic Data (United States) Version 2011-Nov-29, Enhancement 1"
      },
      {
         "mimetype":"",
         "resource_group_id":"3bb8f6c7-25dc-4533-b4e5-0c84a984638e",
         "hash":"",
         "description":"Rensselaer LOGD SPARQL Endpoint",
         "format":"api/sparql",
         "url":"http://logd.tw.rpi.edu/sparql",
         "cache_url":null,
         "webstore_url":null,
         "cache_last_updated":null,
         "package_id":"91d2c0de-75a4-4bb6-b260-bc2946e1be8b",
         "mimetype_inner":"",
         "webstore_last_updated":null,
         "last_modified":null,
         "position":1,
         "size":null,
         "id":"c71c409f-2da1-4ec4-983c-63494963fdb6",
         "resource_type":"api",
         "name":"Rensselaer LOGD SPARQL Endpoint"
      },
      {
         "mimetype":"text/turtle",
         "resource_group_id":"3bb8f6c7-25dc-4533-b4e5-0c84a984638e",
         "hash":"",
         "description":"voiD description",
         "format":"meta/void",
         "url":"http://logd.tw.rpi.edu/source/data-gov/file/4383/version/2011-Nov-29/conversion/data-gov-4383-2011-Nov-29.void.ttl",
         "cache_url":null,
         "webstore_url":null,
         "cache_last_updated":null,
         "package_id":"91d2c0de-75a4-4bb6-b260-bc2946e1be8b",
         "mimetype_inner":"text/turtle",
         "webstore_last_updated":null,
         "last_modified":"2011-11-29T00:00:00",
         "position":2,
         "size":"183655",
         "id":"8b29ea8f-c971-4c0a-a4bf-ae91d68f31bf",
         "resource_type":"file",
         "name":"voiD description"
      },
      {
         "mimetype":"text/turtle",
         "resource_group_id":"3bb8f6c7-25dc-4533-b4e5-0c84a984638e",
         "hash":"",
         "description":"RDF Schema",
         "format":"meta/rdf-schema",
         "url":"http://logd.tw.rpi.edu/source/data-gov/file/4383/version/2011-Nov-29/conversion/data-gov-4383-2011-Nov-29.void.ttl",
         "cache_url":null,
         "webstore_url":null,
         "cache_last_updated":null,
         "package_id":"91d2c0de-75a4-4bb6-b260-bc2946e1be8b",
         "mimetype_inner":"text/turtle",
         "webstore_last_updated":null,
         "last_modified":"2011-11-29T00:00:00",
         "position":3,
         "size":"183655",
         "id":"86d4c0d1-2418-4bae-80b3-d9bccd3337e4",
         "resource_type":"file",
         "name":"RDF Schema"
      },
      {
         "mimetype":"text/turtle",
         "resource_group_id":"3bb8f6c7-25dc-4533-b4e5-0c84a984638e",
         "hash":"",
         "description":"Turtle example link",
         "format":"example/turtle",
         "url":"http://logd.tw.rpi.edu/source/data-gov/file/4383/version/2011-Nov-29/conversion/data-gov-4383-2011-Nov-29.e1.sample.ttl",
         "cache_url":null,
         "webstore_url":null,
         "cache_last_updated":null,
         "package_id":"91d2c0de-75a4-4bb6-b260-bc2946e1be8b",
         "mimetype_inner":"text/turtle",
         "webstore_last_updated":null,
         "last_modified":"2011-11-29T00:00:00",
         "position":4,
         "size":"183655",
         "id":"c655d560-e174-474c-bd18-3a36ff9869e0",
         "resource_type":"file",
         "name":"Turtle example link"
      }
   ],
   "tags":[
      "format-con",
      "format-conversion",
      "format-dc",
      "format-ov",
      "format-owl",
      "format-void",
      "format-wgs",
      "geographic",
      "lod",
      "no-deref-vocab",
      "provenance-metadata",
      "published-by-third-party",
      "vocab-mappings"
   ],
   "groups":[
      "lodcloud",
      "datafaqs"
   ],
   "name":"farmers-markets-geographic-data-united-states",
   "license":"OKD Compliant::Creative Commons Attribution",
   "notes_rendered":"<p>Longitude and latitude, state, address, name, and zip code of Farmers Markets in the United States, converted to RDF format.\n</p>\n<p>References <a href=\"http://logd.tw.rpi.edu/sparql.php?query-option=text&amp;query=PREFIX+owl%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2002%2F07%2Fowl%23%3E%0D%0APREFIX+conversion%3A+%3Chttp%3A%2F%2Fpurl.org%2Ftwc%2Fvocab%2Fconversion%2F%3E%0D%0ASELECT+distinct+%3Fexternal%0D%0AWHERE+%7B%0D%0A++GRAPH+%3Chttp%3A%2F%2Flogd.tw.rpi.edu%2Fsource%2Fdata-gov%2Fdataset%2F4383%2Fversion%2F2011-Nov-29%3E++%7B%0D%0A++++%3Flocal+owl%3AsameAs+%3Fexternal%0D%0A++%7D%0D%0A%7D%0D%0Aorder+by+%3Fexternal&amp;service-uri=&amp;output=html&amp;callback=&amp;tqx=&amp;tp=\">154 URIs</a> in DBPedia, GovTrack, and Geonames.\n</p>\n<p><a href=\"http://logd.tw.rpi.edu/sparql.php?query-option=text&amp;query=PREFIX+conversion%3A+%3Chttp%3A%2F%2Fpurl.org%2Ftwc%2Fvocab%2Fconversion%2F%3E%0D%0APREFIX+ds4383_vocab%3A+%3Chttp%3A%2F%2Flogd.tw.rpi.edu%2Fsource%2Fdata-gov%2Fdataset%2F4383%2Fvocab%2F%3E%0D%0ASELECT+distinct+%3Fmarket%0D%0AWHERE+%7B%0D%0A++GRAPH+%3Chttp%3A%2F%2Flogd.tw.rpi.edu%2Fsource%2Fdata-gov%2Fdataset%2F4383%2Fversion%2F2011-Nov-29%3E+%7B%0D%0A++++%3Fmarket+a+ds4383_vocab%3AFarmersMarket+%0D%0A++%7D%0D%0A%7Dorder+by+%3Fmarket&amp;service-uri=&amp;output=html&amp;callback=&amp;tqx=&amp;tp=\">7,223 farmers market URIs</a> dereference to RDF/XML (e.g., see <a href=\"http://validator.linkeddata.org/vapour?vocabUri=http%3A%2F%2Flogd.tw.rpi.edu%2Fsource%2Fdata-gov%2Fdataset%2F4383%2Fversion%2F2011-Nov-29%2FfarmersMarket_1019&amp;classUri=http%3A%2F%2F&amp;propertyUri=http%3A%2F%2F&amp;instanceUri=http%3A%2F%2F&amp;defaultResponse=dontmind&amp;userAgent=vapour.sourceforge.net\">vapour report</a> for <a href=\"http://logd.tw.rpi.edu/source/data-gov/dataset/4383/version/2011-Nov-29/farmersMarket_1019\">farmersMarket_1019</a>).\n</p>",
   "url":"http://explore.data.gov/Agriculture/Farmers-Markets-Geographic-Data/wfna-38ey",
   "ckan_url":"http://thedatahub.org/dataset/farmers-markets-geographic-data-united-states",
   "notes":"Longitude and latitude, state, address, name, and zip code of Farmers Markets in the United States, converted to RDF format.\r\n\r\nReferences [154 URIs](http://logd.tw.rpi.edu/sparql.php?query-option=text&query=PREFIX+owl%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2002%2F07%2Fowl%23%3E%0D%0APREFIX+conversion%3A+%3Chttp%3A%2F%2Fpurl.org%2Ftwc%2Fvocab%2Fconversion%2F%3E%0D%0ASELECT+distinct+%3Fexternal%0D%0AWHERE+%7B%0D%0A++GRAPH+%3Chttp%3A%2F%2Flogd.tw.rpi.edu%2Fsource%2Fdata-gov%2Fdataset%2F4383%2Fversion%2F2011-Nov-29%3E++%7B%0D%0A++++%3Flocal+owl%3AsameAs+%3Fexternal%0D%0A++%7D%0D%0A%7D%0D%0Aorder+by+%3Fexternal&service-uri=&output=html&callback=&tqx=&tp=) in DBPedia, GovTrack, and Geonames.\r\n\r\n[7,223 farmers market URIs](http://logd.tw.rpi.edu/sparql.php?query-option=text&query=PREFIX+conversion%3A+%3Chttp%3A%2F%2Fpurl.org%2Ftwc%2Fvocab%2Fconversion%2F%3E%0D%0APREFIX+ds4383_vocab%3A+%3Chttp%3A%2F%2Flogd.tw.rpi.edu%2Fsource%2Fdata-gov%2Fdataset%2F4383%2Fvocab%2F%3E%0D%0ASELECT+distinct+%3Fmarket%0D%0AWHERE+%7B%0D%0A++GRAPH+%3Chttp%3A%2F%2Flogd.tw.rpi.edu%2Fsource%2Fdata-gov%2Fdataset%2F4383%2Fversion%2F2011-Nov-29%3E+%7B%0D%0A++++%3Fmarket+a+ds4383_vocab%3AFarmersMarket+%0D%0A++%7D%0D%0A%7Dorder+by+%3Fmarket&service-uri=&output=html&callback=&tqx=&tp=) dereference to RDF/XML (e.g., see [vapour report](http://validator.linkeddata.org/vapour?vocabUri=http%3A%2F%2Flogd.tw.rpi.edu%2Fsource%2Fdata-gov%2Fdataset%2F4383%2Fversion%2F2011-Nov-29%2FfarmersMarket_1019&classUri=http%3A%2F%2F&propertyUri=http%3A%2F%2F&instanceUri=http%3A%2F%2F&defaultResponse=dontmind&userAgent=vapour.sourceforge.net) for [farmersMarket_1019](http://logd.tw.rpi.edu/source/data-gov/dataset/4383/version/2011-Nov-29/farmersMarket_1019)).",
   "title":"Farmers Markets Geographic Data (United States)",
   "ratings_average":null,
   "extras":{
      "links:govtrack":"52",
      "triples":"130005",
      "namespace":"http://logd.tw.rpi.edu/source/data-gov/dataset/4383/version/2011-Nov-29/",
      "links:geonames-semantic-web":"50",
      "shortname":"Farmers Markets (US)",
      "links:dbpedia":"52",
      "sparql_graph_name":"http://logd.tw.rpi.edu/source/data-gov/dataset/4383/version/2011-Nov-29"
   },
   "ratings_count":0,
   "revision_id":"9c2545b8-ebf1-42bf-ac92-633ff5ceba3f"
}

Q: List all datasets in a group?

Group's web page has its id in its URL: http://thedatahub.org/group/datafaqs

Change the base to get to the API: http://thedatahub.org/api/rest/group/datafaqs

returns

{
   "display_name":"DataFAQs",
   "description":"",
   "title":"DataFAQs",
   "created":"2011-11-29T20:49:42.445597",
   "state":"active",
   "extras":{

   },
   "revision_id":"d9d3eca9-1a06-4d14-b269-5db44951a427",
   "packages":[
      "congresspeople",
      "farmers-markets-geographic-data-united-states",
      "white-house-visitor-access-records"
   ],
   "id":"3c5f8753-6f42-45ca-a2be-9dd53eb679b8",
   "name":"datafaqs"
}

Updating a dataset

Q: Add a tag to a dataset?

For example, tag http://thedatahub.org/dataset/white-house-visitor-access-records with "no-deref-vocab"

(the steps here work when changing "package" to "dataset" and using request header X-CKAN-API-Key instead of Authorization).

To modify a dataset listing, you need an API key, which is listed in your user home page after you log in to CKAN. For POST operations, the API key must be the value of an Authorization request header.

screenshot: logging into CKAN shows you your API key

The example from the CKAN tutorial (the test site is now at http://demo.ckan.org/):

$ curl http://test.ckan.net/api/rest/package/test -d '{"name":"test", "title":"Changed Test package"}' -H "Authorization:your-api-key"

When you update an object, fields that you don’t supply will remain as they were before. (Model Formats) (though, this conflicts with recommendation via email to send EVERYTHING you get back)

We need to write the entire record, not a specific attribute. So we need to GET the whole thing, tweak it, then POST it back. Let's fall down to the command line.

curl -O http://thedatahub.org/api/rest/package/white-house-visitor-access-records
{   "maintainer":"",   "maintainer_email":"",   "id":"e59f5df0-05b4-40e5-99b1-f801c15e6e93",   "metadata_created":"2007-04-10T21:19:38",   "relationships":[   ],
   "metadata_modified":"2011-11-29T14:02:44.873726",
   "author":"",
   "author_email":"",
   "state":"active",
   "version":"",
   "license_id":"",
   "resources":[

   ],
   "tags":[

   ],
...

tweak it to say:

   "tags":[
      "no-deref-vocab"
   ],
...

Send it back up :

$ curl http://thedatahub.org/api/rest/package/white-house-visitor-access-records -d @white-house-visitor-access-records -H "Authorization:Aa8...Bb2"
"Access denied" :-(

also denied:

curl http://thedatahub.org/api/rest/package/white-house-visitor-access-records --data-urlencode @white-house-visitor-access-records -H "Authorization:Aa8...Bb2"

(try http://demo.ckan.org/)

http://test.ckan.net/dataset/9f825e18-0e0c-40f6-8895-b5392c1ac18a

Automatically publish dataset on CKAN

Now that we've covered how to use the core CKAN REST API to access and modify the metadata that we added manually, we can see how the python API can be used to perform the same operations. Although there are CKAN APIs for PHP, Javascript, Python, Perl and command-line, we focus here on Python because we are headed to making SADI services in Python. Each language-specific API provides a wrapper to the foundational REST API calls.

As an example, we'll tag e59f5df0-05b4-40e5-99b1-f801c15e6e93 / white-house-visitor-access-records with 'government' using CKAN's Python api.

First, we need to install the CKAN Python API:

sudo easy_install http://pypi.python.org/packages/source/c/ckanclient/ckanclient-0.9.tar.gz#md5=cb6d09eb2e60a01bce60c82c6c3a0c85

We submitted https://github.com/okfn/ckanclient/issues/15 regarding ~/.ckanclientrc not providing the api-key.

A SADI service to submit to CKAN

A SADI service to accept RDF-encoded tags and VoID, and submits it to CKAN: add-metadata.py. X_CKAN_API_Key is described at DATAFAQS environment variables.

export X-CKAN-API-Key="YOUR API KEY"
python add-metadata.py
class TagCKANDataset(sadi.Service):
...
    name                   = 'add-metadata'
...
        # Instantiate the CKAN client.
        # http://docs.python.org/library/configparser.html (could use this technique)
        key = os.environ['X_CKAN_API_Key']
        if len(key) <= 1:
            print 'ERROR: https://github.com/timrdf/DataFAQs/wiki/Missing-CKAN-API-Key'
            sys.exit(1)
        self.ckan = ckanclient.CkanClient(api_key=key)

Service will be available at: http://localhost:9090/add-metadata

Dumb down Turtle to RDF/XML to avoid bad python turtle parsers, and HTTP POST the RDF to the service:

$ rapper -g -o rdfxml sample-inputs/congresspeople-tagged-government.ttl > b.rdf
$ curl -H "Content-Type: application/rdf+xml" -d @b.rdf http://localhost:9090/add-metadata

Returns:

@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

<http://thedatahub.org/dataset/congresspeople> a <http://purl.org/twc/vocab/datafaqs#ModifiedCKANDataset>;
    dcterms:modified "2011-12-16T17:19:26.635183";
    rdfs:seeAlso <http://thedatahub.org/dataset/f4c2a8bb-6580-4919-98aa-617feb766b6c> .

Useful regex b/c dcterms:identifier are uuids AND pretty names (for same thing):

    def is_id(self, id_string):
        '''Tells the client if the string looks like an id or not'''
        return bool(re.match('^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$', id_string))

Extending sadi.py with ckan

I'd like to extend sadi.py to ckan-sadi.py that includes the ckan object and has a constructor:

   def __init__(self):
       sadi.Service.__init__(self)

       # Instantiate the CKAN client.
       # http://docs.python.org/library/configparser.html (could use this technique)
       key = os.environ['X_CKAN_API_Key']
       if len(key) <= 1:
           print 'ERROR: https://github.com/timrdf/DataFAQs/wiki/Missing-CKAN-API-Key'
           sys.exit(1)
       self.ckan = ckanclient.CkanClient(api_key=key)

Are these services all in a common directory? You can include that code in the same directory as your services (ckan-sadi.py or ckan.py) and do:

from ckan-sadi import *

Since it'll be in your python path. You can have ckan-sadi.py import sadi.py and extend that class as you show below, then have your classes extend that one.

Dataset changes

(Apr 2013)

CKAN lists dataset changes (e.g. klappstuhlclub)

http://docs.ckan.org/en/latest/api.html#get-able-api-functions describes how to construct the URL for the functions listed at http://docs.ckan.org/en/latest/ckan.logic.action.get.html

{
  "help": "Return a dataset (package)'s revisions as a list of dictionaries.\n\n    :param id: the id or name of the dataset\n    :type id: string\n\n    ",
  "success": true,
  "result": [
    {
      "id": "3d1b86ca-557a-4be2-82e6-2eeb2d9f0e4b",
      "timestamp": "2011-10-05T15:32:01.037585",
      "message": "",
      "author": "139.18.255.65",
      "approved_timestamp": null
    },
    {
      "id": "998ebea5-d7e1-4c13-a76a-d7e9b8dd9688",
      "timestamp": "2011-09-12T07:32:17.038294",
      "message": "",
      "author": "http:\/\/kurzum.myopenid.com\/",
      "approved_timestamp": null
    },
    {
      "id": "82b23f1a-5fdb-40f3-88a9-b4bbb1b46e76",
      "timestamp": "2011-09-09T08:23:54.817967",
      "message": "",
      "author": "anjeve",
      "approved_timestamp": null
    },
    {
      "id": "20f06b51-2db0-49ca-888d-7facfc4d2a42",
      "timestamp": "2011-09-09T08:18:29.275626",
      "message": "",
      "author": "anjeve",
      "approved_timestamp": null
    },
    {
      "id": "275cb117-0a25-4a66-bdf1-092bb99ea853",
      "timestamp": "2011-09-09T07:56:31.842035",
      "message": "",
      "author": "anjeve",
      "approved_timestamp": null
    },
    {
      "id": "9e81f0ab-d0b5-49e1-a82c-11c5951a550f",
      "timestamp": "2011-09-09T07:56:01.188565",
      "message": "",
      "author": "anjeve",
      "approved_timestamp": null
    },
    {
      "id": "28abe894-2cf1-4eb6-bd93-b14e0292042b",
      "timestamp": "2011-09-09T07:53:14.447832",
      "message": "creation",
      "author": "http:\/\/kurzum.myopenid.com\/",
      "approved_timestamp": null
    }
  ]
}

Still need to find the call for e.g. http://datahub.io/revision/diff/farmers-markets-geographic-data-united-states?diff_entity=package&oldid=6278d190-33b2-4bf9-9e77-8a0f35d2540c&diff=9c2545b8-ebf1-42bf-ac92-633ff5ceba3f

CKAN test sites

As of May 2013:

There are three demo sites that we currently maintain, they are mostly similar but run slightly different versions:

All sites databases are regularly wiped out and recreated so nothing in there should be considered permanent. User accounts are different from the ones on datahub.io.

Misc CKAN bits

Hi all,

We've been working on upgrading the Datahub (datahub.io) to CKAN 2.0.
The good news is that this upgrade will go ahead early next week. You
can see a preview of how it will look here:

http://datahub.staging.ckanhosted.com/

The new version introduces CKAN 2.0 functionality, such as the ability
to follow datasets and groups, activity streams, dashboards, improved
UI, and more.

For those who are new to it, the Datahub is a community-run portal - a
free public CKAN instance for cataloguing and publishing data. Its
'groups' can be used to bring together collections of data in a
particular area or even for lightweight data publishing by small
organisations. As with a wiki, the ability of any user to edit
metadata (e.g. to fix broken links) helps keep it relevant and
up-to-date.

As a result of the upgrade, Datahub users should note the following:

1. User creation, which was temporarily disabled recently because of a
spam attack, will be enabled again when the new site goes live.

2. For a short period on Monday afternoon, there will be a freeze on
adding or editing content. Existing content should still be readable.

3. All datasets will be editable by anyone who has a valid user
account. This is also the default at present, and in keeping with the
Datahub's wiki-like nature. CKAN 'Organizations', which enable
authorization control over datasets, will not be available on the
Datahub for the time being. Of course, the 'Group' feature will
continue to work as before, along with 2.0 enhancements such as group
activity streams and the ability to follow a group.

4. Data can be added only as links: if you are publishing data, We've
had to remove the option of uploading data on the Datahub for the
present. (Data already uploaded will still be available.)

For help, or if you have any questions, please e-mail datahub@okfn.org.

All the best,

Mark Wainwright

-- 
Business development and user engagement manager
The Open Knowledge Foundation
Empowering through Open Knowledge
http://okfn.org/  |  @okfn  |  http://ckan.org  |  @CKANproject

_______________________________________________
ckan-dev mailing list
ckan-dev@lists.okfn.org
http://lists.okfn.org/mailman/listinfo/ckan-dev
Unsubscribe: http://lists.okfn.org/mailman/options/ckan-dev

There are services for hosted instances of CKAN and deployment on your own services at http://ckan.org/services/hosted-slas/ and http://ckan.org/services/deployment/.

The instructions at http://docs.ckan.org/en/1117-start-new-test-suite/install-from-source.html well-written, and one was able to get a local CKAN instance running on my Ubuntu-based laptop in about 15 minutes. (Apr 2014) http://docs.ckan.org/en/latest/maintaining/installing/install-from-source.html (this is the latest version of the link that David linked to). (Apr 2014)

Harvesters

Hi Stephen,

You can probably reuse all logic in the WAF harvester [1] on your own harvester (just don't extend Spatial Harvester), as its gather_stage and fetch_stage basically deal with parsing a remote folder, downloading the contents of the remote files and storing them into the CKAN db. The only spatial specific part are some lines that you can remove [2].

You will need of course to write your own import_stage that will transform whatever document type you want to harvest into a CKAN dict. You can look into the ckan-ckan or spatial import_stages or also [3], which might be simpler to follow.

Hope this helps,

Adrià

[1] https://github.com/ckan/ckanext-spatial/blob/master/ckanext/spatial/harvesters/waf.py [2] https://github.com/ckan/ckanext-spatial/blob/master/ckanext/spatial/harvesters/waf.py#L203-L219 [3] https://github.com/ckan/ckanext-dcat/blob/master/ckanext/dcat/harvesters.py#L268

On 25 February 2014 04:45, Stephen Barton svbarton@ucdavis.edu wrote: Hi, I am trying to make a harvester extension for a generic Web Accessible Folder (WAF). I have reviewed the documentation for the ckanext-harvest extension (that harvests other CKAN instances) and the ckanext-spatial extension (that harvests spatial metadata from WAFs), but it's not clear how to modify the code of ckanharvester.py (or the spatial harvester code waf.py) for a generic WAF. I could not find anything on the discussion archive.

https://lists.okfn.org/mailman/listinfo/ckan-discuss

The info on this page addresses writing a custom harvester, but it's not sufficient for me.

https://github.com/ckan/ckanext-harvest#the-harvesting-interface

What's next

Clone this wiki locally