Skip to content

theodi/data-publish-list

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 

Repository files navigation

How to publish open data: a list of advice and tools

i typed open data in to one of those AI image generation things and got this

This is a curated list of links which we, at the Open Data Institute, think are most helpful in publishing open data. It includes useful links and advice about platforms, licences, metadata and standards.

If you would like to expand or adapt this list yourself, please feel free to fork the project. If you would like to recommend some resources for this guide, please create a GitHub Issue, and we’ll consider them.

Contents

About open data

Before getting to the tools and techniques of publishing, it’s important to know the core definitions and principles of open data.

The Open Definition | Open Knowledge Foundation

  • ‘Open means anyone can freely access, use, modify, and share for any purpose (subject, at most, to requirements that preserve provenance and openness)’
  • The definition is available in many languages
  • The Open Knowledge Foundation is a non-profit, international organisation and behind a number of the resources linked to here

What is Open Data? | Open Data Handbook

  • This takes the open definition and expands on it, explaining what open means for data and why it’s so valuable.
  • The handbook was also made by the Open Knowledge Foundation

Some open data publishing principles | ONS Digital

  • Principles can be very useful to refer to when making decisions – and to ask: ‘if I do this, will it adhere to our principles?’
  • These seven principles are an excellent, concise review of what to aim for when publishing open data

The Annotated 8 Principles of Open Government Data | Opendata.gov

  • These principles were formed by a committee open government specialists
  • They contain some important points which weren’t mentioned in the previous link

The Data Spectrum | Open Data Institute

  • Data exists on a spectrum from closed to shared to open
  • Data should be as open as responsibly possible, while protecting people’s privacy, commercial confidentiality and national security.

Data publishing platforms

Data Portals | Datopian Tech

  • Comprehensive overview on data portals. What are they, why they would be needed
  • This can help you decide what your data publishing needs actually are before choosing any tool
  • Note: the portals it reviews are very centred on the CKAN platform

CKAN | Comprehensive Knowledge Archive Network

  • ‘CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data.’
  • The biggest and most widely used data platform by large public organisations.
  • Features include APIs, visualisations, search and analytics
  • Over 250 user-generated extensions available

Datasette

  • Datasette is a tool for exploring and publishing data. It helps people take data of any shape or size, analyze and explore it, and publish it as an interactive website and accompanying API
  • Straightforward to set up and provides a simple interface * Can run SQL queries on the datastore
  • Search and API functionality
  • Not as many features as platforms like CKAN

DataDock | Networked Planet

  • DataDock is an online service which creates navigable data portals for individuals and organisations to publish open data free of charge’
  • Helps with adding dataset metadata, licences, and search

OpenDataSoft

  • Large, many-featured, software service share and visualise open datasets
  • Holds a large share of the open data publishing market across public and private organisations
  • Runs a Data Hub which gives access to nearly 27,000 open datasets.

Open Data Platform (powered by Socrata) | Tyler Technologies

  • Socrata is a software service for data insights that was acquired Tyler Technologies in 2018
  • Popular product in the open data publishing space
  • Connected to the Open Data Network to browse and access open dataset, with a discovery API.

Dataset metadata

Dataset metadata is data that describes and provides information about data. Adding good metadata to a dataset helps the users understand what a dataset contains and how it is structured. Also, search engines use metadata to index datasets, making them easier to find.

Data and Datasets | schema.org

  • Schema.org is a community that agrees on data standards to add to structured data when publishing to the web
  • It’s used by several big search engines to find and index datasets on the web
  • Schema.org also provides overviews of the Dataset and DataCatalog types (DataCatalogs can contain multiple Datasets)

Dataset | Documentation | Google Developers

  • Guidance on how to add structured data (like that from Schema.org) to a dataset
  • Essential properties required for Google Dataset Search – such as description and name – are listed in the properties section.
  • Other recommended properties are useful for metadata, such as creator, licence and url.

Marking up your dataset with DCAT | Open Data Institute

  • ‘The Data Catalog Vocabulary (DCAT) defines a standard way to publish machine-readable metadata about a dataset’
  • Simple guide from 2014 by the ODI on adding DCAT metadata

Data Catalog Vocabulary (DCAT) format | W3C

  • The full documentation on DCAT* DCAT v2 now has support for documenting APIs

Sample Dataset Metadata | Centre for Gov (USA)

  • Part of a guide on dataset metadata as used by the USA federal government
  • Useful as it provides a list of properties and then clearly explains what each of those metadata properties represent

Metadata best practice for data publishers | Geospatial Commission

  • An introduction to metadata and practical guidance for data publishers
  • Outlines the importance of good metadata in ensuring published data can be discovered and used effectively
  • Provides guidance and examples in describing the key characteristics of data

Search Engine Optimisation best practice for data publishers | Geospatial Commission

  • Guidance to maximise the discoverability of your data in search engines and portals

Tabular data and CSVs

Creators of tools for open data? Steal this | Open Data Institute

  • Blogpost from the ODI summarising user research on data publishing tools, along with accompanying report
  • The research highlights key needs of people publishing data:
    • tools that are easier to use
    • better tools for checking and ensuring quality of the published data
    • better integration between tools in the publishing workflow
  • Includes a link to a register of open source tools

Using CSV file format | UK Government Digital Service

  • Aimed at people working in the UK government, but contains lots of good points
  • Setting up your CSV file’ describes the basics of well-formatted CSVs
  • Includes useful advice on when and when not to use CSV files.

Tidy data | Hadley Wickham

  • Advice, adapted from a paper, on how to structure clean, easy-to-analyse, tabular datasets
  • Tidy datasets are easy to manipulate, model and visualize, and have a specific structure: each variable is a column, each observation is a row, and each type of observational unit is a table’

Frictionless Framework | Frictionless Data

  • Frictionless Framework is a *‘data management framework for Python to describe, extract, validate, and transform tabular data’ *
  • Part of Frictionless Data , an open source toolkit of software and standards for the use of data. Part-funded by the Open Knowledge Foundation.* Available as command-line tool, Python library and in Javascript

CSV Lint | Open Data Institute

  • Online tool to validate CSV files
  • CSVLint helps you to check that your CSV file is readable. And you can use it to check whether it contains the columns and types of values that it should

An Introduction to VisiData

  • Impressive open source command line tool for interacting with tabular data
  • ‘Afree, open-source tool that lets you quickly open, explore, summarize, and analyze datasets in your computer’s terminal

Naming data files and datasets

Good names for datasets make the contents clear and save time for data users.

Datasets | Office of National Statistics UK

  • ‘How to structure and format your datasets and write effective dataset titles, summaries and information notes’
  • Simple and to-the-point advice
  • Developed by the ONS for internal use, but acts as a useful template for other data stewards to guide naming practices and standards.

Best Practices for File Naming – Records Express | USA National Archives

  • Short blogpost on things to consider when naming files

File naming and structure | Princeton University Library

  • A useful guide around the importance of descriptive file names for organising, sharing, and monitoring data files.
  • The most important piece of advice: ‘Write down naming convention in data management plan’. Data stewards and data users should be able to easily understand the naming conventions.

Application programming interfaces (APIs)

What is an Application Programming Interface (API) | IBM

  • Good introduction article to the topic of APIs, explaining some of the fundamental concepts: what are APIs; why we need them; types of APIs etc.
  • A useful resource to help make the business case for APIs

Open standards and open APIs | Open Data Institute

  • A collection of projects from different sectors which use and apply APIs

Web API design best practices | Microsoft Azure

  • Comprehensive and detailed high-quality, technical introduction to good API development
  • We recommend the first section on ‘RESTful web API design

API technical and data standards | UK Government Digital Service

  • The following web-based application programming interface (API) standards guidance will help your organisation deliver the best possible services to users.’
  • This is a comprehensive guide with useful sections, from publishing bulk data through to testing performance and scalability

Accessibility

Accessibility | W3C

  • A good, short overview on why accessibility for websites and data is so important
  • The power of the Web is in its universality. Access by everyone regardless of disability is an essential aspect.’ – Tim Berners-Lee
  • The dataset you publish is very likely to be on a web page. This page should be accessible to people with a diverse range of hearing, movement, sight, and cognitive ability.

Releasing Statistics in Spreadsheets | UK Government Statistical Service

  • Guide ‘to help improve the usability, accessibility and machine readability of statistical spreadsheets.’
  • Good examples of accessible spreadsheets
  • Advice on naming spreadsheet tables, columns, cells so that they are accessible and machine-readable

The 6 Simplest Web Accessibility Tests Anyone Can Do | Karl Groves

  • Before going in to automated testing tools, check the basics of the page’s accessibility
  • Tests include simply turning off the mouse, images and/or CSS

Creating Accessible Tables - Data Tables | Web Accessibility In Mind

  • Discusses which HTML tags to add to make data tables more accessible
  • WebAIM is a long running organisation that provides training and services in web accessibility, and has a good library of useful articles .

WAVE

  • Free web accessibility tool from WebAIM
  • WAVE can identify many accessibility and Web Content Accessibility Guideline (WCAG) errors, but also facilitates human evaluation of web content.
  • Enter a website address to get a diagnosis of accessibility issues

Data licences

Licences define what others can and can not do with the data you publish.

Publisher’s Guide to Open Data Licensing | Open Data Institute

  • Introduction to licensing for open data
  • This can help you understand what data you are allowed to store, and what you are allowed to openly publish
  • ‘If you have ownership of a work, and someone else wants to use it, they have to ask your permission. Licences are how you explicitly give someone else permission to use that work.’

How to choose an open data license | Koordinates

  • Includes a clear description of creative commons licences and the difference between the four licence elements of Attribution, ShareAlike, No-Derivatives and Non-Commercial

Choose a License | Creative Commons

  • Useful step-through to help you choose which CC licence is best for you
  • Includes the option of an icon to use on your website along the data

Conformant Licenses | Open Definition from the Open Knowledge Foundation

  • List of all the licences that conform to the Open Knowledge Foundation’s definition of an open licence.
  • Also includes an API to pull structured information about each of these licences
  • This can useful if you encounter a licence elsewhere and are unsure whether its suitable or not

Using geospatial data: a guide to licences | Open Data Institute

  • The ODI published this guide to help people understand different types of licences
  • It has 17 example use cases of geospatial data, and discusses what licence is most suitable and why

Data standards

Data standards: What are they and why do they matter? | Transparencee

  • Nice approachable description on data standards. Articles on this topic can often be quite dry, and this article is refreshingly clear.
  • ‘The power of the standard comes from the power of all the stakeholders using it. If it’s not really common then it isn’t really a standard.’

What are open standards for data? | ODI, Open Standards Guidebook

  • ‘Standards are documented, reusable agreements that solve a specific set of problems or meet clearly defined needs. Standards detail the language, concepts, rules, guidance or results that have been agreed.’
  • Part of a guidebook the ODI released on open data standards. It explains many aspects of data standards including how to find them, choose and create them.

Data standards | Data.gov

  • Overview of the definitions and terms used around data standards
  • ‘An anatomical unit of a data standard is referred to as a data standard “component.” The documentation for a more holistic data standard specification then assembles multiple components together to form a data standards “package.’
  • Especially useful is the breakdown of different types of components in a data standard package – identifiers, vocabulary, formats, APIs etc

Data Standards | UK Office of National Statistics

  • Useful as a reference list of some example data standards
  • They range from simple standards on naming conventions, such as how dates should be represented through to metadata schemas such as Dublin Core or Schema.org

OpenStand Principles | OpenStand

  • Five clear criteria (cooperation; adherence to principles; collective empowerment; availability; voluntary adoption) for the development of open standards
  • Developed jointly by a range of authoritative international bodies, including the Institute of Electrical and Electronics Engineers (IEEE) and the World Wide Web Consortium (W3C)
  • While there are many definitions of what constitutes an ‘open standard’, the OpenStand criteria are influential and widely accepted

About

Curated list of links to help you publish open data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published