Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
40 lines (26 sloc) 3.29 KB

Working with the Green Web Open Datasets

Every month The Green Web Foundation publishes a dataset of green domain names, and who hosts them, called url2green.

This data closely follows the data available over the Green Web API, and generally speaking, analysis you might use the green web API for, you can use the published datasets for, without needing to hit the API for each check.

Understanding the url2green dataset

Every check of a website is recorded in a table called greenchecks. As of January 2020, this table is nearly 1.6 billion rows, so is rather unwieldy to work with.

For this reason, the dataset we publish contains a smaller table, green_presenting, listing the urls, and their status, with the columns below.

Column Description
id the id of the last check
url the url checked
hosted_by the organisation hosting this site
hosted_by_website the website of the company providing the hosting for this site
partner does this url belong to one of the web green web partner organisations
green is this a green domain? 1 for yes, 0 for no.
hosted_by_id the id of the hosting company
modified the time and date of the last check of this url

Example uses of this dataset

Because this data provides similar data to the greencheck API, this dataset can work like an offline cache, where making API calls for each check either would either be too slow, or leak data about your users that you would not want to share.

  • running local checks for privacy - a build of the privacy protecting search engine searx, uses this, to avoid needing to leak information
  • checking domains as part of development workflow - tools which consume the green web foundation's green check API, like Greenhouse, or Website Carbon, can use this to avoid being reliant on the Green Web API for running checks
  • running analysis to understand how centralisation of the web changes over time - because this dataset shows which organisations host each domain, you can get an idea of how the web is becoming more or less centralised, and flowing through fewer providers.

Licensing of the data

This dataset is releases under the Open Database Licence.

Getting support with using the the Green Web Foundation datasets

We provide limited, free support for using the Green Web Datasets we publish, and are happy to provide advice or answer questions about this data if you want to use it in classes or research.

If you're interested in further analysis about the shift of the web away from fossil fuels, the Green Web Foundation has data going back to 2009, and we're happy to do collaborations.

You can’t perform that action at this time.