Working with the Green Web Open Datasets
Every month The Green Web Foundation publishes a dataset of green domain names, and who hosts them, called
This data closely follows the data available over the Green Web API, and generally speaking, analysis you might use the green web API for, you can use the published datasets for, without needing to hit the API for each check.
Understanding the url2green dataset
Every check of a website is recorded in a table called greenchecks. As of January 2020, this table is nearly 1.6 billion rows, so is rather unwieldy to work with.
For this reason, the dataset we publish contains a smaller table,
green_presenting, listing the urls, and their status, with the columns below.
|id||the id of the last check|
|url||the url checked|
|hosted_by||the organisation hosting this site|
|hosted_by_website||the website of the company providing the hosting for this site|
|partner||does this url belong to one of the web green web partner organisations|
|green||is this a green domain? 1 for yes, 0 for no.|
|hosted_by_id||the id of the hosting company|
|modified||the time and date of the last check of this url|
Example uses of this dataset
Because this data provides similar data to the greencheck API, this dataset can work like an offline cache, where making API calls for each check either would either be too slow, or leak data about your users that you would not want to share.
- running local checks for privacy - a build of the privacy protecting search engine searx, uses this, to avoid needing to leak information
- checking domains as part of development workflow - tools which consume the green web foundation's green check API, like Greenhouse, or Website Carbon, can use this to avoid being reliant on the Green Web API for running checks
- running analysis to understand how centralisation of the web changes over time - because this dataset shows which organisations host each domain, you can get an idea of how the web is becoming more or less centralised, and flowing through fewer providers.
Licensing of the data
This dataset is releases under the Open Database Licence.
Getting support with using the the Green Web Foundation datasets
We provide limited, free support for using the Green Web Datasets we publish, and are happy to provide advice or answer questions about this data if you want to use it in classes or research.
If you're interested in further analysis about the shift of the web away from fossil fuels, the Green Web Foundation has data going back to 2009, and we're happy to do collaborations.