# CKAN Database

The CKAN Database is a commonly used and world renowned database where governments all around the world store their data. This notebook will serve as a means to parse through the api. 
</b>


**Information on the api can be found here:** [ckan api guide](https://docs.ckan.org/en/latest/api/#example-importing-datasets-with-the-ckan-api). General information on the ckan database and its participants can be found here: [ckan official website](https://ckan.org).

requests documentation: [here](https://requests.readthedocs.io/en/latest/)

pandas documentation: [here](https://pandas.pydata.org/docs/)

In [2]:
import requests
import pandas as pd

## Using Requests to access the API

The code below gives an example of using requests to pull from an api, as well as give an example of generally how this data is unpacked.

In [25]:
ckan_url = "http://catalog.data.gov/api/3/action/package_list"

api_token = "45tYhqFq71zd3xYo29eMgLESXiNml4Xxm9JfMmTl"

headers = {
    'Authorization':api_token
}

response = requests.post(url = ckan_url,
                        headers = headers)

assert response.status_code == 200

The particular request above gets the data catalog for all data.gov publicly available datasets, the catalog itself looks like this:

In [27]:
response_dict = response.json()

This is a lot of info, we probably want to see the keys, and maybe even just a list of the dataset, sourcename, and url, we can do this by first looking at the column names

In [36]:
response_dict.keys()

dict_keys(['help', 'success', 'result'])

Then we probably want to follow the results...

In [35]:
response_dict['result'].keys()

dict_keys(['count', 'facets', 'results', 'sort', 'search_facets'])

Then follow the results again... (Spoiler, this gives you the results.)

In [39]:
results = response_dict['result']['results']

results

[{'author': None,
  'author_email': None,
  'creator_user_id': '2b785922-9f13-491b-a3c2-2a40acbd80c2',
  'id': '844dbad1-ee1e-44b8-9799-34cb7ed24640',
  'isopen': False,
  'license_id': 'other-license-specified',
  'license_title': 'other-license-specified',
  'maintainer': 'Department of Licensing',
  'maintainer_email': 'no-reply@data.wa.gov',
  'metadata_created': '2020-11-10T17:20:33.031886',
  'metadata_modified': '2023-12-16T05:00:11.812292',
  'name': 'electric-vehicle-population-data',
  'notes': 'This dataset shows the Battery Electric Vehicles (BEVs) and Plug-in Hybrid Electric Vehicles (PHEVs) that are currently registered through Washington State Department of Licensing (DOL).',
  'num_resources': 4,
  'num_tags': 35,
  'organization': {'id': 'bccbad82-abc4-4712-bd29-3e194e7a8042',
   'name': 'state-of-washington',
   'title': 'State of Washington',
   'type': 'organization',
   'description': '',
   'image_url': '',
   'created': '2020-11-10T15:27:34.605650',
   'is_organi

In [62]:
all_keys = []
for key in [result.keys() for result in results]:
    all_keys += key

reformatted_dict = {}
for i in range(0,len(results)):
    for key in set(all_keys):
        if i == 0:
            try:
                reformatted_dict[key] = [results[i][key]]
            except: continue
        else:
            try:
                reformatted_dict[key].append(results[i][key])
            except:
                print(key + ' key not found!')

reformatted_dict

license_url key not found!
license_url key not found!
license_url key not found!
license_url key not found!
license_url key not found!
license_url key not found!
license_url key not found!
license_url key not found!
license_url key not found!


{'maintainer': ['Department of Licensing',
  'FDIC Public Data Feedback',
  'LAPD OpenData',
  'Terrance N. Lewis,',
  'Hayden Stewart',
  'Wesley Ingwersen',
  'Open Data NY',
  'NYC OpenData',
  None,
  'National Center for Health Statistics'],
 'url': [None, None, None, None, None, None, None, None, None, None],
 'isopen': [False,
  False,
  False,
  False,
  True,
  False,
  False,
  False,
  False,
  False],
 'creator_user_id': ['2b785922-9f13-491b-a3c2-2a40acbd80c2',
  '2b785922-9f13-491b-a3c2-2a40acbd80c2',
  '2b785922-9f13-491b-a3c2-2a40acbd80c2',
  '1ecd1fb1-1be6-46bb-b90d-07a0762ed104',
  '2b785922-9f13-491b-a3c2-2a40acbd80c2',
  '1ecd1fb1-1be6-46bb-b90d-07a0762ed104',
  '2b785922-9f13-491b-a3c2-2a40acbd80c2',
  '2b785922-9f13-491b-a3c2-2a40acbd80c2',
  '1ecd1fb1-1be6-46bb-b90d-07a0762ed104',
  '2b785922-9f13-491b-a3c2-2a40acbd80c2'],
 'metadata_modified': ['2023-12-16T05:00:11.812292',
  '2020-11-12T12:17:38.682707',
  '2023-12-22T14:48:35.626513',
  '2023-04-11T19:23:22.664