# Matching Democracy Club and Wikidata items for local authorities

## Inital set-up and data load 

Load our libraries

In [261]:
import json
from pathlib import Path
import mkwikidata # you may need to pip install mkwikidata

Set up our Wikidata query. 

For earlier versions see
https://w.wiki/4wVb
https://w.wiki/4wmP

In [262]:
query = """
SELECT ?typeLabel ?item ?itemLabel  ?inception ?website ?twitter ?GSS ?WDTK WHERE {
  
  {?item wdt:P31/wdt:P279* wd:Q837766 . }
  UNION # Instance of or sub-class of Local Authority or London Borough
  {?item wdt:P31 wd:Q211690 .} 

  ?item wdt:P17 wd:Q145 . #in UK
  ?item wdt:P31 ?type . #get type 
  MINUS {?item wdt:P576 ?abol .} #ignore abolished councils
  MINUS {?item wdt:P31 wd:Q640452 .} #ignore Area Committees 
  MINUS {?item wdt:P31 wd:Q7137435 .} # ignore Parish Councils
  OPTIONAL {?item wdt:P856 ?website .}
  OPTIONAL {?item wdt:P2002 ?twitter .}
  OPTIONAL {?item wdt:P836 ?GSS.}
  OPTIONAL {?item wdt:P8167 ?WDTK .}
  OPTIONAL {?item wdt:P571 ?inception .}
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
"""

Execute the wikidata query, loading the results into a dictionary: query_result. 

In [263]:
query_result = mkwikidata.run_query(query, params={ })

## Initial Matching

Create a Python Set - wd_set - and load our itemLabels (names of councils)

In [264]:
wd_set = set()

for x in query_result["results"]["bindings"]:
    wd_set.add(x["itemLabel"]["value"])
    #print (x["itemLabel"]["value"])

set the path to where the DC data is. Create a Python set to hold that data

In [265]:
path = Path.cwd().joinpath('data')
dc_council_set = set()


Read the json file into our set - __if__ the council is current (ie not abolished)

In [266]:
with open (path / 'uk_local_authorities.json') as dc_file:
    data = json.load (dc_file)
    for item in data:
        if item['end-date'] =="":
            dc_council_set.add(item['official-name'])

Check how many councils we have in DC set

In [267]:
print(len(dc_council_set))


409


Check how many in our Wikidata set

In [268]:
print(len(wd_set))

509


Using set theory, check how many councils are in DC's set whcih are not in our Wikidata set (derived from the Wikidata Query).

This was 70+ but I've changed the query to add in London Boroughs, and updated all of both the NI councils, _and_ Combined Authorities, to be Local Authorities. 

I've changed the labels of a few Metropolitan Borough Councils - eg Oldham Metropolitan Borough Council (Q17017730) - to their official names with their common names - eg Oldham Council as the alias. This produces more matches, but it may result in some being changed back. Hopefully we can link DC/MS items with the QID before this happens! 

I __suspect__ that most of the following are down to Wikidata using _common names_ as the the item title and sometimes giving the _official tital_ as an alias. 

Also - in the MS / DC json file, the  "official-name": "North Somerset  Council", has a double-space between Somerset and Council which trips up a rudimentary match. 

In [269]:
print ("Councils in DC list but not in WD: ", len(dc_council_set - wd_set))
print("======================================")
missing_list = list(dc_council_set - wd_set)
mising_list = missing_list.sort()
for council in missing_list:
    print(council)


Councils in DC list but not in WD:  10
Borough Council of Kings Lynn and West Norfolk
City of Cardiff Council
Kirklees Council
Liverpool City Region
London Borough of Hammersmith & Fulham
Mid Ulster District Council
North Somerset  Council
Royal Borough of Windsor and Maidenhead
St Helens Council
Wirral Borough Council


In [270]:
print ("Councils in WD but not in DC list: ",len(wd_set - dc_council_set))

Councils in WD but not in DC list:  110


The WD query contains parish councils which DC data does not. 

In [271]:
print ("Councils in Wikidata list but not in DC: ", len(wd_set - dc_council_set))
print("======================================")
missing_list = list(wd_set - dc_council_set )
mising_list = missing_list.sort()
for council in missing_list:
    print(council)

Councils in Wikidata list but not in DC:  110
Barking and Dagenham London Borough Council
Barnet London Borough Council
Basildon District Council
Bexley London Borough Council
Blaenau Gwent Council
Borough Council of King's Lynn and West Norfolk
Bournemouth Borough Council
Breckland Council
Brent London Borough Council
Bromley London Borough Council
Camden London Borough Council
Caradon District Council
Cardiff Council
Carrick District Council
Castle Point District Council
Chester-le-Street District Council
Chiltern District Council
Christchurch Borough Council
City of Lincoln District Council
Corby Borough Council
Corby District Council
Croydon London Borough Council
Cumberland County Council
Daventry District Council
Derwentside District Council
Dungannon District Council
Durham City Council
Dyfed County Council
Ealing London Borough Council
Easington District Council
East Northamptonshire Council
East Riding County Council
East Suffolk County Council
Enfield London Borough Council
G

## Refining the matches

Creat matches for odd cases (ampersands / extra spaces etc

In [255]:
match_dict = {}
# format is dc_name: wikidata_label 

In [272]:
match_dict["Borough Council of Kings Lynn and West Norfolk"] = "Borough Council of King's Lynn and West Norfolk"
match_dict["City of Cardiff Council"]="Cardiff Council"
match_dict["Kirklees Council"]="Kirklees Metropolitan Borough Council"
match_dict["Liverpool City Region"] ="Liverpool City Region Combined Authority"
match_dict["London Borough of Hammersmith & Fulham"] ="Hammersmith and Fulham London Borough Council"
match_dict["Mid Ulster District Council"] ="Mid-Ulster District Council"
match_dict["North Somerset  Council"] ="North Somerset Council"
match_dict["Royal Borough of Windsor and Maidenhead"] ="Windsor and Maidenhead Borough Council"
match_dict["St Helens Council"] ="St Helens Metropolitan Borough Council"
match_dict["Wirral Borough Council"] ="Wirral Metropolitan Borough Council"

In [273]:
dc_missing = dc_council_set - wd_set
dc_matched = dc_council_set - dc_missing
print ("Councils directly matched from DC list: ", len(dc_matched))
print("======================================")
for x in dc_matched:
    match_dict[x] = x

Councils directly matched from DC list:  399


In [274]:
print(len(match_dict))

409


In [276]:
for item in match_dict.items():print(item)

('Borough Council of Kings Lynn and West Norfolk', "Borough Council of King's Lynn and West Norfolk")
('City of Cardiff Council', 'Cardiff Council')
('Kirklees Council', 'Kirklees Metropolitan Borough Council')
('Liverpool City Region', 'Liverpool City Region Combined Authority')
('London Borough of Hammersmith & Fulham', 'Hammersmith and Fulham London Borough Council')
('Mid Ulster District Council', 'Mid-Ulster District Council')
('North Somerset  Council', 'North Somerset Council')
('Royal Borough of Windsor and Maidenhead', 'Windsor and Maidenhead Borough Council')
('St Helens Council', 'St Helens Metropolitan Borough Council')
('Wirral Borough Council', 'Wirral Metropolitan Borough Council')
('Orkney Islands Council', 'Orkney Islands Council')
('Horsham District Council', 'Horsham District Council')
('North East Lincolnshire Council', 'North East Lincolnshire Council')
('London Borough of Bexley', 'London Borough of Bexley')
('East Lothian Council', 'East Lothian Council')
('Staff

__We now have a match for all items from DC by 'official-name' to the Label of a Wikidata item__

Next step is to use these to match the WD QID to DC's "local-authority-code"

eg: "KIN": "Q73072537" 
#("Borough Council of King's Lynn and West Norfolk")
