# Matching Democracy Club and Wikidata items for local authorities

## Inital set-up and data load 

Load our libraries

In [290]:
import json
from pathlib import Path
import mkwikidata # you may need to pip install mkwikidata

Set up our Wikidata query. 

For earlier versions see
https://w.wiki/4wVb
https://w.wiki/4wmP

In [291]:
query = """
SELECT ?typeLabel ?item ?itemLabel  ?inception ?website ?twitter ?GSS ?WDTK WHERE {
  
  {?item wdt:P31/wdt:P279* wd:Q837766 . }
  UNION # Instance of or sub-class of Local Authority or London Borough
  {?item wdt:P31 wd:Q211690 .} 

  ?item wdt:P17 wd:Q145 . #in UK
  ?item wdt:P31 ?type . #get type 
  MINUS {?item wdt:P576 ?abol .} #ignore abolished councils
  MINUS {?item wdt:P31 wd:Q640452 .} #ignore Area Committees 
  MINUS {?item wdt:P31 wd:Q7137435 .} # ignore Parish Councils
  OPTIONAL {?item wdt:P856 ?website .}
  OPTIONAL {?item wdt:P2002 ?twitter .}
  OPTIONAL {?item wdt:P836 ?GSS.}
  OPTIONAL {?item wdt:P8167 ?WDTK .}
  OPTIONAL {?item wdt:P571 ?inception .}
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
"""

Execute the wikidata query, loading the results into a dictionary: query_result. 

In [292]:
query_result = mkwikidata.run_query(query, params={ })

## Initial Matching

Create a Python Set - wd_set - and load our itemLabels (names of councils)

In [293]:
wd_set = set()

for x in query_result["results"]["bindings"]:
    wd_set.add(x["itemLabel"]["value"])
    #print (x["itemLabel"]["value"])

set the path to where the DC data is. Create a Python set to hold that data

In [294]:
path = Path.cwd().joinpath('data')
dc_council_set = set()


Read the json file into our set - __if__ the council is current (ie not abolished)

In [295]:
with open (path / 'uk_local_authorities.json') as dc_file:
    data = json.load (dc_file)
    for item in data:
        if item['end-date'] =="":
            dc_council_set.add(item['official-name'])

Check how many councils we have in DC set

In [296]:
print(len(dc_council_set))


409


Check how many in our Wikidata set

In [297]:
print(len(wd_set))

509


Using set theory, check how many councils are in DC's set whcih are not in our Wikidata set (derived from the Wikidata Query).

This was 70+ but I've changed the query to add in London Boroughs, and updated all of both the NI councils, _and_ Combined Authorities, to be Local Authorities. 

I've changed the labels of a few Metropolitan Borough Councils - eg Oldham Metropolitan Borough Council (Q17017730) - to their official names with their common names - eg Oldham Council as the alias. This produces more matches, but it may result in some being changed back. Hopefully we can link DC/MS items with the QID before this happens! 

I __suspect__ that most of the following are down to Wikidata using _common names_ as the the item title and sometimes giving the _official tital_ as an alias. 

Also - in the MS / DC json file, the  "official-name": "North Somerset  Council", has a double-space between Somerset and Council which trips up a rudimentary match. 

In [298]:
print ("Councils in DC list but not in WD: ", len(dc_council_set - wd_set))
print("======================================")
missing_list = list(dc_council_set - wd_set)
mising_list = missing_list.sort()
for council in missing_list:
    print(council)


Councils in DC list but not in WD:  10
Borough Council of Kings Lynn and West Norfolk
City of Cardiff Council
Kirklees Council
Liverpool City Region
London Borough of Hammersmith & Fulham
Mid Ulster District Council
North Somerset  Council
Royal Borough of Windsor and Maidenhead
St Helens Council
Wirral Borough Council


In [299]:
print ("Councils in WD but not in DC list: ",len(wd_set - dc_council_set))

Councils in WD but not in DC list:  110


The WD query contains parish councils which DC data does not. 

In [300]:
print ("Councils in Wikidata list but not in DC: ", len(wd_set - dc_council_set))
print("======================================")
missing_list = list(wd_set - dc_council_set )
mising_list = missing_list.sort()
for council in missing_list:
    print(council)

Councils in Wikidata list but not in DC:  110
Barking and Dagenham London Borough Council
Barnet London Borough Council
Basildon District Council
Bexley London Borough Council
Blaenau Gwent Council
Borough Council of King's Lynn and West Norfolk
Bournemouth Borough Council
Breckland Council
Brent London Borough Council
Bromley London Borough Council
Camden London Borough Council
Caradon District Council
Cardiff Council
Carrick District Council
Castle Point District Council
Chester-le-Street District Council
Chiltern District Council
Christchurch Borough Council
City of Lincoln District Council
Corby Borough Council
Corby District Council
Croydon London Borough Council
Cumberland County Council
Daventry District Council
Derwentside District Council
Dungannon District Council
Durham City Council
Dyfed County Council
Ealing London Borough Council
Easington District Council
East Northamptonshire Council
East Riding County Council
East Suffolk County Council
Enfield London Borough Council
G

## Refining the matches

Creat matches for odd cases (ampersands / extra spaces etc

In [311]:
match_dict = {}
# format is dc_name: wikidata_label 

In [312]:
match_dict["Borough Council of Kings Lynn and West Norfolk"] = "Borough Council of King's Lynn and West Norfolk"
match_dict["City of Cardiff Council"]="Cardiff Council"
match_dict["Kirklees Council"]="Kirklees Metropolitan Borough Council"
match_dict["Liverpool City Region"] ="Liverpool City Region Combined Authority"
match_dict["London Borough of Hammersmith & Fulham"] ="Hammersmith and Fulham London Borough Council"
match_dict["Mid Ulster District Council"] ="Mid-Ulster District Council"
match_dict["North Somerset  Council"] ="North Somerset Council"
match_dict["Royal Borough of Windsor and Maidenhead"] ="Windsor and Maidenhead Borough Council"
match_dict["St Helens Council"] ="St Helens Metropolitan Borough Council"
match_dict["Wirral Borough Council"] ="Wirral Metropolitan Borough Council"

In [313]:
dc_missing = dc_council_set - wd_set
dc_matched = dc_council_set - dc_missing
print ("Councils directly matched from DC list: ", len(dc_matched))
print("======================================")
for x in dc_matched:
    match_dict[x] = x

Councils directly matched from DC list:  399


In [314]:
print(len(match_dict))

409


In [315]:
for item in match_dict.items():print(item)

('Borough Council of Kings Lynn and West Norfolk', "Borough Council of King's Lynn and West Norfolk")
('City of Cardiff Council', 'Cardiff Council')
('Kirklees Council', 'Kirklees Metropolitan Borough Council')
('Liverpool City Region', 'Liverpool City Region Combined Authority')
('London Borough of Hammersmith & Fulham', 'Hammersmith and Fulham London Borough Council')
('Mid Ulster District Council', 'Mid-Ulster District Council')
('North Somerset  Council', 'North Somerset Council')
('Royal Borough of Windsor and Maidenhead', 'Windsor and Maidenhead Borough Council')
('St Helens Council', 'St Helens Metropolitan Borough Council')
('Wirral Borough Council', 'Wirral Metropolitan Borough Council')
('Orkney Islands Council', 'Orkney Islands Council')
('Horsham District Council', 'Horsham District Council')
('North East Lincolnshire Council', 'North East Lincolnshire Council')
('London Borough of Bexley', 'London Borough of Bexley')
('East Lothian Council', 'East Lothian Council')
('Staff

__We now have a match for all items from DC by 'official-name' to the Label of a Wikidata item__

Next step is to use these to match the WD QID to DC's "local-authority-code"

eg: "KIN": "Q73072537" 
#("Borough Council of King's Lynn and West Norfolk")


## Matching QIDs and DC's Official-names. 

In [316]:
QID_dict = {}
for x in query_result["results"]["bindings"]:
    QID_dict[x["itemLabel"]["value"]] = x["item"]["value"].split("/")[-1]
    

In [334]:
#for item in QID_dict.items():print(item)

In [318]:
dc_dict = {}
with open (path / 'uk_local_authorities.json') as dc_file:
    data = json.load (dc_file)
    for item in data:
        if item['end-date'] =="":
            dc_dict[(item['official-name'])] = item['local-authority-code']

In [319]:
for item in dc_dict.items():print(item)

('Armagh City, Banbridge and Craigavon Borough Council', 'ABC')
('Aberdeenshire Council', 'ABD')
('Aberdeen City Council', 'ABE')
('Adur District Council', 'ADU')
('Argyll and Bute Council', 'AGB')
('Isle of Anglesey County Council', 'AGY')
('Allerdale Borough Council', 'ALL')
('Amber Valley Borough Council', 'AMB')
('Ards and North Down Borough Council', 'AND')
('Antrim and Newtownabbey Borough Council', 'ANN')
('Angus Council', 'ANS')
('Arun District Council', 'ARU')
('Ashford Borough Council', 'ASF')
('Ashfield District Council', 'ASH')
('Babergh District Council', 'BAB')
('Bassetlaw District Council', 'BAE')
('Basildon Borough Council', 'BAI')
('Basingstoke and Deane Borough Council', 'BAN')
('Barrow-in-Furness Borough Council', 'BAR')
('Bath and North East Somerset Council', 'BAS')
('Blackburn with Darwen Borough Council', 'BBD')
('Bedford Borough Council', 'BDF')
('London Borough of Barking and Dagenham', 'BDG')
('London Borough of Brent', 'BEN')
('London Borough of Bexley', 'BEX

In [330]:
dc_wd_dict = {}

for key,value in match_dict.items():
    print (key, "->", value)
    dc_wd_dict[dc_dict[key]] = QID_dict[value]

Borough Council of Kings Lynn and West Norfolk -> Borough Council of King's Lynn and West Norfolk
City of Cardiff Council -> Cardiff Council
Kirklees Council -> Kirklees Metropolitan Borough Council
Liverpool City Region -> Liverpool City Region Combined Authority
London Borough of Hammersmith & Fulham -> Hammersmith and Fulham London Borough Council
Mid Ulster District Council -> Mid-Ulster District Council
North Somerset  Council -> North Somerset Council
Royal Borough of Windsor and Maidenhead -> Windsor and Maidenhead Borough Council
St Helens Council -> St Helens Metropolitan Borough Council
Wirral Borough Council -> Wirral Metropolitan Borough Council
Orkney Islands Council -> Orkney Islands Council
Horsham District Council -> Horsham District Council
North East Lincolnshire Council -> North East Lincolnshire Council
London Borough of Bexley -> London Borough of Bexley
East Lothian Council -> East Lothian Council
Staffordshire Moorlands District Council -> Staffordshire Moorlands

In [332]:
print(len(dc_wd_dict))

409


In [333]:
for key,value in dc_wd_dict.items():
    print (key, "->", value)


KIN -> Q73072537
CRF -> Q5038400
KIR -> Q16995709
LCR -> Q16996746
HMF -> Q5645758
MUL -> Q16997729
NSM -> Q17016922
WNM -> Q17038560
SHN -> Q17022094
WRL -> Q17038877
ORK -> Q11994103
HOR -> Q73072656
NEL -> Q17016820
BEX -> Q207208
ELN -> Q28530253
STF -> Q73072779
ABE -> Q2425849
STN -> Q320378
BAE -> Q73045241
WYCA -> Q17035952
FAL -> Q28530255
LAN -> Q6386149
WFT -> Q40608
NNT -> Q111232446
RUN -> Q73072726
PKN -> Q7170911
POR -> Q17104656
CAY -> Q5016926
WIL -> Q1990879
EXE -> Q5420103
HRW -> Q210476
WLL -> Q7963789
ERY -> Q16989282
NECA -> Q16986408
ASH -> Q73038329
TON -> Q16901812
WECA -> Q24993670
ABD -> Q13194614
WLA -> Q73072828
WMCA -> Q21061625
ARU -> Q72980967
BRX -> Q4976877
AGB -> Q28530251
HAO -> Q73072642
NTT -> Q16934215
SLK -> Q99229588
WYE -> Q73072859
BUR -> Q16954633
KTT -> Q32508
STT -> Q17022493
NOW -> Q17012769
MSS -> Q73072679
HIG -> Q5756024
RCH -> Q17019804
IVC -> Q28530261
TVCA -> Q21061639
PEN -> Q7162423
GRY -> Q73072635
VGL -> Q7909538
BNS -> Q16950416