# An attempt to reconcile data on Hospitals in Scotland from Wikidata and OpenStreetMap (and WIkipedia?) 

This started with two queries on the databases at WikiData and OpenStreetMap:
    
https://w.wiki/5yQy
and 
https://overpass-turbo.eu/s/1nQg 


In [141]:
import json
import csv

osm_names_list = []
osm_wd_list = []

## Import OSM query results

I'm using the results of the OSM query here, and the WD one below. 

In [142]:
with open('export-4.geojson', 'r') as OSM_file:
    osm_data = json.load(OSM_file)
    


In [143]:
f_data = osm_data['features']
for feature in f_data:
    
    if 'name'in feature['properties'].keys(): # NB four items found without names in OSM
        osm_names_list.append(feature['properties']['name'])
    else:
        print(feature['properties']['@id'])
        print ("NO NAME")
        print ("************************")
    if 'wikidata' in feature['properties'].keys():
        #print(feature['properties']['wikidata'])
        osm_wd_list.append(feature['properties']['wikidata'])
    
    
#better to use sets
osm_names_set=set(osm_names_list)
osm_wd_set = set((osm_wd_list))

way/359604291
NO NAME
************************
way/712499976
NO NAME
************************
way/813124398
NO NAME
************************
way/531677885
NO NAME
************************


In [144]:
'''
print(osm_names_list)
print(osm_wd_list)
print(len(osm_wd_list))
'''

'\nprint(osm_names_list)\nprint(osm_wd_list)\nprint(len(osm_wd_list))\n'

## Next get WD data from query download (for now)

In [145]:
wd_names_set = set()
wd_qid_set = set()
# Read CSV file
with open("query-53.csv") as fp:
    reader = csv.reader(fp, delimiter=",", quotechar='"')
    # next(reader, None)  # skip the headers
    data_read = [row for row in reader]


for r in data_read:
    if len(r[0].split("/")) > 1: #skip first line
        wd_qid_set.add( r[0].split("/")[4]) #add QIDs to our WD set
        wd_names_set.add (r[1]) # add names to our WD set
        
   

## Compare sets

In [146]:
print("QIDs in both WD and OSM")

print (osm_wd_set.intersection(wd_qid_set))
print("********************************")
print()
print("QIDs in OSM but not in WD")
print(osm_wd_set.difference(wd_qid_set))
print("********************************")
print()
print("QIDs in WD but not in OSM")
print(wd_qid_set.difference(osm_wd_set))
print("********************************")
print()

QIDs in both WD and OSM
{'Q7270437', 'Q84598648', 'Q17151793', 'Q5639556', 'Q7038594', 'Q6060585', 'Q18161534', 'Q65076847', 'Q85200775', 'Q5560775', 'Q28406015', 'Q4810902', 'Q7926795', 'Q85011470', 'Q20027879', 'Q20570400', 'Q4784932', 'Q5313810', 'Q7593593', 'Q6410450', 'Q5519174', 'Q25858870', 'Q87677556', 'Q5327766', 'Q48814905', 'Q4882380', 'Q5069009', 'Q7620842', 'Q5304529', 'Q6037160', 'Q7894766', 'Q14943178', 'Q5050039', 'Q7856254', 'Q6060060', 'Q6520385', 'Q85815482', 'Q5179425', 'Q5472596', 'Q7987984', 'Q5579596', 'Q6301869', 'Q65065866', 'Q7283830', 'Q42887887', 'Q85785377', 'Q4850756', 'Q6900917', 'Q85797953', 'Q6724535', 'Q7808921', 'Q84187065', 'Q4944702', 'Q16852446', 'Q7373656', 'Q7374066', 'Q84599467', 'Q7855948', 'Q7878130', 'Q30279784', 'Q7440347', 'Q65060169', 'Q7592438', 'Q18161222', 'Q7926784', 'Q7244518', 'Q85011465', 'Q5566797', 'Q7617773', 'Q65065133', 'Q7987867', 'Q3822734', 'Q5566905', 'Q85011468', 'Q54869681', 'Q85011483', 'Q85011480', 'Q5524427', 'Q7170888

In [147]:
print("Names in both WD and OSM")

print (osm_names_set.intersection(wd_names_set))
print("********************************")
print()
print("Names in OSM but not in WD")
print(osm_names_set.difference(wd_names_set))
print("********************************")
print()
print("Names in WD but not in OSM")
print(wd_names_set.difference(osm_names_set))
print("********************************")
print()

Names in both WD and OSM
{'Glasgow Royal Infirmary', 'Gartnavel Royal Hospital', 'New Victoria Hospital', 'Lawson Memorial Hospital', 'Kello Hospital', 'Stonehouse Hospital', 'Seafield Hospital', 'Nairn Town and County Hospital', 'University Hospital Wishaw', "Ellen's Glen House", 'Biggart Hospital', 'Glenrothes Hospital', 'Girvan Community Hospital', 'Inverclyde Royal Hospital', 'Gartnavel General Hospital', 'East Ayrshire Community Hospital', 'Turner Memorial Hospital', 'Forth Valley Royal Hospital', 'Kincardine Community Hospital', 'Lightburn Hospital', 'Royal Edinburgh Hospital', 'Annan Hospital', 'Stracathro Hospital', 'Perth Royal Infirmary', 'Pitlochry Community Hospital', 'Inverurie Hospital', 'Edington Cottage Hospital', 'Royal Cornhill Hospital', 'Ugie Hospital', 'Wester Moffat Hospital', 'Leverndale Hospital', 'Udston Hospital', 'Campbeltown Hospital', 'Migdale Hospital', 'Fraserburgh Hospital', 'City Hospital', 'Lauriston Building', 'Ross Memorial Hospital', 'Moffat Hospita

## Issues 

So, it's clear that there are issues in both sets of data. 

Apart from the four nameless things identified as hospitals in OSM (see above) there are many hospitals in WIkidata that are not in OSM, and it appears quite a few the other way round. 

Quite a bit of that may come down to hospital buildings retaining their name in WIKIDATA as that is how Canmore references them. Perhaps these should be recategorised as Former Hospital rather than Hospital.

This [Wikipedia article](https://en.wikipedia.org/wiki/Category:Defunct_hospitals_in_Scotland) may help along with [this one](https://en.wikipedia.org/wiki/Category:Former_psychiatric_hospitals_in_Scotland) but it suggests a long manual process. 


Jack Gilmore has suggested using sources such as [this one](https://www.nhsinform.scot/scotlands-service-directory)

## Update

Wikipedia has a [list page on hospitals in Scotland](https://en.wikipedia.org/wiki/List_of_hospitals_in_Scotland) which might help but it has a note (as of 2008) saying that it is incomplete. 

Maybe we have the opportunity here to improve on existing sources _and_ to create a downloadable data set that others can use with some confidence! 