# Geocoding
This Colab notebook is part of our Digital Humanities course Mini Project No. 2, where we explore how to visualize the places mentioned in news articles over time. Using computational tools, we extract toponyms (place names) from each article, map them, and observe how the geographic focus of the news shifts over time.

In the notebook titled Gaza_NER2_yasir_afreen_bushra, which is a part of our Mini Project No. 2 portfolio, we applied Named Entity Recognition (NER), a Natural Language Processing (NLP) technique, to identify place names in a dataset of 4341 Al Jazeera English articles about the Gaza war, compiled by Inacio Vieira. Although the dataset includes articles from various dates, we applied a filter to include only those published in January 2024, as instructed. Our focus was to extract place names specifically from that month.

 We stored the places we found using NER in a TSV file named ner_counts.tsv, which is also part of our Mini Project No. 2 portfolio. Now, in this Colab notebook, we will apply geocoding to find coordinates for those places and save them in a TSV file named NER_Gazetteer.tsv, which we will later use for mapping purposes.

In [22]:
# clone our FASDH25-portfolio2 folder here so we will be able to access ner_counts.tsv which contains lists of places for which we are trying to find coordinates using geocoding
!git clone https://github.com/yasirrauf-123/FASDH25-portfolio2.git

fatal: destination path 'FASDH25-portfolio2' already exists and is not an empty directory.


In [23]:
# Did minor adjustments by ourselves like changed the username rest of the code is as given by the instructor in the Gaza-NER2 notebook
import requests
import time

geonames_username = "yasir.rauf2"

def get_coordinates(place, username=geonames_username, fuzzy=0, timeout=1):
  """This function gets a single set of coordinates from the geonames API.

  Args:
    place (str): the place name
    username (str): your geonames user name
    fuzzy (int): 0 = exact matching, 1 = fuzzy matching (allow similar but not exact matches)
    timeout (int): number of seconds to wait before a call to the geonames API
      (to avoid being blocked for overloading the server)

  Returns:
    dictionary: keys: latitude, longitude
  """
  # wait a short while, so that we don't overload the server:
  time.sleep(timeout)
  # make the API call:
  url = "http://api.geonames.org/searchJSON?"
  params = {"q": place, "username": username, "fuzzy": fuzzy, "maxRows": 1, "isNameRequired": True}
  response = requests.get(url, params=params)
  # convert the response into a dictionary:
  results = response.json()
  print(results)
  # get the first result:
  try:
    result = results["geonames"][0]
    return {"latitude": result["lat"], "longitude": result["lng"]}
  except (IndexError, KeyError):
    print("No results found for your API call", response.request.url)

In [24]:
# Code was modified by ourselves but asked ChatGPT to help us understand the code (See ChatGPT Solutions No.)
# Define the path to the input file containing place names (from NER output)
input_filename = "/content/FASDH25-portfolio2/scripts/ner_counts.tsv"

# Define the path to the output file that will store the geocoded gazetteer
output_filename = "/content/FASDH25-portfolio2/scripts/ner_gazetteer.tsv"

# Open the input file in read mode using UTF-8 encoding
with open(input_filename, "r", encoding="utf-8") as file:
    # Read all lines from the file into a list
    lines = file.readlines()

# Extract place names by skipping the header and taking the first column from each line because it contains the names we will geocode
place_names = [line.strip().split("\t")[0] for line in lines[1:]]

# Open the output file in write mode to save the final results
with open(output_filename, "w", encoding="utf-8") as out_file:
    # Write the header row with column names as required: placename, latitude, longitude
    out_file.write("placename\tlatitude\tlongitude\n")

    # Loop through each place name to find its coordinates
    for name in place_names:
        # Call the get_coordinates function
        coordinates = get_coordinates(name)

        # If coordinates are successfully found
        if coordinates:
            lat = coordinates['latitude']    # Extract latitude
            lon = coordinates['longitude']   # Extract longitude
            # Write the place name with its coordinates to the output file
            out_file.write(f"{name}\t{lat}\t{lon}\n")
        else:
            # If coordinates are not found, write NA values
            out_file.write(f"{name}\tNA\tNA\n")

# Open the output file again in read mode to display the results
with open(output_filename, encoding="utf-8") as file:
    # Print the contents to the console for quick verification
    print(file.read())




{'totalResultsCount': 33, 'geonames': [{'adminCode1': '00', 'lng': '34.75', 'geonameId': 294640, 'toponymName': 'State of Israel', 'countryId': '294640', 'fcl': 'A', 'population': 8883800, 'countryCode': 'IL', 'name': 'Israel', 'fclName': 'country, state, region,...', 'countryName': 'Israel', 'fcodeName': 'independent political entity', 'adminName1': '', 'lat': '31.5', 'fcode': 'PCLI'}]}
{'totalResultsCount': 40, 'geonames': [{'adminCode1': 'GZ', 'lng': '34.46672', 'geonameId': 281133, 'toponymName': 'Gaza', 'countryId': '6254930', 'fcl': 'P', 'population': 410000, 'countryCode': 'PS', 'name': 'Gaza', 'fclName': 'city, village,...', 'adminCodes1': {}, 'countryName': 'Palestine', 'fcodeName': 'seat of a first-order administrative division', 'adminName1': 'Gaza Strip', 'lat': '31.50161', 'fcode': 'PPLA'}]}
{'totalResultsCount': 49, 'geonames': [{'adminCode1': '00', 'lng': '35.20329', 'geonameId': 6254930, 'toponymName': 'Palestine', 'countryId': '6254930', 'fcl': 'A', 'population': 45690