# Bigfoot Coordinate Gathering

This notebook processes Bigfoot sighting data to extract or enrich geolocation information. It uses Google Maps API for geocoding and regex to extract coordinates directly from the dataset. The cleaned data will include latitude and longitude values for each sighting, enabling geographic visualization and analysis.



In [1]:
# Dependencies
import googlemaps
import pandas as pd
from dotenv import load_dotenv
import os

# Obtain environment variables
load_dotenv()
GOOGLE_MAPS_API_KEY = os.getenv('GOOGLE_MAPS_API_KEY')

# Initalize the Google Maps API Client (Replace SECRET with the actual key)
gmaps = googlemaps.Client(key=GOOGLE_MAPS_API_KEY)

## Loading the Dataset
Load the cleaned dataset from the data cleaning step.

In [2]:
# Load the dataset
bigfoot_df = pd.read_json('../data/filtered_years_clean.json')

# Preview the first few rows
bigfoot_df.head()

Unnamed: 0,report_number,report_class,year,season,month,state,county,location_details,nearest_town,nearest_road,observed,also_noticed,other_witnesses,other_stories,time_and_conditions,environment,date,a_&_g_references
0,13038,A,2004,Winter,February,Alaska,Anchorage County,Up near powerline clearings east of Potter Mar...,Anchorage / Hillside,No real roads in the area,I and two of my friends were bored one night s...,"Some tracks in the snow, and a clearing in the...",My two friends were snowmachining behind me bu...,I have not heard of any other incidents in Anc...,Middle of the night. The only light was the he...,"In the middle of the woods, in a clearing cove...",,
1,8792,B,2003,Winter,December,Alaska,Anchorage County,"Few houses on the way, a power relay station. ...",Anchorage,Dowling,"Me and a couple of friends had been bored, whe...","We smelled of colonge and after shave, and one...","4. Me, w-man, warren and sean. We were at my h...",no,"Started at 11, ended at about 3-3:30. Weather ...","A pine forest, with a bog or swamp on the righ...",Friday night,
2,1255,B,1998,Fall,September,Alaska,Bethel County,"45 miles by air west of Lake Iliamna, Alaska i...",,,My hunting buddy and I were sitting on a ridge...,nothing unusual,Scouting for caribou with high quality binoculars,,,Call Iliamna Air taxi for lat & Long of Long L...,3,
3,11616,B,2004,Summer,July,Alaska,Bristol Bay County,"Approximately 95 miles east of Egegik, Alaska....",Egegik,,"To whom it may concern, I am a commercial fish...",Just these foot prints and how obvious it was ...,"One other witness, and he was fishing prior to...","I've only heard of one other story, from an ol...","Approximately 12:30 pm, partially coudy/sunny.","Lake front,creek spit, gravel and sand, alder ...",20,
4,637,A,2000,Summer,June,Alaska,Cordova-McCarthy County,"On the main trail toward the glacier, before t...","Kennikot, Alaska",not sure,My hiking partner and I arrived late to the Ke...,I did hear what appeared to be grunting in the...,"I was the only witness, there was one other in...",,About 12:00 Midnight / full moon / clear / dim...,This sighting was located at approximately 1 t...,16,


## Geocoding with Google Maps API
To enrich the dataset, this notebook attempts to geocode sightings using the following priority:
1. `nearest_town` and `state`
2. `county` and `state`
3. Fallback: None if no geolocation can be found.

In [3]:
def geocode_with_fallback(row):
    try:
        # Try nearest_town, state
        location = gmaps.geocode(f"{row['nearest_town']}, {row['state']}")
        if location:
            return location[0]['geometry']['location']['lat'], location[0]['geometry']['location']['lng']

        # Fallback to county, state
        location = gmaps.geocode(f"{row['county']}, {row['state']}")
        if location:
            return location[0]['geometry']['location']['lat'], location[0]['geometry']['location']['lng']

        # If neither works, return None
        return None, None
    except Exception as e:
        print(f"Error geocoding {row['nearest_town']}, {row['state']} or {row['county']}, {row['state']}: {e}")
        return None, None

### Apply Geocoding to Dataset
Run the geocoding function row-wise and create `latitude` and `longitude` columns in the dataframe.

In [None]:
bigfoot_df[['latitude', 'longitude']] = bigfoot_df.apply(
    lambda row: pd.Series(geocode_with_fallback(row)), axis=1
)

In [None]:

bigfoot_df.head()

Unnamed: 0,report_number,report_class,year,season,month,state,county,location_details,nearest_town,nearest_road,observed,also_noticed,other_witnesses,other_stories,time_and_conditions,environment,date,a_&_g_references,latitude,Longitude
0,13038,A,2004,Winter,February,Alaska,Anchorage County,Up near powerline clearings east of Potter Mar...,Anchorage / Hillside,No real roads in the area,I and two of my friends were bored one night s...,"Some tracks in the snow, and a clearing in the...",My two friends were snowmachining behind me bu...,I have not heard of any other incidents in Anc...,Middle of the night. The only light was the he...,"In the middle of the woods, in a clearing cove...",,,61.119996,-149.74543
1,8792,B,2003,Winter,December,Alaska,Anchorage County,"Few houses on the way, a power relay station. ...",Anchorage,Dowling,"Me and a couple of friends had been bored, whe...","We smelled of colonge and after shave, and one...","4. Me, w-man, warren and sean. We were at my h...",no,"Started at 11, ended at about 3-3:30. Weather ...","A pine forest, with a bog or swamp on the righ...",Friday night,,61.217576,-149.899678
2,1255,B,1998,Fall,September,Alaska,Bethel County,"45 miles by air west of Lake Iliamna, Alaska i...",,,My hunting buddy and I were sitting on a ridge...,nothing unusual,Scouting for caribou with high quality binoculars,,,Call Iliamna Air taxi for lat & Long of Long L...,3,,64.500591,-165.408641
3,11616,B,2004,Summer,July,Alaska,Bristol Bay County,"Approximately 95 miles east of Egegik, Alaska....",Egegik,,"To whom it may concern, I am a commercial fish...",Just these foot prints and how obvious it was ...,"One other witness, and he was fishing prior to...","I've only heard of one other story, from an ol...","Approximately 12:30 pm, partially coudy/sunny.","Lake front,creek spit, gravel and sand, alder ...",20,,58.213737,-157.374253
4,637,A,2000,Summer,June,Alaska,Cordova-McCarthy County,"On the main trail toward the glacier, before t...","Kennikot, Alaska",not sure,My hiking partner and I arrived late to the Ke...,I did hear what appeared to be grunting in the...,"I was the only witness, there was one other in...",,About 12:00 Midnight / full moon / clear / dim...,This sighting was located at approximately 1 t...,16,,61.486389,-142.886389


## Extracting Coordinates from Observations
Regex is used to extract specific latitude and longitude values embedded within the `observed` column. This ensures accuracy when the coordinates are directly available in the report text.

In [None]:
# Now we search for specific coordinates from the observed data
coordinate_pattern = r"\b((?:[0-8]?\d(?:\.\d+)?|90(?:\.0+)?)),\s*(-?(?:1[0-7]\d(?:\.\d+)?|0?\d{1,2}(?:\.\d+)?|180(?:\.0+)?))\b"

# Extract new latitude and longitude
extracted_coords = bigfoot_df['observed'].str.extract(coordinate_pattern, expand=True)
extracted_coords.columns = ['new_lat', 'new_long']

# Convert extracted values
extracted_coords = extracted_coords.astype(float)

# update lat and long if there are specified coords
bigfoot_df['latitude'] = bigfoot_df.apply(
    lambda row: extracted_coords.loc[row.name, 'new_lat'] if not pd.isna(extracted_coords.loc[row.name, 'new_lat']) else row['latitude'],
    axis=1
)

bigfoot_df['longitude'] = bigfoot_df.apply(
    lambda row: extracted_coords.loc[row.name, 'new_long'] if not pd.isna(extracted_coords.loc[row.name, 'new_long']) else row['longitude'],
    axis=1
)



**The Jamaica Problem**

For some reason the googlemaps api is setting a sighting in the county of [Jamaica Virginia](https://en.wikipedia.org/wiki/Jamaica,_Virginia) 
to the coordinates for [Jamaica](https://en.wikipedia.org/wiki/Jamaica) the couuntry. So we're just goign to manually fix it with the appropriate coordinates 
according to Wikipedia. 

In [None]:
bigfoot_df.loc[bigfoot_df['nearest_town'].str.contains('jamaica', case=False, na=False), ['latitude', 'longitude']] = [37.4255, -76.4139]

## Filtering Valid Coordinates
Remove entries with invalid longitude values that fall outside the range expected for the United States and Canada.


In [None]:
# drop longitudes that would be outside of the USA / Canada
filtered_coords = bigfoot_df[(bigfoot_df['longitude'] >= -152) & (bigfoot_df['longitude'] <= -67)]

In [None]:
bigfoot_df.to_json('../data/bigfoot_coords_df.json', orient='records')

## Saving Cleaned Data
Export the cleaned dataset with enriched coordinates for future use.


In [None]:
final_columns = ['report_number', 'report_class', 'state', 'county', 'latitude', 'nearest_town', 'longitude', 'season', 'month', 'observed', 'year']

clean_coords_list = filtered_coords[final_columns]
clean_coords_list.to_json('../data/bigfoot_coordinates_clean_cols.json', orient='records')

## Conclusion
This notebook enriched the Bigfoot dataset by adding geolocation data using Google Maps API and regex-based coordinate extraction. The cleaned dataset is now ready for geographic analysis and visualization.