# Bigfoot Coordinate Gathering

This notebook processes Bigfoot sighting data to extract or enrich geolocation information. It uses Google Maps API for geocoding and regex to extract coordinates directly from the dataset. The cleaned data will include latitude and longitude values for each sighting, enabling geographic visualization and analysis.



In [None]:
# Dependencies
import googlemaps
import pandas as pd

# Initalize the Google Maps API Client (Replace SECRET with the actual key)
gmaps = googlemaps.Client(key="SECRET")

## Loading the Dataset
Load the cleaned dataset from the data cleaning step.

In [None]:
# Load the dataset
bigfoot_df = pd.read_json('../data/filtered_years_clean.json')

# Preview the first few rows
bigfoot_df.head()

## Geocoding with Google Maps API
To enrich the dataset, this notebook attempts to geocode sightings using the following priority:
1. `nearest town` and `state`
2. `county` and `state`
3. Fallback: None if no geolocation can be found.

In [None]:
def geocode_with_fallback(row):
    try:
        # Try nearest town, state
        location = gmaps.geocode(f"{row['nearest town']}, {row['state']}")
        if location:
            return location[0]['geometry']['location']['lat'], location[0]['geometry']['location']['lng']

        # Fallback to county, state
        location = gmaps.geocode(f"{row['county']}, {row['state']}")
        if location:
            return location[0]['geometry']['location']['lat'], location[0]['geometry']['location']['lng']

        # If neither works, return None
        return None, None
    except Exception as e:
        print(f"Error geocoding {row['nearest town']}, {row['state']} or {row['county']}, {row['state']}: {e}")
        return None, None

### Apply Geocoding to Dataset
Run the geocoding function row-wise and create `latitude` and `longitude` columns in the dataframe.

In [None]:
bigfoot_df[['latitude', 'Longitude']] = bigfoot_df.apply(
    lambda row: pd.Series(geocode_with_fallback(row)), axis=1
)

In [None]:
bigfoot_df = bigfoot_df.drop(columns='Unnamed: 0')
bigfoot_df.columns

## Extracting Coordinates from Observations
Regex is used to extract specific latitude and longitude values embedded within the `observed` column. This ensures accuracy when the coordinates are directly available in the report text.

In [None]:
# Now we search for specific coordinates from the observed data
coordinate_pattern = r"\b((?:[0-8]?\d(?:\.\d+)?|90(?:\.0+)?)),\s*(-?(?:1[0-7]\d(?:\.\d+)?|0?\d{1,2}(?:\.\d+)?|180(?:\.0+)?))\b"

# Extract new latitude and longitude
extracted_coords = bigfoot_df['observed'].str.extract(coordinate_pattern, expand=True)
extracted_coords.columns = ['new_lat', 'new_long']

# Convert extracted values
extracted_coords = extracted_coords.astype(float)

# update lat and long if there are specified coords
bigfoot_df['latitude'] = bigfoot_df.apply(
    lambda row: extracted_coords.loc[row.name, 'new_lat'] if not pd.isna(extracted_coords.loc[row.name, 'new_lat']) else row['latitude'],
    axis=1
)

bigfoot_df['longitude'] = bigfoot_df.apply(
    lambda row: extracted_coords.loc[row.name, 'new_long'] if not pd.isna(extracted_coords.loc[row.name, 'new_long']) else row['longitude'],
    axis=1
)



## Filtering Valid Coordinates
Remove entries with invalid longitude values that fall outside the range expected for the United States and Canada.


In [None]:
# drop longitudes that would be outside of the USA / Canada
filtered_coords = bigfoot_df[(bigfoot_df['longitude'] >= -152) & (bigfoot_df['longitude'] <= -67)]

In [20]:
bigfoot_df.to_json('../data/bigfoot_coords_df.json', orient='records')


## Saving Cleaned Data
Export the cleaned dataset with enriched coordinates for future use.


In [21]:
final_columns = ['report_number', 'report_class', 'state', 'county', 'latitude', 'longitude', 'season', 'month', 'observed']

clean_coords_list = filtered_coords[final_columns]
clean_coords_list.to_json('../data/bigfoot_coordinates_clean_cols.json', orient='records')

## Conclusion
This notebook enriched the Bigfoot dataset by adding geolocation data using Google Maps API and regex-based coordinate extraction. The cleaned dataset is now ready for geographic analysis and visualization.