# DATA271 Final Project - Weather and Animal Migration Patterns in California

---

## Research Details

---

### Introducing the Problem
This project performs a statistical investigative process to explore and analyze animal migration data in conjunction with meteorological datasets to identify environmental factors that influence the movement patterns of species across California. We aim to understand how temperature, precipitation, and other atmospheric conditions affect when and where animals migrate, and whether these patterns are shifting in response to broader climate variability. 

By pairing ecological tracking data from Movebank with detailed weather reports from NOAA and NASA’s Daymet, this project investigates correlations between environmental change and behavioral shifts in migratory species. The goal is to use spatial and temporal analysis to uncover trends and stressors that can inform environmental monitoring, conservation planning, and scientific understanding of wildlife ecology.

---

### Addressing the Problem
The approach will involve joining datasets on both geographic coordinates and dates to conduct spatiotemporal analysis. We’ll investigate time series trends of both weather conditions and migration activity to determine if patterns emerge—such as species arriving earlier due to warming winters, or retreating from drought-affected areas.

Additionally, we'll explore the possibility of species migrations being driven not just by weather, but also by ecological relationships such as predator-prey dynamics. By cross-referencing species presence and timing, we can identify potential cases where weather-driven migration may be influenced—or confounded—by avoidance of other species.

This work contributes toward a better understanding of how climate variability and atmospheric anomalies impact wildlife ecosystems, particularly within a climate-sensitive and biodiversity-rich region like California.

---

### Analysis Breakdown
We ask the following questions before conducting our official exploratory data analysis:

- What weather patterns or atmospheric variables (e.g., temperature, precipitation, wind patterns) correlate most strongly with migration timing or intensity for species in California?
- Are there identifiable climate thresholds or seasonal shifts that serve as predictors for migratory events?
- Can we spatially and temporally map migration patterns alongside climate variables to detect meaningful trends or changes over time?
- How might ongoing climate variability affect the predictability and consistency of migration behaviors in the near future?
- Are any migratory shifts better explained by predator-prey relationships than by weather factors, and can predator presence be used as a confounding control in determining causality?

Our analysis will be broken down into the following stages:

1. **Explore Individual Datasets**  
   Clean and summarize each dataset—checking for null values, date and location alignment, and variable consistency. Generate summary statistics and visualizations to understand baseline structure and behavior.

2. **Analyze Combined Datasets**  
   Merge animal movement and climate datasets on time and location. Create layered time series visualizations and heat maps to understand migratory behavior in relation to environmental variables.

3. **Evaluate Significance and Observational Limitations**  
   Evaluate both the correlation strength and ecological plausibility of observed patterns. Recognize the observational nature of the data and avoid overextending conclusions where causality cannot be proven.

4. **Answer Research Questions**  
   Revisit initial questions in light of findings. Highlight where results support or refute assumptions about how atmospheric conditions drive migration, and discuss the role of other ecological pressures.

5. **Recommendations & Further Exploration**  
   Suggest data-driven implications for conservation or climate adaptation strategies. Propose new directions for data collection (e.g., more granular predator presence data), and identify gaps or limitations in the current analysis.

---

### Datasets
1. [NOAA National Centers for Environmental Information](https://www.ncei.noaa.gov/) – historical and real-time climate data  
2. [NASA's Daymet](https://daymet.ornl.gov) – gridded daily weather and climatology variables  
3. [Movebank](https://www.movebank.org/) – open-source animal movement data across species  

---

### Libraries & Modules
- **Pandas:** For time series wrangling and merging datasets  
- **Numpy:** Core scientific computing and vectorized calculations  
- **Matplotlib & Seaborn:** Visualizations for temporal and distributional trends  
- **Plotly:** Interactive visualizations and geographic mapping  
- **Geopandas:** Spatial joins and mapping of migration paths and weather zones  

---

### Project Resources
- [GitHub Repository](https://github.com/toritotony/Data271FinalProject)


## Collecting Data

In [1]:
!pip install plotnine geopandas



In [2]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import random
import numpy as np
from plotnine import *
import geopandas
import plotly
import requests
from time import *
from requests.auth import HTTPBasicAuth
from io import StringIO

### Let's start collecting NOAA data for CA daily weather summaries. We should first check what datasets are available

In [3]:
noaa_access_token = "sFsHrTCdOCOitEmsjGbCBLgKSdEbKgCP"
noaa_headers = {"token": noaa_access_token}
noaa_base_url = "https://www.ncei.noaa.gov/cdo-web/api/v2/"
CA_FIPS = "FIPS:06"
dataset_endpoint = noaa_base_url + "datasets"
CA_datasets_response = requests.get(dataset_endpoint, params = {"locationid": CA_FIPS}, headers=noaa_headers)
CA_datasets_response.json()

{'metadata': {'resultset': {'offset': 1, 'count': 11, 'limit': 25}},
 'results': [{'uid': 'gov.noaa.ncdc:C00861',
   'mindate': '1763-01-01',
   'maxdate': '2025-04-04',
   'name': 'Daily Summaries',
   'datacoverage': 1,
   'id': 'GHCND'},
  {'uid': 'gov.noaa.ncdc:C00946',
   'mindate': '1763-01-01',
   'maxdate': '2025-03-01',
   'name': 'Global Summary of the Month',
   'datacoverage': 1,
   'id': 'GSOM'},
  {'uid': 'gov.noaa.ncdc:C00947',
   'mindate': '1763-01-01',
   'maxdate': '2025-01-01',
   'name': 'Global Summary of the Year',
   'datacoverage': 1,
   'id': 'GSOY'},
  {'uid': 'gov.noaa.ncdc:C00345',
   'mindate': '1991-06-05',
   'maxdate': '2025-04-05',
   'name': 'Weather Radar (Level II)',
   'datacoverage': 0.95,
   'id': 'NEXRAD2'},
  {'uid': 'gov.noaa.ncdc:C00708',
   'mindate': '1994-05-20',
   'maxdate': '2025-04-02',
   'name': 'Weather Radar (Level III)',
   'datacoverage': 0.95,
   'id': 'NEXRAD3'},
  {'uid': 'gov.noaa.ncdc:C00821',
   'mindate': '2010-01-01',
   

### Looks like we want to use the GHCND id for our future requests so we can pull CA data with daily weather summaries. We should check which data categories are available for daily weather.

In [4]:
dataset_code = "GHCND"
datacategories_endpoint = noaa_base_url + "datacategories"
CA_datacat_response = requests.get(datacategories_endpoint, params = {"locationid": CA_FIPS, "datasetid": dataset_code, "limit": "1000"}, headers=noaa_headers).json()
CA_datacat_ids = [i['id'] for i in CA_datacat_response['results']]
CA_datacat_response
CA_datacat_ids

['EVAP', 'LAND', 'PRCP', 'SKY', 'SUN', 'TEMP', 'WATER', 'WIND', 'WXTYPE']

### Now we know what data categories are there, we might just want all of them since we're searching for correlations or associations between weather and ecological patterns. Let's look at the datatypes

In [5]:
datatype_endpoint = noaa_base_url + "datatypes"
CA_datatype_response = requests.get(datatype_endpoint, params = {"locationid": CA_FIPS, "datasetid": dataset_code, "limit": "200"}, headers=noaa_headers).json()
# filter list of ids based on importance
CA_datatype_ids = [
    "TMIN",  # Min temperature
    "TMAX",  # Max temperature
    "AWND",  # Avg wind speed
    "TSUN",  # Total sunshine
    "WT08",  # Hail
    "TAVG",  # Avg temperature
    "PRCP",  # Precipitation
    "SNOW",  # Snowfall
    "SNWD",  # Snow depth
    ]
CA_datatype_info = [(i['id'], i['name']) for i in CA_datatype_response['results'] if i['id'] in CA_datatype_ids]
CA_datatype_info

[('AWND', 'Average wind speed'),
 ('PRCP', 'Precipitation'),
 ('SNOW', 'Snowfall'),
 ('SNWD', 'Snow depth'),
 ('TAVG', 'Average Temperature.'),
 ('TMAX', 'Maximum temperature'),
 ('TMIN', 'Minimum temperature'),
 ('TSUN', 'Total sunshine for the period'),
 ('WT08', 'Smoke or haze ')]

### Now we know the datatypes available across our data categories for the dataset we found, time to grab some data across CA, but first let's look at the location categories in case we need it later when looking through the different readings across stations

In [6]:
locationcat_endpoint = noaa_base_url + "locationcategories"
CA_loccat_response = requests.get(locationcat_endpoint, params = {"locationid": CA_FIPS, "datasetid": dataset_code}, headers=noaa_headers).json()
CA_loccat_response

{'metadata': {'resultset': {'offset': 1, 'count': 12, 'limit': 25}},
 'results': [{'name': 'City', 'id': 'CITY'},
  {'name': 'Climate Division', 'id': 'CLIM_DIV'},
  {'name': 'Climate Region', 'id': 'CLIM_REG'},
  {'name': 'Country', 'id': 'CNTRY'},
  {'name': 'County', 'id': 'CNTY'},
  {'name': 'Hydrologic Accounting Unit', 'id': 'HYD_ACC'},
  {'name': 'Hydrologic Cataloging Unit', 'id': 'HYD_CAT'},
  {'name': 'Hydrologic Region', 'id': 'HYD_REG'},
  {'name': 'Hydrologic Subregion', 'id': 'HYD_SUB'},
  {'name': 'State', 'id': 'ST'},
  {'name': 'US Territory', 'id': 'US_TERR'},
  {'name': 'Zip Code', 'id': 'ZIP'}]}

### I also want to know which stations are available in CA, since we might want more information later to map out the stations and how they correlate with tracking data we receive from Movebank later on

In [7]:
station_endpoint = noaa_base_url + "stations"
CA_station_response = requests.get(station_endpoint, params={"locationid": CA_FIPS, "limit": "1000", "startdate": "2013-01-01", "enddate": "2023-12-31"}, headers=noaa_headers).json()
CA_station_response

{'metadata': {'resultset': {'offset': 1, 'count': 2662, 'limit': 1000}},
 'results': [{'elevation': 26.5,
   'mindate': '1994-01-01',
   'maxdate': '2015-11-01',
   'latitude': 38.2177,
   'name': 'ACAMPO 5 NE, CA US',
   'datacoverage': 0.9469,
   'id': 'COOP:040010',
   'elevationUnit': 'METERS',
   'longitude': -121.2013},
  {'elevation': 863.5,
   'mindate': '1931-01-01',
   'maxdate': '2013-12-01',
   'latitude': 34.4938,
   'name': 'ACTON ESCONDIDO CANYON, CA US',
   'datacoverage': 0.8986,
   'id': 'COOP:040014',
   'elevationUnit': 'METERS',
   'longitude': -118.2713},
  {'elevation': 1280.8,
   'mindate': '1943-11-01',
   'maxdate': '2015-11-01',
   'latitude': 41.19334,
   'name': 'ADIN RANGER STATION, CA US',
   'datacoverage': 0.9931,
   'id': 'COOP:040029',
   'elevationUnit': 'METERS',
   'longitude': -120.94458},
  {'elevation': 516.6,
   'mindate': '1952-10-01',
   'maxdate': '2015-11-01',
   'latitude': 32.8358,
   'name': 'ALPINE, CA US',
   'datacoverage': 0.9644,
  

### Time to grab the meteorological or climatological data from NOAA across CA. I'll only be grabbing a decade's worth of data from 2013-2023

### We can only grab a year's worth after requesting 10 years from API, as a result, we need to iteratively grab a response for each year and then extending our array with that json response result. 

In [8]:
data_endpoint = noaa_base_url + "data"

all_results = []

for year in range(2013, 2024): 
    startdate = f"{year}-01-01"
    enddate = f"{year}-12-31"
    
    params = {
        "startdate": startdate,
        "enddate": enddate,
        "datasetid": dataset_code, 
        "locationid": CA_FIPS,
        "datatypeid": CA_datatype_ids,
        "units":"standard", 
        "limit": "1000"
    }

    response = requests.get(data_endpoint, params=params, headers=noaa_headers)
    
    if response.status_code == 200:
        year_data = response.json().get("results", [])
        all_results.extend(year_data)
        print(f"Got data for {year}: {len(year_data)} records")
        sleep(3)
    else:
        print(f"Failed for {year} – {response.status_code}: {response.text}")

Got data for 2013: 1000 records
Got data for 2014: 1000 records
Got data for 2015: 1000 records
Got data for 2016: 1000 records
Got data for 2017: 1000 records
Got data for 2018: 1000 records
Got data for 2019: 1000 records
Got data for 2020: 1000 records
Got data for 2021: 1000 records
Got data for 2022: 1000 records
Got data for 2023: 1000 records


### We've collected daily weather summary data for CA in srandard units, for the provided datatype ids, between 2013 and 2023. We can now use pandas to transform this into a dataframe, where change datatypes, columns, and remove rows of data that have null values.

In [9]:
CA_NOAA_df = pd.json_normalize(all_results)
CA_NOAA_df.head()

Unnamed: 0,date,datatype,station,attributes,value
0,2013-01-01T00:00:00,PRCP,GHCND:US1CAAL0001,",,N,0700",0.0
1,2013-01-01T00:00:00,SNOW,GHCND:US1CAAL0001,",,N,0700",0.0
2,2013-01-01T00:00:00,PRCP,GHCND:US1CAAL0003,",,N,0800",0.0
3,2013-01-01T00:00:00,SNOW,GHCND:US1CAAL0003,",,N,0800",0.0
4,2013-01-01T00:00:00,PRCP,GHCND:US1CAAL0004,",,N,0700",0.0


### To summarize, the data comes from [NOAA's NCEI API](https://www.ncdc.noaa.gov/cdo-web/webservices/v2) which provides archives for weather data across the United States. This data was collected for CA across 2013 to 2023, and the dataset comes with five variables: date, datatype, station, attributes, and values associated with each datatype. Attributes are divided by a code that's divided by a measurement flag, quality flag, source flag, and time of observation. I won't detail them here but you can find more information [here](https://docs.ropensci.org/rnoaa/articles/ncdc_attributes.html)

In [10]:
CA_NOAA_df.shape

(11000, 5)

In [11]:
CA_NOAA_df.isna().sum()

date          0
datatype      0
station       0
attributes    0
value         0
dtype: int64

In [14]:
CA_NOAA_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11000 entries, 0 to 10999
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   date        11000 non-null  datetime64[ns]
 1   datatype    11000 non-null  object        
 2   station     11000 non-null  object        
 3   attributes  11000 non-null  object        
 4   value       11000 non-null  int64         
dtypes: datetime64[ns](1), int64(1), object(3)
memory usage: 429.8+ KB


In [13]:
CA_NOAA_df['date'] = pd.to_datetime(CA_NOAA_df['date'])
CA_NOAA_df['value'] = CA_NOAA_df['value'].astype(int)
CA_NOAA_df.head()

Unnamed: 0,date,datatype,station,attributes,value
0,2013-01-01,PRCP,GHCND:US1CAAL0001,",,N,0700",0
1,2013-01-01,SNOW,GHCND:US1CAAL0001,",,N,0700",0
2,2013-01-01,PRCP,GHCND:US1CAAL0003,",,N,0800",0
3,2013-01-01,SNOW,GHCND:US1CAAL0003,",,N,0800",0
4,2013-01-01,PRCP,GHCND:US1CAAL0004,",,N,0700",0


In [15]:
CA_NOAA_df.value_counts("datatype") # Appears that there isn't some of the variables available that we requested such as average wind speed, sunshine, smoke/haze, average temperature, and hail 

datatype
PRCP    5794
SNOW    3728
TMAX     504
TMIN     501
SNWD     473
Name: count, dtype: int64

### Let's move onto collecting data from Movebank, where we'll retrieve data related to animal movements 

In [16]:
movebank_base_url = "https://www.movebank.org/movebank/service/"
mb_email = "aw399@humboldt.edu"
mb_username = "aw399"
mb_password = "MQG3xrBg8SWKmyR"

### What attributes are there so we know how we can filter down the studies request before requesting the associated tracking data. 

In [21]:
attributes_endpoint = movebank_base_url + "direct_read?attributes"
mb_attr_response = requests.get(attributes_endpoint, auth=HTTPBasicAuth(mb_username, mb_password))
mb_attr_response.text

'<!doctype html><html lang="en"><head><title>HTTP Status 404 – Not Found</title><style type="text/css">body {font-family:Tahoma,Arial,sans-serif;} h1, h2, h3, b {color:white;background-color:#525D76;} h1 {font-size:22px;} h2 {font-size:16px;} h3 {font-size:14px;} p {font-size:12px;} a {color:black;} .line {height:1px;background-color:#525D76;border:none;}</style></head><body><h1>HTTP Status 404 – Not Found</h1><hr class="line" /><p><b>Type</b> Status Report</p><p><b>Message</b> The requested resource [&#47;movebank&#47;service&#47;direct_read] is not available</p><p><b>Description</b> The origin server did not find a current representation for the target resource or is not willing to disclose that one exists.</p><hr class="line" /><h3>Apache Tomcat/8.5.100</h3></body></html>'

### Let's collect studies where I can see the data and have download access, authenticate with our username and password, converting it into a CSV and then a pandas dataframe, and filter it down by timestamps, main location long/lat, and whether it includes any information about California or its native animals

In [25]:
from datetime import datetime
from requests.auth import HTTPBasicAuth

mb_studies_endpoint = movebank_base_url + "direct-read"

mb_params = {
    "entity_type": "study",
    "i_can_see_data": "true",
    "i_have_download_access": "true"
}

mb_studies_response = requests.get(
    mb_studies_endpoint,
    params=mb_params,
    auth=HTTPBasicAuth(mb_username, mb_password)
)

mb_df_studies = pd.read_csv(StringIO(mb_studies_response.text))

mb_df_studies["timestamp_first_deployed_location"] = pd.to_datetime(mb_df_studies["timestamp_first_deployed_location"], errors='coerce')
mb_df_studies["timestamp_last_deployed_location"] = pd.to_datetime(mb_df_studies["timestamp_last_deployed_location"], errors='coerce')

mb_start = pd.to_datetime("2013-01-01")
mb_end = pd.to_datetime("2023-12-31")

mb_df_filtered = mb_df_studies[
    (mb_df_studies["timestamp_last_deployed_location"] >= mb_start) &
    (mb_df_studies["timestamp_first_deployed_location"] <= mb_end)
]

mb_df_filtered = mb_df_filtered[
    (mb_df_filtered["main_location_lat"] >= 32.0) &
    (mb_df_filtered["main_location_lat"] <= 42.0) &
    (mb_df_filtered["main_location_long"] >= -125.0) &
    (mb_df_filtered["main_location_long"] <= -114.0)
]

mb_df_filtered = mb_df_filtered[mb_df_filtered["name"].str.contains("california|sierra|bay area|america|us|ca|CA|California", case=False, na=False)]

mb_df_filtered.head(20)

# Download to analyze and filter down the studies we'll be using and animals with data we can work with
#mb_df_filtered.to_csv("movebank_studies.csv")

Unnamed: 0,acknowledgements,citation,go_public_date,grants_used,has_quota,i_am_owner,id,is_test,license_terms,license_type,...,there_are_data_which_i_cannot_see,i_have_download_access,i_am_collaborator,study_permission,timestamp_first_deployed_location,timestamp_last_deployed_location,number_of_deployed_locations,taxon_ids,sensor_type_ids,contact_person_name
147,,"Huysman AE, CastaÃ±eda XA, Johnson MD. 2021. D...",,,True,False,1426277950,False,,CC_BY,...,False,True,False,na,2016-03-17 16:40:00,2018-07-18 13:09:00,34012.0,Tyto furcata,GPS,ahuysman (Allison Huysman)
290,Support for this study was provided by the S. ...,"BA Barbaree, ME Reiter, CM Hickey, GW Page. 2...",,Grant to the Migratory Bird Conservation Partn...,True,False,1419917362,False,,CC_BY,...,False,True,False,na,2013-02-08 00:00:00,2013-04-23 00:00:00,212.0,"Calidris alpina,Limnodromus scolopaceus",Radio Transmitter,auoptimo (Blake Barbaree)
371,,A more complete version of this dataset is sto...,,,True,False,217784323,False,,CC_BY,...,False,True,False,na,2003-11-05 15:00:00,2016-12-08 14:04:04,679402.0,"Cathartes aura,Coragyps atratus",GPS,dbarber (David Barber)
429,C. G. Putnam provided the template for Figure ...,"Barbaree BA, Reiter ME, Hickey CM, Page GW. 20...",,Support for this study was provided by the S. ...,True,False,1437646021,False,,CC_BY,...,False,True,False,na,2012-08-19 07:00:00,2014-03-18 07:00:00,475.0,Limnodromus scolopaceus,Radio Transmitter,auoptimo (Blake Barbaree)
570,,"Bildstein KL, Barber D, Bechard MJ, GraÃ±a Gri...",,,True,False,481458,False,,CC_BY,...,False,True,False,na,2003-11-05 15:00:00,2025-04-05 18:00:00,2217409.0,"Cathartes aura,Coragyps atratus","GPS,Accessory Measurements",dbarber (David Barber)
595,,Bloom PH (2015) Northward summer migration of ...,,,True,False,1073231887,False,,CC_BY,...,False,True,False,na,2007-08-22 00:00:00,2017-06-04 01:32:03,43006.0,Buteo jamaicensis,"GPS,Argos Doppler Shift",bloombio (Pete Bloom)
676,Work was conducted under permits to JA Estep (...,"Fleishman E, Anderson J, Dickson BG, Krolick D...",,Support for this work was provided by Brookfie...,True,False,164144882,False,These data have been published by the Movebank...,CUSTOM,...,False,True,False,na,2011-06-17 16:00:00,2013-08-28 06:00:00,18828.0,Buteo swainsoni,"GPS,Radio Transmitter",efleishman (Erica Fleishman)
697,We thank the Acopian Family for providing Hawk...,A more complete version of this dataset is sto...,,This research was supported in part by NASA un...,True,False,16880941,False,,CC_0,...,False,True,False,na,2003-11-14 16:00:00,2013-03-19 03:00:00,215719.0,Cathartes aura,GPS,dbarber (David Barber)
771,\tPlease include your email address when reque...,"Irvine LM, Palacios DM, Lagerquist BA, Mate BR...",,,True,False,943824007,False,,CC_BY,...,False,True,False,na,2014-08-04 18:44:30,2015-08-06 20:34:53,17150.0,"Balaenoptera physalus,Balaenoptera musculus",GPS,mmiwtg (Barb Lagerquist)
794,,"Serieys LEK, Matsushima SS, Wilmers CC. 2024. ...",,,True,False,5175345606,False,,CC_0,...,False,True,False,na,2017-06-03 21:00:27,2018-12-29 07:55:08,647712.0,"Lynx rufus,Urocyon cinereoargenteus","GPS,Acceleration",ucsc@bobcat (Laurel Serieys)


### Based on the information, we'll be collecting tracking data from 2013, some information doesn't span a whole decade so we'll shorten our dataset later for NOAA retrieval so that we can match up the tracking occurrences and weather readings. The animals we'll be studying are: Barn Owls, Vultures, Red Tailed Hawk, Bobcats, Grey Foxes, and potentially Blue & Fin Whales.

In [30]:
# Create function to get metadata about studies, and functions that call on that function with different entity types
# we can quickly get all the study information this way 

BASE_URL = mb_studies_endpoint

def get_movebank_entity(entity_type, study_id, username, password):
    params = {
        "entity_type": entity_type,
        "study_id": study_id
    }

    response = requests.get(BASE_URL, params=params, auth=HTTPBasicAuth(username, password))

    if response.status_code == 200:
        try:
            df = pd.read_csv(StringIO(response.text))
            print(f"{entity_type.capitalize()} info retrieved for {study_id}")
            return df
        except Exception as e:
            print(f"Failed to parse {entity_type}: {e}")
            return pd.DataFrame()
    else:
        print(f"Failed to fetch {entity_type} for {study_id}: {response.status_code}")
        return pd.DataFrame()

# Wrapped functions that call separate entity types for all studies to retrieve all metadata per study 
def get_study_metadata(study_id, username, password):
    return get_movebank_entity("study", study_id, username, password)

def get_tags(study_id, username, password):
    return get_movebank_entity("tag", study_id, username, password)

def get_individuals(study_id, username, password):
    return get_movebank_entity("individual", study_id, username, password)

def get_deployments(study_id, username, password):
    return get_movebank_entity("deployment", study_id, username, password)

def get_sensors(study_id, username, password):
    return get_movebank_entity("sensor", study_id, username, password)

def collect_all_metadata_for_studies(study_dict, username, password):
    entity_types = ["study", "tag", "individual", "deployment", "sensor"]
    all_metadata = {}

    for label, study_id in study_dict.items():
        print(f"\n Collecting metadata for: {label} ({study_id})")
        study_data = {}
        for entity in entity_types:
            df = get_movebank_entity(entity, study_id, username, password)
            study_data[entity] = df
        all_metadata[label] = study_data
    
    return all_metadata

mb_studies = {
    "BarnOwl": 1426277950,
    "Vultures": 217784323,
    "RedTailedHawk": 1073231887,
    "BobcatFox": 5175345606,
    "Whales": 1027467132  # we might want to check out an oceanic species too
}

In [31]:
# CURRENTLY FAILING BUT I believe this is because it's tricky to accept agreements when making request, more updates soon!
all_study_metadata = collect_all_metadata_for_studies(mb_studies, mb_username, mb_password)
# example access
all_study_metadata["Whales"]["individual"]


 Collecting metadata for: BarnOwl (1426277950)
Study info retrieved for 1426277950
Tag info retrieved for 1426277950
Individual info retrieved for 1426277950
Deployment info retrieved for 1426277950
Failed to fetch sensor for 1426277950: 500

 Collecting metadata for: Vultures (217784323)
Study info retrieved for 217784323
Failed to parse tag: Error tokenizing data. C error: Expected 1 fields in line 13, saw 17

Failed to parse individual: Error tokenizing data. C error: Expected 1 fields in line 13, saw 17

Failed to parse deployment: Error tokenizing data. C error: Expected 1 fields in line 13, saw 17

Failed to fetch sensor for 217784323: 500

 Collecting metadata for: RedTailedHawk (1073231887)
Study info retrieved for 1073231887
Failed to parse tag: Error tokenizing data. C error: Expected 1 fields in line 13, saw 5

Failed to parse individual: Error tokenizing data. C error: Expected 1 fields in line 13, saw 5

Failed to parse deployment: Error tokenizing data. C error: Expected

## Clean Data

## Gather Statistics

## Analyze Statistics 

## Use Above to Answer Questions using Inferential Statistics and Prediction

## Answer Questions and Conclude Findings

## References and Citations