### CAFO Poultry Plants EDA

This notebook includes exploratory data analysis on and mapping of poultry plants in the state permits and Counterglow datasets. 

State permit data was obtained from individual state government websites, as CAFOs are usually regulated for environmental quality issues. As each state website formats their data differently, input columns were standardized in a farm_source config file for each dataset in the raw/cafo folder. 

The Counterglow dataset was obtained from Project Counterglow and contains crowdsourced data on the locations of CAFOs. 

In [1]:
import pandas as pd
from pipeline.utils.visualize import map_state
from pipeline.constants import CLEANED_CAFO_POULTRY_FPATH, CLEANED_COUNTERGLOW_FPATH, MATCHED_FARMS_FPATH, UNMATCHED_FARMS_FPATH

ModuleNotFoundError: No module named 'pipeline'

In [2]:
permit_data_joined = pd.read_csv(CLEANED_CAFO_POULTRY_FPATH)
counterglow = pd.read_csv(CLEANED_COUNTERGLOW_FPATH)

permit_data_joined.head()

Unnamed: 0.1,Unnamed: 0,name,permit,state,source,address,lat,long
0,0,4D Farms LLC,General Permit - Poultry AFO,MS,Mississippi Department of Environmental Quality,,,
1,1,4S Farms LLC,General Permit - Poultry AFO,MS,Mississippi Department of Environmental Quality,,,
2,2,A B Westbrook Farms LLC,General Permit - Poultry AFO,MS,Mississippi Department of Environmental Quality,,,
3,3,A and J Poultry,General Permit - Poultry AFO,MS,Mississippi Department of Environmental Quality,,,
4,4,A and W Garner Farms,General Permit - Poultry AFO,MS,Mississippi Department of Environmental Quality,,,


There are currently three states in the combined state permit dataset, but more can be added to the cafo folder for the clean_cafo function to process.

In [3]:
permit_data_joined['state'].unique()

array(['MS', 'NC', 'AL'], dtype=object)

In [4]:
no_loc = permit_data_joined['lat'].isnull().sum()
contains_loc = len(permit_data_joined) - no_loc

print(f"In the combined state permit dataset, there are {no_loc} entries with no longitude/latitude data and {contains_loc} entries with data.")

In the combined state permit dataset, there are 1415 entries with no longitude/latitude data and 20 entries with data.


In [5]:
counterglow.head()

Unnamed: 0.1,Unnamed: 0,Name,Latitude,Longitude,Address,City,State,County,Description,Business/company name,...,Phone number,Region,Facility name,Number of animals,Full address,Website URL,Postcode,Suburb/city,Contracted to,Farm Type
0,0,78,43.396729,-95.923149,"1757 Lily Avenue George 51237, United States",,IA,Lyon County,,,...,,,,,,,,,,Pigs (Meat)
1,1,84,42.92535,-96.429291,"1804 500th Street Hawarden 51023, United States",,IA,Sioux County,,,...,,,,,,,,,,Pigs (Meat)
2,2,87,40.658218,-92.410202,"21166 Mahogany Avenue Bloomfield 52537, United...",,IA,Davis County,,,...,,,,,,,,,,Pigs (Meat)
3,3,88,40.643219,-92.409889,"27268 Mahogany Avenue Bloomfield 52537, United...",,IA,Davis County,,,...,,,,,,,,,,Pigs (Meat)
4,4,89,40.660851,-92.421219,"21166 280th Street Bloomfield 52537, United St...",,IA,Davis County,,,...,,,,,,,,,,Pigs (Meat)


#### Cleaning

To clean these datasets, we matched poultry CAFOs by name and location (if provided) in the state permit dataset to CAFOs listed in the Counterglow dataset, while recording how fuzzy the match was (ie. whether it was a perfect name and location match or a fuzzy match). Both matched and unmatched farms are provided in separate files.

In [6]:
matched_farms = pd.read_csv(MATCHED_FARMS_FPATH)
unmatched_farms = pd.read_csv(UNMATCHED_FARMS_FPATH)

matched_farms.head()

Unnamed: 0.2,Unnamed: 0.1,Unnamed: 0,name,permit,state,source,address,lat,long,Exact Name Match,Fuzzy Name,Exact Name/Location,Fuzzy Name/Exact Location,Location Match
0,9,9,"ALAN YOUNG, POULTRY",General Permit - Poultry AFO,MS,Mississippi Department of Environmental Quality,,,,,ALAN YOUNG POULTRY,,,
1,16,16,"ALLEN AND JONNI JONES, POULTRY",General Permit - Poultry AFO,MS,Mississippi Department of Environmental Quality,,,,,ALLEN AND JONNI JONES POULTRY,,,
2,23,23,"ANDY HOLTCAMP, POULTRY",General Permit - Poultry AFO,MS,Mississippi Department of Environmental Quality,,,,,ANDY HOLTCAMP POULTRY,,,
3,29,29,ARNOLD OVERSTREET POULTRY,General Permit - Poultry AFO,MS,Mississippi Department of Environmental Quality,,,,ARNOLD OVERSTREET POULTRY,,,,
4,31,31,"ARTIS R WINDHAM, POULTRY",General Permit - Poultry AFO,MS,Mississippi Department of Environmental Quality,,,,,ARTIS R WINDHAM POULTRY,,,


In [7]:
unmatched_farms.head()

Unnamed: 0.2,Unnamed: 0.1,Unnamed: 0,name,permit,state,source,address,lat,long,Exact Name Match,...,Phone number,Region,Facility name,Number of animals,Full address,Website URL,Postcode,Suburb/city,Contracted to,Farm Type
0,0,0,4D FARMS LLC,General Permit - Poultry AFO,MS,Mississippi Department of Environmental Quality,,,,,...,,,,,,,,,,
1,1,1,4S FARMS LLC,General Permit - Poultry AFO,MS,Mississippi Department of Environmental Quality,,,,,...,,,,,,,,,,
2,2,2,A B WESTBROOK FARMS LLC,General Permit - Poultry AFO,MS,Mississippi Department of Environmental Quality,,,,,...,,,,,,,,,,
3,3,3,A AND J POULTRY,General Permit - Poultry AFO,MS,Mississippi Department of Environmental Quality,,,,,...,,,,,,,,,,
4,4,4,A AND W GARNER FARMS,General Permit - Poultry AFO,MS,Mississippi Department of Environmental Quality,,,,,...,,,,,,,,,,


### Visualizations of CAFOs

Green markers indicate CAFOs that were found in both datasets, red markers were CAFOs only present in Counterglow, and blue markers were CAFOs only present in state permit data. Only farms with longitude and latitude coordinates provided were plotted. 

In [8]:
nc_map = map_state(MATCHED_FARMS_FPATH, UNMATCHED_FARMS_FPATH, "NC")
nc_map