# USDA 2012 – 2017 Hired Labor Data

## Data Overview 

This dataset includes state-level hired farm labor data in the United States for the years 2012 and 2017. It captures the number of hired workers, broken down by days worked, as well as totals for migrant and unpaid workers. The data offers insights into the structure and trends of the agricultural labor force across states. 

 

## Data Structure 
    state (string): Name of the U.S. state 
    year (int): Reporting year 
    Hired farm labor (string): Total number of hired farm workers 
    Workers by days worked – 150 days or more: Number of workers employed for 150 days or more 
    Workers by days worked – Less than 150 days: Number of workers employed for less than 150 days 
    Reported only workers working 150 days or more: Workers reported only in the 150+ days category 
    Reported only workers working less than 150 days: Workers reported only in the <150 days category 
    Reported both - 150 days or more, workers: Workers who worked both categories, counted in 150+ days 
    Reported both - less than 150 days, workers: Workers who worked both categories, counted in <150 days 
    Total migrant workers: Number of migrant workers 
    Unpaid workers: Number of unpaid farm workers 

Values are generally stored as strings, some with formatting (e.g., commas), and may include codes like (D) indicating data is undisclosed or unavailable. 

## Data Collection & Processing 

The data was originally sourced from the USDA labor reports. It has been cleaned, structured consistently, and split by year for ease of analysis. Days worked serves as a primary categorization for labor type. Inclusion of migrant and unpaid worker counts allows for deeper exploration of farm labor conditions and workforce composition. 

In [1]:
import pandas as pd

# Read the raw labor2012 dataset
df_labor2012 = pd.read_csv("Labor2012.csv")

# Print the number of rows and preview the structure
print(f"Dataset size: {len(df_labor2012)} rows")
df_labor2012.head(5)

Dataset size: 50 rows


Unnamed: 0,state,year,Hired farm labor,Workers by days worked\n- 150 days or more,Workers by days worked\n- Less than 150 days,Reported only workers working \n150 days or more,Reported only workers working\n less than 150 days,"Reported both - workers working\n 150 days or more and workers\n working less than 150 days\n- 150 days or more, workers","Reported both - workers working\n 150 days or more and workers\n working less than 150 days\n- less than 150 days, workers",Total migrant workers,Unpaid workers
0,ALABAMA,2012,32948,10311,22637,5581,17304,4730,5333,2032,42969
1,ALASKA,2012,1577,464,1113,133,450,331,663,(D),990
2,ARIZONA,2012,29245,16066,13179,8315,6047,7751,7132,3629,28429
3,ARKANSAS,2012,33104,13663,19441,6985,12707,6678,6734,1358,43305
4,CALIFORNIA,2012,465422,205851,259571,71384,69900,134467,189671,131457,72020


In [2]:
# Read the raw labor2017 dataset
df_labor2017 = pd.read_csv("Labor2017.csv")

# Print the number of rows and preview the structure
print(f"Dataset size: {len(df_labor2017)} rows")
df_labor2017.head(5)

Dataset size: 50 rows


Unnamed: 0,state,year,Hired farm labor,Workers by days worked\n- 150 days or more,Workers by days worked\n- Less than 150 days,Reported only workers working \n150 days or more,Reported only workers working\n less than 150 days,"Reported both - workers working\n 150 days or more and workers\n working less than 150 days\n- 150 days or more, workers","Reported both - workers working\n 150 days or more and workers\n working less than 150 days\n- less than 150 days, workers",Total migrant workers,Unpaid workers
0,ALABAMA,2017,26136,9734,16402,5021,12220,4713,4182,1864,43162
1,ALASKA,2017,1988,537,1451,129,669,408,782,123,1479
2,ARIZONA,2017,24648,14254,10394,7012,3907,7242,6487,4059,21558
3,ARKANSAS,2017,29047,12694,16353,7156,10556,5538,5797,1794,44894
4,CALIFORNIA,2017,377593,187875,189718,78324,53532,109551,136186,105057,62897


In [3]:
# Define the 50 valid U.S. states
US_STATES = [
    'ALABAMA', 'ALASKA', 'ARIZONA', 'ARKANSAS', 'CALIFORNIA', 'COLORADO', 'CONNECTICUT',
    'DELAWARE', 'FLORIDA', 'GEORGIA', 'HAWAII', 'IDAHO', 'ILLINOIS', 'INDIANA', 'IOWA',
    'KANSAS', 'KENTUCKY', 'LOUISIANA', 'MAINE', 'MARYLAND', 'MASSACHUSETTS', 'MICHIGAN',
    'MINNESOTA', 'MISSISSIPPI', 'MISSOURI', 'MONTANA', 'NEBRASKA', 'NEVADA',
    'NEW HAMPSHIRE', 'NEW JERSEY', 'NEW MEXICO', 'NEW YORK', 'NORTH CAROLINA',
    'NORTH DAKOTA', 'OHIO', 'OKLAHOMA', 'OREGON', 'PENNSYLVANIA', 'RHODE ISLAND',
    'SOUTH CAROLINA', 'SOUTH DAKOTA', 'TENNESSEE', 'TEXAS', 'UTAH', 'VERMONT',
    'VIRGINIA', 'WASHINGTON', 'WEST VIRGINIA', 'WISCONSIN', 'WYOMING'
]

In [4]:
def clean_labor_file(file_path: str, year: int) -> pd.DataFrame:
    """
    Load and clean USDA hired labor CSV for a specific year.

    - Filters valid states
    - Standardizes column names
    - Extracts the hired farm labor column
    - Cleans numeric formatting
    - Adds year column
    """
    df = pd.read_csv(file_path)

    # Standardize column names and state formatting
    df.columns = df.columns.str.lower().str.strip()
    df["state"] = df["state"].str.upper().str.strip()

    # Filter to 50 U.S. states only
    df = df[df["state"].isin(US_STATES)]

    # Locate the hired labor column
    hired_col = [col for col in df.columns if "hired" in col and "labor" in col][0]
    df = df[["state", hired_col]].copy()
    df.rename(columns={hired_col: "hired_farm_labor"}, inplace=True)

    # Remove commas and convert to numeric
    df["hired_farm_labor"] = (
        df["hired_farm_labor"]
        .astype(str)
        .str.replace(",", "", regex=False)
    )
    df["hired_farm_labor"] = pd.to_numeric(df["hired_farm_labor"], errors="coerce")

    # Add year column
    df["year"] = year

    return df


In [5]:
# Paths to your local data files (update if needed)
file_2012 = "Labor2012.csv"
file_2017 = "Labor2017.csv"

# Apply cleaning
labor_2012 = clean_labor_file(file_2012, 2012)
labor_2017 = clean_labor_file(file_2017, 2017)

# Combine cleaned data
combined_labor = pd.concat([labor_2012, labor_2017], ignore_index=True)
combined_labor.to_csv("cleaned_labor_data.csv", index=False)

# Display preview
print("Cleaned labor data saved as 'cleaned_labor_data.csv'")
print(f"Dataset size: {len(combined_labor)} rows")
combined_labor.head()

Cleaned labor data saved as 'cleaned_labor_data.csv'
Dataset size: 100 rows


Unnamed: 0,state,hired_farm_labor,year
0,ALABAMA,32948,2012
1,ALASKA,1577,2012
2,ARIZONA,29245,2012
3,ARKANSAS,33104,2012
4,CALIFORNIA,465422,2012


## Data Quality & Limitations 

The “Hired farm labor” column does not include contract laborers. This means workers hired through third-party labor contractors are excluded from this count. However, both the “Unpaid workers” and “Total migrant workers” categories may include individuals who are also counted as hired labor. These categories are not mutually exclusive with “Hired farm labor.” 