Skip to content

tcodeabbot/refugeeMath_machine_learning_model

Repository files navigation

RefugeeMatch — Caseworker Decision Support Tool

A machine learning system that predicts whether newly-arrived refugee families will struggle to afford monthly living expenses, helping resettlement caseworkers identify families who need proactive support.

Author: Abbot Tubeine, Sattler College Course: BUS302 Advances in Data Dataset: 2022 Annual Survey of Refugees (ASR), distributed via ICPSR


What's in this project

File Purpose
refugee_model.ipynb Training notebook — data cleaning, feature selection, model training, optimization, model export
app.py Streamlit caseworker dashboard — intake form, risk prediction, explanations, recommendations
requirements.txt Python dependencies
asr_2022_data.dta Source data (download from ICPSR; rename from 2022 ASR_Public_Use_File.dta)
refugee_model.pkl Trained model bundle (generated by running the notebook)

How to run end-to-end

Step 1: Train the model

  1. Get the 2022 ASR data:

    • Visit ICPSR project E207021V1
    • Create a free account and download the zip
    • Extract 2022 ASR_Public_Use_File.dta
    • Rename it to asr_2022_data.dta
  2. Open refugee_model.ipynb in Jupyter or Google Colab.

  3. Run all cells top to bottom. The final cells (Section 14) will save refugee_model.pkl — the bundle the Streamlit app needs.

Step 2: Run the Streamlit app locally

pip install -r requirements.txt
streamlit run app.py

Then open the URL shown (typically http://localhost:8501) in your browser.

Step 3 (optional): Deploy to the web

The fastest way to share the app:

  1. Push this directory to a GitHub repo (don't push asr_2022_data.dta — it's licensed)
  2. Go to share.streamlit.io
  3. Sign in with GitHub and click "New app"
  4. Point it at your repo and select app.py as the main file
  5. Click Deploy

The app will be live at a public URL within a couple of minutes.


How the model works

Design constraint

The model uses only intake-collectable features — things a caseworker can know during the first conversation with a newly-arrived refugee. It does NOT use post-arrival information like current employment status, current English level, or current income, even though those would improve raw accuracy.

Features used (12)

  • Demographics: household size, sex, age
  • Background: lived in refugee camp, years in camp, marital status
  • Skills: education before U.S. arrival, native language literacy, English on arrival
  • Pre-arrival history: work status in home country (Employed / Self-employed / Not working / Other)
  • Placement: region of resettlement, year of arrival

Features deliberately excluded

  • Country of birth — excluded for fairness reasons (protected demographic; signal is largely captured by other features)
  • All post-arrival outcomes — would not be available at intake time

Performance

  • Test F1 score: ~0.30–0.40 (after optimization)
  • Test ROC-AUC: ~0.70–0.75
  • Threshold tuned for F1 maximization given the 87/13 class imbalance

Limitations

This is a decision-support tool, not a replacement for caseworker judgment:

  1. Small sample size — ~1,400 usable rows after cleaning
  2. Severe class imbalance — only ~13% of refugees in the data couldn't afford monthly expenses
  3. Region-level geography only — public-use file suppresses cities, so we can't capture city-level economic variation
  4. Single survey year (2022) — outcomes may shift across years
  5. Self-reported target — "can pay expenses" is subjective
  6. Survivorship bias — only refugees who stayed at their resettlement address were surveyed

All flagged cases should be reviewed by a qualified caseworker. The model is meant to surface families who may benefit from extra attention, not to make resource allocation decisions on its own.


Citation

Urban Institute. 2022 Annual Survey of Refugees. Inter-university Consortium for Political and Social Research [distributor], 2024-09-20. DOI: 10.3886/E207021V1

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors