RefugeeMatch — Caseworker Decision Support Tool

A machine learning system that predicts whether newly-arrived refugee families will struggle to afford monthly living expenses, helping resettlement caseworkers identify families who need proactive support.

Author: Abbot Tubeine, Sattler College Course: BUS302 Advances in Data Dataset: 2022 Annual Survey of Refugees (ASR), distributed via ICPSR

What's in this project

File	Purpose
`refugee_model.ipynb`	Training notebook — data cleaning, feature selection, model training, optimization, model export
`app.py`	Streamlit caseworker dashboard — intake form, risk prediction, explanations, recommendations
`requirements.txt`	Python dependencies
`asr_2022_data.dta`	Source data (download from ICPSR; rename from `2022 ASR_Public_Use_File.dta`)
`refugee_model.pkl`	Trained model bundle (generated by running the notebook)

How to run end-to-end

Step 1: Train the model

Get the 2022 ASR data:
- Visit ICPSR project E207021V1
- Create a free account and download the zip
- Extract 2022 ASR_Public_Use_File.dta
- Rename it to asr_2022_data.dta
Open refugee_model.ipynb in Jupyter or Google Colab.
Run all cells top to bottom. The final cells (Section 14) will save refugee_model.pkl — the bundle the Streamlit app needs.

Step 2: Run the Streamlit app locally

pip install -r requirements.txt
streamlit run app.py

Then open the URL shown (typically http://localhost:8501) in your browser.

Step 3 (optional): Deploy to the web

The fastest way to share the app:

Push this directory to a GitHub repo (don't push asr_2022_data.dta — it's licensed)
Go to share.streamlit.io
Sign in with GitHub and click "New app"
Point it at your repo and select app.py as the main file
Click Deploy

The app will be live at a public URL within a couple of minutes.

How the model works

Design constraint

The model uses only intake-collectable features — things a caseworker can know during the first conversation with a newly-arrived refugee. It does NOT use post-arrival information like current employment status, current English level, or current income, even though those would improve raw accuracy.

Features used (12)

Demographics: household size, sex, age
Background: lived in refugee camp, years in camp, marital status
Skills: education before U.S. arrival, native language literacy, English on arrival
Pre-arrival history: work status in home country (Employed / Self-employed / Not working / Other)
Placement: region of resettlement, year of arrival

Features deliberately excluded

Country of birth — excluded for fairness reasons (protected demographic; signal is largely captured by other features)
All post-arrival outcomes — would not be available at intake time

Performance

Test F1 score: ~0.30–0.40 (after optimization)
Test ROC-AUC: ~0.70–0.75
Threshold tuned for F1 maximization given the 87/13 class imbalance

Limitations

This is a decision-support tool, not a replacement for caseworker judgment:

Small sample size — ~1,400 usable rows after cleaning
Severe class imbalance — only ~13% of refugees in the data couldn't afford monthly expenses
Region-level geography only — public-use file suppresses cities, so we can't capture city-level economic variation
Single survey year (2022) — outcomes may shift across years
Self-reported target — "can pay expenses" is subjective
Survivorship bias — only refugees who stayed at their resettlement address were surveyed

All flagged cases should be reviewed by a qualified caseworker. The model is meant to surface families who may benefit from extra attention, not to make resource allocation decisions on its own.

Citation

Urban Institute. 2022 Annual Survey of Refugees. Inter-university Consortium for Political and Social Research [distributor], 2024-09-20. DOI: 10.3886/E207021V1

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.DS_Store		.DS_Store
Appendix A ASR 2022 Questionnaire.pdf		Appendix A ASR 2022 Questionnaire.pdf
README.md		README.md
app.py		app.py
asr_2022_data.dta		asr_2022_data.dta
notes.md		notes.md
refugee_model.ipynb		refugee_model.ipynb
refugee_model.pkl		refugee_model.pkl
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RefugeeMatch — Caseworker Decision Support Tool

What's in this project

How to run end-to-end

Step 1: Train the model

Step 2: Run the Streamlit app locally

Step 3 (optional): Deploy to the web

How the model works

Design constraint

Features used (12)

Features deliberately excluded

Performance

Limitations

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RefugeeMatch — Caseworker Decision Support Tool

What's in this project

How to run end-to-end

Step 1: Train the model

Step 2: Run the Streamlit app locally

Step 3 (optional): Deploy to the web

How the model works

Design constraint

Features used (12)

Features deliberately excluded

Performance

Limitations

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages