Skip to content

thelabdc/OVSJG-SUSO-public

Repository files navigation

OVSJG-SUSO

Code for "Does outreach encouraging families to engage with community-based organizations increase engagement and school attendance?"

Automation

Much of this repository is dedicated to the automation of the letter sending process. In order to get that going, run

poetry install

You will then find the command susocli on your path. That command requires a config file, a template of which can be found in config.template.yml. You'll need to fill that out.

WARNING The Dockerfile is currently out of date with the migration to poetry. To fix you'll need to install poetry in the Dockerfile and use poetry install instead of pip -r install requirements.txt and similarly change the commands below to poetry run susocli instead of simply susocli.

The automation aspect of this is handled by the Dockerfile. This is run on a the ktensor box (10.56.6.64) as the following cron job:

0 19-23 * * 1-5 docker run --rm -v /mnt/dockervols/suso:/work thelabdc/ovsjg-suso susocli run /suso/config.yml -t /work/tex -p /work/pdf >> /mnt/dockervols/suso/log 2>&1
1 0 * * 2-6 docker run --rm -v /mnt/dockervols/suso:/work thelabdc/ovsjg-suso susocli run /suso/config.yml -t /work/tex -p /work/pdf >> /mnt/dockervols/suso/log 2>&1
31 0 * * 2-6 docker run --rm -v /mnt/dockervols/suso:/work thelabdc/ovsjg-suso susocli run /suso/config.yml -t /work/tex -p /work/pdf >> /mnt/dockervols/suso/log 2>&1

Though note that the first time you run susocli you'll need to have run susocli create to create the relevant database tables.

Order to run for attendance analyses

If you wish to run the attendance analysis front to back you can do so by first installing all required dependencies thus:

poetry install
Rscript -e 'renv::restore()'

If you do not have renv and poetry installed, these commands will provide them:

curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python3 -
Rscript -e 'if(!requireNamespace("remotes")){install.packages("remotes");remotes::install_github("rstudio/renv")} else {remotes::install_github("rstudio/renv")}'

Once these are installed, you will need the files enumerated in required_data.txt. If you are a member of The Lab @ DC, you should be able to find them in our long term storage. If you are not a member of The Lab @ DC, please contact thelab@dc.gov, though note that access to these data are restricted by agreements with our project partners.

Once these files are in place, you can run:

make all

The outputs of all scripts will be placed in the output directory. The following notes detail exactly what each script does.

Pull, clean, and merge data (src/notebooks/100_pull_and_clean_data)

These files require access to The Lab's OSSE-wide attendance data files for SY 16-17 (pre-intervention year) and SY 17-18 (intervention year). Please note that the data associated with this project is considered highly sensitive. Any researcher wanting to rerun these analyses should contact The Lab @ DC at thelab@dc.gov to discuss potential data sharing agreements.

  1. 015_convert_to_parquet.ipynb
  • Takes in:

    • Zipped attendance files for DCPS and PCS for 16-17 and 17-18
  • What it does:

    • Converts to parquet format
  • Outputs:

    • Parquet format attendance files for DCPS and PCS for 16-17 and 17-18
  1. 020_createlookuptable_suso_osse.ipynb
  • Takes in:

  • What it does:

    • Reads in data from SUSO randomization
    • Reads in student attributes from OSSE (e.g. student name; dob)
    • First tries exact matches on cleaned names and date of birth
    • Then uses fuzzy matching to do approximate matches (usually due to name mispelling)
  • Outputs:

    • lookup_suso_osse: a table stored in the database with a student's suso id, his or her OSSE ID (USI), and whether exact or fuzzy match; used in next script to subset long-form OSSE attendance data (student-day) to SUSO students
  1. 030_attendance_cleaning.ipynb
  • Takes in:

    • lookup_osse_suso: lookup table that has USIs (osse attendance data unique identifiers) for those referred to suso
  • What it does:

    • For present and previous school year, constructs daily tallies of different types of absences and whether student is truant at that date (10 or more unexcused) and/or chronically absent at that date (absent, either excused or unexcused, for more than 10% of school attendance days)
  • Outputs: parquet files used in subsequent scripts

    • dcps_sy1718_attendanceoutcomes_suso: used for intervention year analysis
    • charter_sy1617_attendanceoutcomes_suso: used for PAP baselines
    • dcps_sy1718_attendanceoutcomes_suso: used for intervention year analysis
    • charter_sy1617_attendanceoutcomes_suso: used for PAP baselines
  1. 040_descriptives_previoussy_forPAP.ipynb
  • Takes in:

    • Tables from previous script: dcps/charter_sy1617...
    • Geojson files that have lat/long of schools
    • DC open data geojson file of DC census tract demographics
  • What it does:

    • Descriptives on attendance outcomes at end of school year
    • Map truancy rates by school
  • Outputs: figures for pre-analysis plan and "background" section of writeup

  1. 050_cleanstudents_whoswitchsystems.ipynb
  • Takes in:

    • Tables attendance_cleaning_sql that reflect SUSO year attendance: dcps/charter_sy1718
  • What it does:

    • Identifies students who are in both dcps and charter schools at some point during the year
    • Checks whether those students' attendance is accurately recorded in following file created in previous script attendance_both_clean.parquet

Descriptive Analyses (src/notebooks/200_descriptives)

These notebooks provide the descriptive analyses that appear in the report.

  1. 060_sample_descriptives.ipynb
  • Takes in:

    • Parquet files of all DCPS and PCS students in SY 17-18
    • Lookup table (suso_osse_lookup.pkl) and merged OSSE-SUSO data indicating randomization status (df_suso_merged.csv)
  • What it does:

    • Summarizes descriptive characteristics for three groups of students: (1) students in SUSO schools but not in sample, (2) students in SUSO schools and in sample, (3) students not in SUSO schools
    • Summarizes balance across treatment and control groups
  • Outputs: figures for writeup saved in output

  1. 070_attendance_descriptives.ipynb
  • Takes in: data created in earlier scripts

    • Attendance end-of-year outcomes: attendance_eoy_wsuso.pkl
    • SUSO data: df_suso_merged.csv
    • Attendance daily outcomes: attendance_both_clean.pkl
  • What it does:

    • Finds the start dates for the different clocks (7 days post-referral versus observed delivery date)
    • For delivery date clock, matches control group students with nearest-referral tx student
    • Using the different clocks, calculates changes in absences from start of clock to two calendar weeks after the end of the clock
  • Outputs:

    • figures for writeup
    • attendance_readyforAB.pkl
    • attendance_readyforregressions.csv

Conduct A/B testing and regression analyses (src/notebooks/300_analysis)

These scripts actually perform the analysis on the data that we have generated for the SUSO project. If you have a copy of the data generated in the previous scripts, then you should be able to run these files front to back.

  1. 080_attendance_ABtests.ipynb
  • Takes in:

    • attendance_readyforAB.pkl (created in previous script)
  • What it does: -Conduct A/B tests described in pre-analysis plan -Plots distribution of pr(treat > control) (or vice versa) over draws

  • Outputs: figures for writeup and attendanceoutcomes_posteriors_toplot.csv used in next script

  1. 090_analyze_engagement.ipynb

Original notebook for analyzing impact on family engagement.

Visualize results and R-based regression robustness (src/R/400_additional_plots)

  1. 100_attendance_ABtests_additionalplots.Rmdb
  • Takes in:

    • attendanceoutcomes_posteriors_toplot.csv
  • What it does:

    • Plots posterior differences in outcomes between treatment and control (want to skew away from 0)
  • Outputs: figures for writeup

  1. 110_attendance_regressionrobustness.Rmd
  • Takes in:

    • attendance_readyforregressions.csv
  • What it does:

    • Regressions as robustness checks on A/B tests (logistic reg for binary with and without covars; linear and negative binomial for count outcomes)
  • Outputs: figures and tables for writeup; tables are latex via stargazer and then copied to a .tex file

About

Automating sending letters for SUSO

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •