Skip to content
This repository has been archived by the owner on Jun 23, 2023. It is now read-only.
/ wwc_api Public archive

What Works Clearinghouse data, normalized for RESTful API format.

Notifications You must be signed in to change notification settings

ypaulsussman/wwc_api

Repository files navigation

What Works Clearinghouse API

Update 05/22/20: I wouldn't say I'm abandoning this project, but I am discontinuing work on it for the foreseeable future. I touch on the reasons for doing so - as well as what I learned from getting this far! - at a post here.

If you're looking for code to extract and apply to your own exploration of the WWC data, honestly most of the interesting stuff is in the db/* dir: especially the three sets of wwc_* ETL scripts and the (pretty gnarly, if I say so myself!) PostgreSQL of the 20200507003934_add_searchable_fields_to_studies.rb migration.

Between the two of them, they should get you pretty close to having a normalized, SQL-friendly version of the WWC dataset. NB that there are several extant discrepancies in the original schema; I've yet to submit them to WWC for correction, but they can be located at notes_and_docs/initial_data_problems.md if you'd like to do so!

Repo Purpose

I used my previous Rails toy app to get more familiar with foreman, ActionMailer, Rails 5.1+ system testing, webpack, hand-rolled session-based auth(z/n), and basic full-text search in PostgreSQL.

I'm using this one to learn about JWT, and PostgreSQL's more in-depth full-text search options, before setting it up to feed JSON to (at least one) SPA... so I can learn exactly how much I dislike this decoupled approach to app development ^_^

(In addition, I like what WWC does quite a bit, but I find their their current browser UI opaque and unergonomic to navigate.)

Setup

  • DB Creation

    • Option 01: run rails db:reset studies=db/WWC-export-archive-2020-Apr-25-142355/Studies.csv findings=db/WWC-export-archive-2020-Apr-25-142355/Findings.csv reports=db/WWC-export-archive-2020-Apr-25-142355/InterventionReports.csv
      • For newer data, simply substitute the CSV filepaths: modulo any newly-added corruptions to the data, the scrubbers/loaders should function identically.
      • This option is sloooooooooow -- like, ~8-9 minutes slow. It's doing tons of table sequential-scans, and instancing tons of ActiveRecord objects (neither of which is necessary: but the removal of which is an optimization I haven't yet had time for.)
    • Option 02: run rails db:create && rails db:migrate && psql -d wwc_api_development -f ../2020_04_25_data.sql
      • This requires you to download a public Gist containing the data.
      • You're stuck with the data from April 25th, 2020 (unless you want to update and PR!) 😸
      • On the other hand, this method takes under a second.
  • JWT Testing

    • The only requirement to make use of the scripts in notes_and_docs/wwc_api.postman_collection.json is to first create an email:password record in the users table
    • The simplest way (i.e. no changes needed to those Postman queries) is to rails c in, then User.create!(email: 'foobar@example.com', password: 'password')
  • Querying

    • There are currently two search endpoints:
      • StudySearchesController#autocomplete, uh, serializes and returns your query params. (As close to a noop placeholder as I could get!)
      • StudySearchesController#create performs a full-text search against any of the three tokenized author, title, and publication fields.
    • The eventual (and, see above, currently indefinitely-paused) goal is to debounce-hit autocomplete to gather a list of viable query-terms as the user types their entry.

Next Steps: API/Server

  • Finish studies-search page

    • Add logic for prefilter using sidebar/request.body-params
  • Add studies autocomplete

    • add trigrams columns, per docs
    • use the same regexp you did to extract author_fts, title_fts, and publication_fts.)
    • add method on Study model (or elsewhere?) to select ten (20?) most-similar words from that column
  • Add interventions-search page

    • Add scraper script for FTS descriptions field on interventions table
      • Use Intervention_Page_URL?
    • Extract outcome_domain to separate Model (...eventually)
    • How does products relate to interventions in the reviews table?
  • Add [Review, Finding] search (by Protocol / Protocol Version...and Standards?)

  • Add Histogram chart (with selector for what to plot on x/y axes? Or static RQ's, like...)

    • Which topics most commonly collocate with each other?
    • Which topics most commonly collocate across years?
    • Which fields return the most/strongest findings?

Next Steps: Client

  • Build Controller classes only as needed
  • No CSS framework: use FEM notes/O'Reilly books (can possibly reuse across apps)
  • One API, two SPA's
    • Vue app
      • New framework
      • Still have component classes/lifecycle events
    • React app
      • Familiar framework
      • Only use Hooks and Context API's for state-management
  • Consider building a third, HTML-first, version: perhaps using this fetch() demo for faster reloads

About

What Works Clearinghouse data, normalized for RESTful API format.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published