What Works Clearinghouse API

Update 05/22/20: I wouldn't say I'm abandoning this project, but I am discontinuing work on it for the foreseeable future. I touch on the reasons for doing so - as well as what I learned from getting this far! - at a post here.

If you're looking for code to extract and apply to your own exploration of the WWC data, honestly most of the interesting stuff is in the db/* dir: especially the three sets of wwc_* ETL scripts and the (pretty gnarly, if I say so myself!) PostgreSQL of the 20200507003934_add_searchable_fields_to_studies.rb migration.

Between the two of them, they should get you pretty close to having a normalized, SQL-friendly version of the WWC dataset. NB that there are several extant discrepancies in the original schema; I've yet to submit them to WWC for correction, but they can be located at notes_and_docs/initial_data_problems.md if you'd like to do so!

Repo Purpose

I used my previous Rails toy app to get more familiar with foreman, ActionMailer, Rails 5.1+ system testing, webpack, hand-rolled session-based auth(z/n), and basic full-text search in PostgreSQL.

I'm using this one to learn about JWT, and PostgreSQL's more in-depth full-text search options, before setting it up to feed JSON to (at least one) SPA... so I can learn exactly how much I dislike this decoupled approach to app development ^_^

(In addition, I like what WWC does quite a bit, but I find their their current browser UI opaque and unergonomic to navigate.)

Setup

DB Creation
- Option 01: run rails db:reset studies=db/WWC-export-archive-2020-Apr-25-142355/Studies.csv findings=db/WWC-export-archive-2020-Apr-25-142355/Findings.csv reports=db/WWC-export-archive-2020-Apr-25-142355/InterventionReports.csv
  - For newer data, simply substitute the CSV filepaths: modulo any newly-added corruptions to the data, the scrubbers/loaders should function identically.
  - This option is sloooooooooow -- like, ~8-9 minutes slow. It's doing tons of table sequential-scans, and instancing tons of ActiveRecord objects (neither of which is necessary: but the removal of which is an optimization I haven't yet had time for.)
- Option 02: run rails db:create && rails db:migrate && psql -d wwc_api_development -f ../2020_04_25_data.sql
  - This requires you to download a public Gist containing the data.
  - You're stuck with the data from April 25th, 2020 (unless you want to update and PR!) 😸
  - On the other hand, this method takes under a second.
JWT Testing
- The only requirement to make use of the scripts in notes_and_docs/wwc_api.postman_collection.json is to first create an email:password record in the users table
- The simplest way (i.e. no changes needed to those Postman queries) is to rails c in, then User.create!(email: 'foobar@example.com', password: 'password')
Querying
- There are currently two search endpoints:
  - StudySearchesController#autocomplete, uh, serializes and returns your query params. (As close to a noop placeholder as I could get!)
  - StudySearchesController#create performs a full-text search against any of the three tokenized author, title, and publication fields.
- The eventual (and, see above, currently indefinitely-paused) goal is to debounce-hit autocomplete to gather a list of viable query-terms as the user types their entry.

Next Steps: API/Server

Finish studies-search page
- Add logic for prefilter using sidebar/request.body-params
Add studies autocomplete
- add trigrams columns, per docs
- use the same regexp you did to extract author_fts, title_fts, and publication_fts.)
- add method on Study model (or elsewhere?) to select ten (20?) most-similar words from that column
Add interventions-search page
- Add scraper script for FTS descriptions field on interventions table
  - Use Intervention_Page_URL?
- Extract outcome_domain to separate Model (...eventually)
- How does products relate to interventions in the reviews table?
Add [Review, Finding] search (by Protocol / Protocol Version...and Standards?)
Add Histogram chart (with selector for what to plot on x/y axes? Or static RQ's, like...)
- Which topics most commonly collocate with each other?
- Which topics most commonly collocate across years?
- Which fields return the most/strongest findings?

Next Steps: Client

Build Controller classes only as needed
No CSS framework: use FEM notes/O'Reilly books (can possibly reuse across apps)
One API, two SPA's
- Vue app
  - New framework
  - Still have component classes/lifecycle events
- React app
  - Familiar framework
  - Only use Hooks and Context API's for state-management
Consider building a third, HTML-first, version: perhaps using this fetch() demo for faster reloads

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
app		app
bin		bin
config		config
db		db
lib		lib
log		log
notes_and_docs		notes_and_docs
public		public
storage		storage
test		test
tmp		tmp
vendor		vendor
.gitignore		.gitignore
.rubocop.yml		.rubocop.yml
.ruby-version		.ruby-version
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
README.md		README.md
Rakefile		Rakefile
config.ru		config.ru

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What Works Clearinghouse API

Repo Purpose

Setup

Next Steps: API/Server

Next Steps: Client

About

Releases

Packages

Contributors 2

Languages

ypaulsussman/wwc_api

Folders and files

Latest commit

History

Repository files navigation

What Works Clearinghouse API

Repo Purpose

Setup

Next Steps: API/Server

Next Steps: Client

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages