Skip to content

💵 Downloader, preprocessor, parser and deduper for NIH and NSF grants

Notifications You must be signed in to change notification settings

titipata/grant_database

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Grant database

We create a downloader, parser and database for NIH and NSF grant generating from their website. The link for NSF awards data is here and for NIH award is here.

Check out nih and nsf folder, we provide bash and python script to download and parse data into csv file. Also checkout dedupe folder soon where we put script to deduplicate and link NIH/NSF grant together.

Download cleaned data from Amazon S3

First, you have to install awscli using pip (see this instruction). We now provide parsed data of NSF. You can use awscli to download as follows:

aws s3 cp s3://grant-dataset/ data/ --recursive --exclude dedupe/ --region us-west-2 # download nih, nsf, and grid data

This contains around 2M grants (1.7 Gb) from NIH and 500k grants from NSF (700 Mb).

Install dependencies

We have pandas and lxml as an dependencies provided in requirements.txt. You can install the dependencies using pip.

pip -r install requirements.txt

Members

About

💵 Downloader, preprocessor, parser and deduper for NIH and NSF grants

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published