Improving cancer research with data!
- What is it?
- Main Features
- Impact
- Built With
- Getting Started
- Usage
- Roadmap
- Contributing
- License
- Contact
- Acknowledgements
trialtracker is a Python package that provides methods to easily extract, transform, and download clinical trial data. It aims to create standardized data infrastructure for clinical trial digitalization, focusing on structured representation of clinical trial protocols.
Here are some of the things trialtracker allows you to do:
- Download pre-curated clinical trial and clinical trial eligibility criteria datasets
- Easily query data from clinicaltrials.gov
- Apply state-of-the-art natural language processing methods to extract useful information from raw clinicaltrials.gov data
- Data visualizations and analysis of clinical trial data
The current version of the package is primarily focused on cancer trials, which are an important area for clinical development. Improved data infrastructure is especially helpful in this area given the complexity of the disease and treatments.
Cancer is one of the leading causes of death worldwide. The way we test and approve new treatments is through clinical trials. But 97% of cancer trials
fail,
driven by inability to
recruit
enough patients.
And yet many patients are routinely
excluded
from trials, including minority groups who are most affected by the disease.
The key to solving these problems is in changing how we design trials, recruit patients, and report on results. Regulatory requirements for clinical trial registration became required in 2017, making semi-structured trial protocol data available on clinicaltrials.gov. Today, this is not being systematically used in trial design, patient recruitment, or reporting decisions in Oncology. This project aims to unlock the value of clinical trial data to help accelerate cancer research and improve the lives of cancer patients.
Technologies and methods used to build this project!
To get a local copy up and running follow the steps below.
Get up and running with conda. Given the many dependencies of this project, we use conda as a package/environment manager to make sure we're running things in the same environment and that nothing breaks :)
- Clone the repo
git clone https://github.com/zfx0726/trialtracker.git
- Navigate into the trialtracker project directory and recreate the conda environment.
conda env create --file=trialtrackerenv_py36.yaml
- Activate conda python environment
conda activate trialtrackerenv_py36
Running eligibility criteria extraction with FB Clinical Trial Parser
- Download the MeSH vocabulary, from root directory:
./extract/src/github.com/facebookresearch/Clinical-Trial-Parser/script/mesh.sh
- Navigate into the trialtracker project directory and recreate the conda environment.
Running eligibility criteria extraction with pyMeSHSim
- Download and extract MetaMap as per:
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Distributed under the MIT License. See
LICENSE
for more information.
Forrest Xiao - zfx0726@gmail.com