SARRLEK

Semantic Analysis of Radiology Reports with Lexicon as Embedded Knowledge

Installing required packages

Python requirements are listed on the requirements.txt file. To install all requirements, make sure your run:

pip install -r requirements.txt

Running pipeline

Currently there are 3 steps to run in the pipeline: data preparation, training word2vec model and training topic model (LDA). This can be done with the following python scripts:

data preparation:

python cleaning.py --in=Data/ct_report_dmg_thoracic_only.csv --out=Data/ct_report_dmg_thoracic_only_CLEAN.csv --augment-data

data preparation (with additional structurize step):

python cleaning.py --in=Data/ct_report_dmg_thoracic_only.csv --out=Data/ct_report_dmg_thoracic_only_CLEAN_STRUCT.csv --augment-data --structurize

training word2vec model:

python buildw2v.py --in Data/ct_report_dmg_thoracic_only_CLEAN.csv --out Data/ct_report_model.w2v --vectors Data/ct_report_vectors.csv

training topic model:

python buildlda.py --in Data/ct_report_dmg_thoracic_only_CLEAN.csv --out-dict Data/ct_report_topics.dict --out-model Data/ct_report_topics.lda --out-topics Data/ct_report_topics.csv

training flair model:

python buildflair.py --in-reports=Data/ct_report_dmg_thoracic_only_CLEAN.csv --in-classes=Data/classes.csv --cancer-classifier=out/classifier_cancer --prog-classifier=out/classifier_prog

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
Algorithm		Algorithm
Data		Data
Notebooks		Notebooks
2018umlsradlex.zip		2018umlsradlex.zip
9.csv.xml		9.csv.xml
README.md		README.md
Sarrlek FLAIR.png		Sarrlek FLAIR.png
Sarrlek LDA.png		Sarrlek LDA.png
Sarrlek W2V.png		Sarrlek W2V.png
UmlsLookupAnnotator.xml		UmlsLookupAnnotator.xml
buildflair.py		buildflair.py
buildlda.py		buildlda.py
buildw2v.py		buildw2v.py
classifiers.py		classifiers.py
cleaning.py		cleaning.py
cleverparser.py		cleverparser.py
csv_splitter.py		csv_splitter.py
helpers.py		helpers.py
requirements.txt		requirements.txt
runall.sh		runall.sh
script0.py		script0.py
testflair.py		testflair.py
testlda.py		testlda.py
testw2v.py		testw2v.py
torchnn.py		torchnn.py
trainlda.py		trainlda.py
trainw2v.py		trainw2v.py

c-martinez/SARRLEK

Folders and files

Latest commit

History

Repository files navigation

SARRLEK

Installing required packages

Running pipeline

About

Resources

Stars

Watchers

Forks

Languages