Welcome to the SkimLit App!

The aim of skimlit is to make lengthy summaries skimmable to the eyes. Though abstracts are already summaries of their main documents, they can still be quite hard to read. Thankfully, AI can help! The experiments closely follow the models attempted by the paper PubMed 200k RCT: a Dataset for Sequenctial Sentence Classification in Medical Abstracts, using the same data source.

The SkimLit app

Instantiate the environment, type the following commands in your terminal:

git clone https://github.com/tituslhy/Skimlit
pip -r requirements.txt

Run the backend (written with FastAPI), type the following commands in your terminal:

pip install "uvicorn[standard]"
uvicorn app:app --port 8000 --reload

Run the frontend (written with Streamlit), open a new terminal instance and type the following command: streamlit run skimlit.py

This launches the application's user interface. Feel free to interact with it!

Users are encouraged to upload their unskimmable summaries to the text folder and click 'skim it'. This loads the model which will then be used to run an inference on the submitted text.

A skimmable summary is then returned as an output to the user

The model

Unfortunately the model size is too large and will not be uploaded on GitHub - but do reach out to me if you would like to have the exact weights. The model architecture (specifics are available in utils/utils.py under the 'build_model' function) is:

All experimented models are trained over 3 epochs on only 10% of the training data to speed up experiment timing. The best model of the experiments tabulated below is then trained on all training data over 5 epochs. A summary of all experimented models and their validation accuracy are as follows:

Experiment	Model	Validation accuracy (on 10% of test data)	Findings
Naive-Bayes TF-IDF Classifier	This is the baseline model which serves as the benchmark to all other models experimented.	72.2%	The baseline model has a surprisingly good score!
Conv1D on Word embeddings	Learned a 128 dimension embedding for each word in vocabulary and added a Conv1D layer on top with an n-gram of 5	79.7%	This was the second best performing model. Word embeddings are clearly very important in helping the model understand sentence classification in an abstract.
Universal Sentence Encoder (USE) and Conv1D layer on word embeddings	Using USE word embeddings from tensorflow hub, we add a Conv1D layer on top with an n-gram of 5, and fit to the dataset. Word embeddings layer was frozen.	71.2%	Performance was expectedly poorer because there are fewer parameters to train, given that the embeddings layer was frozen.
Conv1D character embeddings	This model is focused on character embeddings only. We learn a 28 dimension embedding for each character including [UNK].	65.2%	Performance is the worst here, indicating either that we need a more sophisticated model to learn character embeddings adequately, or that character embeddings simply are not the ideal for this task.
Combining USE Sentence Encoder and Conv1D word embeddings with Conv1D character embeddings	This is a hybrid model of the previous 2 approaches.	73.1%	Performance barely beats the baseline - this is likely because the learning of character embeddings pulled the validation accuracy down.
Tribrid model with USE word embeddings, character embeddings and sentence positioning embeddings	We derive the sentence's position as an embedding for each sentence in each abstract, and the total abstract length as another embedding input to the USE word-char hybrid model before. This model now takes in 4 tensor inputs (words, characters, sentence position in abstract and total lines in abstract)	83.0% (final testing accuracy after 5 epochs of training is 84.8%)	This finding shows that sentence position is very important in its classification.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
Images		Images
Notebooks		Notebooks
config		config
utils		utils
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
skimlit.py		skimlit.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Welcome to the SkimLit App!

The SkimLit app

The model

About

Releases

Packages

Languages

License

tituslhy/Skimlit

Folders and files

Latest commit

History

Repository files navigation

Welcome to the SkimLit App!

The SkimLit app

The model

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages