Skip to content

whisk-ml/disaster_tweets

 
 

Repository files navigation

Real or Not? NLP with Disaster Tweets

This is a Tensorflow-backed Keras model that predicts which tweets are about real disasters and which ones are not. It's derived from the popular Basic EDA,Cleaning and GloVe Kaggle Notebook.

The project is structured with whisk, an ML project framework that make makes collaboration, reproducibility, and deployment "just work".

Besides Tensorflow+Keras, the project uses DVC to version control the data download and training stages. As the training stage takes ~20 minutes on a laptop, this can save a significant amount of time when bootstrapping the project.

Using the trained model

You can use the trained model in two different ways:

  1. Install the Python Package
  2. Deploy to Heroku as a web service

Install the Python Package

Install this model via pip:

    pip install git+https://github.com/whisk-ml/disaster_tweets/

See the quickstart section for usage info.

Deploy to Heroku as a web service

Click the button below to deploy the Flask web service to Heroku. See app/README.md for the HTTP API.

Deploy

Checking out the project

You can checkout the project source code and run the model, notebooks, Flask app, and more. There are two options:

  1. GitHub Codespaces
  2. From source code

GitHub Codespaces

The project comes with a devcontainer.json file for the default workspace configuration. This runs the project setup commands - no extra configuration is required.

When using the terminal, activate the project venv first:

source venv/bin/activate

See the quickstart section for usage info.

From source code

Prerequisites

The following is required to run this project:

  • Git
  • Python 3.6+
  • A Linux-based OS

Setup

After cloning this repo and cd disaster_tweets run the following in your terminal:

pip install whisk
whisk setup
source venv/bin/activate
whisk dvc setup
dvc pull

The commands above install whisk, setup the project environment, activate the created venv, setup dvc, and download data stored in DVC.

See the quickstart section for usage info.

Quickstart

After installing pip or running setup, invoke the model from the command line:

disaster_tweets predict "Theyd probably still show more life than Arsenal did yesterday, eh? EH?"
0.19104013

disaster_tweets predict "Just happened a terrible car crash"
0.658098

Use within Python:

from disaster_tweets.models.model import Model
model = Model()
model.predict(["Theyd probably still show more life than Arsenal did yesterday, eh? EH?"])

DVC stages

If you checked out the project source code you can run the DVC stages.

Run the training stage:

dvc repro train.dvc

Run the download stages:

dvc repro download_dataset.dvc
dvc repro download_glove.dvc

Learn more about whisk

To learn more about whisk, here are a few helpful doc pages:

About

A Tensorflow-backed Keras model that predicts which tweets are about real disasters and which ones are not. Built with whisk.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • Jupyter Notebook 90.6%
  • Python 8.6%
  • Other 0.8%