# Disaster Management App - Using ML Pipeline 

For the purpose of disaster management, disaster messages are collected and categorized . An important aspect of disaster management is the choice of the disaster categories which allow to take appropriate measures to manage the situstion. In case of accident, fire or other immediate threats, a reaction is expected within a few minutes. 

**How could AI and Machine Learning support disaster management in this context?**

A disaster response App using Machine Learning to classify disaster messages could be the answer to the question above. In this solution, a multi-class Machine Learning classifier could classify the disaster messages using NLP techniques. The outcoume of the classification would be made available to the disaster management office, which will take the appropriate measures to manage the situation.The solution proposed in this work contains three parts.

- **ETL Pipeline for data preparation (ETL-Pipeline-Preparation.ipynb)**

The ETL pipeline will prepare the data to make it clean for machine learning. The data are read from csv-files, tranformed and stored in a database-file.

- **Multi-Classes Machine Learning Pipeline (Disaster-Response-ML-Pipeline.ipynb)**

The multi-classes ML pipeline will read the disaster messages and thier categories from database-file mentionned above. An ML model will be built, trainned and stored on the local filesystem.

- **Flask web-App for the categorization of disaster messages (./app/run.py)**

The Flask web-app will load the ML model from the filesystem and the disaster messages from the database created with the ETL pipeline. It will offer a functionality to select a message and visualize the classes of the message instantly. This information could then be used by disaster management office. 


The [disaster messages](https://www.kaggle.com/davidshahshankhar/disasterresponsepipeline) dataset supporting this work is freely available on kaggle.com. It consists of 2 csv-files:

- messages.csv: file containing disater messages
- categories.csv: file containing different categories of disaster 






In [3]:
## import libraries

import pandas as pd
import sys
from sqlalchemy import create_engine

# import libraries
import pandas as pd
import numpy as np
from sqlalchemy import create_engine

from nltk.tokenize import word_tokenize,sent_tokenize
from nltk.stem import WordNetLemmatizer
from nltk.corpus import stopwords

from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer, TfidfVectorizer

from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.multioutput import MultiOutputClassifier

from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.metrics import precision_recall_fscore_support,accuracy_score
from sklearn.model_selection import GridSearchCV

import pickle

import nltk
nltk.download(['punkt', 'wordnet','stopwords'])



[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\Herkules\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\Herkules\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\Herkules\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

## ETL Pipeline for data preparation (./ETL-Pipeline-Preparation.ipynb)

In [1]:
def load_data(messages_filepath, categories_filepath):
    
    # load messages dataset
    messages = pd.read_csv(messages_filepath)
    
    # load categories dataset
    categories = pd.read_csv(categories_filepath)
    
    # merge datasets
    df = messages.merge(categories, on=["id"])

    return df


def clean_data(df):
    
    # create a dataframe of the 36 individual category columns
    categories = df.categories.str.split(";",expand=True)

    # select the first row of the categories dataframe
    row = categories.head(1)

    # use this row to extract a list of new column names for categories.
    # one way is to apply a lambda function that takes everything 
    # up to the second to last character of each string with slicing
    category_colnames = list(map(lambda x: x[:-2] , list(row.values[0])) )

    # rename the columns of `categories`
    categories.columns = category_colnames

    for column in categories.columns:
        # set each value to be the last character of the string
        categories[column] = categories[column].astype(str).str[-1]

        # convert column from string to numeric
        categories[column] = categories[column].astype(int)

    # drop the original categories column from `df`
    df.drop('categories',axis=1,inplace=True)

    # concatenate the original dataframe with the new `categories` dataframe
    df = pd.concat([df, categories], axis=1)

    # drop duplicates
    df.drop_duplicates(inplace=True)
    
    return df

def save_data(df, database_filename):
    
    engine = create_engine('sqlite:///{:}.db'.format(database_filename))
    df.to_sql('DisasterResponseMessages', engine, if_exists='replace', index=False)  


def run_etl_pipeline(messages_filepath, categories_filepath, database_filepath):
    try:

        print('Loading data...\n    MESSAGES: {}\n    CATEGORIES: {}'
              .format(messages_filepath, categories_filepath))
        df = load_data(messages_filepath, categories_filepath)

        print('Cleaning data...')
        df = clean_data(df)
        
        print('Saving data...\n    DATABASE: {}'.format(database_filepath))
        save_data(df, database_filepath)
        
        print('Cleaned data saved to database!')
    
    except:
        print('Please provide the filepaths of the messages and categories '\
              'datasets as the first and second argument respectively, as '\
              'well as the filepath of the database to save the cleaned data '\
              'to as the third argument. \n\nExample: disaster_messages.csv'\
              'disaster_categories.csv DisasterResponse.db')
run_etl_pipeline(messages_filepath='messages.csv', categories_filepath='categories.csv', database_filepath='DisasterResponse.db')

Loading data...
    MESSAGES: messages.csv
    CATEGORIES: categories.csv
Cleaning data...
Saving data...
    DATABASE: DisasterResponse.db
Cleaned data saved to database!


## Multi-Classes Machine Learning Pipeline (./Disaster-Response-ML-Pipeline.ipynb)

In [None]:
def load_data(database_filepath):
    """
    Funtion to load data from database
    Parameter: 
        database_filepath: path of the database 
    Returns:
        - X: features (messages)
        - Y: Labels (categories)
        - categories_names: columns of the labels
    """ 
    engine = create_engine('sqlite:///{:}'.format(database_filepath))
    df = pd.read_sql_table("DisasterResponseMessages", con=engine)
    X = df['message']
    Y = df.drop(['id','message','original','genre'],axis=1)
    
    return X, Y, Y.columns

def tokenize(text):
    """ 
    Function to transform text in tokens
    parameters:
        text: text to be transformed
    Return:
        list of tokens
    """
    
    words = word_tokenize(text)
    tokens = [w for w in words if w not in stopwords.words("english")]
    
    lemmatizer = WordNetLemmatizer()

    clean_tokens = []
    for tok in tokens:
        clean_tok = lemmatizer.lemmatize(tok).lower().strip()
        clean_tokens.append(clean_tok)

    return clean_tokens


def build_model():
    """
    Function to build a model for machine learning
    return:
        model for machine learning
    """
    pipeline = Pipeline([
        ('vect', TfidfVectorizer(tokenizer=tokenize)),
        ('clf', MultiOutputClassifier(RandomForestClassifier(n_estimators= 50, random_state=42)))
    ])
    
    return pipeline

def evaluate_model(model, X_test, Y_test, category_names):
    """
    Function to evaluate a model
    Parameters:
        model: The model to be evaluated
        X_test: features for testing
        Y_test: labels for testing
        category_names: columns of the labels
    """
    
    Y_pred = model.predict(X_test)

    for i, col in enumerate(category_names):

            accuracy=accuracy_score(Y_test.loc[:,col],Y_pred[:,i])
            print("[{:}] - accuracy: {:.2f}\n".format(col,accuracy))
            print(classification_report(Y_test[col], Y_pred[:, i]))


def save_model(model, model_filepath):
    """
    Function to save a model
    Parameters:
        model: model to be saved
        model_filepath: file path where the model will be saved
    """
    
    with open('model_filepath', 'wb') as f:
        pickle.dump(model, f)


def run_ML_pipeline(database_filepath, model_filepath):
    try:
        print('Loading data...\n    DATABASE: {}'.format(database_filepath))
        X, Y, category_names = load_data(database_filepath)
        X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)
        
        print('Building model...')
        model = build_model()
        
        print('Training model...')
        model.fit(X_train, Y_train)
        
        print('Evaluating model...')
        evaluate_model(model, X_test, Y_test, category_names)

        print('Saving model...\n    MODEL: {}'.format(model_filepath))
        save_model(model, model_filepath)

        print('Trained model saved!')

    except:
        print('Please provide the filepath of the disaster messages database '\
              'as the first argument and the filepath of the pickle file to '\
              'save the model to as the second argument. \n\nExample: '\
              './data/DisasterResponse.db classifier.pkl')

run_ML_pipeline(database_filepath='data/DisasterResponse.db', model_filepath="models/classifier.pkl")

Loading data...
    DATABASE: data/DisasterResponse.db
Building model...
Training model...


## Flask web-App for the categorization of disaster messages (./app/run.py)

In [None]:
# run.py

import json
import plotly
import pandas as pd

from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize

from flask import Flask
from flask import render_template, request, jsonify
from plotly.graph_objs import Bar
#from sklearn.externals import joblib
import joblib
from sqlalchemy import create_engine

import os


app = Flask(__name__)

def tokenize(text):
    tokens = word_tokenize(text)
    lemmatizer = WordNetLemmatizer()

    clean_tokens = []
    for tok in tokens:
        clean_tok = lemmatizer.lemmatize(tok).lower().strip()
        clean_tokens.append(clean_tok)

    return clean_tokens

# load data
engine = create_engine('sqlite:///./data/DisasterResponse.db')
df = pd.read_sql_table('DisasterResponseMessages', engine)

# load model
model = joblib.load("./models/classifier.pkl")


# index webpage displays cool visuals and receives user input text for model
@app.route('/')
@app.route('/index')
def index():
    
    # extract data needed for visuals
    # TODO: Below is an example - modify to extract data for your own visuals
    genre_counts = df.groupby('genre').count()['message']
    genre_names = list(genre_counts.index)
    
    # create visuals
    # TODO: Below is an example - modify to create your own visuals
    graphs = [
        {
            'data': [
                Bar(
                    x=genre_names,
                    y=genre_counts
                )
            ],

            'layout': {
                'title': 'Distribution of Message Genres',
                'yaxis': {
                    'title': "Count"
                },
                'xaxis': {
                    'title': "Genre"
                }
            }
        }
    ]
    
    # encode plotly graphs in JSON
    ids = ["graph-{}".format(i) for i, _ in enumerate(graphs)]
    graphJSON = json.dumps(graphs, cls=plotly.utils.PlotlyJSONEncoder)
    
    # render web page with plotly graphs
    return render_template('master.html', ids=ids, graphJSON=graphJSON)


# web page that handles user query and displays model results
@app.route('/go')
def go():
    # save user input in query
    query = request.args.get('query', '') 

    # use model to predict classification for query
    classification_labels = model.predict([query])[0]
    classification_results = dict(zip(df.columns[4:], classification_labels))

    # This will render the go.html Please see that file. 
    return render_template(
        'go.html',
        query=query,
        classification_result=classification_results
    )


def run_app():
    app.run(host='127.0.0.1', port=3001, debug=True)


#if __name__ == '__main__':
#    main()

**Run disaster web-app**
- go to terminal at working directory
- change directory: cd app
- run app: python run.py


Subsequently enter the url http://127.0.0.1:3001/ to your browser and start the app on localhost

![Disaster Home](pic1.jpg "Disaster Home")

![Disaster Messages](pic2.jpg "Disaster Message")

![Disaster Messages](pic3.jpg "Disaster Message")