# Disaster Evaluation From tweets

[![Twitter Follow](https://img.shields.io/twitter/follow/dialhaseeb?style=social)](www.twitter.com/dialhaseeb)

![Logo](https://github.com/zenyc/zenyc/blob/master/logo-small.png)

## 🕯 About
**disaster-evaluation-fom-tweets** is a *machine learning model* that predicts if a given tweet has refernce to actual disaster or not. It uses Deep Learning techniques to do so.


## Before we beigin, let's cofigure some stuff so that the notebook runs both on your local machine and on *Google's Colaboratory*

1- If you are running locally, run the following cell:

In [1]:
proj_dir = "proj-dir"

2- If you are running on *Colab*, 
- Make sure you have uploaded all the project files to your *Google Drive*. Then, mount your drive by running the following cell:

In [2]:
from google.colab import drive
drive.mount("/content/drive")

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/drive


- Then write out the path to the project files relative to your drive's root directory after `/content/drive/My Drive/` in the following cell:

In [7]:
%cd "/content/drive/My Drive/Projects/disaster-evaluation-from-tweets/"
proj_dir = "proj-dir/"

/content/drive/My Drive/Projects/disaster-evaluation-from-tweets


## Next up, let's import everything we need. Run the following:

In [5]:
import pandas as pd
import numpy as np
from tensorflow.keras import layers
from tensorflow.keras import Input
from tensorflow.keras.models import Model
from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.preprocessing.text import Tokenizer
import matplotlib.pyplot as plt
from tensorflow.keras.callbacks import EarlyStopping

## Let's load the dataset now:

In [8]:
df = pd.read_csv(proj_dir+"train.csv")

## Load the *Glove Embeddings* into a vector:

In [10]:
embeddings_index = {}
f = open(proj_dir+'glove.6B.100d.txt',encoding="utf")
for line in f:
    values = line.split()
    word = values[0]
    coefs = np.asarray(values[1:], dtype='float32')
    embeddings_index[word] = coefs
f.close()
print('Found %s word vectors.' % len(embeddings_index))

Found 400000 word vectors.


## View the data:

In [11]:
df.head()

Unnamed: 0,id,keyword,location,text,target
0,1,,,Our Deeds are the Reason of this #earthquake M...,1
1,4,,,Forest fire near La Ronge Sask. Canada,1
2,5,,,All residents asked to 'shelter in place' are ...,1
3,6,,,"13,000 people receive #wildfires evacuation or...",1
4,7,,,Just got sent this photo from Ruby #Alaska as ...,1


## Let's perform some data cleaning

In [12]:
data = df.text

In [13]:
labels = df.target

In [14]:
data.shape

(7613,)

In [15]:
0.2*7613

1522.6000000000001

In [16]:
7613-1522

6091

In [17]:
x_train = data[0:6100]

In [18]:
x_test = data[6100:]

In [19]:
y_train = labels[0:6100] 

In [20]:
y_test = labels[6100:]

## Tokenizing the data

In [21]:
tokenizer = Tokenizer()
tokenizer.fit_on_texts(x_train.values)
sequences = tokenizer.texts_to_sequences(x_train.values)
sequences = sequence.pad_sequences(sequences, maxlen=200)

In [22]:
sequences.shape

(6100, 200)

In [31]:
vocab_size = len(tokenizer.word_index)+1

In [32]:
embedding_dim = 100
embedding_matrix = np.zeros((vocab_size, embedding_dim))
for word, i in tokenizer.word_index.items():
    if i < max_words:
        embedding_vector = embeddings_index.get(word)
        if embedding_vector is not None:
            embedding_matrix[i] = embedding_vector

## Let's define the model:

In [33]:
input_layer = Input(shape=(None,), dtype='int32', name='tweet_input')
x = layers.Embedding(vocab_size, 100, input_length=200)(input_layer)
x = layers.LSTM(32,
dropout=0.1,
recurrent_dropout=0.5,
return_sequences=True)(x)
x = layers.LSTM(32,
dropout=0.1,
recurrent_dropout=0.5,
return_sequences=False)(x)


In [34]:
x = layers.Dense(100, activation='relu')(x)
output = layers.Dense(1, activation='sigmoid')(x)

In [35]:
model = Model(input_layer,output)

In [36]:
model.summary()

Model: "functional_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
tweet_input (InputLayer)     [(None, None)]            0         
_________________________________________________________________
embedding_1 (Embedding)      (None, None, 100)         1934600   
_________________________________________________________________
lstm_2 (LSTM)                (None, None, 32)          17024     
_________________________________________________________________
lstm_3 (LSTM)                (None, 32)                8320      
_________________________________________________________________
dense_2 (Dense)              (None, 100)               3300      
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 101       
Total params: 1,963,345
Trainable params: 1,963,345
Non-trainable params: 0
____________________________________________

In [37]:
model.layers[1].set_weights([embedding_matrix])
model.layers[1].trainable = False

In [38]:
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])

## Training the model

In [39]:
es = EarlyStopping(monitor='val_loss', mode='min')

In [40]:
history = model.fit(sequences, y_train.values, epochs=20, validation_split=0.2, callbacks = [es])

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20


In [41]:
model.save(proj_dir+"trained.h5")

## Evaluating the model:

In [42]:
sequences = tokenizer.texts_to_sequences(x_test.values)
sequences = sequence.pad_sequences(sequences, maxlen=200)

In [43]:
x_test = sequences

In [44]:
score = model.evaluate(x_test, y_test.values)



In [45]:
score

[0.41791707277297974, 0.8142762780189514]

## Now loading Kaggle's Test Set:

In [67]:
test = pd.read_csv(proj_dir+"test.csv")

In [68]:
test

Unnamed: 0,id,keyword,location,text
0,0,,,Just happened a terrible car crash
1,2,,,"Heard about #earthquake is different cities, s..."
2,3,,,"there is a forest fire at spot pond, geese are..."
3,9,,,Apocalypse lighting. #Spokane #wildfires
4,11,,,Typhoon Soudelor kills 28 in China and Taiwan
...,...,...,...,...
3258,10861,,,EARTHQUAKE SAFETY LOS ANGELES ÛÒ SAFETY FASTE...
3259,10865,,,Storm in RI worse than last hurricane. My city...
3260,10868,,,Green Line derailment in Chicago http://t.co/U...
3261,10874,,,MEG issues Hazardous Weather Outlook (HWO) htt...


In [69]:
ids = test.id

In [70]:
test = test.text

In [71]:
sequences = tokenizer.texts_to_sequences(test)
sequences = sequence.pad_sequences(sequences, maxlen=200)

In [82]:
results = model.predict(sequences)

In [83]:
results = results.round()

In [84]:
results = results.squeeze()

In [88]:
csv_df = pd.DataFrame({
    "id": ids,
    "target": results
})

In [89]:
csv_df.index = csv_df.id

In [91]:
csv_df = csv_df["target"]

In [97]:
csv_df = csv_df.astype(int)

In [99]:
csv_df.to_csv(proj_dir+"results.csv", header=True)

## Trying the model in action

In [60]:
def encoder(text):
    text = tokenizer.texts_to_sequences([text])
    text = sequence.pad_sequences(text, maxlen=200)
    return text

In [61]:
def predict(text):
    encoded_text = encoder(text)
#     print(encoded_text)
    prediction = (model.predict(encoded_text))
    print(prediction)
    prediction = np.round(prediction)
    if prediction==1:
        return "Disaster"
    return "Not a Disaster"

In [63]:
predict("OMG a blazing sky!")

[[0.13082466]]


'Not a Disaster'

In [66]:
predict("fire fighters are here")

[[0.5364455]]


'Disaster'

# The End?

## 👀 Contact

If you want to contact me you can reach me at <zenyc@live.com>.