# Extracting Situational Tweets

In [1]:
import re

import numpy as np
import pandas as pd
from nltk.tokenize import TweetTokenizer
from nltk import pos_tag, ne_chunk
from nltk.chunk import conlltags2tree, tree2conlltags
from nltk.tag.stanford import StanfordNERTagger
import nltk

from tqdm import *
from pprint import pprint

Getting all the tweets from the twitter api and then saving it here

In [2]:
tweets = pd.read_csv('./tweet.csv',encoding='ISO-8859-1')

## Data Preprocessing

In [3]:
tweets.dropna(inplace=True)
tweets.head()

Unnamed: 0,date,username,retweets,text,mentions,hashtags
1,6/4/2019 16:10,SRKKeralaFC,30,Schools are gonna open this week all over Kera...,@SRKCHENNAIFCpic,#KeralaFloods
3,6/3/2019 6:55,JustOutNews,0,Govt to construct four new dams in Kerala; aim...,@CPIMKerala @keralagovernment,#kerala #keralafloods #StateNews #CurrentUpdat...
5,5/29/2019 14:24,dipalisharma02,5,Chairperson Ms. T.K.Sathi along with Ward Coun...,@ActionAidIndia @caretoday,#keralafloods
10,5/28/2019 15:36,vssanakan,4,Hearty Congratulations!!! Biji Thomas of @mano...,@manoramanews,#KeralaFloods
14,6/1/2019 7:39,blacknwhitetale,1,#keralafloods happened and crores of money wer...,@PrakashJavdekar @SuPriyoBabulhttps,#keralafloods #climatechange


Extracting text from the tweets dataframe

Removing URLs, Removing @..., and the hashtags

In [4]:
tweets.text = tweets.text.apply(lambda x: re.sub(u'https:\S+', u'', x))
tweets.text = tweets.text.apply(lambda x: re.sub(u'http:\S+', u'', x))
tweets.text = tweets.text.apply(lambda x: re.sub(u'(\s)@\w+', u'', x))
tweets.text = tweets.text.apply(lambda x: re.sub(u'#', u'', x))
for text in tweets.text.tail(70):
    print(text + '\n')

Onam holidays for all schools in Kerala will commence from 17th Aug & reopen 29th Aug. Request to align the holidays of Kendriya Vidyalaya schools also to these dates, considering the crisis situation. KeralaFloods

Praying for everyone affected by the KeralaFloods. The state really needs our help and every donation, no matter how big or small, will count. Here are some details of how to donate tos distress relief fund and other helpline numbers pic.twitter.com/eezFJRE2Dv

Death toll rises to 94 in KeralaFloods and over 160,000 displaced people are in relief camps as appeals for help. â¦

Hey Paid Media, keralafloods There is a state called Kerala which is affected by floods, All 44 rivers are overflowing, 33 dams have released waters. Why there is no coverage ? pic.twitter.com/8rmvFTQ1m9

Please help the people who have been affected by the floods in Kerala. Spread the word! KeralaFloods StandWithKerala.twitter.com/beLRCe7V07

More than 79 dead over 1 Lakh displaced in Kerala and i


National media coverage has been grossly inadequate compared to the gravity of the situation in Kerala, said Congress MP. Out of water, food and electricity, many Malayalis have been sharing their difficulties on social media KeralaFloods  â¦

KeralaFloods salute.twitter.com/bJrTXVkAdM

Prayers are always good but in times of dire need and suffering we all can do more. Right now the victims and families of the KeralaFloods need our help. Letâs show them that we StandWithKerala. Even a small contribution to Keralaâs CMDRF would go a long way..twitter.com/UFQCVL3G3x

Girona FC wants to express its solidarity with the citizens of Kerala, who have suffered serious floods these days. Our thoughts are with you. KeralaFloods pic.twitter.com/uNT1lpRPpZ

AlluArjun donates big for Kerala relief measures  â¦ KeralaFloodRelief KeralaFloods pic.twitter.com/pJtYR4iQQQ

Stylish Star who is very popular in Kerala pledges 25 Lakhs donation to Kerala Chief Minister Disaster Relief fund.. A noble 

## Tokenizing with nltk

In [5]:
tknzr = TweetTokenizer()

nltk_tweets = []
for text in tweets.text:
    nltk_tweets.append(tknzr.tokenize(text))
nltk_tweets[-68]

['Death',
 'toll',
 'rises',
 'to',
 '94',
 'in',
 'KeralaFloods',
 'and',
 'over',
 '160,000',
 'displaced',
 'people',
 'are',
 'in',
 'relief',
 'camps',
 'as',
 'appeals',
 'for',
 'help',
 '.',
 'â',
 '\x80',
 '¦']

In [6]:
#nltk.download()

Using POS tagger to get the array of various part of speech in the tweet

In [7]:
nltk_pos = []

for text in nltk_tweets:
    nltk_pos.append(pos_tag(text))
pprint(nltk_pos[-68])
#print(ne_chunk(nltk_pos[-68]))

[('Death', 'NNP'),
 ('toll', 'NN'),
 ('rises', 'VBZ'),
 ('to', 'TO'),
 ('94', 'CD'),
 ('in', 'IN'),
 ('KeralaFloods', 'NNS'),
 ('and', 'CC'),
 ('over', 'IN'),
 ('160,000', 'CD'),
 ('displaced', 'JJ'),
 ('people', 'NNS'),
 ('are', 'VBP'),
 ('in', 'IN'),
 ('relief', 'NN'),
 ('camps', 'NNS'),
 ('as', 'IN'),
 ('appeals', 'NNS'),
 ('for', 'IN'),
 ('help', 'NN'),
 ('.', '.'),
 ('â', 'VB'),
 ('\x80', 'JJ'),
 ('¦', 'NN')]


Tried Named entity recognition using NLTK but not accurate

In [8]:
#pattern = 'NP: {<DT>?<JJ>*<NN>}'
#cp = nltk.RegexpParser(pattern)
#cs = cp.parse(nltk_pos[-68])
#print(cs)

In [9]:
#iob_tagged= tree2conlltags(cs)
#pprint(iob_tagged)

Now using Stanford Natural Processing!!
First, we will set the config_java file for nltk

In [12]:
nltk.internals.config_java("C:/Program Files/Java/jdk-12.0.1/bin/java.exe")
st = StanfordNERTagger('C:\Twitter-Mining\stanford-ner-2018-10-16\classifiers\english.all.3class.distsim.crf.ser.gz',
           'C:\Twitter-Mining\stanford-ner-2018-10-16\stanford-ner.jar', encoding='utf-8')
st.tag(nltk_tweets[-68])

[('Death', 'O'),
 ('toll', 'O'),
 ('rises', 'O'),
 ('to', 'O'),
 ('94', 'O'),
 ('in', 'O'),
 ('KeralaFloods', 'O'),
 ('and', 'O'),
 ('over', 'O'),
 ('160,000', 'O'),
 ('displaced', 'O'),
 ('people', 'O'),
 ('are', 'O'),
 ('in', 'O'),
 ('relief', 'O'),
 ('camps', 'O'),
 ('as', 'O'),
 ('appeals', 'O'),
 ('for', 'O'),
 ('help', 'O'),
 ('.', 'O'),
 ('â', 'O'),
 ('\x80', 'O'),
 ('¦', 'O')]