# RAKE 
short for Rapid Automatic Keyword Extraction algorithm, is a domain independent keyword extraction algorithm which tries to determine key phrases in a body of text by analyzing the frequency of word appearance and its co-occurance with other words in the text.



In [5]:
!pip install python-rake
!pip install rake-nltk

Collecting rake-nltk
  Downloading rake_nltk-1.0.4.tar.gz (7.6 kB)
Building wheels for collected packages: rake-nltk
  Building wheel for rake-nltk (setup.py): started
  Building wheel for rake-nltk (setup.py): finished with status 'done'
  Created wheel for rake-nltk: filename=rake_nltk-1.0.4-py2.py3-none-any.whl size=7829 sha256=276ee8781a80b235f3ff85b7decb1bcad2502fb9f3d112a1a107e3ddc6d204ac
  Stored in directory: c:\users\home\appdata\local\pip\cache\wheels\7c\d9\8a\b8a9244fa89a07f288f9fe006aafc79d93fceb58496c29b606
Successfully built rake-nltk
Installing collected packages: rake-nltk
Successfully installed rake-nltk-1.0.4


In [4]:
import nltk
import pickle
import pandas as pd
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\home\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [7]:
# data
post = "Getting back to work made a difference. After 10 days of nothing but the business of moving and all of its seemingly obligatory messy emotions, it was nice to think of nothing but my patients. I worked Wednesday through Friday, and even with a couple of long days in there, it was a relief to be away from home. It was a relief to be away from unpacking, and contemplating, and deciding. It was a pleasure to think about somebody other than myself for 3 days. I needed that. Those 3 days away, combined with a long run/walk/dip into Lake Superior with Jet yesterday, gave me the energy to unpack nearly my entire basement today. I ve still got a lot to do, but things are starting to take shape. My bedroom is almost completely put together. My bathroom and kitchen are done. I ve still got boxes in the living room, dining room and the other 2 bedrooms, but I m getting there. Tomorrow I m heading south to Mayo Clinic for a ketamine infusion. Im pleased its not an urgent need at this time, just a regular maintenance dose. Returning to work, getting some exercise, and progressing with my unpacking have each helped stabilize my mood. Im  no longer daily wiping tears from my eyes. In fact, I haven t cried for several days. That, in and of itself, is quite a feat! I m taking my time with unpacking. I m doing my best to remain patient. Taking the next right action and maintaining my attitude of gratitude are my focus now. Its still hard, but its not impossible. Settling into my new home, new routine, and new city will take time. I m keeping that fact forefront in my mind. I can do this. But I cant do it all today, nor do I have to. Patiently, Ill get it done."

In [10]:
from rake_nltk import Rake

r = Rake(min_length=2, max_length=3) # Uses stopwords for english from NLTK, and all puntuation characters.

r.extract_keywords_from_text(post)

r.get_ranked_phrases() # To get keyword phrases ranked highest to lowest.

['regular maintenance dose',
 'next right action',
 'entire basement today',
 'still got boxes',
 '3 days away',
 'still got',
 '3 days',
 'still hard',
 'several days',
 'long days',
 '10 days',
 'worked wednesday',
 'work made',
 'urgent need',
 'unpack nearly',
 'take time',
 'take shape',
 'remain patient',
 'new routine',
 'new home',
 'new city',
 'mayo clinic',
 'long run',
 'living room',
 'lake superior',
 'ketamine infusion',
 'jet yesterday',
 'im pleased',
 'ill get',
 'helped stabilize',
 'heading south',
 'getting back',
 'fact forefront',
 'dining room',
 '2 bedrooms']

# Spacy

In [29]:
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp(post)
print(doc.ents)

(10 days, Wednesday through Friday, a couple of long days, 3 days, Those 3 days, Lake Superior, yesterday, today, 2, Mayo Clinic, daily, several days, today, Ill)


In [34]:
for chunk in doc.noun_chunks:
    print(chunk.text)

work
a difference
10 days
nothing
the business
its seemingly obligatory messy emotions
it
nothing
my patients
I
Friday
a couple
long days
it
a relief
home
It
a relief
unpacking
It
a pleasure
somebody
myself
3 days
I
a long run/walk/dip
Lake Superior
Jet
me
the energy
nearly my entire basement
I
a lot
things
shape
My bedroom
My bathroom
kitchen
I
boxes
the living room
dining room
the other 2 bedrooms
I
I
Mayo Clinic
a ketamine infusion
I
its not an urgent need
this time
work
some exercise
my unpacking
my mood
I
tears
my eyes
fact
I
t
several days
itself
quite a feat
I
my time
unpacking
I
the next right action
my attitude
gratitude
my focus
its
my new home
new routine
new city
time
I
that fact
my mind
I
I
it
I
Ill
it


In [36]:
# Analyze syntax

print("Verbs:", [token.lemma_ for token in doc if token.pos_ == "VERB"])


Verbs: ['get', 'make', 'move', 'think', 'work', 'be', 'be', 'be', 'contemplate', 'decide', 'think', 'need', 'combine', 'give', 'unpack', 'get', 'do', 'start', 'take', 'put', 'do', 'get', 'get', 'head', 'm', 'return', 'get', 'progress', 'help', 'stabilize', 'wipe', 'haven', 'cry', 'take', 'do', 'remain', 'take', 'maintain', 'settle', 'take', 'keep', 'do', 'do', 'have', 'get', 'do']


In [8]:
#load labelled data (generated in file 2.preprocessing)
data = pickle.load(open('../data/depression_marathon_df_final.pkl', "rb"))
data.columns


Index(['header', 'date', 'full_text'], dtype='object')

In [13]:
post = data.full_text.iloc[0]
post

"I've been meaning to write for 4 or 5 days, but I'm not feeling all that inspirational or interesting right now. I'm getting back into the routines of my life as best I can. Things still aren't as honky dory as they were prior to my recent depression relapse, but I'm functional. I'd like to be feeling 100% better. I'd like to be as free and light as I was just 2 months ago, but I'm not quite there yet.\n\n\n\nI'm not quite humming along, but I think I'm moving in the right direction. I'm working close to my normal schedule. Unfortunately, we are really slow right now so I've had to take some extra, unwanted time off. I still get really tired after a full work day, though, so maybe working a little less is still for the best. Regardless, I'm looking forward to resuming my regular schedule.\n\n\n\nI'm in the process of resuming my normal exercise schedule and intensity as well. I'm happy and extremely grateful to report I've been able to run 2-4 days per week for the last couple of week

In [14]:
post = post.replace('\n','')
post

"I've been meaning to write for 4 or 5 days, but I'm not feeling all that inspirational or interesting right now. I'm getting back into the routines of my life as best I can. Things still aren't as honky dory as they were prior to my recent depression relapse, but I'm functional. I'd like to be feeling 100% better. I'd like to be as free and light as I was just 2 months ago, but I'm not quite there yet.I'm not quite humming along, but I think I'm moving in the right direction. I'm working close to my normal schedule. Unfortunately, we are really slow right now so I've had to take some extra, unwanted time off. I still get really tired after a full work day, though, so maybe working a little less is still for the best. Regardless, I'm looking forward to resuming my regular schedule.I'm in the process of resuming my normal exercise schedule and intensity as well. I'm happy and extremely grateful to report I've been able to run 2-4 days per week for the last couple of weeks. I'm super slo

In [18]:
post_str = post.split('.')
post_str = [sentence.strip() for sentence in post_str]
post_df = pd.DataFrame(post_str, columns=['sentence'])
post_df

Unnamed: 0,sentence
0,"I've been meaning to write for 4 or 5 days, bu..."
1,I'm getting back into the routines of my life ...
2,Things still aren't as honky dory as they were...
3,I'd like to be feeling 100% better
4,I'd like to be as free and light as I was just...
5,"I'm not quite humming along, but I think I'm m..."
6,I'm working close to my normal schedule
7,"Unfortunately, we are really slow right now so..."
8,I still get really tired after a full work day...
9,"Regardless, I'm looking forward to resuming my..."


In [52]:
from rake_nltk import Rake

r = Rake(min_length=1, max_length=3) # Uses stopwords for english from NLTK, and all puntuation characters.

def get_rake_phrases(sentence):
    r.extract_keywords_from_text(sentence)
    return r.get_ranked_phrases()

In [47]:
print(post_df.sentence.iloc[4])
get_rake_phrases(post_df.sentence.iloc[4])

I'd like to be as free and light as I was just 2 months ago, but I'm not quite there yet


['2 months ago']

In [59]:
import text2emotion as te
print(post_df.sentence.iloc[4])
te.get_emotion(post_df.sentence.iloc[4])

I'd like to be as free and light as I was just 2 months ago, but I'm not quite there yet


{'Happy': 1.0, 'Angry': 0.0, 'Surprise': 0.0, 'Sad': 0.0, 'Fear': 0.0}

In [60]:
post_df['rake'] = post_df.sentence.apply(get_rake_phrases)
post_df['text2emotion'] = post_df.sentence.apply(te.get_emotion)
post_df['Happy'] = [emotion['Happy'] for emotion in post_df.text2emotion]
post_df['Angry'] = [emotion['Angry'] for emotion in post_df.text2emotion]
post_df['Surprise'] = [emotion['Surprise'] for emotion in post_df.text2emotion]
post_df['Sad'] = [emotion['Sad'] for emotion in post_df.text2emotion]
post_df['Fear'] = [emotion['Fear'] for emotion in post_df.text2emotion]

In [63]:
for i, row in post_df[post_df.Happy >= 0.5].iterrows():
    print(row.sentence, row.rake)

I've been meaning to write for 4 or 5 days, but I'm not feeling all that inspirational or interesting right now ['interesting right', '5 days', 'write', 'meaning', 'inspirational', 'feeling', '4']
I'd like to be as free and light as I was just 2 months ago, but I'm not quite there yet ['2 months ago', 'yet', 'quite', 'like', 'light', 'free']
I'm working close to my normal schedule ['working close', 'normal schedule']
I'm happy and extremely grateful to report I've been able to run 2-4 days per week for the last couple of weeks ['run 2', 'last couple', 'extremely grateful', 'weeks', 'report', 'happy', 'able']
But so far so good ['good', 'far']


In [64]:
for i, row in post_df[post_df.Sad >= 0.5].iterrows():
    print(row.sentence, row.rake)

I'm in the process of resuming my normal exercise schedule and intensity as well ['normal exercise schedule', 'well', 'resuming', 'process', 'intensity']
Every time I run I feel an overwhelming sense of joy, gratitude and relief ['overwhelming sense', 'every time', 'run', 'relief', 'joy', 'gratitude', 'feel']
Unfortunately, every run also brings a bit of fear, as I'm constantly waiting for one of my Achilles tendons to flare ['constantly waiting', 'achilles tendons', 'unfortunately', 'one', 'flare', 'fear', 'bit']
They say we really don't appreciate what we have until it's lost ['say', 'really', 'lost', 'appreciate']
I can now verify the truth behind that statement ['truth behind', 'verify', 'statement']
Even though I'm getting back to my routines, I still feel the sting from the losses of my mental wellness, high level of functioning, and running ['still feel', 'mental wellness', 'high level', 'getting back', 'even though', 'sting', 'running', 'routines', 'losses', 'functioning']
Inst

# Spacy does not suit

In [36]:
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp(post_df.sentence.iloc[4])


I
I
I


In [37]:
# Analyze syntax
print("Noun phrases:", [chunk.text for chunk in doc.noun_chunks])
print("Verbs:", [token.lemma_ for token in doc if token.pos_ == "VERB"])


Noun phrases: ['I', 'I', 'I']
Verbs: ['like', 'be', 'be', 'be']
just 2 months ago DATE


In [45]:
print("Verbs:", [[token.lemma_,token.pos_] for token in doc])

Verbs: [['I', 'PRON'], ["'d", 'AUX'], ['like', 'VERB'], ['to', 'PART'], ['be', 'VERB'], ['as', 'ADV'], ['free', 'ADJ'], ['and', 'CCONJ'], ['light', 'NOUN'], ['as', 'ADP'], ['I', 'PRON'], ['be', 'VERB'], ['just', 'ADV'], ['2', 'NUM'], ['month', 'NOUN'], ['ago', 'ADV'], [',', 'PUNCT'], ['but', 'CCONJ'], ['I', 'PRON'], ['be', 'VERB'], ['not', 'PART'], ['quite', 'ADV'], ['there', 'ADV'], ['yet', 'ADV']]
