# Topic Detection and Sentiment Analysis
### Part 1: Exploration of POS and DEP

In [None]:
import spacy
import spacy_transformers
import spacy.displacy as displacy

nlp = spacy.load("en_core_web_trf")
document = nlp('''
Given that it’s backed up by solid performance and fair price, there’s nothing wrong with that at all.
''')
displacy.render(document, style="dep")

### Part 2: Rules for Topic Detection
- How do we find topics mentioned?
- Can we extract subsentences (a.k.a. clauses) related to each topic?

Note: in the solution below we only follow direct left and right children. We could probably improve our approach by following them recursively. Remember recursion? :-)

In [None]:
def topic_detection(document):
    result = {}

    topics = []
    for token in document:
        if ((token.pos_ == "NOUN") and 
            (token.dep_ == "nsubj" or 
             token.dep_ == "pobj" or 
             token.dep_ == "conj")):
            topics.append(token)

    for topic in topics:
        subsentence = []
        for lefty in topic.lefts:
            subsentence.append(lefty.text)
        subsentence.append(topic.text)
        for righty in topic.rights:
            subsentence.append(righty.text)

        result[topic.text] = " ".join(subsentence)

    return result


In [None]:
print(topic_detection(document))

### Part 3: Sentiment Analysis

Note: we now start with another document to better demonstrate the kinds of results obtained when doing sentiment analysis. The following would typically we applied to each row of a dataframe containing our original texts. As a result we obtain 2 new dataframes: one with the sentiments per sentence, another one with the sentiments per subsentence (as detected by our approach in part 2).

In [None]:
import pandas as pd

document = nlp('''I am not sure if this is the case all around the 
    world, but in American sports culture, fans love to argue not just 
    which athlete is better, but also which has been properly rated. 
    The gist is that being good isn't enough, the athlete has to be at 
    least as good or better as the hype bestowed on them (usually by 
    media or apparel companies like Nike). Not living up to the hype 
    results in the dreaded overrated moniker, and the athlete usually 
    receives criticism or mockery.''')
sentence_df = pd.DataFrame(columns=[
    "sentence", 
    "tb_polarity", 
    "tb_subjectivity", 
    "sia_neg", 
    "sia_neu", 
    "sia_pos", 
    "sia_compound"
    ])
subsentence_df = pd.DataFrame(columns=[
    "sentence", 
    "topic", 
    "subsentence", 
    "tb_polarity", 
    "tb_subjectivity", 
    "sia_neg", 
    "sia_neu", 
    "sia_pos", 
    "sia_compound"
    ])

In [None]:
from textblob import TextBlob
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

sia = SentimentIntensityAnalyzer()

i = 0
for sentence in document.sents:
    sentence_sentiment_tb = TextBlob(sentence.text).sentiment
    sentence_sentiment_sia = sia.polarity_scores(sentence.text)
    new_row = pd.DataFrame({
        "sentence": sentence.text,
        "tb_polarity": sentence_sentiment_tb.polarity,
        "tb_subjectivity": sentence_sentiment_tb.subjectivity,
         "sia_neg": sentence_sentiment_sia["neg"],
         "sia_neu": sentence_sentiment_sia["neu"],
         "sia_pos": sentence_sentiment_sia["pos"],
         "sia_compound": sentence_sentiment_sia["compound"],
    }, index=[i])
    sentence_df = pd.concat([new_row, sentence_df])

    topics = topic_detection(nlp(sentence.text))
    for topic, subsentence in topics.items():
        subsentence_sentiment_tb = TextBlob(subsentence).sentiment
        subsentence_sentiment_sia = sia.polarity_scores(subsentence)
        new_row = pd.DataFrame({
        "sentence": sentence.text,
        "topic": topic,
        "subsentence": subsentence,
        "tb_polarity": subsentence_sentiment_tb.polarity,
        "tb_subjectivity": subsentence_sentiment_tb.subjectivity,
         "sia_neg": subsentence_sentiment_sia["neg"],
         "sia_neu": subsentence_sentiment_sia["neu"],
         "sia_pos": subsentence_sentiment_sia["pos"],
         "sia_compound": subsentence_sentiment_sia["compound"],
    }, index=[i])
    subsentence_df = pd.concat([new_row, subsentence_df])


    i+=1

In [None]:
sentence_df.head()

In [None]:
subsentence_df.head()