# SEaM free text sentiment visualiser

Plots the data from a SEaM spreadsheet as a series of bar charts. Change the filename in the marked cell to input from a new spreadsheet, then run all.

Start by selecting a file. (The widget seems to need you to click on `Open` before it actually starts properly.)

In [None]:
from ipyfilechooser import FileChooser

fc=FileChooser()
display(fc)

## Pull in the data and import libraries

AFAIK, the data is in a standard format.

Start by importing the necessary modules:

In [None]:
import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns


And stop pandas from curtailing the outputs so we can see the whole text cells

In [None]:
pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_rows', None)

First, let's get a SEaM data file.

In [None]:
# Change this line to read the seam file

feedback_df=pd.read_excel(fc.selected, sheet_name=2)

Check the contents of the dataframe:

In [None]:
feedback_df.head()

Now, let's whittle down to the columns we actually want:

In [None]:
feedback_df=(feedback_df
 
 .rename({'If you answered Disagree to any of the statements above, we would like to understand why so we can make improvements in the future':'improvements',
          'Do you have any further comments about your teaching, assessment and learning on this module?':'teaching_assessment_learning',
          'Do you have any other comments to add about your study experience on this module?':'study_experience'},
         axis='columns')

    .filter(['improvements', 'teaching_assessment_learning', 'study_experience'], axis='columns')

    .dropna(axis='rows', how='all')
)

In [None]:
feedback_df.head()

## Split the sentences in the free text cells

To split the input into separate sentences, use the NLTK library function `sent_tokenize`:

In [None]:
# import the language model for sentence splitting

import nltk
nltk.download('punkt')

In [None]:
from nltk.tokenize import sent_tokenize

Let's see if we can put all the sentences into a single DataFrame. Reasonably tidily.

In [None]:
l=[]

ss=(feedback_df['improvements']
 
     .dropna()
)

for idx in ss.index:
    l.extend([{'response':idx, 'sentence_num':i, 'improvements':s} for (i, s)
              in enumerate(sent_tokenize(ss[idx]))])

df1=pd.DataFrame(l)
# df1.head()

In [None]:
l=[]

ss=(feedback_df['teaching_assessment_learning']
 
     .dropna()
)

for idx in ss.index:
    l.extend([{'response':idx, 'sentence_num':i, 'teaching_assessment_learning':s} for (i, s)
              in enumerate(sent_tokenize(ss[idx]))])

df2=pd.DataFrame(l)
# df2.head()

In [None]:
l=[]

ss=(feedback_df['study_experience']
 
     .dropna()
)

for idx in ss.index:
    l.extend([{'response':idx, 'sentence_num':i, 'study_experience':s} for (i, s)
              in enumerate(sent_tokenize(ss[idx]))])

df3=pd.DataFrame(l)
# df3.head()

In [None]:
all_comments_df=(pd
                 
                 .merge(df1, df2, how='outer')
                 
                 .merge(df3, how='outer')
                )

all_comments_df.head()

In [None]:
all_comments_df=(all_comments_df
 
                 .sort_values(['response', 'sentence_num'])
 
                 .set_index(['response', 'sentence_num'])
)

all_comments_df

## Apply the sentiment analyser

We can use the Vader sentiment analyser from NLTK.

In [None]:
# import the language model for sentiment analysis

import nltk
nltk.download('vader_lexicon')

In [None]:
from nltk.sentiment import SentimentIntensityAnalyzer

In [None]:
sia = SentimentIntensityAnalyzer()

In [None]:
sia.polarity_scores("TM351 was the best module I have ever imagined!")

In [None]:
sia.polarity_scores("TM351 is the worst course I have studied in decades at the OU")

The `'compound'` key in the dictionary is the one we want: range from -1 to +1.

## Visualising the responses

We can combine the power of *seaborn*, which generates nice graded palettes, with *pandas*'  styling methods for DataFrames.

Can use the palette:

In [None]:
sentiment_colour_map=sns.diverging_palette(10, 125, s=75, l=50,
                                           n=12, center="light", as_cmap=True)
sentiment_colour_map

And then map the sentences in the DataFrame onto the `compound` values:

In [None]:
def polarity_scores_check(txt):
    '''Returns the result of polarity_scores, but with 0 for cases
       raising an error (avoids throwing errors for NaNs and the
       like).
    '''
    try:
        return sia.polarity_scores(txt)['compound']
    except:
        return 0

all_comments_df.applymap(polarity_scores_check)

And finally, we can use the polarity scores DataFrame to colour the cells in the text DataFrame:

In [None]:
all_comments_df.style.background_gradient(cmap=sentiment_colour_map,
                                         axis=None, vmin=-1, vmax=1,
                                          gmap=all_comments_df.applymap(polarity_scores_check))