In [12]:
import pandas as pd
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

### Get data

In [4]:
df = pd.read_csv('../data/mbti_clean_subset.csv')

In [5]:
df

Unnamed: 0,type,posts
0,INFJ,What has been the most life-changing experienc...
1,INFJ,May the PerC Experience immerse you.
2,INFJ,Hello ENFJ7. Sorry to hear of your distress. I...
3,INFJ,Welcome and stuff.
4,INFJ,"Prozac, wellbrutin, at least thirty minutes of..."
...,...,...
4545,ENFP,"Dear ISTJ Father, It's been really hard to re..."
4546,ENFP,Let me clarify this whole thing and perhaps pu...
4547,ENFP,"Lol, I thought you'd say that. Agreed, an..."
4548,ENFP,ISTJ: The OCD ISFJ: The Social Anxiety Disord...


### Exploratory Data Analysis

In [6]:
# num entries by type
print(df['type'].value_counts())

INFP    971
INFJ    933
INTP    803
INTJ    561
ENTJ    278
ENFJ    268
ENFP    188
ENTP    171
ISFP    131
ISFJ    124
ISTP     94
ISTJ     28
Name: type, dtype: int64


There is an imbalance in our dataset. We may want to use resampling methods (SMOTE) in the future.

In [9]:
def print_comment_by_type(mbti_type):
    for index, row in df.iterrows():
        print(row['posts'])
        print()

In [10]:
print_comment_by_type('ISTP')

What has been the most life-changing experience in your life?

May the PerC Experience immerse you.

Hello ENFJ7. Sorry to hear of your distress. It's only natural for a relationship to not be perfection all the time in every moment of existence. Try to figure the hard times as times of growth, as...

Welcome and stuff.

Prozac, wellbrutin, at least thirty minutes of moving your legs (and I don't mean moving them while sitting in your same desk chair), weed in moderation (maybe try edibles as a healthier alternative...

Basically come up with three items you've determined that each type (or whichever types you want to do) would more than likely use, given each types' cognitive functions and whatnot, when left by...

All things in moderation.  Sims is indeed a video game, and a good one at that. Note: a good one at that is somewhat subjective in that I am not completely promoting the death of any given Sim...

Dear ENFP:  What were your favorite video games growing up and what are your 

This was an aquired thing for me, I didn't start doing this until something impacted me and inspired me enough to start thinking this way

Ugh. I'm starting to have a hard time getting to sleep. Monday was also a complete mess. I haven't felt that emotionally unstable in a long time. But I'm really happy for my friends who were able to...

ISFJ's naturally seek validation, or just being acknowledged of what they are doing. This seems like she wanted it a little too much, or she doesn't feel like you give her enough validation and...

It's really hard to put my finger on why... but i know i don't like seeing people sad or distressed, and i don't like being rude. And as i said, i like the feeling from helping others. Maybe it's...

Not EXACTLY sure on what you are asking, but even if i'm nice to someone, i can still be annoyed with them or not want to be around them. If someone just walks up to me and starts talking to me, it...

I am truly excited for one of the first times in a VERY lo

### Vader Analysis - what type of sentiment each types posts about

In [41]:
analyzer = SentimentIntensityAnalyzer()
def get_vader_prediction(string):
    outputs = analyzer.polarity_scores(string)
    max_score = max(outputs['neg'], outputs['pos'], outputs['neu'])
    
    prediction = [key for key, value in outputs.items() if max_score == value][0]
    
    pred_map = {'neu': 'neutral', 'pos': 'positive', 'neg':'negative'}
    
    return pred_map.get(prediction)
    
print(get_vader_prediction('Welcome and stuff.'))
print(get_vader_prediction('Yessss, Adventure Time :D'))

positive
positive


In [22]:
df['vader_pred'] = df['posts'].apply(lambda x: get_vader_prediction(str(x)))

In [39]:
vader_counts = df.groupby(['type', 'vader_pred']).count().reset_index()
vader_aggr = pd.pivot_table(vader_counts, values = 'posts', columns = ['vader_pred'], index=['type'], fill_value=0)
vader_aggr['total'] = vader_aggr['positive'] + vader_aggr['negative'] + vader_aggr['neutral']

In [40]:
# vader_perc

vader_pred,negative,neutral,positive,total
type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
ENFJ,3,226,39,268
ENFP,1,179,8,188
ENTJ,5,265,8,278
ENTP,0,168,3,171
INFJ,12,879,42,933
INFP,13,918,35,966
INTJ,8,543,10,561
INTP,8,775,20,803
ISFJ,1,117,6,124
ISFP,0,123,8,131


### Topic Modelling - What subject of discussion do each type talk about