**Step 1**: Import necessary packages and set-up environment.

In [2]:
# Install Flair for sentiment analysis
!pip install flair

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting flair
  Downloading flair-0.12.2-py3-none-any.whl (373 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m373.1/373.1 kB[0m [31m7.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting conllu>=4.0
  Downloading conllu-4.5.2-py2.py3-none-any.whl (16 kB)
Collecting janome
  Downloading Janome-0.4.2-py2.py3-none-any.whl (19.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m19.7/19.7 MB[0m [31m69.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting mpld3==0.3
  Downloading mpld3-0.3.tar.gz (788 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m788.5/788.5 kB[0m [31m50.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting pytorch-revgrad
  Downloading pytorch_revgrad-0.2.0-py3-none-any.whl (4.6 kB)
Collecting bpemb>=0.3.2
  Downloading bpemb-0.3.4-py3-none-any.whl (19 kB)
Collecting segto

In [3]:
import pandas as pd
import flair
from google.colab import drive
drive.mount('/content/drive')

**Step 2**: Load data from Google Drive.

In [4]:
# Check file list in current directory
!ls '/content/drive/My Drive'

 CIT594GroupProjectUML.drawio  'MCIT 594 Final Project Report.gdoc'
'Colab Notebooks'	       'Meeting Notes.gdoc'
'Getting started.pdf'	        wsb_post_filtered_by_tickers.gsheet
'Group Discussion.gsheet'       wsb_post.gsheet
 kaggle.json


In [5]:
# Load data from csv to pandas dataframe
path = '/content/drive/MyDrive/Colab Notebooks/wsb_post_filtered_by_tickers.csv'
wsb_data_df = pd.read_csv(path)

**Step 3**: Check characteristics of the dataset.

In [6]:
# Check number of records loaded
len(wsb_data_df)

32603

In [7]:
# Check data type
wsb_data_df.dtypes

date       object
ticker     object
comment    object
dtype: object

In [8]:
# Print data in first 5 rows
wsb_data_df.head(5)

Unnamed: 0,date,ticker,comment
0,2023-04-14,TSLA,"Everyone seeing Tesla price cuts and say ""we g..."
1,2023-04-14,TSLA,!banbet TSLA -100% 1h
2,2023-04-14,NVDA,In the future your son will say “dad you knew ...
3,2023-04-14,TSLA,"Come on TSLA, get your ass to $200 you can’t g..."
4,2023-04-14,TSLA,The cool thing about TSLA is you can sell your...


**Step 4**: Load the pre-trained `Flair` model and tokenizer

In [9]:
flair_sentiment = flair.models.TextClassifier.load('en-sentiment')

2023-04-28 07:07:53,196 https://nlp.informatik.hu-berlin.de/resources/models/sentiment-curated-distilbert/sentiment-en-mix-distillbert_4.pt not found in cache, downloading to /tmp/tmpym0tyjbh


100%|██████████| 253M/253M [00:11<00:00, 23.9MB/s]

2023-04-28 07:08:04,723 copying /tmp/tmpym0tyjbh to cache at /root/.flair/models/sentiment-en-mix-distillbert_4.pt





2023-04-28 07:08:05,417 removing temp file /tmp/tmpym0tyjbh


Downloading (…)okenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

**Step 5**: Define function to extract the sentiment score. Since `Flair` will carry out label of the sentence in `POSITIVE` and `NEGATIVE`, we will combine the infomration of the label with the score predicted by this model.

In [11]:
# Define the function to extract sentiment score
def senti_score(n):
    s = flair.data.Sentence(n)
    flair_sentiment.predict(s)
    total_sentiment = s.labels[0]
    assert total_sentiment.value in ['POSITIVE', 'NEGATIVE']
    sign = 1 if total_sentiment.value == 'POSITIVE' else -1
    score = total_sentiment.score
    return sign * score

**Step 6**: Apply the function to `wsb_data_df` and collect the outputs.

In [15]:
wsb_data_df['sentiment'] = wsb_data_df.comment.map(senti_score)

**Step 7**: Export the dataframe as **CSV** file.

In [17]:
wsb_data_df.to_csv(r'/content/drive/MyDrive/Colab Notebooks/wsb_data_with_senti.csv', index=False)