# First and foremost, thank you for sharing this dataset.
# Author: Thomas Gamet licensing under the Apache 2.0 license.

Updated, 90% accuracy with Chat GPT as an LLM performing diagnosis. Cohen's Kappa of 0.78. 
Lesson: ChatGPT hallucinates with conflicting inputs, better results via more specific focus, and prompt design matters. Three minor adjustments were tried, and the result below is consistent for what I've been able to prompt from ChatGPT 3.5 (exact model name is in the code below). 
Both agree on potential suicide post =  541 times.
Both agree on not suicide post =  1068 times.
Ground truth says potential suicide post and LLM disagrees =  118 times.
Ground truth says not a suicide post and LLM disagrees =  58 times.
Calculated Observed Agreement = 0.9014005602240897
Calculated Chance Agreement = 0.5430178345848143
Cohen's Kappa= 0.7842378822676176

Reading https://www.nature.com/articles/s41598-018-25773-2 (Identifying Suicide Ideation and Suicidal Attempts in a Psychiatric Clinical Research Database using Natural Language Processing)
"The sets were independently classified by all three authors with good inter-rater agreement as indicated by a Cohen’s kappa of 0.85 for suicide ideation and 0.86 for suicide attempt. (See supplementary material for further details on inter-rater agreement and rules: https://github.com/andreafernandes/NLP_Tools_Development). It seems that 86% agreement may be in a gold standard range. I also took a few cues and updated the prompts.

Added a Cohen Kappa calculation with further optimization of the prompts and loops:
Both agree on potential suicide post =  459 times.
Both agree on not suicide post =  1100 times.
Ground truth says potential suicide post and LLM disagrees =  200 times.
Ground truth says not a suicide post and LLM disagrees =  26 times.
Calculated Observed Agreement = 0.8733893557422969
Calculated Chance Agreement = 0.5597266357523402
Cohen's Kappa= 0.7124271996920467 (this is mid range for generally accepted significant agreement)

There were a total of 226 errors made relative to ground truth.
Accuracy is at 87.3389355742297 %

The obserer agreement looks good, and Kappa is generally also a good result, even if a bit
below the gold standard of the article above. For those not familiar with Cohen's Kappa:https://www.statology.org/cohens-kappa-statistic/

I am not sure where an algorithm might be overfitting, but I'm pretty sure a generalized solution
can only be a little more accurate without being unusually too accurate relative to human on human comparisons and those human on human should be the most generalized comparisons.

Experiment: Loaded 80% of the tweets for similarity matches in a stratified way made only a small change in accuracy with the main difference in the false negatives. Experiments:
1. Originally with the LLM deciding on potential suicide posts alone was 82.5% accuracy
2. With 80% stratified in the vectorstore the potential suicide post achieved 83.6% accuracy
3. Loading 80% of the tweets with potential suicide post status shows 84.6% accuracy
4. A prompt improvement aimed at considering depression and becoming tired of life shows 87.3% accuracy
5. Updated the prompts to aim at improvements after reading (Identifying Suicide Ideation and Suicidal Attempts in a Psychiatric Clinical Research Database using Natural Language Processing)

The OpenAI ChatGPT LLM heavily seems to favor the probability of saying a large number of ground
truth cases for potential suicide posts are not suicide posts (shown as false negatives against the ground truth) even when the vector store is indexed with many of the actual data points providing similarity matches for being potential suicide posts.

The prompts were improved, the output made easier to read when confirming the LLMs performance and
the number of retries needed when the LLM produces unusable results. Also, the results are now 
processed to make sure outputs are valid responses, and exceptoins on dictionary processing are
handled (keys are found).

## 1 Use pip to install langchain and Chroma
## 2 Load and prepare a dataset for ChatGPT to work upon
## 3 Ask ChatGPT to analyze batches of 4 tweets for suicide related ideation
## 4 Report False Positives, False Negatives, and overall accuracy (with listing)
## 5 Report Cohen's Kappa
## 6 Load a VectorstoreIndex with all the tweets to ask inter-tweet questions
## 7 Show a few (3) questions and the answers given

You will need to use your own OpenAI API key or change to a 'no cost' LLM.
Hope this proves useful to some viewer. A full run of this notebook, using
default setting with ChatGPT 3.5, cost about 0.40 USD per OpenAI's usage information.

In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/suicidal-tweet-detection-dataset/Suicide_Ideation_Dataset(Twitter-based).csv


# Several pip installation calls are made

In [2]:
pip install langchain

Collecting langchain
  Downloading langchain-0.0.278-py3-none-any.whl (1.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m14.4 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting langsmith<0.1.0,>=0.0.21 (from langchain)
  Downloading langsmith-0.0.32-py3-none-any.whl (36 kB)
Installing collected packages: langsmith, langchain
Successfully installed langchain-0.0.278 langsmith-0.0.32
Note: you may need to restart the kernel to use updated packages.


In [3]:
pip install openai

Collecting openai
  Downloading openai-0.28.0-py3-none-any.whl (76 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.5/76.5 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: openai
Successfully installed openai-0.28.0
Note: you may need to restart the kernel to use updated packages.


In [4]:
pip install chromadb

Collecting chromadb
  Downloading chromadb-0.4.8-py3-none-any.whl (418 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m418.3/418.3 kB[0m [31m6.5 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Collecting chroma-hnswlib==0.7.2 (from chromadb)
  Downloading chroma-hnswlib-0.7.2.tar.gz (31 kB)
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
Collecting posthog>=2.4.0 (from chromadb)
  Downloading posthog-3.0.2-py2.py3-none-any.whl (37 kB)
Collecting pulsar-client>=3.1.0 (from chromadb)
  Downloading pulsar_client-3.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.4/5.4 MB[0m [31m50.2 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting onnxruntime>=1.14.1 (from chromadb)
  Downloading onnxruntime-1.15.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x8

In [5]:
pip install tiktoken

Collecting tiktoken
  Downloading tiktoken-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m18.0 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Installing collected packages: tiktoken
Successfully installed tiktoken-0.4.0
Note: you may need to restart the kernel to use updated packages.


# Load and prepare a dataset for ChatGPT to work upon

In [6]:
# Step one, get data and supporting modules into place

import pandas as pd
from datasets import Dataset
from langchain.document_loaders import TextLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.indexes import VectorstoreIndexCreator
from langchain.docstore.document import Document
from langchain.chat_models import ChatOpenAI
import os
import openai
import sys
import ast
import time

dataname = "/kaggle/input/suicidal-tweet-detection-dataset/Suicide_Ideation_Dataset(Twitter-based).csv"
tweet_df = pd.read_csv(dataname)

tweet_df = tweet_df.dropna(subset=['Tweet','Suicide']) # get rid of these
tweet_df = tweet_df.reset_index(drop=True)

ltest_df = tweet_df.copy() # [0:5].copy() # make it small for testing
ltest_df.drop(columns=['Suicide'],inplace=True)

print ("Tweets are loaded and dropna applied to nulls.")
print ("A local test dataframe (ltest_df) is ready for use.")
print ("The shape of ltest_df is",ltest_df.shape)


Tweets are loaded and dropna applied to nulls.
A local test dataframe (ltest_df) is ready for use.
The shape of ltest_df is (1785, 1)


# Prvide the LLM with 80% of the tweets as examples for its categorization

Provide the LLM an Indexed Vectorstore using a typical 80% of the data as training data
and then ask it to see if it recognized the categorization from the training data when it
is tested for overall accuracy.

In [7]:
# Now, to show the power of a vector index, each tweet will be loaded.
# The tweets will be one document each vectorized and with embeddings
# There is no need to split the documents

# However, the prompt are not worded to look for data in this vectorstore

os.environ["OPENAI_API_KEY"] = "sk-"

print ("Provide the LLM with 80% (4 of 5) of the tweets as training knowledge.")
doclist = []
total_not_sui = 0
total_sui = 0
for tweetnum in range(tweet_df.shape[0]): #tweet_df.shape[0]):
    if tweet_df.loc[tweetnum,'Suicide'] == 'Not Suicide post':
        total_not_sui += 1 # skip these
        #if total_not_sui % 5 > 0:
        #    thetweetcase = "Tweets like " + tweet_df.loc[tweetnum,'Tweet'] + " are to be Ignored"
        #    raw_documents = Document(page_content=thetweetcase)
        #    doclist.append(raw_documents)
    else:
        total_sui += 1
        if total_sui % 5 > 0:
            thetweetcase = "Tweets like " + tweet_df.loc[tweetnum,'Tweet'] + " are potential suicide posts to be diagnosed as Follow-up"
            raw_documents = Document(page_content=thetweetcase)
            doclist.append(raw_documents)

print("Loading",len(doclist),"tweets into the VectorstoreIndex as examples.")
index = VectorstoreIndexCreator(vectorstore_kwargs={"persist_directory":None}).from_documents(doclist)


Provide the LLM with 80% (4 of 5) of the tweets as training knowledge.
Loading 528 tweets into the VectorstoreIndex as examples.


# Ask ChatGPT to analyze batches of 4 tweets for suicide related ideation

The following prompts failed to deliver over 82% accuracy and Cohen's Kappa over .59
Reverting prompts and trying one last and more expensive experiment (a batch size of 1 so
I know the LLM is not mixing tweet contents).

    query = "Given tweets "+ \
        str(raw_documents) + \
        "From the perspective of the National Institute of Mental Health " + \
        "Examine these tweets and assess each if it contains any warning signs of suicide risk. " + \
        "Pay attention to the language used, any expressions of despair, depression, anxiety, feelings of being trapped or overwhelmed, or mentions of wanting to end one’s life" + \
        "Also, look for any indications of hopelessness, helplessness, hate, being tired, or worthlessness, as well as any mention of plans or preparations for self-harm. " + \
        "Take every tweet seriously." + \
        "Use a diagnosis of Follow-up with any potential things checked for above and use a diagnosis of Ignore when there is no risk. " + \
        "You must answer as a python dictionary with keys tweet_number and diagnosis in the format "+ \
        " { tweets: [{ tweet_number : x, diagnosis: value}]}."
    query2 = "Retry tweets "+ \
        str(raw_documents) + \
        "From the perspective of the National Institute of Mental Health " + \
        "Examine these tweets and assess each if it contains any warning signs of suicide risk. " + \
        "Consider factors such as the language used, the tone of the message, and any references to self-harm, being tired, or self hate." + \
        "Also, look for any indications of depression, hopelessness, helplessness, or worthlessness, as well as any mention of plans or preparations for self-harm. " + \
        "Take every tweet seriously." + \
        "Use a diagnosis of Follow-up with any potential risk factors above and use a diagnosis of Ignore when there is no risk. " + \
        "You must answer as a python dictionary with keys tweet_number and diagnosis in the format "+ \
        " { tweets: [{ tweet_number : x, diagnosis: value}]}."
    query3 = "Last try with tweets "+ \
        str(raw_documents) + \
        "From the perspective of the National Institute of Mental Health " + \
        "Evaluate these tweets and determine if it suggests that the individual may be at risk for suicide.  " + \
        "Pay attention to the language used, any expressions of despair, feelings of being trapped or overwhelmed, or mentions of wanting to end one’s life" + \
        "Also, look for any indications of hopelessness, helplessness, or worthlessness, as well as any mention of plans or preparations for self-harm. " + \
        "Take every tweet seriously." + \
        "Use a diagnosis of Follow-up with any potential risk factors for suicide above and use a diagnosis of Ignore when there is no risk. " + \
        "You must answer as a python dictionary with keys tweet_number and diagnosis in the format "+ \
        " { tweets: [{ tweet_number : x, diagnosis: value}]}."


In [8]:
# Now it will ask ChatGPT to analyze each tweet as an independent post for whether they potentially contains suidical wording.
# It will batch the tweets to reduce the number of tokens required, especially since the API is paid for by the tokens used.
# It is not searching for similarities, differences, or otherwise comparing the different tweets, so there is no need to load
# a vector database with documents to search. That is left as an exercise after the false positives and false negatives are 
# reviewed.

import pandas as pd

# we will be checking for any updates that were missed - the values set on the next line are over-ridden
tweet_df['diagnosis'] = tweet_df['Suicide'] # this is a quick way to add a column, next we update with ltest_df's results

BATCH_SIZE = 1
batch_cnt = 1784 # int(ltest_df.shape[0] / BATCH_SIZE)
response = ""
responsedict = {}

for batch in range(batch_cnt):
    raw_documents = Document(page_content=str(ltest_df.loc[batch*BATCH_SIZE:(batch+1)*BATCH_SIZE-1,"Tweet"].to_dict()))
    if batch == batch_cnt - 1: # try to squeeze 5 into the last batch
        raw_documents = Document(page_content=str(ltest_df.loc[batch*BATCH_SIZE:(batch+1)*BATCH_SIZE,"Tweet"].to_dict()))
    print("Processing batch",batch+1,"of",batch_cnt,"batches.") #:",raw_documents)
    query = "Given tweets "+ \
        str(raw_documents) + \
        "Make a diangosis for Follow-up when a tweet " + \
        "has potential suicidal ideations, depression, hopelessness, helplessness, " + \
        "loss of interest in life, or self harm. " + \
        "Use a diagnosis of Ignore when the tweet does not suggest self harm. "+ \
        "It is better have a diagnosis as Follow-up than a diagnosis as Ignore and double check before a diagnosis for Ignore." + \
        "You must answer as a python dictionary with keys tweet_number and diagnosis in the format "+ \
        " { tweets: [{ tweet_number : x, diagnosis: value}]}."
    query2 = "Retry tweets "+ \
        str(raw_documents) + \
        "You must use a diagnosis of Follow-up when a tweet " + \
        "has potential suicidal thoughts, depression of any kind, hopelessness, helplessness, deep loss," + \
        "wording that shows a interest in death, or suggests self harm. " + \
        "It is better to offer a diagnosis of Follow-up than a diagnosis of Ignore." + \
        "Otherwise the tweet receives a diagnosis of Ignore. "+ \
        "You must answer as a python dictionary with keys tweet_number and diagnosis in the format "+ \
        " { tweets: [{ tweet_number : x, diagnosis: value}]}."
    query3 = "Last try with tweets "+ \
        str(raw_documents) + \
        "Be sure to make a diangosis for Follow-up when a tweet " + \
        "has potential suicidal writing, serious depression, " + \
        "or words that suggest self harm. " + \
        "If the tweet is not a potential suicide post then use a diagnosis of Ignore. "+ \
        "You must answer as a python dictionary with keys tweet_number and diagnosis in the format "+ \
        " { tweets: [{ tweet_number : x, diagnosis: value}]}."   # Kept testing affordable
    otryno = 0
    while otryno < 7:
        otryno += 1
        tryno = 0
        if otryno > 4: # attempt at a fail safe last resort - short enough to usually be understood down to its output format
            query = "When answering do so as a python dictionary. Given tweets "+ \
                     str(raw_documents) + \
                "With preference to diagnosis for Follow-up when a tweet " + \
                "has potential suicidal thoughts, depression of any kind, " + \
                "wording that shows a interest in death, or suggests self harm. " + \
                "If the tweet appears not potentially suicidal then use a diagnosis of Ignore. "+ \
                "You must answer as a python dictionary with keys tweet_number and diagnosis in the format "+ \
                " { tweets: [{ tweet_number : x, diagnosis: value}]}." 
        while tryno < 3:
            time.sleep(3) # better to pace internally then by OpenAI via trottling this process
            if tryno == 0:
                response = index.query(query, llm=ChatOpenAI(model_name="gpt-3.5-turbo"))
            elif tryno == 1:
                response = index.query(query2, llm=ChatOpenAI(model_name="gpt-3.5-turbo"))
            else:
                response = index.query(query3, llm=ChatOpenAI(model_name="gpt-3.5-turbo"))
            try:
                responsedict = ast.literal_eval(response)
                tryno = 4 # made it - ast was able to read a dictionary.
            except:
                print("Must retry, on try",tryno+1) # the LLM was a jerk and ignored the must answer format
            tryno += 1
        if tryno > 3:
            tweetlist = responsedict['tweets']
            if len(tweetlist) < 1:
                print("Must retry when the LLM failed to produce all answers, try=",otryno)
                time.sleep(7+otryno) # do not rush, this is a problem with the LLM
            else:
                ootryno = otryno
                otryno = 8 # yes, we will process answers and go to the next set of tweets
                resetRetry = False
                for i in range(len(tweetlist)):
                    diagdict = tweetlist[i]
                    idx = 0
                    try:
                        idx = int(diagdict['tweet_number'])
                    except:
                        otryno = ootryno
                        print("The LLM return a dictionary that does not contain tweet_number")
                        break
                    if idx >= batch*BATCH_SIZE and idx <= batch*BATCH_SIZE+BATCH_SIZE:
                        diag = "Nogo"
                        try:
                            diag = diagdict['diagnosis']
                        except:
                            otryno = ootryno
                            print("The LLM returned a dictionary that does not contain diagnosis")
                            break
                        tweet_df.loc[idx,'diagnosis'] = diag
                        if diag != "Ignore" and diag != "Follow-up":
                            print ("The LLM gave a disqualified diagnosis of ", diag)
                            resetRetry = True
                    else:
                        print ("The LLM made an index error reporting idx=", idx)
                        resetRetry = True
                if resetRetry:
                    otryno = ootryno
    if otryno == 7:
        print("Failed to process this batch!")
print("Completed processing.")


Processing batch 1 of 1784 batches.
Processing batch 2 of 1784 batches.
Processing batch 3 of 1784 batches.
Processing batch 4 of 1784 batches.
Processing batch 5 of 1784 batches.
Processing batch 6 of 1784 batches.
Processing batch 7 of 1784 batches.
Processing batch 8 of 1784 batches.
Processing batch 9 of 1784 batches.
Processing batch 10 of 1784 batches.
Processing batch 11 of 1784 batches.
Processing batch 12 of 1784 batches.
Processing batch 13 of 1784 batches.
Processing batch 14 of 1784 batches.
Processing batch 15 of 1784 batches.
Processing batch 16 of 1784 batches.
Processing batch 17 of 1784 batches.
Processing batch 18 of 1784 batches.
Processing batch 19 of 1784 batches.
Processing batch 20 of 1784 batches.
Processing batch 21 of 1784 batches.
Processing batch 22 of 1784 batches.
Processing batch 23 of 1784 batches.
Processing batch 24 of 1784 batches.
Processing batch 25 of 1784 batches.
Processing batch 26 of 1784 batches.
Processing batch 27 of 1784 batches.
Processing

# Report False Positives, False Negatives, and overall accuracy

In [9]:
# Make sure there are no values in the 'diagnosis' column with the original wording of the 'Suicide' column
# If there are the LLM either messed up or skipped a tweet
condition = (tweet_df['diagnosis'] == 'Not Suicide post') | (tweet_df['diagnosis'] == 'Potential Suicide post ')
missed_df = tweet_df.loc[condition]

missed_cnt = missed_df.shape[0]

print("There are",missed_cnt,"cases where the LLM did not make a diagnosis (we are expecting 0).")

There are 0 cases where the LLM did not make a diagnosis (we are expecting 0).


In [10]:
# Identify the False Positives, these are cases where it suggeests to Follow-up on post and the ground truth said
# it is "Not Suicide post".
# The LLM sometimes get's confused and outputs sentence capitalized, no extra space, Potential suicide post
condition = (tweet_df['Suicide'] == 'Not Suicide post') & ((tweet_df['diagnosis'] == 'Follow-up') | (tweet_df['diagnosis'] == 'Potential suicide post'))
falsepos_df = tweet_df.loc[condition]

falsepos_cnt = falsepos_df.shape[0]

print("There are",falsepos_cnt,"false positives where a follow-up was recommended but the ground truth says it is not a suicide post")

for index, row in falsepos_df.iterrows():
    print("Tweet says: ",falsepos_df.loc[index,'Tweet'])

There are 58 false positives where a follow-up was recommended but the ground truth says it is not a suicide post
Tweet says:  @jrkgirlnla Oh yes, I was reading that one
Tweet says:  i'm tired,
Tweet says:  @mikeyway http://twitpic.com/2tu6p - take care guys..!!
Tweet says:  @jackhii n i got yr pic posing in the server room ..
Tweet says:  Going to school and enjoying my last day as a 16 year old but too
Tweet says:  @JonathanRKnight
Tweet says:  @geuphers, you will fully bloom after all the hardships. i love u ):
Tweet says:  1300 words...
Tweet says:  RT @0mysky: If you find someone that makes you happy, enjoy it... life is not fair nor give many opportunities... https://t.co/dCBwDeQVH8
Tweet says:  phone againn
Tweet says:  @iwrotethis: Nope, you just caught me, cheers, I'm sure I will. Hope the sunshine holds out for you
Tweet says:  today i almost kill myself because I almost hit by a car, thank lord.
Tweet says:  @mizphenomenal god wrote that i just typed it
Tweet says:  @annaarc

In [11]:
condition = (tweet_df['Suicide'] == 'Potential Suicide post ') & (tweet_df['diagnosis'] == 'Ignore')
falseneg_df = tweet_df.loc[condition]

falseneg_cnt = falseneg_df.shape[0]

print("There are",falseneg_cnt,"false negatives where ignoring was recommended but the ground truth says it is a potential suicide post")

for index, row in falseneg_df.iterrows():
    print("Tweet says: ",falseneg_df.loc[index,'Tweet'])

There are 118 false negatives where ignoring was recommended but the ground truth says it is a potential suicide post
Tweet says:  @dizzyhrvy that crap took me forever to put together. iâm going to go sleep for DAYS
Tweet says:  I have an awful habit of avoiding writing papers by watching Instagram live videos of the kids I used to nanny forâ¦ https://t.co/NpfZu06gwy
Tweet says:  RT @tamicakeyona: Being a single mommy was never part of the plan but I wake up everyday and do my shitðªð¾
Tweet says:  I damn near hate smoking by myself but sometimes I like it cause I canât finish blunts and I be having some for later.
Tweet says:  @EmisonNaomily but still,i highkey want him to be dead. He f killed peach and beck ugh.Candice is back so imma justâ¦ https://t.co/5pBYOKbtRc
Tweet says:  @FrankDiElsi1 No I want him to get indicted &amp; watch the kids go to jail - I donât want him to die because the Trumpâ¦ https://t.co/BPMdXzujoo
Tweet says:  @EricBoehlert Nothing to live for? Wh

In [12]:
#a
condition = (tweet_df['Suicide'] == 'Potential Suicide post ') & (tweet_df['diagnosis'] == 'Follow-up')
bothagree_psp_df = tweet_df.loc[condition]
bothagree_psp_cnt = bothagree_psp_df.shape[0]
print ("Both agree on potential suicide post = ", bothagree_psp_cnt, "times.")

#d
condition = (tweet_df['Suicide'] == 'Not Suicide post') & (tweet_df['diagnosis'] == 'Ignore')
bothagree_npsp_df = tweet_df.loc[condition]
bothagree_npsp_cnt = bothagree_npsp_df.shape[0]
print ("Both agree on not suicide post = ", bothagree_npsp_cnt, "times.")

#b 
condition = (tweet_df['Suicide'] == 'Potential Suicide post ') & (tweet_df['diagnosis'] == 'Ignore')
psp_vs_ignore_df = tweet_df.loc[condition]
psp_vs_ignore_cnt = psp_vs_ignore_df.shape[0]
print ("Ground truth says potential suicide post and LLM disagrees = ", psp_vs_ignore_cnt, "times.")

#c
condition = (tweet_df['Suicide'] == 'Not Suicide post') & (tweet_df['diagnosis'] == 'Follow-up')
npsp_vs_followup_df = tweet_df.loc[condition]
npsp_vs_followup_cnt = npsp_vs_followup_df.shape[0]
print ("Ground truth says not a suicide post and LLM disagrees = ", npsp_vs_followup_cnt, "times.")

observed_agreement = (bothagree_psp_cnt + bothagree_npsp_cnt) / \
                (bothagree_psp_cnt + bothagree_npsp_cnt + psp_vs_ignore_cnt + npsp_vs_followup_cnt)

print ("Calculated Observed Agreement =", observed_agreement)

chance_agreement = \
    ((bothagree_psp_cnt + psp_vs_ignore_cnt) * (bothagree_psp_cnt + npsp_vs_followup_cnt) + \
     (npsp_vs_followup_cnt + bothagree_npsp_cnt) * (bothagree_npsp_cnt + psp_vs_ignore_cnt)) / \
    ((bothagree_psp_cnt + bothagree_npsp_cnt + psp_vs_ignore_cnt + npsp_vs_followup_cnt) * \
     (bothagree_psp_cnt + bothagree_npsp_cnt + psp_vs_ignore_cnt + npsp_vs_followup_cnt))

print ("Calculated Chance Agreement =", chance_agreement)

kappa = (observed_agreement - chance_agreement) / (1 - chance_agreement)

print ("Cohen's Kappa=", kappa)


Both agree on potential suicide post =  541 times.
Both agree on not suicide post =  1068 times.
Ground truth says potential suicide post and LLM disagrees =  118 times.
Ground truth says not a suicide post and LLM disagrees =  58 times.
Calculated Observed Agreement = 0.9014005602240897
Calculated Chance Agreement = 0.5430178345848143
Cohen's Kappa= 0.7842378822676176


In [13]:
# Let's see what the accuracy relative to ground truth looks like
totalerr_cnt = falsepos_cnt + falseneg_cnt
print ("There were a total of",totalerr_cnt,"errors made relative to ground truth.")
print ("Accuracy is at",(1-totalerr_cnt/tweet_df.shape[0])*100.0,"%")

There were a total of 176 errors made relative to ground truth.
Accuracy is at 90.14005602240897 %


# Load a VectorstoreIndex with all the tweets to ask inter-tweet questions

In [14]:
# Now, to show the power of a vector index, each tweet will be loaded.
# The tweets will be one document each vectorized and with embeddings
# There is no need to split the documents

print ("Build a list of documents, one per Tweet...")
doclist = []
for tweetnum in range(tweet_df.shape[0]): #tweet_df.shape[0]):  
    raw_documents = Document(page_content=str(tweet_df.iloc[tweetnum].to_dict()))
    doclist.append(raw_documents)

print("Working on loading tweets into the VectorstoreIndex using default settings for Chroma.")
index = VectorstoreIndexCreator(vectorstore_kwargs={"persist_directory":None}).from_documents(doclist)


Build a list of documents, one per Tweet...
Working on loading tweets into the VectorstoreIndex using default settings for Chroma.


## Show a few (3) questions and the answers given

In [20]:
query = "From the tweets you know show tweets most like: I am tired and going to sleep forever."
print("Asking: ", query)
response = index.query(query, llm=ChatOpenAI(model_name="gpt-3.5-turbo"))
print("Response: ",response)

Asking:  From the tweets you know show tweets most like: I am tired and going to sleep forever.
Response:  Tweets like "i wanna go to sleep forever and never wake up im so tired" are most similar to the statement "I am tired and going to sleep forever."


In [16]:
query = "Are there tweets being analyzed that are different than one like this: I am in love, and so happy. Give a few examples."
print("Asking: ", query)
response = index.query(query, llm=ChatOpenAI(model_name="gpt-3.5-turbo"))
print("Response: ",response)

Asking:  Are there tweets being analyzed that are different than one like this: I am in love, and so happy. Give a few examples.
Response:  Yes, there are tweets being analyzed that are different from the example you provided. Here are a few examples:

1. "A lot of my tweets are about being lost & miserable but it's not always all it's cracked out to be lol my life is a..." (Potential Suicide post)
2. "please somebody come fucking kill me im so in LOVE!!!" (Potential Suicide post)
3. "If you find someone that makes you happy, enjoy it... life is not fair nor give many opportunities..." (Not Suicide post)
4. "I've truly never been happier with my life :))) I'm rid of all the toxic people who brought me down :))) and those..." (Potential Suicide post)

Please note that the diagnosis of each tweet is provided based on the given context.


In [17]:
query = "Categorize the tweets according to emotional states and give an example of each category."
print("Asking: ", query)
response = index.query(query, llm=ChatOpenAI(model_name="gpt-3.5-turbo"))
print("Response: ",response)

Asking:  Categorize the tweets according to emotional states and give an example of each category.
Response:  Based on the provided context, the tweets can be categorized into the following emotional states:

1. Feeling overwhelmed: "almost broke down at class just because the most negative thoughts are kicking in and I'm mf tired of living hahahahahahahah"
2. Feeling betrayed and hurt: "I was drinking and crying my night away as usual to find out my ex is fucking a girl in his bed so I tried to killâ¦"
3. Feeling hopeless: "Tired of living like this"
4. Feeling lonely: "I feel so lonely"

Example of each category:
1. Feeling overwhelmed: The tweet mentions negative thoughts and being tired of living, indicating a sense of being overwhelmed by emotions and thoughts.
2. Feeling betrayed and hurt: The tweet describes the person's reaction to finding out about their ex-partner being with someone else, resulting in intense emotions and even a suicide attempt.
3. Feeling hopeless: The twee