EXPERIMENT EXPLANATION

CONTEXT: We use a Large Language Model for Question Answering Solution. To do this, we employ an RAG-like model:
1. Build a corpus of Context files;
2. Find a Context that best correlates with a user's question;
3. Use an LLM to extract an Answer from the Context.

For analysis purposes, we take the answer and the context and ask the LLM model: does the context contain the answer or not? 
This serves as a surrogate for human review of the answers. As a result, we obtain a classification of the answers: True Positive, True Negative, etc.

Eyeball review shows that in some cases, LLM classification is unstable: if we run LLM classification several times, we sometimes get different results for each run. Let's name this 'ambiguity' and 'ambiguous context'.

It's a strange situation. We can brainstorm many hypotheses to explain it. But data comes first!

GOAL: a set of ambiguous context files

NOTE: The code can be improved and made shorter.

**STEP 1: Calculate F1 Score**


How dose this ambiguity impact to F1

In [None]:
import psycopg2
from psycopg2 import Error

try:
    # Connect to PostgreSQL database
    connection = psycopg2.connect(
        dbname=os.environ.get('DB_NAME'),
        user=os.environ.get('DB_USER'),
        password=os.environ.get('DB_PASSWORD'),
        host=os.environ.get('DB_HOST'),
        port=os.environ.get('DB_PORT')
    )


    cursor = connection.cursor()

    # Define your SQL query
    sql_query = """
                    SELECT
                        binary_classification, count(binary_classification) 
                    FROM chat_accuracy WHERE (email IS NULL OR email = '') 
                        AND cast(created_at as date) > '2024-02-01' 
                    GROUP by binary_classification;
                """

    # Execute the SQL query
    cursor.execute(sql_query)

    # Fetch all rows from the result set
    rows = cursor.fetchall()

    # Print each row
    for row in rows:
        print(row)

except (Exception, Error) as error:
    print("Error while connecting to PostgreSQL:", error)

finally:
    # Close database connection
    if connection:
        cursor.close()
        connection.close()

In [None]:
true_positives = 0
false_positives = 0
false_negatives = 0

counts = {
    'True Positive': 'true_positives',
    'False Negative': 'false_negatives',
    'False Positive': 'false_positives'
}

for row in rows:
    if row[0] in counts:
        globals()[counts[row[0]]] = row[1]
    else:
        pass

print (true_positives, false_positives, false_negatives)

In [None]:
# Calculate precision
precision = true_positives / (true_positives + false_positives)

# Calculate recall
recall = true_positives / (true_positives + false_negatives)

# Calculate F1 score
f1_score = 2 * (precision * recall) / (precision + recall)
print(f1_score)

STEP 2: Get ambiguit context files


In [1]:
import psycopg2
from psycopg2 import Error

try:
    # Connect to PostgreSQL database
    connection = psycopg2.connect(
        dbname=os.environ.get('DB_NAME'),
        user=os.environ.get('DB_USER'),
        password=os.environ.get('DB_PASSWORD'),
        host=os.environ.get('DB_HOST'),
        port=os.environ.get('DB_PORT')
    )


    cursor = connection.cursor()

    # Define your SQL query
    sql_query = """SELECT 
                        Answer,
                        SUM(CASE WHEN binary_classification = 'True Positive' THEN 1 ELSE 0 END) AS tp_count,
                        SUM(CASE WHEN binary_classification = 'True Negative' THEN 1 ELSE 0 END) AS tn_count,
                        SUM(CASE WHEN binary_classification = 'False Positive' THEN 1 ELSE 0 END) AS fp_count,
                        SUM(CASE WHEN binary_classification = 'False Negative' THEN 1 ELSE 0 END) AS fn_count
                    FROM 
                        chat_accuracy
                    WHERE
                    	(email IS NULL OR email = '')
                        AND answer NOT IN (
                            'Sorry, we don''t have an answer to your question.',
                            'Sorry, I cannot provide an answer as there is no context provided for me to base my response on. Please provide more information or context for me to assist you better.',
                            'Answer: Sorry, I cannot provide an answer as there is no context answer or question provided. Please provide more information so I can assist you better.',
                            'I don''t know.'
                    )
                    GROUP BY 
                        Answer;
                """

    # Execute the SQL query
    cursor.execute(sql_query)

    # Fetch all rows from the result set
    rows = cursor.fetchall()

    #Print each row
    for row in rows:
        print(row)

except (Exception, Error) as error:
    print("Error while connecting to PostgreSQL:", error)

finally:
    # Close database connection
    if connection:
        cursor.close()
        connection.close()

('Star OTP GSS report.', 2, 0, 0, 0)
("Depth of Knowledge is different from Bloom's Taxonomy as it focuses on the complexity of thinking required to answer a question, rather than categorizing learning objectives into different levels.", 1, 0, 0, 0)
('The agenda for the webinar includes an overview of new features for back to school, presented by Sheila Montreat and Vicky Ross.', 2, 0, 0, 0)
('The training with Hayley Bradley is an hour.', 1, 2, 0, 0)
("Hello! How can I assist you today? If you have any questions or need help with the files you've uploaded, feel free to ask!", 0, 2, 0, 0)
('Context answer: an hour', 1, 0, 0, 0)
("Star CBM, or curriculum-based measurement, is a series of short 60 to 92-second assessments designed to help the teacher accurately assess students' readings so that instruction and intervention can be better targeted to the learner's specific needs. Star CBM is a dyslexia screener that helps ensure compliance with RSA laws. It is intended to identify risks fo

In [2]:
### filter answers that appear only one time
result = [tup for tup in rows if sum(1 for element in tup[1:4] if element > 0) >= 2]

print('count of Answers '+ str(len(rows)))
print('ambiguit Answers ' + str(len(result)))
print('racio: '+ str(len(result)/len(rows)))

print(result)


count of Answers 767
ambiguit Answers 19
racio: 0.024771838331160364
[('The training with Hayley Bradley is an hour.', 1, 2, 0, 0), ('While the document specifically references using a multisensory teaching approach, it does not provide explicit examples of this approach in practice. The document highlights the inclusion of visual, auditory, kinesthetic, and tactile elements as part of the multisensory teaching approach recommended for students with dyslexia, but it stops short of giving specific classroom activities or exercises that embody these principles. Generally, such an approach could include activities like using flashcards for visual learning, reciting information aloud for auditory reinforcement, incorporating movement or gestures to explain concepts for kinesthetic learning, and engaging in hands-on activities like tracing or building for tactile learning. However, for detailed examples tailored to specific subjects or skills, one might look to educational resources or teac

In [3]:
ambiguity_questions = [tup[0] for tup in result]
print (ambiguity_questions)


['The training with Hayley Bradley is an hour.', 'While the document specifically references using a multisensory teaching approach, it does not provide explicit examples of this approach in practice. The document highlights the inclusion of visual, auditory, kinesthetic, and tactile elements as part of the multisensory teaching approach recommended for students with dyslexia, but it stops short of giving specific classroom activities or exercises that embody these principles. Generally, such an approach could include activities like using flashcards for visual learning, reciting information aloud for auditory reinforcement, incorporating movement or gestures to explain concepts for kinesthetic learning, and engaging in hands-on activities like tracing or building for tactile learning. However, for detailed examples tailored to specific subjects or skills, one might look to educational resources or teaching guides focused on multisensory instructional strategies.', 'Semester 1 grades a

In [4]:
import psycopg2

# Create a connection to your PostgreSQL database
try:
    connection = psycopg2.connect(
        dbname=os.environ.get('DB_NAME'),
        user=os.environ.get('DB_USER'),
        password=os.environ.get('DB_PASSWORD'),
        host=os.environ.get('DB_HOST'),
        port=os.environ.get('DB_PORT')
    )

    print("Connected to the database successfully!")

    # Create a cursor object
    cursor = connection.cursor()

    # Constructing the query string with placeholders
    query = "SELECT ID FROM chat_accuracy WHERE answer IN ({})".format(', '.join(['%s'] * len(ambiguity_questions)))

    # Execute the query with the list of strings ambiguity_questions
    cursor.execute(query, ambiguity_questions)

    # Fetch the results
    result = cursor.fetchall()

    # Print the result or process it further
    print(result)

except psycopg2.Error as e:
    print("Unable to connect to the database:", e)
finally:
    # Close the cursor and connection
    if 'cursor' in locals():
        cursor.close()
    if 'conn' in locals():
        conn.close()


Connected to the database successfully!
[(32,), (33,), (53,), (55,), (56,), (81,), (87,), (107,), (118,), (180,), (181,), (253,), (432,), (444,), (453,), (477,), (533,), (696,), (697,), (1540,), (1559,), (1573,), (1579,), (1705,), (1706,), (1867,), (1868,), (1869,), (1870,), (1945,), (1946,), (1995,), (1996,), (1997,), (1998,), (2046,), (2047,), (2048,), (2050,), (2051,), (2053,), (2054,), (2055,), (2056,), (2057,), (2058,), (2060,), (2061,), (2062,), (2063,), (2064,), (2097,), (2102,), (2103,), (2104,), (2175,), (2176,), (2197,), (2198,), (2218,), (2225,), (2228,), (2255,)]


In [5]:
ambiguity_questions_id = [tup[0] for tup in result]
print (ambiguity_questions_id)

[32, 33, 53, 55, 56, 81, 87, 107, 118, 180, 181, 253, 432, 444, 453, 477, 533, 696, 697, 1540, 1559, 1573, 1579, 1705, 1706, 1867, 1868, 1869, 1870, 1945, 1946, 1995, 1996, 1997, 1998, 2046, 2047, 2048, 2050, 2051, 2053, 2054, 2055, 2056, 2057, 2058, 2060, 2061, 2062, 2063, 2064, 2097, 2102, 2103, 2104, 2175, 2176, 2197, 2198, 2218, 2225, 2228, 2255]


In [6]:
# Create a connection to your PostgreSQL database
try:
    connection = psycopg2.connect(
        dbname=os.environ.get('DB_NAME'),
        user=os.environ.get('DB_USER'),
        password=os.environ.get('DB_PASSWORD'),
        host=os.environ.get('DB_HOST'),
        port=os.environ.get('DB_PORT')
    )

    print("Connected to the database successfully!")

    # Create a cursor object
    cursor = connection.cursor()

    # Constructing the query string with placeholders
    query ="""
        SELECT 
            chat_accuracy.id as "chat accuracy id", 
            lo_text.course_id,
            lo_text.lo_item_id,
            lo_text.lo_item_text
        FROM public.chat_accuracy
            INNER JOIN messages ON chat_accuracy.id = messages.id
            LEFT OUTER JOIN public.lo_text
                ON (messages.Metadata::jsonb #>> '{{0,lo_item_id}}')::int = lo_text.lo_item_id
                AND (messages.Metadata::jsonb #>> '{{0,id_course}}')::int = lo_text.course_id
        WHERE chat_accuracy.id in ({})
    """.format(', '.join(['%s'] * len(ambiguity_questions_id)))
    
    # Execute the query with the list of strings ambiguity_questions
    cursor.execute(query, ambiguity_questions_id)

    # Fetch the results
    result_lo_text = cursor.fetchall()

    # Print the result or process it further
    print(result_lo_text)

except psycopg2.Error as e:
    print("Unable to connect to the database:", e)
finally:
    # Close the cursor and connection
    if 'cursor' in locals():
        cursor.close()
    if 'conn' in locals():
        connection.close()


Connected to the database successfully!


### Get ambiguid texts, lo_item & cors IDs 

In [None]:
unique_txt_id = []
unique_txt = []
for tup in result_lo_text:
    if tup[1] not in unique_txt_id:
        unique_txt.append(tup[1:4])
        unique_txt_id.append(int(tup[1]))

print(unique_txt)

### Save result as .txt files

In [None]:
for tup in unique_txt:
    file_name = "{}_{}.txt".format(tup[0], tup[1])
    file_content = str(tup[2])
    with open(file_name, "w", encoding="utf-8") as file:
        file.write(file_content)