<h1 style="text-align: center">
<div style="color: #DD3403; font-size: 60%">Data Science DISCOVERY MicroProject</div>
<span style="">AI versus Human: AI Response Analysis</span>
<div style="font-size: 60%;"><a href="https://discovery.cs.illinois.edu/microproject/ai-vs-human-response-analysis/">https://discovery.cs.illinois.edu/microproject/ai-vs-human-response-analysis/</a></div>
</h1>

<hr style="color: #DD3403;">

## Data Source: Hugging Face

[Hugging Face](https://huggingface.co/) is a company that has developed a platform for natural language processing (NLP) applications. They have created and shared a large collection of pre-trained models, datasets, and learning resources which are open-source and available for the public to use.

Hello-SimpleAI is a small group of researchers (PhD students and engineers) that released the [HC3 dataset](https://arxiv.org/abs/2301.07597). This dataset compares human and AI responses to the same question.

They also have a public [API](https://huggingface.co/datasets/Hello-SimpleAI/HC3/viewer/all/train) that allows you to access the data with the requests library.

### Loading the Dataset

For this MicroProject, we have provided the first 100 samples of the `Hello-SampleAI` dataset that was gathered from the following URL: https://datasets-server.huggingface.co/rows?dataset=Hello-SimpleAI%2FHC3&config=all&split=train&offset=0&length=100

This data has been cleaned up and provided to you as `huggingface-hello-sampleAI-100.csv`.  This dataset has 100 `question`s each with a `human_answer` from a human and a `chatgpt_answer` from ChatGPT.

Load this dataset as a DataFrame `df`:

In [397]:
import pandas as pd
df = pd.read_csv("huggingface-hello-sampleAI-100.csv")
df

Unnamed: 0.1,Unnamed: 0,question,human_answer,chatgpt_answer
0,0,"Why is every book I hear about a "" NY Times # ...","Basically there are many categories of "" Best ...",There are many different best seller lists tha...
1,1,"If salt is so bad for cars , why do we use it ...",salt is good for not dying in car crashes and ...,Salt is used on roads to help melt ice and sno...
2,2,Why do we still have SD TV channels when HD lo...,The way it works is that old TV stations got a...,There are a few reasons why we still have SD (...
3,3,Why has nobody assassinated Kim Jong - un He i...,You ca n't just go around assassinating the le...,It is generally not acceptable or ethical to a...
4,4,How was airplane technology able to advance so...,Wanting to kill the shit out of Germans drives...,After the Wright Brothers made the first power...
...,...,...,...,...
95,95,How come the Constitution of the US is relevan...,The Constitution is the highest law of the lan...,The Constitution of the United States is the s...
96,96,Why does Valve have Greenlight would n't it ma...,"Now , There are 2 problems with steam , and bo...","Valve, the company that developed the Steam pl..."
97,97,why do people so commonly say SO instead of bo...,"SO covers boyfriends , girlfriends , husbands ...","People might use the term ""SO"" instead of ""boy..."
98,98,How do companies that buy & re - sell other pe...,"I get that they are "" bottom feeders "" but to ...",When a company or individual owes money to ano...


In [398]:
### TEST CASE for Data Import
# - This read-only cell contains a "checkpoint" for this section of the MicroProejct and verifies you are on the right track.
# - If this cell results in a celebration message, you PASSED all test cases!
# - If this cell results in any errors, check you previous cells, make changes, and RE-RUN your code and then this cell.
tada = "\N{PARTY POPPER}"

df = df.replace(r'\r\n', '\n', regex=True)

assert("df" in vars())
assert(len(df) == 100)
assert("How come I have to pay for Foxtel and get ads , when things like youtube are free because they play ads . Should n't Foxtel be you pay and you do nt get ads , or it should be free and you get ads . Please explain like I'm five." in df["question"].values)
assert("It's important to remember that Japan is a country with a rich and diverse culture, and like any other country, it has a range of cultural practices and expressions. While it is true that Japan has a reputation for being a more traditional and conservative culture, it is also home to a vibrant and creative media industry. \nThe Japanese media industry is known for producing a wide variety of content, including video games, TV shows, and music, that often incorporates elements of Japanese culture and history. This can include things like traditional Japanese instruments and themes, as well as more modern and unconventional elements. \nIt's also worth noting that different parts of Japanese culture can have different levels of conservatism. For example, some aspects of Japanese culture, such as traditional family structure or gender roles, may be more conservative, while other parts of Japanese culture, such as the media industry, may be more open to experimentation and innovation. \nOverall, it's important to remember that Japan is a complex and multifaceted culture, and it's not fair to generalize or stereotype all aspects of Japanese culture based on a few specific examples." in df["chatgpt_answer"].values)
print(f"{tada} All Tests Passed! {tada}")

🎉 All Tests Passed! 🎉


<hr style="color: #DD3403;">

## Part 1: Response Length Analysis

One common critique of current AI systems are the answers are often excessively long.

For the first analysis, let's compare the average length of the human response to the average length of an AI response and see if there's any statistical significance to the difference.  To start off add two new columns to the DataFrame that contain the length of each response.

### Puzzle 1.1: Getting the average length of the human and AI response

When working with columns that contain `string` data, we can use the general syntax:

> ```py
> df["column"].str.STRING_FUNCTION()
> ```

Where `STRING_FUNCTION()` is any of the [string accessor methods](https://pandas.pydata.org/docs/reference/series.html#api-series-str).  Specifically, one such method is:

> ...
> `Series.str.len()`: Compute the length of each element in the Series/Index.
> ...

To begin our analysis of the length of each response, add two new columns to the DataFrame:
- `human_answer_len`, containing the length (in characters) of the human response
- `chatgpt_answer_len`, containing the length (in characters) of the ChatGPT response

In [399]:
df['human_answer_len'] = df['human_answer'].str.len()
df['chatgpt_answer_len'] = df['chatgpt_answer'].str.len()

### Puzzle 1.2: Compute the Average Answer Lengths

Now that we have the lengths of our responses, save the average length of the responses in the variables `human_avg` and `ai_avg`:

In [400]:
# Average response length for a human response:
human_avg = df['human_answer_len'].mean()
human_avg

np.float64(729.88)

In [401]:
# Average response length for a ChatGPT response:
ai_avg = df['chatgpt_answer_len'].mean()
ai_avg

np.float64(1098.77)

In [402]:
### TEST CASE for Puzzle 1: Response Length Analysis (Puzzles 1.1 and 1.2)
tada = "\N{PARTY POPPER}"
assert("human_avg" in vars())
assert("ai_avg" in vars())
assert('human_answer_len' in df.columns)
assert('chatgpt_answer_len' in df.columns)
assert(876 in df["human_answer_len"].values and 930 in df["chatgpt_answer_len"].values)
assert((human_avg * 390)/30 == 9488.44)
assert(((ai_avg * 444)/30) + 86 == 16347.796)
print(f"{tada} All Tests Passed! {tada}")

🎉 All Tests Passed! 🎉


### Puzzle 1.3: Testing if the difference in length is statistically significant

To determine if the difference between the means is statistically significant, we're going to conduct a t-test. Let's first state our null and alternative hypotheses.

> **Null hypothesis**: There is no significant difference between the means of the two response lengths.
> 
> **Alternative hypothesis**: There is a significant difference between the means of the two response lengths.

Use the [`scipy.stats.ttest_ind`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html) function to get the t-stat (as `t_stat`) and p-value (as `p_value`).

You can find the function documentation here: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html

In [403]:
from scipy.stats import ttest_ind
human_lengths = df['human_answer_len'].dropna()
chatgpt_lengths = df['chatgpt_answer_len'].dropna()
# Compute a t-test:
#    Syntax: t_stat, p_value = ttest_ind( ... )
t_stat, p_value = ttest_ind(
    human_lengths,
    chatgpt_lengths,
    )

In [404]:
# Display the t-statistic:
t_stat

np.float64(-3.2043931944794153)

In [405]:
# Display the p-value:
p_value

np.float64(0.0015776855917118066)

### Puzzle 1.4: Choosing Your Conclusion

Using the calculated p-value and an alpha level of 0.05, we can determine if there is a significant difference between the two mean values.

Un-comment exactly one conclusion based on your observations.

In [406]:
# conclusion = "The p-value is greater than .05 so there is a significant difference between the means of the two response lengths."
# conclusion = "The p-value is greater than .05 so there is no significant difference between the means of the two response lengths."
conclusion = "The p-value is less than .05 so there is a significant difference between the means of the two response lengths."
# conclusion = "The p-value is less than .05 so there is no significant difference between the means of the two response lengths."

In [407]:
### TEST CASE for Puzzle 1: Response Length Analysis (Puzzles 1.3 and 1.4)

tada = "\N{PARTY POPPER}"
assert("t_stat" in vars())
assert("p_value" in vars())

import math
assert( math.isclose(t_stat * p_value, -0.005055524973109543) ), "Check your t_stat and p_value."

assert(len(conclusion) == 112), "Your conclusion is incorrect."
assert(conclusion.count("a") == 5), "Your conclusion is incorrect."

print(f"{tada} All Tests Passed! {tada}")

🎉 All Tests Passed! 🎉


<hr style="color: #DD3403;">

## Part 2: Sentiment Analysis - Polarity

A second common critique of current AI systems are the answers are often more optimistic and/or subjective than a human.  One common technique in text processing is to analyze the "polarity" and "subjectivity" score of a piece of text.

A "polarity score" ranges from -1 to 1, where -1 is very negative, 0 is neutral, and 1 is very positive.

A "subjectivity score" ranges from 0 to 1, where 0 is not a subjective statement and 1 is a very subjective statement.

### Puzzle 2.1: Find the Polarity

To find our polarity score, lets use the TextBlob library, to calculate a polarity and subjectivity score for each response. TextBlob is a Python library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more. 


Start off by installing this library.

In the terminal, type the following: `python3 -m pip install textblob`

- If this command does not work, try the others `pip install` commands listed [here](https://discovery.cs.illinois.edu/guides/System-Setup/Setup-Your-System/#Step-4-Installing-pandas), replacing `pandas` with `textblob`.

Once it is installed, import the `TextBlob` library with the following import statement:

In [408]:
from textblob import TextBlob

The polarity score from TextBlob ranges from -1 to 1, where -1 is very negative, 0 is neutral, and 1 is very positive. Scroll to the "Sentiment Analysis" portion of the [TextBlob Tutorial: Quickstart](https://textblob.readthedocs.io/en/dev/quickstart.html).

Once you understand how to get the polarity of a string using the library, complete the function `polarity` below that takes a string `s` as input and returns the `polarity` score:

In [409]:
def polarity(s):
  blob = TextBlob(s)
  return blob.sentiment.polarity


Let's test your function to make sure it's working:

In [410]:
# From the documentation, we expect a positive (above zero) polarity:
polarity("Textblob is amazingly simple to use. What great fun!")

0.39166666666666666

In [411]:
# A negative statement, we expect a negative (below zero) polarity:
polarity("You will fail DISCOVERY if you do not take the final exam")

-0.25

In [412]:
# A neutral statement, we expect a zero polarity:
polarity("Hello, world!")

0.0

In [413]:
# Feel free to try your own sentence :)
polarity("HHHH")

0.0

### Puzzle 2.2: Finding the Polarity of Human Answers

Once you have defined a custom function, you can use it on every row of data by using:
> ```python
> df["column"].apply( customFunctionName )
> ```

Note that `customFunctionName` does NOT contain the usual `(...)` at the end of the function name, you are providing **ONLY the name of the function** and the `apply` function will internally **call** the function.

Using the syntax above, create a new column named `human_polarity` that finds the polarity of the `human_answer` data using the `.apply` operation on a column of a DataFrame:

In [414]:
df["human_polarity"] = df['human_answer'].apply(polarity)
df

Unnamed: 0.1,Unnamed: 0,question,human_answer,chatgpt_answer,human_answer_len,chatgpt_answer_len,human_polarity
0,0,"Why is every book I hear about a "" NY Times # ...","Basically there are many categories of "" Best ...",There are many different best seller lists tha...,600,1114,0.696667
1,1,"If salt is so bad for cars , why do we use it ...",salt is good for not dying in car crashes and ...,Salt is used on roads to help melt ice and sno...,207,1148,0.289286
2,2,Why do we still have SD TV channels when HD lo...,The way it works is that old TV stations got a...,There are a few reasons why we still have SD (...,784,789,0.160761
3,3,Why has nobody assassinated Kim Jong - un He i...,You ca n't just go around assassinating the le...,It is generally not acceptable or ethical to a...,613,930,-0.025000
4,4,How was airplane technology able to advance so...,Wanting to kill the shit out of Germans drives...,After the Wright Brothers made the first power...,59,1235,-0.200000
...,...,...,...,...,...,...,...
95,95,How come the Constitution of the US is relevan...,The Constitution is the highest law of the lan...,The Constitution of the United States is the s...,368,1281,-0.083333
96,96,Why does Valve have Greenlight would n't it ma...,"Now , There are 2 problems with steam , and bo...","Valve, the company that developed the Steam pl...",876,1034,-0.061111
97,97,why do people so commonly say SO instead of bo...,"SO covers boyfriends , girlfriends , husbands ...","People might use the term ""SO"" instead of ""boy...",125,846,-0.250000
98,98,How do companies that buy & re - sell other pe...,"I get that they are "" bottom feeders "" but to ...",When a company or individual owes money to ano...,202,1254,0.170000


### Puzzle 2.3: Finding the Polarity of ChatGPT Answers

Now, find the polarity of ChatGPT answers and save each polarity in the column `chatgpt_polarity`:

In [415]:
df["chatgpt_polarity"] = df['chatgpt_answer'].apply(polarity)
df

Unnamed: 0.1,Unnamed: 0,question,human_answer,chatgpt_answer,human_answer_len,chatgpt_answer_len,human_polarity,chatgpt_polarity
0,0,"Why is every book I hear about a "" NY Times # ...","Basically there are many categories of "" Best ...",There are many different best seller lists tha...,600,1114,0.696667,0.352774
1,1,"If salt is so bad for cars , why do we use it ...",salt is good for not dying in car crashes and ...,Salt is used on roads to help melt ice and sno...,207,1148,0.289286,0.113158
2,2,Why do we still have SD TV channels when HD lo...,The way it works is that old TV stations got a...,There are a few reasons why we still have SD (...,784,789,0.160761,0.196140
3,3,Why has nobody assassinated Kim Jong - un He i...,You ca n't just go around assassinating the le...,It is generally not acceptable or ethical to a...,613,930,-0.025000,0.032037
4,4,How was airplane technology able to advance so...,Wanting to kill the shit out of Germans drives...,After the Wright Brothers made the first power...,59,1235,-0.200000,0.336167
...,...,...,...,...,...,...,...,...
95,95,How come the Constitution of the US is relevan...,The Constitution is the highest law of the lan...,The Constitution of the United States is the s...,368,1281,-0.083333,0.182286
96,96,Why does Valve have Greenlight would n't it ma...,"Now , There are 2 problems with steam , and bo...","Valve, the company that developed the Steam pl...",876,1034,-0.061111,0.000000
97,97,why do people so commonly say SO instead of bo...,"SO covers boyfriends , girlfriends , husbands ...","People might use the term ""SO"" instead of ""boy...",125,846,-0.250000,0.080769
98,98,How do companies that buy & re - sell other pe...,"I get that they are "" bottom feeders "" but to ...",When a company or individual owes money to ano...,202,1254,0.170000,0.112500


In [416]:
### TEST CASE for Puzzle 2: Sentiment Analysis - Polarity
tada = "\N{PARTY POPPER}"

assert('human_polarity' in df.columns)
assert('chatgpt_polarity' in df.columns)

import math
assert(math.isclose( df["human_polarity"].mean(), 0.07790771420489018 ))
assert(math.isclose( df["chatgpt_polarity"].std(), 0.11014935757814867 ))

print(f"{tada} All Tests Passed! {tada}")

🎉 All Tests Passed! 🎉


### Analysis: Overall Results

Run the following cell to see a summary of the results you found so far:

In [417]:
import pandas as pd
pd.DataFrame([
  {"Metric": "Response Length (mean):",  "Human": human_avg, "ChatGPT": ai_avg },
  {"Metric": "Polarity (mean):", "Human": df.human_polarity.mean(), "ChatGPT": df.chatgpt_polarity.mean() },
  {"Metric": "Polarity (mean of absolutely values):", "Human": df.human_polarity.abs().mean(), "ChatGPT": df.chatgpt_polarity.abs().mean() },
])

Unnamed: 0,Metric,Human,ChatGPT
0,Response Length (mean):,729.88,1098.77
1,Polarity (mean):,0.077908,0.098595
2,Polarity (mean of absolutely values):,0.149428,0.11658


<hr style="color: #DD3403;">

## Part 3: Sentiment Analysis - Subjectivity

Using the same library and design as Part #2, create a function called `subjectivity` that returns the subjectivity score of a given string `s`:


In [418]:
def subjectivity(s):
  blob = TextBlob(s)
  return blob.sentiment.subjectivity

Let's try it out:

In [419]:
# A highly subjective statement, expecting a score above 0.5:
subjectivity("The best fruit is a kiwi and everyone else is wrong!")

0.6

In [420]:
# A factual statement, expecting a score near 0:
subjectivity("Winter is colder than summer.")

0.0

In [421]:
# Feel free to try your own :)
subjectivity("...")

0.0

### Finding the Subjectivity of Human and ChatGPT Answers

Add two new columns to the DataFrame, `human_subjectivity` and `chatgpt_subjectivity`, that stores the subjectivity of each answer for all of the questions.

*You may need to look back to the previous section to review how to use the .apply function.*

In [422]:
df['human_subjectivity'] = df['human_answer'].apply(subjectivity)

df['chatgpt_subjectivity'] = df['chatgpt_answer'].apply(subjectivity)
df

Unnamed: 0.1,Unnamed: 0,question,human_answer,chatgpt_answer,human_answer_len,chatgpt_answer_len,human_polarity,chatgpt_polarity,human_subjectivity,chatgpt_subjectivity
0,0,"Why is every book I hear about a "" NY Times # ...","Basically there are many categories of "" Best ...",There are many different best seller lists tha...,600,1114,0.696667,0.352774,0.348333,0.490007
1,1,"If salt is so bad for cars , why do we use it ...",salt is good for not dying in car crashes and ...,Salt is used on roads to help melt ice and sno...,207,1148,0.289286,0.113158,0.625000,0.577193
2,2,Why do we still have SD TV channels when HD lo...,The way it works is that old TV stations got a...,There are a few reasons why we still have SD (...,784,789,0.160761,0.196140,0.387570,0.574912
3,3,Why has nobody assassinated Kim Jong - un He i...,You ca n't just go around assassinating the le...,It is generally not acceptable or ethical to a...,613,930,-0.025000,0.032037,0.521429,0.602222
4,4,How was airplane technology able to advance so...,Wanting to kill the shit out of Germans drives...,After the Wright Brothers made the first power...,59,1235,-0.200000,0.336167,0.800000,0.484583
...,...,...,...,...,...,...,...,...,...,...
95,95,How come the Constitution of the US is relevan...,The Constitution is the highest law of the lan...,The Constitution of the United States is the s...,368,1281,-0.083333,0.182286,0.333333,0.648095
96,96,Why does Valve have Greenlight would n't it ma...,"Now , There are 2 problems with steam , and bo...","Valve, the company that developed the Steam pl...",876,1034,-0.061111,0.000000,0.388194,0.346154
97,97,why do people so commonly say SO instead of bo...,"SO covers boyfriends , girlfriends , husbands ...","People might use the term ""SO"" instead of ""boy...",125,846,-0.250000,0.080769,0.495833,0.504487
98,98,How do companies that buy & re - sell other pe...,"I get that they are "" bottom feeders "" but to ...",When a company or individual owes money to ano...,202,1254,0.170000,0.112500,0.700000,0.465625


In [423]:
### TEST CASE for Puzzle 3: Sentiment Analysis - Subjectivity
tada = "\N{PARTY POPPER}"

import math

assert('human_subjectivity' in df.columns)
assert('chatgpt_subjectivity' in df.columns)
assert( math.isclose( df["human_subjectivity"].mean(), 0.4507534813188739 ) )
assert( math.isclose( df["chatgpt_subjectivity"].std(), 0.10025355258208316 ) )

print(f"{tada} All Tests Passed! {tada}")

🎉 All Tests Passed! 🎉


<hr style="color: #DD3403;">

## Puzzle 4: Testing if the Difference in Subjectivity is Statistically Significant

To determine if the difference between the subjectivity is statistically significant, we're going to conduct a t-test. Let's first state our null and alternative hypotheses.

> **Null hypothesis**: There is no significant difference between the means of the human and ChatGPT subjectivity.
> 
> **Alternative hypothesis**: There is a significant difference between the means of the human and ChatGPT subjectivity.

Use the scipy.stats.ttest_ind function again to get the t-stat and p-value. 

In [424]:
# Compute a t-test:
#    Syntax: t_stat, p_value = ttest_ind( ... )
a = df['human_subjectivity'].dropna()
b = df['chatgpt_subjectivity'].dropna()
t_stat, p_value = ttest_ind(a, b)

In [425]:
t_stat

np.float64(-2.5152049954631126)

In [426]:
p_value

np.float64(0.012691373254489834)

Using the calculated p-value and an alpha level of .05, we can determine if there is a significant difference between the two mean values. Un-comment the correct conclusion based on these values.

In [427]:
conclusion = "The p-value is LESS THAN .05 so there is a significant difference between the means of the two response subjectivity."
# conclusion = "The p-value is LESS THAN .05 so there is *NO* significant difference between the means of the two response subjectivity."
# conclusion = "The p-value is GREATER THAN .05 so there is a significant difference between the means of the two response subjectivity."
# conclusion = "The p-value is GREATER THAN .05 so there is *NO* significant difference between the means of the two response subjectivity."

In [428]:
### TEST CASE for Puzzle 4: Testing if the Difference in Subjectivity is Statistically Significant
tada = "\N{PARTY POPPER}"
assert("t_stat" in vars())
assert("p_value" in vars())
assert(len(conclusion) == 117)
assert(conclusion.count("a") == 4)

import math
assert( math.isclose(t_stat * p_value, -0.031921405408979774) ), "Check your t_stat and p_value."

print(f"{tada} All Tests Passed! {tada}")

🎉 All Tests Passed! 🎉


<hr style="color: #DD3403;">

## Submission

You're almost done!  All you need to do is to commit your lab to GitHub and run the GitHub Actions Grader:

1.  ⚠️ **Make certain to save your work.** ⚠️ To do this, go to **File => Save All**

2.  After you have saved, exit this notebook and return to https://discovery.cs.illinois.edu/microproject/ai-vs-human-response-analysis/ and complete the section **"Commit and Grade Your Notebook"**.

3. If you see a 100% grade result on your GitHub Action, you've completed this MicroProject! 🎉