#### 1. Begin (or restart) part "3(a)" of the **TUT Demo** and interact with a ChatBot to make sure you understand how each part the Monte Hall problem code above works<br>

In [1]:
# Monte Hall Simulation Code -- not the only way to code this, but it's what Prof. Schwartz came up with...

import numpy as np
all_door_options = (1,2,3)  # tuple
my_door_choice = 1  # 1,2,3
i_won = 0
reps = 100000
for i in range(reps):
    secret_winning_door = np.random.choice(all_door_options)
    all_door_options_list = list(all_door_options)
    # take the secret_winning_door, so we don't show it as a "goat" losing door
    all_door_options_list.remove(secret_winning_door)
    try:
        # if my_door_choice was secret_winning_door then it's already removed
        all_door_options_list.remove(my_door_choice)
    except:
        pass
    # show a "goat" losing door and remove it
    goat_door_reveal = np.random.choice(all_door_options_list)
    all_door_options_list.remove(goat_door_reveal)

    # put the secret_winning_door back in if it wasn't our choice
    # we previously removed it, so it would be shown as a  "goat" losing door
    if secret_winning_door != my_door_choice:
        all_door_options_list.append(secret_winning_door)
    # if secret_winning_door was our choice then all that's left in the list is a "goat" losing door
    # if secret_winning_door wasn't our choice then it's all that will be left in the list

    # swap strategy
    my_door_choice = all_door_options_list[0]

    if my_door_choice == secret_winning_door:
        i_won += 1

i_won/reps

0.66716

In [2]:
import numpy as np
all_door_options = (1, 2, 3)
my_door_choice = 1
i_won = 0
reps = 100000

all_door_options is a tuple representing the three doors (1, 2, 3).

my_door_choice is initially set to door 1 (the player always initially picks this door).

i_won is a counter to track how many times the player wins the game.

reps is the number of repetitions or simulations (100,000 in this case).

In [3]:
for i in range(reps):

SyntaxError: incomplete input (3825590777.py, line 1)

This loop will run the simulation reps (100,000) times.

In [4]:
secret_winning_door = np.random.choice(all_door_options)

Each time through the loop, one of the three doors is randomly chosen to be the winning door (with the car behind it).

In [5]:
all_door_options_list = list(all_door_options)
all_door_options_list.remove(secret_winning_door)

The tuple of all doors is converted into a list, and the winning door is removed to avoid revealing it as a losing door.

In [6]:
try:
    all_door_options_list.remove(my_door_choice)
except:
    pass

If the player's initial choice happens to be the winning door, it's already removed. If not, the initial choice is removed from the list of doors.

In [7]:
goat_door_reveal = np.random.choice(all_door_options_list)
all_door_options_list.remove(goat_door_reveal)

A "goat" door (a losing door) is randomly selected from the remaining doors and revealed to the player. This door is then removed from consideration.

In [8]:
if secret_winning_door != my_door_choice:
    all_door_options_list.append(secret_winning_door)

If the player's initial choice was not the winning door, the winning door is added back to the list, representing the doors the player can switch to.

In [9]:
i_won / reps

0.0

After all iterations, the code calculates the proportion of games the player won by dividing the number of wins (i_won) by the total number of simulations (reps).

#### 2. Extend your ChatBot sessions to now address part "3(b)" of the **TUT Demo** and interact with your ChatBot to see if it can suggest a simpler, more streamlined way to code up this *for* loop simulation so the process is more clear and easier to understand; then, describe any preferences you have in terms of readibility or explainability  between the original code and the code improvements suggested by the ChatBot<br>

Key Improvements:

1.Removed unnecessary conversion of tuple to list multiple times.

2.Simplified the logic for removing doors and switching choices.

3.Used more descriptive variable names for clarity.

4.Streamlined the code by combining the try-except block with a conditional check.

In [10]:
import numpy as np

# Monte Hall Simulation with a "switching" strategy

doors = [1, 2, 3]  # The three available doors
initial_choice = 1  # The player's initial choice
wins = 0  # Counter for wins
simulations = 100000  # Number of repetitions

for _ in range(simulations):
    # Randomly select the winning door
    winning_door = np.random.choice(doors)
    
    # Doors remaining after removing the winning door
    remaining_doors = [door for door in doors if door != winning_door]

    # If the initial choice is the winning door, one less door to reveal
    if initial_choice in remaining_doors:
        remaining_doors.remove(initial_choice)
    
    # Randomly reveal a goat door from the remaining doors
    revealed_goat_door = np.random.choice(remaining_doors)
    
    # Switch choice to the remaining door (the one not initially chosen or revealed)
    switch_choice = [door for door in doors if door != initial_choice and door != revealed_goat_door][0]
    
    # Check if the switched choice is the winning door
    if switch_choice == winning_door:
        wins += 1

# Calculate the probability of winning with the switch strategy
win_probability = wins / simulations
print(f"Probability of winning by switching: {win_probability:.4f}")


Probability of winning by switching: 0.6653


1. Logic of the code.

First code (chatgpt's code):The logic of the code is relatively intuitive. For example, it uses a list to filter out doors that can be displayed, or doors that can be changed.Each step is done step by step according to the description in the question, so it is better to understand.

Second code (Professor's code):The logic is slightly more complex. For example, first remove the door, then add the door back, the steps are a bit roundabout.It uses the try-except syntax to deal with a special case (when the door you chose at the beginning is the winning door), which is slightly less easy to understand.

2. Notes and explanations.

First code:There are no comments in the code, but since the variable names and logic are clear, it can be understood without comments.

Second code:It uses a lot of comments to explain what each step is doing. This suggests that the code itself may not be as easy to understand, so comments are needed to help explain it.

3. Overall readability.

First Code:Much more concise and easy to read. The flow of the code is very consistent with the Monty Hall problem itself, the variable names are clear, and the logic is intuitive.

Second code:Slightly more complex, with more steps, operations like adding and removing doors make it easier to get confused, and there are a lot of comments to help with understanding.

Summary:
The first code is easier to read and understand, especially for newcomers. The variable names and code structure are more intuitive and fit the description well.
The second code can realize the same function, but it's a bit difficult to understand because of the many steps, the winding logic, and the many comments.

#### 3. Submit your preferred version of the Monty Hall problem that is verified to be running and working with a final printed output of the code; then, add code comments explaining the purpose of each line of the code<br>

In [None]:
import numpy as np  # Importing the NumPy library for generating random numbers

doors = [1, 2, 3]  # Define three alternative doors, numbered 1, 2, 3
wins = 0  # Initialize the counter for the number of victories to 0
simulations = 100000  # We're going to run 100,000 simulations to estimate the win rate of the door-switching strategy

# Start multiple simulations, with each simulation representing one game
for _ in range(simulations):
    # Randomly select a winning door (host knows which door is behind the car)
    winning_door = np.random.choice(doors)
    
    # Players randomly select a door as their initial choice
    initial_choice = np.random.choice(doors)
    
    # The host shows a door that is not the winning door and is not the player's initial choice, ensuring that behind the door is the goat
    possible_doors_to_reveal = [door for door in doors if door != initial_choice and door != winning_door]
    revealed_goat_door = np.random.choice(possible_doors_to_reveal)
    
    # The player decides to change the door, choosing the one that is left (the one that has not been shown)
    switch_choice = [door for door in doors if door != initial_choice and door != revealed_goat_door][0]
    
    # If the selection after the door change happens to be the winning door, a win is recorded
    if switch_choice == winning_door:
        wins += 1

# Calculate the final margin of victory, margin of victory = number of victories after door changes / total number of simulations
win_probability = wins / simulations

# Printing out the win rate of the door change strategy would result in a result close to 66.67%
print(f"Probability of winning by switching: {win_probability:.4f}")


Summary of Our Session:
In this session, we analyzed a Python script simulating the Monty Hall Problem, a well-known probability puzzle. The original code simulated 100,000 repetitions of the game, using a "switching" strategy, where the player always switches their choice after a non-winning door is revealed.

Key points from the discussion:

Original Code Explanation: We broke down each step of the original code, explaining how it simulates the game and calculates the probability of winning by switching doors.
Improvements Suggested: I proposed a simplified and more readable version of the code that:
Eliminated redundant steps.
Used list comprehensions for more concise code.
Improved naming for better clarity.
Streamlined the process of switching doors and checking for wins.

https://chatgpt.com/share/66eb3e5f-e038-8008-a919-bdd0e2bc2f6b

#### 4. Watch the embedded video tutorial on Markov chains in the next Jupyter cell below to understand their application and relevance for ChatBots; then, after watching the video, start a new ChatBot session by prompting that you have code that creates a "Markovian ChatBot"; show it the first version of the "Markovian ChatBot code" below; and interact with the ChatBot session to make sure you understand how the original first version of the "Markovian ChatBot code" works<br>

In [None]:
# Markovian Chatbot

# from collections import defaultdict
word_used = dict() # defaultdict(int)
next_word = dict() # defaultdict(lambda: defaultdict(int))
for i,word in enumerate(words[:-1]):

    if word in word_used:
        word_used[word] += 1
    else:
        word_used[word] = 1
        next_word[word] = {}

    if words[i+1] in next_word[word]:
        next_word[word][words[i+1]] += 1
    else:
        next_word[word][words[i+1]] = 1

In [None]:
word_used = dict()
next_word = dict() 

word_used is a dictionary of the number of times each word appears in the input text.

next_word is a dictionary that stores the number of times each word is followed by another word. It records a word to a dictionary that counts how often the word appears after it.


In [None]:
for i, word in enumerate(words[:-1]):

words is the list of words in the input text. the for loop iterates over every word in the list of words except the last one (because words[i+1] is already in use, and if the last word is accessed, an error occurs).

In [None]:
if word in word_used:
    word_used[word] += 1
else:
    word_used[word] = 1
    next_word[word] = {}

In the if statement, for each word entered, it determines if the word is already in word_used. If it is not in word_used, it is added to the dictionary, indicating that the word has already appeared. At the same time, the word is added to the next_word dictionary, which is used to count the number of times the word occurs.

In [None]:
if words[i+1] in next_word[word]:
    next_word[word][words[i+1]] += 1
else:
    next_word[word][words[i+1]] = 1

This part of the code looks at the word that comes immediately after the current word (words[i+1]).
If the next word (words[i+1]) already exists in the dictionary of next words for the current word (next_word[word]), its count is incremented.
Otherwise, the next word is added to the dictionary with an initial count of 1.

Summary of Our Current Session:

Understanding the Original Markov Chain Program:

We discussed how the original Markov chain program builds a simple model by tracking the frequency of word transitions in a text. The program uses two dictionaries:
word_used tracks the number of occurrences of each word.
next_word tracks which words follow each word and how frequently they appear.

Extensions to the Markov Chain Program:

Extension #1 (Bigram Transitions):

The first extension enhances the Markov model by moving from single-word transitions to bigram transitions. It tracks how often a word pair (bigram) appears and which word follows that bigram.
This improves the context for word prediction, resulting in more coherent text generation.

Extension #2 (Character-based Context):

The second extension further improves the model by personalizing it based on characters from a dataset. It creates a separate bigram Markov chain for each character, tracking the words each character uses and how their speech patterns flow.
This allows for generating text that mimics the speaking styles of different characters.

Key Coding Results:

Base Markov Chain: The original program tracks word transitions using single-word dependencies, where it calculates how often a word appears and what words follow it.

Bigram-based Markov Chain: Extension #1 modifies the model to track transitions between word pairs (bigrams) and predict the word that comes after a bigram.
Character-specific Markov Chain: Extension #2 builds separate bigram-based Markov chains for each character from a dataset, capturing the unique word usage patterns of different characters.
This approach could be useful for tasks like dialogue generation in chatbots, video games, or personalized text generation systems.

https://chatgpt.com/share/66eb56ef-9724-8008-af9e-0f7f3cde030b

#### 5. Recreate (or resume) the previous ChatBot session from question "4" above, and now  prompt the ChatBot session that you have a couple extensions of the code to show it, and then show it each of the extentions of the "Markovian ChatBot code" below in turn

Extension #1: Using Bigram Transitions

This extension changes the Markov chain to consider pairs of words (bigrams) rather than individual words.

In [None]:
word_used2 = defaultdict(int)
next_word2 = defaultdict(lambda: defaultdict(int))
for i, word in enumerate(words[:-2]):
    word_used2[word+' '+words[i+1]] += 1
    next_word2[word+' '+words[i+1]][words[i+2]] += 1

Explanation:

Bigram Transitions: This extension models transitions between bigrams (pairs of consecutive words) instead of individual words.

word_used2: This dictionary tracks how many times each bigram (word + the word immediately following it) appears in the dataset.
next_word2: This stores the frequency of words that follow each bigram.

How it Works:

The program now looks at two consecutive words (word and words[i+1]) instead of just one word.
It constructs a key for word_used2 and next_word2 using the bigram word + words[i+1].
Then, it increments the count for that bigram in word_used2 and tracks the word that follows the bigram (words[i+2]) in next_word2.
This creates a second-order Markov chain, which takes into account not only the current word but also the word before it. This helps in generating more coherent text since the model has more context.

Extension #2: Character-based Context

This extension personalizes the model by incorporating character-specific Markov chains. It tracks the words spoken by different characters, using a dataset that includes a character column.

In [None]:
characters = Counter("\n"+ avatar.character.str.upper().str.replace(' ','.')+":")
# `avatar` is a dataset, and `character` is one of it's columns

nested_dict = lambda: defaultdict(nested_dict)
word_used2C = nested_dict()
next_word2C = nested_dict()

for i, word in enumerate(words[:-2]):
    if word in characters:
        character = word

    if character not in word_used2C:
        word_used2C[character] = dict()
    if word+' '+words[i+1] not in word_used2C[character]:
        word_used2C[character][word+' '+words[i+1]] = 0
    word_used2C[character][word+' '+words[i+1]] += 1

    if character not in next_word2C:
        next_word2C[character] = dict()
    if word+' '+words[i+1] not in next_word2C[character]:
        next_word2C[character][word+' '+words[i+1]] = dict()
    if words[i+2] not in next_word2C[character][word+' '+words[i+1]]:
        next_word2C[character][word+' '+words[i+1]][words[i+2]] = 0
    next_word2C[character][word+' '+words[i+1]][words[i+2]] += 1


Explanation:
This extension uses the characters from a dataset (in this case, avatar), where the dataset has a column named character, to build separate Markov models for each character.

Character Identification:

In [None]:
characters = Counter("\n"+ avatar.character.str.upper().str.replace(' ','.')+":")


This line processes the character column from the avatar dataset, making all text uppercase, replacing spaces with dots, and appending a colon (:) to standardize the characters' names.

The characters dictionary stores these names as keys, likely mapping to the frequency or occurrence of each character.

Character-specific Dictionaries:

In [None]:
nested_dict = lambda: defaultdict(nested_dict)
word_used2C = nested_dict()
next_word2C = nested_dict()


A nested_dict is defined to create a deep hierarchy of default dictionaries. This will be used to store separate models for each character.

word_used2C and next_word2C are the character-specific versions of word_used2 and next_word2. They store bigram transition data, but the data is now divided by character. In other words, each character has their own Markov chain based on the words they use.

How it Works:

The loop checks if a word matches a character in the dataset. If it does, that word becomes the character variable.
The bigram (word + words[i+1]) is then stored specifically under that character in word_used2C and next_word2C.
This builds a separate Markov model for each character, tracking the frequency of bigrams they use and the words that follow them.

Summary of How the Full System Works:

Base Model (original code):

Tracks individual word transitions in the text.
Bigram Extension (Extension #1):
Moves from tracking individual words to tracking pairs of words (bigrams), which improves context for generating text.

Character-specific Extension (Extension #2):
Personalizes the Markov chain model by creating a separate Markov chain for each character. It uses bigrams to track the specific word patterns of each character in the dataset.

#### 6. Report on your experience interacting with ChatBots to understand the Monte Hall problem and "Markovian ChatBot" code

1. Discuss how quickly the ChatBot was able to be helpful for each of the above questions, and if so, how?<br><br>
    
2. Discuss whether or not interacting with ChatBot to try to figure things out was frustrating or unhelpful, and if so, how?<br><br>
    
3. Based on your experiences to date (e.g., including using ChatBots to troubleshoot coding errors in the previous homework), provide an overall assessment evaluating the usefulness of ChatBots as tools to help you understand code<br>

1.Monty Hall Problems:

In discussing the Monty Hall problem with ChatBot, I found it very quick to explain. It told me why switching doors improves the probability of winning, and illustrated it with simple examples to help me understand the logic behind it. Even though the question itself was a bit counter-intuitive, ChatBot's explanation made sense to me very quickly, and the overall experience was very good, with quick responses and explanations.

“Markovian ChatBot” code:

When I asked ChatBot about the “Markovian ChatBot” code, it immediately provided a clear explanation that helped me understand how the Markov chain predicts relationships between words. It was especially easy to understand when discussing how to extend the code with bigrams and role-specific models. It broke down the logic of the code step-by-step, which helped me a lot to have a deeper understanding of how the code works.

2.Overall, the experience of interacting with ChatBot was positive, especially when my questions were clear enough, and it was efficient in solving the problem by giving quick and accurate answers. However, when dealing with more complex or ambiguous questions, it sometimes required me to adjust the description of the question in order to get a useful response. Especially when I'm debugging code, if the problem is complex it can take a couple interactions to actually solve it, which can be a little frustrating, but ultimately I can find my way around.

3.In my experience, ChatBot is very helpful in understanding code and explaining concepts. It was able to provide quick feedback and especially did well when I was dealing with simple problems and small errors. For common programming problems, its instant feedback saved me a lot of time. However, for more complex debugging or in-depth code analysis, it may not provide the most comprehensive help, and I sometimes need to refer to other resources. Therefore, I think ChatBot is a useful tool, but it needs to be used in conjunction with other learning styles when dealing with more complex tasks.

#### 7. Reflect on your experience interacting with ChatBot and describe how your perception of AI-driven assistance tools in the context of learning coding, statistics, and data science has been evolving (or not) since joining the course<br><br>

Since I started using ChatBot in my courses to learn programming, statistics, and data science, my view of AI aids has changed a bit. At first, I just thought of it as a gadget that could quickly answer questions. But as I used it more, I realized that it not only helped me solve basic problems, but also explained complex concepts and helped me understand the logic behind the code.

In programming, ChatBot can quickly help me understand algorithms, explain the structure of the code, and also help me troubleshoot some common errors. In statistics and data science, it can simplify complex formulas and methods, making it easier for me to grasp and apply them. However, I also find that when the problem becomes more complex or more specific, ChatBot sometimes cannot give the most suitable solution. This is when I need to refer to other resources such as tutorials or documentation.

Overall, my view of AI tools has gradually changed from an initial aid to a helper in the learning process. Although it is useful, when I encounter more in-depth problems, I still use it in conjunction with other resources to gain a fuller understanding.

#### 8. ChatBots consume text data available on the web or platforms, and thus represents a new way to "search consensensus" that condenses and summarizes mainstream human thought<br><br>

1. Start a new ChatBot session and discuss the relevance of learning and adaptability, communication, coding, and statistics and data analysis as skills in the modern world, especially with respect to career opportunities (particularly in the context of the data science industry)<br><br>
    
2. See if ChatBot thinks you could be a statistician or data scientist without coding or doing data analysis, and then transition your ChatBot conversation into a career exploration discussion, using the ChatBot to identify the skills that might be the most valuable for a career that you're interested<br><br>
    
3. Ask for a summary of this ChatBot session and paste it into your homework notebook (including link(s) to chat log histories if you're using ChatBot)<br><br>
    
4. Paraphrase the assessments and conclusions of your conversation in the form of a reflection on your current thoughts regarding your potential future career(s) and how you can go about building the skills you need to pursue it<br><br>

5. Give your thoughts regarding the helpfulness or limitations of your conversation with a ChatBot, and describe the next steps you would take to pursue this conversation further if you felt the information the ChatBot provides was somewhat high level and general, and perhaps lacked the depth and detailed knowledge of a dedicated subject matter expert who had really take the time to understand the ins and outs of the industry and career path in question.
<br><br>

Through my conversation with ChatBot, I realized that in the modern world, the ability to learn and adapt, communication skills, and programming and data analysis skills are very important in career opportunities, especially in the field of data science.ChatBot told me that it is almost impossible to be a statistician or a data scientist without programming or data analysis because these are the foundations for working with data. During the career exploration discussion, ChatBot suggested that I learn programming languages such as Python and R, and learn about statistics and machine learning, which are key skills to become a data scientist. After this conversation, I began to reflect on my career direction; data science not only requires skill, but also encourages creativity in problem solving. As a result, I feel that my next step should be to focus on improving my programming skills, learning statistics and machine learning in depth, and improving my ability to analyze data.

#### 9. Have you reviewed the course [wiki-textbook](https://github.com/pointOfive/stat130chat130/wiki) and interacted with a ChatBot (or, if that wasn't sufficient, real people in the course piazza discussion board or TA office hours) to help you understand all the material in the tutorial and lecture that you didn't quite follow when you first saw it?<br><br>

Yes, I read the course wiki and textbook and interacted with ChatBot to help me understand all the material in the tutorials and lectures.