Please use Markdown cells in your submission to document your thought process. You are expected to follow the clean code and PEP 8 guidelines as much as you can. You should use docstrings for all function declarations.
In this assignment, you will learn several functions from numpy. Please check their functionalities using their documentations or help() function, and see if you can apply them to solve the homework problems.

# Cosin Similarity
Cosine similarity measures the similarity between two high dimensional vectors. It is widely-used in applications such as clustering tasks in machine learning, building recommendation systems for e-commerce companies. See more background on cosine similarity in [Wikipedia](https://en.wikipedia.org/wiki/Cosine_similarity). 

Write a function named "cosine_similarity". The function takes two 1D numpy arrays as inputs, and returns their cosine similarity. 

**Note**: You are allowed to use np.dot() for inner product, np.sum() for summation, np.sqrt() for calculating square root. Do not use other built-in functions from Numpy such as np.linalg.norm(). 


In [2]:
import numpy as np

def cosine_similarity(a, b):
    """
    Calculates the cosine similarity between two 1D numpy arrays.

    Args:
        a (np.array): The first 1D array.
        b (np.array): The second 1D array.

    Returns:
        float: The cosine similarity between a and b.
    """
    # 1. Numerator
    dot_product = np.dot(a, b)
    
    # 2. Norms for the denominator
    # ||a|| = sqrt(a_1^2 + a_2^2 + ...) = sqrt(np.dot(a, a))
    norm_a = np.sqrt(np.dot(a, a))
    norm_b = np.sqrt(np.dot(b, b))

    # 3. Cosine Similarity
    similarity = dot_product / ((norm_a * norm_b) + 1e-8)
    
    return similarity

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(f"Cosine Similarity: {cosine_similarity(a, b)}") 

Cosine Similarity: 0.9746318459002302


# Stock Price Analysis--Part I
Write a function named "count_max_streak". The function takes the historical stock price data, represented as a 1D numpy array, as input, and returns the maximum number of consecutive days when the stock price increased. 

To test your function, please use `tesla_closing_price` as the 1D numpy array input to your function. The historical stock price of Tesla is provided for you to test your code.

```py
import pandas as pd
tesla = pd.read_csv('TSLA.csv') # load Tesla stock price from csv file
tesla_np = tesla.to_numpy() # convert the data to numpy arrays
tesla_closing_price = tesla_np[:, 4] # extract the closing stock price of Tesla
```

In [None]:
def count_max_streak(prices):
    """
    Counts the maximum number of consecutive days the stock price increased.

    Args:
        prices (np.array): A 1D numpy array of historical stock prices.

    Returns:
        int: The maximum number of consecutive increasing days.
    """
    if len(prices) < 2:
        return 0
    
    max_streak = 0
    current_streak = 0
    
    # index 1
    for i in range(1, len(prices)):
        if prices[i] > prices[i-1]:
    
            current_streak += 1
        else:
            max_streak = max(max_streak, current_streak)
            current_streak = 0
            
    
    max_streak = max(max_streak, current_streak)
    
    return max_streak

# import pandas as pd
# tesla = pd.read_csv('TSLA.csv')
# tesla_np = tesla.to_numpy()
# tesla_closing_price = tesla_np[:, 4].astype(float) 
# print(f"Tesla's max increasing streak: {count_max_streak(tesla_closing_price)}")

# Stock Price Analysis--Part 2
Write a function named "detect_crash". The function takes a 2D numpy array as input, where the first column represents the historical opening stock price, and the second column represents the historical closing stock price. The function should return a 2D numpy array whose first column contains the indices when crashes occurred, and whose second column contains amount of price drop. 

**Note**: We say there is a stock price crash if the closing price is less than the opening price.

**Hint**: You may find functions np.where() and np.column_stack() helpful. You are also welcome to come up with other solutions without using these functions. 

To test your function, please use `open_close_prices` as the 2D numpy array input to your function. You can download the historical stock price of Tesla here.

```py
import pandas as pd
tesla = pd.read_csv('TSLA.csv') # load Tesla stock price from csv file
tesla_np = tesla.to_numpy() # convert the data to numpy arrays
tesla_opening_price = tesla_np[:, 1] # extract the closing stock price of Tesla
tesla_closing_price = tesla_np[:, 4] # extract the closing stock price of Tesla
open_close_price = np.column_stack((tesla_opening_price, tesla_closing_price)) # form a 2D numpy array using opening and closing prices 
```

In [None]:
def detect_crash(open_close_price):
    """
    Detects stock price crashes (closing price < opening price).

    Args:
        open_close_price (np.array): A 2D numpy array where:
                                      column 0 is the opening price
                                      column 1 is the closing price

    Returns:
        np.array: A 2D numpy array where:
                  column 0 contains the indices of crash days
                  column 1 contains the price drop amount (open - close)
    """
    
    open_close_price = open_close_price.astype(float)
    
  
    open_prices = open_close_price[:, 0]
    close_prices = open_close_price[:, 1]
    
   
    condition = close_prices < open_prices
    
 
    crash_indices = np.where(condition)[0]
    
    
    drop_amounts = open_prices[crash_indices] - close_prices[crash_indices]
    
 
    result = np.column_stack((crash_indices, drop_amounts))
    
    return result


# import pandas as pd
# tesla = pd.read_csv('TSLA.csv')
# tesla_np = tesla.to_numpy()
# tesla_opening_price = tesla_np[:, 1]
# tesla_closing_price = tesla_np[:, 4]
# open_close_prices = np.column_stack((tesla_opening_price, tesla_closing_price))
# crashes = detect_crash(open_close_prices)
# print("Crash days (index, drop amount):")
# print(crashes)

# Infectious Disease Simulation

Write a function named "simulate_disease" to simulate how infectious disease may propagate over a network. Given the infection probabilities  of all individuals at time  and a network connection , the infection probabilities at time step  is computed as . Here are the requirements of this function:

The function uses a 1D numpy array to denote the probabilities of all individuals within the network of being infected. 
The network connection among individuals is represented using a 2D symmetric, row-stochastic numpy array, with all entries being non-negative (A row-stochastic matrix is one whose elements adds up to 1 for each row). If there are N individuals in the network, the matrix is of dimension . The -th entry of the matrix represents the connection strength between individual  and . 
Given the initial probability, the network connections, and prediction time horizon, the function returns the probabilities of each individual in the network of getting disease at the end of prediction time horizon.
You can use the following code snippet to generate a random connection matrix of dimension  to verify your function:
```py
np.random.seed(20)
matrix = np.random.rand(N, N)
symmetric_matrix = (matrix + matrix.T) / 2
connection_matrix = symmetric_matrix / symmetric_matrix.sum(axis=1, keepdims=True)
```
You can use the following code snippet to generate an initial infection probability: `np.random.rand(N, 1)`

In [None]:
def simulate_disease(initial_prob, connection_matrix, time_horizon):
    """
    Simulates the disease propagation over a network for a given time horizon.

    Args:
        initial_prob (np.array): A 1D array (N,) of initial infection probabilities.
        connection_matrix (np.array): A 2D array (N, N) representing the network.
        time_horizon (int): The number of time steps to simulate.

    Returns:
        np.array: A 1D array (N,) of infection probabilities after T steps.
    """
    current_prob = initial_prob
    
    for _ in range(time_horizon):
        # P(t+1) = P(t) * M
        # 
        current_prob = np.dot(current_prob, connection_matrix)
        
        # 
        # current_prob = current_prob @ connection_matrix
        
    return current_prob

# N = 10
# np.random.seed(20)
# matrix = np.random.rand(N, N)
# symmetric_matrix = (matrix + matrix.T) / 2
# connection_matrix = symmetric_matrix / symmetric_matrix.sum(axis=1, keepdims=True)

# 
# initial_prob_col = np.random.rand(N, 1)
# initial_prob_1d = initial_prob_col.flatten() 

# final_probs = simulate_disease(initial_prob_1d, connection_matrix, T)
# print(f"Probabilities after {T} steps:")
# print(final_probs)

# Music Composition

Write a function named "compose_music" that takes a music sheet in A major scale as input and plays a piece of music based on the music sheet. The music sheet specifies a sequence of notes and their durations (see below for an example). You can use the in-class example "generate_sine" function to generate music notes with `fs = 8000`.

You can test your code using the example music_sheet below (simplified from "A Better Tomorrow (Mark's theme)" by Joseph Koo).
```py
music_sheet = [("Note_Cs", 0.5), ("Note_D", 0.5), ("Note_B", 0.5), ("Note_A_high", 1.5), ("Note_A_high", 0.5), ("Note_Gs", 0.5), ("Note_Fs", 0.5), ("Note_Gs", 0.5), ("Note_E", 0.75)]   
```
**Extra challenge**: Can you revise the function to play chords based on a given music sheet? A chord refers to the combination of two music nodes. (Extra challenge is not graded).

In [10]:
import numpy as np
from IPython.display import Audio # 

def generate_sine(frequency, duration, fs):
    """
    Generates a sine wave for a given frequency and duration.

    Args:
        frequency (float): Frequency in Hz.
        duration (float): Duration in seconds.
        fs (int): Sampling rate (samples per second).

    Returns:
        np.array: A 1D numpy array representing the sine wave.
    """
    # 
    t = np.linspace(0., duration, int(fs * duration), endpoint=False)
    # 
    amplitude = np.iinfo(np.int16).max * 0.5 # 
    wave = amplitude * np.sin(2 * np.pi * frequency * t)
    
    # 
    fade_len = int(fs * 0.01) # 10ms fade
    fade_in = np.linspace(0., 1., fade_len)
    fade_out = np.linspace(1., 0., fade_len)
    
    wave[:fade_len] = wave[:fade_len] * fade_in
    wave[-fade_len:] = wave[-fade_len:] * fade_out
    
    return wave.astype(np.int16)

def compose_music(music_sheet):
    """
    Composes and returns a playable audio object from a music sheet.

    Args:
        music_sheet (list): A list of (note_name, duration) tuples.

    Returns:
        IPython.display.Audio: A playable audio object.
    """
    fs = 8000 
    
    
    note_frequencies = {
        "Note_E": 329.63,     # E4
        "Note_Fs": 369.99,    # F#4
        "Note_Gs": 415.30,    # G#4
        "Note_A": 440.00,     # A4
        "Note_B": 493.88,     # B4
        "Note_Cs": 554.37,    # C#5
        "Note_D": 587.33,     # D5
        "Note_E_high": 659.25, # E5 
        "Note_Fs_high": 739.99, # F#5 
        "Note_Gs_high": 830.61, # G#5 
        "Note_A_high": 880.00,  # A5
    }
    
    # 
    # 
    sheet_frequencies = {
        "Note_E": 659.25,      # E5
        "Note_Fs": 739.99,     # F#5
        "Note_Gs": 830.61,     # G#5
        "Note_A_high": 880.00, # A5
        "Note_B": 493.88,      # B4
        "Note_D": 587.33,      # D5
        "Note_Cs": 554.37,     # C#5
    }

    song_pieces = []
    
    for note_name, duration in music_sheet:
    
        frequency = sheet_frequencies.get(note_name, 0) # 
        
        if frequency == 0:
            print(f"{note_name}")
            # 
            wave = np.zeros(int(fs * duration)).astype(np.int16)
        else:
    
            wave = generate_sine(frequency, duration, fs)
        
        song_pieces.append(wave)
        

    full_song = np.concatenate(song_pieces)
    

    return Audio(full_song, rate=fs)


# music_sheet = [("Note_Cs", 0.5), ("Note_D", 0.5), ("Note_B", 0.5), ("Note_A_high", 1.5), 
#                ("Note_A_high", 0.5), ("Note_Gs", 0.5), ("Note_Fs", 0.5), ("Note_Gs", 0.5), 
#                ("Note_E", 0.75)]
# 
# audio_player = compose_music(music_sheet)
# display(audio_player) #

# Rock Paper Scissor
Write a function named `play_rock_paper_scissor` so that a user can interact with the computer to play Rock Paper Scissor game. Here are the expected functionalities:
- For each round of the game, the function should ask the user whether the user wants to start playing by inputting 0 or 1. User input 0 indicates no, and 1 indicates yes.
- If the user wants to start the game, prompt the user to input a choice of action among Rock, Paper, Scissor. For now, you can safely assume the user's input is always among these three options. Randomly generate an action among Rock, Paper, Scissor for the computer. Please refer to our in-class practice problem for an example on random number generation.
- Compare the user input with the computer action, and print the winner based on the following rules:
    - Rock beats Scissors
    - Scissors beats Paper
    - Paper beats Rock
- Prompt the user whether the game should continue or not as we did in the first step.

In [11]:
import random

def play_rock_paper_scissor():
    """
    Runs an interactive game of Rock, Paper, Scissor between the user and
    the computer.

    The game loop continues until the user decides to quit.
    """
    
    print("--- Welcome to Rock, Paper, Scissor! ---")
    
    # Define the game options
    options = ["Rock", "Paper", "Scissor"]

    # Start the main game loop
    while True:
        # Step 1: Ask the user if they want to play
        user_start = input("\nDo you want to start a new round? (Enter 1 for yes, 0 for no): ")

        # Case 1: User wants to quit
        if user_start == '0':
            print("Thanks for playing. Goodbye! ")
            break  # Exit the while loop to end the game

        # Case 2: User wants to play
        elif user_start == '1':
            
            # Step 2: Get choices
            # Get the user's choice
            user_choice = input("Enter your choice (Rock, Paper, Scissor): ")
            
            # Get the computer's random choice
            computer_choice = random.choice(options)

            # Show the choices
            print(f"\n   You chose: {user_choice}")
            print(f"Computer chose: {computer_choice}\n")

            # Step 3: Compare choices and print the winner
            
            # Check for a tie
            if user_choice == computer_choice:
                print(">>> It's a Tie! ")

            # Check for all user winning conditions
            elif (user_choice == "Rock" and computer_choice == "Scissor") or \
                 (user_choice == "Scissor" and computer_choice == "Paper") or \
                 (user_choice == "Paper" and computer_choice == "Rock"):
                
                print(f">>> You win!  ({user_choice} beats {computer_choice})")

            # If it's not a tie and the user didn't win, the computer must have won
            else:
                print(f">>> Computer wins!  ({computer_choice} beats {user_choice})")
            
            print("-" * 30) # Add a separator for clarity

        # Case 3: User entered invalid input (not 0 or 1)
        else:
            print("Invalid input. Please enter 1 to play or 0 to quit.")

# --- How to run the game ---
# To play, uncomment the line below and run this script.
# play_rock_paper_scissor()

# Linear Regression
- You are given a straight line learned from linear regression, denoted as $\hat{y}=ax+b$. Your task is to use a Python function named `eval_predict` to assess prediction quality over a dataset. Please first define the metric being used for assessment. Then let your program calculate and return the defined metric value.
- To learn a high-quality straight line in the form of $\hat{y}=ax+b$, we need to find optimal values for $a$
 and $b$. In what follows, you will perform grid search. You’ll try many pairs on a rectangular grid and pick the one with the best metric (defined in Problem 7) on the dataset. For example, consider a ∈ {0.0, 0.5, 1.0} and b ∈ {-1.0, 0.0, 1.0}, you evaluate all 9 combinations, compute the metric for each, and pick the best. Write a Python function named `grid_search` to find best pairs for given options of $a$ and $b$.

In [None]:
def eval_predict(a, b, x_data, y_data):
    """
    Assesses the prediction quality of a linear model y_hat = ax + b
    using the Mean Squared Error (MSE) metric.

    Args:
        a (float): The slope parameter.
        b (float): The intercept parameter.
        x_data (np.array): The 1D array of input features.
        y_data (np.array): The 1D array of true target values.

    Returns:
        float: The Mean Squared Error (MSE) of the predictions.
    """
    # 1. 
    y_pred = a * x_data + b
    
    # 2. 
    # MSE = mean((y_true - y_pred)^2)
    mse = np.mean((y_data - y_pred) ** 2)
    
    return mse

def grid_search(a_options, b_options, x_data, y_data):
    """
    Performs a grid search to find the best (a, b) pair that
    minimizes the MSE metric.

    Args:
        a_options (list or np.array): A list of 'a' values to try.
        b_options (list or np.array): A list of 'b' values to try.
        x_data (np.array): The 1D array of input features.
        y_data (np.array): The 1D array of true target values.

    Returns:
        tuple: A tuple (best_a, best_b) containing the best pair found.
    """
    best_metric = float('inf') # 
    best_pair = (None, None)
    
    # 
    for a in a_options:
        for b in b_options:
            # 
            current_metric = eval_predict(a, b, x_data, y_data)
            
            # 
            if current_metric < best_metric:
                best_metric = current_metric
                best_pair = (a, b)
                
    print(f"Grid search complete. Best MSE found: {best_metric}")
    return best_pair
# --- How to test ---
# 
# np.random.seed(42)
# true_a, true_b = 2.5, 1.0
# x_data = np.linspace(0, 10, 50)
# y_data = true_a * x_data + true_b + np.random.normal(0, 1.5, 50) # 

# 
# a_options = np.linspace(2.0, 3.0, 10) # 
# b_options = np.linspace(0.5, 1.5, 10) # 

# best_a, best_b = grid_search(a_options, b_options, x_data, y_data)
# print(f"Best pair found: a={best_a:.2f}, b={best_b:.2f}")

# Sign-Up Validator
Imagine you are building a sign-up page for a new app. To keep the user database clean, you must validate all inputs before creating an account.
Instead of relying on advanced libraries, you’ll use pure Python basics.

### Username Validation
Write a function `validate_username(username)` that:
- Returns False if the username is empty, shorter than 3, or longer than 20 characters.
- Returns False if it contains anything besides letters, digits, underscore _, dot ., or hyphen -.
- Returns True if valid.

### Email Validation (Simple Version)
Write a function `validate_email(email)` that:
- Returns False if it doesn’t contain an "@".
- Splits the email into local part and domain (use .partition("@")).
- Ensures the domain contains at least one "." and doesn’t start/end with ".".
- Returns True if valid, False otherwise.

### Phone Normalization (Simplified)
Write a function `normalize_phone(phone, default_cc="+1")` that:
- Removes all non-digit characters.
- If the number doesn’t start with "+", prepend the default country code.
- Check that the final string has between 9 and 15 digits (excluding +).
- Return the normalized phone number (e.g., "+14155551212"), or "Invalid" if not valid.

### Sign-Up Aggregator
Write a function `validate_signup(user_info)` where `user_info` is a dictionary, e.g.:
```py
{
    "username": "alice_01",
    "email": "Alice@example.com",
    "password": "Strong!Pass1",
    "phone": "(415) 555-1212",
    "country": "us"
}
```
It should:
- Call each validation function.
- Collect any errors in a dictionary.
- Return a result like:
```py
{
    "ok": True,
    "errors": {},
    "normalized": {
        "username": "alice_01",
        "email": "alice@example.com",
        "phone": "+14155551212",
        "country": "US"
    }
}
```
If there are errors, `ok` should be False and errors should explain them.


In [None]:
import string # 

# --- Part 1: Username Validation ---
def validate_username(username):
    """
    Validates a username based on length and allowed characters.
    """
    # 1. 
    if not (3 <= len(username) <= 20):
        return False
    
    # 2. 
    allowed_chars = set(string.ascii_letters + string.digits + "_.-")
    
    # 
    for char in username:
        if char not in allowed_chars:
            return False
    
    # 
    return True

# --- Part 2: Email Validation ---
def validate_email(email):
    """
    Performs a simple validation on an email address.
    """
    # 1. 
    try:
        local, at_symbol, domain = email.partition("@")
    except:
        return False # 

    # 2. 
    if not at_symbol: # 
        return False
    
    # 3. 
    if not local or not domain:
        return False
        
    # 4. 
    if "." not in domain:
        return False
        
    # 5. 
    if domain.startswith(".") or domain.endswith("."):
        return False
        
    return True

# --- Part 3: Phone Normalization ---
def normalize_phone(phone, default_cc="+1"):
    """
    Normalizes a phone number by removing non-digits and adding a country code.
    """
    # 1. 
    digits = "".join(char for char in phone if char.isdigit())
    
    # 2. 
    if phone.strip().startswith("+"):
        # 
        # 
        normalized = "+" + digits
    else:
        # 
        normalized = default_cc + digits
        
    # 3. 
    number_part = normalized[1:] # 
    if not (9 <= len(number_part) <= 15):
        return "Invalid"
        
    return normalized

# --- Part 4: Sign-Up Aggregator ---
def validate_signup(user_info):
    """
    Aggregates all validation functions for a user_info dictionary.
    """
    errors = {}
    normalized = {}
    
    # --- Username ---
    username = user_info.get("username", "")
    if not validate_username(username):
        errors["username"] = "Username must be 3-20 chars and can only contain letters, digits, _, ., -"
    else:
        normalized["username"] = username
        
    # --- Email ---
    email = user_info.get("email", "")
    if not validate_email(email):
        errors["email"] = "Invalid email format. Must contain @ and a valid domain."
    else:
        normalized["email"] = email.lower() # 
        
    # --- Phone ---
    phone = user_info.get("phone", "")
    
    # 
    cc_map = {"us": "+1", "cn": "+86"} # 
    country_code = user_info.get("country", "us").lower()
    default_cc = cc_map.get(country_code, "+1") # 
    
    norm_phone = normalize_phone(phone, default_cc=default_cc)
    
    if norm_phone == "Invalid":
        errors["phone"] = "Invalid phone number. Must have 9-15 digits."
    else:
        normalized["phone"] = norm_phone
        
    # --- Country & Password (just pass-through normalization) ---
    # 
    normalized["country"] = user_info.get("country", "").upper()
    # 
    
    # --- Final Result ---
    ok = (len(errors) == 0)
    
    return {
        "ok": ok,
        "errors": errors,
        "normalized": normalized
    }

# --- How to test ---
# good_user = {
#     "username": "alice_01",
#     "email": "Alice@example.com",
#     "password": "Strong!Pass1",
#     "phone": "(415) 555-1212",
#     "country": "us"
# }

# bad_user = {
#     "username": "a!", # 
#     "email": "bad-email.com", # 
#     "phone": "123", # 
# }

# print("--- Good User Validation ---")
# print(validate_signup(good_user))
# print("\n--- Bad User Validation ---")
# print(validate_signup(bad_user))