Please use Markdown cells in your submission to document your thought process. You are expected to follow the clean code and PEP 8 guidelines as much as you can. You should use docstrings for all function declarations.
In this assignment, you will learn several functions from numpy. Please check their functionalities using their documentations or help() function, and see if you can apply them to solve the homework problems.

# Cosin Similarity
Cosine similarity measures the similarity between two high dimensional vectors. It is widely-used in applications such as clustering tasks in machine learning, building recommendation systems for e-commerce companies. See more background on cosine similarity in [Wikipedia](https://en.wikipedia.org/wiki/Cosine_similarity). 

Write a function named "cosine_similarity". The function takes two 1D numpy arrays as inputs, and returns their cosine similarity. 

**Note**: You are allowed to use np.dot() for inner product, np.sum() for summation, np.sqrt() for calculating square root. Do not use other built-in functions from Numpy such as np.linalg.norm(). 


In [10]:
import numpy as np

def cosine_similarity(a: np.ndarray, b: np.ndarray) -> float:
    """
    Compute cosine similarity between two 1D numpy arrays a and b.
    cos_sim = (a · b) / (||a|| * ||b||)
    """
    # inner product
    dot = np.dot(a, b)

    # L2 norms of a and b
    norm_a = np.sqrt(np.dot(a, a))
    norm_b = np.sqrt(np.dot(b, b))

    # avoid division by zero if one vector is all zeros
    if norm_a == 0 or norm_b == 0:
        return 0.0

    return dot / (norm_a * norm_b)


# Stock Price Analysis--Part I
Write a function named "count_max_streak". The function takes the historical stock price data, represented as a 1D numpy array, as input, and returns the maximum number of consecutive days when the stock price increased. 

To test your function, please use `tesla_closing_price` as the 1D numpy array input to your function. The historical stock price of Tesla is provided for you to test your code.

```py
import pandas as pd
tesla = pd.read_csv('TSLA.csv') # load Tesla stock price from csv file
tesla_np = tesla.to_numpy() # convert the data to numpy arrays
tesla_closing_price = tesla_np[:, 4] # extract the closing stock price of Tesla
```

In [11]:
import pandas as pd
tesla = pd.read_csv('TSLA.csv') # load Tesla stock price from csv file
tesla_np = tesla.to_numpy() # convert the data to numpy arrays
tesla_closing_price = tesla_np[:, 4] # extract the closing stock price of Tesla
import numpy as np

def maximum_number(price: tesla_np) -> int:
    days = 0
    cur = 0

    for i in range(len(price)):
        if i > 0 and price[i] > price[i - 1]:
            cur += 1
            days = max(days, cur)
    else:
        cur = 0
    return days

print(maximum_number(tesla_closing_price))




1131


# Stock Price Analysis--Part 2
Write a function named "detect_crash". The function takes a 2D numpy array as input, where the first column represents the historical opening stock price, and the second column represents the historical closing stock price. The function should return a 2D numpy array whose first column contains the indices when crashes occurred, and whose second column contains amount of price drop. 

**Note**: We say there is a stock price crash if the closing price is less than the opening price.

**Hint**: You may find functions np.where() and np.column_stack() helpful. You are also welcome to come up with other solutions without using these functions. 

To test your function, please use `open_close_prices` as the 2D numpy array input to your function. You can download the historical stock price of Tesla here.

```py
import pandas as pd
tesla = pd.read_csv('TSLA.csv') # load Tesla stock price from csv file
tesla_np = tesla.to_numpy() # convert the data to numpy arrays
tesla_opening_price = tesla_np[:, 1] # extract the closing stock price of Tesla
tesla_closing_price = tesla_np[:, 4] # extract the closing stock price of Tesla
open_close_price = np.column_stack((tesla_opening_price, tesla_closing_price)) # form a 2D numpy array using opening and closing prices 
```

In [12]:
import pandas as pd
tesla = pd.read_csv('TSLA.csv') # load Tesla stock price from csv file
tesla_np = tesla.to_numpy() # convert the data to numpy arrays
tesla_opening_price = tesla_np[:, 1] # extract the closing stock price of Tesla
tesla_closing_price = tesla_np[:, 4] # extract the closing stock price of Tesla
open_close_price = np.column_stack((tesla_opening_price, tesla_closing_price)) # form a 2D numpy array using opening and closing prices 

import numpy as np

def detect_crash(open_close: np.ndarray) -> np.ndarray:
    """
    open_close: 2D numpy array with shape (n_days, 2)
        column 0 = opening prices
        column 1 = closing prices

    Return a 2D numpy array where:
        column 0 = indices (0-based) of days with a crash
        column 1 = amount of price drop (open - close) on those days
    """
    # Separate opening and closing prices
    opening = open_close[:, 0]
    closing = open_close[:, 1]

    # Boolean mask: True where there is a crash
    crash_mask = closing < opening

    # Indices where crashes occurred
    crash_indices = np.where(crash_mask)[0]

    # Amount of drop on those days
    drops = opening[crash_mask] - closing[crash_mask]

    # Stack indices and drops into a 2D array (n_crashes x 2)
    result = np.column_stack((crash_indices, drops))

    return result

crashes = detect_crash(open_close_price)
print(crashes)



[[1 1.9600010000000019]
 [2 3.040001]
 [3 3.7999989999999997]
 ...
 [2223 3.3699960000000146]
 [2224 4.840011000000004]
 [2225 1.4199979999999925]]


# Infectious Disease Simulation

Write a function named "simulate_disease" to simulate how infectious disease may propagate over a network. Given the infection probabilities  of all individuals at time  and a network connection , the infection probabilities at time step  is computed as . Here are the requirements of this function:

The function uses a 1D numpy array to denote the probabilities of all individuals within the network of being infected. 
The network connection among individuals is represented using a 2D symmetric, row-stochastic numpy array, with all entries being non-negative (A row-stochastic matrix is one whose elements adds up to 1 for each row). If there are N individuals in the network, the matrix is of dimension . The -th entry of the matrix represents the connection strength between individual  and . 
Given the initial probability, the network connections, and prediction time horizon, the function returns the probabilities of each individual in the network of getting disease at the end of prediction time horizon.
You can use the following code snippet to generate a random connection matrix of dimension  to verify your function:
```py
np.random.seed(20)
matrix = np.random.rand(N, N)
symmetric_matrix = (matrix + matrix.T) / 2
connection_matrix = symmetric_matrix / symmetric_matrix.sum(axis=1, keepdims=True)
```
You can use the following code snippet to generate an initial infection probability: `np.random.rand(N, 1)`

In [13]:
import numpy as np

def simulate_disease(initial_prob: np.ndarray,
                     connection_matrix: np.ndarray,
                     T: int) -> np.ndarray:
    """
    Simulate how an infectious disease spreads in a network.

    Parameters
    ----------
    initial_prob : np.ndarray
        1D array of length N (or shape (N, 1)) representing the initial
        infection probability of each individual in the network.
    connection_matrix : np.ndarray
        2D array of shape (N, N). It is symmetric and row-stochastic:
        - all entries are non-negative
        - each row sums to 1
        entry [i, j] represents how much person j influences person i.
    T : int
        Number of time steps to simulate.

    Returns
    -------
    np.ndarray
        Infection probabilities after T time steps. Shape matches the
        shape of `initial_prob` (1D or column vector).
    """

    # Ensure we work internally with a column vector of shape (N, 1)
    # so that matrix multiplication is clear.
    was_1d = False
    prob = initial_prob
    if prob.ndim == 1:
        # Remember that original was 1D so we can return 1D later
        was_1d = True
        prob = prob.reshape(-1, 1)  # (N,) -> (N, 1)

    # Repeatedly apply: p_{t+1} = C @ p_t
    for _ in range(T):
        prob = connection_matrix @ prob

    # Convert back to original shape
    if was_1d:
        return prob.ravel()  # (N, 1) -> (N,)
    else:
        return prob


In [14]:
N = 5
np.random.seed(20)
matrix = np.random.rand(N, N)
symmetric_matrix = (matrix + matrix.T) / 2
connection_matrix = symmetric_matrix / symmetric_matrix.sum(axis=1, keepdims=True)

initial_prob = np.random.rand(N)  # 1D vector
final_prob = simulate_disease(initial_prob, connection_matrix, T=10)
print("initial:", initial_prob)
print("final:", final_prob)

initial: [0.49238104 0.63125307 0.83949792 0.4610394  0.49794007]
final: [0.60340112 0.60340112 0.60340112 0.60340112 0.60340112]


# Music Composition

Write a function named "compose_music" that takes a music sheet in A major scale as input and plays a piece of music based on the music sheet. The music sheet specifies a sequence of notes and their durations (see below for an example). You can use the in-class example "generate_sine" function to generate music notes with `fs = 8000`.

You can test your code using the example music_sheet below (simplified from "A Better Tomorrow (Mark's theme)" by Joseph Koo).
```py
music_sheet = [("Note_Cs", 0.5), ("Note_D", 0.5), ("Note_B", 0.5), ("Note_A_high", 1.5), ("Note_A_high", 0.5), ("Note_Gs", 0.5), ("Note_Fs", 0.5), ("Note_Gs", 0.5), ("Note_E", 0.75)]   
```
**Extra challenge**: Can you revise the function to play chords based on a given music sheet? A chord refers to the combination of two music nodes. (Extra challenge is not graded).

In [15]:
import numpy as np

# Example mapping from note names to frequencies in Hz.
# Adjust these to match whatever was defined in your notebook.
NOTE_FREQ = {
    "Note_Cs": 554.37,   # C#5 (example values)
    "Note_D":  587.33,
    "Note_E":  659.25,
    "Note_Fs": 739.99,
    "Note_Gs": 830.61,
    "Note_A_high": 880.00,
    "Note_B":  739.99,   # just example; replace with your actual settings
    # add other notes if needed
}

def generate_sine(freq, duration, fs=8000):
    """Generate a sine wave at given frequency and duration."""
    t = np.linspace(0, duration, int(fs * duration), endpoint=False)
    return np.sin(2 * np.pi * freq * t)


def compose_music(music_sheet, fs=8000):
    """
    Compose a piece of music based on a given music sheet.

    Parameters
    ----------
    music_sheet : list[tuple[str, float]]
        Each element is (note_name, duration_in_seconds), e.g.
        [("Note_Cs", 0.5), ("Note_D", 0.5), ...]
    fs : int
        Sampling frequency (default 8000 Hz).

    Returns
    -------
    np.ndarray
        A 1D numpy array containing the full audio waveform obtained
        by concatenating all generated notes.
    """
    # Start with an empty waveform
    full_waveform = np.array([], dtype=float)

    # Process each (note_name, duration) pair in the music sheet
    for note_name, duration in music_sheet:
        # Look up the frequency of this note
        freq = NOTE_FREQ[note_name]

        # Generate the sine wave for this note
        note_wave = generate_sine(freq, duration, fs)

        # Concatenate this note to the full waveform
        full_waveform = np.concatenate((full_waveform, note_wave))

    return full_waveform


In [16]:
music_sheet = [
    ("Note_Cs", 0.5),
    ("Note_D", 0.5),
    ("Note_B", 0.5),
    ("Note_A_high", 1.5),
    ("Note_A_high", 0.5),
    ("Note_Gs", 0.5),
    ("Note_Fs", 0.5),
    ("Note_Gs", 0.5),
    ("Note_E", 0.75),
]

wave = compose_music(music_sheet, fs=8000)



# Rock Paper Scissor
Write a function named `play_rock_paper_scissor` so that a user can interact with the computer to play Rock Paper Scissor game. Here are the expected functionalities:
- For each round of the game, the function should ask the user whether the user wants to start playing by inputting 0 or 1. User input 0 indicates no, and 1 indicates yes.
- If the user wants to start the game, prompt the user to input a choice of action among Rock, Paper, Scissor. For now, you can safely assume the user's input is always among these three options. Randomly generate an action among Rock, Paper, Scissor for the computer. Please refer to our in-class practice problem for an example on random number generation.
- Compare the user input with the computer action, and print the winner based on the following rules:
    - Rock beats Scissors
    - Scissors beats Paper
    - Paper beats Rock
- Prompt the user whether the game should continue or not as we did in the first step.

In [17]:
import random

def play_rock_paper_scissor():
    """
    Let the user play Rock–Paper–Scissors with the computer.

    For each round:
      1. Ask the user whether they want to play (0 = no, 1 = yes).
      2. If yes, ask for a choice among "Rock", "Paper", "Scissor".
      3. Randomly generate the computer's choice.
      4. Compare and print who wins.
      5. Ask again whether to continue (same as step 1).
    """
    options = ["Rock", "Paper", "Scissor"]

    while True:
        # Step 1: ask if the user wants to play a round
        start = input("Do you want to play? (1 = yes, 0 = no): ")

        if start == "0":
            print("Game over. Bye!")
            break
        elif start != "1":
            # If the user types something else, just skip this loop iteration
            print("Please enter 1 or 0.")
            continue

        # Step 2: get user's action
        user_action = input("Choose Rock, Paper, or Scissor: ")

        # (The problem statement says we can assume the input is valid.)
        # Step 3: computer randomly picks an action
        computer_action = random.choice(options)
        print(f"Computer chose: {computer_action}")

        # Step 4: determine winner
        if user_action == computer_action:
            print("It's a tie!")
        elif (
            (user_action == "Rock" and computer_action == "Scissor") or
            (user_action == "Scissor" and computer_action == "Paper") or
            (user_action == "Paper" and computer_action == "Rock")
        ):
            print("You win!")
        else:
            print("Computer wins!")


play_rock_paper_scissor()


Game over. Bye!


# Linear Regression
- You are given a straight line learned from linear regression, denoted as $\hat{y}=ax+b$. Your task is to use a Python function named `eval_predict` to assess prediction quality over a dataset. Please first define the metric being used for assessment. Then let your program calculate and return the defined metric value.
- To learn a high-quality straight line in the form of $\hat{y}=ax+b$, we need to find optimal values for $a$
 and $b$. In what follows, you will perform grid search. You’ll try many pairs on a rectangular grid and pick the one with the best metric (defined in Problem 7) on the dataset. For example, consider a ∈ {0.0, 0.5, 1.0} and b ∈ {-1.0, 0.0, 1.0}, you evaluate all 9 combinations, compute the metric for each, and pick the best. Write a Python function named `grid_search` to find best pairs for given options of $a$ and $b$.

In [21]:
import numpy as np

def eval_predict(x: np.ndarray,
                 y_true: np.ndarray,
                 a: float,
                 b: float) -> float:
    """
    Evaluate prediction quality of line y_hat = a * x + b on dataset (x, y_true).

    Parameters
    ----------
    x : np.ndarray
        1D array of input features.
    y_true : np.ndarray
        1D array of true target values.
    a, b : float
        Parameters of the line y_hat = a * x + b.

    Returns
    -------
    float
        The metric value (here we use Mean Squared Error).
        If your Problem 7 uses a different metric, modify this
        function accordingly.
    """
    # Predicted values by the line
    y_pred = a * x + b

    # ---- Metric definition ----
    # Example: Mean Squared Error (MSE)
    errors = y_pred - y_true
    mse = np.mean(errors ** 2)

    return mse



def grid_search(x: np.ndarray,
                y_true: np.ndarray,
                a_list: list,
                b_list: list):
    """
    Perform grid search over given lists of a and b to find the best line
    y_hat = a * x + b according to eval_predict metric.

    Parameters
    ----------
    x : np.ndarray
        1D array of input features.
    y_true : np.ndarray
        1D array of true target values.
    a_list : list[float]
        Candidate values for slope a.
    b_list : list[float]
        Candidate values for intercept b.

    Returns
    -------
    (best_a, best_b, best_metric)
        best_a : float
            The value of a that gives the best metric.
        best_b : float
            The value of b that gives the best metric.
        best_metric : float
            The metric value for (best_a, best_b).
    """
    best_a = None
    best_b = None
    best_metric = None  # we will minimize the metric (e.g., MSE)

    for a in a_list:
        for b in b_list:
            metric = eval_predict(x, y_true, a, b)

            # For MSE/MAE, smaller is better → we minimize metric
            if best_metric is None or metric < best_metric:
                best_metric = metric
                best_a = a
                best_b = b

    return best_a, best_b, best_metric


In [22]:
# Fake data for testing
x = np.array([0, 1, 2, 3, 4], dtype=float)
y = 2 * x + 1   # true line: y = 2x + 1

a_candidates = [0.0, 1.0, 2.0, 3.0]
b_candidates = [0.0, 1.0, 2.0]

best_a, best_b, best_metric = grid_search(x, y, a_candidates, b_candidates)
print("Best a:", best_a)
print("Best b:", best_b)
print("Best metric:", best_metric)


Best a: 2.0
Best b: 1.0
Best metric: 0.0


# Sign-Up Validator
Imagine you are building a sign-up page for a new app. To keep the user database clean, you must validate all inputs before creating an account.
Instead of relying on advanced libraries, you’ll use pure Python basics.

### Username Validation
Write a function `validate_username(username)` that:
- Returns False if the username is empty, shorter than 3, or longer than 20 characters.
- Returns False if it contains anything besides letters, digits, underscore _, dot ., or hyphen -.
- Returns True if valid.

### Email Validation (Simple Version)
Write a function `validate_email(email)` that:
- Returns False if it doesn’t contain an "@".
- Splits the email into local part and domain (use .partition("@")).
- Ensures the domain contains at least one "." and doesn’t start/end with ".".
- Returns True if valid, False otherwise.

### Phone Normalization (Simplified)
Write a function `normalize_phone(phone, default_cc="+1")` that:
- Removes all non-digit characters.
- If the number doesn’t start with "+", prepend the default country code.
- Check that the final string has between 9 and 15 digits (excluding +).
- Return the normalized phone number (e.g., "+14155551212"), or "Invalid" if not valid.

### Sign-Up Aggregator
Write a function `validate_signup(user_info)` where `user_info` is a dictionary, e.g.:
```py
{
    "username": "alice_01",
    "email": "Alice@example.com",
    "password": "Strong!Pass1",
    "phone": "(415) 555-1212",
    "country": "us"
}
```
It should:
- Call each validation function.
- Collect any errors in a dictionary.
- Return a result like:
```py
{
    "ok": True,
    "errors": {},
    "normalized": {
        "username": "alice_01",
        "email": "alice@example.com",
        "phone": "+14155551212",
        "country": "US"
    }
}
```
If there are errors, `ok` should be False and errors should explain them.


In [23]:
import string

# ---------- Username Validation ----------

def validate_username(username: str) -> bool:
    """
    Return True if username is valid, False otherwise.

    Rules:
    - False if empty, shorter than 3, or longer than 20 characters.
    - False if it contains characters other than:
        letters, digits, underscore "_", dot ".", or hyphen "-".
    """
    if not username:
        return False

    if len(username) < 3 or len(username) > 20:
        return False

    allowed_chars = (
        string.ascii_letters + string.digits + "._-"
    )

    for ch in username:
        if ch not in allowed_chars:
            return False

    return True


# ---------- Email Validation (simple) ----------

def validate_email(email: str) -> bool:
    """
    Simple email validation.

    Rules:
    - Return False if it doesn't contain "@".
    - Split into local part and domain using .partition("@").
    - Domain must:
        * contain at least one "."
        * not start with "."
        * not end with "."
    - Return True if valid, False otherwise.
    """
    if "@" not in email:
        return False

    local, sep, domain = email.partition("@")
    # local part and domain must both be non-empty
    if not local or not domain:
        return False

    # Domain must contain ".", and not start/end with "."
    if "." not in domain:
        return False
    if domain[0] == "." or domain[-1] == ".":
        return False

    return True


# ---------- Phone Normalization (simplified) ----------

def normalize_phone(phone: str, default_cc: str = "+1") -> str:
    """
    Normalize a phone number.

    Steps:
    - Remove all non-digit characters.
    - If the *original* phone string does not start with '+',
      prepend the default country code (e.g., "+1").
    - Check that the final number has between 9 and 15 digits
      (excluding the '+').
    - Return the normalized number (e.g., "+14155551212"),
      or "Invalid" if it is not valid.
    """
    # Remember whether user already included "+"
    has_plus = phone.strip().startswith("+")

    # Keep digits only
    digits = "".join(ch for ch in phone if ch.isdigit())

    if not digits:
        return "Invalid"

    if has_plus:
        normalized = "+" + digits
    else:
        # default_cc already includes '+', e.g. "+1"
        normalized = default_cc + digits

    # Check length of digits (exclude '+')
    digit_count = len(normalized.lstrip("+"))
    if 9 <= digit_count <= 15:
        return normalized
    else:
        return "Invalid"


# ---------- Sign-Up Aggregator ----------

# NOTE: I'm assuming you already have a password validator from
# your previous assignment. Here is a very simple placeholder;
# replace this with your real password rule if needed.
def validate_password(password: str) -> bool:
    """
    Very simple password validator placeholder.
    Replace this with your assignment's real password rules.
    """
    return len(password) >= 8


def validate_signup(user_info: dict) -> dict:
    """
    Validate a sign-up payload.

    user_info example:
    {
        "username": "alice_01",
        "email":    "Alice@example.com",
        "password": "Strong!Pass1",
        "phone":    "(415) 555-1212",
        "country":  "us"
    }

    Returns a dictionary like:
    {
        "ok": True/False,
        "errors": {
            "username": "Invalid username",
            ...
        },
        "normalized": {
            "username": "alice_01",
            "email":    "alice@example.com",
            "phone":    "+14155551212",
            "country":  "US"
        }
    }
    """
    errors = {}
    normalized = {}

    username = user_info.get("username", "")
    email = user_info.get("email", "")
    password = user_info.get("password", "")
    phone = user_info.get("phone", "")
    country = user_info.get("country", "")

    # --- username ---
    if not validate_username(username):
        errors["username"] = "Invalid username"
    else:
        normalized["username"] = username

    # --- email ---
    if not validate_email(email):
        errors["email"] = "Invalid email"
    else:
        # normalize email to lowercase
        normalized["email"] = email.strip().lower()

    # --- password ---
    if not validate_password(password):
        errors["password"] = "Invalid or weak password"

    # --- phone ---
    normalized_phone = normalize_phone(phone)
    if normalized_phone == "Invalid":
        errors["phone"] = "Invalid phone number"
    else:
        normalized["phone"] = normalized_phone

    # --- country ---
    # very simple normalization: uppercase country code
    if country:
        normalized["country"] = country.strip().upper()
    else:
        errors["country"] = "Country is required"

    ok = len(errors) == 0

    return {
        "ok": ok,
        "errors": errors,
        "normalized": normalized,
    }


test_user = {
    "username": "alice_01",
    "email": "Alice@example.com",
    "password": "Strong!Pass1",
    "phone": "(415) 555-1212",
    "country": "us",
}

print(validate_signup(test_user))


{'ok': True, 'errors': {}, 'normalized': {'username': 'alice_01', 'email': 'alice@example.com', 'phone': '+14155551212', 'country': 'US'}}
