# STA130 Week 04 Homework
***
#### **Author**: David Daniliuc<br>**Created**: Fri Sept. 20, 2024

> #### Homework Chat Logs:
> - Question 1-4: *https://chatgpt.com/share/66fa1e6c-491c-8004-9d51-db0313f0c744*
> 
> - Question 5-: **
> 

### 1. The "Prelecture" video (above) mentioned the "standard error of the mean" as being the "standard deviation" of the distribution bootstrapped means.  What is the difference between the "standard error of the mean" and the "standard deviation" of the original data? What hypothetical idea do each of these capture? Explain this concisely in your own words.

The *standard deviation* of the original data measures how much individual data points differ from the mean of the dataset. It represents the hypothetical idea of the spread/variability among the data points in the dataset. As the *standard deviation* decreases, the data points are closer to the mean, and vice versa.

The *standard error of the mean* measures how much the mean of a sample is expected to differ from the population mean. The hypothetical idea represented by the *standard error of the mean* is the variability in the sample mean when sampling from a population. It also shows how as the sample size increases, the *standard error of the mean* decreases.

### 2. The "Prelecture" video (above) suggested that the "standard error of the mean" could be used to create a confidence interval, but didn't describe exactly how to do this.  How can we use the "standard error of the mean" to create a 95% confidence interval which "covers 95% of the bootstrapped sample means"? Explain this concisely in your own words.

**Steps to create a 95% confidence interval using the standard error of the mean:**
1. Calculate the standard error of the mean.
2. Find the critical value. Find the Z-score (approximately 1.96) for a 95% confidence interval using the standard normal distribution. Use T-score if the sample size is small.
3. Using the following formula to find the confidence interval.
	 * `Confidence Interval = Sample Mean +/- Critical Value * Standard Error of the Mean`

### 3. Creating the "sample mean plus and minus about 2 times the standard error" confidence interval addressed in the previous problem should indeed cover approximately 95% of the bootstrapped sample means. Alternatively, how do we create a 95% bootstrapped confidence interval using the bootstrapped means (without using their standard deviation to estimate the standard error of the mean)? Explain this concisely in your own words.

**Steps to create a 95% bootstrapped confidence interval using the bootstrapped means:**

1. Generated bootstrapped samples by resampling the original dataset thousands of times.
2. Calculate the bootstrapped mean of each generated bootstrapped sample. Sort them into a bootstrapped means distribution.
4. Determine the lower and upper bound of the confidence interval by finding the 2.5th and 97.5 percentile of this distribution.

### 4. The "Prelecture" video (above) mentioned that bootstrap confidence intervals could apply to other statistics of the sample, such as the "median". Work with a ChatBot to create code to produce a 95% bootstrap confidence interval for a population mean based on a sample that you have and comment the code to demonstrate how the code can be changed to produce a 95% bootstrap confidence interval for different population parameter (other than the population mean, such as the population median).

In [5]:
import numpy as np

# Sample data from a normal distribution (can be changed to any sample)
np.random.seed(130)
sample = np.random.normal(loc=0, scale=1, size=100)  # Sample of size 100 from N(0,1)

# Bootstrap process to compute confidence interval
def bootstrap_ci(data, stat_func=np.mean, num_bootstrap_samples=1000, confidence_level=0.95):
    """
    Calculate a bootstrap confidence interval for a given statistic.
    
    Parameters:
    - data: The original sample data (array-like).
    - stat_func: The statistic function to apply (default is np.mean). 
                 For example, use np.median for the median.
    - num_bootstrap_samples: Number of bootstrap samples (default 1000).
    - confidence_level: The desired confidence level (default is 0.95 for 95% CI).
    
    Returns:
    - lower_bound: The lower bound of the confidence interval.
    - upper_bound: The upper bound of the confidence interval.
    """
    # Generate bootstrap samples and compute the statistic for each sample
    bootstrap_stats = np.array([stat_func(np.random.choice(data, size=len(data), replace=True)) for _ in range(num_bootstrap_samples)])
    
    # Compute the confidence interval based on percentiles
    lower_bound = float(np.percentile(bootstrap_stats, (1 - confidence_level) / 2 * 100))
    upper_bound = float(np.percentile(bootstrap_stats, (1 + confidence_level) / 2 * 100))
    
    return lower_bound, upper_bound

# Calculate 95% bootstrap confidence interval for the mean
mean_ci = bootstrap_ci(sample, stat_func=np.mean)
print(f"95% Bootstrap Confidence Interval for the Mean: {mean_ci}")

# To calculate 95% CI for the median, simply change the 'stat_func' to np.median
median_ci = bootstrap_ci(sample, stat_func=np.median)
print(f"95% Bootstrap Confidence Interval for the Median: {median_ci}")


95% Bootstrap Confidence Interval for the Mean: (-0.018315467952926715, 0.36535703638434114)
95% Bootstrap Confidence Interval for the Median: (-0.09625506434163726, 0.3642670631309479)


### 5. The previous question addresses making a confidence interval for a population parameter based on a sample statistic. Why do we need to distinguish between the role of the popualation parameter and the sample sample statistic when it comes to confidence intervals? Explain this concisely in your own words.

### 9. Have you reviewed the course [wiki-textbook](https://github.com/pointOfive/stat130chat130/wiki) and interacted with a ChatBot (or, if that wasn't sufficient, real people in the course piazza discussion board or TA office hours) to help you understand all the material in the tutorial and lecture that you didn't quite follow when you first saw it?

This week I haven't used the *wiki-textbook* much. However, I have used ChatGPT a lot to understand Kernel Density Estimations and the other topics discussed at the end of this week's lecture.