**1.**


## Probability Mass Function (PMF) and Probability Density Function (PDF)

- **PMF**: Used for discrete random variables, it gives the probability of each possible value.
  Example: Rolling a die..

- **PDF**: Used for continuous random variables, it gives the probability density at each value. The area under the curve over an interval gives the probability.
  Example: Height distribution


**2.**

The Cumulative Density Function (CDF) gives the probability that a random variable X takes on a value less than or equal to a specific value x. It represents the accumulation of probabilities up to x.

Why CDF is Used?

1. To calculate probabilities for ranges of values.

2. To understand the distribution of data up to a certain point.

3. To compare distributions or model cumulative probabilities.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

# Generate x values
x = np.linspace(-4, 4, 1000)

# Calculate CDF for standard normal distribution
cdf = norm.cdf(x, loc=0, scale=1)

# Plot the CDF
plt.plot(x, cdf, label="Standard Normal CDF", color="green")
plt.title("Cumulative Density Function (CDF)")
plt.xlabel("x")
plt.ylabel("P(X ≤ x)")
plt.grid()
plt.legend()
plt.show()

**3.**

### **Examples of Normal Distribution Usage**:
1. **Heights of People**: Heights in a population typically follow a normal distribution.  
2. **Test Scores**: Standardized exam scores like SAT or IQ tests.  
3. **Measurement Errors**: Errors in scientific experiments often follow a normal distribution.  
4. **Stock Market Returns**: Daily returns of stocks or indices.  

### **Parameters and Shape**:
1. **Mean (\(μ\))**: Determines the center of the distribution.  
2. **Standard Deviation (\(σ\))**: Controls the spread:
   - Larger \(σ\): Wider and flatter curve.
   - Smaller \(σ\): Narrower and taller curve.  

The normal distribution is symmetrical around \(μ\).

**4.**

### **Importance of Normal Distribution**:
1. **Common Occurrence**: Many natural and social phenomena follow a normal distribution (e.g., heights, test scores).  
2. **Foundation for Statistical Methods**: Forms the basis for many statistical analyses, such as hypothesis testing and confidence intervals.  
3. **Predictability**: Allows probabilities to be calculated for ranges of data using its standard properties.  
4. **Central Limit Theorem**: The mean of a large number of independent random variables tends to follow a normal distribution, regardless of the original distribution.

---

### **Real-Life Examples**:
1. **Heights of Individuals**: Human heights in a population are normally distributed.  
2. **IQ Scores**: Intelligence scores are typically modeled as a normal distribution with a mean of 100 and standard deviation of 15.  
3. **Measurement Errors**: Errors in scientific experiments often follow a normal distribution.  
4. **Stock Returns**: Daily or monthly stock market returns approximate a normal distribution.  
5. **Blood Pressure**: Distribution of systolic blood pressure in a healthy population.

**5.**

### **Bernoulli Distribution**:
The **Bernoulli distribution** represents the probability distribution of a single trial with two possible outcomes: success (\(1\)) or failure (\(0\)).

- **Parameter**: \(p\), the probability of success (where \(0 ≤ p ≤ 1\)).  
- **Mean**: \(p\).  
- **Variance**: \(p(1-p)\).

---

### **Example**:
Flipping a coin:
- Success (\(1\)): Heads, with \(p = 0.5\).  
- Failure (\(0\)): Tails, with \(1-p = 0.5\).

---

### **Difference Between Bernoulli and Binomial Distributions**:
1. **Bernoulli Distribution**:  
   - Deals with a single trial.  
   - Example: Flipping a coin once.  

2. **Binomial Distribution**:  
   - Deals with multiple independent trials (\(n\)) of a Bernoulli process.  
   - Example: Flipping a coin 10 times and counting the number of heads.  
   - Parameters: \(n\) (number of trials) and \(p\) (probability of success).  

**6.**

### Problem Statement
Given:
- Mean (μ) = 50  
- Standard Deviation (\(σ\)) = 10  
- Observation: \(X > 60\)

We need to calculate \(P(X > 60)\) assuming a normal distribution.

### Steps:
1. **Standardize the Observation**:  
   Use the formula:  
   \[
   Z = \frac{X - μ}{σ}
   \]  
   Substituting \(X = 60, μ = 50, σ = 10\):  
   \[
   Z = \frac{60 - 50}{10} = 1
   \]

2. **Find \(P(Z > 1)\)**:  
   Using the standard normal distribution:  
   \[
   P(Z > 1) = 1 - P(Z ≤ 1)
   \]  
   From the Z-table or calculator:  
   \[
   P(Z ≤  1) = 0.8413
   \]  
   Therefore:  
   \[
   P(Z > 1) = 1 - 0.8413 = 0.1587
   \]

3. **Final Answer**:  
   The probability \(P(X > 60)\) is approximately **0.1587** (15.87%).


In [None]:
from scipy.stats import norm

# Given data
mean = 50
std_dev = 10
x = 60

# Calculate Z-score
z_score = (x - mean) / std_dev

# Calculate P(X > 60)
probability = 1 - norm.cdf(z_score)

print(f"The probability that X > 60 is approximately {probability:.4f}")


**7.**

### **Uniform Distribution**:
A **uniform distribution** is a probability distribution where all outcomes are equally likely.

- **Continuous Uniform Distribution**: The probability density function (PDF) is constant between two values \(a\) and \(b\), and the probability of any value within this range is the same.
  - **Formula**: \( f(x) = \frac{1}{b - a} \) for \( a ≤  x ≤  b \)
  - **Example**: The height of a person selected randomly from a group with heights between 150 cm and 200 cm.

- **Discrete Uniform Distribution**: Each outcome in a set of discrete values is equally likely.
  - **Example**: Rolling a fair die, where each number (1–6) has a probability of \(\frac{1}{6}\).



**8.**

### **Z-Score**:
The **z-score** (also called the **standard score**) indicates how many standard deviations a particular data point is from the mean of a dataset. It is calculated using the formula:

\[
z = \frac{X - μ}{σ}
\]

Where:  
- \(X) is the data point.  
- \(μ) is the mean of the dataset.  
- \(σ) is the standard deviation of the dataset.

### **Importance of Z-Score**:
1. **Standardization**: Converts data from different distributions into a standard scale, allowing comparison between datasets with different units.
2. **Identify Outliers**: A z-score greater than 3 or less than -3 typically indicates that the data point is an outlier.
3. **Probability Calculation**: Used in normal distributions to find probabilities and determine how likely a certain value is to occur.
4. **Normalization**: Helps in transforming data so that it follows a standard normal distribution (\(N(0, 1)\)).



**9.**

### **Central Limit Theorem (CLT)**:
The **Central Limit Theorem** states that the distribution of the sample mean will approach a **normal distribution** as the sample size increases, regardless of the shape of the population distribution, provided the samples are independent and identically distributed (i.i.d.). This holds true even for populations that are not normally distributed.

- **Key Points**:
  - For a large enough sample size \(n\), the sampling distribution of the sample mean will be approximately normal.
  - The mean of the sampling distribution will be equal to the population mean \(μ\).
  - The standard deviation of the sampling distribution (called the **standard error**) is \(\frac{σ}{\sqrt{n}}\), where \(σ\) is the population standard deviation.

### **Significance of CLT**:
1. **Approximation to Normality**: CLT allows us to apply normal distribution techniques (e.g., z-scores, confidence intervals) even if the underlying data is not normally distributed.
2. **Basis for Statistical Inference**: Many statistical methods, such as hypothesis testing and confidence intervals, rely on the assumption that sample means are normally distributed.
3. **Improved Accuracy with Larger Sample Sizes**: As the sample size increases, the sample mean becomes a more accurate estimate of the population mean.



**10.**

### **Assumptions of the Central Limit Theorem**:
1. **Independence**: The samples must be independent of each other.
2. **Identical Distribution**: The samples should be drawn from the same population or have the same distribution.
3. **Sample Size**: The sample size should be sufficiently large (typically \(n ≥ 30\)) for the sample mean to approximate a normal distribution.
4. **Finite Variance**: The population should have finite variance.