# Q1: What are the Probability Mass Function (PMF) and Probability Density Function (PDF)? Explain with an example.
Ans: \


###  **Probability Mass Function (PMF):**
- **Used for:** **Discrete random variables**
- **Definition:** PMF gives the **probability** that a discrete random variable is exactly equal to some value.

####  Example (PMF):
Let’s say we roll a fair 6-sided die. The random variable `X` is the number that shows up.

Then,  
**PMF of X:**
```math
P(X = x) = 1/6  for x ∈ {1, 2, 3, 4, 5, 6}
```

This means the probability of rolling any number from 1 to 6 is **equal and discrete**.

---

###  **Probability Density Function (PDF):**
- **Used for:** **Continuous random variables**
- **Definition:** PDF gives the **relative likelihood** that a continuous random variable falls within a particular range of values.  
It **does not** give the probability at an exact point (since it would be zero), but instead over an **interval**.

####  Example (PDF):
Let `X` be a continuous variable that follows a **normal distribution** with mean = 0 and standard deviation = 1. Then the PDF is:

```math
f(x) = (1 / √(2π)) * e^(-x²/2)
```

To find the **probability** that `X` is between 0 and 1, we compute:

```math
P(0 ≤ X ≤ 1) = ∫₀¹ f(x) dx
```

---

###  Summary:

| Feature                | PMF                                | PDF                                 |
|------------------------|-------------------------------------|--------------------------------------|
| Type of Variable       | Discrete                           | Continuous                          |
| Output                 | Probability at specific value      | Density (not exact probability)     |
| Value at exact point   | Gives probability                  | Zero (need to integrate for range)  |
| Example                | Rolling a die                      | Heights of people                   |


# Q2: What is Cumulative Density Function (CDF)? Explain with an example. Why CDF is used?
Ans: \

The **Cumulative Distribution Function (CDF)** gives the **probability that a random variable \( X \)** is **less than or equal to** a certain value \( x \):

\[
\text{CDF: } F(x) = P(X \leq x)
\]

It works for both **discrete** and **continuous** random variables.

---

###  Why is CDF Used?

- To understand **how probability accumulates** across values.
- Helps in computing **probabilities over intervals**.
- Used to compare **distributions** or assess **percentiles/quantiles**.
- Helpful in statistical analysis, hypothesis testing, and simulations.

---

###  Example (Discrete):

Suppose a fair die is rolled, and let \( X \) be the number on the die (1 to 6).  
The CDF at \( x = 3 \) is:

\[
F(3) = P(X \leq 3) = P(X = 1) + P(X = 2) + P(X = 3) = 1\6 + 1\6 + 1\6 = 0.5
\]

So, there's a 50% chance of rolling a number less than or equal to 3.

---

###  Example (Continuous):

Let’s say \( X \sim N(0,1) \), a standard normal distribution.  
The CDF at \( x = 1 \) is:

\[
F(1) = P(X
<= 1)
\approx 0.8413
\]

This means about **84.13%** of the values in a standard normal distribution are less than or equal to 1.

---

###  Key Points:

| Feature         | Description |
|----------------|-------------|
| Symbol         | \( F(x) \) |
| Value Range    | 0 to 1 |
| Always         | Non-decreasing |
| Use Case       | Finding probabilities up to a point |
| At infinity    | \( \lim_{x \to \infty} F(x) = 1 \) |


# Q3: What are some examples of situations where the normal distribution might be used as a model? Explain how the parameters of the normal distribution relate to the shape of the distribution.
Ans: \


### **Real-Life Situations Where Normal Distribution is Used:**

1. **Heights of people**:  
   Human heights (within a population) are usually symmetrically distributed around a mean — a classic example of a normal distribution.

2. **Test scores (e.g., SAT, IQ)**:  
   Standardized test scores often follow a bell curve where most students score near the average.

3. **Measurement errors**:  
   Random errors in physical or scientific measurements tend to follow a normal distribution due to the Central Limit Theorem.

4. **Blood pressure readings**:  
   In a healthy population, blood pressure is usually normally distributed.

5. **Returns in finance (over short periods)**:  
   Short-term stock returns or asset price changes often approximate a normal distribution.

---

###  **Parameters of the Normal Distribution:**

The normal distribution is defined by **two parameters**:  
1. **Mean (μ)** — *center/position* of the distribution  
2. **Standard Deviation (σ)** — *spread/width* of the distribution

---

###  How They Affect the Shape:

| Parameter | Effect on Shape |
|----------|-----------------|
| **μ (Mean)** | Shifts the curve left or right along the x-axis. It determines the **center** of the distribution. |
| **σ (Standard Deviation)** | Controls the **spread**. A larger σ flattens and widens the curve, while a smaller σ makes it steeper and narrower. |

---

###  Summary:

- The **normal distribution** is useful in modeling natural and human phenomena that are influenced by many small, independent factors.
- The **bell shape** is **symmetrical**, with the **mean = median = mode** at the center.
- **μ and σ** are key to customizing the distribution to fit different datasets.


# Q4: Explain the importance of Normal Distribution. Give a few real-life examples of Normal Distribution.
Ans: \
### Importance of Normal Distribution & Real-Life Examples


###  **Importance of Normal Distribution:**

1. **Foundation for Statistical Inference**  
   Many statistical tests (like t-tests, ANOVA, and regression) assume that data is normally distributed. This makes the normal distribution central to hypothesis testing.

2. **Basis of the Central Limit Theorem (CLT)**  
   The CLT states that the sampling distribution of the sample mean approaches a normal distribution as sample size increases, regardless of the population's original distribution.

3. **Predictability**  
   In a normal distribution:
   - ~68% of data falls within ±1σ of the mean  
   - ~95% within ±2σ  
   - ~99.7% within ±3σ  
   This helps in identifying outliers, risks, and making decisions.

4. **Modeling Natural Phenomena**  
   Many naturally occurring variables (like human traits or measurement errors) follow a normal pattern, so it becomes a useful model for real-world data.

5. **Simplifies Complex Calculations**  
   With known properties and symmetry, it’s easier to compute probabilities, confidence intervals, and perform analyses.

---

###  **Real-Life Examples of Normal Distribution:**

1. **Human Height and Weight**  
   Heights of people in a specific age and gender group are normally distributed around a mean value.

2. **IQ Scores**  
   Intelligence Quotient (IQ) scores are designed to follow a normal distribution with a mean of 100 and a standard deviation of 15.

3. **Test Scores**  
   Standardized exams like SATs and GREs often aim to have a normal distribution of scores.

4. **Blood Pressure**  
   In a healthy population, blood pressure tends to cluster around a mean, forming a bell-shaped curve.

5. **Measurement Errors**  
   Repeated measurements of the same quantity often have small errors that follow a normal distribution due to random variation.

---

###  Summary:

The normal distribution is **a powerful tool** because it naturally arises in many contexts and simplifies both **descriptive and inferential statistics**. Understanding it is crucial for interpreting and modeling real-world data accurately.

# Q5: What is Bernaulli Distribution? Give an Example. What is the difference between Bernoulli Distribution and Binomial Distribution?
Ans: \
##Bernoulli Distribution:
A Bernoulli distribution is a discrete probability distribution for a random variable that has exactly two possible outcomes: success (1) and failure (0). It is used to model situations where there is a single trial.

Parameter: p = probability of success (0 ≤ p ≤ 1)

The probability of failure is 1 - p

## Example:
Tossing a coin once:

Success = getting a Head (1)

Failure = getting a Tail (0)

If the coin is fair:

P(Head) = 0.5

P(Tail) = 0.5

# Generate Bernoulli sample
```
sample = np.random.binomial(n=1, p=0.5, size=1000)  # same as Bernoulli(p)
```
## Difference Between Bernoulli and Binomial Distributions:
 1. Type of Experiment \
Bernoulli Distribution: Represents a single trial experiment where the outcome is either a success or a failure.

>Binomial Distribution: Represents multiple independent trials of a Bernoulli experiment, where you count the number of successes.

 2. Number of Trials (n) \
Bernoulli Distribution: There is only 1 trial (n = 1).

>Binomial Distribution: There are n trials, where n ≥ 1. Each trial is independent and has the same probability of success.

 3. Outcomes \
Bernoulli Distribution: Only two outcomes are possible:

>1 (success)

>0 (failure)

>Binomial Distribution: The outcome is the total number of successes in the n trials. This can be any integer from 0 to n.

 4. Example \
Bernoulli Distribution: Tossing a coin once. You either get a head (success) or a tail (failure).

>Binomial Distribution: Tossing a coin 10 times and counting how many times heads (successes) occur.

 5. Random Variable \
Bernoulli Distribution: The random variable X takes values 0 or 1 only.

>Binomial Distribution: The random variable X takes values in the range 0 to n (i.e., 0, 1, 2, ..., n), depending on the number of successes.

# Q6. Consider a dataset with a mean of 50 and a standard deviation of 10. If we assume that the dataset is normally distributed, what is the probability that a randomly selected observation will be greater than 60? Use the appropriate formula and show your calculations.
Ans: \
To solve this problem, we use the **Z-score** formula and then find the probability from the **standard normal distribution**.

---

###  Given:
- Mean (μ) = 50  
- Standard deviation (σ) = 10  
- Value (x) = 60  

---

###  Step 1: Calculate the Z-score
$$
Z = \frac{x - \mu}{\sigma} = \frac{60 - 50}{10} = 1.0
$$


---

###  Step 2: Find the probability that Z > 1.0  
From the standard normal distribution table:
\[
P(Z < 1.0) = 0.8413
\]
\[
P(Z > 1.0) = 1 - 0.8413 = 0.1587
\]

---

###  Final Answer:
**The probability that a randomly selected observation is greater than 60 is approximately 0.1587 or 15.87%.**

# Q7: Explain uniform Distribution with an example.
Ans: \
**Uniform Distribution** is a type of probability distribution in which all outcomes are equally likely within a given range. It can be:

- **Discrete Uniform Distribution**: Finite number of outcomes (e.g., rolling a fair die).
- **Continuous Uniform Distribution**: Infinite outcomes in a continuous range (e.g., choosing a number at random between 0 and 1).

---

###  **Example (Continuous Uniform Distribution):**

Suppose you are waiting for a bus that will arrive **anytime between 10:00 AM and 10:30 AM**, and it’s equally likely to arrive at any moment in that time window.

Let:
- \( a = 0 \) minutes (10:00 AM)
- \( b = 30 \) minutes (10:30 AM)

If you want to find the probability that the bus arrives between 10:10 and 10:20 (i.e., between 10 and 20 minutes), then:
$$
[
P(10 \leq X \leq 20) = \frac{20 - 10}{30 - 0} = \frac{10}{30} = 0.33
]
$$
So, there's a **33% chance** the bus arrives in that 10-minute window.


# Q8: What is the z score? State the importance of the z score.
Ans: \
The **Z-score** (also called a *standard score*) tells you **how many standard deviations** a data point is from the **mean** of the dataset.

---

###  **Formula:**
$$
[
Z = \frac{X - \mu}{\sigma}
]
$$
Where:

- ( X ) = value \
- ( mu ) = mean of the dataset
- \( sigma \) = standard deviation

---

###  **Importance of the Z-score:**

1. **Standardization**: Converts different datasets to a common scale.
2. **Outlier Detection**: Helps identify extreme values (typically \( |Z| > 2 \) or \( |Z| > 3 \)).
3. **Comparisons**: Allows comparison of scores from different distributions.
4. **Probability Calculation**: Used in standard normal distribution to find probabilities.
5. **Hypothesis Testing**: Central in z-tests for determining statistical significance.

---

###  **Example:**

If a student's exam score is 85, with a class mean of 75 and standard deviation of 5:
$$
[
Z = \frac{85 - 75}{5} = 2
]
$$
→ The student scored **2 standard deviations above the mean**.

# Q9: What is Central Limit Theorem? State the significance of the Central Limit Theorem.
Ans: \
The **Central Limit Theorem** is a fundamental concept in statistics that states:

> *"The sampling distribution of the sample mean approaches a normal distribution as the sample size becomes large, regardless of the shape of the population distribution, provided the samples are independent and identically distributed."*

---

###  **Key Points:**

- Applies when **sample size (n) is sufficiently large** (typically $$( n \geq 30 ))$$
- Works **even if the population distribution is not normal**.
- The **mean** of the sampling distribution equals the **population mean** $$( (\mu)).$$
- The **standard deviation** of the sampling distribution is called the **standard error**:
  $$
  [
  \text{SE} = \frac{\sigma}{\sqrt{n}}
  ]
  $$
---

###  **Significance of the Central Limit Theorem:**

1. **Foundation for Inferential Statistics**: Enables use of normal distribution to estimate population parameters.
2. **Hypothesis Testing**: Justifies the use of z-tests and t-tests.
3. **Confidence Intervals**: Helps construct intervals for means/proportions.
4. **Practical Application**: Makes analysis possible even with non-normal data, if the sample size is large.

---

###  **Example:**

Imagine measuring the average height of students across many samples. Even if the height distribution is skewed, the **distribution of the sample means** will be approximately normal if the sample size is large.

# Q10: State the assumptions of the Central Limit Theorem.
Ans: \
To apply the Central Limit Theorem correctly, the following assumptions must be met:

---

### **1. Independent Observations**
- The samples must be drawn **independently** of each other.
- The outcome of one observation **should not influence** another.

---

### **2. Identically Distributed Samples**
- Each sample should come from the **same distribution** (i.e., same mean and standard deviation).
- This means each observation is from the same population.

---

### **3. Sample Size Should Be Large**
- Generally, a sample size of **n ≥ 30** is considered sufficient.
- If the population is **highly skewed**, even larger samples may be needed.
- If the population is **normally distributed**, CLT applies even with smaller sample sizes.

---

### **4. Finite Variance**
- The population from which samples are drawn must have a **finite variance** (no infinite or undefined standard deviation).

---

### **Why These Matter:**

These assumptions ensure that the **distribution of the sample means** will tend toward a **normal distribution**, enabling reliable statistical inference.
