# **THEORY:**
---

### **1. What is a random variable in probability theory?**

**Answer:**  
A **random variable** is a numerical outcome of a random experiment or process. It assigns a real number to each possible outcome in the sample space. Random variables can be either **discrete** (taking distinct values) or **continuous** (taking any value in a continuous range).

---

### **2. What are the types of random variables?**

**Answer:**  
The two main types of random variables are:
1. **Discrete Random Variable**: Takes finite or countable values (e.g., number of heads in a coin toss).
2. **Continuous Random Variable**: Takes infinite values within a given range (e.g., height, weight).

---

### **3. What is the difference between discrete and continuous distributions?**

**Answer:**  
- **Discrete Distribution**: The random variable can take only specific, distinct values (e.g., the number of cars in a parking lot).
- **Continuous Distribution**: The random variable can take any value within a continuous range (e.g., temperature, height).

---

### **4. What are probability distribution functions (PDF)?**

**Answer:**  
A **Probability Distribution Function (PDF)** is a function that describes the likelihood of each possible outcome of a random variable. For continuous random variables, it gives the probability that the variable falls within a particular range.

---

### **5. How do cumulative distribution functions (CDF) differ from probability distribution functions (PDF)?**

**Answer:**  
- **PDF**: Provides the probability density of the random variable at any point. It shows the probability of the variable taking specific values.
- **CDF**: Represents the cumulative probability that the random variable is less than or equal to a specific value. It is the integral of the PDF.

---

### **6. What is a discrete uniform distribution?**

**Answer:**  
A **Discrete Uniform Distribution** is a type of probability distribution where all outcomes are equally likely. For example, rolling a fair die, where each face (1-6) has an equal chance of occurring.

---

### **7. What are the key properties of a Bernoulli distribution?**

**Answer:**  
The **Bernoulli Distribution** is a discrete distribution with only two possible outcomes: success (1) or failure (0). Key properties:
- The probability of success is \( p \), and the probability of failure is \( 1-p \).
- It is the simplest form of a binomial distribution.

---

### **8. What is the binomial distribution, and how is it used in probability?**

**Answer:**  
The **Binomial Distribution** describes the number of successes in a fixed number of independent Bernoulli trials. It is used to model scenarios with two outcomes, like success/failure or yes/no, and is defined by the number of trials \( n \) and the probability of success \( p \).

---

### **9. What is the Poisson distribution and where is it applied?**

**Answer:**  
The **Poisson Distribution** models the number of events occurring in a fixed interval of time or space. It is often used to describe rare events, like the number of phone calls received by a call center in an hour or the number of accidents in a day.

---

### **10. What is a continuous uniform distribution?**

**Answer:**  
A **Continuous Uniform Distribution** is a probability distribution where all intervals of the same length within a given range are equally likely. For example, if a random variable can take any value between 0 and 1, all values in this interval are equally likely.

---

### **11. What are the characteristics of a normal distribution?**

**Answer:**  
The **Normal Distribution** has the following characteristics:
- Symmetrical and bell-shaped curve.
- Mean, median, and mode are all equal.
- The total area under the curve is 1.
- It is defined by two parameters: mean (μ) and standard deviation (σ).

---

### **12. What is the standard normal distribution, and why is it important?**

**Answer:**  
The **Standard Normal Distribution** is a normal distribution with a mean of 0 and a standard deviation of 1. It is important because it allows for standardized comparisons of different normal distributions, and Z-scores are calculated using this distribution.

---

### **13. What is the Central Limit Theorem (CLT), and why is it critical in statistics?**

**Answer:**  
The **Central Limit Theorem (CLT)** states that, for a large enough sample size, the sampling distribution of the sample mean will be approximately normally distributed, regardless of the original distribution. This is critical because it allows for the use of normal distribution methods in hypothesis testing and confidence intervals.

---

### **14. How does the Central Limit Theorem relate to the normal distribution?**

**Answer:**  
The CLT ensures that as the sample size increases, the distribution of the sample mean becomes normal, even if the original data is not normally distributed. This is why the normal distribution is often used in inferential statistics.

---

### **15. What is the application of Z statistics in hypothesis testing?**

**Answer:**  
**Z statistics** are used in hypothesis testing to determine whether a sample mean is significantly different from a population mean. It is calculated by subtracting the population mean from the sample mean and dividing by the standard deviation of the population.

---

### **16. How do you calculate a Z-score, and what does it represent?**

**Answer:**  
A **Z-score** is calculated as:

\[
Z = \frac{X - \mu}{\sigma}
\]

Where \( X \) is the data point, \( \mu \) is the population mean, and \( \sigma \) is the population standard deviation. It represents how many standard deviations a data point is from the mean.

---

### **17. What are point estimates and interval estimates in statistics?**

**Answer:**  
- **Point Estimate**: A single value estimate for a population parameter, such as the sample mean as an estimate for the population mean.
- **Interval Estimate**: A range of values used to estimate a population parameter, commonly represented by confidence intervals.

---

### **18. What is the significance of confidence intervals in statistical analysis?**

**Answer:**  
A **Confidence Interval** provides a range of values within which the true population parameter is likely to fall. It is important because it quantifies the uncertainty in the estimate and helps with decision-making.

---

### **19. What is the relationship between a Z-score and a confidence interval?**

**Answer:**  
The **Z-score** is used to calculate the **confidence interval** by determining the number of standard deviations from the mean. For example, a 95% confidence interval corresponds to a Z-score of approximately 1.96.

---

### **20. How are Z-scores used to compare different distributions?**

**Answer:**  
**Z-scores** standardize data from different distributions, allowing comparisons of values from different datasets on a common scale. This helps in comparing values even when the distributions have different means and standard deviations.

---

### **21. What are the assumptions for applying the Central Limit Theorem?**

**Answer:**  
The main assumptions are:
1. The sample size must be sufficiently large (typically \( n > 30 \)).
2. The samples must be independent.
3. The population should have a finite variance.

---

### **22. What is the concept of expected value in a probability distribution?**

**Answer:**  
The **Expected Value** (or mean) is the long-run average value of a random variable in a probability distribution. It is calculated as:

\[
E(X) = \sum (x_i \times P(x_i))
\]

Where \( x_i \) are the possible values and \( P(x_i) \) is the probability of each value.

---

### **23. How does a probability distribution relate to the expected outcome of a random variable?**

**Answer:**  
A **probability distribution** describes the likelihood of different outcomes, and the **expected value** is the weighted average of these outcomes, representing the long-term average if the experiment is repeated many times.

---


# **Practical:**
---
**1. Write a Python program to generate a random variable and display its value**

```python
import random
random_variable = random.randint(1, 100)  # Generating a random integer between 1 and 100
print(f"Random Variable: {random_variable}")
```

---

**2. Generate a discrete uniform distribution using Python and plot the probability mass function (PMF)**

```python
import matplotlib.pyplot as plt
import numpy as np

data = np.random.randint(1, 7, size=1000)  # Simulating a die roll
plt.hist(data, bins=6, density=True, alpha=0.7, color='blue')
plt.title('Discrete Uniform Distribution (Die Roll)')
plt.xlabel('Value')
plt.ylabel('Probability')
plt.show()
```

---

**3. Write a Python function to calculate the probability distribution function (PDF) of a Bernoulli distribution**

```python
from scipy.stats import bernoulli
def bernoulli_pdf(p, x):
    return bernoulli.pmf(x, p)

p = 0.5  # Probability of success
x = [0, 1]  # Outcomes
pdf_values = [bernoulli_pdf(p, i) for i in x]
print(f"Bernoulli PDF: {pdf_values}")
```

---

**4. Write a Python script to simulate a binomial distribution with n=10 and p=0.5, then plot its histogram**

```python
import numpy as np
import matplotlib.pyplot as plt

n = 10
p = 0.5
data = np.random.binomial(n, p, size=1000)

plt.hist(data, bins=range(0, n+2), density=True, alpha=0.7, color='green')
plt.title('Binomial Distribution (n=10, p=0.5)')
plt.xlabel('Number of successes')
plt.ylabel('Probability')
plt.show()
```


---

### **5. Create a Poisson distribution and visualize it using Python**

```python
import numpy as np
import matplotlib.pyplot as plt

# Parameters for Poisson distribution
lam = 5  # rate of occurrences
data = np.random.poisson(lam, 1000)

# Plot the distribution
plt.hist(data, bins=30, density=True, alpha=0.7, color='purple')
plt.title('Poisson Distribution (λ=5)')
plt.xlabel('Number of events')
plt.ylabel('Probability')
plt.show()
```

---

### **6. Write a Python program to calculate and plot the cumulative distribution function (CDF) of a discrete uniform distribution**

```python
import numpy as np
import matplotlib.pyplot as plt

data = np.random.randint(1, 7, size=1000)  # Simulating a die roll
# Calculate the CDF
counts, bin_edges = np.histogram(data, bins=6, density=True)
cdf = np.cumsum(counts) * np.diff(bin_edges)[0]

plt.step(bin_edges[1:], cdf, where='mid', color='orange')
plt.title('CDF of Discrete Uniform Distribution')
plt.xlabel('Value')
plt.ylabel('Cumulative Probability')
plt.show()
```

---

### **7. Generate a continuous uniform distribution using NumPy and visualize it**

```python
import numpy as np
import matplotlib.pyplot as plt

# Generate data from continuous uniform distribution between 0 and 1
data = np.random.uniform(0, 1, 1000)

# Plot the histogram
plt.hist(data, bins=30, density=True, alpha=0.7, color='brown')
plt.title('Continuous Uniform Distribution')
plt.xlabel('Value')
plt.ylabel('Probability')
plt.show()
```

---

### **8. Simulate data from a normal distribution and plot its histogram**

```python
import numpy as np
import matplotlib.pyplot as plt

# Generate 1000 data points from a normal distribution (mean=0, std=1)
data = np.random.normal(0, 1, 1000)

# Plot the histogram
plt.hist(data, bins=30, density=True, alpha=0.7, color='blue')
plt.title('Normal Distribution (mean=0, std=1)')
plt.xlabel('Value')
plt.ylabel('Probability')
plt.show()
```

---

### **9. Write a Python function to calculate Z-scores from a dataset and plot them**

```python
import numpy as np
import matplotlib.pyplot as plt

# Z-score function
def calculate_z_scores(data):
    mean = np.mean(data)
    std_dev = np.std(data)
    return (data - mean) / std_dev

# Generate sample data
data = np.random.normal(10, 5, 1000)

# Calculate Z-scores
z_scores = calculate_z_scores(data)

# Plot the Z-scores
plt.hist(z_scores, bins=30, density=True, alpha=0.7, color='green')
plt.title('Z-Scores Distribution')
plt.xlabel('Z-score')
plt.ylabel('Density')
plt.show()
```

---

### **10. Implement the Central Limit Theorem (CLT) using Python for a non-normal distribution.**

```python
import numpy as np
import matplotlib.pyplot as plt

# Non-normal distribution (Exponential distribution)
data = np.random.exponential(scale=1, size=10000)

# Draw random samples of size 30 and calculate the mean
sample_means = [np.mean(np.random.choice(data, size=30)) for _ in range(1000)]

# Plot the sampling distribution of the sample means
plt.hist(sample_means, bins=30, density=True, alpha=0.7, color='red')
plt.title('Sampling Distribution of the Sample Mean (CLT)')
plt.xlabel('Sample Mean')
plt.ylabel('Density')
plt.show()
```

---

### **11. Simulate multiple samples from a normal distribution and verify the Central Limit Theorem**

```python
import numpy as np
import matplotlib.pyplot as plt

# Parameters for the normal distribution
mu = 0   # mean
sigma = 1  # standard deviation
sample_size = 30
n_samples = 1000

# Simulate multiple samples and compute their means
sample_means = [np.mean(np.random.normal(mu, sigma, sample_size)) for _ in range(n_samples)]

# Plot the histogram of the sample means
plt.hist(sample_means, bins=30, density=True, alpha=0.7, color='purple')
plt.title('Sampling Distribution of Sample Means - Central Limit Theorem')
plt.xlabel('Sample Mean')
plt.ylabel('Density')
plt.show()
```

---

### **12. Write a Python function to calculate and plot the standard normal distribution (mean = 0, std = 1)**

```python
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

# Create a range of values
x = np.linspace(-5, 5, 1000)

# Calculate the PDF for standard normal distribution
pdf_values = norm.pdf(x, 0, 1)

# Plot the standard normal distribution
plt.plot(x, pdf_values, label="Standard Normal Distribution")
plt.title('Standard Normal Distribution (mean=0, std=1)')
plt.xlabel('Z-score')
plt.ylabel('Density')
plt.legend()
plt.show()
```

---

### **13. Generate random variables and calculate their corresponding probabilities using the binomial distribution**

```python
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import binom

# Parameters for binomial distribution
n = 10  # Number of trials
p = 0.5  # Probability of success
size = 1000  # Number of samples

# Generate random variables from binomial distribution
data = np.random.binomial(n, p, size)

# Plot the probability mass function (PMF)
x = np.arange(0, n+1)
pmf_values = binom.pmf(x, n, p)

plt.bar(x, pmf_values, color='blue', alpha=0.7)
plt.title('Binomial Distribution PMF')
plt.xlabel('Number of successes')
plt.ylabel('Probability')
plt.show()
```

---

### **14. Write a Python program to calculate the Z-score for a given data point and compare it to a standard normal distribution**

```python
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

# Parameters for the normal distribution
mu = 10
sigma = 5

# Data point for which we need the Z-score
data_point = 12

# Calculate the Z-score
z_score = (data_point - mu) / sigma
print(f'Z-score for {data_point}: {z_score}')

# Plot the standard normal distribution
x = np.linspace(mu - 3*sigma, mu + 3*sigma, 1000)
pdf_values = norm.pdf(x, mu, sigma)

plt.plot(x, pdf_values, label="Normal Distribution (mean=10, std=5)")
plt.axvline(data_point, color='red', linestyle='--', label=f'Data Point ({data_point})')
plt.title('Z-score Calculation and Normal Distribution')
plt.xlabel('Value')
plt.ylabel('Density')
plt.legend()
plt.show()
```

---

### **15. Implement hypothesis testing using Z-statistics for a sample dataset**

```python
import numpy as np
import scipy.stats as stats

# Sample data
sample_data = np.random.normal(50, 10, 100)  # mean=50, std=10, sample size=100

# Hypothesis test
sample_mean = np.mean(sample_data)
population_mean = 50
std_dev = np.std(sample_data, ddof=1)  # Sample standard deviation
n = len(sample_data)

# Z-statistic
z_stat = (sample_mean - population_mean) / (std_dev / np.sqrt(n))

# Calculate the p-value
p_value = 2 * (1 - stats.norm.cdf(abs(z_stat)))

print(f'Z-statistic: {z_stat}')
print(f'P-value: {p_value}')
```

---

### **16. Create a confidence interval for a dataset using Python and interpret the result**

```python
import numpy as np
import scipy.stats as stats

# Sample data
data = np.random.normal(50, 10, 100)  # mean=50, std=10, sample size=100

# Calculate mean and standard error
mean = np.mean(data)
std_error = np.std(data, ddof=1) / np.sqrt(len(data))

# Confidence level (95%)
confidence_level = 0.95
z_score = stats.norm.ppf((1 + confidence_level) / 2)

# Confidence interval
lower_bound = mean - z_score * std_error
upper_bound = mean + z_score * std_error

print(f'95% Confidence Interval: ({lower_bound}, {upper_bound})')
```

---
