# statistics module 2

Q1. What is a random variable in probability theory?

ans-  In probability theory, a random variable is a variable whose value is subject to variations due to chance (randomness). It's essentially a function that maps the outcomes of a random phenomenon to numerical values. For example, if you flip a coin twice, the number of heads (0, 1, or 2) is a random variable. Random variables can be discrete (taking a finite or countably infinite number of values) or continuous (taking any value within a given range).

Q2. What are the types of random variables?
ans-  Random variables can generally be classified into two main types:

Discrete Random Variable: A discrete random variable is one that can take on a finite or countably infinite number of values. These values are often integers and are typically the result of counting. Examples include the number of heads in a coin flip, the number of cars passing a certain point in an hour, or the number of defective items in a sample.

Continuous Random Variable: A continuous random variable is one that can take on any value within a given range or interval. These values are typically the result of measuring. Examples include the height of a person, the temperature of a room, the time it takes to complete a task, or the amount of rainfall in a day.

Q3. Explain the difference between discrete and continuous distributions.

ans-  
- Discrete Distributions: These describe random variables that can only take on a finite or countably infinite number of distinct values. Think of things you can count, like the number of heads in coin flips, the number of defects in a batch, or the number of cars passing a point. The probability is associated with specific, individual values. Examples include the Bernoulli, Binomial, Poisson, and Geometric distributions.

- Continuous Distributions: These describe random variables that can take on any value within a given range or interval. Think of things you measure, like height, weight, temperature, or time. For continuous variables, the probability of a random variable taking on any exact single value is zero. Instead, probabilities are associated with intervals (e.g., the probability that a person's height is between 170 cm and 180 cm). Examples include the Normal (Gaussian), Uniform, Exponential, and Chi-Squared distributions.


Q4. What is a binomial distribution, and how is it used in probability?

ans-  The binomial distribution is a discrete probability distribution that models the number of successes in a fixed number of independent trials, where each trial has only two possible outcomes (success or failure) and the probability of success is constant for each trial. It is used in probability to calculate the likelihood of observing a specific number of successes within a set number of attempts.

The probability mass function (PMF) for a binomial distribution is given by:

P(X = k) = C(n, k) * p^k * (1-p)^(n-k)

Where:

P(X = k): The probability of getting exactly 'k' successes.
C(n, k): The binomial coefficient, representing the number of ways to choose 'k' successes from 'n' trials (read as "n choose k").
p: The probability of success on a single trial.
(1-p): The probability of failure on a single trial.
n: The total number of trials.
k: The number of successes.

Q5. What is the standard normal distribution, and why is it important?

ans-  The standard normal distribution is a specific normal distribution with a mean  of 0 and a standard deviation (σ) of 1. It's important because it allows us to:

Standardize data: Convert any normal distribution to this common form using Z-scores, making different datasets comparable.
Calculate probabilities: Easily find the probability of values using standard tables or software, as the total area under its curve is 1.

Q6. What is the Central Limit Theorem (CLT), and why is it critical in statistics?

ans-  The Central Limit Theorem (CLT) states that, given a sufficiently large sample size, the sampling distribution of the sample mean (or sum) will be approximately normally distributed, regardless of the shape of the original population distribution. This approximation improves as the sample size increases.

It is critical in statistics because:

1.  Enables Inference
2.  Foundation for Hypothesis Testing
3.  Simplifies Analysis

Q7. What is the significance of confidence intervals in statistical analysis?

ans-  Confidence intervals are crucial in statistics because they provide a range of plausible values for an unknown population parameter (like a mean or proportion), rather than just a single point estimate. This range comes with a specified confidence level (e.g., 95%), indicating how often this type of interval would capture the true parameter if the experiment were repeated many times.

Q8. What is the concept of expected value in a probability distribution?

ans-  The expected value of a probability distribution is the long-run average value of a random variable. It represents the weighted average of all possible outcomes, where each outcome's weight is its probability.

Q9.  Write a Python program to generate 1000 random numbers from a normal
distribution with mean = 50 and standard deviation = 5. Compute its mean and standard
deviation using NumPy, and draw a histogram to visualize the distribution.

ans-  
### import numpy as np
import matplotlib.pyplot as plt

mean = 50
std_dev = 5
num_samples = 1000

random_numbers = np.random.normal(loc=mean, scale=std_dev, size=num_samples)

print(f"Generated {num_samples} random numbers.")
print(f"First 5 numbers: {random_numbers[:5]}")

computed_mean = np.mean(random_numbers)
computed_std_dev = np.std(random_numbers)

print(f"Computed Mean: {computed_mean:.2f}")
print(f"Computed Standard Deviation: {computed_std_dev:.2f}")

plt.figure(figsize=(10, 6))
plt.hist(random_numbers, bins=30, density=True, alpha=0.6, color='g', edgecolor='black')
plt.title('Histogram of Random Numbers from Normal Distribution')
plt.xlabel('Value')
plt.ylabel('Probability Density')
plt.grid(axis='y', alpha=0.75)
plt.show()
###


Q10.  You are working as a data analyst for a retail company. The company has
collected daily sales data for 2 years and wants you to identify the overall sales trend.
daily_sales = [220, 245, 210, 265, 230, 250, 260, 275, 240, 255,
 235, 260, 245, 250, 225, 270, 265, 255, 250, 260]
● Explain how you would apply the Central Limit Theorem to estimate the average sales
with a 95% confidence interval.
● Write the Python code to compute the mean sales and its confidence interval.


ans-  import numpy as np
from scipy import stats

daily_sales = [
    220, 245, 210, 265, 230, 250, 260, 275, 240, 255,
    235, 260, 245, 250, 225, 270, 265, 255, 250, 260
]


mean_sales = np.mean(daily_sales)

std_dev_sales = np.std(daily_sales, ddof=1)

n = len(daily_sales)

confidence_level = 0.95
alpha = 1 - confidence_level


df = n - 1


t_critical = stats.t.ppf(1 - alpha / 2, df)

std_err_mean = std_dev_sales / np.sqrt(n)

margin_of_error = t_critical * std_err_mean

confidence_interval_lower = mean_sales - margin_of_error
confidence_interval_upper = mean_sales + margin_of_error

print(f"Daily Sales Data: {daily_sales}")
print(f"\nSample Mean Sales: {mean_sales:.2f}")
print(f"Sample Standard Deviation: {std_dev_sales:.2f}")
print(f"Sample Size (n): {n}")
print(f"t-critical value for {confidence_level*100}% CI: {t_critical:.3f}")
print(f"Standard Error of the Mean: {std_err_mean:.2f}")
print(f"Margin of Error: {margin_of_error:.2f}")
print(f"\n{confidence_level*100}% Confidence Interval for Average Sales: ")
print(f"[{confidence_interval_lower:.2f}, {confidence_interval_upper:.2f}]")
###