<a href="https://colab.research.google.com/github/thepersonuadmire/statisticsAdvance1/blob/main/StatsAdv1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

1. What is a random variable in probability theory?


A random variable is a numerical outcome of a random phenomenon. It assigns a real number to each outcome in a sample space, allowing for the quantification of uncertainty.

2. What are the types of random variables?


There are two main types of random variables:

Discrete Random Variables: These can take on a countable number of values (e.g., the number of heads in a series of coin flips).

Continuous Random Variables: These can take on an infinite number of values within a given range (e.g., the height of individuals).

3. What is the difference between discrete and continuous distributions?


Discrete Distributions: These describe the probabilities of discrete random variables. The probability mass function (PMF) is used to define the probabilities.

Continuous Distributions: These describe the probabilities of continuous random variables. The probability density function (PDF) is used, and probabilities are calculated over intervals.

4. What are probability distribution functions (PDF)?


A probability distribution function (PDF) describes the likelihood of a random variable taking on a particular value. For discrete variables, it is called the probability mass function (PMF), while for continuous variables, it is the probability density function.

5. How do cumulative distribution functions (CDF) differ from probability distribution functions (PDF)?


The cumulative distribution function (CDF) gives the probability that a random variable is less than or equal to a certain value. It is the integral of the PDF for continuous variables or the sum of the PMF for discrete variables.

6. What is a discrete uniform distribution?


A discrete uniform distribution is a type of probability distribution where all outcomes are equally likely. For example, rolling a fair die results in a discrete uniform distribution with six equally likely outcomes.

7. What are the key properties of a Bernoulli distribution?


A Bernoulli distribution has two possible outcomes: success (1) and failure (0). Key properties include:

The probability of success is denoted by ( p ).

The mean is ( p ) and the variance is ( p(1-p) ).

8. What is the binomial distribution, and how is it used in probability?


The binomial distribution models the number of successes in a fixed number of independent Bernoulli trials. It is characterized by two parameters: ( n ) (number of trials) and ( p ) (probability of success). It is used in scenarios like flipping a coin multiple times.

9. What is the Poisson distribution and where is it applied?


The Poisson distribution models the number of events occurring in a fixed interval of time or space, given that these events happen with a known constant mean rate and independently of the time since the last event. It is often used in fields like telecommunications and traffic flow.

10. What is a continuous uniform distribution?


A continuous uniform distribution is a type of distribution where all intervals of the same length within the range are equally probable. It is defined by two parameters: the minimum and maximum values.

11. What are the characteristics of a normal distribution?


A normal distribution is symmetric and bell-shaped, characterized by its mean (µ) and standard deviation (σ). Key properties include:

Approximately 68% of the data falls within one standard deviation of the mean.

Approximately 95% falls within two standard deviations.

Approximately 99.7% falls within three standard deviations.

12. What is the standard normal distribution, and why is it important?


The standard normal distribution is a normal distribution with a mean of 0 and a standard deviation of 1. It is important because it allows for the standardization of scores (Z-scores), enabling comparison across different normal distributions.

13. What is the Central Limit Theorem (CLT), and why is it critical in statistics?


The Central Limit Theorem states that the distribution of the sample mean approaches a normal distribution as the sample size increases, regardless of the original distribution of the population. It is critical because it justifies the use of normal distribution in inferential statistics.

14. How does the Central Limit Theorem relate to the normal distribution?


The CLT indicates that as the sample size increases, the sampling distribution of the sample mean will approximate a normal distribution, even if the population distribution is not normal.


15. What is the application of Z statistics in hypothesis testing?


Z statistics are used in hypothesis testing to determine how far away a sample mean is from the population mean in terms of standard deviations. It helps in deciding whether to reject the null hypothesis.

16. How do you calculate a Z-score, and what does it represent?


A Z-score is calculated using the formula [ Z = \frac{(X - \mu)}{\sigma} ] where ( X ) is the value, ( \mu ) is the mean, and ( \sigma ) is the standard deviation. The Z-score represents the number of standard deviations a data point is from the mean, indicating how unusual or typical the data point is.

17. What are point estimates and interval estimates in statistics?


Point Estimates: A single value estimate of a population parameter (e.g., sample mean as an estimate of population mean).
Interval Estimates: A range of values (confidence interval) within which the population parameter is expected to lie, providing more information than a point estimate.

18. What is the significance of confidence intervals in statistical analysis?


Confidence intervals provide a range of values that likely contain the population parameter, allowing researchers to quantify the uncertainty associated with point estimates. They help in making inferences about the population based on sample data.

19. What is the relationship between a Z-score and a confidence interval?


A Z-score is used to determine the critical values that define the boundaries of a confidence interval. For example, a Z-score corresponding to a 95% confidence level is used to calculate the margin of error around the sample mean.

20. How are Z-scores used to compare different distributions?


Z-scores standardize different datasets, allowing for comparison across distributions with different means and standard deviations. By converting values to Z-scores, one can assess relative positions within different distributions.

21. What are the assumptions for applying the Central Limit Theorem?


The main assumptions include:

The samples must be independent.

The sample size should be sufficiently large (typically ( n \geq 30 )).

The population from which samples are drawn should have a finite mean and variance.

22. What is the concept of expected value in a probability distribution?


The expected value is the long-term average or mean of a random variable, calculated as the sum of all possible values weighted by their probabilities. It represents the center of the distribution.

23. How does a probability distribution relate to the expected outcome of a random variable?

A probability distribution provides the probabilities of all possible outcomes of a random variable, and the expected value is a weighted average of these outcomes, indicating the most likely average result over many trials.

# ***PRACTICAL***

1. Write a Python program to generate a random variable and display its value.



In [None]:
import random

value = random.randint(1, 100)
print("Random variable value:", value)


2. Generate a discrete uniform distribution using Python and plot the probability mass function (PMF).


In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import randint

low, high = 1, 6
rv = randint(low, high + 1)
x = np.arange(low, high + 1)
pmf = rv.pmf(x)

plt.stem(x, pmf, basefmt=" ")
plt.xlabel('Value')
plt.ylabel('PMF')
plt.title('Discrete Uniform Distribution PMF')
plt.show()


3. Write a Python function to calculate the probability distribution function (PDF) of a Bernoulli distribution.


In [None]:
from scipy.stats import bernoulli

def bernoulli_pdf(p, x):
    return bernoulli.pmf(x, p)

print("Bernoulli PDF for p=0.5, x=1:", bernoulli_pdf(0.5, 1))

4. Write a Python script to simulate a binomial distribution with n=10 and p=0.5, then plot its histogram.


In [None]:
n, p = 10, 0.5
binomial_data = np.random.binomial(n, p, 1000)

plt.hist(binomial_data, bins=range(n + 2), align='left', rwidth=0.8)
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('Binomial Distribution (n=10, p=0.5)')
plt.show()


5. Create a Poisson distribution and visualize it using Python.


In [None]:
from scipy.stats import poisson

mu = 4
poisson_data = poisson.rvs(mu, size=1000)

plt.hist(poisson_data, bins=range(min(poisson_data), max(poisson_data) + 2), align='left', rwidth=0.8)
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('Poisson Distribution (mu=4)')
plt.show()


6. Write a Python program to calculate and plot the cumulative distribution function (CDF) of a discrete uniform distribution.


In [None]:
cdf = rv.cdf(x)

plt.step(x, cdf, where='post')
plt.xlabel('Value')
plt.ylabel('CDF')
plt.title('Discrete Uniform Distribution CDF')
plt.show()


7. Generate a continuous uniform distribution using NumPy and visualize it.


In [None]:
uniform_data = np.random.uniform(0, 1, 1000)

plt.hist(uniform_data, bins=20, density=True, alpha=0.7)
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('Continuous Uniform Distribution')
plt.show()


8. Simulate data from a normal distribution and plot its histogram.


In [None]:
normal_data = np.random.normal(0, 1, 1000)

plt.hist(normal_data, bins=20, density=True, alpha=0.7)
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('Normal Distribution')
plt.show()


9. Write a Python function to calculate Z-scores from a dataset and plot them.


In [None]:
def calculate_z_scores(data):
    mean = np.mean(data)
    std = np.std(data)
    z_scores = (data - mean) / std
    return z_scores

data = np.random.normal(50, 10, 1000)
z_scores = calculate_z_scores(data)

plt.hist(z_scores, bins=20, density=True, alpha=0.7)
plt.xlabel('Z-score')
plt.ylabel('Frequency')
plt.title('Z-scores of Dataset')
plt.show()


10. Implement the Central Limit Theorem (CLT) using Python for a non-normal distribution.


In [None]:
import seaborn as sns

# Generate a non-normal distribution (exponential)
non_normal_data = np.random.exponential(scale=2, size=10000)

# Simulate the CLT by taking sample means
sample_means = [np.mean(np.random.choice(non_normal_data, size=30)) for _ in range(1000)]

# Plot original data and sample means
plt.figure(figsize=(14, 6))

plt.subplot(1, 2, 1)
sns.histplot(non_normal_data, bins=30, kde=True)
plt.title('Original Non-Normal Distribution')

plt.subplot(1, 2, 2)
sns.histplot(sample_means, bins=30, kde=True)
plt.title('Sample Means (CLT Applied)')

plt.show()


15. Simulate multiple samples from a normal distribution and verify the Central Limit Theorem.


In [None]:
sample_means = [np.mean(np.random.normal(0, 1, 30)) for _ in range(1000)]

plt.hist(sample_means, bins=30, density=True, alpha=0.7)
plt.title('CLT Verification with Normal Distribution')
plt.xlabel('Sample Mean')
plt.ylabel('Density')
plt.show()


16. Write a Python function to calculate and plot the standard normal distribution (mean = 0, std = 1).


In [None]:
from scipy.stats import norm

def plot_standard_normal():
    x = np.linspace(-4, 4, 1000)
    y = norm.pdf(x, 0, 1)

    plt.plot(x, y)
    plt.title('Standard Normal Distribution (mean=0, std=1)')
    plt.xlabel('Z-score')
    plt.ylabel('Probability Density')
    plt.grid()
    plt.show()

plot_standard_normal()


17. Generate random variables and calculate their corresponding probabilities using the binomial distribution.


In [None]:
n, p = 10, 0.5
binom_rv = np.random.binomial(n, p, size=1000)
x = np.arange(0, n + 1)
pmf = binom_rv

plt.hist(binom_rv, bins=range(n + 2), align='left', rwidth=0.8, density=True)
plt.title('Binomial Distribution PMF')
plt.xlabel('Value')
plt.ylabel('Probability')
plt.show()


18. Write a Python program to calculate the Z-score for a given data point and compare it to a standard normal distribution.


In [None]:
def z_score(data_point, mean, std):
    return (data_point - mean) / std

data = np.random.normal(50, 10, 100)
mean, std = np.mean(data), np.std(data)

z = z_score(60, mean, std)
print(f"Z-score for data point 60: {z}")

# Plotting comparison
x = np.linspace(-4, 4, 1000)
y = norm.pdf(x, 0, 1)

plt.plot(x, y, label="Standard Normal Distribution")
plt.axvline(z, color='r', linestyle='--', label=f"Z = {z:.2f}")
plt.legend()
plt.show()


19. Implement hypothesis testing using Z-statistics for a sample dataset.


In [None]:
from scipy.stats import norm

# Example dataset
sample_mean = 52
population_mean = 50
std_dev = 10
sample_size = 30

# Calculate Z-statistic
z_stat = (sample_mean - population_mean) / (std_dev / np.sqrt(sample_size))
p_value = 2 * (1 - norm.cdf(abs(z_stat)))

print(f"Z-statistic: {z_stat:.2f}")
print(f"P-value: {p_value:.4f}")


20. Create a confidence interval for a dataset using Python and interpret the result.


In [None]:
def confidence_interval(data, confidence=0.95):
    mean = np.mean(data)
    std_err = np.std(data, ddof=1) / np.sqrt(len(data))
    margin = std_err * norm.ppf((1 + confidence) / 2)

    return mean - margin, mean + margin

data = np.random.normal(50, 10, 100)
ci = confidence_interval(data)

print(f"95% Confidence Interval: {ci}")


21. Generate data from a normal distribution, then calculate and interpret the confidence interval for its mean.


In [None]:
data = np.random.normal(100, 15, 200)
ci = confidence_interval(data)

print(f"95% Confidence Interval for Mean: {ci}")


22. Write a Python script to calculate and visualize the probability density function (PDF) of a normal distribution.


In [None]:
x = np.linspace(-4, 4, 1000)
pdf = norm.pdf(x, 0, 1)

plt.plot(x, pdf)
plt.title('Probability Density Function of Normal Distribution')
plt.xlabel('Value')
plt.ylabel('Density')
plt.grid()
plt.show()


23. Use Python to calculate and interpret the cumulative distribution function (CDF) of a Poisson distribution.


In [None]:
poisson_cdf = poisson.cdf(x, mu)

plt.step(x, poisson_cdf, where='post')
plt.title('Poisson Distribution CDF')
plt.xlabel('Value')
plt.ylabel('Cumulative Probability')
plt.show()


24. Simulate a random variable using a continuous uniform distribution and calculate its expected value.


In [None]:
uniform_data = np.random.uniform(0, 10, 1000)
expected_value = np.mean(uniform_data)

print("Expected Value:", expected_value)


25. Write a Python program to compare the standard deviations of two datasets and visualize the difference.


In [None]:
data1 = np.random.normal(50, 10, 100)
data2 = np.random.normal(60, 15, 100)

std1, std2 = np.std(data1), np.std(data2)
print(f"Standard Deviation of Dataset 1: {std1:.2f}")
print(f"Standard Deviation of Dataset 2: {std2:.2f}")

plt.hist(data1, bins=20, alpha=0.5, label='Dataset 1')
plt.hist(data2, bins=20, alpha=0.5, label='Dataset 2')
plt.legend()
plt.show()


26. Calculate the range and interquartile range (IQR) of a dataset generated from a normal distribution.


In [None]:
data = np.random.normal(50, 10, 1000)

range_value = np.ptp(data)
iqr_value = np.percentile(data, 75) - np.percentile(data, 25)

print(f"Range: {range_value}")
print(f"IQR: {iqr_value}")


27. Implement Z-score normalization on a dataset and visualize its transformation.

In [None]:
def z_score_normalization(data):
    return (data - np.mean(data)) / np.std(data)

normalized_data = z_score_normalization(data)

plt.hist(data, bins=20, alpha=0.5, label='Original')
plt.hist(normalized_data, bins=20, alpha=0.5, label='Normalized')
plt.legend()
plt.show()



28. Write a Python function to calculate the skewness and kurtosis of a dataset generated from a normal distribution.

In [None]:
from scipy.stats import skew, kurtosis

skewness = skew(data)
kurt = kurtosis(data)

print(f"Skewness: {skewness:.2f}")
print(f"Kurtosis: {kurt:.2f}")
