In [None]:
Q1. What is the Probability density function?

The Probability Density Function (PDF) is a concept primarily used in probability theory and statistics. It describes the probability distribution of a continuous random variable. Unlike discrete random variables which take on distinct values with specific probabilities, continuous random variables can take on any value within a certain range.

The PDF itself is a function that describes the relative likelihood for a continuous random variable to take on a particular value. Mathematically, it's denoted as \( f(x) \) where \( x \) is the variable, and it satisfies two conditions:

1. The function \( f(x) \) is non-negative for all \( x \).
2. The total area under the curve of \( f(x) \) over the entire range of possible values for \( x \) equals 1.

In simpler terms, the PDF gives you the probability of a continuous random variable falling within a particular range of values. This is often represented graphically as a curve, with the area under the curve representing the probability. The higher the curve at a certain point, the more likely it is for the random variable to take on a value in that vicinity.

For example, in the case of a normal distribution, the PDF is a bell-shaped curve, with the peak indicating the most probable value, and the tails indicating less probable values.

In [None]:
Q2. What are the types of Probability distribution?

There are several types of probability distributions, each with its own characteristics and applications. Some of the most common ones include:

1. **Discrete Distributions:**
   - **Bernoulli Distribution:** Represents the probability of success or failure in a single experiment with two possible outcomes.
   - **Binomial Distribution:** Describes the number of successes in a fixed number of independent Bernoulli trials.
   - **Poisson Distribution:** Models the number of events occurring in a fixed interval of time or space, given a known average rate of occurrence.

2. **Continuous Distributions:**
   - **Normal Distribution:** Also known as the Gaussian distribution, it is characterized by a symmetric bell-shaped curve. Many natural phenomena follow this distribution due to the Central Limit Theorem.
   - **Uniform Distribution:** All values within a given interval are equally likely to occur.
   - **Exponential Distribution:** Describes the time between events in a Poisson process, such as the time between phone calls at a call center.
   - **Gamma Distribution:** Generalizes the exponential distribution and is often used to model waiting times or lifetimes of objects.
   - **Beta Distribution:** Often used to model random variables constrained to a finite interval, such as proportions or probabilities.

3. **Multivariate Distributions:**
   - **Multinomial Distribution:** Generalizes the binomial distribution to more than two categories.
   - **Multivariate Normal Distribution:** Generalizes the normal distribution to multiple dimensions, describing the joint distribution of several correlated random variables.

4. **Special Distributions:**
   - **Student's t-Distribution:** Similar to the normal distribution but with heavier tails, commonly used in hypothesis testing.
   - **Chi-Squared Distribution:** Used in hypothesis testing and confidence interval estimation, particularly in analyzing variance.
   - **F-Distribution:** Used in analysis of variance (ANOVA) and regression analysis.

In [None]:
Q3. Write a Python function to calculate the probability density function of a normal distribution with
given mean and standard deviation at a given point.

Certainly! You can use the probability density function (PDF) formula for the normal distribution, which is defined as:

\[ f(x) = \frac{1}{\sqrt{2\pi}\sigma} e^{-\frac{(x - \mu)^2}{2\sigma^2}} \]

Where:
- \( \mu \) is the mean of the distribution,
- \( \sigma \) is the standard deviation, and
- \( x \) is the point at which you want to evaluate the PDF.

Here's a Python function to calculate the PDF of a normal distribution at a given point:

import math

def normal_pdf(x, mu, sigma):
    """
    Calculate the probability density function (PDF) of a normal distribution
    at a given point x with mean mu and standard deviation sigma.

    Parameters:
        x (float): The point at which to evaluate the PDF.
        mu (float): The mean of the normal distribution.
        sigma (float): The standard deviation of the normal distribution.

    Returns:
        float: The value of the PDF at point x.
    """
    coefficient = 1 / (math.sqrt(2 * math.pi) * sigma)
    exponent = -((x - mu) ** 2) / (2 * sigma ** 2)
    pdf_value = coefficient * math.exp(exponent)
    return pdf_value

# Example usage:
mean = 0  # Mean of the normal distribution
std_dev = 1  # Standard deviation of the normal distribution
point = 1  # Point at which to evaluate the PDF
pdf_at_point = normal_pdf(point, mean, std_dev)
print("PDF at point", point, ":", pdf_at_point)

In [None]:
Q4. What are the properties of Binomial distribution? Give two examples of events where binomial
distribution can be applied.

The Binomial distribution is a discrete probability distribution that describes the number of successes in a fixed number of independent Bernoulli trials, where each trial has only two possible outcomes: success or failure. The key properties of the Binomial distribution include:

1. **Fixed number of trials (n):** The number of trials is predetermined and remains constant throughout the experiment.

2. **Independent trials:** Each trial is independent of the others, meaning the outcome of one trial does not affect the outcome of another.

3. **Two possible outcomes:** Each trial results in either a success or a failure.

4. **Constant probability of success (p):** The probability of success (denoted as p) remains the same for each trial.

5. **Discrete nature:** The random variable representing the number of successes takes on discrete values (0, 1, 2, ..., n).

6. **Probability Mass Function (PMF):** The probability of obtaining exactly k successes in n trials is given by the Binomial PMF formula:

\[ P(X = k) = \binom{n}{k} \times p^k \times (1 - p)^{n - k} \]

where:
- \( n \) is the number of trials,
- \( k \) is the number of successes,
- \( p \) is the probability of success in each trial, and
- \( \binom{n}{k} \) is the binomial coefficient, calculated as \( \frac{n!}{k!(n-k)!} \).

Examples of events where the Binomial distribution can be applied include:

1. **Coin Flipping:** Suppose you flip a fair coin 10 times and count the number of heads obtained. Each flip can be considered a Bernoulli trial with a probability of success (getting heads) of 0.5. The total number of heads in 10 flips follows a Binomial distribution.

2. **Manufacturing Defects:** Consider a manufacturing process where each item produced has a certain probability of being defective (e.g., 0.02). If you randomly select 100 items from the production line, the number of defective items among the 100 follows a Binomial distribution, where each selection is a Bernoulli trial with a probability of success (finding a defective item) of 0.02.

In [None]:
Q5. Generate a random sample of size 1000 from a binomial distribution with probability of success 0.4
and plot a histogram of the results using matplotlib.

To generate a random sample of size 1000 from a binomial distribution with a probability of success \( p = 0.4 \) and plot a histogram of the results using matplotlib in Python, you can follow these steps:

import numpy as np
import matplotlib.pyplot as plt

# Set the parameters
n = 1000  # Sample size
p = 0.4   # Probability of success

# Generate random sample from binomial distribution
random_sample = np.random.binomial(n, p, size=n)

# Plot histogram
plt.hist(random_sample, bins=20, color='skyblue', edgecolor='black')
plt.title('Histogram of Binomial Distribution (n=1000, p=0.4)')
plt.xlabel('Number of Successes')
plt.ylabel('Frequency')
plt.grid(True)
plt.show()

In [None]:
Q6. Write a Python function to calculate the cumulative distribution function of a Poisson distribution
with given mean at a given point.

To calculate the cumulative distribution function (CDF) of a Poisson distribution at a given point \( k \) with a given mean \( \lambda \), you can use the following formula:

\[ CDF(k; \lambda) = \sum_{i=0}^{k} \frac{e^{-\lambda} \lambda^i}{i!} \]

Here's a Python function to compute the CDF of a Poisson distribution:

import math

def poisson_cdf(k, mean):
    """
    Calculate the cumulative distribution function (CDF) of a Poisson distribution
    at a given point k with a given mean.

    Parameters:
        k (int): The point at which to evaluate the CDF.
        mean (float): The mean of the Poisson distribution.

    Returns:
        float: The value of the CDF at point k.
    """
    cdf = 0
    for i in range(k + 1):
        cdf += math.exp(-mean) * (mean ** i) / math.factorial(i)
    return cdf

# Example usage:
mean = 3  # Mean of the Poisson distribution
point = 2  # Point at which to evaluate the CDF
cdf_at_point = poisson_cdf(point, mean)
print("CDF at point", point, ":", cdf_at_point)

In [None]:
Q7. How Binomial distribution different from Poisson distribution?

The Binomial distribution and the Poisson distribution are both probability distributions used to model the number of successes in a series of independent trials. However, they are used in different scenarios and have different characteristics:

1. **Number of Trials:**
   - **Binomial Distribution:** The number of trials (denoted by \( n \)) in a Binomial distribution is fixed and known in advance.
   - **Poisson Distribution:** There is no fixed number of trials in a Poisson distribution. It is used to model the number of events occurring in a fixed interval of time or space, given a known average rate of occurrence (\( \lambda \)).

2. **Nature of Events:**
   - **Binomial Distribution:** The Binomial distribution is used when each trial results in one of two outcomes (success or failure).
   - **Poisson Distribution:** The Poisson distribution is used to model the number of occurrences of rare events in a fixed interval of time or space, where the events are independent and occur at a constant average rate.

3. **Parameters:**
   - **Binomial Distribution:** It is characterized by two parameters: the number of trials (\( n \)) and the probability of success in each trial (\( p \)).
   - **Poisson Distribution:** It is characterized by a single parameter: the average rate of occurrence (\( \lambda \)).

4. **Formula:**
   - **Binomial Distribution:** The probability mass function (PMF) of the Binomial distribution is given by the binomial coefficient multiplied by the probability of success raised to the power of the number of successes and the probability of failure raised to the power of the number of failures.
   - **Poisson Distribution:** The probability mass function (PMF) of the Poisson distribution is given by the Poisson parameter (\( \lambda \)) raised to the power of the number of occurrences, multiplied by the exponential of the negative Poisson parameter, divided by the factorial of the number of occurrences.

5. **Assumptions:**
   - **Binomial Distribution:** It assumes a fixed number of trials with a constant probability of success in each trial.
   - **Poisson Distribution:** It assumes a large number of trials with a low probability of success in each trial, such that the average rate of occurrence (\( \lambda \)) remains constant.

In [None]:
Q8. Generate a random sample of size 1000 from a Poisson distribution with mean 5 and calculate the
sample mean and variance.

To generate a random sample of size 1000 from a Poisson distribution with a mean of 5 and calculate the sample mean and variance, you can use numpy's random module. Here's how you can do it in Python:

import numpy as np

# Set the parameters
mean = 5
sample_size = 1000

# Generate random sample from Poisson distribution
random_sample = np.random.poisson(mean, size=sample_size)

# Calculate sample mean and variance
sample_mean = np.mean(random_sample)
sample_variance = np.var(random_sample)

# Print the results
print("Sample Mean:", sample_mean)
print("Sample Variance:", sample_variance)

In [None]:
Q9. How mean and variance are related in Binomial distribution and Poisson distribution?

In both the Binomial distribution and the Poisson distribution, the mean (\( \mu \)) and the variance (\( \sigma^2 \)) are closely related, but they have slightly different relationships due to the different nature of these distributions.

**Binomial Distribution:**
In the Binomial distribution, where the number of trials is fixed (\( n \)) and the probability of success in each trial is constant (\( p \)), the mean (\( \mu \)) and variance (\( \sigma^2 \)) are related as follows:
\[ \mu = n \times p \]
\[ \sigma^2 = n \times p \times (1 - p) \]

So, in the Binomial distribution, the variance is directly dependent on both the number of trials and the probability of success in each trial.

**Poisson Distribution:**
In the Poisson distribution, where the number of trials is not fixed and the average rate of occurrence (\( \lambda \)) is constant, the mean (\( \mu \)) and variance (\( \sigma^2 \)) are related as follows:
\[ \mu = \lambda \]
\[ \sigma^2 = \lambda \]

In the Poisson distribution, both the mean and variance are equal to the average rate of occurrence (\( \lambda \)). This means that the variance is solely dependent on the average rate of occurrence, and it does not depend on any other parameters such as the number of trials or the probability of success.

In [None]:
Q10. In normal distribution with respect to mean position, where does the least frequent data appear?

In a normal distribution, the least frequent data appears at the tails of the distribution, farthest away from the mean. 

A normal distribution is symmetric about its mean. The probability density function (PDF) of a normal distribution is highest at the mean and decreases symmetrically in both directions as you move away from the mean towards the tails of the distribution. As a result, the data points that are farthest away from the mean, in the tails of the distribution, occur less frequently compared to those closer to the mean.

In other words, the least frequent data appears in the tails of the normal distribution, located at the extreme ends of the distribution away from the mean. These data points represent values that are either very high or very low relative to the mean, and they occur less frequently according to the bell-shaped curve of the normal distribution.