# Q1: What are the Probability Mass Function (PMF) and Probability Density Function (PDF)? Explain with an example.

Probability Mass Function (PMF) and Probability Density Function (PDF) are both mathematical functions used to describe the probability distribution of a random variable.

***Probability Mass Function (PMF):*** The PMF is applicable to discrete random variables. It gives the probability of each possible outcome occurring. The PMF assigns a probability value to each specific value of the random variable.

**Example:** Consider rolling a fair six-sided die. The PMF for this scenario would assign a probability value of 1/6 to each outcome (1, 2, 3, 4, 5, or 6) because each face of the die has an equal chance of occurring.

***Probability Density Function (PDF):*** The PDF is used for continuous random variables. It represents the likelihood of a random variable falling within a specific range of values. Unlike the PMF, the PDF does not give the probability of individual values; instead, it provides the relative likelihood of values within a range.

**Example:** Suppose we have a continuous random variable that represents the height of individuals in a population. The PDF for this variable would describe the likelihood of finding individuals within a specific height range. For instance, it might indicate that the probability density is higher for heights between 160 cm and 170 cm, suggesting that individuals within that range are more likely to be found in the population.

>In summary, the PMF is used for discrete random variables, providing the probability of each individual outcome. The PDF is used for continuous random variables, describing the likelihood of values falling within specific ranges.

# Q2: What is Cumulative Density Function (CDF)? Explain with an example. Why CDF is used?

***The Cumulative Density Function (CDF)*** is a mathematical function that gives the probability that a random variable takes on a value less than or equal to a given value. It provides cumulative information about the probability distribution.

**Example:** Let's consider a continuous random variable representing the time it takes for a customer to complete a task at a service center. The CDF for this variable would give the probability that a customer's task time is less than or equal to a specific value.

Suppose the CDF at time t=5 minutes is 0.8. This means that there is an 80% probability that a customer's task time is less than or equal to 5 minutes. The CDF can be interpreted as the area under the probability density curve up to a given value.

>The CDF is used for several reasons:

**Probability calculation:** The CDF allows us to calculate the probability of a random variable falling within a certain range by subtracting the CDF values at the lower and upper limits of the range.

**Percentile determination:** The CDF helps determine percentiles or quantiles of a distribution. For example, the value of the random variable at which the CDF reaches 0.5 represents the median.

**Distribution comparison:** CDFs can be used to compare and analyze different probability distributions. By plotting multiple CDFs on the same graph, we can visually assess differences in their shapes, locations of percentiles, or probabilities at specific values.

**Estimation of parameters:** CDFs are employed in statistical estimation techniques, such as maximum likelihood estimation, to estimate the parameters of a distribution that best fit observed data.

In summary, the Cumulative Density Function (CDF) provides cumulative information about the probability distribution of a random variable, giving the probability that the variable takes on a value less than or equal to a given value. It is useful for probability calculations, percentile determination, distribution comparison, and parameter estimation.

# Q3: What are some examples of situations where the normal distribution might be used as a model? Explain how the parameters of the normal distribution relate to the shape of the distribution.

Some examples of situations where the normal distribution might be used as a model include:

1.**Heights or weights of a population:** The normal distribution can be used to model the distribution of heights or weights of a population.

2.**Test scores:** The normal distribution can be used to model the distribution of test scores in a population.

3.**Measurement errors:** The normal distribution can be used to model the distribution of errors in measurements, such as errors in laboratory measurements.

4.**Financial returns:** The normal distribution can be used to model the distribution of financial returns, such as stock prices.

5.**Time taken to complete a task:** The normal distribution can be used to model the distribution of time taken to complete a task, such as the time taken to fill an order in a factory.

>The normal distribution is a symmetric distribution, which means that the mean, median, and mode are all equal. The shape of the distribution is bell-shaped, and the total area under the curve is equal to 1. The standard normal distribution is a special case of the normal distribution with a mean of 0 and a standard deviation of 1.  
 The normal distribution is often used as a model in statistical analysis because of its properties, such as the central limit theorem, which states that the sum of a large number of independent and identically distributed random variables will tend to a normal distribution.

# Q4: Explain the importance of Normal Distribution. Give a few real-life examples of Normal Distribution.

>The importance of the normal distribution can be explained as follows:

***Many natural phenomena follow a normal distribution:*** The normal distribution is observed in many natural phenomena, such as the heights and weights of a population, the measurement errors in laboratory experiments, the test scores of students, the time taken to complete a task, and many more.

***Statistical inference:*** The normal distribution is used extensively in statistical inference, which involves drawing conclusions about a population based on a sample of data. Many statistical tests, such as the t-test and ANOVA, assume that the data follow a normal distribution.

***Central limit theorem:*** The central limit theorem states that the sum of a large number of independent and identically distributed random variables will tend to a normal distribution, regardless of the distribution of the individual variables. This property makes the normal distribution a fundamental concept in probability theory and statistics.

***Data Modeling:*** Many real-life phenomena naturally follow a Normal Distribution. Examples include physical measurements like height and weight of individuals in a population, errors in measurements, test scores, and financial market returns. By assuming a Normal Distribution, we can simplify the modeling process and make predictions or draw inferences from the data.

***Parameter Estimation:*** In many statistical models, estimating the parameters of a distribution is a crucial step. The Normal Distribution is often used as a reference or default distribution for estimation procedures, such as maximum likelihood estimation and least squares regression, due to its mathematical properties and simplicity.

>Some real-life examples of the normal distribution are:

1.**Heights of adults:** The heights of adults in a population follow a normal distribution, with a mean of around 5 feet 7 inches and a standard deviation of around 3 inches.

2.**Test scores:** The scores on a standardized test, such as the SAT or GRE, follow a normal distribution, with a mean of around 500-600 and a standard deviation of around 100-200.

3.**Body temperature:** The body temperature of a healthy human follows a normal distribution, with a mean of around 98.6 degrees Fahrenheit and a standard deviation of around 0.5 degrees.

4.**IQ scores:** The IQ scores of a population follow a normal distribution, with a mean of 100 and a standard deviation of 15.

5.**Stock market returns:** The daily returns on the stock market follow a normal distribution, with a mean of around 0 and a standard deviation of around 1-2%

# Q5: What is Bernaulli Distribution? Give an Example. What is the difference between Bernoulli Distribution and Binomial Distribution?

***The Bernoulli distribution is a discrete probability distribution that models the outcome of a single binary event. It takes a single parameter p, which represents the probability of success for the event, and outputs a probability distribution for the event's outcomes.***

**Example:** Consider flipping a fair coin. The Bernoulli distribution models this as a binary event where the outcome is either heads or tails. If we define "success" as getting heads, then the probability of success is 0.5, and the probability of failure (getting tails) is also 0.5. The Bernoulli distribution would output a probability distribution where the probability of success is 0.5 and the probability of failure is 0.5.

> **The key difference between the Bernoulli distribution and the binomial distribution** is that the Bernoulli distribution models the outcome of a single event, while the binomial distribution models the number of successes in a fixed number of independent and identical Bernoulli trials. In other words, the binomial distribution is the sum of multiple independent Bernoulli random variables.

# Q6. Consider a dataset with a mean of 50 and a standard deviation of 10. If we assume that the dataset is normally distributed, what is the probability that a randomly selected observation will be greater than 60? Use the appropriate formula and show your calculations.

We can use the z-table to solve this question.

> The appropriate formula to use is:  
  z = (x - μ) / σ

where x is the value of the observation you are interested in (in this case, x = 60), μ is the mean of the dataset, and σ is the standard deviation of the dataset.

> Substituting the values given in the question, we get:  
  z = (60 - 50) / 10 = 1

Now, we need to use the z-table to find the probability that a z-score is greater than 1.

**Using the z-table***, we look up the probability corresponding to a z-score of 1.00 in the positive z-score column. The table tells us that the probability is 0.8413.

The probabilty that we got from z-table is the probability of randomly selected number less than 60, because z-table gives us the probability of the values on the left side of 60. But we need the values on the right side (greater than 60) of 60. So we can get that probability by subtracting 0.8413 from 1 as:

> 1 - 0.8413 = 0.1587

# Q7: Explain uniform Distribution with an example.

***Uniform distribution, also known as a rectangular distribution, is a probability distribution where all possible outcomes are equally likely to occur. It is often used in statistics to model situations where each outcome is equally likely to occur, such as rolling a fair die or picking a card from a well-shuffled deck.***

**Example:**
Rolling a fair six-sided die: When rolling the die, each face has an equal probability of showing up, which is 1/6 or approximately 0.1667. This means that any number between 1 and 6 is equally likely to be rolled, and the probability of rolling any particular number is 1/6.

>In the above examples, the probability density function of the uniform distribution is constant over the entire range of possible outcomes. That is, the probability of any particular outcome is proportional to the size of the range of possible outcomes.

# Q8: What is the z score? State the importance of the z score.

***The z-score is a statistical measure that expresses how far a data point is from the mean of a distribution in terms of standard deviations.***

> The formula for calculating the z-score of a data point is:  
  z = (x - μ) / σ  
  where x is the data point, μ is the mean of the distribution, and σ is the standard deviation.

**Importance:**

1. The z-score is important because it allows us to standardize data from different distributions, which can then be compared and analyzed more easily. By converting data into z-scores, we can compare observations from different samples or populations and make meaningful statements about their relative positions.
2. The z-score is also useful in hypothesis testing, where it is used to calculate the probability of observing a value as extreme as the one observed, assuming a certain null hypothesis.

# Q9: What is Central Limit Theorem? State the significance of the Central Limit Theorem.

***The Central Limit Theorem (CLT) is a fundamental result in probability theory and statistics that describes the behavior of the sum or average of a large number of independent and identically distributed random variables. It states that, under certain conditions, the sum or average of such variables will converge to a normal distribution, regardless of the distribution of the individual variables.***

> ***The significance of the CLT*** is that it provides a theoretical foundation for many statistical techniques that assume normally distributed data. For example, many hypothesis tests and confidence intervals rely on the assumption of normality, which is often justified by the CLT.
Additionally, the CLT is important in practical applications such as quality control, where it is often necessary to estimate the mean and variance of a population based on a sample.

# Q10: State the assumptions of the Central Limit Theorem.

The assumptions of the Central Limit Theorem are:

1.**Independence:** The observations in the sample are independent of each other, meaning that the outcome of one observation does not influence the outcome of another observation.

2.**Sample size:** The sample size is sufficiently large. The larger the sample size, the better the approximation to the normal distribution.

3.**Identically distributed:** The sample data comes from a population that has a well-defined mean and variance. The observations in the sample are identically distributed, meaning that they come from the same population.

4.**Finite variance:** The population has a finite variance. This assumption ensures that the sample variance is also finite.

5.**Non-skewed population distribution:** The population distribution is not strongly skewed. A strongly skewed population distribution can affect the validity of the Central Limit Theorem, and a larger sample size may be required to approximate a normal distribution.