### 1. What is statistics, and why is it important?

Statistics is the study of collecting, analyzing, interpreting, presenting, and organizing data. It is essential for making informed decisions.

### 2. What are the two main types of statistics?

Descriptive statistics and inferential statistics.

### 3. What are descriptive statistics?

They summarize and describe features of a dataset (e.g., mean, median, mode).

### 4. What is inferential statistics?

They make predictions or generalizations about a population based on a sample.

### 5. What is sampling in statistics?

Sampling is selecting a subset of a population for analysis.

### 6. What are the different types of sampling methods?

Simple random, stratified, systematic, cluster, convenience, and quota sampling.

### 7. What is the difference between random and non-random sampling?

Random sampling gives all elements an equal chance; non-random does not.

### 8. Define and give examples of qualitative and quantitative data

Qualitative: non-numeric (e.g., color), Quantitative: numeric (e.g., height).

### 9. What are the different types of data in statistics?

Nominal, ordinal, interval, and ratio.

### 10. Explain nominal, ordinal, interval, and ratio levels of measurement

Nominal: categories, Ordinal: order, Interval: numeric no true zero, Ratio: numeric with true zero.

### 11. What is the measure of central tendency?

A single value that describes the center of a dataset (mean, median, mode).

### 12. Define mean, median, and mode

Mean: average, Median: middle value, Mode: most frequent value.

### 13. What is the significance of the measure of central tendency?

Helps summarize data with a single representative value.

### 14. What is variance, and how is it calculated?

Variance measures spread; calculated as the average of squared differences from the mean.

### 15. What is standard deviation, and why is it important?

It shows how much data deviates from the mean.

### 16. Define and explain the term range in statistics

Range = max - min; shows data spread.

### 17. What is the difference between variance and standard deviation?

Standard deviation is the square root of variance.

### 18. What is skewness in a dataset?

Skewness describes asymmetry of a distribution.

### 19. What does it mean if a dataset is positively or negatively skewed?

Positive skew: right tail; Negative skew: left tail.

### 20. Define and explain kurtosis

Kurtosis measures tail heaviness or lightness of data distribution.

### 21. What is the purpose of covariance?

Shows the directional relationship between two variables.

### 22. What does correlation measure in statistics?

Measures the strength and direction of a relationship between two variables.

### 23. What is the difference between covariance and correlation?

Correlation standardizes covariance between -1 and 1.

### 24. What are some real-world applications of statistics?

Medicine, business, education, sports, government, etc.

# Statistics Basics - Assignment Solutions
This notebook contains explanations and Python implementations for key statistical concepts.

## 1. Mean, Median, and Mode

In [None]:
import statistics

data = [2, 4, 4, 6, 8, 10]
mean = statistics.mean(data)
median = statistics.median(data)
mode = statistics.mode(data)

print("Mean:", mean)
print("Median:", median)
print("Mode:", mode)

## 2. Variance and Standard Deviation

In [None]:
data = [4, 5, 8, 6, 9, 10]

variance = statistics.variance(data)
std_dev = statistics.stdev(data)

print("Variance:", variance)
print("Standard Deviation:", std_dev)

## 3. Dataset Classification

In [None]:
dataset = {
    "Nominal": ["Red", "Blue", "Green"],
    "Ordinal": ["Poor", "Average", "Good", "Excellent"],
    "Interval": [10, 20, 30, 40],
    "Ratio": [150, 160, 170]
}
print(dataset)

## 4. Random and Stratified Sampling

In [None]:
import random
import pandas as pd

data = list(range(1, 101))
random_sample = random.sample(data, 10)
print("Random Sample:", random_sample)

df = pd.DataFrame({
    'Gender': ['Male'] * 50 + ['Female'] * 50,
    'Score': list(range(1, 101))
})
stratified_sample = df.groupby('Gender', group_keys=False).apply(lambda x: x.sample(5))
print(stratified_sample)

## 5. Range Function

In [None]:
def calculate_range(data):
    return max(data) - min(data)

print("Range:", calculate_range([3, 6, 9, 12]))

## 6. Histogram and Skewness

In [None]:
import matplotlib.pyplot as plt

data = [2, 3, 3, 3, 4, 5, 6, 18]
plt.hist(data, bins=6)
plt.title("Histogram")
plt.show()

## 7. Skewness and Kurtosis

In [None]:
from scipy.stats import skew, kurtosis

data = [2, 3, 3, 3, 4, 5, 6, 18]
print("Skewness:", skew(data))
print("Kurtosis:", kurtosis(data))

## 8. Positive and Negative Skewness

In [None]:
import numpy as np

pos_skew = np.random.exponential(scale=2, size=1000)
plt.hist(pos_skew, bins=30)
plt.title("Positive Skew")
plt.show()

neg_skew = -np.random.exponential(scale=2, size=1000)
plt.hist(neg_skew, bins=30)
plt.title("Negative Skew")
plt.show()

## 9. Covariance

In [None]:
x = [1, 2, 3, 4, 5]
y = [5, 4, 6, 8, 10]
cov_matrix = np.cov(x, y)
print("Covariance matrix:\n", cov_matrix)

## 10. Correlation Coefficient

In [None]:
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
corr = np.corrcoef(x, y)
print("Correlation coefficient matrix:\n", corr)

## 11. Scatter Plot

In [None]:
x = [1, 2, 3, 4, 5]
y = [3, 6, 9, 12, 15]

plt.scatter(x, y)
plt.title("Scatter Plot")
plt.xlabel("X")
plt.ylabel("Y")
plt.show()

## 12. Random vs Systematic Sampling

In [None]:
simple_random = random.sample(range(1, 101), 10)
print("Simple Random Sampling:", simple_random)

population = list(range(1, 101))
k = 10
systematic_sample = [population[i] for i in range(0, len(population), k)]
print("Systematic Sampling:", systematic_sample)

## 13. Grouped Data - Mean

In [None]:
data = {'Marks': [10, 20, 30, 40], 'Frequency': [1, 3, 4, 2]}
df = pd.DataFrame(data)
mean = sum(df['Marks'] * df['Frequency']) / sum(df['Frequency'])
print("Grouped Mean:", mean)

## 14. Simulate Data and Central Tendency

In [None]:
data = np.random.normal(loc=50, scale=10, size=100)
print("Mean:", statistics.mean(data))
print("Standard Deviation:", statistics.stdev(data))

## Practical Section (Continued) - Last 10 Questions

### 15. Use NumPy or pandas to summarize a dataset’s descriptive statistics

In [None]:
import pandas as pd
import numpy as np

data = np.random.randint(10, 100, 20)
df = pd.DataFrame(data, columns=['Scores'])
print(df.describe())

### 16. Plot a boxplot to understand the spread and identify outliers

In [None]:
import matplotlib.pyplot as plt

plt.boxplot(data)
plt.title("Boxplot of Scores")
plt.show()

### 17. Calculate the interquartile range (IQR) of a dataset

In [None]:
Q1 = np.percentile(data, 25)
Q3 = np.percentile(data, 75)
IQR = Q3 - Q1
print("Interquartile Range (IQR):", IQR)

### 18. Implement Z-score normalization and explain its significance

In [None]:
z_scores = (data - np.mean(data)) / np.std(data)
print("Z-scores:", z_scores)

### 19. Compare two datasets using their standard deviations

In [None]:
data2 = np.random.randint(20, 80, 20)
std1 = np.std(data)
std2 = np.std(data2)
print("Standard Deviation of Dataset 1:", std1)
print("Standard Deviation of Dataset 2:", std2)

### 20. Visualize covariance using a heatmap

In [None]:
import seaborn as sns

df2 = pd.DataFrame({'A': data, 'B': data2})
sns.heatmap(df2.cov(), annot=True, cmap="coolwarm")
plt.title("Covariance Heatmap")
plt.show()

### 21. Use seaborn to create a correlation matrix for a dataset

In [None]:
sns.heatmap(df2.corr(), annot=True, cmap="YlGnBu")
plt.title("Correlation Matrix")
plt.show()

### 22. Generate a dataset and implement both variance and standard deviation computations

In [None]:
sample_data = np.random.randint(1, 100, 50)
print("Variance:", np.var(sample_data))
print("Standard Deviation:", np.std(sample_data))

### 23. Visualize skewness and kurtosis using Python libraries like matplotlib or seaborn

In [None]:
sns.histplot(sample_data, kde=True)
plt.title("Histogram with KDE")
plt.show()

from scipy.stats import skew, kurtosis
print("Skewness:", skew(sample_data))
print("Kurtosis:", kurtosis(sample_data))

### 24. Implement the Pearson and Spearman correlation coefficients for a dataset

In [None]:
from scipy.stats import pearsonr, spearmanr

pearson_corr, _ = pearsonr(data, data2)
spearman_corr, _ = spearmanr(data, data2)
print("Pearson Correlation:", pearson_corr)
print("Spearman Correlation:", spearman_corr)