### **What is Simple Linear Regression**


 Simple Linear Regression is a statistical method that models the relationship between a dependent variable (Y) and an independent variable (X) using the equation Y = mX + c, where 'm' is the slope and 'c' is the intercept.

### **What are the key assumptions of Simple Linear Regression?**

* Linearity: The relationship between X and Y is linear.
* Independence: Observations are independent.
* Homoscedasticity: Constant variance of residuals.
* Normality: Residuals follow a normal distribution.
* No multicollinearity: Only one independent variable.


### **What does the coefficient m represent in the equation Y=mX+c?**

The coefficient 'm' represents the slope of the regression line, indicating the rate of change in Y for a one-unit increase in X.

### **What does the intercept c represent in the equation Y=mX+c?**

The intercept 'c' represents the value of Y when X is zero. It is the point where the regression line crosses the Y-axis.


### **How do we calculate the slope m in Simple Linear Regression?**

m = (Σ(X - mean(X)) * (Y - mean(Y))) / Σ(X - mean(X))²

### **What is the purpose of the least squares method in Simple Linear Regression?**

The least squares method minimizes the sum of squared differences between observed and predicted Y values to find the best-fitting regression line.

### **How is the coefficient of determination (R²) interpreted in Simple Linear Regression?**

R² measures the proportion of variance in Y explained by X. A value close to 1 indicates a strong relationship, while a value near 0 suggests a weak relationship.

### **What is Multiple Linear Regression?**

Multiple Linear Regression models the relationship between a dependent variable and multiple independent variables using Y = b0 + b1X1 + b2X2 + ... + bnXn.

### **What is the main difference between Simple and Multiple Linear Regression?**

Simple Linear Regression has one independent variable, while Multiple Linear Regression has two or more independent variables.

### **What are the key assumptions of Multiple Linear Regression?**

* Linearity
* Independence
* Homoscedasticity
* Normality
* No multicollinearity (independent variables should not be highly correlated).

### **What is heteroscedasticity, and how does it affect the results of a Multiple Linear Regression model?**

Heteroscedasticity occurs when residual variance is not constant, leading to unreliable standard errors and affecting hypothesis tests.

### **How can you improve a Multiple Linear Regression model with high multicollinearity?**

* Remove highly correlated variables
* Use Principal Component Analysis (PCA)
* Apply Ridge or Lasso regression.

### **What are some common techniques for transforming categorical variables for use in regression models?**

* One-Hot Encoding
* Label Encoding
* Target Encoding

### **What is the role of interaction terms in Multiple Linear Regression?**

Interaction terms capture combined effects of independent variables that impact Y beyond their individual contributions.

### **How can the interpretation of intercept differ between Simple and Multiple Linear Regression?**

In Simple Linear Regression, the intercept is the Y value when X is zero. In Multiple Linear Regression, it represents Y when all Xs are zero, which may not always be meaningful.

### **What is the significance of the slope in regression analysis, and how does it affect predictions?**

The slope represents the rate of change in Y per unit increase in X. It determines the direction and strength of the relationship.

### **How does the intercept in a regression model provide context for the relationship between variables?**

The intercept helps establish a reference point for Y when all predictors are at zero, providing baseline insights.

### **What are the limitations of using R² as a sole measure of model performance?**
R² does not indicate causation, does not work well with nonlinear relationships, and can be misleading in complex models.

### **How would you interpret a large standard error for a regression coefficient?**

A large standard error suggests high variability in coefficient estimation, indicating low reliability.

### **How can heteroscedasticity be identified in residual plots, and why is it important to address it?**

Heteroscedasticity appears as a fan-shaped pattern in residual plots. It must be addressed for reliable coefficient estimates.

### **What does it mean if a Multiple Linear Regression model has a high R² but low adjusted R²?**

It indicates that adding independent variables may not be contributing meaningful explanatory power, causing overfitting.

### **Why is it important to scale variables in Multiple Linear Regression?**

Scaling ensures variables contribute equally, preventing dominance by large-scale features.

### **What is polynomial regression?**

Polynomial Regression is an extension of linear regression where the relationship between independent and dependent variables is modeled as an nth-degree polynomial.

### **How does polynomial regression differ from linear regression?**

Polynomial Regression captures nonlinear relationships by including higher-degree terms of the independent variable.

### **When is polynomial regression used?**

It is used when data shows a curvilinear trend that a linear model cannot capture.

### **What is the general equation for polynomial regression?**

Y = b0 + b1X + b2X² + ... + bnXⁿ

### **Can polynomial regression be applied to multiple variables?**

Yes, it can be extended to multiple independent variables, leading to Polynomial Multiple Regression.

### **What are the limitations of polynomial regression?**

* Overfitting with high-degree polynomials
* Computationally expensive
* Sensitive to outliers.

### **What methods can be used to evaluate model fit when selecting the degree of a polynomial?**

* R² Score
* Cross-validation
* Mean Squared Error (MSE)

### **Why is visualization important in polynomial regression?**

Visualization helps identify the polynomial degree required to capture patterns effectively.

### **How is polynomial regression implemented in Python?**

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

X = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]).reshape(-1, 1)
y = np.array([2.1, 2.9, 3.8, 5.2, 6.8, 8.7, 11.1, 13.9, 17.5, 21.2])
poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)
model = LinearRegression()
model.fit(X_poly, y)
y_pred = model.predict(X_poly)

plt.scatter(X, y, color='blue', label="Data")
plt.plot(X, y_pred, color='red', label="Polynomial Regression")
plt.legend()
plt.xlabel("X Axis")
plt.ylabel("Y Aix")
plt.title("Polynomial Regression")
plt.show()