**Q1. Explain the difference between simple linear regression and multiple linear regression. Provide an example of each.**

Ans: 

**Simple Linear Regression**
In simple linear regression, we attempt to model the relationship between a dependent variable and a single independent variable using a linear equation. The goal is to find the best-fitting line that minimizes the distance between the predicted values and the actual values.
Example:
- Dependent Variable: House Price
- Independent Variable: House Size (square feet)
We want to predict the price of a house based on it's size.

**Multiple Linear Regression**
Multiple linear regression extends simple linear regression by incorporating multiple independent variables to predict a dependent variable. This allows for more complex relationships and potentially more accurate predictions.
Example:
- Dependent Variable: Car Price
- Independent Variables:
    1. Mileage
    2. Age
    3. Engine Size
    4. Horsepower
We want to predict the price of car based on its mileage, age, engine size, and horsepower

**Q2. Discuss the assumptions of linear regression. How can you check whether these assumptions hold in a given dataset?**

Ans:

Assupmtions of Linear Regression:
1. Linearity: The relationship between the dependent variable(Y) and independent variables(X) is linear. This means that a change in X results in a proportional change in Y.
    - Checking:
    - Scatter plots: Visualize the relationship between Y and each X. A linear pattern suggests linearity
    - Residual plots: Plot the residuals against the predicted values. A random scatter indicates linearity.

2. Homoscedasticity: The variance of the errors is constant across all values of the independent variables. This means that the spread of the residuals is consistent.
    - Checking:
    - Residual plots: Look for a consistent spread of residuals across the range of predicted values.
    - Breusch-Pagan test: This statistical test formally tests for heteroscedasticity.

3. Independence of Errors: The errors(residuals) are independent of each other. This means that the error in one observation does not influence the error in another observation.
    - Checking:
    - Durbin-Watson test: This statistical test checks for autocorrelation, which violates the independence assumption. A value close to 2 indicates independence.

4. Normality of Errors: The errors are normally distributed. This assumption is important for hypothesis testing and confidence interval calculations.
    - Checking: 
    - Histogram of residuals: A bell-shaped curve suggests normality.
    - Q-Q plot: Compare the quantiles of the residuals to the quantiles of a normal distribution. A straight line indicates normality.

5. No multicollinearity: The independent variables are not highly correlated with each other. Multicollinearity can make it difficult to estimate the individual effects of the independent variables on the dependent variable.
    - Checking:
    - Correlation matrix: Calculate the correlation coefficients between the independent variables. High correlations suggest multicollinearity.
    - Variance Inflation Factor(VIF): A VIF greater than 10 indicates high multicollinearity.
    
    


**Q3. How do you interpret the slope and intercept in a linear regression model? Provide an example using a real-world scenario.**

Ans: In a linear regression model, the equation of the line is typically represented as:

Y = b0 + b1*X

Where:
- Y: Dependent variable (the outcome we want to predict)
- X: Independent variable (the predictor variable)
- b0: Intercept
- b1: Slope

*Intercept (b0)*

The intercept represents the predicted value of Y when X is equal to 0. It's the starting point of the line.

*Slope (b1)*

The slope represents the rate of change of Y with respect to X. In other words, it tells us how much Y changes for a one-unit increase in X.

**Real-world example: Predicting House Prices**

*Let's say we want to predict the price of a house based on its square footage. After running a linear regression, we get the following equation:*

Price = 50000 + 100 * SquareFootage
Interpretation:

Intercept (b0 = 50000): This means that a house with 0 square footage (which is unrealistic) would have a predicted price of $50,000. This might represent a base value like land value or minimum construction costs.
Slope (b1 = 100): This means that for every additional square foot of living space, the predicted price of the house increases by $100.

**Q4. Explain the concept of gradient descent. How is it used in machine learning?**
Ans: Gradient descent is an optimization algorithm used to find the minimum of a function. In machine learning, it's used to minimize the ost function, which measures how well a model's predictions match the actual values.

*How it works:*
1. Initialize Parameters:
    - Start with random initial values for the model's parameters (weights and biases).
2. Calculate the Gradient:
    - Compute the gradient of the cost function with respect to the parameters. The gradient indicates the direction of steepest ascent.
3. Update Parameters:
    - Update the parameters in the opposite direction of the gradient, multiplied by a learning rate:
    ***new_parameter = old_parameter - learning_rate * gradient***
4. Repeat:
    - Iterate steps 2 and 3 until the gradient becomes very small, indicating that we've reached a minimum point.

**Q5. Describe the multiple linear regression model. How does it differ from simple linear regression?**

Ans: **Multiple linear regression** is a statistical method used to model the relationship between a dependent variable  and two or more independent variables. It extends the concept of simple linear regression, which involves only one independent variable.

The general equation for multiple linear regression is:

Y = β₀ + β₁X₁ + β₂X₂ + ... + βₚXₚ + ε

Where:
- Y: Dependent variable
- X₁, X₂, ..., Xₚ: Independent variables
- β₀: Intercept
- β₁, β₂, ..., βₚ: Coefficients for each independent variable
- ε: Error term

Key Differences from Simple Linear Regression:

1. Number of Independent Variables:
    - Simple linear regression: One independent variable
    - Multiple linear regression: Two or more independent variables
2. Model Complexity:
    - Simple linear regression: A linear relationship between two variables
    - Multiple linear regression: A more complex relationship involving multiple variables,           potentially capturing non-linear effects and interactions.
3. Model Interpretation:
    - Simple linear regression: The slope coefficient indicates the change in the dependent variable for a unit change in the independent variable.
    - Multiple linear regression: The coefficients for each independent variable represent the change in the dependent variable for a unit change in that specific independent variable, holding other variables constant.

**Q6. Explain the concept of multicollinearity in multiple linear regression. How can you detect and
address this issue?**

Ans: Multicollinearity occurs when two or more independent variables in a regression model are highly correlated with each other.

Detecting Multicollinearity:
1. Correlation Matrix:
    - Calculate the correlation coefficients between all pairs of independent variables.   
    - High correlation coefficients (e.g., above 0.7 or 0.8) indicate potential multicollinearity.
2. Variance Inflation Factor (VIF):
    - VIF measures the extent to which the variance of a coefficient estimate is inflated due to multicollinearity.
    - A VIF greater than 10 suggests high multicollinearity.
3. Tolerance:
    - Tolerance is the reciprocal of VIF.
    - A tolerance value less than 0.1 indicates high multicollinearity.

Addressing Multicollinearity:

1. Feature Engineering:
    - Combine highly correlated variables into a single variable or create a new variable that captures the underlying relationship.
2. Principal Component Analysis (PCA):
    - Reduce the dimensionality of the data by creating uncorrelated linear combinations of the original variables.
3. Remove Redundant Variables:
    - If two variables are highly correlated, remove one of them from the model.
4. Ridge Regression:
    - This technique adds a penalty term to the regression equation to shrink the coefficients and reduce the impact of multicollinearity.
5. Lasso Regression:
    - This technique can automatically select a subset of relevant variables, potentially reducing the impact of multicollinearity.

**Q7. Describe the polynomial regression model. How is it different from linear regression?**

Ans: Polynomial Regression is a form of regression analysis in which the relationship between the independent variable (x) and the dependent variable (y) is modeled as an nth degree polynomial. This allows for more complex, non-linear relationships between the variables.   

Mathematical Representation:

y = β₀ + β₁x + β₂x² + ... + βₙxⁿ + ε

Where:

- y: Dependent variable
- x: Independent variable
- β₀, β₁, ..., βₙ: Coefficients
- ε: Error term
- n: Degree of the polynomial

Key Differences from Linear Regression:

1. Relationship:
    - Linear Regression: Assumes a linear relationship between the variables.   
    - Polynomial Regression: Allows for non-linear relationships by introducing polynomial terms (x², x³, etc.) into the model.   
2. Model Complexity:
    - Linear Regression: Simpler model, often suitable for linear relationships.   
    - Polynomial Regression: More flexible model, capable of capturing complex patterns in data.   
3. Overfitting:
    - Polynomial Regression: Prone to overfitting, especially with higher-degree polynomials. Careful model selection and regularization techniques are crucial.   


**Q8. What are the advantages and disadvantages of polynomial regression compared to linear
regression? In what situations would you prefer to use polynomial regression?**

Ans: Polynomial Regression is a form of regression analysis in which the relationship between the independent variable (x) and the dependent variable (y) is modeled as an nth degree polynomial. This allows for more complex, non-linear relationships between the variables.   

Mathematical Representation:

y = β₀ + β₁x + β₂x² + ... + βₙxⁿ + ε

Where:

- y: Dependent variable
- x: Independent variable
- β₀, β₁, ..., βₙ: Coefficients
- ε: Error term
- n: Degree of the polynomial

Key Differences from Linear Regression:

1. Relationship:
    - Linear Regression: Assumes a linear relationship between the variables.   
    - Polynomial Regression: Allows for non-linear relationships by introducing polynomial terms (x², x³, etc.) into the model.   
2. Model Complexity:
    - Linear Regression: Simpler model, often suitable for linear relationships.   
    - Polynomial Regression: More flexible model, capable of capturing complex patterns in data.   
3. Overfitting:
    - Polynomial Regression: Prone to overfitting, especially with higher-degree polynomials. Careful model selection and regularization techniques are crucial.   

When to Use Polynomial Regression:
- When the relationship between variables is non-linear.
- When the data exhibits curvature or trends that cannot be captured by a linear model.
- When you need to model complex patterns in the data.