## Ans 1

Overfitting occurs when a model is too complex and has learned the training data too well, to the point where it starts to fit noise in the data rather than the underlying pattern. As a result, the model may perform very well on the training data, but poorly on new, unseen data.

Underfitting occurs when a model is too simple and is unable to capture the underlying pattern in the data. In this case, the model performs poorly on both the training and test data.

Overfitting can lead to poor generalization performance, making the model useless for real-world applications. Underfitting, on the other hand, means that the model is unable to capture the patterns in the data, and may miss important relationships between the features and target variable.

To mitigate overfitting, we can use techniques such as regularization, which adds a penalty term to the objective function to prevent the model from over-relying on certain features. Another approach is to use cross-validation to tune hyperparameters and select the model with the best performance on unseen data.

To mitigate underfitting, we can try increasing the model's complexity by adding more features or increasing the model's capacity. However, we should be careful not to overdo this, as it may lead to overfitting. We can also try using a more sophisticated model that is better suited for the data at hand, or use ensemble techniques to combine multiple models to improve performance.

# Ans2

Use more data: Increasing the size of the training set can help the model better capture the underlying pattern in the data and reduce overfitting.

Use regularization: Regularization is a technique that adds a penalty term to the objective function to prevent the model from over-relying on certain features. This can help reduce overfitting and improve the model's generalization performance.

Use cross-validation: Cross-validation is a technique that splits the data into multiple training and validation sets, allowing us to tune the model's hyperparameters and select the model with the best performance on unseen data.

Simplify the model: Reducing the complexity of the model by removing unnecessary features or using a simpler model architecture can also help reduce overfitting.

Use dropout: Dropout is a technique that randomly drops out a fraction of the neurons during training, forcing the model to learn more robust features and reducing overfitting.

Early stopping: Early stopping is a technique that stops the training process when the model's performance on the validation set starts to degrade, preventing the model from overfitting to the training data.

Data augmentation: Data augmentation involves generating new training data by applying transformations to the existing data, such as rotating or flipping images. This can help the model generalize better to new data and reduce overfitting.

# Ans3

Underfitting occurs when a machine learning model is too simple to capture the underlying pattern in the data, resulting in poor performance on both the training and test data. This can happen when the model is not complex enough to capture the nuances in the data, or when there is not enough training data to accurately represent the problem.

Scenarios where underfitting can occur in machine learning:

Linear models: Linear models such as linear regression and logistic regression can underfit when the relationship between the features and target variable is not linear. In such cases more complex models such as decision trees or neural networks may be more appropriate.

Insufficient data: When there is not enough data to accurately represent the problem the model may underfit and fail to capture the underlying patterns in the data. In such cases collecting more data or using data augmentation techniques can help improve performance.

Over-regularization: Regularization is a technique used to prevent overfitting by adding a penalty term to the objective function. If the regularization is too strong it can lead to underfitting by oversimplifying the model. Tuning the regularization parameter can help find the right balance between overfitting and underfitting.

High bias: High bias can occur when the model is too simple and cannot capture the underlying patterns in the data. This can happen when the model architecture is too simple or when certain features are not included in the model. Increasing the complexity of the model or adding more features can help reduce bias and improve performance.

Incorrect data preprocessing: Data preprocessing techniques such as feature scaling and normalization can significantly impact the performance of machine learning models. Incorrect data preprocessing can lead to underfitting by making it difficult for the model to capture the underlying patterns in the data.

## Ans4

Bias Variance tradeoff is a concept that specifies the ability of the model to fit the information and use it to apply on the real world data.

The tradeoff occurs beacause if we reduce the bias, variance increases and vice veras.

Bias refers to the error that occurs when a model makes simplifying assumptions about the underlying problem, resulting in the model consistently underpredicting or overpredicting the target variable. High bias models are typically simple and have low complexity, making them more likely to underfit the training data.

Variance refers to the error that occurs when a model is too sensitive to the noise in the training data and captures the idiosyncrasies of the training data rather than the underlying pattern. High variance models are typically more complex and have more flexibility to fit the training data, making them more likely to overfit the data

We want to achieve minimum bias and minimum variance to get the best output.

## Ans5

## Ans6

Bias and variance are two important concepts in machine learning that describe the characteristics of a model and its ability to generalize to new data.

Bias refers to the error that arises from the model's inability to capture the true relationship between the features and the target variable. High bias models are typically too simple and have low complexity, resulting in a significant amount of underfitting. Exampeles of high bias models include linear regression models with few features or shallow decision trees. These models are generally easier to interpret and faster to traim but may not capture the complexity of the underlying data, leading to poor performance on the training and testing datasets.

Variance refers to the error that arises from the model's sensitivity to fluctuations in the training data. High variance models are typically too complex and have high flexibility, leading to overfitting. Examples of high variance models include deep neural networks or decision trees with many layers. These models can capture complex relationships in the data, but they may not generalize well to new data due to their sensitivity to noise and fluctuations in the training dataset.

In general, high bias models have low variasnce and high stability, but they have a higher chance of underfitting the training data. High variance models, on the other hand, have low stability and high variance, but they have a higher chance of overfitting the training data

## Ans7

Regularization is a technique used in machine learning to prevent overfitting by adding a penalty term to the loss function during training. The penalty term encourages the model to have smaller weights, resulting in a simpler model that is less likely to overfit the training data.

Regularization techniques:

L1 regularization (Lasso regression): In L1 regularization the penalty term is the sum of the absolute values of the weights. This results in a sparse model where many weights are set to zero, effectively performing feature selection.

L2 regularization (Ridge regression): In L2 regularization the penalty term is the sum of the squares of the weights. This results in a smoother model that distributes the weight more evenly across all features.