# Unit 1 Recall of the Linear Regression Basics

## Lesson Introduction

This lesson provides a quick refresher on the core concepts of linear regression, focusing on key steps and implementation in Python using `sklearn`. By the end of this lesson, you'll be ready to load datasets, split them, create and train a linear regression model, make predictions, and evaluate the model.

## Loading Data

We'll start by loading the `diabetes dataset` from `sklearn`. This dataset contains ten baseline variables (age, sex, body mass index, average blood pressure, and six blood serum measurements), which were obtained for each of 442 diabetes patients. The target is a quantitative measure of disease progression one year after baseline.

```python
import numpy as np
from sklearn import datasets

# Load the diabetes dataset
diabetes = datasets.load_diabetes()
X = diabetes.data  # Features
y = diabetes.target  # Target

print("Features:\n", X[:2])
print("Target:\n", y[:2])
```

Note that we can access features and target of this dataset by using `.data` and `.target` attributes. This code prints out the first two rows of the dataset, so we can observe its structure:

```
Features:
 [[ 0.03807591  0.05068012  0.06169621  0.02187239 -0.0442235  -0.03482076
  -0.04340085 -0.00259226  0.01990749 -0.01764613]
 [-0.00188202 -0.04464164 -0.05147406 -0.02632753 -0.00844872 -0.01916334
   0.07441156 -0.03949338 -0.06833155 -0.09220405]]
Target:
 [151.  75.]
```

There is also a shortcut for loading X and y:

```python
X, y = datasets.load_diabetes(return_X_y=True)
```

The `return_X_y=True` parameter allows us to split the dataset when loading. You can use any method you find comfortable.

## Splitting the Dataset

Next, we'll split our data into training and testing sets, like we did before. As a reminder, we use the `train_test_split` function for it.

```python
from sklearn.model_selection import train_test_split

# Split dataset into 80% training and 20% testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print("Training set size:", X_train.shape)
print("Testing set size:", X_test.shape)
```

Output:

```
Training set size: (353, 10)
Testing set size: (89, 10)
```

The size of the test set, `test_size`, is set to `0.2`, which is 20%. It is common to set the test set size to 20-30%.

## Creating the Model

Let's create a Linear Regression model and train it:

```python
from sklearn.linear_model import LinearRegression

# Create a Linear Regression model
model = LinearRegression()
# Train the model
model.fit(X_train, y_train)
```

## Making Predictions

Using the trained model, let's make predictions on the test set:

```python
# Make predictions on the test set
y_pred = model.predict(X_test)
print(y_pred[:5])  # [139.5475584  179.51720835 134.03875572 291.41702925 123.78965872]
```

We print out the first 5 predictions to observe their values. Now, we can evaluate the model's performance by using some metric. We will apply the `Mean Squared Error (MSE)` metric here:

```python
from sklearn.metrics import mean_squared_error

# Evaluate the model using Mean Squared Error (MSE)
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error: %.2f" % mse)
```

Output:

```
Mean Squared Error: 2900.13
```

## Lesson Summary

You've refreshed your knowledge on:

  * Loading datasets
  * Splitting data into training and testing sets
  * Creating and training a linear regression model
  * Making predictions
  * Evaluating the model using `MSE`

Now, you're prepared for the practice session to reinforce these concepts. Let's dive in\!

## Print the First 10 Predictions

Hey Space Explorer, let's tweak some code! Instead of printing the model coefficients, show the first 10 predictions that model makes on the testing set. Let's practice making predictions

```python
import numpy as np
from sklearn import datasets
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

# Load the diabetes dataset
diabetes = datasets.load_diabetes()
X, y = diabetes.data, diabetes.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the model
model = LinearRegression()
model.fit(X_train, y_train)

# Print the coefficients of the model
print("Model coefficients:", model.coef_)

```

```python
import numpy as np
from sklearn import datasets
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

# Load the diabetes dataset
diabetes = datasets.load_diabetes()
X, y = diabetes.data, diabetes.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Print the first 10 predictions
print("First 10 predictions:", y_pred[:10])
```

## Complete the Linear Regression Model

Fantastic job so far, Space Voyager! Now, let's test your skills. Fill in the missing pieces to make the code complete and functional. Remember, we need to create and train a Linear Regression model.

Launch into action!


```python
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Load the diabetes dataset
diabetes = datasets.load_diabetes()
X = diabetes.data
y = diabetes.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# TODO: Create the Linear Regression model and train it

# Make predictions and evaluate the model
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error on Diabetes Dataset: %.2f" % mse)

```

```python
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Load the diabetes dataset
diabetes = datasets.load_diabetes()
X = diabetes.data
y = diabetes.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create the Linear Regression model and train it
model = LinearRegression() # Instantiate the LinearRegression model
model.fit(X_train, y_train) # Train the model using the training data

# Make predictions and evaluate the model
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error on Diabetes Dataset: %.2f" % mse)
```

## Predicting Diabetes Health Metrics

Space Explorer, let's wrap up our journey into the stars with this final task!

You'll need to split the dataset, create a model, make predictions, and evaluate its performance using Mean Squared Error. Complete the TODO sections with the code to accomplish these tasks.



```python
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Load a diabetes dataset
data = datasets.load_diabetes()
X = data.data
y = data.target

# TODO: Split the data into training and testing sets
# TODO: Create and train the linear regression model
# TODO: Make predictions on the test set
# TODO: Evaluate the model using Mean Squared Error and print the result

```
```python
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Load a diabetes dataset
data = datasets.load_diabetes()
X = data.data
y = data.target

# Split the data into training and testing sets
# We'll use 80% for training and 20% for testing, with a fixed random_state for reproducibility.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the linear regression model
# Instantiate the LinearRegression model.
model = LinearRegression()
# Train the model using the training data (features X_train and target y_train).
model.fit(X_train, y_train)

# Make predictions on the test set
# Use the trained model to make predictions on the unseen test features (X_test).
y_pred = model.predict(X_test)

# Evaluate the model using Mean Squared Error and print the result
# Calculate the Mean Squared Error between the actual target values (y_test)
# and the predicted target values (y_pred).
mse = mean_squared_error(y_test, y_pred)
# Print the calculated Mean Squared Error, formatted to two decimal places.
print(f"Mean Squared Error on Diabetes Dataset: {mse:.2f}")

```
