# Color-Magnitude diagram

We will be fitting a polynomial to the color-magnitude relation for the stars in the Pleiades cluster.

In this activity we will stick to Ordinary Least Squares fitting but you should use this notebook to explore other options.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

## 1. Read the data in `pleiadesdata.csv`.

In [2]:
data = pd.read_csv('pleiadesdata.csv', skiprows=5, 
                   names=['id', 'V', 'B', 'B-V', 'Vmag', 'VMag', 'm-M'])

## 2. Plot the `B-V` color vs. `V` magnitude

Usually magnitude is plotted so that brighter goes on top and fainter at the bottom.

In [3]:
# Complete here

## 3. Split the sample into a training and testing sub-set. Start by fitting a straight line to the data using ordinary least squares.

In [4]:
from sklearn.model_selection import train_test_split
from sklearn import linear_model
from sklearn.metrics import mean_squared_error

Notice the reshaping done in the following step. It is necessary to use the linear models from sklearn.

In [5]:
x = data['B-V'].to_numpy().reshape(-1, 1)
y = data['V'].to_numpy().reshape(-1, 1)

x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=42)

Plot both the train and the test sample

In [6]:
# complete here

Now do the fit. Report the MSE for the train and the test.

In [7]:
ols1 = linear_model.LinearRegression()
linear = ols1.fit(x_train, y_train)
# print the Train MSE
print("Test MSE = ", mean_squared_error(y_test, linear.predict(x_test)))

Test MSE =  0.39227736654044426


Plot the train, test, and model.

In [8]:
x_to_plot = np.linspace(-.2, 1.4, 20).reshape(-1, 1) # this will be useful to plot the model

# complete the rest


## 4. Fit a 2nd order polynomial.

Here it's recommended to use `PolynomialFeatures` to build the design matrix automatically.

In [9]:
from sklearn.preprocessing import PolynomialFeatures

The following cell transforms the train and test sets into the right design matrix

In [10]:
poly2 = PolynomialFeatures(degree=2)
x_train_2 = poly2.fit_transform(x_train)
x_test_2 = poly2.fit_transform(x_test)

Now we can fit the model just like before.

In [11]:
# Complete here to fit the model

And now plot the data and the 2nd order model.

In [12]:
x_to_plot_2 = poly2.fit_transform(x_to_plot)  # We need to transform this too, to be able to use predict

## 5. Keep increasing the order and decide when to stop

Write a function that does all the previous steps at once and returns the `mean_squared_errors` for the training sample and for the testing sample.

Plot the MSE as a function of the degree of the polynomial for the training sample and for the testing sample.

In [13]:
def fit_model(degree, plot=True):
    # Complete this funciton to make all the previous steps automatically.
    pass

Use the function to fit different degeree polynomials and store the MSE for the train and test datasets.

In [14]:
# Complete here

Plot the MSEs previously calculated

In [15]:
# plt.plot(mse_test)
# plt.plot(mse_train)