Watch: 29-minute video
Start with the following data:
import pandas as pd
from sklearn.linear_model import LogisticRegression
df = pd.DataFrame(
columns=["x1", "x2", "x3", "y"],
data = [
[5, 1, 2, True],
[4, 3, 3, False],
[0, 3, 4, False],
[0, 1, 4, False],
[3, 1, 3, False],
[5, 1, 1, True],
[6, 3, 1, True],
[1, 4, 2, False],
[3, 0, 0, True],
[7, 1, 0, True],
[2, 4, 0, False],
[3, 4, 0, False],
[3, 5, 1, False],
[9, 4, 3, True],
[0, 0, 3, False],
[7, 0, 4, True],
[6, 0, 1, True],
[2, 4, 2, False],
[5, 3, 0, True],
[4, 0, 1, True]
]
)
df
The pattern is that y is True whenever x1 >= x2 + x3. Let's see how well a LogisticRegression can learn that pattern.
We pre-shuffled the data before giving it to you, so lets use the
first 10 rows for training and the latter 10 for testing (no need to
use sklearn's train_test_split
). Complete the following slices to
do this:
train, test = df.iloc[????], df.iloc[????]
ANSWER
:10, 10:Fit a classifier to the training data:
lr = LogisticRegression()
lr.fit(????, ????)
ANSWER
train[["x1", "x2", "x3"]], train["y"]How accurate is the model on the test data? Copy the last line from above, then replace "fit" with "score", and replace "train" with "test". You should get 0.9.
Let's add the predictions to a column in the test DataFrame (note,
test refers to subset of the rows in df; pandas will complain with
SettingWithCopyWarning
if we add a column to this subset without
making train a full copy of those rows):
????
test["predicted_y"] = lr.predict(????)
test
ANSWER
test=test.copy(), test[["x1", "x2", "x3"]]Let's show the rows in the test dataset that the model mis-classifies:
test[test["y"] != ????]
ANSWER
test["predicted_y"]Note that fit_intercept=True
is the default for LogisticRegression.
We overrode this default in the lecture video, but not here. This
means that both lr.coef_
and lr.intercept_
will be set.
The formula is very similar to before -- we still do the dot product, but then add the intercept:
test[["x1", "x2", "x3"]] @ lr.coef_.T + lr.intercept_ > 0
Verify that the above calculation produces the same results as we saw
with lr.predict
earlier.