Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
demo.ipynb		demo.ipynb
meta.txt		meta.txt
notes.txt		notes.txt
slides.key		slides.key
slides.pdf		slides.pdf

Classification

1. LogisticRegression Classifier (it's not a regressor!)

Practice: Fit, Score, Predict

Start with the following data:

import pandas as pd
from sklearn.linear_model import LogisticRegression

df = pd.DataFrame(
    columns=["x1", "x2", "x3", "y"],
    data = [
       [5, 1, 2, True],
       [4, 3, 3, False],
       [0, 3, 4, False],
       [0, 1, 4, False],
       [3, 1, 3, False],
       [5, 1, 1, True],
       [6, 3, 1, True],
       [1, 4, 2, False],
       [3, 0, 0, True],
       [7, 1, 0, True],
       [2, 4, 0, False],
       [3, 4, 0, False],
       [3, 5, 1, False],
       [9, 4, 3, True],
       [0, 0, 3, False],
       [7, 0, 4, True],
       [6, 0, 1, True],
       [2, 4, 2, False],
       [5, 3, 0, True],
       [4, 0, 1, True]
    ]
)
df

The pattern is that y is True whenever x1 >= x2 + x3. Let's see how well a LogisticRegression can learn that pattern.

We pre-shuffled the data before giving it to you, so lets use the first 10 rows for training and the latter 10 for testing (no need to use sklearn's train_test_split). Complete the following slices to do this:

train, test = df.iloc[????], df.iloc[????]

ANSWER

:10, 10:

Fit a classifier to the training data:

lr = LogisticRegression()
lr.fit(????, ????)

ANSWER

train[["x1", "x2", "x3"]], train["y"]

How accurate is the model on the test data? Copy the last line from above, then replace "fit" with "score", and replace "train" with "test". You should get 0.9.

Let's add the predictions to a column in the test DataFrame (note, test refers to subset of the rows in df; pandas will complain with SettingWithCopyWarning if we add a column to this subset without making train a full copy of those rows):

????
test["predicted_y"] = lr.predict(????)
test

ANSWER

test=test.copy(), test[["x1", "x2", "x3"]]

Let's show the rows in the test dataset that the model mis-classifies:

test[test["y"] != ????]

ANSWER

test["predicted_y"]

Note that fit_intercept=True is the default for LogisticRegression. We overrode this default in the lecture video, but not here. This means that both lr.coef_ and lr.intercept_ will be set.

The formula is very similar to before -- we still do the dot product, but then add the intercept:

test[["x1", "x2", "x3"]] @ lr.coef_.T + lr.intercept_ > 0

Verify that the above calculation produces the same results as we saw with lr.predict earlier.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

33-classification1

33-classification1

README.md

Classification

1. LogisticRegression Classifier (it's not a regressor!)

Watch: 29-minute video

Practice: Fit, Score, Predict

2. Decision Boundaries

Watch: 15-minute video

Files

33-classification1

Directory actions

More options

Directory actions

More options

Latest commit

History

33-classification1

Folders and files

parent directory

README.md

Classification

1. LogisticRegression Classifier (it's not a regressor!)

Watch: 29-minute video

Practice: Fit, Score, Predict

2. Decision Boundaries

Watch: 15-minute video