# Exercises for Introduction to Python for Data Science

Week 07 - Numpy

Matthias Feurer and Andreas Bender  
2026-05-06

# Exercise 1

Write a class that implements a `fit` and a `predict` method. The `fit`
method should accept `X` and `y`, i.e., data for a supervised regression
task. The `predict` function should accept only `X` and return the
predictions for new data.

Here’s an outline of the function to get you started.

``` python
class LinearRegression:
    def fit(X, y):
        pass
        
    def predict(X):
        pass
```

You can follow these steps to arrive at a useful solution:

1.  In the fit function you are supposed to implement the equations for
    a linear regression and store the coefficients as an attribute so
    they can later on be used in the predict function.
2.  Standardize the data prior to fitting: make sure that X has a
    column-wise mean of 0 and a standard deviation of 1, and ensure to
    apply the same standardization to the test data.
3.  Standardize the targets prior to fitting, and ensure to
    de-standardize them again before returning them to the user.

NB: use only vectorized operations.

## Solution Exercise 1

We implement ordinary least squares (OLS) in `fit` together with full
standardisation of features and target. After fitting, we undo the
scaling in `predict` so that predictions are returned in the original
units of (y).

In [1]:
import numpy as np

class LinearRegression:
    """Minimal linear regression with standardisation of X and y."""

    # --------------------------------------------------------------------- #
    # fitting                                                              #
    # --------------------------------------------------------------------- #
    def fit(self, X, y):
        """Estimate coefficients via the closed-form OLS solution.

        Parameters
        ----------
        X : array-like, shape (n_samples, n_features)
            Training data.
        y : array-like, shape (n_samples,) or (n_samples, 1)
            Targets.

        Returns
        -------
        self : fitted estimator
        """
        X = np.asarray(X, dtype=float)
        y = np.asarray(y, dtype=float).reshape(-1, 1)

        # ----------------------------------------------------------------- #
        # 1) save means and stds                                            #
        # ----------------------------------------------------------------- #
        self.x_mean_ = X.mean(axis=0, keepdims=True)
        self.x_std_  = X.std(axis=0, keepdims=True)
        self.y_mean_ = y.mean()
        self.y_std_  = y.std()

        # guard against zero variance columns
        self.x_std_[self.x_std_ == 0] = 1.0
        if self.y_std_ == 0:
            self.y_std_ = 1.0

        # ----------------------------------------------------------------- #
        # 2) standardise                                                    #
        # ----------------------------------------------------------------- #
        Xs = (X - self.x_mean_) / self.x_std_
        ys = (y - self.y_mean_) / self.y_std_


        # ----------------------------------------------------------------- #
        # 3) add intercept                                                  #
        # ----------------------------------------------------------------- #
        Xs = np.hstack([np.ones((Xs.shape[0], 1)), Xs])

        # ----------------------------------------------------------------- #
        # 4) compute coefficients                                           #
        # ----------------------------------------------------------------- #
        XtX = Xs.T @ Xs                 # (d+1, d+1)
        XtX_inv = np.linalg.inv(XtX)    # ← assumes XtX is nonsingular
        self.coef_ = XtX_inv @ Xs.T @ ys

        return self

    # --------------------------------------------------------------------- #
    # prediction                                                            #
    # --------------------------------------------------------------------- #
    def predict(self, X):
        """Predict targets for new data.

        Parameters
        ----------
        X : array-like, shape (n_samples, n_features)

        Returns
        -------
        y_pred : ndarray, shape (n_samples,)
        """
        if not hasattr(self, "coef_"):
            raise AttributeError("Model is not fitted yet. Call 'fit' first.")

        X = np.asarray(X, dtype=float)

        # ----------------------------------------------------------------- #
        # 1) standardise                                                    #
        # ----------------------------------------------------------------- #
        Xs = (X - self.x_mean_) / self.x_std_

        # ----------------------------------------------------------------- #
        # 2) add intercept                                                  #
        # ----------------------------------------------------------------- #
        Xs = np.hstack([np.ones((Xs.shape[0], 1)), Xs])

        # ----------------------------------------------------------------- #
        # 3) compute predictions and return rescaled version                #
        # ----------------------------------------------------------------- #
        y_std = Xs @ self.coef_
        return (y_std * self.y_std_ + self.y_mean_).ravel()

# Exercise 2

Simulate random data to try your implementation:

1.  sample random coefficients for a linear regression model and compute
    targets for a dense list of values `X`.
2.  make predictions using the linear regression model you implemented
    in exercise 1.
3.  compare the learned coefficients with the data generating process,
    i.e., the random coefficients from step 1.
4.  Compute the RMSE using only vectorized operations.

## Solution Exercise 2

In [2]:
import numpy as np

rng = np.random.default_rng(0)

# --------------------------------------------------------------------------- #
# 1) simulate data
# --------------------------------------------------------------------------- #
n_samples   = 1000       # number of observations
n_features  = 3          # dimensionality
true_beta   = rng.normal(size=(n_features + 1, 1)) #draw random coefficients

X     = rng.normal(size=(n_samples, n_features)) #Create input matrix

noise = rng.normal(scale=0.05, size=(n_samples, 1)) #Create targets with Gaussian noise
y     = true_beta[0] + X @ true_beta[1:] + noise 

# --------------------------------------------------------------------------- #
# 2) fit model
# --------------------------------------------------------------------------- #
model = LinearRegression().fit(X, y)

# --------------------------------------------------------------------------- #
# 3) compare coefficients
# --------------------------------------------------------------------------- #
np.set_printoptions(precision=3, suppress=True)
print("True β*:      ", true_beta.ravel().round(3))
print("Estimated β̂:", model.coef_.ravel().round(3))

# --------------------------------------------------------------------------- #
# 4) compute RMSE
# --------------------------------------------------------------------------- #
pred  = model.predict(X)
rmse  = np.sqrt(((pred - y.ravel()) ** 2).mean())
print("RMSE:", rmse)

True β*:       [ 0.126 -0.132  0.64   0.105]
Estimated β̂: [-0.    -0.188  0.961  0.153]
RMSE: 0.05038573574021142