# Simple Linear Regression: Confidence- and Prediction-Intervals

This notebook demonstrates how to create confidence and prediction intervals for a linear regression model using Python, based on rental price data (Y) and living area (X).

See also: https://lmc2179.github.io/posts/confidence_prediction.html

## Libraries and Settings

In [None]:
# Import libraries
import os
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import statsmodels.formula.api as smf

# Suppress warnings
import warnings
warnings.filterwarnings('ignore')

# Show current working directory
print(os.getcwd())

## Generate Apartment Data
We will use the provided data representing rental prices (`Y`) and living area (`X`).

In [None]:
# Set the random seed for reproducibility
np.random.seed(42)

# Generate values for rental_price and living_area
x = np.linspace(45, 160, 25)
y = np.interp(x, [45, 160], [1500, 4900]) + np.random.normal(0, 400, 25)

# Create a new DataFrame with 25 data points
data = pd.DataFrame({
    'rental_price': y,
    'living_area': x
})

df = pd.DataFrame(data)
df.columns = ['rental_price', 'living_area']
df.head()

## Fit an OLS Regression Model
We will use `statsmodels` to fit an ordinary least squares (OLS) regression model for rental price as a function of living area.

In [None]:
# Fit OLS regression
model = smf.ols('rental_price ~ living_area', data=df)

# Get and show results
results = model.fit()
print(results.summary())

# Plot data and regression line
plt.scatter(df['living_area'], df['rental_price'])
plt.plot(df['living_area'], 
         results.predict(df['living_area']), 
         color='darkred', 
         linewidth=2)
plt.xlabel('Living Area')
plt.ylabel('Rental Price')
plt.title('Living Area vs Rental Price')

# Add annotation (regression equation with f-string)
plt.text(50, 4500, f"y = {results.params[0]:.2f} + {results.params[1]:.2f}x", 
         fontsize=12, color='black')

plt.show()


## Generate Prediction- and Confidence Intervals

We calculate the prediction- and confidence intervals for the model.

In [None]:
# Generate predictions and confidence intervals
alpha = 0.05
predictions = results.get_prediction(df).summary_frame(alpha)
predictions.head()

## Visualizing the Regression Line and Confidence Interval

We now plot the observed data along with the regression line and confidence intervals.

In [None]:
# Plot observed data and regression line
plt.scatter(df['living_area'], 
            df['rental_price'], 
            label='Observed', 
            marker='x', 
            color='black')

# Plot regression line
plt.plot(df['living_area'], 
         predictions['mean'], 
         label='Regression line')

# Plot confidence intervals
plt.fill_between(df['living_area'], 
                 predictions['mean_ci_lower'], 
                 predictions['mean_ci_upper'], 
                 color='blue', 
                 alpha=0.4, 
                 label='95% Confidence Interval')

# Add legend and labels
plt.xlabel('Living Area (m2)')
plt.ylabel('Rental Price (CHF)')
plt.legend()
plt.show()

## Visualizing the Regression Line and Prediction Interval

We now plot the observed data along with the regression line, confidence- and prediction intervals.

In [None]:
# Plot observed data and regression line
plt.scatter(df['living_area'], 
            df['rental_price'], 
            label='Observed', 
            marker='x', 
            color='black')

# Plot regression line
plt.plot(df['living_area'], 
         predictions['mean'], 
         label='Regression line')

# Plot prediction interval
plt.fill_between(df['living_area'], 
                 predictions['obs_ci_lower'], 
                 predictions['obs_ci_upper'], 
                 alpha=.5, 
                 label='95% Prediction interval',
                 color='orange')

# Plot confidence interval
plt.fill_between(df['living_area'], 
                 predictions['mean_ci_lower'], 
                 predictions['mean_ci_upper'], 
                 alpha=.4, 
                 label='95% Confidence interval',
                 color='blue')

# Add labels
plt.xlabel('Living area (m2)')
plt.ylabel('Rental price (CHF)')
plt.legend()

# Show plot
plt.show()

### Jupyter notebook --footer info-- (please always provide this at the end of each submitted notebook)

In [None]:
import os
import platform
import socket
from platform import python_version
from datetime import datetime

print('-----------------------------------')
print(os.name.upper())
print(platform.system(), '|', platform.release())
print('Datetime:', datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
print('Python Version:', python_version())
print('-----------------------------------')