## Step 1: Importing the necessary libraries


Begin by importing the required libraries for data manipulation, visualization, and linear regression. In this case, we'll use pandas, matplotlib, and scikit-learn.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

## Step 2: Loading the dataset

Next, load your dataset into a pandas DataFrame. Assuming you have a CSV file named "dataset.csv" containing the height and weight data, you can load it as follows:

In [None]:
data = pd.read_csv('dataset.csv')

## Step 3: Exploratory Data Analysis (EDA)

Performing EDA allows you to understand your dataset and identify any patterns or outliers. You can use various pandas functions and visualization techniques to explore the data. Let's start with a quick overview of the dataset:

In [None]:
print(data.head())

To gain more insights, you can plot a scatter plot of the height and weight variables:

In [None]:
plt.plot(data['height'], data['weight'])
plt.xlabel('Height')
plt.ylabel('Weight')
plt.title('Height vs. Weight')
plt.show()

## Step 4: Data Preprocessing

Before fitting a linear regression model, it's essential to preprocess the data. This involves splitting the dataset into input (X) and output (y) variables. In our case, the height will be the input variable, and weight will be the output variable. We'll also split the data into training and testing sets to evaluate the model later.

In [None]:
X = data['height'].values.reshape(-1, 1)
y = data['weight'].values.reshape(-1, 1)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## Step 5: Fitting the Linear Regression Model

Now it's time to fit a linear regression model to our training data. The scikit-learn library provides a convenient LinearRegression class for this purpose:

In [None]:
model = LinearRegression()
model.fit(X_train, y_train)

## Step 6: Making Predictions

After training the model, we can use it to make predictions on new, unseen data. Let's make predictions on the test set and visualize them alongside the actual values:

In [None]:
y_pred = model.predict(X_test)

plt.scatter(X_test, y_test, color='b', label='Actual')
plt.plot(X_test, y_pred, color='r', label='Predicted')
plt.xlabel('Height')
plt.ylabel('Weight')
plt.title('Actual vs. Predicted')
plt.legend()
plt.show()

## Step 7: Evaluating the Model

To assess the performance of our model, we can calculate metrics such as the mean squared error (MSE) and coefficient of determination (R²). Scikit-learn provides these metrics for regression tasks:


In [None]:
from sklearn.metrics import mean_squared_error, r2_score

mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f'Mean Squared Error: {mse}')
print(f'R^2 Score: {r2}') 