Regression Using TensorFlow

Predicting fuel efficiency (MPG) from car specifications using a deep neural network built with TensorFlow/Keras.

Problem

The Auto MPG dataset contains specifications for ~400 cars (cylinders, displacement, horsepower, weight, acceleration, model year, origin) and their fuel efficiency in miles per gallon. The goal: build a neural network that predicts MPG from these features.

Approach

Clean — impute 6 missing Horsepower values with the mean
Encode — convert Origin (1/2/3) to one-hot vectors (USA/Europe/Japan)
Scale — normalise features and labels to zero mean, unit variance with StandardScaler
Build — 3-layer sequential network: Dense(32, relu) → Dense(64, relu) → Dense(1)
Train — 100 epochs, RMSProp optimiser, MSE loss
Evaluate — loss on held-out test set, inverse-transform predictions back to original MPG scale

Results

The model converges cleanly — training and validation loss track closely, indicating no overfitting. Predictions are inverse-scaled back to real MPG values for interpretation.

Stage	Detail
Training set	80% (~314 cars)
Test set	20% (~78 cars)
Architecture	3-layer DNN (32 → 64 → 1)
Optimiser	RMSProp
Loss	MSE
Epochs	100

Setup

Google Colab

Click the badge above — no setup required.

Local

pip install tensorflow pandas scikit-learn matplotlib
git clone https://github.com/wsamuelw/regression-using-tensorflow.git
cd regression-using-tensorflow
jupyter notebook regression_using_tensorflow.ipynb

Data

Auto MPG Dataset — from the UCI Machine Learning Repository. ~400 cars with 7 features and 1 target.

Feature	Type	Description
Cylinders	int	Number of cylinders
Displacement	float	Engine displacement
Horsepower	float	Engine power (6 missing, mean-imputed)
Weight	float	Vehicle weight
Acceleration	float	0-60 mph time
Model Year	int	Year of manufacture
Origin	cat	1=USA, 2=Europe, 3=Japan (one-hot encoded)

Target: MPG (miles per gallon)

Why Normalise?

Neural networks train faster and more reliably when inputs are on similar scales. Without normalisation, features like Weight (1,000–5,000) would dominate features like Acceleration (8–25), and the network would struggle to learn.

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)  # mean=0, std=1
X_test = scaler.transform(X_test)        # use training stats

Architecture

Input (9 features)
  ↓
Dense(32, activation='relu')
  ↓
Dense(64, activation='relu')
  ↓
Dense(1)  ← regression output (no activation)

No activation on the output layer — regression requires continuous output, not bounded to [0,1].

Key Decisions

Mean imputation for missing Horsepower — only 6 rows affected, mean is safe here
One-hot encoding for Origin — categorical variable, ordinal encoding would imply USA < Europe < Japan
StandardScaler on labels — helps the network converge; predictions are inverse-transformed back to MPG
RMSProp optimiser — adaptive learning rate, works well out of the box for regression

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
README.md		README.md
regression_using_tensorflow.ipynb		regression_using_tensorflow.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Regression Using TensorFlow

Problem

Approach

Results

Setup

Google Colab

Local

Data

Why Normalise?

Architecture

Key Decisions

Tech Stack

References

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Regression Using TensorFlow

Problem

Approach

Results

Setup

Google Colab

Local

Data

Why Normalise?

Architecture

Key Decisions

Tech Stack

References

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages