# Training a Linear Regression Model to Predict GPU Performance 

Welcome to our programming project! This is where you'll get to apply the knowledge you obtained in class on a real world project. While the demos are done using SKLearn for simplicity, these programming projects aim to expose you to PyTorch, an industrial library for Machine Learning.


## Introduction

The GPU is a piece of computing hardware first developed in the early 1990s for accelerating graphical applications. Due to its highly parallel nature, people eventually found great uses of GPUs in areas other than gaming. Today, the GPU market is a 20 billion dollar market that serves the fields of video games, AI, crypto, scientific computing and engineering sectors.

In this project, we will train a linear regression model on a GPU dataset -- our goal is to build a web application that takes in the **process node, die area, memory size,** and the number of **millions of transistors** to predict the **compute performance** of that GPU.

## Importing the Necessary Libraries

This project aims to train a linear regression model on the **AMD Radeon and Nvidia GPU Specifications** dataset by  JetBrains Datalore. Here are a list of libraries used in this project:

- **PyTorch**: A powerful machine learning Library
- **Pandas** and **Numpy**: Used for easy data storage and linear algebra
- **Matplotlib** and **Bokeh**: For creating static and dynamic visualizations

In [4]:
import torch
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from bokeh.plotting import show, figure, output_notebook


pd.set_option('display.max_colwidth', None)

## Loading the Dataset

The first step is to load and visualize the  dataset. This is crucial in any machine learning development process, as this can give you an idea of what features to include or exclude.

In [6]:
gpu_dataset = pd.read_csv("datasets/gpus_v2.csv")
gpu_dataset.head(20)

Unnamed: 0,Manufacturer,Class,Name,Year,Fab,Transistors (mln),Die size,Memory size,Memory speed,GFLOPS,TDP
0,Nvidia,Desktop,GeForce 8300 GS,2007,80,210,127,512,6.4,14.4,40.0
1,Nvidia,Desktop,GeForce 8400 GS,2007,80,210,127,512,6.4,28.8,40.0
2,Nvidia,Desktop,GeForce 8400 GS rev.2,2007,65,210,86,512,6.4,24.4,25.0
3,Nvidia,Desktop,GeForce 8400 GS rev.3,2010,40,260,57,1024,9.6,19.7,25.0
4,Nvidia,Desktop,GeForce 8500 GT,2007,80,210,127,1024,12.8,28.8,45.0
5,Nvidia,Desktop,GeForce 8600 GS,2007,80,289,169,512,12.8,75.5,47.0
6,Nvidia,Desktop,GeForce 8600 GT,2007,80,289,169,1024,22.4,76.0,45.0
7,Nvidia,Desktop,GeForce 8600 GTS,2007,80,289,169,512,32.0,92.8,71.0
8,Nvidia,Desktop,GeForce 8600 GTS,2008,65,754,324,768,38.4,264.0,105.0
9,Nvidia,Desktop,GeForce 8800 GTS,2007,90,681,484,640,64.0,228.0,146.0


Here, we are going to choose the columns `Transistors (mln)`, `Die size`, `Memory size` and `Fab` to be our **training features** and `GFLOPS` to be our **labels**.

In [9]:
features = gpu_dataset[["Transistors (mln)", "Die size", "Memory size", "Fab"]].to_numpy()
labels = gpu_dataset[['GFLOPS']].to_numpy()

print(features.shape)
print(labels.shape)

(497, 4)
(497, 1)


## References

https://www.alliedmarketresearch.com/graphic-processing-unit-market