# Diamond Price Prediction
***
## Table of Contents
***


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from typing import Tuple
from numpy.typing import NDArray

## 1. Introduction
Diamond is 58 times harder than any other mineral in the world, and its elegance as a gemstone has long been admired. The diamond industry relies heavily on accurate and objective valuation, as prices are determined by a combination of physical and qualitative attributes. By applying machine learning algorithms, we can estimate diamond prices more effectively, thereby supporting jewellers, investors, and consumers in making informed decisions.

The objective of this project is to develop and compare multiple predictive models to accurately estimate the price of diamonds based on their features, providing actionable insights for stakeholders, jewellers, and clients in the diamond industry.

## 2. Loading Data
Retrieved from [Kaggle - Diamonds](https://www.kaggle.com/datasets/shivam2503/diamonds)
- **price**: Price in US dollars ($326 - $18,823). Target variable.
- **carat**: Weight of the diamond (0.2 - 5.01)
- **cut**: Quality of the cut (Fair, Good, Very Good, Premium, Ideal)
- **color**: Diamond colour, from J (worst) to D (best)
- **clarity**: Measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (best))
- **x**: Length in mm (0 - 10.74)
- **y**: Width in mm (0 - 58.9)
- **z**: Depth in mm (0 - 31.8)
- **depth**: Total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43-79)
- **table**: Width of top of diamond relative to widest point (43-95)

In [None]:
df = pd.read_csv("_datasets/diamonds.csv")
df = df.drop("Unnamed: 0", axis=1)
df.head()

Unnamed: 0,carat,cut,color,clarity,depth,table,price,x,y,z
0,0.23,Ideal,E,SI2,61.5,55.0,326,3.95,3.98,2.43
1,0.21,Premium,E,SI1,59.8,61.0,326,3.89,3.84,2.31
2,0.23,Good,E,VS1,56.9,65.0,327,4.05,4.07,2.31
3,0.29,Premium,I,VS2,62.4,58.0,334,4.2,4.23,2.63
4,0.31,Good,J,SI2,63.3,58.0,335,4.34,4.35,2.75


## References

1. Omondi, Evans (2023). *Assessing predictive performance of supervised machine learning algorithms for a diamond pricing model [Dataset]*. Dryad. <br>
https://doi.org/10.5061/dryad.wh70rxwrh

1. Shivam Agrawal. (2017). *Diamonds*.<br>
https://www.kaggle.com/datasets/shivam2503/diamonds