# **<u>Cleaning and adapting the fuel price data:</u>**

In [28]:
import pandas as pd

## **<u>Loading the data using Pandas:</u>**

In [29]:
df = pd.read_csv('../external_data/fuel_price_index.csv', sep=';')

In [30]:
df.head()

Unnamed: 0,Date,CNR NGV fuel index
0,2020-01,10524
1,2020-02,9750
2,2020-03,9206
3,2020-04,9459
4,2020-05,8777


## **<u>Checking for null values:</u>**

In [31]:
df.isna().sum().sort_values(ascending=False)

Date                  0
CNR NGV fuel index    0
dtype: int64

-> In this case, the data doesn't contain null values, so we can proceed with the next step

## **<u>Modifying the dataset's columns:</u>**

first let's split the 'Date' column into year and month and delete it:

In [32]:
df['year'] = df['Date'].str.split('-').str[0]
df['month'] = df['Date'].str.split('-').str[1]

df = df.drop(['Date'], axis=1)

In [33]:
df.head()

Unnamed: 0,CNR NGV fuel index,year,month
0,10524,2020,1
1,9750,2020,2
2,9206,2020,3
3,9459,2020,4
4,8777,2020,5


next we turn the fuel index values' data type from 'object' to 'float':

In [34]:
df['CNR NGV fuel index'] = df['CNR NGV fuel index'].str.replace(',', '.').astype(float)

In [35]:
df.head()

Unnamed: 0,CNR NGV fuel index,year,month
0,105.24,2020,1
1,97.5,2020,2
2,92.06,2020,3
3,94.59,2020,4
4,87.77,2020,5


## **<u>Creating a csv file with the cleaned data:</u>**

In [36]:
df.to_csv("fuel_index_v1.csv", index=False)