# ANP-PROJECT - [EN]

---

# Vinicius Guerra e Ribas -  Energy Sector Analyst
[Energy Engineer (UnB)](https://www.unb.br/) │ [Data Scientist and Analytics (USP)](https://www5.usp.br/)


## [E-mail](mailto:viniciusgribas@gmail.com?Subject=%5BANP-PROJECT%5D%20-%20Contact)│ [Linkedin](https://www.linkedin.com/in/vinicius-guerra-e-ribas/) │[GitHub](https://github.com/viniciusgribas) 

---

# [Project Notebook](https://github.com/viniciusgribas/ANP-PROJECT/blob/main/Codigos_Python/Notebook_Master.ipynb)

---



# ANP PROJECT - [VBA] [PYTHON] - [GitHub](https://github.com/viniciusgribas/ANP-PROJECT)

## PART 1 - INTRODUCTION

The programming languages used were PYTHON and VBA EXCEL.


Seeking to simplify and make clear the flow of activities to obtain the final product, this notebook has been divided into 4 parts.

> PART 1 - INTRODUCTION
 -  Containing a short summary of how the project was developed, the basics and bibliography.

> PART 2 - EXCEL VBA
 - Introducing the `VBA` formulas developed in excel to be called in `python`.

> PART 3 - PYTHON
 - Presenting the `python` code used for this project.

> PART 4 - CONCLUSION
 - Final considerations of the project.

### 1.1 AUXILIARY BIBLIOGRAPHY

 - https://www.automateexcel.com/vba-code-examples/
 - https://www.rondebruin.nl/index.htm
 - https://www.xlwings.org/
 - https://pandas.pydata.org/docs/
 - http://timgolden.me.uk/pywin32-docs/contents.html
 - https://github.com/wesm/pydata-book
 - https://github.com/fzumstein/python-for-excel
 - https://github.com/raizen-analytics/data-engineering-test

### 1.2 PROJECT FLOW

1) At first, the files are only available in excel `".XLS"` format, under the name ["vendas-combustíveis-m3.xls"](https://github.com/viniciusgribas/ANP-PROJECT/tree/main/assets).

2. Within this initial file, there are two pivot tables that are the target. These are:

    - Pivot Table 1 ) "Vendas, pelas distribuidoras, dos derivados combustíveis de petróleo por Unidade da Federação e produto - 2000-2020 (m3)"
    
    - Pivot Table 2 ) "Vendas, pelas distribuidoras, de óleo diesel por tipo e Unidade da Federação - 2013-2020 (m3)"

3) This data, presented by the pivot tables, does not have its data source easily accessible in another spreadsheet. Also, the data is not available through the Excel shortcut: PivotTableTools>Analyze>Change Data Source. This shows the need to extract them using Excel's own `VBA` programming language. The advantage of extracting them this way is not only the reduced time for processes that could be long, but the possibility of applying them via `python`, through the `"wlwings" library`.

- The worksheet, once opened, has by default only one sheet, called "plan1".
- The macros created in VBA are available in the folder [`"\ANP-PROJECT\Codigos_VBA"`](https://github.com/viniciusgribas/ANP-PROJECT/tree/main/Codigos_VBA).
-  To extract this data, 4 macros were created in VBA. These are presented and described in **PART 2 EXCEL**

4) Once all the *VBA - MACROS* have been created, they can be called by `python` and applied there via `xlwings libery`.

5) After applying the Macros on python, the end products of the extraction are two files in `"CSV-UTF8"`:

 - [`PlanConsolidada1.CSV`](https://github.com/viniciusgribas/ANP-PROJECT/tree/main/assets)

 - [`PlanConsolidada2.CSV`](https://github.com/viniciusgribas/ANP-PROJECT/tree/main/assets)

6) These files were managed via the *[Pandas Libery](https://pandas.pydata.org/)* from python. Having the descriptive in **PART 3 PYTHON**

7) Finally, the final product of this project is two files in `"CSV-UTF8"` available in the folder [`"\ANP-PROJECT\Planilhas Finais"`](https://github.com/viniciusgribas/ANP-PROJECT/tree/main/Planilhas%20Finais), according to the following table:

| Column     | Type      |
|------------|-----------|
| year_month | date      |
| uf         | string    |
| product    | string    |
| unit       | string    |
| volume     | double    |
| created_at | timestamp |

 - `Sales_Of_Diesel_By_UF_And_Type.CSV`

 - `Sales_Of_Oil_Derivative_Fuels_By_UF_And_Product.CSV`



## PART 2 - EXCEL VBA


### The first *MACRO* used, created in *VBA - EXCEL*. - [`Desagrupar_TabDinFixa_1()`](https://github.com/viniciusgribas/ANP-PROJECT/blob/main/Codigos_VBA/M%C3%B3dulo1.bas)

Called  `Desagrupar_TabDinFixa_1()`, enables you to drill down on the data in pivot table 1 by performing a cell-by-cell extraction on its columns. Since each column of the pivot table corresponds to a year, an extra worksheet is generated for each year. So, after running this algorithm, the workspace will be left with the initial worksheet plus 21 other worksheets, corresponding to the interval from 2000 to 2020 and totaling 22 worksheets.

---


`Sub Desagrupar_TabDinFixa_1()`

`For Each c In Worksheets("Plan1").Range("C54:W54")`

`    c.Select`

`    Selection.ShowDetail = True`

`    Sheets("Plan1").Select`

`Next c`
    

`End Sub`

*MACRO EXCEL VISUAL BASIC - VBA*

---

### The second *MACRO* used, created in *VBA - EXCEL*. - [`ConsolidacaoDeDados_1()`](https://github.com/viniciusgribas/ANP-PROJECT/blob/main/Codigos_VBA/M%C3%B3dulo2.bas)

Called  `ConsolidacaoDeDados_1()` , it allows you to join all the previously disaggregated spreadsheets into a single one, called `"PlanConsolidada1"`. In addition to grouping them, it also adds a header containing the same headers as the spreadsheets generated by the pivot table. Furthermore, this algorithm deletes all the worksheets that were created, except for the first "Plan1" and exports the consolidated worksheet into an `CSV UTF-8` file called `"PlanConsolidada1.CSV"`.

---
`Sub ConsolidacaoDeDados_1()`

`    Dim sh As Worksheet`
    
`    Dim DestSh As Worksheet`
    
`    Dim Last As Long`
    
`    Dim shLast As Long`
    
`    Dim CopyRng As Range`
    
`    Dim StartRow As Long`
    
`    Dim Plan1 As String`
    
`    Dim PlanConsolidada As String`
    
`    Dim FPath As String`
    
    

`    With Application`

`        .ScreenUpdating = False`

`        .EnableEvents = False`

`    End With`
    
    
`    '#Define the name as "PlanConsolidada1" and the costom path to export in the end of this macro'`
    
`    FPath = "C:\Users\vinic\Documents\GitHub\ANP\ANP-PROJECT\assets\PlanConsolidada1"`
    
  

`    '#Delete the sheet "PlanConsolidada1" if it exist`

`    Application.DisplayAlerts = False`

`    On Error Resume Next`

`    ActiveWorkbook.Worksheets("PlanConsolidada1").Delete`

`    On Error GoTo 0`

`    Application.DisplayAlerts = False`
    
    

`    'Add a worksheet with the name "PlanConsolidada1"`

`    Set DestSh = ActiveWorkbook.Worksheets.Add`

`    DestSh.Name = "PlanConsolidada1"`
     
`    'Add a title in the first row"`
    
`    With Sheets("PlanConsolidada1")`

`    .Range("a1").Value = "COMBUSTÍVEL"`

`    .Range("b1").Value = "ANO"`

`    .Range("c1").Value = "REGIÃO"`

`    .Range("d1").Value = "ESTADO"`

`    .Range("e1").Value = "UNIDADE"`

`    .Range("f1").Value = "JAN"`

`    .Range("g1").Value = "FEV"`

`    .Range("h1").Value = "MAR"`

`    .Range("i1").Value = "ABR"`

`    .Range("j1").Value = "MAI"`

`    .Range("k1").Value = "JUN"`

`    .Range("l1").Value = "JUL"`

`    .Range("m1").Value = "AGO"`

`    .Range("n1").Value = "SET"`

`    .Range("o1").Value = "OUT"`

`    .Range("p1").Value = "NOV"`

`    .Range("q1").Value = "DEZ"`

`    .Range("r1").Value = "TOTAL"`


`End With`
    
    
`    'Fill in the start row (without header)`

`    StartRow = 2`

`    'loop through all worksheets and copy the data to the DestSh`

`    For Each sh In ActiveWorkbook.Worksheets`

`        'Loop through all worksheets except the PlanConsolidada worksheet and the Plan1 worksheet, you can ad more sheets to the array if you want.`

`        If IsError(Application.Match(sh.Name,Array(DestSh.Name, "Plan1"), 0)) Then`

`            'Find the last row with data on the DestSh and sh`
            
`            Last = DestSh.Cells.Find(What:="*", After:=DestSh.Range("A1"), Lookat:=xlPart, LookIn:=xlFormulas,SearchOrder:=xlByRows, SearchDirection:=xlPrevious, MatchCase:=False).Row`

`            shLast = sh.Cells.Find(What:="*", After:=sh.Range("A1"), Lookat:=xlPart, LookIn:=xlFormulas,SearchOrder:=xlByRows, SearchDirection:=xlPrevious, MatchCase:=False).Row`
            

`            'If sh is not empty and if the last row >= StartRow copy the CopyRng`

`            If shLast > 0 And shLast >= StartRow Then`

`                'Set the range that you want to copy `

`                Set CopyRng = sh.Range(sh.Rows(StartRow), sh.Rows(shLast))`

`                'Test if there enough rows in the DestSh to copy all the data`

`                If Last + CopyRng.Rows.Count > DestSh.Rows.Count Then`

`                    MsgBox "There are not enough rows in the Destsh"`

`                    GoTo ExitTheSub`

`                End If`

`                'This copies values/formats`

`                CopyRng.Copy`

`                With DestSh.Cells(Last + 1, "A")`

`                    .PasteSpecial xlPasteValues`

`                    .PasteSpecial xlPasteFormats`

`                    Application.CutCopyMode = False`

`                End With`
`
`           ` End If`
`
`     `   End If`

`    Next`
`
``'Export the new worksheet to a new archive'`


`ThisWorkbook.Sheets("PlanConsolidada1").Copy`

`ActiveWorkbook.SaveAs FPath, FileFormat:=xlCSVUTF8`





`'Delete worksheets after merge data keeping just the Plan1'`

`For Each sh In ThisWorkbook.Worksheets`

`    If sh.Name <> "Plan1" Then`

`       sh.Delete`

`    End If`

`Application.DisplayAlerts = False`

`Next sh`







`'FINAL'`

`ExitTheSub:`

`   'Application.GoTo DestSh.Cells(1)`

`    'AutoFit the column width in the DestSh sheet DestSh.Columns.AutoFit `
`
`    `With Application`

`        .ScreenUpdating = False`

`        .EnableEvents = False`

`    End With`

`End SubM`

*MACRO EXCEL VISUAL BASIC - VBA*

---

### The third *MACRO* used, created in *VBA - EXCEL*. - [`Desagrupar_TabDinFixa_2()`](https://github.com/viniciusgribas/ANP-PROJECT/blob/main/Codigos_VBA/M%C3%B3dulo1.bas)

Being called `Desagrupar_TabDinFixa_2()`, it works analogously to `Desagrupar_TabDinFixa_1()`. However, since this is applied to the second target pivot table, which has a smaller range of years (between 2013 and 2020), its final product corresponds to the initial worksheet plus 8 worksheets, totaling 9 worksheets in the final workbook.

In practice, the only difference to the `Ungroup_TabDinFixa_1()` algorithm is in the argument of the `.Range()` function, which changes according to the pivot table chosen as target.


### The fourth MACRO used, created in *VBA - EXCEL.* - [`ConsolidacaoDeDados_2()`](https://github.com/viniciusgribas/ANP-PROJECT/blob/main/Codigos_VBA/M%C3%B3dulo2.bas)

Being called `ConsolidacaoDeDados_2()`, it works analogously to the first one. Without much change in its code. The changes made to this algorithm, when compared to `ConsolidacaoDeDados_1()`, were basically the names of the exported sheets, which instead of being `"PlanConsolidada1"` became `"PlanConsolidada2"`.



## PART 3 - PYTHON

In [1]:
# Importing all the libraries that will be used
import pandas as pd
import xlwings as xw
import openpyxl as p
import numpy as np
import win32com.client as win32
import os

In [2]:

# Defining the path where the files are and will be saved
patch = "C:\\Users\\vinic\\Documents\\GitHub\\ANP\\ANP-PROJECT\\assets"

# Recognizing the files inside the folder
for dir, sub, files in os.walk(patch):
    for file in files:
        print(os.path.join(dir, file))

# Recognizing the name of the worksheet to work with
wk_name = "vendas-combustiveis-m3.xls"

# Defining the path where the worksheet to be worked on is located
wk_patch = patch + "\\" + wk_name

print("workbook patch: ", wk_patch)

C:\Users\vinic\Documents\GitHub\ANP\ANP-PROJECT\assets\PlanConsolidada1.CSV
C:\Users\vinic\Documents\GitHub\ANP\ANP-PROJECT\assets\PlanConsolidada2.CSV
C:\Users\vinic\Documents\GitHub\ANP\ANP-PROJECT\assets\vendas-combustiveis-m3.xls
workbook patch:  C:\Users\vinic\Documents\GitHub\ANP\ANP-PROJECT\assets\vendas-combustiveis-m3.xls


In [3]:

# Applying the address where the final files will be saved
patch_final = "C:\\Users\\vinic\\Documents\\GitHub\\ANP\\ANP-PROJECT\\Planilhas Finais\\"

# Determining a name for the final files
arquivo_final_1 = 'Sales_Of_Oil_Derivative_Fuels_By_UF_And_Product.csv'

arquivo_final_2 = 'Sales_Of_Diesel_By_UF_And_Type.csv'

In [4]:
# Shortcut to open excel application via win32
excel = win32.gencache.EnsureDispatch('Excel.Application')

excel.Workbooks.Open(wk_patch)

# Assigning active workbook to python via xlwings library. The excel application needs to be open to work.
wk = xw.books.open(wk_patch)

# Calling macros that already exist in excel and assigning them to new variables in python, via xlwings.

    # Macro created to disaggregate the data from the dynamic table 1 ("Sales of petroleum products by Federal Government and products")
desagrupar_TabDinFixa_1 = wk.app.macro("Desagrupar_TabDinFixa_1") 

    # Macro created to disaggregate the data from pivot table 2 ("Diesel sales by Federal Government and container")
desagrupar_TabDinFixa_2 = wk.app.macro("Desagrupar_TabDinFixa_2") 
    
    # Macro created to join the disaggregated data from pivot table 1 into a single DataFrame exported in CSV UTF 8, named "PlanConsolidated1"
consolidacaoDeDados_1 = wk.app.macro("ConsolidacaoDeDados_1") 

    # Macro created to join the disaggregated data from pivot table 2 into a single DataFrame exported in CSV UTF 8, named "PlanConsolidated2"
consolidacaoDeDados_2 = wk.app.macro("ConsolidacaoDeDados_2") 

# Close the excel application, to prevent bugs
excel.Application.Quit()

In [5]:
# Function created to apply the generation flow of DataFrame 1, containing the data of "Petroleum products sales by UF and products" in CSV.
def gerar_df_1():

    excel.Workbooks.Open(wk_patch)
    desagrupar_TabDinFixa_1()
    consolidacaoDeDados_1()
    excel.Application.Quit()
    
# Function created to apply the generation flow of the DataFrame 2, containing the data of "Diesel oil sales by UF and container" in CSV.
def gerar_df_2():
    excel.Workbooks.Open(wk_patch)
    desagrupar_TabDinFixa_2()
    consolidacaoDeDados_2()
    excel.Application.Quit()


In [6]:
# Execution flow for creating the DataFrames to be used

# 1 - Recognize the macros that are in VBA in python

# 2 - Generate the DF 1 - Petroleum products sales by UF and products
gerar_df_1()

# 3 - Generate the DF 2 - Diesel oil sales by UF and container
gerar_df_2()

PARTE 3 - Importando os arquivos criados e os colocando no formato:
 - Column │ Type

    - year_month │ date

   - uf │ string

   - product │ string

   - unit │ string

   - volume │ double
 
   - created_at │ timestamp

In [7]:
# Referencing the file name generated by the above flow. A process analogous to the one for the workbooks already done
df1_name = "PlanConsolidada1.CSV"
df2_name = "PlanConsolidada2.CSV"
df1_path = wk_patch = patch + "\\" + df1_name
df2_path = wk_patch = patch + "\\" + df2_name

df1 = pd.read_csv(df1_path)
df2 = pd.read_csv(df2_path)

# A simple function, to generate a line in print
def linha(x):
    print('\n')
    print('='*x)
    print('\n')

In [8]:
# the ".copy()" command preserves the structure of the original database

petroleo_df = df1.copy()
diesel_df = df2.copy()

In [9]:
# Checking for null values

print("DATA FRAME 1")

print(petroleo_df.isna().sum())

linha(30)

print("DATA FRAME 2")

print(diesel_df.isna().sum())

DATA FRAME 1
COMBUSTÍVEL      0
ANO              0
REGIÃO           0
ESTADO           0
UNIDADE          0
JAN              0
FEV              0
MAR              0
ABR              0
MAI              0
JUN              0
JUL              0
AGO              0
SET              0
OUT            216
NOV            216
DEZ            216
TOTAL            0
dtype: int64




DATA FRAME 2
COMBUSTÍVEL      0
ANO              0
REGIÃO           0
ESTADO           0
UNIDADE          0
JAN              0
FEV              0
MAR              0
ABR              0
MAI              0
JUN              0
JUL              0
AGO              0
SET              0
OUT            135
NOV            135
DEZ            135
TOTAL            0
dtype: int64


In [10]:
# Substituting NAN values for 0
petroleo_df = petroleo_df.fillna(0)
diesel_df = diesel_df.fillna(0)


# Checking if the values have been replaced
print("DATA FRAME 1")

print(petroleo_df.isna().sum())
linha(30)
print("DATA FRAME 2")

print(diesel_df.isna().sum())

DATA FRAME 1
COMBUSTÍVEL    0
ANO            0
REGIÃO         0
ESTADO         0
UNIDADE        0
JAN            0
FEV            0
MAR            0
ABR            0
MAI            0
JUN            0
JUL            0
AGO            0
SET            0
OUT            0
NOV            0
DEZ            0
TOTAL          0
dtype: int64




DATA FRAME 2
COMBUSTÍVEL    0
ANO            0
REGIÃO         0
ESTADO         0
UNIDADE        0
JAN            0
FEV            0
MAR            0
ABR            0
MAI            0
JUN            0
JUL            0
AGO            0
SET            0
OUT            0
NOV            0
DEZ            0
TOTAL          0
dtype: int64


In [11]:
# taking a look at the dataframe
petroleo_df.head()

Unnamed: 0,COMBUSTÍVEL,ANO,REGIÃO,ESTADO,UNIDADE,JAN,FEV,MAR,ABR,MAI,JUN,JUL,AGO,SET,OUT,NOV,DEZ,TOTAL
0,GASOLINA C (m3),2000,REGIÃO NORTE,RONDÔNIA,m3,9563.263,11341.229,9369.746,10719.983,11165.968,12312.451,11220.97,12482.281,13591.122,11940.57,11547.576,10818.094,136073.253
1,GASOLINA C (m3),2000,REGIÃO NORTE,ACRE,m3,3065.758,3495.29,2946.93,3023.92,3206.93,3612.58,3264.46,3835.74,3676.571,3225.61,3289.718,3358.346,40001.853
2,GASOLINA C (m3),2000,REGIÃO NORTE,AMAZONAS,m3,17615.604,20258.2,18741.344,19604.023,20221.674,20792.616,19912.898,21869.338,21145.643,20633.175,20766.918,21180.919,242742.352
3,GASOLINA C (m3),2000,REGIÃO NORTE,RORAIMA,m3,3259.3,3636.216,3631.569,3348.416,3394.016,4078.616,3346.616,4029.9,4358.516,3716.032,3200.4,3339.332,43338.929
4,GASOLINA C (m3),2000,REGIÃO NORTE,PARÁ,m3,28830.479,32297.047,27310.979,29396.384,26511.009,36553.25,31807.84,31009.972,29755.907,28661.951,28145.784,29294.796,359575.398


In [12]:
diesel_df.head()

Unnamed: 0,COMBUSTÍVEL,ANO,REGIÃO,ESTADO,UNIDADE,JAN,FEV,MAR,ABR,MAI,JUN,JUL,AGO,SET,OUT,NOV,DEZ,TOTAL
0,ÓLEO DIESEL S-10 (m3),2013,REGIÃO NORTE,RONDÔNIA,m3,3517.6,3681.7,4700.67,5339.2,6166.4,6539.65,7283.7,8082.85,7902.55,9383.15,9767.4,9088.8,81453.67
1,ÓLEO DIESEL S-10 (m3),2013,REGIÃO NORTE,ACRE,m3,363.0,410.0,536.0,607.0,740.0,756.0,971.0,1174.0,1240.0,1439.0,1483.0,1483.0,11202.0
2,ÓLEO DIESEL S-10 (m3),2013,REGIÃO NORTE,AMAZONAS,m3,3190.585,3305.0,3391.0,3637.0,4250.0,4576.0,5756.879,6228.636,6334.0,7154.2,6836.3,6784.232,61443.832
3,ÓLEO DIESEL S-10 (m3),2013,REGIÃO NORTE,RORAIMA,m3,795.4,757.2,939.8,1040.6,966.0,992.9,1027.0,1083.8,1311.2,1475.3,1502.7,1531.8,13423.7
4,ÓLEO DIESEL S-10 (m3),2013,REGIÃO NORTE,PARÁ,m3,30137.8,28146.3,31280.5,33033.05,33519.88,34321.53,37168.16,41248.336,40913.48,45383.5,44013.219,41975.03,441140.785


In [14]:
# Converting the sum of the month column values and the sum of the total column values into a list
petroleo_totals_lista = list(petroleo_df.iloc[0:,5:24].sum())


petroleo_total = 0

linha(30)
print("DATA FRAME 1")

# A test to see if the values of all months summed together, correspond to the totals
for i in range(12):
    petroleo_total = petroleo_totals_lista[i] + petroleo_total
print(" The sum of the values gives us:", petroleo_total)
print(" The sum of the totals gives us: ", petroleo_totals_lista[12])

diferenca_petroleo_total_e_valores = petroleo_total - petroleo_totals_lista[12]

print(" The difference between both is: ", diferenca_petroleo_total_e_valores)

# Here the same logic is applied as above 

diesel_totals_lista = list(diesel_df.iloc[0:,5:18].sum())
diesel_total = 0
linha(30)
print('DATA FRAME 2')

for i in range(12):
    diesel_total = diesel_totals_lista[i] + diesel_total

print(" The sum of the values gives us:", diesel_total)
print(" The sum of the totals gives us: ", diesel_totals_lista[12])

diferenca_diesel_total_e_valores = diesel_total - diesel_totals_lista[12]


print(" The difference between both is: ", diferenca_diesel_total_e_valores)








DATA FRAME 1
 The sum of the values gives us: 2369227066.362818
 The sum of the totals gives us:  2369226660.951311
 The difference between both is:  405.41150665283203




DATA FRAME 2
 The sum of the values gives us: 440145528.3683949
 The sum of the totals gives us:  440145528.36791205
 The difference between both is:  0.00048285722732543945


In [15]:
# Relocating the column "ANO"
cols1 = petroleo_df.columns.tolist()
cols1 = ['COMBUSTÍVEL',
 'REGIÃO',
 'ESTADO',
 'UNIDADE',
 'ANO',
 'JAN',
 'FEV',
 'MAR',
 'ABR',
 'MAI',
 'JUN',
 'JUL',
 'AGO',
 'SET',
 'OUT',
 'NOV',
 'DEZ',
 'TOTAL']

petroleo_df = petroleo_df[cols1]

petroleo_df.head(5)

# This exploratory analysis reproduces the same pivot table as it was in excel

tab_din_1 = petroleo_df.groupby("ANO").sum()
tab_din_1 = tab_din_1.T

round(tab_din_1,0)

ANO,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,...,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020
JAN,6995110.0,7180755.0,7196583.0,6688222.0,6770295.0,6709517.0,7050202.0,7470613.0,8256556.0,8215630.0,...,9110855.0,9708937.0,10931208.0,11429169.0,12047932.0,10514005.0,10428955.0,10803217.0,11230328.0,11338520.0
FEV,7416433.0,6548590.0,6655637.0,6294698.0,6362191.0,6597881.0,6733218.0,7066233.0,8048320.0,7840033.0,...,9262542.0,9877027.0,10117862.0,11230845.0,10490254.0,10792682.0,10121262.0,10220520.0,10802186.0,11090441.0
MAR,7350795.0,7655263.0,7480153.0,6414785.0,7587215.0,7604798.0,7713760.0,8306224.0,8604527.0,8870255.0,...,10148732.0,10929436.0,11053520.0,11521485.0,12095848.0,11601156.0,11853141.0,11724175.0,11243410.0,10645620.0
ABR,7282573.0,7228914.0,7314485.0,6637099.0,7362965.0,7243180.0,7057554.0,7601038.0,8706378.0,8897817.0,...,9656897.0,10186467.0,11377921.0,11874577.0,11767280.0,11199689.0,10570124.0,11109721.0,11477191.0,8872672.0
MAI,7517882.0,7611896.0,7475773.0,7064183.0,7065017.0,7300532.0,7575075.0,8026656.0,8760642.0,8639589.0,...,10158050.0,10775104.0,11503972.0,12132305.0,11449771.0,11195715.0,11374611.0,9901079.0,11650922.0,9497880.0
JUN,7788443.0,7657814.0,7072255.0,6636083.0,7227510.0,7475039.0,7432494.0,8059006.0,8798586.0,8990697.0,...,10184503.0,10506639.0,11035208.0,11283405.0,11867076.0,11238239.0,11399228.0,11645914.0,11073020.0,10181302.0
JUL,7434315.0,7630442.0,7537209.0,7351964.0,7738422.0,7492139.0,7539417.0,8209920.0,9168209.0,9506494.0,...,10360738.0,10882148.0,11691185.0,12215763.0,12183784.0,11414542.0,11617861.0,11530423.0,12283090.0,11276882.0
AGO,7840944.0,8003568.0,7541355.0,7131433.0,7757805.0,8042449.0,8036491.0,8752805.0,9051564.0,9242321.0,...,10979441.0,11592307.0,12082456.0,12538269.0,12046884.0,11851856.0,12116749.0,12373865.0,12326808.0,11208269.0
SET,7522385.0,7500609.0,7523606.0,7308319.0,7739156.0,7720372.0,7892382.0,8127878.0,9368012.0,9481387.0,...,10692546.0,10717877.0,11451925.0,12592408.0,11985710.0,11644013.0,11732360.0,11411721.0,11664500.0,11554429.0
OUT,7686804.0,7952203.0,8243983.0,7718482.0,7595617.0,7386628.0,7980418.0,9020457.0,9550067.0,10055111.0,...,10519427.0,11858856.0,12466574.0,13289592.0,12526918.0,11425058.0,11939680.0,12073887.0,12695469.0,0.0


In [16]:
total_dif1 = 0

# Checking: 1) the observed differences by year; 2) the sum of all differences.

print("DATA FRAME 1")
print("\n")
for i in range(21):
    diferenca1 = (tab_din_1.iloc[0:12,i].sum() - tab_din_1.iloc[12,i].sum())
    total_dif1 = total_dif1 + diferenca1
    print("Difference verified for the year ending in",i, " = ",diferenca1)

linha(30)
diferenca1 = (tab_din_1.iloc[0:12,20].sum() - tab_din_1.iloc[12,20].sum())
print("THE MAIN DIFFERENCE WAS IN 2020 = " , diferenca1)
print("THE TOTAL DIFFERENCE WAS = " , total_dif1)

# It follows from these analyses that the difference is very low. It is however saved.

DATA FRAME 1


Difference verified for the year ending in 0  =  -0.0014043450355529785
Difference verified for the year ending in 1  =  -0.00012385845184326172
Difference verified for the year ending in 2  =  -1.2978911399841309e-05
Difference verified for the year ending in 3  =  -0.00022134184837341309
Difference verified for the year ending in 4  =  -0.00024077296257019043
Difference verified for the year ending in 5  =  0.0005107522010803223
Difference verified for the year ending in 6  =  -0.0004882961511611938
Difference verified for the year ending in 7  =  0.0018863528966903687
Difference verified for the year ending in 8  =  0.004105135798454285
Difference verified for the year ending in 9  =  0.006928756833076477
Difference verified for the year ending in 10  =  0.002915412187576294
Difference verified for the year ending in 11  =  0.0007937699556350708
Difference verified for the year ending in 12  =  0.0020993202924728394
Difference verified for the year ending in 13  =  0.

In [17]:
# The same analyses performed for Data Frame 1 are repeated for Data Frame 2

diesel_df = diesel_df[cols1]
diesel_df.head(5)

tab_din_2 = diesel_df.groupby("ANO").sum()
tab_din_2 = tab_din_2.T

round(tab_din_2,0)

ANO,2013,2014,2015,2016,2017,2018,2019,2020
JAN,4456693.0,4566321.0,4732999.0,3942870.0,3959167.0,4135742.0,4391503.0,4432971.0
FEV,4276021.0,4679585.0,4071621.0,4284567.0,4034946.0,4120482.0,4375219.0,4514232.0
MAR,4696752.0,4815103.0,5013802.0,4751359.0,4852097.0,4825773.0,4554753.0,4710564.0
ABR,4943159.0,4885146.0,4738923.0,4572944.0,4146624.0,4618470.0,4653654.0,4004817.0
MAI,4928346.0,5131919.0,4636557.0,4499733.0,4614687.0,3772603.0,4796718.0,4360350.0
JUN,4708673.0,4707725.0,4863309.0,4616496.0,4677454.0,5011752.0,4653211.0,4696043.0
JUL,5119508.0,5186601.0,4963402.0,4697057.0,4821464.0,4982153.0,5187032.0,5231199.0
AGO,5369365.0,5350987.0,5017610.0,4903385.0,5001582.0,5197650.0,5284081.0,5164439.0
SET,5029823.0,5355678.0,4932081.0,4775598.0,4856584.0,4759711.0,4891111.0,5237176.0
OUT,5483350.0,5732737.0,5181460.0,4631472.0,4915778.0,5058821.0,5415773.0,0.0


In [18]:
# The same analyses performed for Data Frame 1 are repeated for Data Frame 2

total_dif2 = 0

print("DATA FRAME 2")
print("\n")

for i in range(8):
    diferenca2 = (tab_din_2.iloc[0:12,i].sum() - tab_din_2.iloc[12,i].sum())
    total_dif2 = total_dif2 + diferenca2
    print("Difference verified for the year ending in",i+13, " = ",diferenca2)

linha(30)

diferenca2 = (tab_din_2.iloc[0:12,4].sum() - tab_din_2.iloc[12,4].sum())

print("THE MAIN DIFFERENCE WAS IN 2017 = " , diferenca2)
print("THE TOTAL DIFFERENCE WAS = " , total_dif2)

# It follows from these analyses that the difference is very low. It is however saved.


DATA FRAME 2


Difference verified for the year ending in 13  =  7.450580596923828e-09
Difference verified for the year ending in 14  =  -1.4901161193847656e-08
Difference verified for the year ending in 15  =  -1.4901161193847656e-08
Difference verified for the year ending in 16  =  7.450580596923828e-09
Difference verified for the year ending in 17  =  0.00048291683197021484
Difference verified for the year ending in 18  =  1.4901161193847656e-08
Difference verified for the year ending in 19  =  2.2351741790771484e-08
Difference verified for the year ending in 20  =  0.0




THE MAIN DIFFERENCE WAS IN 2017 =  0.00048291683197021484
THE TOTAL DIFFERENCE WAS =  0.0004829391837120056


In [19]:
# Renaming the columns

petroleo_df.columns = ['COMBUSTÍVEL',
 'REGIÃO',
 'ESTADO',
 'UNIDADE',
 'ANO',
 '1',
 '2',
 '3',
 '4',
 '5',
 '6',
 '7',
 '8',
 '9',
 '10',
 '11',
 '12',
 'TOTAL']

petroleo_df.head(5)



Unnamed: 0,COMBUSTÍVEL,REGIÃO,ESTADO,UNIDADE,ANO,1,2,3,4,5,6,7,8,9,10,11,12,TOTAL
0,GASOLINA C (m3),REGIÃO NORTE,RONDÔNIA,m3,2000,9563.263,11341.229,9369.746,10719.983,11165.968,12312.451,11220.97,12482.281,13591.122,11940.57,11547.576,10818.094,136073.253
1,GASOLINA C (m3),REGIÃO NORTE,ACRE,m3,2000,3065.758,3495.29,2946.93,3023.92,3206.93,3612.58,3264.46,3835.74,3676.571,3225.61,3289.718,3358.346,40001.853
2,GASOLINA C (m3),REGIÃO NORTE,AMAZONAS,m3,2000,17615.604,20258.2,18741.344,19604.023,20221.674,20792.616,19912.898,21869.338,21145.643,20633.175,20766.918,21180.919,242742.352
3,GASOLINA C (m3),REGIÃO NORTE,RORAIMA,m3,2000,3259.3,3636.216,3631.569,3348.416,3394.016,4078.616,3346.616,4029.9,4358.516,3716.032,3200.4,3339.332,43338.929
4,GASOLINA C (m3),REGIÃO NORTE,PARÁ,m3,2000,28830.479,32297.047,27310.979,29396.384,26511.009,36553.25,31807.84,31009.972,29755.907,28661.951,28145.784,29294.796,359575.398


In [20]:
# Renaming the columns

diesel_df.columns = ['COMBUSTÍVEL',
 'REGIÃO',
 'ESTADO',
 'UNIDADE',
 'ANO',
 '1',
 '2',
 '3',
 '4',
 '5',
 '6',
 '7',
 '8',
 '9',
 '10',
 '11',
 '12',
 'TOTAL']

diesel_df.head(5)

Unnamed: 0,COMBUSTÍVEL,REGIÃO,ESTADO,UNIDADE,ANO,1,2,3,4,5,6,7,8,9,10,11,12,TOTAL
0,ÓLEO DIESEL S-10 (m3),REGIÃO NORTE,RONDÔNIA,m3,2013,3517.6,3681.7,4700.67,5339.2,6166.4,6539.65,7283.7,8082.85,7902.55,9383.15,9767.4,9088.8,81453.67
1,ÓLEO DIESEL S-10 (m3),REGIÃO NORTE,ACRE,m3,2013,363.0,410.0,536.0,607.0,740.0,756.0,971.0,1174.0,1240.0,1439.0,1483.0,1483.0,11202.0
2,ÓLEO DIESEL S-10 (m3),REGIÃO NORTE,AMAZONAS,m3,2013,3190.585,3305.0,3391.0,3637.0,4250.0,4576.0,5756.879,6228.636,6334.0,7154.2,6836.3,6784.232,61443.832
3,ÓLEO DIESEL S-10 (m3),REGIÃO NORTE,RORAIMA,m3,2013,795.4,757.2,939.8,1040.6,966.0,992.9,1027.0,1083.8,1311.2,1475.3,1502.7,1531.8,13423.7
4,ÓLEO DIESEL S-10 (m3),REGIÃO NORTE,PARÁ,m3,2013,30137.8,28146.3,31280.5,33033.05,33519.88,34321.53,37168.16,41248.336,40913.48,45383.5,44013.219,41975.03,441140.785


In [21]:
# Joining the values referring to the year column and the columns referring to the months (1 to 12)_ 
# This is accomplished by the "".melt()"" function.

temp1 = petroleo_df.melt(id_vars=["COMBUSTÍVEL","ESTADO","UNIDADE", "ANO"], 
                        value_vars=['1','2','3','4','5','6','7','8','9','10','11','12'])

temp1['year_month'] = temp1[['ANO','variable']].astype(str).agg('-'.join,1)


temp2 = diesel_df.melt(id_vars=["COMBUSTÍVEL","ESTADO","UNIDADE", "ANO"], 
                        value_vars=['1','2','3','4','5','6','7','8','9','10','11','12'])

temp2['year_month'] = temp2[['ANO','variable']].astype(str).agg('-'.join,1)


In [22]:
# Checking that the difference at the end of this analysis is the same as the difference that has already been saved.

diferenca3 = round(temp1['value'].sum(),2) - round(petroleo_total,2)

print("DataFrame 1 difference: ", diferenca3)

linha(30)

diferenca4 = round(temp2['value'].sum(),2) - round(diesel_total,2)

print("DataFrame 2 difference: ", diferenca4)

DataFrame 1 difference:  0.0




DataFrame 2 difference:  0.0


In [23]:
# Viewing the data as it is so far. Preserving what has already been done in the temporary variable "temp".

petroleo_df_final = temp1.copy()

print(petroleo_df_final.head())

linha(30)

diesel_df_final = temp2.copy()

print(diesel_df_final.head())



       COMBUSTÍVEL    ESTADO UNIDADE   ANO variable      value year_month
0  GASOLINA C (m3)  RONDÔNIA      m3  2000        1   9563.263     2000-1
1  GASOLINA C (m3)      ACRE      m3  2000        1   3065.758     2000-1
2  GASOLINA C (m3)  AMAZONAS      m3  2000        1  17615.604     2000-1
3  GASOLINA C (m3)   RORAIMA      m3  2000        1   3259.300     2000-1
4  GASOLINA C (m3)      PARÁ      m3  2000        1  28830.479     2000-1




             COMBUSTÍVEL    ESTADO UNIDADE   ANO variable      value  \
0  ÓLEO DIESEL S-10 (m3)  RONDÔNIA      m3  2013        1   3517.600   
1  ÓLEO DIESEL S-10 (m3)      ACRE      m3  2013        1    363.000   
2  ÓLEO DIESEL S-10 (m3)  AMAZONAS      m3  2013        1   3190.585   
3  ÓLEO DIESEL S-10 (m3)   RORAIMA      m3  2013        1    795.400   
4  ÓLEO DIESEL S-10 (m3)      PARÁ      m3  2013        1  30137.800   

  year_month  
0     2013-1  
1     2013-1  
2     2013-1  
3     2013-1  
4     2013-1  


In [24]:
# Excluding columns "ANO" anda "variable"
 
petroleo_df_final = petroleo_df_final.drop(columns=["ANO", "variable"])

print(petroleo_df_final.head())

linha(30)

diesel_df_final = diesel_df_final.drop(columns=["ANO", "variable"])

print(diesel_df_final.head())


       COMBUSTÍVEL    ESTADO UNIDADE      value year_month
0  GASOLINA C (m3)  RONDÔNIA      m3   9563.263     2000-1
1  GASOLINA C (m3)      ACRE      m3   3065.758     2000-1
2  GASOLINA C (m3)  AMAZONAS      m3  17615.604     2000-1
3  GASOLINA C (m3)   RORAIMA      m3   3259.300     2000-1
4  GASOLINA C (m3)      PARÁ      m3  28830.479     2000-1




             COMBUSTÍVEL    ESTADO UNIDADE      value year_month
0  ÓLEO DIESEL S-10 (m3)  RONDÔNIA      m3   3517.600     2013-1
1  ÓLEO DIESEL S-10 (m3)      ACRE      m3    363.000     2013-1
2  ÓLEO DIESEL S-10 (m3)  AMAZONAS      m3   3190.585     2013-1
3  ÓLEO DIESEL S-10 (m3)   RORAIMA      m3    795.400     2013-1
4  ÓLEO DIESEL S-10 (m3)      PARÁ      m3  30137.800     2013-1


In [25]:
# Renaming the columns

cols2 = ['year_month', 'ESTADO', 'COMBUSTÍVEL', 'UNIDADE','value']

petroleo_df_final = petroleo_df_final[cols2]
diesel_df_final = diesel_df_final[cols2]

nomes_DF = ('year_month', 'uf', 'product', 'unit','volume')

petroleo_df_final.columns = nomes_DF
diesel_df_final.columns = nomes_DF


print(petroleo_df_final.head())

linha(30)


print(diesel_df_final.head())

  year_month        uf          product unit     volume
0     2000-1  RONDÔNIA  GASOLINA C (m3)   m3   9563.263
1     2000-1      ACRE  GASOLINA C (m3)   m3   3065.758
2     2000-1  AMAZONAS  GASOLINA C (m3)   m3  17615.604
3     2000-1   RORAIMA  GASOLINA C (m3)   m3   3259.300
4     2000-1      PARÁ  GASOLINA C (m3)   m3  28830.479




  year_month        uf                product unit     volume
0     2013-1  RONDÔNIA  ÓLEO DIESEL S-10 (m3)   m3   3517.600
1     2013-1      ACRE  ÓLEO DIESEL S-10 (m3)   m3    363.000
2     2013-1  AMAZONAS  ÓLEO DIESEL S-10 (m3)   m3   3190.585
3     2013-1   RORAIMA  ÓLEO DIESEL S-10 (m3)   m3    795.400
4     2013-1      PARÁ  ÓLEO DIESEL S-10 (m3)   m3  30137.800


In [26]:
# Adjusting the "product" column by deleting redundant information.

petroleo_df_final["product"] = petroleo_df_final["product"].str.slice(stop = -4)

print(petroleo_df_final.head())

linha(30)

diesel_df_final["product"] = diesel_df_final["product"].str.slice(stop = -4)
print(diesel_df_final.head())



  year_month        uf      product unit     volume
0     2000-1  RONDÔNIA  GASOLINA C    m3   9563.263
1     2000-1      ACRE  GASOLINA C    m3   3065.758
2     2000-1  AMAZONAS  GASOLINA C    m3  17615.604
3     2000-1   RORAIMA  GASOLINA C    m3   3259.300
4     2000-1      PARÁ  GASOLINA C    m3  28830.479




  year_month        uf            product unit     volume
0     2013-1  RONDÔNIA  ÓLEO DIESEL S-10    m3   3517.600
1     2013-1      ACRE  ÓLEO DIESEL S-10    m3    363.000
2     2013-1  AMAZONAS  ÓLEO DIESEL S-10    m3   3190.585
3     2013-1   RORAIMA  ÓLEO DIESEL S-10    m3    795.400
4     2013-1      PARÁ  ÓLEO DIESEL S-10    m3  30137.800


In [27]:
# Setting the "year_month" column to Datetime64 format.

petroleo_df_final["year_month"] = pd.to_datetime(petroleo_df_final["year_month"])

print(petroleo_df_final.dtypes)
linha(30)

diesel_df_final["year_month"] = pd.to_datetime(diesel_df_final["year_month"])

print(diesel_df_final.dtypes)


year_month    datetime64[ns]
uf                    object
product               object
unit                  object
volume               float64
dtype: object




year_month    datetime64[ns]
uf                    object
product               object
unit                  object
volume               float64
dtype: object


In [28]:
# Checking the names in the UF column seeking to abbreviate them.

print(petroleo_df_final["uf"].unique())
linha(30)
print(diesel_df_final["uf"].unique())



['RONDÔNIA' 'ACRE' 'AMAZONAS' 'RORAIMA' 'PARÁ' 'AMAPÁ' 'TOCANTINS'
 'MARANHÃO' 'PIAUÍ' 'CEARÁ' 'RIO GRANDE DO NORTE' 'PARAÍBA' 'PERNAMBUCO'
 'ALAGOAS' 'SERGIPE' 'BAHIA' 'MINAS GERAIS' 'ESPÍRITO SANTO'
 'RIO DE JANEIRO' 'SÃO PAULO' 'PARANÁ' 'SANTA CATARINA'
 'RIO GRANDE DO SUL' 'MATO GROSSO DO SUL' 'MATO GROSSO' 'GOIÁS'
 'DISTRITO FEDERAL']




['RONDÔNIA' 'ACRE' 'AMAZONAS' 'RORAIMA' 'PARÁ' 'AMAPÁ' 'TOCANTINS'
 'MARANHÃO' 'PIAUÍ' 'CEARÁ' 'RIO GRANDE DO NORTE' 'PARAÍBA' 'PERNAMBUCO'
 'ALAGOAS' 'SERGIPE' 'BAHIA' 'MINAS GERAIS' 'ESPÍRITO SANTO'
 'RIO DE JANEIRO' 'SÃO PAULO' 'PARANÁ' 'SANTA CATARINA'
 'RIO GRANDE DO SUL' 'MATO GROSSO DO SUL' 'MATO GROSSO' 'GOIÁS'
 'DISTRITO FEDERAL']


In [29]:
# This dictionary being applied with the ".map()" function converts the states into abbreviations

petroleo_df_final["uf"] = petroleo_df_final.uf.map({
'RONDÔNIA':'RO', 'ACRE':'AC', 'AMAZONAS':'AM', 'RORAIMA':'RR', 'PARÁ':'PA', 'AMAPÁ':'AP',
       'TOCANTINS':'TO', 'MARANHÃO':'MA', 'PIAUÍ':'PI', 'CEARÁ':'CE', 'RIO GRANDE DO NORTE':'RN',
       'PARAÍBA':'PB', 'PERNAMBUCO':'PE', 'ALAGOAS':'AL', 'SERGIPE':'SE', 'BAHIA':'BA',
       'MINAS GERAIS':'MG', 'ESPÍRITO SANTO':'ES', 'RIO DE JANEIRO':'RJ', 'SÃO PAULO':'SP',
       'PARANÁ':'PR', 'SANTA CATARINA':'SC', 'RIO GRANDE DO SUL':'RS',
       'MATO GROSSO DO SUL':'MS', 'MATO GROSSO':'MT', 'GOIÁS':'GO', 'DISTRITO FEDERAL':'DF'})



diesel_df_final["uf"] = diesel_df_final.uf.map({
'RONDÔNIA':'RO', 'ACRE':'AC', 'AMAZONAS':'AM', 'RORAIMA':'RR', 'PARÁ':'PA', 'AMAPÁ':'AP',
       'TOCANTINS':'TO', 'MARANHÃO':'MA', 'PIAUÍ':'PI', 'CEARÁ':'CE', 'RIO GRANDE DO NORTE':'RN',
       'PARAÍBA':'PB', 'PERNAMBUCO':'PE', 'ALAGOAS':'AL', 'SERGIPE':'SE', 'BAHIA':'BA',
       'MINAS GERAIS':'MG', 'ESPÍRITO SANTO':'ES', 'RIO DE JANEIRO':'RJ', 'SÃO PAULO':'SP',
       'PARANÁ':'PR', 'SANTA CATARINA':'SC', 'RIO GRANDE DO SUL':'RS',
       'MATO GROSSO DO SUL':'MS', 'MATO GROSSO':'MT', 'GOIÁS':'GO', 'DISTRITO FEDERAL':'DF'})

In [30]:
# Checking for null values

print("Null in DF1 abbreviations: ", petroleo_df_final["uf"].isnull().sum())

linha(30)

print("Null in DF2 abbreviations: ", diesel_df_final["uf"].isnull().sum())



Null in DF1 abbreviations:  0




Null in DF2 abbreviations:  0


In [31]:
print(petroleo_df_final.head())
linha(30)
print(diesel_df_final.head())

  year_month  uf      product unit     volume
0 2000-01-01  RO  GASOLINA C    m3   9563.263
1 2000-01-01  AC  GASOLINA C    m3   3065.758
2 2000-01-01  AM  GASOLINA C    m3  17615.604
3 2000-01-01  RR  GASOLINA C    m3   3259.300
4 2000-01-01  PA  GASOLINA C    m3  28830.479




  year_month  uf            product unit     volume
0 2013-01-01  RO  ÓLEO DIESEL S-10    m3   3517.600
1 2013-01-01  AC  ÓLEO DIESEL S-10    m3    363.000
2 2013-01-01  AM  ÓLEO DIESEL S-10    m3   3190.585
3 2013-01-01  RR  ÓLEO DIESEL S-10    m3    795.400
4 2013-01-01  PA  ÓLEO DIESEL S-10    m3  30137.800


In [32]:
# Converting to the requested formats and creating a new column "created_at", using the function pd.to_datetime('today')

petroleo_df_final['uf'] = petroleo_df_final['uf'].astype(str)
petroleo_df_final['product'] = petroleo_df_final['product'].astype(str)
petroleo_df_final['unit'] = petroleo_df_final['unit'].astype(str)
petroleo_df_final['volume'] = petroleo_df_final['volume'].astype(float)
petroleo_df_final['volume'] =round(petroleo_df_final['volume'] ,2)
petroleo_df_final['created_at'] = pd.to_datetime('today').strftime("%Y-%m-%d")

diesel_df_final['uf'] = diesel_df_final['uf'].astype(str)
diesel_df_final['product'] = diesel_df_final['product'].astype(str)
diesel_df_final['unit'] = diesel_df_final['unit'].astype(str)
diesel_df_final['volume'] = diesel_df_final['volume'].astype(float)
diesel_df_final['volume'] =round(diesel_df_final['volume'] ,2)
diesel_df_final['created_at'] = pd.to_datetime('today').strftime("%Y-%m-%d")



print(petroleo_df_final.head())
linha(30)
print(diesel_df_final.head())

  year_month  uf      product unit    volume  created_at
0 2000-01-01  RO  GASOLINA C    m3   9563.26  2022-03-18
1 2000-01-01  AC  GASOLINA C    m3   3065.76  2022-03-18
2 2000-01-01  AM  GASOLINA C    m3  17615.60  2022-03-18
3 2000-01-01  RR  GASOLINA C    m3   3259.30  2022-03-18
4 2000-01-01  PA  GASOLINA C    m3  28830.48  2022-03-18




  year_month  uf            product unit    volume  created_at
0 2013-01-01  RO  ÓLEO DIESEL S-10    m3   3517.60  2022-03-18
1 2013-01-01  AC  ÓLEO DIESEL S-10    m3    363.00  2022-03-18
2 2013-01-01  AM  ÓLEO DIESEL S-10    m3   3190.58  2022-03-18
3 2013-01-01  RR  ÓLEO DIESEL S-10    m3    795.40  2022-03-18
4 2013-01-01  PA  ÓLEO DIESEL S-10    m3  30137.80  2022-03-18


In [33]:
# Exporting the final files in CSV UTF8 format

petroleo_df_final.to_csv(patch_final+arquivo_final_1, encoding='utf-8', index=False)
diesel_df_final.to_csv(patch_final+arquivo_final_2, encoding='utf-8', index=False)