# Cálculos

Podemos usar este caderno para calcular algumas estatísticas a respeito dos dados compilados.

In [4]:
# load the events

import pandas as pd

df = pd.read_csv("chuvas.csv", encoding="latin1", parse_dates=["date"])
print(df.head())
print(df.dtypes)

   station       date  monthly_total  sub_basin  latitude  longitude    name  \
0  1036005 2024-01-10           41.1         49   -10.285   -36.5564  Penedo   
1  1036005 2024-01-09           23.8         49   -10.285   -36.5564  Penedo   
2  1036005 2024-01-08           48.6         49   -10.285   -36.5564  Penedo   
3  1036005 2024-01-07          164.2         49   -10.285   -36.5564  Penedo   
4  1036005 2024-01-06          257.6         49   -10.285   -36.5564  Penedo   

     state municipality  
0  Alagoas       Penedo  
1  Alagoas       Penedo  
2  Alagoas       Penedo  
3  Alagoas       Penedo  
4  Alagoas       Penedo  
station                   int64
date             datetime64[ns]
monthly_total           float64
sub_basin                 int64
latitude                float64
longitude               float64
name                     object
state                    object
municipality             object
dtype: object


Aqui calculamos as precipitações anuais por sub-bacia e ao longo de toda a bacia do São Francisco usando médias simples.

In [5]:
# calculate yearly totals of sub basins
df["year"] = df["date"].dt.year
df_annual = df.groupby(["sub_basin", "year"])["monthly_total"].mean().reset_index()
df_annual.columns = ["sub_basin", "year", "total"]

# export to csv
df_annual.to_csv("chuvas_anual.csv", index=False)

print(df_annual.head())

# calculate yearly totals of all basins
df_annual_all = df.groupby("year")["monthly_total"].mean().reset_index()
df_annual_all.columns = ["year", "total"]

print(df_annual_all.head())

# export to csv
df_annual_all.to_csv("chuvas_anual_all.csv", index=False)

   sub_basin  year       total
0         40  2004  148.679736
1         40  2005  134.404148
2         40  2006  127.906127
3         40  2007  112.834586
4         40  2008  169.484211
   year       total
0  2004  138.677304
1  2005  118.910801
2  2006  100.916377
3  2007   96.485333
4  2008  133.728768


Após calcular os totais anuais, ainda podemos extrair algumas estatísticas do nosso banco de dados. Como anos mais chuvosos e secos de cada sub-bacia e ao longo de toda a bacia.

In [20]:
# calculate a bunch of statistics

wettest_year = df_annual.loc[df_annual["total"].idxmax()]
driest_year = df_annual.loc[df_annual["total"].idxmin()]

print(f'Wettest year: {wettest_year["year"]} with {wettest_year["total"]} mm')
print(f'Driest year: {driest_year["year"]} with {driest_year["total"]} mm')

sub_basin_wettest_year = df_annual.loc[df_annual.groupby("sub_basin")["total"].idxmax()]
sub_basin_driest_year = df_annual.loc[df_annual.groupby("sub_basin")["total"].idxmin()]

print("")

print(f"Wettest years by sub basin:")
print(sub_basin_wettest_year)

print("")

print(f"Driest years by sub basin:")
print(sub_basin_driest_year)


Wettest year: 2021.0 with 209.68444444444444 mm
Driest year: 2012.0 with 22.453521126760563 mm

Wettest years by sub basin:
     sub_basin  year       total   variation
4           40  2008  169.484211   50.205904
28          41  2011  169.005670   36.971916
62          42  2024  195.520482  108.980742
80          43  2021  209.684444   45.332066
101         44  2021  154.093846   19.807372
122         45  2021  157.518310   10.693541
146         46  2024  190.120408  180.108379
147         47  2004  101.443421         NaN
168         48  2004   83.598276         NaN
207         49  2022   95.773636   34.438007

Driest years by sub basin:
     sub_basin  year      total  variation
10          40  2014  70.535446 -41.357627
31          41  2014  63.980269 -52.231016
61          42  2023  93.559091 -39.109083
74          43  2015  73.045098 -31.533593
95          44  2015  49.671631 -18.763176
124         45  2023  62.228395 -53.161561
145         46  2023  67.873874 -45.771773
158      