# Obteniendo el **número de muertes**  confirmadas por COVID-19 por **semana epidemiológica** de cada uno de los **24 departamentos** del Perú

## 0. Run libraries and functions
Empezamos cargando las librerías que necesitamos y leyendo nuestro csv actualizado de fallecidos por COVID-19.

In [10]:
import pandas as pd
import numpy as np

import functions as fn

In [11]:
def just_cities(fal_url):
    """
    Función para filtrar a cada fallecido reportado en su año
    y semana epidemiológica en ciudad (Fallecidos = fal, Vacunados = vac)
    """         
    fal_col = ['FECHA_FALLECIMIENTO', 'DEPARTAMENTO']   # Get only date and department 
    df_fal = fn.read_largeCSV_file(fal_url, ';', fal_col)
    fn.variable_fecha(df_fal, 'FECHA_FALLECIMIENTO')
    fn.date_to_epiweek(df_fal,'FECHA_FALLECIMIENTO')
    df_fal['fallecido'] = 1     # To count cases                                   
    df_fal['fallecido'].apply(np.int8) 
    
    return df_fal

## 1. Add the year and epidemiological week per each death case

In [12]:
fal_url = "RawData/fallecidos_covid.csv"
falxdep = just_cities(fal_url)
del fal_url

Print the result of the falxdep (**d**eaths confirmed from COVID-19 by de**p**artment a**n**d by Epidemiological **w**eek)

In [13]:
print("Head of the 'falxdep' dataframe of each death confirmed case from COVID-19 in Peru:")
print(falxdep.head()), print("\n")

print('TOTAL of confirmed death cases from COVID-19 in Peru: ' + str(falxdep['fallecido'].sum())) 

Head of the 'falxdep' dataframe of each death confirmed case from COVID-19 in Peru:
  DEPARTAMENTO  epi_year  epi_week  fallecido
0         LIMA      2021        17          1
1         LIMA      2021        17          1
2     AYACUCHO      2021        17          1
3         LIMA      2021        16          1
4     AREQUIPA      2021        31          1


TOTAL of confirmed death cases from COVID-19 in Peru: 200246


Crosstab of the total deaths from COVID-19 by department of Peru, for each epidemiological week and year

In [14]:
ct_falxdep = pd.crosstab(index=[falxdep['epi_year'], falxdep['epi_week']],
                       columns=[falxdep['fallecido'], falxdep['DEPARTAMENTO']],
                       margins = True)

ct_falxdep.to_csv('Data/fallecidos_x_departamentos_x_semanasEpi.csv')

Fix the indexs and print the **c**ross**t**ab result of the **falxdep**

In [15]:
ct_falxdep = pd.read_csv('Data/fallecidos_x_departamentos_x_semanasEpi.csv')

def filtering_data_dep(falxdep_df):
    """Function to fix the indexes of the data of deceased by department of Peru.
    IMPORTANT: There are more efficient ways to modify indexes using 'loc' and 
    'iloc' but this method at least 'works'"""

    time = falxdep_df[["fallecido", "Unnamed: 1"]]   # Get the col of epidemiological weeks
    time = time.rename(columns=time.iloc[1])     # Put the first row (epi_week) as header
    time = time.drop([0,1, len(time)-1],axis=0)  # Drop the first and last row (header, nan and total)
    time = time.reset_index(drop=True)           # Reset index

    departments = falxdep_df.drop(["Unnamed: 1", 'fallecido'], axis=1)    # Drop cols that are not departments
    departments = departments.rename(columns=departments.iloc[0])     # Put the first row (epi_week) as header
    departments = departments.drop([0,1, len(departments)-1],axis=0)  # Drop the first and last row (header, nan and total)
    departments = departments.reset_index(drop=True)                  # Reset index

    falxdep_df = pd.concat([time, departments], axis=1)
    return falxdep_df

ct_falxdep_fix = filtering_data_dep(ct_falxdep)

print("Head of the crosstab of each death confirmed case from COVID-19 by year and epidemiological week:")
print(ct_falxdep_fix.head(10))

del ct_falxdep
ct_falxdep_fix.to_csv('Data/fallecidos_x_departamentos_x_semanasEpi.csv', index = False)

Head of the crosstab of each death confirmed case from COVID-19 by year and epidemiological week:
  epi_year epi_week AMAZONAS ANCASH APURIMAC AREQUIPA AYACUCHO CAJAMARCA  \
0     2020       10        0      0        0        0        0         0   
1     2020       11        0      0        0        0        0         0   
2     2020       12        0      0        0        0        1         0   
3     2020       13        0      3        0        0        0         0   
4     2020       14        0      0        0       15        0         2   
5     2020       15        1      9        1        7        1         4   
6     2020       16        0     24        0        7        2         4   
7     2020       17        0     47        3       13        3         4   
8     2020       18        2     67        0       14        1         5   
9     2020       19        7    102        1       18        7         4   

  CALLAO CUSCO  ... MADRE DE DIOS MOQUEGUA PASCO PIURA PUNO SAN M

# Obteniendo el número **total de personas completamente vacunadas (2 dosis)** y el número **total de muertes** confirmadas por COVID-19 por cada uno de los **24 departamentos** del Perú 

## 0. Crosstab of the TOTAL confirmed deaths from COVID-19 by department.
Use 'falxdep' dataframe

In [16]:
dep_deaths = pd.crosstab(index = falxdep['DEPARTAMENTO'], columns = falxdep['fallecido'])
dep_deaths.columns = ['fallecidos']

# Adding an extra row of the total deaths for the whole country
dep_deaths.loc['PERU']= dep_deaths.sum()

## 1. Get the TOTAL confirmed DEATHS from COVID-19 by each of the 24 departments of Peru

In [17]:
print("Head of a SUMMARY dataframe of all fully vaccinated people in Peru (2 doses) per place of vaccination:")
print(dep_deaths)

Head of a SUMMARY dataframe of all fully vaccinated people in Peru (2 doses) per place of vaccination:
               fallecidos
DEPARTAMENTO             
AMAZONAS             1258
ANCASH               6674
APURIMAC             1510
AREQUIPA             9680
AYACUCHO             2121
CAJAMARCA            4135
CALLAO               9982
CUSCO                4794
HUANCAVELICA         1161
HUANUCO              2700
ICA                  8367
JUNIN                7020
LA LIBERTAD         10196
LAMBAYEQUE           8610
LIMA                88734
LORETO               4174
MADRE DE DIOS         773
MOQUEGUA             1516
PASCO                1048
PIURA               11996
PUNO                 4174
SAN MARTIN           3012
TACNA                1954
TUMBES               1566
UCAYALI              3091
PERU               200246


## 2. Get the total of people FULLY VACCINATED (2 doses) by the 24 departments of Peru

There are no direct way to found the total of people fully vaccinated per departments. To achieve this the following is planned:

1. The vaccination dataset **(RawData/TB_VACUNACION_COVID19.csv)** only gives information about the vaccination center called *id_centro_vacunacion*. NOT the department or another relevant location.

2. The vaccination centers dataset **(RawData/TB_CENTRO_VACUNACION.csv)** can be used to "match" the *id_centro_vacunacion* with the *id_ubigeo*. Which is a numeric variable from 0 to 1894 of the specifict district.

3.  Finally with the UBIGEO dataset **(RawData/TB_UBIGEOS.csv)** it is possible to "match" each *id_ubigeo* with the correct department.


In [18]:
def vac_department(vac_url):
    """
    Función que toma la dirección del dataset de vacunados y devuelve el número de VACUNADOS 
    por los 24 departamentos del Perú
    """
    vac_col = ['id_centro_vacunacion', 'dosis','fecha_vacunacion']                     
    df_vac = fn.read_largeCSV_file(vac_url, ',', vac_col)    
    lst_vac = fn.df_into_chunks(df_vac)               
    
    for df in lst_vac:                                           
        df = df.drop(df[df["dosis"] == 1].index,  inplace=True)     # Drop non fully vaccinated (1 dose)

    for df in lst_vac:
        df['vacunado'] = 1  # To count each case
        df['vacunado'] = df['vacunado'].apply(np.int8)
        del df['dosis']     # Dose var is no needed anymore

    return lst_vac

In [19]:
vac_url = "RawData/TB_VACUNACION_COVID19.csv"
vacxdep = vac_department(vac_url)
del vac_url

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats i

In [20]:
print("Head of the first chunk of all fully vaccinated people in Peru (2 doses) per place of vaccination:")
print(vacxdep[0].head())

Head of the first chunk of all fully vaccinated people in Peru (2 doses) per place of vaccination:
  fecha_vacunacion  id_centro_vacunacion  vacunado
0       19/07/2021                    17         1
1       17/06/2021                  1828         1
2       11/06/2021                103617         1
3       28/07/2021                   891         1
7        5/06/2021                108281         1


Note how each fully vaccinated person has a location id. Now it is necessary to collect each fully vaccine case by this id.

## 2. From a list of chunks to a summary dataframe of people fully vaccinated (2 doses) per department

**IMPORTANT:** "vacxdep" variable is actually a list of *dataframes* or *chunks*.

In [21]:
def vacxdep_chunks(dfs_vac):
    """
    Devuelve un dataframe con el TOTAL DE VACUNADOS por DEPARTAMENTO 
    del PERÚ (recibe una lista de dataframes o chunks)
    """
    var_holder = {}     # Diccionario para guardar nombres                                             
    lst_epi_vac = []    # Lista de dfs para cada sumatoria de chunks
                                         
    for i, chunk in enumerate(dfs_vac):
        var_holder['epi_vac_' + str(i)]= pd.crosstab(index=[chunk['id_centro_vacunacion']],
                                                     columns=chunk['vacunado'])
        lst_epi_vac.append(var_holder['epi_vac_' + str(i)])
    
    merged_epivac = pd.concat(lst_epi_vac, axis=1)  # Merge all dfs
    epi_vac = pd.DataFrame(merged_epivac.sum(numeric_only=True, axis=1))
    epi_vac.columns = ['vacunados']
    epi_vac['vacunados'] = epi_vac['vacunados'].astype(np.int64)
    epi_vac.reset_index(level=0, inplace=True)

    return epi_vac

In [22]:
vacxdep_sum = vacxdep_chunks(vacxdep)
print("Head of a SUMMARY dataframe of all fully vaccinated people in Peru (2 doses) per place of vaccination:")
print(vacxdep_sum.head()), print("\n")

print('Fully vaccinated (2 doses): ' + str(vacxdep_sum['vacunados'].sum())) 

Head of a SUMMARY dataframe of all fully vaccinated people in Peru (2 doses) per place of vaccination:
   id_centro_vacunacion  vacunados
0                     1        331
1                     3     150269
2                     5       7998
3                     6      33530
4                     8      44992


Fully vaccinated (2 doses): 15676507


## 3. Getting the department of each vaccination center
Read the 2 other csv with the directions

For this case is necesary to match ...

In [23]:
ubigeo_url = 'RawData/TB_UBIGEOS.csv'
vaccenter_url = 'RawData/TB_CENTRO_VACUNACION.csv'

ubigeo = pd.read_csv(ubigeo_url, usecols = ['id_ubigeo', 'departamento'])
vaccenter = pd.read_csv(vaccenter_url, usecols= ['id_centro_vacunacion','id_ubigeo'])

del ubigeo_url, vaccenter_url

vaccenter = vaccenter.merge(ubigeo, on = 'id_ubigeo', how = 'left')
del vaccenter['id_ubigeo']

print("Head of the merged dataframe (vaccenter) with: 'id_centro_vacunacion' and 'departamento'") 
print(vaccenter.head(10))

Head of the merged dataframe (vaccenter) with: 'id_centro_vacunacion' and 'departamento'
   id_centro_vacunacion departamento
0                  2021         PUNO
1                  3699   SAN MARTIN
2                   154   SAN MARTIN
3                   155   SAN MARTIN
4                  3260   SAN MARTIN
5                  2906       ANCASH
6                  2907       ANCASH
7                  2909       ANCASH
8                  2910       ANCASH
9                  2912       ANCASH


Note that the new dataframe **'vaccenter'** can be used to get the department of each vaccination center

## 4. Found the deparment of all people fully vaccinated (2 doses)

In [24]:
vacxdep_sum = vacxdep_sum.merge(vaccenter, on = 'id_centro_vacunacion', how = 'left')
del vacxdep_sum['id_centro_vacunacion']

print(vacxdep_sum.head())

   vacunados departamento
0        331         LIMA
1     150269         LIMA
2       7998         LIMA
3      33530         LIMA
4      44992         LIMA


Finally just get the total of vaccinated grouping by department.

In [25]:
ct_vacxdep = vacxdep_sum.groupby(['departamento']).sum()    # Sum by departments
ct_vacxdep.loc['PERU',:] = ct_vacxdep.sum(axis = 0)         # Total of fully vaccinated
ct_vacxdep['vacunados'] = ct_vacxdep['vacunados'].apply(np.int64) 

print(ct_vacxdep)

               vacunados
departamento            
AMAZONAS          165057
ANCASH            644423
APURIMAC          201423
AREQUIPA          763523
AYACUCHO          243124
CAJAMARCA         610599
CALLAO            695497
CUSCO             575067
HUANCAVELICA      149145
HUANUCO           265974
ICA               515352
JUNIN             651236
LA LIBERTAD       935166
LAMBAYEQUE        597063
LIMA             6011436
LORETO            272941
MADRE DE DIOS      49748
MOQUEGUA          116037
PASCO             129634
PIURA             777954
PUNO              345062
SAN MARTIN        374244
TACNA             211017
TUMBES            128651
UCAYALI           188548
PERU            15617921


# 5. Merge fully vaccinated and confirmed deaths from COVID-19 per departments

In [26]:
vndxdep = pd.concat([ct_vacxdep, dep_deaths], axis=1)
vndxdep.reset_index(level=0, inplace=True)
vndxdep.rename(columns = {'index':'departamento'}, inplace = True)

print(vndxdep)

     departamento  vacunados  fallecidos
0        AMAZONAS     165057        1258
1          ANCASH     644423        6674
2        APURIMAC     201423        1510
3        AREQUIPA     763523        9680
4        AYACUCHO     243124        2121
5       CAJAMARCA     610599        4135
6          CALLAO     695497        9982
7           CUSCO     575067        4794
8    HUANCAVELICA     149145        1161
9         HUANUCO     265974        2700
10            ICA     515352        8367
11          JUNIN     651236        7020
12    LA LIBERTAD     935166       10196
13     LAMBAYEQUE     597063        8610
14           LIMA    6011436       88734
15         LORETO     272941        4174
16  MADRE DE DIOS      49748         773
17       MOQUEGUA     116037        1516
18          PASCO     129634        1048
19          PIURA     777954       11996
20           PUNO     345062        4174
21     SAN MARTIN     374244        3012
22          TACNA     211017        1954
23         TUMBE

# 6. Adding the number of inhabitants per department

Population based on: https://es.wikipedia.org/wiki/Anexo:Departamentos_del_Per%C3%BA_por_poblaci%C3%B3n

In [27]:
dic_dep = {
    "AMAZONAS"	:	426806,
    "ANCASH"	:	1180638,
    "APURIMAC"	:	430736,
    "AREQUIPA"	:	1497438,
    "AYACUCHO"	:	668213,
    "CAJAMARCA"	:	1453711,
    "CALLAO"	:	1129854,
    "CUSCO"	    :	1357075,
    "HUANCAVELICA":	365317,
    "HUANUCO"	:	760267,
    "ICA"	    :	975182,
    "JUNIN"	    :	1361467,
    "LA LIBERTAD":	2016771,
    "LAMBAYEQUE":	1310785,
    "LIMA"	    :	10628470,
    "LORETO"	:	1027559,
    "MADRE DE DIOS":173811,
    "MOQUEGUA"	:	192740,
    "PASCO"	    :	271904,
    "PIURA"	    :	2047954,
    "PUNO"	    :	1237997,
    "SAN MARTIN":	899648,
    "TACNA"	    :	370974,
    "TUMBES"	:	251521,
    "UCAYALI"	:	589110,
    "PERU"  	:	32625948,
}

In [28]:
vndxdep['no_habitantes'] = vndxdep['departamento'].map(dic_dep)
vndxdep = vndxdep[['departamento', 'no_habitantes', 'vacunados', 'fallecidos']]

# Mortality rate per 100k per department
vndxdep['tasa_mortalidad'] = (vndxdep['fallecidos']/vndxdep['no_habitantes'])*100000

# % of people fully vaccinated per department
vndxdep['vac_porcentaje'] = (vndxdep['vacunados']*100)/vndxdep['no_habitantes']

print(vndxdep)

     departamento  no_habitantes  vacunados  fallecidos  tasa_mortalidad  \
0        AMAZONAS         426806     165057        1258       294.747497   
1          ANCASH        1180638     644423        6674       565.287582   
2        APURIMAC         430736     201423        1510       350.562758   
3        AREQUIPA        1497438     763523        9680       646.437448   
4        AYACUCHO         668213     243124        2121       317.413759   
5       CAJAMARCA        1453711     610599        4135       284.444432   
6          CALLAO        1129854     695497        9982       883.476980   
7           CUSCO        1357075     575067        4794       353.259768   
8    HUANCAVELICA         365317     149145        1161       317.806179   
9         HUANUCO         760267     265974        2700       355.138392   
10            ICA         975182     515352        8367       857.993687   
11          JUNIN        1361467     651236        7020       515.620283   
12    LA LIB

Guardamos:

In [29]:
vndxdep.to_csv('Data/TOTAL_vacunados_y_fallecidos_x_departamento.csv',index = False)