# Obteniendo el número **total de personas completamente vacunadas (2 dosis)** y el número **total de muertes** confirmadas por COVID-19 por cada uno de los **24 departamentos** del Perú 

## 0. Crosstab of the TOTAL confirmed deaths from COVID-19 by department.
Use 'falxdep' dataframe

In [1]:
import pandas as pd
import numpy as np

import functions as fn

## 2. Get the total of people FULLY VACCINATED (2 doses) by the 24 departments of Peru

There are no direct way to found the total of people fully vaccinated per departments. To achieve this the following is planned:

1. The vaccination dataset **(RawData/TB_VACUNACION_COVID19.csv)** only gives information about the vaccination center called *id_centro_vacunacion*. NOT the department or another relevant location.

2. The vaccination centers dataset **(RawData/TB_CENTRO_VACUNACION.csv)** can be used to "match" the *id_centro_vacunacion* with the *id_ubigeo*. Which is a numeric variable from 0 to 1894 of the specifict district.

3.  Finally with the UBIGEO dataset **(RawData/TB_UBIGEOS.csv)** it is possible to "match" each *id_ubigeo* with the correct department.


In [2]:
def vac_department(vac_url):
    """
    Función que toma la dirección del dataset de vacunados y devuelve el número de VACUNADOS 
    por los 24 departamentos del Perú
    """
    vac_col = ['id_centro_vacunacion', 'dosis','fecha_vacunacion']                     
    df_vac = fn.read_largeCSV_file(vac_url, ',', vac_col)    
    lst_vac = fn.df_into_chunks(df_vac)               
    
    for df in lst_vac:                                           
        df = df.drop(df[df["dosis"] == 1].index,  inplace=True)     # Drop non fully vaccinated (1 dose)

    for df in lst_vac:
        df['vacunado'] = 1  # To count each case
        df['vacunado'] = df['vacunado'].apply(np.int8)
        del df['dosis']     # Dose var is no needed anymore

    return lst_vac

In [3]:
vac_url = "RawData/TB_VACUNACION_COVID19.csv"
vacxdep = vac_department(vac_url)
del vac_url

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats i

In [7]:
print("Head of the first chunk of all fully vaccinated people in Peru (2 doses) per place of vaccination:")
print(vacxdep[0].head())

Head of the first chunk of all fully vaccinated people in Peru (2 doses) per place of vaccination:
  fecha_vacunacion  id_centro_vacunacion  vacunado
0       19/07/2021                    17         1
1       17/06/2021                  1828         1
2       11/06/2021                103617         1
3       28/07/2021                   891         1
7        5/06/2021                108281         1


Note how each fully vaccinated person has a location id. Now it is necessary to collect each fully vaccine case by this id.

## 2. From a list of chunks to a summary dataframe of people fully vaccinated (2 doses) per department

**IMPORTANT:** "vacxdep" variable is actually a list of *dataframes* or *chunks*.

In [8]:
def vacxdep_chunks(dfs_vac):
    """
    Devuelve un dataframe con el TOTAL DE VACUNADOS por DEPARTAMENTO 
    del PERÚ (recibe una lista de dataframes o chunks)
    """
    var_holder = {}     # Diccionario para guardar nombres                                             
    lst_epi_vac = []    # Lista de dfs para cada sumatoria de chunks
                                         
    for i, chunk in enumerate(dfs_vac):
        var_holder['epi_vac_' + str(i)]= pd.crosstab(index=[chunk['id_centro_vacunacion']],
                                                     columns=chunk['vacunado'])
        lst_epi_vac.append(var_holder['epi_vac_' + str(i)])
    
    merged_epivac = pd.concat(lst_epi_vac, axis=1)  # Merge all dfs
    epi_vac = pd.DataFrame(merged_epivac.sum(numeric_only=True, axis=1))
    epi_vac.columns = ['vacunados']
    epi_vac['vacunados'] = epi_vac['vacunados'].astype(np.int64)
    epi_vac.reset_index(level=0, inplace=True)

    return epi_vac

In [10]:
vacxdep_sum = vacxdep_chunks(vacxdep)
print("Head of a SUMMARY dataframe of all fully vaccinated people in Peru (2 doses) per place of vaccination:")
print(vacxdep_sum.head()), print("\n")

print('Fully vaccinated (2 doses): ' + str(vacxdep_sum['vacunados'].sum())) 

Head of a SUMMARY dataframe of all fully vaccinated people in Peru (2 doses) per place of vaccination:
   id_centro_vacunacion  vacunados
0                     1        331
1                     3     150269
2                     5       7998
3                     6      33530
4                     8      44992


Fully vaccinated (2 doses): 15676507


## 3. Getting the department of each vaccination center
Read the 2 other csv with the directions

For this case is necesary to match ...

In [12]:
ubigeo_url = 'RawData/TB_UBIGEOS.csv'
vaccenter_url = 'RawData/TB_CENTRO_VACUNACION.csv'

ubigeo = pd.read_csv(ubigeo_url, usecols = ['id_ubigeo', 'departamento'])
vaccenter = pd.read_csv(vaccenter_url, usecols= ['id_centro_vacunacion','id_ubigeo'])

del ubigeo_url, vaccenter_url

vaccenter = vaccenter.merge(ubigeo, on = 'id_ubigeo', how = 'left')
del vaccenter['id_ubigeo']

print("Head of the merged dataframe (vaccenter) with: 'id_centro_vacunacion' and 'departamento'") 
print(vaccenter.head(10))



Head of the merged dataframe (vaccenter) with: 'id_centro_vacunacion' and 'departamento'
   id_centro_vacunacion departamento
0                  2021         PUNO
1                  3699   SAN MARTIN
2                   154   SAN MARTIN
3                   155   SAN MARTIN
4                  3260   SAN MARTIN
5                  2906       ANCASH
6                  2907       ANCASH
7                  2909       ANCASH
8                  2910       ANCASH
9                  2912       ANCASH


Note that the new dataframe **'vaccenter'** can be used to get the department of each vaccination center

## 4. Found the deparment of all people fully vaccinated (2 doses)

In [None]:
vacxdep_sum = vacxdep_sum.merge(vaccenter, on = 'id_centro_vacunacion', how = 'left')
del vacxdep_sum['id_centro_vacunacion']

print(vacxdep_sum.head())

Finally just get the total of vaccinated grouping by department.

In [None]:
ct_vacxdep = vacxdep_sum.groupby(['departamento']).sum()    # Sum by departments
ct_vacxdep.loc['PERU',:] = ct_vacxdep.sum(axis = 0)         # Total of fully vaccinated
ct_vacxdep['vacunados'] = ct_vacxdep['vacunados'].apply(np.int64) 

print(ct_vacxdep)

# 5. Merge fully vaccinated and confirmed deaths from COVID-19 per departments

In [None]:
vndxdep = pd.concat([ct_vacxdep, dep_deaths], axis=1)
vndxdep.reset_index(level=0, inplace=True)
vndxdep.rename(columns = {'index':'departamento'}, inplace = True)

print(vndxdep)

# 6. Adding the number of inhabitants per department

Population based on: https://es.wikipedia.org/wiki/Anexo:Departamentos_del_Per%C3%BA_por_poblaci%C3%B3n

In [13]:
dic_dep = {
    "AMAZONAS"	:	426806,
    "ANCASH"	:	1180638,
    "APURIMAC"	:	430736,
    "AREQUIPA"	:	1497438,
    "AYACUCHO"	:	668213,
    "CAJAMARCA"	:	1453711,
    "CALLAO"	:	1129854,
    "CUSCO"	    :	1357075,
    "HUANCAVELICA":	365317,
    "HUANUCO"	:	760267,
    "ICA"	    :	975182,
    "JUNIN"	    :	1361467,
    "LA LIBERTAD":	2016771,
    "LAMBAYEQUE":	1310785,
    "LIMA"	    :	10628470,
    "LORETO"	:	1027559,
    "MADRE DE DIOS":173811,
    "MOQUEGUA"	:	192740,
    "PASCO"	    :	271904,
    "PIURA"	    :	2047954,
    "PUNO"	    :	1237997,
    "SAN MARTIN":	899648,
    "TACNA"	    :	370974,
    "TUMBES"	:	251521,
    "UCAYALI"	:	589110,
    "PERU"  	:	32625948,
}

In [None]:
vndxdep['no_habitantes'] = vndxdep['departamento'].map(dic_dep)
vndxdep = vndxdep[['departamento', 'no_habitantes', 'vacunados', 'fallecidos']]

# Mortality rate per 100k per department
vndxdep['tasa_mortalidad'] = (vndxdep['fallecidos']/vndxdep['no_habitantes'])*100000

# % of people fully vaccinated per department
vndxdep['vac_porcentaje'] = (vndxdep['vacunados']*100)/vndxdep['no_habitantes']

print(vndxdep)

Guardamos:

In [None]:
vndxdep.to_csv('Data/vac_fal_x_departamento.csv',index = False)

# Número de vacunados por semana epidemiológica de cada departamento

## 0. Usamos la data incial de vacunados por departamento (vacxdep)

In [11]:
print("Head of the first chunk of all fully vaccinated people in Peru (2 doses) per place of vaccination:")
print(vacxdep[0].head())

Head of the first chunk of all fully vaccinated people in Peru (2 doses) per place of vaccination:
  fecha_vacunacion  id_centro_vacunacion  vacunado
0       19/07/2021                    17         1
1       17/06/2021                  1828         1
2       11/06/2021                103617         1
3       28/07/2021                   891         1
7        5/06/2021                108281         1


## 1. Transformamos a formato fecha (yyyy-mm-dd) la columna "fecha_vacunacion" de todos los chunks

In [12]:
for i, chunk in enumerate(vacxdep):
    chunk = fn.variable_fecha_ymd(chunk, "fecha_vacunacion")
    print(vacxdep[i].head())

## 2. Obtenemos la semana epidemiológica y año de la columna "fecha_vacunacion" ahora en formato fecha (yyyy-mm-dd) de todos los chunks

In [13]:
for i, chunk in enumerate(vacxdep):
    chunk = fn.date_to_epiweek(chunk, "fecha_vacunacion")

print(vacxdep[0].head())

   id_centro_vacunacion  vacunado  epi_year  epi_week
0                    17         1      2021        29
1                  1828         1      2021        24
2                103617         1      2021        23
3                   891         1      2021        30
7                108281         1      2021        22


## 3. Reemplazar id_centro_vacunacion con el departamento respectivo

In [14]:
ubigeo_url = 'RawData/TB_UBIGEOS.csv'
vaccenter_url = 'RawData/TB_CENTRO_VACUNACION.csv'

ubigeo = pd.read_csv(ubigeo_url, usecols = ['id_ubigeo', 'departamento'])
vaccenter = pd.read_csv(vaccenter_url, usecols= ['id_centro_vacunacion','id_ubigeo'])

del ubigeo_url, vaccenter_url

vaccenter = vaccenter.merge(ubigeo, on = 'id_ubigeo', how = 'left')
del vaccenter['id_ubigeo']

print("Head of the merged dataframe (vaccenter) with: 'id_centro_vacunacion' and 'departamento'") 
print(vaccenter.head(10))

Head of the merged dataframe (vaccenter) with: 'id_centro_vacunacion' and 'departamento'
   id_centro_vacunacion departamento
0                  2021         PUNO
1                  3699   SAN MARTIN
2                   154   SAN MARTIN
3                   155   SAN MARTIN
4                  3260   SAN MARTIN
5                  2906       ANCASH
6                  2907       ANCASH
7                  2909       ANCASH
8                  2910       ANCASH
9                  2912       ANCASH


## 4.

In [30]:

df = vacxdep[0].merge(vaccenter, on = 'id_centro_vacunacion', how = 'left')
#del vacxdep_id['id_centro_vacunacion']

print(df.head())

   id_centro_vacunacion  vacunado  epi_year  epi_week departamento
0                    17         1      2021        29         LIMA
1                  1828         1      2021        24   LAMBAYEQUE
2                103617         1      2021        23     AYACUCHO
3                   891         1      2021        30         LIMA
4                108281         1      2021        22   LAMBAYEQUE


In [34]:
for i, chunk in enumerate(vacxdep):
    chunk = chunk.merge(vaccenter, on = 'id_centro_vacunacion', how = 'left')
    vacxdep[i] = chunk

print(vacxdep[0].head())

   id_centro_vacunacion  vacunado  epi_year  epi_week departamento
0                    17         1      2021        29         LIMA
1                  1828         1      2021        24   LAMBAYEQUE
2                103617         1      2021        23     AYACUCHO
3                   891         1      2021        30         LIMA
4                108281         1      2021        22   LAMBAYEQUE


## 3. Unimos nu

In [46]:
def vacxdep_chunks(dfs_vac):
    """
    Devuelve un dataframe con el TOTAL DE VACUNADOS por DEPARTAMENTO 
    del PERÚ (recibe una lista de dataframes o chunks)
    """
    var_holder = {}     # Diccionario para guardar nombres                                             
    lst_epi_vac = []    # Lista de dfs para cada sumatoria de chunks
                                         
    for i, chunk in enumerate(dfs_vac):
        var_holder['epi_vac_' + str(i)]= pd.crosstab(index = [chunk['epi_year'], chunk['epi_week']],
                                                     columns = [chunk['departamento']])
        lst_epi_vac.append(var_holder['epi_vac_' + str(i)])

    merged_epivac = pd.concat(lst_epi_vac, axis=1)  # Merge all dfs
    merged_epivac = merged_epivac.fillna(0).astype(np.int64)
    #epi_vac = pd.DataFrame(merged_epivac.sum(numeric_only=True, axis=1))
    #epi_vac.columns = ['vacunados']
    #epi_vac['vacunados'] = epi_vac['vacunados'].astype(np.int64)
    #epi_vac.reset_index(level=0, inplace=True)

    return merged_epivac

In [47]:
vacxdepxsemEpi = vacxdep_chunks(vacxdep)

In [48]:
print(vacxdepxsemEpi.head(50))
vacxdepxsemEpi.to_csv('Data/vacunados_x_departamento_x_semanaEpi.csv')

departamento       AMAZONAS  ANCASH  APURIMAC  AREQUIPA  AYACUCHO  CAJAMARCA  \
epi_year epi_week                                                              
2021     6                0       0         0         0         0          0   
         7                0       0         0         0         0          0   
         8                0       0         0         0         0          0   
         9               13      20        15        49        29         30   
         10               4      10        10        62        32          6   
         11              47      67        27        67        12         62   
         12               5      16        11        35         5         18   
         13               3       4         1         6         3          6   
         14               3       6         7        32         8          6   
         15               0      56         2       129         0          4   
         16               2       6     