## Descripción del dataset

Disponemos de dos grupos de datasets: **casos de coronavirus** y **vacunas por país**.

## Origen del dataset

Datasets de Kaggle publicados por Sumit Kumar: www.kaggle.com/code/anaverageengineer/will-vaccination-able-to-compress-covid-cases/data

Scrapeado por Joseph Assaker el 30/06/2021 de: www.worldometers.info/coronavirus

www.worldometers.info es un sitio web que durante los últimos 15 años ha estado proporcionando estadísticas sobre diferentes ámbitos. En este caso, se ha utilizado para obtener datos sobre la evolución de la pandemia de COVID-19.

Los datos son proporcionados per personas alreadedor de todo el mundo, y se actualizan diariamente. Estos datos son comprobados y validados por un equipo de analistas.

La pagina no indica ninguna politica en especifico respecto al uso de los datos, únicamente que se puede acceder a ellos de forma gratuita.

## Exploración de los datos

In [1]:
import pandas as pd

In [2]:
df_worldwide_daily = pd.read_csv('./data/worldometer_coronavirus_daily_data.csv')
df_worldwide_summary = pd.read_csv('./data/worldometer_coronavirus_summary_data.csv')

df_country_vaccinations = pd.read_csv('./data/country_vaccinations.csv')
df_country_vaccinations_by_manufacturer = pd.read_csv('./data/country_vaccinations_by_manufacturer.csv')

In [3]:
display(df_worldwide_daily.head())
display(df_worldwide_summary.head())

Unnamed: 0,date,country,cumulative_total_cases,daily_new_cases,active_cases,cumulative_total_deaths,daily_new_deaths
0,2020-2-15,Afghanistan,0.0,,0.0,0.0,
1,2020-2-16,Afghanistan,0.0,,0.0,0.0,
2,2020-2-17,Afghanistan,0.0,,0.0,0.0,
3,2020-2-18,Afghanistan,0.0,,0.0,0.0,
4,2020-2-19,Afghanistan,0.0,,0.0,0.0,


Unnamed: 0,country,continent,total_confirmed,total_deaths,total_recovered,active_cases,serious_or_critical,total_cases_per_1m_population,total_deaths_per_1m_population,total_tests,total_tests_per_1m_population,population
0,Afghanistan,Asia,120216,4962.0,71012.0,44242.0,1124.0,3021,125.0,612112.0,15381.0,39797047
1,Albania,Europe,132521,2456.0,130009.0,56.0,3.0,46100,854.0,805546.0,280223.0,2874665
2,Algeria,Africa,139626,3716.0,97089.0,38821.0,32.0,3128,83.0,230861.0,5172.0,44636630
3,Andorra,Europe,13911,127.0,13720.0,64.0,3.0,179757,1641.0,193595.0,2501615.0,77388
4,Angola,Africa,38849,900.0,33242.0,4707.0,11.0,1147,27.0,648883.0,19154.0,33876821


**df_worldwide_daily**:
- date: designates the date of observation of the row's data in YYYY-MM-DD format.
- country: designates the Country in which the the row's data was observed.
- cumulative_total_cases: designates the cumulative number of confirmed cases as of the row's date, for the row's country.
- daily_new_cases: designates the daily new number of confirmed cases on the row's date, for the row's country.
- active_cases: designates the number of active cases (i.e., confirmed cases that still didn't recover nor die) on the row's date, for the row's country.
- cumulative_total_deaths: designates the cumulative number of confirmed deaths as of the row's date, for the row's country.
- daily_new_deaths: designates the daily new number of confirmed deaths on the row's date, for the row's country.

**df_worldwide_summary**:
- country: designates the Country in which the the row's data was observed.
- continent: designates the Continent of the observed country.
- total_confirmed: designates the total number of confirmed cases in the observed country.
- total_deaths: designates the total number of confirmed deaths in the observed country.
- total_recovered: designates the total number of confirmed recoveries in the observed country.
- active_cases: designates the number of active cases in the observed country.
- serious_or_critical: designates the estimated number of cases in serious or critical conditions in the observed country.
- total_cases_per_1m_population: designates the number of total cases per 1 million population in the observed country.
- total_deaths_per_1m_population: designates the number of total deaths per 1 million population in the observed country.
- total_tests: designates the number of total tests done in the observed country.
- total_tests_per_1m_population: designates the number of total test done per 1 million population in the observed country.
- population: designates the population count in the observed country.

In [4]:
countries = df_worldwide_daily['country'].nunique()
print(f'Number of countries: {countries}')

df_worldwide_daily['date'] = pd.to_datetime(df_worldwide_daily['date'])
start_date = df_worldwide_daily['date'].min()
print(f'Start date: {start_date}')

end_date = df_worldwide_daily['date'].max()
print(f'End date: {end_date}')

records = df_worldwide_daily.shape[0]
print(f'Number of records: {records}')

Number of countries: 220
Start date: 2020-01-22 00:00:00
End date: 2021-06-30 00:00:00
Number of records: 110464


In [5]:
display(df_country_vaccinations.head())
display(df_country_vaccinations_by_manufacturer.head())

Unnamed: 0,country,iso_code,date,total_vaccinations,people_vaccinated,people_fully_vaccinated,daily_vaccinations_raw,daily_vaccinations,total_vaccinations_per_hundred,people_vaccinated_per_hundred,people_fully_vaccinated_per_hundred,daily_vaccinations_per_million,vaccines,source_name,source_website
0,Afghanistan,AFG,2021-02-22,0.0,0.0,,,,0.0,0.0,,,"BBIBP-CorV, Oxford/AstraZeneca, Pfizer/BioNTech",World Health Organization,https://covid19.who.int/
1,Afghanistan,AFG,2021-02-23,,,,,1367.0,,,,35.0,"BBIBP-CorV, Oxford/AstraZeneca, Pfizer/BioNTech",World Health Organization,https://covid19.who.int/
2,Afghanistan,AFG,2021-02-24,,,,,1367.0,,,,35.0,"BBIBP-CorV, Oxford/AstraZeneca, Pfizer/BioNTech",World Health Organization,https://covid19.who.int/
3,Afghanistan,AFG,2021-02-25,,,,,1367.0,,,,35.0,"BBIBP-CorV, Oxford/AstraZeneca, Pfizer/BioNTech",World Health Organization,https://covid19.who.int/
4,Afghanistan,AFG,2021-02-26,,,,,1367.0,,,,35.0,"BBIBP-CorV, Oxford/AstraZeneca, Pfizer/BioNTech",World Health Organization,https://covid19.who.int/


Unnamed: 0,location,date,vaccine,total_vaccinations
0,Austria,2021-01-08,Johnson&Johnson,0
1,Austria,2021-01-08,Moderna,0
2,Austria,2021-01-08,Oxford/AstraZeneca,0
3,Austria,2021-01-08,Pfizer/BioNTech,30938
4,Austria,2021-01-15,Johnson&Johnson,0


In [6]:
countries = df_country_vaccinations['country'].nunique()
print(f'Number of countries: {countries}')

start_date = df_country_vaccinations['date'].min()
print(f'Start date: {start_date}')

end_date = df_country_vaccinations['date'].max()
print(f'End date: {end_date}')

records = df_country_vaccinations.shape[0]
print(f'Number of records: {records}')

Number of countries: 217
Start date: 2020-12-02
End date: 2021-07-05
Number of records: 29086


## Justificación elección de los datos

El dataset elegido es el de casos de coronavirus, ya que es el que más información nos aporta sobre la evolución de la pandemia. Además, nos permite comparar los datos de casos con los de vacunación, para ver si la vacunación está teniendo efecto en la disminución de casos. Por otro lado, el dataset de vacunas por país nos permite ver la evolución de la vacunación en cada país, y compararla con la evolución de la pandemia en cada país.

## Preguntas de interés

- ¿Cómo ha evolucionado la pandemia en el mundo?

- ¿Cómo ha evolucionado la vacunación en el mundo?

- ¿Cómo ha evolucionado la pandemia en España en comparación con otros países?

- ¿Qué países han vacunado a más personas?

- ¿Han reaccionado los países de forma diferente ante la pandemia?

- ¿Qué vacunas se están utilizando más?

- ¿Se puede relacionar el tipo de vacuna con una mayor o menor tasa de casos?