# COVID-19 Data Visualization and Exploratory Data Analysis

## Data Source
The following is a portfolio project in which the Our World in Data Covid-19 dataset is explored.  A copy of the dataset can be found at [this site](https://github.com/owid/covid-19-data/tree/master/public/data).

The data comes from the COVID-19 Data Repository by the Center of Systems Science and Engineering (CSSE) at Johns Hopkins University.  The dataset contains data regarding confirmed cases and deaths, hospitalizations, testing for COVID-19, vaccinations against COVID-19, and other variables.  In this notebook, we will perform exploratory data analysis on the COVID-19 dataset.  Run the code snippet below to prepare this Jupyter notebook.

In [1]:
import ipywidgets as widgets
import seaborn as sns
import matplotlib.pyplot as plt


## Data Ingestion
Ingested the data into a SQL server instance.  The data should have two tables: CovidDeaths and CovidVaccinations.  Alternatively, use the code below to create the database in your SQL server instance.

### CovidDeaths spreadsheet fields: 
The CovidDeaths spreadsheet file should have these fields in the following order:

iso_code, continent, location, date, population, total_cases, new_cases, new_cases_smoothed, total_deaths, new_deaths, new_deaths_smoothed, total_cases_per_million, new_cases_per_million, new_cases_smoothed_per_million, total_deaths_per_million, new_deaths_per_million, new_deaths_smoothed_per_million, reproduction_rate, icu_patients, icu_patients_per_million, hosp_patients, hosp_patients_per_million, weekly_icu_admissions, weekly_icu_admissions_per_million, weekly_hosp_admissions, weekly_hosp_admissions_per_million, total_tests

### CovidVaccination spreadsheet fields: 
The CovidVaccination spreadsheet file should have these fields in the following order:

iso_code, continent, location, date, total_tests, new_tests, total_tests_per_thousand, new_tests_per_thousand, new_tests_smoothed, new_tests_smoothed_per_thousand, positive_rate, tests_per_case, tests_units, total_vaccinations, people_vaccinated, people_fully_vaccinated, total_boosters, new_vaccinations, new_vaccinations_smoothed, total_vaccinations_per_hundred, people_vaccinated_per_hundred, people_fully_vaccinated_per_hundred, total_boosters_per_hundred, new_vaccinations_smoothed_per_million, new_people_vaccinated_smoothed, new_people_vaccinated_smoothed_per_hundred, stringency_index, population, population_density, median_age, aged_65_older, aged_70_older, gdp_per_capita, extreme_poverty, cardiovasc_death_rate, diabetes_prevalence, female_smokers, male_smokers, handwashing_facilities, hospital_beds_per_thousand, life_expectancy, human_development_index, excess_mortality_cumulative_absolute, excess_mortality_cumulative, excess_mortality, excess_mortality_cumulative_per_million

Finally, import these two files to create two tables: PorfolioProject..CovidDeaths and PorfolioProject..CovidVaccination

## Connect to the database
Finally, connect to the database that has been created and next we will explore the data set!

## EDA
Code for the exploratory data analysis on COVID dataset is below.
#### First exploration of CovidDeaths and CovidVaccinations dataset.


In [2]:
Select *
From PortfolioProject..CovidDeaths
where continent is not null
order by 3,4

Select * 
From PortfolioProject..CovidVaccinations
Where continent is not null
order by 3,4

Select Location, date, total_cases, new_cases, total_deaths, population
From PortfolioProject..CovidDeaths
Where continent is not null
order by 1,2

SyntaxError: invalid syntax (1618866794.py, line 1)

#### Looking at Total Cases vs. Total Deaths
#### Shows likelihood of dying if you contract COVID in the US.

In [None]:
Select Location, date, total_cases, total_deaths, (total_deaths/total_cases)*100 AS DeathPercentage
From PortfolioProject..CovidDeaths
where Location like '%states%'
Where continent is not null
order by 1,2

#### Looking at Total Cases vs. Population.

In [None]:
Select Location, date, total_cases, population, (total_cases/population)*100 as CasePercentage
From PortfolioProject..CovidDeaths
Where continent is not null
order by 1,2

#### Looking at Countries with Highest Infection Rate Compared to Populations

In [None]:
Select Location, population, MAX(total_cases) as HighestInfectionCount, Max(total_cases/population)*100 as PercentPopulationInfected
From PortfolioProject..CovidDeaths
Where continent is not null
Group by Location, Population
order by PercentPopulationInfected desc

#### Showing Countries with Highest Death Count per Population

In [None]:
Select Location, MAX(cast(total_deaths as int)) as TotalDeathCount
From PortfolioProject..CovidDeaths
Where continent is not null
Group by Location
order by TotalDeathCount desc

#### Break things down by continent

In [None]:
Select continent, MAX(cast(total_deaths as int)) as TotalDeathCount
From PortfolioProject..CovidDeaths
Where continent is not null
Group by continent
Order by TotalDeathCount desc

#### GLOBAL NUMBERS

In [None]:
Select date, sum(new_cases) as total_cases, sum(cast(new_deaths as int)) as total_deaths, SUM(cast(new_deaths as int))/SUM(new_cases)*100 as DeathPercentage
From PortfolioProject..CovidDeaths
where continent is not null
group by date
order by 1,2

Select sum(new_cases) as total_cases, sum(cast(new_deaths as int)) as total_deaths, SUM(cast(new_deaths as int))/SUM(new_cases)*100 as DeathPercentage
From PortfolioProject..CovidDeaths
where continent is not null
order by 1,2

#### Looking at Total Population versus Vaccinations

In [None]:
Select dea.continent, dea.location, dea.date, dea.population,  
SUM(cast(vac.new_vaccinations as BIGINT)) OVER (Partition by dea.Location Order by dea.location, dea.date) as cumulative_vaccination, 
sum(cast(dea.new_deaths as BIGINT)) OVER (partition by dea.Location Order by dea.location, dea.date) as cumulative_deaths
From PortfolioProject..CovidDeaths dea
Join PortfolioProject..CovidVaccinations vac
 | on dea.location = vac.location
 | and dea.date = vac.date
where dea.continent is not null
order by 2,3

#### using Common Table Expressions (CTE)


In [None]:
With PopvsVac (continent, location, date, population, cumulative_vaccination, cumulative_deaths)
as 
(
Select dea.continent, dea.location, dea.date, dea.population,  
SUM(cast(vac.new_vaccinations as BIGINT)) OVER (Partition by dea.Location Order by dea.location, dea.date) as cumulative_vaccination, 
sum(cast(dea.new_deaths as BIGINT)) OVER (partition by dea.Location Order by dea.location, dea.date) as cumulative_deaths
From PortfolioProject..CovidDeaths dea
Join PortfolioProject..CovidVaccinations vac
 | on dea.location = vac.location
 | and dea.date = vac.date
where dea.continent is not null
)

Select *, (cumulative_vaccination)/population*100 percentage_vaccinated
From PopvsVac
order by location, date

#### using temporary TABLE

In [None]:
drop table if exists #PercentPopulationVaccinated

Create Table #PercentPopulationVaccinated
(
continent nvarchar(255),
location nvarchar(255), 
date datetime,
population numeric, 
new_vaccination numeric,
cumulative_vaccination numeric
)

Insert into #percentPopulationVaccinated
Select dea.continent, dea.location, dea.date, dea.population,  
SUM(cast(vac.new_vaccinations as BIGINT)) OVER (Partition by dea.Location Order by dea.location, dea.date) as cumulative_vaccination, 
sum(cast(dea.new_deaths as BIGINT)) OVER (partition by dea.Location Order by dea.location, dea.date) as cumulative_deaths
From PortfolioProject..CovidDeaths dea
Join PortfolioProject..CovidVaccinations vac
 | on dea.location = vac.location
, and dea.date = vac.date
where dea.continent is not null

Select *, (cumulative_vaccination)/population*100 percentage_vaccinated
From #percentPopulationVaccinated
order by location, date

#### Creating view to store data for later visualizations

In [None]:
Create view PercentPopulationVaccinated as
Select dea.continent, dea.location, dea.date, dea.population,  
SUM(cast(vac.new_vaccinations as BIGINT)) OVER (Partition by dea.Location Order by dea.location, dea.date) as cumulative_vaccination, 
sum(cast(dea.new_deaths as BIGINT)) OVER (partition by dea.Location Order by dea.location, dea.date) as cumulative_deaths
From PortfolioProject..CovidDeaths dea
Join PortfolioProject..CovidVaccinations vac
	on dea.location = vac.location
	and dea.date = vac.date
where dea.continent is not null

Select * 
From PercentPopulationVaccinated