<a href="https://www.kaggle.com/code/theyazilimci/covid-world-vaccination-analysis?scriptVersionId=94443532" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Covid World Vaccination 💉 

> Context: Data is collected daily from Our World in Data GitHub repository for covid-19, merged and uploaded. Country level vaccination data is gathered and assembled in one single file. Then, this data file is merged with locations data file to include vaccination sources information. A second file, with manufacturers information, is included.

<div class="alert alert-block alert-info" style="font-size:16px; font-family:Helvetica;">
     📌 In This Notebook we'll try to analyse the Covid World Vaccination dataset
     that contain some columns such as <br>
       <p style="font-family: Helvetica;background_color: #fffffff;">
        <li>Date information </li>
        <li>Country Name </li>
        <li>Daily vaccinations per million</li>
         <li>the source and more </li>
        </p>
       We are going to analysis 2 files by creating some useful function and plotting the results 
</div>

<img src= "https://www.interpol.int/var/interpol/storage/images/7/1/4/0/230417-1-eng-GB/Cyber_preview.jpg" alt ="Image" style='width: 900px;'>

# Used Library 📖 <br>
We'll use basic python library well known for data analysis and observation 
*  Numpy link: https://numpy.org
*  Pandas link: https://pandas.pydata.org
*  Matplotlib link: https://matplotlib.org
*  Plotly link: https://plotly.com

<hr>

In [1]:
import numpy as np
import pandas as pd 
import seaborn as sns 
import matplotlib.pyplot as plt 
from plotly.offline import init_notebook_mode, iplot, plot
import plotly as py
init_notebook_mode(connected=True)
import plotly.graph_objs as go
import plotly.express as px
from pandas_profiling import ProfileReport

# First Look to the Data 👀


<div class="alert alert-block alert-info" style="font-size:16px; font-family:Helvetica;">
     📌 Here we are going to create one function that i thought useful when you need to get the original data sometimes after a bug to fix it,so i create a function getDf that return the original data set <br>
then we'll look at the columns name <br> Check the head of the data,<br> Get the description and info about it ℹ️ <br> Add a profiling to get a summary of the whole dataset 
</div>


In [2]:
def getDf():
    
    df = pd.read_csv('/kaggle/input/covid-world-vaccination-progress/country_vaccinations.csv')
    return df


In [3]:
df = getDf()

df.columns

Index(['country', 'iso_code', 'date', 'total_vaccinations',
       'people_vaccinated', 'people_fully_vaccinated',
       'daily_vaccinations_raw', 'daily_vaccinations',
       'total_vaccinations_per_hundred', 'people_vaccinated_per_hundred',
       'people_fully_vaccinated_per_hundred', 'daily_vaccinations_per_million',
       'vaccines', 'source_name', 'source_website'],
      dtype='object')

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 86512 entries, 0 to 86511
Data columns (total 15 columns):
 #   Column                               Non-Null Count  Dtype  
---  ------                               --------------  -----  
 0   country                              86512 non-null  object 
 1   iso_code                             86512 non-null  object 
 2   date                                 86512 non-null  object 
 3   total_vaccinations                   43607 non-null  float64
 4   people_vaccinated                    41294 non-null  float64
 5   people_fully_vaccinated              38802 non-null  float64
 6   daily_vaccinations_raw               35362 non-null  float64
 7   daily_vaccinations                   86213 non-null  float64
 8   total_vaccinations_per_hundred       43607 non-null  float64
 9   people_vaccinated_per_hundred        41294 non-null  float64
 10  people_fully_vaccinated_per_hundred  38802 non-null  float64
 11  daily_vaccinations_per_milli

In [5]:
df.describe()

Unnamed: 0,total_vaccinations,people_vaccinated,people_fully_vaccinated,daily_vaccinations_raw,daily_vaccinations,total_vaccinations_per_hundred,people_vaccinated_per_hundred,people_fully_vaccinated_per_hundred,daily_vaccinations_per_million
count,43607.0,41294.0,38802.0,35362.0,86213.0,43607.0,41294.0,38802.0,86213.0
mean,45929640.0,17705080.0,14138300.0,270599.6,131305.5,80.188543,40.927317,35.523243,3257.049157
std,224600400.0,70787310.0,57139200.0,1212427.0,768238.8,67.913577,29.290759,28.376252,3934.31244
min,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,526410.0,349464.2,243962.2,4668.0,900.0,16.05,11.37,7.02,636.0
50%,3590096.0,2187310.0,1722140.0,25309.0,7343.0,67.52,41.435,31.75,2050.0
75%,17012300.0,9152520.0,7559870.0,123492.5,44098.0,132.735,67.91,62.08,4682.0
max,3263129000.0,1275541000.0,1240777000.0,24741000.0,22424290.0,345.37,124.76,122.37,117497.0


For those who want a well structured and global information 

In [6]:
"""
profiling = ProfileReport(df)
profiling.to_file("profiling.html")
profiling
"""

'\nprofiling = ProfileReport(df)\nprofiling.to_file("profiling.html")\nprofiling\n'

<hr>

# Exemple Plotting 

<div class="alert alert-block alert-info" style="font-size:16px; font-family:Helvetica;">
     📌 Here we are going to create a function that get a Country Name in parameter and make a plot of the daily vaccination it is easy with this function to get for each country his daily vaccination information 
</div>



In [7]:
def dateVacination(countryCode):
    df = getDf()
    
    dfA = df[df['iso_code'] == countryCode]
    liste = list(set(dfA['country']))

    fig = px.scatter(x=dfA['date'] , y=dfA['daily_vaccinations'],title='Country {}'.format(liste[0]))

    fig.show()
    

liste = list(set(df['iso_code']))

for i in range(3):
    source_code = liste[i]
    dateVacination(source_code)
    

**You can see some missing value in the data**

#  Data Analysis 📈

<div class="alert alert-block alert-info" style="font-size:16px; font-family:Helvetica;">
     📌 In this section we are going to make some basic analysis of the data using different type of plot<br>
first we are going to see for the last date which is the 2022-02-04 the repartition of daily vaccinations 
</div>

In [8]:
df = df[ df['date'] == '2022-02-04']
df = df[['country','daily_vaccinations']].sort_values(by='daily_vaccinations', ascending=False)

In [9]:
fig = px.treemap(df, path=[px.Constant('daily_vaccinations'),'country'], values='daily_vaccinations',
                   hover_data=['country'])
fig.show()

### Compare it with the first day 

In [10]:
df = getDf()

df = df[ df['date'] == '2021-02-22']
df = df[['country','daily_vaccinations']].sort_values(by='daily_vaccinations', ascending=False)

In [11]:
fig = px.treemap(df, path=[px.Constant('daily_vaccinations'),'country'], values='daily_vaccinations',
                   hover_data=['country'])
fig.show()

<div class="alert alert-block alert-info" style="font-size:16px; font-family:Helvetica;">
     📌 NOTE: India took the place of the United States 
</div>

# Version 1 
<hr>

# Go Deeper Into The Analysis

<div class="alert alert-block alert-info" style="font-size:16px; font-family:Helvetica;">
     📌 The dataframe give us the ratio per percent, vaccination / population that is helpful for us  
</div>

In [12]:
df = getDf()

In [13]:
df = df[ df['date'] == '2022-02-04']
df = df.sort_values(by='people_vaccinated_per_hundred',ascending=False)

In [14]:
px.bar(df.head(10), x='country', y='people_vaccinated_per_hundred',
                   title='People Vaccinated per Hundred for the Date 2022-02-04')

<div class="alert alert-block alert-info" style="font-size:16px; font-family:Helvetica;">
     📌 We can say that often theses country total population are between 5 to 40 Millions 
</div>

In [15]:
df = df.sort_values(by='people_fully_vaccinated_per_hundred',ascending=False)

px.bar(df.head(10), x='country', y='people_fully_vaccinated_per_hundred',
                   title='People Fully Vaccinated per Hundred for the Date 2022-02-04')

<div class="alert alert-block alert-info" style="font-size:16px; font-family:Helvetica;">
     📌 The classement don't change a lot, United Arab Emirates become first 
</div>

# Second Part ❷

<div class="alert alert-block alert-info" style="font-size:16px; font-family:Helvetica;">
     📌 let take a look at the second file that we have 
</div>



In [16]:
def getDfA():
    
    dfA = pd.read_csv('/kaggle/input/covid-world-vaccination-progress/country_vaccinations_by_manufacturer.csv')
    return dfA


<div class="alert alert-block alert-info" style="font-size:16px; font-family:Helvetica;">
     📌 I do the same thing as the first file by creating a function for this file to, firstly 
        let's get the last date vaccinations records
</div>

In [17]:
dfA = getDfA()
dfA = dfA[dfA.date == '2022-02-04']
dfA.head()

Unnamed: 0,location,date,vaccine,total_vaccinations
2305,Argentina,2022-02-04,CanSino,468481
2306,Argentina,2022-02-04,Moderna,5318406
2307,Argentina,2022-02-04,Oxford/AstraZeneca,25606912
2308,Argentina,2022-02-04,Pfizer/BioNTech,11225368
2309,Argentina,2022-02-04,Sinopharm/Beijing,27396208


In [18]:
def vaccineRepartition(location):

    dictObject = {}
    
    dfA = getDfA()
   
    dfA = dfA[dfA['location'] == location]

    fig = px.bar(x=dfA['vaccine'], y=dfA['total_vaccinations'],
                   title='Most Used Vaccine for {}'.format(location))
    
    fig.show()

In [19]:
vaccineRepartition('Belgium') # change the name Belgium with what you want


<div class="alert alert-block alert-info" style="font-size:16px; font-family:Helvetica;">
     📌 Belgium use the vaccine BioNTech vaccine the most,there aren't the statistics for some Country like  India,or China nor for the USA  
</div>

In [20]:
total  = dfA.groupby('vaccine').sum()

px.bar(x=total.index, y=total['total_vaccinations'],
                   title='Most Used Vaccine in the World')


<div class="alert alert-block alert-info" style="font-size:16px; font-family:Helvetica;">
     📌 Most used vaccine in the world is BioNTech
</div>

In [21]:
dfA = getDfA()
dfA.columns

Index(['location', 'date', 'vaccine', 'total_vaccinations'], dtype='object')

In [22]:
bestTotalVac = dfA[dfA['total_vaccinations'] == max(dfA['total_vaccinations'])]

bestTotalVac

Unnamed: 0,location,date,vaccine,total_vaccinations
35619,European Union,2022-03-29,Pfizer/BioNTech,600519998


<div class="alert alert-block alert-info" style="font-size:16px; font-family:Helvetica;">
     📌 European Union has made 56730415 vaccinations, let's look it with the date
</div>

In [23]:
europeanUnion = dfA[dfA['location'] ==  'European Union']

In [24]:
px.line(europeanUnion, x="date", y="total_vaccinations",color='vaccine',title='European Union Vaccination')

<div class="alert alert-block alert-info" style="font-size:16px; font-family:Helvetica;">
     📌 We can create a function for this too 
</div>

In [25]:
def vaccinationDuration(location):
    
    dfA = getDfA()
    
    europeanUnion = dfA[dfA['location'] ==  location]
    
    fig = px.line(europeanUnion, x="date", y="total_vaccinations",color='vaccine',title='{} Vaccination'.format(location))
    
    fig.show()


In [26]:
import random 
listeOfCountry = list(set(dfA['location']))

for i in range(3):
    location = random.choice(listeOfCountry)
    vaccinationDuration(location)
    

# Version 2