# Monkeypox Analysis

Monkeypox is an infectious viral disease that can occur in humans and some other animals. The disease is caused by monkeypox virus, a zoonotic virus in the genus Orthopoxvirus. An ongoing outbreak of this viral disease monkeypox was confirmed in May 2022, beginning with a cluster of cases found in the United Kingdom. 

The first confirmed case was traced to an individual with travel links to Nigeria (where the disease is endemic) and was detected on 6 May 2022, although it has been suggested that cases were already spreading in the previous months. 

From 18 May onwards, cases were reported from an increasing number of countries and regions, predominantly in Europe, but also in North and South America, Asia, Africa and Australia. As of 27 June, the World Health Organization has declared this an "Evolving Health Threat" rather than a Public Health Emergency of International Concern (PHEIC).

In [6]:
# Import libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

from plotly.offline import init_notebook_mode
init_notebook_mode(connected=True)

# About the dataset

Monkey_Pox_Cases_Worldwide : This dataset contains a tally of confirmed and suspected cases in all the countries.

Worldwide_Case_Detection_Timeline : This dataset contains the timeline for confirmed cases w.r.t. date time, it also contains some other details on every case that is being reported.

Daily_Country_Wise_Confirmed_Cases : This dataset contains the daily number of confirmed cases for all the countries where the virus has entered.

In [8]:
# Read data files
case = pd.read_csv('Monkey_Pox_Cases_Worldwide.csv')
case_timeline = pd.read_csv('Worldwide_Case_Detection_Timeline.csv')
confirmed_case = pd.read_csv('Daily_Country_Wise_Confirmed_Cases.csv')

In [11]:
# Print shape of data files (row, column)
print('Cases Worldwide: ', case.shape)
print('Case Detection Timeline: ', case_timeline.shape)
print('Daily Confirmed Cases: ', confirmed_case.shape)

Cases Worldwide:  (125, 6)
Case Detection Timeline:  (58002, 9)
Daily Confirmed Cases:  (107, 129)


# Basic exploration of the dataset

In [12]:
case.head()

Unnamed: 0,Country,Confirmed_Cases,Suspected_Cases,Hospitalized,Travel_History_Yes,Travel_History_No
0,England,3320.0,0.0,5.0,2.0,7.0
1,Portugal,871.0,0.0,0.0,0.0,34.0
2,Spain,6884.0,0.0,13.0,2.0,0.0
3,United States,21761.0,0.0,4.0,41.0,11.0
4,Canada,1320.0,12.0,1.0,5.0,0.0


In [13]:
case_timeline.head()

Unnamed: 0,Date_confirmation,Country,City,Age,Gender,Symptoms,Hospitalised (Y/N/NA),Isolated (Y/N/NA),Travel_history (Y/N/NA)
0,2022-01-31,Nigeria,,,,,,,
1,2022-01-31,Nigeria,,,,,,,
2,2022-01-31,Nigeria,,,,,,,
3,2022-02-17,Cameroon,,0-39,,,,,
4,2022-02-17,Cameroon,,0-39,,,,,


In [16]:
case_timeline.tail()

Unnamed: 0,Date_confirmation,Country,City,Age,Gender,Symptoms,Hospitalised (Y/N/NA),Isolated (Y/N/NA),Travel_history (Y/N/NA)
57997,2022-09-09,United States,,,,,,,
57998,2022-09-09,United States,,,,,,,
57999,2022-09-09,United States,,,,,,,
58000,2022-09-09,United States,,,,,,,
58001,2022-09-09,United States,,,,,,,


In [15]:
confirmed_case.head()

Unnamed: 0,Country,2022-01-31,2022-02-17,2022-02-28,2022-03-04,2022-03-31,2022-04-10,2022-04-12,2022-04-30,2022-05-06,...,2022-08-31,2022-09-01,2022-09-02,2022-09-03,2022-09-04,2022-09-05,2022-09-06,2022-09-07,2022-09-08,2022-09-09
0,Nigeria,3,0,1,0,6,0,0,5,0,...,0,0,0,0,0,0,0,0,0,0
1,Cameroon,0,3,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Central African Republic,0,0,0,2,0,4,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Republic of Congo,0,0,0,0,0,0,2,0,0,...,0,0,0,0,0,0,0,0,0,0
4,England,0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,63,0,0,0,0


In [18]:
print(f"Case Dataset Information :\n")
case.info()

Case Dataset Information :

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 125 entries, 0 to 124
Data columns (total 6 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Country             125 non-null    object 
 1   Confirmed_Cases     125 non-null    float64
 2   Suspected_Cases     125 non-null    float64
 3   Hospitalized        125 non-null    float64
 4   Travel_History_Yes  125 non-null    float64
 5   Travel_History_No   125 non-null    float64
dtypes: float64(5), object(1)
memory usage: 6.0+ KB


# Summary of Dataset

In [22]:
print(f"Summary of Case Dataset :\n")
case.describe().T  # trims the decimals 

Summary of Case Dataset :



Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Confirmed_Cases,125.0,461.496,2153.757181,0.0,2.0,5.0,71.0,21761.0
Suspected_Cases,125.0,27.056,242.531884,0.0,0.0,0.0,0.0,2681.0
Hospitalized,125.0,1.184,2.82667,0.0,0.0,0.0,1.0,18.0
Travel_History_Yes,125.0,2.128,4.827635,0.0,0.0,1.0,3.0,41.0
Travel_History_No,125.0,0.72,3.56642,0.0,0.0,0.0,0.0,34.0
