<h1 style="text-align: center">
<div style="color: #DD3403; font-size: 60%">Data Science DISCOVERY MicroProject</div>
<span style="">MicroProject: Exploring COVID-19 Data from GitHub</span>
<div style="font-size: 60%;"><a href="https://discovery.cs.illinois.edu/microproject/covid-data-from-github/">https://discovery.cs.illinois.edu/microproject/covid-data-from-github/</a></div>
</h1>

<hr style="color: #DD3403;">

## Data Source: COVID-19 Case Data from Johns Hopkins University, via GitHub

Since before COVID-19 was detected in the United States, the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University has provided daily updates of COVID-19 case data as clean, structured CSV files on GitHub as a free public service to the world.

You can view their COVID-19 GitHub repository here: [https://github.com/CSSEGISandData/COVID-19](https://github.com/CSSEGISandData/COVID-19).  You can find their daily reports by navigating into their repository:

- Click **csse_covid_19_data** to navigate into the `csse_covid_19_data` folder,
- Navigate into `csse_covid_19_daily_reports`,
- Find the CSV data for **Jan. 3, 2022** *(it'll be near the top, be careful to get the correct year)*
- Click the **Raw** button to above the file contents to navigate to the raw CSV version of the file (without the GitHub interface)
- Use the URL of the **raw data as your dataset** for this MicroProject.

Use panda's `read_csv` function to read the dataset you found and create a DataFrame called `df`:

In [1]:
import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/01-03-2022.csv")
df

Unnamed: 0,FIPS,Admin2,Province_State,Country_Region,Last_Update,Lat,Long_,Confirmed,Deaths,Recovered,Active,Combined_Key,Incident_Rate,Case_Fatality_Ratio
0,,,,Afghanistan,2022-01-04 04:22:17,33.93911,67.709953,158183,7364,,,Afghanistan,406.344057,4.655368
1,,,,Albania,2022-01-04 04:22:17,41.15330,20.168300,210885,3220,,,Albania,7327.993606,1.526899
2,,,,Algeria,2022-01-04 04:22:17,28.03390,1.659600,219532,6298,,,Algeria,500.631194,2.868830
3,,,,Andorra,2022-01-04 04:22:17,42.50630,1.521800,24502,140,,,Andorra,31711.641752,0.571382
4,,,,Angola,2022-01-04 04:22:17,-11.20270,17.873900,83764,1775,,,Angola,254.863132,2.119049
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4011,,,Unknown,Ukraine,2022-01-04 04:22:17,,,0,0,0.0,0.0,"Unknown, Ukraine",0.000000,0.000000
4012,,,,Nauru,2022-01-04 04:22:17,-0.52280,166.931500,0,0,0.0,0.0,Nauru,0.000000,0.000000
4013,,,Niue,New Zealand,2022-01-04 04:22:17,-19.05440,-169.867200,0,0,0.0,0.0,"Niue, New Zealand",0.000000,0.000000
4014,,,,Tuvalu,2022-01-04 04:22:17,-7.10950,177.649300,0,0,0.0,0.0,Tuvalu,0.000000,0.000000


### 🔬 Checkpoint Tests 🔬

In [2]:
### TEST CASE for Data Import
tada = "\N{PARTY POPPER}"

assert("df" in vars())
assert("Country_Region" in df)
assert("People_Hospitalized" not in df), "Make sure you have the global daily reports, not just the US daily reports."
assert("India" in df["Country_Region"].unique())
assert("2022-01-04" in df["Last_Update"].unique()[0]), "Make sure you have the Jan. 3, 2022 CSV file."
print(f"{tada} All Tests Passed! {tada}")

🎉 All Tests Passed! 🎉


<hr style="color: #DD3403;">

## Country-Level Analysis of COVID-19

The CSV file from JHU provides the **total reported cases over all time until the end of the day on Jan. 3, 2022**.  However, the data is often breaks countries into individual regions.  For example, let's check out the United States.  Create a DataFrame with all records from the dataset with data about the United States in the variable `df_us`:

In [3]:
df_us = df[df["Country_Region"]=="US"]
df_us

Unnamed: 0,FIPS,Admin2,Province_State,Country_Region,Last_Update,Lat,Long_,Confirmed,Deaths,Recovered,Active,Combined_Key,Incident_Rate,Case_Fatality_Ratio
673,1001.0,Autauga,Alabama,US,2022-01-04 04:22:17,32.539527,-86.644082,11256,160,,,"Autauga, Alabama, US",20147.129893,1.421464
674,1003.0,Baldwin,Alabama,US,2022-01-04 04:22:17,30.727750,-87.722071,40549,593,,,"Baldwin, Alabama, US",18164.347725,1.462428
675,1005.0,Barbour,Alabama,US,2022-01-04 04:22:17,31.868263,-85.387129,3961,81,,,"Barbour, Alabama, US",16045.531880,2.044938
676,1007.0,Bibb,Alabama,US,2022-01-04 04:22:17,32.996421,-87.125115,4594,95,,,"Bibb, Alabama, US",20514.423506,2.067915
677,1009.0,Blount,Alabama,US,2022-01-04 04:22:17,33.982109,-86.567906,11368,198,,,"Blount, Alabama, US",19658.976931,1.741731
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3947,56039.0,Teton,Wyoming,US,2022-01-04 04:22:17,43.935225,-110.589080,6273,14,,,"Teton, Wyoming, US",26734.572110,0.223179
3948,56041.0,Uinta,Wyoming,US,2022-01-04 04:22:17,41.287818,-110.547578,4173,31,,,"Uinta, Wyoming, US",20631.859982,0.742871
3949,90056.0,Unassigned,Wyoming,US,2022-01-04 04:22:17,,,0,0,,,"Unassigned, Wyoming, US",,
3950,56043.0,Washakie,Wyoming,US,2022-01-04 04:22:17,43.904516,-107.680187,1886,37,,,"Washakie, Wyoming, US",24163.997438,1.961824


### Analysis of COVID-19 in the United States

Create a new DataFrame, `df_us_sorted`, that sorts the DataFrame based on the number of confirmed cases of COVID-19 in the United States, where the **first row contains the location with the highest number of confirmed cases**:

In [4]:
df_us_sorted = df_us.sort_values("Confirmed",ascending=False)
df_us_sorted

Unnamed: 0,FIPS,Admin2,Province_State,Country_Region,Last_Update,Lat,Long_,Confirmed,Deaths,Recovered,Active,Combined_Key,Incident_Rate,Case_Fatality_Ratio
885,6037.0,Los Angeles,California,US,2022-01-04 04:22:17,34.308284,-118.228241,1757522,27647,,,"Los Angeles, California, US",17506.756328,1.573067
781,4013.0,Maricopa,Arizona,US,2022-01-04 04:22:17,33.348359,-112.491815,887308,13737,,,"Maricopa, Arizona, US",19782.075857,1.548166
1305,17031.0,Cook,Illinois,US,2022-01-04 04:22:17,41.841448,-87.816588,862761,12055,,,"Cook, Illinois, US",16751.882876,1.397258
1050,12086.0,Miami-Dade,Florida,US,2022-01-04 04:22:17,25.611236,-80.551706,854670,6472,,,"Miami-Dade, Florida, US",31457.080392,0.757251
3440,48201.0,Harris,Texas,US,2022-01-04 04:22:17,29.858649,-95.393395,759902,9766,,,"Harris, Texas, US",14660.987732,1.413273
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2848,90039.0,Unassigned,Ohio,US,2022-01-04 04:22:17,,,0,5,,,"Unassigned, Ohio, US",,
725,80001.0,Out of AL,Alabama,US,2020-12-21 13:27:30,,,0,0,,,"Out of AL, Alabama, US",,
2763,90038.0,Unassigned,North Dakota,US,2022-01-04 04:22:17,,,0,45,,,"Unassigned, North Dakota, US",,
1287,90016.0,Unassigned,Idaho,US,2022-01-04 04:22:17,,,0,0,,,"Unassigned, Idaho, US",,


### Create a DataFrame for Country Level Analysis

Create a new DataFrame, `df_countries`, that aggregates the data within each country together to get a DataFrame that contains one row for each country:

In [5]:
df_countries = df.groupby("Country_Region").agg("sum").reset_index()
df_countries

Unnamed: 0,Country_Region,FIPS,Admin2,Province_State,Last_Update,Lat,Long_,Confirmed,Deaths,Recovered,Active,Combined_Key,Incident_Rate,Case_Fatality_Ratio
0,Afghanistan,0.0,0,0,2022-01-04 04:22:17,33.939110,67.709953,158183,7364,0.0,0.0,Afghanistan,406.344057,4.655368
1,Albania,0.0,0,0,2022-01-04 04:22:17,41.153300,20.168300,210885,3220,0.0,0.0,Albania,7327.993606,1.526899
2,Algeria,0.0,0,0,2022-01-04 04:22:17,28.033900,1.659600,219532,6298,0.0,0.0,Algeria,500.631194,2.868830
3,Andorra,0.0,0,0,2022-01-04 04:22:17,42.506300,1.521800,24502,140,0.0,0.0,Andorra,31711.641752,0.571382
4,Angola,0.0,0,0,2022-01-04 04:22:17,-11.202700,17.873900,83764,1775,0.0,0.0,Angola,254.863132,2.119049
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
196,West Bank and Gaza,0.0,0,0,2022-01-04 04:22:17,31.952200,35.233200,469748,4919,0.0,0.0,West Bank and Gaza,9208.188472,1.047157
197,Winter Olympics 2022,0.0,0,0,2022-01-04 04:22:17,39.904200,116.407400,0,0,0.0,0.0,Winter Olympics 2022,0.000000,0.000000
198,Yemen,0.0,0,0,2022-01-04 04:22:17,15.552727,48.516388,10138,1984,0.0,0.0,Yemen,33.990515,19.569935
199,Zambia,0.0,0,0,2022-01-04 04:22:17,-13.133897,27.849332,261221,3753,0.0,0.0,Zambia,1420.918327,1.436715


### Performing Country-Level Analysis

Create a DataFrame called `df_most_cases` that contains the country which has had the most confirmed cases of COVID-19:

In [6]:
df_most_cases = df_countries.nlargest(1,"Confirmed")
df_most_cases

Unnamed: 0,Country_Region,FIPS,Admin2,Province_State,Last_Update,Lat,Long_,Confirmed,Deaths,Recovered,Active,Combined_Key,Incident_Rate,Case_Fatality_Ratio
186,US,105902623.0,AutaugaBaldwinBarbourBibbBlountBullockButlerCa...,AlabamaAlabamaAlabamaAlabamaAlabamaAlabamaAlab...,2022-01-04 04:22:172022-01-04 04:22:172022-01-...,121693.134039,-293352.619164,56438999,828234,0.0,0.0,"Autauga, Alabama, USBaldwin, Alabama, USBarbou...",55760020.0,7600.185927


### 🔬 Checkpoint Tests 🔬

In [7]:
### TEST CASE for Data Analysis
tada = "\N{PARTY POPPER}"

assert("df_us" in vars())
assert("Province_State" in df_us)

assert("df_us_sorted" in vars())
assert(df_us_sorted.iloc[:,-7].values[2] <= df_us_sorted.iloc[:,-7].values[1])
assert(df_us_sorted.iloc[:,-7].values[10] <= df_us_sorted.iloc[:,-7].values[9])
assert(df_us_sorted["Confirmed"].values[0]) == max(df_us["Confirmed"])

assert("df_countries" in vars())
assert(df_countries["Confirmed"].sum() == df["Confirmed"].cumsum().values[len(df) - 1])

assert("df_most_cases" in vars())
assert(len(df_most_cases) == 1)
assert(len(df_most_cases.iloc[0]["Country_Region"]) == 2)

print(f"{tada} All Tests Passed! {tada}")


🎉 All Tests Passed! 🎉


<hr style="color: #DD3403;">

## Checking for the Pareto principle

The Pareto principle states that *"for many outcomes, roughly 80% of consequences come from 20% of causes (the "vital few")"* ([See more on Wikipedia: Pareto principle](https://en.wikipedia.org/wiki/Pareto_principle)).  This is also known as the "80-20 rule" and appears often in Data Science.

In terms of COVID-19 cases, the application of the Pareto principle would be that **80% of confirmed cases comes from just 20% of countries**.  Is this true?

To test this, we need to find total number of cases across all countries.  Compute the total number of confirmed cases in the variable `confirmed_total` (this should be a number, not a DataFrame):

In [8]:
confirmed_total = df_countries["Confirmed"].sum()
confirmed_total

293190148

Using a bit of math, 80% of the total number of confirmed cases would be:

In [9]:
confirmed_80pct = confirmed_total * 0.8
confirmed_80pct

234552118.4

### Finding the Cumulative Sum of Cases

DataFrames provides the function "cumulative sum" function, or `df.cumsum(...)`, that allows us to calculate the sum of every row up until and including the current row.

Read the DISCOVERY guide to learn the syntax on "What is the Cumulative Sum of a pandas DataFrame?" to find out more on using the cumulative sum function:
- [Guide: "What is the Cumulative Sum of a pandas DataFrame?"](https://discovery.cs.illinois.edu/guides/DataFrame-Fundamentals/Cumulative-Sum-in-pandas/)

Before finding the cumulative sum, we need to have a sorted DataFrame of all countries in ascending order.  Use `df_countries` to create a DataFrame sorted by confirmed cases in the variable `df_countries_sorted`:

In [10]:
df_countries_sorted = df_countries.sort_values("Confirmed",ascending=False)
df_countries_sorted

Unnamed: 0,Country_Region,FIPS,Admin2,Province_State,Last_Update,Lat,Long_,Confirmed,Deaths,Recovered,Active,Combined_Key,Incident_Rate,Case_Fatality_Ratio
186,US,105902623.0,AutaugaBaldwinBarbourBibbBlountBullockButlerCa...,AlabamaAlabamaAlabamaAlabamaAlabamaAlabamaAlab...,2022-01-04 04:22:172022-01-04 04:22:172022-01-...,121693.134039,-293352.619164,56438999,828234,0.0,0.0,"Autauga, Alabama, USBaldwin, Alabama, USBarbou...",5.576002e+07,7600.185927
80,India,0.0,0,Andaman and Nicobar IslandsAndhra PradeshAruna...,2022-01-04 04:22:172022-01-04 04:22:172022-01-...,831.177882,2945.020567,34960261,482017,0.0,0.0,"Andaman and Nicobar Islands, IndiaAndhra Prade...",1.537603e+05,46.956740
24,Brazil,0.0,0,AcreAlagoasAmapaAmazonasBahiaCearaDistrito Fed...,2022-01-04 04:22:172022-01-04 04:22:172022-01-...,-342.077100,-1308.973300,22309081,619473,0.0,0.0,"Acre, BrazilAlagoas, BrazilAmapa, BrazilAmazon...",3.272539e+05,66.658983
190,United Kingdom,0.0,0,AnguillaBermudaBritish Virgin IslandsCayman Is...,2022-01-04 04:22:172022-01-04 04:22:172022-01-...,496.754894,-642.546956,13498751,178396,0.0,0.0,"Anguilla, United KingdomBermuda, United Kingdo...",1.993084e+05,11.388305
63,France,0.0,0,French GuianaFrench PolynesiaGuadeloupeMartini...,2022-01-04 04:22:172022-01-04 04:22:172022-01-...,77.169695,-416.763414,10422830,125200,0.0,0.0,"French Guiana, FranceFrench Polynesia, FranceG...",1.306596e+05,13.176218
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
181,Tonga,0.0,0,0,2022-01-04 04:22:17,-21.179000,-175.198200,1,0,0.0,0.0,Tonga,9.461006e-01,0.000000
93,"Korea, North",0.0,0,0,2022-01-04 04:22:17,40.339900,127.510100,0,0,0.0,0.0,"Korea, North",0.000000e+00,0.000000
126,Nauru,0.0,0,0,2022-01-04 04:22:17,-0.522800,166.931500,0,0,0.0,0.0,Nauru,0.000000e+00,0.000000
197,Winter Olympics 2022,0.0,0,0,2022-01-04 04:22:17,39.904200,116.407400,0,0,0.0,0.0,Winter Olympics 2022,0.000000e+00,0.000000


Using `df_countries_sorted`, create a new column called `Cumulative Confirmed` that contains the cumulative sum of the Confirmed cases:

In [12]:
df_countries_sorted["Cumulative Confirmed"] = df_countries_sorted["Confirmed"].cumsum()
df_countries_sorted

Unnamed: 0,Country_Region,FIPS,Admin2,Province_State,Last_Update,Lat,Long_,Confirmed,Deaths,Recovered,Active,Combined_Key,Incident_Rate,Case_Fatality_Ratio,Cumulative Confirmed
186,US,105902623.0,AutaugaBaldwinBarbourBibbBlountBullockButlerCa...,AlabamaAlabamaAlabamaAlabamaAlabamaAlabamaAlab...,2022-01-04 04:22:172022-01-04 04:22:172022-01-...,121693.134039,-293352.619164,56438999,828234,0.0,0.0,"Autauga, Alabama, USBaldwin, Alabama, USBarbou...",5.576002e+07,7600.185927,56438999
80,India,0.0,0,Andaman and Nicobar IslandsAndhra PradeshAruna...,2022-01-04 04:22:172022-01-04 04:22:172022-01-...,831.177882,2945.020567,34960261,482017,0.0,0.0,"Andaman and Nicobar Islands, IndiaAndhra Prade...",1.537603e+05,46.956740,91399260
24,Brazil,0.0,0,AcreAlagoasAmapaAmazonasBahiaCearaDistrito Fed...,2022-01-04 04:22:172022-01-04 04:22:172022-01-...,-342.077100,-1308.973300,22309081,619473,0.0,0.0,"Acre, BrazilAlagoas, BrazilAmapa, BrazilAmazon...",3.272539e+05,66.658983,113708341
190,United Kingdom,0.0,0,AnguillaBermudaBritish Virgin IslandsCayman Is...,2022-01-04 04:22:172022-01-04 04:22:172022-01-...,496.754894,-642.546956,13498751,178396,0.0,0.0,"Anguilla, United KingdomBermuda, United Kingdo...",1.993084e+05,11.388305,127207092
63,France,0.0,0,French GuianaFrench PolynesiaGuadeloupeMartini...,2022-01-04 04:22:172022-01-04 04:22:172022-01-...,77.169695,-416.763414,10422830,125200,0.0,0.0,"French Guiana, FranceFrench Polynesia, FranceG...",1.306596e+05,13.176218,137629922
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
181,Tonga,0.0,0,0,2022-01-04 04:22:17,-21.179000,-175.198200,1,0,0.0,0.0,Tonga,9.461006e-01,0.000000,293190148
93,"Korea, North",0.0,0,0,2022-01-04 04:22:17,40.339900,127.510100,0,0,0.0,0.0,"Korea, North",0.000000e+00,0.000000,293190148
126,Nauru,0.0,0,0,2022-01-04 04:22:17,-0.522800,166.931500,0,0,0.0,0.0,Nauru,0.000000e+00,0.000000,293190148
197,Winter Olympics 2022,0.0,0,0,2022-01-04 04:22:17,39.904200,116.407400,0,0,0.0,0.0,Winter Olympics 2022,0.000000e+00,0.000000,293190148


Finally, create a DataFrame called `df_80pct` with all the countries up to the country that, cumulatively, account for 80% of the global cases (remember, that's the cases you stored in `confirmed_80pct`):

In [13]:
df_80pct = df_countries_sorted[df_countries_sorted["Cumulative Confirmed"]<= confirmed_80pct]
df_80pct

Unnamed: 0,Country_Region,FIPS,Admin2,Province_State,Last_Update,Lat,Long_,Confirmed,Deaths,Recovered,Active,Combined_Key,Incident_Rate,Case_Fatality_Ratio,Cumulative Confirmed
186,US,105902623.0,AutaugaBaldwinBarbourBibbBlountBullockButlerCa...,AlabamaAlabamaAlabamaAlabamaAlabamaAlabamaAlab...,2022-01-04 04:22:172022-01-04 04:22:172022-01-...,121693.134039,-293352.619164,56438999,828234,0.0,0.0,"Autauga, Alabama, USBaldwin, Alabama, USBarbou...",55760020.0,7600.185927,56438999
80,India,0.0,0,Andaman and Nicobar IslandsAndhra PradeshAruna...,2022-01-04 04:22:172022-01-04 04:22:172022-01-...,831.177882,2945.020567,34960261,482017,0.0,0.0,"Andaman and Nicobar Islands, IndiaAndhra Prade...",153760.3,46.95674,91399260
24,Brazil,0.0,0,AcreAlagoasAmapaAmazonasBahiaCearaDistrito Fed...,2022-01-04 04:22:172022-01-04 04:22:172022-01-...,-342.0771,-1308.9733,22309081,619473,0.0,0.0,"Acre, BrazilAlagoas, BrazilAmapa, BrazilAmazon...",327253.9,66.658983,113708341
190,United Kingdom,0.0,0,AnguillaBermudaBritish Virgin IslandsCayman Is...,2022-01-04 04:22:172022-01-04 04:22:172022-01-...,496.754894,-642.546956,13498751,178396,0.0,0.0,"Anguilla, United KingdomBermuda, United Kingdo...",199308.4,11.388305,127207092
63,France,0.0,0,French GuianaFrench PolynesiaGuadeloupeMartini...,2022-01-04 04:22:172022-01-04 04:22:172022-01-...,77.169695,-416.763414,10422830,125200,0.0,0.0,"French Guiana, FranceFrench Polynesia, FranceG...",130659.6,13.176218,137629922
147,Russia,0.0,0,Adygea RepublicAltai KraiAltai RepublicAmur Ob...,2022-01-04 04:22:172022-01-04 04:22:172022-01-...,4527.343882,5156.031399,10374292,305096,0.0,0.0,"Adygea Republic, RussiaAltai Krai, RussiaAltai...",589972.2,250.391796,148004214
184,Turkey,0.0,0,0,2022-01-04 04:22:17,38.9637,35.2433,9597670,82795,0.0,0.0,Turkey,11382.2,0.86248,157601884
67,Germany,0.0,0,Baden-WurttembergBayernBerlinBrandenburgBremen...,2022-01-04 04:22:172022-01-04 04:22:172022-01-...,825.1495,164.4461,7129500,111602,0.0,0.0,"Baden-Wurttemberg, GermanyBayern, GermanyBerli...",140020.6,25.414232,164731384
167,Spain,0.0,0,AndalusiaAragonAsturiasBalearesC. ValencianaCa...,2022-01-04 04:22:172022-01-04 04:22:172022-01-...,755.510158,-69.972552,6667511,89573,0.0,0.0,"Andalusia, SpainAragon, SpainAsturias, SpainBa...",267985.1,24.964501,171398895
86,Italy,0.0,0,AbruzzoBasilicataCalabriaCampaniaEmilia-Romagn...,2022-01-04 04:22:172022-01-04 04:22:172022-01-...,903.972147,256.745065,6396110,137786,0.0,0.0,"Abruzzo, ItalyBasilicata, ItalyCalabria, Italy...",217843.9,44.65312,177795005


### Does the Pareto Principle show up?

Currently:
- `df_countries` contains EVERY country in the world with COVID-19 data, and
- `df_80pct` contains countries that make up 80% of the cases.

If the Pareto principle applies to the confirmed cases of COVID-19, then we expect that `df_80pct` holds only approximately 20% of all the countries.  Let's see:


In [14]:
pct_cases = 100 * sum(df_80pct["Confirmed"]) / sum(df_countries["Confirmed"])
pct_cases = round(pct_cases, 2)

pct_countries = 100 * len(df_80pct) / len(df_countries)
pct_countries = round(pct_countries, 2)

print(f"Result: {pct_cases}% of the COVID-19 cases comes from {pct_countries}% of the countries in the dataset.")

Result: 79.42% of the COVID-19 cases comes from 12.44% of the countries in the dataset.


### 🔬 Checkpoint Tests 🔬

In [15]:
### TEST CASE for Pareto Principle
tada = "\N{PARTY POPPER}"

assert("confirmed_total" in vars())
assert("confirmed_80pct" in vars())
assert(confirmed_total > 2.9e8)
                         
assert("df_countries_sorted" in vars())
assert("Cumulative Confirmed" in df_countries_sorted)
assert(max(df_countries_sorted["Cumulative Confirmed"]) == sum(df_countries_sorted["Confirmed"]))
assert(min(df_countries_sorted["Cumulative Confirmed"]) == df_countries_sorted.iloc[0]["Confirmed"])

assert("df_80pct" in vars())
assert("US" in df_80pct["Country_Region"].unique())
assert("India" in df_80pct["Country_Region"].unique())
assert("Tonga" not in df_80pct["Country_Region"].unique())

assert(sum(df_80pct["Confirmed"]) / sum(df_countries["Confirmed"]) > 0.7)
assert(sum(df_80pct["Confirmed"]) / sum(df_countries["Confirmed"]) < 0.9)

assert(len(df_80pct["Confirmed"]) / len(df_countries["Confirmed"]) > 0.1)
assert(len(df_80pct["Confirmed"]) / len(df_countries["Confirmed"]) < 0.2)

print(f"{tada} All Tests Passed! {tada}")

🎉 All Tests Passed! 🎉


<hr style="color: #DD3403;">

## Submission

You're almost done!  All you need to do is to commit your lab to GitHub and run the GitHub Actions Grader:

1.  ⚠️ **Make certain to save your work.** ⚠️ To do this, go to **File => Save All**

2.  After you have saved, exit this notebook and return to https://discovery.cs.illinois.edu/microproject/covid-data-from-github/ and complete the section **"Commit and Grade Your Notebook"**.

3. If you see a 100% grade result on your GitHub Action, you've completed this MicroProject! 🎉
