# Assignment 3 - Pandas Data Analysis Practice

*This assignment is a part of the course ["Data Analysis with Python: Zero to Pandas"](https://jovian.ml/learn/data-analysis-with-python-zero-to-pandas)*

- By [Tushar Nankani](https://www.linkedin.com/in/tusharnankani/)

In [1]:
import pandas as pd

In this assignment, we're going to analyze an operate on data from a CSV file. Let's begin by downloading the CSV file.

In [2]:
from urllib.request import urlretrieve

urlretrieve('https://hub.jovian.ml/wp-content/uploads/2020/09/countries.csv', 
            'countries.csv')

('countries.csv', <http.client.HTTPMessage at 0x203f6651988>)

## Quick Hack: [*Reference*](https://www.shanelynn.ie/using-pandas-dataframe-creating-editing-viewing-data-in-python/)
#### Quickly lowercase and camelcase all column names in a DataFrame
- a tidying function for column names to ensure a standard, camel-case format for variables names. When loading data from potentially unstructured data sets, it can be useful to remove spaces and lowercase all column names
```python
data = pd.read_csv("https://shanelynnwebsite-mid9n9g1q9y8tt.netdna-ssl.com/path/to/csv/file.csv")
data.rename(columns=lambda x: x.lower().replace(' ', '_'))
```

Let's load the data from the CSV file into a Pandas data frame.

In [3]:
countries_df = pd.read_csv('countries.csv')

In [4]:
countries_df

Unnamed: 0,location,continent,population,life_expectancy,hospital_beds_per_thousand,gdp_per_capita
0,Afghanistan,Asia,38928341.0,64.83,0.50,1803.987
1,Albania,Europe,2877800.0,78.57,2.89,11803.431
2,Algeria,Africa,43851043.0,76.88,1.90,13913.839
3,Andorra,Europe,77265.0,83.73,,
4,Angola,Africa,32866268.0,61.15,,5819.495
...,...,...,...,...,...,...
205,Vietnam,Asia,97338583.0,75.40,2.60,6171.884
206,Western Sahara,Africa,597330.0,70.26,,
207,Yemen,Asia,29825968.0,66.12,0.70,1479.147
208,Zambia,Africa,18383956.0,63.89,2.00,3689.251


In [5]:
countries_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 210 entries, 0 to 209
Data columns (total 6 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   location                    210 non-null    object 
 1   continent                   210 non-null    object 
 2   population                  210 non-null    float64
 3   life_expectancy             207 non-null    float64
 4   hospital_beds_per_thousand  164 non-null    float64
 5   gdp_per_capita              183 non-null    float64
dtypes: float64(4), object(2)
memory usage: 10.0+ KB


**Q: How many countries does the dataframe contain?**

Hint: Use the `.shape` method.

In [6]:
num_countries = countries_df.shape[0]

In [7]:
print('There are {} countries in the dataset'.format(num_countries))

There are 210 countries in the dataset


**Q: Retrieve a list of continents from the dataframe?**

*Hint: Use the `.unique` method of a series.*

In [8]:
continents = countries_df.continent.unique()

In [9]:
continents

array(['Asia', 'Europe', 'Africa', 'North America', 'South America',
       'Oceania'], dtype=object)

**Q: What is the total population of all the countries listed in this dataset?**

In [10]:
total_population = countries_df.population.sum()

In [11]:
print('The total population is {}.'.format(int(total_population)))

The total population is 7757980095.


**Q: (Optional) What is the overall life expectancy across in the world?**

*Hint: You'll need to take a weighted average of life expectancy using populations as weights.*

In [12]:
overall_life_expectancy = ((countries_df.life_expectancy * countries_df.population).sum() / total_population.sum())

In [13]:
overall_life_expectancy

72.72165193409664

**Q: Create a dataframe containing 10 countries with the highest population.**

*Hint: Chain the `sort_values` and `head` methods.*

In [14]:
most_populous_df = countries_df.sort_values('population', ascending=False).head(10)

In [15]:
most_populous_df

Unnamed: 0,location,continent,population,life_expectancy,hospital_beds_per_thousand,gdp_per_capita
41,China,Asia,1439324000.0,76.91,4.34,15308.712
90,India,Asia,1380004000.0,69.66,0.53,6426.674
199,United States,North America,331002600.0,78.86,2.77,54225.446
91,Indonesia,Asia,273523600.0,71.72,1.04,11188.744
145,Pakistan,Asia,220892300.0,67.27,0.6,5034.708
27,Brazil,South America,212559400.0,75.88,2.2,14103.452
141,Nigeria,Africa,206139600.0,54.69,,5338.454
15,Bangladesh,Asia,164689400.0,72.59,0.8,3523.984
157,Russia,Europe,145934500.0,72.58,8.05,24765.954
125,Mexico,North America,128932800.0,75.05,1.38,17336.469


**Q: Add a new column in `countries_df` to record the overall GDP per country (product of population & per capita GDP).**



In [16]:
countries_df['gdp'] = (countries_df.population * countries_df.gdp_per_capita)

In [17]:
countries_df

Unnamed: 0,location,continent,population,life_expectancy,hospital_beds_per_thousand,gdp_per_capita,gdp
0,Afghanistan,Asia,38928341.0,64.83,0.50,1803.987,7.022622e+10
1,Albania,Europe,2877800.0,78.57,2.89,11803.431,3.396791e+10
2,Algeria,Africa,43851043.0,76.88,1.90,13913.839,6.101364e+11
3,Andorra,Europe,77265.0,83.73,,,
4,Angola,Africa,32866268.0,61.15,,5819.495,1.912651e+11
...,...,...,...,...,...,...,...
205,Vietnam,Asia,97338583.0,75.40,2.60,6171.884,6.007624e+11
206,Western Sahara,Africa,597330.0,70.26,,,
207,Yemen,Asia,29825968.0,66.12,0.70,1479.147,4.411699e+10
208,Zambia,Africa,18383956.0,63.89,2.00,3689.251,6.782303e+10


**Q: (Optional) Create a dataframe containing 10 countries with the lowest GDP per capita, among the counties with population greater than 100 million.**

In [18]:
required_df = countries_df[countries_df.population > 1e8].sort_values('gdp_per_capita').head(10)

In [19]:
required_df

Unnamed: 0,location,continent,population,life_expectancy,hospital_beds_per_thousand,gdp_per_capita,gdp
63,Ethiopia,Africa,114963600.0,66.6,0.3,1729.927,198878600000.0
15,Bangladesh,Asia,164689400.0,72.59,0.8,3523.984,580362800000.0
145,Pakistan,Asia,220892300.0,67.27,0.6,5034.708,1112128000000.0
141,Nigeria,Africa,206139600.0,54.69,,5338.454,1100467000000.0
90,India,Asia,1380004000.0,69.66,0.53,6426.674,8868838000000.0
151,Philippines,Asia,109581100.0,71.23,1.0,7599.188,832727300000.0
58,Egypt,Africa,102334400.0,71.99,1.6,10550.206,1079649000000.0
91,Indonesia,Asia,273523600.0,71.72,1.04,11188.744,3060386000000.0
27,Brazil,South America,212559400.0,75.88,2.2,14103.452,2997821000000.0
41,China,Asia,1439324000.0,76.91,4.34,15308.712,22034190000000.0


**Q: Create a data frame that counts the number countries in each continent?**

*Hint: Use `groupby`, select the `location` column and aggregate using `count`.*

In [20]:
country_counts_df = countries_df.groupby('continent').count()

In [21]:
country_counts_df

Unnamed: 0_level_0,location,population,life_expectancy,hospital_beds_per_thousand,gdp_per_capita,gdp
continent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Africa,55,55,55,40,53,53
Asia,47,47,47,43,45,45
Europe,51,51,48,43,42,42
North America,36,36,36,23,27,27
Oceania,8,8,8,3,4,4
South America,13,13,13,12,12,12


**Q: Create a data frame showing the total population of each continent.**

*Hint: Use `groupby`, select the population column and aggregate using `sum`.*

In [22]:
continent_populations_df = countries_df.groupby('continent').sum()

In [23]:
continent_populations_df

Unnamed: 0_level_0,population,life_expectancy,hospital_beds_per_thousand,gdp_per_capita,gdp
continent,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Africa,1339424000.0,3532.41,60.22,288523.368,6149204000000.0
Asia,4607388000.0,3506.05,124.93,1032210.905,56761880000000.0
Europe,748506200.0,3829.4,222.078,1401145.971,24358850000000.0
North America,591242500.0,2760.6,53.28,584691.58,22662970000000.0
Oceania,40958320.0,609.69,8.75,93260.722,1354559000000.0
South America,430461100.0,982.53,24.82,166089.423,6258200000000.0


Let's download another CSV file containing overall Covid-19 stats for various countires, and read the data into another Pandas data frame.

In [24]:
urlretrieve('https://hub.jovian.ml/wp-content/uploads/2020/09/covid-countries-data.csv', 
            'covid-countries-data.csv')

('covid-countries-data.csv', <http.client.HTTPMessage at 0x203f673d688>)

In [25]:
covid_data_df = pd.read_csv('covid-countries-data.csv')

In [26]:
covid_data_df

Unnamed: 0,location,total_cases,total_deaths,total_tests
0,Afghanistan,38243.0,1409.0,
1,Albania,9728.0,296.0,
2,Algeria,45158.0,1525.0,
3,Andorra,1199.0,53.0,
4,Angola,2729.0,109.0,
...,...,...,...,...
207,Western Sahara,766.0,1.0,
208,World,26059065.0,863535.0,
209,Yemen,1976.0,571.0,
210,Zambia,12415.0,292.0,


In [27]:
covid_data_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 212 entries, 0 to 211
Data columns (total 4 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   location      212 non-null    object 
 1   total_cases   211 non-null    float64
 2   total_deaths  211 non-null    float64
 3   total_tests   90 non-null     float64
dtypes: float64(3), object(1)
memory usage: 6.8+ KB


**Q: Count the number of countries for which the `total_tests` data is missing.**

*Hint: Use the `.isna` method.*

In [28]:
covid_data_df[covid_data_df.total_tests.isna()]

Unnamed: 0,location,total_cases,total_deaths,total_tests
0,Afghanistan,38243.0,1409.0,
1,Albania,9728.0,296.0,
2,Algeria,45158.0,1525.0,
3,Andorra,1199.0,53.0,
4,Angola,2729.0,109.0,
...,...,...,...,...
205,Venezuela,48883.0,398.0,
207,Western Sahara,766.0,1.0,
208,World,26059065.0,863535.0,
209,Yemen,1976.0,571.0,


In [29]:
total_tests_missing = covid_data_df[covid_data_df.total_tests.isna()].shape[0]

In [30]:
print("The data for total tests is missing for {} countries.".format(int(total_tests_missing)))

The data for total tests is missing for 122 countries.


Let's merge the two data frames, and compute some more metrics.

**Q: Merge `countries_df` with `covid_data_df` on the `location` column.**

*Hint: Use the `.merge` method on `countries_df`.

In [31]:
countries_df

Unnamed: 0,location,continent,population,life_expectancy,hospital_beds_per_thousand,gdp_per_capita,gdp
0,Afghanistan,Asia,38928341.0,64.83,0.50,1803.987,7.022622e+10
1,Albania,Europe,2877800.0,78.57,2.89,11803.431,3.396791e+10
2,Algeria,Africa,43851043.0,76.88,1.90,13913.839,6.101364e+11
3,Andorra,Europe,77265.0,83.73,,,
4,Angola,Africa,32866268.0,61.15,,5819.495,1.912651e+11
...,...,...,...,...,...,...,...
205,Vietnam,Asia,97338583.0,75.40,2.60,6171.884,6.007624e+11
206,Western Sahara,Africa,597330.0,70.26,,,
207,Yemen,Asia,29825968.0,66.12,0.70,1479.147,4.411699e+10
208,Zambia,Africa,18383956.0,63.89,2.00,3689.251,6.782303e+10


In [32]:
covid_data_df

Unnamed: 0,location,total_cases,total_deaths,total_tests
0,Afghanistan,38243.0,1409.0,
1,Albania,9728.0,296.0,
2,Algeria,45158.0,1525.0,
3,Andorra,1199.0,53.0,
4,Angola,2729.0,109.0,
...,...,...,...,...
207,Western Sahara,766.0,1.0,
208,World,26059065.0,863535.0,
209,Yemen,1976.0,571.0,
210,Zambia,12415.0,292.0,


In [33]:
combined_df = countries_df.merge(covid_data_df, on='location')

In [34]:
combined_df

Unnamed: 0,location,continent,population,life_expectancy,hospital_beds_per_thousand,gdp_per_capita,gdp,total_cases,total_deaths,total_tests
0,Afghanistan,Asia,38928341.0,64.83,0.50,1803.987,7.022622e+10,38243.0,1409.0,
1,Albania,Europe,2877800.0,78.57,2.89,11803.431,3.396791e+10,9728.0,296.0,
2,Algeria,Africa,43851043.0,76.88,1.90,13913.839,6.101364e+11,45158.0,1525.0,
3,Andorra,Europe,77265.0,83.73,,,,1199.0,53.0,
4,Angola,Africa,32866268.0,61.15,,5819.495,1.912651e+11,2729.0,109.0,
...,...,...,...,...,...,...,...,...,...,...
205,Vietnam,Asia,97338583.0,75.40,2.60,6171.884,6.007624e+11,1046.0,35.0,261004.0
206,Western Sahara,Africa,597330.0,70.26,,,,766.0,1.0,
207,Yemen,Asia,29825968.0,66.12,0.70,1479.147,4.411699e+10,1976.0,571.0,
208,Zambia,Africa,18383956.0,63.89,2.00,3689.251,6.782303e+10,12415.0,292.0,


**Q: Add columns `tests_per_million`, `cases_per_million` and `deaths_per_million` into `combined_df`.**

In [35]:
combined_df['tests_per_million'] = combined_df['total_tests'] * 1e6 / combined_df['population']

In [36]:
combined_df.sample(5)

Unnamed: 0,location,continent,population,life_expectancy,hospital_beds_per_thousand,gdp_per_capita,gdp,total_cases,total_deaths,total_tests,tests_per_million
190,Togo,Africa,8278737.0,61.04,0.7,1429.813,11837050000.0,1434.0,30.0,54709.0,6608.375166
30,Bulgaria,Europe,6948445.0,75.05,7.454,18563.307,128986100000.0,16454.0,642.0,415797.0,59840.295203
162,San Marino,Europe,33938.0,84.97,3.8,56861.47,1929765000.0,735.0,42.0,,
69,French Polynesia,Oceania,280904.0,77.66,,,,596.0,0.0,,
81,Guernsey,Europe,67052.0,,,,,252.0,13.0,,


In [37]:
combined_df['cases_per_million'] = combined_df['total_cases'] * 1e6 / combined_df['population']

In [38]:
combined_df['deaths_per_million'] = combined_df['total_deaths'] * 1e6 / combined_df['population']

In [39]:
combined_df

Unnamed: 0,location,continent,population,life_expectancy,hospital_beds_per_thousand,gdp_per_capita,gdp,total_cases,total_deaths,total_tests,tests_per_million,cases_per_million,deaths_per_million
0,Afghanistan,Asia,38928341.0,64.83,0.50,1803.987,7.022622e+10,38243.0,1409.0,,,982.394806,36.194710
1,Albania,Europe,2877800.0,78.57,2.89,11803.431,3.396791e+10,9728.0,296.0,,,3380.359997,102.856349
2,Algeria,Africa,43851043.0,76.88,1.90,13913.839,6.101364e+11,45158.0,1525.0,,,1029.804468,34.776824
3,Andorra,Europe,77265.0,83.73,,,,1199.0,53.0,,,15518.022390,685.950948
4,Angola,Africa,32866268.0,61.15,,5819.495,1.912651e+11,2729.0,109.0,,,83.033462,3.316470
...,...,...,...,...,...,...,...,...,...,...,...,...,...
205,Vietnam,Asia,97338583.0,75.40,2.60,6171.884,6.007624e+11,1046.0,35.0,261004.0,2681.403324,10.745996,0.359570
206,Western Sahara,Africa,597330.0,70.26,,,,766.0,1.0,,,1282.373228,1.674116
207,Yemen,Asia,29825968.0,66.12,0.70,1479.147,4.411699e+10,1976.0,571.0,,,66.250993,19.144391
208,Zambia,Africa,18383956.0,63.89,2.00,3689.251,6.782303e+10,12415.0,292.0,,,675.317108,15.883415


**Q: Create a dataframe with 10 countires that have highest number of tests per million people.**

In [40]:
highest_tests_df = combined_df.sort_values('tests_per_million', ascending=False).head(10)

In [41]:
highest_tests_df

Unnamed: 0,location,continent,population,life_expectancy,hospital_beds_per_thousand,gdp_per_capita,gdp,total_cases,total_deaths,total_tests,tests_per_million,cases_per_million,deaths_per_million
197,United Arab Emirates,Asia,9890400.0,77.97,1.2,67293.483,665559500000.0,71540.0,387.0,7177430.0,725696.635121,7233.276713,39.128852
14,Bahrain,Asia,1701583.0,77.29,2.0,43290.705,73662730000.0,52440.0,190.0,1118837.0,657527.137965,30818.36149,111.66073
115,Luxembourg,Europe,625976.0,82.25,4.51,94277.965,59015740000.0,7928.0,124.0,385820.0,616349.508607,12665.022301,198.090662
122,Malta,Europe,441539.0,82.53,4.485,36513.323,16122060000.0,1931.0,13.0,188539.0,427004.183096,4373.339614,29.442473
53,Denmark,Europe,5792203.0,80.9,2.5,46682.515,270394600000.0,17195.0,626.0,2447911.0,422621.755488,2968.645954,108.076323
96,Israel,Asia,8655541.0,82.97,2.99,33132.32,286778200000.0,122539.0,969.0,2353984.0,271962.665303,14157.289533,111.951408
89,Iceland,Europe,341250.0,82.99,2.91,46482.958,15862310000.0,2121.0,10.0,88829.0,260304.761905,6215.384615,29.304029
157,Russia,Europe,145934460.0,72.58,8.05,24765.954,3614206000000.0,1005000.0,17414.0,37176827.0,254750.159763,6886.653091,119.327539
199,United States,North America,331002647.0,78.86,2.77,54225.446,17948770000000.0,6114406.0,185744.0,83898416.0,253467.507769,18472.377957,561.155633
10,Australia,Oceania,25499881.0,83.44,3.84,44648.71,1138537000000.0,25923.0,663.0,6255797.0,245326.517406,1016.592979,26.000121


**Q: Create a dataframe with 10 countires that have highest number of positive cases per million people.**

In [42]:
highest_cases_df = combined_df.sort_values('cases_per_million', ascending=False).head(10)

In [43]:
highest_cases_df

Unnamed: 0,location,continent,population,life_expectancy,hospital_beds_per_thousand,gdp_per_capita,gdp,total_cases,total_deaths,total_tests,tests_per_million,cases_per_million,deaths_per_million
155,Qatar,Asia,2881060.0,80.23,1.2,116935.6,336898500000.0,119206.0,199.0,634745.0,220316.48074,41375.74365,69.0718
14,Bahrain,Asia,1701583.0,77.29,2.0,43290.705,73662730000.0,52440.0,190.0,1118837.0,657527.137965,30818.36149,111.66073
147,Panama,North America,4314768.0,78.51,2.3,22267.037,96077100000.0,94084.0,2030.0,336345.0,77952.04748,21805.112117,470.477208
40,Chile,South America,19116209.0,80.18,2.11,22767.037,435219400000.0,414739.0,11344.0,2458762.0,128621.841287,21695.671982,593.4231
162,San Marino,Europe,33938.0,84.97,3.8,56861.47,1929765000.0,735.0,42.0,,,21657.13949,1237.550828
9,Aruba,North America,106766.0,76.29,,35973.781,3840777000.0,2211.0,12.0,,,20708.839893,112.395332
105,Kuwait,Asia,4270563.0,75.49,2.0,65530.537,279852300000.0,86478.0,535.0,621616.0,145558.325682,20249.789079,125.276222
150,Peru,South America,32971846.0,76.74,1.6,12236.706,403466800000.0,663437.0,29259.0,584232.0,17719.117092,20121.318048,887.393445
27,Brazil,South America,212559409.0,75.88,2.2,14103.452,2997821000000.0,3997865.0,123780.0,4797948.0,22572.268255,18808.224105,582.331314
199,United States,North America,331002647.0,78.86,2.77,54225.446,17948770000000.0,6114406.0,185744.0,83898416.0,253467.507769,18472.377957,561.155633


**Q: Create a dataframe with 10 countires that have highest number of deaths cases per million people?**

In [44]:
highest_deaths_df = combined_df.sort_values('deaths_per_million', ascending=False).head(10)

In [45]:
highest_deaths_df

Unnamed: 0,location,continent,population,life_expectancy,hospital_beds_per_thousand,gdp_per_capita,gdp,total_cases,total_deaths,total_tests,tests_per_million,cases_per_million,deaths_per_million
162,San Marino,Europe,33938.0,84.97,3.8,56861.47,1929765000.0,735.0,42.0,,,21657.13949,1237.550828
150,Peru,South America,32971846.0,76.74,1.6,12236.706,403466800000.0,663437.0,29259.0,584232.0,17719.117092,20121.318048,887.393445
18,Belgium,Europe,11589616.0,81.63,5.64,42658.576,494396500000.0,85817.0,9898.0,2281853.0,196887.713967,7404.645676,854.040375
3,Andorra,Europe,77265.0,83.73,,,,1199.0,53.0,,,15518.02239,685.950948
177,Spain,Europe,46754783.0,83.56,2.97,34272.36,1602397000000.0,479554.0,29194.0,6416533.0,137238.001939,10256.790198,624.406705
198,United Kingdom,Europe,67886004.0,81.32,2.54,39753.244,2698689000000.0,338676.0,41514.0,13447568.0,198090.434075,4988.89285,611.525168
40,Chile,South America,19116209.0,80.18,2.11,22767.037,435219400000.0,414739.0,11344.0,2458762.0,128621.841287,21695.671982,593.4231
97,Italy,Europe,60461828.0,83.51,3.18,35220.084,2129471000000.0,271515.0,35497.0,5214766.0,86248.897403,4490.684602,587.097697
27,Brazil,South America,212559409.0,75.88,2.2,14103.452,2997821000000.0,3997865.0,123780.0,4797948.0,22572.268255,18808.224105,582.331314
182,Sweden,Europe,10099270.0,82.8,2.22,46949.283,474153500000.0,84532.0,5820.0,,,8370.109919,576.279276


**(Optional) Q: Count number of countries that feature in both the lists of "highest number of tests per million" and "highest number of cases per million".**

In [46]:
highest_tests_df

Unnamed: 0,location,continent,population,life_expectancy,hospital_beds_per_thousand,gdp_per_capita,gdp,total_cases,total_deaths,total_tests,tests_per_million,cases_per_million,deaths_per_million
197,United Arab Emirates,Asia,9890400.0,77.97,1.2,67293.483,665559500000.0,71540.0,387.0,7177430.0,725696.635121,7233.276713,39.128852
14,Bahrain,Asia,1701583.0,77.29,2.0,43290.705,73662730000.0,52440.0,190.0,1118837.0,657527.137965,30818.36149,111.66073
115,Luxembourg,Europe,625976.0,82.25,4.51,94277.965,59015740000.0,7928.0,124.0,385820.0,616349.508607,12665.022301,198.090662
122,Malta,Europe,441539.0,82.53,4.485,36513.323,16122060000.0,1931.0,13.0,188539.0,427004.183096,4373.339614,29.442473
53,Denmark,Europe,5792203.0,80.9,2.5,46682.515,270394600000.0,17195.0,626.0,2447911.0,422621.755488,2968.645954,108.076323
96,Israel,Asia,8655541.0,82.97,2.99,33132.32,286778200000.0,122539.0,969.0,2353984.0,271962.665303,14157.289533,111.951408
89,Iceland,Europe,341250.0,82.99,2.91,46482.958,15862310000.0,2121.0,10.0,88829.0,260304.761905,6215.384615,29.304029
157,Russia,Europe,145934460.0,72.58,8.05,24765.954,3614206000000.0,1005000.0,17414.0,37176827.0,254750.159763,6886.653091,119.327539
199,United States,North America,331002647.0,78.86,2.77,54225.446,17948770000000.0,6114406.0,185744.0,83898416.0,253467.507769,18472.377957,561.155633
10,Australia,Oceania,25499881.0,83.44,3.84,44648.71,1138537000000.0,25923.0,663.0,6255797.0,245326.517406,1016.592979,26.000121


In [47]:
# using the `isin` method: Checks whether `values` are contained in Series.
highest_tests_df.location.isin(highest_cases_df.location)

197    False
14      True
115    False
122    False
53     False
96     False
89     False
157    False
199     True
10     False
Name: location, dtype: bool

In [48]:
required_count = highest_tests_df.location.isin(highest_cases_df.location).sum()

In [49]:
print(f"The number of countries that feature in both the lists are {required_count}.")

The number of countries that feature in both the lists are 2.


**(Optional) Q: Count number of countries that feature in both the lists "20 countries with lowest GDP per capita" and "20 countries with the lowest number of hospital beds per thousand population". Only consider countries with a population higher than 10 million while creating the list.**

In [50]:
lowest_gdp_per_capita = combined_df[combined_df['population'] > 1e7].sort_values('gdp_per_capita').head(20)
# lowest_gdp_per_capita

In [51]:
lowest_hospital_beds = combined_df[combined_df['population'] > 1e7].sort_values('hospital_beds_per_thousand').head(20)
# lowest_hospital_beds

In [52]:
lowest_gdp_per_capita.location.isin(lowest_hospital_beds.location)

32      True
52     False
140     True
118    False
132     True
117     True
207     True
176    False
85      True
195     True
31      True
63      True
39     False
0       True
158    False
209    False
82      True
121     True
20      True
135     True
Name: location, dtype: bool

In [53]:
required_ans = lowest_gdp_per_capita.location.isin(lowest_hospital_beds.location).sum()

print(f"The number of countries that feature in both the lists are {required_ans}.")

The number of countries that feature in both the lists are 14.


In [54]:
combined_df.to_csv('final_results.csv', index=None)

### Thank You!
- Tushar Nankani