# Wrangling Data with `pandas`
***
SC207 Social Data Science, University of Essex, 2021/2022

## This Jupyter Notebook
* A hands-on tutorial on how to load, describe, access, and summarise data with the Python package `pandas` using a real worl data set.
* Analysis of the [Google COVID-19 Community Mobility Reports](https://www.google.com/covid19/mobility/), a large anonimised and open data set of aggreagate mobility trends tracing how global communities respond to Covid-19. 
* Real-world examples and understanding of local mobility trends in the United Kingdom and Essex in comparison to other countries and counties.
* Open and reproducible research workflow.

### Python tools for data analysis

The Python data science community have developed an open source ecosystem of libraries for data science, including the `pandas` for data loading, wrangling, and analysis, the `seaborn` library for data visualisation, and many other [data science libraries](https://github.com/krzjoa/awesome-python-data-science#data-manipulation). Think about Python libraries as tools that allow you to do data science tasks at easy, with minimal programming requirements, while focusing on scalable and reproducible analysis of social data.

### Getting started with [`pandas`](https://pandas.pydata.org/docs/getting_started/index.html#getting-started)

The `pandas` library:
* is a fast, powerful, and flexible open source tool for doing real world data analysis in Python.
* offers a diverse range of high-performance tools for data loading, cleaning, wrangling, merging, reshaping,  and summarising.
* is the go-to data sceince library in Python.

<img src="https://upload.wikimedia.org/wikipedia/commons/e/ed/Pandas_logo.svg" title='Pandas Logo' width="400" height="200"/>

### Dataset: Google Covid-19 Community Mobility Reports (GCMR)
* Aggregated, anonymized sets of data that protect individual privacy.
* Shows trends of human mobility over time by country and region, across different categories of places, including retail and recreation, groceries and pharmacies, parks, transit stations, workplaces, and residential. 
* For each place in a region, the data display the percentage change in visits for the reported date compared to a baseline day. Mobility changes are reported as a positive or negative percentage. An overview of the data from the Community Mobility Reports is provided [here](https://support.google.com/covid19-mobility/answer/9824897?hl=en&ref_topic=9822927).
* Provides an opportunity to explore how mobility trends have changed as a response to non-pharmaceutical public health interventions (e.g., lockdowns, school closure)  designed to reduce the spread of Covid-19.

<img src="https://www.google.com/covid19/static/reports-icon-grid.png" title='Google Covid-19 Community Mobility Data' width="400" height="200"/>

## Importing the `pandas` library

We first import the `pandas` library and, by convention, give it the alias `pd`.

In [1]:
# We import the pandas library via the Python's import command
import pandas as pd
pd.__version__ # For reproducibility, we also check the version of the library 

'1.3.3'

We can now access all the functions and capabilities the `pandas` library provides.

## Loading your data

Pandas supports many data file formats, including csv, excel, sql, json.
For details, see [How do I read and write tabular data?](https://pandas.pydata.org/docs/getting_started/intro_tutorials/02_read_write.html#min-tut-02-read-write)

<img src="https://pandas.pydata.org/docs/_images/02_io_readwrite.svg" width="800" height="400" >

### Loading data from the Web

To load the Covid-19 Community Mobility Reports data, there is no need to download the file on your local computer. Because the data set is provided as a comma-separated values (.csv) file, we just call the `read_csv()` function in `pandas` and specify the URL. 


### What is a (`pandas`) function?

A function is a block of code that:

* takes input parameters
* performs a specific task
* returns an output.

The `pandas` function `read_csv()` will take as an input parameter a comma-separated values (csv) file, read the file, and return Pandas DataFrame.

We call a function by writing the function name followed by parenthesis. The function `read_csv()` takes many input parameters, for example

* `sep` — delimeter to use when reading the file; default is *,* but other possible delimeters include *tab* characters or space characters.
* `parse_dates` — a column to be parsed as date and time.

### Getting help when needed

To learn more about a function, you use a question mark ?. For example, to access help information about the function Pandas function `read_csv()`, you type in


The code below loads the most recent online version of the data. We also assign the loaded data set to a variable called `mobility_trends_df`.

In [2]:
# The code below loads the most recent online version of the data

mobility_trends_df = pd.read_csv('https://www.gstatic.com/covid19/mobility/Global_Mobility_Report.csv', parse_dates = ['date'])

# Pandas represents tabular data as a DataFrame
mobility_trends_df

  exec(code_obj, self.user_global_ns, self.user_ns)


Unnamed: 0,country_region_code,country_region,sub_region_1,sub_region_2,metro_area,iso_3166_2_code,census_fips_code,place_id,date,retail_and_recreation_percent_change_from_baseline,grocery_and_pharmacy_percent_change_from_baseline,parks_percent_change_from_baseline,transit_stations_percent_change_from_baseline,workplaces_percent_change_from_baseline,residential_percent_change_from_baseline
0,AE,United Arab Emirates,,,,,,ChIJvRKrsd9IXj4RpwoIwFYv0zM,2020-02-15,0.0,4.0,5.0,0.0,2.0,1.0
1,AE,United Arab Emirates,,,,,,ChIJvRKrsd9IXj4RpwoIwFYv0zM,2020-02-16,1.0,4.0,4.0,1.0,2.0,1.0
2,AE,United Arab Emirates,,,,,,ChIJvRKrsd9IXj4RpwoIwFYv0zM,2020-02-17,-1.0,1.0,5.0,1.0,2.0,1.0
3,AE,United Arab Emirates,,,,,,ChIJvRKrsd9IXj4RpwoIwFYv0zM,2020-02-18,-2.0,1.0,5.0,0.0,2.0,1.0
4,AE,United Arab Emirates,,,,,,ChIJvRKrsd9IXj4RpwoIwFYv0zM,2020-02-19,-2.0,0.0,4.0,-1.0,2.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7465942,ZW,Zimbabwe,Midlands Province,Kwekwe,,,,ChIJRcIZ3-FJNBkRRsj55IcLpfU,2021-10-25,,,,,33.0,
7465943,ZW,Zimbabwe,Midlands Province,Kwekwe,,,,ChIJRcIZ3-FJNBkRRsj55IcLpfU,2021-10-26,,,,,57.0,
7465944,ZW,Zimbabwe,Midlands Province,Kwekwe,,,,ChIJRcIZ3-FJNBkRRsj55IcLpfU,2021-10-27,,,,,55.0,
7465945,ZW,Zimbabwe,Midlands Province,Kwekwe,,,,ChIJRcIZ3-FJNBkRRsj55IcLpfU,2021-10-28,,,,,48.0,


#### Loading data from your local computer

In [3]:
# The same read_csv function can be used to load the file Global_Mobility_Report.csv from your computer 
# Prerequisite: the file needs to be pre-downloaded from https://www.google.com/covid19/mobility/
# Replace 'Downloads' with the actual folder in which the file is stored in your computer

# mobility_trends_downloaded = pd.read_csv('~/Downloads/Global_Mobility_Report.csv')

### Pandas DataFrame

['A DataFrame is a 2-dimensional data structure that can store data of different types (including characters, integers, floating point values, categorical data and more) in columns.'](https://pandas.pydata.org/docs/getting_started/intro_tutorials/01_table_oriented.html#min-tut-01-tableoriented)

<img src="https://pandas.pydata.org/docs/_images/01_table_dataframe.svg" title='Pandas DataFrame' width="400" height="200"/>

## Viewing, Describing, and Accessing your Data

### Viewing data

In [4]:
# Show the first five rows using the method DataFrame.head()
mobility_trends_df.head()

Unnamed: 0,country_region_code,country_region,sub_region_1,sub_region_2,metro_area,iso_3166_2_code,census_fips_code,place_id,date,retail_and_recreation_percent_change_from_baseline,grocery_and_pharmacy_percent_change_from_baseline,parks_percent_change_from_baseline,transit_stations_percent_change_from_baseline,workplaces_percent_change_from_baseline,residential_percent_change_from_baseline
0,AE,United Arab Emirates,,,,,,ChIJvRKrsd9IXj4RpwoIwFYv0zM,2020-02-15,0.0,4.0,5.0,0.0,2.0,1.0
1,AE,United Arab Emirates,,,,,,ChIJvRKrsd9IXj4RpwoIwFYv0zM,2020-02-16,1.0,4.0,4.0,1.0,2.0,1.0
2,AE,United Arab Emirates,,,,,,ChIJvRKrsd9IXj4RpwoIwFYv0zM,2020-02-17,-1.0,1.0,5.0,1.0,2.0,1.0
3,AE,United Arab Emirates,,,,,,ChIJvRKrsd9IXj4RpwoIwFYv0zM,2020-02-18,-2.0,1.0,5.0,0.0,2.0,1.0
4,AE,United Arab Emirates,,,,,,ChIJvRKrsd9IXj4RpwoIwFYv0zM,2020-02-19,-2.0,0.0,4.0,-1.0,2.0,1.0


In [5]:
# Show the last five rows using the method DataFrame.tail()   
mobility_trends_df.tail()

Unnamed: 0,country_region_code,country_region,sub_region_1,sub_region_2,metro_area,iso_3166_2_code,census_fips_code,place_id,date,retail_and_recreation_percent_change_from_baseline,grocery_and_pharmacy_percent_change_from_baseline,parks_percent_change_from_baseline,transit_stations_percent_change_from_baseline,workplaces_percent_change_from_baseline,residential_percent_change_from_baseline
7465942,ZW,Zimbabwe,Midlands Province,Kwekwe,,,,ChIJRcIZ3-FJNBkRRsj55IcLpfU,2021-10-25,,,,,33.0,
7465943,ZW,Zimbabwe,Midlands Province,Kwekwe,,,,ChIJRcIZ3-FJNBkRRsj55IcLpfU,2021-10-26,,,,,57.0,
7465944,ZW,Zimbabwe,Midlands Province,Kwekwe,,,,ChIJRcIZ3-FJNBkRRsj55IcLpfU,2021-10-27,,,,,55.0,
7465945,ZW,Zimbabwe,Midlands Province,Kwekwe,,,,ChIJRcIZ3-FJNBkRRsj55IcLpfU,2021-10-28,,,,,48.0,
7465946,ZW,Zimbabwe,Midlands Province,Kwekwe,,,,ChIJRcIZ3-FJNBkRRsj55IcLpfU,2021-10-29,,,,,49.0,


In [6]:
# Specify the number of rows to return
mobility_trends_df.tail(10)

Unnamed: 0,country_region_code,country_region,sub_region_1,sub_region_2,metro_area,iso_3166_2_code,census_fips_code,place_id,date,retail_and_recreation_percent_change_from_baseline,grocery_and_pharmacy_percent_change_from_baseline,parks_percent_change_from_baseline,transit_stations_percent_change_from_baseline,workplaces_percent_change_from_baseline,residential_percent_change_from_baseline
7465937,ZW,Zimbabwe,Midlands Province,Kwekwe,,,,ChIJRcIZ3-FJNBkRRsj55IcLpfU,2021-10-18,,,,,34.0,
7465938,ZW,Zimbabwe,Midlands Province,Kwekwe,,,,ChIJRcIZ3-FJNBkRRsj55IcLpfU,2021-10-19,,,,,49.0,
7465939,ZW,Zimbabwe,Midlands Province,Kwekwe,,,,ChIJRcIZ3-FJNBkRRsj55IcLpfU,2021-10-20,,,,,53.0,
7465940,ZW,Zimbabwe,Midlands Province,Kwekwe,,,,ChIJRcIZ3-FJNBkRRsj55IcLpfU,2021-10-21,,,,,41.0,
7465941,ZW,Zimbabwe,Midlands Province,Kwekwe,,,,ChIJRcIZ3-FJNBkRRsj55IcLpfU,2021-10-22,,,,,47.0,
7465942,ZW,Zimbabwe,Midlands Province,Kwekwe,,,,ChIJRcIZ3-FJNBkRRsj55IcLpfU,2021-10-25,,,,,33.0,
7465943,ZW,Zimbabwe,Midlands Province,Kwekwe,,,,ChIJRcIZ3-FJNBkRRsj55IcLpfU,2021-10-26,,,,,57.0,
7465944,ZW,Zimbabwe,Midlands Province,Kwekwe,,,,ChIJRcIZ3-FJNBkRRsj55IcLpfU,2021-10-27,,,,,55.0,
7465945,ZW,Zimbabwe,Midlands Province,Kwekwe,,,,ChIJRcIZ3-FJNBkRRsj55IcLpfU,2021-10-28,,,,,48.0,
7465946,ZW,Zimbabwe,Midlands Province,Kwekwe,,,,ChIJRcIZ3-FJNBkRRsj55IcLpfU,2021-10-29,,,,,49.0,


> # Try on your own—Exercise 1

Using the DataFrame `mobility_trends_df`, view the 
* top 20 rows
* last 30 rows

In [7]:
# Please write the code related to Exercise 1 in this cell   






### Describing your DataFrame

In [8]:
# Accessing columns using the DataFrame.columns attribute
mobility_trends_df.columns

Index(['country_region_code', 'country_region', 'sub_region_1', 'sub_region_2',
       'metro_area', 'iso_3166_2_code', 'census_fips_code', 'place_id', 'date',
       'retail_and_recreation_percent_change_from_baseline',
       'grocery_and_pharmacy_percent_change_from_baseline',
       'parks_percent_change_from_baseline',
       'transit_stations_percent_change_from_baseline',
       'workplaces_percent_change_from_baseline',
       'residential_percent_change_from_baseline'],
      dtype='object')

In [9]:
# Accessing the index using the DataFrame.index attribute
mobility_trends_df.index

RangeIndex(start=0, stop=7465947, step=1)

In [10]:
# Accessing the values using the DataFrame.values attribute 
mobility_trends_df.values

array([['AE', 'United Arab Emirates', nan, ..., 0.0, 2.0, 1.0],
       ['AE', 'United Arab Emirates', nan, ..., 1.0, 2.0, 1.0],
       ['AE', 'United Arab Emirates', nan, ..., 1.0, 2.0, 1.0],
       ...,
       ['ZW', 'Zimbabwe', 'Midlands Province', ..., nan, 55.0, nan],
       ['ZW', 'Zimbabwe', 'Midlands Province', ..., nan, 48.0, nan],
       ['ZW', 'Zimbabwe', 'Midlands Province', ..., nan, 49.0, nan]],
      dtype=object)

In [11]:
# Type of data structure
type(mobility_trends_df)

pandas.core.frame.DataFrame

In [12]:
# Dimensionality of a DataFrame  

mobility_trends_df.shape

(7465947, 15)

In [13]:
# Use the print function to display the number of rows and columns in a DataFrame 
print("\nThe Google COVID-19 Community Mobility Reports contain", 
      mobility_trends_df.shape[0], "rows and", mobility_trends_df.shape[1],"columns.")


The Google COVID-19 Community Mobility Reports contain 7465947 rows and 15 columns.


In [14]:
# Information about a DataFrame
mobility_trends_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7465947 entries, 0 to 7465946
Data columns (total 15 columns):
 #   Column                                              Dtype         
---  ------                                              -----         
 0   country_region_code                                 object        
 1   country_region                                      object        
 2   sub_region_1                                        object        
 3   sub_region_2                                        object        
 4   metro_area                                          object        
 5   iso_3166_2_code                                     object        
 6   census_fips_code                                    float64       
 7   place_id                                            object        
 8   date                                                datetime64[ns]
 9   retail_and_recreation_percent_change_from_baseline  float64       
 10  grocery_and_pharma

### Accessing columns and rows in your data

#### Accessing columns
We can access columns via column name and column position.

*Accessing columns via column name*

In [15]:
# Get the country column and save it to its own variable
# The double square bracket option `[[]]` gives DataFrame
mobility_trends_df_country = mobility_trends_df[['country_region']]

In [16]:
# Display the top five rows
mobility_trends_df_country.head()

Unnamed: 0,country_region
0,United Arab Emirates
1,United Arab Emirates
2,United Arab Emirates
3,United Arab Emirates
4,United Arab Emirates


In [17]:
# Display the type of data structure
type(mobility_trends_df_country)

pandas.core.frame.DataFrame

In [18]:
# The single square braket `[]` option gives Series
mobility_trends_df_country = mobility_trends_df['country_region']
mobility_trends_df_country.head()

0    United Arab Emirates
1    United Arab Emirates
2    United Arab Emirates
3    United Arab Emirates
4    United Arab Emirates
Name: country_region, dtype: object

In [19]:
# Display the type of data structure
type(mobility_trends_df_country)

pandas.core.series.Series

In [20]:
# Accessing more than one column by using Python list syntax
mobility_trends_df_country_region_date = mobility_trends_df[['country_region', 'sub_region_1', 'date']]

In [21]:
# Display the top five rows
mobility_trends_df_country_region_date.head()

Unnamed: 0,country_region,sub_region_1,date
0,United Arab Emirates,,2020-02-15
1,United Arab Emirates,,2020-02-16
2,United Arab Emirates,,2020-02-17
3,United Arab Emirates,,2020-02-18
4,United Arab Emirates,,2020-02-19


---

> # Try on your own—Exercise 2
Access the column `country_region_code` from the DataFrame `mobility_trends_df`

In [22]:
# Please write the code related to Exercise 2 in this cell   






---

*Accessing columns via column position*

In [23]:
# Accessing columns via column position
mobility_trends_df_country_region_date = mobility_trends_df.iloc[:, [1, 2, 8]]

# Display the top five rows
mobility_trends_df_country_region_date.head()

Unnamed: 0,country_region,sub_region_1,date
0,United Arab Emirates,,2020-02-15
1,United Arab Emirates,,2020-02-16
2,United Arab Emirates,,2020-02-17
3,United Arab Emirates,,2020-02-18
4,United Arab Emirates,,2020-02-19


In [24]:
# Accessing a subset of rows and columns
mobility_trends_df_country_region_date_3rows = mobility_trends_df.iloc[0:3, [1, 2, 8]]
mobility_trends_df_country_region_date_3rows.head()

Unnamed: 0,country_region,sub_region_1,date
0,United Arab Emirates,,2020-02-15
1,United Arab Emirates,,2020-02-16
2,United Arab Emirates,,2020-02-17


#### Accessing rows

Rows can be accessed via row labels `df.loc` and row index `df.iloc`

In [25]:
# Before accessing particular rows, let's see the names of all countries in the dataset 
# by listing all unique values in the df['country_region'] column
mobility_trends_df.country_region.unique()

array(['United Arab Emirates', 'Afghanistan', 'Antigua and Barbuda',
       'Angola', 'Argentina', 'Austria', 'Australia', 'Aruba',
       'Bosnia and Herzegovina', 'Barbados', 'Bangladesh', 'Belgium',
       'Burkina Faso', 'Bulgaria', 'Bahrain', 'Benin', 'Bolivia',
       'Brazil', 'The Bahamas', 'Botswana', 'Belarus', 'Belize', 'Canada',
       'Switzerland', "Côte d'Ivoire", 'Chile', 'Cameroon', 'Colombia',
       'Costa Rica', 'Cape Verde', 'Czechia', 'Germany', 'Denmark',
       'Dominican Republic', 'Ecuador', 'Estonia', 'Egypt', 'Spain',
       'Finland', 'Fiji', 'France', 'Gabon', 'United Kingdom', 'Georgia',
       'Ghana', 'Greece', 'Guatemala', 'Guinea-Bissau', 'Hong Kong',
       'Honduras', 'Croatia', 'Haiti', 'Hungary', 'Indonesia', 'Ireland',
       'Israel', 'India', 'Iraq', 'Italy', 'Jamaica', 'Jordan', 'Japan',
       'Kenya', 'Kyrgyzstan', 'Cambodia', 'South Korea', 'Kuwait',
       'Kazakhstan', 'Laos', 'Lebanon', 'Liechtenstein', 'Sri Lanka',
       'Lithuania', '

In [26]:
# Accessing specific rows from a DataFrame
# We are interested in the data about the United Kingdom 
mobility_trends_df_country_UK = mobility_trends_df[mobility_trends_df['country_region']=='United Kingdom']
mobility_trends_df_country_UK.head()

Unnamed: 0,country_region_code,country_region,sub_region_1,sub_region_2,metro_area,iso_3166_2_code,census_fips_code,place_id,date,retail_and_recreation_percent_change_from_baseline,grocery_and_pharmacy_percent_change_from_baseline,parks_percent_change_from_baseline,transit_stations_percent_change_from_baseline,workplaces_percent_change_from_baseline,residential_percent_change_from_baseline
2746676,GB,United Kingdom,,,,,,ChIJqZHHQhE7WgIReiWIMkOg-MQ,2020-02-15,-12.0,-7.0,-35.0,-12.0,-4.0,2.0
2746677,GB,United Kingdom,,,,,,ChIJqZHHQhE7WgIReiWIMkOg-MQ,2020-02-16,-7.0,-6.0,-28.0,-7.0,-3.0,1.0
2746678,GB,United Kingdom,,,,,,ChIJqZHHQhE7WgIReiWIMkOg-MQ,2020-02-17,10.0,1.0,24.0,-2.0,-14.0,2.0
2746679,GB,United Kingdom,,,,,,ChIJqZHHQhE7WgIReiWIMkOg-MQ,2020-02-18,7.0,-1.0,20.0,-3.0,-14.0,2.0
2746680,GB,United Kingdom,,,,,,ChIJqZHHQhE7WgIReiWIMkOg-MQ,2020-02-19,6.0,-2.0,8.0,-4.0,-14.0,3.0


In [27]:
mobility_trends_df_country_UK.tail()

Unnamed: 0,country_region_code,country_region,sub_region_1,sub_region_2,metro_area,iso_3166_2_code,census_fips_code,place_id,date,retail_and_recreation_percent_change_from_baseline,grocery_and_pharmacy_percent_change_from_baseline,parks_percent_change_from_baseline,transit_stations_percent_change_from_baseline,workplaces_percent_change_from_baseline,residential_percent_change_from_baseline
3006267,GB,United Kingdom,York,,,GB-YOR,,ChIJh-IigLwxeUgRAKFv7Z75DAM,2021-10-25,25.0,24.0,113.0,-8.0,-46.0,8.0
3006268,GB,United Kingdom,York,,,GB-YOR,,ChIJh-IigLwxeUgRAKFv7Z75DAM,2021-10-26,25.0,27.0,123.0,-12.0,-45.0,9.0
3006269,GB,United Kingdom,York,,,GB-YOR,,ChIJh-IigLwxeUgRAKFv7Z75DAM,2021-10-27,22.0,22.0,117.0,-6.0,-43.0,8.0
3006270,GB,United Kingdom,York,,,GB-YOR,,ChIJh-IigLwxeUgRAKFv7Z75DAM,2021-10-28,23.0,24.0,125.0,-7.0,-43.0,8.0
3006271,GB,United Kingdom,York,,,GB-YOR,,ChIJh-IigLwxeUgRAKFv7Z75DAM,2021-10-29,11.0,21.0,67.0,2.0,-43.0,8.0


In [28]:
# Accessing data about multiple countries 
mobility_trends_df_countries = mobility_trends_df[mobility_trends_df['country_region'].isin(['United Kingdom', 'Germany', 'Italy', 'Sweden'])]
mobility_trends_df_countries.head()

Unnamed: 0,country_region_code,country_region,sub_region_1,sub_region_2,metro_area,iso_3166_2_code,census_fips_code,place_id,date,retail_and_recreation_percent_change_from_baseline,grocery_and_pharmacy_percent_change_from_baseline,parks_percent_change_from_baseline,transit_stations_percent_change_from_baseline,workplaces_percent_change_from_baseline,residential_percent_change_from_baseline
2389784,DE,Germany,,,,,,ChIJa76xwh5ymkcRW-WRjmtd6HU,2020-02-15,6.0,1.0,45.0,10.0,0.0,-1.0
2389785,DE,Germany,,,,,,ChIJa76xwh5ymkcRW-WRjmtd6HU,2020-02-16,7.0,10.0,9.0,6.0,-1.0,0.0
2389786,DE,Germany,,,,,,ChIJa76xwh5ymkcRW-WRjmtd6HU,2020-02-17,2.0,2.0,7.0,1.0,-2.0,0.0
2389787,DE,Germany,,,,,,ChIJa76xwh5ymkcRW-WRjmtd6HU,2020-02-18,2.0,2.0,10.0,1.0,-1.0,1.0
2389788,DE,Germany,,,,,,ChIJa76xwh5ymkcRW-WRjmtd6HU,2020-02-19,3.0,0.0,6.0,-1.0,-1.0,1.0


In [29]:
# Filter by two conditions — country and county — simultenously
# First let's see the list of counties in the dataset
mobility_trends_df_country_UK.sub_region_1.unique()

array([nan, 'Aberdeen City', 'Aberdeenshire', 'Angus Council',
       'Antrim and Newtownabbey', 'Ards and North Down',
       'Argyll and Bute Council', 'Armagh City, Banbridge and Craigavon',
       'Bath and North East Somerset', 'Bedford', 'Belfast',
       'Blackburn with Darwen', 'Blackpool', 'Blaenau Gwent',
       'Borough of Halton', 'Bracknell Forest', 'Bridgend County Borough',
       'Brighton and Hove', 'Bristol City', 'Buckinghamshire',
       'Caerphilly County Borough', 'Cambridgeshire', 'Cardiff',
       'Carmarthenshire', 'Causeway Coast and Glens',
       'Central Bedfordshire', 'Ceredigion', 'Cheshire East',
       'Cheshire West and Chester', 'Clackmannanshire',
       'Conwy Principal Area', 'Cornwall', 'County Durham', 'Cumbria',
       'Darlington', 'Denbighshire', 'Derby', 'Derbyshire',
       'Derry and Strabane', 'Devon', 'Dorset', 'Dumfries and Galloway',
       'Dundee City Council', 'East Ayrshire Council',
       'East Dunbartonshire Council', 'East Lothi

In [30]:
# Access data about UK and Essex
mobility_trends_df_country_UK_county_Essex = mobility_trends_df[(mobility_trends_df['country_region'] == 'United Kingdom') & 
                                 (mobility_trends_df['sub_region_1']=='Essex')]

In [31]:
mobility_trends_df_country_UK_county_Essex.head()

Unnamed: 0,country_region_code,country_region,sub_region_1,sub_region_2,metro_area,iso_3166_2_code,census_fips_code,place_id,date,retail_and_recreation_percent_change_from_baseline,grocery_and_pharmacy_percent_change_from_baseline,parks_percent_change_from_baseline,transit_stations_percent_change_from_baseline,workplaces_percent_change_from_baseline,residential_percent_change_from_baseline
2801794,GB,United Kingdom,Essex,,,GB-ESS,,ChIJ0w2H_idW2EcReVDuRzjLV0I,2020-02-15,-9.0,-7.0,-27.0,-11.0,-3.0,2.0
2801795,GB,United Kingdom,Essex,,,GB-ESS,,ChIJ0w2H_idW2EcReVDuRzjLV0I,2020-02-16,-16.0,-12.0,-45.0,-9.0,-5.0,2.0
2801796,GB,United Kingdom,Essex,,,GB-ESS,,ChIJ0w2H_idW2EcReVDuRzjLV0I,2020-02-17,9.0,-1.0,23.0,-1.0,-16.0,3.0
2801797,GB,United Kingdom,Essex,,,GB-ESS,,ChIJ0w2H_idW2EcReVDuRzjLV0I,2020-02-18,10.0,-4.0,15.0,0.0,-16.0,2.0
2801798,GB,United Kingdom,Essex,,,GB-ESS,,ChIJ0w2H_idW2EcReVDuRzjLV0I,2020-02-19,9.0,-3.0,10.0,1.0,-16.0,2.0


---

> # Try on your own—Exercise 3
Access all rows about `Greater London`

In [32]:
# Please write the code related to Exercise 3 in this cell






---

#### Accessing multiple rows and columns and conditioning

In [33]:
# Let's see which UK counties had the lower retail and recreation mobility the day after Italy went in lockdown
mobility_trends_df_UK_mobility1003=mobility_trends_df_country_UK.loc[(mobility_trends_df_country_UK['date']=='2020-03-10') &
                                       (mobility_trends_df_country_UK['retail_and_recreation_percent_change_from_baseline']<0),
                                       ['sub_region_1','retail_and_recreation_percent_change_from_baseline']]

# Sort in decreasing order
mobility_trends_df_UK_mobility1003.sort_values(by='retail_and_recreation_percent_change_from_baseline', ascending=True)

Unnamed: 0,sub_region_1,retail_and_recreation_percent_change_from_baseline
2816101,Gloucestershire,-16.0
2817961,Greater London,-14.0
2834782,Greater London,-13.0
2818584,Greater London,-12.0
2831044,Greater London,-11.0
...,...,...
2974582,Tyne and Wear,-1.0
2926580,Nottinghamshire,-1.0
2841635,Greater Manchester,-1.0
2960923,Suffolk,-1.0


In [34]:
# UK counties with the lower retail and recreation the day after UK went in lockdown in March 2020
mobility_trends_df_UK_mobility2403=mobility_trends_df_country_UK.loc[(mobility_trends_df_country_UK['date']=='2020-03-24') & 
                                       (mobility_trends_df_country_UK['retail_and_recreation_percent_change_from_baseline']<0), 
                                       ['sub_region_1','retail_and_recreation_percent_change_from_baseline']]

# Sort in decreasing order
mobility_trends_df_UK_mobility2403.sort_values(by='retail_and_recreation_percent_change_from_baseline', ascending=True)

Unnamed: 0,sub_region_1,retail_and_recreation_percent_change_from_baseline
2817975,Greater London,-94.0
2818598,Greater London,-92.0
2822336,Greater London,-87.0
2837288,Greater London,-84.0
2829189,Greater London,-83.0
...,...,...
2990770,West Sussex,-56.0
2776377,Cumbria,-56.0
2868361,Kent,-54.0
2869607,Kent,-54.0


## Summarising your data

In [35]:
# For each country, find the maximum value of visits to retail and recriation compared to baseline 
mobility_trends_df.groupby("country_region") \
    ["retail_and_recreation_percent_change_from_baseline"].max() \
    .sort_values(ascending=False)

country_region
Poland             616.0
India              499.0
United States      409.0
Croatia            343.0
Turkey             333.0
                   ...  
Myanmar (Burma)      6.0
Costa Rica           5.0
Bahrain              5.0
Hong Kong            4.0
Singapore           -3.0
Name: retail_and_recreation_percent_change_from_baseline, Length: 135, dtype: float64

In [36]:
# For each country, find the mean value of visits to retail and recriation compared to baseline
mobility_trends_df_retail=mobility_trends_df.groupby('country_region') \
    ['retail_and_recreation_percent_change_from_baseline'].mean() \
    .sort_values(ascending=False)
mobility_trends_df_retail

country_region
Yemen              35.832797
Libya              31.868917
Niger              26.352843
Afghanistan        19.163653
Burkina Faso       17.276572
                     ...    
Argentina         -34.037057
Philippines       -34.835336
Chile             -36.412832
Panama            -42.718299
Myanmar (Burma)   -45.708935
Name: retail_and_recreation_percent_change_from_baseline, Length: 135, dtype: float64

In [37]:
# Type of data structure
type(mobility_trends_df_retail)

pandas.core.series.Series

In [38]:
# Find mean value of mobility in retail and recreation in Italy 
mobility_trends_df_retail.loc['Italy']

-20.94490173234728

---

> # Try on your own—Exercise 4
Find mean value of mobility in `workplace` in the `United Kingdom`

In [39]:
# Please write the code related to Exercise 4 in this cell





---

## Acknowledgements
* Wes McKinney. 2017. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython.
* Daniel Chen. 2017. Pandas for Everyone: Python Data Analysis.
* Manuel Amunategui. 2020. COVID-19 Community Mobility Reports From Google and Apple - Available to All - Explore with Python. 