The following code connect to the Data Mill North page, parse the html and process the csv files uploaded by LSC.  It is adapted from Nic Malleson's initial exploration of the data found at https://github.com/Urban-Analytics/dust/blob/main/Projects/Ambient_Populations/AmbientPopulations.ipynb.

There are various checks to ensure duplicate files are not downloaded and merged into the final dataframe.  Initially the code included a check on the filename to filter out anything that started with 'Copy of', however after visualising the data I discovered that a lot of the data was missing from
earlier years (mostly 2015-2017) as many of the files had been named 'Copy of....' yet were not duplicates.  The code already ensures files that exist are not downloaded and I've gone through and eyeballed the files to do a sense check of whether duplicates exist or not.

If viewing in a notebook browser window, the Plotly charts may not show unless everything is run.  They don't seem to automatically appear.  I'm looking into whether there is a workaround for this.

In [1]:
import plotly.express as px
from bs4 import BeautifulSoup  # requirement beautifulsoup4
from urllib.request import (
    urlopen, urlretrieve)
import os, os.path
import sys
import pandas as pd
import numpy as np
from IPython.display import HTML
import datetime

from source import *

data_dir = "./data/lcc_footfall"


If you're not bothered about checking the website for updated files, only run the following cell once.  Once the files are downloaded you can skip it to save time.

In [2]:
 # Where to save the csv files

#Function to parse the html and download the csv files to specified location
download_data(data_dir)

In [3]:
footfalldf_imported = import_data(data_dir)

File Copy%20of%20Monthly%20Data%20Feed%20-%20Oct%202017.csv has nans in the following columns: '['Hour']'. Ignoring it
File Copy%20of%20Monthly%20Data%20Feed%20-%20Sept%202017.csv has nans in the following columns: '['Hour']'. Ignoring it
File Copy%20of%20Monthly%20Data%20Feed-November%202016%20-%2020161221.csv has nans in the following columns: '['Date', 'InCount', 'Hour', 'Location', 'BRCYear', 'BRCMonthName', 'BRCWeek']'. Ignoring it
File Monthly%20Data%20Feed%20-%20%20Jan%202018.csv has nans in the following columns: '['Hour']'. Ignoring it
File Monthly%20Data%20Feed%20-%20Dec%202017.csv has nans in the following columns: '['Hour']'. Ignoring it
File Monthly%20Data%20Feed%20-%20Nov%202017.csv has nans in the following columns: '['Hour']'. Ignoring it
File Monthly%20Data%20Feed-April%202017%20-%2020170510.csv has nans in the following columns: '['Date', 'InCount', 'Hour', 'Location', 'BRCYear', 'BRCMonthName', 'BRCWeek']'. Ignoring it
File Monthly%20Data%20Feed-Feb%202018.csv has na

Let's import the csv data files and clean the resulting dataframe based on some initial exploration.

First, the import_data function runs through each csv, creates a dataframe for each and then merges them.  Columns are converted to appropriate data types and any mismatches are fixed before merging.  This is important as Leeds City Council changed the formats of the files several times, which led to some differences in column names and potentially data types.

The next step in the pipeline is to check for duplicates and remove them.  Initial data exploration revealed errors in some of the csv files where individual records had been duplicated.  In some instances, the same records existed in several different files, for example dates in early July appeared towards the end of the June csv.

The cameras don't all come online at the same time, with the last starting on 27th August 2008.  To ensure meaningful comparability, any records before this date have been removed.

Finally, one of the cameras appeared to have moved locations on 31st May 2015 from Commercial Street at Lush to Commercial Street at Sharps.  These are combined and renamed to Commercial Street Combined.

In [4]:
#Pipeline that imports csv files, creates a dataframe and applies cleaning functions
footfalldf = (footfalldf_imported
              .pipe(start_pipeline)
              .pipe(set_start_date, '2008-08-27')
              .pipe(combine_cameras)
              .pipe(check_remove_dup)
              .pipe(create_BRC_MonthNum))

#Useful list for if months ever get out of whack when resampling or plotting.
Months = ['January','February','March','April','May','June','July','August','September','October','November','December']

Footfall hasn't changed when combining cameras
There are 0 duplicates left


Output to CSV file for analysis elsewhere (need to look at tidying this up and using as the main data import rather than always downloading and importing multiple files).

In [5]:
footfalldf.to_csv("./data/lcc_footfall_combined.csv")

### What has the impact of lockdown policies been on footfall in Leeds City Centre?

The bar charts below show mean hourly footfall over each month for 2019, 2020 and 2021.  Initially visualising the data like this has helped to identify gaps in the data so creating a cleaner dataset for analysis.  It also provides a useful high level overview of the impacts of lockdown on footfall in Leeds City Centre over most of 2020 and all 2021 compared to 'business as usual' in 2019.  The immediate drop in March after restrictions were implemented is just one indication of the impacts, felt throughout the year as shown by the variation in height of the bars.

Where possible, British Retail Consortium (BRC) frequencies have been used to subset and analyse the data, rather than extracting them directly from the DateTime field.  This allowed sense checks to be undertaken against the official Springboard reports during analysis.

In [49]:
data_18to20 = footfalldf.loc[footfalldf.BRCYear >= 2019]
data_18to20 = mean_hourly(data_18to20,"month")

fig = px.bar(data_18to20,
             x='BRCMonth',y='Count',title="Mean Hourly Footfall by Month 2019 - 2021",
             facet_row='BRCYear',
             category_orders={"BRCYear": [2019,2020,2021],
                              "BRCMonth": Months},
             labels={"Count": "","BRCMonth": "BRC Month"})
fig.for_each_annotation(lambda a: a.update(text=a.text.split("=")[-1]))

fig.show()

With such a clear contrast to 2019 established, the next step of the analysis was to look at the lockdown period specifically.  Since the pandemic hit so early on in 2020, the entire year is included in most of the analysis except for the change from baseline later on.

First, the data was aggregated to daily frequency and the mean hourly footfall for each day was calculated and plotted below.  The chart has been split out by location to identify whether any specific trends in the data occurred homogenously across the city centre.  Previous analysis has at LIDA has identified gaps which might lead to distorted visualisations if not corrected for.

In [7]:
def mean_hourly_location(dataf,freq):

    if freq == "day":
        dataf = dataf.groupby(['Location',
            pd.Grouper(key="DateTime",freq="D"),'BRCWeekNum','BRCMonth','BRCYear'])['Count'].aggregate(np.mean)
    elif freq == "month":
        dataf = dataf.groupby(
            ['Location','BRCMonthNum',pd.Grouper(key="BRCMonth"),'BRCYear'])['Count'].aggregate(np.mean).reset_index()
    elif freq == "week":
        dataf = dataf.set_index('DateTime').groupby(
            ['Location',pd.Grouper(key="BRCWeekNum")])['Count'].aggregate(np.mean)
    elif freq == "year":
        dataf = dataf.set_index('DateTime').groupby(
            ['Location',pd.Grouper(key="BRCYear")])['Count'].aggregate(np.mean)

    return dataf


data_lockdown_location = (footfalldf
        .pipe(start_pipeline)
        .pipe(mean_hourly_location,"day")
        .pipe(reset_df_index)
        .pipe(set_dt_index))

data_lockdown_location = data_lockdown_location.loc[data_lockdown_location.BRCYear >= 2020]

fig = px.line(data_lockdown_location,
                  title="Daily Mean Hourly footfall by individual camera - 2020 and 2021",
                  y='Count',
              facet_col='Location',facet_col_wrap=4,
                           labels={"Count": ""})
fig.for_each_annotation(lambda a: a.update(text=a.text.split("=")[-1],
                                           font_size=10))

fig.update_xaxes(nticks=10, tickangle=45, tickfont=dict(size=8))

fig.show()

Interpretation of the lack of data from Albion Street at McDonalds and Park row is that they are relatively new cameras, coming online in September 2020.  Unfortunately, both appear to have large gaps between October 2020 and February 2021.  The decision was taken to drop these from the analysis so that the limited data they provide doesn't distort the footfall figures around the time of coming online and later in 2021.

### All Cameras

The chart below shows mean hourly footfall for all cameras during 2020 and 2021.  The dates are shown in the table below the chart.  Red zones indicate where a full lockdown was in place whilst orange shows partial restrictions in place.

In [8]:
#Create a pipeline to format the dataframe into mean hourly footfall by day.
data_lockdown = (footfalldf
                 .pipe(start_pipeline)
                 .pipe(remove_new_cameras)
                 .pipe(mean_hourly, "day")
                 .pipe(reset_df_index)
                 .pipe(set_dt_index))

#Filter data to be only 2020/2021
data_lockdown = data_lockdown.loc[(data_lockdown.BRCYear == 2020) | (data_lockdown.BRCYear == 2021)]



fig = px.line(data_lockdown,
              title="Comparison of Mean Hourly footfall over 12 months",
              y='Count')

fig.update_xaxes(nticks=20, tickangle=45, tickfont=dict(size=8))

chart_lockdown_dates(fig).show()


Footfall starts to drop almost immediately after the announcement on 16th March that people should stop non-essential contact and travel.  This is four days before the intermediate 20th March restrictions on hospitality and a full week before the announcement of a full lockdown on 23rd March.  There's a stark contrast between the period of 'normality' in January and February and the sudden drop towards the end of March, with the relative variation between days of the week minimised to a large extent.

Footfall stays low, with a small increase as summer approaches, however it isn't until 15th June, when non-essential shops reopened, that a huge spike occurs.  More restrictions were eased on 4th July and footfall continues to rise into August.

One of the most high profile policies to have occured in 2020 was the Eat out to Help out scheme.  A government backed scheme launched on 3rd August, it gave people 50% discount on food and non-alcoholic drinks at participating outlets.  Business were able to claim the lost revenue back in an attempt to encourage people to help boost the hospitality sector after months of closure.  At the time of the launch, footfall was already steadily increasing so it's difficult to discern from the chart the exact impact of the scheme.  Additionally, it was only valid Monday to Wednesday.  A more granular analysis is undertaken later to tease out whether this had any impacts or not.

Another large drop comes after 22nd September when new restrictions are announced, although they noticably don't go back to first lockdown levels.  This lasts until an interesting spike around 14th October when the tier system comes into effect.  Leeds is put in tier 2, which bans mixing indoors outside of household/bubbles, a rule of 6 outside and hospitality can only open if serving full meals.  A second lockdown between 2nd November and 2nd December drives footfall down again but seemingly not to levels of the first lockdown.  Leeds is placed into tier 3, which all but prevents people from meeting in groups of 6 in a public place.  Hospitality remains closed and travel is restricted for all but essential purposes.

Christmas is a strange period, some days showing a significantly increased amount of footfall, most likely attributed to shopping at retail outlets.

Finally, the entire country is placed into tier 4, driving footfall down to similar levels seen in previous
lockdowns.


| Date                | Description                                                                                  | Restrictions Implemented                                                                                            | Restrictions lifted                                                                                        |
|---------------------|----------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------|
| 16th March 2020     | Advice to stop non-essential contact and travel                                              | None - advice only                                                                                                  | None                                                                                                       |
| 20th March 2020     | Hospitality and leisure to close at midnight                                                 | Hospitality and leisure venues close                                                                                | None                                                                                                       |
| 23rd March 2020     | First lockdown announced                                                                     | Stay at Home, non-essential travel restricted, <br>non-essential retail and schools closed                          | None                                                                                                       |
| 10th May 2020       | Conditional lockdown easing announced, <br>working from home can be relaxed                  | None                                                                                                                | Partial return to workplaces if agreed with employer, <br>unlimited time outside, meet one person outside. |
| 1st June 2020       | Phased reopening of primary schools                                                                  | None                                                                                                                | Primary schools reopened, <br>groups of up to six meeting outside, Stay at Home.                           |
| 15th June 2020      | Non-essential retail and secondary schools reopen                                            | Face coverings mandatory on public transport                                                                        | Non-essential shops reopen. <br>Secondary schools reopen.                                                  |
| 3rd August 2020     | Eat out to Help out launched                                                                 | None                                                                                                                | Eat out to help out scheme                                                                                 |
| 22nd September 2020 | New restrictions in England, <br>including working from home and 10pm curfew for hospitality | Working from home, restrictions on hospitality.                                                                     | None                                                                                                       |
| 14th October 2020   | West Yorkshire put into tier 2                                                               | Mixing indoors outside of household/bubble banned, <br>rule of 6 outside. Pubs and bars only open if serving meals. | None                                                                                                       |
| 2nd November 2020   | 2nd Lockdown comes into force                                                                | All restrictions reimplemented                                                                                      | None                                                                                                       |
| 2nd December 2020   | 2nd Lockdown ends, tier system returns                                                       | None                                                                                                                | Restrictions eased in some areas.                                                                         |


Let's break some of these periods down to look at daily changes as mean hourly footfall and percentage change.

In [44]:
#Chart fed by a pipeline to create dataframe grouped by days between 16th March 2020 - 22nd March 2020 and show mean hourly footfall.  Haven't figured out how to sort the colours dividing the days up yet!
data_2020_subset = (footfalldf
        .pipe(start_pipeline)
        .pipe(remove_new_cameras)
        .pipe(mean_hourly,"day")
        .pipe(reset_df_index)
        .pipe(date_range,"2020-03-09","2020-03-29"))

data_2020_subset['Day_Name'] = data_2020_subset.index.day_name()

bar = px.bar(data_2020_subset,y='Count',title="Mean hourly footfall by weekday - 9th to 29th March 2020",
             color='Day_Name',hover_data=['Day_Name','Count'],
             labels={'Count': 'Mean hourly footfall', 'DateTime': 'Date', 'Day_Name':'Day'})

bar.update_xaxes(nticks=26,tickangle=45,tickfont=dict(size=8))

bar.show()


There's definitely a drop through the week immediately after the announcement (March 16th) where people were advised
to stop non-essential contact and travel but no official lockdown was in place.  Until then there seemed to be
relatively stable levels of hourly footfall with minor variations day by day in the week of 9th to 15th March, seeing an expected rise on Saturday before a drop on Sunday.

On the day of the announcement, hourly footfall drops a little from the previous Monday, however it is in the days after that the changes are really noticable.  Rather than the small increases/decreases observed the week before, levels decrease sharply over the week, with some minor levelling off midweek.

Look at Saturday 21st March though.  Footfall drops by approximately two-thirds following the announcement on 20th March that all hospitality and leisure must close.  Levels continue to drop as the official lockdown is announced on 23rd March and legally comes into force by 26th March.

let's ignore 10th May and 1st June as the line chart didn't register much of a change and jump to 15th June, when non-essential shops and secondary schools reopened.

In [10]:
data_2020_subset = (footfalldf
        .pipe(start_pipeline)
                 .pipe(remove_new_cameras)
        .pipe(mean_hourly,"day")
        .pipe(reset_df_index)
        .pipe(date_range,"2020-06-08","2020-06-28"))

data_2020_subset['Day_Name'] = data_2020_subset.index.day_name()

bar = px.bar(data_2020_subset,y='Count', title="Mean hourly footfall by weekday - 8th to 28th June 2020",
             color='Day_Name',hover_data=['Day_Name','Count'],
             labels={'Count': 'Mean hourly footfall', 'DateTime': 'Date', 'Day_Name':'Day'})

bar.update_xaxes(nticks=21,tickangle=45,tickfont=dict(size=8))
bar.update_yaxes(range=[0,1000])

bar.show()

A huge jump in mean hourly footfall on Monday 15th, which was to be expected for Leeds City Centre as a large part of the retail sector is allowed to reopen.


I really want to look at that period around 14th October, something strange happens as footfall rises when it would have been expected to fall.


In [11]:
data_2020_subset = (footfalldf
        .pipe(start_pipeline)
                 .pipe(remove_new_cameras)
        .pipe(mean_hourly,"day")
        .pipe(reset_df_index)
        .pipe(date_range,"2020-10-05","2020-10-25"))

data_2020_subset['Day_Name'] = data_2020_subset.index.day_name()

bar = px.bar(data_2020_subset,y='Count', title="Mean hourly footfall by weekday - 10th to 25th October 2020",
             color='Day_Name',hover_data=['Day_Name','Count'],
             labels={'Count': 'Mean hourly footfall', 'DateTime': 'Date', 'Day_Name':'Day'})

bar.update_xaxes(nticks=26,tickangle=45,tickfont=dict(size=8))
bar.update_yaxes(range=[0,1000])

bar.show()

This is interesting.  Footfall jumps in the week commencing 12th October.  The tier system was introduced on 14th October, with Liverpool being the only city to be placed in the highest.  Leeds was placed in tier 2, which put restrictions on meeting indoors and on hospitality, particularly pubs and bars.  Perhaps in anticipation of closures, people started flocking into the city centre to make the most of supposed freedoms before things maybe got worse?  This trend certainly continued later into October.

I wonder if it had a huge peak before Leeds was put into tier 3 and the 2nd national lockdown came into force?

In [12]:
data_2020_subset = (footfalldf
        .pipe(start_pipeline)
                 .pipe(remove_new_cameras)
        .pipe(mean_hourly,"day")
        .pipe(reset_df_index)
        .pipe(date_range,"2020-10-26","2020-11-08"))
data_2020_subset['Day_Name'] = data_2020_subset.index.day_name()

bar = px.bar(data_2020_subset,y='Count', title="Mean hourly footfall by weekday - 26th October to 8th November 2020",
             color='Day_Name',hover_data=['Day_Name','Count'],
             labels={'Count': 'Mean hourly footfall', 'DateTime': 'Date', 'Day_Name':'Day'})

bar.update_xaxes(nticks=26,tickangle=45,tickfont=dict(size=8))
bar.update_yaxes(range=[0,1000])

bar.show()

This is interesting.  Combined with the previous viz, what it seems to be illustrating (at least for mobility in Leeds City Centre) is that the tier system actually drove footfall up quite significantly.  Perhaps because non-essential retail was still open despite restrictions on hospitality existing?  Or people travelling into Leeds from other areas whilst it was still in tier 2?  Or people just ignoring the rules and going out on a last shopping spree or pub crawl whilst they still can?  Probably a combination of these factors and many more.

Compared with the dramatic drop faced when the public are given clear instructions to stay at home, that businesses must close and not to travel unless essential during the second lockdown.


There's much more to look at with this data, it's fascinating to see how different stages of lockdown have impacted footfall within the city centre.  Even without factoring in other variables such as weather, there are clear patterns in the data.  I would like to dig into some later periods a bit more, or even compare some of the retail weeks with their equivalents in previous years, but for now I feel it's right to move on.  It's definitely a viable data source for modelling.

Potential next steps:
    - Start constructing lockdown variables to join with footfall data
    - Repeat analysis for alternative data sources to determine suitability for modelling.
    - Start to write this analysis up more formally for an output?


Had a thought after looking at the Google Mobility data.  They take a baseline value of the median for each weekday during a period between 3rd Jan and 6th Feb 2020, to be 'normal' values.  They caveat this with how unreliable it might be due to regional differences in when COVID became a serious problem, however that seems to be a good period for the UK.  It might even be worth extending that to the beginning of March to take an even wider slice as the weather supposedly starts to improve and days are lighter.

In [14]:
data_lockdown = (footfalldf
                 .pipe(start_pipeline)
                 .pipe(remove_new_cameras)
                 .pipe(set_dt_index))

data_lockdown = data_lockdown.loc[(data_lockdown.BRCYear == 2020) | (data_lockdown.BRCYear == 2021)]
data_lockdown = data_lockdown.groupby( [pd.Grouper(level='DateTime',freq='D')])['Count'].sum()
data_lockdown

fig = px.line(data_lockdown,
              title="Total Daily Footfall 2020-2021",
              y='Count')

fig.update_xaxes(nticks=20, tickangle=45, tickfont=dict(size=8))

chart_lockdown_dates(fig).show()

In [35]:
#Calculate median of period 3rd January to March 5th

data_lockdown = (footfalldf
                 .pipe(start_pipeline)
                 .pipe(remove_new_cameras)
                 .pipe(set_dt_index))

data_lockdown = data_lockdown.loc[(data_lockdown.BRCYear == 2020) | (data_lockdown.BRCYear == 2021)]

data_lockdown['Day_Name'] = data_lockdown.index.day_name()
data_lockdown = data_lockdown.groupby( [pd.Grouper(level='DateTime',freq='D'),'Day_Name'])['Count'].sum().reset_index()

baseline = (data_lockdown
        .pipe(start_pipeline)
        .pipe(date_range,"2020-01-03","2020-03-05"))

baseline = baseline.groupby([pd.Grouper(key="Day_Name")])['Count'].aggregate(np.median)
baseline

data_lockdown = data_lockdown.loc[data_lockdown.DateTime > "2020-03-05"].set_index('DateTime')
data_lockdown['baseline'] = data_lockdown.Day_Name.map(baseline.to_dict())
data_lockdown['baseline_change'] = data_lockdown.Count - data_lockdown.baseline
data_lockdown['baseline_per_change'] = (data_lockdown.baseline_change/data_lockdown.baseline)
data_lockdown

fig = px.line(data_lockdown,
              title="Daily Footfall 6th March 2020-2021 - % change from baseline",
              y='baseline_per_change',
            hover_data=['Day_Name','baseline_per_change'],
             labels={'DateTime': 'Date', 'Day_Name':'Day'})

fig.update_xaxes(nticks=20, tickangle=45, tickfont=dict(size=8))
fig.update_yaxes(tickformat='%')

chart_lockdown_dates(fig).show()

This is really interesting.  What it shows is the percentage change of total daily footfall from a baseline.  This baseline was calculated as the median value of each weekday between 3rd January and 5th March 2020.

What it shows is that except for a few outliers (which I'll come on to), footfall has remained incredibly low compared to the baseline established from the start of the year.  Considering that this baseline covers mostly the middle to end of winter, it's astonishing to see that it didn't really recover to pre-pandemic levels even over summer as restrictions were eased.  Most days were still at least 10 percentage lower than the baseline value.  Given that advice to work from home and other restrictions on other parts of society werre eased, it's understandable that footfall increased so dramatically.  However the gap that still exists on most days could be explained by employers choosing to keep offices closed or staffed to a minimum combined with the caution some people may be showing in travelling into the city centre for anything.

Some of the outliers can most likely be explained according to the dates they happened on.  For example August 30th 2020 records approximately 11% higher than the baseline for a Sunday, the only instance of an increase in the whole dataset.  Most likely this is due to the August Bank Holiday taking place, with restrictions at their loosest and the weather fairly good.

The lead up to Christmas is interesting.  Monday 21st and Tuesday 22nd have footfall representing less than a 10% and 5% decrease respectively from the baseline value.  Safe to say this is most likely due to a combination of last minute Christmas shopping and people travelling through the city centre en route to wherever they were spending the festive period.  If the latter were true, it would be interesting considering that the rules were changed on 19th December to only permit bubbles on 25th December.  Certainly according to the law, nobody should have been travelling long distances on public transport for Christmas as the advice was 'stay local'.

Comparison to Google Mobility data

Since the outbreak of the COVID-19 pandemic, Google have made aggregated mobility data available and produced community mobility reports for public health officials to use in their response work.  This covers mobility of Google devices users over several different categories: Retail/Recreation, Grocery/Pharmacy, Parks, Transit Stations, Workplaces and Residential.

Whilst this data are interesting, it's use in estimating ambient population is questionable due to lack of spatial granularity. (Whipp et al, 2021).  Some of the categories follow broadly similar trends to the footfall data when considering key lockdown dates, however it is not possible to identify whether these are related to Leeds City Centre as the dataset covers the entire city region.