# The LFC Goal Machine (R version)

This python3 notebook analyses Liverpool FC's goal scorers, in particular exploring a scatter plot of a player's top level league goals scored in a season against the age of the player at the mid-point of that season. The plot is available as an interactive web app called the LFC Goal Machine [here](https://terrydolan.shinyapps.io/lfcgmR/). 

The notebook generates and validates the required data (based on reference data from lfchistory.net) and prototypes the key parts of the R application. Python is used for the data preparation and analysis, and R is used for the interactive application and plotting. The application's core plotting function uses R's ggplot2.

The notebook contains the key algorithms, some interesting LFC player plots and describes how the lfcgmR app is built and deployed.

The project uses [Anaconda](https://www.anaconda.com/), [Jupyter Notebook](http://jupyter.org/), [Python](https://www.python.org/), [Pandas](http://pandas.pydata.org/), [rpy2](https://rpy2.readthedocs.io/), [R](https://www.r-project.org/), [R ggplot](http://ggplot2.org/), [R studio](https://www.rstudio.com/), [R Shiny](http://shiny.rstudio.com/), [R Dplyr](https://cran.r-project.org/web/packages/dplyr/index.html), [R Scales](https://cran.r-project.org/web/packages/scales/index.html).

__Application History, at August 2019__

The original lfcgm interactive application (built around 2016) was contained a shared core plotting function and was deployed as both a Python app (using Spyre and Heroku) and an R app (using RStudio). Both of these verions had ggplot at their core; with the R implementation using ggplot2 and the python implementation using yhat 'port' of ggplot2. The R ggplot implementation is excellent and the original objective was always to have the core plotting algorithm defined in ggplot.

The original python app used yhat's ggplot. The yhat ggplot developer has taken the decision to not to continue to R-like version for python. The latest version of yhat's ggplot is 'work in progress'. There are several other ggplot-like developments that are also available in python. All of these projects aim to be pythonic thereby making it difficult to maintain a single core function based on ggplot that can be shared between Rython and R. 

Also, R Studio makes it very easy to develop and deploy interactive web apps. The nearest equivalent in python in 2016 was Spyre in conjunction with Heroku. Spyre is no longer developed.

Therefore I've decided to maintain only the R version of the lfcgm app. However, this remains a hybrid Python and R development. Python infrastructure (Pandas, etc) is used to generate and validate the required data, and explore the core plotting function. The Python rpy2 library is used to run R commands in a python3 notebook.

## Notebook Approach

This python3 notebook prepares and validates the data and algorithms for the LFC Goal Machine app (lfcgmR). 

1. Load the input data files, enhance (e.g. add player age at season midpoint) and generate the required lfcgmR application data. The application requires 2 csvs:
 - data/lfc_scorers_tl_pos_age.csv: containing all of the LFC scorers in the top flight seasons with their age.
 - data/lfcgm_app_dropdown.csv: containing a dropdown list of those scorers.

1. Validate the generated data.

1. Validate the core r plotting algorithm (using R's ggplot2).

1. Describe the lfcgmR app, with a link to the github source code.

The lfcgmR app is available as an interactive web app [here](https://terrydolan.shinyapps.io/lfcgmR/). 

### Notebook Change Log

In [1]:
%%html
<! left align the change log table in next cell >
<style>
table {float:left}
</style>

| Date          | Change Description |
| :------------- | :----------------- |
| 21st February 2016 | Initial python and r baseline versions|
| 30th October 2016 | Added LFC season 2015-16 |
| 12th October 2017 | Added LFC season 2016-17 |
| 31st August 2019 | Added LFC seasons 2017-18 and 2018-19; Moved to Python 3; restuctured the application to focus on R version; updated function to generate age at midpoint of season; improved the data validation; enhanced R app to: improve core plotting function, especially y axis ints; add table tab to show the data table |

To Do
- Review latest R Studio UI presentation and controls to see if application can be improved.

- Consider extending app to also allow sort by season(s) rather than age; allow overlapping seasons?

## Set-Up

### Environment

This notebook uses an Anaconda py37r environment.

In [None]:
# This notebook requires a Python 3 (3.7+) and R (3.6+) environment
# Show active conda env
!conda env list

In [None]:
!Python --version

In [None]:
!R --version

### Import the python modules needed for the analysis.

In [None]:
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import dateutil as du
import dateutil.relativedelta as dr
import sys
import os

# enable inline plotting
%matplotlib inline

# activate the rpy capability so that R commands can be run in this notebook
from rpy2.robjects import pandas2ri
pandas2ri.activate()
import rpy2

# load the extension for the %%R cell magic
%load_ext rpy2.ipython

Print key module version numbers.

In [None]:
print('python version: {}'.format(sys.version))
print('pandas version: {}'.format(pd.__version__))
print('numpy version: {}'.format(np.__version__))
print('matplotlib version: {}'.format(mpl.__version__))
print('dateutil version: {}'.format(du.__version__))
print('rpy2 version: {}'.format(rpy2.__version__))

## Prepare The Data Files

### Define name and location of csv data files

In [None]:
# define key seasons
LG_SEASON_START = '1893-1894' # first league season (2nd Division)
SEASON_START = '1892-1893' # first season in existance
# modify the following, as required
SEASON_END = '2018-2019' # most recent season
PLAYERS_CSV_MONTH = 'August' # month of players csv extract
PLAYERS_CSV_YEAR = '2019' # year of players csv extract
NUMBER_OF_SEASONS = 104 # 104 in 2018-2019

# define input csv files
print('\nLocation of the input csv data files:')

# define scorers CSV file name (and check exists)
SCORERS_PREFIX = 'lfc_scorers'
SCORERS_CSV_FILE = '{}_{}_{}.csv'.format(SCORERS_PREFIX, SEASON_START, SEASON_END)
LFC_SCORERS_CSV_FILE = os.path.relpath('data/{}'.format(SCORERS_CSV_FILE))
assert os.path.isfile(LFC_SCORERS_CSV_FILE) 
print('LFC scorers csv file is: {}'.format(LFC_SCORERS_CSV_FILE))

# define squads CSV file name (and check exists)
SQUADS_PREFIX = 'lfc_squads'
SQUADS_CSV_FILE = '{}_{}_{}.csv'.format(SQUADS_PREFIX, SEASON_START, SEASON_END)
LFC_SQUADS_CSV_FILE = os.path.relpath('data/{}'.format(SQUADS_CSV_FILE))
assert os.path.isfile(LFC_SQUADS_CSV_FILE) 
print('LFC squads csv file is: {}'.format(LFC_SQUADS_CSV_FILE))

# define player appearances CSV file name (and check exists)
APPS_PREFIX = 'lfc_apps'
APPS_CSV_FILE = '{}_{}_{}.csv'.format(APPS_PREFIX, SEASON_START, SEASON_END)
LFC_APPS_CSV_FILE = os.path.relpath('data/{}'.format(APPS_CSV_FILE))
assert os.path.isfile(LFC_APPS_CSV_FILE) 
print('LFC appearances csv file is: {}'.format(LFC_APPS_CSV_FILE))

# define league CSV file name (and check exists)
LEAGUE_PREFIX = 'lfc_league'
LEAGUE_CSV_FILE = '{}_{}_{}.csv'.format(LEAGUE_PREFIX, LG_SEASON_START, SEASON_END)
LFC_LEAGUE_CSV_FILE = os.path.relpath('data/{}'.format(LEAGUE_CSV_FILE))
assert os.path.isfile(LFC_LEAGUE_CSV_FILE) 
print('LFC league csv file is: {}'.format(LFC_LEAGUE_CSV_FILE))
                                          
# define players CSV file name (and check exists)
PLAYERS_PREFIX = 'lfc_players'
PLAYERS_CSV_FILE_UPDATED = '{}_{}{}_upd.csv'.format(PLAYERS_PREFIX, PLAYERS_CSV_MONTH, PLAYERS_CSV_YEAR)
LFC_PLAYERS_CSV_FILE_UPDATED = os.path.relpath('data/{}'.format(PLAYERS_CSV_FILE_UPDATED))
assert os.path.isfile(LFC_PLAYERS_CSV_FILE_UPDATED) 
print('LFC league csv file is: {}'.format(LFC_PLAYERS_CSV_FILE_UPDATED))

# define generated csv files (this is the data used by the lfcgm app)
print('\nLocation of the generated csv data files:')

# define scorers in top league with position and age CSV file name
LFC_SCORERS_TL_POS_AGE_CSV_FILE = os.path.relpath('data/lfc_scorers_tl_pos_age.csv')
print('LFC scorers in top league with position and age is: {}'.format(LFC_SCORERS_TL_POS_AGE_CSV_FILE))

# define dropdown CSV file name
LFCGM_DROPDOWN = os.path.relpath('data/lfcgm_app_dropdown.csv')
print('LFC goal machine dropdown is: {}'.format(LFCGM_DROPDOWN))

## Load The LFC Data Into Dataframes And Munge

### Create a dataframe of scorers in top level league seasons
Input data source: lfchistory.net

In [None]:
print('Loading LFC scorers csv from {}'.format(LFC_SCORERS_CSV_FILE))
dflfc_scorers = pd.read_csv(LFC_SCORERS_CSV_FILE)

# sort by season, then league goals
dflfc_scorers = dflfc_scorers.sort_values(['season', 'league'], ascending=([False, False]))
dflfc_scorers.shape

In [None]:
# show the most recent top goal scorers
dflfc_scorers.head()

In [None]:
# expect 1892-1893
dflfc_scorers.tail()

In [None]:
# note that scorers includes own goals
dflfc_scorers[dflfc_scorers.player == 'Own goals'].head()

In [None]:
# note: war years already excluded in input files
LANCS_YRS = ['1892-1893']
SECOND_DIV_YRS = ['1893-1894', '1895-1896', '1904-1905', '1961-1962', 
                  '1954-1955', '1955-1956', '1956-1957', '1957-1958', 
                  '1958-1959', '1959-1960', '1960-1961']

NOT_TOP_LEVEL_YRS = LANCS_YRS + SECOND_DIV_YRS
dflfc_scorers_tl = dflfc_scorers[~dflfc_scorers.season.isin(NOT_TOP_LEVEL_YRS)].copy()
dflfc_scorers_tl.shape

In [None]:
## check number of top level seasons aligns with http://www.lfchistory.net/Stats/LeagueOverall
## e.g. total was 102 for top level seasons from 1894-95 to 2016-17
print('the number of seasons is {}'.format(len(dflfc_scorers_tl.season.unique())))
assert len(dflfc_scorers_tl.season.unique()) == NUMBER_OF_SEASONS

In [None]:
# show most league goals in a season in top level
# cross-check with http://en.wikipedia.org/wiki/List_of_Liverpool_F.C._records_and_statistics#Goalscorers
# expect 101 in 2013-14
assert dflfc_scorers_tl[['season', 'league']].groupby(['season'])\
            .sum().sort_values('league', ascending=False).head(1).reset_index().values.tolist()[0] == ['2013-2014', 101]
dflfc_scorers_tl[['season', 'league']].groupby(['season']).sum().sort_values('league', ascending=False).head(1)

In [None]:
# remove OG
dflfc_scorers_tl = dflfc_scorers_tl[dflfc_scorers_tl.player != 'Own goals']
dflfc_scorers_tl.shape

In [None]:
# check latest season
dflfc_scorers_tl[dflfc_scorers_tl.season == SEASON_END].head(10)

In [None]:
# check first top level season, expect 1894-95
assert dflfc_scorers_tl.tail(1).season.values[0] =='1894-1895'

### Create dataframe of squads giving the position of each player
Input data source: lfchistory.net

In [None]:
print('Loading LFC scorers csv from {}'.format(LFC_SQUADS_CSV_FILE))
dflfc_squads = pd.read_csv(LFC_SQUADS_CSV_FILE)
dflfc_squads.shape

In [None]:
dflfc_squads.head()

In [None]:
dflfc_squads.tail()

### Create dataframe of LFC's league position
Input data source: lfchistory.net

In [None]:
print('Loading LFC scorers csv from {}'.format(LFC_LEAGUE_CSV_FILE))
dflfc_league = pd.read_csv(LFC_LEAGUE_CSV_FILE)
dflfc_league.shape

In [None]:
dflfc_league.head()

In [None]:
dflfc_league.tail()

In [None]:
# check most recent league data is present
assert dflfc_league.tail(1).Season.values[0] == SEASON_END

### Create merged dataframe, combining scorers in top league level season with squad position

In [None]:
dflfc_scorers_tl_pos = pd.DataFrame.merge(dflfc_scorers_tl, dflfc_squads)
dflfc_scorers_tl_pos.shape

In [None]:
dflfc_scorers_tl_pos.head()

In [None]:
dflfc_scorers_tl_pos.tail()

In [None]:
 assert dflfc_scorers_tl_pos[(dflfc_scorers_tl_pos.season == '2018-2019') & 
                            (dflfc_scorers_tl_pos.player == 'Mohamed Salah')].position.values[0] == 'Striker'

In [None]:
 assert dflfc_scorers_tl_pos[(dflfc_scorers_tl_pos.season == '1905-1906') & 
                            (dflfc_scorers_tl_pos.player == 'Alex Raisbeck')].position.values[0] == 'Defender'

### Create a dataframe of players with birthdate and country of birth

In [None]:
print('Loading LFC scorers csv from {}'.format(LFC_PLAYERS_CSV_FILE_UPDATED))
dflfc_players = pd.read_csv(LFC_PLAYERS_CSV_FILE_UPDATED, parse_dates=['birthdate'])
assert dflfc_players.birthdate.dtypes == 'datetime64[ns]'
dflfc_players.shape

In [None]:
dflfc_players.head()

In [None]:
dflfc_players.tail()

### Create merged dataframe of players, combining scorers in top league level season with squad position and age

Add players age to the dflfc_scorers_tl_pos dataframe

In [None]:
# v3
# change log: Aug 2019, 
# enhanced to pass individual dayta items as input, rather than a row of data
# made use of dflfc_players more explicit
# ensure consistent calculation of age as year decimal
def age_at_season(player, season, dflfc_players=dflfc_players):
    """Return player's age at mid-point of season, assumed to be 1st Jan.
    
        player -> player's name: string e.g. 'Ian Rush'
        season -> season: string e.g. '1984-1985'
        dflfc_players -> dataframe with columns ['player', 'birthdate'] e.g. 
                player   | birthdate  | country
                Ian Rush | 1961-10-20 | Wales
        
        uses dflfc_players to look-up birthdate, keyed on player
        return age as year.fraction, rouned to one decimal place
        
        e.g. age_at_season('Ian Rush', '1985-1986') -> 24.2
        
        return average age (26.5) if player is missing from dflfc_players
    """
    #print('this player: {}'.format(player))
    
    AVERAGE_AGE = 26.5
    
    # define mid-point of the season, to be used to calculate the players age in a given season
    mid_point = pd.Timestamp('01 January {}'.format(season[-4:]))
    #print('this season mid point: {}'.format(mid_point))
          
    try:
        # look-up player's date of birth
        dob = dflfc_players[dflfc_players.player==player].birthdate.values[0]
    except:
        # use average age if player's birthdate not available
        print('warning: age not found for player {} in season {}, using average age {}'.format(player, 
                                                                                             season, 
                                                                                             AVERAGE_AGE))
        return AVERAGE_AGE

    # calculate player's age at mid-point of season
    bday = pd.Timestamp(dob)
    result = dr.relativedelta(mid_point.to_pydatetime(), bday.to_pydatetime())
    
    # return result as year.fraction
    yr_int = result.years
    yr_fraction = (result.months*30 + result.days)/365.2425

    return yr_int + round(yr_fraction, 1)

In [None]:
# test function age_at_season()

assert age_at_season('Ian Rush', '1985-1986') == 24.2

# create a dataframe with test data
d = {'Jan_baby': ['01 January 2000', '2000-2001'], 'Feb_baby': ['01 February 2000', '2000-2001'],
     'Jan2_baby': ['02 January 2000', '2000-2001'], 'Dec2_baby': ['31 December 2000', '2000-2001'],
     'Mar_baby': ['01 March 2000', '2000-2001'], 'Apr_baby': ['01 April 2000', '2000-2001'],
     'May_baby': ['01 May 2000', '2000-2001'], 'Jun_baby': ['01 June 2000', '2000-2001'],
     'Jul_baby': ['01 July 2000', '2000-2001'], 'Aug_baby': ['01 August 2000', '2000-2001'],
     'Sep_baby': ['01 September 2000', '2000-2001'], 'Oct_baby': ['01 October 2000', '2000-2001'],
     'Nov_baby': ['01 November 2000', '2000-2001'], 'Dec_baby': ['01 December 2000', '2000-2001']}
df_test = pd.DataFrame.from_dict(d, orient='index', columns=['birthdate', 'season'])
df_test.index.name = 'player'
df_test.reset_index(inplace=True)
df_test['birthdate'] = pd.to_datetime(df_test.birthdate)

# check single values in dataframe
assert (age_at_season('Jul_baby', '2000-2001', df_test) == 0.5)
assert (age_at_season('Nov_baby', '2000-2001', df_test) == 0.2)

# check edge examples
#print(age_at_season('Jan2_baby', '2000-2001', df_test))
assert (age_at_season('Jan2_baby', '2000-2001', df_test) == 1.0)
#print(age_at_season('Dec2_baby', '2000-2001', df_test))
assert (age_at_season('Dec2_baby', '2000-2001', df_test) == 0.0)

# check all values in dataframe
#print(age_at_season('Jan_baby', '2000-2001', df_test))
assert (age_at_season('Jan_baby', '2000-2001', df_test) == 1.0)
#print(age_at_season('Feb_baby', '2000-2001', df_test))
assert (age_at_season('Feb_baby', '2000-2001', df_test) == 0.9)
#print(age_at_season('Mar_baby', '2000-2001', df_test))
assert (age_at_season('Mar_baby', '2000-2001', df_test) == 0.8)
#print(age_at_season('Apr_baby', '2000-2001', df_test))
assert (age_at_season('Apr_baby', '2000-2001', df_test) == 0.7)
#print(age_at_season('May_baby', '2000-2001', df_test))
assert (age_at_season('May_baby', '2000-2001', df_test) == 0.7)
#print(age_at_season('Jun_baby', '2000-2001', df_test))
assert (age_at_season('Jun_baby', '2000-2001', df_test) == 0.6)
#print(age_at_season('Jul_baby', '2000-2001', df_test))
assert (age_at_season('Jul_baby', '2000-2001', df_test) == 0.5)
#print(age_at_season('Aug_baby', '2000-2001', df_test))
assert (age_at_season('Aug_baby', '2000-2001', df_test) == 0.4)
#print(age_at_season('Sep_baby', '2000-2001', df_test))
assert (age_at_season('Sep_baby', '2000-2001', df_test) == 0.3)
#print(age_at_season('Oct_baby', '2000-2001', df_test))
assert (age_at_season('Oct_baby', '2000-2001', df_test) == 0.2)
#print(age_at_season('Nov_baby', '2000-2001', df_test))
assert (age_at_season('Nov_baby', '2000-2001', df_test) == 0.2)
#print(age_at_season('Dec_baby', '2000-2001', df_test))
assert (age_at_season('Dec_baby', '2000-2001', df_test) == 0.1)

# check for same result when applying the function
expected_set = {1.0, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1, 0.0}
#print(set(df_test.apply(lambda row: age_at_season(row.player, row.season, df_test), axis=1)))
assert (set(df_test.apply(lambda row: age_at_season(row.player, row.season, df_test), axis=1)) == expected_set)

# check specific players with dflfc_players
# Jon Flanagan was born on 1st Jan 1993, so check age is whole number
assert (age_at_season('Jon Flanagan', '2012-2013') == 20.0)
# Rigobert Song was born on 1st Jul 1976, so check age is half a whole number
assert (age_at_season('Rigobert Song', '2005-2006') == 29.5)

In [None]:
# create new dataframe and add age column
# no warnings expected i.e. all players should be found
dflfc_scorers_tl_pos_age = dflfc_scorers_tl_pos.copy()
dflfc_scorers_tl_pos_age['age'] = dflfc_scorers_tl_pos.apply(lambda row: age_at_season(row.player, row.season), axis=1)

In [None]:
dflfc_scorers_tl_pos_age.head(15)

In [None]:
# check data for Kenny Dalglish
dflfc_scorers_tl_pos_age[dflfc_scorers_tl_pos_age.player=='Kenny Dalglish']

#### Save the new dataframe
_This is the key dataframe used in the plot of age vs league goals._

In [None]:
dflfc_scorers_tl_pos_age.to_csv(LFC_SCORERS_TL_POS_AGE_CSV_FILE, header=True, index=False, sep=',')
assert os.path.isfile(LFC_SCORERS_TL_POS_AGE_CSV_FILE) 

### Create a dataframe of players who have scored a goal for Liverpool in the top flight

In [None]:
df_dropdown = pd.DataFrame(dflfc_scorers_tl_pos_age.player.unique(), columns=['player'])\
                .sort_values('player')
print(df_dropdown.head())

#### Save the new dataframe
_This is used by the lfcgmR application to provide a 'dropdown'list of players_

In [None]:
df_dropdown.to_csv(LFCGM_DROPDOWN, header=True, index=False)
assert os.path.isfile(LFCGM_DROPDOWN) 

### Create dataframe of player's league appearances
Input data source: lfchistory.net

In [None]:
# read the appearance file
print('Loading LFC appearances csv from {}'.format(LFC_APPS_CSV_FILE))
dflfc_lgapps = pd.read_csv(LFC_APPS_CSV_FILE)
print(dflfc_lgapps.shape)
dflfc_lgapps.head()

### Create merged dataframe of players, combining scorers in top league level season with squad position, age and appearances

In [None]:
dflfc_scorers_tl_pos_age_apps = dflfc_scorers_tl_pos_age.merge(dflfc_lgapps)
# calculate GPG for each player season and add as a column
dflfc_scorers_tl_pos_age_apps['GPG'] = (dflfc_scorers_tl_pos_age_apps.league/dflfc_scorers_tl_pos_age_apps.lgapp).round(3)
print(dflfc_scorers_tl_pos_age_apps.shape)
print(dflfc_scorers_tl_pos.shape)

In [None]:
dflfc_scorers_tl_pos_age_apps.head()

In [None]:
dflfc_scorers_tl_pos_age_apps.tail()

In [None]:
# show top GPG for players with > 10 games in a season
dflfc_scorers_tl_pos_age_apps[dflfc_scorers_tl_pos_age_apps.lgapp > 10].sort_values('GPG', ascending=False).head(10)

## Validate The App Data

Read the key player goals vs age data that is used by lfcgmR app

In [None]:
dflfcgm = pd.read_csv(LFC_SCORERS_TL_POS_AGE_CSV_FILE)
dflfcgm_dd = pd.read_csv(LFCGM_DROPDOWN)

In [None]:
dflfcgm.head()

In [None]:
dflfcgm_dd.head()

In [None]:
# check start and end
assert(dflfcgm.season.head(1).values[0] == SEASON_END)
assert(dflfcgm.season.tail(1).values[0] == '1894-1895') # first season in top flight

# check lfcgm dropdown consistent with main data source
assert(set(dflfcgm_dd.player) == set(dflfcgm.player))

# confirm that season 1939-1940 is not included in analysis
assert len(dflfcgm[dflfcgm.season == '1939-1940']) == 0

In [None]:
dflfcgm.head()

In [None]:
# show players scoring for first time in latest season
# these are new to the app dropdown
LATEST_SEASON = SEASON_END
latest_season_int = int(LATEST_SEASON[0:4])
PREVIOUS_SEASON = '{}-{}'.format(latest_season_int-1, latest_season_int)
#print(LATEST_SEASON, PREVIOUS_SEASON)
scorers_for_latest_season_set = set(dflfcgm[(dflfcgm.season == LATEST_SEASON) & (dflfcgm.league >= 1)].player.values)
scorers_for_previous_seasons_set = set(dflfcgm[(dflfcgm.season <= PREVIOUS_SEASON) & (dflfcgm.league >= 1)].player.values)
print('players scoring for first time in latest season: \n\t{}'\
          .format(', '.join(list(scorers_for_latest_season_set - scorers_for_previous_seasons_set))))

Validate the data using reference data from lfchistory.net

In [None]:
# build a dictionary with selected LFC season and total top flight league goalscorers in that season
# ref: data from lfchistory.net, season archive
lfc_season_tot_league_goals_d = {'1894-1895': 9,
                                 '1905-1906': 11,
                                 '1914-1915': 8,
                                 '1924-1925': 11,
                                 '1934-1935': 11,
                                 '1946-1947': 11,
                                 '1953-1954': 14,
                                 '1964-1965': 15,
                                 '1974-1975': 12,
                                 '1984-1985': 12,
                                 '1994-1995': 9,
                                 '2004-2005': 13,
                                 '2014-2015': 15}

# check total number of scorers matches dictionary
for season, tot_lg_scorers in lfc_season_tot_league_goals_d.items():
    #print(season, tot_lg_scorers, len(dflfcgm[(dflfcgm.season==season)].player.unique()))
    assert tot_lg_scorers == len(dflfcgm[(dflfcgm.season==season)].player.unique())

In [None]:
# build a dictionary with selected LFC players, giving list of 
# (season, total top flight league goals in that season, age in that season)
# ref: data from lfchistory.net, player archive (with calculation of age based on player profile)
lfc_player_season_tot_league_goals_d = {
   'Alex Raisbeck': [('1898-1899', 1, 19.0),
                     ('1899-1900', 3, 20.0),
                     ('1900-1901', 1, 21.0),
                     ('1902-1903', 1, 23.0),
                     ('1903-1904', 1, 24.0),
                     ('1905-1906', 1, 26.0),
                     ('1906-1907', 4, 27.0),
                     ('1907-1908', 2, 28.0),
                     ('1908-1909', 2, 29.0)],
        'Ian Rush': [('1981-1982', 17, 20.2),
                     ('1982-1983', 24, 21.2),
                     ('1983-1984', 32, 22.2),
                     ('1984-1985', 14, 23.2),
                     ('1985-1986', 22, 24.2),
                     ('1986-1987', 30, 25.2),
                     ('1988-1989', 7, 27.2),
                     ('1989-1990', 18, 28.2),
                     ('1990-1991', 16, 29.2),
                     ('1991-1992', 4, 30.2),
                     ('1992-1993', 14, 31.2),
                     ('1993-1994', 14, 32.2),
                     ('1994-1995', 12, 33.2),
                     ('1995-1996', 5, 34.2)],
   'Mohamed Salah': [('2017-2018', 32, 25.5),
                     ('2018-2019', 22, 26.5)]}

# check total number of scorers matches dictionary
for player, l in lfc_player_season_tot_league_goals_d.items():
    for season, tot_lg_goals, age in l:
        #print(player, season, tot_lg_goals)
        assert dflfcgm[(dflfcgm.season==season) & (dflfcgm.player==player)].league.values[0] == tot_lg_goals
        assert dflfcgm[(dflfcgm.season==season) & (dflfcgm.player==player)].age.values[0] == age

## Explore The Lfcgm App's Core Plotting Function

### Set-up the R environment

In [None]:
%%R

#load R libraries
library(ggplot2)
library(dplyr)
library(scales)

## Define the core R plotting function using ggplot2

In [None]:
%%R

# Change log
# v1, February 2016, original
# v2, August 2019, enhance to ensure pretty printing of x and y axis
ggplot_age_vs_lgoals <- function(df, players) {
  # Return ggplot of League Goals vs Age for given players in dataframe.
  #
  #  Given the low number of points, ggplot's geom_smooth uses
  #  the loess method with default span.
  
  TITLE <- 'LFCGMR League Goals vs Age'
  XLABEL <- 'Age at Midpoint of Season'
  YLABEL <- 'League Goals per Season'
  EXEMPLAR_PLAYERS <- c('Ian Rush', 'Kenny Dalglish', 'Roger Hunt', 'David Johnson', 
                        'Harry Chambers', 'John Toshack', 'John Barnes', 'Kevin Keegan')
  EXEMPLAR_TITLE <- 'LFCGMR Example Plot, The Champions: League Goals vs Age

This plot shows the goalscoring performance over their Liverpool career of 
arguably the most important 8 players, those who scored most goals in the 
18 title winning seasons
'
  
  # if players vector is empty then set the default exemplar options
  if (length(players) == 0) {
    players <- EXEMPLAR_PLAYERS
    title <- EXEMPLAR_TITLE
  } else {
    title <- TITLE
  }
  
  # create dataframes to plot...
  # filter those players with only 2 points and those with more than 2
  this_df <- df[df$player %in% players, ]
  this_dfeq2 <- this_df %>% group_by(player) %>% filter(n()==2)
  this_dfgt2 <- this_df %>% group_by(player) %>% filter(n()>2) 
  
  # produce the plot and return it
  this_plot <- ggplot(this_df, aes(x=age, y=league, color=player, shape=player)) + 
    geom_point(size=2) + 
    geom_line(data=this_dfeq2, size=0.5) +
    geom_smooth(data=this_dfgt2, se=FALSE, size=0.8) + 
    xlab(XLABEL) + 
    ylab(YLABEL) + 
    ggtitle(title) + 
    scale_shape_manual(values=0:length(players)) +
    theme(legend.text=element_text(size=10)) + 
    scale_y_continuous(breaks = function(x) unique(floor(pretty(seq(0, (max(x) + 1) * 1.1))))) + 
    ylim(c(0, max(this_df$league)+1)) + 
    scale_x_continuous(breaks = pretty_breaks())
  return (this_plot)
}

### Explore different plots and check the funtion is well-behaved

Note that one of the key challenges is that the function must cope with:
- single points (for players who have only scored in 1 season)
- straight lines (for players who have scored in 2 seasons)
- curved lines, using a line of best fit (for players who have scored in more than 2 seasons)

Start with the default plot

In [None]:
%%R -i dflfcgm -w 12 -h 8 -u in

# show the default plot, showing the champions (see below for more info on 'The Champions')
players = c()
plt <- ggplot_age_vs_lgoals(dflfcgm, players)
print(plt)

Plot a player with a single point

In [None]:
# plot a player with a single point
dflfcgm[dflfcgm.player=='Abel Xavier']

In [None]:
%%R -i dflfcgm -w 12 -h 8 -u in

# plot a player with a single point
players = c('Abel Xavier')
plt <- ggplot_age_vs_lgoals(dflfcgm, players)
print(plt)

Plot a player with 2 points

In [None]:
# plot a player with 2 points
dflfcgm[dflfcgm.player=='Andy Carroll']

In [None]:
%%R -i dflfcgm -w 12 -h 8 -u in

# plot a player with 2 points
players = c('Andy Carroll')
plt <- ggplot_age_vs_lgoals(dflfcgm, players)
print(plt)

In [None]:
# plot a player with more than 2 points
dflfcgm[dflfcgm.player=='Luis Suarez']

In [None]:
%%R -i dflfcgm -w 12 -h 8 -u in

# plot a player with more than 2 points
players = c('Luis Suarez')
plt <- ggplot_age_vs_lgoals(dflfcgm, players)
print(plt)

Plot all 3, a player with 1, 2 and more than 2 points

In [None]:
# plot all 3
dflfcgm[dflfcgm.player.isin(['Abel Xavier', 'Andy Carroll', 'Luis Suarez'])]

In [None]:
%%R -i dflfcgm -w 12 -h 8 -u in

# Plot all 3, a player with 1, 2 and more than 2 points
players = c('Abel Xavier', 'Andy Carroll', 'Luis Suarez')
plt <- ggplot_age_vs_lgoals(dflfcgm, players)
print(plt)

### Show some interesting LFC plots

Early Riser

In [None]:
# show all players scoring more than 20 goals when under 20 years old
dflfcgm[(dflfcgm.league >= 20) & 
        (dflfcgm.age < 20)]

In [None]:
%%R -i dflfcgm -w 12 -h 8 -u in

# produce plot for player known as 'god'
players = c('Robbie Fowler')
plt <- ggplot_age_vs_lgoals(dflfcgm, players)
print(plt)

Late Flourish

In [None]:
# show all players scoring more than 20 goals when over 30 years old
df_late = dflfcgm[(dflfcgm.league >= 20) & 
                  (dflfcgm.age > 30)]
players = list(df_late.player.values)
print(players)
df_late

In [None]:
%%R -i dflfcgm -w 12 -h 8 -u in

players = c('John Aldridge', 'Jack Balmer', 'Gordon Hodgson', 'Dick Forshaw', 'Ronald Orr')
plt <- ggplot_age_vs_lgoals(dflfcgm, players)
print(plt)

All time career top scorers

In [None]:
# show players scoring most league goals over their career
df_top = dflfcgm[['player', 'league']].groupby('player').sum()
df_top = df_top.sort_values('league', ascending=False).head(12)
df_top

In [None]:
players = list(df_top[df_top.league >= 120].index.values)
print(players)

In [None]:
%%R -i dflfcgm -w 12 -h 8 -u in

players = c('Gordon Hodgson', 'Ian Rush', 'Roger Hunt', 'Harry Chambers', 'Robbie Fowler', 'Steven Gerrard')
plt <- ggplot_age_vs_lgoals(dflfcgm, players)
print(plt)

Elite 30

In [None]:
# show players scoring >=30 league goals in a season
df_elite = dflfcgm[['season', 'player', 'league']].sort_values('league', ascending=False)
df_elite.head(10)

In [None]:
players = list(df_elite[df_elite.league > 30].player.unique())
print(players)
#ggplot_age_vs_lgoals(dflfc_scorers_tl_pos_age, list(players))

In [None]:
%%R -i dflfcgm players -w 12 -h 8 -u in

players = c('Gordon Hodgson', 'Ian Rush', 'Mohamed Salah', 'Sam Raybould', 'Luis Suarez', 'Roger Hunt')
plt <- ggplot_age_vs_lgoals(dflfcgm, players)
print(plt)

A Striking Trio
- For a discusson of Liverpool's best ever trio see Terry's blog [A Striking Trio](http://www.lfcsorted.com/2015/07/lfc-striking-trio.html)

In [None]:
# show best total for a striking trio in the league
df_trio = dflfcgm[['season', 'league']].groupby('season').head(3).groupby('season').sum()
df_trio.sort_values('league', ascending=False).head(10)

In [None]:
TOP_TRIO = ['1963-1964']
df_trio_players = dflfcgm[['season', 'player', 'league']]\
                                            [dflfcgm.season.isin(TOP_TRIO)].groupby('season').head(3)
players = list(df_trio_players.player.values)
print(players)
df_trio_players

In [None]:
%%R -i dflfcgm -w 12 -h 8 -u in

players = c('Roger Hunt', 'Ian St John', 'Alf Arrowsmith')
plt <- ggplot_age_vs_lgoals(dflfcgm, players)
print(plt)

A Striking Duo

In [None]:
# show best total for a striking duo in the league
df_duo = dflfcgm[['season', 'league']].groupby('season').head(2).groupby('season').sum()
df_duo.sort_values('league', ascending=False).head(20)

In [None]:
TOP_DUO = ['1963-1964', '2013-2014']
df_duo_players = dflfcgm[['season', 'player', 'league']]\
                             [dflfcgm.season.isin(TOP_DUO)].groupby('season').head(2)
df_duo_players

In [None]:
# plot first of TOP_DUO seasons
players = list(df_duo_players[df_duo_players.season == TOP_DUO[0]].player.values)
print(players)

In [None]:
%%R -i dflfcgm -w 12 -h 8 -u in

players = c('Roger Hunt', 'Ian St John')
plt <- ggplot_age_vs_lgoals(dflfcgm, players)
print(plt)

In [None]:
# plot second of TOP_DUO seasons
players = list(df_duo_players[df_duo_players.season == TOP_DUO[1]].player.values)
print(players)

In [None]:
%%R -i dflfcgm -w 12 -h 8 -u in

players = c('Luis Suarez', 'Daniel Sturridge')
plt <- ggplot_age_vs_lgoals(dflfcgm, players)
print(plt)

Performance of Liverpool players who went on to be managers

In [None]:
# produce list of managers - ref: http://www.lfchistory.net/Managers/
MANAGERS = ['William Barclay', 'Tom Watson', 'David Ashworth', 'Matt McQueen', 'George Patterson',\
    'George Kay', 'Don Welsh', 'Phil Taylor', 'Bill Shankly', 'Bob Paisley', 'Joe Fagan',\
    'Kenny Dalglish', 'Graeme Souness', 'Roy Evans', 'Gerard Houllier',\
    'Rafael Benitez', 'Roy Hodgson', 'Kenny Dalglish', 'Brendan Rodgers', 'Jurgen Klopp']
# excludes Ronnie Moran who was temporary manager in 1991

In [None]:
# produce list of players (who scored in more than 1 season at top level) who were managers
df_mgrs = dflfcgm[['player', 'league']][dflfcgm.player.isin(MANAGERS)]\
                                        .groupby('player').sum().sort_values('league', ascending=False)
df_mgrs

In [None]:
players = list(df_mgrs.index.values)
print(players)

In [None]:
%%R -i dflfcgm -w 12 -h 8 -u in

players = c('Kenny Dalglish', 'Graeme Souness', 'Phil Taylor', 'Bob Paisley')
plt <- ggplot_age_vs_lgoals(dflfcgm, players)
print(plt)

Top midfielders

In [None]:
# show midfielders who have scored more than 15 goals
df_mids = dflfcgm[(dflfcgm.position == 'Midfielder') &
                    (dflfcgm.league > 15)].sort_values('league', ascending=False)
df_mids

In [None]:
players = list(df_mids.sort_values('league', ascending=False).player.unique())
print(len(players), players)

In [None]:
%%R -i dflfcgm -w 12 -h 8 -u in

players = c('Billy Liddell', 'John Wark', 'Kenny Dalglish', 'Gordon Gunson', 'Steven Gerrard', 'John Barnes', 'Dick Edmed')
plt <- ggplot_age_vs_lgoals(dflfcgm, players)
print(plt)

Top Defenders

In [None]:
# show defenders who have scored more than 6 goals
df_defs = dflfcgm[(dflfcgm.position == 'Defender') &
                  (dflfcgm.league > 6)].sort_values('league', ascending=False)
df_defs

In [None]:
players = list(df_defs.sort_values('league', ascending=False).player.unique())
print(len(players), players)

In [None]:
%%R -i dflfcgm -w 12 -h 8 -u in

players = c('Chris Lawler', 'Phil Neal', 'James Milner', 'Martin Skrtel', 'John Arne Riise')
plt <- ggplot_age_vs_lgoals(dflfcgm, players)
print(plt)

Peak Performance

In [None]:
# show player with top score in a season, Gordon Hodgson
top_player = dflfcgm[dflfcgm.league == max(dflfcgm.league)]
top_player

In [None]:
players = list(top_player.player.values)
print(players)

In [None]:
%%R -i dflfcgm -w 12 -h 8 -u in

players = c('Gordon Hodgson')
plt <- ggplot_age_vs_lgoals(dflfcgm, players)
print(plt)

Rocket Men

Show the players scored over 50  goals in 3 or more consecutive seasons, with a rising number of goals each season.

In [None]:
# create dataframe ordered by player and season
df = dflfcgm.groupby(['player', 'season']).sum()
df.head(12)

In [None]:
def linefit(x, y):
    """"Return gradient and intercept of straight line of best fit for given x and y arrays."""
    gradient, intercept = np.polyfit(x, y, 1)
    return gradient, intercept

In [None]:
# test linefit()
# using y = 2x^2 + 6
x=np.array([-1, 0, 1, 2])
print(x)
y=2*x*x + 6
print(y)
print(plt.plot(x, y))
gradient, intercept = linefit(x, y)
print(np.round(gradient, 1), np.round(intercept, 1))
print(plt.plot(x, gradient*x + intercept))

In [None]:
# Show the players scored over 50 goals in 3 or more consecutive seasons, with a rising number of goals each season.
MIN_SEASONS = 3
MIN_TOTAL_GOALS = 50
p_prev = None # previous player
l_prev = None # previous league goals
Lg = [] # List of consecutive goals
La = [] # List of consecutive ages
Ls = [] # List of consecutive seasons

# iterate through dataframe 
# for each row of (player, season) (league goals, age)
for (p, s), (l, a) in df.iterrows():
    #print (p,s,l,a)
    if p != p_prev:
        # new player, so check previous
        if len(Lg) >= MIN_SEASONS and sum(Lg) >= MIN_TOTAL_GOALS:
            grad, intercept = linefit(np.array(range(len(Lg))), np.array(Lg))
            print('Rocket man: {}, goals={}, start_season={}, start_age={}, goals={}, grad={}'\
                                .format(p_prev, Lg, Ls[0], La[0], sum(Lg), np.round(grad, 2)))
            
        #print('new p', p)
        l_prev = None
        Lg = []
        La = []
        Ls = []
        
    #print(p, s, l, a) #player, season, league, age
    #print(l, l_prev, Lg)
    if (l_prev == None) or (l >= l_prev):
        #print('\t', l, 'greater than', l_prev, Lg)
        Lg.append(l)
        La.append(a)
        Ls.append(s)
    else:
        if len(Lg) >= MIN_SEASONS and sum(Lg) >= MIN_TOTAL_GOALS:
            grad, intercept = linefit(np.array(range(len(Lg))), np.array(Lg))
            print('Rocket man: {}, goals={}, start_season={}, start_age={}, goals={}, grad={}'\
                  .format(p_prev, Lg, Ls[0], La[0], sum(Lg), np.round(grad, 2)))
        Lg = [l]
        La = [a]
        Ls = [s]
    
    l_prev = l
    p_prev = p
            

Top 5 Rocket Men (sorted by gradient of line of best fit) are
- Dick Forshaw, 11
- Luis Suarez, 9.3
- Gordon Hodgson, 8.5
- Ian Rush, 8.0
- Robbie Fowler, 8.0

In [None]:
# show example graph of the rocket portion of the players career e.g. Robbie Fowler
p = 'Robbie Fowler'
Lg = [12.0, 25.0, 28.0]
dfp = dflfcgm[(dflfcgm.player == p) &
              (dflfcgm.league.isin(Lg))]
print(dfp)
#print ggplot_age_vs_lgoals(dfp, [p])

In [None]:
%%R -i dfp -w 12 -h 8 -u in

players = c('Robbie Fowler')
plt <- ggplot_age_vs_lgoals(dfp, players)
print(plt)

Striking Nostalgia

In [None]:
# Just a few of my early favourites
players = ['Kevin Keegan', 'Kenny Dalglish', 'Steve Heighway']

In [None]:
%%R -i dflfcgm -w 12 -h 8 -u in

players = c('Kevin Keegan', 'Kenny Dalglish', 'Steve Heighway')
plt <- ggplot_age_vs_lgoals(dflfcgm, players)
print(plt)

Highest scoring midfielders over career

In [None]:
df = dflfcgm[(dflfcgm.position == 'Midfielder')]\
                            .groupby('player').sum()
print(df[df.league > 50].sort_values('league', ascending=False)['league'])
players = df[df.league > 50].sort_values('league', ascending=False).index.unique()
print(len(players), list(players))

In [None]:
%%R -i dflfcgm -w 12 -h 8 -u in

players = c('Steven Gerrard', 'Billy Liddell', 'Berry Nieuwenhuys', 'Arthur Goddard', 'Jack Cox', 'John Barnes', 'Terry McDermott')
plt <- ggplot_age_vs_lgoals(dflfcgm, players)
print(plt)

Highest scoring defenders over career

In [None]:
df = dflfcgm[(dflfcgm.position == 'Defender')]\
                            .groupby('player').sum()
print(df[df.league > 20].sort_values('league', ascending=False)['league'])
players = df[df.league > 20].sort_values('league', ascending=False).index.unique()
print(len(players), list(players))

In [None]:
%%R -i dflfcgm -w 12 -h 8 -u in

players = c('Chris Lawler', 'Phil Neal', 'Tommy Smith', 'Donald Mackinlay', 'Steve Nicol', 'Sami Hyypia', 'John Arne Riise')
plt <- ggplot_age_vs_lgoals(dflfcgm, players)
print(plt)

The Champions

This plot shows the goalscoring performance over their Liverpool career of 
arguably the most important 8 players, those who scored most goals in the 
18 title winning seasons

In [None]:
# create list of seasons when LFC were champions
CHAMPS = ['1900-1901', '1905-1906', '1921-1922', '1922-1923', '1946-1947', '1963-1964',\
          '1965-1966', '1972-1973', '1975-1976', '1976-1977', '1978-1979', '1979-1980',\
          '1981-1982', '1982-1983', '1983-1984', '1985-1986', '1987-1988', '1989-1990']
print(len(CHAMPS))

In [None]:
# show total goals over career in title winning teams
df_champs = dflfcgm[dflfcgm.season.isin(CHAMPS)][['league', 'player']].groupby('player').sum()\
                            .sort_values('league', ascending=False).head(12)
df_champs

In [None]:
# plot top 8
players = list(df_champs.index.values[:8])
print(players)

In [None]:
%%R -i dflfcgm -w 12 -h 8 -u in

players = c('Ian Rush', 'Kenny Dalglish', 'Roger Hunt', 'David Johnson', 'Harry Chambers', 'John Toshack', 'John Barnes', 'Kevin Keegan')
plt <- ggplot_age_vs_lgoals(dflfcgm, players)
print(plt)

In [None]:
# show highest scorers in a title winning season
dflfcgm[dflfcgm.season.isin(CHAMPS)].sort_values('league', ascending=False).head(12)

European Cup Winning Team, May 1977

In [None]:
players = ['Ray Clemence', 'Phil Neal', 'Joey Jones', 'Tommy Smith',
           'Ray Kennedy', 'Emlyn Hughes', 'Kevin Keegan', 'Jimmy Case',
           'Steve Heighway', 'Ian Callaghan', 'Terry McDermott']
print(players)

In [None]:
%%R -i dflfcgm -w 12 -h 8 -u in

players = c('Ray Clemence', 'Phil Neal', 'Joey Jones', 'Tommy Smith', 'Ray Kennedy', 'Emlyn Hughes', 'Kevin Keegan', 'Jimmy Case', 'Steve Heighway', 'Ian Callaghan', 'Terry McDermott')
plt <- ggplot_age_vs_lgoals(dflfcgm, players)
print(plt)

Best goals per game

In [None]:
dflfc_scorers_tl_pos_age_apps.head()

In [None]:
# show best GPG per season where appearance > 10
dflfc_scorers_tl_pos_age_apps[dflfc_scorers_tl_pos_age_apps.lgapp > 10].sort_values('GPG', ascending=False).head(15)

In [None]:
# show best Career GPG (CGPG) per career where appearance > 50
df_gpg = dflfc_scorers_tl_pos_age_apps[['player', 'league', 'lgapp']].groupby('player').sum()
df_gpg['CGPG'] = (df_gpg.league/df_gpg.lgapp).round(3) # career goals per game
df_gpg['CMPG'] = (df_gpg.lgapp*90/df_gpg.league).round(3) # career minutes per goal (assume all apps = 90 mins)
df_gpg[df_gpg.lgapp > 50].sort_values('CGPG', ascending=False).head(12)

In [None]:
# plot top 6 goal scorers with best Career GPG
players = list(df_gpg[df_gpg.lgapp > 50].sort_values('CGPG', ascending=False).head(6).index.values)
print(players)

In [None]:
%%R -i dflfcgm -w 12 -h 8 -u in

players = c('Mohamed Salah', 'Gordon Hodgson', 'Fernando Torres', 'Luis Suarez', 'Jimmy Smith', 'John Aldridge')
plt <- ggplot_age_vs_lgoals(dflfcgm, players)
print(plt)

## Note on the variable number of games per season

Note that the number of league games has varied over the top level seasons.

In [None]:
# show number of different total games
print(dflfc_league[dflfc_league.League.isin(['1st Division', 'Premier League'])].PLD.unique())

In [None]:
# show number of seasons for each total
dflfc_league[dflfc_league.League.isin(['1st Division', 'Premier League'])][['PLD', 'Season']].groupby('PLD').count()

In [None]:
# show the seasons for each total
dflfc_league[dflfc_league.League.isin(['1st Division', 'Premier League'])][['PLD', 'Season']]\
                        .groupby('PLD')['Season'].apply(lambda x: ','.join(x))

## lfcgmR App

### Building and Deploying the lfcgmR App

The lfcgmR interactive web app is built using R Shiny and published using their cloud platform.

Useful reference material:
+ How to develop a shiny app [https://shiny.rstudio.com/](https://shiny.rstudio.com/).

### Running the App

The app is available at [lfcgmr.lfcsorted.com](http://lfcgmr.lfcsorted.com) and [terrydolan.shinyapps.io/lfcgmR](https://terrydolan.shinyapps.io/lfcgmR/).

### App Source Code

The lfcgmR source code is available on github at [lfcgmR github repo](https://github.com/terrydolan/lfcgmR).

The main files are:
+ ui.R: user interface definition.
+ server.R: server definition, with main plot and table functions.
+ global.R: global definition e.g. version.
+ about.R: description of the app.
+ lfcgmR.ipynb: this notebook.

## App Data

The LFC Goal Machine app uses the following data files:
+ data/lfcgm_app_dropdown.csv (used to build the dropdown list of players)
+ data/lfc_scorers_tl_pos_age.csv (used to build the pandas dataframe of LFC scorers in top level league)

The data structure of these files is described in this notebook. However the data is not in the lfcgm github repository because the data is owned by [lfchistory.net](http://www.lfchistory.net).

In [None]:
print("reached final cell")