# Peak Player Performance 

This notebook is based on the article [The Best Stats Measure](https://www.thecricketmonthly.com/story/1057899/the-best-stats-measure), written by Andy Zaltzman who compares alternative ways of measuring the ability of cricketers. He defines here the Peak-33 metric:

> Peak-33 is based on the 33 matches in which batsmen scored most runs and bowlers took most wickets, rather than the 33 in which they returned the best average. 

The search for great metrics for measuring cricket performance is continually evolving and the question about different eras is one that rages on. He does speak further about normalising the numbers for an era and for given matches, but that isn't something that I will explore here, at least initially.  

This acts as an implementation of taking averages and summaries for given time periods, matches and innings for a given cricketer to enable comparison of cricketers at their peaks using `BeautifulSoup` to scrape the stats online.  

In [4]:
import numpy as np 
import pandas as pd 
import requests
from bs4 import BeautifulSoup
import re

I will begin with using Ben Stokes as a working example. 

In [5]:
player_id = 311158 # Ben Stokes ESPN CricInfo ID 

base_player_url = f'https://stats.espncricinfo.com/ci/engine/player/{player_id}.html'
batting_innings_link = base_player_url + '?class=1;template=results;type=batting;view=innings'
bowling_innings_link = base_player_url + '?class=1;template=results;type=bowling;view=innings'

In [6]:
def get_innings_by_innings_table(url):
    soup = BeautifulSoup(requests.get(url).text, features="html.parser")
    for caption in soup.find_all('caption'):
        if caption.get_text() == 'Innings by innings list':
            main_table = caption.find_parent(
                'table', {'class': 'engineTable'})
        
    columns = [header.get_text() for header in main_table.find('thead').find_all('tr')[0].find_all('th')]
    rows = []

    for innings in [
            row for row in main_table.find('tbody').find_all('tr')]:
        rows.append([stat.get_text() for stat in innings.find_all('td')])

    final_table = pd.DataFrame(rows, columns=columns).apply(pd.to_numeric, errors='ignore')
        
    # Remove blank columns
    return(final_table.loc[:,[i for i in final_table.columns if i != '']])

We can get the list of test match batting and bowling innings. 

In [19]:
batting = get_innings_by_innings_table(batting_innings_link)
bowling = get_innings_by_innings_table(bowling_innings_link)

In [20]:
batting.head()

Unnamed: 0,Runs,Mins,BF,4s,6s,SR,Pos,Dismissal,Inns,Opposition,Ground,Start Date
0,1,19,12,0,0,8.33,6,lbw,2,v Australia,Adelaide,5 Dec 2013
1,28,122,90,5,0,31.11,6,caught,4,v Australia,Adelaide,5 Dec 2013
2,18,94,57,3,0,31.57,6,caught,2,v Australia,Perth,13 Dec 2013
3,120,256,195,18,1,61.53,6,caught,4,v Australia,Perth,13 Dec 2013
4,14,37,23,1,1,60.86,6,caught,1,v Australia,Melbourne,26 Dec 2013


In [21]:
bowling.head()

Unnamed: 0,Overs,Mdns,Runs,Wkts,Econ,Pos,Inns,Opposition,Ground,Start Date
0,18.0,2,70,2,3.88,5,1,v Australia,Adelaide,5 Dec 2013
1,7.0,3,20,0,2.85,4,3,v Australia,Adelaide,5 Dec 2013
2,17.0,3,63,1,3.7,4,1,v Australia,Perth,13 Dec 2013
3,18.0,1,82,2,4.55,3,3,v Australia,Perth,13 Dec 2013
4,15.0,4,46,1,3.06,3,2,v Australia,Melbourne,26 Dec 2013
