# ESPN Cricinfo 
## The 'Top 4' 
There is a debate in cricket about who is the best batsman in the world. I think it was fairly universal that 4 batsman were part of the debate:
* Virat Kohli (India)
* Steve Smith (Australia)
* Kane Williamson (New Zealand)
* Joe Root (England)

However, it seems that Joe Root has fallen slightly away in this great debate. I thought it would be good to look at their test match scores spanning their entire careers and take a look at whether there is a clear fall away of England's nomination for the debate. 

In [1]:
# Imports
import requests
import pandas as pd
from bs4 import BeautifulSoup

We can grab the innings by innings list in test matches for each of these guys. 

In [4]:
virat_kohli = 'http://stats.espncricinfo.com/ci/engine/player/253802.html?class=1;template=results;type=allround;view=innings'
steve_smith = 'http://stats.espncricinfo.com/ci/engine/player/267192.html?class=1;template=results;type=allround;view=innings'
kane_williamson = 'http://stats.espncricinfo.com/ci/engine/player/277906.html?class=1;template=results;type=allround;view=innings'
joe_root = 'http://stats.espncricinfo.com/ci/engine/player/303669.html?class=1;template=results;type=allround;view=innings'

In [12]:
soup = BeautifulSoup(requests.get(joe_root).text, features="html.parser")

for caption in soup.find_all('caption'):
    if caption.get_text() == 'Innings by innings list':
        main_table = caption.find_parent('table', {'class': 'engineTable'})

We should isolate the headers of the table so that we know the structure that we are aiming for. 

In [28]:
[header.get_text() for header in main_table.find('thead').find_all('tr')[0].find_all('th')]

['Inns',
 'Score',
 'Overs',
 'Conc',
 'Wkts',
 'Ct',
 'St',
 '',
 'Opposition',
 'Ground',
 'Start Date',
 '']

We can then take the body of the table and start working on that. We are aiming to have something that we can turn into a `pd.DataFrame` as that will make it much easier to manipulate and do plots. 

In [35]:
for innings in [row for row in main_table.find('tbody').find_all('tr')]:
    stats = [stat.get_text() for stat in innings.find_all('td')]
    print(stats)

['1', '73', '-', '-', '-', '-', '-', '', 'v India', 'Nagpur', '13 Dec 2012', 'Test # 2066']
['2', '-', '1.0', '5', '0', '0', '0', '', 'v India', 'Nagpur', '13 Dec 2012', 'Test # 2066']
['3', '20*', '-', '-', '-', '-', '-', '', 'v India', 'Nagpur', '13 Dec 2012', 'Test # 2066']
['1', '4', '-', '-', '-', '-', '-', '', 'v New Zealand', 'Dunedin', '6 Mar 2013', 'Test # 2077']
['2', '-', '5.0', '8', '0', '0', '0', '', 'v New Zealand', 'Dunedin', '6 Mar 2013', 'Test # 2077']
['3', '0', '-', '-', '-', '-', '-', '', 'v New Zealand', 'Dunedin', '6 Mar 2013', 'Test # 2077']
['1', '10', '-', '-', '-', '-', '-', '', 'v New Zealand', 'Wellington', '14 Mar 2013', 'Test # 2080']
['2', '-', '1.0', '6', '0', '0', '0', '', 'v New Zealand', 'Wellington', '14 Mar 2013', 'Test # 2080']
['3', '-', '2.0', '12', '0', '0', '0', '', 'v New Zealand', 'Wellington', '14 Mar 2013', 'Test # 2080']
['1', '-', '2.0', '5', '0', '0', '0', '', 'v New Zealand', 'Auckland', '22 Mar 2013', 'Test # 2084']
['2', '45', '-', '-

That looks ideal for now. 

In [36]:
headers = [header.get_text() for header in main_table.find('thead').find_all('tr')[0].find_all('th')]
rows = []

for innings in [row for row in main_table.find('tbody').find_all('tr')]:
    rows.append([stat.get_text() for stat in innings.find_all('td')])

In [37]:
headers

['Inns',
 'Score',
 'Overs',
 'Conc',
 'Wkts',
 'Ct',
 'St',
 '',
 'Opposition',
 'Ground',
 'Start Date',
 '']

In [40]:
rows[:2]

[['1',
  '73',
  '-',
  '-',
  '-',
  '-',
  '-',
  '',
  'v India',
  'Nagpur',
  '13 Dec 2012',
  'Test # 2066'],
 ['2',
  '-',
  '1.0',
  '5',
  '0',
  '0',
  '0',
  '',
  'v India',
  'Nagpur',
  '13 Dec 2012',
  'Test # 2066']]

In [46]:
len(headers)

12

In [47]:
len(rows)

327

In [48]:
len(rows[0])

12

This is encouraging that it looks like we have pulled the data in the right dimensions, so it should hopefully align. 

In [50]:
pd.DataFrame(rows, columns=headers)

Unnamed: 0,Inns,Score,Overs,Conc,Wkts,Ct,St,Unnamed: 8,Opposition,Ground,Start Date,Unnamed: 12
0,1,73,-,-,-,-,-,,v India,Nagpur,13 Dec 2012,Test # 2066
1,2,-,1.0,5,0,0,0,,v India,Nagpur,13 Dec 2012,Test # 2066
2,3,20*,-,-,-,-,-,,v India,Nagpur,13 Dec 2012,Test # 2066
3,1,4,-,-,-,-,-,,v New Zealand,Dunedin,6 Mar 2013,Test # 2077
4,2,-,5.0,8,0,0,0,,v New Zealand,Dunedin,6 Mar 2013,Test # 2077
5,3,0,-,-,-,-,-,,v New Zealand,Dunedin,6 Mar 2013,Test # 2077
6,1,10,-,-,-,-,-,,v New Zealand,Wellington,14 Mar 2013,Test # 2080
7,2,-,1.0,6,0,0,0,,v New Zealand,Wellington,14 Mar 2013,Test # 2080
8,3,-,2.0,12,0,0,0,,v New Zealand,Wellington,14 Mar 2013,Test # 2080
9,1,-,2.0,5,0,0,0,,v New Zealand,Auckland,22 Mar 2013,Test # 2084
