<center>
<img src="https://habrastorage.org/files/fd4/502/43d/fd450243dd604b81b9713213a247aa20.jpg" />
    
## [mlcourse.ai](https://mlcourse.ai) – Open Machine Learning Course 
    
### <center> Author: Rajasekhar Battula, @Rajasekhar Battula
    
## <center> Tutorial
### <center> Cricket Scorecard Data Scraping.

 For each cricket there is a unique MatchID provided at the end of a match page URL in the website ('http://www.espncricinfo.com/matches/engine/match/1119496.html'). MatchID is highlighted with red box in the URL. We will use this ID to fetch score card details. Lets use of couple of matches to explain in this tutorial.
1. Match 1: India vs Australia, Sep 17 2017  --> MatchID = '**1119496**'
2. Match 2: England vs New Zealand, Jun 6 2017   --> MatchID = '**1022357**'

![Cricket1](http://imgur.com/wueAhlY)

In [1]:
#Install the required libraries as below
!pip install python-espncricinfo
!pip install grequests

Collecting python-espncricinfo
  Downloading https://files.pythonhosted.org/packages/5f/36/ab9c7a617f7420235f1fa12aad9878cdbb0a3d53e6ca0e5fd8c99ac9bc7a/python_espncricinfo-0.4.1-py2.py3-none-any.whl
Collecting bs4 (from python-espncricinfo)
  Downloading https://files.pythonhosted.org/packages/10/ed/7e8b97591f6f456174139ec089c769f89a94a1a4025fe967691de971f314/bs4-0.0.1.tar.gz
Collecting dateparser (from python-espncricinfo)
[?25l  Downloading https://files.pythonhosted.org/packages/a0/30/5cb8bb214c0b111fb59137c2e19c636a136209dbe45e1c3e9d63f7a76c1a/dateparser-0.7.1-py2.py3-none-any.whl (351kB)
[K    100% |████████████████████████████████| 358kB 7.5MB/s 
Building wheels for collected packages: bs4
  Building wheel for bs4 (setup.py) ... [?25l- done
[?25h  Stored in directory: /tmp/.cache/pip/wheels/a0/b0/b2/4f80b9456b87abedbc0bf2d52235414c3467d8889be38dd472
Successfully built bs4
Installing collected packages: bs4, dateparser, python-espncricinfo
Successfully installed 

Do you like Cricket? I think you do.

In this tutorial we will discuss the way to convert publicly available cricket scorecard of any cricket match from the website (www.espncricinfo.com) to Structured tabular format in python dataframe. For the same purpose I'd make use of a python library espncricinfo.

In [2]:
from espncricinfo.summary import Summary
from espncricinfo.match import Match 
from espncricinfo.series import Series

import json
import requests
from bs4 import BeautifulSoup
from espncricinfo.exceptions import MatchNotFoundError, NoScorecardError

import pandas as pd

  curious_george.patch_all(thread=False, select=False)


For the sake of illustration i've used the below couple of matchID's. We can use any number of them and it depends on your system computational power.

In [3]:
testlist = ['1119496', '1022357']

In [4]:
#To see the match URL we can use the below function within espncricinfo library
Match('1119496').match_url

'http://www.espncricinfo.com/matches/engine/match/1119496.html'

In [5]:
#This functions helps to expand the list of dictonaries to columns in a dataframe.
def flatten(js):
    return pd.DataFrame(js).set_index(['text','name']).squeeze()

Steps to be followed to create the structured data from ESPNCricinfo website:
-  To extract the scorecard details we create 2 dataframes one for batsmen and other for bowllers
-  Firstly, extract the HTML content of the match using the Match(<matchID>).html
-  Locate the scorecard script in the HTML content using the method "find_all" of BeautifulSoup.
-  Convert it to JSON format.
-  Extract the scorecard of each batsmen or bowller using the key-value pair of JSON dicts.
-  Flatten the scorecard of each batsmen or bowller using the "flatten" function written above.
-  Using the helper functions in espncricinfo library extract the city and date of match.
-  Finally return a dataframe with all the above details.

In [6]:
def getbattingdatafame(list1):
    df = pd.DataFrame()
    for x in list1:
        x1 = Match(x).html
        x2 = json.loads(x1.find_all('script')[13].get_text().replace("\n", " ").replace('window.__INITIAL_STATE__ =','').replace('&dagger;','wk').replace('&amp;','').replace('wkts;','wkts,').replace('wkt;','wkt,').strip().replace('};', "}};").split('};')[0])
        df1bat = pd.DataFrame(x2['gamePackage']['scorecard']['innings']['1']['batsmen'])
        d1title = x2['gamePackage']['scorecard']['innings']['1']['title']
        df1bat['Team'] = d1title.split(' ')[0]
        df2bat = pd.DataFrame(x2['gamePackage']['scorecard']['innings']['2']['batsmen'])
        d2title = x2['gamePackage']['scorecard']['innings']['2']['title']
        df2bat['Team'] = d2title.split(' ')[0]
        df1bat['Oppositionteam'] = d2title.split(' ')[0]
        df2bat['Oppositionteam'] = d1title.split(' ')[0]
        
        Finaldf_bat = pd.concat([df1bat.drop(['captain','commentary','runningScore','runningOver', 'stats','hasVideoId','href','isNotOut','roles','shortText','trackingName'], axis=1),
           df1bat.stats.apply(flatten)], axis=1).append(pd.concat([df2bat.drop(['captain','commentary','runningScore','runningOver', 'stats','hasVideoId','href','isNotOut','roles','shortText','trackingName'], axis=1),
                                                               df2bat.stats.apply(flatten)], axis=1))
        Finaldf_bat['city'] = Match(x).town_name
        Finaldf_bat['date'] = Match(x).date
        df=pd.concat([df,Finaldf_bat])
    return(df)

In [7]:
getbattingdatafame(testlist).head()

Unnamed: 0,name,Team,Oppositionteam,"(R, runs)","(B, ballsFaced)","(M, minutes)","(4s, fours)","(6s, sixes)","(SR, strikeRate)",city,date
0,AM Rahane,India,Australia,5,15,18,0,0,33.33,Chennai,2017-09-17
1,RG Sharma,India,Australia,28,44,81,3,0,63.63,Chennai,2017-09-17
2,V Kohli,India,Australia,0,4,7,0,0,0.0,Chennai,2017-09-17
3,MK Pandey,India,Australia,0,2,4,0,0,0.0,Chennai,2017-09-17
4,KM Jadhav,India,Australia,40,54,72,5,0,74.07,Chennai,2017-09-17


In [8]:
def getbowlingdatafame(list1):
    df = pd.DataFrame()
    for x in list1:
        x1 = Match(x).html
        x2 = json.loads(x1.find_all('script')[13].get_text().replace("\n", " ").replace('window.__INITIAL_STATE__ =','').replace('&dagger;','wk').replace('&amp;','').replace('wkts;','wkts,').replace('wkt;','wkt,').strip().replace('};', "}};").split('};')[0])
        df1bowl = pd.DataFrame(x2['gamePackage']['scorecard']['innings']['1']['bowlers'])
        d1title = x2['gamePackage']['scorecard']['innings']['1']['title']
        df2bowl = pd.DataFrame(x2['gamePackage']['scorecard']['innings']['2']['bowlers'])
        d2title = x2['gamePackage']['scorecard']['innings']['2']['title']
        df1bowl['Team'] = d2title.split(' ')[0]
        df2bowl['Team'] = d1title.split(' ')[0]
        df1bowl['Oppositionteam'] = d1title.split(' ')[0]
        df2bowl['Oppositionteam'] = d2title.split(' ')[0]
        
        Finaldf_bowl = pd.concat([df1bowl.drop(['captain','stats','hasVideoId','href','roles','trackingName'], axis=1),
                       df1bowl.stats.apply(flatten)], axis=1).append(pd.concat([df2bowl.drop(['captain','stats','hasVideoId','href','roles','trackingName'], axis=1),
                                                               df2bowl.stats.apply(flatten)], axis=1))
        Finaldf_bowl['city'] = Match(x).town_name
        Finaldf_bowl['date'] = Match(x).date
        df=pd.concat([df,Finaldf_bowl])
    return(df)

In [9]:
getbowlingdatafame(testlist).head()

Unnamed: 0,name,Team,Oppositionteam,"(O, overs)","(M, maidens)","(R, conceded)","(W, wickets)","(Econ, economyRate)","(0s, dots)","(4s, foursConceded)","(6s, sixesConceded)","(WD, wides)","(NB, noballs)",city,date
0,PJ Cummins,Australia,India,10,1,44,0,4.4,33,3,0,2,0,Chennai,2017-09-17
1,NM Coulter-Nile,Australia,India,10,0,44,3,4.4,38,5,0,1,1,Chennai,2017-09-17
2,JP Faulkner,Australia,India,10,1,67,1,6.7,31,8,2,0,2,Chennai,2017-09-17
3,MP Stoinis,Australia,India,10,0,54,2,5.4,27,3,1,2,0,Chennai,2017-09-17
4,A Zampa,Australia,India,10,0,66,1,6.6,26,3,4,0,0,Chennai,2017-09-17


We have other helper functions to see the details of the series as shown below. 

In [10]:
print(Series('18808').years)
print(Series('18808').url)
print(Series('18808').name)

['2019']
http://www.espn.in/cricket/series/_/id/18808/india-in-new-zealand-2018-19
India tour of New Zealand 2018/19


## References:
1. [espncricinfo reference](https://github.com/dwillis/python-espncricinfo)<br>
2. [Scorecards from YAML files](https://github.com/tvganesh/yorkpy/blob/master/yorkpy/analytics.py)<br>