<a href="https://colab.research.google.com/github/yashguptaab99/Cricket-Prediction/blob/master/Cricket_Predictions_Batting.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# INCREASED PREDICTION ACCURACY IN THE GAME OF CRICKET USING MACHINE LEARNING

Player selection is one the most important tasks for any sport and cricket is no exception. The performance of the players depends on various factors such as the opposition team, the venue, his current form etc. The team management, the coach and the captain select 11 players for each match from a squad of 15 to 20 players. They analyze different characteristics and the statistics of the players to select the best playing 11 for each match. Each batsman contributes by scoring maximum runs possible and each bowler contributes by taking maximum wickets and conceding minimum runs. This paper attempts to predict the performance of players as how many runs will each batsman score and how many wickets will each bowler take for both the teams. Both the problems are targeted as classification problems where number of runs and number of wickets are classified in different ranges. We used naïve bayes, random forest, multiclass SVM and decision tree classifiers to generate the prediction models for both the problems. Random Forest classifier was found to be the most accurate for both the problems. 

# Importing Libraray

In [71]:
import pandas as pd
import re

# Importing Data

In [72]:
# All Innings list after 14 Jan 2005
innings = pd.read_csv("/content/drive/My Drive/Projects/Cricket Prediction/Batting.csv")

# All Ininngs list from 18 Dec 1989 to 13 Jan 2005
inningsExtra = pd.read_csv("/content/drive/My Drive/Projects/Cricket Prediction/Batting89-05.csv")

# Data Preprocessing

## Batting data

In [73]:
innings = innings.drop(columns=['Mins', '4s', '6s', 'Sr', 'Inns'])
inningsExtra = inningsExtra.drop(columns=['Mins', '4s', '6s', 'Sr', 'Inns'])

In [74]:
# Cleaning data

innings = innings[innings.Runs != 'DNB']
innings = innings[innings.Runs != 'TDNB']
innings = innings[innings.Runs != 'sub']
innings = innings[innings.Runs != 'absent']
innings = innings.rename(columns={"Player 1":"Player", "Start Date":"StartDate"})

inningsExtra = inningsExtra[inningsExtra.Runs != 'DNB']
inningsExtra = inningsExtra[inningsExtra.Runs != 'TDNB']
inningsExtra = inningsExtra[inningsExtra.Runs != 'sub']
inningsExtra = inningsExtra[inningsExtra.Runs != 'absent']
inningsExtra = inningsExtra.rename(columns={"Player 1":"Player", "Start Date":"StartDate"})


In [75]:
#List of all players who played after 14 Jan 2005

listOfBatsman = list(innings['Player'].unique())

In [76]:
#Merging player past performance innings which were present in matches after 2005
# for ex. Sachine was senior most so his mast matches performance shoould be added 

for player in listOfBatsman:
  playerframe = inningsExtra[inningsExtra.Player == player]
  innings = innings.append(playerframe)


In [77]:
innings['StartDate'] = pd.to_datetime(innings['StartDate'])
# Now innings variable contains all players past played innings
innings.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 47844 entries, 2 to 35439
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   Player      47844 non-null  object        
 1   Team        47844 non-null  object        
 2   Runs        47844 non-null  object        
 3   Bf          47844 non-null  object        
 4   Opposition  47844 non-null  object        
 5   Ground      47844 non-null  object        
 6   StartDate   47844 non-null  datetime64[ns]
dtypes: datetime64[ns](1), object(6)
memory usage: 2.9+ MB


In [78]:
#Converting bf to integer and cleaning it
bf = []
for st in innings['Bf'].values:
  st = re.findall(r'[0-9]+', st)
  if not st:
    st.append('0')
  bf.append(float(st[0]))

In [79]:
innings['Bf'] = bf

In [80]:
innings.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 47844 entries, 2 to 35439
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   Player      47844 non-null  object        
 1   Team        47844 non-null  object        
 2   Runs        47844 non-null  object        
 3   Bf          47844 non-null  float64       
 4   Opposition  47844 non-null  object        
 5   Ground      47844 non-null  object        
 6   StartDate   47844 non-null  datetime64[ns]
dtypes: datetime64[ns](1), float64(1), object(5)
memory usage: 2.9+ MB


### Calculating The Derived Attributes

#### Consistency

This attribute describes how experienced the player is and how consistent he has been throughout his career. All the traditional attributes used in this formula are calculated over the entire career of the player. 

**Consistency = (0.4262 X average) + (0.2566 X no. of innings) + (0.1510 X SR) + (0.0787 X Centuries) + (0.0556 X Fifties) – (0.0328 X Zeros)**


In [81]:
#Please Rate then Calculate

## Consistency 
Consistency = []

for player in listOfBatsman:
  not_outs = 0
  runs_score = 0
  balls_faced = 0
  playerframe = innings[innings.Player == player]

  ######### Number of innings #########
  numInnings = playerframe.shape[0]

  ######### Amount of not out #########
  for st in playerframe['Runs'].values:
    if st.endswith("*"):
      not_outs+=1

   ######### Number of Dismisal #########
  num_of_dismisal = numInnings - not_outs

  ######### Total Runs #########
  #converting to int
  playruns = []
  for st in playerframe['Runs'].values:
    st = re.findall(r'[0-9]+', st)
    if not st:
      st.append('0')
    playruns.append(float(st[0]))
  playerframe['Runs'] = playruns
  runs_score = playerframe['Runs'].sum()

  ######### Total Ball Faced #########
  balls_faced = playerframe['Bf'].sum()

  ######### Batting Average #########
  if (num_of_dismisal==0):
    average = 0
  else:
    average = runs_score/num_of_dismisal

  ######### Strike Rate #########
  if (balls_faced==0):
    sr = 0
  else:
    sr = (runs_score/balls_faced) * 100

  ######### Number of Centuries #########
  cen = playerframe[playerframe.Runs >= 100].shape[0]

  ######### Number of Fifties #########
  fif = playerframe[playerframe.Runs >= 50].shape[0]
  fif = fif - cen

  ######### Highest Score #########
  h = playerframe['Runs'].max()

  ######### Number of Zeros #########
  zero = playerframe[playerframe.Runs == 0].shape[0]


####################  Rate the Elements Before Calculation  ####################

  #### numInnings ####
  if (numInnings>=1 and numInnings<=49):
    numInnings = 1
  elif (numInnings>=50 and numInnings<=99):
    numInnings = 2
  elif (numInnings>=100 and numInnings<=124):
    numInnings = 3
  elif (numInnings>=125 and numInnings<=149):
    numInnings = 4
  elif (numInnings>=150):
    numInnings = 5 

  #### average ####
  if (average>=0.0 and average<=9.9):
    average = 1
  elif (average>=10.0 and average<=19.9):
    average = 2
  elif (average>=20.0 and average<=29.9):
    average = 3
  elif (average>=30.0 and average<=39.9):
    average = 4
  elif (average>=40.0):
    average = 5   

  #### sr ####
  if (sr>=0.0 and sr<=49.9):
    sr = 1
  elif (sr>=50.0 and sr<=59.9):
    sr = 2
  elif (sr>=60.0 and sr<=79.0):
    sr = 3
  elif (sr>=80.0 and sr<=99.9):
    sr = 4
  elif (sr>=100.0):
    sr = 5  

  #### cen ####
  if (cen>=1 and cen<=4):
    cen = 1
  elif (cen>=5 and cen<=9):
    cen = 2
  elif (cen>=10 and cen<=14):
    cen = 3
  elif (cen>=15 and cen<=19):
    cen = 4
  elif (cen>=20):
    cen = 5 

  #### fif ####
  if (fif>=1 and fif<=9):
    fif = 1
  elif (fif>=10 and fif<=19):
    fif = 2
  elif (fif>=20 and fif<=29):
    fif = 3
  elif (fif>=30 and fif<=39):
    fif = 4
  elif (fif>=40):
    fif = 5 

  #### zero ####
  if (zero>=1 and zero<=4):
    zero = 1
  elif (zero>=5 and zero<=9):
    zero = 2
  elif (zero>=10 and zero<=14):
    zero = 3
  elif (zero>=15 and zero<=19):
    zero = 4
  elif (zero>=20):
    zero = 5 


  consistency = (0.4262 * average) + (0.2566 * numInnings) + (0.1510 * sr) + (0.0787 * cen) + (0.0556 * fif) - (0.0328 * zero)
  Consistency.append(consistency)


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


In [82]:
ConsistencyFrame = pd.DataFrame(Consistency, columns = ["Consistency"])

In [83]:
ConsistencyFrame

Unnamed: 0,Consistency
0,1.4050
1,2.2902
2,3.0472
3,1.7130
4,1.3782
...,...
1395,2.4700
1396,1.6474
1397,2.0838
1398,1.7130


#### Form

Form of a player describes his performance over last one year. All the traditional attributes used in this formula are calculated over the matches played by the player in last 12 months from the day of the match. 

**Form = 0.4262 X average + 0.2566 X no. of innings + 0.1510 X SR + 0.0787 X Centuries + 0.0556 X Fifties – 0.0328 X Zeros**

In [84]:
#Please Rate then Calculate

## Form
Form = []

for player in listOfBatsman:
  playerframe = innings[innings.Player == player]
  playerframe = playerframe[playerframe.StartDate > "2019-01-01"]

  if not playerframe.empty:
    ######### Number of innings #########
    numInnings = playerframe.shape[0]

    ######### Amount of not out #########
    for st in playerframe['Runs'].values:
      if st.endswith("*"):
        not_outs+=1

    ######### Number of Dismisal #########
    num_of_dismisal = numInnings - not_outs

    ######### Total Runs #########
    #converting to int
    playruns = []
    for st in playerframe['Runs'].values:
      st = re.findall(r'[0-9]+', st)
      if not st:
        st.append('0')
      playruns.append(float(st[0]))
    playerframe['Runs'] = playruns
    runs_score = playerframe['Runs'].sum()

    ######### Total Ball Faced #########
    balls_faced = playerframe['Bf'].sum()

    ######### Batting Average #########
    if (num_of_dismisal==0):
      average = 0
    else:
      average = runs_score/num_of_dismisal

    ######### Strike Rate #########
    if (balls_faced==0):
      sr = 0
    else:
      sr = (runs_score/balls_faced) * 100

    ######### Number of Centuries #########
    cen = playerframe[playerframe.Runs >= 100].shape[0]

    ######### Number of Fifties #########
    fif = playerframe[playerframe.Runs >= 50].shape[0]
    fif = fif - cen

    ######### Highest Score #########
    h = playerframe['Runs'].max()

    ######### Number of Zeros #########
    zero = playerframe[playerframe.Runs == 0].shape[0]


  ####################  Rate the Elements Before Calculation  ####################

    #### numInnings ####
    if (numInnings>=1 and numInnings<=4):
      numInnings = 1
    elif (numInnings>=5 and numInnings<=9):
      numInnings = 2
    elif (numInnings>=10 and numInnings<=11):
      numInnings = 3
    elif (numInnings>=12 and numInnings<=14):
      numInnings = 4
    elif (numInnings>=15):
      numInnings = 5 

    #### average ####
    if (average>=0.0 and average<=9.9):
      average = 1
    elif (average>=10.0 and average<=19.9):
      average = 2
    elif (average>=20.0 and average<=29.9):
      average = 3
    elif (average>=30.0 and average<=39.9):
      average = 4
    elif (average>=40.0):
      average = 5   

    #### sr ####
    if (sr>=0.0 and sr<=49.9):
      sr = 1
    elif (sr>=50.0 and sr<=59.9):
      sr = 2
    elif (sr>=60.0 and sr<=79.0):
      sr = 3
    elif (sr>=80.0 and sr<=99.9):
      sr = 4
    elif (sr>=100.0):
      sr = 5  

    #### cen ####
    if (cen==1):
      cen = 1
    elif (cen==2):
      cen = 2
    elif (cen==3):
      cen = 3
    elif (cen==4):
      cen = 4
    elif (cen==5):
      cen = 5 

    #### fif ####
    if (fif>=1 and fif<=2):
      fif = 1
    elif (fif>=3 and fif<=4):
      fif = 2
    elif (fif>=5 and fif<=6):
      fif = 3
    elif (fif>=7 and fif<=9):
      fif = 4
    elif (fif>=10):
      fif = 5 

    #### zero ####
    if (zero==1):
      zero = 1
    elif (zero==2):
      zero = 2
    elif (zero==3):
      zero = 3
    elif (zero==4):
      zero = 4
    elif (zero==5):
      zero = 5 

    form = (0.4262 * average) + (0.2566 * numInnings) + (0.1510 * sr) + (0.0787 * cen) + (0.0556 * fif) - (0.0328 * zero)
  else:
    form = 0

  Form.append(form)

In [85]:
FormFrame = pd.DataFrame(Form, columns=['Form'])

In [86]:
FormFrame

Unnamed: 0,Form
0,-2.004600
1,-8.791000
2,0.000000
3,-0.986267
4,2.946200
...,...
1395,0.000000
1396,0.000000
1397,0.000000
1398,0.000000


#### Opposition

Opposition describes a player’s performance against a particular team. All the traditional attributes used in this formula are calculated over all the matches played by the player against the opposition team in his entire career till the day of the match. 

**Opposition = 0.4262 X average + 0.2566 X no. of innings + 0.1510 X SR + 0.0787 X Centuries + 0.0556 X Fifties – 0.0328 X Zeros** 

In [87]:
listOfOpposition = list(innings['Opposition'].unique())
listOfOpposition.sort()

In [88]:
#Please Rate then Calculate

## Opposition 
Oppositions = []

for player in listOfBatsman:
  playerframe = innings[innings.Player == player]
  perPlayerOpposition = []
  for opposition in listOfOpposition:
    oppositionframe = playerframe[playerframe.Opposition == opposition]
    if not oppositionframe.empty:
      ######### Number of innings #########
      numInnings = oppositionframe.shape[0]

      ######### Amount of not out #########
      for st in oppositionframe['Runs'].values:
        if st.endswith("*"):
          not_outs+=1

      ######### Number of Dismisal #########
      num_of_dismisal = numInnings - not_outs

      ######### Total Runs #########
      #converting to int
      playruns = []
      for st in oppositionframe['Runs'].values:
        st = re.findall(r'[0-9]+', st)
        if not st:
          st.append('0')
        playruns.append(float(st[0]))
      oppositionframe['Runs'] = playruns
      runs_score = oppositionframe['Runs'].sum()

      ######### Total Ball Faced #########
      balls_faced = oppositionframe['Bf'].sum()

      ######### Batting Average #########
      if (num_of_dismisal==0):
        average = 0
      else:
        average = runs_score/num_of_dismisal

      ######### Strike Rate #########
      if (balls_faced==0):
        sr = 0
      else:
        sr = (runs_score/balls_faced) * 100

      ######### Number of Centuries #########
      cen = oppositionframe[oppositionframe.Runs >= 100].shape[0]

      ######### Number of Fifties #########
      fif = oppositionframe[oppositionframe.Runs >= 50].shape[0]
      fif = fif - cen

      ######### Highest Score #########
      h = oppositionframe['Runs'].max()

      ######### Number of Zeros #########
      zero = oppositionframe[oppositionframe.Runs == 0].shape[0]


    ####################  Rate the Elements Before Calculation  ####################

      #### numInnings ####
      if (numInnings>=1 and numInnings<=2):
        numInnings = 1
      elif (numInnings>=3 and numInnings<=4):
        numInnings = 2
      elif (numInnings>=5 and numInnings<=6):
        numInnings = 3
      elif (numInnings>=7 and numInnings<=9):
        numInnings = 4
      elif (numInnings>=10):
        numInnings = 5 

      #### average ####
      if (average>=0.0 and average<=9.9):
        average = 1
      elif (average>=10.0 and average<=19.9):
        average = 2
      elif (average>=20.0 and average<=29.9):
        average = 3
      elif (average>=30.0 and average<=39.9):
        average = 4
      elif (average>=40.0):
        average = 5   

      #### sr ####
      if (sr>=0.0 and sr<=49.9):
        sr = 1
      elif (sr>=50.0 and sr<=59.9):
        sr = 2
      elif (sr>=60.0 and sr<=79.0):
        sr = 3
      elif (sr>=80.0 and sr<=99.9):
        sr = 4
      elif (sr>=100.0):
        sr = 5  

      #### cen ####
      if (cen==1):
        cen = 3
      elif (cen==2):
        cen = 4
      elif (cen>=3):
        cen = 5

      #### fif ####
      if (fif>=1 and fif<=2):
        fif = 1
      elif (fif>=3 and fif<=4):
        fif = 2
      elif (fif>=5 and fif<=6):
        fif = 3
      elif (fif>=7 and fif<=9):
        fif = 4
      elif (fif>=10):
        fif = 5 

      #### zero ####
      if (zero==1):
        zero = 1
      elif (zero==2):
        zero = 2
      elif (zero==3):
        zero = 3
      elif (zero==4):
        zero = 4
      elif (zero==5):
        zero = 5 

      oppo = (0.4262 * average) + (0.2566 * numInnings) + (0.1510 * sr) + (0.0787 * cen) + (0.0556 * fif) - (0.0328 * zero)
    else:
      oppo = 0
    perPlayerOpposition.append(oppo)
  Oppositions.append(perPlayerOpposition)   

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


In [89]:
OppositionsFrame = pd.DataFrame(Oppositions, columns = listOfOpposition) 

In [90]:
OppositionsFrame

Unnamed: 0,Afghanistan,Africa XI,Asia XI,Australia,Bangladesh,Bermuda,Canada,England,Hong Kong,ICC World XI,India,Ireland,Kenya,Namibia,Nepal,Netherlands,New Zealand,Oman,P.N.G.,Pakistan,Scotland,South Africa,Sri Lanka,U.A.E.,U.S.A.,West Indies,Zimbabwe
0,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.801,0.000000,0.000000,0.000000,0.0,0.000000,1.007160,0.000000,0.00000,1.010861,0.00000,0.000000,0.000000
1,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.982054,0.000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.00000,0.856168,0.00000,0.000000,0.000000
2,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000,0.000000,0.988004,0.000000,0.0,0.000000,0.000000,0.000000,0.00000,0.000000,0.00000,0.000000,0.869746
3,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000,0.000000,0.000000,0.000000,0.0,0.000000,0.851031,0.000000,0.00000,0.000000,0.00000,0.000000,0.000000
4,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000,0.363002,0.000000,0.684529,0.0,0.000000,0.000000,0.000000,0.00000,0.908585,0.40613,0.000000,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1395,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,1.017295,0.000000,0.000000,0.000000,0.000,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.00000,0.000000,0.00000,0.000000,0.707442
1396,0.000000,0.0,0.0,1.232960,0.000000,0.000000,0.000000,1.051178,0.0,0.0,1.412721,0.000000,0.000000,0.000000,0.000,0.000000,1.776571,0.000000,0.0,1.785084,0.000000,0.920191,0.00000,0.708617,0.00000,0.965264,1.372536
1397,0.597336,0.0,0.0,0.406945,0.858962,9.866598,1.237503,0.717133,0.0,0.0,0.768200,0.882055,1.578994,0.709366,0.000,0.000000,0.407553,0.000000,0.0,0.965218,1.423507,0.813189,0.78095,0.709460,0.00000,0.407086,0.406899
1398,0.000000,0.0,0.0,1.115704,0.407553,0.000000,0.000000,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000,0.000000,0.709507,0.000000,0.0,0.000000,0.000000,0.000000,0.00000,0.000000,0.00000,0.000000,0.000000


In [91]:
OppositionsFrame.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1400 entries, 0 to 1399
Data columns (total 27 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Afghanistan   1400 non-null   float64
 1   Africa XI     1400 non-null   float64
 2   Asia XI       1400 non-null   float64
 3   Australia     1400 non-null   float64
 4   Bangladesh    1400 non-null   float64
 5   Bermuda       1400 non-null   float64
 6   Canada        1400 non-null   float64
 7   England       1400 non-null   float64
 8   Hong Kong     1400 non-null   float64
 9   ICC World XI  1400 non-null   float64
 10  India         1400 non-null   float64
 11  Ireland       1400 non-null   float64
 12  Kenya         1400 non-null   float64
 13  Namibia       1400 non-null   float64
 14  Nepal         1400 non-null   float64
 15  Netherlands   1400 non-null   float64
 16  New Zealand   1400 non-null   float64
 17  Oman          1400 non-null   float64
 18  P.N.G.        1400 non-null 

#### Venue

Venue describes a player’s performance at a particular venue. All the traditional attributes used in this formula are calculated over all the matches played by the player at the venue in his entire career till the day of the match. 

**Venue = 0.4262 X average + 0.2566 X no. of innings + 0.1510 X SR + 0.0787X Centuries + 0.0556X Fifties + 0.0328 X HS**

In [92]:
listOfVenue = list(innings['Ground'].unique())
listOfVenue.sort()

In [93]:
#Please Rate then Calculate

## Venue
Venues = []

for player in listOfBatsman:
  playerframe = innings[innings.Player == player]
  perPlayerVenue = []
  for venue in listOfVenue:
    venueframe = playerframe[playerframe.Ground == venue]
    if not venueframe.empty:
      ######### Number of innings #########
      numInnings = venueframe.shape[0]

      ######### Amount of not out #########
      for st in venueframe['Runs'].values:
        if st.endswith("*"):
          not_outs+=1

      ######### Number of Dismisal #########
      num_of_dismisal = numInnings - not_outs

      ######### Total Runs #########
      #converting to int
      playruns = []
      for st in venueframe['Runs'].values:
        st = re.findall(r'[0-9]+', st)
        if not st:
          st.append('0')
        playruns.append(float(st[0]))
      venueframe['Runs'] = playruns
      runs_score = venueframe['Runs'].sum()

      ######### Total Ball Faced #########
      balls_faced = venueframe['Bf'].sum()

      ######### Batting Average #########
      if (num_of_dismisal==0):
        average = 0
      else:
        average = runs_score/num_of_dismisal

      ######### Strike Rate #########
      if (balls_faced==0):
        sr = 0
      else:
        sr = (runs_score/balls_faced) * 100

      ######### Number of Centuries #########
      cen = venueframe[venueframe.Runs >= 100].shape[0]

      ######### Number of Fifties #########
      fif = venueframe[venueframe.Runs >= 50].shape[0]
      fif = fif - cen

      ######### Highest Score #########
      h = venueframe['Runs'].max()

      ######### Number of Zeros #########
      zero = venueframe[venueframe.Runs == 0].shape[0]


    ####################  Rate the Elements Before Calculation  ####################

      #### numInnings ####
      if (numInnings==1):
        numInnings = 1
      elif (numInnings==2):
        numInnings = 2
      elif (numInnings==3):
        numInnings = 3
      elif (numInnings==4):
        numInnings = 4
      elif (numInnings>=5):
        numInnings = 5 

      #### average ####
      if (average>=0.0 and average<=9.9):
        average = 1
      elif (average>=10.0 and average<=19.9):
        average = 2
      elif (average>=20.0 and average<=29.9):
        average = 3
      elif (average>=30.0 and average<=39.9):
        average = 4
      elif (average>=40.0):
        average = 5   

      #### sr ####
      if (sr>=0.0 and sr<=49.9):
        sr = 1
      elif (sr>=50.0 and sr<=59.9):
        sr = 2
      elif (sr>=60.0 and sr<=79.0):
        sr = 3
      elif (sr>=80.0 and sr<=99.9):
        sr = 4
      elif (sr>=100.0):
        sr = 5  

      #### cen ####
      if (cen==1):
        cen = 4
      elif (cen>=2):
        cen = 5

      #### fif ####
      if (fif==1):
        fif = 4
      elif (fif>=2):
        fif = 5

      #### h ####
      if (h>=1 and h<=24):
        h = 1
      elif (h>=25 and h<=49):
        h = 2
      elif (h>=50 and h<=99):
        h = 3
      elif (h>=100 and h<=149):
        h = 4
      elif (h>=150):
        h = 5   

      ven = (0.4262 * average) + (0.2566 * numInnings) + (0.1510 * sr) + (0.0787 * cen) + (0.0556 * fif) + (0.0328 * h)
    else:
      ven = 0
    perPlayerVenue.append(ven)
  Venues.append(perPlayerVenue)    

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


In [94]:
VenuesFrame = pd.DataFrame(Venues, columns = listOfVenue)

In [95]:
VenuesFrame

Unnamed: 0,Aberdeen,Abu Dhabi,Adelaide,Ahmedabad,Al Amerat,Amritsar,Amstelveen,Auckland,Ayr,Ballarat,Basseterre,Belfast,Bengaluru,Benoni,Berri,Birmingham,Bloemfontein,Bogra,Bready,Bridgetown,Brisbane,Bristol,Bulawayo,Cairns,Canberra,Canterbury,Cape Town,Cardiff,Centurion,Chandigarh,Chattogram,Chelmsford,Chennai,Chester-le-Street,Christchurch,Colombo (PSS),Colombo (RPS),Colombo (SSC),Cuttack,Dambulla,...,Pietermaritzburg,Port Elizabeth,Port Moresby,Port of Spain,Potchefstroom,Providence,Pune,Queenstown,Quetta,Rajkot,Ranchi,Rawalpindi,Roseau,Rotterdam,Sargodha,Schiedam,Sharjah,Sheikhupura,Sialkot,Singapore,Southampton,St George's,St John's,Sydney,Sylhet,Tangier,Taunton,Taupo,The Hague,The Oval,Thiruvananthapuram,Toronto,Townsville,Vadodara,Vijayawada,Visakhapatnam,Wellington,Whangarei,Windhoek,Worcester
0,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.00000,0.000000,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.0,0.00000,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.000000,0.0,0.000000,...,0.0,0.000000,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0
1,0.000000,0.000000,0.0,0.0,1.331651,0.0,0.000000,0.0,0.000000,0.0,0.00000,0.000000,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.0,0.00000,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.000000,0.0,0.000000,...,0.0,0.000000,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0
2,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.00000,0.000000,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.0,0.00000,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.000000,0.0,0.000000,...,0.0,0.000000,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,1.075705,0.0,0.0,0.0
3,0.892793,0.000000,0.0,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.00000,0.000000,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.0,0.00000,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.000000,0.0,0.000000,...,0.0,0.000000,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0
4,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.696253,0.0,0.000000,0.0,0.00000,0.000000,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.0,0.00000,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.000000,0.0,0.000000,...,0.0,0.000000,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1395,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.00000,0.000000,0.0,0.000000,0.0,0.0,0.741892,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.439989,0.0,1.28509,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.000000,0.0,0.000000,...,0.0,0.000000,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0
1396,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.00000,0.000000,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.893207,0.893062,0.0,0.0,0.0,0.440327,0.0,0.00000,0.0,0.0,0.0,0.0,0.0,1.076426,0.0,1.768026,1.589240,0.0,0.741893,...,0.0,0.440376,0.0,0.0,0.000000,0.0,0.440352,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,2.067948,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,1.044303,0.0,0.0,0.0,0.000000,0.0,0.0,1.300710,0.0,0.0,0.0
1397,0.000000,0.000000,0.0,0.0,0.000000,0.0,2.053636,0.0,0.847614,0.0,1.13612,0.774331,0.0,0.696831,0.0,0.0,0.742279,0.0,0.0,0.0,0.0,0.000000,0.440038,0.0,0.0,0.0,0.000000,0.0,0.00000,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.833800,0.742231,0.0,0.000000,...,0.0,0.000000,0.0,0.0,2.106751,0.0,0.000000,0.0,0.0,0.0,0.0,0.623548,0.0,1.798848,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.696469,0.000000,0.0,0.0,0.0,0.440376,0.0,0.0,0.000000,0.0,0.0,0.0
1398,0.000000,1.300614,0.0,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.00000,0.000000,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.0,0.00000,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.000000,0.0,0.000000,...,0.0,0.000000,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,1.044110,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0


In [96]:
VenuesFrame.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1400 entries, 0 to 1399
Columns: 170 entries, Aberdeen to Worcester
dtypes: float64(170)
memory usage: 1.8 MB


In [97]:
listOfBatsmanFrame = pd.DataFrame(listOfBatsman, columns=["Players"])

In [98]:
listOfBatsmanFrame

Unnamed: 0,Players
0,NP Kenjige
1,ME Sanuth
2,Aamer Yamin
3,Aamir Kaleem
4,Aarif Sheikh
...,...
1395,K Zondo
1396,DNT Zoysa
1397,B Zuiderent
1398,Zulfiqar Babar


#### Final Data for training the model

In [99]:
playerPerformance = pd.concat([listOfBatsmanFrame, ConsistencyFrame, FormFrame], axis = 1)
playerOpposition = pd.concat([listOfBatsmanFrame, OppositionsFrame], axis = 1)
playerVenue = pd.concat([listOfBatsmanFrame, VenuesFrame], axis = 1)

In [100]:
playerPerformance

Unnamed: 0,Players,Consistency,Form
0,NP Kenjige,1.4050,-2.004600
1,ME Sanuth,2.2902,-8.791000
2,Aamer Yamin,3.0472,0.000000
3,Aamir Kaleem,1.7130,-0.986267
4,Aarif Sheikh,1.3782,2.946200
...,...,...,...
1395,K Zondo,2.4700,0.000000
1396,DNT Zoysa,1.6474,0.000000
1397,B Zuiderent,2.0838,0.000000
1398,Zulfiqar Babar,1.7130,0.000000


In [101]:
batting = pd.read_csv("/content/drive/My Drive/Projects/Cricket Prediction/Batting.csv")

In [102]:
# Cleaning data

batting = batting[batting.Runs != 'DNB']
batting = batting[batting.Runs != 'TDNB']
batting = batting[batting.Runs != 'sub']
batting = batting[batting.Runs != 'absent']
batting = batting.rename(columns={"Player 1":"Players", "Start Date":"StartDate"})

In [103]:
batting = batting.drop(columns=['Team', 'Mins', 'Bf', '4s', '6s', 'Sr', 'Inns', 'StartDate'])

In [104]:
batting

Unnamed: 0,Players,Runs,Opposition,Ground
2,NP Kenjige,1*,U.A.E.,ICCA Dubai
3,NP Kenjige,6*,Scotland,ICCA Dubai
4,ME Sanuth,6,U.A.E.,Al Amerat
5,ME Sanuth,40,Namibia,Al Amerat
7,NP Kenjige,0,Nepal,Kirtipur
...,...,...,...,...
45281,Zulfiqar Babar,1*,Bangladesh,Dhaka
45282,Zulqarnain Haider,12*,South Africa,Abu Dhabi
45283,Zulqarnain Haider,6,South Africa,Abu Dhabi
45284,Zulqarnain Haider,11,South Africa,Dubai (DSC)


In [105]:
runs = []
for st in batting['Runs'].values:
  st = re.findall(r'[0-9]+', st)
  if not st:
    st.append('0')
  r = float(st[0])
  ######## Rate the run attribute ########
  if (r>=0 and r<=24):
    r = 1
  elif (r>=25 and r<=49):
    r = 2
  elif (r>=50 and r<=74):
    r = 3
  elif (r>=75 and r<=99):
    r = 4
  elif (r>=100):
    r = 5  

  runs.append(r)
batting['Runs'] = runs

In [106]:
batting.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 36109 entries, 2 to 45285
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Players     36109 non-null  object
 1   Runs        36109 non-null  int64 
 2   Opposition  36109 non-null  object
 3   Ground      36109 non-null  object
dtypes: int64(1), object(3)
memory usage: 1.4+ MB


In [107]:
batting

Unnamed: 0,Players,Runs,Opposition,Ground
2,NP Kenjige,1,U.A.E.,ICCA Dubai
3,NP Kenjige,1,Scotland,ICCA Dubai
4,ME Sanuth,1,U.A.E.,Al Amerat
5,ME Sanuth,2,Namibia,Al Amerat
7,NP Kenjige,1,Nepal,Kirtipur
...,...,...,...,...
45281,Zulfiqar Babar,1,Bangladesh,Dhaka
45282,Zulqarnain Haider,1,South Africa,Abu Dhabi
45283,Zulqarnain Haider,1,South Africa,Abu Dhabi
45284,Zulqarnain Haider,1,South Africa,Dubai (DSC)


In [108]:
# Now we have to join playerPerformance and batting dataframe to create our final dataset

finalBatting = pd.merge(batting, playerPerformance, on="Players")

In [109]:
finalBatting.isna().any()

Players        False
Runs           False
Opposition     False
Ground         False
Consistency    False
Form           False
dtype: bool

In [110]:
finalPlayersFrame = finalBatting[['Players','Opposition']]
finalbattingOpposition = pd.merge(finalPlayersFrame, playerOpposition, on="Players")
finalbattingOpposition =finalbattingOpposition.drop(columns=['Players', 'Opposition'])
finalPlayersFrame = pd.get_dummies(finalPlayersFrame.Opposition)
finalbattingOpposition = pd.DataFrame(finalbattingOpposition.values*finalPlayersFrame.values, columns=finalbattingOpposition.columns, index=finalbattingOpposition.index)
finalBatting = pd.concat([finalBatting, finalbattingOpposition], axis=1)

In [111]:
finalPlayersFrame = finalBatting[['Players','Ground']]
finalbattingVenue = pd.merge(finalPlayersFrame, playerVenue, on="Players")
finalbattingVenue =finalbattingVenue.drop(columns=['Players', 'Ground'])
finalPlayersFrame = pd.get_dummies(finalPlayersFrame.Ground)
colfinalPlayersFrame = finalPlayersFrame.columns
finalbattingVenue = finalbattingVenue[colfinalPlayersFrame]
finalbattingVenue = pd.DataFrame(finalbattingVenue.values*finalPlayersFrame.values, columns=finalbattingVenue.columns, index=finalbattingVenue.index)
finalBatting = pd.concat([finalBatting, finalbattingVenue], axis=1)

In [112]:
finalBatting = finalBatting.drop(columns=['Opposition', 'Ground'])
finalBatting

Unnamed: 0,Players,Runs,Consistency,Form,Afghanistan,Africa XI,Asia XI,Australia,Bangladesh,Bermuda,Canada,England,Hong Kong,ICC World XI,India,Ireland,Kenya,Namibia,Nepal,Netherlands,New Zealand,Oman,P.N.G.,Pakistan,Scotland,South Africa,Sri Lanka,U.A.E.,U.S.A.,West Indies,Zimbabwe,Aberdeen,Abu Dhabi,Adelaide,Ahmedabad,Al Amerat,Amstelveen,Auckland,Ayr,Basseterre,...,Nairobi (Ruaraka),Napier,Nelson,North Sound,Nottingham,Paarl,Pallekele,Perth,Peshawar,Port Elizabeth,Port Moresby,Port of Spain,Potchefstroom,Providence,Pune,Queenstown,Rajkot,Ranchi,Rawalpindi,Roseau,Rotterdam,Schiedam,Sharjah,Sheikhupura,Southampton,St George's,St John's,Sydney,Sylhet,Taunton,The Hague,The Oval,Thiruvananthapuram,Toronto,Townsville,Vadodara,Visakhapatnam,Wellington,Whangarei,Windhoek
0,NP Kenjige,1,1.4050,-2.0046,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000,0.0,0.0,0.0,0.0,0.0,0.00000,0.000000,0.0,1.010861,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,NP Kenjige,1,1.4050,-2.0046,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000,0.0,0.0,0.0,0.0,0.0,1.00716,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,NP Kenjige,1,1.4050,-2.0046,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.801,0.0,0.0,0.0,0.0,0.0,0.00000,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,ME Sanuth,1,2.2902,-8.7910,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000,0.0,0.0,0.0,0.0,0.0,0.00000,0.000000,0.0,0.856168,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,1.331651,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,ME Sanuth,2,2.2902,-8.7910,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.982054,0.000,0.0,0.0,0.0,0.0,0.0,0.00000,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,1.331651,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
36104,Zulfiqar Babar,1,1.7130,0.0000,0.0,0.0,0.0,0.0,0.407553,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000,0.0,0.0,0.0,0.0,0.0,0.00000,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
36105,Zulqarnain Haider,1,1.9882,0.0000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000,0.0,0.0,0.0,0.0,0.0,0.00000,0.963956,0.0,0.000000,0.0,0.0,0.0,0.0,0.696566,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
36106,Zulqarnain Haider,1,1.9882,0.0000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000,0.0,0.0,0.0,0.0,0.0,0.00000,0.963956,0.0,0.000000,0.0,0.0,0.0,0.0,0.696566,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
36107,Zulqarnain Haider,1,1.9882,0.0000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000,0.0,0.0,0.0,0.0,0.0,0.00000,0.963956,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [113]:
finalBatting.to_csv("/content/drive/My Drive/Projects/Cricket Prediction/finalBatting.csv")

In [114]:
finalBatting = pd.read_csv("/content/drive/My Drive/Projects/Cricket Prediction/finalBatting.csv")

In [115]:
X_batting = finalBatting.drop(columns=['Runs', 'Unnamed: 0', 'Players'])
y_batting = finalBatting['Runs']

In [116]:
X_batting

Unnamed: 0,Consistency,Form,Afghanistan,Africa XI,Asia XI,Australia,Bangladesh,Bermuda,Canada,England,Hong Kong,ICC World XI,India,Ireland,Kenya,Namibia,Nepal,Netherlands,New Zealand,Oman,P.N.G.,Pakistan,Scotland,South Africa,Sri Lanka,U.A.E.,U.S.A.,West Indies,Zimbabwe,Aberdeen,Abu Dhabi,Adelaide,Ahmedabad,Al Amerat,Amstelveen,Auckland,Ayr,Basseterre,Belfast,Bengaluru,...,Nairobi (Ruaraka),Napier,Nelson,North Sound,Nottingham,Paarl,Pallekele,Perth,Peshawar,Port Elizabeth,Port Moresby,Port of Spain,Potchefstroom,Providence,Pune,Queenstown,Rajkot,Ranchi,Rawalpindi,Roseau,Rotterdam,Schiedam,Sharjah,Sheikhupura,Southampton,St George's,St John's,Sydney,Sylhet,Taunton,The Hague,The Oval,Thiruvananthapuram,Toronto,Townsville,Vadodara,Visakhapatnam,Wellington,Whangarei,Windhoek
0,1.4050,-2.0046,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000,0.0,0.0,0.0,0.0,0.0,0.00000,0.000000,0.0,1.010861,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,1.4050,-2.0046,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000,0.0,0.0,0.0,0.0,0.0,1.00716,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,1.4050,-2.0046,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.801,0.0,0.0,0.0,0.0,0.0,0.00000,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,2.2902,-8.7910,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000,0.0,0.0,0.0,0.0,0.0,0.00000,0.000000,0.0,0.856168,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,1.331651,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,2.2902,-8.7910,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.982054,0.000,0.0,0.0,0.0,0.0,0.0,0.00000,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,1.331651,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
36104,1.7130,0.0000,0.0,0.0,0.0,0.0,0.407553,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000,0.0,0.0,0.0,0.0,0.0,0.00000,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
36105,1.9882,0.0000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000,0.0,0.0,0.0,0.0,0.0,0.00000,0.963956,0.0,0.000000,0.0,0.0,0.0,0.0,0.696566,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
36106,1.9882,0.0000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000,0.0,0.0,0.0,0.0,0.0,0.00000,0.963956,0.0,0.000000,0.0,0.0,0.0,0.0,0.696566,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
36107,1.9882,0.0000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000,0.0,0.0,0.0,0.0,0.0,0.00000,0.963956,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [117]:
y_batting

0        1
1        1
2        1
3        1
4        2
        ..
36104    1
36105    1
36106    1
36107    1
36108    1
Name: Runs, Length: 36109, dtype: int64

# Oversampling SMOTE

In [118]:
pip install -U imbalanced-learn

Requirement already up-to-date: imbalanced-learn in /usr/local/lib/python3.6/dist-packages (0.7.0)


In [119]:
from imblearn.over_sampling import SMOTE
X_resample_batting, y_resample_batting = SMOTE().fit_sample(X_batting,y_batting.values.ravel())

In [120]:
X_resample_batting.shape

(118315, 169)

In [121]:
X_batting.shape

(36109, 169)

# Taking Care of Zero values

In [122]:
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values=0, strategy='mean')
imputer.fit(X_resample_batting)
X_resample_batting = imputer.transform(X_resample_batting) 

# Splitting the datasets into the Training set and Test set

In [123]:
###############   FOR BATTING   ###############
from sklearn.model_selection import train_test_split
X_train_batting, X_test_batting, y_train_batting, y_test_batting = train_test_split(X_resample_batting, y_resample_batting, test_size=0.3, random_state = 1)

In [124]:
print(X_train_batting.shape)

print(X_test_batting.shape)

print(y_train_batting.shape)

print(y_test_batting.shape)

(82820, 169)
(35495, 169)
(82820,)
(35495,)


In [125]:
X_train_batting

array([[ 2.48981009, -0.28212157,  1.08188752, ...,  1.56729411,
         0.85459538,  0.68192765],
       [ 2.08724043,  0.14677431,  1.08188752, ...,  1.56729411,
         0.85459538,  0.68192765],
       [ 2.7767    ,  1.29181555,  1.08188752, ...,  1.56729411,
         0.85459538,  0.68192765],
       ...,
       [ 4.204     , -0.28212157,  1.08188752, ...,  1.56729411,
         0.85459538,  0.68192765],
       [ 2.37233465,  0.48167676,  1.08188752, ...,  1.56729411,
         0.85459538,  0.68192765],
       [ 3.85269583,  0.00560943,  1.08188752, ...,  1.56729411,
         0.85459538,  0.68192765]])

# Feature Scaleing

In [126]:
###############   FOR BATTING   ###############
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train_batting = sc.fit_transform(X_train_batting)
X_test_batting = sc.transform(X_test_batting)

In [127]:
X_train_batting

array([[-4.91470474e-01, -2.46740657e-03, -1.75731049e-03, ...,
         9.77993971e-04, -2.24279217e-04, -1.95359592e-04],
       [-6.98313969e-01,  1.33851501e-01, -1.75731049e-03, ...,
         9.77993971e-04, -2.24279217e-04, -1.95359592e-04],
       [-3.44064154e-01,  4.97787780e-01, -1.75731049e-03, ...,
         9.77993971e-04, -2.24279217e-04, -1.95359592e-04],
       ...,
       [ 3.89293953e-01, -2.46740657e-03, -1.75731049e-03, ...,
         9.77993971e-04, -2.24279217e-04, -1.95359592e-04],
       [-5.51830293e-01,  2.40295836e-01, -1.75731049e-03, ...,
         9.77993971e-04, -2.24279217e-04, -1.95359592e-04],
       [ 2.08791071e-01,  8.89841009e-02, -1.75731049e-03, ...,
         9.77993971e-04, -2.24279217e-04, -1.95359592e-04]])

In [128]:
X_test_batting

array([[-3.55778976e-01, -2.46740657e-03, -1.75731049e-03, ...,
         9.77993971e-04, -2.24279217e-04, -1.95359592e-04],
       [ 5.21483481e-01, -2.17958308e-01, -1.75731049e-03, ...,
         9.77993971e-04, -2.24279217e-04, -1.95359592e-04],
       [-1.37604745e+00,  2.54763133e-01, -1.75731049e-03, ...,
         9.77993971e-04, -2.24279217e-04, -1.95359592e-04],
       ...,
       [ 4.11803660e-01,  4.07041884e-01, -1.75731049e-03, ...,
         9.77993971e-04, -2.24279217e-04, -1.95359592e-04],
       [-6.19465223e-01,  3.40204414e-01, -1.75731049e-03, ...,
         9.77993971e-04, -2.24279217e-04, -1.95359592e-04],
       [-6.80197324e-01, -1.26531573e-02, -1.75731049e-03, ...,
         9.77993971e-04, -2.24279217e-04, -1.95359592e-04]])

In [129]:
import numpy as np

X_train_batting = np.array(X_train_batting)
X_test_batting = np.array(X_test_batting)
y_train_batting = np.array(y_train_batting)
y_test_batting = np.array(y_test_batting)

In [130]:
X_train_batting

array([[-4.91470474e-01, -2.46740657e-03, -1.75731049e-03, ...,
         9.77993971e-04, -2.24279217e-04, -1.95359592e-04],
       [-6.98313969e-01,  1.33851501e-01, -1.75731049e-03, ...,
         9.77993971e-04, -2.24279217e-04, -1.95359592e-04],
       [-3.44064154e-01,  4.97787780e-01, -1.75731049e-03, ...,
         9.77993971e-04, -2.24279217e-04, -1.95359592e-04],
       ...,
       [ 3.89293953e-01, -2.46740657e-03, -1.75731049e-03, ...,
         9.77993971e-04, -2.24279217e-04, -1.95359592e-04],
       [-5.51830293e-01,  2.40295836e-01, -1.75731049e-03, ...,
         9.77993971e-04, -2.24279217e-04, -1.95359592e-04],
       [ 2.08791071e-01,  8.89841009e-02, -1.75731049e-03, ...,
         9.77993971e-04, -2.24279217e-04, -1.95359592e-04]])

# Model Building

In [131]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

In [132]:
models_batting = []
models_batting.append(('DTC', DecisionTreeClassifier(criterion= 'entropy', random_state=0)))
models_batting.append(('NB', GaussianNB()))
models_batting.append(('RFC', RandomForestClassifier(n_estimators=500, criterion='entropy', random_state=0)))
models_batting.append(('SVC', SVC(random_state = 0, kernel = 'rbf')))

In [133]:
results = []
names = []

for name, model in models_batting:
  model.fit(X_train_batting, y_train_batting)
  y_pred = model.predict(X_test_batting)
  accuracies = accuracy_score(y_test_batting, y_pred)
  results.append(accuracies*100)
  names.append(name)
  print("Model Completed")
final_comparison_batting = pd.DataFrame(list(zip(names, results)), columns = ['Model Name', 'Accuracy'])

Model Completed
Model Completed
Model Completed
Model Completed


In [134]:
final_comparison_batting.sort_values(by=['Accuracy'], ascending=False)

Unnamed: 0,Model Name,Accuracy
2,RFC,76.920693
0,DTC,61.729821
3,SVC,45.09931
1,NB,22.777856
