<a href="https://colab.research.google.com/github/yashguptaab99/Cricket-Prediction/blob/master/Cricket_Predictions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# INCREASED PREDICTION ACCURACY IN THE GAME OF CRICKET USING MACHINE LEARNING

Player selection is one the most important tasks for any sport and cricket is no exception. The performance of the players depends on various factors such as the opposition team, the venue, his current form etc. The team management, the coach and the captain select 11 players for each match from a squad of 15 to 20 players. They analyze different characteristics and the statistics of the players to select the best playing 11 for each match. Each batsman contributes by scoring maximum runs possible and each bowler contributes by taking maximum wickets and conceding minimum runs. This paper attempts to predict the performance of players as how many runs will each batsman score and how many wickets will each bowler take for both the teams. Both the problems are targeted as classification problems where number of runs and number of wickets are classified in different ranges. We used naïve bayes, random forest, multiclass SVM and decision tree classifiers to generate the prediction models for both the problems. Random Forest classifier was found to be the most accurate for both the problems. 

# Importing Libraray

In [1]:
import pandas as pd
import re

# Importing Data

In [2]:
# All Innings list after 14 Jan 2005
batting = pd.read_csv("/content/drive/My Drive/Projects/Cricket Prediction/Batting.csv")

# All Ininngs list from 18 Dec 1989 to 13 Jan 2005
battingExtra = pd.read_csv("/content/drive/My Drive/Projects/Cricket Prediction/Batting89-05.csv")

# Data Manipulation

## Batting data

In [3]:
batting = batting.drop(columns=['Mins', '4s', '6s', 'Sr', 'Inns'])
battingExtra = battingExtra.drop(columns=['Mins', '4s', '6s', 'Sr', 'Inns'])

In [4]:
batting = batting[batting.Runs != 'DNB']
batting = batting[batting.Runs != 'TDNB']
batting = batting[batting.Runs != 'sub']
batting = batting[batting.Runs != 'absent']
batting = batting.rename(columns={"Player 1":"Player", "Start Date":"StartDate"})

battingExtra = battingExtra[battingExtra.Runs != 'DNB']
battingExtra = battingExtra[battingExtra.Runs != 'TDNB']
battingExtra = battingExtra[battingExtra.Runs != 'sub']
battingExtra = battingExtra[battingExtra.Runs != 'absent']
battingExtra = battingExtra.rename(columns={"Player 1":"Player", "Start Date":"StartDate"})

In [5]:
listOfBatsman = list(batting['Player'].unique())

In [15]:
#Merging player past performance innings which were present in matches after 2005
# for ex. Sachine was senior most so his mast matches performance shoould be added 

for player in listOfBatsman:
  playerframe = battingExtra[battingExtra.Player == player]
  


In [5]:
#Converting bf to integer and cleaning it
bf = []
for st in batting['Bf'].values:
  st = re.findall(r'[0-9]+', st)
  if not st:
    st.append('0')
  bf.append(float(st[0]))

bfExtra = []
for st in battingExtra['Bf'].values:
  st = re.findall(r'[0-9]+', st)
  if not st:
    st.append('0')
  bfExtra.append(float(st[0]))

In [6]:
batting['Bf'] = bf
battingExtra['Bf'] = bfExtra

In [8]:
# List of batting attributes 
num_of_innings = []
batting_avg = []
strike_rate = []
centuries = []
fifties = []
highest = []
zeros = []

In [9]:
for player in listOfBatsman:
  not_outs = 0
  runs_score = 0
  balls_faced = 0
  playerframe = batting[batting.Player == player]
  playerframeExtra = battingExtra[battingExtra.Player == player]

  #number of innings
  numInnings = playerframe.shape[0]
  if not playerframeExtra.empty:
    numInnings += playerframeExtra.shape[0]

  #Amount of not out
  for st in playerframe['Runs'].values:
    if st.endswith("*"):
      not_outs+=1
  if not playerframeExtra.empty:
    for st in playerframeExtra['Runs'].values:
      if st.endswith("*"):
        not_outs+=1

  #Number of Dismisal
  num_of_dismisal = numInnings - not_outs

  #Total Runs
  #converting to int
  playruns = []
  playrunsExtra = []
  for st in playerframe['Runs'].values:
    st = re.findall(r'[0-9]+', st)
    if not st:
      st.append('0')
    playruns.append(float(st[0]))
  playerframe['Runs'] = playruns
  runs_score = playerframe['Runs'].sum()

  if not playerframeExtra.empty:
    for st in playerframeExtra['Runs'].values:
      st = re.findall(r'[0-9]+', st)
      if not st:
        st.append('0')
      playrunsExtra.append(float(st[0]))
    playerframeExtra['Runs'] = playrunsExtra
    runs_score += playerframeExtra['Runs'].sum()


  #Total Ball Faced
  balls_faced = playerframe['Bf'].sum()
  if not playerframeExtra.empty:
    balls_faced += playerframeExtra['Bf'].sum()


  #Batting Average
  if (num_of_dismisal==0):
    battign_average = 0
  else:
    battign_average = runs_score/num_of_dismisal

  #Strike Rate
  if (balls_faced==0):
    sr = 0
  else:
    sr = (runs_score/balls_faced) * 100

  #Number of Centuries
  cen = playerframe[playerframe.Runs >= 100].shape[0]
  if not playerframeExtra.empty:
    cen += playerframeExtra[playerframeExtra.Runs >= 100].shape[0]



  ##Number of Fifties
  fif = playerframe[playerframe.Runs >= 50].shape[0]
  if not playerframeExtra.empty:
    fif += playerframeExtra[playerframeExtra.Runs >= 50].shape[0]
  fif = fif - cen

  #Highest Score
  h=0
  hs = playerframe['Runs'].max()
  hsExtra = playerframeExtra['Runs'].max()
  if (hs > hsExtra):
    h = hs
  else:
    h = hsExtra

  #Number of Zeros
  zero = playerframe[playerframe.Runs == 0].shape[0]
  if not playerframeExtra.empty:
    zero += playerframeExtra[playerframeExtra.Runs == 0].shape[0]

  num_of_innings.append(numInnings)
  batting_avg.append(battign_average)
  strike_rate.append(sr)
  centuries.append(cen)
  fifties.append(fif)
  highest.append(h)
  zeros.append(zero)

df = pd.DataFrame()
df['listOfBatsman'] = listOfBatsman
df['num_of_innings'] = num_of_innings
df['batting_avg'] = batting_avg
df['strike_rate'] = strike_rate
df['centuries'] = centuries
df['fifties'] = fifties
df['highest_score'] = highest
df['num_of_zeroes'] = zeros

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1400 entries, 0 to 1399
Data columns (total 8 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   listOfBatsman   1400 non-null   object 
 1   num_of_innings  1400 non-null   int64  
 2   batting_avg     1400 non-null   float64
 3   strike_rate     1400 non-null   float64
 4   centuries       1400 non-null   int64  
 5   fifties         1400 non-null   int64  
 6   highest_score   288 non-null    float64
 7   num_of_zeroes   1400 non-null   int64  
dtypes: float64(3), int64(4), object(1)
memory usage: 87.6+ KB


In [11]:
df.to_csv("/content/drive/My Drive/Projects/Cricket Prediction/PlayerStats05-20.csv")

In [12]:
#Converting runs to integer and cleaning it
runs = []
for st in batting['Runs'].values:
  st = re.findall(r'[0-9]+', st)
  if not st:
    st.append('0')
  runs.append(float(st[0]))

In [13]:
batting['Runs'] = runs