<a href="https://colab.research.google.com/github/vidSanas/greyatom-python-for-data-science/blob/master/Manipulating_Data_with_NumPy_Code_Along.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# IPL Dataset Analysis

## Problem Statement
We want to know as to what happens during an IPL match which raises several questions in our mind with our limited knowledge about the game called cricket on which it is based. This analysis is done to know as which factors led one of the team to win and how does it matter.

## About the Dataset :
The Indian Premier League (IPL) is a professional T20 cricket league in India contested during April-May of every year by teams representing Indian cities. It is the most-attended cricket league in the world and ranks sixth among all the sports leagues. It has teams with players from around the world and is very competitive and entertaining with a lot of close matches between teams.

The IPL and other cricket related datasets are available at [cricsheet.org](https://cricsheet.org/%c2%a0(data). Feel free to visit the website and explore the data by yourself as exploring new sources of data is one of the interesting activities a data scientist gets to do.

## About the dataset:
Snapshot of the data you will be working on:<br>
<br>
The dataset 1452 data points and 23 features<br>

|Features|Description|
|-----|-----|
|match_code|Code pertaining to individual match|
|date|Date of the match played|
|city|Location where the match was played|
|team1|team1|
|team2|team2|
|toss_winner|Who won the toss out of two teams|
|toss_decision|toss decision taken by toss winner|
|winner|Winner of that match between two teams|
|win_type|How did the team won(by wickets or runs etc.)|
|win_margin|difference with which the team won| 
|inning|inning type(1st or 2nd)|
|delivery|ball delivery|
|batting_team|current team on batting|
|batsman|current batsman on strike|
|non_striker|batsman on non-strike|
|bowler|Current bowler|
|runs|runs scored|
|extras|extra run scored|
|total|total run scored on that delivery including runs and extras|
|extras_type|extra run scored by wides or no ball or legby|
|player_out|player that got out|
|wicket_kind|How did the player got out|
|wicket_fielders|Fielder who caught out the player by catch|


### Analysing data using numpy module

### Read the data using numpy module.

In [0]:
import numpy as np
# Not every data format will be in csv there are other file formats also.
# This exercise will help you deal with other file formats and how toa read it.
path = './ipl_matches_small.csv'
data_ipl = np.genfromtxt(path, delimiter=',', skip_header=1, dtype=str)



In [3]:
print(data_ipl)

[['392203' '2009-05-01' 'East London' ... '' '' '']
 ['392203' '2009-05-01' 'East London' ... '' '' '']
 ['392203' '2009-05-01' 'East London' ... '' '' '']
 ...
 ['335987' '2008-04-21' 'Jaipur' ... '' '' '']
 ['335987' '2008-04-21' 'Jaipur' ... '' '' '']
 ['335987' '2008-04-21' 'Jaipur' ... '' '' '']]


### Calculate the unique no. of matches in the provided dataset ?

In [4]:
# How many matches were held in total we need to know so that we can analyze further statistics keeping that in mind.im
import numpy as np
unique_match_code=np.unique(data_ipl[:,0])
print(unique_match_code)

['335987' '392197' '392203' '392212' '501226' '729297']


### Find the set of all unique teams that played in the matches in the data set.

In [5]:
# this exercise deals with you getting to know that which are all those six teams that played in the tournament.
import numpy as np
unique_match_team3=np.unique(data_ipl[:,3])
print(unique_match_team3)
unique_match_team4=np.unique(data_ipl[:,4])
print(unique_match_team4)

union=np.union1d(unique_match_team3,unique_match_team4)
print(union)
unique=np.unique(union)
print(unique)

['Chennai Super Kings' 'Deccan Chargers' 'Kolkata Knight Riders'
 'Rajasthan Royals']
['Chennai Super Kings' 'Kings XI Punjab' 'Mumbai Indians' 'Pune Warriors']
['Chennai Super Kings' 'Deccan Chargers' 'Kings XI Punjab'
 'Kolkata Knight Riders' 'Mumbai Indians' 'Pune Warriors'
 'Rajasthan Royals']
['Chennai Super Kings' 'Deccan Chargers' 'Kings XI Punjab'
 'Kolkata Knight Riders' 'Mumbai Indians' 'Pune Warriors'
 'Rajasthan Royals']


### Find sum of all extras in all deliveries in all matches in the dataset

In [6]:
# An exercise to make you familiar with indexing and slicing up within data.
import numpy as np
extras=data_ipl[:,17]
data=extras.astype(np.int)
print(sum(data))


88


### Get the array of all delivery numbers when a given player got out. Also mention the wicket type.

In [7]:
import numpy as np
deliveries=[]
wicket_type=[]
for i in data_ipl:
  if(i[20]!=""):
    a=i[11]
    b=i[21]
    deliveries.append(a)
    wicket_type.append(b)
print(deliveries)
print(wicket_type)
    
  
  
  
       
  
  
    


['3.2', '5.5', '7.6', '11.4', '15.6', '18.6', '0.4', '2.2', '14.5', '17.2', '18.6', '19.3', '12.2', '13.5', '14.4', '15.1', '16.6', '18.5', '1.7', '2.7', '10.2', '12.1', '12.3', '13.2', '14.5', '15.1', '15.2', '1.5', '5.3', '9.4', '12.6', '17.1', '19.1', '1.4', '1.5', '8.5', '14.1', '15.5', '15.6', '17.1', '17.3', '5.3', '7.2', '8.2', '10.1', '11.1', '14.5', '1.3', '5.2', '6.4', '6.5', '10.5', '12.6', '13.3', '14.2', '18.3', '19.5', '9.2', '9.6', '16.4', '17.2', '17.5', '19.6', '2.4', '3.6', '4.6', '5.3', '12.6', '18.3', '18.5', '19.1', '19.2', '4.5', '6.3', '7.4', '8.6', '16.5', '17.2', '17.4', '18.6', '1.1', '2.3', '4.5', '11.2']
['caught', 'caught', 'caught', 'bowled', 'caught', 'caught', 'bowled', 'bowled', 'caught', 'bowled', 'run out', 'caught', 'lbw', 'caught', 'caught', 'run out', 'caught', 'caught', 'caught', 'caught', 'bowled', 'caught', 'caught', 'caught', 'caught', 'bowled', 'bowled', 'caught', 'caught', 'bowled', 'bowled', 'caught', 'run out', 'caught', 'bowled', 'caught',

### How many matches the team `Mumbai Indians` has won the toss?

In [8]:
data_arr=[]
for i in data_ipl:
  if(i[5]=="Mumbai Indians"):
    data_arr.append(i[0])
unique_match_id=np.unique(data_arr)
print(unique_match_id)
print(len(unique_match_id))
    




['392197' '392203']
2


### Create a filter that filters only those records where the batsman scored 6 runs. Also who has scored the maximum no. of sixes overall ?

In [11]:
# An exercise to know who is the most aggresive player or maybe the scoring player 
import numpy as np
counter=0
run_dict={}
arr=[]
for i in data_ipl:
  #print(i[13])
  #current_run = i[16]
  #prev_run = run_dict[batsman_nm]
  #batsman_nm = i[13]
  #if prev_run == None:
    #run_dict[batsman_nm] = current_run
  #else:
    #run_dict[batsman_nm] = run_dict[batsman_nm]current_run
  if i[13] in run_dict:
    run_dict[i[13]]=run_dict[i[13]]+int(i[16])
  else:
    run_dict[i[13]]=int(i[16])
print(run_dict)


{'ST Jayasuriya': 63, 'SR Tendulkar': 104, 'Harbhajan Singh': 24, 'AM Nayar': 19, 'JP Duminy': 122, 'GR Napier': 15, 'AM Rahane': 25, 'Z Khan': 2, 'CH Gayle': 19, 'SC Ganguly': 34, 'BJ Hodge': 97, 'MN van Wyk': 32, 'LR Shukla': 12, 'BB McCullum': 12, 'WP Saha': 8, 'DJ Bravo': 16, 'S Dhawan': 12, 'SS Tiwary': 7, 'Yashpal Singh': 8, 'AN Ghosh': 0, 'I Sharma': 6, 'BAW Mendis': 0, 'AB Dinda': 0, 'AC Gilchrist': 25, 'HH Gibbs': 0, 'TL Suman': 20, 'RG Sharma': 38, 'DR Smith': 66, 'Y Venugopal Rao': 28, 'DB Ravi Teja': 4, 'RJ Harris': 5, 'PR Shah': 29, 'RR Raje': 11, 'DS Kulkarni': 35, 'SK Raina': 6, 'F du Plessis': 7, 'MS Dhoni': 31, 'RA Jadeja': 72, 'M Manhas': 30, 'R Ashwin': 9, 'SV Samson': 16, 'SR Watson': 83, 'SPD Smith': 19, 'STR Binny': 8, 'R Bhatia': 23, 'JP Faulkner': 4, 'TG Southee': 4, 'PV Tambe': 2, 'M Vijay': 31, 'MEK Hussey': 61, 'JA Morkel': 0, 'S Badrinath': 11, 'S Anirudha': 7, 'JD Ryder': 15, 'MD Mishra': 9, 'MK Pandey': 12, 'RV Uthappa': 0, 'Yuvraj Singh': 91, 'NL McCullum