### üìå Source of Men's Player Statistics (BCCI)

The data for **Indian Men's ODI players** has been scraped from the official BCCI website:

üîó **URL:** https://www.bcci.tv/international/men/stats/odi

This page provides detailed statistics for Indian men's international cricketers, which were extracted using `Python`, `Requests`, and `BeautifulSoup` as part of the web scraping process.

In [5]:
import requests 
from bs4 import BeautifulSoup

In [6]:
url="https://www.bcci.tv/international/men/stats/odi"

In [7]:
page=requests.get(url)

In [8]:
page

<Response [200]>

In [9]:
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:106.0) Gecko/20100101 Firefox/106.0',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,/;q=0.8',
    'Accept-Language': 'en-US,en;q=0.5',
    # 'Accept-Encoding': 'gzip, deflate, br',
    'DNT': '1',
    'Connection': 'keep-alive',
    'Upgrade-Insecure-Requests': '1',
    'Sec-Fetch-Dest': 'document',
    'Sec-Fetch-Mode': 'navigate',
    'Sec-Fetch-Site': 'none',
    'Sec-Fetch-User': '?1',
}

In [10]:
page=requests.get(url,headers=headers)
page

<Response [200]>

In [4]:
### üìù Note on `soup`

#```python
# soup = page.text
# soup (This would print the entire HTML code of the webpage, which is very large and not suitable to display or upload)

In [6]:
### üìù Note on `BeautifulSoup`

#```python
# soup = BeautifulSoup(page.text)
# soup  # (Printing this will show the entire parsed HTML, which is very large and not required in the notebook)

### üì• Extracting Men's Player Data

In this notebook, I extracted the **Indian Men's ODI player statistics** using **BeautifulSoup**.  
The HTML content from the BCCI website was parsed to collect key player information such as:

- Player Name  
- Role  
- Batting Stats  
- Bowling Stats  
- Additional profile details  

BeautifulSoup was used to navigate the page structure, locate relevant tags, and scrape the required columns efficiently.

In [13]:

url = "https://www.bcci.tv/international/men/stats/odi"   
headers = {"User-Agent": "Mozilla/5.0"}

res = requests.get(url, headers=headers)
soup = BeautifulSoup(res.text, "html.parser")

# Extract all player name <td> tags
player_tags = soup.select('div.stats-data-table-player td[width="20%"] h6')

# Store names in a list
player_names = [tag.get_text(strip=True) for tag in player_tags]

print(player_names)
print(f"\nTotal players found: {len(player_names)}")

['Virat Kohli', 'Rohit Sharma', 'Sourav Ganguly', 'Rahul Dravid', 'MS Dhoni', 'Mohammad Azharuddin', 'Yuvraj Singh', 'Virender Sehwag', 'Shikhar Dhawan', 'Suresh Raina', 'Ajay Jadeja', 'Gautam Gambhir', 'Navjot Sidhu', 'Kris Srikkanth', 'Kapil Dev', 'Dilip Vengsarkar', 'Ravi Shastri', 'Sunil Gavaskar', 'KL Rahul', 'Ajinkya Rahane', 'Shreyas Iyer', 'Shubman Gill', 'Ravindra Jadeja', 'Mohammad Kaif', 'Vinod Kambli', 'VVS Laxman', 'Sanjay Manjrekar', 'Mohinder Amarnath', 'Hardik Pandya', 'Manoj Prabhakar', 'Dinesh Karthik', 'Ambati Rayudu', 'Irfan Pathan', 'Kedar Jadhav', 'Nayan Mongia', 'Ajit Agarkar', 'Harbhajan Singh', 'Dinesh Mongia', 'Sandeep Patil', 'Anil Kumble', 'Robin Uthappa', 'Ishan Kishan', 'Javagal Srinath', 'Yashpal Sharma', 'Rishabh Pant', 'Hemang Badani', 'Axar Patel', 'Yusuf Pathan', 'Zaheer Khan', 'Raman Lamba', 'Surya Kumar Yadav', 'Parthiv Patel', 'Ravichandran Ashwin', 'Sadagoppan Ramesh', 'Roger Binny', 'Woorkeri Raman', 'Sunil Joshi', 'Manish Pandey', 'Kiran More', 

In [14]:
player_names

['Virat Kohli',
 'Rohit Sharma',
 'Sourav Ganguly',
 'Rahul Dravid',
 'MS Dhoni',
 'Mohammad Azharuddin',
 'Yuvraj Singh',
 'Virender Sehwag',
 'Shikhar Dhawan',
 'Suresh Raina',
 'Ajay Jadeja',
 'Gautam Gambhir',
 'Navjot Sidhu',
 'Kris Srikkanth',
 'Kapil Dev',
 'Dilip Vengsarkar',
 'Ravi Shastri',
 'Sunil Gavaskar',
 'KL Rahul',
 'Ajinkya Rahane',
 'Shreyas Iyer',
 'Shubman Gill',
 'Ravindra Jadeja',
 'Mohammad Kaif',
 'Vinod Kambli',
 'VVS Laxman',
 'Sanjay Manjrekar',
 'Mohinder Amarnath',
 'Hardik Pandya',
 'Manoj Prabhakar',
 'Dinesh Karthik',
 'Ambati Rayudu',
 'Irfan Pathan',
 'Kedar Jadhav',
 'Nayan Mongia',
 'Ajit Agarkar',
 'Harbhajan Singh',
 'Dinesh Mongia',
 'Sandeep Patil',
 'Anil Kumble',
 'Robin Uthappa',
 'Ishan Kishan',
 'Javagal Srinath',
 'Yashpal Sharma',
 'Rishabh Pant',
 'Hemang Badani',
 'Axar Patel',
 'Yusuf Pathan',
 'Zaheer Khan',
 'Raman Lamba',
 'Surya Kumar Yadav',
 'Parthiv Patel',
 'Ravichandran Ashwin',
 'Sadagoppan Ramesh',
 'Roger Binny',
 'Woorkeri

In [17]:
import requests
from bs4 import BeautifulSoup

# Step 1: Get page content
url = "https://www.bcci.tv/international/men/stats/odi"
headers = {"User-Agent": "Mozilla/5.0"}
res = requests.get(url, headers=headers)
soup = BeautifulSoup(res.text, "html.parser")

# Step 2: Find all <td> that contain 'Matches'
td_tags = soup.select('div.stats-data-table-player td')

matches = []

for td in td_tags:
    value = td.find("h6")
    label = td.find("span")
    if label and value:
        if label.get_text(strip=True) == "Matches":
            matches.append(value.get_text(strip=True))

print(matches)
print(f"\nTotal match records found: {len(matches)}")

['305', '276', '311', '344', '350', '334', '304', '251', '167', '226', '196', '147', '136', '146', '225', '129', '150', '108', '88', '90', '73', '58', '204', '125', '104', '86', '74', '85', '94', '130', '94', '55', '120', '73', '140', '191', '236', '57', '45', '271', '46', '27', '229', '42', '31', '40', '71', '57', '200', '32', '37', '38', '116', '24', '72', '27', '69', '29', '94', '121', '37', '16', '20', '65', '25', '31', '67', '49', '34', '26', '34', '17', '47', '39', '68', '36', '12', '18', '25', '15', '23', '10', '14', '108', '12', '161', '16', '19', '114', '13', '7', '6', '15', '10', '10', '31', '13', '10', '10', '7', '11', '12', '120', '10', '35', '17', '5', '37', '13', '30', '19', '32', '6', '2', '19', '5', '8', '58', '3', '5', '89', '12', '7', '31', '5', '12', '8', '23', '75', '7', '72', '70', '2', '80', '22', '4', '47', '2', '1', '15', '4', '11', '22', '5', '5', '5', '59', '6', '3', '5', '18', '2', '53', '1', '36', '17', '4', '9', '4', '25', '8', '2', '4', '20', '10', '9', '5

In [18]:
import requests
from bs4 import BeautifulSoup

# Step 1: Get page content
url = "https://www.bcci.tv/international/men/stats/odi"
headers = {"User-Agent": "Mozilla/5.0"}
res = requests.get(url, headers=headers)
soup = BeautifulSoup(res.text, "html.parser")

# Step 2: Find all <td> tags inside the stats section
td_tags = soup.select('div.stats-data-table-player td')

# Step 3: Extract Innings values
innings = []

for td in td_tags:
    value = td.find("h6")
    label = td.find("span")
    if label and value:
        if label.get_text(strip=True) == "Inns":
            innings.append(value.get_text(strip=True))

print(innings)
print(f"\nTotal innings records found: {len(innings)}")

['293', '268', '300', '318', '297', '308', '278', '245', '164', '194', '179', '143', '127', '145', '198', '120', '128', '102', '81', '87', '67', '58', '137', '110', '97', '83', '70', '75', '68', '98', '79', '50', '87', '52', '96', '113', '128', '51', '42', '136', '42', '24', '121', '40', '27', '36', '49', '41', '101', '31', '35', '34', '63', '24', '49', '27', '45', '24', '65', '55', '30', '14', '19', '35', '23', '27', '35', '31', '27', '17', '27', '16', '25', '26', '33', '23', '12', '14', '21', '14', '17', '9', '11', '49', '8', '63', '11', '15', '42', '9', '7', '6', '15', '10', '10', '19', '10', '8', '7', '7', '10', '11', '46', '9', '18', '11', '4', '14', '13', '16', '12', '17', '6', '2', '16', '4', '5', '20', '3', '3', '26', '8', '7', '13', '5', '7', '7', '12', '24', '6', '14', '27', '2', '28', '12', '4', '19', '2', '1', '9', '4', '8', '7', '5', '4', '3', '18', '5', '2', '4', '10', '2', '21', '1', '11', '6', '4', '5', '4', '12', '3', '2', '3', '13', '5', '6', '16', '9', '7', '11', '2'

In [19]:
import requests
from bs4 import BeautifulSoup

# Step 1: Get page content
url = "https://www.bcci.tv/international/men/stats/odi"
headers = {"User-Agent": "Mozilla/5.0"}
res = requests.get(url, headers=headers)
soup = BeautifulSoup(res.text, "html.parser")

# Step 2: Find all <td> tags inside the stats section
td_tags = soup.select('div.stats-data-table-player td')

# Step 3: Extract "Avg" values
averages = []

for td in td_tags:
    value = td.find("h6")
    label = td.find("span")
    if label and value:
        if label.get_text(strip=True) == "Avg":
            averages.append(value.get_text(strip=True))

print(averages)
print(f"\nTotal average records found: {len(averages)}")

['57.71', '49.22', '41.02', '39.16', '50.57', '36.92', '36.55', '35.05', '44.11', '35.31', '37.47', '39.68', '37.08', '29.01', '23.79', '34.73', '29.04', '35.13', '48.31', '35.26', '47.81', '56.36', '32.62', '32.01', '32.59', '30.76', '33.23', '30.53', '32.82', '24.12', '30.20', '47.05', '23.39', '42.09', '20.19', '14.58', '13.30', '27.95', '24.51', '10.54', '25.94', '42.40', '10.63', '28.48', '33.50', '33.34', '23.18', '27.00', '12.00', '27.00', '25.76', '23.74', '16.44', '28.08', '16.12', '23.73', '17.17', '33.29', '13.09', '14.15', '20.52', '56.66', '30.46', '24.00', '19.95', '20.19', '19.09', '20.72', '15.73', '21.93', '17.84', '21.18', '17.31', '15.50', '13.90', '20.71', '26.09', '25.45', '14.15', '20.69', '23.27', '30.37', '28.75', '7.75', '31.85', '6.90', '24.22', '16.61', '9.85', '33.83', '27.57', '31.50', '13.84', '22.00', '18.33', '11.64', '22.57', '22.57', '25.50', '21.85', '18.87', '13.54', '5.64', '27.20', '10.15', '14.55', '65.00', '14.00', '9.38', '12.00', '11.80', '11.6

In [20]:
import requests
from bs4 import BeautifulSoup

# Step 1: Get page content
url = "https://www.bcci.tv/international/men/stats/odi"
headers = {"User-Agent": "Mozilla/5.0"}
res = requests.get(url, headers=headers)
soup = BeautifulSoup(res.text, "html.parser")

# Step 2: Get all <td> tags from the stats table
td_tags = soup.select('div.stats-data-table-player td')

# Initialize empty lists
strike_rates = []
highest_scores = []
fours = []
sixes = []

# Step 3: Loop through and extract based on <span> labels
for td in td_tags:
    value = td.find("h6")
    label = td.find("span")
    if label and value:
        label_text = label.get_text(strip=True)
        value_text = value.get_text(strip=True)

        if label_text == "SR":
            strike_rates.append(value_text)
        elif label_text == "HS":
            highest_scores.append(value_text)
        elif label_text == "4‚Äôs":
            fours.append(value_text)
        elif label_text == "6‚Äôs":
            sixes.append(value_text)

# Step 4: Display outputs
print("Strike Rates:", strike_rates[:10])
print("Highest Scores:", highest_scores[:10])
print("Fours:", fours[:10])
print("Sixes:", sixes[:10])

print(f"\nTotal SR records: {len(strike_rates)}")
print(f"Total HS records: {len(highest_scores)}")
print(f"Total 4's records: {len(fours)}")
print(f"Total 6's records: {len(sixes)}")

Strike Rates: ['93.26', '92.66', '73.70', '71.23', '87.56', '74.02', '87.67', '104.33', '91.35', '93.5']
Highest Scores: ['183', '264', '183', '153', '183', '153', '150', '219', '143', '116']
Fours: ['1332', '1066', '1122', '950', '826', '622', '908', '1132', '842', '476']
Sixes: ['152', '349', '190', '42', '229', '77', '155', '136', '79', '120']

Total SR records: 236
Total HS records: 236
Total 4's records: 236
Total 6's records: 236


In [21]:
import requests
from bs4 import BeautifulSoup

# Step 1: Get page content
url = "https://www.bcci.tv/international/men/stats/odi"
headers = {"User-Agent": "Mozilla/5.0"}
res = requests.get(url, headers=headers)
soup = BeautifulSoup(res.text, "html.parser")

# Step 2: Select all <td> blocks from the player stats table
td_tags = soup.select('div.stats-data-table-player td')

# Step 3: Initialize lists
fifties = []
hundreds = []

# Step 4: Loop through and extract based on <span> text
for td in td_tags:
    value = td.find("h6")
    label = td.find("span")
    if label and value:
        label_text = label.get_text(strip=True)
        value_text = value.get_text(strip=True)

        if label_text == "50‚Äôs":
            fifties.append(value_text)
        elif label_text == "100‚Äôs":
            hundreds.append(value_text)

# Step 5: Print results
print("50‚Äôs:", fifties[:10])
print("100‚Äôs:", hundreds[:10])

print(f"\nTotal 50‚Äôs records: {len(fifties)}")
print(f"Total 100‚Äôs records: {len(hundreds)}")

50‚Äôs: ['75', '59', '72', '83', '73', '58', '52', '38', '39', '36']
100‚Äôs: ['51', '33', '22', '12', '10', '7', '14', '15', '17', '5']

Total 50‚Äôs records: 236
Total 100‚Äôs records: 236


In [23]:
import requests
from bs4 import BeautifulSoup

# Step 1: Get page content
url = "https://www.bcci.tv/international/men/stats/odi"
headers = {"User-Agent": "Mozilla/5.0"}
res = requests.get(url, headers=headers)
soup = BeautifulSoup(res.text, "html.parser")

# Step 2: Find all <td> blocks in the player stats section
td_tags = soup.select('div.stats-data-table-player td')

# Step 3: Initialize list
runs = []

# Step 4: Loop through and extract 'Runs'
for td in td_tags:
    value = td.find("h6")
    label = td.find("span")
    if label and value:
        if label.get_text(strip=True) == "Runs":
            runs.append(value.get_text(strip=True))

# Step 5: Display output
print("Runs:", runs[:15])
print(f"\nTotal Runs records found: {len(runs)}")

Runs: ['14255', '11370', '11363', '10889', '10773', '9378', '8701', '8273', '6793', '5615', '5359', '5238', '4413', '4091', '3783']

Total Runs records found: 236


In [24]:
print("player_names :", len(player_names))
print("matches      :", len(matches))
print("innings      :", len(innings))
print("averages     :", len(averages))
print("strike_rates :", len(strike_rates))
print("highest_scores:", len(highest_scores))
print("fours        :", len(fours))
print("sixes        :", len(sixes))
print("fifties      :", len(fifties))
print("hundreds     :", len(hundreds))
print("runs         :", len(runs))

player_names : 236
matches      : 236
innings      : 236
averages     : 236
strike_rates : 236
highest_scores: 236
fours        : 236
sixes        : 236
fifties      : 236
hundreds     : 236
runs         : 236


In [25]:
import pandas as pd

# --- Step 5: Build DataFrame for Men‚Äôs ODI Stats ---
df_menn = pd.DataFrame({
    "Player": player_names,
    "Matches": matches,
    "Innings": innings,
    "Average": averages,
    "SR": strike_rates,
    "HS": highest_scores,
    "4s": fours,
    "6s": sixes,
    "50s": fifties,
    "100s": hundreds,
    "Runs": runs
})

# --- Step 6: Display & Save ---
print(df_menn.head())
print("\nTotal rows in df_men:", len(df_menn))

# Save to CSV for merging later
df_menn.to_csv("bcci_men_odi_stats.csv", index=False)

           Player Matches Innings Average     SR   HS    4s   6s 50s 100s  \
0     Virat Kohli     305     293   57.71  93.26  183  1332  152  75   51   
1    Rohit Sharma     276     268   49.22  92.66  264  1066  349  59   33   
2  Sourav Ganguly     311     300   41.02  73.70  183  1122  190  72   22   
3    Rahul Dravid     344     318   39.16  71.23  153   950   42  83   12   
4        MS Dhoni     350     297   50.57  87.56  183   826  229  73   10   

    Runs  
0  14255  
1  11370  
2  11363  
3  10889  
4  10773  

Total rows in df_men: 236


In [None]:
# I added Sachin Tendulkar's row manually because the website had issues
# and I was unable to scrape the first record. 
# After adding the row, I saved it along with the men's details into one CSV file.

In [26]:
import pandas as pd

# Create a new row (Sachin Tendulkar's data)
sachin_row = pd.DataFrame([{
    "Player": "Sachin Tendulkar",
    "Matches": 200,
    "Innings": 329,
    "Average": 53.78,
    "SR": 54.09,
    "HS": 248,
    "4s": 2058,
    "6s": 69,
    "50s": 68,
    "100s": 51
    "Runs": 15921
}])

# Add it above existing df_menn
df_menn = pd.concat([sachin_row, df_menn], ignore_index=True)

print(df_menn.head())

             Player Matches Innings Average     SR   HS    4s   6s 50s 100s  \
0  Sachin Tendulkar     200     329   53.78  54.09  248  2058   69  68   51   
1       Virat Kohli     305     293   57.71  93.26  183  1332  152  75   51   
2      Rohit Sharma     276     268   49.22  92.66  264  1066  349  59   33   
3    Sourav Ganguly     311     300   41.02  73.70  183  1122  190  72   22   
4      Rahul Dravid     344     318   39.16  71.23  153   950   42  83   12   

    Runs  
0  15921  
1  14255  
2  11370  
3  11363  
4  10889  


In [77]:
#del df_menn

In [27]:
df_menn.head()

Unnamed: 0,Player,Matches,Innings,Average,SR,HS,4s,6s,50s,100s,Runs
0,Sachin Tendulkar,200,329,53.78,54.09,248,2058,69,68,51,15921
1,Virat Kohli,305,293,57.71,93.26,183,1332,152,75,51,14255
2,Rohit Sharma,276,268,49.22,92.66,264,1066,349,59,33,11370
3,Sourav Ganguly,311,300,41.02,73.7,183,1122,190,72,22,11363
4,Rahul Dravid,344,318,39.16,71.23,153,950,42,83,12,10889
