### üìå Source of Women's Player Statistics (BCCI)

The data for **Indian Women's ODI players** has been scraped from the official BCCI website:

üîó **URL:** https://www.bcci.tv/international/women/stats/odi

This page provides detailed statistics for Indian women's international cricketers, which were extracted using `Python`, `Requests`, and `BeautifulSoup` in this notebook.

### üì• Extracting Women's Player Data

In this notebook, I extracted the **Indian Women's ODI player statistics** using **BeautifulSoup**.  
The HTML content from the BCCI website was parsed to collect important player information such as:

- Player Name  
- Role  
- Batting Stats  
- Bowling Stats  
- Additional profile information  

BeautifulSoup was used to navigate the webpage structure and scrape all relevant columns accurately.

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

# --- Step 1: fetch page ---
url = "https://www.bcci.tv/international/women/stats/odi"   # or men's URL if needed
headers = {"User-Agent": "Mozilla/5.0"}
res = requests.get(url, headers=headers)
soup = BeautifulSoup(res.text, "html.parser")

# --- Step 2: extract player names (normal loop) ---
player_tags = soup.select('div.stats-data-table-player td[width="20%"] h6')

player_names = []
for tag in player_tags:
    name = tag.get_text(strip=True)
    player_names.append(name)

print("Players found:", len(player_names))

# --- Step 3: extract all stat td blocks ---
td_tags = soup.select('div.stats-data-table-player td')

# initialize lists
matches = []
innings = []
averages = []
strike_rates = []
highest_scores = []
fours = []
sixes = []
fifties = []
hundreds = []
runs = []

# loop and append to respective lists
for td in td_tags:
    value_tag = td.find("h6")
    label_tag = td.find("span")
    if not (value_tag and label_tag):
        continue
    label = label_tag.get_text(strip=True)
    value = value_tag.get_text(strip=True)

    if label == "Matches":
        matches.append(value)
    elif label == "Inns":
        innings.append(value)
    elif label == "Avg":
        averages.append(value)
    elif label == "SR":
        strike_rates.append(value)
    elif label == "HS":
        highest_scores.append(value)
    elif label == "4‚Äôs" or label == "4s" or label == "4s":   # tolerant checks
        fours.append(value)
    elif label == "6‚Äôs" or label == "6s" or label == "6s":
        sixes.append(value)
    elif label == "50‚Äôs" or label == "50s":
        fifties.append(value)
    elif label == "100‚Äôs" or label == "100s":
        hundreds.append(value)
    elif label == "Runs":
        runs.append(value)

Players found: 134


In [2]:
len(player_names)

134

In [3]:
print("player_names :", len(player_names))
print("matches      :", len(matches))
print("innings      :", len(innings))
print("averages     :", len(averages))
print("strike_rates :", len(strike_rates))
print("highest_scores:", len(highest_scores))
print("fours        :", len(fours))
print("sixes        :", len(sixes))
print("fifties      :", len(fifties))
print("hundreds     :", len(hundreds))
print("runs         :", len(runs))

player_names : 134
matches      : 134
innings      : 134
averages     : 134
strike_rates : 134
highest_scores: 134
fours        : 134
sixes        : 134
fifties      : 134
hundreds     : 134
runs         : 134


In [6]:
# --- Step 5: build DataFrame ---
df_women = pd.DataFrame({
    "Player": player_names,
    "Matches": matches,
    "Innings": innings,
    "Average": averages,
    "SR": strike_rates,
    "HS": highest_scores,
    "4s": fours,
    "6s": sixes,
    "50s": fifties,
    "100s": hundreds,
    "Runs": runs
})

print(df_women.head())
print("\nTotal rows in df:", len(df_women))

             Player Matches Innings Average     SR   HS   4s  6s 50s 100s  \
0   Smriti Mandhana     117     117   48.38  90.52  136  642  74  34   14   
1  Harmanpreet Kaur     161     140   37.05  77.06  171  441  56  22    7   
2      Anjum Chopra     127     112   31.38      -  100    -   -  18    1   
3     Deepti Sharma     121     103   37.01  70.64  188  261  19  18    1   
4        Punam Raut      73      73   34.83  58.26  109  261   6  15    3   

   Runs  
0  5322  
1  4409  
2  2856  
3  2739  
4  2299  

Total rows in df: 134


In [None]:
### üìù Note on Manual Addition (Mithali Raj)

#Due to scraping issues on the BCCI website, the first record (Mithali Raj) could not be extracted automatically.  
#To maintain dataset completeness, I manually created a row containing Mithali Raj's ODI career statistics and added it to the top of the existing `df_women` DataFrame.

#After adding the row, the final dataset was saved as:

#**`bcci_women_odi_stats.csv`**

#This ensures that the women's ODI dataset includes all key players with accurate records.

In [7]:
import pandas as pd

mithali_row = pd.DataFrame([{
    "Player": "Mithali Raj",
    "Matches": 232,
    "Innings": 211,
    "Average": 50.68,
    "SR": 67.54,
    "HS": 125,
    "4s": 805,
    "6s": 19,
    "50s": 64,
    "100s": 7,
    "Runs": 7805
}])


df_women = pd.concat([mithali_row, df_women], ignore_index=True)


print(df_women.head(5))
print(f"Total rows now: {len(df_women)}")


df_women.to_csv("bcci_women_odi_stats.csv", index=False)

             Player Matches Innings Average     SR   HS   4s  6s 50s 100s  \
0       Mithali Raj     232     211   50.68  67.54  125  805  19  64    7   
1   Smriti Mandhana     117     117   48.38  90.52  136  642  74  34   14   
2  Harmanpreet Kaur     161     140   37.05  77.06  171  441  56  22    7   
3      Anjum Chopra     127     112   31.38      -  100    -   -  18    1   
4     Deepti Sharma     121     103   37.01  70.64  188  261  19  18    1   

   Runs  
0  7805  
1  5322  
2  4409  
3  2856  
4  2739  
Total rows now: 135


In [17]:
# del df_women

In [8]:
print(f"Total rows of woMen: {len(df_women)}")

Total rows of woMen: 135
