# CBIC GST Goods Rates Scraper
This notebook scrapes the CGST/SGST/UTGST rates for goods from the official [CBIC GST website](https://cbic-gst.gov.in/gst-goods-services-rates.html) and saves them to a clean CSV file.

This was done with the help python libraries: beautifulsoup and pandas

# Install dependencies

In [1]:
!pip install requests beautifulsoup4 pandas



### Imports

In [2]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import re

### Helper function to clean HTML cell text

In [3]:
def clean_text(cell):
    txt = cell.get_text(separator=" ", strip=True)
    txt = txt.replace("\xa0", " ")
    return re.sub(r"\s+", " ", txt).strip()

Below contain the code related on fetching the page, to finding the table, iterating the rows, building the Dataframes for each column names,Cleaning and Processing and at last saving to CSV

In [4]:
url  = "https://cbic-gst.gov.in/gst-goods-services-rates.html"
resp = requests.get(url)
resp.raise_for_status()
soup = BeautifulSoup(resp.text, "html.parser")

table = soup.find("table", id="goods_table")

records = []
for tr in table.tbody.find_all("tr"):
    tds = tr.find_all("td")
    if len(tds) != 8:
        continue
    vals = [ clean_text(td) for td in tds ]
    records.append(vals)

cols = [
    "Schedules",
    "S.No.",
    "Chapter/Heading/Sub-heading/Tariffitem",
    "DescriptionofGoods",
    "CGST(%)",
    "SGST/UTGST(%)",
    "IGST(%)",
    "CompensationCess"
]
df = pd.DataFrame(records, columns=cols)

def to_float(x):
    x = re.sub(r"[^0-9.]", "", x)
    return float(x) if x else None

for rate_col in ["CGST(%)", "SGST/UTGST(%)", "IGST(%)"]:
    df[rate_col] = df[rate_col].apply(to_float)

df.to_csv("cbic_gst_goods_rates_exact.csv", index=False)
print(f"Extracted {len(df)} rows → cbic_gst_goods_rates_exact.csv")

Extracted 1850 rows → cbic_gst_goods_rates_exact.csv
