# Project on BRSR

This project shall take data from BRSR disclosures in India for the period of April 01, 2024 to March 31, 2025 and perform data analysis and visualization on it.

## Business Responsibility and Sustainability Reporting (BRSR)

BRSR is a sustainability reporting framework mandated by the Securities and Exchange Board of India (SEBI) for the top 1,000 listed companies in India. It requires companies to disclose their environmental, social, and governance (ESG) performance indicators, promoting responsible business practices and sustainable development. 

### Key aspects of BRSR

**Mandatory Reporting:**
BRSR is mandatory for the top 1000 listed companies in India by market capitalization.

**ESG Focus:**
It focuses on disclosing a company's performance related to Environmental, Social, and Governance factors. 

**Alignment with National Guidelines:**
BRSR disclosures are aligned with the National Guidelines on Responsible Business Conduct (NGRBC). 

**Transition from BRR:**
BRSR replaces the older Business Responsibility Reporting (BRR) framework, marking a shift towards more comprehensive and quantifiable sustainability reporting.

### Objective:
The main goal is to link a company's financial performance with its ESG performance, promoting transparency and accountability. 

# BRSR disclosures on NSE website

### Fetch NSE Data

URL: https://www.nseindia.com/companies-listing/corporate-filings-bussiness-sustainabilitiy-reports

BRSR filings for the relevant period can be downloaded as a csv file and have been saved in the project directory as **nse_brsr_filings.csv**.

In [1]:
import pandas as pd

In [2]:
#Let's take a look at what the NSE data looks like

nse_df = pd.read_csv("nse_brsr_filings.csv")
nse_df

Unnamed: 0,COMPANY \n,FROM YEAR \n,TO YEAR \n,ATTACHMENT \n,**XBRL \n,ORIGINAL SUBMISSION DATE \n,LATEST REVISION DATE \n
0,Varun Beverages Limited,2024,2024,https://nsearchives.nseindia.com/corporate/VBL...,https://nsearchives.nseindia.com/corporate/xbr...,11-Mar-2025,-
1,Castrol India Limited,2024,2024,https://nsearchives.nseindia.com/corporate/CAS...,https://nsearchives.nseindia.com/corporate/xbr...,25-Feb-2025,25-Feb-2025
2,Cyient Limited,2023,2024,https://nsearchives.nseindia.com/corporate/CYI...,https://nsearchives.nseindia.com/corporate/xbr...,19-Feb-2025,-
3,Siemens Limited,2023,2024,https://nsearchives.nseindia.com/corporate/SIE...,https://nsearchives.nseindia.com/corporate/xbr...,14-Jan-2025,-
4,Indraprastha Gas Limited,2023,2024,https://nsearchives.nseindia.com/corporate/IGL...,https://nsearchives.nseindia.com/corporate/xbr...,17-Dec-2024,-
...,...,...,...,...,...,...,...
1176,Huhtamaki India Limited,2023,2023,https://nsearchives.nseindia.com/corporate/HUH...,https://nsearchives.nseindia.com/corporate/xbr...,30-Apr-2024,-
1177,Sanofi India Limited,2023,2023,https://nsearchives.nseindia.com/corporate/SAN...,https://nsearchives.nseindia.com/corporate/xbr...,23-Apr-2024,-
1178,Transformers and Rectifiers (India) Limited,2023,2024,https://nsearchives.nseindia.com/corporate/TRI...,https://nsearchives.nseindia.com/corporate/xbr...,20-Apr-2024,-
1179,RAIN INDUSTRIES LIMITED,2023,2023,https://nsearchives.nseindia.com/corporate/RAI...,https://nsearchives.nseindia.com/corporate/xbr...,18-Apr-2024,-


### Clean NSE data

In [3]:
# Some basic cleaning
nse_df.rename(columns={
    "COMPANY \n": "Company",
    "FROM YEAR \n": "FromYear",
    "TO YEAR \n": "ToYear",
    "ATTACHMENT \n": "PDFURL",
    "**XBRL \n": "XBRLURL",
    "ORIGINAL SUBMISSION DATE \n": "SubmissionDate",
    "LATEST REVISION DATE \n": "RevisionDate"
}, inplace=True)

# Let's replace '-' with an empty string and convert dates to YYYY-MM-DD format
nse_df['RevisionDate'] = nse_df['RevisionDate'].replace("-", "").apply(
    lambda x: pd.to_datetime(x, format="%d-%b-%Y").strftime("%Y-%m-%d") if x else x
)

nse_df['SubmissionDate'] = nse_df['SubmissionDate'].replace("-", "").apply(
    lambda x: pd.to_datetime(x, format="%d-%b-%Y").strftime("%Y-%m-%d") if x else x
)

# Count empty values in the date columns
empty_submission = (nse_df['SubmissionDate'] == "").sum()
empty_revision = (nse_df['RevisionDate'] == "").sum()

empty_submission
empty_revision


1157

In [4]:
nse_df

Unnamed: 0,Company,FromYear,ToYear,PDFURL,XBRLURL,SubmissionDate,RevisionDate
0,Varun Beverages Limited,2024,2024,https://nsearchives.nseindia.com/corporate/VBL...,https://nsearchives.nseindia.com/corporate/xbr...,2025-03-11,
1,Castrol India Limited,2024,2024,https://nsearchives.nseindia.com/corporate/CAS...,https://nsearchives.nseindia.com/corporate/xbr...,2025-02-25,2025-02-25
2,Cyient Limited,2023,2024,https://nsearchives.nseindia.com/corporate/CYI...,https://nsearchives.nseindia.com/corporate/xbr...,2025-02-19,
3,Siemens Limited,2023,2024,https://nsearchives.nseindia.com/corporate/SIE...,https://nsearchives.nseindia.com/corporate/xbr...,2025-01-14,
4,Indraprastha Gas Limited,2023,2024,https://nsearchives.nseindia.com/corporate/IGL...,https://nsearchives.nseindia.com/corporate/xbr...,2024-12-17,
...,...,...,...,...,...,...,...
1176,Huhtamaki India Limited,2023,2023,https://nsearchives.nseindia.com/corporate/HUH...,https://nsearchives.nseindia.com/corporate/xbr...,2024-04-30,
1177,Sanofi India Limited,2023,2023,https://nsearchives.nseindia.com/corporate/SAN...,https://nsearchives.nseindia.com/corporate/xbr...,2024-04-23,
1178,Transformers and Rectifiers (India) Limited,2023,2024,https://nsearchives.nseindia.com/corporate/TRI...,https://nsearchives.nseindia.com/corporate/xbr...,2024-04-20,
1179,RAIN INDUSTRIES LIMITED,2023,2023,https://nsearchives.nseindia.com/corporate/RAI...,https://nsearchives.nseindia.com/corporate/xbr...,2024-04-18,


It appears that during the 01-04-2024 to 31-03-2025 period, vairous companies have filed, and for various periods. The 3 permissible periods are 
1. Calendar years 2023 (Jan to Dec 2023, submitting the report by June 2024)
2. Financial year (April 2023 to March 2024, submitting the report before Sept 2024)
3. Calendar year 2024 (Jan to Dec 2023, those submitting the report by March 2025).

Let's count the number of reports in these buckets to get a better idea.

In [5]:
count_2023_2023 = nse_df[(nse_df["FromYear"] == 2023) & (nse_df["ToYear"] == 2023)].shape[0]

count_2023_2024 = nse_df[(nse_df["FromYear"] == 2023) & (nse_df["ToYear"] == 2024)].shape[0]

count_2024_2024 = nse_df[(nse_df["FromYear"] == 2024) & (nse_df["ToYear"] == 2024)].shape[0]

# Count rows where none of conditions fits
count_neither = nse_df[~((nse_df["FromYear"] == 2023) & (nse_df["ToYear"].isin([2023, 2024]))) & 
                       ~((nse_df["FromYear"] == 2024) & (nse_df["ToYear"] == 2024))]

print(f"Rows where FromYear is 2023 and ToYear is 2023: {count_2023_2023}")
print(f"Rows where FromYear is 2023 and ToYear is 2024: {count_2023_2024}")
print(f"Rows where FromYear is 2024 and ToYear is 2024: {count_2024_2024}")
print(f"Rows where neither condition fits: {count_neither.shape[0]}")


Rows where FromYear is 2023 and ToYear is 2023: 10
Rows where FromYear is 2023 and ToYear is 2024: 1167
Rows where FromYear is 2024 and ToYear is 2024: 2
Rows where neither condition fits: 2


The cases where none of the conditions are true seem to be erroneous. Let's open them up.

In [6]:
neither_fits_df = nse_df[~((nse_df["FromYear"] == 2023) & (nse_df["ToYear"].isin([2023, 2024]))) & 
                         ~((nse_df["FromYear"] == 2024) & (nse_df["ToYear"] == 2024))]

# Display full column without truncation
pd.set_option('display.max_rows', None)  # Show all rows
pd.set_option('display.max_colwidth', None)  # Ensure full column width is displayed

neither_fits_df


Unnamed: 0,Company,FromYear,ToYear,PDFURL,XBRLURL,SubmissionDate,RevisionDate
18,ITI LIMITED,2024,2025,https://nsearchives.nseindia.com/corporate/ITI_10102024182155_SE_Intimation_on_BRSR.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1265258_10102024062627_WEB.xml,2024-10-10,
512,Equitas Small Finance Bank Limited,2024,2025,https://nsearchives.nseindia.com/corporate/EQUITASBNK_13082024202258_ESFBBRSRIntimationFY2324.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1224424_13082024083934_WEB.xml,2024-08-13,


Upon checking manually, both these cases are for FY 2023-24. So we can modify the actual df.

In [7]:
# Modify rows where FromYear = 2024 and ToYear = 2025
nse_df.loc[(nse_df["FromYear"] == 2024) & (nse_df["ToYear"] == 2025), ["FromYear", "ToYear"]] = [2023, 2024]

In [8]:
neither_fits_df = nse_df[~((nse_df["FromYear"] == 2023) & (nse_df["ToYear"].isin([2023, 2024]))) & 
                         ~((nse_df["FromYear"] == 2024) & (nse_df["ToYear"] == 2024))]

# Display full column without truncation
pd.set_option('display.max_rows', None)  # Show all rows
pd.set_option('display.max_colwidth', None)  # Ensure full column width is displayed

neither_fits_df

Unnamed: 0,Company,FromYear,ToYear,PDFURL,XBRLURL,SubmissionDate,RevisionDate


In [9]:
count_2023_2023 = nse_df[(nse_df["FromYear"] == 2023) & (nse_df["ToYear"] == 2023)].shape[0]

count_2023_2024 = nse_df[(nse_df["FromYear"] == 2023) & (nse_df["ToYear"] == 2024)].shape[0]

count_2024_2024 = nse_df[(nse_df["FromYear"] == 2024) & (nse_df["ToYear"] == 2024)].shape[0]

# Count rows where none of conditions fits
count_neither = nse_df[~((nse_df["FromYear"] == 2023) & (nse_df["ToYear"].isin([2023, 2024]))) & 
                       ~((nse_df["FromYear"] == 2024) & (nse_df["ToYear"] == 2024))]

print(f"Rows where FromYear is 2023 and ToYear is 2023: {count_2023_2023}")
print(f"Rows where FromYear is 2023 and ToYear is 2024: {count_2023_2024}")
print(f"Rows where FromYear is 2024 and ToYear is 2024: {count_2024_2024}")
print(f"Rows where neither condition fits: {count_neither.shape[0]}")

Rows where FromYear is 2023 and ToYear is 2023: 10
Rows where FromYear is 2023 and ToYear is 2024: 1169
Rows where FromYear is 2024 and ToYear is 2024: 2
Rows where neither condition fits: 0


### Remove duplicates

In [10]:
# Count duplicates in 'Company' column
duplicate_counts = nse_df['Company'].value_counts()
duplicates = duplicate_counts[duplicate_counts > 1]

# Display duplicate companies and their counts
print("Duplicate values in 'Company' column:")
print(duplicates)


Duplicate values in 'Company' column:
Company
Jain Irrigation Systems Limited    2
Tata Motors Limited                2
Name: count, dtype: int64


In [11]:
# Identify duplicate companies and display their rows
duplicate_companies_df = nse_df[nse_df.duplicated(subset=['Company'], keep=False)]

# Display the full DataFrame with duplicate 'Company' values
print(duplicate_companies_df)

                              Company  FromYear  ToYear  \
536   Jain Irrigation Systems Limited      2023    2024   
537   Jain Irrigation Systems Limited      2023    2024   
1139              Tata Motors Limited      2023    2024   
1140              Tata Motors Limited      2023    2024   

                                                                                            PDFURL  \
536    https://nsearchives.nseindia.com/corporate/JISLDVREQS_09082024173911_Jains_BRSR_2023_24.pdf   
537    https://nsearchives.nseindia.com/corporate/JISLJALEQS_09082024173541_Jains_BRSR_2023_24.pdf   
1139    https://nsearchives.nseindia.com/corporate/TATAMTRDVRB_31052024160028_NSEBSELETTERBRSR.pdf   
1140  https://nsearchives.nseindia.com/corporate/TATAMOTORSSJS_31052024155411_NSEBSELETTERBRSR.pdf   

                                                                                  XBRLURL  \
536   https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1219094_09082024053948_WEB.xml   
537   h

These are disclosures by the same company on the same date. We will delete these 2 rows manually and keep the newer values (upper values since the original csv file was sorted in the descending order by date-time by NSE)

In [12]:
urls_to_delete = [
    "https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1219073_09082024053555_WEB.xml", 
    "https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1150389_31052024035442_WEB.xml"
    ]

# Remove rows where XBRLURL matches the given values
nse_df = nse_df[~nse_df["XBRLURL"].isin(urls_to_delete)]

nse_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1179 entries, 0 to 1180
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Company         1179 non-null   object
 1   FromYear        1179 non-null   int64 
 2   ToYear          1179 non-null   int64 
 3   PDFURL          1179 non-null   object
 4   XBRLURL         1179 non-null   object
 5   SubmissionDate  1179 non-null   object
 6   RevisionDate    1179 non-null   object
dtypes: int64(2), object(5)
memory usage: 73.7+ KB


In [13]:
print(nse_df.shape)

(1179, 7)


In [14]:
# Count duplicates in 'Company' column
duplicate_counts = nse_df['Company'].value_counts()
duplicates = duplicate_counts[duplicate_counts > 1]

# Display duplicate companies and their counts
print("Duplicate values in 'Company' column:")
print(duplicates)

Duplicate values in 'Company' column:
Series([], Name: count, dtype: int64)


In [15]:
nse_df.shape

(1179, 7)

### Clean up Company Names

In [16]:
nse_df['Company']

0                                         Varun Beverages Limited
1                                           Castrol India Limited
2                                                  Cyient Limited
3                                                 Siemens Limited
4                                        Indraprastha Gas Limited
5                                    Religare Enterprises Limited
6                                         Future Consumer Limited
7                                    JEENA SIKHO LIFECARE LIMITED
8                                            BF UTILITIES LIMITED
9                           Network18 Media & Investments Limited
10                                Procter & Gamble Health Limited
11                                         Gillette India Limited
12               Procter & Gamble Hygiene and Health Care Limited
13                          Zee Entertainment Enterprises Limited
14                                                    HMT LIMITED
15        

In [17]:
# I can see 'LTD.' in some cases. Some are totally uppercase and lowercase. Let's fix that

def clean_company_name(name):
    # Convert to title case if the name is all uppercase or all lowercase
    if name.isupper() or name.islower():
        name = name.title()

    # Replace trailing 'LTD.' (case-insensitive) with 'Limited'
    if name.upper().endswith('LTD.'):
        name = name[:-4].rstrip() + ' Limited'

    return name

nse_df.loc[:, 'Company'] = nse_df['Company'].apply(clean_company_name)

In [18]:
nse_df['Company']

0                                         Varun Beverages Limited
1                                           Castrol India Limited
2                                                  Cyient Limited
3                                                 Siemens Limited
4                                        Indraprastha Gas Limited
5                                    Religare Enterprises Limited
6                                         Future Consumer Limited
7                                    Jeena Sikho Lifecare Limited
8                                            Bf Utilities Limited
9                           Network18 Media & Investments Limited
10                                Procter & Gamble Health Limited
11                                         Gillette India Limited
12               Procter & Gamble Hygiene and Health Care Limited
13                          Zee Entertainment Enterprises Limited
14                                                    Hmt Limited
15        

In [19]:
# Further Cleanup

first_word_upper = {
    'Bf Utilities Limited': 'BF Utilities Limited',
    'Hmt Limited': 'HMT Limited',
    'Iti Limited': 'ITI Limited',
    'Nmdc Steel Limited': 'NMDC Steel Limited',
    'Pb Fintech Limited': 'PB Fintech Limited',
    'Tarc Limited': 'TARC Limited',
    'Hma Agro Industries Limited': 'HMA Agro Industries Limited',
    'Nlc India Limited': 'NLC India Limited',
    'Pvr Inox Limited': 'PVR INOX Limited',
    'Gfl Limited': 'GFL Limited',
    'Mstc Limited': 'MSTC Limited',
    'Jtl Industries Limited': 'JTL Industries Limited',
    'Ncc Limited': 'NCC Limited',
    'Pcbl Limited': 'PCBL Limited',
    'Ntpc Limited': 'NTPC Limited',
    'Pi Industries Limited': 'PI Industries Limited',
    'Upl Limited': 'UPL Limited',
    'Nhpc Limited': 'NHPC Limited',
    'Kcp Limited': 'KCP Limited',
    'Eih Limited': 'EIH Limited',
    'Sis Limited': 'SIS Limited',
    'Ksb Limited': 'KSB Limited',
    'Uco Bank': 'UCO Bank',
    'Gm Breweries Limited': 'GM Breweries Limited',
    'Force Motors Ltd': 'Force Motors Limited',
    'Shree Digvijay Cement Co.Ltd': 'Shree Digvijay Cement Co. Limited',
    'Ems Limited': 'EMS Limited'
}

def fix_specific_names(name):
    name = name.strip()
    return first_word_upper.get(name, name)

nse_df.loc[:, 'Company'] = nse_df['Company'].apply(fix_specific_names)

In [20]:
nse_df['Company']

0                                         Varun Beverages Limited
1                                           Castrol India Limited
2                                                  Cyient Limited
3                                                 Siemens Limited
4                                        Indraprastha Gas Limited
5                                    Religare Enterprises Limited
6                                         Future Consumer Limited
7                                    Jeena Sikho Lifecare Limited
8                                            BF Utilities Limited
9                           Network18 Media & Investments Limited
10                                Procter & Gamble Health Limited
11                                         Gillette India Limited
12               Procter & Gamble Hygiene and Health Care Limited
13                          Zee Entertainment Enterprises Limited
14                                                    HMT Limited
15        

### Address Submission and Revision Dates

In [21]:
# Count occurrences of each unique value in RevisionDate
revision_counts = nse_df["RevisionDate"].value_counts()
df=nse_df
sorted_df = df[df["RevisionDate"] != ""].sort_values(by="Company")
print(sorted_df)

                                    Company  FromYear  ToYear  \
666        Bharat Heavy Electricals Limited      2023    2024   
1                     Castrol India Limited      2024    2024   
1175           Craftsman Automation Limited      2023    2024   
779                             DLF Limited      2023    2024   
165    Diamond Power Infrastructure Limited      2023    2024   
690   Federal-Mogul Goetze (India) Limited.      2023    2024   
636        Flair Writing Industries Limited      2023    2024   
969             GIC Housing Finance Limited      2023    2024   
841                             GRP Limited      2023    2024   
385             Gateway Distriparks Limited      2023    2024   
955                  Graphite India Limited      2023    2024   
287          Hariom Pipe Industries Limited      2023    2024   
318                  Juniper Hotels Limited      2023    2024   
809                        Kamdhenu Limited      2023    2024   
88                      L

So a bunch of companies filed revised BRSR disclosures but these are not being repeated in the Company names as we saw earlier. So we downloaded them all and we consider their revision dates to be their filing dates. We will manually remove the 2 diplicate ones.

In [22]:
# Replace SubmissionDate with RevisionDate where RevisionDate is not empty
nse_df.loc[nse_df["RevisionDate"] != "", "SubmissionDate"] = nse_df["RevisionDate"]
nse_df

Unnamed: 0,Company,FromYear,ToYear,PDFURL,XBRLURL,SubmissionDate,RevisionDate
0,Varun Beverages Limited,2024,2024,https://nsearchives.nseindia.com/corporate/VBL_11032025174358_VBLNoticeAR2024.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1394994_11032025054427_WEB.xml,2025-03-11,
1,Castrol India Limited,2024,2024,https://nsearchives.nseindia.com/corporate/CASTROLINDIA_25022025195909_SE_Intimation_BRSRFY2024_25022025.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1389867_25022025075940_WEB.xml,2025-02-25,2025-02-25
2,Cyient Limited,2023,2024,https://nsearchives.nseindia.com/corporate/CYIENT_19022025125655_BRSR2024.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1387421_19022025125706_WEB.xml,2025-02-19,
3,Siemens Limited,2023,2024,https://nsearchives.nseindia.com/corporate/SIEMENS57_14012025173818_Siemens-Limited-BRSR-2024.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1345203_14012025053859_WEB.xml,2025-01-14,
4,Indraprastha Gas Limited,2023,2024,https://nsearchives.nseindia.com/corporate/IGL1_17122024163450_BRSRIGL.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1328008_17122024043540_WEB.xml,2024-12-17,
5,Religare Enterprises Limited,2023,2024,https://nsearchives.nseindia.com/corporate/RELIGARE_07122024171106_RELIGAREAGMNOTICEAR.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1324332_07122024051536_WEB.xml,2024-12-07,
6,Future Consumer Limited,2023,2024,https://nsearchives.nseindia.com/corporate/FCEL1_05122024180848_FCLBRSR31032024.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1323499_05122024061135_WEB.xml,2024-12-05,
7,Jeena Sikho Lifecare Limited,2023,2024,https://nsearchives.nseindia.com/corporate/JEENASIKHO_29112024133437_SignedBRSRJeena2024.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1320946_29112024013446_WEB.xml,2024-11-29,
8,BF Utilities Limited,2023,2024,https://nsearchives.nseindia.com/corporate/BFUTILITIE_28112024123201_BRSR_BFUL_28112024.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1320378_28112024123226_WEB.xml,2024-11-28,
9,Network18 Media & Investments Limited,2023,2024,https://nsearchives.nseindia.com/corporate/Shambhu_27112024210620_NW18BRSRfiling27112024.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1320324_27112024090720_WEB.xml,2024-11-27,


In [23]:
# Finally, let's drop the RevisionDate column
nse_df.drop(columns=["RevisionDate"], inplace=True)

nse_df

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  nse_df.drop(columns=["RevisionDate"], inplace=True)


Unnamed: 0,Company,FromYear,ToYear,PDFURL,XBRLURL,SubmissionDate
0,Varun Beverages Limited,2024,2024,https://nsearchives.nseindia.com/corporate/VBL_11032025174358_VBLNoticeAR2024.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1394994_11032025054427_WEB.xml,2025-03-11
1,Castrol India Limited,2024,2024,https://nsearchives.nseindia.com/corporate/CASTROLINDIA_25022025195909_SE_Intimation_BRSRFY2024_25022025.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1389867_25022025075940_WEB.xml,2025-02-25
2,Cyient Limited,2023,2024,https://nsearchives.nseindia.com/corporate/CYIENT_19022025125655_BRSR2024.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1387421_19022025125706_WEB.xml,2025-02-19
3,Siemens Limited,2023,2024,https://nsearchives.nseindia.com/corporate/SIEMENS57_14012025173818_Siemens-Limited-BRSR-2024.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1345203_14012025053859_WEB.xml,2025-01-14
4,Indraprastha Gas Limited,2023,2024,https://nsearchives.nseindia.com/corporate/IGL1_17122024163450_BRSRIGL.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1328008_17122024043540_WEB.xml,2024-12-17
5,Religare Enterprises Limited,2023,2024,https://nsearchives.nseindia.com/corporate/RELIGARE_07122024171106_RELIGAREAGMNOTICEAR.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1324332_07122024051536_WEB.xml,2024-12-07
6,Future Consumer Limited,2023,2024,https://nsearchives.nseindia.com/corporate/FCEL1_05122024180848_FCLBRSR31032024.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1323499_05122024061135_WEB.xml,2024-12-05
7,Jeena Sikho Lifecare Limited,2023,2024,https://nsearchives.nseindia.com/corporate/JEENASIKHO_29112024133437_SignedBRSRJeena2024.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1320946_29112024013446_WEB.xml,2024-11-29
8,BF Utilities Limited,2023,2024,https://nsearchives.nseindia.com/corporate/BFUTILITIE_28112024123201_BRSR_BFUL_28112024.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1320378_28112024123226_WEB.xml,2024-11-28
9,Network18 Media & Investments Limited,2023,2024,https://nsearchives.nseindia.com/corporate/Shambhu_27112024210620_NW18BRSRfiling27112024.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1320324_27112024090720_WEB.xml,2024-11-27


Let's save this DataFrame into an Excel file so that we can pick up from this point in the next notebook.

### Download NSE data

We will take the XBRL URL, save the XML file, convert it to Excel using the template available on NSE website and save those excel files as well.

In [24]:
import pandas as pd
from tqdm import tqdm
from pathlib import Path
import requests
import xml.etree.ElementTree as ET
from openpyxl import Workbook, load_workbook
from openpyxl.styles import Font
from openpyxl.utils import get_column_letter

# === CONFIGURATION ===
TEMPLATE_PATH = "template.xlsx"
OUTPUT_DIR = Path("excel_files")
XML_DIR = Path("xbrl_files")
OUTPUT_DIR.mkdir(exist_ok=True)
XML_DIR.mkdir(exist_ok=True)

# === CONSTANTS ===
NS_XBRLI = "{http://www.xbrl.org/2003/instance}"

# === HELPER FUNCTIONS ===
def concepts_from_template(template_path: str) -> set[str]:
    wb = load_workbook(template_path, read_only=True)
    ws = wb["Element"]
    header = [c.value for c in next(ws.iter_rows(max_row=1))]
    idx = header.index("name")
    concepts = {str(row[idx]).strip() for row in ws.iter_rows(min_row=2, values_only=True) if row[idx]}
    wb.close()
    return concepts

def load_context_periods(root) -> dict[str, str]:
    ctx_period = {}
    for ctx in root.findall(NS_XBRLI + "context"):
        ctx_id = ctx.attrib.get("id")
        period = ctx.find(NS_XBRLI + "period")
        if not period:
            continue
        sd = period.find(NS_XBRLI + "startDate")
        ed = period.find(NS_XBRLI + "endDate")
        inst = period.find(NS_XBRLI + "instant")
        if sd is not None and ed is not None:
            ctx_period[ctx_id] = f"{sd.text} To {ed.text}"
        elif inst is not None:
            ctx_period[ctx_id] = inst.text
        else:
            ctx_period[ctx_id] = ""
    return ctx_period

def extract_facts(xbrl_bytes: bytes, wanted_concepts: set[str]):
    root = ET.fromstring(xbrl_bytes)
    ctx_period = load_context_periods(root)
    for elem in root:
        tag = elem.tag
        if tag.endswith(("context", "unit", "schemaRef")):
            continue
        local = tag.split("}", 1)[1] if "}" in tag else tag
        if local not in wanted_concepts:
            continue
        ctx_id = elem.attrib.get("contextRef")
        period = ctx_period.get(ctx_id, "") if ctx_id else ""
        decimals = elem.attrib.get("decimals")
        value = (elem.text or "").strip()
        yield local, period, ctx_id or None, decimals, value

def build_workbook(facts) -> Workbook:
    wb = Workbook()
    ws = wb.active
    ws.title = "Instance Data"
    ws.append(["Sr.No.", "Element Name", "Period", "Unit", "Decimals", "Fact Value"])
    for cell in ws[1]:
        cell.font = Font(bold=True)
    for sr, (local, period, ctx, dec, val) in enumerate(facts, start=1):
        ws.append([sr, local, period, ctx, dec, val])
    for col in range(1, 7):
        ws.column_dimensions[get_column_letter(col)].width = 25
    return wb

# === MAIN FUNCTION (NOW WITH XML DOWNLOAD) ===
def xml_url_to_excel(xml_url: str, company_name: str, template_path: str = TEMPLATE_PATH) -> str:
    try:
        headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
        }

        # Save path for XML
        xml_path = XML_DIR / f"{company_name}.xml"

        # Skip download if file already exists
        if not xml_path.exists():
            response = requests.get(xml_url, headers=headers, timeout=30)
            response.raise_for_status()
            with open(xml_path, "wb") as f:
                f.write(response.content)
        else:
            response = open(xml_path, "rb")

        # Load template concepts
        concepts = concepts_from_template(template_path)

        # Read the XBRL bytes
        xbrl_bytes = open(xml_path, "rb").read()

        # Extract facts
        facts = list(extract_facts(xbrl_bytes, concepts))
        if not facts:
            return None

        # Create Excel file
        wb = build_workbook(facts)
        excel_path = OUTPUT_DIR / f"{company_name.replace(' ', '_')}.xlsx"
        wb.save(excel_path)
        return str(excel_path)

    except Exception as e:
        print(f"❌ Error processing {company_name} | {xml_url}: {e}")
        return None


In [25]:
missing_files = []

for _, row in tqdm(nse_df.iterrows(), total=nse_df.shape[0], desc="Processing XBRL Files"):
    xml_url = row["XBRLURL"]
    company_name = row["Company"].strip()

    if pd.notna(xml_url) and xml_url.strip():
        result = xml_url_to_excel(xml_url, company_name)
        if result is None:
            missing_files.append(f"{company_name} | {xml_url}")

if missing_files:
    print("\n❌ The following files failed:")
    for entry in missing_files:
        print(entry)

print("✅ All done!")


Processing XBRL Files:  35%|███████████████████▍                                    | 408/1179 [11:28<20:08,  1.57s/it]

❌ Error processing GMR Power and Urban Infra Limited | https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1231919_24082024112726_WEB.xml: 404 Client Error: Not Found for url: https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1231919_24082024112726_WEB.xml


Processing XBRL Files:  44%|████████████████████████▋                               | 519/1179 [14:44<16:33,  1.51s/it]

❌ Error processing GE T&D India Limited | https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1223287_13082024045134_WEB.xml: 404 Client Error: Not Found for url: https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1223287_13082024045134_WEB.xml


Processing XBRL Files:  74%|█████████████████████████████████████████▍              | 872/1179 [25:19<07:49,  1.53s/it]

❌ Error processing Amara Raja Energy & Mobility Limited | https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1179038_10072024070935_WEB.xml: 404 Client Error: Not Found for url: https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1179038_10072024070935_WEB.xml


Processing XBRL Files:  85%|███████████████████████████████████████████████▍        | 998/1179 [28:31<03:53,  1.29s/it]

❌ Error processing Mahindra & Mahindra Limited | https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1167401_29062024075519_WEB.xml: 404 Client Error: Not Found for url: https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1167401_29062024075519_WEB.xml


Processing XBRL Files:  89%|████████████████████████████████████████████████▋      | 1045/1179 [29:37<03:01,  1.36s/it]

❌ Error processing Mahindra & Mahindra Financial Services Limited | https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1162361_24062024101552_WEB.xml: 404 Client Error: Not Found for url: https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1162361_24062024101552_WEB.xml


Processing XBRL Files: 100%|███████████████████████████████████████████████████████| 1179/1179 [32:48<00:00,  1.67s/it]


❌ The following files failed:
GMR Power and Urban Infra Limited | https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1231919_24082024112726_WEB.xml
GE T&D India Limited | https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1223287_13082024045134_WEB.xml
Amara Raja Energy & Mobility Limited | https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1179038_10072024070935_WEB.xml
Mahindra & Mahindra Limited | https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1167401_29062024075519_WEB.xml
Mahindra & Mahindra Financial Services Limited | https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1162361_24062024101552_WEB.xml
✅ All done!





So there are the 6 cases where the XML file URLs are absent or broken. We will remove those from our DF.

In [26]:
# List of companies you want to filter
companies_to_filter = ["GMR Power and Urban Infra Limited", 
                       "GE T&D India Limited", 
                       "Amara Raja Energy & Mobility Limited", 
                       "Mahindra & Mahindra Limited", 
                       "Mahindra & Mahindra Financial Services Limited"
                      ]

# Filter rows where Company is in the list
filtered_df = nse_df[nse_df["Company"].isin(companies_to_filter)]

# Display filtered DataFrame
print("Matching rows:")
print(filtered_df)

Matching rows:
                                             Company  FromYear  ToYear  \
407                GMR Power and Urban Infra Limited      2023    2024   
518                             GE T&D India Limited      2023    2024   
872             Amara Raja Energy & Mobility Limited      2023    2024   
998                      Mahindra & Mahindra Limited      2023    2024   
1045  Mahindra & Mahindra Financial Services Limited      2023    2024   

                                                                                                   PDFURL  \
407                        https://nsearchives.nseindia.com/corporate/GPUIL_24082024232635_BRSRattach.pdf   
518                            https://nsearchives.nseindia.com/corporate/GETDIL1_13082024165119_BRSR.pdf   
872                    https://nsearchives.nseindia.com/corporate/AMARAJABAT_10072024190429_BRSR_2024.pdf   
998        https://nsearchives.nseindia.com/corporate/ferozebaria_29062024195438_AnnualReportwithBRSR.pd

In [27]:
# Remove rows where Company matches any value in the list
final_df = nse_df[~nse_df["Company"].isin(companies_to_filter)]

# Display updated DataFrame
print("✅ Rows deleted. Updated DataFrame:")
print(final_df)


✅ Rows deleted. Updated DataFrame:
                                                        Company  FromYear  \
0                                       Varun Beverages Limited      2024   
1                                         Castrol India Limited      2024   
2                                                Cyient Limited      2023   
3                                               Siemens Limited      2023   
4                                      Indraprastha Gas Limited      2023   
5                                  Religare Enterprises Limited      2023   
6                                       Future Consumer Limited      2023   
7                                  Jeena Sikho Lifecare Limited      2023   
8                                          BF Utilities Limited      2023   
9                         Network18 Media & Investments Limited      2023   
10                              Procter & Gamble Health Limited      2023   
11                                       

In [28]:
final_df.shape

(1174, 6)

In [29]:
final_df.head()

Unnamed: 0,Company,FromYear,ToYear,PDFURL,XBRLURL,SubmissionDate
0,Varun Beverages Limited,2024,2024,https://nsearchives.nseindia.com/corporate/VBL_11032025174358_VBLNoticeAR2024.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1394994_11032025054427_WEB.xml,2025-03-11
1,Castrol India Limited,2024,2024,https://nsearchives.nseindia.com/corporate/CASTROLINDIA_25022025195909_SE_Intimation_BRSRFY2024_25022025.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1389867_25022025075940_WEB.xml,2025-02-25
2,Cyient Limited,2023,2024,https://nsearchives.nseindia.com/corporate/CYIENT_19022025125655_BRSR2024.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1387421_19022025125706_WEB.xml,2025-02-19
3,Siemens Limited,2023,2024,https://nsearchives.nseindia.com/corporate/SIEMENS57_14012025173818_Siemens-Limited-BRSR-2024.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1345203_14012025053859_WEB.xml,2025-01-14
4,Indraprastha Gas Limited,2023,2024,https://nsearchives.nseindia.com/corporate/IGL1_17122024163450_BRSRIGL.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1328008_17122024043540_WEB.xml,2024-12-17


So we're left with 1181 - 2 (duplicate) - 5 (broken links) = 1174 rows

### Save final DF to Excel

In [30]:
final_df.to_excel("nse_data.xlsx", index=False)

# Add Sector Column to NSE Data

Let's add sectors for each company. I have added the symbols for the companies. I have also updated the names and Symbols of companies whose names have changed or mergers have happened, have been added.

In [49]:
import pandas as pd
import requests
import time

In [50]:
df = pd.read_excel('nse_data.xlsx')
df

Unnamed: 0,Company,Symbol,FromYear,ToYear,PDFURL,XBRLURL,SubmissionDate,Sector
0,Varun Beverages Limited,VBL,2024,2024,https://nsearchives.nseindia.com/corporate/VBL_11032025174358_VBLNoticeAR2024.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1394994_11032025054427_WEB.xml,2025-03-11,Fast Moving Consumer Goods
1,Castrol India Limited,CASTROLIND,2024,2024,https://nsearchives.nseindia.com/corporate/CASTROLINDIA_25022025195909_SE_Intimation_BRSRFY2024_25022025.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1389867_25022025075940_WEB.xml,2025-02-25,Oil Gas & Consumable Fuels
2,Cyient Limited,CYIENT,2023,2024,https://nsearchives.nseindia.com/corporate/CYIENT_19022025125655_BRSR2024.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1387421_19022025125706_WEB.xml,2025-02-19,Information Technology
3,Siemens Limited,SIEMENS,2023,2024,https://nsearchives.nseindia.com/corporate/SIEMENS57_14012025173818_Siemens-Limited-BRSR-2024.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1345203_14012025053859_WEB.xml,2025-01-14,Capital Goods
4,Indraprastha Gas Limited,IGL,2023,2024,https://nsearchives.nseindia.com/corporate/IGL1_17122024163450_BRSRIGL.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1328008_17122024043540_WEB.xml,2024-12-17,Oil Gas & Consumable Fuels
5,Religare Enterprises Limited,RELIGARE,2023,2024,https://nsearchives.nseindia.com/corporate/RELIGARE_07122024171106_RELIGAREAGMNOTICEAR.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1324332_07122024051536_WEB.xml,2024-12-07,Financial Services
6,Future Consumer Limited,FCONSUMER,2023,2024,https://nsearchives.nseindia.com/corporate/FCEL1_05122024180848_FCLBRSR31032024.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1323499_05122024061135_WEB.xml,2024-12-05,Consumer Services
7,Jeena Sikho Lifecare Limited,JSLL,2023,2024,https://nsearchives.nseindia.com/corporate/JEENASIKHO_29112024133437_SignedBRSRJeena2024.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1320946_29112024013446_WEB.xml,2024-11-29,Healthcare
8,BF Utilities Limited,BFUTILITIE,2023,2024,https://nsearchives.nseindia.com/corporate/BFUTILITIE_28112024123201_BRSR_BFUL_28112024.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1320378_28112024123226_WEB.xml,2024-11-28,Services
9,Network18 Media & Investments Limited,NETWORK18,2023,2024,https://nsearchives.nseindia.com/corporate/Shambhu_27112024210620_NW18BRSRfiling27112024.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1320324_27112024090720_WEB.xml,2024-11-27,Media Entertainment & Publication


### Get Sector from NSE website

In [36]:
class NSE:
    def __init__(self):
        # Define a single global header for all NSE requests
        self.headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
            'Accept-Encoding': 'gzip, deflate, br',
            'Accept-Language': 'en-US,en;q=0.9',
            'Cache-Control': 'max-age=0',
            'Referer': 'https://www.nseindia.com/',
            'Sec-Ch-Ua': '"Not_A Brand";v="8", "Chromium";v="120", "Google Chrome";v="120"',
            'Sec-Ch-Ua-Mobile': '?0',
            'Sec-Ch-Ua-Platform': '"Windows"',
            'Sec-Fetch-Dest': 'document',
            'Sec-Fetch-Mode': 'navigate',
            'Sec-Fetch-Site': 'same-origin',
            'Sec-Fetch-User': '?1',
            'Upgrade-Insecure-Requests': '1',
            'Connection': 'keep-alive'
        }
        self.session = self._create_session()
        self.initialize_session()

    def _create_session(self):
        session = requests.Session()
        session.headers.update(self.headers)
        return session

    def initialize_session(self, specific_symbol=None):
        """Initializes or refreshes the session cookies by visiting NSE."""
        max_retries = 3
        retry_delay = 2  # seconds
        
        for attempt in range(1, max_retries + 1):
            try:
                print(f"Initializing/Refreshing NSE session (attempt {attempt}/{max_retries})")
                
                # Reset session for fresh attempt if not the first try
                if attempt > 1:
                    print("Creating fresh session")
                    self.session = self._create_session()
                
                # Visit homepage to get cookies
                homepage_url = "https://www.nseindia.com/"
                response = self.session.get(homepage_url, timeout=15)
                response.raise_for_status()
                
                # Check if we received cookies
                if not self.session.cookies:
                    print(f"No cookies received from NSE homepage (attempt {attempt})")
                    time.sleep(retry_delay)
                    continue
                    
                #print(f"Successfully accessed NSE homepage. Cookies: {dict(self.session.cookies)}")
                
                # Small delay to ensure cookies are processed
                time.sleep(1)
                
                if specific_symbol:
                    # Visit quote page for specific symbol
                    quote_url = f"https://www.nseindia.com/get-quotes/equity?symbol={specific_symbol.upper()}"
                    print(f"Accessing quote page: {quote_url}")
                    
                    # Update referrer
                    self.session.headers.update({'Referer': homepage_url})
                    
                    response = self.session.get(quote_url, timeout=15)
                    response.raise_for_status()
                    print(f"Successfully accessed quote page for {specific_symbol}")
                    
                    # Additional delay after accessing the quote page
                    time.sleep(1)
                
                return True
                
            except requests.exceptions.RequestException as e:
                print(f"Error initializing NSE session (attempt {attempt}): {e}")
                
                if attempt < max_retries:
                    print(f"Retrying in {retry_delay} seconds...")
                    time.sleep(retry_delay)
                    retry_delay *= 2  # Exponential backoff
                else:
                    print("All NSE session initialization attempts failed")
                    return False
        
        return False

    def get_sector(self, symbol):
        
        # First make sure we have a valid session
        if not self.initialize_session(symbol):
            print(f"Failed to initialize NSE session for {symbol}, retrying once more")
            time.sleep(2)
            if not self.initialize_session(symbol):
                print(f"Failed to establish NSE session after multiple attempts")
                return pd.DataFrame()
        
        nse_url = f"https://www.nseindia.com/api/quote-equity?symbol={symbol}"
        try:
            # Set the proper referer
            self.session.headers.update({'Referer': f'https://www.nseindia.com/get-quotes/equity?symbol={symbol}'})
            
            response = self.session.get(nse_url, timeout=15)
            
            # If we get an error, try refreshing the session once
            if response.status_code != 200:
                print(f"Failed to get NSE data (status {response.status_code}), refreshing session")
                self.initialize_session(symbol)
                time.sleep(2)  # Short delay
                self.session.headers.update({'Referer': f'https://www.nseindia.com/get-quotes/equity?symbol={symbol}'})
                response = self.session.get(nse_url, timeout=15)
            
            if response.status_code == 200:
                response_json = response.json()
                data = response_json.get("industryInfo", [])
                x = data['sector']
            return x
        
        except ValueError as e:
            print(f"Error parsing NSE JSON response: {e}")
            raise HTTPException(status_code=500, detail=f"Error parsing NSE JSON response: {e}")


In [37]:
nse=NSE()
# Define a safe wrapper around the sector fetch
def fetch_sector(symbol):
    try:
        return nse.get_sector(symbol)
    except Exception:
        return ""  # Leave it empty if it fails

Initializing/Refreshing NSE session (attempt 1/3)


In [None]:
# Ensure 'Sector' column exists
if 'Sector' not in df.columns:
    df['Sector'] = ''

# Identify rows to process (i.e. where Sector is empty)
to_process = df[df['Sector'] == ''].copy()
batch_size = 100

def fetch_sector(symbol):
    try:
        return nse.get_sector(symbol)
    except Exception:
        return ''

# Process in batches
for start in range(0, len(to_process), batch_size):
    end = start + batch_size
    batch_indices = to_process.iloc[start:end].index
    symbols = df.loc[batch_indices, 'Symbol']

    print(f"Processing rows {start} to {end - 1}")

    # Apply the sector fetching function
    sectors = symbols.apply(fetch_sector)

    # Update the main dataframe
    df.loc[batch_indices, 'Sector'] = sectors

    # Save intermediate result to avoid progress loss
    df.to_excel('nse_data.xlsx', index=False)

    time.sleep(1)  # slight pause if needed to avoid overloading the API


### Further Cleaning

In [51]:
df

Unnamed: 0,Company,Symbol,FromYear,ToYear,PDFURL,XBRLURL,SubmissionDate,Sector
0,Varun Beverages Limited,VBL,2024,2024,https://nsearchives.nseindia.com/corporate/VBL_11032025174358_VBLNoticeAR2024.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1394994_11032025054427_WEB.xml,2025-03-11,Fast Moving Consumer Goods
1,Castrol India Limited,CASTROLIND,2024,2024,https://nsearchives.nseindia.com/corporate/CASTROLINDIA_25022025195909_SE_Intimation_BRSRFY2024_25022025.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1389867_25022025075940_WEB.xml,2025-02-25,Oil Gas & Consumable Fuels
2,Cyient Limited,CYIENT,2023,2024,https://nsearchives.nseindia.com/corporate/CYIENT_19022025125655_BRSR2024.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1387421_19022025125706_WEB.xml,2025-02-19,Information Technology
3,Siemens Limited,SIEMENS,2023,2024,https://nsearchives.nseindia.com/corporate/SIEMENS57_14012025173818_Siemens-Limited-BRSR-2024.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1345203_14012025053859_WEB.xml,2025-01-14,Capital Goods
4,Indraprastha Gas Limited,IGL,2023,2024,https://nsearchives.nseindia.com/corporate/IGL1_17122024163450_BRSRIGL.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1328008_17122024043540_WEB.xml,2024-12-17,Oil Gas & Consumable Fuels
5,Religare Enterprises Limited,RELIGARE,2023,2024,https://nsearchives.nseindia.com/corporate/RELIGARE_07122024171106_RELIGAREAGMNOTICEAR.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1324332_07122024051536_WEB.xml,2024-12-07,Financial Services
6,Future Consumer Limited,FCONSUMER,2023,2024,https://nsearchives.nseindia.com/corporate/FCEL1_05122024180848_FCLBRSR31032024.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1323499_05122024061135_WEB.xml,2024-12-05,Consumer Services
7,Jeena Sikho Lifecare Limited,JSLL,2023,2024,https://nsearchives.nseindia.com/corporate/JEENASIKHO_29112024133437_SignedBRSRJeena2024.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1320946_29112024013446_WEB.xml,2024-11-29,Healthcare
8,BF Utilities Limited,BFUTILITIE,2023,2024,https://nsearchives.nseindia.com/corporate/BFUTILITIE_28112024123201_BRSR_BFUL_28112024.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1320378_28112024123226_WEB.xml,2024-11-28,Services
9,Network18 Media & Investments Limited,NETWORK18,2023,2024,https://nsearchives.nseindia.com/corporate/Shambhu_27112024210620_NW18BRSRfiling27112024.pdf,https://nsearchives.nseindia.com/corporate/xbrl/BRSR_1320324_27112024090720_WEB.xml,2024-11-27,Media Entertainment & Publication


In [52]:
empty_counts = (df.fillna('').applymap(str).applymap(str.strip) == '').sum()
print(empty_counts)
df[df.fillna('').applymap(str).applymap(str.strip).eq('').any(axis=1)]

Company           0
Symbol            0
FromYear          0
ToYear            0
PDFURL            0
XBRLURL           0
SubmissionDate    0
Sector            0
dtype: int64


  empty_counts = (df.fillna('').applymap(str).applymap(str.strip) == '').sum()
  df[df.fillna('').applymap(str).applymap(str.strip).eq('').any(axis=1)]


Unnamed: 0,Company,Symbol,FromYear,ToYear,PDFURL,XBRLURL,SubmissionDate,Sector


In [53]:
df[df['Sector']=='']['Company']

Series([], Name: Company, dtype: object)

In [None]:
# Done manually the second time around

# update_sector(df, 'Infobeans Technologies Limited', 'Information Technology')
# update_sector(df, 'The Jammu & Kashmir Bank Limited', 'Financial Services')
# Suven Pharma name changed - updated it manually 

In [None]:
df.to_excel('nse_data.xlsx', index=False)

In [57]:
df['Company']


0                                         Varun Beverages Limited
1                                           Castrol India Limited
2                                                  Cyient Limited
3                                                 Siemens Limited
4                                        Indraprastha Gas Limited
5                                    Religare Enterprises Limited
6                                         Future Consumer Limited
7                                    Jeena Sikho Lifecare Limited
8                                            BF Utilities Limited
9                           Network18 Media & Investments Limited
10                                Procter & Gamble Health Limited
11                                         Gillette India Limited
12               Procter & Gamble Hygiene and Health Care Limited
13                          Zee Entertainment Enterprises Limited
14                                                    HMT Limited
15        