# **Taiwo Adeoye Dada**

# **Web Scraping Assignment: Largest Companies in Nigeria**

This project focuses on analyzing the largest companies in Nigeria by revenue and market capitalization, excluding banks. The extracted data can be leveraged for Exploratory Data Analysis (EDA) and Machine Learning (ML) to uncover valuable insights. By analyzing financial metrics such as revenue, profits, and market cap, the project aims to identify trends, correlations, and outliers within various industries. This data can support informed decision-making in market analysis, investment strategies, and predictive modeling for future business performance.
For the year 2024

In [37]:
# import necessary libraries
import pandas as pd
import requests
from bs4 import BeautifulSoup

In [39]:
url = 'https://en.wikipedia.org/wiki/List_of_largest_companies_in_Nigeria'
url

'https://en.wikipedia.org/wiki/List_of_largest_companies_in_Nigeria'

In [41]:
res = requests.get(url)
res.status_code

200

In [43]:
# parse the response text as HTML using BeautifulSoup|
soup = BeautifulSoup(res.text, 'html.parser')

## **Largest Companines in Nigeria - Based on Revenue**

### ***Scrape data for the Largest Companines in Nigeria based on revenue***
table tag: <table class="wikitable sortable jquery-tablesorter"

In [45]:
# get the revenue table
revenue_table = soup.find_all('table', class_="wikitable sortable")
revenue_table

[<table class="wikitable sortable">
 <tbody><tr>
 <th>Rank
 </th>
 <th>Company
 </th>
 <th>Industry
 </th>
 <th>Revenue<br/>(US$ millions)
 </th>
 <th>Profits<br/>(US$ millions)
 </th></tr>
 <tr>
 <td>1
 </td>
 <td><a class="mw-redirect" href="/wiki/Nigeria_National_Petroleum_Corporation" title="Nigeria National Petroleum Corporation">Nigeria National Petroleum</a>
 </td>
 <td>Oil and gas
 </td>
 <td>9,706
 </td>
 <td>1,877
 </td></tr>
 <tr>
 <td>2
 </td>
 <td><a class="mw-redirect" href="/wiki/Nigeria_Liquefied_Natural_Gas" title="Nigeria Liquefied Natural Gas">Nigeria Liquefied Natural Gas</a>
 </td>
 <td>Oil and gas
 </td>
 <td>6,315
 </td>
 <td>...
 </td></tr>
 <tr>
 <td>3
 </td>
 <td><a class="mw-redirect" href="/wiki/MTN_Nigeria" title="MTN Nigeria">MTN Nigeria</a>
 </td>
 <td>Telecommunications
 </td>
 <td>3,514
 </td>
 <td>536
 </td></tr>
 <tr>
 <td>4
 </td>
 <td><a href="/wiki/Dangote_Cement" title="Dangote Cement">Dangote Cement</a>
 </td>
 <td>Cement
 </td>
 <td>2,699
 </td>

In [47]:
# get the revenue headers with tags
rev_headers = revenue_table[0].find_all('th')
rev_headers

[<th>Rank
 </th>,
 <th>Company
 </th>,
 <th>Industry
 </th>,
 <th>Revenue<br/>(US$ millions)
 </th>,
 <th>Profits<br/>(US$ millions)
 </th>]

In [49]:
# get the revenue headers without tags
revenue_titles = []

for header in rev_headers:
    text = header.get_text().strip()
    revenue_titles.append(text)
revenue_titles

['Rank',
 'Company',
 'Industry',
 'Revenue(US$ millions)',
 'Profits(US$ millions)']

In [51]:
# extract rows (excluding the header row)
rows = revenue_table[0].find_all('tr')[1:]

# initialize an empty list to store the rows of data
revenue_data = []

# loop through each row and extract the columns
for row in rows:
    columns = row.find_all('td')
    
    # extract the text from each column and strip any trailing spaces
    row_data = [col.get_text().strip() for col in columns]
    
    # combine the header titles and row data
    revenue_data.append(dict(zip(revenue_titles, row_data)))

print(revenue_data)

[{'Rank': '1', 'Company': 'Nigeria National Petroleum', 'Industry': 'Oil and gas', 'Revenue(US$ millions)': '9,706', 'Profits(US$ millions)': '1,877'}, {'Rank': '2', 'Company': 'Nigeria Liquefied Natural Gas', 'Industry': 'Oil and gas', 'Revenue(US$ millions)': '6,315', 'Profits(US$ millions)': '...'}, {'Rank': '3', 'Company': 'MTN Nigeria', 'Industry': 'Telecommunications', 'Revenue(US$ millions)': '3,514', 'Profits(US$ millions)': '536'}, {'Rank': '4', 'Company': 'Dangote Cement', 'Industry': 'Cement', 'Revenue(US$ millions)': '2,699', 'Profits(US$ millions)': '721'}, {'Rank': '5', 'Company': 'Nigerian Petroleum Development', 'Industry': 'Oil and gas', 'Revenue(US$ millions)': '2,686', 'Profits(US$ millions)': '219'}, {'Rank': '6', 'Company': 'Flour Mills of Nigeria', 'Industry': 'Agroindustry', 'Revenue(US$ millions)': '2,014', 'Profits(US$ millions)': '67'}, {'Rank': '7', 'Company': 'Airtel Nigeria', 'Industry': 'Telecommunications', 'Revenue(US$ millions)': '1,503', 'Profits(US$ m

In [53]:
# store revenue data in a dataframe
rev_data = [{'Rank': '1', 'Company': 'Nigeria National Petroleum', 'Industry': 'Oil and gas', 'Revenue(US$ millions)': '9,706', 'Profits(US$ millions)': '1,877'}, {'Rank': '2', 'Company': 'Nigeria Liquefied Natural Gas', 'Industry': 'Oil and gas', 'Revenue(US$ millions)': '6,315', 'Profits(US$ millions)': '...'}, {'Rank': '3', 'Company': 'MTN Nigeria', 'Industry': 'Telecommunications', 'Revenue(US$ millions)': '3,514', 'Profits(US$ millions)': '536'}, {'Rank': '4', 'Company': 'Dangote Cement', 'Industry': 'Cement', 'Revenue(US$ millions)': '2,699', 'Profits(US$ millions)': '721'}, {'Rank': '5', 'Company': 'Nigerian Petroleum Development', 'Industry': 'Oil and gas', 'Revenue(US$ millions)': '2,686', 'Profits(US$ millions)': '219'}, {'Rank': '6', 'Company': 'Flour Mills of Nigeria', 'Industry': 'Agroindustry', 'Revenue(US$ millions)': '2,014', 'Profits(US$ millions)': '67'}, {'Rank': '7', 'Company': 'Airtel Nigeria', 'Industry': 'Telecommunications', 'Revenue(US$ millions)': '1,503', 'Profits(US$ millions)': '343'}, {'Rank': '8', 'Company': 'Nigerian Breweries', 'Industry': 'Agroindustry', 'Revenue(US$ millions)': '890', 'Profits(US$ millions)': '19'}, {'Rank': '9', 'Company': 'Jumia', 'Industry': 'Retail', 'Revenue(US$ millions)': '837', 'Profits(US$ millions)': '...'}, {'Rank': '10', 'Company': 'Nestle Nigeria', 'Industry': 'Agroindustry', 'Revenue(US$ millions)': '749', 'Profits(US$ millions)': '102'}, {'Rank': '11', 'Company': 'Krystal Digital Network Solutions', 'Industry': 'Infotech', 'Revenue(US$ millions)': '678', 'Profits(US$ millions)': '21'}, {'Rank': '12', 'Company': 'Julius Berger', 'Industry': 'Construction', 'Revenue(US$ millions)': '631', 'Profits(US$ millions)': '3'}, {'Rank': '13', 'Company': 'Nigerian Bottling Company', 'Industry': 'Agroindustry', 'Revenue(US$ millions)': '627', 'Profits(US$ millions)': '...'}, {'Rank': '14', 'Company': 'Lafarge Africa', 'Industry': 'Cement', 'Revenue(US$ millions)': '602', 'Profits(US$ millions)': '97'}, {'Rank': '15', 'Company': 'Dangote Sugar Refinery', 'Industry': 'Agroindustry', 'Revenue(US$ millions)': '559', 'Profits(US$ millions)': '78'}, {'Rank': '16', 'Company': 'BUA Cement', 'Industry': 'Cement', 'Revenue(US$ millions)': '547', 'Profits(US$ millions)': '184'}, {'Rank': '17', 'Company': 'TotalEnergies Nigeria', 'Industry': 'Oil and gas', 'Revenue(US$ millions)': '534', 'Profits(US$ millions)': '5'}, {'Rank': '18', 'Company': 'Seplat Petroleum Development', 'Industry': 'Oil and gas', 'Revenue(US$ millions)': '498', 'Profits(US$ millions)': '−80'}, {'Rank': '19', 'Company': 'Ardova Plc', 'Industry': 'Oil and gas', 'Revenue(US$ millions)': '474', 'Profits(US$ millions)': '5'}, {'Rank': '20', 'Company': '11PLC', 'Industry': 'Oil and gas', 'Revenue(US$ millions)': '428', 'Profits(US$ millions)': '16'}, {'Rank': '21', 'Company': 'International Breweries plc', 'Industry': 'Agroindustry', 'Revenue(US$ millions)': '357', 'Profits(US$ millions)': '−32'}, {'Rank': '22', 'Company': 'Conoil', 'Industry': 'Oil and gas', 'Revenue(US$ millions)': '307', 'Profits(US$ millions)': '...'}, {'Rank': '23', 'Company': 'Honeywell Flour Mill', 'Industry': 'Agroindustry', 'Revenue(US$ millions)': '286', 'Profits(US$ millions)': '3'}, {'Rank': '24', 'Company': 'PZ Cussons Nigeria', 'Industry': 'Consumer goods', 'Revenue(US$ millions)': '216', 'Profits(US$ millions)': '4'}, {'Rank': '25', 'Company': 'UAC of Nigeria', 'Industry': 'Conglomerate', 'Revenue(US$ millions)': '213', 'Profits(US$ millions)': '11'}]
large_rev_data = pd.DataFrame(rev_data)
large_rev_data.head()

Unnamed: 0,Rank,Company,Industry,Revenue(US$ millions),Profits(US$ millions)
0,1,Nigeria National Petroleum,Oil and gas,9706,1877
1,2,Nigeria Liquefied Natural Gas,Oil and gas,6315,...
2,3,MTN Nigeria,Telecommunications,3514,536
3,4,Dangote Cement,Cement,2699,721
4,5,Nigerian Petroleum Development,Oil and gas,2686,219


In [63]:
# convert to .csv and store in .csv file
large_rev_data.to_csv('largest_companies_rev_2024.csv', index=False)

## **Largest Companines in Nigeria - Based on Market Capitalization**

### ***Scrape data for the Largest Companines in Nigeria based on market capitalization***
table tag: <table class="wikitable sortable jquery-tablesorter"

In [55]:
# get the market capitalization headers with tags
mkt_cap_headers = revenue_table[1].find_all('th')
mkt_cap_headers

[<th>Rank
 </th>,
 <th>Company
 </th>,
 <th>Industry
 </th>,
 <th>Market cap<br/>(US$ millions)
 </th>]

In [57]:
# get the market capitalization headers without tags
mkt_cap_titles = []

for header in mkt_cap_headers:
    text = header.get_text().strip()
    mkt_cap_titles.append(text)
mkt_cap_titles

['Rank', 'Company', 'Industry', 'Market cap(US$ millions)']

In [59]:
# extract rows (excluding the header row)
rows = revenue_table[1].find_all('tr')[1:]

# initialize an empty list to store the rows of data
mkt_cap_data = []

# loop through each row and extract the columns
for row in rows:
    columns = row.find_all('td')
    
    # extract the text from each column and strip any trailing spaces
    row_data = [col.get_text().strip() for col in columns]
    
    # combine the header titles and row data
    mkt_cap_data.append(dict(zip(mkt_cap_titles, row_data)))

print(mkt_cap_data)

[{'Rank': '1', 'Company': 'Dangote Cement', 'Industry': 'Cement', 'Market cap(US$ millions)': '11,203'}, {'Rank': '2', 'Company': 'MTN Nigeria', 'Industry': 'Telecommunications', 'Market cap(US$ millions)': '10,471'}, {'Rank': '3', 'Company': 'Airtel Nigeria', 'Industry': 'Telecommunications', 'Market cap(US$ millions)': '6,903'}, {'Rank': '4', 'Company': 'BUA Cement', 'Industry': 'Cement', 'Market cap(US$ millions)': '5,759'}, {'Rank': '5', 'Company': 'Nestle Nigeria', 'Industry': 'Agroindustry', 'Market cap(US$ millions)': '2,658'}, {'Rank': '6', 'Company': 'BUA Foods', 'Industry': 'Agroindustry', 'Market cap(US$ millions)': '2,575'}, {'Rank': '7', 'Company': 'Zenith Bank', 'Industry': 'Banking', 'Market cap(US$ millions)': '1,691'}, {'Rank': '8', 'Company': 'Guaranty Trust Holding Company PLC', 'Industry': 'Finance', 'Market cap(US$ millions)': '1,585'}, {'Rank': '9', 'Company': 'First Bank of Nigeria', 'Industry': 'Banking', 'Market cap(US$ millions)': '1,070'}, {'Rank': '10', 'Com

In [61]:
mcap_data = [{'Rank': '1', 'Company': 'Dangote Cement', 'Industry': 'Cement', 'Market cap(US$ millions)': '11,203'}, {'Rank': '2', 'Company': 'MTN Nigeria', 'Industry': 'Telecommunications', 'Market cap(US$ millions)': '10,471'}, {'Rank': '3', 'Company': 'Airtel Nigeria', 'Industry': 'Telecommunications', 'Market cap(US$ millions)': '6,903'}, {'Rank': '4', 'Company': 'BUA Cement', 'Industry': 'Cement', 'Market cap(US$ millions)': '5,759'}, {'Rank': '5', 'Company': 'Nestle Nigeria', 'Industry': 'Agroindustry', 'Market cap(US$ millions)': '2,658'}, {'Rank': '6', 'Company': 'BUA Foods', 'Industry': 'Agroindustry', 'Market cap(US$ millions)': '2,575'}, {'Rank': '7', 'Company': 'Zenith Bank', 'Industry': 'Banking', 'Market cap(US$ millions)': '1,691'}, {'Rank': '8', 'Company': 'Guaranty Trust Holding Company PLC', 'Industry': 'Finance', 'Market cap(US$ millions)': '1,585'}, {'Rank': '9', 'Company': 'First Bank of Nigeria', 'Industry': 'Banking', 'Market cap(US$ millions)': '1,070'}, {'Rank': '10', 'Company': 'Stanbic IBTC Holdings', 'Industry': 'Finance', 'Market cap(US$ millions)': '1,064'}, {'Rank': '11', 'Company': 'Lafarge Africa', 'Industry': 'Cement', 'Market cap(US$ millions)': '918'}, {'Rank': '12', 'Company': 'Access Holdings', 'Industry': 'Finance', 'Market cap(US$ millions)': '833'}, {'Rank': '13', 'Company': 'Nigerian Breweries', 'Industry': 'Agroindustry', 'Market cap(US$ millions)': '890'}, {'Rank': '14', 'Company': 'United Bank for Africa', 'Industry': 'Finance', 'Market cap(US$ millions)': '633'}, {'Rank': '15', 'Company': 'Ecobank', 'Industry': 'Banking', 'Market cap(US$ millions)': '529'}, {'Rank': '16', 'Company': 'Dangote Sugar Refinery', 'Industry': 'Agroindustry', 'Market cap(US$ millions)': '467'}, {'Rank': '17', 'Company': 'Union Bank of Nigeria', 'Industry': 'Banking', 'Market cap(US$ millions)': '431'}, {'Rank': '18', 'Company': 'Guinness Nigeria', 'Industry': 'Consumer goods', 'Market cap(US$ millions)': '375'}, {'Rank': '19', 'Company': 'Okomu Oil Palm', 'Industry': 'Agroindustry', 'Market cap(US$ millions)': '343'}, {'Rank': '20', 'Company': 'Presco PLC', 'Industry': 'Agroindustry', 'Market cap(US$ millions)': '320'}]
large_mcap_data = pd.DataFrame(mcap_data)
large_mcap_data.head()

Unnamed: 0,Rank,Company,Industry,Market cap(US$ millions)
0,1,Dangote Cement,Cement,11203
1,2,MTN Nigeria,Telecommunications,10471
2,3,Airtel Nigeria,Telecommunications,6903
3,4,BUA Cement,Cement,5759
4,5,Nestle Nigeria,Agroindustry,2658


In [65]:
# convert to .csv and store in .csv file
large_mcap_data.to_csv('largest_companies_mcap_2024.csv', index=False)