# **Hands-on Lab : Web Scraping**


Estimated time needed: **30 to 45** minutes


## Objectives


In this lab we will perform the following:


* Extract information from a given web site 
* Write the scraped data into a csv file.


## Extract information from the given web site
we will extract the data from the below web site: <br> 


In [1]:
#this url contains the data you need to scrape
url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/datasets/Programming_Languages.html"

The data we need to scrape is the **name of the programming language** and **average annual salary**.<br> It is a good idea to open the url in our web broswer and study the contents of the web page before we start to scrape.


Import the required libraries


In [2]:
from bs4 import BeautifulSoup # this module helps in web scrapping.
import requests

Download the webpage at the url


In [3]:
data  = requests.get(url).text

Create a soup object


In [4]:
soup = BeautifulSoup(data,"html.parser")

Scrape the `Language name` and `annual average salary`.


In [7]:
table = soup.find('table')
for row in table.find_all('tr'): # in html table row is represented by the tag <tr>
    # Get all columns in each row.
    cols = row.find_all('td') # in html a column is represented by the tag <td>
    language = cols[1].getText() # store the value in column 3 as color_name
    avg_salary = cols[3].getText() # store the value in column 4 as color_code
    print("{}--->{}".format(language, avg_salary))

Language--->Average Annual Salary
Python--->$114,383
Java--->$101,013
R--->$92,037
Javascript--->$110,981
Swift--->$130,801
C++--->$113,865
C#--->$88,726
PHP--->$84,727
SQL--->$84,793
Go--->$94,082


Save the scrapped data into a file named *popular-languages.csv*


In [9]:
import csv
table = soup.find('table')

# Open a CSV file for writing
with open('popular-languages.csv', mode='w', newline='', encoding='utf-8') as file:
    writer = csv.writer(file)
    
    # Write headers
    writer.writerow(['Language', 'Average Salary'])

    # Loop through rows and write each to the CSV
    for row in table.find_all('tr'):
        cols = row.find_all('td')
        if len(cols) >= 4:  # Make sure the row has enough columns
            language = cols[1].getText(strip=True)
            avg_salary = cols[3].getText(strip=True)
            writer.writerow([language, avg_salary])
