## Scraping LeMonde

**Scrape the content of [LeMonde website](https://www.lemonde.fr/) and save it as a CSV. We want: titles, subhead, article URL, whether it's premium or not, byline, article type, image URL.**

*Bonus, if you want to get fancy: Make the CSV file auto-updating.*

In [6]:
import pandas as pd
from bs4 import BeautifulSoup
import requests

response = requests.get("https://www.lemonde.fr/en/")
doc = BeautifulSoup(response.text)

In [7]:
articles = doc.find_all(class_ = 'article')
print(len(articles))

53


In [8]:
rows = []

for article in articles:
    row = {}

    row['headline'] = article.find(class_ = 'article__title').text
    
    try:
        row['link'] = article.find('a').get('href')
    except:
        row['link'] = article.get('href')
    
    try:
        row['premium'] = article.find(class_ = 'sr-only').text
    except:
        row['premium'] = None
    
    try:
        row['subhead'] = article.find(class_ = 'article__desc').text
    except:
        row['subhead'] = None

    try:
        row['byline'] = article.find(class_ = 'article__byline').text
    except:
        row['byline'] = None

    try:
        row['article_type'] = article.find(class_ = 'article__type').text
    except:
        row['article_type'] = None
    
    img = article.find("img")
    if img:
        row['image'] = (
            img.get("data-lazy")
            or img.get("data-src")
            or img.get("src")
            )
    else:
        row['image'] = None
        

    rows.append(row)
len(rows)


53

In [15]:
df = pd.DataFrame(rows)
df.head()

Unnamed: 0,headline,link,premium,subhead,byline,article_type,image
0,"The Islamic Republic of Iran, Netanyahu's favo...",https://www.lemonde.fr/en/international/articl...,Subscribers only,,,,https://img.lemde.fr/2025/06/13/0/0/3637/2424/...
1,Israel launches massive lightning attack on Iran,https://www.lemonde.fr/en/international/articl...,Subscribers only,,,,https://img.lemde.fr/2025/06/13/0/0/8640/5760/...
2,"As Israel strikes Iran, US pointedly steps bac...",https://www.lemonde.fr/en/international/articl...,Subscribers only,,,,https://img.lemde.fr/2025/06/13/142/0/2700/180...
3,Federal judge rules Trump's deployment of Nati...,https://www.lemonde.fr/en/international/articl...,Subscribers only,,,,https://img.lemde.fr/2025/06/13/0/0/8256/5504/...
4,Criticism is mounting in the Netherlands again...,https://www.lemonde.fr/en/international/articl...,Subscribers only,,,,https://img.lemde.fr/2025/06/03/0/0/5899/3933/...


In [13]:
df.to_csv('le_monde.csv', index=False)