## Web Scraping with Python - Beautiful Soup Crash Course
```
⭐️ Course Contents ⭐️
Local HTML Scraping:
⌨️ (00:00) Basic HTML Structure, HTML Tags Explanation
⌨️ (05:35) Packages Installation
⌨️ (07:23) Scraping Usage, Local files
⌨️ (12:41) Beautiful Soup find & find_all() methods
⌨️ (16:22) Web Browser Inspect tool
⌨️ (18:30) Grab all Prices, Basic Scraping Project

Website Scraping:
⌨️ (24:48) Using the Requests Library to see a Website's HTML  
⌨️ (30:10) Scraping a Production Website, Best practices for pulling info
⌨️ (44:05) Looping through similar soup.find_all() objects

Features addition:
⌨️ (48:26) Prettifying the Jobs paragraph
⌨️ (54:05) Jobs Filtration by owned skills
⌨️ (57:45) Setting up the Project to scrape every 10 minutes
⌨️ (1:01:53) Storing the jobs paragraph in text files
````

source: https://www.youtube.com/watch?v=XVv6mJpFOb0&t=335s

In [9]:
#!pip install lxml

In [3]:
import requests
from bs4 import BeautifulSoup

In [15]:
with open("home.html", 'r') as html_file:
    content = html_file.read()
    print(content)

<!doctype html>
<html lang="en">
   <head>
      <meta charset="utf-8">
      <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
      <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.5.2/css/bootstrap.min.css" integrity="sha384-JcKb8q3iqJ61gNV9KGb8thSsNjpSL0n8PARn9HuZOnIxN0hoP+VmmDGMN5t9UJ0Z" crossorigin="anonymous">
      <title>My Courses</title>
   </head>
   <body>
      <h1>Hello, Start Learning!</h1>
      <div class="card" id="card-python-for-beginners">
         <div class="card-header">
            Python
         </div>
         <div class="card-body">
            <h5 class="card-title">Python for beginners</h5>
            <p class="card-text">If you are new to Python, this is the course that you should buy!</p>
            <a href="#" class="btn btn-primary">Start for 20$</a>
         </div>
      </div>
      <div class="card" id="card-python-web-development">
         <div class="card-header">
            Pyt

In [17]:
with open("home.html", 'r') as html_file:
    content = html_file.read()
    #print(content)
    
    soup = BeautifulSoup(content)
    print(soup.prettify())

<!DOCTYPE html>
<html lang="en">
 <head>
  <meta charset="utf-8"/>
  <meta content="width=device-width, initial-scale=1, shrink-to-fit=no" name="viewport"/>
  <link crossorigin="anonymous" href="https://stackpath.bootstrapcdn.com/bootstrap/4.5.2/css/bootstrap.min.css" integrity="sha384-JcKb8q3iqJ61gNV9KGb8thSsNjpSL0n8PARn9HuZOnIxN0hoP+VmmDGMN5t9UJ0Z" rel="stylesheet"/>
  <title>
   My Courses
  </title>
 </head>
 <body>
  <h1>
   Hello, Start Learning!
  </h1>
  <div class="card" id="card-python-for-beginners">
   <div class="card-header">
    Python
   </div>
   <div class="card-body">
    <h5 class="card-title">
     Python for beginners
    </h5>
    <p class="card-text">
     If you are new to Python, this is the course that you should buy!
    </p>
    <a class="btn btn-primary" href="#">
     Start for 20$
    </a>
   </div>
  </div>
  <div class="card" id="card-python-web-development">
   <div class="card-header">
    Python
   </div>
   <div class="card-body">
    <h5 class="ca

In [21]:
with open("home.html", 'r') as html_file:
    content = html_file.read()
    soup = BeautifulSoup(content)

In [27]:
tags = soup.find('h5')
print(tags)

<h5 class="card-title">Python for beginners</h5>


In [28]:
courses_html_tags = soup.find_all('h5')
print(courses_html_tags)

[<h5 class="card-title">Python for beginners</h5>, <h5 class="card-title">Python Web Development</h5>, <h5 class="card-title">Python Machine Learning</h5>]


In [30]:
for course in courses_html_tags:
    print(course.text)

Python for beginners
Python Web Development
Python Machine Learning


In [43]:
course_cards = soup.find_all('div', class_='card')
#course_cards
for course in course_cards:
    #print(course.h5)
    course_name=course.h5.text
    course_price = course.a.text.split()[-1]
    
    #print(course_name)
    #print(course_price)
    print(f'{course_name}: costs {course_price}')

Python for beginners: costs 20$
Python Web Development: costs 50$
Python Machine Learning: costs 100$


## Scraping a job advertising website

In [44]:
import requests
from bs4 import BeautifulSoup

In [68]:
url = "https://www.jobijoba.com/fr/query/?what=python"

res = requests.get(url, 'lxml')
res

<Response [200]>

In [69]:
res.encoding

'UTF-8'

In [70]:
#res.encoding = "utf-8"

In [64]:
print(res.text)

<!DOCTYPE html>
<html lang="fr-FR">
<head>
    <meta http-equiv="content-type" content="text/html; charset=utf-8"/>
    <link rel="shortcut icon" type="image/x-icon" href="/build/images/favicon.ico"/>
    <link rel="apple-touch-icon" sizes="192x192" href="/build/images/icons/icon-192x192.png"/>
    <title>Emploi python - Mars 2023 - Jobijoba</title>
    <meta name="description" content="Trouvez votre emploi python parmi les 24574 offres proposées par Jobijoba ➤ CDI, CDD, Stages ☑ Alertes personnalisées par mail"/>
    <link rel="canonical" href="https://www.jobijoba.com/fr/query/"/>
    <meta name="referrer" content="unsafe-url"/>
    <meta name="theme-color" content="#ffffff">
    <meta name="viewport"
          content="width=device-width, height=device-height, initial-scale=1.0, maximum-scale=1.0, user-scalable=1"/>
    <meta name="robots" content="noindex, nofollow">
    <meta name="google" content="nositelinkssearchbox"/>
    <link rel="preconnect" href="https://jobijoba.imgix.net

In [66]:
soup = BeautifulSoup(res.text)
jobs = soup.find_all('div', class_="offer")
print(jobs)

[<div class="offer" data-id="ad_7dcf881f6b46fb78f7c5d8d938e0c384">
<div class="actions float-right">
<span class="icon-save-ad icon-heart-empty js-save-ad" title="Sauvegarder l'offre"></span>
</div>
<a class="offer-link" href="https://www.jobijoba.com/fr/redirect/offer/499/7dcf881f6b46fb78f7c5d8d938e0c384" onclick='dataLayer.push({"event":"productClick","ecommerce":{"click":{"actionField":{"list":"search_results_index"},"products":[{"name":"D\u00e9veloppeur Python S&amp;R (H\/F)","id":"7dcf881f6b46fb78f7c5d8d938e0c384","price":0,"brand":"EURO-INFORMATION PRODUCTION","category":"Informatique","variant":"emploi_payant_sponso","rhw":1348383.44,"chw":1341285,"position":1,"dimension24":"indeed.fr_masterfeed"}]}}});' rel="nofollow" target="_blank">
<div class="offer-header">
<h3 class="offer-header-title">
                Développeur Python S&amp;R (H/F)
            </h3>
<div class="offer-features">
<span class="feature">
<span class="iconwrap">
<span class="icon-map-marker"></span>
</span>

In [67]:
job = soup.find('div', class_="offer")
print(job)

<div class="offer" data-id="ad_7dcf881f6b46fb78f7c5d8d938e0c384">
<div class="actions float-right">
<span class="icon-save-ad icon-heart-empty js-save-ad" title="Sauvegarder l'offre"></span>
</div>
<a class="offer-link" href="https://www.jobijoba.com/fr/redirect/offer/499/7dcf881f6b46fb78f7c5d8d938e0c384" onclick='dataLayer.push({"event":"productClick","ecommerce":{"click":{"actionField":{"list":"search_results_index"},"products":[{"name":"D\u00e9veloppeur Python S&amp;R (H\/F)","id":"7dcf881f6b46fb78f7c5d8d938e0c384","price":0,"brand":"EURO-INFORMATION PRODUCTION","category":"Informatique","variant":"emploi_payant_sponso","rhw":1348383.44,"chw":1341285,"position":1,"dimension24":"indeed.fr_masterfeed"}]}}});' rel="nofollow" target="_blank">
<div class="offer-header">
<h3 class="offer-header-title">
                Développeur Python S&amp;R (H/F)
            </h3>
<div class="offer-features">
<span class="feature">
<span class="iconwrap">
<span class="icon-map-marker"></span>
</span>


In [71]:
soup = BeautifulSoup(res.text)
job = soup.find('div', class_="offer")

company_name = job.find("h3")