Skip to content

thiiagoms/links-extractor

Folders and files

NameName
Last commit message
Last commit date
Jun 25, 2023
Jun 25, 2023
Jun 25, 2023
Jun 26, 2023
Jun 25, 2023
Apr 29, 2021
May 3, 2021
Jun 25, 2023
Jul 6, 2024
Jun 25, 2023
Jun 25, 2023

Repository files navigation

Logo

Extract links from urls πŸ—œοΈ

Python

Library that allows for the extraction of links from web pages

Dependencies

  • Python 3.8+
  • Requests
  • BeautifulSoup

Install

01 -) Clone:

$ git clone https://github.com/thiiagoms/links-extractor

02 -) Go to links-extractor directory:

$ cd links-extractor
links-extractor $

Run

01 -) In your script.py call Extractor main class like:

from src.services.extractor import Extractor
from src.utils.printer import Printer

urls = ['https://github.com', 'https://google.com']
extractor = Extractor()
links = extractor.extract(urls, timeout=10)

for url, extracted_links in links.items():
    Printer.message(f"Url: {url}")
    for link in extracted_links:
        Printer.success(f" { link}")
    Printer.message("###############")

And you should receive this output:

$ python example.py

Url: https://github.com

  #start-of-content
  https://github.com/
  /signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F&source=header-home
  /features/actions
  /features/packages
  /features/security

###############

Url: https://google.com

  https://www.google.com/imghp?hl=pt-BR&tab=wi
  https://maps.google.com.br/maps?hl=pt-BR&tab=wl
  https://play.google.com/?hl=pt-BR&tab=w8

###############

Bonus

01 -) Run tests with pytest:

links-extractor $ pytest

02 -) Run autopep8 lint on files like:

links-extractor $  autopep8 --in-place --aggressive --aggressive src/services/extractor.py