GitHub - thiiagoms/links-extractor: :hammer: extract all links from website

Extract links from urls 🗜️

Library that allows for the extraction of links from web pages

Dependencies ➕
Install 📦
Run 🏃
Bonus 🏅

Dependencies

Python 3.8+
Requests
BeautifulSoup

Install

01 -) Clone:

$ git clone https://github.com/thiiagoms/links-extractor

02 -) Go to links-extractor directory:

$ cd links-extractor
links-extractor $

Run

01 -) In your script.py call Extractor main class like:

from src.services.extractor import Extractor
from src.utils.printer import Printer

urls = ['https://github.com', 'https://google.com']
extractor = Extractor()
links = extractor.extract(urls, timeout=10)

for url, extracted_links in links.items():
    Printer.message(f"Url: {url}")
    for link in extracted_links:
        Printer.success(f" { link}")
    Printer.message("###############")

And you should receive this output:

$ python example.py

Url: https://github.com

  #start-of-content
  https://github.com/
  /signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F&source=header-home
  /features/actions
  /features/packages
  /features/security

###############

Url: https://google.com

  https://www.google.com/imghp?hl=pt-BR&tab=wi
  https://maps.google.com.br/maps?hl=pt-BR&tab=wl
  https://play.google.com/?hl=pt-BR&tab=w8

###############

Bonus

01 -) Run tests with pytest:

links-extractor $ pytest

02 -) Run autopep8 lint on files like:

links-extractor $  autopep8 --in-place --aggressive --aggressive src/services/extractor.py

Name	Name	Last commit message	Last commit date
Latest commit dependabot[bot] and thiiagoms build(deps): bump certifi from 2024.2.2 to 2024.7.4 Jul 6, 2024 e475000 · Jul 6, 2024 History 54 Commits
.github/workflows	.github/workflows	fix: fix python version in action file 💚	Jun 25, 2023
assets/img	assets/img	static: fix logo path 🍱	Jun 25, 2023
src	src	feat: create custom cli printer ✨	Jun 25, 2023
tests/unit	tests/unit	Update test_extractor.py	Jun 26, 2023
.gitignore	.gitignore	static: update gitignore 🙈	Jun 25, 2023
Dockerfile	Dockerfile	dockernize application 🐳	Apr 29, 2021
LICENSE	LICENSE	Add License 📚	May 3, 2021
Pipfile	Pipfile	build: add new dependencies ➕	Jun 25, 2023
Pipfile.lock	Pipfile.lock	build(deps): bump certifi from 2024.2.2 to 2024.7.4	Jul 6, 2024
README.md	README.md	fix: update README.md 📝	Jun 25, 2023
example.py	example.py	docs: add example about how to use this library 📚	Jun 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Extract links from urls 🗜️

Dependencies

Install

Run

Bonus

About

Releases

Packages

Contributors 2

Languages

License

thiiagoms/links-extractor

Folders and files

Latest commit

History

Repository files navigation

Extract links from urls 🗜️

Dependencies

Install

Run

Bonus

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages