web-crawler

A web crawler that seeks for urls. Currently it doesn't serve any purpose apart from collecting different urls it finds across the web.

Getting Started

Prerequisites

Python >= 3
pipenv

OR

Docker Compose

Installing

Navigate to the project root and run the following command:

ROOT_URL=insert_root_url_here docker-compose up --build --scale web-crawler=2

This starts the web-crawler-scheduler and two web-crawlers. The scale number can be anything, but keep in mind to not overload the target server.

OR

Open two seperate terminals and navigate to both web-crawler-scheduler and web-crawler directories, where in both run first the following command:

pipenv install

and then in the web-crawler-scheduler directory run

pipenv run python main.py insert_root_url_here

and in the web-crawler directory run

pipenv run python main.py

All urls that the crawler finds will be stored into /web-crawler-scheduler/data/data.txt

Disclaimer

Please notice that this project was created just to practise network programming with Python. If you choose to test this app, be sure to not overload the servers you are targeting. Don't start too many crawlers at once and don't remove the time.sleep(1) that slows down the loop in WebCrawler.py file. I'm not liable for any misuse of this application.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
web-crawler-scheduler		web-crawler-scheduler
web-crawler		web-crawler
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

web-crawler

Getting Started

Prerequisites

Installing

Disclaimer

About

Releases

Packages

Languages

License

woltsu/web-crawler

Folders and files

Latest commit

History

Repository files navigation

web-crawler

Getting Started

Prerequisites

Installing

Disclaimer

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages