-
Notifications
You must be signed in to change notification settings - Fork 4
Home
Sohom Datta edited this page May 22, 2024
·
2 revisions
The VisibleV8 Crawler is a framework that makes large-scale crawling of URLs with VisibleV8 much easier.
- Clone this repository
git clone ssh://git@github.com:/rekap-ncsu/vv8-crawler-slim.git
- Setup the crawler locally
pip install -r ./scripts/requirements.txt
python ./scripts/vv8-cli.py setup
- Crawl a webpage with a postprocessor
python3 ./scripts/vv8-cli.py crawl -u 'https://google.com' -pp 'Mfeatures'
ModuleNotFoundError: No module named 'requests'
You may not have set up your environment correctly. If your operating system manages your python environment, you could set up a virtual environment using the following commands:
python3 -m venv env
source ./env/bin/activate
cannot connect to the docker daemon at unix ///var/run/docker.sock
The VV8 crawler expects your user to be able to use docker without sudo. You can follow the instructions here to setup docker to run without sudo.
File "./scripts/vv8-cli.py", line 39 match opts.mode: ^ SyntaxError: invalid syntax
The VV8 crawler only works with Python versions 3.10 and above.