Skip to content
Sohom Datta edited this page May 22, 2024 · 2 revisions

The VisibleV8 Crawler is a framework that makes large-scale crawling of URLs with VisibleV8 much easier.

Quick start

  • Clone this repository
git clone ssh://git@github.com:/rekap-ncsu/vv8-crawler-slim.git
  • Setup the crawler locally
pip install -r ./scripts/requirements.txt
python ./scripts/vv8-cli.py setup
  • Crawl a webpage with a postprocessor
python3 ./scripts/vv8-cli.py crawl -u 'https://google.com' -pp 'Mfeatures'

Quick troubleshooting

ModuleNotFoundError: No module named 'requests'

You may not have set up your environment correctly. If your operating system manages your python environment, you could set up a virtual environment using the following commands:

python3 -m venv env
source ./env/bin/activate

cannot connect to the docker daemon at unix ///var/run/docker.sock

The VV8 crawler expects your user to be able to use docker without sudo. You can follow the instructions here to setup docker to run without sudo.

File "./scripts/vv8-cli.py", line 39
   match opts.mode:
         ^
SyntaxError: invalid syntax

The VV8 crawler only works with Python versions 3.10 and above.

Clone this wiki locally