Skip to content

virgodarth/image-spider

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Description

Extract and download images from websites by input keywords

Available website

  • unsplash_com

Requirements

  • python >= 3.7
  • Ubuntu >= 18.04

How to setup system

Install python

  1. Start by updating the packages list and installing the prerequisites
  • sudo apt update
  • sudo apt install build-essential zlib1g-dev libncurses5-dev libgdbm-dev libnss3-dev libssl-dev libsqlite3-dev libreadline-dev libffi-dev wget libbz2-dev
  • sudo apt install software-properties-common
  1. Add the deadsnakes PPA to your sources list (used to install python3.8 or later)
  • sudo add-apt-repository ppa:deadsnakes/ppa
  • When prompted press Enter to continue: Press [ENTER] to continue or Ctrl-c to cancel adding it.
  1. Once the repository is enabled, install python 3.8
  • sudo apt install python3.8

Set up enviroment

  1. Install python virtualenv
  • sudo apt install python3.8-dev python3.8-env
  1. Create new virtualenv
  • python3.8 -m venv your_folder_name
  1. Active enviroment
  • source your_folder_name/bin/active
  1. Deactive enviroment (if need)
  • deactivate

Install necessary python package

  • pip install -U pip wheel setuptools
  • pip install -r requirements/dev.txt

Run Code

  1. Move to workdir
  • cd ./spider_app
  1. Setup settings.py for scrapy
  • cp
  1. Show available spiders
  • crapy list
  1. choose and run spider
  • scrapy crawl -a tags=your_keywords_are_seperated_by_comma your_selected_spider
  • Ex: scrapy crawl -a tags=flower,friend,babay unsplash_spider

Default Config

  • Watch log file: tail -f -n 100 ./spider_app/logs
  • Download folder: ls ./spider_app/spider_app/download/
  • Total downloaded images: find ./spider_app/download -type f | wc -l

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages