Skip to content

Latest commit

 

History

History
46 lines (41 loc) · 1.67 KB

README.md

File metadata and controls

46 lines (41 loc) · 1.67 KB

Classified Ads Scraper

Scrape ads from ClassifiedAds.com

Requirements

  • Python (>= 3.10)

Reproducing the environment

  • Clone the repository.
    git clone https://github.com/toludaree/classified-ads-scraper.git
  • Create a python virtual environment and activate it. You can use the venv package. Name the environment .venv.
    python -m venv .venv
    
    # Activate
    .venv/Scripts/activate     # Windows
    source .venv/bin/activate  # Linux
  • Install scrapy and other associated libraries through requirements.txt
    pip install -r requirements.txt

Scrape ClassifiedAds

  • Navigate into the classifiedads directory.
    cd classifiedads
  • Choose the category or subcategory you want to scrape from ClassifiedAds.com. Here is a screenshot of all the categories and subcategories categories
  • Begin the scrapy process using the scrapy crawl command.
    scrapy crawl ads -a name=<category> -O <file-path>
    
    # category - name of subcategory that you chose from the last section
    # file-path - path to save the results of the scraping process too. It can be a JSON, CSV or an XML file.
    • For example, we might want to scrape SUV ads and save the file to suv.json.
      scrapy crawl ads -a name="SUVs" -O suv.json
    • A screenshot of the crawling session in progress Crawling in progress
    • A screenshot of the results. You can get the JSON file here SUV