Skip to content
No description, website, or topics provided.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
build/lib/scraper
scraper
.DS_Store
.Scrape.py.swp
README.md
Scrape.py
setup.py
setup_env.sh

README.md

vis

Environment Setup

  1. Clone repository into directory of your choice and navigate to vis folder (hereto referenced as <vis_home>)
  2. Navigate to .../<vis_home>/ and run the command sudo sh setup_env.sh.
  3. Run the command in the CL python3 setup.py install

Usage

To begin utilizing this module, first do import scraper. This imports scraper.Session which will keep track of the scraping session that you are currently running and handle multi-threading internally.

In the Scraper module there are two important classes: Scraper and Action. As a rule of thumb, Scraper is the information source, while Action is an action that acts on the information present in a Scraper.

Scraper

This class is the source of truth for any Action that acts on the scraper. Note that this means the Scraper itself does not actually do anything; you submit actions to this Scraper, initialize the Actions, and then run the queue.

Action

This class's instances act on a Scraper. To run an action immediately, you can use Action.execute(self). To spawn an action that attaches to the queue and runs when the resources are available, use Action.run(self). Running the latter will attach an action method to the Session queue which will run when an available thread can handle it.

To create custom actions, extend the Action class and override the Action.get_act(self, scraper) method. This should return a higher-order function that will be run and act on the information stored in the Scraper. It is possible to chain together Actions by creating, in the higher-order function, sub-actions and using Action.execute(scraper) to immediately run the action, thereby stringing together functionality and consolidating it into a single Action.

Examples

import scraper
queue = scraper.Session.action_queue
get_action = scraper.Scraper.Get_Action()
scraper1 = scraper.Scraper.Scraper(site = 'https://www.google.com/', actions = [get_action])
queue.populate_queue()
queue.run()
You can’t perform that action at this time.