Skip to content

Latest commit

 

History

History
183 lines (116 loc) · 6.55 KB

options_callbacks.rst

File metadata and controls

183 lines (116 loc) · 6.55 KB

Callbacks

# callbacks_example.py

from nyawc.Options import Options
from nyawc.Crawler import Crawler
from nyawc.CrawlerActions import CrawlerActions
from nyawc.http.Request import Request

def cb_crawler_before_start():
    print("Started crawling")

def cb_crawler_after_finish(queue):
    print("Finished crawling")

def cb_request_before_start(queue, queue_item):
    print("Making request: {}".format(queue_item.request.url))
    return CrawlerActions.DO_CONTINUE_CRAWLING

def cb_request_after_finish(queue, queue_item, new_queue_items):
    print("Finished request: {}".format(queue_item.request.url))
    return CrawlerActions.DO_CONTINUE_CRAWLING

def cb_form_before_autofill(queue_item, elements, form_data):
    return CrawlerActions.DO_AUTOFILL_FORM

def cb_form_after_autofill(queue_item, elements, form_data):
    pass

options = Options()

options.callbacks.crawler_before_start = cb_crawler_before_start
options.callbacks.crawler_after_finish = cb_crawler_after_finish
options.callbacks.request_before_start = cb_request_before_start
options.callbacks.request_after_finish = cb_request_after_finish
options.callbacks.form_before_autofill = cb_form_before_autofill
options.callbacks.form_after_autofill = cb_form_after_autofill

crawler = Crawler(options)
crawler.start_with(Request("https://finnwea.com/"))

Can be used to run some code before the crawler starts crawling. It does not receive any arguments.

...

def cb_crawler_before_start():
    print("Started crawling")

options.callbacks.crawler_before_start = cb_crawler_before_start

...

Can be used to run some code after the crawler finished crawling. It receives one argument, :class:`nyawc.Queue`.

...

def cb_crawler_after_finish(queue):
    # Print the amount of request/response pairs that were found.
    print("Crawler finished, found " + str(queue.count_total) + " requests.")

    # Iterate over all request/response pairs that were found.
    for queue_item in queue.get_all():
        print("Request method {}".format(queue_item.request.method))
        print("Request URL {}".format(queue_item.request.url))
        print("Request POST data {}".format(queue_item.request.data))
        # print("Response body {}".format(queue_item.response.text))

options.callbacks.crawler_after_finish = cb_crawler_after_finish

...

Can be used to run some code after the request started executing. It receives two arguments, :class:`nyawc.Queue`, which contains all the items currently in the queue (also finished items) and :class:`nyawc.QueueItem`, which is the item (request/response pair) in the queue that will now be executed.

  • By returning CrawlerActions.DO_SKIP_TO_NEXT, this queue_item (request/response pair) will be skipped.
  • By returning CrawlerActions.DO_STOP_CRAWLING, the crawler will stop crawling entirely.
  • When returning CrawlerActions.DO_CONTINUE_CRAWLING, the crawler will continue like normally.
...

def cb_request_before_start(queue, queue_item):
    # return CrawlerActions.DO_SKIP_TO_NEXT
    # return CrawlerActions.DO_STOP_CRAWLING

    return CrawlerActions.DO_CONTINUE_CRAWLING

options.callbacks.request_before_start = cb_request_before_start

...

Can be used to run some code after the request finished executing. It receives three arguments, :class:`nyawc.Queue`, which contains all the items currently in the queue (also finished items), :class:`nyawc.QueueItem`, which is the item (request/response pair) in the queue that will now be executed and new_queue_items (array of :class:`nyawc.QueueItem`), which contains the request/response pairs that were found during this request.

  • By returning CrawlerActions.DO_STOP_CRAWLING, the crawler will stop crawling entirely.
  • When returning CrawlerActions.DO_CONTINUE_CRAWLING, the crawler will continue like normally.
...

def cb_request_after_finish(queue, queue_item, new_queue_items):
    percentage = str(int(queue.get_progress()))
    total_requests = str(queue.count_total))

    print("At " + percentage + "% of " + total_requests + " requests ([" + str(queue_item.response.status_code) + "] " + queue_item.request.url + ").")

    # return CrawlerActions.DO_STOP_CRAWLING
    return CrawlerActions.DO_CONTINUE_CRAWLING

options.callbacks.request_after_finish = cb_request_after_finish

...

Can be used to run some code before automatically filling in a form. It receives three arguments, :class:`nyawc.Queue`, which contains all the items currently in the queue (also finished items), elements, which is an array of BeautifulSoup4 input elements found in the form and form_data, which is the (editable) form data that will be used in the request.

  • By returning CrawlerActions.DO_AUTOFILL_FORM, the form will be filled with random data.
  • By returning CrawlerActions.DO_NOT_AUTOFILL_FORM, only default input values will be used.
...

def cb_form_before_autofill(queue_item, elements, form_data):
    # return CrawlerActions.DO_NOT_AUTOFILL_FORM

    return CrawlerActions.DO_AUTOFILL_FORM

options.callbacks.form_before_autofill = cb_form_before_autofill

...

Can be used to run some code after the crawler automatically filled in a form. It receives three arguments, :class:`nyawc.Queue`, which contains all the items currently in the queue (also finished items), elements, which is an array of BeautifulSoup4 input elements found in the form and form_data, which is the (editable) form data that will be used in the request.

Please note that this callback will not be called if CrawlerActions.DO_NOT_AUTOFILL_FORM was returned in the before callback.

...

def cb_form_after_autofill(queue_item, elements, form_data):
    pass

options.callbacks.form_after_autofill = cb_form_after_autofill

...