From 4a618099e810b9be6bff5af6e3c86e5c5da6a64d Mon Sep 17 00:00:00 2001 From: Sophia Antipenko Date: Wed, 23 Sep 2020 20:53:08 +0300 Subject: [PATCH 1/2] Update README.md --- README.md | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/README.md b/README.md index 9a6f9501..bcfef15b 100644 --- a/README.md +++ b/README.md @@ -202,6 +202,33 @@ Number, maximum amount of concurrent requests. Defaults to `Infinity`. #### plugins + +Plugins allow to extend scraper behaviour + +* [Existing plugins](#existing-plugins) +* [Create plugin](#create-plugin) +* Create action + * [beforeStart](#beforestart) + * [afterFinish](#afterfinish) + * [error](#error) + * [beforeRequest](#beforerequest) + * [afterResponse](#afterresponse) + * [onResourceSaved](#onresourcesaved) + * [onResourceError](#onresourceerror) + * [saveResource](#saveresource) + * [generateFilename](#generatefilename) + * [getReference](#getreference) + +##### Existing plugins +* [website-scraper-puppeteer](https://github.com/website-scraper/website-scraper-puppeteer) - download dynamic (rendered with js) websites using puppeteer +* [website-scraper-phantom](https://github.com/website-scraper/node-website-scraper-phantom) - download dynamic (rendered with js) websites using phantomJS +* [website-scraper-existing-directory](https://github.com/website-scraper/website-scraper-existing-directory) - save files to existing directory +* [request throttle](https://benjaminhorn.io/code/request-throttle-for-npm-package-website-scraper/) - add random timeout between requests, thanks @erickhavel + +##### Create plugin + +Note! Before creating new plugins consider using/extending [existing plugins](#existing-plugins). + Plugin is object with `.apply` method, can be used to change scraper behavior. `.apply` method takes one argument - `registerAction` function which allows to add handlers for different actions. Action handlers are functions that are called by scraper on different stages of downloading website. For example `generateFilename` is called to generate filename for resource based on its url, `onResourceError` is called when error occured during requesting/handling/saving resource. From 561246798bf229593fceca1b01ff0ffb85c4dc33 Mon Sep 17 00:00:00 2001 From: Sophia Antipenko Date: Wed, 23 Sep 2020 20:56:26 +0300 Subject: [PATCH 2/2] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index bcfef15b..10897792 100644 --- a/README.md +++ b/README.md @@ -223,7 +223,7 @@ Plugins allow to extend scraper behaviour * [website-scraper-puppeteer](https://github.com/website-scraper/website-scraper-puppeteer) - download dynamic (rendered with js) websites using puppeteer * [website-scraper-phantom](https://github.com/website-scraper/node-website-scraper-phantom) - download dynamic (rendered with js) websites using phantomJS * [website-scraper-existing-directory](https://github.com/website-scraper/website-scraper-existing-directory) - save files to existing directory -* [request throttle](https://benjaminhorn.io/code/request-throttle-for-npm-package-website-scraper/) - add random timeout between requests, thanks @erickhavel +* [request throttle](https://benjaminhorn.io/code/request-throttle-for-npm-package-website-scraper/) - add random timeout between requests ##### Create plugin