Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Browser cache option #130

Merged
merged 2 commits into from
Feb 24, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.

## [Unreleased]

- Support `browserCache` for [crawler.queue()](https://github.com/yujiosaka/headless-chrome-crawler#crawlerqueueoptions)'s options.
- Support `depthPriority` option again.

## [1.3.4] - 2018-02-22

### changed
Expand Down
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,7 @@ browserWSEndpoint, ignoreHTTPSErrors
Also, the following options can be set as default values when [crawler.queue()](#crawlerqueueoptions) are executed.

```
url, allowedDomains, deniedDomains, timeout, priority, depthPriority, delay, retryCount, retryDelay, jQuery, device, username, password, evaluatePage
url, allowedDomains, deniedDomains, timeout, priority, depthPriority, delay, retryCount, retryDelay, jQuery, browserCache, device, username, password, evaluatePage
```

> **Note**: In practice, setting the options every time you queue equests is redundant. Therefore, it's recommended to set the default values and override them depending on the necessity.
Expand Down Expand Up @@ -222,7 +222,7 @@ ignoreHTTPSErrors, headless, executablePath, slowMo, args, ignoreDefaultArgs, ha
Also, the following options can be set as default values when [crawler.queue()](#crawlerqueueoptions) are executed.

```
url, allowedDomains, deniedDomains, timeout, priority, depthPriority, delay, retryCount, retryDelay, jQuery, device, username, password, evaluatePage
url, allowedDomains, deniedDomains, timeout, priority, depthPriority, delay, retryCount, retryDelay, jQuery, browserCache, device, username, password, evaluatePage
```

> **Note**: In practice, setting the options every time you queue the requests is redundant. Therefore, it's recommended to set the default values and override them depending on the necessity.
Expand Down Expand Up @@ -251,6 +251,7 @@ url, allowedDomains, deniedDomains, timeout, priority, depthPriority, delay, ret
* `retryCount` <[number]> Number of limit when retry fails, defaults to `3`.
* `retryDelay` <[number]> Number of milliseconds after each retry fails, defaults to `10000`.
* `jQuery` <[boolean]> Whether to automatically add [jQuery](https://jquery.com) tag to page, defaults to `true`.
* `browserCache` <[boolean]> Whether to enable browser cache for each request, defaults to `true`.
* `device` <[string]> Device to emulate. Available devices are listed [here](https://github.com/GoogleChrome/puppeteer/blob/master/DeviceDescriptors.js).
* `username` <[string]> Username for basic authentication. pass `null` if it's not necessary.
* `screenshot` <[Object]> Screenshot option, defaults to `null`. This option is passed to [Puppeteer's page.screenshot()](https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#pagescreenshotoptions). Pass `null` or leave default to disable screenshot.
Expand Down
10 changes: 10 additions & 0 deletions lib/crawler.js
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@ class Crawler {
this._preventNewTabs(),
this._authenticate(),
this._emulate(),
this._setCacheEnabled(),
this._setUserAgent(),
this._setExtraHeaders(),
this._handlePageEvents(),
Expand Down Expand Up @@ -118,6 +119,15 @@ class Crawler {
return this._page.emulate(devices[this._options.device]);
}

/**
* @return {!Promise}
* @private
*/
_setCacheEnabled() {
if (this._options.browserCache) return Promise.resolve();
return this._page.setCacheEnabled(false);
}

/**
* @return {!Promise}
* @private
Expand Down
1 change: 1 addition & 0 deletions lib/hccrawler.js
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,7 @@ class HCCrawler extends EventEmitter {
retryCount: 3,
retryDelay: 10000,
jQuery: true,
browserCache: true,
persistCache: false,
skipDuplicates: true,
depthPriority: true,
Expand Down