Skip to content

Latest commit

 

History

History
706 lines (440 loc) · 25.7 KB

File metadata and controls

706 lines (440 loc) · 25.7 KB

automation-extra-interception-proxy

Simple way to play with a site requests and responses.

Spiritual heir of puppeteer-page-proxy. Just the same behavior but in more extend way with promises.

Using a proxy is optional.

Supported proxy(through proxy-agent):

  • http://proxy-server-over-tcp.com:3128
  • https://proxy-server-over-tls.com:3129
  • socks://username:password@some-socks-proxy.com:9050 (username & password are optional)
  • socks5://username:password@some-socks-proxy.com:9050 (username & password are optional)
  • socks4://some-socks-proxy.com:9050
  • pac+http://www.example.com/proxy.pac

Tested in puppeteer/chromium only!

Table of Contents

Known issues in managed mode

  • Grease chips are missing.
  • Some headers are missing.
  • [CORS] OPTIONS requests (preflight requests) are missing before the actual request will be executed.
  • [CORS] Headers are close to being correct, but they're not.
  • WebSockets will be handled by the browser (IP leak may occur if you are using a proxy in the package but not in Puppeteer itself).
  • Optimization can be bad on high load.

Installing

Using npm

npm install automation-extra-interception-proxy

Using yarn

yarn add automation-extra-interception-proxy

Why?

This package solves next problems:

1. Traffic sniffing

Time to time required to reach information from the browser request. By default you can reach easily only to headers information. If you want to just read all responses you also can do that but time to time it will throw errors by one of next reasons.

At first page can be already closed and then your code will throw an error.

At second some sites using service workers for requesting some information. Unfortunately you cant handle this situation without manual requesting and then converting to puppeteer.

2. Data manipulation

If you want just adjust some requests or responses you should do that manually.

Example. You want get original request/response and do some adjustments. This package will help do that easily. You just getting what you want by single function call.

3. Set proxy

Yes, puppeteer already have a proxy support throw additional process arguments. But you should manually maintain proxy credentials each request(?, not sure). Also you cant use socks proxy(?, not sure).

4. Asynchronous decisions for requests

Even with cooperative mode you can not make your decisions asynchronously. Here you can chain of handlers with will proceed request decision one by one. Also you can say that this is latest decision and no need to ask another handlers in the chain. Also in one handler you can can adjust request/response for the next one.

Motivation

We live in the world where almost each website have internal api. When you are looking at the network tab in Chrome DevTools its easy to handle where and what. Data already yours but you cant just get what you want. But you have to fight for the information you desired for. So lets fight together!

API

Table of Contents

wrapPage

src/index.ts:22-24

Add interception ability to the page (sample)

Parameters

  • page Puppeteer.Page Page for future interceptions
  • config IConfig?

Returns Promise<InterceptionProxyPageConfig>

IConfig

src/interfaces/base.ts:41-132

Plugin configuration object

cooperativePriority

src/interfaces/base.ts:51-51

Puppeteer' "Cooperative Intercept Mode" priority

This package using own way to manage cooperation

Use only if you know what it does

[Read more]

Type: (undefined | number)

requestMode

src/interfaces/base.ts:61-61

ignore - Plugin will do nothing about original request

native - Plugin will just listen to the original request/response data and all requests will fulfilled by puppeteer itself. But some plugin functionality can be unavailable.

managed - Plugin will do all requests by requestHandlers or by himself. All plugin features will be available.

Default - managed

Type: RequestMode

proxy

src/interfaces/base.ts:77-77

Proxy for request

Automatically sets agent property using proxy-agent

Examples:

  • http://proxy-server-over-tcp.com:3128
  • https://proxy-server-over-tls.com:3129
  • socks://username:password@some-socks-proxy.com:9050 (username & password are optional)
  • socks5://username:password@some-socks-proxy.com:9050 (username & password are optional)
  • socks4://some-socks-proxy.com:9050
  • pac+http://www.example.com/proxy.pac

Default null

Type: (string | null)

agent

src/interfaces/base.ts:88-88

Your agent hot handling requests

Sets by proxy property. Cleans proxy property if sets directly.

Default null

Type: (Agent | null)

Meta

  • deprecated: Use proxy property instead. Deprecated because of possibly incoming request handling rework.

logger

src/interfaces/base.ts:92-92

You can handle all plugins messages

Type: any

timeout

src/interfaces/base.ts:96-96

Request timeout in milliseconds(actual execution only)

Type: number

nativeContinueIfPossible

src/interfaces/base.ts:103-103

If you didn't changed request or response, let puppeteer handle this request by himself

Default: false

Type: boolean

ignoreResponseBodyIfPossible

src/interfaces/base.ts:113-113

If you did not use the plugin' response object it will not retrieve response from puppeteer for better performance

Applies for native mode only

Type: boolean

enableLegacyCookieHandling

src/interfaces/base.ts:121-121

For old versions of puppeteer, plugin should handle cookies by himself.

Enable this option, if you are have an issue with cookie.

Recommended to upgrade your puppeteer version instead.

Type: boolean

gotHooks

src/interfaces/base.ts:129-129

It is not recommended to use. Use another library properties to do it.

Modify requests in more advanced way through interaction with got.

Type: Hooks

continue

src/interfaces/classes.ts:42-42

Will send gathered response back to the puppeteer immediately

If response not collected yet will call getResponse first.

Returns Promise<void>

ignoreResponseBodyIfPossible

src/interfaces/mixins.ts:14-14

If you are using this specific method global ignoreResponseBodyIfPossible will be ignored

Type: boolean

flushLocal

src/interfaces/mixins.ts:43-43

Flush local configuration

Parameters

  • key any? If provided will flush only specific parameter at local level

Returns void

recordError

src/interfaces/mixins.ts:55-59

Pass an error to the logger

Parameters

  • message any Flow description
  • error any? Original error object
  • meta ...any non specific meta information

Returns void

recordInternalError

src/interfaces/mixins.ts:65-68

Pass an internal error to the logger

Parameters

  • message any Flow/error description
  • meta ...any non specific meta information

Returns void

recordWarning

src/interfaces/mixins.ts:74-77

Pass an warn to the logger

Parameters

  • message any Flow/error description
  • meta ...any non specific meta information

Returns void

RequestMode

src/interfaces/network.ts:7-21

Plugin mode for handling requests

ignore

src/interfaces/network.ts:11-11

Plugin will do nothing about original request

Type: string

native

src/interfaces/network.ts:16-16

Plugin will just listen to the original request/response data and all requests will fulfilled by puppeteer itself. But some plugin functionality can be unavailable.

Type: string

managed

src/interfaces/network.ts:20-20

Plugin will do all requests by himself. All plugin features will be available.

Type: string

RequestStage

src/interfaces/network.ts:26-65

Current stage of the request

gotRequest

src/interfaces/network.ts:35-35

We got a new request from the puppeteer witch includes all necessary information about.

At this stage we can adjust request.

Type: string

sentRequest

src/interfaces/network.ts:42-42

The request in requesting process

At this stage we unable to adjust request but still have not response to go forward.

Type: string

gotResponse

src/interfaces/network.ts:50-50

We got response from the request witch probably was modified by the user and now user can adjust the response.

At this stage we can adjust response. At this stage the user will unable to override the request anymore.

Type: string

sentResponse

src/interfaces/network.ts:57-57

We sent final response of the request to the browser.

Its too late to adjust request or response.

Type: string

closed

src/interfaces/network.ts:64-64

Page were closed and we unable do anything

From technical perspective sentResponse looks just the same

Type: string

IRequestOptions

src/interfaces/network.ts:72-100

Plugin' request options. The request have significant difference with Puppeteer' request.

Can be modified. All changes will be applied to the actual Puppeteer' request and will be executed

method

src/interfaces/network.ts:78-78

Request method.

If request were executed you will unable to change this property.

Type: Method

url

src/interfaces/network.ts:85-85

Request url.

If request were executed you will unable to change this property.

Type: string

headers

src/interfaces/network.ts:92-92

Request headers.

If request were executed you will unable to change this property.

Type: Headers

body

src/interfaces/network.ts:99-99

Request body.

If request were executed you will unable to change this property.

Type: (string | Buffer | undefined)

_bodyError

src/interfaces/network.ts:108-108

Type: string

IAbortReason

src/interfaces/network.ts:129-129

This option will override the response

  • aborted - An operation was aborted (due to user action).
  • accessdenied - Permission to access a resource, other than the network, was denied.
  • addressunreachable - The IP address is unreachable. This usually means

that there is no route to the specified host or network.

  • blockedbyclient - The client chose to block the request.
  • blockedbyresponse - The request failed because the response was delivered along with requirements which are not met ('X-Frame-Options' and 'Content-Security-Policy' ancestor checks, for instance).
  • connectionaborted - A connection timed out as a result of not receiving an ACK for data sent.
  • connectionclosed - A connection was closed (corresponding to a TCP FIN).
  • connectionfailed - A connection attempt failed.
  • connectionrefused - A connection attempt was refused.
  • connectionreset - A connection was reset (corresponding to a TCP RST).
  • internetdisconnected - The Internet connection has been lost.
  • namenotresolved - The host name could not be resolved.
  • timedout - An operation timed out.
  • failed - A generic failure occurred.

Type: ErrorCode

InterceptionProxyRequest

src/classes/Request.ts:40-236

Extends RequestBase

Plugin' request. The request have significant difference with Puppeteer' request.

Parameters

Samples

/**
 * This example will show how to enable proxy for single page.
 */

// require libs
const puppeteer = require('puppeteer');
const InterceptionUtils = require('automation-extra-interception-proxy');

// do everything async
(async () => {

    // launch some browser
    const browser = await puppeteer.launch({
        headless: false,
    });

    // get some page
    const page = await browser.newPage();

    // attach interception commands
    await InterceptionUtils.wrapPage(page, {
        requestMode: "managed",

        // optional, will be handled by https://www.npmjs.com/package/proxy-agent
        proxy: "socks5://username:password@some-socks-proxy.com:9050" 
    });

    // goto to our destination and wait for the response
    await page.goto('https://www.npmjs.com/package/automation-extra-interception-proxy');

    // closing browser
    await browser.close();

})(); // ent of our thread
/**
 * This example will show how to enable interceptions for single page.
 *
 * This code will get some wallpaper image urls from bing.com
 *
 * This code could be broken if their behavior was changed.
 */

// require libs
const puppeteer = require('puppeteer');
const InterceptionUtils = require('automation-extra-interception-proxy');

// do everything async
(async () => {

    // launch some browser
    const browser = await puppeteer.launch({
        headless: false,
    });

    // get some page
    const page = await browser.newPage();

    // attach interception commands
    await InterceptionUtils.wrapPage(page, {
        requestMode: "managed",

        // optional, will be handled by https://www.npmjs.com/package/proxy-agent
        // proxy: "socks5://username:password@some-socks-proxy.com:9050" 
    });

    // create promise callback for async processing
    let callback;
    const promise = new Promise((resolve) => { callback = resolve; });

    // add some listener
    page.interceptions.addRequestListener('bing-images', async request => {

        // filter anything else
        if (request.url !== 'https://www.bing.com/hp/api/model') {
            // just letting you know that we got something else here
            console.log('Ignoring', request.url.slice(0, 50));
            return
        }

        // get response data
        const response = await request.getResponse();

        // grab data directly from their api response
        const apiData = response.json;

        // doing anything you like
        const imageUrls = apiData.MediaContents.map(({ ImageContent }) =>
            `https://www.bing.com${ImageContent.Image.Url}`);

        // back to async thread
        callback(imageUrls);

    }); // end of listener

    // goto to our destination and wait for the response
    const [imageUrls] = await Promise.all([
        promise,
        page.goto('https://www.bing.com/'),
    ]);

    // print our image urls
    console.log('imageUrls', imageUrls);

    // not necessary: cleaning our listener
    page.interceptions.deleteLocalRequestListener('bing-images');

    // closing browser
    await browser.close();

})(); // ent of our thread

Troubleshooting

Cookies does not work

Probably you're using old version of puppeteer. Try you upgrade first.

In case if you don't want to or cookies still does not work enable enableLegacyCookieHandling.

Does cors requests are broken?

Yes, the implementation is still raw.

TODO:

  • finalize cors managed requests - need to pass cors test
  • add tests
    • plugin flow
  • documentation
    • improve docs command
    • describe wrapPage
    • describe InterceptionProxyPlugin class
  • add more proxy api
    • waitRequest
  • websocket support
  • migrate to automation-extra-plugin
  • support Grease cipher

License

Copyright © 2021 - 2023, Utyfua. Released under the MIT License.