Skip to content

Latest commit

 

History

History
114 lines (71 loc) · 3.29 KB

cli.rst

File metadata and controls

114 lines (71 loc) · 3.29 KB

Command-line client

Once you have installed python-zyte-api <install> and configured your API key <api-key>, you can use the zyte-api command-line client.

To use zyte-api, pass an input file <input-file> as the first parameter and specify an output file <output-file> with --output. For example:

zyte-api urls.txt --output result.jsonl

Input file

The input file can be either of the following:

  • A plain-text file with a list of target URLs, one per line. For example:

    https://books.toscrape.com
    https://quotes.toscrape.com

    For each URL, a Zyte API request will be sent with request:browserHtml set to True.

  • A JSON Lines file with a object of Zyte API request parameters <zyte-api-reference> per line. For example:

    {"url": "https://a.example", "browserHtml": true, "geolocation": "GB"}
    {"url": "https://b.example", "httpResponseBody": true}
    {"url": "https://books.toscrape.com", "productNavigation": true}

Output file

You can specify the path to an output file with the --output/-o switch. If not specified, the output is printed on the standard output.

Warning

The output path is overwritten.

The output file is in JSON Lines format. Each line contains a JSON object with a response from Zyte API.

By default, zyte-api uses multiple concurrent connections for performance reasons <cli-optimize> and, as a result, the order of responses will probably not match the order of the source requests from the input file <input-file>. If you need to match the output results to the input requests, the best way is to use request:echoData. By default, zyte-api fills request:echoData with the input URL.

Optimization

By default, zyte-api uses 20 concurrent connections for requests. Use the --n-conn switch to change that:

zyte-api --n-conn 40 …

The --shuffle option can be useful if you target multiple websites and your input file <input-file> is sorted by website, to randomize the request order and hence distribute the load somewhat evenly:

zyte-api urls.txt --shuffle …

For guidelines on how to choose the optimal --n-conn value for you, and other optimization tips, see zyte-api-optimize.

Errors and retries

zyte-api automatically handles retries for rate-limiting <zyte-api-rate-limit> and unsuccessful <zyte-api-unsuccessful-responses> responses, as well as network errors, following the default retry policy <default-retry-policy>.

Use --dont-retry-errors to disable the retrying of error responses, and retrying only rate-limiting responses <zyte-api-rate-limit>:

zyte-api --dont-retry-errors …

By default, errors are only logged in the standard error output (stderr). If you want to include error responses in the output file, use --store-errors:

zyte-api --store-errors …

cli-ref