Once you have installed python-zyte-api <install>
and configured
your API key <api-key>
, you can use the zyte-api
command-line client.
To use zyte-api
, pass an input file <input-file>
as the first parameter and specify an output file <output-file>
with --output
. For example:
zyte-api urls.txt --output result.jsonl
The input file can be either of the following:
A plain-text file with a list of target URLs, one per line. For example:
https://books.toscrape.com https://quotes.toscrape.com
For each URL, a Zyte API request will be sent with
request:browserHtml
set toTrue
.A JSON Lines file with a object of
Zyte API request parameters <zyte-api-reference>
per line. For example:{"url": "https://a.example", "browserHtml": true, "geolocation": "GB"} {"url": "https://b.example", "httpResponseBody": true} {"url": "https://books.toscrape.com", "productNavigation": true}
You can specify the path to an output file with the --output
/-o
switch. If not specified, the output is printed on the standard output.
Warning
The output path is overwritten.
The output file is in JSON Lines format. Each line contains a JSON object with a response from Zyte API.
By default, zyte-api
uses multiple concurrent connections for performance reasons <cli-optimize>
and, as a result, the order of responses will probably not match the order of the source requests from the input file <input-file>
. If you need to match the output results to the input requests, the best way is to use request:echoData
. By default, zyte-api
fills request:echoData
with the input URL.
By default, zyte-api
uses 20 concurrent connections for requests. Use the --n-conn
switch to change that:
zyte-api --n-conn 40 …
The --shuffle
option can be useful if you target multiple websites and your input file <input-file>
is sorted by website, to randomize the request order and hence distribute the load somewhat evenly:
zyte-api urls.txt --shuffle …
For guidelines on how to choose the optimal --n-conn
value for you, and other optimization tips, see zyte-api-optimize
.
zyte-api
automatically handles retries for rate-limiting
<zyte-api-rate-limit>
and unsuccessful
<zyte-api-unsuccessful-responses>
responses, as well as network errors, following the default retry policy <default-retry-policy>
.
Use --dont-retry-errors
to disable the retrying of error responses, and retrying only rate-limiting responses <zyte-api-rate-limit>
:
zyte-api --dont-retry-errors …
By default, errors are only logged in the standard error output (stderr
). If you want to include error responses in the output file, use --store-errors
:
zyte-api --store-errors …
cli-ref