Description
Which package is the feature request for? If unsure which one to select, leave blank
@crawlee/core
Feature
Hopefully we can all agree that crawlee is an extremely powerful library. Unfortunately, this means that debugging crawlee and understanding exactly what it is doing can be quite challenging in some cases. To help mitigate this issue, I would like to propose that crawlee could implement automatic instrumentation for opentelemetry.
Opentelemetry(OTLP) is quickly becoming an industry standard package for providing observability into complex or distributed systems. It provides three primitive resources to help developers and ops teams to understand their system
- Traces (scrapes, pages, hooks) - these are events that last for a given length of time. They can have data attached to them through attributes but mostly they are used to build a heirarchal relationship of what is going on in a system (what called what).
- Metrics (resource usage, queue progression) - These are figures that can change over time. They allow you to see how a systems state is evolving while it is executing.
- Logs - Logs in terms of OTLP are attached to a parent span at a specified time. As you might expect, they can convey anything a developer wants
Library Instrumentation can be written for OTLP in two ways; Natively which would mean adding OTLP to @crawlee/core, or using an instumentation library such as creating a @crawlee/otlp package which would then monkey patch into the rest of crawlee.
Native instrumentation gets more direct access to the internal workings of crawlee but it would mean its installed by everyone even if they do not use OTLP while monkey patching would be less flexible but would mean crawlee is still installable without it. So far, in the javascript ecosystem, the only package to have native instrumentation is nextjs (Doing it natively would mean crawlee would be put on a list next to nextjs, which could be a bonus?).
Motivation
Our crawlee scraper is so complex, it would be really useful to have spans and metrics automaticly collected and collated.
Ideal solution or implementation, and any additional constraints
Native OTLP instrumentation throughout crawlee's packages.
Alternative solutions or implementations
A optional package along the lines of @crawlee/openTelemetry that monkey patches tracing into crawlee.
Other context
I have attached an example of a basic express app here. It uses an auto instrumentation library for express, and creates spans for "rollTheDice" and "rollOnce".
Below is an example request with the data shown in azure monitor