Skip to content

Conversation

@doronkopit5
Copy link
Member

@doronkopit5 doronkopit5 commented Nov 24, 2025

Fixes #2897
Fixes #2512

  • I have added tests that cover my changes.
  • If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
  • PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
  • (If applicable) I have updated the documentation accordingly.

Important

Add mypy type-checking support to traceloop-sdk, enhance type annotations, and integrate type validation into CI/CD pipeline.

  • Type Checking:
    • Added mypy for static type checking in pyproject.toml and project.json.
    • Integrated type-checking step in CI/CD pipeline in ci.yml.
  • Code Changes:
    • Enhanced type annotations across various files for better type safety.
    • Renamed entity_id to entity_instance_id in user_feedback.py and base_annotation.py for consistency.
  • Dependencies:
    • Added mypy, types-requests, types-colorama, and pandas-stubs to pyproject.toml for type checking support.

This description was created by Ellipsis for 5ea4dd6. You can customize this summary. It will automatically update as commits are pushed.


Summary by CodeRabbit

  • Chores

    • Integrated static type checking into CI pipeline with mypy configuration
    • Added comprehensive type annotations across SDK modules and test suites
    • Enhanced overall type safety configuration
  • Documentation

    • Updated user feedback API documentation reflecting entity identifier parameter naming standardization

✏️ Tip: You can customize this high-level summary in your review settings.

…ependencies

- Added mypy for static type checking in the traceloop-sdk.
- Updated poetry.lock to include mypy and its dependencies.
- Enhanced CI workflow to include type-checking step.
- Refactored user feedback and dataset classes to use entity_instance_id.
- Improved type annotations across various modules for better type safety.
- Updated poetry.lock to include pandas-stubs and types-pytz for improved type checking.
- Modified project.json to specify the mypy command for type checking the sdk directory.
- Enhanced type annotations across various modules for better type safety and clarity.
- Refactored function signatures to include return types for improved type hinting.
@coderabbitai
Copy link

coderabbitai bot commented Nov 24, 2025

Walkthrough

Adds comprehensive type hints and mypy support across traceloop-sdk. Introduces CI type-check target, configures strict mypy validation in pyproject.toml, updates type annotations in core modules, and simplifies decorator TypeVars to resolve Pyright compatibility issues.

Changes

Cohort / File(s) Summary
Workflow & Project Configuration
.github/workflows/ci.yml, packages/traceloop-sdk/project.json
Added NX type-check target and CI step running poetry run mypy traceloop/sdk in parallel with 3 workers.
Mypy Configuration
packages/traceloop-sdk/pyproject.toml
Added [tool.mypy] with strict type checking settings, [tool.mypy.overrides] to ignore missing imports for specific modules (cuid, posthog), and [pydantic-mypy] config. Added dev dependencies: mypy, types-requests, types-colorama, pandas-stubs.
Decorator Type System
packages/traceloop-sdk/traceloop/sdk/decorators/__init__.py, packages/traceloop-sdk/traceloop/sdk/decorators/base.py
Simplified TypeVar F from Callable[P, R | Awaitable[R]] to Callable[..., Any] and removed ParamSpec and R TypeVar declarations to resolve Pyright compatibility.
Test Type Hints
packages/traceloop-sdk/tests/test_user_feedback.py
Added type annotations to test function signatures (UserFeedback, Mock types) and updated payload keys from entity_id to entity_instance_id.
Annotation Module Updates
packages/traceloop-sdk/traceloop/sdk/annotation/base_annotation.py, packages/traceloop-sdk/traceloop/sdk/annotation/user_feedback.py
Updated docstrings and examples in base_annotation.py; moved create method into UserFeedback class and renamed parameter from entity_instance_id to entity_id.
HTTP & Client Modules
packages/traceloop-sdk/traceloop/sdk/client/http.py, packages/traceloop-sdk/traceloop/sdk/client/client.py
Added return type Dict[str, str] to _headers() method; added type: ignore[arg-type] comment in Experiment constructor.
Dataset Modules
packages/traceloop-sdk/traceloop/sdk/dataset/dataset.py, packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py
Added return type annotations (-> None) to _create_columns and _create_rows in Dataset; added cast() usage, None checks, and string type casting in datasets.py.
Stream Client
packages/traceloop-sdk/traceloop/sdk/evaluator/stream_client.py
Added type annotation response: httpx.Response to _handle_sse_response method; refactored error message construction with type ignore comment.
Experiment Module
packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py
Added type hints to inner coroutines and multiple type: ignore comments for dynamic typing expectations in async execution paths.
Fetcher Module
packages/traceloop-sdk/traceloop/sdk/fetcher.py
Added comprehensive return type annotations to all public methods and functions; enhanced HTTP request handling with headers and stricter error checks.
Image Uploader
packages/traceloop-sdk/traceloop/sdk/images/image_uploader.py
Added extensive type hints to __init__, upload_base64_image, aupload_base64_image, _get_image_url, and _async_upload method signatures with explicit parameter and return types.
Logging & Metrics Wrappers
packages/traceloop-sdk/traceloop/sdk/logging/logging.py, packages/traceloop-sdk/traceloop/sdk/metrics/metrics.py
Widened resource_attributes from dict to Dict[Any, Any], changed endpoint from str to Optional[str], and made internal exporter/provider fields Optional; adjusted initialization logic to handle None values.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Decorator changes require careful validation that the simplified TypeVar still satisfies all decorator use cases and doesn't inadvertently break runtime behavior
  • Logging/Metrics modules have control-flow changes (early returns on None) that need verification
  • Datasets module introduces cast() calls and type-dependent logic that should be validated for correctness
  • Many files have annotation-only changes that are lower-risk but still need scanning for accuracy
  • Cross-module consistency of type patterns (Optional, Dict[Any, Any]) should be verified

Possibly related PRs

Suggested reviewers

  • nirga
  • galkleinman
  • dinmukhamedm

Poem

🐰 Hops of type, whiskers of care,
Mypy checks floating through the air,
ParamSpec gone, Callable stays true,
Pyright smiles at the typing we do,
Strict and strong, our code shines bright!

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning The renaming of entity_id to entity_instance_id in user_feedback.py and related files appears unrelated to the type-checking objectives from linked issues #2897 and #2512. Clarify whether the entity_id/entity_instance_id renaming is necessary for type-checking compliance or if it should be addressed in a separate PR to maintain focus.
Docstring Coverage ⚠️ Warning Docstring coverage is 36.73% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the main objective: adding mypy type-checking support to traceloop-sdk, which is the primary focus of the changes.
Linked Issues check ✅ Passed The PR addresses both linked issues: #2897 (fixes Pyright errors by simplifying ParamSpec typing to Callable[..., Any]) and #2512 (adds comprehensive mypy support with type annotations, stubs, and configuration).
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dk/add-mypy-for-traceloop-sdk

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 5ea4dd6 and ba4e094.

⛔ Files ignored due to path filters (1)
  • packages/traceloop-sdk/poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (2)
  • packages/traceloop-sdk/pyproject.toml (2 hunks)
  • packages/traceloop-sdk/traceloop/sdk/decorators/base.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/traceloop-sdk/pyproject.toml
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

  • packages/traceloop-sdk/traceloop/sdk/decorators/base.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Test Packages (3.12)
  • GitHub Check: Lint
  • GitHub Check: Test Packages (3.10)
  • GitHub Check: Test Packages (3.11)
  • GitHub Check: Build Packages (3.11)
🔇 Additional comments (2)
packages/traceloop-sdk/traceloop/sdk/decorators/base.py (2)

4-10: LGTM: Import reorganization is appropriate.

The reorganization of the typing imports (moving Optional into the block and removing ParamSpec/Awaitable) aligns with the simplified typing approach and improves code organization.


29-29: Verify that overload alternatives were considered when simplifying TypeVar F.

The simplification from Callable[P, R | Awaitable[R]] to Callable[..., Any] is confirmed at line 29. This reduces type precision for decorated functions—callers will lack proper type information about function parameters and return types.

Verification found:

The type precision regression is real and impacts IDE autocomplete and type checking quality. Before accepting this trade-off, please confirm this simplified approach was chosen deliberately after evaluating overload-based alternatives that could preserve more type information while addressing Pyright compatibility.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed everything up to c6cb82f in 2 minutes and 33 seconds. Click for details.
  • Reviewed 747 lines of code in 20 files
  • Skipped 1 files when reviewing.
  • Skipped posting 6 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/traceloop-sdk/traceloop/sdk/annotation/base_annotation.py:29
  • Draft comment:
    The docstring still refers to the old parameter name 'entity_id' whereas the method now uses 'entity_instance_id'. Please update the documentation to match the updated parameter name.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
2. packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py:100
  • Draft comment:
    The TODO indicates that the task function’s return type is not properly annotated and is being type ignored. Consider updating the 'task' parameter signature to return an Awaitable (e.g. Awaitable[Dict[str, Any]]) instead of a plain dict.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The diff shows that TODO comments and type: ignore annotations were added in this PR. The comment is suggesting to fix the type annotation issue that the TODO mentions. However, looking at the rules, I need to check: 1) Is this about a code change in the diff? Yes, the TODO and type: ignore were added. 2) Is this actionable? Yes, it suggests updating the task parameter signature. 3) Is this obvious? The TODO already states what needs to be fixed, so the comment is essentially just repeating what the TODO says. 4) Does it require a code change? Yes, but the author already acknowledged this with the TODO. The comment is essentially asking the author to resolve the TODO they just added, which seems redundant. The author clearly knows about this issue (they added the TODO), so the comment doesn't add new information. The comment is actionable and technically correct about the type mismatch. The author added a TODO acknowledging the issue, which might mean they plan to fix it later. The comment could be seen as reinforcing the need to address this, rather than being redundant. However, it's also possible the author intentionally added the TODO to defer this work. While the comment is technically correct, it's redundant with the TODO that was just added by the author. The author clearly knows about this issue and has documented it. The comment doesn't provide any new information or insight beyond what the TODO already states. According to the rules, comments should not be obvious or unimportant, and this falls into that category since it's just restating what the code already says. This comment should be deleted because it's redundant with the TODO comment that the author just added. The author is already aware of the type annotation issue and has documented it. The comment doesn't add any new information or actionable insight beyond what's already in the code.
3. packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py:137
  • Draft comment:
    The TODO comments regarding eval_results type suggest that the current type is not fully accurate. Consider refining the type of eval_results to accommodate both dict and string outputs for cleaner type-safety without using type: ignore.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The comment is directly about changes made in the diff (the TODO comments and type: ignore annotations). However, the comment is essentially just restating what the TODO comments already say - the author already acknowledged this issue by adding TODO comments. The comment doesn't provide any new information or actionable guidance beyond what the author already documented. This seems like a case where the author is aware of the type issue and has documented it for future work. The comment is somewhat redundant with the TODOs already present. According to the rules, comments should not be obvious or unimportant, and this comment is essentially just echoing what's already in the code. The author explicitly added TODO comments acknowledging this exact issue, so they are clearly aware of it. However, the comment could be seen as reinforcing the importance of addressing these TODOs, or suggesting it should be done now rather than later. Perhaps the comment is trying to elevate the priority of fixing this type issue. While the comment might be trying to elevate priority, it doesn't provide any new information, specific guidance on how to fix it, or explain why it should be done now versus later. The author has already documented the issue with TODOs, and the comment is essentially redundant. It's not actionable beyond what the TODOs already indicate. The comment should be deleted because it merely restates what the author has already documented in TODO comments without providing additional value, specific guidance, or actionable next steps. It violates the rule about not making obvious comments.
4. packages/traceloop-sdk/traceloop/sdk/telemetry.py:10
  • Draft comment:
    The Posthog API key is hardcoded in the source. For security and flexibility, consider moving this key to an environment variable or secure configuration.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
5. packages/traceloop-sdk/traceloop/sdk/evaluator/stream_client.py:60
  • Draft comment:
    The TODO indicates that the bytes from response.aread() should be properly decoded before being included in the error message. Consider decoding the bytes (e.g., error_text.decode('utf-8')) to avoid issues with byte-string formatting.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 30% vs. threshold = 50% The comment is directly about code that was changed in the diff - specifically the error handling that was modified. The TODO was added by the PR author themselves, indicating they know there's an issue but haven't fixed it yet. The comment provides a concrete, actionable suggestion to decode the bytes. This is a legitimate code quality issue that needs to be fixed. The comment is not speculative - the TODO confirms there IS an issue. The suggestion is clear and actionable. However, I need to consider: the PR author already added the TODO themselves, so they're aware of the issue. Does this comment add value beyond what the TODO already says? The PR author already added a TODO acknowledging this exact issue, so they're clearly aware of it. The comment might be redundant since it's just restating what the TODO says and providing the obvious fix. The author may have intentionally left this as a TODO to fix later. While the author added the TODO, the comment provides a concrete code suggestion that makes it immediately actionable. The comment shows exactly how to fix it, which is more helpful than just the TODO. However, the rules state "Do NOT comment unless there is clearly a code change required" - and if the author intentionally added a TODO for later, this might not be a change required NOW. The comment is technically correct and actionable, but it's addressing a TODO that the PR author themselves added, suggesting they're aware of the issue and may be intentionally deferring the fix. Since the author already documented this as a known issue with a TODO, the comment is somewhat redundant. However, it does provide a concrete fix.
6. packages/traceloop-sdk/traceloop/sdk/annotation/base_annotation.py:27
  • Draft comment:
    Typographical issue: Consider changing "an user feedback annotation" to "a user feedback annotation".
  • Reason this comment was not posted:
    Comment was on unchanged code.

Workflow ID: wflow_ZXsP8FVGY9MEmS0g

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

@nirga nirga changed the title feat(traceloop-sdk): add type-checking support with mypy fix(traceloop-sdk): add type-checking support with mypy Nov 24, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py (1)

34-34: Fix the task signature to reflect async behavior.

The task parameter is typed as Callable[[Optional[Dict[str, Any]]], Dict[str, Any]], but line 101 awaits the result, meaning it should return an Awaitable. The current signature misrepresents the API contract and prevents mypy from catching incorrect usage.

Apply this diff to fix the signature:

     async def run(
         self,
-        task: Callable[[Optional[Dict[str, Any]]], Dict[str, Any]],
+        task: Callable[[Optional[Dict[str, Any]]], Awaitable[Dict[str, Any]]],
         dataset_slug: Optional[str] = None,

Then update the import on line 4:

-from typing import Any, List, Callable, Optional, Tuple, Dict
+from typing import Any, List, Callable, Optional, Tuple, Dict, Awaitable
packages/traceloop-sdk/traceloop/sdk/annotation/base_annotation.py (1)

27-50: Fix docstring to match the parameter rename.

The docstring still references entity_id in the Args section (line 32) and in the example code (line 42), but the actual parameter name is entity_instance_id. This inconsistency will mislead users and cause runtime errors if they follow the documentation.

Apply this diff to update the docstring:

         Args:
             annotation_task (str): The ID/slug of the annotation task to report to.
                 Can be found at app.traceloop.com/annotation_tasks/:annotation_task_id
-            entity_id (str): The ID of the specific entity instance being annotated, should be reported
+            entity_instance_id (str): The ID of the specific entity instance being annotated, should be reported
                 in the association properties
             tags (Dict[str, Any]): Dictionary containing the tags to be reported.
                 Should match the tags defined in the annotation task
 
         Example:
             ```python
             client = Client(api_key="your-key")
             client.annotation.create(
                 annotation_task="task_123",
-                entity_id="instance_456",
+                entity_instance_id="instance_456",
                 tags={
                     "sentiment": "positive",
                     "relevance": 0.95,
                     "tones": ["happy", "nice"]
                 },
             )
             ```
🧹 Nitpick comments (10)
packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py (2)

137-143: Consider properly typing eval_results instead of using type: ignore.

The type: ignore[assignment] pragmas suppress legitimate type concerns about assigning strings to eval_results. Since TaskResponse.evaluations is typed as Dict[str, Any] (based on the model snippet), the assignments are actually valid.

Remove the type: ignore comments and simplify:

                             else:
                                 await self._evaluator.trigger_experiment_evaluator(
                                     evaluator_slug=evaluator_slug,
                                     evaluator_version=evaluator_version,
                                     task_id=task_id,
                                     experiment_id=experiment.experiment.id,
                                     experiment_run_id=run_id,
                                     input=task_result,
                                 )

-                                # TODO: Fix type - eval_results should accept Union[Dict, str]
-                                msg = f"Triggered execution of {evaluator_slug}"
-                                eval_results[evaluator_slug] = msg  # type: ignore[assignment]
+                                eval_results[evaluator_slug] = f"Triggered execution of {evaluator_slug}"

                         except Exception as e:
-                            # TODO: Fix type - eval_results should accept Union[Dict, str]
-                            eval_results[evaluator_slug] = f"Error: {str(e)}"  # type: ignore[assignment]
+                            eval_results[evaluator_slug] = f"Error: {str(e)}"

The assignments are valid because Dict[str, Any] accepts string values.


143-143: Optional: Use explicit conversion flag in f-string.

The linter suggests using an explicit conversion flag for clarity.

As per coding guidelines, based on Ruff/Flake8 linting rules:

                         except Exception as e:
-                            eval_results[evaluator_slug] = f"Error: {str(e)}"
+                            eval_results[evaluator_slug] = f"Error: {e!s}"
packages/traceloop-sdk/traceloop/sdk/telemetry.py (1)

17-19: LGTM with optional improvement suggestion.

The Optional[Posthog] typing and associated guards correctly handle initialization failures. The silent exception handling in capture, log_exception, and feature_enabled is appropriate for telemetry code.

Consider logging exceptions at debug level instead of silently passing, which would aid troubleshooting without impacting production:

     def capture(self, event: str, event_properties: dict = {}) -> None:
         try:  # don't fail if telemetry fails
             if self._telemetry_enabled and self._posthog is not None:
                 self._posthog.capture(
                     self._anon_id(), event, {**self._context(), **event_properties}
                 )
-        except Exception:
-            pass
+        except Exception as e:
+            logging.getLogger(__name__).debug(f"Telemetry capture failed: {e}")

Also applies to: 71-71, 78-88, 92-98

packages/traceloop-sdk/traceloop/sdk/logging/logging.py (1)

43-50: Align parameter type with class attribute type.

The resource_attributes parameter is typed as dict, but the class attribute is Dict[Any, Any]. For consistency and better mypy validation, use the same type annotation.

Apply this diff:

 @staticmethod
 def set_static_params(
-    resource_attributes: dict,
+    resource_attributes: Dict[Any, Any],
     endpoint: str,
     headers: Dict[str, str],
 ) -> None:
packages/traceloop-sdk/traceloop/sdk/annotation/user_feedback.py (1)

42-42: Consider using super() for cleaner delegation.

The explicit BaseAnnotation.create(self, ...) works but super().create(...) is more idiomatic and maintainable.

Apply this diff:

-        return BaseAnnotation.create(self, annotation_task, entity_instance_id, tags)
+        return super().create(annotation_task, entity_instance_id, tags)
packages/traceloop-sdk/traceloop/sdk/fetcher.py (2)

67-71: Consider improving error handling.

The method catches all exceptions and only prints them, which silently swallows errors and can make debugging difficult. Consider using proper logging (with logging.error() or logging.exception()) or letting exceptions propagate to the caller for appropriate handling.

Apply this diff to improve error handling:

 def api_post(self, api: str, body: Dict[str, typing.Any]) -> None:
     try:
         post_url(f"{self._base_url}/v2/{api}", self._api_key, body)
     except Exception as e:
-        print(e)
+        logging.exception(f"Error posting to {api}: {e}")

97-114: The Any return type is acceptable but not ideal.

The function returns response.json() which can vary by endpoint, making Any reasonable without a unified response schema. Consider defining typed response models (e.g., using TypedDict or Pydantic) for better type safety in the future.

packages/traceloop-sdk/traceloop/sdk/decorators/base.py (1)

145-145: Use logging module instead of print statement.

Debug output in library code should use the logging module rather than print() so users can configure log levels and output destinations.

Apply this diff:

+import logging
+
+logger = logging.getLogger(__name__)
+
 ...
 
     if tlp_span_kind == TraceloopSpanKindValues.AGENT:
-        print(f"Setting agent name: {entity_name}")
+        logger.debug(f"Setting agent name: {entity_name}")
         set_agent_name(entity_name)
packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (2)

84-86: LGTM! Header validation prevents potential errors.

The explicit check for None fieldnames correctly prevents an AttributeError when processing CSVs without headers. The current implementation with a clear error message is appropriate.

If you consider the current error handling sufficient, the TODO comment could be removed. Alternatively, you might consider defining a custom exception class to address the static analysis hint, but that's purely optional.


145-147: LGTM! Defensive type casting is appropriate.

The explicit str(k) cast correctly handles pandas DataFrame columns, which are typed as Hashable and can include integers, tuples, or other non-string types. This ensures type safety when passing to _slugify, which expects a string parameter.

The TODO could be removed if this defensive casting approach is the intended solution. The current implementation properly handles the type mismatch between pandas' Hashable column names and the string-expecting _slugify method.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ba1aa4c and c6cb82f.

⛔ Files ignored due to path filters (1)
  • packages/traceloop-sdk/poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (19)
  • .github/workflows/ci.yml (1 hunks)
  • packages/traceloop-sdk/project.json (1 hunks)
  • packages/traceloop-sdk/pyproject.toml (2 hunks)
  • packages/traceloop-sdk/tests/test_user_feedback.py (3 hunks)
  • packages/traceloop-sdk/traceloop/sdk/annotation/base_annotation.py (2 hunks)
  • packages/traceloop-sdk/traceloop/sdk/annotation/user_feedback.py (1 hunks)
  • packages/traceloop-sdk/traceloop/sdk/client/client.py (1 hunks)
  • packages/traceloop-sdk/traceloop/sdk/client/http.py (1 hunks)
  • packages/traceloop-sdk/traceloop/sdk/dataset/dataset.py (2 hunks)
  • packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (4 hunks)
  • packages/traceloop-sdk/traceloop/sdk/decorators/__init__.py (2 hunks)
  • packages/traceloop-sdk/traceloop/sdk/decorators/base.py (1 hunks)
  • packages/traceloop-sdk/traceloop/sdk/evaluator/stream_client.py (1 hunks)
  • packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py (3 hunks)
  • packages/traceloop-sdk/traceloop/sdk/fetcher.py (8 hunks)
  • packages/traceloop-sdk/traceloop/sdk/images/image_uploader.py (2 hunks)
  • packages/traceloop-sdk/traceloop/sdk/logging/logging.py (3 hunks)
  • packages/traceloop-sdk/traceloop/sdk/metrics/metrics.py (3 hunks)
  • packages/traceloop-sdk/traceloop/sdk/telemetry.py (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

  • packages/traceloop-sdk/traceloop/sdk/decorators/base.py
  • packages/traceloop-sdk/traceloop/sdk/evaluator/stream_client.py
  • packages/traceloop-sdk/traceloop/sdk/client/http.py
  • packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py
  • packages/traceloop-sdk/traceloop/sdk/client/client.py
  • packages/traceloop-sdk/traceloop/sdk/dataset/dataset.py
  • packages/traceloop-sdk/traceloop/sdk/annotation/base_annotation.py
  • packages/traceloop-sdk/traceloop/sdk/telemetry.py
  • packages/traceloop-sdk/tests/test_user_feedback.py
  • packages/traceloop-sdk/traceloop/sdk/logging/logging.py
  • packages/traceloop-sdk/traceloop/sdk/decorators/__init__.py
  • packages/traceloop-sdk/traceloop/sdk/metrics/metrics.py
  • packages/traceloop-sdk/traceloop/sdk/annotation/user_feedback.py
  • packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py
  • packages/traceloop-sdk/traceloop/sdk/fetcher.py
  • packages/traceloop-sdk/traceloop/sdk/images/image_uploader.py
🧠 Learnings (1)
📚 Learning: 2025-08-17T15:06:48.109Z
Learnt from: CR
Repo: traceloop/openllmetry PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Use Nx workspace commands to run tests, lint, lockfile updates, and affected graphs

Applied to files:

  • .github/workflows/ci.yml
🧬 Code graph analysis (8)
packages/traceloop-sdk/traceloop/sdk/evaluator/stream_client.py (1)
packages/traceloop-sdk/traceloop/sdk/evaluator/model.py (1)
  • ExecutionResponse (39-43)
packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (1)
packages/traceloop-sdk/traceloop/sdk/client/http.py (1)
  • get (39-58)
packages/traceloop-sdk/traceloop/sdk/client/client.py (2)
packages/traceloop-sdk/tests/experiment/test_experiment.py (1)
  • experiment (8-14)
packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py (1)
  • Experiment (19-244)
packages/traceloop-sdk/traceloop/sdk/dataset/dataset.py (1)
packages/traceloop-sdk/traceloop/sdk/dataset/model.py (2)
  • ColumnDefinition (14-17)
  • RowObject (31-35)
packages/traceloop-sdk/traceloop/sdk/annotation/user_feedback.py (1)
packages/traceloop-sdk/traceloop/sdk/annotation/base_annotation.py (2)
  • create (21-71)
  • BaseAnnotation (6-71)
packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py (1)
packages/traceloop-sdk/traceloop/sdk/experiment/model.py (1)
  • TaskResponse (10-15)
packages/traceloop-sdk/traceloop/sdk/fetcher.py (2)
packages/traceloop-sdk/traceloop/sdk/client/http.py (1)
  • post (23-37)
packages/traceloop-sdk/traceloop/sdk/tracing/content_allow_list.py (1)
  • load (23-24)
packages/traceloop-sdk/traceloop/sdk/images/image_uploader.py (2)
packages/opentelemetry-instrumentation-anthropic/tests/conftest.py (3)
  • upload_base64_image (81-82)
  • upload_base64_image (104-105)
  • upload_base64_image (130-131)
packages/traceloop-sdk/traceloop/sdk/fetcher.py (1)
  • run (54-62)
🪛 Ruff (0.14.5)
packages/traceloop-sdk/traceloop/sdk/evaluator/stream_client.py

60-60: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)


61-61: Create your own exception

(TRY002)

packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py

86-86: Avoid specifying long messages outside the exception class

(TRY003)


174-174: Create your own exception

(TRY002)


174-174: Avoid specifying long messages outside the exception class

(TRY003)

packages/traceloop-sdk/traceloop/sdk/annotation/base_annotation.py

55-55: Avoid specifying long messages outside the exception class

(TRY003)


57-57: Avoid specifying long messages outside the exception class

(TRY003)

packages/traceloop-sdk/traceloop/sdk/telemetry.py

75-76: try-except-pass detected, consider logging the exception

(S110)


75-75: Do not catch blind exception: Exception

(BLE001)


89-90: try-except-pass detected, consider logging the exception

(S110)


89-89: Do not catch blind exception: Exception

(BLE001)

packages/traceloop-sdk/traceloop/sdk/logging/logging.py

18-18: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


20-20: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

packages/traceloop-sdk/traceloop/sdk/metrics/metrics.py

23-23: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


26-26: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py

141-141: Do not catch blind exception: Exception

(BLE001)


143-143: Use explicit conversion flag

Replace with conversion flag

(RUF010)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Test Packages (3.12)
  • GitHub Check: Test Packages (3.11)
  • GitHub Check: Test Packages (3.10)
  • GitHub Check: Build Packages (3.11)
  • GitHub Check: Lint
🔇 Additional comments (29)
packages/traceloop-sdk/traceloop/sdk/client/http.py (1)

17-17: LGTM! Return type annotation is correct and improves type safety.

The -> Dict[str, str] annotation accurately describes the method's return value and aligns with the PR's goal of adding mypy type-checking support.

packages/traceloop-sdk/traceloop/sdk/evaluator/stream_client.py (1)

55-55: LGTM! Type annotations align with mypy support objective.

The method signature is properly typed with explicit parameter and return type annotations.

packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py (1)

157-159: LGTM!

The type hint addition for the row parameter is correct and consistent with run_single_row.

.github/workflows/ci.yml (1)

77-77: LGTM!

The type-check step is properly integrated using Nx commands with parallel execution, aligning with the project's workflow patterns.

packages/traceloop-sdk/traceloop/sdk/dataset/dataset.py (1)

89-89: LGTM!

The -> None return type annotations correctly reflect that these private methods mutate instance state without returning values.

Also applies to: 101-101

packages/traceloop-sdk/project.json (1)

53-60: LGTM!

The type-check target is properly configured to run mypy on the SDK codebase, integrating seamlessly with the Nx workflow.

packages/traceloop-sdk/traceloop/sdk/metrics/metrics.py (1)

30-30: LGTM!

The Optional typing for method parameters correctly reflects that these values can be None, improving type safety.

Also applies to: 69-69

packages/traceloop-sdk/pyproject.toml (2)

83-86: LGTM!

The mypy development dependencies and type stub packages are appropriate for comprehensive type checking.


102-139: LGTM!

The mypy configuration is well-structured with:

  • Strict type checking enabled for improved code quality
  • Reasonable exclusions for gradual adoption (decorators, prompts, tracing, utils)
  • Appropriate overrides for third-party libraries without type stubs
  • Pydantic plugin integration for model validation

The configuration aligns with Python 3.10+ support and follows best practices for introducing static type checking.

packages/traceloop-sdk/traceloop/sdk/images/image_uploader.py (1)

9-9: LGTM!

The type annotations are comprehensive and accurate:

  • Method signatures clearly specify parameter and return types
  • upload_base64_image correctly returns None while wrapping the async call
  • aupload_base64_image properly returns str (the uploaded image URL)
  • The type: ignore[no-any-return] on Line 40 is appropriate for the JSON response

Also applies to: 14-17, 19-26, 28-40, 42-59

packages/traceloop-sdk/traceloop/sdk/client/client.py (1)

66-67: No action required; runtime behavior is safe.

The Experiment class gracefully handles None at runtime. When experiment_slug=None is passed to __init__, it stores the value in self._experiment_slug. Later, when the run() method executes, it has a fallback check: if not experiment_slug: experiment_slug = self._experiment_slug or "exp-" + str(cuid.cuid())[:11]. This means if the stored value is None, a default slug is generated automatically, preventing any runtime errors.

The type: ignore[arg-type] suppression is appropriate and necessary because the type annotation requires str while os.getenv() returns Optional[str], but the actual runtime behavior is safe.

packages/traceloop-sdk/traceloop/sdk/logging/logging.py (1)

24-40: LGTM!

The Optional type hints correctly reflect that the logger may be initialized without an endpoint, and the early return pattern appropriately handles this case in the singleton pattern.

packages/traceloop-sdk/tests/test_user_feedback.py (1)

38-89: LGTM!

The type hints enhance test clarity, and the parameter rename from entity_id to entity_instance_id is applied consistently across all test cases and assertions.

packages/traceloop-sdk/traceloop/sdk/annotation/base_annotation.py (1)

52-71: LGTM!

The validation and payload construction correctly use the renamed parameter entity_instance_id.

packages/traceloop-sdk/traceloop/sdk/annotation/user_feedback.py (1)

11-40: LGTM!

The method signature and documentation correctly use entity_instance_id and provide clear guidance for users of the API.

packages/traceloop-sdk/traceloop/sdk/fetcher.py (8)

9-9: LGTM!

The addition of Any to the typing imports is necessary for the type annotations added throughout the file.


54-54: LGTM!

The return type annotation is correct for this method.


86-89: LGTM!

The BaseException parameter type is appropriate here since this function is used as a retry predicate that needs to filter any exception type.


117-128: LGTM!

The function signature and implementation are correct. The Authorization header properly uses the API key from parameters, following security guidelines.


138-138: LGTM!

The return type annotation is correct for this polling function.


154-154: LGTM!

The return type annotation is correct for this data refresh function.


162-165: LGTM!

The return type annotation is correct for this thread monitoring function.


64-65: The review comment is incorrect.

The original concern conflates two different HTTP client classes. Fetcher has two post methods: post(api: str, body: Dict[str, str]) for v1 endpoints and api_post(api: str, body: Dict[str, typing.Any]) for v2 endpoints. The restrictive Dict[str, str] type is intentional for the v1 endpoint.

More importantly, dataset operations use HTTPClient (confirmed by BaseDatasetEntity._http: HTTPClient), which correctly declares post(path: str, data: Dict[str, Any]). All callers passing complex types—including Pydantic model_dump() results and ValuesMap dictionaries—use HTTPClient, not Fetcher. No call sites invoke Fetcher.post() with non-string values.

Likely an incorrect or invalid review comment.

packages/traceloop-sdk/traceloop/sdk/decorators/base.py (3)

206-284: Implementation logic is sound.

The decorator correctly handles all three function types (async generators, async functions, and sync functions) with proper span lifecycle management and error handling. The generator handling at lines 275-276 appropriately delegates to _handle_generator to ensure the span is ended after generation completes.


287-304: LGTM!

The class decorator correctly retrieves the target method, wraps it with entity_method, and sets it back on the class. The default naming convention using camel_to_snake is appropriate.


30-30: Revise review comment: mypy excludes this folder, so the "defeats mypy support" claim is incorrect.

The decorators folder is explicitly excluded from mypy checking in pyproject.toml, so mypy doesn't validate this code regardless of the TypeVar definition. The premise that removing the bound "defeats mypy support" is inaccurate.

However, the underlying concern remains valid: F = TypeVar("F") is unconstrained and the code assumes fn is callable (accessing fn.__qualname__ and calling _is_async_method(fn)). While this works at runtime, using ParamSpec for decorators is the standard approach to improve type clarity. Since Python 3.10+ is the target, ParamSpec is available from the typing module.

Recommended change: Use ParamSpec for better code quality, not as a mypy fix:

from typing import ParamSpec, TypeVar, Callable, Awaitable

P = ParamSpec('P')
R = TypeVar('R')
F = Callable[P, R | Awaitable[R]]

Likely an incorrect or invalid review comment.

packages/traceloop-sdk/traceloop/sdk/datasets/datasets.py (3)

2-2: LGTM! Import supports type narrowing.

The addition of cast to the imports is appropriate for the type-checking improvements made in this file.


163-168: No issues found—code is correct.

Verification of the API endpoint confirms it returns text/csv content-type. The HTTPClient.get() method properly handles this by checking the content-type header and returning response.text for CSV responses. The cast(str, result) is appropriate here since HTTPClient.get() returns Any, and the None check ensures the cast operates on actual string data.


170-175: Confirm API sets Content-Type "application/x-ndjson" for datasets/{slug}/versions/{version}/jsonl

HTTPClient.get returns response.text when Content-Type includes "application/x-ndjson" or "text/csv", so cast(str, result) is correct only if the server sends that header; repository contains no server implementation or test/cassette for the /jsonl endpoint — verify the API sends "application/x-ndjson" for this route or add test/cassette asserting that behavior.

Comment on lines +59 to +61
# TODO: Fix bytes formatting - should decode error_text or use !r
error_msg = f"Failed to stream results: {response.status_code}, body: {error_text}" # type: ignore[str-bytes-safe] # noqa: E501
raise Exception(error_msg)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix bytes formatting in error message.

The error message includes error_text (bytes) directly in the f-string, which will display as b'...' instead of the actual error text. This degrades the error message quality for users.

Apply this diff to decode the bytes and remove the type ignore:

-        error_text = await response.aread()
-        # TODO: Fix bytes formatting - should decode error_text or use !r
-        error_msg = f"Failed to stream results: {response.status_code}, body: {error_text}"  # type: ignore[str-bytes-safe]  # noqa: E501
-        raise Exception(error_msg)
+        error_text = await response.aread()
+        error_msg = f"Failed to stream results: {response.status_code}, body: {error_text.decode('utf-8', errors='replace')}"
+        raise Exception(error_msg)

Additional note: As per static analysis, the # noqa: E501 directive on line 60 is unused and can be removed (already removed in the suggested fix above).

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.14.5)

60-60: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)


61-61: Create your own exception

(TRY002)

🤖 Prompt for AI Agents
In packages/traceloop-sdk/traceloop/sdk/evaluator/stream_client.py around lines
59 to 61, the error_msg f-string includes error_text which may be bytes and
renders as b'...'; decode error_text before interpolating (e.g. error_str =
error_text.decode('utf-8', errors='replace') if isinstance(error_text, (bytes,
bytearray)) else str(error_text)) and use that decoded string in the f-string,
and remove the trailing "# type: ignore[str-bytes-safe]" and the unused "# noqa:
E501" comments so static analysis passes.

Comment on lines +98 to 108
async def run_single_row(row: Optional[Dict[str, Any]]) -> TaskResponse:
try:
task_result = await task(row)
# TODO: Fix type annotation - task should return Awaitable, not dict
task_result = await task(row) # type: ignore[misc]
# TODO: Fix type - task_input should accept Optional[Dict]
task_id = self._create_task(
experiment_slug=experiment_slug,
experiment_run_id=run_id,
task_input=row,
task_input=row, # type: ignore[arg-type]
task_output=task_result,
).id
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Address the Optional[Dict] vs Dict type mismatch.

The row parameter is typed as Optional[Dict[str, Any]] (line 98), but _create_task expects a non-optional Dict[str, Any] for task_input (line 214). The type: ignore[arg-type] on line 106 suppresses the error without handling the potential None case.

If row can actually be None at runtime, add a guard:

                 task_result = await task(row)  # type: ignore[misc]
+                if row is None:
+                    raise ValueError("Cannot create task with None input")
                 task_id = self._create_task(
                     experiment_slug=experiment_slug,
                     experiment_run_id=run_id,
-                    task_input=row,  # type: ignore[arg-type]
+                    task_input=row,
                     task_output=task_result,
                 ).id

Otherwise, if row is guaranteed to be non-None (since it comes from the rows list parsed from JSONL), change the type annotation:

-        async def run_single_row(row: Optional[Dict[str, Any]]) -> TaskResponse:
+        async def run_single_row(row: Dict[str, Any]) -> TaskResponse:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def run_single_row(row: Optional[Dict[str, Any]]) -> TaskResponse:
try:
task_result = await task(row)
# TODO: Fix type annotation - task should return Awaitable, not dict
task_result = await task(row) # type: ignore[misc]
# TODO: Fix type - task_input should accept Optional[Dict]
task_id = self._create_task(
experiment_slug=experiment_slug,
experiment_run_id=run_id,
task_input=row,
task_input=row, # type: ignore[arg-type]
task_output=task_result,
).id
async def run_single_row(row: Dict[str, Any]) -> TaskResponse:
try:
# TODO: Fix type annotation - task should return Awaitable, not dict
task_result = await task(row) # type: ignore[misc]
# TODO: Fix type - task_input should accept Optional[Dict]
task_id = self._create_task(
experiment_slug=experiment_slug,
experiment_run_id=run_id,
task_input=row, # type: ignore[arg-type]
task_output=task_result,
).id
🤖 Prompt for AI Agents
In packages/traceloop-sdk/traceloop/sdk/experiment/experiment.py around lines
98–108, the function run_single_row types row as Optional[Dict[str, Any]] but
passes it into self._create_task which requires a non-optional Dict, and the
current type: ignore[arg-type] masks this; fix by either (A) if row can be None
at runtime: add an explicit guard before calling _create_task (e.g., if row is
None: raise a clear ValueError/TypeError with context) and then pass row (typed
as Dict) without type ignores, or (B) if row is never None (rows come from
validated JSONL), change the run_single_row parameter annotation to Dict[str,
Any] and adjust any callers accordingly, then remove the type: ignore[arg-type];
ensure the chosen approach removes the type ignore and the mypy error is
resolved.

Comment on lines +18 to 20
resource_attributes: Dict[Any, Any] = {}
endpoint: Optional[str] = None
headers: Dict[str, str] = {}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Use ClassVar annotation for shared mutable class attributes.

The resource_attributes and headers attributes are class-level mutables shared across all instances of the singleton. Without ClassVar, mypy may incorrectly infer them as instance attributes. Based on coding guidelines, static analysis correctly flags this.

Apply this diff to add ClassVar annotations:

+from typing import Dict, Optional, Any, ClassVar
 
 class LoggerWrapper(object):
-    resource_attributes: Dict[Any, Any] = {}
-    endpoint: Optional[str] = None
-    headers: Dict[str, str] = {}
+    resource_attributes: ClassVar[Dict[Any, Any]] = {}
+    endpoint: ClassVar[Optional[str]] = None
+    headers: ClassVar[Dict[str, str]] = {}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
resource_attributes: Dict[Any, Any] = {}
endpoint: Optional[str] = None
headers: Dict[str, str] = {}
resource_attributes: ClassVar[Dict[Any, Any]] = {}
endpoint: ClassVar[Optional[str]] = None
headers: ClassVar[Dict[str, str]] = {}
🧰 Tools
🪛 Ruff (0.14.5)

18-18: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


20-20: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

🤖 Prompt for AI Agents
In packages/traceloop-sdk/traceloop/sdk/logging/logging.py around lines 18-20,
the mutable class-level attributes resource_attributes and headers must be
annotated with typing.ClassVar to indicate they are shared across instances;
update the top imports to include ClassVar (e.g., from typing import ClassVar)
and change the declarations to resource_attributes: ClassVar[Dict[Any, Any]] =
{} and headers: ClassVar[Dict[str, str]] = {}; leave endpoint as an
instance/optional attribute. Ensure imports remain unused-free.

Comment on lines +23 to 26
resource_attributes: Dict[Any, Any] = {}
endpoint: Optional[str] = None
# if it needs headers?
headers: Dict[str, str] = {}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Use ClassVar for mutable class attributes.

The mutable class attributes resource_attributes and headers should be annotated with ClassVar to indicate they are class-level attributes shared across all instances, not instance attributes.

Apply this diff:

+from typing import ClassVar
 from collections.abc import Sequence
 from typing import Dict, Optional, Any
 
 class MetricsWrapper(object):
-    resource_attributes: Dict[Any, Any] = {}
-    endpoint: Optional[str] = None
+    resource_attributes: ClassVar[Dict[Any, Any]] = {}
+    endpoint: ClassVar[Optional[str]] = None
     # if it needs headers?
-    headers: Dict[str, str] = {}
+    headers: ClassVar[Dict[str, str]] = {}
🧰 Tools
🪛 Ruff (0.14.5)

23-23: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


26-26: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

🤖 Prompt for AI Agents
In packages/traceloop-sdk/traceloop/sdk/metrics/metrics.py around lines 23-26,
the mutable class attributes resource_attributes and headers are currently typed
as Dict and should be declared as ClassVar to indicate they are class-level
shared mutables; update their type annotations to use ClassVar[Dict[Any, Any]]
and ClassVar[Dict[str, str]] respectively, and add/import ClassVar from typing
at the top of the file if not already present.

Comment on lines -11 to -13
P = ParamSpec("P")
R = TypeVar("R")
F = TypeVar("F", bound=Callable[P, R | Awaitable[R]])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got this with the agent after allowing mypy

The issue: The TypeVar F has a bound that uses ParamSpec and is too strict:

  1. Coroutine[Any, Any, WorkflowFinalOutput] is more specific than Awaitable[R], but TypeVars require exact matching (invariance)
  2. The ParamSpec creates strict parameter matching that doesn't work well with instance methods (which include self)
  3. This appears to be introduced in a recent traceloop-sdk commit (77bb066) that tried to "improve type safety"

Why It Looks Good in traceloop-sdk

The decorator types look correct syntactically, but they're overly restrictive for real-world usage patterns. The traceloop-sdk itself doesn't strictly type-check the decorators folder, so this issue wasn't caught in their own type checking.

self.datasets = Datasets(self._http)
experiment_slug = os.getenv("TRACELOOP_EXP_SLUG")
self.experiment = Experiment(self._http, self._async_http, experiment_slug)
# TODO: Fix type - Experiment constructor should accept Optional[str]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

"""

return BaseAnnotation.create(self, annotation_task, entity_instance_id, tags)
def create(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like an un needed change

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed 5ea4dd6 in 1 minute and 21 seconds. Click for details.
  • Reviewed 166 lines of code in 5 files
  • Skipped 0 files when reviewing.
  • Skipped posting 6 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/traceloop-sdk/tests/test_user_feedback.py:41
  • Draft comment:
    Good update: the test now calls create() with the new parameter name 'entity_id'. The expected payload still uses 'entity_instance_id'. Ensure that this mapping is documented for clarity.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
2. packages/traceloop-sdk/traceloop/sdk/annotation/base_annotation.py:23
  • Draft comment:
    Parameter renamed from 'entity_instance_id' to 'entity_id' improves clarity and type support. Note that the HTTP payload still maps to the key 'entity_instance_id' – if intentional for backward compatibility, consider adding a brief comment.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
3. packages/traceloop-sdk/traceloop/sdk/annotation/user_feedback.py:11
  • Draft comment:
    The UserFeedback.create() method now accepts 'entity_id' and delegates correctly to BaseAnnotation.create(). Changes are consistent with the new parameter naming.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
4. packages/traceloop-sdk/traceloop/sdk/decorators/__init__.py:11
  • Draft comment:
    Nice improvement: constraining the TypeVar 'F' to 'Callable[..., Any]' strengthens type-checking and mypy support.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
5. packages/traceloop-sdk/traceloop/sdk/decorators/base.py:30
  • Draft comment:
    Updating the TypeVar 'F' to be bound to 'Callable[..., Any]' is a good move for enhanced type safety. This aligns well with the overall mypy improvements.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50% None
6. packages/traceloop-sdk/traceloop/sdk/annotation/base_annotation.py:27
  • Draft comment:
    Typo: Consider changing "Create an user feedback annotation for a specific task." to "Create a user feedback annotation for a specific task."
  • Reason this comment was not posted:
    Comment was on unchanged code.

Workflow ID: wflow_V519RPEKNzU5JiUt

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

@doronkopit5 doronkopit5 merged commit f9a586a into main Nov 25, 2025
12 checks passed
@doronkopit5 doronkopit5 deleted the dk/add-mypy-for-traceloop-sdk branch November 25, 2025 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🐛 Bug Report: @task and @workflow decorators throw Pyright type errors with traceloop-sdk v0.39.4+ 🚀 Feature: missing py.typed file/typings

3 participants