Skip to content

Conversation

@simantak-dabhade
Copy link
Contributor

bonus points to anyone who figures out why this commit is called "batman"

hint: batman has no parents, git stores commits in a directed graph. The first commit has no parents. Boom. (I realize this was more than just a hint)

@coderabbitai
Copy link

coderabbitai bot commented Jan 28, 2026

📝 Walkthrough

Walkthrough

A new TinyFish Web Agent skill is introduced through two files. The SKILL.md documentation file provides comprehensive guidance on using the skill, including environment requirements, best practices, request/response structures, and Python code examples for interacting with the MINO API via Server-Sent Events (SSE). The extract.py script implements a CLI tool exposing an extract() function that validates the MINO_API_KEY environment variable, constructs a POST request payload with URL, goal, and optional stealth/proxy parameters, streams the response from the MINO API, parses SSE events, and returns the extracted result when a COMPLETE event with status "COMPLETED" is received.

Sequence Diagram

sequenceDiagram
    participant CLI as CLI User
    participant Extract as extract() Function
    participant MINO as MINO API<br/>(mino.ai/v1/automation/run-sse)
    participant Response as Response Stream<br/>(SSE Events)

    CLI->>Extract: extract(url, goal, stealth?, proxy?)
    activate Extract
    Extract->>Extract: Validate MINO_API_KEY<br/>from environment
    alt API Key Missing
        Extract->>CLI: Error: Print error & exit
    else API Key Present
        Extract->>Extract: Build request payload<br/>(url, goal, browser_profile,<br/>proxy_config)
        Extract->>MINO: POST with X-API-Key header
        activate MINO
        MINO-->>Response: Stream SSE events
        deactivate MINO
        activate Response
        Extract->>CLI: Log "Extracting from {url}..."
        loop Parse Event Stream
            Response-->>Extract: SSE line (data: {...})
            alt Event Type = "STATUS_UPDATE"
                Extract->>CLI: Log to stderr
            else Event Type = "COMPLETE"
                alt Status = "COMPLETED"
                    Extract->>Extract: Extract resultJson
                    Extract->>CLI: Print resultJson to stdout
                    Extract-->>CLI: Return resultJson
                end
            end
        end
        deactivate Response
    end
    deactivate Extract
Loading
🚥 Pre-merge checks | ✅ 1 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Description check ❓ Inconclusive The description is a humorous note about the commit branch name, not directly related to the changeset itself. Consider adding a substantive description of the changes, such as what the TinyFish Web Agent skill does and why these example scripts are valuable.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Setting up the skills.md and example scripts!' directly describes the main changes: a new SKILL.md documentation file and a new extract.py example script.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@skills/tinyfish-web-agent/scripts/extract.py`:
- Around line 16-21: The network call in extract.py (the urllib.request.urlopen
call) needs a timeout and friendly error handling: add a configurable timeout
(e.g., TIMEOUT constant or argparse --timeout) and pass it to
urllib.request.urlopen(..., timeout=timeout), wrap the call in try/except
catching urllib.error.HTTPError, urllib.error.URLError, socket.timeout (and a
fallback Exception), and on each print a concise user-facing message and exit
with a non-zero code instead of letting raw tracebacks propagate; update the
code around the existing urllib.request.urlopen usage to implement this
handling.
- Around line 25-28: Replace the in-function process exit with an exception:
inside extract() (where it checks MINO_API_KEY) raise a descriptive exception
(e.g., EnvironmentError or ValueError) instead of calling sys.exit; then update
the CLI entry point that invokes extract() (the top-level main/if __name__ ==
'__main__' caller) to wrap the extract() call in try/except, print the error to
stderr and call sys.exit(1) on failure so library callers get an exception while
the CLI still exits gracefully.

Comment on lines +16 to +21
import os
import sys
import json
import urllib.request
import argparse

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Read the extract.py file to see the current state
head -60 skills/tinyfish-web-agent/scripts/extract.py | cat -n

Repository: tinyfish-io/skills

Length of output: 2172


🏁 Script executed:

#!/bin/bash
# Read the rest of the extract.py file to see complete API call code
sed -n '55,75p' skills/tinyfish-web-agent/scripts/extract.py | cat -n

Repository: tinyfish-io/skills

Length of output: 1213


Add timeout + HTTP/URL error handling for the API call.

The blocking network call at line 55 lacks both a timeout parameter and exception handling. Without a timeout, this call can hang indefinitely, and unhandled HTTP/URL errors will crash the script with raw tracebacks instead of graceful error messages.

🛠️ Suggested fix (timeout + friendly errors)
-import urllib.request
+import urllib.request
+import urllib.error
@@
-    with urllib.request.urlopen(req) as response:
-        for line in response:
-            line_str = line.decode("utf-8").strip()
-            if line_str.startswith("data: "):
-                event = json.loads(line_str[6:])
+    try:
+        with urllib.request.urlopen(req, timeout=30) as response:
+            for line in response:
+                line_str = line.decode("utf-8").strip()
+                if line_str.startswith("data: "):
+                    event = json.loads(line_str[6:])
@@
-                if event.get("type") == "COMPLETE" and event.get("status") == "COMPLETED":
-                    print(json.dumps(event["resultJson"], indent=2))
-                    return event["resultJson"]
+                    if event.get("type") == "COMPLETE" and event.get("status") == "COMPLETED":
+                        print(json.dumps(event["resultJson"], indent=2))
+                        return event["resultJson"]
+    except urllib.error.HTTPError as e:
+        raise RuntimeError(f"MINO API HTTP {e.code}: {e.reason}") from e
+    except urllib.error.URLError as e:
+        raise RuntimeError(f"MINO API connection error: {e.reason}") from e
🤖 Prompt for AI Agents
In `@skills/tinyfish-web-agent/scripts/extract.py` around lines 16 - 21, The
network call in extract.py (the urllib.request.urlopen call) needs a timeout and
friendly error handling: add a configurable timeout (e.g., TIMEOUT constant or
argparse --timeout) and pass it to urllib.request.urlopen(..., timeout=timeout),
wrap the call in try/except catching urllib.error.HTTPError,
urllib.error.URLError, socket.timeout (and a fallback Exception), and on each
print a concise user-facing message and exit with a non-zero code instead of
letting raw tracebacks propagate; update the code around the existing
urllib.request.urlopen usage to implement this handling.

Comment on lines +25 to +28
api_key = os.environ.get("MINO_API_KEY")
if not api_key:
print("Error: MINO_API_KEY environment variable not set", file=sys.stderr)
sys.exit(1)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "extract.py" | grep -E "tinyfish-web-agent"

Repository: tinyfish-io/skills

Length of output: 108


🏁 Script executed:

cat -n skills/tinyfish-web-agent/scripts/extract.py | head -100

Repository: tinyfish-io/skills

Length of output: 3254


Avoid sys.exit inside the reusable extract() function.

extract() is a library function that should raise exceptions rather than exit the process. Calling sys.exit() here (line 28) means any code importing and calling this function will terminate unexpectedly when the environment variable is missing. Raise an exception instead and handle it in the CLI entry point.

Also, the CLI entry point (line 79) calls extract() without catching potential exceptions. Add error handling there to catch the exception and exit gracefully.

🛠️ Suggested fix
 def extract(url, goal, stealth=False, proxy_country=None):
     """Extract/scrape data from a website using TinyFish"""
     api_key = os.environ.get("MINO_API_KEY")
     if not api_key:
-        print("Error: MINO_API_KEY environment variable not set", file=sys.stderr)
-        sys.exit(1)
+        raise RuntimeError("MINO_API_KEY environment variable not set")
 if __name__ == "__main__":
     parser = argparse.ArgumentParser(description="TinyFish web extract/scrape tool")
     parser.add_argument("url", help="URL to extract/scrape from")
     parser.add_argument("goal", help="What to extract (natural language)")
     parser.add_argument("--stealth", action="store_true", help="Use stealth mode")
     parser.add_argument("--proxy", help="Proxy country code (e.g., US, UK, DE)")
 
     args = parser.parse_args()
-    extract(args.url, args.goal, args.stealth, args.proxy)
+    try:
+        extract(args.url, args.goal, args.stealth, args.proxy)
+    except RuntimeError as exc:
+        print(f"Error: {exc}", file=sys.stderr)
+        sys.exit(1)
🤖 Prompt for AI Agents
In `@skills/tinyfish-web-agent/scripts/extract.py` around lines 25 - 28, Replace
the in-function process exit with an exception: inside extract() (where it
checks MINO_API_KEY) raise a descriptive exception (e.g., EnvironmentError or
ValueError) instead of calling sys.exit; then update the CLI entry point that
invokes extract() (the top-level main/if __name__ == '__main__' caller) to wrap
the extract() call in try/except, print the error to stderr and call sys.exit(1)
on failure so library callers get an exception while the CLI still exits
gracefully.

@simantak-dabhade simantak-dabhade merged commit bc1d9b4 into main Jan 28, 2026
3 checks passed
@simantak-dabhade simantak-dabhade deleted the batman branch January 28, 2026 02:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants