-
Notifications
You must be signed in to change notification settings - Fork 0
Setting up the skills.md and example scripts! #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughA new TinyFish Web Agent skill is introduced through two files. The SKILL.md documentation file provides comprehensive guidance on using the skill, including environment requirements, best practices, request/response structures, and Python code examples for interacting with the MINO API via Server-Sent Events (SSE). The extract.py script implements a CLI tool exposing an Sequence DiagramsequenceDiagram
participant CLI as CLI User
participant Extract as extract() Function
participant MINO as MINO API<br/>(mino.ai/v1/automation/run-sse)
participant Response as Response Stream<br/>(SSE Events)
CLI->>Extract: extract(url, goal, stealth?, proxy?)
activate Extract
Extract->>Extract: Validate MINO_API_KEY<br/>from environment
alt API Key Missing
Extract->>CLI: Error: Print error & exit
else API Key Present
Extract->>Extract: Build request payload<br/>(url, goal, browser_profile,<br/>proxy_config)
Extract->>MINO: POST with X-API-Key header
activate MINO
MINO-->>Response: Stream SSE events
deactivate MINO
activate Response
Extract->>CLI: Log "Extracting from {url}..."
loop Parse Event Stream
Response-->>Extract: SSE line (data: {...})
alt Event Type = "STATUS_UPDATE"
Extract->>CLI: Log to stderr
else Event Type = "COMPLETE"
alt Status = "COMPLETED"
Extract->>Extract: Extract resultJson
Extract->>CLI: Print resultJson to stdout
Extract-->>CLI: Return resultJson
end
end
end
deactivate Response
end
deactivate Extract
🚥 Pre-merge checks | ✅ 1 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@skills/tinyfish-web-agent/scripts/extract.py`:
- Around line 16-21: The network call in extract.py (the urllib.request.urlopen
call) needs a timeout and friendly error handling: add a configurable timeout
(e.g., TIMEOUT constant or argparse --timeout) and pass it to
urllib.request.urlopen(..., timeout=timeout), wrap the call in try/except
catching urllib.error.HTTPError, urllib.error.URLError, socket.timeout (and a
fallback Exception), and on each print a concise user-facing message and exit
with a non-zero code instead of letting raw tracebacks propagate; update the
code around the existing urllib.request.urlopen usage to implement this
handling.
- Around line 25-28: Replace the in-function process exit with an exception:
inside extract() (where it checks MINO_API_KEY) raise a descriptive exception
(e.g., EnvironmentError or ValueError) instead of calling sys.exit; then update
the CLI entry point that invokes extract() (the top-level main/if __name__ ==
'__main__' caller) to wrap the extract() call in try/except, print the error to
stderr and call sys.exit(1) on failure so library callers get an exception while
the CLI still exits gracefully.
| import os | ||
| import sys | ||
| import json | ||
| import urllib.request | ||
| import argparse | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Read the extract.py file to see the current state
head -60 skills/tinyfish-web-agent/scripts/extract.py | cat -nRepository: tinyfish-io/skills
Length of output: 2172
🏁 Script executed:
#!/bin/bash
# Read the rest of the extract.py file to see complete API call code
sed -n '55,75p' skills/tinyfish-web-agent/scripts/extract.py | cat -nRepository: tinyfish-io/skills
Length of output: 1213
Add timeout + HTTP/URL error handling for the API call.
The blocking network call at line 55 lacks both a timeout parameter and exception handling. Without a timeout, this call can hang indefinitely, and unhandled HTTP/URL errors will crash the script with raw tracebacks instead of graceful error messages.
🛠️ Suggested fix (timeout + friendly errors)
-import urllib.request
+import urllib.request
+import urllib.error
@@
- with urllib.request.urlopen(req) as response:
- for line in response:
- line_str = line.decode("utf-8").strip()
- if line_str.startswith("data: "):
- event = json.loads(line_str[6:])
+ try:
+ with urllib.request.urlopen(req, timeout=30) as response:
+ for line in response:
+ line_str = line.decode("utf-8").strip()
+ if line_str.startswith("data: "):
+ event = json.loads(line_str[6:])
@@
- if event.get("type") == "COMPLETE" and event.get("status") == "COMPLETED":
- print(json.dumps(event["resultJson"], indent=2))
- return event["resultJson"]
+ if event.get("type") == "COMPLETE" and event.get("status") == "COMPLETED":
+ print(json.dumps(event["resultJson"], indent=2))
+ return event["resultJson"]
+ except urllib.error.HTTPError as e:
+ raise RuntimeError(f"MINO API HTTP {e.code}: {e.reason}") from e
+ except urllib.error.URLError as e:
+ raise RuntimeError(f"MINO API connection error: {e.reason}") from e🤖 Prompt for AI Agents
In `@skills/tinyfish-web-agent/scripts/extract.py` around lines 16 - 21, The
network call in extract.py (the urllib.request.urlopen call) needs a timeout and
friendly error handling: add a configurable timeout (e.g., TIMEOUT constant or
argparse --timeout) and pass it to urllib.request.urlopen(..., timeout=timeout),
wrap the call in try/except catching urllib.error.HTTPError,
urllib.error.URLError, socket.timeout (and a fallback Exception), and on each
print a concise user-facing message and exit with a non-zero code instead of
letting raw tracebacks propagate; update the code around the existing
urllib.request.urlopen usage to implement this handling.
| api_key = os.environ.get("MINO_API_KEY") | ||
| if not api_key: | ||
| print("Error: MINO_API_KEY environment variable not set", file=sys.stderr) | ||
| sys.exit(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
find . -name "extract.py" | grep -E "tinyfish-web-agent"Repository: tinyfish-io/skills
Length of output: 108
🏁 Script executed:
cat -n skills/tinyfish-web-agent/scripts/extract.py | head -100Repository: tinyfish-io/skills
Length of output: 3254
Avoid sys.exit inside the reusable extract() function.
extract() is a library function that should raise exceptions rather than exit the process. Calling sys.exit() here (line 28) means any code importing and calling this function will terminate unexpectedly when the environment variable is missing. Raise an exception instead and handle it in the CLI entry point.
Also, the CLI entry point (line 79) calls extract() without catching potential exceptions. Add error handling there to catch the exception and exit gracefully.
🛠️ Suggested fix
def extract(url, goal, stealth=False, proxy_country=None):
"""Extract/scrape data from a website using TinyFish"""
api_key = os.environ.get("MINO_API_KEY")
if not api_key:
- print("Error: MINO_API_KEY environment variable not set", file=sys.stderr)
- sys.exit(1)
+ raise RuntimeError("MINO_API_KEY environment variable not set") if __name__ == "__main__":
parser = argparse.ArgumentParser(description="TinyFish web extract/scrape tool")
parser.add_argument("url", help="URL to extract/scrape from")
parser.add_argument("goal", help="What to extract (natural language)")
parser.add_argument("--stealth", action="store_true", help="Use stealth mode")
parser.add_argument("--proxy", help="Proxy country code (e.g., US, UK, DE)")
args = parser.parse_args()
- extract(args.url, args.goal, args.stealth, args.proxy)
+ try:
+ extract(args.url, args.goal, args.stealth, args.proxy)
+ except RuntimeError as exc:
+ print(f"Error: {exc}", file=sys.stderr)
+ sys.exit(1)🤖 Prompt for AI Agents
In `@skills/tinyfish-web-agent/scripts/extract.py` around lines 25 - 28, Replace
the in-function process exit with an exception: inside extract() (where it
checks MINO_API_KEY) raise a descriptive exception (e.g., EnvironmentError or
ValueError) instead of calling sys.exit; then update the CLI entry point that
invokes extract() (the top-level main/if __name__ == '__main__' caller) to wrap
the extract() call in try/except, print the error to stderr and call sys.exit(1)
on failure so library callers get an exception while the CLI still exits
gracefully.
bonus points to anyone who figures out why this commit is called "batman"
hint: batman has no parents, git stores commits in a directed graph. The first commit has no parents. Boom. (I realize this was more than just a hint)