Update URL handler to extract og:title metadata by thomwiggers · Pull Request #245 · thomwiggers/onebot

thomwiggers · 2026-04-14T07:07:57Z

Summary

URL handler now prefers og:title over <title> for all URLs (fixes Instagram reels showing generic page title)
Deduplicated truncation/formatting logic into _extract_title_from_content
Strip internal newlines/whitespace from extracted titles (og:title content can contain embedded newlines)

Test changes

Replaced invented Instagram fixture with real Betamax cassette
Documented Betamax zstd workaround (force gzip to avoid binary corruption in cassettes)
Added assertion that title contains no newlines

Instagram pages have a generic <title> ("Instagram") but put the full post description in the og:title meta tag. Add a dedicated _process_url_instagram handler that detects instagram.com URLs and extracts og:title instead of falling through to the default <title> extraction. Add a fixture HTML file and unit test that verify the og:title is used and the result includes the actual post content. https://claude.ai/code/session_01HpfD4oaZHyZS9gePtLqsHC

Rather than a dedicated Instagram processor, simply update _extract_title_from_content to check og:title first and fall back to <title>. This fixes Instagram (and any other site where the <title> is generic) in one place. Remove the now-redundant _process_url_instagram method. Update the old test_too_long_title_text test to assert og:title preference and add a separate test_too_long_title_truncated for the truncation logic. https://claude.ai/code/session_01HpfD4oaZHyZS9gePtLqsHC

…ntent Select the title string first (og:title then <title>), then apply the truncation and curly-quote wrapping once instead of repeating it in both branches. https://claude.ai/code/session_01HpfD4oaZHyZS9gePtLqsHC

Replace the hand-crafted instagram-reel-example.html and its mock-based test with a Betamax cassette recorded against the live Instagram URL. Set Accept-Encoding: gzip, deflate in the session to avoid zstd, which Betamax cannot round-trip through JSON (urllib3 advertises zstd but the cassette serialiser corrupts the binary frame, causing ZstdError on playback). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Use split()/join() instead of strip() to collapse all whitespace (including embedded newlines) in extracted titles. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

thomwiggers changed the title ~~Add Instagram URL handler to extract og:title metadata~~ Update URL handler to extract og:title metadata Apr 14, 2026

thomwiggers commented Apr 14, 2026

View reviewed changes

Comment thread onebot/plugins/urlinfo.py Outdated

thomwiggers and others added 11 commits April 14, 2026 14:55

update devcontainer

f740472

test: document Betamax zstd workaround in test_instagram

b88f6e2

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

format

44c57d3

expand test

07c9546

fix: strip internal newlines from og:title and page title

c4c8aa7

Use split()/join() instead of strip() to collapse all whitespace (including embedded newlines) in extracted titles. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

format, check

9305660

fix ruff formatting target

de879a0

thomwiggers force-pushed the claude/fix-instagram-previews-rryOY branch from 9d94ca2 to de879a0 Compare April 14, 2026 12:55

thomwiggers enabled auto-merge (squash) April 14, 2026 12:55

thomwiggers merged commit 3da4ddd into develop Apr 14, 2026
13 of 14 checks passed

thomwiggers deleted the claude/fix-instagram-previews-rryOY branch April 14, 2026 12:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update URL handler to extract og:title metadata#245

Update URL handler to extract og:title metadata#245
thomwiggers merged 11 commits intodevelopfrom
claude/fix-instagram-previews-rryOY

thomwiggers commented Apr 14, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

thomwiggers commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

thomwiggers commented Apr 14, 2026 •

edited

Loading