Skip to content

Update URL handler to extract og:title metadata#245

Merged
thomwiggers merged 11 commits intodevelopfrom
claude/fix-instagram-previews-rryOY
Apr 14, 2026
Merged

Update URL handler to extract og:title metadata#245
thomwiggers merged 11 commits intodevelopfrom
claude/fix-instagram-previews-rryOY

Conversation

@thomwiggers
Copy link
Copy Markdown
Owner

@thomwiggers thomwiggers commented Apr 14, 2026

Summary

  • URL handler now prefers og:title over <title> for all URLs (fixes Instagram reels showing generic page title)
  • Deduplicated truncation/formatting logic into _extract_title_from_content
  • Strip internal newlines/whitespace from extracted titles (og:title content can contain embedded newlines)

Test changes

  • Replaced invented Instagram fixture with real Betamax cassette
  • Documented Betamax zstd workaround (force gzip to avoid binary corruption in cassettes)
  • Added assertion that title contains no newlines

@thomwiggers thomwiggers changed the title Add Instagram URL handler to extract og:title metadata Update URL handler to extract og:title metadata Apr 14, 2026
Comment thread onebot/plugins/urlinfo.py Outdated
thomwiggers and others added 11 commits April 14, 2026 14:55
Instagram pages have a generic <title> ("Instagram") but put the full
post description in the og:title meta tag. Add a dedicated
_process_url_instagram handler that detects instagram.com URLs and
extracts og:title instead of falling through to the default <title>
extraction.

Add a fixture HTML file and unit test that verify the og:title is used
and the result includes the actual post content.

https://claude.ai/code/session_01HpfD4oaZHyZS9gePtLqsHC
Rather than a dedicated Instagram processor, simply update
_extract_title_from_content to check og:title first and fall back to
<title>. This fixes Instagram (and any other site where the <title> is
generic) in one place.

Remove the now-redundant _process_url_instagram method. Update the old
test_too_long_title_text test to assert og:title preference and add a
separate test_too_long_title_truncated for the truncation logic.

https://claude.ai/code/session_01HpfD4oaZHyZS9gePtLqsHC
…ntent

Select the title string first (og:title then <title>), then apply the
truncation and curly-quote wrapping once instead of repeating it in
both branches.

https://claude.ai/code/session_01HpfD4oaZHyZS9gePtLqsHC
Replace the hand-crafted instagram-reel-example.html and its mock-based
test with a Betamax cassette recorded against the live Instagram URL.
Set Accept-Encoding: gzip, deflate in the session to avoid zstd, which
Betamax cannot round-trip through JSON (urllib3 advertises zstd but the
cassette serialiser corrupts the binary frame, causing ZstdError on playback).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Use split()/join() instead of strip() to collapse all whitespace
(including embedded newlines) in extracted titles.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@thomwiggers thomwiggers force-pushed the claude/fix-instagram-previews-rryOY branch from 9d94ca2 to de879a0 Compare April 14, 2026 12:55
@thomwiggers thomwiggers enabled auto-merge (squash) April 14, 2026 12:55
@thomwiggers thomwiggers merged commit 3da4ddd into develop Apr 14, 2026
13 of 14 checks passed
@thomwiggers thomwiggers deleted the claude/fix-instagram-previews-rryOY branch April 14, 2026 12:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants