Update URL handler to extract og:title metadata#245
Merged
thomwiggers merged 11 commits intodevelopfrom Apr 14, 2026
Merged
Conversation
thomwiggers
commented
Apr 14, 2026
Instagram pages have a generic <title> ("Instagram") but put the full
post description in the og:title meta tag. Add a dedicated
_process_url_instagram handler that detects instagram.com URLs and
extracts og:title instead of falling through to the default <title>
extraction.
Add a fixture HTML file and unit test that verify the og:title is used
and the result includes the actual post content.
https://claude.ai/code/session_01HpfD4oaZHyZS9gePtLqsHC
Rather than a dedicated Instagram processor, simply update _extract_title_from_content to check og:title first and fall back to <title>. This fixes Instagram (and any other site where the <title> is generic) in one place. Remove the now-redundant _process_url_instagram method. Update the old test_too_long_title_text test to assert og:title preference and add a separate test_too_long_title_truncated for the truncation logic. https://claude.ai/code/session_01HpfD4oaZHyZS9gePtLqsHC
…ntent Select the title string first (og:title then <title>), then apply the truncation and curly-quote wrapping once instead of repeating it in both branches. https://claude.ai/code/session_01HpfD4oaZHyZS9gePtLqsHC
Replace the hand-crafted instagram-reel-example.html and its mock-based test with a Betamax cassette recorded against the live Instagram URL. Set Accept-Encoding: gzip, deflate in the session to avoid zstd, which Betamax cannot round-trip through JSON (urllib3 advertises zstd but the cassette serialiser corrupts the binary frame, causing ZstdError on playback). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Use split()/join() instead of strip() to collapse all whitespace (including embedded newlines) in extracted titles. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
9d94ca2 to
de879a0
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
og:titleover<title>for all URLs (fixes Instagram reels showing generic page title)_extract_title_from_contentTest changes