chat: mitigate unnecessarily risky code in third party oembed html #3115

darighost · 2023-12-09T12:38:23Z

When embedding content, there are two things I wanted to beef up:

The regexes for isTrusted are easy to bypass
Pass embedHtml through DOMPurify

This PR aims to address these two issues.

Perhaps there's still a workaround I've neglected, please do let me know. So easy to miss small details when making this kind of change.

Do embeds still work?

To check that Youtube, Twitter, and Spotify embeds still work, I simply ran Groups with these changes and tried embedding all three.

Appears to work!

Is this exploitable?

Actually, I think so! Specifically, for Twitter embeds. We can bypass the regex by using any domain that has the Twitter URL as a query parameter, ala https://example.text/evil?ignoreMe=https://twitter.com/angelrightsnow/status/1557429719032905728.

This gets us past the regex but then we're hit with this roadblock:

https://github.com/tloncorp/landscape-apps/blob/e052bd0f39221652f39a0ea39f6be58804420e41/ui/src/state/embed.ts#L5

And then

https://github.com/tloncorp/landscape-apps/blob/e052bd0f39221652f39a0ea39f6be58804420e41/ui/src/state/embed.ts#L35

No way around it, we need to serve the malicious content from noembed. But noembed only serves oembed content from specific, supported providers. On the other hand, noembed is kind of old. Maybe some providers have actually died, and their domain is available for sale, all while noembed continues to support them!

We can search for noembed providers who's domains are for-sale by writing a simple Python script that uses noembed's and godaddy's APIs:

import re

domains = set()
r = requests.get('https://noembed.com/providers')

for provider in r.json():
    for pattern in provider["patterns"]:
        m = re.match('http.*www?(.*\\.[a-z]+).*?\\/', pattern)
        if m is not None:
            s = re.sub('^[^a-z]+', '', m.group(1), flags=re.M)
            d = re.sub('[^a-z.]', '', s)
            domains.add(d)

for d in domains:
    api_base_url = 'https://api.godaddy.com/v1/domains/available?domain='
    api_auth = 'sso-key [REDACTED]' # too casual for env vars rn lol
    r = requests.get(f'{api_base_url}{d}', headers={'Authorization': api_auth})

    if r.json().get('available'):
        print('noembed supported provider available for sale:', d)

And running it...

➜  ~/ily/urbit $ python3 search_noembeds_market.py 
noembed supported provider available for sale: utposts.com
noembed supported provider available for sale: nooledge.com

Woo! So here's what an attacker does. First, you buy one of these domains.

Then you set the "html" property to contain a malicious link. The TwitterEmbed component looks for a link that looks like this: <a href=\"https://twitter.com/random_user/status/1733422327390490663?ref_src=twsrc%5Etfw\">

So we make nooledge.com/vid/9001 serve an oembed with a javascript: URL that passes these same checks. This will render as an error, including a link to the broken Tweet, which (when clicked) will launch the malicious JS code.

In the end, we get a URL like nooledge.com/vid/9001?ignoreme=twitter.com/status/etcetc which Groups would misinterpret as a Twitter embed (due to the broken regex), which noembed would support, and which would snag HTML from a domain we can buy and control.

Very convoluted and unrealistic! Still, maybe there's something worse lurking beneath this silly thought exercise. In any case, maybe that justifies adding some additional mitigations.

patosullivan

Makes sense to me! Thanks, Darigo!

mrozanski · 2023-12-14T15:14:07Z

Nice work, great to see a detailed breakdown of what was done and why!

nomad added 2 commits December 9, 2023 05:15

Mitigate risky code in embedded HTML

a465f1e

Add www subdomain for Youtube embeds

a850c53

patosullivan approved these changes Dec 9, 2023

View reviewed changes

mrozanski requested a review from latter-bolden December 13, 2023 18:33

mrozanski merged commit 78c34dc into tloncorp:develop Dec 14, 2023
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chat: mitigate unnecessarily risky code in third party oembed html #3115

chat: mitigate unnecessarily risky code in third party oembed html #3115

darighost commented Dec 9, 2023 •

edited

patosullivan left a comment

mrozanski commented Dec 14, 2023

chat: mitigate unnecessarily risky code in third party oembed html #3115

chat: mitigate unnecessarily risky code in third party oembed html #3115

Conversation

darighost commented Dec 9, 2023 • edited

Do embeds still work?

Is this exploitable?

patosullivan left a comment

Choose a reason for hiding this comment

mrozanski commented Dec 14, 2023

darighost commented Dec 9, 2023 •

edited