Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chat: mitigate unnecessarily risky code in third party oembed html #3115

Merged
merged 2 commits into from
Dec 14, 2023

Conversation

darighost
Copy link

@darighost darighost commented Dec 9, 2023

When embedding content, there are two things I wanted to beef up:

  1. The regexes for isTrusted are easy to bypass
  2. Pass embedHtml through DOMPurify

This PR aims to address these two issues.

Perhaps there's still a workaround I've neglected, please do let me know. So easy to miss small details when making this kind of change.

Do embeds still work?

To check that Youtube, Twitter, and Spotify embeds still work, I simply ran Groups with these changes and tried embedding all three.

Screenshot 2023-12-09 at 5 58 07

Appears to work!

Is this exploitable?

Actually, I think so! Specifically, for Twitter embeds. We can bypass the regex by using any domain that has the Twitter URL as a query parameter, ala https://example.text/evil?ignoreMe=https://twitter.com/angelrightsnow/status/1557429719032905728.

This gets us past the regex but then we're hit with this roadblock:

https://github.com/tloncorp/landscape-apps/blob/e052bd0f39221652f39a0ea39f6be58804420e41/ui/src/state/embed.ts#L5

And then

https://github.com/tloncorp/landscape-apps/blob/e052bd0f39221652f39a0ea39f6be58804420e41/ui/src/state/embed.ts#L35

No way around it, we need to serve the malicious content from noembed. But noembed only serves oembed content from specific, supported providers. On the other hand, noembed is kind of old. Maybe some providers have actually died, and their domain is available for sale, all while noembed continues to support them!

We can search for noembed providers who's domains are for-sale by writing a simple Python script that uses noembed's and godaddy's APIs:

import re

domains = set()
r = requests.get('https://noembed.com/providers')

for provider in r.json():
    for pattern in provider["patterns"]:
        m = re.match('http.*www?(.*\\.[a-z]+).*?\\/', pattern)
        if m is not None:
            s = re.sub('^[^a-z]+', '', m.group(1), flags=re.M)
            d = re.sub('[^a-z.]', '', s)
            domains.add(d)

for d in domains:
    api_base_url = 'https://api.godaddy.com/v1/domains/available?domain='
    api_auth = 'sso-key [REDACTED]' # too casual for env vars rn lol
    r = requests.get(f'{api_base_url}{d}', headers={'Authorization': api_auth})

    if r.json().get('available'):
        print('noembed supported provider available for sale:', d)

And running it...

~/ily/urbit $ python3 search_noembeds_market.py 
noembed supported provider available for sale: utposts.com
noembed supported provider available for sale: nooledge.com

Woo! So here's what an attacker does. First, you buy one of these domains.

Screenshot 2023-12-09 at 0 17 30

Then you set the "html" property to contain a malicious link. The TwitterEmbed component looks for a link that looks like this: <a href=\"https://twitter.com/random_user/status/1733422327390490663?ref_src=twsrc%5Etfw\">

So we make nooledge.com/vid/9001 serve an oembed with a javascript: URL that passes these same checks. This will render as an error, including a link to the broken Tweet, which (when clicked) will launch the malicious JS code.

In the end, we get a URL like nooledge.com/vid/9001?ignoreme=twitter.com/status/etcetc which Groups would misinterpret as a Twitter embed (due to the broken regex), which noembed would support, and which would snag HTML from a domain we can buy and control.

Very convoluted and unrealistic! Still, maybe there's something worse lurking beneath this silly thought exercise. In any case, maybe that justifies adding some additional mitigations.

Copy link
Member

@patosullivan patosullivan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me! Thanks, Darigo!

@mrozanski
Copy link
Contributor

Nice work, great to see a detailed breakdown of what was done and why!

@mrozanski mrozanski merged commit 78c34dc into tloncorp:develop Dec 14, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants