Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect escaping in utils.js_to_json #24965

Open
LukeLR opened this issue Apr 23, 2020 · 0 comments
Open

Incorrect escaping in utils.js_to_json #24965

LukeLR opened this issue Apr 23, 2020 · 0 comments

Comments

@LukeLR
Copy link

@LukeLR LukeLR commented Apr 23, 2020

Checklist

  • I'm reporting a broken site support issue
  • I've verified that I'm running youtube-dl version 2020.03.24
  • I've checked that all provided URLs are alive and playable in a browser
  • I've checked that all URLs and arguments with special characters are properly quoted or escaped
  • I've searched the bugtracker for similar bug reports including closed ones
  • I've read bugs section in FAQ

Description

Consider a JavaScript dictionary that has a key-value pair like this:

{
link: encodeURI('https://mediathek.hhu.de/watch/6ea779d9-a2c1-4150-8d26-0653832e9d67/'),
}

utils.js_to_json is not able to escape this value correctly. The expected result would be:

{
"link": "encodeURI('https://mediathek.hhu.de/watch/6ea779d9-a2c1-4150-8d26-0653832e9d67/')",
}

However, the actual result is:

"link": "encodeURI"("https://mediathek.hhu.de/watch/7db1e695-b4c2-46a7-9227-1b22d7c7c05f/")

As the value is not enclosed in ', the value gets split into two seperate parts, which is not valid JSON anymore. But as the input was valid JavaScript, js_to_json should be able to produce valid JSON from it as well.

I think, the regex in youtube_dl/utils.py#L4022 could be adapted to parse a single word bevor a : as a key, and anything after the : until the next , as a value. This would allow parsing of JavaScript like this and similar cases, and maybe even simplify the regex a bit.

I tried implementing such a fix for the past hour now, but I guess, I'm not enough of a regex magician for that. (although I'll keep looking into it). What do you think?

Any other ideas? How can js_to_json be improved to cover JavaScript like this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant
You can’t perform that action at this time.