Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Use lazy quantifier in regex for sanitizing dates in js_to_json #8295

Merged
merged 1 commit into from Oct 7, 2023

Conversation

awalgarg
Copy link
Contributor

@awalgarg awalgarg commented Oct 6, 2023

Given the following JavaScript code as string:

[new Date("foobar"), '("baz")']

The greedy quantifier's usage matches the string till the last quote instead of just the text inside the parenthesis. This patch fixes that using the STRING_RE regex for proper string literal matching.

Also updated tests.

Template

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

  • I am the original author of this code and I am willing to release it under Unlicense

What is the purpose of your pull request?

  • Core bug fix/improvement
Copilot Summary

馃 Generated by Copilot at c3dadad

Summary

馃悰馃晵馃幀

Fix js_to_json bug with single-quoted strings in new Date arguments. This improves YouTube metadata extraction.

js_to_json fixed
Single quotes now accepted
Winter of errors

Walkthrough

  • Fix bug in js_to_json function that caused parsing errors for single-quoted strings in new Date constructor (link)

@bashonly bashonly added the bug Bug that is not site-specific label Oct 6, 2023
@bashonly bashonly self-requested a review October 6, 2023 17:28
@coletdjnz
Copy link
Member

Also update/add tests if feasible (see test/test_utils.py - possibly test_js_to_json_edgecases).

@Grub4K
Copy link
Member

Grub4K commented Oct 7, 2023

We can instead use proper greedy string matching by doing this:

-        code = re.sub(r'new Date\((".+")\)', r'\g<1>', code)
+        code = re.sub(rf'new Date\(({STRING_RE})\)', r'\g<1>', code)

It passes the test you provided:

        on = js_to_json('[new Date("spam"), \'("eggs")\']')
        self.assertEqual(json.loads(on), ['spam', '("eggs")'], msg='Date regex should match a single string')

Given the following JavaScript code as string:

    [new Date("foobar"), '("baz")']

The greedy quantifier's usage matches the string till the last quote instead of just the text inside the parenthesis. This patch fixes that using the `STRING_RE` regex for matching string literals properly.

Also updated tests.
@awalgarg
Copy link
Contributor Author

awalgarg commented Oct 7, 2023

Thanks both, updated the PR. Using STRING_RE for matching string literals and updated tests.

Copy link
Member

@coletdjnz coletdjnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@bashonly bashonly added the pending-review PR needs a review label Oct 7, 2023
@Grub4K Grub4K removed the pending-review PR needs a review label Oct 7, 2023
@Grub4K Grub4K self-assigned this Oct 7, 2023
@Grub4K Grub4K merged commit 9d7ded6 into yt-dlp:master Oct 7, 2023
16 checks passed
@awalgarg awalgarg deleted the patch-1 branch October 8, 2023 07:20
aalsuwaidi pushed a commit to aalsuwaidi/yt-dlp that referenced this pull request Apr 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug that is not site-specific
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants