Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YouTube video description breaks JSON parsing #19228

Open
ealgase opened this issue Feb 15, 2019 · 2 comments
Open

YouTube video description breaks JSON parsing #19228

ealgase opened this issue Feb 15, 2019 · 2 comments

Comments

@ealgase
Copy link
Contributor

@ealgase ealgase commented Feb 15, 2019

Please follow the guide below

  • You will be asked some questions and requested to provide some information, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your issue (like this: [x])
  • Use the Preview tab to see what your issue will actually look like

Make sure you are using the latest version: run youtube-dl --version and ensure your version is 2019.02.08. If it's not, read this FAQ entry and update. Issues with outdated version will be rejected.

  • I've verified and I assure that I'm running youtube-dl 2019.02.08

Before submitting an issue make sure you have:

  • At least skimmed through the README, most notably the FAQ and BUGS sections
  • Searched the bugtracker for similar issues including closed ones
  • Checked that provided video/audio/playlist URLs (if any) are alive and playable in a browser

What is the purpose of your issue?

  • Bug report (encountered problems with youtube-dl)
  • Site support request (request for adding support for a new site)
  • Feature request (request for a new functionality)
  • Question
  • Other

The following sections concretize particular purposed issues, you can erase any section (the contents between triple ---) not applicable to your issue


If the purpose of this issue is a bug report, site support request or you are not completely sure provide the full verbose output as follows:

Add the -v flag to your command line you run youtube-dl with (youtube-dl -v <your command line>), copy the whole output and insert it here. It should look similar to one below (replace it with your log inserted between triple ```):

(bionic)ealgase@localhost:~/USB/YouTubearchives$ youtube-dl -v Od39xwKZ1d0
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-v', 'Od39xwKZ1d0']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2019.01.30.1
[debug] Python version 3.6.7 (CPython) - Linux-3.14.0-x86_64-with-Ubuntu-18.04-bionic
[debug] exe versions: ffmpeg 3.4.4, ffprobe 3.4.4, phantomjs 2.1.1
[debug] Proxy map: {}
[youtube] Od39xwKZ1d0: Downloading webpage
WARNING: [youtube] Od39xwKZ1d0: Failed to parse JSON Invalid \escape: line 1 column 98897 (char 98896)
[youtube] Od39xwKZ1d0: Downloading video info webpage
[debug] Default format spec: bestvideo+bestaudio/best
[debug] Invoking downloader on 'https://r6---sn-vgqsknel.googlevideo.com/videoplayback?ei=wwdmXILjFNDFwQHbnb3gDQ&clen=31890124&source=youtube&fvip=6&lmt=1536792839139822&expire=1550212131&pcm2=no&mime=video%2Fwebm&c=WEB&itag=303&key=yt6&ipbits=0&dur=290.683&sparams=aitags%2Cclen%2Cdur%2Cei%2Cgir%2Cid%2Cinitcwndbps%2Cip%2Cipbits%2Citag%2Ckeepalive%2Clmt%2Cmime%2Cmm%2Cmn%2Cms%2Cmv%2Cpcm2%2Cpl%2Crequiressl%2Csource%2Cexpire&requiressl=yes&id=o-AB-PnNf2R0KuhCyoy4ch-flN-NO-kue7Ks5xxMDwFmDW&ms=au%2Crdu&mt=1550190404&mv=m&gir=yes&pl=26&initcwndbps=1913750&signature=5B29175C0D3764D174A456AC0F2C2C678B6BE413.2C6BC4AD0D521C9ECC1D50C2B668F6466A5B6E5E&keepalive=yes&ip=2601%3A400%3Ac200%3Ac6ef%3Af18d%3Aa14f%3Ae313%3Abf36&mn=sn-vgqsknel%2Csn-vgqs7nlr&txp=5432332&mm=31%2C29&aitags=133%2C134%2C135%2C136%2C137%2C160%2C242%2C243%2C244%2C247%2C248%2C278%2C298%2C299%2C302%2C303&ratebypass=yes'
[download] Destination: Emojis in Linux Terminal-Od39xwKZ1d0.f303.webm
[download] 100% of 30.41MiB in 00:04
[debug] Invoking downloader on 'https://r6---sn-vgqsknel.googlevideo.com/videoplayback?ei=wwdmXILjFNDFwQHbnb3gDQ&clen=4540185&source=youtube&fvip=6&lmt=1536795394737305&expire=1550212131&pcm2=no&mime=audio%2Fwebm&c=WEB&itag=251&ipbits=0&dur=290.701&sparams=clen%2Cdur%2Cei%2Cgir%2Cid%2Cinitcwndbps%2Cip%2Cipbits%2Citag%2Ckeepalive%2Clmt%2Cmime%2Cmm%2Cmn%2Cms%2Cmv%2Cpcm2%2Cpl%2Crequiressl%2Csource%2Cexpire&requiressl=yes&id=o-AB-PnNf2R0KuhCyoy4ch-flN-NO-kue7Ks5xxMDwFmDW&ms=au%2Crdu&mt=1550190404&mv=m&gir=yes&pl=26&initcwndbps=1913750&signature=8DF19FD6D4F0D10E05526B21D1DA3FC9C6950379.DA4177550CDF240A3C2A40AD0303C557548F063C&keepalive=yes&ip=2601%3A400%3Ac200%3Ac6ef%3Af18d%3Aa14f%3Ae313%3Abf36&key=yt6&txp=5411222&mm=31%2C29&mn=sn-vgqsknel%2Csn-vgqs7nlr&ratebypass=yes'
[download] Destination: Emojis in Linux Terminal-Od39xwKZ1d0.f251.webm
[download] 100% of 4.33MiB in 00:00
[ffmpeg] Merging formats into "Emojis in Linux Terminal-Od39xwKZ1d0.webm"
[debug] ffmpeg command line: ffmpeg -y -loglevel repeat+info -i 'file:Emojis in Linux Terminal-Od39xwKZ1d0.f303.webm' -i 'file:Emojis in Linux Terminal-Od39xwKZ1d0.f251.webm' -c copy -map 0:v:0 -map 1:a:0 'file:Emojis in Linux Terminal-Od39xwKZ1d0.temp.webm'
Deleting original file Emojis in Linux Terminal-Od39xwKZ1d0.f303.webm (pass -k to keep)
Deleting original file Emojis in Linux Terminal-Od39xwKZ1d0.f251.webm (pass -k to keep)

Description of your issue, suggested solution and other information

I found out the issue that was causing #17940. The description of the video contains escape characters and breaks the JSON parsing. While it doesn't cause a failure in simply downloading the video, if it's deep in watch history, it will (as shown in #17940).

@NathanJewell
Copy link

@NathanJewell NathanJewell commented Mar 18, 2019

I performed some review of this issue. I verified the given case and am able to make a stronger conclusion about the source.

The JSON parsing at question is in extractor/Common.py _parse_json()

After analyzing the json string I can say the following.
-The issue is due to the plaintext unicode escape strings in the video description
-When passed to the parsing function there is an extra escape character in the sequence
-I believe this may be due to an encoding error which is not the fault of youtube-dl

The only resolution to this in my mind would be to manually verify all incoming json for erroneous escape sequences which has performance implications and is not necessary in the majority of cases.

I will continue to check out the issue and see if I can confirm the original source of the encoding error.

@ealgase
Copy link
Contributor Author

@ealgase ealgase commented Mar 18, 2019

Thanks for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.