Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Youtube description info truncates at 160 chars, started today #25937

Closed
subirax opened this issue Jul 9, 2020 · 30 comments
Closed

Youtube description info truncates at 160 chars, started today #25937

subirax opened this issue Jul 9, 2020 · 30 comments

Comments

@subirax
Copy link

@subirax subirax commented Jul 9, 2020

Checklist

- [ x] I'm reporting a broken site support
- [x ] I've verified that I'm running youtube-dl version **2020.06.16.1**
- [x ] I've checked that all provided URLs are alive and playable in a browser
- [ x] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ x] I've searched the bugtracker for similar issues including closed ones

Verbose log

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--restrict-filenames', '--write-description', '-o',
 '%(title)s.%(ext)s', '-f', '140', 'https://www.youtube.com/watch?v=WXYRvg8_H_w'
, '-v']
[debug] Encodings: locale cp1252, fs mbcs, out cp1252, pref cp1252
[debug] youtube-dl version 2020.06.16.1
[debug] Python version 3.4.4 (CPython) - Windows-7-6.1.7600-SP0
[debug] exe versions: ffmpeg git-2020-01-24-e931119, ffprobe git-2020-01-24-e931
119
[debug] Proxy map: {}

Description

Youtube has apparently changed the way it passes video descriptions to youtube-dl, causing .description files to contain severely truncated data.

Descriptions used to be correctly formatted and complete, until today.  Now they run on a single unformatted line and are truncated at 160 characters, thus discarding much important information.

I compared description output of the same youtube download yesterday and today, using the same command line in batch file.  Yesterday's is correct; today's is truncated.  I tried this with 3 other youtubes, with the same problem.

Using the current version, checked with --update.

(Wild guess:  Has youtube changed the way its descriptions use end-of-line or linefeed characters?)

Entire truncated video description as downloaded today by youtube-dl, between dashed lines:
----------------------------------------------------------
Mario Castelnuovo-Tedesco (1895-1968): Antony and Cleopatra, Ouverture op.134 (1947). West Australian Symphony Orchestra diretta da Andrew Penny. Cover image...
----------------------------------------------------------

Correct video description as downloaded a week ago with the same command line:
----------------------------------------------------------
Mario Castelnuovo-Tedesco (1895-1968): Antony and Cleopatra, Ouverture op.134 (1947).

West Australian Symphony Orchestra diretta da Andrew Penny.

Cover image: painting by Harry Tresham.

***

The music published in our channel is exclusively dedicated to divulgation purposes and not commercial. This within a program shared to study classic educational music of the 1900's (mostly Italian) which involves thousands of people around the world. If someone, for any reason, would deem that a video appearing in this channel violates the copyright, please inform us immediately before you submit a claim to Youtube, and it will be our care to remove immediately the video accordingly.
----------------------------------------------------------


Hope this is  helpful.  Thanks very much!
@bamtan
Copy link

@bamtan bamtan commented Jul 9, 2020

--write-info-json is affected as well it seems

@subirax
Copy link
Author

@subirax subirax commented Jul 9, 2020

Strangely, descriptions still download correctly using JDownloader. With Youtube settings to download descriptions, the result the the same complete text file as before.

@randombyte-developer
Copy link

@randombyte-developer randombyte-developer commented Jul 10, 2020

Some things like Artist and Album for music videos rely on \n in the Regex. The new shortened description doesn't have any line breaks, which breaks detecting those.

@randombyte-developer
Copy link

@randombyte-developer randombyte-developer commented Jul 10, 2020

I was able to fix this issue at least for my purposes in these two commits: https://github.com/randombyte-developer/youtube-dl/commits?author=randombyte-developer

I am using the description to extract details about music videos like title and artist. Since it uses the video description it should also have fixed this issue.

Here is a build for linux: https://github.com/randombyte-developer/youtube-dl/releases/tag/yt-fixed-description-1

@holygamer
Copy link

@holygamer holygamer commented Jul 11, 2020

Any idea when the main program will be updated with that fix? I'm in a hurry to download a channel before YouTube deletes it.

Also even after the fix is applied, if in the future the description is truncated again, is there any way to get the program to display a warning letting me know that only a truncated description was downloaded?
Thanks

@GlassedSilver
Copy link

@GlassedSilver GlassedSilver commented Jul 11, 2020

Any idea when the main program will be updated with that fix? I'm in a hurry to download a channel before YouTube deletes it.

Also even after the fix is applied, if in the future the description is truncated again, is there any way to get the program to display a warning letting me know that only a truncated description was downloaded?
Thanks

What you're asking for is if ytdl can implement sanity checks and possibly even probe the issue tracker for new bugs that may have a certain tag assigned like "data integrity" (just a suggestion), right?

I'd fancy that, not sure how much of a sanity check process is already integrated. (if any)

@subirax
Copy link
Author

@subirax subirax commented Jul 12, 2020

@randombyte-developer:
How does a complete noob like myself implement your fix?
Thanks!

@subirax
Copy link
Author

@subirax subirax commented Jul 12, 2020

I figured out what's causing the problem with downloading descriptions .

For
https://www.youtube.com/watch?v=-dac_2af0Ww

Use Chrome Inspect and look for instances of "description." Three of them contain the truncated description, including this line:
<meta name="description" content="Arturo Toscanini [etc]

However. one line begins like this:
{"@context":"https://schema.org","@type":"VideoObject","description":"Arturo Toscanini [etc]

This line contains the complete correct description. The others only contain the truncated version.

I'm speculating the YTDL uses the Meta Name line above for extracting the description. Youtube probably changed th way the meta name works.

This looks like it'd be a small and easy change for YTDL to implement.

How does one suggest this to their programmers?

(Thanks to all who've answered!)

@libjared
Copy link

@libjared libjared commented Jul 13, 2020

  • Empty descriptions are being replaced with the string "Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube."
  • Ideographic space U+3000 is being replaced with a regular space.
  • One or more \n is replaced with a space.

I'm sure there's a few other regressions, these are the ones I found.

@libjared
Copy link

@libjared libjared commented Jul 13, 2020

  • The original-language description is used instead of the translated one.
    • example. A month ago, ytdl downloaded the English description that displays on the web, and now ytdl grabs the original Japanese language version. This doesn't seem to rely on a session cookie. Is it GeoIP? Was there ever a command-line switch to select between title/description languages?
@GlassedSilver
Copy link

@GlassedSilver GlassedSilver commented Jul 14, 2020

* The original-language description is used instead of the translated one.
  
  * [example](https://www.youtube.com/watch?v=6l2wdyDwGcA). A month ago, ytdl downloaded the English description that displays on the web, and now ytdl grabs the original Japanese language version. This doesn't seem to rely on a session cookie. Is it GeoIP? Was there ever a command-line switch to select between title/description languages?

I don't think so, no, but I may be wrong and have missed an important change like the addition of an arg.

If anything, I would love to store both the original description as well as any set of possible translations that may be available. (same for title, tags, whatever...)

If anyone knows whether there's any progress on that front that I missed, please kindly inform me, because it's quite a concern of mine.

@subirax
Copy link
Author

@subirax subirax commented Jul 15, 2020

I don't understand why this issue has been closed. The issue remains exactly as it was when I first reported it.

This command including write-description:

youtube-dl --restrict-filenames --write-description -o "%%(title)s.%%(ext)s" -f 140 https://www.youtube.com/watch?v=-dac_2af0Ww

...still truncates descriptions at 160 characters. This problem has not been fixed.

See my original report:

#25937 (comment)

Will this issue be re-opened, or should I start a new thread?

Thanks!

@michealespinola
Copy link

@michealespinola michealespinola commented Jul 15, 2020

I don't understand why this issue has been closed. The issue remains exactly as it was when I first reported it.

Forgive me if I'm mistaken, but it's closed because code has been merged that corrects the issue. This doesn't "fix" it for your Windows compiled executable until the next version is published/released. You still have to wait for the next version to be released, and update to it.

@joeschmoe40
Copy link

@joeschmoe40 joeschmoe40 commented Jul 15, 2020

Kind of a strange definition of closed.

Most people assume that "closed" means the same thing as "solved" and it is not "solved" until the solution is in people's hands. That's not (yet) the case here.

@johan456789
Copy link

@johan456789 johan456789 commented Jul 15, 2020

@joeschmoe40 That's not how it works. The issue tracker is for tracking issues in current development code, not for tracking problems in user's production code.

But I compiled from source and run subirax's example, the description is still truncated. @michealespinola @remitamine Can you confirm this? If so, we should reopen this.

@libjared
Copy link

@libjared libjared commented Jul 15, 2020

I'm running from the master branch which is commit a115e07 at the time of writing. When I run their example, I get the full description, starting from "Arturo" and ending in "accordingly". So no, I cannot reproduce this anymore, and the bug is fixed. If you're compiling from source, please ensure you've run git pull. The description has a few other minor defects we're covering in #26006 , but truncation is not one of them. This should stay closed, unless someone has a different example?

@johan456789
Copy link

@johan456789 johan456789 commented Jul 15, 2020

You are correct. It's fixed. Turns out it's path issue and I was still running old version. Sorry for that.

@subirax
Copy link
Author

@subirax subirax commented Jul 15, 2020

Thanks to michealespinola, johan456789 and libjared for clarifying this! Much appreciated!

@xperia64
Copy link

@xperia64 xperia64 commented Jul 18, 2020

I can confirm that master with the above commits merged fixes the description truncation issue reported here is fixed, but it does NOT fix age gated videos.

Please reopen and unlock #25945

(I'm not a fan of locking out issues entirely like the above until it is 100% confirmed that they are the same issue, because now I either have to write about it here and hope it gets noticed, or open a new one which would probably get locked again for being a duplicate)

@michealespinola
Copy link

@michealespinola michealespinola commented Jul 19, 2020

issue reported here is fixed, but it does NOT fix age gated videos

While age-gating occurred at the same time, it is not the same issue. (are you certain #25945 is the report you meant to reference - and not #25954?)

Like on other websites, using cookies in association with your Google user account and settings will allow you to access the video.

@xperia64
Copy link

@xperia64 xperia64 commented Jul 19, 2020

While age-gating occurred at the same time, it is not the same issue. (are you certain #25945 is the report you meant to reference - and not #25954?)

Like on other websites, using cookies in association with your Google user account and settings will allow you to access the video.

I agree it is not the same issue. I linked #25945 because I believe it was first confirmed by the bottom comment in that issue, then later posted in it's own issue in #25954. Of course, both are locked for discussion now because they were assumed to be this issue.

youtube-dl has successfully bypassed age-gating sans cookies before, but if it can no longer do so, the resolution to #25954 is a documentation update.

@joeschmoe40
Copy link

@joeschmoe40 joeschmoe40 commented Jul 20, 2020

Note: The last (and, therefore, "current") version is 2020.06.16.1, which is over a month old.

Is there any indication when a new version will be out, incorporating this fix (the problem of the description being truncated)?

Given that we used to get new versions like every few days or so, this seems long overdue.

@randombyte-developer
Copy link

@randombyte-developer randombyte-developer commented Jul 21, 2020

@joeschmoe40 Actually it's really not hard to build the project yourself. But yes, I agree that an official build would be best.

@michealespinola
Copy link

@michealespinola michealespinola commented Jul 21, 2020

I originally wrote this as a reply to #26053, but it may be useful to readers here as well:

Sergey - who has been handling releases since as long as I have been using youtube-dl - has been inactive for 10+ days now. I don't otherwise know what the dev/mod hierarchy is for this project. But given the current pandemic, I can only hope that he and his loved ones are well and in good health.

@randombyte-developer
Copy link

@randombyte-developer randombyte-developer commented Jul 23, 2020

@ebrawer I think it's just a make. If any errors occur, dependencies are probably missing. Google those errors and install those dependencies. Then try building it again.

@cdmackay
Copy link

@cdmackay cdmackay commented Jul 24, 2020

After a successful make, do a make install.

If you want to make it obvious that you're running a different version, edit youtube_dl/version.py, and increment the final digit. Then repeat the above.

@joeschmoe40
Copy link

@joeschmoe40 joeschmoe40 commented Jul 25, 2020

Just to be clear, I don't need this immediately, so I'm willing to wait. I just need to know that a solution will come out eventually. Not interested in creating more confusion by doing my own "make"s.

To be clear, I need both of the following:

  1. To know that a solution is in the pipeline - that it will come eventually.
  2. For it to come via the normal channels. No out-of-band stuff.

That's all.

@samutamm
Copy link

@samutamm samutamm commented Jul 27, 2020

I was able to build the project using make install and the truncated description bug seems to be gone when running youtube-dl on terminal. But I am using the embeded youtube-dl inside python scripts. How could I build it to python executable and have this fix in my python script?

@johan456789
Copy link

@johan456789 johan456789 commented Jul 27, 2020

How could I build it to python executable and have this fix in my python script?

I simply change the package's python source code.

  1. find out where the package folder is

    $ pip show youtube-dl
    WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
    Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
    To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
    Name: youtube-dl
    Version: 2020.6.16.1
    Summary: YouTube video downloader
    Home-page: https://github.com/ytdl-org/youtube-dl
    Author: Ricardo Garcia
    Author-email: ytdl@yt-dl.org
    License: Unlicense
    Location: /usr/local/Caskroom/miniconda/base/envs/selenium/lib/python3.8/site-packages
    Requires: 
    Required-by: 

    So it's at Location: /usr/local/Caskroom/miniconda/base/envs/selenium/lib/python3.8/site-packages. Your location might be different from mine, which depends on your virtual environment and python version.

  2. Edit or replace /usr/local/Caskroom/miniconda/base/envs/selenium/lib/python3.8/site-packages/youtube_dl/extractor/youtube.py to match the pull request edits

    It's just one line of code.

alexmerkel added a commit to alexmerkel/ytarchiver that referenced this issue Aug 7, 2020
This is necessary to workaround the extraction bug currently present
in youtube-dl (ytdl-org/youtube-dl#25937)
@medbenchohra
Copy link

@medbenchohra medbenchohra commented Sep 18, 2020

While age-gating occurred at the same time, it is not the same issue. (are you certain #25945 is the report you meant to reference - and not #25954?)
Like on other websites, using cookies in association with your Google user account and settings will allow you to access the video.

I agree it is not the same issue. I linked #25945 because I believe it was first confirmed by the bottom comment in that issue, then later posted in it's own issue in #25954. Of course, both are locked for discussion now because they were assumed to be this issue.

youtube-dl has successfully bypassed age-gating sans cookies before, but if it can no longer do so, the resolution to #25954 is a documentation update.

Downgrading to 2020.6.6 solves the issue. Other issues may popup (the ones fixed after 2020.6.6).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.