Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

output template support for %(channel_id)s #9676

Closed
glenn-slayden opened this issue Jun 2, 2016 · 18 comments
Closed

output template support for %(channel_id)s #9676

glenn-slayden opened this issue Jun 2, 2016 · 18 comments
Labels

Comments

@glenn-slayden
Copy link
Contributor

@glenn-slayden glenn-slayden commented Jun 2, 2016

Please follow the guide below

  • You will be asked some questions and requested to provide some information, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your issue (like that [x])
  • Use Preview tab to see how your issue will actually look like

Make sure you are using the latest version: run youtube-dl --version and ensure your version is 2016.06.02. If it's not read this FAQ entry and update. Issues with outdated version will be rejected.

  • I've verified and I assure that I'm running youtube-dl 2016.06.02

Before submitting an issue make sure you have:

  • At least skimmed through README and most notably FAQ and BUGS sections
  • Searched the bugtracker for similar issues including closed ones

What is the purpose of your issue?

  • Bug report (encountered problems with youtube-dl)
  • Site support request (request for adding support for a new site)
  • Feature request (request for a new functionality)
  • Question
  • Other

The output template should have a way to reference the YouTube channel id, i.e.

%(channel_id)s

Currently, %(uploader_id)s falls back to rendering the channel id if the uploader doesn't have a name, but there's no way to force this behavior if the uploader does have a name.

Note: If this feature is already available somewhere in youtube-dl, then I could not find it in the documentation, so documentation should be added or improved.

@Synetech
Copy link

@Synetech Synetech commented Jul 8, 2016

You can use %(uploader)s to force it to use the uploader’s name rather than their ID (it’s the second item listed in the --help).

@glenn-slayden
Copy link
Contributor Author

@glenn-slayden glenn-slayden commented Jul 8, 2016

No, this request is for a way to force the YouTube channel id (such as "UCLqxVugv74EIW3VWh2NOa3Q"), when there is also an uploader name.

As noted above, when using %(uploader_id)s, and when the channel also has a user name, the name seems to have precedence over the channel id, and there doesn't appear to be a way to get the id ("UCLqxVugv74EIW3VWh2NOa3Q"). You get the uploader name instead of the id.

@yan12125 yan12125 added the request label Jul 9, 2016
@Synetech
Copy link

@Synetech Synetech commented Jul 9, 2016

%(uploader_id)s is the channel ID. For example, because Derek had already created Veritasium before, when he re-created it, he had to give it the ID 1Veritasium since the original one was already used, but its shortcut is Veritasium. Likewise, Tom Scott’s channel has his name, but its ID is enyay, but its shortcut is tomscottgo.

Where are you getting that other string? Is that an internal Google ID? (And why would you want it anyway?)

@glenn-slayden
Copy link
Contributor Author

@glenn-slayden glenn-slayden commented Jul 9, 2016

Ok, but every channel on YouTube also has an 'id' of the form "UCLqxVugv74EIW3VWh2NOa3Q", which is just as permanent as the 'uploader ID' your mention, and can be used interchangeably with it in URLs.

It is easy to find this string in any respective YouTube HTML page, even if originating with the human-readable value. Just search for "channelId" and note the 24-character string that starts with 'UC'.

As for why I want it, the 24-character format which starts with 'UC' is a consistent and predictable format which is identifiably associated with YouTube, for example when it appears as a (e.g.) local folder name mixed with other types of strings. That would not be true for the human-readable version of the "uploader ID".

And also, remember that %(uploader_id)s already does exactly what I'm asking for here -- but only if there is no human-readable ID available. What doesn't seem helpful to me is the inconsistency in youtube-dl; there's essentially no way to predict whether you'll get a 24-character ID or otherwise when using %(uploader_id)s. Hence I propose %(channel_id)s that would always return an 'id' of the form "UCLqxVugv74EIW3VWh2NOa3Q"

@Synetech
Copy link

@Synetech Synetech commented Jul 9, 2016

Ah right, I know what you are referring to now. I get those IDs for the /channel/ links in the subscription section of the menu.

I’m still not sure how it would be useful for downloaded videos. I for one would prefer to just see the uploaders’ names rather than some random string; that would make organization a lot easier, and is just as easy to sort. I debated whether even the named channel ID (e.g. enyay for Tom Scott’s channel) would be of any practical real-world use, and ended up taking it out of my template and using just %(uploader)s.

I’m sure if you really need, they could probably add it, but there’s probably some more useful things that could be added like the the individual timestamp fields for more customizable timestamps.

(Also, the --help docs need to be updated because apparently there are a bunch of fields like %(duration)s that are available which were not added.)

@glenn-slayden
Copy link
Contributor Author

@glenn-slayden glenn-slayden commented Jul 10, 2016

It's useful for downloaded videos because you can use the output template to specify folder names in the full path leading to the target filename.

You say that having the name "makes organization a lot easier," but as I noted above, I find exactly the opposite to be true for my application. When the downloaded files are managed by a file management program, it is better to have a unique folder identifier which has a well-understood fixed-width format, and is composed from a very limited ASCII character set (many channel names use e.g. Japanese, combining Unicode characters, etc.)

@glenn-slayden
Copy link
Contributor Author

@glenn-slayden glenn-slayden commented Oct 16, 2016

bump?

(apologies if doing so is considered rude; I don't know the culture here)

@glenn-slayden
Copy link
Contributor Author

@glenn-slayden glenn-slayden commented Oct 17, 2016

Actually, it looks like this feature request is mostly solved by the new %(playlist_id)s capability which was recently added to the output template. Although the aforementioned keyword in fact returns the 24-character "playlist identifier" (UU...) of the channel, the "channel_id" is trivially obtained by changing the "UU" prefix to "UC."

Unless someone feels like fixing the remaining glitch by adding %(channel_id)s to the template syntax as well (should be easy now, eh?), I guess this issue could be marked resolved.

@glenn-slayden
Copy link
Contributor Author

@glenn-slayden glenn-slayden commented Oct 17, 2016

Sorry, looks like I spoke too soon. For operations on individual files, the playlist_id is returning "NA," a value which is not trivially converted to the proposed %(channel_id)s value :-(

youtube_dl --simulate --o "/%(playlist_id)s/%(id)s.%(ext)s" --get-filename http://www.youtube.com/watch?v=BaW_jenozKc

--> \NA\BaW_jenozKc.mp4

Obviously, that's not useful. As noted above, using %(uploader_id)s instead gives...

--> \phihag\BaW_jenozKc.mp4

But this result is unpredictable: depending on the presence of a friendly name, you may or may not get the channel_id. What I continue to hope for of course, is...

--> \UCLqxVugv74EIW3VWh2NOa3Q\BaW_jenozKc.mp4

This one is predictable because the channel_id is always available. Also, unlike the human-readable channel name which the user can change (I believe), the channel_id is fixed over the lifetime of the channel. This last point, which I don't think has been mentioned yet in this thread, is a compelling reason to add proper support for %(channel_id)s.

So anyway, as far as I can tell, it remains impossible to specify this with the current output template options.

@Ectomind1990
Copy link

@Ectomind1990 Ectomind1990 commented Nov 17, 2016

I agree with the comments by glenn-slayden, and would like to see a channel_id field made available for the output template.

He explained why the existing uploader_id field seems inconsistent. It is probably worth me mentioning that the uploader_id field in the optional JSON file (created if you invoke youtube-dl with --write-info-json) seems to behave the same as the uploader_id field in the output template. So it will be:

  • either a channel ID like UClYV6hHlupm_S_ObS1W-DYw (a string of length 24, seemingly using the base64 "URL and Filename Safe Alphabet", and always starting with "UC")
  • or, effectively anything else (whatever the uploader has chosen as their username, if they have one?)

I guess the code changes needed to add a new channel_id field to the JSON file would be implemented at the same time as making channel_id available in the output template. (But if the developers want, I would be happy to create another issue for this.)

If the developers want people to do some testing of these change(s), I volunteer. I could also help by writing documentation. (I don't think much documentation will be needed, but if I'm wrong I'd be happy to write quite a lot.)

@Ectomind1990
Copy link

@Ectomind1990 Ectomind1990 commented Nov 20, 2016

Demo

Actual

For a quick demonstration of the apparent inconsistency of %(uploader_id)s, try these 2 commands
youtube-dl -o 'savedvids/%(uploader_id)s/%(id)s.%(ext)s' https://youtu.be/5THOUSvpCKk
youtube-dl -o 'savedvids/%(uploader_id)s/%(id)s.%(ext)s' https://youtu.be/UM96LYjO06E
And
ls -crR -w 1 savedvids/*/*
Gives
savedvids/1veritasium/5THOUSvpCKk.mkv
savedvids/UCBLGbNIdfHuY2UAA0hQ_ytw/UM96LYjO06E.mp4

That inconsistency in directory naming is why I would like to have a new %(channel_id)s field.

Desired

Here is what I would like to happen, if the feature is implemented:
youtube-dl -o 'savedvids/%(channel_id)s/%(id)s.%(ext)s' https://youtu.be/5THOUSvpCKk
youtube-dl -o 'savedvids/%(channel_id)s/%(id)s.%(ext)s' https://youtu.be/UM96LYjO06E
And
ls -crR -w 1 savedvids/*/*
Would give
savedvids/UCHnyfMqiRRG1u-2MsSQLbXA/5THOUSvpCKk.mkv
savedvids/UCBLGbNIdfHuY2UAA0hQ_ytw/UM96LYjO06E.mp4

My Theory About YouTube "Channels"/"Users"

Every uploader is a "channel", and can be accessed at URLs like these:
https://www.youtube.com/channel/UCHnyfMqiRRG1u-2MsSQLbXA
https://www.youtube.com/channel/UCHnyfMqiRRG1u-2MsSQLbXA/videos
https://www.youtube.com/channel/UCHnyfMqiRRG1u-2MsSQLbXA/about
Some uploaders are also a "user", meaning that the same webpages can also be accessed at URLs like these:
https://www.youtube.com/user/1veritasium
https://www.youtube.com/user/1veritasium/videos
https://www.youtube.com/user/1veritasium/about

If an uploader is also a "user", then YouTube mostly prefers to give you links to the "user" form of URLs. For example, even this search gives you this result.

The exception is when you are on a video page, the link back to the uploader (just above the Subscribe button) always seems to be the "channel" form of URL. For example, at 5THOUSvpCKk, the link is: "Veritasium".

A Look at Some Example "Channels" That Are Not "Users"

The example I've already given was video UM96LYjO06E - from a random channel with just 2 subscribers, currently called GTMA GTMA.

At the other end of the scale is The Weeknd - Topic - a singer with 1/4 million subscribers.

Here is an Arabic channel: https://www.youtube.com/channel/UCrIy5XTmRPwZPOlilXDWacA
I don't know what youtube-dl does when %(uploader)s is in Arabic, a right-to-left language. But I think it would be difficult for someone like me to deal with. That's perhaps another reason why I'd like to have a %(channel_id)s field.

@glenn-slayden
Copy link
Contributor Author

@glenn-slayden glenn-slayden commented Nov 20, 2016

I completely concur with Ectomind1990's comments and I still remain quite eager to have this issue fixed in the manner (s)he has so nicely described.

@glenn-slayden
Copy link
Contributor Author

@glenn-slayden glenn-slayden commented Dec 28, 2016

Here is the complete code which fixes this issue if someone would like to propagate it to the master. All changes in 'extractor/youtube.py'. Line numbers are based on youtube-dl-master as of 12/23/2016, but should be obvious from context. The two changes (listed in reverse order in order to keep original line numbers valid) are as follows:

extractor/youtube.dl, INSERT at line 1716:
________'channel_id': video_channel_id,

extractor/youtube.dl, INSERT at line 1413:
________# channel_id
________mobj = re.search(r'itemprop="channelId" content="(UC[-_A-Za-z0-9]{21}[AQgw])"', video_webpage)
________if mobj is not None:
____________video_channel_id = mobj.group(1)
________else:
____________self._downloader.report_warning('unable to extract channelId')
____________video_channel_id = None

With this change, you can specify %(channel_id)s in your output template, and you will consistently get the "UC..." identifier for both single item and playlist-oriented downloads.

@Ectomind1990
Copy link

@Ectomind1990 Ectomind1990 commented Mar 1, 2017

Some people who are interested in this feature request might also be interested in bug #12317 which affected the same area. Could it be worth checking that the proposed new channel_id field doesn't suffer from the same problem as the uploader_id?

@Piokaz
Copy link

@Piokaz Piokaz commented Oct 13, 2017

I was having the same problem, glad to see i was not alone.

Will this be merged to master ? Having "channel_id" is more consistent than "uploader_id".

@glenn-slayden
Copy link
Contributor Author

@glenn-slayden glenn-slayden commented Dec 4, 2017

@Piokaz The fix seems to have been rejected, so I am still privately using the mod shown above.

@KronK0321
Copy link

@KronK0321 KronK0321 commented Jul 12, 2018

I'd like to pile onto this one. I have an indexer that pulls YouTube metadata based on the naming of my folder structure.

The channel:
Numberphile [numberphile] is not correctly identified, while
Numberphile [UCoxcjq-8xIDTYp3uz647V5A] is.

@ianc125
Copy link

@ianc125 ianc125 commented Aug 18, 2018

Please add this the channel ID is necessary for Plex plugins to properly scrape metadata from youtube. See here: https://github.com/ZeroQI/YouTube-Agent.bundle

@dstftw dstftw closed this in dd4c449 Sep 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants
You can’t perform that action at this time.