Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--write-description fails on Bandcamp #25056

Open
emphoeller opened this issue Apr 29, 2020 · 2 comments
Open

--write-description fails on Bandcamp #25056

emphoeller opened this issue Apr 29, 2020 · 2 comments

Comments

@emphoeller
Copy link

@emphoeller emphoeller commented Apr 29, 2020

Checklist

  • I'm reporting a broken site support
  • I've verified that I'm running youtube-dl version 2020.03.24
  • I've checked that all provided URLs are alive and playable in a browser
  • I've checked that all URLs and arguments with special characters are properly quoted or escaped
  • I've searched the bugtracker for similar issues including closed ones

Verbose log

$ youtube-dl -v --write-description 'https://virt.bandcamp.com/track/hyper-camelot-guest-director-boss-battle'
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-v', '--write-description', 'https://virt.bandcamp.com/track/hyper-camelot-guest-director-boss-battle']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2020.03.24
[debug] Python version 3.6.5 (CPython) - Linux-5.5.19-pclos1-x86_64-with-mandrake-2020-PCLinuxOS
[debug] exe versions: ffmpeg 4.2.2, ffprobe 4.2.2
[debug] Proxy map: {}
[Bandcamp] hyper-camelot-guest-director-boss-battle: Downloading webpage
[debug] Default format spec: bestvideo+bestaudio/best
WARNING: There's no description to write.
[debug] Invoking downloader on 'https://t4.bcbits.com/stream/e493e1ebbbedc392966bb1bff071780e/mp3-128/804085936?p=0&ts=1588216165&t=2284db76a3a67ef719da6743c43d1a07a2c793a0&token=1588216165_cd699ad85036eea2d3e6f9921d390425419a380e'
[download] Destination: Jake Kaufman - Hyper Camelot (Guest Director Boss Battle)-804085936.mp3
[download] 100% of 2.15MiB in 00:00

Description

Attempting to download a track description from Bandcamp using --write-description results in WARNING: There's no description to write. and no file being written, even though a description is present.
As you can see here, the track used above does have a description.
Is this maybe not supported at all for Bandcamp yet, making this more of a feature request? (In that case, the current warning is giving the user the wrong idea.)

@oxguy3
Copy link

@oxguy3 oxguy3 commented Apr 30, 2020

Was able to replicate this on my machine. It looks like BandcampIE (the extractor for individual tracks on Bandcamp) doesn't include any code for retrieving the description, so this is a missing feature rather than a bug.

Currently, BandcampIE pulls all its metadata out of a JSON object in the page: the trackinfo property of the TralbumData variable. The description of the song can be found in the current property of the TralbumData variable. Actually, the description is made of two parts, current.about and current.lyrics. Given that Bandcamp's website displays these as though they were one continuous description, it seems like it'd be easiest if youtube-dl combined them as well.

Here's some untested code for BandcampIE's _real_extract() method that would pull those two fields out of TralbumData.current, then combine them into one variable:

current = self._parse_json(
    self._search_regex(
        r'current\s*:\s*\[\s*({.+?})\s*\]\s*,\s*?\n',
        webpage, 'current', default='{}'), title)
if current:
    about = str_or_none(current.get('about'))
    lyrics = str_or_none(current.get('lyrics'))
    description = "\n\n".join(filter(None, (about, lyrics)))

If only one of the two values is set, then description will just be set to that value. If they're both set, description will be set to a concatenation of both, with two line breaks added between them.

This code is based on the existing track_info code in BandcampIE, and can be dropped in immediately after that code on line 117. You'd also need to add the description to the function's return value on line 204.

You could also add the same functionality to BandcampAlbumIE with similar code. However, albums don't have lyrics, so you would only need the about field.

@emphoeller
Copy link
Author

@emphoeller emphoeller commented Apr 30, 2020

Here’s what I came up with, based on your code:

tralbumdata_current = self._parse_json(
	self._search_regex(
		r'TralbumData\s*=\s*\{.*?current\s*:\s*(\{.*?\})',
		webpage, 'track description', default='{}'), title)
description = None
if tralbumdata_current:
	description = '\r\n\r\n'.join(filter(None, (
		str_or_none(tralbumdata_current.get('about')),
		str_or_none(tralbumdata_current.get('lyrics')))))

My modifications explained:

  • Using variable name tralbumdata_current instead of current as it is a bit more informative.
  • Different regex:
    • Matching prefix TralbumData = { so not just any current in the webpage is matched.
    • Dropped the \[ \] since current is not an array.
    • Escaped the braces, which is more readable IMO as it makes clear that they are being matched literally.
    • Switched from .+? to .*? inside the capturing group in order to match {}.
  • Passing a different string ('track description') to _search_regex – it appears in error messages, and “current” is not very transparent.
  • Inlined the single-use variables about and lyrics.
  • Using CR LF instead of LF in the separator as this seems to be the standard for line breaks in Bandcamp's descriptions and lyrics, avoiding creating a file with inconsistent line breaks.

I have also added 'description': description, to the return dictionary. However, right now I can’t figure out how to actually execute my modified code, as whenever I call python3 ./youtube-dl/youtube_dl/ --write-description 'https://virt.bandcamp.com/track/hyper-camelot-guest-director-boss-battle', the program executes unchangedly, even when I print something, raise an exception, or produce a syntax error. How do I run my code properly?

Also a side question: What happens when about or lyrics contains }? Wouldn’t that have the regex cut off there, which would lead to an incomplete JSON string with an unterminated string literal being parsed, resulting in an error?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.