Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[extractor/common] _parse_mpd_formats fails on parsing AdaptationSet with contentType="text" and mimeType="application/mp4" #14635

Closed
tobijjah opened this issue Oct 31, 2017 · 0 comments

Comments

@tobijjah
Copy link

@tobijjah tobijjah commented Oct 31, 2017

  • I've verified and I assure that I'm running youtube-dl 2017.10.29
  • At least skimmed through the README, most notably the FAQ and BUGS sections
  • Searched the bugtracker for similar issues including closed ones
  • Bug report (encountered problems with youtube-dl)

Description

Traceback (most recent call last):
  File "/home/ilex/Documents/code/python/projects/yt-dl/youtube-dl/youtube_dl/YoutubeDL.py", line 784, in extract_info
    ie_result = ie.extract(url)
  File "/home/ilex/Documents/code/python/projects/yt-dl/youtube-dl/youtube_dl/extractor/common.py", line 434, in extract
    ie_result = self._real_extract(url)
  File "/home/ilex/Documents/code/python/projects/yt-dl/youtube-dl/youtube_dl/extractor/hbo.py", line 309, in _real_extract
    return self._extract_from_path('series/%s' % path)
  File "/home/ilex/Documents/code/python/projects/yt-dl/youtube-dl/youtube_dl/extractor/hbo.py", line 154, in _extract_from_path
    return self._extract_from_xml(api_data, path)
  File "/home/ilex/Documents/code/python/projects/yt-dl/youtube-dl/youtube_dl/extractor/hbo.py", line 147, in _extract_from_xml
    data = self._extract_mpd_formats(url, 12)
  File "/home/ilex/Documents/code/python/projects/yt-dl/youtube-dl/youtube_dl/extractor/common.py", line 1753, in _extract_mpd_formats
    formats_dict=formats_dict, mpd_url=mpd_url)
  File "/home/ilex/Documents/code/python/projects/yt-dl/youtube-dl/youtube_dl/extractor/common.py", line 2007, in _parse_mpd_formats
    self.report_warning('Unknown MIME type %s in DASH manifest' % mime_type)
  File "/home/ilex/Documents/code/python/projects/yt-dl/youtube-dl/youtube_dl/extractor/common.py", line 702, in report_warning
    '[%s] %s%s' % (self.IE_NAME, idstr, msg))
  File "/home/ilex/Documents/code/python/projects/yt-dl/youtube-dl/test/helper.py", line 244, in _report_warning
    real_warning(w)
  File "/home/ilex/Documents/code/python/projects/yt-dl/youtube-dl/test/test_download.py", line 52, in report_warning
    raise ExtractorError(message)
youtube_dl.utils.ExtractorError: [hbo:series] Unknown MIME type application/mp4 in DASH manifest; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

If you trying to parse with _parse_mpd_formats (respectively _extract_mpd_formats) the provided example MPD manifest file, the method fails and triggers an ExtractorError. The error trigger occurs if the parser reaches the AdaptationSet with the contentType="text" and the mimeType="application/mp4". In the attached code snippet (where the error occurs) it looks like the parser should skip an AdaptationSet with this contentType but it does not do it. Indeed, the 'content_type' attribute is set to application and the conditional statement is not fulfilled (see comments in the attached code snippet). I attached three possible solutions because I don't know if this behavior is wished or this is a bug of the parser. Additionally, I don't know if the provided solutions have unknown side effects etc. (I tested them just with a few examples and in my case it worked as it should)

for representation in adaptation_set.findall(_add_ns('Representation')):
    if is_drm_protected(representation):
        continue
    representation_attrib = adaptation_set.attrib.copy()
    representation_attrib.update(representation.attrib)  # contentType is text
    # According to [1, 5.3.7.2, Table 9, page 41], @mimeType is mandatory
    mime_type = representation_attrib['mimeType']  # mimeType is application/mp4
    content_type = mime_type.split('/')[0]  # sets content_type to application
    if content_type == 'text':  # therefore this condition is not fulfilled and this AdaptationSet is not skipped
        # TODO implement WebVTT downloading
        pass
    elif content_type in ('video', 'audio'):
        ...
    else:  # trigger error
        self.report_warning('Unknown MIME type %s in DASH manifest' % mime_type)
  • Possible solutions for common.InfoExtractor._parse_mpd_formats():
# First: tested and my preferred 
representation_attrib.update(representation.attrib)
mime_type = representation_attrib['mimeType'] 
content_type = mime_type.split('/')[0] 
if content_type == 'text' or representation_attrib.get('contentType') == 'text':  # change
     pass

# Second: untested
representation_attrib.update(representation.attrib)
mime_type = representation_attrib['mimeType'] 
content_type = representation_attrib.get('contentType')  # change
if content_type == 'text': 
     pass

# Third: tested
representation_attrib.update(representation.attrib)
mime_type = representation_attrib['mimeType'] 
content_type = mime_type.split('/')[0] 
if content_type in ('text', 'application'):  # change
     pass
@tobijjah tobijjah closed this Oct 31, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant
You can’t perform that action at this time.