Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

youtube-dl downloads as .ogg and ffmpeg does not convert correctly #6204

Closed
haveneersrobin opened this issue Jul 13, 2015 · 5 comments
Closed

youtube-dl downloads as .ogg and ffmpeg does not convert correctly #6204

haveneersrobin opened this issue Jul 13, 2015 · 5 comments

Comments

@haveneersrobin
Copy link

@haveneersrobin haveneersrobin commented Jul 13, 2015

I'm trying to download some videos and extract audtio and it seems to me that if the video is .webm (which I believe is a HTML5 youtube video) youtube-dl downloads the audio as .ogg. When I specify --audio-format m4a it downloads the songs and converts it. However, when I try setting the tags with mutagen in Python it raises the error that it's not a .m4a (mpeg-4) file and thus can not set the tags. Is this an issue related to youtube-dl or do I have to search the problem elsewhere (e.g. with ffmpeg or mutagen)

@haveneersrobin haveneersrobin changed the title YouTube dl downloads as .ogg and ffmpeg does not convert correctly youtube-dl downloads as .ogg and ffmpeg does not convert correctly Jul 13, 2015
@jaimeMF
Copy link
Collaborator

@jaimeMF jaimeMF commented Jul 13, 2015

Post the steps you follow so that we can reproduce the problem, including the youtube-dl command (with the output you get with the --verbose flag) and the mutagen commands.

@haveneersrobin
Copy link
Author

@haveneersrobin haveneersrobin commented Jul 13, 2015

Well I use youtube-dl in a tool I helped developing and it stopped working recently. I'll try and go in full detail so bare with me.
I call the youtube-dl command in a Python script using the os.system-command. The command that is executed is: youtube-dl -x -ytsearch:<query>
When I call this command (with --verbose) in my script I get the following:

usr/local/bin/youtube-dl -x --verbose  -o "/Users/Robin/Desktop/Download/Reality - Lost Frequencies, Janieck Devy.m4a" ytsearch:"Reality Lost Frequencies, Janieck Devy Audio" (1)
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'-x', u'--verbose', u'-o', u'/Users/Robin/Desktop/Download/Reality - Lost Frequencies, Janieck Devy.m4a', u'ytsearch:Reality Lost Frequencies, Janieck Devy Audio']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2015.04.26
[debug] Python version 2.7.9 - Darwin-15.0.0-x86_64-i386-64bit
[debug] exe versions: ffmpeg 2.6.2, ffprobe 2.6.2
[debug] Proxy map: {}
[youtube:search] query "Reality Lost Frequencies, Janieck Devy Audio": Downloading page 1
[download] Downloading playlist: Reality Lost Frequencies, Janieck Devy Audio
[youtube:search] playlist Reality Lost Frequencies, Janieck Devy Audio: Collected 1 video ids (downloading 1 of them)
[download] Downloading video 1 of 1
[youtube] w5_pa7TYxGM: Downloading webpage
[youtube] w5_pa7TYxGM: Extracting video information
[youtube] w5_pa7TYxGM: Downloading DASH manifest
[debug] Invoking downloader on 'https://r7---sn-5hne6n76.googlevideo.com/videoplayback?id=c39fe96bb4d8c463&itag=171&source=youtube&requiressl=yes&mn=sn-5hne6n76&mm=31&ms=au&mv=m&pl=25&nh=IgpwcjA0LmFtczE1KgkxMjcuMC4wLjE&ratebypass=yes&mime=audio/webm&gir=yes&clen=3073553&lmt=1435050464882686&dur=186.147&upn=vPHH3RUMel0&signature=56B609FE91E8288479E93B107DCBC1DF13081D89.61A67C471A4D33466F997216C7062F197626D789&sver=3&key=dg_yt0&mt=1436792387&fexp=901816,920935,937432,9406545,9406848,9408142,9408420,9408710,9412839,9415005,9415833,9416126,948802&ip=193.190.253.145&ipbits=0&expire=1436814054&sparams=ip,ipbits,expire,id,itag,source,requiressl,mn,mm,ms,mv,pl,nh,ratebypass,mime,gir,clen,lmt,dur'
[download] Destination: /Users/Robin/Desktop/Download/Reality - Lost Frequencies, Janieck Devy.m4a
[download] 100% of 2.93MiB in 00:00
[debug] ffmpeg command line: ffprobe -show_streams '/Users/Robin/Desktop/Download/Reality - Lost Frequencies, Janieck Devy.m4a'
[ffmpeg] Destination: /Users/Robin/Desktop/Download/Reality - Lost Frequencies, Janieck Devy.ogg (2)
[debug] ffmpeg command line: ffmpeg -y -i '/Users/Robin/Desktop/Download/Reality - Lost Frequencies, Janieck Devy.m4a' -vn -acodec copy '/Users/Robin/Desktop/Download/Reality - Lost Frequencies, Janieck Devy.ogg'
Deleting original file /Users/Robin/Desktop/Download/Reality - Lost Frequencies, Janieck Devy.m4a (pass -k to keep)

(1) Is the command I use to call youtube-dl
(2) Is where ffmpeg converts to .ogg (I think)

When I specify --audio-format M4A, this happens

/usr/local/bin/youtube-dl -x --audio-format m4a --verbose  -o "/Users/Robin/Desktop/Download/Rio - Netsky.m4a" ytsearch:"Rio Netsky Audio" (1)
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'-x', u'--audio-format', u'm4a', u'--verbose', u'-o', u'/Users/Robin/Desktop/Download/Rio - Netsky.m4a', u'ytsearch:Rio Netsky Audio']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2015.04.26
[debug] Python version 2.7.9 - Darwin-15.0.0-x86_64-i386-64bit
[debug] exe versions: ffmpeg 2.6.2, ffprobe 2.6.2
[debug] Proxy map: {}
[youtube:search] query "Rio Netsky Audio": Downloading page 1
[download] Downloading playlist: Rio Netsky Audio
[youtube:search] playlist Rio Netsky Audio: Collected 1 video ids (downloading 1 of them)
[download] Downloading video 1 of 1
[youtube] exNtYwaL0gw: Downloading webpage
[youtube] exNtYwaL0gw: Extracting video information
[youtube] exNtYwaL0gw: Downloading DASH manifest
[debug] Invoking downloader on 'https://r6---sn-5hne6n7e.googlevideo.com/videoplayback?id=7b136d63068bd20c&itag=171&source=youtube&requiressl=yes&gcr=be&mn=sn-5hne6n7e&mm=31&pl=25&mv=m&nh=IgpwcjA0LmFtczE1KgkxMjcuMC4wLjE&ms=au&ratebypass=yes&mime=audio/webm&gir=yes&clen=3668594&lmt=1434208367347597&dur=229.185&mt=1436792600&fexp=901816,9405451,9405633,9405995,9408142,9408195,9408420,9408710,9409097,9409172,9412773,9414737,9416126,9416882,9417178&sver=3&upn=92FP52blfMA&signature=57E1459B902E20347220EF34E41A107D85EC023D.33EE81B993C1918536CC8CD9B6CB85307D6B70C5&key=dg_yt0&ip=193.190.253.145&ipbits=0&expire=1436814239&sparams=ip,ipbits,expire,id,itag,source,requiressl,gcr,mn,mm,pl,mv,nh,ms,ratebypass,mime,gir,clen,lmt,dur'
[download] Destination: /Users/Robin/Desktop/Download/Rio - Netsky.m4a
[download] 100% of 3.50MiB in 00:00
[debug] ffmpeg command line: ffprobe -show_streams '/Users/Robin/Desktop/Download/Rio - Netsky.m4a'
[youtube] Post-process file /Users/Robin/Desktop/Download/Rio - Netsky.m4a exists, skipping
/Users/Robin/Desktop/Download/Rio - Netsky.m4a
Error caught on signal handler: <bound method ?.result of <Scraper(Scraper-1, started)>>
Traceback (most recent call last): (2)
  File "/Library/Python/2.7/site-packages/twisted/internet/defer.py", line 578, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/Library/Python/2.7/site-packages/scrapy/core/engine.py", line 275, in <lambda>
    spider=spider, reason=reason, spider_stats=self.crawler.stats.get_stats()))
  File "/Library/Python/2.7/site-packages/scrapy/signalmanager.py", line 23, in send_catch_log_deferred
    return signal.send_catch_log_deferred(*a, **kw)
  File "/Library/Python/2.7/site-packages/scrapy/utils/signal.py", line 53, in send_catch_log_deferred
    *arguments, **named)
--- <exception caught here> ---
  File "/Library/Python/2.7/site-packages/twisted/internet/defer.py", line 140, in maybeDeferred
    result = f(*args, **kw)
  File "/Library/Python/2.7/site-packages/scrapy/xlib/pydispatch/robustapply.py", line 54, in robustApply
    return receiver(*arguments, **named)
  File "/usr/local/bin/Cellar/Vire/Vire/src/scraper.py", line 55, in result
    self.youtube.download(playlist, self.path)
  File "/usr/local/bin/Cellar/Vire/Vire/src/youtube.py", line 60, in download
    Youtube.tag(track, path)
  File "/usr/local/bin/Cellar/Vire/Vire/src/youtube.py", line 73, in tag
    audio = MP4(path)
  File "/Library/Python/2.7/site-packages/mutagen/_file.py", line 40, in __init__
    self.load(filename, *args, **kwargs)
  File "/Library/Python/2.7/site-packages/mutagen/mp4/__init__.py", line 932, in load
    self.info = MP4Info(atoms, fileobj)
  File "/Library/Python/2.7/site-packages/mutagen/mp4/__init__.py", line 812, in __init__
    raise MP4StreamInfoError("not a MP4 file")
mutagen.mp4.MP4StreamInfoError: not a MP4 file

(1) The command used for calling youtube-dl
(2) This is where my Ptyhon error starts

I just noticed that in the first example (without the --audio-format m4a flag, it correctly downloaded the song 'Netsky-Rio' and failed for 'Lost Frequencies-Reality', but in the second example the mutagen script raises the error the 'Netsky-Rio' is not a M4A file. However, it appears as an .m4a in the location where I downloaded the song.

I understand if this is possibly not very helpful or clear. Please let me know if I can do anything to make this more clear.

@haveneersrobin
Copy link
Author

@haveneersrobin haveneersrobin commented Jul 13, 2015

This is the python file that does most of the work. It calls some utilities that are in a seperate file. If any of these need explanation, please let me know.

# -*- coding: utf-8 -*-

# Import dependencies.
from __future__ import division
import os
import os.path
import string
import os.path
import urllib2 as urllib
from os.path import expanduser
from utils import Utils
from playlist import Playlist
from mutagen.mp4 import MP4, MP4Cover
from distutils.spawn import find_executable

# Youtube instance that downloads playlists from Youtube.com.
class Youtube():

    # Create new instance of Youtube with forced boolean.
    def __init__(self, forced):
        self.forced = forced

    # Generate search query from a given track.
    @staticmethod
    def query(track):
        GLOBAL_QUERY = 'Audio'
        enc = (track['title'], track['artist'], GLOBAL_QUERY)
        q = ' '.join(enc)
        return Utils.clean_string(q, False)

    # Download the given track to the given download path.
    def download_track(self, track, path):
        path = Utils.clean_string(Utils.full_path(track, path, 'm4a'), False)
        path = filter(lambda x: x in string.printable, path)
        path = path.replace('~', expanduser('~'))
        if self.forced:
            if os.path.exists(path):
                os.system('rm "' + path + '"')
        query = Utils.clean_string(Youtube.query(track), False)
        youtubedlpath = find_executable('youtube-dl')
        enc = (youtubedlpath, ' -x --audio-format m4a --verbose  -o "', path, '" ytsearch:"', query, '"')
        command = ''.join(enc)
        command = filter(lambda x: x in string.printable, command)
        print command
        os.system(command.encode('utf-8'))

    # Download the given playlist to the given download path.
    def download(self, playlist, path):
        if playlist.length() == 0:
            print 'Invalid playlist.'
            os._exit(1)
        playlist.reset()
        #Utils.progress(0)
        total = playlist.length()
        current = 0
        while playlist.has_next():
            playlist.next()
            track = playlist.current_track()
            self.download_track(track, path)
            Youtube.tag(track, path)
            current += 1
            progress = (current / total) * 100
            #Utils.progress(progress)

    # Set the ID3 tags of the given track to the file at the given path.
    @staticmethod
    def tag(track, path):
        path = Utils.full_path(track, path, 'm4a')
        path = filter(lambda x: x in string.printable, path)
        path = path.replace('~', expanduser('~'))
        if not os.path.exists(path):
            return;
        audio = MP4(path)
        print track
        if track['album_type'] == 'single':
            track['album_name'] += ' - Single'
        audio['\xa9nam'] = track['title'].encode('utf-8')
        audio['\xa9ART'] = track['artist'].encode('utf-8')
        audio['\xa9alb'] = track['album_name'].encode('utf-8')
        audio['aART'] = track['album_artist'].encode('utf-8')
        audio['cprt'] = track['copyright'].encode('utf-8')
        audio['disk'] = [(1, 1)]
        audio['trkn'] = [(int(track['track']), int(track['maxtracks']))]
        audio['\xa9day'] = track['year']
        if track['explicit']:
            audio['rtng'] = [(str(4))]
        cover = track['album_url'].encode('utf-8')
        fd = urllib.urlopen(cover)
        covr = MP4Cover(fd.read(), getattr(MP4Cover,'FORMAT_PNG' if cover.endswith('png') else 'FORMAT_JPEG'))
        fd.close()
        audio['covr'] = [covr]
        audio.save()
@jaimeMF
Copy link
Collaborator

@jaimeMF jaimeMF commented Jul 14, 2015

Ok, the problem is that you specify the extension in the output template: /Users/Robin/Desktop/Download/Reality - Lost Frequencies, Janieck Devy.m4a but you should use the placeholder /Users/Robin/Desktop/Download/Reality - Lost Frequencies, Janieck Devy.%(ext)s. It's needed because the downloaded format is actually a webm file but youtube-dl thinks it's already a m4a file.

Since you seem to only be interested in getting the audio in m4a format, using youtube-dl -x --audio-format m4a --format 'bestaudio[ext=m4a]/best' --verbose -o "/Users/Robin/Desktop/Download/Rio - Netsky.%(ext)s" ytsearch:"Rio Netsky Audio" could be the best choice. This will try first to directly download the audio file in m4a format and fallback to downloading the full file and extracting the audio.

If this still fails, I'll try to inspect the rest of your code.

@haveneersrobin
Copy link
Author

@haveneersrobin haveneersrobin commented Jul 14, 2015

This worked! Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.