Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with embedding youtube-dl in python, getting auto subtitles, and getting a filename #15262

Closed
zefoo opened this issue Jan 15, 2018 · 2 comments

Comments

@zefoo
Copy link

@zefoo zefoo commented Jan 15, 2018

Make sure you are using the latest version: run youtube-dl --version and ensure your version is 2018.01.14. If it's not, read this FAQ entry and update. Issues with outdated version will be rejected.

  • I've verified and I assure that I'm running youtube-dl 2018.01.14

Before submitting an issue make sure you have:

  • At least skimmed through the README, most notably the FAQ and BUGS sections
  • Searched the bugtracker for similar issues including closed ones

What is the purpose of your issue?

  • Bug report (encountered problems with youtube-dl)
  • Site support request (request for adding support for a new site)
  • Feature request (request for a new functionality)
  • Question
  • Other

Reference purposes for embedding: https://github.com/rg3/youtube-dl/blob/master/README.md#embedding-youtube-dl

Reference options that can be used when embedding: https://github.com/rg3/youtube-dl/blob/3e4cedf9e8cd3157df2457df7274d0c842421945/youtube_dl/YoutubeDL.py#L137-L312

I am embedding youtube-dl into a python script.

I have this code currently:

from __future__ import unicode_literals
import youtube_dl

class MyLogger(object):
    def debug(self, msg):
        pass
    def warning(self, msg):
        pass
    def error(self, msg):
        print(msg)

def my_hook(d):
    print(d['status'])
    if d['status'] == 'finished':
        print('Done downloading, now converting ...')
        transcription_file = d['filename'].replace('webm', 'en.vtt')
        with open(transcription_file, 'r') as myfile:
            transcription_text=myfile.read().replace('\n', '')
        print(transcription_text)
        print('printing...')

ydl_opts = {
    'writeautomaticsub': True,
    'progress_hooks': [my_hook],
    'forcefilename': True,
    'skip_download': True,
    'logger': MyLogger(),
}
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
    output = ydl.download(['https://www.youtube.com/watch?v=SAgYiERRDPY'])

I have spent several hours on this, across multiple days, without getting it to work the way I want. Ideally, this is what I want to happen:

A) I feed a URL
B) I get all the titles from the URL returned in a string/list (regular subtitles if they exist, if not that, then I want autogenerated)
C) I delete video file and titles files

The issues/things I don't understand with this:

  1. my_hook gets called twice with d['status'] == 'finished' being True. I would expect it to only be called once. Maybe the second time is for the titles.

  2. 'keepvideo': False - I don't want to keep the video file, I only want the titles. I tried to set "simulate" to True as well, still kept video file. This isn't that big of a deal, mostly curious (I can manually delete the video file in python). Actually, nevermind, 'skip_download': True does what I want... it just keeps the en.vtt file. I think.

  3. I want to get the filename of the resulting video file, or titles file for that matter (ideally the latter). I can't seem to get that... despite it being in the hook.

Once I have the name, I can change extension to titles and then read that in and do what I want... but simply getting filename in a way that works is proving difficult. I look forward to any insight.

@zefoo
Copy link
Author

@zefoo zefoo commented Jan 15, 2018

And BTW, after changing searches around, I did see this: #10987

I could get filename that way. Then I have this:

from __future__ import unicode_literals
import youtube_dl
class MyLogger(object):
    def debug(self, message):
        global filename
        filename = message
    def warning(self, msg):
        pass
    def error(self, msg):
        print(msg)
def my_hook(d):
    if d['status'] == 'finished':
        print('Done downloading, now converting ...')
ydl_opts = {
    'writeautomaticsub': True,
    'logger': MyLogger(),
    'skip_download': True,
    'progress_hooks': [my_hook],
    'forcefilename': True,
}
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
    output = ydl.download(['https://www.youtube.com/watch?v=SAgYiERRDPY'])
if filename:
    transcription_file = filename.replace('[info] Writing video subtitles to: ', '')
    with open(transcription_file, 'r') as myfile:
            transcription_text=myfile.read().replace('\n', '')
    print(transcription_text)
  1. The question then is, does anyone see anything for improvement here?

  2. Any gotchas this could create?

  3. It would be nice to pull regular subtitles if they exist, and then if not pull auto generated. I'll have to play around with that some more but at least this works for now, mostly.

@dstftw
Copy link
Collaborator

@dstftw dstftw commented Jan 15, 2018

  1. Youtube serves video and audio separately thus two finished calls for two separate files.
  2. keepvideo keeps intermediate files. It has nothing to do with keeping the final file.
  3. There is no easy way to get the filename.
@dstftw dstftw closed this Jan 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.