Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dumping the page source possible? #10234

Closed
EnginePod opened this issue Aug 5, 2016 · 6 comments
Closed

Dumping the page source possible? #10234

EnginePod opened this issue Aug 5, 2016 · 6 comments

Comments

@EnginePod
Copy link

@EnginePod EnginePod commented Aug 5, 2016

  • I've verified and I assure that I'm running youtube-dl 2016.08.01

Before submitting an issue make sure you have:

  • At least skimmed through README and most notably FAQ and BUGS sections
  • Searched the bugtracker for similar issues including closed ones

What is the purpose of your issue?

  • Bug report (encountered problems with youtube-dl)
  • Site support request (request for adding support for a new site)
  • Feature request (request for a new functionality)
  • Question
  • Other

I see options that allows me to dump the user agent and JSON of the output, but is there a way to actually dump the source of the page(s) that youtube-dl went through?

@yan12125
Copy link
Collaborator

@yan12125 yan12125 commented Aug 5, 2016

Use --dump-pages or --write-pages. Note that the latter is not perfect as requests to the same URL are dumped to the same file.

@yan12125 yan12125 closed this Aug 5, 2016
@EnginePod
Copy link
Author

@EnginePod EnginePod commented Aug 5, 2016

Not sure how I missed these, thanks.

I went through both just now and they seem to have issues (as you pointed out).
--dump-pages doesn't work with most other parameters which is no use (-g or -j are just two examples).
--write-pages as you said just dumps pages all over the place and you have no idea which is which. This feature would've been useful if you could at least set the directory that you wanted to the save the pages to.

@yan12125
Copy link
Collaborator

@yan12125 yan12125 commented Aug 5, 2016

Yes currently these options are for debugging only and not designed for fine-grained control. You can modify https://github.com/rg3/youtube-dl/blob/8b40854/youtube_dl/extractor/common.py#L445-L472 to achieve "setting the directory that you wanted to the save the pages to" or similar.

@EnginePod
Copy link
Author

@EnginePod EnginePod commented Aug 5, 2016

Ah I see, but the downside is that I'd have to modify it and set it up every single time there's an update which would be an absolute pain.

PS
Nice new picture! 👍

@yan12125
Copy link
Collaborator

@yan12125 yan12125 commented Aug 5, 2016

Something hacky :) At least no need to modify sources.

from __future__ import unicode_literals

import json
import re

import youtube_dl


class Logger(object):
    def __init__(self):
        self.dumping_url = None
        self.dumped_files = []

    def debug(self, msg):
        if self.dumping_url:
            self.dumped_files.append({
                'url': self.dumping_url,
                'base64_data': msg,
            })
            self.dumping_url = None
            return

        mobj = re.search(r'Dumping request to (.+)', msg)
        if mobj:
            self.dumping_url = mobj.group(1)

    def warning(self, msg):
        pass

    def error(self, msg):
        pass

logger = Logger()

ydl_opts = {
    'dump_intermediate_pages': True,
    'logger': logger,
}

with youtube_dl.YoutubeDL(ydl_opts) as ydl:
    ydl.extract_info('https://www.youtube.com/watch?v=cbjMwKLE-RE', download=False)

print(json.dumps(logger.dumped_files, indent=4, sort_keys=True))
@EnginePod
Copy link
Author

@EnginePod EnginePod commented Aug 5, 2016

Big thanks, I'll give this a try! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.