Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--dump-pages Problem #5781

Closed
Alavar opened this issue May 22, 2015 · 11 comments
Closed

--dump-pages Problem #5781

Alavar opened this issue May 22, 2015 · 11 comments

Comments

@Alavar
Copy link

@Alavar Alavar commented May 22, 2015

I'm trying to use --dump-pages, but the only output I'm getting is a lot of randomly repeating alphanumeric characters starting with slashes.

Here is a very small fragment of the output: /IHl0Y2ZnLmQoKVtrXSA6IG87fSxzZXQ6IGZ1bmN0aW9uKCkge3ZhciBhID0gYXJndW1lbnRzO2lmIChhLmxlbmd0aCA+IDEpIHt5dGNmZy5kKClbYVswXV0gPSBhWzFdO30gZWxzZSB7Zm9yICh2YXIgayBpbiBhWzBdKSB7eXRjZmcuZCgpW2tdID0gYVswXVtrXTt9fX19Ozwvc2NyaXB0PiAgPHNjcmlwdD4KICAgIHl0Y2ZnLnNldCgiRVhQX0xBQ1RfTU9VU0UiLCBmYWxzZSk7CiAgICB5dGNmZy5zZXQoIkVYUF9MQUNUX1JFU0laRSIsIGZhbHNlKTsKICAgIHl0Y2ZnLnNldCgiRVhQX0xBQ1RfU0NST0xMIiwgZmFsc2UpOwogIDwvc2NyaXB0PgogIDxzY3JpcHQ+eXRjZmcuc2V0KCJMQUNUIiwgbnVsbCk7PC9zY3JpcHQ+CiAgCgoKCgogIDxzY3JpcHQ+CiAgICAgICAgKGZ1bmN0aW9uKCl7dmFyIGI9e2Y6ImNvbnRlbnQtc25hcC13aWR0aC0xIixoOiJjb250ZW50LXNuYXAtd2lkdGgtMiIsajoiY29udGVudC1zbmFwLXdpZHRoLTMiLGM6ImNvbnRlbnQtc25hcC13aWR0aC1za2lubnktbW9kZSJ9O2Z1bmN0aW9uIGcoKXt2YXIgYT1bXSxjO2ZvcihjIGluIGIpYS5wdXNoKGJbY10pO3JldHVybiBhfWZ1bmN0aW9uIGgoYSl7dmFyIGM9ZygpLmNvbmNhdChbImd1aWRlLXBpbm5lZCIsInNob3ctZ3VpZGUiXSksZT1jLmxlbmd0aCxmPVtdO2EucmVwbGFjZSgvXFMrL2csZnVuY3Rpb24oYSl7Zm9yKHZhciBkPTA7ZDxlO2QrKylpZihhPT1jW2RdKXJldHVybjtmLnB1c2goYSl9KTtyZXR1cm4gZn07ZnVuY3Rpb24gbChhLGMsZSl7dmFyIGY9ZG9jdW1lbnQuZ2V0RWxlbWVudHNCeVRhZ05hbWUoImh0bWwiKVswXSxrPWgoZi5jbGFzc05hbWUpO2EmJjEyNTE8PSh3aW5kb3cuaW5uZXJXaWR0aHx8ZG9jdW1lbnQuZG9jdW1lbnRFbGVtZW50LmNsaWVudFdpZHRoKSYmKGsucHVzaCgiZ3VpZGUtcGlubmVkIiksYyYmay5wdXNoKCJzaG93LWd1aWRlIikpO2lmKGUpe2U9d2luZG93LmlubmVyV2lkdGh8fGRvY3VtZW50LmRvY3VtZW50RWxlbWVudC5jbGllbnRXaWR0aDt2YXIgZD1lLTIxLTUwOzEyNTE8PSh3aW5kb3cuaW5uZXJXaWR0aHx8ZG9jdW1lbnQuZG9jdW1lbnRFbGVtZW50LmNsaWVudFdpZHRoKSYmYSYmYyYmKGQtPTIzMCk7ay5wdXNoKDY0MD49ZT8iY29udGVudC1zbmFwLXdpZHRoLXNraW5ueS1tb2RlIjoxMjYyPD1kPyJjb250ZW50LXNuYXAtd2lkdGgtMyI6MTA1Njw9ZD8iY29udGVudC1zbmFwLXdpZHRoLTIiOiJjb250ZW50LXNuYXAtd2lkdGgtMSIpfWYuY2xhc3NOYW1lPQprLmpvaW4oIiAiKX12YXIgbT1bInl0Iiwid3d3IiwibWFzdGhlYWQiLCJzaXppbmciLCJydW5CZWZvcmVCb2R5SXNSZWFkeSJdLG49dGhpczttWzBdaW4gbnx8IW4uZXhlY1NjcmlwdHx8bi5leGVjU2NyaXB0KCJ2YXIgIittWzBdKTtmb3IodmFyIHA7bS5sZW5ndGgmJihwPW0uc2hpZnQoKSk7KW0ubGVuZ3RofHx2b2lkIDA9PT1sP25bcF0.

How can I get youtube-dl to correctly show the HTML of the retrieved pages?

@jaimeMF
Copy link
Collaborator

@jaimeMF jaimeMF commented May 22, 2015

It's encoded in base64, otherwise it would be really messy to handle when the page contains newlines (most of them). If you have a base64 installed, you can use the following command in *nix to print the webpage contents:

youtube-dl --simulate --dump-pages  test:youtube | grep -A 1 'Dumping' | sed '/Dumping/d' | base64 --decode
@jaimeMF jaimeMF closed this May 22, 2015
@Alavar
Copy link
Author

@Alavar Alavar commented May 22, 2015

That makes complete sense, thanks for the explanation.
Due to the terribly long string I didn't take the time to notice the last characters.

It would however be helpful if this was stated in the --help.

@jaimeMF
Copy link
Collaborator

@jaimeMF jaimeMF commented May 22, 2015

It would however be helpful if this wasn't stated in the --help

Asuming you mean if this was stated in the --help ;), that's what I did in 79979c6

@Alavar
Copy link
Author

@Alavar Alavar commented May 22, 2015

Yes, exactly that was a typo and I just edited my post.
Glad to see it added and keep up the great work! :)

@Alavar
Copy link
Author

@Alavar Alavar commented May 22, 2015

I have run into yet another problem.
When combining the parameter --dump-pages with --dump-json I only end up getting the JSON without the Base64 encoded source of the page.

Is it possible to have receive both the JSON and the page dump in one command?

@jaimeMF
Copy link
Collaborator

@jaimeMF jaimeMF commented May 22, 2015

Is it possible to have receive both the JSON and the page dump in one command?

No, it suppress all output except the json.

@Alavar
Copy link
Author

@Alavar Alavar commented May 22, 2015

No, it suppress all output except the json.

Would it be difficult to modify the code and have it it output both the JSON and the page dump?
Another alternative would otherwise be to run two commands, one retrieving the page dump and one the JSON.

@jaimeMF
Copy link
Collaborator

@jaimeMF jaimeMF commented May 22, 2015

Would it be difficult to have it it output both the JSON and the page dump?

I wouldn't be too hard, but it would be messy and it would be inconsistent with the --get-* and --dump-single-json options.
You usercase is very specific, maybe if you told why do you need to inspect the pages we could give some advide.

@Alavar
Copy link
Author

@Alavar Alavar commented May 22, 2015

You usercase is very specific, maybe if you told why do you need to inspect the pages we could give some advide.

The main reason is that I want to see which videos that have encrypted signatures, if you have any suggestions on how I can find out if a video has an encrypted signature and output the JSON at the same time then I would be more than glad to hear it.

Another reason is that I can support some other smaller sites and since I don't write Python I would have to open a request for a new host that is barely used by other users.

@Alavar
Copy link
Author

@Alavar Alavar commented May 23, 2015

As this once again got off-topic; #5787.

@jaimeMF
Copy link
Collaborator

@jaimeMF jaimeMF commented May 23, 2015

Another reason is that I can support some other smaller sites and since I don't write Python I would have to open a request for a new host that is barely used by other users.

I don't see how inspecting the webpages would allow you to support new sites. Anyway, the proper thing is to ask us to add support for the page (we have lots of sites that probably are only used by a few users, so it's not a problem).

The main reason is that I want to see which videos that have encrypted signatures

As I said, this could be added to the json output and it shouldn't be too hard. Feel free to open an issue requesting it. Although I can't imagine why do you need to know if they are encrypted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.