Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

instagram: --write-pages broken #14769

Closed
Vrihub opened this issue Nov 16, 2017 · 0 comments
Closed

instagram: --write-pages broken #14769

Vrihub opened this issue Nov 16, 2017 · 0 comments

Comments

@Vrihub
Copy link
Contributor

@Vrihub Vrihub commented Nov 16, 2017

Make sure you are using the latest version: run youtube-dl --version and ensure your version is 2017.11.15. If it's not, read this FAQ entry and update. Issues with outdated version will be rejected.

  • I've verified and I assure that I'm running youtube-dl 2017.11.15

Before submitting an issue make sure you have:

  • At least skimmed through the README, most notably the FAQ and BUGS sections
  • Searched the bugtracker for similar issues including closed ones

What is the purpose of your issue?

  • Bug report (encountered problems with youtube-dl)
  • Site support request (request for adding support for a new site)
  • Feature request (request for a new functionality)
  • Question
  • Other

Description of your issue, suggested solution and other information

The recent fix to the instagram extractor 5fc12b9 broke the --write-pages option for this extractor.

The pages are correctly downloaded and parsed, but they are saved using the same file name, so they get overwritten and when the program exits, the user is left only with one page.
The old implementation used unique file names by including a max_id=... chunk in the name.

Reason: in the new implementation of the extractor, the max_id chunk is not part of the url_or_request argument which is passed to _download_json() (in /extractor/common.py) and used by _webpage_read_content() to build the file name for the --write-pages option. Instead the max_id is stored in the query dictionary, which is not used in _webpage_read_content() to build the file name.

I think there are two alternative ways to fix this bug:

A) Fix common.py and build the file name for written pages by also using the query argument.
(I didn't dig into this, because I thought it could have side-effects for other extractors)

B) Fix the instagram extractor, avoid using the query object and putting max_id back into the url.
I tried this and it works: see the attached patch.

instagram.py.fixwritepages.txt

@dstftw dstftw closed this in f610dbb Nov 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant
You can’t perform that action at this time.