Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
Make sure you are using the latest version: run
youtube-dl --versionand ensure your version is 2017.11.15. If it's not, read this FAQ entry and update. Issues with outdated version will be rejected.Before submitting an issue make sure you have:
What is the purpose of your issue?
Description of your issue, suggested solution and other information
The recent fix to the instagram extractor 5fc12b9 broke the
--write-pagesoption for this extractor.The pages are correctly downloaded and parsed, but they are saved using the same file name, so they get overwritten and when the program exits, the user is left only with one page.
The old implementation used unique file names by including a
max_id=...chunk in the name.Reason: in the new implementation of the extractor, the
max_idchunk is not part of theurl_or_requestargument which is passed to_download_json()(in/extractor/common.py) and used by_webpage_read_content()to build the file name for the--write-pagesoption. Instead themax_idis stored in thequerydictionary, which is not used in_webpage_read_content()to build the file name.I think there are two alternative ways to fix this bug:
A) Fix
common.pyand build the file name for written pages by also using thequeryargument.(I didn't dig into this, because I thought it could have side-effects for other extractors)
B) Fix the instagram extractor, avoid using the
queryobject and puttingmax_idback into the url.I tried this and it works: see the attached patch.
instagram.py.fixwritepages.txt