Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[niconico] Improvement niconico douga from another approach. (Support encrypted movies and old server source movies) #49

Merged
merged 28 commits into from
Feb 10, 2021

Conversation

kurumigi
Copy link
Contributor

@kurumigi kurumigi commented Feb 2, 2021

Please follow the guide below

  • You will be asked some questions, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your pull request (like that [x])
  • Use Preview tab to see how your pull request will actually look like

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of some part of this code and I am willing to release it under Unlicense
  • I am not the original author of rest of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

Main part of this PR was pull requested two times by @tsukumijima (ytdl-org/youtube-dl#23824 / blackjack4494/youtube-dlc#55).

niconico (ニコニコ動画), which is not known outside Japan, is the largest video sharing site in Japan.

niconico has begun using HLS to encrypt the distribution of some official videos, such as anime, on March 5, 2019.
As a result, some official videos can no longer be downloaded with the previously used Extractor.
(Examples of videos that can be downloaded as before: https://www.nicovideo.jp/watch/sm32868846
(Example of video that cannot be downloaded due to HLS encryption: https://www.nicovideo.jp/watch/so36189921)

This pull request includes an improvement to allow you to download niconico's encrypted HLS videos.

In niconico, if you do not send "heartbeat" once every 60 seconds, the download session will be closed and you will get a 403 error.
This pull request also includes a change to allow "heartbeat" to be sent once every 60 seconds during the download, and to allow long videos to be downloaded to the end.

We have also made changes so that you can obtain higher quality thumbnails and original titles.
(Sentence may be a bit strange because we use translation)

Additonaly, I write some code for downloading movie from SMILEVIDEO, niconico douga old distribution system. Niconico douga changed its distribution system from SMILEVIDEO to DMC (Dwango Media Cluster) in 2016. But SMILEVIDEO was stored source movies until 2018-06-28, and we can access this movies even now.

@pukkandan
Copy link
Member

I didn't go through the code in-depth, but these are my initial thoughts:

  1. Fix the conflicts - I am not knowledgeable in niconico, so it's totally up to you to resolve conflicts
  2. Rebase the branch on yt-dlp and get rid of the merge commits from youtube-dlc. Having these unnecessary commits makes both review and conflict resolution tedious
  3. From what I understand, the code is similar to https://github.com/animelover1984/youtube-dl, but without the live video extractor (and hence no need for websockets), correct?
  4. The heartbeat logic should be moved to either a separate downloader similar to YoutubeLiveChatReplayFD or moved inside the extractor itself. (I can help with this)

@tsukumijima
Copy link
Contributor

From what I understand, the code is similar to https://github.com/animelover1984/youtube-dl, but without the live video extractor (and hence no need for websockets), correct?

Yes, it's correct. I'd like to support NicoLive in the future, but the drawback is that I need a WebSocket module. There seems to be a restriction that youtube-dl should not use external libraries, so I can't find a good solution.

The heartbeat logic should be moved to either a separate downloader similar to YoutubeLiveChatReplayFD or moved inside the extractor itself. (I can help with this)

Heartbeat logic is an improvement over the code proposed by others, but I don't think the current implementation is good either.
I'm not familiar with Python, so I don't know how to solve it.

The sentence may be unnatural because it is machine translated from Japanese.

@pukkandan
Copy link
Member

pukkandan commented Feb 2, 2021

I'm not familiar with Python, so I don't know how to solve it.

And, I am not familiar with niconico. So, unless someone who is familiar with it is willing to help, this PR probably won't go anywhere. Maybe @kurumigi or @bbepis would be interested?

there seems to be a restriction that youtube-dl should not use external libraries, so I can't find a good solution.

Yes, youtube-dl has that requirement. But yt-dlp does not. However, if the dependency is not found, the import should fail gracefully. What I mean is, the ImportError should be handled silently and when the user tries to use a functionality that requires the dependancy, yt-dlp must give error message stating that an additional module is needed. You can see how I handled mutagen requirement for example:

https://github.com/pukkandan/yt-dlp/blob/b60419c51aa3eb9872e278e526cc5e62bf484462/youtube_dlc/postprocessor/embedthumbnail.py#L11-L15
https://github.com/pukkandan/yt-dlp/blob/b60419c51aa3eb9872e278e526cc5e62bf484462/youtube_dlc/postprocessor/embedthumbnail.py#L155-L156

@pukkandan
Copy link
Member

@kurumigi Don't mind the linter. It is an issue from my end

@kurumigi
Copy link
Contributor Author

kurumigi commented Feb 2, 2021

Fix the conflicts - I am not knowledgeable in niconico, so it's totally up to you to resolve conflicts
Rebase the branch on yt-dlp and get rid of the merge commits from youtube-dlc. Having these unnecessary commits makes both review and conflict resolution tedious

Oops.
I pushd it now.

From what I understand, the code is similar to https://github.com/animelover1984/youtube-dl, but without the live video extractor (and hence no need for websockets), correct?

Yes. I think only support of niconico douga (niconico's non-live service)..

youtube_dlc/extractor/niconico.py Outdated Show resolved Hide resolved
'url': video_real_url,
'format_id': 'smile',
'format_note': 'Source movie file in old server (SMILEVIDEO)',
'preference': -2, # I think demand for source movie is not high.
Copy link
Member

@pukkandan pukkandan Feb 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain 'preference': -2. This should probably be source_preference

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary: Most of movie files in SMILEVIDEO (old server) before 2018-06-28 are SOURCE files. But, not recommend for obsolete encoding and lack of movie file information.

According to official and unofficial technical documents written by Japanese.

Until 2018-06-28, under the following conditions SMILEVIDEO preserved uploaded movies without change.

The other cases, uploaded movies were re-encoded to mp4 (H.264).

However, SMILEVIDEO is deprecated system. So, download SMILEVIDEO movie file have several problems.

  • Movie encoding : Now, swf and flv are obsolete. DMC (new server) always re-encoded them to mp4 (H.264).
  • Reducing API : Now, information of SMILEVIDEO movie file from API are URL, extension and filesize only. API for DMC provided resolution and bitrate, mandatory information to detect download movie file.

Because I down 'preference', not 'source_preference'.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, makes sense

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kurumigi this is wrong. smile_high source files are the most important download from niconico because for the time period you specify (which might actually be around november 2016, i don't remember what my notes said), the smile_high videos are the highest quality for both video and audio.

The DMC streams are always worse in this regard, with degraded audio and reencoded video (which is usually downscaled to the nearest 16:9 standard resolution). The fact that some files use outdated codecs doesn't matter; any modern video player (VLC, MPC, MPV) can play them without issue. Since youtube-dl is an archival tool, the highest quality version should be preferred regardless. If you need certain codecs you can reencode them yourself, and you will still end up with a better quality video than the DMC versions because niconico uses very low quality settings for conversion.

There is no reason why they shouldn't be preferred (for videos uploaded before 2018 or 2016, need to double check). There are plenty of videos with 320kbit audio being reencoded to 192kbit in the DMC streams which is horrible. I can give plenty of examples of videos where quality is degraded from DMC conversion

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kurumigi
quality is used to mark that a video is better than ALL videos of a higher quality
source_quality is used to indicate that the format has a source that is prefered over other sources (this does not significantly affect the actual format selection unless explicitly specified by the user with -S source)

Like I said before, I don't know anything about niconico. But if @bbepis is correct, the quality should not be set lower and instead source_quality should be set lower (since less metadata is available in the older source and therefore, the newer source is preferred if everything else is equal)

All this is assuming that the video codec and resolution (or bitrate) of the formats is known. If these cannot be determined, the situation is more complicated.

Please give me an example output of -list-formats of a video with both sources so that I can better understand the situation

Copy link
Contributor Author

@kurumigi kurumigi Feb 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... Using ffprobe is an interesting idea.

Apart fromt that, API return "filesize is 1 byte" like this, the movie no longer stored in SMILEVIDEO.
This matter happens only in movies uploaded after 2018-06-28, it's one of the reasons I think "deadline" is 2018-06-28.

Copy link
Member

@pukkandan pukkandan Feb 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now only this issue left @kurumigi Great work :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-check official and unofficial technical documents written by Japanese, I understand major specification changes of niconico douga are two times.

  • Movies uploaded before 2016-12-08:
    Maximum Movie filesize was 100MB.
    SMILEVIDEO preserved acceptable encode (wrote in previous comment) movies, and re-encoded unacceptable movies to H.264.
    After 2018-04-10, DMC re-encoded all movies from SMILEVIDEO. (Official Source: http://blog.nicovideo.jp/niconews/69409.html (Japanese))

    Conclusion : SMILEVIDEO's movie are always superior to DMC's one.

  • Movies uploaded between 2016-12-08 and 2018-06-28:
    Maximum Movie filesize was upped to 1.5GB.
    SMILEVIDEO preserved acceptable encode movies below 100MB, and re-encoded unacceptable movies to H.264 below 100MB. (Archive of Official Source: https://www.nicovideo.jp/watch/sm33168111 (Japanese / Movie))
    DMC re-encoded all movies immediately.

    Conclusion : SMILEVIDEO acceptable movies are superior to DMC's one. Other movies depends re-encoded movie size.

  • Movies uploaded after 2018-06-28:
    SMILEVIDEO deactivated. No longer stored movies. (Official Source: https://twitter.com/nico_nico_talk/status/1004290141530173440 (Japanese))
    DMC re-encoded all movies immediately.

    Conclusion : DMC's movie only.

Hmmmm....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bbepis

Here is my logic that I use for smile format extraction, feel free to copy it outright.

I will merge this code. Thank you for permission to copy code.

Copy link
Member

@pukkandan pukkandan Feb 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bbepis Could you review the metadata extraction. I don't fully understand it

youtube_dlc/extractor/niconico.py Outdated Show resolved Hide resolved
youtube_dlc/downloader/common.py Outdated Show resolved Hide resolved
@bbepis
Copy link
Contributor

bbepis commented Feb 4, 2021

And, I am not familiar with niconico. So, unless someone who is familiar with it is willing to help, this PR probably won't go anywhere. Maybe @kurumigi or @bbepis would be interested?

I have no idea how to do a clean job of this; I'm not familiar with python enough to do concurrency (at least without googling everything). There might be a clean way with async code but you don't wish to have the version requirement for that.

I do strongly feel that https://github.com/pukkandan/yt-dlp/pull/49#discussion_r569817286 should be marked as unresolved though, since it causes lower quality videos to be downloaded in a significant amount of videos

@pukkandan
Copy link
Member

I have no idea how to do a clean job of this; I'm not familiar with python enough to do concurrency (at least without googling everything).

The current code is mostly good enough. Once https://github.com/pukkandan/yt-dlp/pull/49/files#discussion_r569649508 is resolved, I should be able to move the heartbeat code to a seperate downloader myself

There might be a clean way with async code but you don't wish to have the version requirement for that.

See above comment https://github.com/pukkandan/yt-dlp/pull/49/files#issuecomment-771771482

However, if the dependency is not found, the import should fail gracefully. What I mean is, the ImportError should be handled silently and when the user tries to use a functionality that requires the dependancy, yt-dlp must give error message stating that an additional module is needed.

As long as the the async code can be made to fail gracefully in python2.6+ (and therefore not affect the use of other extractors), It can be used

kurumigi and others added 5 commits February 5, 2021 22:04
 * [extractor/niconico] Split method to get acturally format info.
 * [downloader/niconico] Call above method just before the download.
 * [downloader/niconico] Select downloader by acturally format info.
 * [downloader/niconico] Unify downloader class now.
Copy link
Member

@pukkandan pukkandan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kurumigi I have reviewed the code so far and made some changes. Since I don't have write permission in your branch, I made a new branch with the changes. Please have a look https://github.com/pukkandan/yt-dlp/commits/nico

@kurumigi
Copy link
Contributor Author

kurumigi commented Feb 7, 2021

@pukkandan Thanks for adjustments! I tested using "--external_downloader ffmpeg", downloaded no problem.

@tsukumijima
Copy link
Contributor

The heartbeat related processing is nicely coded and nice. Thank you.

@pukkandan
Copy link
Member

Let me know when this is ready for final review

@pukkandan
Copy link
Member

pukkandan commented Feb 8, 2021

Don't worry about the merge conflict. It is caused because I have changed how nested downloaders (like niconico) should behave.

I will resolve it once the code is complete from your end

Resolved

Copy link
Member

@pukkandan pukkandan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few small things. Leaving review for future reference, I will fix these

youtube_dlc/postprocessor/ffmpeg.py Outdated Show resolved Hide resolved
youtube_dlc/extractor/niconico.py Outdated Show resolved Hide resolved
youtube_dlc/extractor/niconico.py Outdated Show resolved Hide resolved
youtube_dlc/extractor/niconico.py Outdated Show resolved Hide resolved
youtube_dlc/extractor/niconico.py Outdated Show resolved Hide resolved
@kurumigi
Copy link
Contributor Author

kurumigi commented Feb 9, 2021

Sorry.
I forget to check on community members only movies.
Fix about them.

Copy link
Member

@pukkandan pukkandan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good.

Is this ready for merge, or are you still making changes?

youtube_dlc/extractor/niconico.py Show resolved Hide resolved
youtube_dlc/extractor/niconico.py Outdated Show resolved Hide resolved
It's major reason is non-community member try to download community members only movie.
@pukkandan
Copy link
Member

Merging it now. If you have any further improvements, make a new PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants