Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: sequential download #4795

Merged
merged 9 commits into from
Apr 14, 2023
Merged

Conversation

pldubouilh
Copy link
Contributor

@pldubouilh pldubouilh commented Feb 10, 2023

Allows to optionally sequentially download a torrent - closes #597. Based off @rnhmjoj's patch ported to transmission v4.

A few caveats / open-questions :

  • no GUI implementation yet
  • no web-ui implementation yet
  • defaults to off - should there also be a globally settable-option ?
  • usage is a bit counter intuitive (more on this later)
  • I'm not a c++ programmer !

The usage can be counter intuitive - e.g. when sequentially downloading the free tears of steel torrent, if I only enable the movie, it will download it sequentially, but it will end with the beginning of the file - effectively preventing streaming the file from the disk with a video-player.

Knowing this, an easy workaround here is to download everything but the movie first, and then the movie. But I can imagine this would not immediately obvious to most users ! I don't see an immediate solution for this - I just wanted to share along with this PR to get ideas.

Thank you !

Fixes #5169

@ckerr ckerr added enhancement scope:core semver:minor adds functionality in a backwards compatible manner labels Feb 10, 2023
Copy link
Member

@ckerr ckerr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've requested a handful of changes but this is a great start, thanks for submitting this PR!

cli/cli.cc Outdated Show resolved Hide resolved
libtransmission/torrent.h Outdated Show resolved Hide resolved
libtransmission/torrent.cc Outdated Show resolved Hide resolved
libtransmission/torrent.h Outdated Show resolved Hide resolved
libtransmission/rpcimpl.cc Outdated Show resolved Hide resolved
libtransmission/peer-mgr-wishlist.cc Outdated Show resolved Hide resolved
@pldubouilh
Copy link
Contributor Author

All fixed up - thanks a lot for the detailed review :) keep me posted if you need any further change !

@pldubouilh pldubouilh requested review from ckerr and removed request for ckerr February 10, 2023 20:03
docs/rpc-spec.md Outdated Show resolved Hide resolved
@ckerr ckerr added this to the 4.1.0 milestone Feb 11, 2023
@pldubouilh
Copy link
Contributor Author

pldubouilh commented Feb 11, 2023

One more question - I have a further commit c051cc3 that I would like to include as part of this PR. Should I just cherry-pick it on this branch, or open a further PR afterwards ?

This commit is fairly important imho in order to honor the sequential download promise, as it fixes the issue I mentioned above.
It prioritises the beginning (first piece) of each downloaded file - making the sequential downloaded afterwards truly sequential from a file point of view. Incidentally it allows streaming the files currently being downloaded from the disk.

Thanks !

@ckerr
Copy link
Member

ckerr commented Feb 11, 2023

I'm not completely against the idea but I'm not convinced either -- it feels like added complexity for not much value. What is the benefit of sequential sorting being more fine-grained than the piece level?

Also

    [[nodiscard]] auto isFileFirstPiece(tr_piece_index_t piece) const
    {
        auto const [begin_idx, end_idx] = fpm_.fileSpan(piece);
        return piece == begin_idx;
    }

This is incorrect, begin_idx and end_idx are file indices, not piece indices. So the last line is compare a piece index to a file index, which is not going to give a meaningful value

@ckerr
Copy link
Member

ckerr commented Feb 11, 2023

Should I just cherry-pick it on this branch, or open a further PR afterwards ?

This is a more complicated question than it ought to be. This is a 4.1.0 feature, but got at least one 4.0.x release ahead of us first since people are crawling out of the woodwork to report issues they're finding in 4.0.0. An obvious answer to this is to make a new 4-0-x branch so that we can commit both features and fixes (see #4736) but that means every fix would need to be double-merged and that is a lot of work, so I'd rather not until the flow of bugfixes has slowed down.

So tl;dr is let's discuss it here in-PR and if/when we agree on merging it, add it to the PR

@pldubouilh
Copy link
Contributor Author

To me this effectively brings sequential-download-ness, as I think users care about the file/binary point of view, versus the torrent piece-ordering point of view.
Also this brings disk-streamability of the content being downloaded which is the reason why I did this PR in the first place :)

This is incorrect, begin_idx and end_idx are file indices

Cheers for this - I amended my commit in 2a6106e I think this should fix it 👍

@ckerr
Copy link
Member

ckerr commented Feb 11, 2023

as I think users care about the file/binary point of view, versus the torrent piece-ordering point of view.

Since whole pieces are what get checksum-tested, any files in the piece will still show as .part until the piece is complete and so there's no practical user-visible difference. So even if the piece holds >1 file you still won't be able to use the first file until the entire piece is done.

For that reason I don't see the benefit of extra code for block-level sequential downloading. I'm happy to keep discussing it if there are more arguments for the change but am still in my previous position of not completely against the idea but not convinced either.

@ckerr ckerr changed the title sequential download feat: sequential download Feb 12, 2023
@ckerr ckerr added type:feat A new feature and removed enhancement labels Feb 12, 2023
@pldubouilh
Copy link
Contributor Author

I'm not sure I fully understand the link with piece verification - iiuc content is verified and written on disk at path (+ optional .part extension), but users are free to also open and read the files actively being downloaded.

Just to clarify / rephrase the point I tried to make above - without 2a6106e, on the on the aforementioned tears of steel free torrent if I only select the movie, the download sequence will be from the file's second piece, till the end - then finishing by the file's first piece (first piece is shared between multiple files).

This prevents streaming the video file from the drive (e.g. with VLC) while we're downloading the torrent, as we're only downloading the first bytes at the beginning of the file (holding header metadata) at the very end.

To me the point of sequential downloading is purely on the block level - being able to read sequentially a file being downloaded is imho an important feature to have nowadays. It's directly helpful for video files, but I'm sure other applications would benefit from being able to stream-read large binaries from torrents while they're being downloaded.

And we're almost there with this PR, aside from the first few bytes of the files.

@pldubouilh
Copy link
Contributor Author

I'm also happy to change 2a6106e to another implementation if you prefer (maybe a simpler approach by putting a higher prio on the file's first piece ?)
Cheers !

@afontenot
Copy link

if I only select the movie, the download sequence will be from the file's second piece

I'm not a contributor to Transmission, but I've been following this PR and wanted to test this. I wasn't able to confirm.

  • Downloading and pausing the "Tears of Steel" torrent in sequential mode after a couple percent (after applying this PR), I was able to start playing the file immediately in VLC / mpv. The first (and last) piece of the torrent appear to be downloaded, including all the subtitles (which fit into one piece), even though they were deselected.
  • In case this was caused by Transmission prioritizing the first and last torrent pieces, I created a new test torrent with two large files of random data. I deselected the first file. After a very small percent had downloaded, I paused and checked the files with a hex editor. The very end of the first file had downloaded, along with the beginning of the second. Despite being in sequential mode, Transmission didn't start with the second piece of the enabled file. (It also downloaded the last piece right away again, for reasons unclear to me.)

I think this behavior makes sense - pieces are considered wanted for download even if the first part of the piece is in an unwanted file, and the first piece of the movie file necessarily comes before the second in sequential order, so it's not clear to me why there would be a problem (unless Transmission was incorrectly ignoring pieces that started in an unwanted file). It doesn't seem like explicitly prioritizing the first part of each file should be necessary.

Happy to test other things related to this patch if that's useful.

@ckerr
Copy link
Member

ckerr commented Feb 22, 2023

A couple of quick answers --

  • The "unwanted files downloaded anyway" question is covered in long-standing bug Unselected files are downloaded anyway #290.

  • The "prioritize first and last piece" feature's been in there for a looong time since f2daeb2 since it's needed to make media playable. See also 1, 2.

Is the current patch is fine as-is?

@Deathcow
Copy link

Deathcow commented Mar 5, 2023

Hi @ckerr, @pldubouilh, quick question: This kind of leech behavior seems pretty detrimental to the health of a swarm, and as far as I understand, the way bittorrent downloads and distributes pieces has a lot to do with its success. Is this really a good idea?

IMHO users who desire sequential downloads should use a different protocol, designed for that purpose.

@freeseacher
Copy link

@Deathcow as far as i can see that is no default behavior.
in my opinion torrents seeded much more time than downloading.
is that argument really fits in 2023?

@pldubouilh
Copy link
Contributor Author

pldubouilh commented Mar 5, 2023

hey @Deathcow - afaict there are two use cases where rarest-first comes in helpful

  • at the inception - to avoid thundering-herd-ing the few seeders with the same piece
  • when seeders are few and on unreliable connections - to maintain a torrent alive

wrt inception, this is an issue that is brought in by sequential download, but this will never be in practice a real thundering herd as all peers will start downloading the content at different speed, and not exactly at the same time, so it creates a de-facto offset. A proper real-life network analysis would be very interesting to perform regarding this !

and about the few seeders case, the idea was to focus on pieces that are only held by a few peers - so that the torrent wouldn't die if these few peers disconnect. It think the consensus is that nowadays long term seeders are much more reliable compared to the early bittorrent days, so this is not really an issue anymore.

so imho I really think this will-have // had (it's already implemented in other clients) no impact on the health of the network.

[edit] interestingly enough, the piece-selection's behaviour is not strictly standardised in a BEP, so clients are free to implement whatever they want. So protocol wise, it's perfectly compliant to have sequential downloads as well :)


also side question for @ckerr, I can't find references of rarest first in the codebase ? Afaict pieces are downloaded semi randomly (sorted by > salt) - also taking into account priority, and near-completion.

@pldubouilh
Copy link
Contributor Author

pldubouilh commented Mar 5, 2023

The "prioritize first and last piece" feature's been in there for a looong time since f2daeb2

@ckerr I can't find this feature in the current codebase ? main does not behave like this (checkout the second screenshot below, the files first bits are all zeros) - I'm wondering if this code was not lost at some point.
basically what I want to do in 2a6106e is exactly what was done in f2daeb2, and for the same reason !

Downloading and pausing the "Tears of Steel" torrent in sequential mode after a couple percent (after applying this PR), I was able to start playing the file immediately in VLC / mpv. The first (and last) piece of the torrent appear to be downloaded

@afontenot thank you for testing ! I can't repro this locally - just to clarify, you need to start the torrent with all the files already disabled (expect the movie) like vvv

2023-03-05-16:19:44

After downloading the file for a bit (17%) I could confirm that the beginning of the data is not present (with ghex) - also if you checkout the entropy graph you can see that it starts flat - which shows the zeros. (generated with binwalk -E file.bin). The video file is corrupted and VLC complains (it actually manages to fixup the video stream ~30 seconds in).

2023-03-05-16:35:05

And here's a similar screenshot where I followed the same process, but including my patch 2a6106e - the file header is here and it happily plays in VLC.

2023-03-05-16:56:37

@afontenot
Copy link

afontenot commented Mar 6, 2023

@pldubouilh @ckerr I found the issue! It's a glitch when a torrent has a web seed. The beginning chunk of the file isn't prioritized for some reason.

If you use this this modified torrent the sequential downloading patch appears to work as expected. When I was originally testing the patch I removed the webseed as that was the only way I could be sure sequential downloading was working.

Importantly this doesn't seem to be caused by this patch - I see the issue on my distribution's 4.0.1. So this is presumably a bug with web seed torrents having broken priority for the first and last pieces.

Edit: it does download it eventually for me. I wonder if this is just caused by a really slow webseed that gets "assigned" the first piece?

@ckerr
Copy link
Member

ckerr commented Mar 6, 2023

The "prioritize first and last piece" feature's been in there for a looong time since f2daeb2

@ckerr I can't find this feature in the current codebase ? main does not behave like this (checkout the second screenshot below, the files first bits are all zeros) - I'm wondering if this code was not lost at some point. basically what I want to do in 2a6106e is exactly what was done in f2daeb2, and for the same reason !

Yes good catch, it looks like this was lost as part of the huge rewriting that led up to 4.0.0.

IMO the right place for this fix would be

tr_priority_t tr_file_priorities::piecePriority(tr_piece_index_t piece) const
That class already has fast access to info :about what files are touched by the piece, and which feeds directly into the wishlist code via Wishlist::Mediator::priority().

Mind if I make a separate PR for this? IMO this is a regression in 4.0.x and so a narrow-scope PR for that would be a bugfix that could go into the next release instead of waiting for the semver-minor freeze to be lifted

@pldubouilh
Copy link
Contributor Author

@afontenot I could repro without webseed as well (using the magnet link from tears of steel, sans webseed). There are a lot of edge cases where it would not happen though (1 single file in torrent, different piece-size configurations, etc...)

@ckerr super, that sounds good to me - thanks for the PR !

@pldubouilh
Copy link
Contributor Author

I think this PR is now fully ready - thank y'all for the active and careful review 🙌

@afontenot
Copy link

For what it's worth I see the same issue after applying the new first / last piece priority patch. The presence of the webseed (which downloads at <100 kbps for me) delays the existence of the first piece for a long time, even though the rest of the pieces download sequentially.

I timed the appearance of the first chunk several times and it takes somewhere around a minute to appear, which at the 512 KiB piece size translates to about 70 kbps. That's almost exactly what I see for the transfer speed before Transmission connects to the first peer - so when it's downloading exclusively from the web seed.

Maybe it would make more sense to race the first piece between peers and the webseed when there's a webseed? A webseed could be arbitrarily slow, meaning for a large enough piece size you get basically no benefit to sequential downloading. Alternatively, it would be nice if I could remove a webseed in transmission-qt.

A screen recording of the issue
transmission.mp4

(The last chunk of the entire torrent appears to be getting prioritized even before the patch - but not the last chunk of each wanted file. The patch fixes this, and also correctly prioritizes the first chunk in the non-sequential download cases. In the sequential case it doesn't matter, in my tests, because the first piece gets downloaded first anyway.)

@OmlineEditor
Copy link

if you implement the download sequentially for several files, for example, when a TV series is downloaded. I ask you to sort the files inside the torrent by the name of the files and not by the position in the torrent. In this case, the series and files will be loaded really in the right sequence.

it was suggested by me here #5169

@KapJI
Copy link

KapJI commented Apr 3, 2023

FYI: There is also this patch which includes some UI including web UI.
https://github.com/Mikayex/transmission/tree/3.00-seq1

@OmlineEditor
Copy link

FYI: There is also this patch which includes some UI including web UI. https://github.com/Mikayex/transmission/tree/3.00-seq1

we need support inside the program itself without patches and web interfaces. if there is no such function inside the kernel, then you need to write this function for sequential download

@ckerr
Copy link
Member

ckerr commented Apr 14, 2023

Notes: Added optional sequential downloading.

@ckerr ckerr merged commit ebfba68 into transmission:main Apr 14, 2023
@bananakid
Copy link

Notes: Added optional sequential downloading.

Sorry to nag, please include UI toggle for sequential data download in the upcoming Transmission 4.1.0 update for Windows if possible.

P.S. It looks like Keenetic has already introduced this for web interface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
notes:highlight Should be listed prominently in release notes scope:core semver:minor adds functionality in a backwards compatible manner type:feat A new feature
8 participants