Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Estimating progress in bees #72

Open
Zygo opened this issue Sep 25, 2018 · 0 comments
Open

Estimating progress in bees #72

Zygo opened this issue Sep 25, 2018 · 0 comments

Comments

@Zygo
Copy link
Owner

Zygo commented Sep 25, 2018

It would be nice if bees could estimate progress, i.e. how much of the filesystem has been scanned or needs to be scanned before bees can return to an idle state.

For subvol scans (root 5 and all roots 256 and higher), in .beeshome/beescrawl.dat we can see the position of the extent iterators:

root 5 objectid 18446744073709551611 offset 193891 min_transid 1345204 max_transid 1345216 started 1537886291 start_ts 2018-09-25-10-38-11
root 258 objectid 6385071 offset 540699 min_transid 1345204 max_transid 1345216 started 1537886291 start_ts 2018-09-25-10-38-11
root 259 objectid 0 offset 0 min_transid 1345204 max_transid 1345216 started 1537886291 start_ts 2018-09-25-10-38-11
root 289 objectid 92116 offset 257 min_transid 1345204 max_transid 1345216 started 1537886291 start_ts 2018-09-25-10-38-11

The 'objectid' field is the inode number within the subvol. If you know the largest inode number in a subvol (let's call it max_objectid), then the percentage progress is:

progress_percent = 100 * objectid / max_objectid

The scan mode (-m) option scans subvols differently:

  • In mode 0, all subvols are scanned in lock-step, i.e. they all progress at the same rate, and they all restart at the same time.
  • In mode 1, all subvols are scanned in parallel with no synchronization. Each subvol scan restarts immediately. A small subvol will be scanned many times while a large subvol is scanned once.
  • In mode 2, all subvols are scanned in order of start_ts, with root ID to break ties. When a subvol is completed it will not be scanned again until all other subvols have been scanned.

At the end of each subvol scan (100% completion), the max_transid field is copied to the min_transid field and the scan starts over. If all subvols have no new data, scanning stops until 10 transids have passed.

When a new subvol is detected, the lowest value for any subvol's min_transid field is copied to the min_transid field of the new subvol, since any extent older than the lowest min_transid has already been scanned.

If the min_transid field of all subvols is non-zero then at least one scan has been completed for the entire filesystem.

For root 2 (extent) or root 7 (csum) scans the objectid is a data block bytenr. In these cases the entire filesystem is scanned in a single pass; however, data block bytenrs are not contiguous, so some extra work (scans of device tree 4) has to be done to determine which parts of the bytenr space are occupied by data in order to produce an accurate estimate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant