-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make DB_SIZE= resize the hashtable #70
Comments
Change that and restart beesd. That will resize hashtable. |
@Massimo-B Maybe you are confused by how truncate works when you looked at the script: Truncate does what the name says: It truncates the contents of the file at the specified position. Everything after that would be thrown away (you loose that part of the hash table). If you truncate the file after its end, the file would be resized filling the gap with a hole (it makes the file sparse). Truncate does not create an empty file by overwriting it with zeros. It just creates void (or cuts away what will end up in void). |
bees saves a few bits per hash table entry by encoding some data in the position within the hash table (i.e. some bits of the hash form the page address in the hash table file so it is not necessary to store those bits in the hash table entry). If the hash table is resized, bees can no longer find most of the data, and rejects any data that is found as invalid. If the hash table is resized by a multiple of 2, then 50% of the data lands in the right place by chance (and the other 50% is lost) but this is an exception. If the hash table is resized by a multiple of 3, over 99% of the data is lost. Issue 44 has a deeper discussion of the technical issues (and an algorithm for resizing at #44 (comment)). |
FWIW as you mentioned in #105 (comment) @Zygo there are certain situations where creating a new empty hash table is preferable (e.g. full btrfs balance). I read #44 and all of the other documentation extensively - I am very far from an expert at the internals of bees and/or btrfs (I've done enough sysadminning/programming [is that a word?] to understand what's going on in general...) That said, perhaps I can make a small request/plea for some clarification/documentation on the internals of bees? Maybe a section on "Don't do these things - they will break btrfs" or "These things are OK/safe to do"? Especially on interactions with btrfs functionality like defrag/defrag+compress/balance/etc. (I think probably a huge factor in my lack of clarity comes from applying other filesystem dedupe mechanisms (e.g. NTFS/ZFS) into the BTRFS/bees world - some things are similar, some things are very different.) Maybe just as a suggestion (in no particular order at all):
I'd be happy to volunteer and help write the documentation I suggested above -- I just don't have the understanding (yet :) ) to do so. I hope my comments don't come off as critical -- I am enormously grateful for the project so far and am excited for the future. |
NTFS dedupe attacks the problem from a very different vector. As far as I know, it creates a hidden file somewhere in I guess the ZFS approach is more like bees because it keeps a hash table in memory for dedupe with the difference that it is real-time (native implementation within the FS, dedupe happens before writing to disk) while bees is near-time (walks and re-reads the extents as soon as transaction changed are detected, the extents have already been written to disk). From that perspective, you'd be better using bees on a storage with NTFS images than using NTFS Dedup within a Windows VM. |
The hash table becomes empty. You must also delete
The answer is probabilistic and depends on which subset of the data is present in the hash table. If the hash table is large enough and there are no hash collisions, then all the data is present, and all extents can be duplicated. If the hash table is smaller, unique data and data scanned a long time ago will be discarded in favor of duplicate data and data scanned recently. There is a random function which selects which hash table entries to discard to avoid biasing the hash table toward some data structures at the expense of others (e.g. files which are duplicate in the middle, but have unique headers or trailers). Once bees has located a pair of duplicate blocks, any adjacent blocks in the same extents can be deduplicated. This means that only one block per extent needs to be in the hash table. Blocks are deleted from the hash table randomly, so a larger extent is more likely to retain blocks in the hash table than a smaller extent. When comparing bees to a dedupe engine that uses blocks with constant size and alignment, bees appears have a block size that is proportional to the ratio of data size divided by hash table size; however, this is only comparing averages while ignoring the rest of the practical differences in implementation.
It has no effect on previously deduped data--it will stay deduped until it is deleted, overwritten, or defragmented (btrfs defrag is a data copy with some atomic update magic).
Memory usage is constant in bees and all dedupe state is persistent, so the best you could do would be to adjust the number of cores and IO/CPU priorities. It may be possible to sample the hash table to make an existing table smaller (e.g. discard 75% of the hash table) to get something similar to this.
defrag undoes all dedupe whether it compresses or not. bees will treat defragged data as new data and dedupe it again (usually undoing the defrag if a non-defragged reference to the old data still exists). btrfs balance does not modify extents or references except to relocate them, i.e. balanced data retains all fragment sizes and compression, but not position on disk (both virtual and physical addresses change). bees stores references to data by virtual address, so anything that changes virtual address of data will invalidate the bees hash table. bees will interpret balanced data as new data and reinsert it into the hash table at its new virtual address.
Depends what you mean by 'well'. bees copes with this by brute force. bees treats all such operations the same way as any other new data write: new data is read, looked up into the hash table, and inserted/updated/deduped as appropriate. Obviously there are cases (e.g. full balance to reshape a RAID array) where this is not the optimally performing behavior, hence #105 (comment). It will not break a system to run bees and balance together, but running two maintenance functions at the same time is usually slower than running them at separate times on spinning disks. defrag and bees are more directly opposed--both will undo the other. |
When you figure this out please tell me. ;) bees is designed to work with minimal degradation when the hash table is not optimally sized. If the hash table is too small, dedupe ratio decreases because bees can't match extents that are far apart in space or time. If the hash table is too large, performance decreases because bees will have stale data in the hash table that takes extra time to read and invalidate. An "optimal" hash table size also depends on the desired efficiency rate vs. RAM usage, which can only be answered by a sysadmin: is it worth using an extra 30% of RAM to get a 2.6:1 dedupe ratio instead of 2.5:1? The time component makes estimation based on size metrics difficult. It matters when data was written to the disk. Data that is written at close to the same time is more likely to be present in an undersized hash table than data that is written at different times (e.g. everybody downloads the same popular file at once). If all of your data is like that, you can reduce bees hash table size with no impact; if none of your data is like that, you need a bigger hash table or you don't find any duplicates. It's also possible to delete
If your filesystem is too big to run The total size of the data divided by 4K (btrfs dedupe alignment block size) and multiplied by 16 (number of bytes in each bees hash table entry) is the largest hash table that could possibly be useful. At that size, every 4K block on the filesystem is indexed and all duplicate blocks can be found. If the hash table is larger than this size then bees will not use the extra space. The average size of an extent is uncompressed disk usage (10G) divided by regular extent count (84388)--in this example, just under 128K. The probability of an extent being deduped is a function of the extent length and the ratio between the largest useful hash table size and the actual hash table size. For a 4K extent, if the hash table drops 1 block of the extent, the extent cannot be deduped; for a 128K extent, the hash table can drop 31 blocks before the extent cannot be deduped. If all extents were 128K, the filesystem could theoretically be deduped by a hash table that was 32 times smaller than the maximum useful size. Note these numbers are averages: some extents will certainly have all 32 blocks dropped from the hash table, while others will have more than 1 block still present. In reality, real filesystems don't have a uniform data distribution, which affects the dedupe hit rate either way. Temporal proximity has a large impact as well: very old extents are more likely to be completely dropped while very new extents probably have all their blocks in the hash table.
config.md has a lot of those already. Do you have more concrete criticisms?
I'd start with uncompressed total data size (as reported by
In practice this is the only question that matters, unless you want to know how much RAM you should buy for a given data size.
bees on-disk hash table is the same size as bees in-RAM hash table. The table can be about 10% smaller on disk due to filesystem compression.
Pull requests welcome! |
bees uses btrfs's backref counting to achieve this. It is similar in some ways, just on a much different scale. i.e. if you have multiple reflinks to an extent, and you delete some but not all of them, you can have unreachable blocks on the filesystem that don't get freed until the last relink to any part of the extent is removed. This means you can sometimes see disk usage > referenced in
ZFS allows swapping of the hash table to disk. bees does not--anything that doesn't fit in RAM is simply discarded. I considered various swapping algorithms for bees, but any swapping algorithm would make bees 3-6 orders of magnitude slower, while only adding a few percent more dedupe ratio. 10,000 times slower for 1% more disk space didn't seem like a worthwhile tradeoff. That said, there has been some discussion here about putting some kind of hash table swapping into bees to reduce RAM requirements during long periods of idleness, but this is not the same use case as ZFS DDT swap.
bees trails a little behind the filesystem (waiting until at least two commits have gone to disk since the data was written before bees reads the data) due to limitations of the btrfs kernel interface. bees needs to know physical locations of data on disk, but the kernel doesn't necessarily allocate data immediately, so bees has to wait until the data flush to disk is done before picking up the data. If all goes well, the data is still in VFS cache, so at least bees doesn't need to do extra IOs to read it. |
Yes that's true but it's still different in NTFS: If you delete all files sharing an extent, it won't free up space unless the GC ran. Because NTFS reflinks shared extents to one additional chunk database file. It also seems to be a very special feature: Files with shared extents can only be accessed by Windows Server. Windows consumer versions cannot access such files (or maybe cannot even mount the volume). So extent reflinks in NTFS seem not to be something baked into the FS at a native level, it probably needs some helper service to access those files. This makes me think that NTFS extents shared this way are not really reflinked but are a special type of extent linking into the chunk database. It's probably build around the VSS functions so the chunk database would be some special volume snapshot and as such all chunks are kept, even if references to it are all deleted. This is also a big headache for backup software which doesn't know about the chunk database and tries to back it up: This database may be very big and take a good part of the storage space of the volume, and it constantly changes, thus it is backed up over and over again by file-based backup software. |
@Zygo That would mean the service wrapper script should kill the |
If the hash table is deleted and recreated, so should If the file is just resized with |
Is this from @Zygo still true, or did it change by now? The beesd wrapper does delete the beescrawl.dat file, even when resizing. https://github.com/Zygo/bees/blob/master/scripts/beesd.in#L129-L133 if (( "$OLD_SIZE" != "$NEW_SIZE" )); then
INFO "Resize db: $OLD_SIZE -> $NEW_SIZE"
rm -f "$BEESHOME/beescrawl.dat"
truncate -s $NEW_SIZE $DB_PATH
fi Besides being unnecessary, does it impact performance, cause reindexing, or other issues? |
If the hash table size is altered by any change other than multiplying or dividing the hash table size by 2, then bees won't be able to locate old hash table entries, and won't be able to dedupe new data against copies that were scanned before the hash table resize. That usually makes dedupe stop working, and to fix that, we remove Even when dividing or multiplying by 2, half of the pre-resize hash table entries will be lost, but bees can still get a reasonable hit rate with only half the entries. When bees can't locate the old hash table entries, it won't be able to dedupe any data that was scanned before the hash table was resized. This wasn't really an intended feature, but it is a result of the structure of the hash table: it's a giant array that places hash entries in blocks according to There's a long-standing feature request to have bees properly support resizing the hash table, or a tool that can read a hash table and write a new table with a different size. This would involve reading the old hash table and inserting its data into a new hash table while preserving the historical LRU order, effectively simulating just the hash table insertions of a filesystem scan. That would remove the need to restart the filesystem scan from the beginning. |
Hi,
is it possible to make changing the DB_SIZE= in the beesd config trigger a resizing of the hastable?
How would I actually resize the hashtable myself, without loosing all the existing hashtable?
The text was updated successfully, but these errors were encountered: