Update README.md

tomtomwombat · Mar 22, 2024 · a0c2ab6 · a0c2ab6
1 parent 311a43e
commit a0c2ab6
Showing 1 changed file with 14 additions and 14 deletions.
diff --git a/README.md b/README.md
@@ -8,7 +8,7 @@
     <img src="https://codecov.io/gh/tomtomwombat/fastbloom/branch/main/graph/badge.svg">
 </a>
 
-The fastest bloom filter in Rust. Compatible with any hasher.
+The fastest Bloom filter in Rust. Compatible with any hasher.
 
 
 ## Usage
@@ -45,7 +45,7 @@ let filter = BloomFilter::with_num_bits(1024)
 ```
 
 ## Background
-Bloom filters are space-efficient approximate membership set data structures supported by an underlying bit array to track item membership. To insert/check membership, a number of bits are set/checked at positions based on the item's hash. False positives from a membership check are possible, but false negatives are not. Once constructed, neither the bloom filter's underlying memory usage nor number of bits per item change. [See more.](https://en.wikipedia.org/wiki/Bloom_filter)
+Bloom filters are space-efficient approximate membership set data structures supported by an underlying bit array to track item membership. To insert/check membership, a number of bits are set/checked at positions based on the item's hash. False positives from a membership check are possible, but false negatives are not. Once constructed, neither the Bloom filter's underlying memory usage nor number of bits per item change. [See more.](https://en.wikipedia.org/wiki/Bloom_filter)
 
 ```text
 hash(4) ──────┬─────┬───────────────┐
@@ -60,21 +60,21 @@ hash(4) ──────┬─────┬───────────
 
 ## Implementation
 
-`fastbloom` is **several times faster** than existing bloom filters and scales very well with the number of hashes per item. In all cases, `fastbloom` maintains competitive false positive rates. `fastbloom` is blazingly fast because it uses L1 cache friendly blocks, efficiently derives many index bits from **only one real hash per item**, and leverages other research findings on bloom filters.
+`fastbloom` is **several times faster** than existing Bloom filters and scales very well with the number of hashes per item. In all cases, `fastbloom` maintains competitive false positive rates. `fastbloom` is blazingly fast because it uses L1 cache friendly blocks, efficiently derives many index bits from **only one real hash per item**, and leverages other research findings on Bloom filters.
 
 
-`fastbloom` is implemented as a blocked bloom filter. Blocked bloom filters partition their underlying bit array into sub-array “blocks”. Bits set and checked from the item’s hash are constrained to a single block instead of the entire bit array. This allows for better cache-efficiency and the opportunity to leverage SIMD and [SWAR](https://en.wikipedia.org/wiki/SWAR) operations when generating bits from an item’s hash. [See more on blocked bloom filters.](https://web.archive.org/web/20070623102632/http://algo2.iti.uni-karlsruhe.de/singler/publications/cacheefficientbloomfilters-wea2007.pdf)
+`fastbloom` is implemented as a blocked Bloom filter. Blocked Bloom filters partition their underlying bit array into sub-array “blocks”. Bits set and checked from the item’s hash are constrained to a single block instead of the entire bit array. This allows for better cache-efficiency and the opportunity to leverage SIMD and [SWAR](https://en.wikipedia.org/wiki/SWAR) operations when generating bits from an item’s hash. [See more on blocked bloom filters.](https://web.archive.org/web/20070623102632/http://algo2.iti.uni-karlsruhe.de/singler/publications/cacheefficientbloomfilters-wea2007.pdf)
 
 
 ## Runtime Performance
 
-`fastbloom` is 50-1000% faster than existing bloom filters implemented in Rust.
+`fastbloom` is 50-1000% faster than existing Bloom filters implemented in Rust.
 
 #### SipHash
-Runtime comparison to other bloom filter crates (all using SipHash).
+Runtime comparison to other Bloom filter crates (all using SipHash).
 Note:
-- The number hashes for all bloom filters is derived to optimize accuracy, meaning fewer items in the bloom filters result in more hashes per item and generally slower performance.
-- As number of items (input) increases, the accuracy of the bloom filter decreases. 1000 random strings were used to test membership.
+- The number hashes for all Bloom filters is derived to optimize accuracy, meaning fewer items in the Bloom filters result in more hashes per item and generally slower performance.
+- As number of items (input) increases, the accuracy of the Bloom filter decreases. 1000 random strings were used to test membership.
 
 ![member](https://github.com/tomtomwombat/fastbloom/assets/45644087/c74ea802-a7a2-4df7-943c-92b3bcec982e)
 ![non-member](https://github.com/tomtomwombat/fastbloom/assets/45644087/326c2558-6f86-4675-99cb-c95aed73e90d)
@@ -91,10 +91,10 @@ The fastbloom-rs crate (similarily named) uses xxhash, which is faster than SipH
 
 ## False Positive Performance
 
-`fastbloom` does not compromise accuracy. Below is a comparison of false positive rates with other bloom filter crates:
+`fastbloom` does not compromise accuracy. Below is a comparison of false positive rates with other Bloom filter crates:
 
 ![bloom-fp](https://github.com/tomtomwombat/fastbloom/assets/45644087/07e22ab3-f777-4e4e-8910-4f1c764e4134)
-> The bloom filters and a control hash set were populated with a varying number of random 64 bit integers ("Number of Items"). Then 100,000 random 64 bit integers were checked: false positives are numbers that do NOT exist in the control hash set but do report as existing in the bloom filter.
+> The Bloom filters and a control hash set were populated with a varying number of random 64 bit integers ("Number of Items"). Then 100,000 random 64 bit integers were checked: false positives are numbers that do NOT exist in the control hash set but do report as existing in the Bloom filter.
 
 [Benchmark source](https://github.com/tomtomwombat/bench-bloom-filters)
 
@@ -103,7 +103,7 @@ The fastbloom-rs crate (similarily named) uses xxhash, which is faster than SipH
 `fastbloom` offers 4 different block sizes: 64, 128, 256, and 512 bits. 512 bits is the default. Larger block sizes generally have slower performance but are more accurate.
 
 #### Runtime Performance
-Times are for 1000 random strings. The bloom filters used ahash.
+Times are for 1000 random strings. The Bloom filters used ahash.
 
 ![member-fastbloom-blocks](https://github.com/tomtomwombat/fastbloom/assets/45644087/44073965-cc2d-4e70-9151-7e821b30b208)
 ![non-member-fastbloom-blocks](https://github.com/tomtomwombat/fastbloom/assets/45644087/6e5ee0e0-f460-46b9-95d6-f4b91d9fa424)
@@ -128,9 +128,9 @@ Instead of deriving a single bit position per hash, a hash with ~N 1 bits set ca
 
 ##### Example
 
-For a bloom filter with a bit vector of size 64 and desired hashes 24, 24 (potentially overlapping) positions in the bit vector are set or checked for each item on insertion or membership check respectively.
+For a Bloom filter with a bit vector of size 64 and desired hashes 24, 24 (potentially overlapping) positions in the bit vector are set or checked for each item on insertion or membership check respectively.
 
-Other bloom filters derive 24 positions based on 24 hashes of the item:
+Other Bloom filters derive 24 positions based on 24 hashes of the item:
 - `hash0(item) % 64`
 - `hash1(item) % 64`
 - ...
@@ -145,7 +145,7 @@ Note:
 - Given 64 bits, and 24 hashes, a bit has probability (63/64)^24 to NOT be set, i.e. 0, after 24 hashes. The expected number of bits to be set for an item is 64 - (64 * (63/64)^24) ~= 20.
 - A 64 bit `hash0(item)` provides us with roughly 32 set bits with a binomial distribution. `hash0(item) & hash1(item)` gives us ~16 set bits, `hash0(item) | hash1(item)` gives us ~48 set bits, etc.
 
-In reality, the bloom filter may have more than 64 bits of storage. In that case, many underlying `u64`s in the block are operated on, and the number of hashes is adjusted to be the number of hashes per `u64` in the block. Additionally, some bits may be set in the usual way to account for any rounding errors.
+In reality, the Bloom filter may have more than 64 bits of storage. In that case, many underlying `u64`s in the block are operated on, and the number of hashes is adjusted to be the number of hashes per `u64` in the block. Additionally, some bits may be set in the usual way to account for any rounding errors.
 
 ## References
 - [Bloom filter - Wikipedia](https://en.wikipedia.org/wiki/Bloom_filter)