Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saturated bloomfilter size #666

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

cesarcolle
Copy link

I reproduce the error from issue #632

  1. The error from the initial issue is reproduced with the test :
    "return his size event if it's saturated" in {
      val bfMonoid = BloomFilterMonoid[String](5, 13)
      val strings = Array(1, 2, 3, 4, 5, 6, 7, 8, 9, 10).map(_.toString)
      val bf = bfMonoid.create(strings: _*)

      assert(bf.size.min > 0)
    }

throws :

[info] - should return his size event if it's saturated *** FAILED ***
[info]   java.lang.IllegalArgumentException: requirement failed

Due to, I think , #632 (comment) we discuss before.

  1. This is not the only error I found, seem like a saturated bloomfilter also provided a huge max approximation for a given input. But not throw an exception

Coming from the comment (L.135 to 138 ) in Bloomfilter file :

/*
   * approximationWidth defines an interval around the maximum-likelihood cardinality
   * estimate. Namely, the approximation returned is of the form
   * (min, estimate, max) =
   *   ((1 - approxWidth) * estimate, estimate, (1 + approxWidth) * estimate)
   */

But there is a problem between the result from the :

    "return a max approximate size if it's saturated" in {
      val bfMonoid = BloomFilterMonoid[String](5, 13)
      val strings = List(8, 9, 8, 10, 1, 8, 11, 7, 7, 1).map(_.toString)
      val bf = bfMonoid.create(strings: _*)
      // even it's seems big. This assert fail..
      assert(bf.size.max < strings.length * 100)
    }

which return :

[info] - should return a max approximate size if it's saturated *** FAILED ***
[info]   2147483647 was not less than 1000 (BloomFilterTest.scala:459)

2147483647 is a particular number because it's the maximum 32 bits integer you can have.

Copy link
Collaborator

@johnynek johnynek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks good.

Could we possibly revert the style changes for code you are not touching?

@cesarcolle
Copy link
Author

Sorry it's Intelij clean code not working with the .scalafmt.conf of the project. I apply algebird's scalafmt by hand

From the paper linked to the size estimation of a bloomfilter,
we have a inegality mixing t = number bits to 1 of the filter, m=width of filter, numHash=k

        nl <= S-1(t-1)
        nr >= S-1(t+1)
S-1 is defined like :

	S-1(t) = ln(1 - t/m) / k.ln(1 - 1/m)

problem occur when _t = m_

	S-1(m) = ln(0) / k.ln(1 - 1/m)

but *ln(-1/m)* is not correct.

problem is :

	ln( 1 - 1/m) < 0
The scala maths library give :

	scala> val infinity = scala.math.log(0) / -42
	infinity: Double = Infinity

	scala> infinity.round.toInt
	res24: Int = -1

just add special case for the S-1 function.
@codecov-io
Copy link

Codecov Report

Merging #666 into develop will increase coverage by 0.21%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #666      +/-   ##
===========================================
+ Coverage    89.31%   89.52%   +0.21%     
===========================================
  Files          113      113              
  Lines         8944     8945       +1     
  Branches       490      494       +4     
===========================================
+ Hits          7988     8008      +20     
+ Misses         956      937      -19
Impacted Files Coverage Δ
.../main/scala/com/twitter/algebird/BloomFilter.scala 95.15% <100%> (+0.9%) ⬆️
...c/main/scala/com/twitter/algebird/MapAlgebra.scala 75.67% <0%> (+0.9%) ⬆️
.../main/scala/com/twitter/algebird/Applicative.scala 61.76% <0%> (+2.94%) ⬆️
.../main/scala/com/twitter/algebird/Successible.scala 91.66% <0%> (+4.16%) ⬆️
...src/main/scala/com/twitter/algebird/Interval.scala 81.73% <0%> (+5.21%) ⬆️
...ala/com/twitter/algebird/ApproximateProperty.scala 82% <0%> (+10%) ⬆️
...scala/com/twitter/algebird/PredecessibleLaws.scala 86.66% <0%> (+20%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5fdb079...ee3b2db. Read the comment docs.

@CLAassistant
Copy link

CLAassistant commented Nov 16, 2019

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants