Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formalization of approximate interface with an exact backing #465

Open
jnievelt opened this issue Jul 15, 2015 · 6 comments
Open

Formalization of approximate interface with an exact backing #465

jnievelt opened this issue Jul 15, 2015 · 6 comments

Comments

@jnievelt
Copy link
Contributor

This idea seems to crop up in a few places. Essentially, exact structures can be taken as a special case of our sketch structures that take up less space.

In theory, Monoids could take advantage of this, and maybe we could simplify some logic by having implicit conversions? And perhaps we could simplify some EventuallyMonoids.

Some examples would be:

  • String or Set[String] as exact BloomFilter
  • K or Set[K] as exact HLL
  • K, (K, Long), or TraversableOnce[(K, Long)] as exact CMS
  • (K, V) or TraversableOnce[(K, V)] as exact SketchMap
@johnynek
Copy link
Collaborator

I'd say CMS[K] ~ Map[K, Long] and SketchMap[K, V] ~ Map[K, V]. But how would we use this?

These approximations are generally for datastructures and methods/functions. How do we codify that?

@jnievelt
Copy link
Contributor Author

Yeah it's not completely clear to me, either.

But it occurs to me that all of these Monoids have/need a 'create' method that does the conversion involved here (more or less). Perhaps we could have types like:

trait Exact[A, E] extends A {
  def exactData: E
}
trait ApproximationMonoid[A, E] extends Monoid[A] {
  def approximate(exactData: E): A
}
trait ExactMonoid[A, E] extends Monoid[Exact[A, E]]

Though this is already problematic due to 'extends A', because these things aren't really traits (SketchMap is a case class, even).

@johnynek
Copy link
Collaborator

A minor tweak to your suggestion:

// A approximates E
  // ideally there would be some metric here, like maybe:
  // if we have |approx(f(e1, e2)) - f'(approx(e1), approx(e2))| < eps
  // for some approximating transformation f -> f' and some Metric[A].
trait Approximates[A, E, M[_]] {
  def approximate(e: E): A
  // get the approximate typeclass from the exact one.
  def transform(m: M[E]): M[A]
  def metric: Metric[A]
  def error: Double
}

sealed trait MaybeApprox[+A, +E] extends Any
case class Exact[E](exact: E) extends MaybeApprox[Nothing, A] with AnyVal
case class Approx[A](approx: A) extends MaybeApprox[A, Nothing] with AnyVal

// This is basically a renamed version of the EventuallyMonoid to be more what we are using it for
// with the addition of a new Typeclass: Approximates[A, E]
case class ApproximateMonoid[A, E](implicit exact: Monoid[E], approx: Approx[A, E]) extends Monoid[MaybeApprox[A, E]] ...
  // we can get Monoid[A] from approx and exact.

@jnievelt
Copy link
Contributor Author

I think the signature of the Monoid would be a bit longer. Something like:

class EventuallyApproxMonoid[A, E](
  makeApprox: E => A,
  p: E => Boolean
)(
  implicit
  exact: Monoid[E],
  approx: Monoid[A]
) extends Monoid[MaybeApprox[A, E]]

@sid-kap
Copy link
Contributor

sid-kap commented Jul 31, 2015

Something like this could be a great way to clean up our test code. I'm thinking about making something like

trait ApproximateProperty {
  type Params
  type Exact
  type Approximate
  type ExactResult
  type ApproximateResult

  def makeApproximate(p: Params, e: Exact): Approximate
  def exactResult(e: Exact): ExactResult
  def approximateResult(a: Approximate): ApproximateResult
  def claim(e: ExactResult, a: ApproximateResult): Boolean
  def probability(p: Params): Double
}

Example implementation:

CmsProperty extends ApproximateProperty {
  type Params = CMSParams
  type Exact = List[T]
  type Approximate = CountMinSketch[T]
  type ExactResult = Int
  type ApproximateResult = Int

  def makeApproximate(p: CMSParams, exact: List[T]) = {
    val cmsMonoid = CountMinSketchMonoid(params)
    cmsMonoid.sum(exact.map(cmsMonoid.create(_)))
  }
  def exactResult(list: List[T]) = list.getFrequency(x)
  def approximateResult(cms: CountMinSketch[T]) = cms.frequency(x)
  def claim(e: Int, a: Int) = e == a

  // probability that `claim` is true, using eps and delta in the params
  def probability(p: CMSParams) = ??? 
}

This example doesn't check out because I don't know where to get the x in exactResult and approximateResult. Maybe there should be a def getArbitraryTestInput which returns a suitable x which we can feed into exactResult and approximateResult?

Does this seem like a good idea? My goal is to make something similar to scalacheck's Properties but for approximate properties. This could be useful in bounding the probabilities of test failures and such for each test.

@sritchie
Copy link
Collaborator

Do we all mind if I close this and merge the discussion into #44 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants