Formalization of approximate interface with an exact backing #465

jnievelt · 2015-07-15T19:31:41Z

This idea seems to crop up in a few places. Essentially, exact structures can be taken as a special case of our sketch structures that take up less space.

In theory, Monoids could take advantage of this, and maybe we could simplify some logic by having implicit conversions? And perhaps we could simplify some EventuallyMonoids.

Some examples would be:

String or Set[String] as exact BloomFilter
K or Set[K] as exact HLL
K, (K, Long), or TraversableOnce[(K, Long)] as exact CMS
(K, V) or TraversableOnce[(K, V)] as exact SketchMap

The text was updated successfully, but these errors were encountered:

johnynek · 2015-07-15T21:00:51Z

I'd say CMS[K] ~ Map[K, Long] and SketchMap[K, V] ~ Map[K, V]. But how would we use this?

These approximations are generally for datastructures and methods/functions. How do we codify that?

jnievelt · 2015-07-15T21:44:06Z

Yeah it's not completely clear to me, either.

But it occurs to me that all of these Monoids have/need a 'create' method that does the conversion involved here (more or less). Perhaps we could have types like:

trait Exact[A, E] extends A {
  def exactData: E
}
trait ApproximationMonoid[A, E] extends Monoid[A] {
  def approximate(exactData: E): A
}
trait ExactMonoid[A, E] extends Monoid[Exact[A, E]]

Though this is already problematic due to 'extends A', because these things aren't really traits (SketchMap is a case class, even).

johnynek · 2015-07-24T17:56:39Z

A minor tweak to your suggestion:

// A approximates E
  // ideally there would be some metric here, like maybe:
  // if we have |approx(f(e1, e2)) - f'(approx(e1), approx(e2))| < eps
  // for some approximating transformation f -> f' and some Metric[A].
trait Approximates[A, E, M[_]] {
  def approximate(e: E): A
  // get the approximate typeclass from the exact one.
  def transform(m: M[E]): M[A]
  def metric: Metric[A]
  def error: Double
}

sealed trait MaybeApprox[+A, +E] extends Any
case class Exact[E](exact: E) extends MaybeApprox[Nothing, A] with AnyVal
case class Approx[A](approx: A) extends MaybeApprox[A, Nothing] with AnyVal

// This is basically a renamed version of the EventuallyMonoid to be more what we are using it for
// with the addition of a new Typeclass: Approximates[A, E]
case class ApproximateMonoid[A, E](implicit exact: Monoid[E], approx: Approx[A, E]) extends Monoid[MaybeApprox[A, E]] ...
  // we can get Monoid[A] from approx and exact.

jnievelt · 2015-07-24T18:45:17Z

I think the signature of the Monoid would be a bit longer. Something like:

class EventuallyApproxMonoid[A, E](
  makeApprox: E => A,
  p: E => Boolean
)(
  implicit
  exact: Monoid[E],
  approx: Monoid[A]
) extends Monoid[MaybeApprox[A, E]]

sid-kap · 2015-07-31T23:35:29Z

Something like this could be a great way to clean up our test code. I'm thinking about making something like

trait ApproximateProperty {
  type Params
  type Exact
  type Approximate
  type ExactResult
  type ApproximateResult

  def makeApproximate(p: Params, e: Exact): Approximate
  def exactResult(e: Exact): ExactResult
  def approximateResult(a: Approximate): ApproximateResult
  def claim(e: ExactResult, a: ApproximateResult): Boolean
  def probability(p: Params): Double
}

Example implementation:

CmsProperty extends ApproximateProperty {
  type Params = CMSParams
  type Exact = List[T]
  type Approximate = CountMinSketch[T]
  type ExactResult = Int
  type ApproximateResult = Int

  def makeApproximate(p: CMSParams, exact: List[T]) = {
    val cmsMonoid = CountMinSketchMonoid(params)
    cmsMonoid.sum(exact.map(cmsMonoid.create(_)))
  }
  def exactResult(list: List[T]) = list.getFrequency(x)
  def approximateResult(cms: CountMinSketch[T]) = cms.frequency(x)
  def claim(e: Int, a: Int) = e == a

  // probability that `claim` is true, using eps and delta in the params
  def probability(p: CMSParams) = ??? 
}

This example doesn't check out because I don't know where to get the x in exactResult and approximateResult. Maybe there should be a def getArbitraryTestInput which returns a suitable x which we can feed into exactResult and approximateResult?

Does this seem like a good idea? My goal is to make something similar to scalacheck's Properties but for approximate properties. This could be useful in bounding the probabilities of test failures and such for each test.

sritchie · 2016-10-24T20:37:01Z

Do we all mind if I close this and merge the discussion into #44 ?

sritchie added the on-close-alert label Oct 24, 2016

sritchie mentioned this issue Nov 18, 2016

Approximate collections API. #44

Closed

sritchie removed the on-close-alert label Nov 18, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Formalization of approximate interface with an exact backing #465

Formalization of approximate interface with an exact backing #465

jnievelt commented Jul 15, 2015

johnynek commented Jul 15, 2015

jnievelt commented Jul 15, 2015

johnynek commented Jul 24, 2015

jnievelt commented Jul 24, 2015

sid-kap commented Jul 31, 2015

sritchie commented Oct 24, 2016

Formalization of approximate interface with an exact backing #465

Formalization of approximate interface with an exact backing #465

Comments

jnievelt commented Jul 15, 2015

johnynek commented Jul 15, 2015

jnievelt commented Jul 15, 2015

johnynek commented Jul 24, 2015

jnievelt commented Jul 24, 2015

sid-kap commented Jul 31, 2015

sritchie commented Oct 24, 2016