diff --git a/CHANGELOG.rst b/CHANGELOG.rst index 33d63b5b83..40267a84af 100644 --- a/CHANGELOG.rst +++ b/CHANGELOG.rst @@ -7,6 +7,116 @@ Note that ``RB_ID=#`` and ``PHAB_ID=#`` correspond to associated message in comm Unreleased ---------- +19.1.0 +------- + +Added +~~~~~ + +* finatra-kafka-streams: SumAggregator and CompositeSumAggregator only support enhanced window + aggregations for the sum operation. Deprecate SumAggregator and CompositeSumAggregator and create + an AggregatorTransformer class that can perform arbitrary aggregations. ``PHAB_ID=D257138`` + +* finatra-streams: Open-source Finatra Streams. Finatra Streams is an integration + between Kafka Streams and Finatra which we've been using internally at Twitter + for the last year. The library is not currently open-source. + ``PHAB_ID=D248408`` + +* inject-server: Add lint rule to alert when deprecated `util-logging` JUL flags from the + `c.t.inject.server.DeprecatedLogging` trait are user defined. This trait was mixed-in + only for backwards compatibility when TwitterServer was moved to the slf4j-api and the flags are + not expected to be configured. By default, `util-app` based applications will fail to start if + they are passed a flag value at startup which they do not define. Users should instead configure + their chosen slf4j-api logging implementation directly. ``PHAB_ID=D256489`` + +* finatra-thrift: `c.t.finatra.thrift.Controllers` now support per-method filtering and + access to headers via `c.t.scrooge.{Request, Response}` wrappers. To use this new + functionality, create a `Controller` which extends the + `c.t.finatra.thrift.Controller(SomeThriftService)` abstract class instead of constructing a + Controller that mixes in the `SomeThriftService.BaseServiceIface` trait. With this, you can now + provide implementations in form of `c.t.scrooge.Request`/`c.t.scrooge.Response` wrappers by calling + the `handle(ThriftMethod)` method. Note that a `Controller` constructed this way cannot also + extend a `BaseServiceIface`. + + handle(SomeMethod).filtered(someFilter).withFn { req: Request[SomeMethod.Args] => + val requestHeaders = req.headers + // .. implementation here + + // response: Future[Response[SomeMethod.SuccessType]] + } + + Note that if `Request`/`Response` based implementations are used the types on any + existing `ExceptionMappers` should be adjusted accordingly. Also, if a `DarkTrafficFilterModule` + was previously used, it must be swapped out for a `ReqRepDarkTrafficFilterModule` + ``PHAB_ID=D236724`` + +Changed +~~~~~~~ + +* inject-core, inject-server: Remove deprecated `@Bind` support from test mixins. Users should + instead prefer using the `bind[T] `__ + DSL in tests. ``PHAB_ID=D250325`` + +* inject-app: Remove deprecated `bind[T]` DSL methods from `c.t.inject.app.BindDSL`. + + Instead of: + + .. code:: scala + + injector.bind[T](instance) + injector.bind[T, Ann](instance) + injector.bind[T](ann, instance) + + Users should instead use the more expressive forms of these methods, e.g.,: + + .. code:: scala + + injector.bind[T].toInstance(instance) + injector.bind[T].annotatedWith[Ann].toInstance(instance) + injector.bind[T].annotatedWith(ann).toInstance(instance) + + which more closely mirrors the scala-guice binding DSL. ``PHAB_ID=D255591`` + +* finatra-thrift: For services that wish to support dark traffic over + `c.t.scrooge.Request`/`c.t.scrooge.Response`-based services, a new dark traffic module is + available: `c.t.finatra.thrift.modules.ReqRepDarkTrafficFilterModule` ``PHAB_ID=D236724`` + +* finatra-thrift: Creating a `c.t.finatra.thrift.Controller` that extends a + `ThriftService.BaseServiceIface` has been deprecated. See the related bullet point in "Added" with + the corresponding PHAB_ID to this one for how to migrate. ``PHAB_ID=D236724`` + +* inject-core, inject-server: Remove deprecated `WordSpec` testing utilities. The framework + default ScalaTest testing style is `FunSuite` though users are free to mix their testing + style of choice with the framework provided test mixins as per the + `documentation `__. + ``PHAB_ID=D255094`` + +* finatra-thrift: Instead of failing (potentially silently) + `c.t.finatra.thrift.routing.ThriftWarmup` now explicitly checks that it is + using a properly configured `c.t.finatra.thrift.routing.Router` ``PHAB_ID=D253603`` + +* finatra-inject: `c.t.finatra.inject.server.PortUtils` has been modified to + work with `c.t.f.ListeningServer` only. Methods which worked with the + now-removed `c.t.f.b.Server` have been modified or removed. + ``PHAB_ID=D254339`` + +* finatra-kafka-streams: Finatra Queryable State methods currently require the window size + to be passed into query methods for windowed key value stores. This is unnecessary, as + the queryable state class can be passed the window size at construction time. We also now + save off all FinatraKeyValueStores in a global manager class to allow query services + (e.g. thrift) to access the same KeyValueStore implementation that the FinatraTransformer + is using. ``PHAB_ID=D256920`` + +Fixed +~~~~~ + +* finatra-kafka-streams: Fix bug where KeyValueStore#isOpen was throwing an + exception when called on an uninitialized key value store + ``PHAB_ID=D257635`` + +Closed +~~~~~~ + 18.12.0 ------- @@ -16,6 +126,12 @@ Added Changed ~~~~~~~ +* finatra-thrift: `c.t.finatra.thrift.Controller` is now an abstract class + rather than a trait. ``PHAB_ID=D251314`` + +* finatra-thrift: `c.t.finatra.thrift.internal.ThriftMethodService` is now + private. ``PHAB_ID=D251186`` + * finatra-thrift: `c.t.finatra.thrift.exceptions.FinatraThriftExceptionMapper` and `c.t.finatra.thrift.exceptions.FinatraJavaThriftExceptionMapper` now extend `ExceptionManager[Throwable, Nothing]` since the return type was never used. They are @@ -40,7 +156,7 @@ Changed Fixed ~~~~~ -* finatra-http: Validate headers to prevent header injection vulnerability. ``PHAB_ID=D246889`` +* finatra-http: Validate headers to prevent header injection vulnerability. ``PHAB_ID=D246889`` Closed ~~~~~~ diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index e42ef3ce3b..9c37458881 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -69,7 +69,7 @@ the maintainers will help out during code review. We've standardized on using the [ScalaTest testing framework][scalatest]. Because ScalaTest has such a big surface area, we use a restricted subset of it in our tests to keep them easy to read. We've chosen the `Matchers` API, and we -use the [`WordSpec` mixin][wordspec]. Please mixin our [Test trait][test-trait] +use the [`FunSuite` mixin][funsuite]. Please mixin our [Test trait][test-trait] to get these defaults. Note that while you will see a [Travis CI][travis-ci] status message in your @@ -200,13 +200,13 @@ Scaladocs. Please file an [issue](https://github.com/twitter/finatra/issues). [finagle-repo]: https://github.com/twitter/finagle [util-repo]: https://github.com/twitter/util [effectivescala]: https://twitter.github.io/effectivescala/ -[wordspec]: http://doc.scalatest.org/2.2.1/#org.scalatest.WordSpec +[funsuite]: http://doc.scalatest.org/2.2.1/#org.scalatest.FunSuite [scalatest]: http://www.scalatest.org/ -[scala-style-guide]: http://docs.scala-lang.org/style/scaladoc.html -[sbt]: http://www.scala-sbt.org/ +[scala-style-guide]: https://docs.scala-lang.org/style/index.html +[sbt]: https://www.scala-sbt.org/ [travis-ci]: https://travis-ci.org/twitter/finatra [test-trait]: https://github.com/twitter/finatra/blob/develop/inject/inject-core/src/test/scala/com/twitter/inject/Test.scala -[scaladoc]: http://docs.scala-lang.org/style/scaladoc.html +[scaladoc]: https://docs.scala-lang.org/style/scaladoc.html [scalacheck]: https://www.scalacheck.org/ [gendrivenprop]: http://www.scalatest.org/user_guide/generator_driven_property_checks diff --git a/build.sbt b/build.sbt index 3aa2e8e5f9..92490aba9c 100644 --- a/build.sbt +++ b/build.sbt @@ -4,7 +4,7 @@ import scoverage.ScoverageKeys concurrentRestrictions in Global += Tags.limit(Tags.Test, 1) // All Twitter library releases are date versioned as YY.MM.patch -val releaseVersion = "18.12.0" +val releaseVersion = "19.1.0" lazy val buildSettings = Seq( version := releaseVersion, @@ -51,21 +51,26 @@ lazy val versions = new { // All Twitter library releases are date versioned as YY.MM.patch val twLibVersion = releaseVersion + val agrona = "0.9.22" + val bijectionCore = "0.9.5" val commonsCodec = "1.9" val commonsFileupload = "1.3.1" val commonsIo = "2.4" val commonsLang = "2.6" + val fastutil = "8.1.1" val guava = "19.0" val guice = "4.0" val jackson = "2.9.6" val jodaConvert = "1.2" val jodaTime = "2.5" val junit = "4.12" + val kafka = "2.0.1" val libThrift = "0.10.0" val logback = "1.1.7" val mockito = "1.9.5" val mustache = "0.8.18" val nscalaTime = "2.14.0" + val rocksdbjni = "5.14.2" val scalaCheck = "1.13.4" val scalaGuice = "4.1.0" val scalaTest = "3.0.0" @@ -213,7 +218,7 @@ lazy val finatraModules = Seq[sbt.ProjectReference]( httpclient, injectApp, injectCore, - injectLogback, + injectLogback, injectModules, injectRequestScope, injectServer, @@ -223,6 +228,12 @@ lazy val finatraModules = Seq[sbt.ProjectReference]( injectThriftClientHttpMapper, injectUtils, jackson, + kafka, + kafkaStreams, + kafkaStreamsPrerestore, + kafkaStreamsStaticPartitioning, + kafkaStreamsQueryableThriftClient, + kafkaStreamsQueryableThrift, thrift, utils) @@ -275,8 +286,6 @@ lazy val injectCoreTestJarSources = "com/twitter/inject/Test", "com/twitter/inject/TestMixin", "com/twitter/inject/TwitterTestModule", - "com/twitter/inject/WordSpecIntegrationTest", - "com/twitter/inject/WordSpecTest", "org/specs2/matcher/ScalaTestExpectations") lazy val injectCore = (project in file("inject/inject-core")) .settings(projectSettings) @@ -406,8 +415,7 @@ lazy val injectServerTestJarSources = "com/twitter/inject/server/EmbeddedTwitterServer", "com/twitter/inject/server/FeatureTest", "com/twitter/inject/server/FeatureTestMixin", - "com/twitter/inject/server/package", - "com/twitter/inject/server/WordSpecFeatureTest") + "com/twitter/inject/server/package") lazy val injectServer = (project in file("inject/inject-server")) .settings(projectSettings) .settings( @@ -757,12 +765,180 @@ lazy val injectThriftClientHttpMapper = (project in file("inject-thrift-client-h injectThriftClient % "test->test;compile->compile", thrift % "test->test;test->compile") +lazy val kafkaStreamsExclusionRules = Seq( + ExclusionRule("javax.ws.rs", "javax.ws.rs-api"), + ExclusionRule("log4j", "log4j"), + ExclusionRule("org.slf4j", "slf4j-log4j12")) + +lazy val kafkaTestJarSources = + Seq("com/twitter/finatra/kafka/test/EmbeddedKafka", + "com/twitter/finatra/kafka/test/KafkaTopic", + "com/twitter/finatra/kafka/test/utils/ThreadUtils", + "com/twitter/finatra/kafka/test/utils/PollUtils", + "com/twitter/finatra/kafka/test/utils/InMemoryStatsUtil", + "com/twitter/finatra/kafka/test/KafkaFeatureTest", + "com/twitter/finatra/kafka/test/KafkaStateStore") +lazy val kafka = (project in file("kafka")) + .settings(projectSettings) + .settings( + name := "finatra-kafka", + moduleName := "finatra-kafka", + ScoverageKeys.coverageExcludedPackages := ";.*", + libraryDependencies ++= Seq( + "com.twitter" %% "finagle-core" % versions.twLibVersion, + "com.twitter" %% "finagle-exp" % versions.twLibVersion, + "com.twitter" %% "finagle-thrift" % versions.twLibVersion, + "com.twitter" %% "scrooge-serializer" % versions.twLibVersion, + "com.twitter" %% "util-core" % versions.twLibVersion, + "org.apache.kafka" %% "kafka" % versions.kafka % "compile->compile;test->test", + "org.apache.kafka" %% "kafka" % versions.kafka % "test" classifier "test", + "org.apache.kafka" % "kafka-clients" % versions.kafka % "test->test", + "org.apache.kafka" % "kafka-clients" % versions.kafka % "test" classifier "test", + "org.apache.kafka" % "kafka-streams" % versions.kafka % "compile->compile;test->test", + "org.apache.kafka" % "kafka-streams" % versions.kafka % "test" classifier "test", + "org.apache.kafka" % "kafka-streams-test-utils" % versions.kafka % "compile->compile;test->test", + "org.apache.kafka" % "kafka-streams-test-utils" % versions.kafka % "test" classifier "test", + "org.slf4j" % "slf4j-api" % versions.slf4j % "compile->compile;test->test" + ), + excludeDependencies in Test ++= kafkaStreamsExclusionRules, + excludeDependencies ++= kafkaStreamsExclusionRules, + scroogeThriftIncludeFolders in Test := Seq(file("src/test/thrift")), + scroogeLanguages in Test := Seq("scala"), + excludeFilter in unmanagedResources := "BUILD", + publishArtifact in Test := true, + mappings in (Test, packageBin) := { + val previous = (mappings in (Test, packageBin)).value + previous.filter(mappingContainsAnyPath(_, kafkaTestJarSources)) + }, + mappings in (Test, packageDoc) := { + val previous = (mappings in (Test, packageDoc)).value + previous.filter(mappingContainsAnyPath(_, kafkaTestJarSources)) + }, + mappings in (Test, packageSrc) := { + val previous = (mappings in (Test, packageSrc)).value + previous.filter(mappingContainsAnyPath(_, kafkaTestJarSources)) + } + ).dependsOn( + injectCore % "test->test;compile->compile", + injectSlf4j % "test->test;compile->compile", + injectUtils % "test->test;compile->compile", + jackson % "test->test", + utils % "test->test;compile->compile") + +lazy val kafkaStreamsQueryableThriftClient = (project in file("kafka-streams/kafka-streams-queryable-thrift-client")) + .settings(projectSettings) + .settings( + name := "finatra-kafka-streams-queryable-thrift-client", + moduleName := "finatra-kafka-streams-queryable-thrift-client", + ScoverageKeys.coverageExcludedPackages := ";.*", + libraryDependencies ++= Seq( + "com.twitter" %% "finagle-serversets" % versions.twLibVersion + ), + excludeDependencies in Test ++= kafkaStreamsExclusionRules, + excludeDependencies ++= kafkaStreamsExclusionRules, + excludeFilter in unmanagedResources := "BUILD" + ).dependsOn( + injectCore % "test->test;compile->compile", + injectSlf4j % "test->test;compile->compile", + injectUtils % "test->test;compile->compile", + thrift % "test->test;compile->compile", + utils % "test->test;compile->compile") + +lazy val kafkaStreamsStaticPartitioning = (project in file("kafka-streams/kafka-streams-static-partitioning")) + .settings(projectSettings) + .settings( + name := "finatra-kafka-streams-static-partitioning", + moduleName := "finatra-kafka-streams-static-partitioning", + ScoverageKeys.coverageExcludedPackages := ";.*", + excludeDependencies in Test ++= kafkaStreamsExclusionRules, + excludeDependencies ++= kafkaStreamsExclusionRules, + excludeFilter in unmanagedResources := "BUILD" + ).dependsOn( + injectCore % "test->test;compile->compile", + injectSlf4j % "test->test;compile->compile", + injectUtils % "test->test;compile->compile", + kafkaStreams % "test->test;compile->compile", + kafkaStreamsQueryableThriftClient % "test->test;compile->compile", + thrift % "test->test;compile->compile", + utils % "test->test;compile->compile") + +lazy val kafkaStreamsPrerestore = (project in file("kafka-streams/kafka-streams-prerestore")) + .settings(projectSettings) + .settings( + name := "finatra-kafka-streams-prerestore", + moduleName := "finatra-kafka-streams-prerestore", + ScoverageKeys.coverageExcludedPackages := ";.*", + excludeDependencies in Test ++= kafkaStreamsExclusionRules, + excludeDependencies ++= kafkaStreamsExclusionRules, + excludeFilter in unmanagedResources := "BUILD" + ).dependsOn( + injectCore % "test->test;compile->compile", + injectSlf4j % "test->test;compile->compile", + injectUtils % "test->test;compile->compile", + kafkaStreams % "test->test;compile->compile", + kafkaStreamsStaticPartitioning % "test->test;compile->compile", + thrift % "test->test;compile->compile", + utils % "test->test;compile->compile") + +lazy val kafkaStreamsQueryableThrift = (project in file("kafka-streams/kafka-streams-queryable-thrift")) + .settings(projectSettings) + .settings( + name := "finatra-kafka-streams-queryable-thrift", + moduleName := "finatra-kafka-streams-queryable-thrift", + ScoverageKeys.coverageExcludedPackages := ";.*", + excludeDependencies in Test ++= kafkaStreamsExclusionRules, + excludeDependencies ++= kafkaStreamsExclusionRules, + scroogeThriftIncludeFolders in Compile := Seq(file("src/test/thrift")), + scroogeLanguages in Compile := Seq("java", "scala"), + scroogeLanguages in Test := Seq("java", "scala"), + excludeFilter in unmanagedResources := "BUILD" + ).dependsOn( + injectCore % "test->test;compile->compile", + injectSlf4j % "test->test;compile->compile", + injectUtils % "test->test;compile->compile", + kafkaStreams % "test->test;compile->compile", + kafkaStreamsQueryableThriftClient % "test->test;compile->compile", + kafkaStreamsStaticPartitioning % "test->test;compile->compile", + thrift % "test->test;compile->compile", + utils % "test->test;compile->compile") + +lazy val kafkaStreams = (project in file("kafka-streams/kafka-streams")) + .settings(projectSettings) + .settings( + name := "finatra-kafka-streams", + moduleName := "finatra-kafka-streams", + ScoverageKeys.coverageExcludedPackages := ";.*", + libraryDependencies ++= Seq( + "it.unimi.dsi" % "fastutil" % versions.fastutil, + "jakarta.ws.rs" % "jakarta.ws.rs-api" % "2.1.3", + "org.agrona" % "agrona" % versions.agrona, + "org.apache.kafka" %% "kafka-streams-scala" % versions.kafka % "compile->compile;test->test", + "org.rocksdb" % "rocksdbjni" % versions.rocksdbjni % "provided;compile->compile;test->test", + "org.apache.kafka" % "kafka-streams" % versions.kafka % "compile->compile;test->test", + "org.apache.kafka" % "kafka-streams" % versions.kafka % "test" classifier "test", + ), + excludeDependencies in Test ++= kafkaStreamsExclusionRules, + excludeDependencies ++= kafkaStreamsExclusionRules, + excludeFilter in unmanagedResources := "BUILD", + publishArtifact in Test := true + ).dependsOn( + injectCore % "test->test;compile->compile", + injectLogback % "test->test", + injectSlf4j % "test->test;compile->compile", + injectUtils % "test->test;compile->compile", + jackson % "test->test;compile->compile", + kafka % "test->test;compile->compile", + kafkaStreamsQueryableThriftClient % "test->test;compile->compile", + thrift % "test->test", + utils % "test->test;compile->compile") + + lazy val site = (project in file("doc")) .enablePlugins(SphinxPlugin) .settings( - baseSettings ++ buildSettings ++ Seq( - scalacOptions in doc ++= Seq("-doc-title", "Finatra", "-doc-version", version.value), - includeFilter in Sphinx := ("*.html" | "*.png" | "*.svg" | "*.js" | "*.css" | "*.gif" | "*.txt"))) + baseSettings ++ buildSettings ++ Seq( + scalacOptions in doc ++= Seq("-doc-title", "Finatra", "-doc-version", version.value), + includeFilter in Sphinx := ("*.html" | "*.png" | "*.svg" | "*.js" | "*.css" | "*.gif" | "*.txt"))) // START EXAMPLES diff --git a/doc/src/sphinx/user-guide/index.rst b/doc/src/sphinx/user-guide/index.rst index 374f028a84..3a3f589260 100644 --- a/doc/src/sphinx/user-guide/index.rst +++ b/doc/src/sphinx/user-guide/index.rst @@ -73,6 +73,13 @@ Clients - :doc:`thrift/clients` +Kafka Streams +------------- + +- :doc:`kafka-streams/index` +- :doc:`kafka-streams/examples` +- :doc:`kafka-streams/testing` + Testing ------- @@ -124,6 +131,9 @@ Testing thrift/exceptions thrift/warmup thrift/clients + kafka-streams/index + kafka-streams/examples + kafka-streams/testing testing/index testing/embedded testing/feature_tests diff --git a/doc/src/sphinx/user-guide/kafka-streams/examples.rst b/doc/src/sphinx/user-guide/kafka-streams/examples.rst new file mode 100644 index 0000000000..25921a2197 --- /dev/null +++ b/doc/src/sphinx/user-guide/kafka-streams/examples.rst @@ -0,0 +1,55 @@ +.. _kafka-streams_examples: + +Examples +======== + +The `integration tests `__ serve as a good collection of example Finatra Kafka Streams servers. + +Word Count Server +----------------- + +We can build a lightweight server which counts the unique words from an input topic, storing the results in RocksDB. + +.. code:: scala + + class WordCountRocksDbServer extends KafkaStreamsTwitterServer { + + override val name = "wordcount" + private val countStoreName = "CountsStore" + + override protected def configureKafkaStreams(builder: StreamsBuilder): Unit = { + builder.asScala + .stream[Bytes, String]("TextLinesTopic")(Consumed.`with`(Serdes.Bytes, Serdes.String)) + .flatMapValues(_.split(' ')) + .groupBy((_, word) => word)(Serialized.`with`(Serdes.String, Serdes.String)) + .count()(Materialized.as(countStoreName)) + .toStream + .to("WordsWithCountsTopic")(Produced.`with`(Serdes.String, ScalaSerdes.Long)) + } + } + +Queryable State +~~~~~~~~~~~~~~~ + +We can then expose a Thrift endpoint enabling clients to directly query the state via `interactive queries `__. + +.. code:: scala + + class WordCountRocksDbServer extends KafkaStreamsTwitterServer with QueryableState { + + ... + + final override def configureThrift(router: ThriftRouter): Unit = { + router + .add( + new WordCountQueryService( + queryableFinatraKeyValueStore[String, Long]( + storeName = countStoreName, + primaryKeySerde = Serdes.String + ) + ) + ) + } + } + +In this example, ``WordCountQueryService`` is an underlying Thrift service. \ No newline at end of file diff --git a/doc/src/sphinx/user-guide/kafka-streams/index.rst b/doc/src/sphinx/user-guide/kafka-streams/index.rst new file mode 100644 index 0000000000..cd113f9652 --- /dev/null +++ b/doc/src/sphinx/user-guide/kafka-streams/index.rst @@ -0,0 +1,56 @@ +.. _kafka-streams: + +Finatra Kafka Streams +===================== + +Finatra has native integration with `Kafka Streams `__ to easily build Kafka Streams applications on top of a `TwitterServer `__. + +Features +-------- + +- Intuitive `DSL `__ for topology creation, compatible with the `Kafka Streams DSL `__ +- Full Kafka Streams metric integration, exposed as `TwitterServer Metrics `__ +- `RocksDB integration <#rocksdb>`__ +- `Queryable State <#queryable-state>`__ +- `Rich testing functionality `__ + +Basics +------ + +With `KafkaStreamsTwitterServer `__, +a fully functional service can be written by simply configuring the Kafka Streams Builder via the ``configureKafkaStreams()`` lifecycle method. See the `examples `__ section. + +Transformers +~~~~~~~~~~~~ + +Implement custom `transformers `__ using `FinatraTransformerV2 `__. + +Aggregations +^^^^^^^^^^^^ + +There are several included aggregating transformers, which may be used when configuring a ``StreamsBuilder`` + + ``sample`` + + ``sum`` + + ``compositeSum`` + +Stores +------ + +RocksDB +~~~~~~~ + +In addition to using `state stores `__, you may also use a RocksDB-backed store. This affords all of the advantages of using `RocksDB `__, including efficient range scans. + +Queryable State +~~~~~~~~~~~~~~~ + +Finatra Kafka Streams supports directly querying state from a store. This can be useful for creating a service that serves data aggregated within a local Topology. You can use `static partitioning `__ to query an instance deterministically known to hold a key. + +See how queryable state is used in the following `example `__. + +Queryable Stores +^^^^^^^^^^^^^^^^ + + - `QueryableFinatraKeyValueStore `__ + - `QueryableFinatraWindowStore `__ + - `QueryableFinatraCompositeWindowStore `__ diff --git a/doc/src/sphinx/user-guide/kafka-streams/testing.rst b/doc/src/sphinx/user-guide/kafka-streams/testing.rst new file mode 100644 index 0000000000..65bd74b8c5 --- /dev/null +++ b/doc/src/sphinx/user-guide/kafka-streams/testing.rst @@ -0,0 +1,6 @@ +.. _kafka-streams_testing: + +Testing +======= + +Finatra Kafka Streams includes tooling that simplifies the process of writing highly testable services. See `TopologyFeatureTest `__, which includes a `FinatraTopologyTester `__ that integrates Kafka Streams' `TopologyTestDriver `__ with a `KafkaStreamsTwitterServer `__. \ No newline at end of file diff --git a/doc/src/sphinx/user-guide/testing/feature_tests.rst b/doc/src/sphinx/user-guide/testing/feature_tests.rst index 16d8e87cb6..42e648b6a3 100644 --- a/doc/src/sphinx/user-guide/testing/feature_tests.rst +++ b/doc/src/sphinx/user-guide/testing/feature_tests.rst @@ -52,7 +52,7 @@ from the |EmbeddedThriftServer|_. .. code:: scala import com.example.thriftscala.ExampleThrift - import com.twitter.conversions.time._ + import com.twitter.conversions.DurationOps._ import com.twitter.finatra.thrift.EmbeddedThriftServer import com.twitter.inject.server.FeatureTest import com.twitter.util.Await @@ -136,7 +136,7 @@ could close the client in the ScalaTest `afterAll` lifecycle block. E.g., .. code:: scala import com.example.thriftscala.ExampleThrift - import com.twitter.conversions.time._ + import com.twitter.conversions.DurationOps._ import com.twitter.finatra.thrift.EmbeddedThriftServer import com.twitter.inject.server.FeatureTest import com.twitter.util.Await @@ -167,7 +167,7 @@ then you can feature test by constructing an `EmbeddedHttpServer with ThriftClie .. code:: scala import com.example.thriftscala.ExampleThrift - import com.twitter.conversions.time._ + import com.twitter.conversions.DurationOps._ import com.twitter.finatra.http.EmbeddedHttpServer import com.twitter.finatra.thrift.ThriftClient import com.twitter.inject.server.FeatureTest @@ -246,8 +246,8 @@ For example, we could define a "base" testing trait: override val overrideModules = Seq(???) }, flags = flags - ).bind[Foo](bar) - .bind[Baz](bazImpl) + ).bind[Foo].toInstance(bar) + .bind[Baz].toInstance(bazImpl) } This "base" trait can define a method for obtaining a properly configured Embedded server for test diff --git a/doc/src/sphinx/user-guide/testing/index.rst b/doc/src/sphinx/user-guide/testing/index.rst index 9c702baaf6..0b789ca33c 100644 --- a/doc/src/sphinx/user-guide/testing/index.rst +++ b/doc/src/sphinx/user-guide/testing/index.rst @@ -42,14 +42,13 @@ in Finatra revolves around the following definitions: `ScalaTest `__ ----------------------------------------- -The Finatra testing framework is in transition from the `WordSpec `__ -ScalaTest `testing style `__ to `FunSuite `__ -for framework testing and to facilitate the types of testing outlined above we have several testing -traits to aid in creating simple and powerful tests. +The Finatra testing framework uses the Twitter recommended ScalaTest `testing style `__ `FunSuite `__ for framework testing and to +facilitate the types of testing outlined above we have several testing traits to aid in creating simple +and powerful tests. For more information on `ScalaTest `__, see the `ScalaTest User Guide `__. -To make use of another ScalaTest test style, such as `FunSpec `__ +To make use of another ScalaTest testing style, such as `FunSpec `__ or others, see `Test Mixins `__. More Information diff --git a/doc/src/sphinx/user-guide/testing/mixins.rst b/doc/src/sphinx/user-guide/testing/mixins.rst index a5459f33f6..821c4acaab 100644 --- a/doc/src/sphinx/user-guide/testing/mixins.rst +++ b/doc/src/sphinx/user-guide/testing/mixins.rst @@ -11,12 +11,6 @@ You can use this ScalaTest test style by extending either: - |c.t.inject.IntegrationTest|_ - |c.t.inject.server.FeatureTest|_ -There are also deprecated versions which mix-in the `WordSpec `__ testing style: - -- `c.t.inject.WordSpecTest` -- `c.t.inject.WordSpecIntegrationTest` -- `c.t.inject.server.WordSpecFeatureTest` - However, you are free to choose a ScalaTest testing style that suits your team by using the test mixin companion classes directly and mix in your preferred ScalaTest style: - |c.t.inject.TestMixin|_ diff --git a/doc/src/sphinx/user-guide/thrift/controllers.rst b/doc/src/sphinx/user-guide/thrift/controllers.rst index 4ad323622f..5ab21558f5 100644 --- a/doc/src/sphinx/user-guide/thrift/controllers.rst +++ b/doc/src/sphinx/user-guide/thrift/controllers.rst @@ -1,27 +1,14 @@ -.. _thrift_controllers: +.. _thrift_Controllers: Defining Thrift Controllers =========================== -A *Thrift Controller* is an implementation of your thrift service. To create the controller, extend the `c.t.finatra.thrift.Controller `__ trait and mix-in the `Scrooge `__-generated `BaseServiceIface` trait for your service. Scrooge generates a `ServiceIface` which is a case class containing a `Service` for each thrift method over the corresponding `Args` and `SuccessType` structures for the method that extends from the `BaseServiceIface` trait. E.g, +A *Thrift Controller* is an implementation of your thrift service. To create the Controller, extend the `c.t.finatra.thrift.Controller `__ with the generated thrift service as its argument. Scrooge generates a `GeneratedThriftService` which is a class containing information about the various `ThriftMethods` and types that this service defines. Each `ThriftMethod` defines an `Args` type and `SuccessType` type. When creating a `Controller`, you must provide exactly one implementation for each method defined in your Thrift service using the `handle(ThriftMethod)` DSL. -.. code:: scala - - case class ServiceIface( - fetchBlob: Service[FetchBlob.Args, FetchBlob.SuccessType] - ) extends BaseServiceIface - - -For Thrift Controllers we use the `BaseServiceIface` trait since we are not able to extend the `ServiceIface` case class. - -.. note:: - - The generated `BaseServiceIface` was deprecated on 2017-11-07, but Finatra still only supports `BaseServiceIface` and it is safe to ignore the deprecation warning until Finatra supports `ServicePerEndpoint`. +Implementing methods with `handle(ThriftMethod)` +------------------------------------------------ -`handle(ThriftMethod)` DSL --------------------------- - -The Finatra `c.t.finatra.thrift.Controller` provides a DSL with which you can easily implement your thrift service methods via the `handle(ThriftMethod) `__ function which takes a callback from `ThriftMethod.Args => Future[ThriftMethod.SuccessType]`. +The Finatra `c.t.finatra.thrift.Controller` provides a DSL with which you can implement your thrift service methods via the `handle(ThriftMethod) `__ function. Using this DSL, you can apply `TypeAgnostic` `Filters` to handling of methods as well as provide an implementation in the form of a function from `ThriftMethod.Args => Future[ThriftMethod.SuccessType]`, `Request[ThriftMethod.Args] => Future[Response[ThriftMethod.SuccessType]]` or `Service[Request[ThriftMethod.Args], Response[ThriftMethod.Args]]`. For example, given the following thrift IDL: `example_service.thrift` @@ -39,7 +26,7 @@ For example, given the following thrift IDL: `example_service.thrift` ) throws ( 1: finatra_thrift_exceptions.ServerError serverError, 2: finatra_thrift_exceptions.UnknownClientIdError unknownClientIdError - 3: finatra_thrift_exceptions.NoClientIdError noClientIdError + 3: finatra_thrift_exceptions.NoClientIdError kClientError ) } @@ -53,27 +40,46 @@ We can implement the following Thrift Controller: import com.twitter.util.Future class ExampleThriftController - extends Controller - with ExampleService.BaseServiceIface { + extends Controller(ExampleService) { - override val add1 = handle(Add1) { args: Add1.Args => + val addFilter: Filter.TypeAgnostic = { ... } + + handle(Add1).filtered(addFilter) { args: Add1.Args => Future(args.num + 1) } } -The `handle(ThriftMethod)` function may seem magical but it serves an important purpose. By implementing your service method via this function, it allows the framework to apply the configured filter chain defined in your `server definition <../build-new-thrift-server#server-definition>`__ to your method implementation (passed as the callback to `handle(ThriftMethod)`). +The `handle(ThriftMethod)` function may seem magical but it serves an important purpose. By implementing your service method via this function, it allows the framework to apply the configured global filter chain defined in your `server definition <../build-new-thrift-server#server-definition>`__ to your method implementation (passed as the callback to `handle(ThriftMethod)`). -That is to say, the `handle(ThriftMethod)` function captures your method implementation then exposes it for the `ThriftRouter `__ to combine with the configured filter chain to build the `Finagle Service `__ that represents your server. +That is to say, the `handle(ThriftMethod)` function captures filters that you apply to that particular method plus your method implementation and then exposes it for the `ThriftRouter `__ to combine with the configured global filter chain to build the `Finagle Service `__ that represents your server. See the `Filters `__ section for more information on adding filters to your server definition. -Note, in the example above we implement the `ExampleService.BaseServiceIface#add1` method to satisfy the `ExampleService.BaseServiceIface` interface -- however, the framework will not call the `add1` method in this way as it uses the implementation of the thrift method captured by the `handle(ThriftMethod)` function (as mentioned above this in order to apply the configured filter chain to requests). Thus if you were to directly call `ExampleThriftController.add1(request)` this would by-pass any configured `filters `__ from the server definition. +When creating a Controller to handle a `ThriftSerice`, all methods defined in the thrift service must have one and only one implementation - that is, there should be exactly one call to `handle(ThriftMethod)` for each thrift method defined. Anything else will result in the Finatra service failing at runtime. -Ensure you override using `val` -------------------------------- +Scrooge `Request` and `Response` Wrappers +----------------------------------------- +By providing an implementation that is aware of the Scrooge-generated `Request` and `Response` wrappers, header data is available. Using the earlier `ExampleThrift`, we can construct a Controller that examines header information like this: + +.. code:: scala + + import com.twitter.example.thriftscala.ExampleService + import com.twitter.finatra.thrift.Controller + import com.twitter.util.Future + import com.twitter.scrooge.{Request, Response} + + class ExampleThriftController extends Controller(ExampleService) { + + handle(Add1).withFn { request: Request[Add1.Args] => + val num = request.args.num + val headers = request.headers + + log(s"Add1 called with $num and headers: $headers") + Future(Response(num + 1)) + } + } -You will see above that we use `override val` since the computed `ThriftMethodService `__ instance returned `is effectively constant `__. However, you MUST override as a `val` when using the `handle(ThriftMethod)` function as using a `def` here will cause indeterminate behavior that will be hard to debug. Add the Controller to the Server -------------------------------- @@ -92,37 +98,64 @@ Controller: } -Please note that Finatra only currently supports adding a **single** Thrift controller to the `ThriftRouter`. The expectation is that you are implementing a single Thrift *service* and thus a single `BaseServiceIface` which is implementable in a single controller. +Please note that Finatra only currently supports adding a **single** Thrift Controller to the `ThriftRouter`. The expectation is that you are implementing a single Thrift *service* and thus a single `ThriftService`. But I don't want to write all of my code inside of one Controller class ----------------------------------------------------------------------- Don't worry. You don't have to. -The only requirement is a single class which implements the service's `BaseServiceIface`. Nothing specifies that *this* class needs to contain all of your service implementation or logic. +The only requirement is a single class which implements the service's defined thrift methods. Nothing specifies that *this* class needs to contain all of your service implementation or logic. -If you want to modularize or componentize to have a better separation of concerns in your code, your `BaseServiceIface` implementation can be easily written to inject other services or handlers such that complicated logic can be handled in other classes as is generally good practice. E.g., +If you want to modularize or componentize to have a better separation of concerns in your code, your `Controller` implementation can be easily written to inject other services or handlers such that complicated logic can be handled in other classes as is generally good practice. E.g., .. code:: scala class ExampleThriftController @Inject() ( add1Service: Add1Service, add2Service: Add2Service, - ) extends Controller - with ExampleService.BaseServiceIface { + ) extends Controller(ExampleService) { - override val add1 = handle(Add1) { args: Add1.Args => - add1Service.add1(args) - } + // add1Service must be of a unique type for injection but also extends: + // Service[Request[Add1.Args], Response[Add1.SuccessType]] + // which is what the withService method is looking for. + handle(Add1).withService(add1Service) - override val add2 = handle(Add2) { args: Add2.Args => - add2Service.add2(args) - } - } + handle(Add2).withService(add2Service) + } -In the above example the `BaseServiceIface` implementation merely calls the methods of other classes to provide the service's Thrift Controller method implementations. +In the above example the `Controller` implementation forwards handling of the various methods to the injected services directly. + +How you structure and call other classes from the `Controller` implementation is completely up to you to implement in whatever way makes sense for your service or team. + +Deprecated/Legacy Controller Information +---------------------------------------- + +Prior to constructing a `Controller` by extending `Controller(GeneratedThriftSerivce)`, a Controller was constructed by creating a class that extended `Controller with GeneratedThriftSerivce.BaseServiceIface`. Constructing a Controller this way is still possible but deprecated. + +Since a legacy-style `Controller` extends the `BaseServiceIface` directly, it must provide implementations for each of the thrift methods, but it also must still use the `handle(ThriftMethod)` method to make Finatra aware of which methods are being served for reporting and filtering reasons. If this is not done, none of the configured global filters will be applied (including things like per-method stats). + +It is important that when constructing the overrides for the `BaseServiceIface`, they must be implemented as a `val` instead of a `def`. If they're `defs`, the service/filters will be re-created for each incoming request, incurring very serious overhead. + +Legacy style Controllers cannot use per-method filtering or have access to headers via Scrooge's `Request` and `Response` types. + +A properly configured legacy-style Controller looks like this: + +.. code:: scala + + import com.twitter.example.thriftscala.ExampleService + import com.twitter.finatra.thrift.Controller + import com.twitter.util.Future + + class ExampleThriftController + extends Controller with ExampleService.BaseServiceIface { + + // Note that this is a val instead of a def + override val add1 = handle(Add1) { args: Add1.Args => + Future(args.num + 1) + } + } -How you structure and call other classes from the `BaseServiceIface` implementation is completely up to you to implement in whatever way makes sense for your service or team. More information ---------------- diff --git a/examples/thrift-server/thrift-example-server/src/test/scala/com/twitter/calculator/CalculatorServerFeatureTest.scala b/examples/thrift-server/thrift-example-server/src/test/scala/com/twitter/calculator/CalculatorServerFeatureTest.scala index 784bcff26b..fe7ede9564 100644 --- a/examples/thrift-server/thrift-example-server/src/test/scala/com/twitter/calculator/CalculatorServerFeatureTest.scala +++ b/examples/thrift-server/thrift-example-server/src/test/scala/com/twitter/calculator/CalculatorServerFeatureTest.scala @@ -13,18 +13,22 @@ class CalculatorServerFeatureTest extends FeatureTest { val client = server.thriftClient[Calculator[Future]](clientId = "client123") test("whitelist#clients allowed") { - client.increment(1).value should equal(2) - client.addNumbers(1, 2).value should equal(3) - client.addStrings("1", "2").value should equal("3") + await(client.increment(1)) should equal(2) + await(client.addNumbers(1, 2)) should equal(3) + await(client.addStrings("1", "2")) should equal("3") } test("blacklist#clients blocked with UnknownClientIdException") { val clientWithUnknownId = server.thriftClient[Calculator[Future]](clientId = "unlisted-client") - intercept[UnknownClientIdError] { clientWithUnknownId.increment(2).value } + intercept[UnknownClientIdError] { + await(clientWithUnknownId.increment(2)) + } } test("clients#without a client-id blocked with NoClientIdException") { val clientWithoutId = server.thriftClient[Calculator[Future]]() - intercept[NoClientIdError] { clientWithoutId.increment(1).value } + intercept[NoClientIdError] { + await(clientWithoutId.increment(1)) + } } } diff --git a/http/src/main/scala/com/twitter/finatra/http/filters/HttpResponseFilter.scala b/http/src/main/scala/com/twitter/finatra/http/filters/HttpResponseFilter.scala index 7cd5e1b0c3..6d0b46b283 100644 --- a/http/src/main/scala/com/twitter/finatra/http/filters/HttpResponseFilter.scala +++ b/http/src/main/scala/com/twitter/finatra/http/filters/HttpResponseFilter.scala @@ -1,6 +1,6 @@ package com.twitter.finatra.http.filters -import com.twitter.conversions.time._ +import com.twitter.conversions.DurationOps._ import com.twitter.finagle.http.{MediaType, Message, Request, Response} import com.twitter.finagle.{Service, SimpleFilter} import com.twitter.finatra.http.HttpHeaders diff --git a/http/src/main/scala/com/twitter/finatra/http/internal/server/BaseHttpServer.scala b/http/src/main/scala/com/twitter/finatra/http/internal/server/BaseHttpServer.scala index 5c21c93b25..5de3875f21 100644 --- a/http/src/main/scala/com/twitter/finatra/http/internal/server/BaseHttpServer.scala +++ b/http/src/main/scala/com/twitter/finatra/http/internal/server/BaseHttpServer.scala @@ -2,8 +2,8 @@ package com.twitter.finatra.http.internal.server import com.google.inject.Module import com.twitter.app.Flag -import com.twitter.conversions.storage._ -import com.twitter.conversions.time._ +import com.twitter.conversions.StorageUnitOps._ +import com.twitter.conversions.DurationOps._ import com.twitter.finagle.http.service.NullService import com.twitter.finagle.http.{Request, Response} import com.twitter.finagle.stats.StatsReceiver diff --git a/http/src/test/scala/com/twitter/finatra/http/EmbeddedHttpServer.scala b/http/src/test/scala/com/twitter/finatra/http/EmbeddedHttpServer.scala index fb8961fd38..cad431fad6 100644 --- a/http/src/test/scala/com/twitter/finatra/http/EmbeddedHttpServer.scala +++ b/http/src/test/scala/com/twitter/finatra/http/EmbeddedHttpServer.scala @@ -1091,6 +1091,7 @@ class EmbeddedHttpServer( } private def matchesAdminRoute(method: Method, path: String): Boolean = { + start() // ensure we have started the server and thus added admin routes. path.startsWith(HttpRouter.FinatraAdminPrefix) || adminHttpRouteMatchesPath(method -> path) } diff --git a/http/src/test/scala/com/twitter/finatra/http/ExternalHttpClient.scala b/http/src/test/scala/com/twitter/finatra/http/ExternalHttpClient.scala index ea14efe580..ae633cbb27 100644 --- a/http/src/test/scala/com/twitter/finatra/http/ExternalHttpClient.scala +++ b/http/src/test/scala/com/twitter/finatra/http/ExternalHttpClient.scala @@ -1,6 +1,6 @@ package com.twitter.finatra.http -import com.twitter.conversions.time._ +import com.twitter.conversions.DurationOps._ import com.twitter.finatra.json.FinatraObjectMapper import com.twitter.inject.server.{EmbeddedTwitterServer, PortUtils, Ports, info} import com.twitter.util.Closable diff --git a/http/src/test/scala/com/twitter/finatra/http/JsonAwareEmbeddedHttpClient.scala b/http/src/test/scala/com/twitter/finatra/http/JsonAwareEmbeddedHttpClient.scala index 4f18478161..f1c46fc80e 100644 --- a/http/src/test/scala/com/twitter/finatra/http/JsonAwareEmbeddedHttpClient.scala +++ b/http/src/test/scala/com/twitter/finatra/http/JsonAwareEmbeddedHttpClient.scala @@ -1,7 +1,7 @@ package com.twitter.finatra.http import com.fasterxml.jackson.databind.JsonNode -import com.twitter.conversions.time._ +import com.twitter.conversions.DurationOps._ import com.twitter.finagle.http.{Request, Response, Status} import com.twitter.finatra.json.{FinatraObjectMapper, JsonDiff} import com.twitter.inject.server.{EmbeddedHttpClient, _} diff --git a/http/src/test/scala/com/twitter/finatra/http/tests/conversions/FutureHttpConversionsTest.scala b/http/src/test/scala/com/twitter/finatra/http/tests/conversions/FutureHttpConversionsTest.scala index c6e4eb165c..479a9650bd 100644 --- a/http/src/test/scala/com/twitter/finatra/http/tests/conversions/FutureHttpConversionsTest.scala +++ b/http/src/test/scala/com/twitter/finatra/http/tests/conversions/FutureHttpConversionsTest.scala @@ -1,6 +1,6 @@ package com.twitter.finatra.http.tests.conversions -import com.twitter.conversions.time._ +import com.twitter.conversions.DurationOps._ import com.twitter.finagle._ import com.twitter.finagle.http.Status._ import com.twitter.finatra.http.conversions.futureHttp._ diff --git a/http/src/test/scala/com/twitter/finatra/http/tests/filters/StatsFilterTest.scala b/http/src/test/scala/com/twitter/finatra/http/tests/filters/StatsFilterTest.scala index d4008d7e0a..04e46ea4f9 100644 --- a/http/src/test/scala/com/twitter/finatra/http/tests/filters/StatsFilterTest.scala +++ b/http/src/test/scala/com/twitter/finatra/http/tests/filters/StatsFilterTest.scala @@ -1,6 +1,6 @@ package com.twitter.finatra.http.tests.filters -import com.twitter.conversions.time._ +import com.twitter.conversions.DurationOps._ import com.twitter.finagle.Service import com.twitter.finagle.http.{Request, Response} import com.twitter.finagle.service.{ReqRep, ResponseClass, ResponseClassifier} diff --git a/http/src/test/scala/com/twitter/finatra/http/tests/integration/darktraffic/main/DarkServerTestModule.scala b/http/src/test/scala/com/twitter/finatra/http/tests/integration/darktraffic/main/DarkServerTestModule.scala index 5ff3af402a..21caf27b7f 100644 --- a/http/src/test/scala/com/twitter/finatra/http/tests/integration/darktraffic/main/DarkServerTestModule.scala +++ b/http/src/test/scala/com/twitter/finatra/http/tests/integration/darktraffic/main/DarkServerTestModule.scala @@ -1,6 +1,6 @@ package com.twitter.finatra.http.tests.integration.darktraffic.main -import com.twitter.conversions.time._ +import com.twitter.conversions.DurationOps._ import com.twitter.finagle.Http import com.twitter.finagle.http.Method.{Delete, Post} import com.twitter.finagle.http.Request diff --git a/http/src/test/scala/com/twitter/finatra/http/tests/integration/doeverything/main/modules/DoEverythingModule.scala b/http/src/test/scala/com/twitter/finatra/http/tests/integration/doeverything/main/modules/DoEverythingModule.scala index d3f92c5653..e9c12b86a1 100644 --- a/http/src/test/scala/com/twitter/finatra/http/tests/integration/doeverything/main/modules/DoEverythingModule.scala +++ b/http/src/test/scala/com/twitter/finatra/http/tests/integration/doeverything/main/modules/DoEverythingModule.scala @@ -2,7 +2,7 @@ package com.twitter.finatra.http.tests.integration.doeverything.main.modules import com.google.inject.Provides import com.google.inject.name.{Named, Names} -import com.twitter.conversions.time._ +import com.twitter.conversions.DurationOps._ import com.twitter.finatra.http.tests.integration.doeverything.main.services.{ ComplexServiceFactory, MultiService, diff --git a/http/src/test/scala/com/twitter/finatra/http/tests/integration/tweetexample/main/TweetsEndpointServer.scala b/http/src/test/scala/com/twitter/finatra/http/tests/integration/tweetexample/main/TweetsEndpointServer.scala index 475f6ff760..a763a0843f 100644 --- a/http/src/test/scala/com/twitter/finatra/http/tests/integration/tweetexample/main/TweetsEndpointServer.scala +++ b/http/src/test/scala/com/twitter/finatra/http/tests/integration/tweetexample/main/TweetsEndpointServer.scala @@ -1,6 +1,6 @@ package com.twitter.finatra.http.tests.integration.tweetexample.main -import com.twitter.conversions.storage._ +import com.twitter.conversions.StorageUnitOps._ import com.twitter.finatra.http.{HttpServer, Tls} import com.twitter.finatra.http.filters.CommonFilters import com.twitter.finatra.http.tests.integration.tweetexample.main.controllers.{AdminController, TweetsController} diff --git a/http/src/test/scala/com/twitter/finatra/http/tests/integration/tweetexample/test/TweetsControllerIntegrationTest.scala b/http/src/test/scala/com/twitter/finatra/http/tests/integration/tweetexample/test/TweetsControllerIntegrationTest.scala index 28fa3884a9..8ac412171b 100644 --- a/http/src/test/scala/com/twitter/finatra/http/tests/integration/tweetexample/test/TweetsControllerIntegrationTest.scala +++ b/http/src/test/scala/com/twitter/finatra/http/tests/integration/tweetexample/test/TweetsControllerIntegrationTest.scala @@ -19,7 +19,7 @@ class TweetsControllerIntegrationTest extends FeatureTest { defaultRequestHeaders = Map("X-UserId" -> "123"), // Set client flags to also start on HTTPS port flags = Map("https.port" -> ":0", "cert.path" -> "", "key.path" -> "") - ).bind[mutable.ArrayBuffer[String]](onWriteLog) + ).bind[mutable.ArrayBuffer[String]].toInstance(onWriteLog) lazy val streamingJsonHelper = new StreamingJsonTestHelper(server.mapper) diff --git a/http/src/test/scala/com/twitter/finatra/http/tests/response/StreamingResponseTest.scala b/http/src/test/scala/com/twitter/finatra/http/tests/response/StreamingResponseTest.scala index e4239ae560..03ec4a6b35 100644 --- a/http/src/test/scala/com/twitter/finatra/http/tests/response/StreamingResponseTest.scala +++ b/http/src/test/scala/com/twitter/finatra/http/tests/response/StreamingResponseTest.scala @@ -1,7 +1,7 @@ package com.twitter.finatra.http.tests.response import com.twitter.concurrent.AsyncStream -import com.twitter.conversions.time._ +import com.twitter.conversions.DurationOps._ import com.twitter.finagle.http.{Response, Status} import com.twitter.finatra.http.response.StreamingResponse import com.twitter.inject.Test diff --git a/httpclient/src/test/scala/com/twitter/finatra/httpclient/HttpClientIntegrationTest.scala b/httpclient/src/test/scala/com/twitter/finatra/httpclient/HttpClientIntegrationTest.scala index e93f521722..6cb3963ced 100644 --- a/httpclient/src/test/scala/com/twitter/finatra/httpclient/HttpClientIntegrationTest.scala +++ b/httpclient/src/test/scala/com/twitter/finatra/httpclient/HttpClientIntegrationTest.scala @@ -22,7 +22,7 @@ class HttpClientIntegrationTest extends IntegrationTest { private[this] val httpClient = injector.instance[HttpClient] override def afterEach(): Unit = { - resetResettables(inMemoryHttpService) + inMemoryHttpService.reset() } test("execute") { diff --git a/inject-thrift-client-http-mapper/src/test/scala/com/twitter/finatra/multiserver/CombinedServer/DoEverythingCombinedServer.scala b/inject-thrift-client-http-mapper/src/test/scala/com/twitter/finatra/multiserver/CombinedServer/DoEverythingCombinedServer.scala index 02fed2b82b..7af8ea615f 100644 --- a/inject-thrift-client-http-mapper/src/test/scala/com/twitter/finatra/multiserver/CombinedServer/DoEverythingCombinedServer.scala +++ b/inject-thrift-client-http-mapper/src/test/scala/com/twitter/finatra/multiserver/CombinedServer/DoEverythingCombinedServer.scala @@ -1,6 +1,6 @@ package com.twitter.finatra.multiserver.CombinedServer -import com.twitter.conversions.time._ +import com.twitter.conversions.DurationOps._ import com.twitter.finagle.Filter import com.twitter.finatra.http.HttpServer import com.twitter.finatra.http.filters.CommonFilters diff --git a/inject/inject-app/src/test/scala/com/twitter/inject/app/BindDSL.scala b/inject/inject-app/src/test/scala/com/twitter/inject/app/BindDSL.scala index 57b2665c8f..ffdcf2fa27 100644 --- a/inject/inject-app/src/test/scala/com/twitter/inject/app/BindDSL.scala +++ b/inject/inject-app/src/test/scala/com/twitter/inject/app/BindDSL.scala @@ -14,66 +14,6 @@ import scala.reflect.runtime.universe._ */ private[twitter] trait BindDSL { self => - /** - * Bind an instance of type [T] to the object graph of the underlying server. - * This will REPLACE any previously bound instance of the given type. - * - * @param instance - to bind instance. - * @tparam T type of the instance to bind. - * - * @see [[https://twitter.github.io/finatra/user-guide/testing/index.html#feature-tests Feature Tests]] - */ - @deprecated("Use bind[T].toInstance(T)", "2018-03-17") - def bind[T: TypeTag](instance: T): self.type = { - addInjectionServiceModule(new TwitterModule { - override def configure(): Unit = { - bind(asManifest[T]).toInstance(instance) - } - }) - self - } - - /** - * Bind an instance of type [T] annotated with Annotation type [A] to the object - * graph of the underlying server. This will REPLACE any previously bound instance of - * the given type bound with the given annotation type. - * - * @param instance - to bind instance. - * @tparam T type of the instance to bind. - * @tparam Ann type of the Annotation used to bind the instance. - * @see [[https://twitter.github.io/finatra/user-guide/testing/index.html#feature-tests Feature Tests]] - */ - @deprecated("Use bind[T].annotatedWith[Ann].toInstance(T)", "2018-03-17") - def bind[T: TypeTag, Ann <: Annotation: TypeTag](instance: T): self.type = { - addInjectionServiceModule(new TwitterModule { - override def configure(): Unit = { - bind(asManifest[T], asManifest[Ann]).toInstance(instance) - } - }) - self - } - - /** - * Bind an instance of type [T] annotated with the given Annotation value to the object - * graph of the underlying server. This will REPLACE any previously bound instance of - * the given type bound with the given annotation. - * - * @param annotation [[java.lang.annotation.Annotation]] instance value - * @param instance to bind instance. - * @tparam T type of the instance to bind. - * - * @see [[https://twitter.github.io/finatra/user-guide/testing/index.html#feature-tests Feature Tests]] - */ - @deprecated("Use bind[T].annotatedWith(annotation).toInstance(T)", "2018-03-17") - def bind[T: TypeTag](annotation: Annotation, instance: T): self.type = { - addInjectionServiceModule(new TwitterModule { - override def configure(): Unit = { - bind(asManifest[T]).annotatedWith(annotation).toInstance(instance) - } - }) - self - } - /** * Supports a DSL for binding a type [[T]] in different ways. * {{{ diff --git a/inject/inject-app/src/test/scala/com/twitter/inject/app/tests/TestInjectorTest.scala b/inject/inject-app/src/test/scala/com/twitter/inject/app/tests/TestInjectorTest.scala index 1aa473f5fc..2fa5a72125 100644 --- a/inject/inject-app/src/test/scala/com/twitter/inject/app/tests/TestInjectorTest.scala +++ b/inject/inject-app/src/test/scala/com/twitter/inject/app/tests/TestInjectorTest.scala @@ -1,11 +1,16 @@ package com.twitter.inject.app.tests +import com.google.inject.Provides import com.google.inject.name.Names import com.twitter.app.GlobalFlag +import com.twitter.finagle.Service import com.twitter.inject.annotations.{Flag, Flags} import com.twitter.inject.app.TestInjector -import com.twitter.inject.{Test, TwitterModule} -import javax.inject.Inject +import com.twitter.inject.{Mockito, Test, TwitterModule, TypeUtils} +import com.twitter.util.Future +import javax.inject.{Inject, Singleton} +import scala.language.higherKinds +import scala.reflect.runtime.universe._ object testBooleanGlobalFlag extends GlobalFlag[Boolean](false, "Test boolean global flag defaulted to false") @@ -17,24 +22,8 @@ object testMapGlobalFlag "Test map global flag defaulted to Map.empty" ) -object BooleanFlagModule extends TwitterModule { - flag[Boolean]("x", false, "default to false") -} - -object TestBindModule extends TwitterModule { - flag[Boolean]("bool", false, "default is false") - - override protected def configure(): Unit = { - bind[String, Up].toInstance("Hello, world!") - bind[Baz].toInstance(new Baz(10)) - bind[Baz](Names.named("five")).toInstance(new Baz(5)) - bind[Baz](Names.named("six")).toInstance(new Baz(6)) - bind[Boolean](Flags.named("bool")).toInstance(true) - } -} - class FooWithInject @Inject()(@Flag("x") x: Boolean) { - def bar = x + def bar: Boolean = x } class Bar { @@ -44,6 +33,12 @@ class Bar { val mapGlobalFlag = testMapGlobalFlag() } +trait DoEverything[+MM[_]] { + def uppercase(msg: String): MM[String] + def echo(msg: String): MM[String] + def magicNum(): MM[String] +} + trait TestTrait { def foobar(): String } @@ -82,6 +77,34 @@ class ThirtyThree extends Number { override def longValue(): Long = int.toLong } +class DoEverythingImpl42 extends DoEverything[Future] { + override def uppercase(msg: String): Future[String] = { + Future.value(msg.toUpperCase) + } + + override def echo(msg: String): Future[String] = { + Future.value(msg) + } + + override def magicNum(): Future[String] = { + Future.value("42") + } +} + +class DoEverythingImpl137 extends DoEverything[Future] { + override def uppercase(msg: String): Future[String] = { + Future.value(msg.toUpperCase) + } + + override def echo(msg: String): Future[String] = { + Future.value(msg) + } + + override def magicNum(): Future[String] = { + Future.value("137") + } +} + trait Processor { def process: String } @@ -96,7 +119,34 @@ class ProcessorB extends Processor { class Baz(val value: Int) -class TestInjectorTest extends Test { +object BooleanFlagModule extends TwitterModule { + flag[Boolean]("x", false, "default to false") +} + +object TestBindModule extends TwitterModule { + flag[Boolean]("bool", false, "default is false") + + override protected def configure(): Unit = { + bind[String, Up].toInstance("Hello, world!") + bind[Baz].toInstance(new Baz(10)) + bind[Baz](Names.named("five")).toInstance(new Baz(5)) + bind[Baz](Names.named("six")).toInstance(new Baz(6)) + bind[Boolean](Flags.named("bool")).toInstance(true) + bind[Service[Int, String]].toInstance(Service.mk { i: Int => Future.value(s"The answer is: $i") }) + bind[Option[Boolean]].toInstance(None) + bind[Seq[Long]].toInstance(Seq(1L, 2L, 3L)) + // need to explicitly provide a Manifest for the higher kinded type + bind[DoEverything[Future]](TypeUtils.asManifest[DoEverything[Future]]).toInstance(new DoEverythingImpl42) + } + + @Provides + @Singleton + def providesOptionalService: Option[Service[String, String]] = { + Some(Service.mk { name: String => Future.value(s"Hello $name!") }) + } +} + +class TestInjectorTest extends Test with Mockito { override protected def afterEach(): Unit = { // reset flags @@ -142,24 +192,23 @@ class TestInjectorTest extends Test { injector.instance[Baz]("five").value should equal(5) injector.instance[Baz](Names.named("six")).value should equal(6) injector.instance[String, Up] should equal("Hello, world!") + injector.instance[Seq[Long]] should equal(Seq(1L, 2L, 3L)) + val svc = injector.instance[Service[Int, String]] + await(svc(42)) should equal("The answer is: 42") + // need to explicitly provide a Manifest here for the higher kinded type + // note: this is not always necessary + await(injector.instance[DoEverything[Future]](TypeUtils.asManifest[DoEverything[Future]]).magicNum()) should equal("42") } - test("bind deprecated") { - val injector = TestInjector(modules = Seq(TestBindModule)) - .bind[Baz](new Baz(100)) - .bind[String, Up]("Goodbye, world!") - .bind[String](Names.named("foo"), "bar") - .bind[String](Flags.named("cat.flag"), "Kat") - .create + test("bind") { + val testMap: Map[Number, Processor] = + Map( + new FortyTwo -> new ProcessorB, + new ThirtyThree -> new ProcessorA) - injector.instance[Baz].value should equal(100) - injector.instance[String, Up] should equal("Goodbye, world!") - injector.instance[String]("foo") should equal("bar") - injector.instance[String](Names.named("foo")) should equal("bar") - injector.instance[String](Flags.named("cat.flag")) should be("Kat") - } + val mockService: Service[Int, String] = mock[Service[Int, String]] + mockService.apply(anyInt).returns(Future.value("hello, world")) - test("bind") { // bind[T] to [T] // bind[T] to (clazz) // bind[T] toInstance (instance) @@ -191,6 +240,15 @@ class TestInjectorTest extends Test { .bind[String].annotatedWith[Down].toInstance("Goodbye, world!") .bind[String].annotatedWith(classOf[Up]).toInstance("Very important Up String") .bind[String].annotatedWith(Flags.named("cat.flag")).toInstance("Kat") + .bind[Map[Number, Processor]].toInstance(testMap) + .bind[Service[Int, String]].toInstance(mockService) + .bind[Option[Boolean]].toInstance(Some(false)) + .bind[Option[Long]].toInstance(None) + .bind[Option[Service[String, String]]].toInstance(None) + .bind[Seq[Long]].toInstance(Seq(33L, 34L)) + // need to explicitly provide a TypeTag here for the higher kinded type + // note: this is not always necessary + .bind[DoEverything[Future]](typeTag[DoEverything[Future]]).toInstance(new DoEverythingImpl137) .create injector.instance[TestTrait].foobar() should be("TestTraitImpl1") @@ -211,6 +269,19 @@ class TestInjectorTest extends Test { injector.instance[String](classOf[Up]) should equal("Very important Up String") injector.instance[String, Up] should equal("Very important Up String") injector.instance[String](Flags.named("cat.flag")) should be("Kat") + + injector.instance[Map[Number, Processor]] should equal(testMap) + val svc = injector.instance[Service[Int, String]] + await(svc(1)) should equal("hello, world") + + injector.instance[Option[Boolean]] shouldBe Some(false) + injector.instance[Option[Long]] shouldBe None + + injector.instance[Option[Service[String, String]]] shouldBe None + + // need to explicitly provide a Manifest here for the higher kinded type + // note: this is not always necessary + await(injector.instance[DoEverything[Future]](TypeUtils.asManifest[DoEverything[Future]]).magicNum()) should equal("137") } test("bindClass") { @@ -267,11 +338,11 @@ class TestInjectorTest extends Test { test("bind fails after injector is called") { val testInjector = TestInjector(modules = Seq(TestBindModule)) - .bind[Baz](new Baz(100)) + .bind[Baz].toInstance(new Baz(100)) val injector = testInjector.create intercept[IllegalStateException] { - testInjector.bind[String, Up]("Goodbye, world!") + testInjector.bind[String].annotatedWith[Up].toInstance("Goodbye, world!") } injector.instance[Baz].value should equal(100) injector.instance[String, Up] should equal("Hello, world!") diff --git a/inject/inject-core/src/test/scala/com/twitter/inject/IntegrationTestMixin.scala b/inject/inject-core/src/test/scala/com/twitter/inject/IntegrationTestMixin.scala index 5ecda8d545..dd6915222e 100644 --- a/inject/inject-core/src/test/scala/com/twitter/inject/IntegrationTestMixin.scala +++ b/inject/inject-core/src/test/scala/com/twitter/inject/IntegrationTestMixin.scala @@ -1,9 +1,5 @@ package com.twitter.inject -import com.google.inject.Module -import com.google.inject.testing.fieldbinder.{Bind, BoundFieldModule} -import java.lang.reflect.Field -import org.mockito.internal.util.MockUtil import org.scalatest.{Suite, SuiteMixin} /** @@ -25,82 +21,4 @@ trait IntegrationTestMixin /* Protected */ protected def injector: Injector - - @deprecated("Users are encouraged to reset mocks and resettables directly as appropriate. See c.t.inject.Mockito#resetMocks and IntegrationTest#resetResettables", "2017-09-28") - protected val resetBindings = true - - /** See https://github.com/google/guice/wiki/BoundFields */ - @deprecated("Use #bind[T] DSL instead.", "2017-03-01") - protected val integrationTestModule: Module = BoundFieldModule.of(this) - - override protected def beforeAll(): Unit = { - super.beforeAll() - injector.underlying.injectMembers(this) - } - - override protected def afterEach(): Unit = { - super.afterEach() - - if (resetBindings) { - for (mockObject <- mockObjects) { - org.mockito.Mockito.reset(mockObject) - } - - resetResettables(resettableObjects: _*) - } - } - - protected def resetResettables(resettables: Resettable*): Unit = { - for (resettable <- resettables) { - debug("Clearing " + resettable) - resettable.reset() - } - } - - @deprecated("Use #bind[T] DSL instead.", "2017-03-01") - protected def hasBoundFields: Boolean = boundFields.nonEmpty - - /* Private */ - - @deprecated("Use #bind[T] DSL instead.", "2017-03-01") - private[this] lazy val mockObjects = { - val mockUtil = new MockUtil() - for { - field <- boundFields - fieldValue = field.get(this) - if mockUtil.isMock(fieldValue) - } yield fieldValue - } - - @deprecated("Users are encouraged to reset mocks and resettables directly as appropriate. See c.t.inject.Mockito#resetMocks and IntegrationTest#resetResettables", "2017-09-28") - private[this] lazy val resettableObjects = { - for { - field <- boundFields - if classOf[Resettable].isAssignableFrom(field.getType) - _ = field.setAccessible(true) - fieldValue = field.get(this) - } yield fieldValue.asInstanceOf[Resettable] - } - - @deprecated("Use #bind[T] DSL instead.", "2017-03-01") - private[this] lazy val boundFields = { - for { - field <- getDeclaredFieldsRespectingInheritance(getClass) - if hasBindAnnotation(field) - _ = field.setAccessible(true) - } yield field - } - - @deprecated("Use #bind[T] DSL instead.", "2017-03-01") - private def hasBindAnnotation(field: Field): Boolean = { - field.getAnnotation(classOf[Bind]) != null - } - - private def getDeclaredFieldsRespectingInheritance(clazz: Class[_]): Array[Field] = { - if (clazz == null) { - Array() - } else { - clazz.getDeclaredFields ++ getDeclaredFieldsRespectingInheritance(clazz.getSuperclass) - } - } } diff --git a/inject/inject-core/src/test/scala/com/twitter/inject/WordSpecIntegrationTest.scala b/inject/inject-core/src/test/scala/com/twitter/inject/WordSpecIntegrationTest.scala deleted file mode 100644 index 7fcbeccbe3..0000000000 --- a/inject/inject-core/src/test/scala/com/twitter/inject/WordSpecIntegrationTest.scala +++ /dev/null @@ -1,7 +0,0 @@ -package com.twitter.inject - -@deprecated( - "It is recommended that users switch to com.twitter.inject.IntegrationTest which uses FunSuite", - "2017-01-16" -) -trait WordSpecIntegrationTest extends WordSpecTest with IntegrationTestMixin diff --git a/inject/inject-core/src/test/scala/com/twitter/inject/WordSpecTest.scala b/inject/inject-core/src/test/scala/com/twitter/inject/WordSpecTest.scala deleted file mode 100644 index 5077acda35..0000000000 --- a/inject/inject-core/src/test/scala/com/twitter/inject/WordSpecTest.scala +++ /dev/null @@ -1,12 +0,0 @@ -package com.twitter.inject - -import org.junit.runner.RunWith -import org.scalatest.WordSpec -import org.scalatest.junit.JUnitRunner - -@deprecated( - "It is recommended that users switch to com.twitter.inject.Test which uses FunSuite", - "2017-01-16" -) -@RunWith(classOf[JUnitRunner]) -abstract class WordSpecTest extends WordSpec with TestMixin diff --git a/inject/inject-core/src/test/scala/com/twitter/inject/tests/WhenReadyMixinTest.scala b/inject/inject-core/src/test/scala/com/twitter/inject/tests/WhenReadyMixinTest.scala index b62e2463d8..33d42f5d67 100644 --- a/inject/inject-core/src/test/scala/com/twitter/inject/tests/WhenReadyMixinTest.scala +++ b/inject/inject-core/src/test/scala/com/twitter/inject/tests/WhenReadyMixinTest.scala @@ -1,6 +1,6 @@ package com.twitter.inject.tests -import com.twitter.conversions.time._ +import com.twitter.conversions.DurationOps._ import com.twitter.inject.{Test, WhenReadyMixin} import com.twitter.util.Future import org.scalatest.concurrent.ScalaFutures._ diff --git a/inject/inject-core/src/test/scala/com/twitter/inject/tests/module/DoEverythingModule.scala b/inject/inject-core/src/test/scala/com/twitter/inject/tests/module/DoEverythingModule.scala index e5953bfb4c..a424b1c475 100644 --- a/inject/inject-core/src/test/scala/com/twitter/inject/tests/module/DoEverythingModule.scala +++ b/inject/inject-core/src/test/scala/com/twitter/inject/tests/module/DoEverythingModule.scala @@ -3,7 +3,7 @@ package com.twitter.inject.tests.module import com.google.inject.name.{Named, Names} import com.google.inject.spi.TypeConverter import com.google.inject.{Provides, TypeLiteral} -import com.twitter.conversions.time._ +import com.twitter.conversions.DurationOps._ import com.twitter.inject.tests.Prod import com.twitter.inject.{Injector, TwitterModule} import java.util.Properties diff --git a/inject/inject-server/src/main/scala/com/twitter/inject/server/PortUtils.scala b/inject/inject-server/src/main/scala/com/twitter/inject/server/PortUtils.scala index 4b18da5c26..420280c777 100644 --- a/inject/inject-server/src/main/scala/com/twitter/inject/server/PortUtils.scala +++ b/inject/inject-server/src/main/scala/com/twitter/inject/server/PortUtils.scala @@ -2,7 +2,6 @@ package com.twitter.inject.server import com.twitter.app.Flaggable import com.twitter.finagle.ListeningServer -import com.twitter.finagle.builder.{Server => BuilderServer} import java.net.{InetAddress, InetSocketAddress, SocketAddress} /** @@ -39,13 +38,8 @@ object PortUtils { socketAddress.asInstanceOf[InetSocketAddress].getPort } - /** Returns the Integer representation of the given [[BuilderServer]] */ - def getPort(server: BuilderServer): Int = { - getSocketAddress(server).asInstanceOf[InetSocketAddress].getPort - } - - /** Returns the bound address of the given [[BuilderServer]] */ - def getSocketAddress(server: BuilderServer): SocketAddress = { + /** Returns the bound address of the given [[ListeningServer]] */ + def getSocketAddress(server: ListeningServer): SocketAddress = { server.boundAddress } diff --git a/inject/inject-server/src/main/scala/com/twitter/inject/server/TwitterServer.scala b/inject/inject-server/src/main/scala/com/twitter/inject/server/TwitterServer.scala index 723e9ef0ce..d5f07b2fee 100644 --- a/inject/inject-server/src/main/scala/com/twitter/inject/server/TwitterServer.scala +++ b/inject/inject-server/src/main/scala/com/twitter/inject/server/TwitterServer.scala @@ -1,7 +1,8 @@ package com.twitter.inject.server import com.google.inject.Module -import com.twitter.conversions.time._ +import com.twitter.app.Flag +import com.twitter.conversions.DurationOps._ import com.twitter.finagle.client.ClientRegistry import com.twitter.inject.Logging import com.twitter.inject.annotations.Lifecycle @@ -74,12 +75,12 @@ abstract class AbstractTwitterServer extends TwitterServer * @see [[https://twitter.github.io/finatra/user-guide/twitter-server/index.html Creating an Injectable TwitterServer]] */ trait TwitterServer - extends App - with com.twitter.server.TwitterServer - with DeprecatedLogging - with Ports - with Warmup - with Logging { + extends App + with com.twitter.server.TwitterServer + with DeprecatedLogging + with Ports + with Warmup + with Logging { addFrameworkModules( statsReceiverModule, @@ -362,5 +363,64 @@ private[server] trait DeprecatedLogging extends com.twitter.logging.Logging { se @deprecated("For backwards compatibility only.", "2017-10-06") override lazy val log: com.twitter.logging.Logger = com.twitter.logging.Logger(name) - override def configureLoggerFactories(): Unit = {} + // lint if any com.twitter.logging.Logging flags are set + premain { + val flgs: Seq[Flag[_]] = + Seq( + inferClassNamesFlag, + outputFlag, + levelFlag, + asyncFlag, + asyncMaxSizeFlag, + rollPolicyFlag, + appendFlag, + rotateCountFlag + ) + + val userDefinedFlgs: Seq[Flag[_]] = flgs.collect { + case flg: Flag[_] if flg.isDefined => flg + } + + if (userDefinedFlgs.nonEmpty && !userDefinedFlgsAllowed) { + GlobalRules.get.add( + Rule( + Category.Configuration, + "Unsupported util-logging (JUL) flag set", + """By default, Finatra uses the slf4j-api for logging and as such setting util-logging + | flags is not expected to have any effect. Setting these flags may cause your server to + | fail startup in the future. Logging configuration should always match your chosen logging + | implementation. + | See: https://twitter.github.io/finatra/user-guide/logging/index.html.""".stripMargin + ) { + userDefinedFlgs.map(flg => Issue(s"-${flg.name}")) + } + ) + } + } + + /** If slf4j-jdk14 is being used, it is acceptable to have user defined values for these flags */ + private[this] def userDefinedFlgsAllowed: Boolean = { + try { + Class.forName("org.slf4j.impl.JDK14LoggerFactory", false, this.getClass.getClassLoader) + true + } catch { + case _: ClassNotFoundException => false + } + } + + /** + * [[com.twitter.logging.Logging.configureLoggerFactories()]] removes all added JUL handlers + * and adds only handlers defined by [[com.twitter.logging.Logging.loggerFactories]]. + * + * `Logging.configureLoggerFactories` would thus remove the installed SLF4J BridgeHandler + * from [[com.twitter.server.TwitterServer]]. Therefore, we override with a no-op to prevent the + * SLF4J BridgeHandler from being removed. + * + * @note Subclasses MUST override this method with an implementation that configures the + * `com.twitter.logging.Logger` if they want to use their configured logger factories via + * the util-logging style of configuration. + * + * @see [[https://www.slf4j.org/legacy.html#jul-to-slf4j jul-to-slf4j bridge]] + */ + override protected def configureLoggerFactories(): Unit = {} } diff --git a/inject/inject-server/src/test/scala/com/twitter/inject/server/EmbeddedHttpClient.scala b/inject/inject-server/src/test/scala/com/twitter/inject/server/EmbeddedHttpClient.scala index 9319d534a2..ec1496b731 100644 --- a/inject/inject-server/src/test/scala/com/twitter/inject/server/EmbeddedHttpClient.scala +++ b/inject/inject-server/src/test/scala/com/twitter/inject/server/EmbeddedHttpClient.scala @@ -1,6 +1,6 @@ package com.twitter.inject.server -import com.twitter.conversions.time._ +import com.twitter.conversions.DurationOps._ import com.twitter.finagle.{Http, Service} import com.twitter.finagle.http.{Request, Response, Status} import com.twitter.finagle.http.codec.HttpCodec diff --git a/inject/inject-server/src/test/scala/com/twitter/inject/server/FeatureTestMixin.scala b/inject/inject-server/src/test/scala/com/twitter/inject/server/FeatureTestMixin.scala index 151be05179..17e3556ec2 100644 --- a/inject/inject-server/src/test/scala/com/twitter/inject/server/FeatureTestMixin.scala +++ b/inject/inject-server/src/test/scala/com/twitter/inject/server/FeatureTestMixin.scala @@ -1,7 +1,6 @@ package com.twitter.inject.server import com.twitter.inject.{Injector, IntegrationTestMixin} -import com.twitter.util.{Await, Future} import org.scalatest.{Suite, SuiteMixin} import scala.util.control.NonFatal @@ -28,23 +27,6 @@ trait FeatureTestMixin def printStats = true - override protected def beforeAll(): Unit = { - if (server.isStarted && hasBoundFields) { - throw new Exception( - "ERROR: Server started before integrationTestModule added. " + - "@Bind will not work unless references to the server are lazy, or within a ScalaTest " + - "lifecycle method or test method, or the integrationTestModule is manually added as " + - "an override module." - ) - } - - if (hasBoundFields) { - assert(server.isInjectable) - server.injectableServer.addFrameworkOverrideModules(integrationTestModule) - } - super.beforeAll() - } - override protected def afterEach(): Unit = { super.afterEach() if (server.isInjectable) { @@ -68,11 +50,4 @@ trait FeatureTestMixin } } } - - implicit class RichFuture[T](future: Future[T]) { - def value: T = { - Await.result(future) - } - } - } diff --git a/inject/inject-server/src/test/scala/com/twitter/inject/server/WordSpecFeatureTest.scala b/inject/inject-server/src/test/scala/com/twitter/inject/server/WordSpecFeatureTest.scala deleted file mode 100644 index 78bc724e34..0000000000 --- a/inject/inject-server/src/test/scala/com/twitter/inject/server/WordSpecFeatureTest.scala +++ /dev/null @@ -1,9 +0,0 @@ -package com.twitter.inject.server - -import com.twitter.inject.WordSpecTest - -@deprecated( - "It is recommended that users switch to com.twitter.inject.server.FeatureTest which uses FunSuite", - "2017-01-16" -) -trait WordSpecFeatureTest extends WordSpecTest with FeatureTestMixin diff --git a/inject/inject-server/src/test/scala/com/twitter/inject/server/tests/EmbeddedTwitterServerIntegrationTest.scala b/inject/inject-server/src/test/scala/com/twitter/inject/server/tests/EmbeddedTwitterServerIntegrationTest.scala index 3777afcc25..af060d4883 100644 --- a/inject/inject-server/src/test/scala/com/twitter/inject/server/tests/EmbeddedTwitterServerIntegrationTest.scala +++ b/inject/inject-server/src/test/scala/com/twitter/inject/server/tests/EmbeddedTwitterServerIntegrationTest.scala @@ -1,69 +1,23 @@ package com.twitter.inject.server.tests -import com.google.inject.{Provides, Stage} import com.google.inject.name.Names -import com.twitter.finagle.{Http, Service} +import com.google.inject.{Provides, Stage} import com.twitter.finagle.http.{Request, Response, Status} +import com.twitter.finagle.{Http, Service} import com.twitter.inject.server.{EmbeddedTwitterServer, TwitterServer} import com.twitter.inject.{Logging, Test, TwitterModule} import com.twitter.util.{Await, Future} -import com.twitter.util.lint.{Category, GlobalRules, Issue, Rule, Rules, RulesImpl} -import com.twitter.util.registry.{GlobalRegistry, SimpleRegistry} import javax.inject.Singleton -import org.apache.commons.lang.RandomStringUtils class EmbeddedTwitterServerIntegrationTest extends Test { - private [this] def generateTestRuleName: String = { - s"TestRule-${RandomStringUtils.randomAlphabetic(12).toUpperCase()}" - } - - private def mkRules(rules: Rule*): Rules = { - val toReturn = new RulesImpl() - rules.foreach(rule => toReturn.add(rule)) - toReturn - } - - private val alwaysRule1 = - Rule( - Category.Configuration, - generateTestRuleName, - "Lorem ipsum dolor sit amet, consectetur adipiscing elit.") { - Seq( - Issue("Etiam nisi metus, commodo in erat id, sagittis luctus purus."), - Issue("Vestibulum sagittis justo a ex suscipit, sit amet efficitur mi varius."), - Issue("Maecenas eu condimentum nulla, non porta tortor.")) - } - private val alwaysRule2 = - Rule( - Category.Configuration, - generateTestRuleName, - "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla aliquam eu ante et auctor. " + - "Vestibulum sagittis justo a ex suscipit, sit amet efficitur mi varius. Maecenas egestas " + - "viverra arcu, id volutpat magna molestie sit amet.") { - Seq(Issue("Duis blandit orci mi, sit amet euismod magna maximus eu.")) - } - private val alwaysRule3 = - Rule( - Category.Configuration, - generateTestRuleName, - "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla aliquam eu ante et auctor. " + - "Vestibulum sagittis justo a ex suscipit, sit amet efficitur mi varius. Maecenas egestas " + - "viverra arcu, id volutpat magna molestie sit amet.") { - Seq(Issue("This is just a test.")) - } - private val neverRule = - Rule( - Category.Configuration, - generateTestRuleName, - "Donec ligula nibh, accumsan a tempor a, consequat sit amet enim.") { - Seq.empty - } - test("server#start") { val twitterServer = new TwitterServer {} twitterServer.addFrameworkOverrideModules(new TwitterModule {}) - val embeddedServer = new EmbeddedTwitterServer(twitterServer) + val embeddedServer = new EmbeddedTwitterServer( + twitterServer = twitterServer, + disableTestLogging = true + ) try { embeddedServer.httpGetAdmin("/health", andExpect = Status.Ok, withBody = "OK\n") @@ -80,15 +34,19 @@ class EmbeddedTwitterServerIntegrationTest extends Test { test("server#fail if bind on a non-injectable server") { intercept[IllegalStateException] { - new EmbeddedTwitterServer(new NonInjectableServer) - .bind[String].toInstance("hello!") + new EmbeddedTwitterServer( + twitterServer = new NonInjectableServer, + disableTestLogging = true + ).bind[String].toInstance("hello!") } } test("server#support bind in server") { val server = - new EmbeddedTwitterServer(new TwitterServer {}) - .bind[String].toInstance("helloworld") + new EmbeddedTwitterServer( + twitterServer = new TwitterServer {}, + disableTestLogging = true + ).bind[String].toInstance("helloworld") try { server.injector.instance[String] should be("helloworld") @@ -99,20 +57,26 @@ class EmbeddedTwitterServerIntegrationTest extends Test { test("server#support bind with @Named in server") { val server = - new EmbeddedTwitterServer(new TwitterServer {}) - .bind[String] - .annotatedWith(Names.named("best")) - .toInstance("helloworld") + new EmbeddedTwitterServer( + twitterServer = new TwitterServer {}, + disableTestLogging = true + ).bind[String] + .annotatedWith(Names.named("best")) + .toInstance("helloworld") try { - server.injector.instance[String]("best") should be("helloworld") + server.injector.instance[String](Names.named("best")) should be("helloworld") } finally { server.close() } } test("server#fail because of unknown flag") { - val server = new EmbeddedTwitterServer(new TwitterServer {}, flags = Map("foo.bar" -> "true")) + val server = new EmbeddedTwitterServer( + twitterServer = new TwitterServer {}, + flags = Map("foo.bar" -> "true"), + disableTestLogging = true + ) try { val e = intercept[Exception] { @@ -135,158 +99,24 @@ class EmbeddedTwitterServerIntegrationTest extends Test { throw new Exception("Yikes") } }) - }).bind[String].toInstance("helloworld") + }, + disableTestLogging = true + ).bind[String].toInstance("helloworld") + try { val e = intercept[Exception] { server.injector.instance[String] should be("helloworld") } e.getCause.getMessage should be("Yikes") - } finally{ + } finally { server.close() } } - - // Currently fails in sbt - ignore("server#fail startup because of linting violations") { - val rules = mkRules(alwaysRule1, alwaysRule2, neverRule) - - GlobalRegistry.withRegistry(new SimpleRegistry) { - GlobalRules.withRules(rules) { - val server = new EmbeddedTwitterServer(new TwitterServer {}, failOnLintViolation = true) - try { - val e = intercept[Exception] { - server.assertHealthy() - } - e.getMessage.contains("and 4 Linter issues found") should be(true) - } finally { - server.close() - } - } - } - } - - // Currently fails in sbt - ignore("server#starts when there is an artificial rule but no violations and failOnLintViolation = true") { - val rules = mkRules(neverRule) - - GlobalRegistry.withRegistry(new SimpleRegistry) { - GlobalRules.withRules(rules) { - val server = new EmbeddedTwitterServer(new TwitterServer {}, failOnLintViolation = true) - - try { - server.assertHealthy() - } finally { - server.close() - } - } - } - } - - // Currently fails in sbt - ignore("server#starts when there are no lint rule violations and failOnLintViolation = true") { - val rules = mkRules() - - GlobalRegistry.withRegistry(new SimpleRegistry) { - GlobalRules.withRules(rules) { - val server = new EmbeddedTwitterServer(new TwitterServer {}, failOnLintViolation = true) - - try { - server.assertHealthy() - } finally { - server.close() - } - } - } - } - - // Currently fails in sbt - ignore("server#starts when there are linting violations and failOnLintViolation = false") { - val rules = mkRules(alwaysRule1, alwaysRule2, alwaysRule3) - - GlobalRegistry.withRegistry(new SimpleRegistry) { - GlobalRules.withRules(rules) { - val server = new EmbeddedTwitterServer(new TwitterServer {}) - - try { - server.assertHealthy() - } finally { - server.close() - } - } - } - } - - // Currently fails in sbt - ignore("server#non-injectable server fail startup because linting violation and failOnLintViolation = true") { - val rules = mkRules(alwaysRule3) - - GlobalRegistry.withRegistry(new SimpleRegistry) { - GlobalRules.withRules(rules) { - val server = new EmbeddedTwitterServer(new NonInjectableServer, failOnLintViolation = true) - try { - val e = intercept[Exception] { - server.assertHealthy() - } - e.getMessage.contains("and 1 Linter issue found") should be(true) - } finally { - server.close() - } - } - } - } - - // Currently fails in sbt - ignore("server#non-injectable server starts when there are linting violations and failOnLintViolation = false") { - val rules = mkRules(alwaysRule3) - - GlobalRegistry.withRegistry(new SimpleRegistry) { - GlobalRules.withRules(rules) { - val server = new EmbeddedTwitterServer(new NonInjectableServer) - try { - server.assertHealthy() - } finally { - server.close() - } - } - } - } - - // Currently fails in sbt - ignore("server#non-injectable server starts when there are no linting violations and and failOnLintViolation = false") { - val rules = mkRules() - - GlobalRegistry.withRegistry(new SimpleRegistry) { - GlobalRules.withRules(rules) { - val server = new EmbeddedTwitterServer(new NonInjectableServer) - try { - server.assertHealthy() - } finally { - server.close() - } - } - } - } - - // Currently fails in sbt - ignore("server#non-injectable server starts when there are no linting violations and and failOnLintViolation = true") { - val rules = mkRules() - - GlobalRegistry.withRegistry(new SimpleRegistry) { - GlobalRules.withRules(rules) { - val server = new EmbeddedTwitterServer(new NonInjectableServer, failOnLintViolation = true) - try { - server.assertHealthy() - } finally { - server.close() - } - } - } - } } class NonInjectableServer extends com.twitter.server.TwitterServer with Logging { - val service = new Service[Request, Response] { - def apply(request: Request) = { + private[this] val service = new Service[Request, Response] { + def apply(request: Request): Future[Response] = { val response = Response(request.version, Status.Ok) response.contentString = "hello" Future.value(response) diff --git a/inject/inject-server/src/test/scala/com/twitter/inject/server/tests/FeatureTestNonInjectionTest.scala b/inject/inject-server/src/test/scala/com/twitter/inject/server/tests/FeatureTestNonInjectionTest.scala new file mode 100644 index 0000000000..003663ff2c --- /dev/null +++ b/inject/inject-server/src/test/scala/com/twitter/inject/server/tests/FeatureTestNonInjectionTest.scala @@ -0,0 +1,50 @@ +package com.twitter.inject.server.tests + +import com.twitter.conversions.DurationOps._ +import com.twitter.finagle.http.Status +import com.twitter.inject.server.{EmbeddedTwitterServer, FeatureTest} +import com.twitter.server.TwitterServer +import com.twitter.util.{Await, Duration} + +/** Test a non-inject TwitterServer with the [[FeatureTest]] trait */ +class FeatureTestNonInjectionTest extends FeatureTest { + + override val server: EmbeddedTwitterServer = + new EmbeddedTwitterServer( + twitterServer = new TestTwitterServer, + disableTestLogging = true) + + /** + * Explicitly start the server before all tests, close will be attempted + * by [[com.twitter.inject.server.FeatureTestMixin]] in `afterAll`. + */ + override def beforeAll(): Unit = { + server.start() + } + + test("TestServer#starts up") { + server.assertHealthy() + } + + test("TestServer#feature test") { + server.httpGetAdmin( + "/admin/lint.json", + andExpect = Status.Ok + ) + + server.httpGetAdmin( + "/admin/registry.json", + andExpect = Status.Ok + ) + } + +} + +class TestTwitterServer extends TwitterServer { + /* ensure enough time to close resources */ + override val defaultCloseGracePeriod: Duration = 15.seconds + def main(): Unit = { + /* injectable TwitterServer automatically awaits on the admin, we need to do it explicitly here.*/ + Await.ready(adminHttpServer) + } +} diff --git a/inject/inject-server/src/test/scala/com/twitter/inject/server/tests/FeatureTestTest.scala b/inject/inject-server/src/test/scala/com/twitter/inject/server/tests/FeatureTestTest.scala index 46c010fd25..a5340ca041 100644 --- a/inject/inject-server/src/test/scala/com/twitter/inject/server/tests/FeatureTestTest.scala +++ b/inject/inject-server/src/test/scala/com/twitter/inject/server/tests/FeatureTestTest.scala @@ -1,14 +1,45 @@ package com.twitter.inject.server.tests +import com.twitter.finagle.http.Status import com.twitter.inject.server.{EmbeddedTwitterServer, FeatureTest, TwitterServer} +/** Test an injectable TwitterServer with the [[FeatureTest]] trait */ class FeatureTestTest extends FeatureTest { - override val server = - new EmbeddedTwitterServer(new TwitterServer {}) - .bind[String].toInstance("helloworld") + /* Disable printing of stats for injectable TwitterServer under test */ + override val printStats = false - test("feature test") { + override val server: EmbeddedTwitterServer = + new EmbeddedTwitterServer( + twitterServer = new TwitterServer {}, + disableTestLogging = true + ).bind[String].toInstance("helloworld") + + /** + * Explicitly start the server before all tests, close will be attempted by + * [[com.twitter.inject.server.FeatureTestMixin]] in `afterAll`. + */ + override def beforeAll(): Unit = { + server.start() + } + + test("TwitterServer#starts up") { + server.assertHealthy() + } + + test("TwitterServer#feature test") { + server.httpGetAdmin( + "/admin/lint.json", + andExpect = Status.Ok + ) + + server.httpGetAdmin( + "/admin/registry.json", + andExpect = Status.Ok + ) + } + + test("TwitterServer#bind test") { server.injector.instance[String] should be("helloworld") } } diff --git a/inject/inject-server/src/test/scala/com/twitter/inject/server/tests/PortUtilsTest.scala b/inject/inject-server/src/test/scala/com/twitter/inject/server/tests/PortUtilsTest.scala index 5f7e91474c..7694b6bd7d 100644 --- a/inject/inject-server/src/test/scala/com/twitter/inject/server/tests/PortUtilsTest.scala +++ b/inject/inject-server/src/test/scala/com/twitter/inject/server/tests/PortUtilsTest.scala @@ -1,6 +1,6 @@ package com.twitter.inject.server.tests -import com.twitter.finagle.builder.Server +import com.twitter.finagle.ListeningServer import com.twitter.inject.Test import com.twitter.inject.server.PortUtils import com.twitter.util.Awaitable.CanAwait @@ -9,8 +9,8 @@ import java.net.InetSocketAddress class PortUtilsTest extends Test { - test("PortUtils#getPort for Server") { - val server = new Server { + test("PortUtils#getPort for ListeningServer") { + val server = new ListeningServer { /** * The address to which this server is bound. diff --git a/inject/inject-server/src/test/scala/com/twitter/inject/server/tests/StartupIntegrationTest.scala b/inject/inject-server/src/test/scala/com/twitter/inject/server/tests/StartupIntegrationTest.scala index 5bc8719ef8..237b065568 100644 --- a/inject/inject-server/src/test/scala/com/twitter/inject/server/tests/StartupIntegrationTest.scala +++ b/inject/inject-server/src/test/scala/com/twitter/inject/server/tests/StartupIntegrationTest.scala @@ -2,7 +2,7 @@ package com.twitter.inject.server.tests import com.google.inject.AbstractModule import com.twitter.app.CloseException -import com.twitter.conversions.time._ +import com.twitter.conversions.DurationOps._ import com.twitter.finagle.http.Status import com.twitter.inject.app.App import com.twitter.inject.server.{EmbeddedTwitterServer, Ports, TwitterServer} diff --git a/inject/inject-thrift-client/src/main/scala/com/twitter/inject/thrift/modules/ThriftClientModuleTrait.scala b/inject/inject-thrift-client/src/main/scala/com/twitter/inject/thrift/modules/ThriftClientModuleTrait.scala index 30dd1fa778..6fac0031f2 100644 --- a/inject/inject-thrift-client/src/main/scala/com/twitter/inject/thrift/modules/ThriftClientModuleTrait.scala +++ b/inject/inject-thrift-client/src/main/scala/com/twitter/inject/thrift/modules/ThriftClientModuleTrait.scala @@ -1,6 +1,6 @@ package com.twitter.inject.thrift.modules -import com.twitter.conversions.time._ +import com.twitter.conversions.DurationOps._ import com.twitter.finagle.ThriftMux import com.twitter.finagle.service.RetryBudget import com.twitter.inject.{Injector, Logging} diff --git a/inject/inject-thrift-client/src/test/scala/com/twitter/inject/thrift/DoEverythingReqRepThriftMethodBuilderClientModuleFeatureTest.scala b/inject/inject-thrift-client/src/test/scala/com/twitter/inject/thrift/DoEverythingReqRepThriftMethodBuilderClientModuleFeatureTest.scala index e05b26fbee..0f470cec94 100644 --- a/inject/inject-thrift-client/src/test/scala/com/twitter/inject/thrift/DoEverythingReqRepThriftMethodBuilderClientModuleFeatureTest.scala +++ b/inject/inject-thrift-client/src/test/scala/com/twitter/inject/thrift/DoEverythingReqRepThriftMethodBuilderClientModuleFeatureTest.scala @@ -1,6 +1,6 @@ package com.twitter.inject.thrift -import com.twitter.conversions.time._ +import com.twitter.conversions.DurationOps._ import com.twitter.finagle.http.Status.Ok import com.twitter.finatra.http.{EmbeddedHttpServer, HttpTest} import com.twitter.finatra.thrift.EmbeddedThriftServer diff --git a/inject/inject-thrift-client/src/test/scala/com/twitter/inject/thrift/DoEverythingThriftMethodBuilderClientModuleFeatureTest.scala b/inject/inject-thrift-client/src/test/scala/com/twitter/inject/thrift/DoEverythingThriftMethodBuilderClientModuleFeatureTest.scala index 0a6a579328..e861cf46b7 100644 --- a/inject/inject-thrift-client/src/test/scala/com/twitter/inject/thrift/DoEverythingThriftMethodBuilderClientModuleFeatureTest.scala +++ b/inject/inject-thrift-client/src/test/scala/com/twitter/inject/thrift/DoEverythingThriftMethodBuilderClientModuleFeatureTest.scala @@ -40,6 +40,11 @@ class DoEverythingThriftMethodBuilderClientModuleFeatureTest extends FeatureTest ) ) + override def beforeAll(): Unit = { + super.beforeAll() + server.start() + } + override def afterAll(): Unit = { greeterThriftServer.close() echoThriftServer.close() diff --git a/inject/inject-thrift-client/src/test/scala/com/twitter/inject/thrift/integration/TestThriftServer.scala b/inject/inject-thrift-client/src/test/scala/com/twitter/inject/thrift/integration/TestThriftServer.scala index df62ddb585..92575fef47 100644 --- a/inject/inject-thrift-client/src/test/scala/com/twitter/inject/thrift/integration/TestThriftServer.scala +++ b/inject/inject-thrift-client/src/test/scala/com/twitter/inject/thrift/integration/TestThriftServer.scala @@ -1,6 +1,6 @@ package com.twitter.inject.thrift.integration -import com.twitter.conversions.time._ +import com.twitter.conversions.DurationOps._ import com.twitter.finagle.{ListeningServer, ThriftMux} import com.twitter.finagle.thrift.ThriftService import com.twitter.inject.server.{PortUtils, TwitterServer} diff --git a/inject/inject-thrift-client/src/test/scala/com/twitter/inject/thrift/integration/reqrepserviceperendpoint/GreeterReqRepThriftMethodBuilderClientModule.scala b/inject/inject-thrift-client/src/test/scala/com/twitter/inject/thrift/integration/reqrepserviceperendpoint/GreeterReqRepThriftMethodBuilderClientModule.scala index d193550c50..21fca7ced0 100644 --- a/inject/inject-thrift-client/src/test/scala/com/twitter/inject/thrift/integration/reqrepserviceperendpoint/GreeterReqRepThriftMethodBuilderClientModule.scala +++ b/inject/inject-thrift-client/src/test/scala/com/twitter/inject/thrift/integration/reqrepserviceperendpoint/GreeterReqRepThriftMethodBuilderClientModule.scala @@ -1,7 +1,7 @@ package com.twitter.inject.thrift.integration.reqrepserviceperendpoint import com.google.inject.Module -import com.twitter.conversions.percent._ +import com.twitter.conversions.PercentOps._ import com.twitter.finagle.service.{ReqRep, ResponseClass, ResponseClassifier} import com.twitter.greeter.thriftscala.{Greeter, InvalidOperation} import com.twitter.inject.exceptions.PossiblyRetryable diff --git a/inject/inject-utils/src/test/scala/com/twitter/inject/tests/utils/ExceptionUtilsTest.scala b/inject/inject-utils/src/test/scala/com/twitter/inject/tests/utils/ExceptionUtilsTest.scala index 9fb9f4b892..6bd19dc687 100644 --- a/inject/inject-utils/src/test/scala/com/twitter/inject/tests/utils/ExceptionUtilsTest.scala +++ b/inject/inject-utils/src/test/scala/com/twitter/inject/tests/utils/ExceptionUtilsTest.scala @@ -1,6 +1,6 @@ package com.twitter.inject.tests.utils -import com.twitter.conversions.time._ +import com.twitter.conversions.DurationOps._ import com.twitter.finagle.{FailedFastException, IndividualRequestTimeoutException} import com.twitter.inject.Test import com.twitter.inject.utils.ExceptionUtils diff --git a/kafka-streams/PROJECT b/kafka-streams/PROJECT new file mode 100644 index 0000000000..c57fb17484 --- /dev/null +++ b/kafka-streams/PROJECT @@ -0,0 +1,7 @@ +owners: + - messaging-group:ldap + - scosenza + - dbress + - adams +watchers: + - ds-messaging@twitter.com diff --git a/kafka-streams/kafka-streams-prerestore/src/main/scala/BUILD b/kafka-streams/kafka-streams-prerestore/src/main/scala/BUILD new file mode 100644 index 0000000000..452a680047 --- /dev/null +++ b/kafka-streams/kafka-streams-prerestore/src/main/scala/BUILD @@ -0,0 +1,16 @@ +scala_library( + sources = rglobs("*.scala"), + compiler_option_sets = {"fatal_warnings"}, + provides = scala_artifact( + org = "com.twitter", + name = "finatra-streams-prerestore", + repo = artifactory, + ), + strict_deps = False, + dependencies = [ + "finatra/kafka-streams/kafka-streams-static-partitioning/src/main/scala", + ], + exports = [ + "finatra/kafka-streams/kafka-streams-static-partitioning/src/main/scala", + ], +) diff --git a/kafka-streams/kafka-streams-prerestore/src/main/scala/com/twitter/finatra/streams/prerestore/PreRestoreState.scala b/kafka-streams/kafka-streams-prerestore/src/main/scala/com/twitter/finatra/streams/prerestore/PreRestoreState.scala new file mode 100644 index 0000000000..bbcfcd64c8 --- /dev/null +++ b/kafka-streams/kafka-streams-prerestore/src/main/scala/com/twitter/finatra/streams/prerestore/PreRestoreState.scala @@ -0,0 +1,140 @@ +package com.twitter.finatra.streams.prerestore + +import com.twitter.finatra.annotations.Experimental +import com.twitter.conversions.DurationOps._ +import com.twitter.finatra.kafkastreams.KafkaStreamsTwitterServer +import com.twitter.finatra.kafkastreams.internal.utils.ReflectionUtils +import com.twitter.finatra.streams.partitioning.StaticPartitioning +import java.util.Properties +import java.util.concurrent.TimeUnit +import java.util.concurrent.atomic.AtomicInteger +import org.apache.kafka.clients.consumer.Consumer +import org.apache.kafka.common.Metric +import org.apache.kafka.common.utils.Utils +import org.apache.kafka.streams.processor.internals.StreamThread +import org.apache.kafka.streams.{KafkaStreams, StreamsConfig} +import org.joda.time.DateTimeUtils +import scala.collection.JavaConverters._ +import scala.util.control.NonFatal + +@Experimental +trait PreRestoreState extends KafkaStreamsTwitterServer with StaticPartitioning { + + private val preRestoreState = flag("kafka.prerestore.state", true, "Pre-Restore state") + private val preRestoreDurationInitialDelay = + flag("kafka.prerestore.duration", 2.minutes, "Pre-Restore min delay") + + /* Protected */ + + final override protected[finatra] def createAndStartKafkaStreams(): Unit = { + if (preRestoreState()) { + val copiedProperties = properties.clone().asInstanceOf[Properties] + val preRestoreProperties = configurePreRestoreProperties(copiedProperties) + val preRestoreKafkaStreams = + new KafkaStreams(topology, preRestoreProperties, kafkaStreamsClientSupplier) + setExceptionHandler(preRestoreKafkaStreams) + preRestoreKafkaStreams.start() + startWaitForPreRestoredThread(preRestoreKafkaStreams) + } else { + super.createAndStartKafkaStreams() + } + } + + /* Private */ + + private def startWaitForPreRestoredThread(preRestoreKafkaStreams: KafkaStreams): Unit = { + new Thread("wait-for-pre-restoring-server-thread") { + override def run(): Unit = { + try { + waitForPreRestoreFinished(preRestoreKafkaStreams) + + info(s"Closing pre-restoring server") + preRestoreKafkaStreams.close(1, TimeUnit.MINUTES) + info(s"Pre-restore complete.") + + //Reset the thread id and start Kafka Streams as if we weren't using pre-restore mode + resetStreamThreadId() + PreRestoreState.super.createAndStartKafkaStreams() + } catch { + case NonFatal(e) => + error("PreRestore error", e) + close(defaultCloseGracePeriod) + } + } + }.start() + } + + // Note: 10000 is somewhat arbitrary. The goal is to get close to the head of the changelog, before exiting pre-restore mode and taking active ownership of the pre-restoring tasks + private def waitForPreRestoreFinished(preRestoreKafkaStreams: KafkaStreams): Unit = { + info( + s"Waiting for Total Restore Lag to be less than 1000 after an initial wait period of ${preRestoreDurationInitialDelay()}" + ) + val minTimeToFinish = DateTimeUtils + .currentTimeMillis() + preRestoreDurationInitialDelay().inMillis + var totalRestoreLag = Double.MaxValue + while (totalRestoreLag >= 10000 || DateTimeUtils.currentTimeMillis() < minTimeToFinish) { + totalRestoreLag = findTotalRestoreLag(preRestoreKafkaStreams) + Thread.sleep(1000) + } + } + + private def findTotalRestoreLag(preRestoreKafkaStreams: KafkaStreams): Double = { + val lagMetrics = findRestoreConsumerLagMetrics(preRestoreKafkaStreams) + val totalRestoreLag = lagMetrics.map(_.metricValue.asInstanceOf[Double]).sum + info(s"Total Restore Lag: $totalRestoreLag") + totalRestoreLag + } + + private def configurePreRestoreProperties(properties: Properties) = { + val applicationServerConfigHost = Utils.getHost(applicationServerConfig()) + properties.put( + StreamsConfig.APPLICATION_SERVER_CONFIG, + s"$applicationServerConfigHost:${StaticPartitioning.PreRestoreSignalingPort}" + ) + + // During prerestore we set poll_ms to 0 to prevent activeTask.polling from slowing down the standby tasks + // See https://github.com/apache/kafka/blob/b532ee218e01baccc0ff8c4b1df586577637de50/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamThread.java#L832 + properties.put(StreamsConfig.POLL_MS_CONFIG, "0") + + properties + } + + //HACK: Reset StreamThread's so the kafka broker doesn't see 2x the consumer client.ids (since thread number is part of client id) + private def resetStreamThreadId(): Unit = { + try { + val streamThreadClass = classOf[StreamThread] + val streamThreadIdSequenceField = + streamThreadClass.getDeclaredField("STREAM_THREAD_ID_SEQUENCE") + streamThreadIdSequenceField.setAccessible(true) + val streamThreadIdSequence = streamThreadIdSequenceField.get(null).asInstanceOf[AtomicInteger] + streamThreadIdSequence.set(1) + } catch { + case NonFatal(e) => + error("Error resetting stream threads", e) + } + } + + private def findRestoreConsumerLagMetrics(kafkaStreams: KafkaStreams): Seq[Metric] = { + for { + thread <- getThreads(kafkaStreams).toSeq + restoreConsumer = getRestoreConsumer(thread) + (name, recordsLag) <- findConsumerLagMetric(restoreConsumer) + } yield { + recordsLag + } + } + + private def findConsumerLagMetric(restoreConsumer: Consumer[Array[Byte], Array[Byte]]) = { + restoreConsumer.metrics().asScala.find { + case (metricName, metric) => metricName.name() == "records-lag" + } + } + + private def getThreads(kafkaStreams: KafkaStreams) = { + ReflectionUtils.getFinalField[Array[StreamThread]](anyRef = kafkaStreams, fieldName = "threads") + } + + private def getRestoreConsumer(thread: StreamThread) = { + ReflectionUtils.getFinalField[Consumer[Array[Byte], Array[Byte]]](thread, "restoreConsumer") + } +} diff --git a/kafka-streams/kafka-streams-prerestore/src/test/resources/BUILD b/kafka-streams/kafka-streams-prerestore/src/test/resources/BUILD new file mode 100644 index 0000000000..9237675c63 --- /dev/null +++ b/kafka-streams/kafka-streams-prerestore/src/test/resources/BUILD @@ -0,0 +1,3 @@ +resources( + sources = globs("*.xml"), +) diff --git a/kafka-streams/kafka-streams-prerestore/src/test/resources/logback-test.xml b/kafka-streams/kafka-streams-prerestore/src/test/resources/logback-test.xml new file mode 100644 index 0000000000..6c2fbb5ebe --- /dev/null +++ b/kafka-streams/kafka-streams-prerestore/src/test/resources/logback-test.xml @@ -0,0 +1,90 @@ + + + + %date %.-3level %-25logger{0} %msg%n + + + + + + %red(%date [%thread] %.-3level %-25logger{0} %msg%n) + + + + + %red(%date [%thread] %.-3level %-25logger{0} %msg%n) + + + + + + %blue(%date [%thread] %.-3level %-25logger{0} %msg%n) + + + + + + %green(%date [%thread] %.-3level %-25logger{0} %msg%n) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/kafka-streams/kafka-streams-prerestore/src/test/scala/BUILD b/kafka-streams/kafka-streams-prerestore/src/test/scala/BUILD new file mode 100644 index 0000000000..1d6aa0f7c8 --- /dev/null +++ b/kafka-streams/kafka-streams-prerestore/src/test/scala/BUILD @@ -0,0 +1,21 @@ +junit_tests( + sources = rglobs("*.scala"), + compiler_option_sets = {"fatal_warnings"}, + strict_deps = False, + dependencies = [ + "3rdparty/jvm/ch/qos/logback:logback-classic", + "3rdparty/jvm/org/apache/kafka:kafka-clients-test", + "3rdparty/jvm/org/apache/kafka:kafka-streams", + "3rdparty/jvm/org/apache/kafka:kafka-streams-test", + "3rdparty/jvm/org/apache/kafka:kafka-test", + "3rdparty/jvm/org/apache/zookeeper:zookeeper-client", + "3rdparty/jvm/org/apache/zookeeper:zookeeper-server", + "finatra/http/src/test/scala:test-deps", + "finatra/inject/inject-core/src/test/scala:test-deps", + "finatra/inject/inject-server/src/test/scala:test-deps", + "finatra/kafka-streams/kafka-streams-prerestore/src/main/scala", + "finatra/kafka-streams/kafka-streams-prerestore/src/test/resources", + "finatra/kafka-streams/kafka-streams/src/test/scala:test-deps", + "finatra/kafka/src/test/scala:test-deps", + ], +) diff --git a/kafka-streams/kafka-streams-prerestore/src/test/scala/com/twitter/finatra/kafkastreams/integration/wordcount/PreRestoreWordCountRocksDbServer.scala b/kafka-streams/kafka-streams-prerestore/src/test/scala/com/twitter/finatra/kafkastreams/integration/wordcount/PreRestoreWordCountRocksDbServer.scala new file mode 100644 index 0000000000..e7621ff58b --- /dev/null +++ b/kafka-streams/kafka-streams-prerestore/src/test/scala/com/twitter/finatra/kafkastreams/integration/wordcount/PreRestoreWordCountRocksDbServer.scala @@ -0,0 +1,25 @@ +package com.twitter.finatra.kafkastreams.integration.wordcount + +import com.twitter.finatra.kafka.serde.ScalaSerdes +import com.twitter.finatra.kafkastreams.KafkaStreamsTwitterServer +import com.twitter.finatra.streams.prerestore.PreRestoreState +import org.apache.kafka.common.serialization.Serdes +import org.apache.kafka.streams.StreamsBuilder +import org.apache.kafka.streams.kstream.{Consumed, Materialized, Produced, Serialized} + +class PreRestoreWordCountRocksDbServer extends KafkaStreamsTwitterServer with PreRestoreState { + + override val name = "wordcount" + private val countStoreName = "CountsStore" + flag("hack_to_allow_explicit_http_port_below", "hack", "hack") + + override protected def configureKafkaStreams(builder: StreamsBuilder): Unit = { + builder.asScala + .stream("TextLinesTopic")(Consumed.`with`(Serdes.Bytes, Serdes.String)) + .flatMapValues(_.split(' ')) + .groupBy((_, word) => word)(Serialized.`with`(Serdes.String, Serdes.String)) + .count()(Materialized.as(countStoreName)) + .toStream + .to("WordsWithCountsTopic")(Produced.`with`(Serdes.String, ScalaSerdes.Long)) + } +} diff --git a/kafka-streams/kafka-streams-prerestore/src/test/scala/com/twitter/finatra/kafkastreams/integration/wordcount/PreRestoreWordCountServerFeatureTest.scala b/kafka-streams/kafka-streams-prerestore/src/test/scala/com/twitter/finatra/kafkastreams/integration/wordcount/PreRestoreWordCountServerFeatureTest.scala new file mode 100644 index 0000000000..5157e9684b --- /dev/null +++ b/kafka-streams/kafka-streams-prerestore/src/test/scala/com/twitter/finatra/kafkastreams/integration/wordcount/PreRestoreWordCountServerFeatureTest.scala @@ -0,0 +1,103 @@ +package com.twitter.finatra.kafkastreams.integration.wordcount + +import com.twitter.conversions.DurationOps._ +import com.twitter.finatra.http.EmbeddedHttpServer +import com.twitter.finatra.kafka.serde.ScalaSerdes +import com.twitter.finatra.kafka.test.utils.InMemoryStatsUtil +import com.twitter.finatra.kafkastreams.test.KafkaStreamsMultiServerFeatureTest +import com.twitter.util.{Await, Duration} +import org.apache.kafka.common.serialization.Serdes +import org.scalatest.Ignore + +@Ignore +class PreRestoreWordCountServerFeatureTest extends KafkaStreamsMultiServerFeatureTest { + + private def createServer(preRestore: Boolean): EmbeddedHttpServer = { + new EmbeddedHttpServer( + new PreRestoreWordCountRocksDbServer, + flags = kafkaStreamsFlags ++ Map( + "kafka.application.num.instances" -> "1", + "kafka.prerestore.state" -> s"$preRestore", + "kafka.prerestore.duration" -> "1000.milliseconds", + "kafka.application.server" -> "0.foo.scosenza.service.smf1.twitter.com:12345", + "kafka.application.id" -> "wordcount-prod" + ) + ) + } + + override protected def kafkaCommitInterval: Duration = 1.second + + private val textLinesTopic = + kafkaTopic(ScalaSerdes.Long, Serdes.String, "TextLinesTopic", logPublish = false) + private val countsChangelogTopic = kafkaTopic( + Serdes.String, + Serdes.Long, + "wordcount-prod-CountsStore-changelog", + autoCreate = false + ) + private val wordsWithCountsTopic = kafkaTopic(Serdes.String, Serdes.Long, "WordsWithCountsTopic") + + test("word count") { + testInitialStartupWithoutPrerestore() + testRestartWithoutPrerestore() + testRestartWithoutPrerestore() + testRestartWithPrerestore() + } + + private def testInitialStartupWithoutPrerestore(): Unit = { + val server = createServer(preRestore = false) + //val countsStore = kafkaStateStore[String, Long]("CountsStore", server) + server.start() + val serverStats = InMemoryStatsUtil(server.injector) + + textLinesTopic.publish(1L -> "hello world hello") + /*countsStore.queryKeyValueUntilValue("hello", 2L) + countsStore.queryKeyValueUntilValue("world", 1L)*/ + + textLinesTopic.publish(1L -> "world world") + //countsStore.queryKeyValueUntilValue("world", 3L) + serverStats.waitForGauge("kafka/thread1/restore_consumer/records_consumed_total", 0) + + for (i <- 1 to 1000) { + textLinesTopic.publish(1L -> s"foo$i") + } + + server.close() + Await.result(server.mainResult) + server.clearStats() + resetStreamThreadId() + } + + private def testRestartWithoutPrerestore(): Unit = { + val server = createServer(preRestore = false) + /*val countsStore = kafkaStateStore[String, Long]("CountsStore", server) + server.start() + val serverStats = InMemoryStatsUtil(server.injector) + + countsStore.queryKeyValueUntilValue("hello", 2L) + countsStore.queryKeyValueUntilValue("world", 3L) + server.printStats() + //TODO byte consumed is >0 but records consumed is 0 :-/ serverStats.waitForGauge("kafka/thread1/restore_consumer/records_consumed_total", 2)*/ + + server.close() + Await.result(server.mainResult) + server.clearStats() + resetStreamThreadId() + } + + private def testRestartWithPrerestore(): Unit = { + val server = createServer(preRestore = true) + /*val countsStore = kafkaStateStore[String, Long]("CountsStore", server) + server.start() + val serverStats = InMemoryStatsUtil(server.injector) + + countsStore.queryKeyValueUntilValue("hello", 2L) + countsStore.queryKeyValueUntilValue("world", 3L) + serverStats.waitForGauge("kafka/thread1/restore_consumer/records_consumed_total", 0)*/ + + server.close() + Await.result(server.mainResult) + server.clearStats() + resetStreamThreadId() + } +} diff --git a/kafka-streams/kafka-streams-queryable-thrift-client/src/main/java/BUILD b/kafka-streams/kafka-streams-queryable-thrift-client/src/main/java/BUILD new file mode 100644 index 0000000000..2f44d8bcd7 --- /dev/null +++ b/kafka-streams/kafka-streams-queryable-thrift-client/src/main/java/BUILD @@ -0,0 +1,8 @@ +java_library( + sources = rglobs("*.java"), + compiler_option_sets = {}, + dependencies = [ + ], + exports = [ + ], +) diff --git a/kafka-streams/kafka-streams-queryable-thrift-client/src/main/java/com/twitter/finatra/streams/queryable/thrift/client/partitioning/utils/KafkaUtils.java b/kafka-streams/kafka-streams-queryable-thrift-client/src/main/java/com/twitter/finatra/streams/queryable/thrift/client/partitioning/utils/KafkaUtils.java new file mode 100644 index 0000000000..390069eaac --- /dev/null +++ b/kafka-streams/kafka-streams-queryable-thrift-client/src/main/java/com/twitter/finatra/streams/queryable/thrift/client/partitioning/utils/KafkaUtils.java @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +//Note: Copied from Kafka Utils and Serde classes to prevent a query clients from depending on Kafka +package com.twitter.finatra.streams.queryable.thrift.client.partitioning.utils; + +//Note: All code below copied from Kafka 1.1 +@SuppressWarnings("checkstyle:off") +public final class KafkaUtils { + + private KafkaUtils() { + + } + + /** + * Generates 32 bit murmur2 hash from byte array + * @param data byte array to hash + * @return 32 bit hash of the given array + */ + @SuppressWarnings("fallthrough") + public static int murmur2(final byte[] data) { + int length = data.length; + int seed = 0x9747b28c; + // 'm' and 'r' are mixing constants generated offline. + // They're not really 'magic', they just happen to work well. + final int m = 0x5bd1e995; + final int r = 24; + + // Initialize the hash to a random value + int h = seed ^ length; + int length4 = length / 4; + + // SUPPRESS CHECKSTYLE:OFF LineLength + for (int i = 0; i < length4; i++) { + final int i4 = i * 4; + int k = (data[i4 + 0] & 0xff) + ((data[i4 + 1] & 0xff) << 8) + ((data[i4 + 2] & 0xff) << 16) + ((data[i4 + 3] & 0xff) << 24); + k *= m; + k ^= k >>> r; + k *= m; + h *= m; + h ^= k; + } + + // Handle the last few bytes of the input array + switch (length % 4) { + // SUPPRESS CHECKSTYLE:OFF FallThrough + case 3: + h ^= (data[(length & ~3) + 2] & 0xff) << 16; + // SUPPRESS CHECKSTYLE:OFF FallThrough + case 2: + h ^= (data[(length & ~3) + 1] & 0xff) << 8; + // SUPPRESS CHECKSTYLE:OFF FallThrough + case 1: + h ^= data[length & ~3] & 0xff; + h *= m; + default: + } + + h ^= h >>> 13; + h *= m; + h ^= h >>> 15; + + return h; + } + + /** + * A cheap way to deterministically convert a number to a positive value. When the input is + * positive, the original value is returned. When the input number is negative, the returned + * positive value is the original value bit AND against 0x7fffffff which is not its absolutely + * value. + * + * Note: changing this method in the future will possibly cause partition selection not to be + * compatible with the existing messages already placed on a partition since it is used + * in producer's {@link org.apache.kafka.clients.producer.internals.DefaultPartitioner} + * + * @param number a given number + * @return a positive number. + */ + public static int toPositive(int number) { + return number & 0x7fffffff; + } + + // SUPPRESS CHECKSTYLE:OFF JavadocMethodRegex + public static byte[] serializeLong(Long data) { + if (data == null) { + return null; + } + + return new byte[] { + (byte) (data >>> 56), + (byte) (data >>> 48), + (byte) (data >>> 40), + (byte) (data >>> 32), + (byte) (data >>> 24), + (byte) (data >>> 16), + (byte) (data >>> 8), + data.byteValue() + }; + } +} diff --git a/kafka-streams/kafka-streams-queryable-thrift-client/src/main/scala/BUILD b/kafka-streams/kafka-streams-queryable-thrift-client/src/main/scala/BUILD new file mode 100644 index 0000000000..c9e26577fa --- /dev/null +++ b/kafka-streams/kafka-streams-queryable-thrift-client/src/main/scala/BUILD @@ -0,0 +1,23 @@ +scala_library( + sources = rglobs("*.scala"), + compiler_option_sets = {"fatal_warnings"}, + java_sources = [ + "finatra/kafka-streams/kafka-streams-queryable-thrift-client/src/main/java", + ], + provides = scala_artifact( + org = "com.twitter", + name = "finatra-streams-queryable-thrift-client", + repo = artifactory, + ), + strict_deps = False, + dependencies = [ + "3rdparty/jvm/com/twitter/bijection:core", + "finagle/finagle-serversets", + "finatra/inject/inject-thrift-client", + ], + exports = [ + "3rdparty/jvm/com/twitter/bijection:core", + "finagle/finagle-serversets", + "finatra/inject/inject-thrift-client", + ], +) diff --git a/kafka-streams/kafka-streams-queryable-thrift-client/src/main/scala/com/twitter/finagle/loadbalancer/NoRequestedBrokersAvailableException.scala b/kafka-streams/kafka-streams-queryable-thrift-client/src/main/scala/com/twitter/finagle/loadbalancer/NoRequestedBrokersAvailableException.scala new file mode 100644 index 0000000000..e7fd8a629b --- /dev/null +++ b/kafka-streams/kafka-streams-queryable-thrift-client/src/main/scala/com/twitter/finagle/loadbalancer/NoRequestedBrokersAvailableException.scala @@ -0,0 +1,17 @@ +package com.twitter.finagle.loadbalancer + +import com.twitter.finagle.{Dtab, RequestException, SourcedException} + +class NoRequestedBrokersAvailableException( + val name: String, + val baseDtab: Dtab, + val localDtab: Dtab) + extends RequestException + with SourcedException { + def this(name: String = "unknown") = this(name, Dtab.empty, Dtab.empty) + + override def exceptionMessage: String = + s"No requested hosts are available for $name, Dtab.base=[${baseDtab.show}], Dtab.local=[${localDtab.show}]" + + serviceName = name +} diff --git a/kafka-streams/kafka-streams-queryable-thrift-client/src/main/scala/com/twitter/finagle/loadbalancer/NoRequestedShardIdsException.scala b/kafka-streams/kafka-streams-queryable-thrift-client/src/main/scala/com/twitter/finagle/loadbalancer/NoRequestedShardIdsException.scala new file mode 100644 index 0000000000..a263a49ac6 --- /dev/null +++ b/kafka-streams/kafka-streams-queryable-thrift-client/src/main/scala/com/twitter/finagle/loadbalancer/NoRequestedShardIdsException.scala @@ -0,0 +1,15 @@ +package com.twitter.finagle.loadbalancer + +import com.twitter.finagle.{Dtab, RequestException, SourcedException} + +class NoRequestedShardIdsException(val name: String, val baseDtab: Dtab, val localDtab: Dtab) + extends RequestException + with SourcedException { + def this(name: String = "unknown") = this(name, Dtab.empty, Dtab.empty) + + override def exceptionMessage: String = + s"No requested shard ids set in the local context. The ShardIdAwareRoundRobinBalancer should " + + s"only be used when there's always a requested shard id" + + serviceName = name +} diff --git a/kafka-streams/kafka-streams-queryable-thrift-client/src/main/scala/com/twitter/finagle/loadbalancer/ShardIdAwareRoundRobinBalancer.scala b/kafka-streams/kafka-streams-queryable-thrift-client/src/main/scala/com/twitter/finagle/loadbalancer/ShardIdAwareRoundRobinBalancer.scala new file mode 100644 index 0000000000..0df9a9b925 --- /dev/null +++ b/kafka-streams/kafka-streams-queryable-thrift-client/src/main/scala/com/twitter/finagle/loadbalancer/ShardIdAwareRoundRobinBalancer.scala @@ -0,0 +1,206 @@ +package com.twitter.finagle.loadbalancer + +import com.twitter.finagle.Address.Inet +import com.twitter.finagle._ +import com.twitter.finagle.context.Contexts +import com.twitter.finagle.serverset2.addr.ZkMetadata +import com.twitter.finagle.stats.{Counter, StatsReceiver} +import com.twitter.finatra.streams.queryable.thrift.domain.{RequestedShardIds, ServiceShardId} +import com.twitter.inject.Logging +import com.twitter.util.{Activity, Future, Time} +import scala.collection.mutable + +//TODO: DRY with com.twitter.finagle.loadbalancer.Balancers +object ShardIdAwareRoundRobinBalancer { + def create: LoadBalancerFactory = new LoadBalancerFactory { + override def newBalancer[Req, Rep]( + endpoints: Activity[IndexedSeq[EndpointFactory[Req, Rep]]], + exc: NoBrokersAvailableException, + params: Stack.Params + ): ServiceFactory[Req, Rep] = { + val sr = params[param.Stats].statsReceiver + //Note: We set maxEffort to 1 since our pick has it's own retry logic to try to find an open shard //TODO: Verify this + val balancer = new ShardIdAwareRoundRobinBalancer(endpoints, sr, exc, maxEffort = 1) + newScopedBal(params[param.Label].label, sr, "round_robin", balancer) + } + + override def toString: String = "ShardAwareRoundRobin" + } + + private def newScopedBal[Req, Rep]( + label: String, + sr: StatsReceiver, + lbType: String, + bal: ServiceFactory[Req, Rep] + ): ServiceFactory[Req, Rep] = { + bal match { + case balancer: Balancer[Req, Rep] => balancer.register(label) + case _ => () + } + + new ServiceFactoryProxy(bal) { + private[this] val typeGauge = sr.scope("algorithm").addGauge(lbType)(1) + + override def close(when: Time): Future[Unit] = { + typeGauge.remove() + super.close(when) + } + } + } +} + +/** + * A shard id aware balancer that lets clients specify a list of requested shard ids through a Local + * + * @see com.twitter.finatra.streams.queryable.thrift.client.internal.RequestedShardIds + * + * This class is adapted from Finagle RoundRobinLoadBalancer + * TODO: DRY with RoundRobinLoadBalancer + * + * TODO: This load balancer currently only works with zookeeper-backed destinations. + * A general solution could extract a ShardIdentifier interface e.g. + * trait ShardIdentifier { + * def fromAddrMetadata(meta: Addr.Metadata): Option[Int] + * } + * + * which we could then implement when implementing a new Announcer/Resolver pair ie. + * + * class ZkShardIdentifier extends ShardIdentifier { + * def fromAddrMetadata(meta: Addr.Metadata): Option[Int] = ZkMetadata.fromAddrMetadata(meta).shardId + * } + */ +private final class ShardIdAwareRoundRobinBalancer[Req, Rep]( + protected val endpoints: Activity[IndexedSeq[EndpointFactory[Req, Rep]]], + protected val statsReceiver: StatsReceiver, + protected val emptyException: NoBrokersAvailableException, + protected val maxEffort: Int = 5) + extends ServiceFactory[Req, Rep] + with Balancer[Req, Rep] + with Updating[Req, Rep] + with Logging { + + protected[this] val maxEffortExhausted: Counter = statsReceiver.counter("max_effort_exhausted") + + override def additionalMetadata: Map[String, Any] = Map.empty + + override def initDistributor(): Distributor = new Distributor(Vector.empty) + + override def newNode(factory: EndpointFactory[Req, Rep]): Node = new Node(factory) + + override def failingNode(cause: Throwable): Node = new Node(new FailingEndpointFactory(cause)) + + protected class Node(val factory: EndpointFactory[Req, Rep]) + extends ServiceFactoryProxy[Req, Rep](factory) + with NodeT[Req, Rep] { + + // Note: These stats are never updated. + override def load: Double = 0.0 + + override def pending: Int = 0 + + override def close(deadline: Time): Future[Unit] = { + factory.close(deadline) + } + + override def apply(conn: ClientConnection): Future[Service[Req, Rep]] = { + factory(conn) + } + + //Updated from RoundRobinBalancer + def shardId: Option[Int] = { + val shardId = factory.address match { + case inet: Inet => + for { + zkMetadata <- ZkMetadata.fromAddrMetadata(inet.metadata) + shardId <- zkMetadata.shardId + } yield { + shardId + } + case _ => + None + } + + if (shardId.isEmpty) { + //TODO: Should this be fatal? + error( + s"ShardIdAwareRoundRobinBalancer should only be used with nodes containing a shardId. No shardId found for $this" + ) + } + + shardId + } + } + + /** + * A simple round robin distributor. + */ + protected class Distributor(vector: Vector[Node]) extends DistributorT[Node](vector) { + type This = Distributor + + /** + * Indicates if we've seen any down nodes during `pick` which we expected to be available + */ + @volatile + protected[this] var sawDown = false + + /** + * `up` is the Vector of nodes that were `Status.Open` at creation time. + * `down` is the Vector of nodes that were not `Status.Open` at creation time. + */ + private[this] val (up: Vector[Node], down: Vector[Node]) = vector.partition(_.isAvailable) + + //Updated for ShardAware balancing + private val shardIdToNode = new mutable.LongMap[Node](vector.size) + + /* Populate shardIdToNode map at construction time */ + for { + node <- vector + shardId <- node.shardId + prevNode <- shardIdToNode.put(shardId, node) + } { + warn(s"Multiple nodes assigned the same shardId! prevNode: $prevNode newNode: $node") //TODO: Zombie detection? + } + + private val availableShardIds = shardIdToNode.keys.map(id => ServiceShardId(id.toInt)).toSet + + private val noBrokersAvailableNode = failingNode(new NoBrokersAvailableException) + private val noRequestedBrokersAvailableNode = failingNode( + new NoRequestedBrokersAvailableException + ) + private val noRequestedShardIdsNode = failingNode(new NoRequestedShardIdsException) + + override def pick(): Node = { + if (vector.isEmpty) { + return noBrokersAvailableNode + } + + Contexts.local.get(RequestedShardIds.requestedShardIdsKey) match { + case Some(requestedShardIds) => + requestedShardIds.chooseShardId(availableShardIds) match { + case Some(chosenShardId) => + val node = shardIdToNode(chosenShardId.id) //Note: chooseShardId ensures that we choose a shardId that exists in the map + if (node.status != Status.Open) { + sawDown = true + } + debug(s"PICK with requested shards $requestedShardIds selects node $node") + node + case None => + noRequestedBrokersAvailableNode + } + case _ => // + noRequestedShardIdsNode + } + } + + override def rebuild(): This = new Distributor(vector) + + override def rebuild(vector: Vector[Node]): This = new Distributor(vector) + + // while the `nonEmpty` check isn't necessary, it is an optimization + // to avoid the iterator allocation in the common case where `down` + // is empty. + override def needsRebuild: Boolean = { + sawDown || (down.nonEmpty && down.exists(_.isAvailable)) + } + } +} diff --git a/kafka-streams/kafka-streams-queryable-thrift-client/src/main/scala/com/twitter/finatra/streams/queryable/thrift/domain/KafkaGroupId.scala b/kafka-streams/kafka-streams-queryable-thrift-client/src/main/scala/com/twitter/finatra/streams/queryable/thrift/domain/KafkaGroupId.scala new file mode 100644 index 0000000000..5b9b01ca9b --- /dev/null +++ b/kafka-streams/kafka-streams-queryable-thrift-client/src/main/scala/com/twitter/finatra/streams/queryable/thrift/domain/KafkaGroupId.scala @@ -0,0 +1,3 @@ +package com.twitter.finatra.streams.queryable.thrift.domain + +case class KafkaGroupId(id: Int) diff --git a/kafka-streams/kafka-streams-queryable-thrift-client/src/main/scala/com/twitter/finatra/streams/queryable/thrift/domain/KafkaPartitionId.scala b/kafka-streams/kafka-streams-queryable-thrift-client/src/main/scala/com/twitter/finatra/streams/queryable/thrift/domain/KafkaPartitionId.scala new file mode 100644 index 0000000000..75c2d215ab --- /dev/null +++ b/kafka-streams/kafka-streams-queryable-thrift-client/src/main/scala/com/twitter/finatra/streams/queryable/thrift/domain/KafkaPartitionId.scala @@ -0,0 +1,3 @@ +package com.twitter.finatra.streams.queryable.thrift.domain + +case class KafkaPartitionId(id: Int) diff --git a/kafka-streams/kafka-streams-queryable-thrift-client/src/main/scala/com/twitter/finatra/streams/queryable/thrift/domain/RequestedShardIds.scala b/kafka-streams/kafka-streams-queryable-thrift-client/src/main/scala/com/twitter/finatra/streams/queryable/thrift/domain/RequestedShardIds.scala new file mode 100644 index 0000000000..ea358b0ddf --- /dev/null +++ b/kafka-streams/kafka-streams-queryable-thrift-client/src/main/scala/com/twitter/finatra/streams/queryable/thrift/domain/RequestedShardIds.scala @@ -0,0 +1,31 @@ +package com.twitter.finatra.streams.queryable.thrift.domain + +import com.twitter.finagle.context.Contexts +import com.twitter.finagle.context.Contexts.local + +object RequestedShardIds { + val requestedShardIdsKey: local.Key[RequestedShardIds] = + new Contexts.local.Key[RequestedShardIds]() +} + +/** + * Requested shard ids in order of preference + */ +case class RequestedShardIds(shardIds: IndexedSeq[ServiceShardId]) { + + private[this] var nextShardIdx: Int = 0 + + //TODO: Also check for Node with state = Open? + def chooseShardId(availableShardIds: Set[ServiceShardId]): Option[ServiceShardId] = { + var choiceNum = 0 + while (choiceNum < shardIds.size) { + val nextShardId = shardIds(nextShardIdx) + nextShardIdx = math.abs((nextShardIdx + 1) % shardIds.size) + if (availableShardIds.contains(nextShardId)) { + return Some(nextShardId) + } + choiceNum += 1 + } + None + } +} diff --git a/kafka-streams/kafka-streams-queryable-thrift-client/src/main/scala/com/twitter/finatra/streams/queryable/thrift/domain/ServiceShardId.scala b/kafka-streams/kafka-streams-queryable-thrift-client/src/main/scala/com/twitter/finatra/streams/queryable/thrift/domain/ServiceShardId.scala new file mode 100644 index 0000000000..91f0dde57c --- /dev/null +++ b/kafka-streams/kafka-streams-queryable-thrift-client/src/main/scala/com/twitter/finatra/streams/queryable/thrift/domain/ServiceShardId.scala @@ -0,0 +1,3 @@ +package com.twitter.finatra.streams.queryable.thrift.domain + +case class ServiceShardId(id: Int) diff --git a/kafka-streams/kafka-streams-queryable-thrift-client/src/main/scala/com/twitter/finatra/streams/queryable/thrift/partitioning/KafkaPartitioner.scala b/kafka-streams/kafka-streams-queryable-thrift-client/src/main/scala/com/twitter/finatra/streams/queryable/thrift/partitioning/KafkaPartitioner.scala new file mode 100644 index 0000000000..8950c5691e --- /dev/null +++ b/kafka-streams/kafka-streams-queryable-thrift-client/src/main/scala/com/twitter/finatra/streams/queryable/thrift/partitioning/KafkaPartitioner.scala @@ -0,0 +1,25 @@ +package com.twitter.finatra.streams.queryable.thrift.partitioning + +import com.twitter.finatra.streams.queryable.thrift.client.partitioning.utils.KafkaUtils.{ + murmur2, + toPositive +} +import com.twitter.finatra.streams.queryable.thrift.domain.{KafkaPartitionId, ServiceShardId} + +object KafkaPartitioner { + def partitionId(numPartitions: Int, keyBytes: Array[Byte]): KafkaPartitionId = { + val partitionId = toPositive(murmur2(keyBytes)) % numPartitions + KafkaPartitionId(partitionId) + } +} + +case class KafkaPartitioner(serviceShardPartitioner: ServiceShardPartitioner, numPartitions: Int) { + + def shardIds(keyBytes: Array[Byte]): IndexedSeq[ServiceShardId] = { + val kafkaPartitionId = KafkaPartitioner.partitionId(numPartitions, keyBytes) + IndexedSeq( + serviceShardPartitioner.activeShardId(kafkaPartitionId), + serviceShardPartitioner.standbyShardIds(kafkaPartitionId).head + ) + } +} diff --git a/kafka-streams/kafka-streams-queryable-thrift-client/src/main/scala/com/twitter/finatra/streams/queryable/thrift/partitioning/ServiceShardPartitioner.scala b/kafka-streams/kafka-streams-queryable-thrift-client/src/main/scala/com/twitter/finatra/streams/queryable/thrift/partitioning/ServiceShardPartitioner.scala new file mode 100644 index 0000000000..937d60a504 --- /dev/null +++ b/kafka-streams/kafka-streams-queryable-thrift-client/src/main/scala/com/twitter/finatra/streams/queryable/thrift/partitioning/ServiceShardPartitioner.scala @@ -0,0 +1,11 @@ +package com.twitter.finatra.streams.queryable.thrift.partitioning + +import com.twitter.finatra.streams.queryable.thrift.domain.{KafkaPartitionId, ServiceShardId} + +trait ServiceShardPartitioner { + def shardIds(numPartitions: Int, keyBytes: Array[Byte]): IndexedSeq[ServiceShardId] + + def activeShardId(kafkaPartitionId: KafkaPartitionId): ServiceShardId + + def standbyShardIds(kafkaPartitionId: KafkaPartitionId): Seq[ServiceShardId] +} diff --git a/kafka-streams/kafka-streams-queryable-thrift-client/src/main/scala/com/twitter/finatra/streams/queryable/thrift/partitioning/StaticServiceShardPartitioner.scala b/kafka-streams/kafka-streams-queryable-thrift-client/src/main/scala/com/twitter/finatra/streams/queryable/thrift/partitioning/StaticServiceShardPartitioner.scala new file mode 100644 index 0000000000..898c8defc1 --- /dev/null +++ b/kafka-streams/kafka-streams-queryable-thrift-client/src/main/scala/com/twitter/finatra/streams/queryable/thrift/partitioning/StaticServiceShardPartitioner.scala @@ -0,0 +1,37 @@ +package com.twitter.finatra.streams.queryable.thrift.partitioning + +import com.twitter.finatra.streams.queryable.thrift.client.partitioning.utils.KafkaUtils.{ + murmur2, + toPositive +} +import com.twitter.finatra.streams.queryable.thrift.domain.{KafkaPartitionId, ServiceShardId} + +object StaticServiceShardPartitioner { + def partitionId(numPartitions: Int, keyBytes: Array[Byte]): KafkaPartitionId = { + val partitionId = toPositive(murmur2(keyBytes)) % numPartitions + KafkaPartitionId(partitionId) + } +} + +case class StaticServiceShardPartitioner(numShards: Int) extends ServiceShardPartitioner { + + override def activeShardId(kafkaPartitionId: KafkaPartitionId): ServiceShardId = { + ServiceShardId(kafkaPartitionId.id % numShards) + } + + override def standbyShardIds(kafkaPartitionId: KafkaPartitionId): Seq[ServiceShardId] = { + Seq(standbyShardId(kafkaPartitionId)) + } + + override def shardIds(numPartitions: Int, keyBytes: Array[Byte]): IndexedSeq[ServiceShardId] = { + val kafkaPartitionId = StaticServiceShardPartitioner.partitionId(numPartitions, keyBytes) + IndexedSeq(activeShardId(kafkaPartitionId), standbyShardId(kafkaPartitionId)) + } + + //Note: We divide numShards by 2 so that standby tasks will be assigned on a different + // "side" of the available shards. This allows deploys to upgrade the first half of the + // shards while the second half of the shards can remain with the full set of tasks + def standbyShardId(kafkaPartitionId: KafkaPartitionId): ServiceShardId = { + ServiceShardId((kafkaPartitionId.id + (numShards / 2)) % numShards) + } +} diff --git a/kafka-streams/kafka-streams-queryable-thrift/src/main/scala/BUILD b/kafka-streams/kafka-streams-queryable-thrift/src/main/scala/BUILD new file mode 100644 index 0000000000..1bef8b2dd6 --- /dev/null +++ b/kafka-streams/kafka-streams-queryable-thrift/src/main/scala/BUILD @@ -0,0 +1,27 @@ +scala_library( + sources = rglobs("*.scala"), + compiler_option_sets = {"fatal_warnings"}, + provides = scala_artifact( + org = "com.twitter", + name = "finatra-streams-queryable-thrift", + repo = artifactory, + ), + strict_deps = False, + dependencies = [ + "finatra/inject/inject-core", + "finatra/inject/inject-server", + "finatra/inject/inject-slf4j", + "finatra/inject/inject-utils", + "finatra/kafka-streams/kafka-streams-static-partitioning/src/main/scala", + "finatra/kafka-streams/kafka-streams/src/main/java", + "finatra/thrift", + ], + exports = [ + "finatra/inject/inject-core", + "finatra/inject/inject-server", + "finatra/inject/inject-slf4j", + "finatra/inject/inject-utils", + "finatra/kafka-streams/kafka-streams/src/main/java", + "finatra/thrift", + ], +) diff --git a/kafka-streams/kafka-streams-queryable-thrift/src/main/scala/com/twitter/finatra/streams/queryable/thrift/QueryableState.scala b/kafka-streams/kafka-streams-queryable-thrift/src/main/scala/com/twitter/finatra/streams/queryable/thrift/QueryableState.scala new file mode 100644 index 0000000000..7141ea270a --- /dev/null +++ b/kafka-streams/kafka-streams-queryable-thrift/src/main/scala/com/twitter/finatra/streams/queryable/thrift/QueryableState.scala @@ -0,0 +1,64 @@ +package com.twitter.finatra.streams.queryable.thrift + +import com.twitter.app.Flag +import com.twitter.finatra.streams.partitioning.StaticPartitioning +import com.twitter.finatra.streams.query.{ + QueryableFinatraCompositeWindowStore, + QueryableFinatraKeyValueStore, + QueryableFinatraWindowStore +} +import com.twitter.util.Duration +import org.apache.kafka.common.serialization.Serde + +trait QueryableState extends StaticPartitioning { + protected val currentShard: Flag[Int] = flag[Int]("kafka.current.shard", "") + protected val numQueryablePartitions: Flag[Int] = flag[Int]("kafka.num.queryable.partitions", "") + + protected def queryableFinatraKeyValueStore[PK, K, V]( + storeName: String, + primaryKeySerde: Serde[PK] + ): QueryableFinatraKeyValueStore[PK, K, V] = { + new QueryableFinatraKeyValueStore[PK, K, V]( + storeName, + primaryKeySerde, + numApplicationInstances(), + numQueryablePartitions(), + currentShard()) + } + + protected def queryableFinatraWindowStore[K, V]( + storeName: String, + windowSize: Duration, + primaryKeySerde: Serde[K] + ): QueryableFinatraWindowStore[K, V] = { + new QueryableFinatraWindowStore[K, V]( + storeName, + windowSize = windowSize, + keySerde = primaryKeySerde, + numShards = numApplicationInstances(), + numQueryablePartitions = numQueryablePartitions(), + currentShardId = currentShard()) + } + + @deprecated("Use queryableFinatraWindowStore without a windowSize", "1/7/2019") + protected def queryableFinatraWindowStore[K, V]( + storeName: String, + primaryKeySerde: Serde[K] + ): QueryableFinatraWindowStore[K, V] = { + queryableFinatraWindowStore(storeName, null, primaryKeySerde) + } + + protected def queryableFinatraCompositeWindowStore[PK, SK, V]( + storeName: String, + windowSize: Duration, + primaryKeySerde: Serde[PK] + ): QueryableFinatraCompositeWindowStore[PK, SK, V] = { + new QueryableFinatraCompositeWindowStore[PK, SK, V]( + storeName, + windowSize = windowSize, + primaryKeySerde = primaryKeySerde, + numShards = numApplicationInstances(), + numQueryablePartitions = numQueryablePartitions(), + currentShardId = currentShard()) + } +} diff --git a/kafka-streams/kafka-streams-queryable-thrift/src/test/resources/BUILD b/kafka-streams/kafka-streams-queryable-thrift/src/test/resources/BUILD new file mode 100644 index 0000000000..9237675c63 --- /dev/null +++ b/kafka-streams/kafka-streams-queryable-thrift/src/test/resources/BUILD @@ -0,0 +1,3 @@ +resources( + sources = globs("*.xml"), +) diff --git a/kafka-streams/kafka-streams-queryable-thrift/src/test/resources/logback-test.xml b/kafka-streams/kafka-streams-queryable-thrift/src/test/resources/logback-test.xml new file mode 100644 index 0000000000..5d2d081597 --- /dev/null +++ b/kafka-streams/kafka-streams-queryable-thrift/src/test/resources/logback-test.xml @@ -0,0 +1,32 @@ + + + + %.-3level %-100logger %msg%n + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/kafka-streams/kafka-streams-queryable-thrift/src/test/thrift/BUILD b/kafka-streams/kafka-streams-queryable-thrift/src/test/thrift/BUILD new file mode 100644 index 0000000000..f207a073f0 --- /dev/null +++ b/kafka-streams/kafka-streams-queryable-thrift/src/test/thrift/BUILD @@ -0,0 +1,12 @@ +create_thrift_libraries( + base_name = "thrift", + sources = rglobs("*.thrift"), + dependency_roots = [ + ], + generate_languages = [ + "java", + "scala", + ], + provides_java_name = "kafka-streams-thrift-java", + provides_scala_name = "kafka-streams-thrift-scala", +) diff --git a/kafka-streams/kafka-streams-queryable-thrift/src/test/thrift/queryabletest.thrift b/kafka-streams/kafka-streams-queryable-thrift/src/test/thrift/queryabletest.thrift new file mode 100644 index 0000000000..9e9b3faad5 --- /dev/null +++ b/kafka-streams/kafka-streams-queryable-thrift/src/test/thrift/queryabletest.thrift @@ -0,0 +1,10 @@ +namespace java com.twitter.finatra.messaging.kafkastreams.thrift +#@namespace scala com.twitter.finatra.messaging.kafkastreams.thriftscala + +struct TweetId { + 1: i64 id +} + +struct UserId { + 1: i64 id +} diff --git a/kafka-streams/kafka-streams-static-partitioning/src/main/java/BUILD b/kafka-streams/kafka-streams-static-partitioning/src/main/java/BUILD new file mode 100644 index 0000000000..f7c760ae10 --- /dev/null +++ b/kafka-streams/kafka-streams-static-partitioning/src/main/java/BUILD @@ -0,0 +1,15 @@ +java_library( + sources = rglobs("*.java"), + compiler_option_sets = {}, + provides = artifact( + org = "com.twitter", + name = "finatra-streams-static-partitioning-java", + repo = artifactory, + ), + dependencies = [ + "3rdparty/jvm/org/apache/kafka:kafka-clients", + "3rdparty/jvm/org/apache/kafka:kafka-streams", + ], + exports = [ + ], +) diff --git a/kafka-streams/kafka-streams-static-partitioning/src/main/java/org/apache/kafka/streams/processor/internals/OverridableStreamsPartitionAssignor.java b/kafka-streams/kafka-streams-static-partitioning/src/main/java/org/apache/kafka/streams/processor/internals/OverridableStreamsPartitionAssignor.java new file mode 100644 index 0000000000..2387b20ce5 --- /dev/null +++ b/kafka-streams/kafka-streams-static-partitioning/src/main/java/org/apache/kafka/streams/processor/internals/OverridableStreamsPartitionAssignor.java @@ -0,0 +1,989 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.kafka.streams.processor.internals; + +// SUPPRESS CHECKSTYLE:OFF ImportOrder +// SUPPRESS CHECKSTYLE:OFF LineLength +// SUPPRESS CHECKSTYLE:OFF ModifierOrder +// SUPPRESS CHECKSTYLE:OFF OperatorWrap +// SUPPRESS CHECKSTYLE:OFF HiddenField +// SUPPRESS CHECKSTYLE:OFF NeedBraces +// SUPPRESS CHECKSTYLE:OFF NestedForDepth +// SUPPRESS CHECKSTYLE:OFF JavadocStyle +// SUPPRESS CHECKSTYLE:OFF NestedForDepth + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collection; +import java.util.Collections; +import java.util.Comparator; +import java.util.HashMap; +import java.util.HashSet; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; +import java.util.Set; +import java.util.UUID; +import java.util.concurrent.atomic.AtomicBoolean; + +import org.apache.kafka.clients.CommonClientConfigs; +import org.apache.kafka.clients.consumer.internals.PartitionAssignor; +import org.apache.kafka.common.Cluster; +import org.apache.kafka.common.Configurable; +import org.apache.kafka.common.KafkaException; +import org.apache.kafka.common.Node; +import org.apache.kafka.common.PartitionInfo; +import org.apache.kafka.common.TopicPartition; +import org.apache.kafka.common.config.ConfigException; +import org.apache.kafka.common.utils.LogContext; +import org.apache.kafka.common.utils.Utils; +import org.apache.kafka.streams.StreamsConfig; +import org.apache.kafka.streams.errors.StreamsException; +import org.apache.kafka.streams.errors.TaskAssignmentException; +import org.apache.kafka.streams.processor.PartitionGrouper; +import org.apache.kafka.streams.processor.TaskId; +import org.apache.kafka.clients.consumer.internals.PartitionAssignor.Assignment; +import org.apache.kafka.clients.consumer.internals.PartitionAssignor.Subscription; +import org.apache.kafka.streams.processor.internals.assignment.AssignmentInfo; +import org.apache.kafka.streams.processor.internals.assignment.ClientState; +import org.apache.kafka.streams.processor.internals.assignment.StickyTaskAssignor; +import org.apache.kafka.streams.processor.internals.assignment.SubscriptionInfo; +import org.apache.kafka.streams.processor.internals.assignment.TaskAssignor; +import org.apache.kafka.streams.state.HostInfo; +import org.slf4j.Logger; + +import static org.apache.kafka.common.utils.Utils.getHost; +import static org.apache.kafka.common.utils.Utils.getPort; + +//Note: The following class is copied from Kafka Streams 2.0.0 with the only changes being +// those that allow overriding TaskAssignor (see 'TWITTER CHANGED' comment) +public class OverridableStreamsPartitionAssignor implements PartitionAssignor, Configurable { + + private final static int UNKNOWN = -1; + public final static int NOT_AVAILABLE = -2; + private final static int VERSION_ONE = 1; + private final static int VERSION_TWO = 2; + private final static int VERSION_THREE = 3; + private final static int EARLIEST_PROBEABLE_VERSION = VERSION_THREE; + private int minReceivedMetadataVersion = UNKNOWN; + protected Set supportedVersions = new HashSet<>(); + + private Logger log; + private String logPrefix; + + private static class AssignedPartition implements Comparable { + public final TaskId taskId; + public final TopicPartition partition; + + AssignedPartition(final TaskId taskId, + final TopicPartition partition) { + this.taskId = taskId; + this.partition = partition; + } + + @Override + public int compareTo(final AssignedPartition that) { + return PARTITION_COMPARATOR.compare(this.partition, that.partition); + } + + @Override + public boolean equals(final Object o) { + if (!(o instanceof AssignedPartition)) { + return false; + } + final AssignedPartition other = (AssignedPartition) o; + return compareTo(other) == 0; + } + + @Override + public int hashCode() { + // Only partition is important for compareTo, equals and hashCode. + return partition.hashCode(); + } + } + + public static class ClientMetadata { + public final HostInfo hostInfo; + public final Set consumers; + public final ClientState state; + + ClientMetadata(final String endPoint) { + + // get the host info if possible + if (endPoint != null) { + final String host = getHost(endPoint); + final Integer port = getPort(endPoint); + + if (host == null || port == null) { + throw new ConfigException(String.format("Error parsing host address %s. Expected format host:port.", endPoint)); + } + + hostInfo = new HostInfo(host, port); + } else { + hostInfo = null; + } + + // initialize the consumer memberIds + consumers = new HashSet<>(); + + // initialize the client state + state = new ClientState(); + } + + void addConsumer(final String consumerMemberId, + final SubscriptionInfo info) { + consumers.add(consumerMemberId); + state.addPreviousActiveTasks(info.prevTasks()); + state.addPreviousStandbyTasks(info.standbyTasks()); + state.incrementCapacity(); + } + + @Override + public String toString() { + return "ClientMetadata{" + + "hostInfo=" + hostInfo + + ", consumers=" + consumers + + ", state=" + state + + '}'; + } + } + + static class InternalTopicMetadata { + public final InternalTopicConfig config; + + public int numPartitions; + + InternalTopicMetadata(final InternalTopicConfig config) { + this.config = config; + this.numPartitions = UNKNOWN; + } + + @Override + public String toString() { + return "InternalTopicMetadata(" + + "config=" + config + + ", numPartitions=" + numPartitions + + ")"; + } + } + + protected static final Comparator PARTITION_COMPARATOR = new Comparator() { + @Override + public int compare(final TopicPartition p1, + final TopicPartition p2) { + final int result = p1.topic().compareTo(p2.topic()); + + if (result != 0) { + return result; + } else { + return Integer.compare(p1.partition(), p2.partition()); + } + } + }; + + private String userEndPoint; + private int numStandbyReplicas; + + private TaskManager taskManager; + private PartitionGrouper partitionGrouper; + private AtomicBoolean versionProbingFlag; + + protected int usedSubscriptionMetadataVersion = SubscriptionInfo.LATEST_SUPPORTED_VERSION; + + private InternalTopicManager internalTopicManager; + private CopartitionedTopicsValidator copartitionedTopicsValidator; + + protected String userEndPoint() { + return userEndPoint; + } + + protected TaskManager taskManger() { + return taskManager; + } + + /** + * We need to have the PartitionAssignor and its StreamThread to be mutually accessible + * since the former needs later's cached metadata while sending subscriptions, + * and the latter needs former's returned assignment when adding tasks. + * @throws KafkaException if the stream thread is not specified + */ + @Override + public void configure(final Map configs) { + final StreamsConfig streamsConfig = new StreamsConfig(configs); + + // Setting the logger with the passed in client thread name + logPrefix = String.format("stream-thread [%s] ", streamsConfig.getString(CommonClientConfigs.CLIENT_ID_CONFIG)); + final LogContext logContext = new LogContext(logPrefix); + log = logContext.logger(getClass()); + + final String upgradeFrom = streamsConfig.getString(StreamsConfig.UPGRADE_FROM_CONFIG); + if (upgradeFrom != null) { + switch (upgradeFrom) { + case StreamsConfig.UPGRADE_FROM_0100: + log.info("Downgrading metadata version from {} to 1 for upgrade from 0.10.0.x.", SubscriptionInfo.LATEST_SUPPORTED_VERSION); + usedSubscriptionMetadataVersion = VERSION_ONE; + break; + case StreamsConfig.UPGRADE_FROM_0101: + case StreamsConfig.UPGRADE_FROM_0102: + case StreamsConfig.UPGRADE_FROM_0110: + case StreamsConfig.UPGRADE_FROM_10: + case StreamsConfig.UPGRADE_FROM_11: + log.info("Downgrading metadata version from {} to 2 for upgrade from {}.x.", SubscriptionInfo.LATEST_SUPPORTED_VERSION, upgradeFrom); + usedSubscriptionMetadataVersion = VERSION_TWO; + break; + default: + throw new IllegalArgumentException("Unknown configuration value for parameter 'upgrade.from': " + upgradeFrom); + } + } + + final Object o = configs.get(StreamsConfig.InternalConfig.TASK_MANAGER_FOR_PARTITION_ASSIGNOR); + if (o == null) { + final KafkaException fatalException = new KafkaException("TaskManager is not specified"); + log.error(fatalException.getMessage(), fatalException); + throw fatalException; + } + + if (!(o instanceof TaskManager)) { + final KafkaException fatalException = new KafkaException(String.format("%s is not an instance of %s", o.getClass().getName(), TaskManager.class.getName())); + log.error(fatalException.getMessage(), fatalException); + throw fatalException; + } + + taskManager = (TaskManager) o; + + final Object o2 = configs.get(StreamsConfig.InternalConfig.VERSION_PROBING_FLAG); + if (o2 == null) { + final KafkaException fatalException = new KafkaException("VersionProbingFlag is not specified"); + log.error(fatalException.getMessage(), fatalException); + throw fatalException; + } + + if (!(o2 instanceof AtomicBoolean)) { + final KafkaException fatalException = new KafkaException(String.format("%s is not an instance of %s", o2.getClass().getName(), AtomicBoolean.class.getName())); + log.error(fatalException.getMessage(), fatalException); + throw fatalException; + } + + versionProbingFlag = (AtomicBoolean) o2; + + numStandbyReplicas = streamsConfig.getInt(StreamsConfig.NUM_STANDBY_REPLICAS_CONFIG); + + partitionGrouper = streamsConfig.getConfiguredInstance(StreamsConfig.PARTITION_GROUPER_CLASS_CONFIG, PartitionGrouper.class); + + final String userEndPoint = streamsConfig.getString(StreamsConfig.APPLICATION_SERVER_CONFIG); + if (userEndPoint != null && !userEndPoint.isEmpty()) { + try { + final String host = getHost(userEndPoint); + final Integer port = getPort(userEndPoint); + + if (host == null || port == null) + throw new ConfigException(String.format("%s Config %s isn't in the correct format. Expected a host:port pair" + + " but received %s", + logPrefix, StreamsConfig.APPLICATION_SERVER_CONFIG, userEndPoint)); + } catch (final NumberFormatException nfe) { + throw new ConfigException(String.format("%s Invalid port supplied in %s for config %s", + logPrefix, userEndPoint, StreamsConfig.APPLICATION_SERVER_CONFIG)); + } + + this.userEndPoint = userEndPoint; + } + + internalTopicManager = new InternalTopicManager(taskManager.adminClient, streamsConfig); + + copartitionedTopicsValidator = new CopartitionedTopicsValidator(logPrefix); + } + + //TWITTER COMMENT: We keep the assignor name as "stream" since our overridden TaskAssignor is compatibility with the default StickyTaskAssignor + @Override + public String name() { + return "stream"; + } + + @Override + public Subscription subscription(final Set topics) { + // Adds the following information to subscription + // 1. Client UUID (a unique id assigned to an instance of KafkaStreams) + // 2. Task ids of previously running tasks + // 3. Task ids of valid local states on the client's state directory. + + final Set previousActiveTasks = taskManager.prevActiveTaskIds(); + final Set standbyTasks = taskManager.cachedTasksIds(); + standbyTasks.removeAll(previousActiveTasks); + final SubscriptionInfo data = new SubscriptionInfo( + usedSubscriptionMetadataVersion, + taskManager.processId(), + previousActiveTasks, + standbyTasks, + this.userEndPoint); + + taskManager.updateSubscriptionsFromMetadata(topics); + + return new Subscription(new ArrayList<>(topics), data.encode()); + } + + /* + * This assigns tasks to consumer clients in the following steps. + * + * 0. check all repartition source topics and use internal topic manager to make sure + * they have been created with the right number of partitions. + * + * 1. using user customized partition grouper to generate tasks along with their + * assigned partitions; also make sure that the task's corresponding changelog topics + * have been created with the right number of partitions. + * + * 2. using TaskAssignor to assign tasks to consumer clients. + * - Assign a task to a client which was running it previously. + * If there is no such client, assign a task to a client which has its valid local state. + * - A client may have more than one stream threads. + * The assignor tries to assign tasks to a client proportionally to the number of threads. + * - We try not to assign the same set of tasks to two different clients + * We do the assignment in one-pass. The result may not satisfy above all. + * + * 3. within each client, tasks are assigned to consumer clients in round-robin manner. + */ + @Override + public Map assign(final Cluster metadata, + final Map subscriptions) { + // construct the client metadata from the decoded subscription info + final Map clientsMetadata = new HashMap<>(); + final Set futureConsumers = new HashSet<>(); + + minReceivedMetadataVersion = SubscriptionInfo.LATEST_SUPPORTED_VERSION; + supportedVersions.clear(); + int futureMetadataVersion = UNKNOWN; + for (final Map.Entry entry : subscriptions.entrySet()) { + final String consumerId = entry.getKey(); + final Subscription subscription = entry.getValue(); + + final SubscriptionInfo info = SubscriptionInfo.decode(subscription.userData()); + final int usedVersion = info.version(); + supportedVersions.add(info.latestSupportedVersion()); + if (usedVersion > SubscriptionInfo.LATEST_SUPPORTED_VERSION) { + futureMetadataVersion = usedVersion; + futureConsumers.add(consumerId); + continue; + } + if (usedVersion < minReceivedMetadataVersion) { + minReceivedMetadataVersion = usedVersion; + } + + // create the new client metadata if necessary + ClientMetadata clientMetadata = clientsMetadata.get(info.processId()); + + if (clientMetadata == null) { + clientMetadata = new ClientMetadata(info.userEndPoint()); + clientsMetadata.put(info.processId(), clientMetadata); + } + + // add the consumer to the client + clientMetadata.addConsumer(consumerId, info); + } + + final boolean versionProbing; + if (futureMetadataVersion != UNKNOWN) { + if (minReceivedMetadataVersion >= EARLIEST_PROBEABLE_VERSION) { + log.info("Received a future (version probing) subscription (version: {}). Sending empty assignment back (with supported version {}).", + futureMetadataVersion, + SubscriptionInfo.LATEST_SUPPORTED_VERSION); + versionProbing = true; + } else { + throw new IllegalStateException("Received a future (version probing) subscription (version: " + futureMetadataVersion + + ") and an incompatible pre Kafka 2.0 subscription (version: " + minReceivedMetadataVersion + ") at the same time."); + } + } else { + versionProbing = false; + } + + if (minReceivedMetadataVersion < SubscriptionInfo.LATEST_SUPPORTED_VERSION) { + log.info("Downgrading metadata to version {}. Latest supported version is {}.", + minReceivedMetadataVersion, + SubscriptionInfo.LATEST_SUPPORTED_VERSION); + } + + log.debug("Constructed client metadata {} from the member subscriptions.", clientsMetadata); + + // ---------------- Step Zero ---------------- // + + // parse the topology to determine the repartition source topics, + // making sure they are created with the number of partitions as + // the maximum of the depending sub-topologies source topics' number of partitions + final Map topicGroups = taskManager.builder().topicGroups(); + + final Map repartitionTopicMetadata = new HashMap<>(); + for (final InternalTopologyBuilder.TopicsInfo topicsInfo : topicGroups.values()) { + for (final InternalTopicConfig topic: topicsInfo.repartitionSourceTopics.values()) { + repartitionTopicMetadata.put(topic.name(), new InternalTopicMetadata(topic)); + } + } + + boolean numPartitionsNeeded; + do { + numPartitionsNeeded = false; + + for (final InternalTopologyBuilder.TopicsInfo topicsInfo : topicGroups.values()) { + for (final String topicName : topicsInfo.repartitionSourceTopics.keySet()) { + int numPartitions = repartitionTopicMetadata.get(topicName).numPartitions; + + // try set the number of partitions for this repartition topic if it is not set yet + if (numPartitions == UNKNOWN) { + for (final InternalTopologyBuilder.TopicsInfo otherTopicsInfo : topicGroups.values()) { + final Set otherSinkTopics = otherTopicsInfo.sinkTopics; + + if (otherSinkTopics.contains(topicName)) { + // if this topic is one of the sink topics of this topology, + // use the maximum of all its source topic partitions as the number of partitions + for (final String sourceTopicName : otherTopicsInfo.sourceTopics) { + final Integer numPartitionsCandidate; + // It is possible the sourceTopic is another internal topic, i.e, + // map().join().join(map()) + if (repartitionTopicMetadata.containsKey(sourceTopicName)) { + numPartitionsCandidate = repartitionTopicMetadata.get(sourceTopicName).numPartitions; + } else { + numPartitionsCandidate = metadata.partitionCountForTopic(sourceTopicName); + if (numPartitionsCandidate == null) { + repartitionTopicMetadata.get(topicName).numPartitions = NOT_AVAILABLE; + } + } + + if (numPartitionsCandidate != null && numPartitionsCandidate > numPartitions) { + numPartitions = numPartitionsCandidate; + } + } + } + } + // if we still have not find the right number of partitions, + // another iteration is needed + if (numPartitions == UNKNOWN) { + numPartitionsNeeded = true; + } else { + repartitionTopicMetadata.get(topicName).numPartitions = numPartitions; + } + } + } + } + } while (numPartitionsNeeded); + + + // ensure the co-partitioning topics within the group have the same number of partitions, + // and enforce the number of partitions for those repartition topics to be the same if they + // are co-partitioned as well. + ensureCopartitioning(taskManager.builder().copartitionGroups(), repartitionTopicMetadata, metadata); + + // make sure the repartition source topics exist with the right number of partitions, + // create these topics if necessary + prepareTopic(repartitionTopicMetadata); + + // augment the metadata with the newly computed number of partitions for all the + // repartition source topics + final Map allRepartitionTopicPartitions = new HashMap<>(); + for (final Map.Entry entry : repartitionTopicMetadata.entrySet()) { + final String topic = entry.getKey(); + final int numPartitions = entry.getValue().numPartitions; + + for (int partition = 0; partition < numPartitions; partition++) { + allRepartitionTopicPartitions.put(new TopicPartition(topic, partition), + new PartitionInfo(topic, partition, null, new Node[0], new Node[0])); + } + } + + final Cluster fullMetadata = metadata.withPartitions(allRepartitionTopicPartitions); + taskManager.setClusterMetadata(fullMetadata); + + log.debug("Created repartition topics {} from the parsed topology.", allRepartitionTopicPartitions.values()); + + // ---------------- Step One ---------------- // + + // get the tasks as partition groups from the partition grouper + final Set allSourceTopics = new HashSet<>(); + final Map> sourceTopicsByGroup = new HashMap<>(); + for (final Map.Entry entry : topicGroups.entrySet()) { + allSourceTopics.addAll(entry.getValue().sourceTopics); + sourceTopicsByGroup.put(entry.getKey(), entry.getValue().sourceTopics); + } + + final Map> partitionsForTask = partitionGrouper.partitionGroups(sourceTopicsByGroup, fullMetadata); + + // check if all partitions are assigned, and there are no duplicates of partitions in multiple tasks + final Set allAssignedPartitions = new HashSet<>(); + final Map> tasksByTopicGroup = new HashMap<>(); + for (final Map.Entry> entry : partitionsForTask.entrySet()) { + final Set partitions = entry.getValue(); + for (final TopicPartition partition : partitions) { + if (allAssignedPartitions.contains(partition)) { + log.warn("Partition {} is assigned to more than one tasks: {}", partition, partitionsForTask); + } + } + allAssignedPartitions.addAll(partitions); + + final TaskId id = entry.getKey(); + tasksByTopicGroup.computeIfAbsent(id.topicGroupId, k -> new HashSet<>()).add(id); + } + for (final String topic : allSourceTopics) { + final List partitionInfoList = fullMetadata.partitionsForTopic(topic); + if (!partitionInfoList.isEmpty()) { + for (final PartitionInfo partitionInfo : partitionInfoList) { + final TopicPartition partition = new TopicPartition(partitionInfo.topic(), partitionInfo.partition()); + if (!allAssignedPartitions.contains(partition)) { + log.warn("Partition {} is not assigned to any tasks: {}" + + " Possible causes of a partition not getting assigned" + + " is that another topic defined in the topology has not been" + + " created when starting your streams application," + + " resulting in no tasks created for this topology at all.", partition, partitionsForTask); + } + } + } else { + log.warn("No partitions found for topic {}", topic); + } + } + + // add tasks to state change log topic subscribers + final Map changelogTopicMetadata = new HashMap<>(); + for (final Map.Entry entry : topicGroups.entrySet()) { + final int topicGroupId = entry.getKey(); + final Map stateChangelogTopics = entry.getValue().stateChangelogTopics; + + for (final InternalTopicConfig topicConfig : stateChangelogTopics.values()) { + // the expected number of partitions is the max value of TaskId.partition + 1 + int numPartitions = UNKNOWN; + if (tasksByTopicGroup.get(topicGroupId) != null) { + for (final TaskId task : tasksByTopicGroup.get(topicGroupId)) { + if (numPartitions < task.partition + 1) + numPartitions = task.partition + 1; + } + final InternalTopicMetadata topicMetadata = new InternalTopicMetadata(topicConfig); + topicMetadata.numPartitions = numPartitions; + + changelogTopicMetadata.put(topicConfig.name(), topicMetadata); + } else { + log.debug("No tasks found for topic group {}", topicGroupId); + } + } + } + + prepareTopic(changelogTopicMetadata); + + log.debug("Created state changelog topics {} from the parsed topology.", changelogTopicMetadata.values()); + + // ---------------- Step Two ---------------- // + + // assign tasks to clients + final Map states = new HashMap<>(); + for (final Map.Entry entry : clientsMetadata.entrySet()) { + states.put(entry.getKey(), entry.getValue().state); + } + + log.debug("Assigning tasks {} to clients {} with number of replicas {}", + partitionsForTask.keySet(), states, numStandbyReplicas); + + //TWITTER CHANGED + //final StickyTaskAssignor taskAssignor = new StickyTaskAssignor<>(states, partitionsForTask.keySet()); + final TaskAssignor taskAssignor = createTaskAssignor(partitionsForTask, states, clientsMetadata); + taskAssignor.assign(numStandbyReplicas); + + log.info("Assigned tasks to clients as {}.", states); + + // ---------------- Step Three ---------------- // + + // construct the global partition assignment per host map + final Map> partitionsByHostState = new HashMap<>(); + if (minReceivedMetadataVersion == 2 || minReceivedMetadataVersion == 3) { + for (final Map.Entry entry : clientsMetadata.entrySet()) { + final HostInfo hostInfo = entry.getValue().hostInfo; + + if (hostInfo != null) { + final Set topicPartitions = new HashSet<>(); + final ClientState state = entry.getValue().state; + + for (final TaskId id : state.activeTasks()) { + topicPartitions.addAll(partitionsForTask.get(id)); + } + + partitionsByHostState.put(hostInfo, topicPartitions); + } + } + } + taskManager.setPartitionsByHostState(partitionsByHostState); + + final Map assignment; + if (versionProbing) { + assignment = versionProbingAssignment(clientsMetadata, partitionsForTask, partitionsByHostState, futureConsumers, minReceivedMetadataVersion); + } else { + assignment = computeNewAssignment(clientsMetadata, partitionsForTask, partitionsByHostState, minReceivedMetadataVersion); + } + + return assignment; + } + + //TWITTER CHANGED + protected TaskAssignor createTaskAssignor(Map> partitionsForTask, Map states, Map clientsMetadata) { + return new StickyTaskAssignor<>(states, partitionsForTask.keySet()); + } + + private Map computeNewAssignment(final Map clientsMetadata, + final Map> partitionsForTask, + final Map> partitionsByHostState, + final int minUserMetadataVersion) { + final Map assignment = new HashMap<>(); + + // within the client, distribute tasks to its owned consumers + for (final Map.Entry entry : clientsMetadata.entrySet()) { + final Set consumers = entry.getValue().consumers; + final ClientState state = entry.getValue().state; + + final List> interleavedActive = interleaveTasksByGroupId(state.activeTasks(), consumers.size()); + final List> interleavedStandby = interleaveTasksByGroupId(state.standbyTasks(), consumers.size()); + + int consumerTaskIndex = 0; + + for (final String consumer : consumers) { + final Map> standby = new HashMap<>(); + final ArrayList assignedPartitions = new ArrayList<>(); + + final List assignedActiveList = interleavedActive.get(consumerTaskIndex); + + for (final TaskId taskId : assignedActiveList) { + for (final TopicPartition partition : partitionsForTask.get(taskId)) { + assignedPartitions.add(new AssignedPartition(taskId, partition)); + } + } + + if (!state.standbyTasks().isEmpty()) { + final List assignedStandbyList = interleavedStandby.get(consumerTaskIndex); + for (final TaskId taskId : assignedStandbyList) { + standby.computeIfAbsent(taskId, k -> new HashSet<>()).addAll(partitionsForTask.get(taskId)); + } + } + + consumerTaskIndex++; + + Collections.sort(assignedPartitions); + final List active = new ArrayList<>(); + final List activePartitions = new ArrayList<>(); + for (final AssignedPartition partition : assignedPartitions) { + active.add(partition.taskId); + activePartitions.add(partition.partition); + } + + // finally, encode the assignment before sending back to coordinator + assignment.put(consumer, new Assignment( + activePartitions, + new AssignmentInfo(minUserMetadataVersion, active, standby, partitionsByHostState).encode())); + } + } + + return assignment; + } + + private Map versionProbingAssignment(final Map clientsMetadata, + final Map> partitionsForTask, + final Map> partitionsByHostState, + final Set futureConsumers, + final int minUserMetadataVersion) { + final Map assignment = new HashMap<>(); + + // assign previously assigned tasks to "old consumers" + for (final ClientMetadata clientMetadata : clientsMetadata.values()) { + for (final String consumerId : clientMetadata.consumers) { + + if (futureConsumers.contains(consumerId)) { + continue; + } + + final List activeTasks = new ArrayList<>(clientMetadata.state.prevActiveTasks()); + + final List assignedPartitions = new ArrayList<>(); + for (final TaskId taskId : activeTasks) { + assignedPartitions.addAll(partitionsForTask.get(taskId)); + } + + final Map> standbyTasks = new HashMap<>(); + for (final TaskId taskId : clientMetadata.state.prevStandbyTasks()) { + standbyTasks.put(taskId, partitionsForTask.get(taskId)); + } + + assignment.put(consumerId, new Assignment( + assignedPartitions, + new AssignmentInfo( + minUserMetadataVersion, + activeTasks, + standbyTasks, + partitionsByHostState) + .encode() + )); + } + } + + // add empty assignment for "future version" clients (ie, empty version probing response) + for (final String consumerId : futureConsumers) { + assignment.put(consumerId, new Assignment( + Collections.emptyList(), + new AssignmentInfo().encode() + )); + } + + return assignment; + } + + // visible for testing + List> interleaveTasksByGroupId(final Collection taskIds, final int numberThreads) { + final LinkedList sortedTasks = new LinkedList<>(taskIds); + Collections.sort(sortedTasks); + final List> taskIdsForConsumerAssignment = new ArrayList<>(numberThreads); + for (int i = 0; i < numberThreads; i++) { + taskIdsForConsumerAssignment.add(new ArrayList<>()); + } + while (!sortedTasks.isEmpty()) { + for (final List taskIdList : taskIdsForConsumerAssignment) { + final TaskId taskId = sortedTasks.poll(); + if (taskId == null) { + break; + } + taskIdList.add(taskId); + } + } + return taskIdsForConsumerAssignment; + } + + /** + * @throws TaskAssignmentException if there is no task id for one of the partitions specified + */ + @Override + public void onAssignment(final Assignment assignment) { + final List partitions = new ArrayList<>(assignment.partitions()); + Collections.sort(partitions, PARTITION_COMPARATOR); + + final AssignmentInfo info = AssignmentInfo.decode(assignment.userData()); + final int receivedAssignmentMetadataVersion = info.version(); + final int leaderSupportedVersion = info.latestSupportedVersion(); + + if (receivedAssignmentMetadataVersion > usedSubscriptionMetadataVersion) { + throw new IllegalStateException("Sent a version " + usedSubscriptionMetadataVersion + + " subscription but got an assignment with higher version " + receivedAssignmentMetadataVersion + "."); + } + + if (receivedAssignmentMetadataVersion < usedSubscriptionMetadataVersion + && receivedAssignmentMetadataVersion >= EARLIEST_PROBEABLE_VERSION) { + + if (receivedAssignmentMetadataVersion == leaderSupportedVersion) { + log.info("Sent a version {} subscription and got version {} assignment back (successful version probing). " + + "Downgrading subscription metadata to received version and trigger new rebalance.", + usedSubscriptionMetadataVersion, + receivedAssignmentMetadataVersion); + usedSubscriptionMetadataVersion = receivedAssignmentMetadataVersion; + } else { + log.info("Sent a version {} subscription and got version {} assignment back (successful version probing). " + + "Setting subscription metadata to leaders supported version {} and trigger new rebalance.", + usedSubscriptionMetadataVersion, + receivedAssignmentMetadataVersion, + leaderSupportedVersion); + usedSubscriptionMetadataVersion = leaderSupportedVersion; + } + + versionProbingFlag.set(true); + return; + } + + // version 1 field + final Map> activeTasks = new HashMap<>(); + // version 2 fields + final Map topicToPartitionInfo = new HashMap<>(); + final Map> partitionsByHost; + + switch (receivedAssignmentMetadataVersion) { + case VERSION_ONE: + processVersionOneAssignment(info, partitions, activeTasks); + partitionsByHost = Collections.emptyMap(); + break; + case VERSION_TWO: + processVersionTwoAssignment(info, partitions, activeTasks, topicToPartitionInfo); + partitionsByHost = info.partitionsByHost(); + break; + case VERSION_THREE: + if (leaderSupportedVersion > usedSubscriptionMetadataVersion) { + log.info("Sent a version {} subscription and group leader's latest supported version is {}. " + + "Upgrading subscription metadata version to {} for next rebalance.", + usedSubscriptionMetadataVersion, + leaderSupportedVersion, + leaderSupportedVersion); + usedSubscriptionMetadataVersion = leaderSupportedVersion; + } + processVersionThreeAssignment(info, partitions, activeTasks, topicToPartitionInfo); + partitionsByHost = info.partitionsByHost(); + break; + default: + throw new IllegalStateException("This code should never be reached. Please file a bug report at https://issues.apache.org/jira/projects/KAFKA/"); + } + + taskManager.setClusterMetadata(Cluster.empty().withPartitions(topicToPartitionInfo)); + taskManager.setPartitionsByHostState(partitionsByHost); + taskManager.setAssignmentMetadata(activeTasks, info.standbyTasks()); + taskManager.updateSubscriptionsFromAssignment(partitions); + } + + private void processVersionOneAssignment(final AssignmentInfo info, + final List partitions, + final Map> activeTasks) { + // the number of assigned partitions should be the same as number of active tasks, which + // could be duplicated if one task has more than one assigned partitions + if (partitions.size() != info.activeTasks().size()) { + throw new TaskAssignmentException( + String.format("%sNumber of assigned partitions %d is not equal to the number of active taskIds %d" + + ", assignmentInfo=%s", logPrefix, partitions.size(), info.activeTasks().size(), info.toString()) + ); + } + + for (int i = 0; i < partitions.size(); i++) { + final TopicPartition partition = partitions.get(i); + final TaskId id = info.activeTasks().get(i); + activeTasks.computeIfAbsent(id, k -> new HashSet<>()).add(partition); + } + } + + private void processVersionTwoAssignment(final AssignmentInfo info, + final List partitions, + final Map> activeTasks, + final Map topicToPartitionInfo) { + processVersionOneAssignment(info, partitions, activeTasks); + + // process partitions by host + final Map> partitionsByHost = info.partitionsByHost(); + for (final Set value : partitionsByHost.values()) { + for (final TopicPartition topicPartition : value) { + topicToPartitionInfo.put( + topicPartition, + new PartitionInfo(topicPartition.topic(), topicPartition.partition(), null, new Node[0], new Node[0])); + } + } + } + + private void processVersionThreeAssignment(final AssignmentInfo info, + final List partitions, + final Map> activeTasks, + final Map topicToPartitionInfo) { + processVersionTwoAssignment(info, partitions, activeTasks, topicToPartitionInfo); + } + + // for testing + protected void processLatestVersionAssignment(final AssignmentInfo info, + final List partitions, + final Map> activeTasks, + final Map topicToPartitionInfo) { + processVersionThreeAssignment(info, partitions, activeTasks, topicToPartitionInfo); + } + + /** + * Internal helper function that creates a Kafka topic + * + * @param topicPartitions Map that contains the topic names to be created with the number of partitions + */ + private void prepareTopic(final Map topicPartitions) { + log.debug("Starting to validate internal topics {} in partition assignor.", topicPartitions); + + // first construct the topics to make ready + final Map topicsToMakeReady = new HashMap<>(); + + for (final InternalTopicMetadata metadata : topicPartitions.values()) { + final InternalTopicConfig topic = metadata.config; + final int numPartitions = metadata.numPartitions; + + if (numPartitions == NOT_AVAILABLE) { + continue; + } + if (numPartitions < 0) { + throw new StreamsException(String.format("%sTopic [%s] number of partitions not defined", logPrefix, topic.name())); + } + + topic.setNumberOfPartitions(numPartitions); + topicsToMakeReady.put(topic.name(), topic); + } + + if (!topicsToMakeReady.isEmpty()) { + internalTopicManager.makeReady(topicsToMakeReady); + } + + log.debug("Completed validating internal topics {} in partition assignor.", topicPartitions); + } + + private void ensureCopartitioning(final Collection> copartitionGroups, + final Map allRepartitionTopicsNumPartitions, + final Cluster metadata) { + for (final Set copartitionGroup : copartitionGroups) { + copartitionedTopicsValidator.validate(copartitionGroup, allRepartitionTopicsNumPartitions, metadata); + } + } + + static class CopartitionedTopicsValidator { + private final String logPrefix; + + CopartitionedTopicsValidator(final String logPrefix) { + this.logPrefix = logPrefix; + } + + void validate(final Set copartitionGroup, + final Map allRepartitionTopicsNumPartitions, + final Cluster metadata) { + int numPartitions = UNKNOWN; + + for (final String topic : copartitionGroup) { + if (!allRepartitionTopicsNumPartitions.containsKey(topic)) { + final Integer partitions = metadata.partitionCountForTopic(topic); + + if (partitions == null) { + throw new org.apache.kafka.streams.errors.TopologyException(String.format("%sTopic not found: %s", logPrefix, topic)); + } + + if (numPartitions == UNKNOWN) { + numPartitions = partitions; + } else if (numPartitions != partitions) { + final String[] topics = copartitionGroup.toArray(new String[copartitionGroup.size()]); + Arrays.sort(topics); + throw new org.apache.kafka.streams.errors.TopologyException(String.format("%sTopics not co-partitioned: [%s]", logPrefix, Utils.join(Arrays.asList(topics), ","))); + } + } else if (allRepartitionTopicsNumPartitions.get(topic).numPartitions == NOT_AVAILABLE) { + numPartitions = NOT_AVAILABLE; + break; + } + } + + // if all topics for this co-partition group is repartition topics, + // then set the number of partitions to be the maximum of the number of partitions. + if (numPartitions == UNKNOWN) { + for (final Map.Entry entry: allRepartitionTopicsNumPartitions.entrySet()) { + if (copartitionGroup.contains(entry.getKey())) { + final int partitions = entry.getValue().numPartitions; + if (partitions > numPartitions) { + numPartitions = partitions; + } + } + } + } + // enforce co-partitioning restrictions to repartition topics by updating their number of partitions + for (final Map.Entry entry : allRepartitionTopicsNumPartitions.entrySet()) { + if (copartitionGroup.contains(entry.getKey())) { + entry.getValue().numPartitions = numPartitions; + } + } + + } + } + + // following functions are for test only + void setInternalTopicManager(final InternalTopicManager internalTopicManager) { + this.internalTopicManager = internalTopicManager; + } + +} diff --git a/kafka-streams/kafka-streams-static-partitioning/src/main/scala/BUILD b/kafka-streams/kafka-streams-static-partitioning/src/main/scala/BUILD new file mode 100644 index 0000000000..da01091078 --- /dev/null +++ b/kafka-streams/kafka-streams-static-partitioning/src/main/scala/BUILD @@ -0,0 +1,18 @@ +scala_library( + sources = rglobs("*.scala"), + compiler_option_sets = {"fatal_warnings"}, + provides = scala_artifact( + org = "com.twitter", + name = "finatra-streams-static-partitioning", + repo = artifactory, + ), + strict_deps = False, + dependencies = [ + "finatra/kafka-streams/kafka-streams-queryable-thrift-client/src/main/scala", + "finatra/kafka-streams/kafka-streams-static-partitioning/src/main/java", + "finatra/kafka-streams/kafka-streams/src/main/scala", + ], + exports = [ + "finatra/kafka-streams/kafka-streams/src/main/scala", + ], +) diff --git a/kafka-streams/kafka-streams-static-partitioning/src/main/scala/com/twitter/finatra/streams/partitioning/StaticPartitioning.scala b/kafka-streams/kafka-streams-static-partitioning/src/main/scala/com/twitter/finatra/streams/partitioning/StaticPartitioning.scala new file mode 100644 index 0000000000..76544b0636 --- /dev/null +++ b/kafka-streams/kafka-streams-static-partitioning/src/main/scala/com/twitter/finatra/streams/partitioning/StaticPartitioning.scala @@ -0,0 +1,25 @@ +package com.twitter.finatra.streams.partitioning + +import com.twitter.app.Flag +import com.twitter.finatra.kafkastreams.KafkaStreamsTwitterServer +import com.twitter.finatra.streams.partitioning.internal.StaticPartitioningKafkaClientSupplierSupplier +import org.apache.kafka.streams.KafkaClientSupplier + +object StaticPartitioning { + val PreRestoreSignalingPort = 0 //TODO: Hack to signal our assignor that we are in PreRestore mode +} + +trait StaticPartitioning extends KafkaStreamsTwitterServer { + + protected val numApplicationInstances: Flag[Int] = + flag[Int]( + "kafka.application.num.instances", + "Total number of instances for static partitioning" + ) + + /* Protected */ + + override def kafkaStreamsClientSupplier: KafkaClientSupplier = { + new StaticPartitioningKafkaClientSupplierSupplier(numApplicationInstances()) + } +} diff --git a/kafka-streams/kafka-streams-static-partitioning/src/main/scala/com/twitter/finatra/streams/partitioning/internal/ClientStateAndHostInfo.scala b/kafka-streams/kafka-streams-static-partitioning/src/main/scala/com/twitter/finatra/streams/partitioning/internal/ClientStateAndHostInfo.scala new file mode 100644 index 0000000000..2735b33d20 --- /dev/null +++ b/kafka-streams/kafka-streams-static-partitioning/src/main/scala/com/twitter/finatra/streams/partitioning/internal/ClientStateAndHostInfo.scala @@ -0,0 +1,12 @@ +package com.twitter.finatra.streams.partitioning.internal + +import com.twitter.finatra.streams.queryable.thrift.domain.ServiceShardId +import org.apache.kafka.streams.processor.internals.assignment.ClientState +import org.apache.kafka.streams.state.HostInfo + +case class ClientStateAndHostInfo[ID](id: ID, clientState: ClientState, hostInfo: HostInfo) { + + val serviceShardId: ServiceShardId = { + StaticPartitioningStreamAssignor.parseShardId(hostInfo.host()) + } +} diff --git a/kafka-streams/kafka-streams-static-partitioning/src/main/scala/com/twitter/finatra/streams/partitioning/internal/StaticPartitioningKafkaClientSupplierSupplier.scala b/kafka-streams/kafka-streams-static-partitioning/src/main/scala/com/twitter/finatra/streams/partitioning/internal/StaticPartitioningKafkaClientSupplierSupplier.scala new file mode 100644 index 0000000000..74d01618f0 --- /dev/null +++ b/kafka-streams/kafka-streams-static-partitioning/src/main/scala/com/twitter/finatra/streams/partitioning/internal/StaticPartitioningKafkaClientSupplierSupplier.scala @@ -0,0 +1,21 @@ +package com.twitter.finatra.streams.partitioning.internal + +import java.util +import org.apache.kafka.clients.consumer.{Consumer, ConsumerConfig} +import org.apache.kafka.streams.processor.internals.DefaultKafkaClientSupplier + +class StaticPartitioningKafkaClientSupplierSupplier(numApplicationInstances: Int) + extends DefaultKafkaClientSupplier { + + override def getConsumer(config: util.Map[String, AnyRef]): Consumer[Array[Byte], Array[Byte]] = { + config.put( + StaticPartitioningStreamAssignor.ApplicationNumInstances, + numApplicationInstances.toString + ) + config.put( + ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG, + classOf[StaticPartitioningStreamAssignor].getName + ) + super.getConsumer(config) + } +} diff --git a/kafka-streams/kafka-streams-static-partitioning/src/main/scala/com/twitter/finatra/streams/partitioning/internal/StaticPartitioningStreamAssignor.scala b/kafka-streams/kafka-streams-static-partitioning/src/main/scala/com/twitter/finatra/streams/partitioning/internal/StaticPartitioningStreamAssignor.scala new file mode 100644 index 0000000000..8da58e1a15 --- /dev/null +++ b/kafka-streams/kafka-streams-static-partitioning/src/main/scala/com/twitter/finatra/streams/partitioning/internal/StaticPartitioningStreamAssignor.scala @@ -0,0 +1,65 @@ +package com.twitter.finatra.streams.partitioning.internal + +import com.twitter.finatra.streams.queryable.thrift.domain.ServiceShardId +import com.twitter.finatra.streams.queryable.thrift.partitioning.StaticServiceShardPartitioner +import com.twitter.inject.Logging +import java.util +import java.util.UUID +import org.apache.kafka.common.TopicPartition +import org.apache.kafka.streams.processor.TaskId +import org.apache.kafka.streams.processor.internals.OverridableStreamsPartitionAssignor +import org.apache.kafka.streams.processor.internals.assignment.{ClientState, TaskAssignor} +import scala.collection.JavaConverters._ +import scala.util.control.NonFatal + +object StaticPartitioningStreamAssignor { + val StreamsPreRestoreConfig = "streams.prerestore" + val ApplicationNumInstances = "application.num.instances" + + //TODO: Generalize + def parseShardId(applicationServerHost: String): ServiceShardId = { + val firstPeriodIndex = applicationServerHost.indexOf('.') + + val shardId = try { + applicationServerHost.substring(0, firstPeriodIndex).toInt + } catch { + case NonFatal(e) => + throw new Exception( + "StaticPartitioning currently requires flag 'kafka.application.server' to be set to the Aurora service proxy hostname " + + "e.g. 0.tweet-word-count.prod.team1.service.smf1.twitter.com:12345" + ) + } + + ServiceShardId(shardId) + } +} + +class StaticPartitioningStreamAssignor extends OverridableStreamsPartitionAssignor with Logging { + + private var _configs: util.Map[String, _] = _ + + override def configure(configs: util.Map[String, _]): Unit = { + super.configure(configs) + _configs = configs + } + + override protected def createTaskAssignor( + partitionsForTask: util.Map[TaskId, util.Set[TopicPartition]], + states: util.Map[UUID, ClientState], + clients: util.Map[UUID, OverridableStreamsPartitionAssignor.ClientMetadata] + ): TaskAssignor[UUID, TaskId] = { + val clientStateAndHostInfo: Map[UUID, ClientStateAndHostInfo[UUID]] = + (for ((id, metadata) <- clients.asScala) yield { + id -> ClientStateAndHostInfo(id, metadata.state, metadata.hostInfo) + }).toMap + + val numInstances = _configs + .get(StaticPartitioningStreamAssignor.ApplicationNumInstances).toString.toInt //Required flag + + new StaticTaskAssignor[UUID]( + serviceShardPartitioner = new StaticServiceShardPartitioner(numShards = numInstances), + clientsMetadata = clientStateAndHostInfo, + taskIds = partitionsForTask.keySet().asScala.toSet + ) + } +} diff --git a/kafka-streams/kafka-streams-static-partitioning/src/main/scala/com/twitter/finatra/streams/partitioning/internal/StaticTaskAssignor.scala b/kafka-streams/kafka-streams-static-partitioning/src/main/scala/com/twitter/finatra/streams/partitioning/internal/StaticTaskAssignor.scala new file mode 100644 index 0000000000..0a9964e0d4 --- /dev/null +++ b/kafka-streams/kafka-streams-static-partitioning/src/main/scala/com/twitter/finatra/streams/partitioning/internal/StaticTaskAssignor.scala @@ -0,0 +1,229 @@ +package com.twitter.finatra.streams.partitioning.internal + +import com.twitter.finagle.stats.LoadedStatsReceiver +import com.twitter.finatra.streams.partitioning.StaticPartitioning +import com.twitter.finatra.streams.queryable.thrift.domain.{ + KafkaGroupId, + KafkaPartitionId, + ServiceShardId +} +import com.twitter.finatra.streams.queryable.thrift.partitioning.ServiceShardPartitioner +import com.twitter.inject.Logging +import org.apache.kafka.streams.processor.TaskId +import org.apache.kafka.streams.processor.internals.assignment.TaskAssignor +import org.apache.kafka.streams.state.HostInfo +import scala.collection.mutable +import scala.language.existentials + +/** + * StaticTaskAssignor is a TaskAssignor that statically/deterministically assigns tasks to Kafka Streams instances/shards + * Deterministic assignments has the following benefits over the dynamic default StickyTaskAssignor: + * * Since the same active and standby tasks are assigned to the same shards, monitoring and alerting is easier to reason about + * * PreRestore functionality is easier to implement because we know where the active and standby tasks live + * * Queryable State is easier to implement since the query client can deterministically know where the active and standby tasks are available + * to be queried + * + * TODO: Detect zombie shards (e.g. multiple hosts with the same instanceId) + * TODO: Spread out standby replicas so they fall across multiple instances (currently, when a single instance restarts, all it's tasks go to the same instance containing all it's standby replicas) + * TODO: Currently the "interleaving" code in StreamPartitionAssignor spreads out active and standby tasks across multiple threads. To speed up active threads, we may want to move all standby processing to it's own thread)... + */ +class StaticTaskAssignor[ID]( + clientsMetadata: Map[ID, ClientStateAndHostInfo[ID]], + taskIds: Set[TaskId], + serviceShardPartitioner: ServiceShardPartitioner, + dynamicAssignments: Boolean = false) + extends TaskAssignor[ID, TaskId] + with Logging { + + //Note: We need to use LoadedStatsReceiver, since the parent class is created by Kafka without access to a scoped statsReceiver + private val clientsMetadataSizeStat = LoadedStatsReceiver.stat("clientsMetadataSize") + + /* Public */ + + //Note: Synchronized to protect against 2 assignors concurrently running when performing local feature testing.... + override def assign(numStandbyReplicas: Int): Unit = synchronized { + assert( + numStandbyReplicas <= 1, + "Num standby replicas > 1 not currently supported in static task assignor" + ) + clientsMetadataSizeStat.add(clientsMetadata.size) + info("clientsMetadata: " + clientsMetadata.mkString(", ")) + + val assignmentsMap = scala.collection.mutable.Map[HostInfo, TaskAssignments]() + val shardIdToClientStateAndHostInfo = createInstanceToClientStateAndHostInfo() + for { + (groupId, tasks) <- tasksByGroupId(numStandbyReplicas, shardIdToClientStateAndHostInfo) + taskId <- tasks + } { + val kafkaPartitionId = KafkaPartitionId(taskId.partition) + val activeShardId = serviceShardPartitioner.activeShardId(kafkaPartitionId) + val standbyShardId = getStandbyShardId(numStandbyReplicas, kafkaPartitionId) + debug(s"TaskId $taskId StaticActive $activeShardId StaticStandby $standbyShardId") + + for (activeClientStateAndHostInfo <- lookupActiveClientState( + shardIdToClientStateAndHostInfo, + activeShardId, + standbyShardId + )) { + val standbyClientStateAndHostInfo = getStandbyHostInfo( + activeClientStateAndHostInfo, + numStandbyReplicas, + shardIdToClientStateAndHostInfo, + standbyShardId + ) + assign(assignmentsMap, taskId, activeClientStateAndHostInfo, standbyClientStateAndHostInfo) + } + } + info(s"Assignments\n${assignmentsMap.iterator.toSeq + .sortBy(_._1.host()).map { case (k, v) => k + "\t" + v.toPrettyStr }.mkString("\n")}") + } + + /* Private */ + + private def getStandbyHostInfo( + activeClientStateAndHostInfo: ClientStateAndHostInfo[ID], + numStandbyReplicas: Int, + shardIdToClientStateAndHostInfo: Map[ServiceShardId, ClientStateAndHostInfo[ID]], + standbyShardId: Option[ServiceShardId] + ): Option[ClientStateAndHostInfo[ID]] = { + val standbyHostInfo = standbyShardId.flatMap( + lookupStandbyClientState(numStandbyReplicas, shardIdToClientStateAndHostInfo, _) + ) + + if (standbyHostInfo.contains(activeClientStateAndHostInfo)) { + None + } else { + standbyHostInfo + } + } + + private def getStandbyShardId(numStandbyReplicas: Int, kafkaPartitionId: KafkaPartitionId) = { + if (numStandbyReplicas == 0) { + None + } else { + serviceShardPartitioner.standbyShardIds(kafkaPartitionId).headOption + } + } + + private def assign( + assignmentsMap: mutable.Map[HostInfo, TaskAssignments], + taskId: TaskId, + activeClientStateAndHostInfo: ClientStateAndHostInfo[ID], + standbyClientStateAndHostInfoOpt: Option[ClientStateAndHostInfo[ID]] + ): Unit = { + if (isPreRestoringInstance(activeClientStateAndHostInfo)) { + standbyClientStateAndHostInfoOpt match { + case Some(standbyClientStateAndHostInfo) + if !isPreRestoringInstance(standbyClientStateAndHostInfo) => + assignStandby(assignmentsMap, taskId, activeClientStateAndHostInfo, preRestore = true) + assignActive(assignmentsMap, taskId, standbyClientStateAndHostInfo) + case Some(standbyClientStateAndHostInfo) => + assignStandby(assignmentsMap, taskId, activeClientStateAndHostInfo, preRestore = true) + case None => + assignActive(assignmentsMap, taskId, activeClientStateAndHostInfo) + } + } else { + assignActive(assignmentsMap, taskId, activeClientStateAndHostInfo) + + standbyClientStateAndHostInfoOpt match { + case Some(standbyClientStateAndHostInfo) + if !isPreRestoringInstance(standbyClientStateAndHostInfo) => + assignStandby(assignmentsMap, taskId, standbyClientStateAndHostInfo, preRestore = false) + case _ => + } + } + } + + private def tasksByGroupId( + numStandbyReplicas: Int, + shardIdToClientStateAndHostInfo: Map[ServiceShardId, ClientStateAndHostInfo[ID]] + ): Map[KafkaGroupId, Seq[TaskId]] = { + val tasksByGroupId = taskIds + .groupBy(taskId => KafkaGroupId(taskId.topicGroupId)).mapValues(_.toSeq.sortBy(_.partition)) + info( + s"Assign with $serviceShardPartitioner and numStandbyReplicas $numStandbyReplicas \nTasksByGroupId:\n$tasksByGroupId\nInstanceToClientStateAndHostInfo:\n$shardIdToClientStateAndHostInfo" + ) + tasksByGroupId + } + + private def createInstanceToClientStateAndHostInfo( + ): Map[ServiceShardId, ClientStateAndHostInfo[ID]] = { + for ((id, clientStateAndHostInfo) <- clientsMetadata) yield { + clientStateAndHostInfo.serviceShardId -> clientStateAndHostInfo + } + } + + private def assignActive( + assignmentsMap: mutable.Map[HostInfo, TaskAssignments], + taskId: TaskId, + clientStateAndHostInfo: ClientStateAndHostInfo[ID] + ) = { + debug(s"assignActive $taskId $clientStateAndHostInfo") + clientStateAndHostInfo.clientState.assign(taskId, true) + val taskAssignments = + assignmentsMap.getOrElseUpdate(clientStateAndHostInfo.hostInfo, TaskAssignments()) + taskAssignments.activeTasks += taskId + } + + private def assignStandby( + assignmentsMap: mutable.Map[HostInfo, TaskAssignments], + taskId: TaskId, + clientStateAndHostInfo: ClientStateAndHostInfo[ID], + preRestore: Boolean + ) = { + debug(s"assignStandby ${if (preRestore) "PreRestore" else ""} $taskId $clientStateAndHostInfo") + clientStateAndHostInfo.clientState.assign(taskId, false) + val taskAssignments = + assignmentsMap.getOrElseUpdate(clientStateAndHostInfo.hostInfo, TaskAssignments()) + if (preRestore) { + taskAssignments.standbyPreRestoreTasks += taskId + } else { + taskAssignments.standbyTasks += taskId + } + } + + private def isPreRestoringInstance(clientStateAndHostInfo: ClientStateAndHostInfo[ID]) = { + clientStateAndHostInfo.hostInfo.port() == StaticPartitioning.PreRestoreSignalingPort + } + + private def lookupActiveClientState( + instanceToClientStateAndHostInfo: Map[ServiceShardId, ClientStateAndHostInfo[ID]], + activeInstance: ServiceShardId, + standbyInstance: Option[ServiceShardId] + ): Option[ClientStateAndHostInfo[ID]] = { + instanceToClientStateAndHostInfo + .get(activeInstance).orElse(standbyInstance.flatMap(instanceToClientStateAndHostInfo.get)).orElse( + getNextAvailableInstance(instanceToClientStateAndHostInfo, activeInstance) + ) + } + + //TODO: Refactor + private def getNextAvailableInstance( + instanceToClientStateAndHostInfo: Map[ServiceShardId, ClientStateAndHostInfo[ID]], + activeInstance: ServiceShardId + ): Option[ClientStateAndHostInfo[ID]] = { + if (dynamicAssignments) { + var nextAvailableInstance: Option[ClientStateAndHostInfo[ID]] = None + var idx = activeInstance.id + while (nextAvailableInstance.isEmpty) { + idx += 1 + nextAvailableInstance = instanceToClientStateAndHostInfo.get(ServiceShardId(idx)) + } + nextAvailableInstance + } else { + None + } + } + + private def lookupStandbyClientState( + numStandbyReplicas: Int, + instanceToClientStateAndHostInfo: Map[ServiceShardId, ClientStateAndHostInfo[ID]], + shardId: ServiceShardId + ): Option[ClientStateAndHostInfo[ID]] = { + if (numStandbyReplicas > 0) { + instanceToClientStateAndHostInfo.get(shardId) + } else { + None + } + } +} diff --git a/kafka-streams/kafka-streams-static-partitioning/src/main/scala/com/twitter/finatra/streams/partitioning/internal/TaskAssignments.scala b/kafka-streams/kafka-streams-static-partitioning/src/main/scala/com/twitter/finatra/streams/partitioning/internal/TaskAssignments.scala new file mode 100644 index 0000000000..4ce2943d0d --- /dev/null +++ b/kafka-streams/kafka-streams-static-partitioning/src/main/scala/com/twitter/finatra/streams/partitioning/internal/TaskAssignments.scala @@ -0,0 +1,16 @@ +package com.twitter.finatra.streams.partitioning.internal + +import org.apache.kafka.streams.processor.TaskId +import scala.collection.mutable.ArrayBuffer + +case class TaskAssignments( + activeTasks: ArrayBuffer[TaskId] = ArrayBuffer(), + standbyTasks: ArrayBuffer[TaskId] = ArrayBuffer(), + standbyPreRestoreTasks: ArrayBuffer[TaskId] = ArrayBuffer()) { + + def toPrettyStr: String = { + s"Active:\t${activeTasks.mkString(", ")}" + + s"\tStandby:\t${standbyTasks.mkString(", ")}" + + s"\tPreRestore:\t${standbyPreRestoreTasks.mkString(", ")}" + } +} diff --git a/kafka-streams/kafka-streams-static-partitioning/src/test/resources/BUILD b/kafka-streams/kafka-streams-static-partitioning/src/test/resources/BUILD new file mode 100644 index 0000000000..9237675c63 --- /dev/null +++ b/kafka-streams/kafka-streams-static-partitioning/src/test/resources/BUILD @@ -0,0 +1,3 @@ +resources( + sources = globs("*.xml"), +) diff --git a/kafka-streams/kafka-streams-static-partitioning/src/test/resources/logback-test.xml b/kafka-streams/kafka-streams-static-partitioning/src/test/resources/logback-test.xml new file mode 100644 index 0000000000..ba176ff4be --- /dev/null +++ b/kafka-streams/kafka-streams-static-partitioning/src/test/resources/logback-test.xml @@ -0,0 +1,65 @@ + + + + %date %.-3level %-25logger{0} %msg%n + + + + + + %red(%date [%thread] %.-3level %-25logger{0} %msg%n) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/kafka-streams/kafka-streams-static-partitioning/src/test/scala/BUILD b/kafka-streams/kafka-streams-static-partitioning/src/test/scala/BUILD new file mode 100644 index 0000000000..57d6e018a6 --- /dev/null +++ b/kafka-streams/kafka-streams-static-partitioning/src/test/scala/BUILD @@ -0,0 +1,15 @@ +junit_tests( + sources = rglobs("*.scala"), + compiler_option_sets = {"fatal_warnings"}, + strict_deps = False, + dependencies = [ + "3rdparty/jvm/ch/qos/logback:logback-classic", + "3rdparty/jvm/org/apache/kafka:kafka-clients-test", + "3rdparty/jvm/org/apache/kafka:kafka-streams-test", + "3rdparty/jvm/org/apache/kafka:kafka-test", + "3rdparty/jvm/org/apache/zookeeper:zookeeper-client", + "finatra/kafka-streams/kafka-streams-static-partitioning/src/main/scala", + "finatra/kafka-streams/kafka-streams/src/test/scala:test-deps", + "finatra/kafka/src/test/scala:test-deps", + ], +) diff --git a/kafka-streams/kafka-streams-static-partitioning/src/test/scala/org/apache/kafka/streams/processor/internals/assignment/StaticTaskAssignorTest.scala b/kafka-streams/kafka-streams-static-partitioning/src/test/scala/org/apache/kafka/streams/processor/internals/assignment/StaticTaskAssignorTest.scala new file mode 100644 index 0000000000..34c6ed1ea9 --- /dev/null +++ b/kafka-streams/kafka-streams-static-partitioning/src/test/scala/org/apache/kafka/streams/processor/internals/assignment/StaticTaskAssignorTest.scala @@ -0,0 +1,504 @@ +package org.apache.kafka.streams.processor.internals.assignment + +import com.twitter.finatra.streams.partitioning.StaticPartitioning +import com.twitter.finatra.streams.partitioning.internal.{ + ClientStateAndHostInfo, + StaticTaskAssignor +} +import com.twitter.finatra.streams.queryable.thrift.partitioning.StaticServiceShardPartitioner +import com.twitter.inject.Test +import java.util +import org.apache.kafka.streams.processor.TaskId +import org.apache.kafka.streams.state.HostInfo +import scala.collection.JavaConverters._ + +class StaticTaskAssignorTest extends Test { + + private val task00 = new TaskId(0, 0) + private val task01 = new TaskId(0, 1) + private val task02 = new TaskId(0, 2) + private val task03 = new TaskId(0, 3) + private val task04 = new TaskId(0, 4) + private val task05 = new TaskId(0, 5) + private val task06 = new TaskId(0, 6) + private val task07 = new TaskId(0, 7) + private val task08 = new TaskId(0, 8) + + private val clients = new util.HashMap[Int, ClientStateAndHostInfo[Int]]() + + override def beforeEach(): Unit = { + super.beforeEach() + clients.clear() + } + + test("testAssign") { + createClient(processId = 0, capacity = 1) + createClient(processId = 1, capacity = 1) + createClient(processId = 2, capacity = 1) + createClient(processId = 3, capacity = 1) + + val assignor = new StaticTaskAssignor[Int]( + serviceShardPartitioner = StaticServiceShardPartitioner(numShards = 4), + clientsMetadata = clients.asScala.toMap, + taskIds = Set(task00, task01, task02, task03) + ) + + val numStandbyReplicas = 1 + assignor.assign(numStandbyReplicas) + + assertTasks(instanceId = 0, active = Set(task00), standby = Set(task02)) + + assertTasks(instanceId = 1, active = Set(task01), standby = Set(task03)) + + assertTasks(instanceId = 2, active = Set(task02), standby = Set(task00)) + + assertTasks(instanceId = 3, active = Set(task03), standby = Set(task01)) + } + + test("testAssign Side A down") { + createClient(processId = 2, capacity = 1) + createClient(processId = 3, capacity = 1) + + val assignor = new StaticTaskAssignor[Int]( + serviceShardPartitioner = StaticServiceShardPartitioner(numShards = 4), + clientsMetadata = clients.asScala.toMap, + taskIds = Set(task00, task01, task02, task03) + ) + + val numStandbyReplicas = 1 + assignor.assign(numStandbyReplicas) + + assertTasks(instanceId = 2, active = Set(task00, task02), standby = Set()) + + assertTasks(instanceId = 3, active = Set(task01, task03), standby = Set()) + } + + test("testAssign Side B down") { + createClient(processId = 0, capacity = 1) + createClient(processId = 1, capacity = 1) + + val assignor = new StaticTaskAssignor[Int]( + serviceShardPartitioner = StaticServiceShardPartitioner(numShards = 4), + clientsMetadata = clients.asScala.toMap, + taskIds = Set(task00, task01, task02, task03) + ) + + val numStandbyReplicas = 1 + assignor.assign(numStandbyReplicas) + + assertTasks(instanceId = 0, active = Set(task00, task02), standby = Set()) + + assertTasks(instanceId = 1, active = Set(task01, task03), standby = Set()) + } + + test("testAssign with 0 standby replicas") { + createClient(processId = 0, capacity = 1) + createClient(processId = 1, capacity = 1) + createClient(processId = 2, capacity = 1) + createClient(processId = 3, capacity = 1) + + val assignor = new StaticTaskAssignor[Int]( + serviceShardPartitioner = StaticServiceShardPartitioner(numShards = 4), + clientsMetadata = clients.asScala.toMap, + taskIds = Set(task00, task01, task02, task03) + ) + + val numStandbyReplicas = 0 + assignor.assign(numStandbyReplicas) + + assertTasks(instanceId = 0, active = Set(task00), standby = Set()) + + assertTasks(instanceId = 1, active = Set(task01), standby = Set()) + + assertTasks(instanceId = 2, active = Set(task02), standby = Set()) + + assertTasks(instanceId = 3, active = Set(task03), standby = Set()) + } + + test("assign with task 01 active instance down") { + createClient(0, 1) + createClient(2, 1) + createClient(3, 1) + + val assignor = new StaticTaskAssignor[Int]( + serviceShardPartitioner = StaticServiceShardPartitioner(numShards = 4), + clientsMetadata = clients.asScala.toMap, + taskIds = Set(task00, task01, task02, task03) + ) + + val numStandbyReplicas = 1 + assignor.assign(numStandbyReplicas) + + assertTasks(instanceId = 0, active = Set(task00), standby = Set(task02)) + + assertTasks(instanceId = 2, active = Set(task02), standby = Set(task00)) + + assertTasks(instanceId = 3, active = Set(task03, task01), standby = Set()) + } + + test( + "don't dynamically assign task00 when active and standby instances down and dynamicAssignments = false" + ) { + createClient(processId = 1, capacity = 1) + createClient(processId = 3, capacity = 1) + + val assignor = new StaticTaskAssignor[Int]( + serviceShardPartitioner = StaticServiceShardPartitioner(numShards = 4), + clientsMetadata = clients.asScala.toMap, + taskIds = Set(task00, task01, task02, task03), + dynamicAssignments = false + ) + + val numStandbyReplicas = 1 + assignor.assign(numStandbyReplicas) + + assertTasks(instanceId = 1, active = Set(task01), standby = Set(task03)) + + assertTasks(instanceId = 3, active = Set(task03), standby = Set(task01)) + } + + test("dont assign active and standby to single instance") { + createClient(0, 1) + + val assignor = new StaticTaskAssignor[Int]( + serviceShardPartitioner = StaticServiceShardPartitioner(numShards = 1), + clientsMetadata = clients.asScala.toMap, + taskIds = Set(task00) + ) + + val numStandbyReplicas = 1 + assignor.assign(numStandbyReplicas) + + assertTasks(instanceId = 0, active = Set(task00), standby = Set()) + } + + test("assign standbys to 2 of 2 clients") { + createClient(0, 2) + createClient(1, 2) + + val assignor = new StaticTaskAssignor[Int]( + serviceShardPartitioner = StaticServiceShardPartitioner(numShards = 2), + clientsMetadata = clients.asScala.toMap, + taskIds = Set(task00, task01) + ) + + val numStandbyReplicas = 1 + assignor.assign(numStandbyReplicas) + + assertTasks(instanceId = 0, active = Set(task00), standby = Set(task01)) + + assertTasks(instanceId = 1, active = Set(task01), standby = Set(task00)) + } + + test("assign standbys to 3 of 3 clients") { + createClient(0, 1) + createClient(1, 1) + createClient(2, 1) + + val assignor = new StaticTaskAssignor[Int]( + serviceShardPartitioner = StaticServiceShardPartitioner(numShards = 3), + clientsMetadata = clients.asScala.toMap, + taskIds = Set(task00, task01, task02) + ) + + val numStandbyReplicas = 1 + assignor.assign(numStandbyReplicas) + + assertTasks(instanceId = 0, active = Set(task00), standby = Set(task02)) + + assertTasks(instanceId = 1, active = Set(task01), standby = Set(task00)) + + assertTasks(instanceId = 2, active = Set(task02), standby = Set(task01)) + } + + test("assign standbys to 2 of 3 clients") { + createClient(0, 2) + createClient(1, 2) + + val assignor = new StaticTaskAssignor[Int]( + serviceShardPartitioner = StaticServiceShardPartitioner(numShards = 3), + clientsMetadata = clients.asScala.toMap, + taskIds = Set(task00, task01, task02) + ) + + val numStandbyReplicas = 1 + assignor.assign(numStandbyReplicas) + + assertTasks(instanceId = 0, active = Set(task00, task02), standby = Set()) + + assertTasks(instanceId = 1, active = Set(task01), standby = Set(task00)) + } + + test("instance 0 should get 1 active and 1 standby of the 100 other tasks") { + createClient(0, 2) + createClient(50, 2) + + val assignor = new StaticTaskAssignor[Int]( + serviceShardPartitioner = StaticServiceShardPartitioner(numShards = 100), + clientsMetadata = clients.asScala.toMap, + taskIds = createTasks(100), + dynamicAssignments = false + ) + + val numStandbyReplicas = 1 + assignor.assign(numStandbyReplicas) + + assertTasks(instanceId = 0, active = Set(task00), standby = Set(new TaskId(0, 50))) + } + + test( + "instance 0 should get 2 active when standby replicas disabled with 50 aurora shards and 100 partitions" + ) { + createClient(0, 1) + + val assignor = new StaticTaskAssignor[Int]( + serviceShardPartitioner = StaticServiceShardPartitioner(numShards = 50), + clientsMetadata = clients.asScala.toMap, + taskIds = createTasks(100), + dynamicAssignments = false + ) + + val numStandbyReplicas = 0 + assignor.assign(numStandbyReplicas) + + assertTasks(instanceId = 0, active = Set(new TaskId(0, 0), new TaskId(0, 50)), standby = Set()) + } + + def createTasks(numPartitions: Int) = { + (for (partition <- 0 until numPartitions) yield { + new TaskId(0, partition) + }).toSet + } + + //TODO: Spread out standby tasks so they do not all move to the same client when an instance restarts + test("9 tasks across 3 clients") { + createClient(0, 1) + createClient(1, 1) + createClient(2, 1) + + val assignor = new StaticTaskAssignor[Int]( + serviceShardPartitioner = StaticServiceShardPartitioner(numShards = 3), + clientsMetadata = clients.asScala.toMap, + taskIds = Set(task00, task01, task02, task03, task04, task05, task06, task07, task08) + ) + + val numStandbyReplicas = 1 + assignor.assign(numStandbyReplicas) + + assertTasks( + instanceId = 0, + active = Set(task00, task03, task06), + standby = Set(task02, task05, task08) + ) + + assertTasks( + instanceId = 1, + active = Set(task01, task04, task07), + standby = Set(task00, task03, task06) + ) + + assertTasks( + instanceId = 2, + active = Set(task02, task05, task08), + standby = Set(task01, task04, task07) + ) + } + + test("1 task and 1 client and no standby replicas") { + createClient(0, 1) + + val assignor = new StaticTaskAssignor[Int]( + serviceShardPartitioner = StaticServiceShardPartitioner(numShards = 3), + clientsMetadata = clients.asScala.toMap, + taskIds = Set(task00) + ) + + val numStandbyReplicas = 0 + assignor.assign(numStandbyReplicas) + + assertTasks(instanceId = 0, active = Set(task00), standby = Set()) + } + + test("1 task and 1 client and 1 standby replicas") { + createClient(0, 1) + + val assignor = new StaticTaskAssignor[Int]( + serviceShardPartitioner = StaticServiceShardPartitioner(numShards = 3), + clientsMetadata = clients.asScala.toMap, + taskIds = Set(task00) + ) + + val numStandbyReplicas = 1 + assignor.assign(numStandbyReplicas) + + assertTasks(instanceId = 0, active = Set(task00), standby = Set()) + } + + test("3 clients, 3 tasks, with standby 0") { + createClient(0, 1) + createClient(1, 1) + createClient(2, 1) + + val assignor = new StaticTaskAssignor[Int]( + serviceShardPartitioner = StaticServiceShardPartitioner(numShards = 3), + clientsMetadata = clients.asScala.toMap, + taskIds = Set(task00, task01, task02) + ) + + val numStandbyReplicas = 0 + assignor.assign(numStandbyReplicas) + + assertTasks(instanceId = 0, active = Set(task00), standby = Set()) + assertTasks(instanceId = 1, active = Set(task01), standby = Set()) + assertTasks(instanceId = 2, active = Set(task02), standby = Set()) + } + + test("3 clients, 3 tasks, with standby 0. Prerestore 0") { + createClient(0, 1, preRestore = true) + createClient(1, 1) + createClient(2, 1) + + val assignor = new StaticTaskAssignor[Int]( + serviceShardPartitioner = StaticServiceShardPartitioner(numShards = 3), + clientsMetadata = clients.asScala.toMap, + taskIds = Set(task00, task01, task02) + ) + + val numStandbyReplicas = 0 + assignor.assign(numStandbyReplicas) + + assertTasks(instanceId = 0, active = Set(task00), standby = Set()) + assertTasks(instanceId = 1, active = Set(task01), standby = Set()) + assertTasks(instanceId = 2, active = Set(task02), standby = Set()) + } + + test("3 clients, 3 tasks, with standby 0. Prerestore 0, 1") { + createClient(0, 1, preRestore = true) + createClient(1, 1, preRestore = true) + createClient(2, 1) + + val assignor = new StaticTaskAssignor[Int]( + serviceShardPartitioner = StaticServiceShardPartitioner(numShards = 3), + clientsMetadata = clients.asScala.toMap, + taskIds = Set(task00, task01, task02) + ) + + val numStandbyReplicas = 0 + assignor.assign(numStandbyReplicas) + + assertTasks(instanceId = 0, active = Set(task00), standby = Set()) + assertTasks(instanceId = 1, active = Set(task01), standby = Set()) + assertTasks(instanceId = 2, active = Set(task02), standby = Set()) + } + + test("3 clients, 3 tasks, with standby 0. Prerestore 0, 1, 2") { + createClient(0, 1, preRestore = true) + createClient(1, 1, preRestore = true) + createClient(2, 1, preRestore = true) + + val assignor = new StaticTaskAssignor[Int]( + serviceShardPartitioner = StaticServiceShardPartitioner(numShards = 3), + clientsMetadata = clients.asScala.toMap, + taskIds = Set(task00, task01, task02) + ) + + val numStandbyReplicas = 0 + assignor.assign(numStandbyReplicas) + + assertTasks(instanceId = 0, active = Set(task00), standby = Set()) + assertTasks(instanceId = 1, active = Set(task01), standby = Set()) + assertTasks(instanceId = 2, active = Set(task02), standby = Set()) + } + + test("3 clients, 3 tasks, with standby 1") { + createClient(0, 1) + createClient(1, 1) + createClient(2, 1) + + val assignor = new StaticTaskAssignor[Int]( + serviceShardPartitioner = StaticServiceShardPartitioner(numShards = 3), + clientsMetadata = clients.asScala.toMap, + taskIds = Set(task00, task01, task02) + ) + + val numStandbyReplicas = 1 + assignor.assign(numStandbyReplicas) + + assertTasks(instanceId = 0, active = Set(task00), standby = Set(task02)) + assertTasks(instanceId = 1, active = Set(task01), standby = Set(task00)) + assertTasks(instanceId = 2, active = Set(task02), standby = Set(task01)) + } + + test("3 clients, 3 tasks, with standby 1. Prerestore 0") { + createClient(0, 1, preRestore = true) + createClient(1, 1) + createClient(2, 1) + + val assignor = new StaticTaskAssignor[Int]( + serviceShardPartitioner = StaticServiceShardPartitioner(numShards = 3), + clientsMetadata = clients.asScala.toMap, + taskIds = Set(task00, task01, task02) + ) + + val numStandbyReplicas = 1 + assignor.assign(numStandbyReplicas) + + assertTasks(instanceId = 0, active = Set(), standby = Set(task00)) + assertTasks(instanceId = 1, active = Set(task01, task00), standby = Set()) + assertTasks(instanceId = 2, active = Set(task02), standby = Set(task01)) + } + + test("3 clients, 3 tasks, with standby 1. Prerestore 0 and 1") { + createClient(0, 1, preRestore = true) + createClient(1, 1, preRestore = true) + createClient(2, 1) + + val assignor = new StaticTaskAssignor[Int]( + serviceShardPartitioner = StaticServiceShardPartitioner(numShards = 3), + clientsMetadata = clients.asScala.toMap, + taskIds = Set(task00, task01, task02) + ) + + val numStandbyReplicas = 1 + assignor.assign(numStandbyReplicas) + + assertTasks(instanceId = 0, active = Set(), standby = Set(task00)) + assertTasks(instanceId = 1, active = Set(), standby = Set(task01)) + assertTasks(instanceId = 2, active = Set(task02, task01), standby = Set()) + } + + test("3 clients, 3 tasks, with standby 1. Prerestore 0 and 1 and 2") { + createClient(0, 1, preRestore = true) + createClient(1, 1, preRestore = true) + createClient(2, 1, preRestore = true) + + val assignor = new StaticTaskAssignor[Int]( + serviceShardPartitioner = StaticServiceShardPartitioner(numShards = 3), + clientsMetadata = clients.asScala.toMap, + taskIds = Set(task00, task01, task02) + ) + + val numStandbyReplicas = 1 + assignor.assign(numStandbyReplicas) + + assertTasks(instanceId = 0, active = Set(), standby = Set(task00)) + assertTasks(instanceId = 1, active = Set(), standby = Set(task01)) + assertTasks(instanceId = 2, active = Set(), standby = Set(task02)) + } + + private def assertTasks(instanceId: Int, active: Set[TaskId], standby: Set[TaskId]): Unit = { + val state = clients.get(instanceId) + assert(state.clientState.activeTasks().asScala == active) + assert(state.clientState.standbyTasks().asScala == standby) + } + + private def createClient(processId: Int, capacity: Int, preRestore: Boolean = false) = { + val clientState = new ClientState(capacity) + val hostInfo = new HostInfo( + s"$processId.foo.com", + if (preRestore) StaticPartitioning.PreRestoreSignalingPort else 12345 + ) + clients.put(processId, ClientStateAndHostInfo(processId, clientState, hostInfo)) + clientState + } +} diff --git a/kafka-streams/kafka-streams/src/main/java/BUILD b/kafka-streams/kafka-streams/src/main/java/BUILD new file mode 100644 index 0000000000..0e8b8664b7 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/java/BUILD @@ -0,0 +1,20 @@ +java_library( + sources = rglobs( + "com/twitter/finatra/kafkastreams/*.java", + "com/twitter/finatra/streams/*.java", + "org/*.java", + ), + compiler_option_sets = {}, + provides = artifact( + org = "com.twitter", + name = "finatra-streams-java", + repo = artifactory, + ), + dependencies = [ + "3rdparty/jvm/org/agrona", + "3rdparty/jvm/org/apache/kafka:kafka-clients", + "3rdparty/jvm/org/apache/kafka:kafka-streams", + ], + exports = [ + ], +) diff --git a/kafka-streams/kafka-streams/src/main/java/com/twitter/finatra/kafkastreams/domain/ProcessingGuarantee.java b/kafka-streams/kafka-streams/src/main/java/com/twitter/finatra/kafkastreams/domain/ProcessingGuarantee.java new file mode 100644 index 0000000000..1dd3ecfaad --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/java/com/twitter/finatra/kafkastreams/domain/ProcessingGuarantee.java @@ -0,0 +1,17 @@ +package com.twitter.finatra.kafkastreams.domain; + +public enum ProcessingGuarantee { + AT_LEAST_ONCE("at_least_once"), + EXACTLY_ONCE("exactly_once"); + + private String value; + + ProcessingGuarantee(String value) { + this.value = value; + } + + @Override + public String toString() { + return value; + } +} diff --git a/kafka-streams/kafka-streams/src/main/java/org/apache/kafka/streams/state/internals/InMemoryKeyValueFlushingLoggedStore.java b/kafka-streams/kafka-streams/src/main/java/org/apache/kafka/streams/state/internals/InMemoryKeyValueFlushingLoggedStore.java new file mode 100644 index 0000000000..597c473fb7 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/java/org/apache/kafka/streams/state/internals/InMemoryKeyValueFlushingLoggedStore.java @@ -0,0 +1,154 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.kafka.streams.state.internals; + +// SUPPRESS CHECKSTYLE:OFF LineLength +// SUPPRESS CHECKSTYLE:OFF ModifierOrder +// SUPPRESS CHECKSTYLE:OFF OperatorWrap +// SUPPRESS CHECKSTYLE:OFF HiddenField +// SUPPRESS CHECKSTYLE:OFF NeedBraces +// SUPPRESS CHECKSTYLE:OFF NestedForDepth +// SUPPRESS CHECKSTYLE:OFF JavadocStyle +// SUPPRESS CHECKSTYLE:OFF NestedForDepth + +import java.util.List; + +import org.apache.kafka.common.serialization.Serde; +import org.apache.kafka.streams.KeyValue; +import org.apache.kafka.streams.processor.ProcessorContext; +import org.apache.kafka.streams.processor.StateStore; +import org.apache.kafka.streams.processor.internals.ProcessorStateManager; +import org.apache.kafka.streams.state.KeyValueIterator; +import org.apache.kafka.streams.state.KeyValueStore; +import org.apache.kafka.streams.state.StateSerdes; + +//Note: This class is copied from Kafka Streams InMemoryKeyValueLoggedStore with the only changes commented with "Twitter Changed" +public class InMemoryKeyValueFlushingLoggedStore extends WrappedStateStore.AbstractStateStore implements KeyValueStore { + + private final KeyValueStore inner; + private final Serde keySerde; + private final Serde valueSerde; + + // Twitter Changed + //private StoreChangeLogger changeLogger; + private StoreChangeFlushingLogger changeLogger; + + InMemoryKeyValueFlushingLoggedStore(final KeyValueStore inner, Serde keySerde, Serde valueSerde) { + super(inner); + this.inner = inner; + this.keySerde = keySerde; + this.valueSerde = valueSerde; + } + + @Override + @SuppressWarnings("unchecked") + public void init(ProcessorContext context, StateStore root) { + inner.init(context, root); + + // construct the serde + StateSerdes serdes = new StateSerdes<>( + ProcessorStateManager.storeChangelogTopic(context.applicationId(), inner.name()), + keySerde == null ? (Serde) context.keySerde() : keySerde, + valueSerde == null ? (Serde) context.valueSerde() : valueSerde); + + // Twitter Changed + //this.changeLogger = new StoreChangeLogger<>(inner.name(), context, serdes); + this.changeLogger = new StoreChangeFlushingLogger<>(inner.name(), context, serdes); + + // if the inner store is an LRU cache, add the eviction listener to log removed record + if (inner instanceof MemoryLRUCache) { + ((MemoryLRUCache) inner).whenEldestRemoved(new MemoryNavigableLRUCache.EldestEntryRemovalListener() { + @Override + public void apply(K key, V value) { + removed(key); + } + }); + } + } + + @Override + public long approximateNumEntries() { + return inner.approximateNumEntries(); + } + + @Override + public V get(K key) { + return this.inner.get(key); + } + + @Override + public void put(K key, V value) { + this.inner.put(key, value); + + changeLogger.logChange(key, value); + } + + @Override + public V putIfAbsent(K key, V value) { + V originalValue = this.inner.putIfAbsent(key, value); + if (originalValue == null) { + changeLogger.logChange(key, value); + } + return originalValue; + } + + @Override + public void putAll(List> entries) { + this.inner.putAll(entries); + + for (KeyValue entry : entries) { + K key = entry.key; + changeLogger.logChange(key, entry.value); + } + } + + @Override + public V delete(K key) { + V value = this.inner.delete(key); + + removed(key); + + return value; + } + + /** + * Called when the underlying {@link #inner} {@link KeyValueStore} removes an entry in response to a call from this + * store. + * + * @param key the key for the entry that the inner store removed + */ + protected void removed(K key) { + changeLogger.logChange(key, null); + } + + @Override + public KeyValueIterator range(K from, K to) { + return this.inner.range(from, to); + } + + @Override + public KeyValueIterator all() { + return this.inner.all(); + } + + @Override + //Twitter added + public void flush() { + changeLogger.flush(); + super.flush(); + } +} diff --git a/kafka-streams/kafka-streams/src/main/java/org/apache/kafka/streams/state/internals/StoreChangeFlushingLogger.java b/kafka-streams/kafka-streams/src/main/java/org/apache/kafka/streams/state/internals/StoreChangeFlushingLogger.java new file mode 100644 index 0000000000..ff5153422a --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/java/org/apache/kafka/streams/state/internals/StoreChangeFlushingLogger.java @@ -0,0 +1,153 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.kafka.streams.state.internals; + +// SUPPRESS CHECKSTYLE:OFF LineLength +// SUPPRESS CHECKSTYLE:OFF ModifierOrder +// SUPPRESS CHECKSTYLE:OFF OperatorWrap +// SUPPRESS CHECKSTYLE:OFF HiddenField +// SUPPRESS CHECKSTYLE:OFF NeedBraces +// SUPPRESS CHECKSTYLE:OFF NestedForDepth +// SUPPRESS CHECKSTYLE:OFF JavadocStyle +// SUPPRESS CHECKSTYLE:OFF NestedForDepth +// SUPPRESS CHECKSTYLE:OFF ConstantName + +import java.util.function.BiConsumer; + +import org.agrona.collections.Hashing; +import org.agrona.collections.Object2ObjectHashMap; +import org.apache.kafka.common.serialization.Serializer; +import org.apache.kafka.streams.processor.ProcessorContext; +import org.apache.kafka.streams.processor.TaskId; +import org.apache.kafka.streams.processor.internals.ProcessorStateManager; +import org.apache.kafka.streams.processor.internals.RecordCollector; +import org.apache.kafka.streams.state.StateSerdes; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * Note that the use of array-typed keys is discouraged because they result in incorrect caching behavior. + * If you intend to work on byte arrays as key, for example, you may want to wrap them with the {@code Bytes} class, + * i.e. use {@code RocksDBStore} rather than {@code RocksDBStore}. + * + * @param + * @param + */ +//See FlushingStores for motivations of this class +//Note: This class is copied from Kafka Streams StoreChangeLogger with the only changes commented with "Twitter Changed" +// The modifications provide "flushing" functionality which flushes the latest records for a given key to the changelog +// after every Kafka commit (which triggers the flush method below) +class StoreChangeFlushingLogger { + + protected final StateSerdes serialization; + + private final String topic; + private final int partition; + private final ProcessorContext context; + private final RecordCollector collector; + + // Twitter Changed + private static final Logger log = LoggerFactory.getLogger(StoreChangeFlushingLogger.class); + private final TaskId taskId; + private final Serializer keySerializer; + private final Serializer valueSerializer; + private final Object2ObjectHashMap> newEntries = new Object2ObjectHashMap<>(100000, Hashing.DEFAULT_LOAD_FACTOR); + + StoreChangeFlushingLogger(String storeName, ProcessorContext context, StateSerdes serialization) { + this(storeName, context, context.taskId().partition, serialization); + } + + private StoreChangeFlushingLogger(String storeName, ProcessorContext context, int partition, StateSerdes serialization) { + this.topic = ProcessorStateManager.storeChangelogTopic(context.applicationId(), storeName); + this.context = context; + this.partition = partition; + this.serialization = serialization; + this.collector = ((RecordCollector.Supplier) context).recordCollector(); + + // Twitter Added + this.taskId = context.taskId(); + this.keySerializer = serialization.keySerializer(); + this.valueSerializer = serialization.valueSerializer(); + } + + void logChange(final K key, final V value) { + if (collector != null) { + // Twitter Added + newEntries.put(key, new ValueAndTimestamp<>(value, context.timestamp())); + } + } + + // Twitter Changed + /* + * logChange now saves new entries into a map, which collapses entries using the same key. When flush + * is called, we send the latest collapsed entries to the changelog topic. By buffering entries + * before flush is called, we avoid writing every log change to the changelog topic. + * Pros: Less messages to and from changelog. Less broker side compaction needed. Bursts of changelog messages are better batched and compressed. + * Cons: Changelog messages are written to the changelog topic in bursts. + */ + void flush() { + if (!newEntries.isEmpty()) { + newEntries.forEach(foreachConsumer); + log.info("Task " + taskId + " flushed " + newEntries.size() + " entries into " + topic + "." + partition); + newEntries.clear(); + } + } + + private final BiConsumer> foreachConsumer = new BiConsumer>() { + @Override + public final void accept(K key, ValueAndTimestamp valueAndTimestamp) { + // Sending null headers to changelog topics (KIP-244) + collector.send( + topic, + key, + valueAndTimestamp.value, + null, + partition, + valueAndTimestamp.timestamp, + keySerializer, + valueSerializer); + } + }; + + class ValueAndTimestamp { + public final V value; + public final Long timestamp; + + ValueAndTimestamp(V value, Long timestamp) { + this.value = value; + this.timestamp = timestamp; + } + + @Override + public boolean equals(Object o) { + if (this == o) return true; + if (o == null || getClass() != o.getClass()) return false; + + ValueAndTimestamp that = (ValueAndTimestamp) o; + + if (value != null ? !value.equals(that.value) : that.value != null) return false; + return timestamp != null ? timestamp.equals(that.timestamp) : that.timestamp == null; + } + + @Override + public int hashCode() { + int result = value != null ? value.hashCode() : 0; + result = 31 * result + (timestamp != null ? timestamp.hashCode() : 0); + return result; + } + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/BUILD b/kafka-streams/kafka-streams/src/main/scala/BUILD new file mode 100644 index 0000000000..588f03f714 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/BUILD @@ -0,0 +1,53 @@ +# Upgrade version of rocksdb +jar_library( + name = "rocksdb-5.14.2", + jars = [ + jar( + org = "org.rocksdb", + name = "rocksdbjni", + force = True, + rev = "5.14.2", + ), + ], +) + +scala_library( + sources = rglobs( + "com/twitter/finatra/kafkastreams/*.scala", + "com/twitter/finatra/streams/*.scala", + "org/*.scala", + ), + compiler_option_sets = {"fatal_warnings"}, + provides = scala_artifact( + org = "com.twitter", + name = "finatra-streams", + repo = artifactory, + ), + strict_deps = False, + dependencies = [ + ":rocksdb-5.14.2", + "3rdparty/jvm/it/unimi/dsi:fastutil", + "3rdparty/jvm/org/agrona", + "3rdparty/jvm/org/apache/kafka:kafka-clients", + "3rdparty/jvm/org/apache/kafka:kafka-streams", + "3rdparty/jvm/org/apache/kafka:kafka-streams-scala", + "finatra/inject/inject-core", + "finatra/inject/inject-server", + "finatra/inject/inject-slf4j", + "finatra/inject/inject-utils", + "finatra/kafka-streams/kafka-streams-queryable-thrift-client/src/main/scala", + "finatra/kafka-streams/kafka-streams/src/main/java", + "finatra/kafka/src/main/scala", + ], + exports = [ + "3rdparty/jvm/org/apache/kafka:kafka-clients", + "3rdparty/jvm/org/apache/kafka:kafka-streams", + "3rdparty/jvm/org/apache/kafka:kafka-streams-scala", + "finatra/inject/inject-core", + "finatra/inject/inject-server", + "finatra/inject/inject-slf4j", + "finatra/inject/inject-utils", + "finatra/kafka-streams/kafka-streams/src/main/java", + "finatra/kafka/src/main/scala", + ], +) diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/KafkaStreamsTwitterServer.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/KafkaStreamsTwitterServer.scala new file mode 100644 index 0000000000..ce0916036e --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/KafkaStreamsTwitterServer.scala @@ -0,0 +1,336 @@ +package com.twitter.finatra.kafkastreams + +import com.twitter.app.Flag +import com.twitter.conversions.DurationOps._ +import com.twitter.conversions.StorageUnitOps._ +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finatra.kafka.interceptors.{ + InstanceMetadataProducerInterceptor, + MonitoringConsumerInterceptor, + PublishTimeProducerInterceptor +} +import com.twitter.finatra.kafka.stats.KafkaFinagleMetricsReporter +import com.twitter.finatra.kafkastreams.config.{FinatraRocksDBConfig, KafkaStreamsConfig} +import com.twitter.finatra.kafkastreams.domain.ProcessingGuarantee +import com.twitter.finatra.kafkastreams.internal.ScalaStreamsImplicits +import com.twitter.finatra.kafkastreams.internal.listeners.FinatraStateRestoreListener +import com.twitter.finatra.kafkastreams.internal.serde.AvoidDefaultSerde +import com.twitter.finatra.kafkastreams.internal.stats.KafkaStreamsFinagleMetricsReporter +import com.twitter.finatra.kafkastreams.utils.KafkaFlagUtils +import com.twitter.finatra.streams.interceptors.KafkaStreamsMonitoringConsumerInterceptor +import com.twitter.inject.server.TwitterServer +import com.twitter.util.Duration +import java.util.Properties +import java.util.concurrent.TimeUnit +import java.util.concurrent.atomic.AtomicLong +import org.apache.kafka.clients.consumer.{ConsumerConfig, OffsetResetStrategy} +import org.apache.kafka.common.metrics.Sensor.RecordingLevel +import org.apache.kafka.streams.KafkaStreams.{State, StateListener} +import org.apache.kafka.streams.processor.internals.DefaultKafkaClientSupplier +import org.apache.kafka.streams.{ + KafkaClientSupplier, + KafkaStreams, + StreamsBuilder, + StreamsConfig, + Topology +} + +/** + * A [[com.twitter.server.TwitterServer]] that supports configuring a KafkaStreams topology. + * + * To use, override the [[configureKafkaStreams]] method to setup your topology. + * + * {{{ + * import com.twitter.finatra.kafkastreams.KafkaStreamsTwitterServer + * + * object MyKafkaStreamsTwitterServerMain extends MyKafkaStreamsTwitterServer + * + * class MyKafkaStreamsTwitterServer extends KafkaStreamsTwitterServer { + * + * override def configureKafkaStreams(streamsBuilder: StreamsBuilder): Unit = { + * streamsBuilder.asScala + * .stream("dp-it-devel-tweetid-to-interaction")( + * Consumed.`with`(ScalaSerdes.Long, ScalaSerdes.Thrift[MigratorInteraction]) + * ) + * } + * }}} + */ +abstract class KafkaStreamsTwitterServer + extends TwitterServer + with KafkaFlagUtils + with ScalaStreamsImplicits { + + // Required configs + protected[kafkastreams] val applicationId = + requiredKafkaFlag[String](StreamsConfig.APPLICATION_ID_CONFIG) + protected[kafkastreams] val bootstrapServer = requiredKafkaFlag[String]( + StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, + helpPrefix = "A finagle destination or" + ) + + // Configs using kafka default + private val numStreamThreads = + flagWithKafkaDefault[Integer](StreamsConfig.NUM_STREAM_THREADS_CONFIG) + private val numStandbyReplicas = + flagWithKafkaDefault[Integer](StreamsConfig.NUM_STANDBY_REPLICAS_CONFIG) + private val processingGuarantee = + flagWithKafkaDefault[String](StreamsConfig.PROCESSING_GUARANTEE_CONFIG) + private val cacheMaxBytesBuffering = + flagWithKafkaDefault[Long](StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG) + private val metadataMaxAge = flagWithKafkaDefault[Long](StreamsConfig.METADATA_MAX_AGE_CONFIG) + private val maxPollRecords = kafkaFlag[Int]( + ConsumerConfig.MAX_POLL_RECORDS_CONFIG, + 500, + "The maximum number of records returned in a single call to poll()." + ) //TODO: Use ConsumerConfig aware flagWithKafkaDefault + + // Configs with customized default + private val replicationFactor = kafkaFlag(StreamsConfig.REPLICATION_FACTOR_CONFIG, 3) // We set it to 3 for durability and reliability. + protected[kafkastreams] val applicationServerConfig = + kafkaFlag(StreamsConfig.APPLICATION_SERVER_CONFIG, s"localhost:$defaultAdminPort") + private val stateDir = kafkaFlag(StreamsConfig.STATE_DIR_CONFIG, "kafka-stream-state") + private val metricsRecordingLevel = + kafkaFlag(StreamsConfig.METRICS_RECORDING_LEVEL_CONFIG, "INFO") + private val autoOffsetReset = kafkaFlag( + ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, + "latest", + ConsumerConfig.AUTO_OFFSET_RESET_DOC + ) + protected val commitInterval: Flag[Duration] = flag( + "kafka.commit.interval", + 30.seconds, + "The frequency with which to save the position of the processor." + ) + private val instanceKey: Flag[String] = flag( + InstanceMetadataProducerInterceptor.KafkaInstanceKeyFlagName, + "", + "The application specific identifier for process or job that gets added to record header as `instance_key`." + + "The `instance_key` is only included when this flag is set, otherwise no header will be included." + ) + + @volatile private var timeStartedRebalancingOpt: Option[Long] = None + private val totalTimeRebalancing: AtomicLong = new AtomicLong(0) + + @volatile private var lastUncaughtException: Throwable = _ + + def uncaughtException: Throwable = lastUncaughtException + + protected[kafkastreams] val kafkaStreamsBuilder = new StreamsBuilder() + protected[kafkastreams] var properties: Properties = _ + protected[kafkastreams] var topology: Topology = _ + protected var kafkaStreams: KafkaStreams = _ + + /* Abstract Protected */ + + /** + * Callback method which is executed after the injector is created and before any other lifecycle + * methods. + * + * Use the provided StreamsBuilder to create your KafkaStreams topology. + * + * @note It is NOT expected that you block in this method as you will prevent completion + * of the server lifecycle. + * @param builder + */ + protected def configureKafkaStreams(builder: StreamsBuilder): Unit + + /* Protected */ + + override val defaultCloseGracePeriod: Duration = 1.minute + + protected def streamsStatsReceiver: StatsReceiver = { + injector.instance[StatsReceiver].scope("kafka").scope("stream") + } + + override protected def postInjectorStartup(): Unit = { + super.postInjectorStartup() + properties = createKafkaStreamsProperties() + topology = createKafkaStreamsTopology() + } + + override protected def postWarmup(): Unit = { + super.postWarmup() + createAndStartKafkaStreams() + } + + /* Protected */ + + protected[finatra] def createAndStartKafkaStreams(): Unit = { + kafkaStreams = new KafkaStreams(topology, properties, kafkaStreamsClientSupplier) + setExceptionHandler(kafkaStreams) + monitorStateChanges(kafkaStreams) + closeKafkaStreamsOnExit(kafkaStreams) + + kafkaStreams.start() + while (!kafkaStreams.state().isRunning) { + Thread.sleep(100) + debug("Waiting for Initial Kafka Streams Startup") + } + } + + protected def kafkaStreamsClientSupplier: KafkaClientSupplier = { + new DefaultKafkaClientSupplier + } + + protected def onStateChange(newState: State, oldState: State): Unit = {} + + protected def setExceptionHandler(streams: KafkaStreams): Unit = { + streams.setUncaughtExceptionHandler(new Thread.UncaughtExceptionHandler() { + override def uncaughtException(t: Thread, e: Throwable): Unit = { + error("UncaughtException in thread " + t, e) + lastUncaughtException = e + } + }) + } + + /** + * Callback method which is executed after the injector is created and before KafkaStreams is + * configured. + * + * Use the provided KafkaStreamsConfig and augment to configure your KafkaStreams topology. + * + * Example: + * + * {{{ + * override def streamsProperties(config: KafkaStreamsConfig): KafkaStreamsConfig = { + * super + * .streamsProperties(config) + * .retries(60) + * .retryBackoff(1.second) + * .consumer.sessionTimeout(10.seconds) + * .consumer.heartbeatInterval(1.second) + * .producer.retries(300) + * .producer.retryBackoff(1.second) + * .producer.requestTimeout(2.minutes) + * .producer.transactionTimeout(2.minutes) + * .producer.batchSize(500.kilobytes) + * } + * }}} + * + * + * @param config the default KafkaStreamsConfig defined at [[createKafkaStreamsProperties]] + * + * @return a KafkaStreamsConfig with your additional configurations applied. + */ + protected def streamsProperties(config: KafkaStreamsConfig): KafkaStreamsConfig = config + + protected[finatra] def createKafkaStreamsProperties(): Properties = { + var defaultConfig = + new KafkaStreamsConfig() + .metricReporter[KafkaStreamsFinagleMetricsReporter] + .metricsRecordingLevelConfig(RecordingLevel.forName(metricsRecordingLevel())) + .metricsSampleWindow(60.seconds) + .applicationServer(applicationServerConfig()) + .dest(bootstrapServer()) + .stateDir(stateDir()) + .commitInterval(commitInterval()) + .replicationFactor(replicationFactor()) + .numStreamThreads(numStreamThreads()) + .cacheMaxBuffering(cacheMaxBytesBuffering().bytes) + .numStandbyReplicas(numStandbyReplicas()) + .metadataMaxAge(metadataMaxAge().milliseconds) + .processingGuarantee(ProcessingGuarantee.valueOf(processingGuarantee().toUpperCase)) + .defaultKeySerde[AvoidDefaultSerde] + .defaultValueSerde[AvoidDefaultSerde] + .withConfig(InstanceMetadataProducerInterceptor.KafkaInstanceKeyFlagName, instanceKey()) + .producer.metricReporter[KafkaStreamsFinagleMetricsReporter] + .producer.metricsRecordingLevel(RecordingLevel.forName(metricsRecordingLevel())) + .producer.metricsSampleWindow(60.seconds) + .producer.interceptor[PublishTimeProducerInterceptor] + .producer.interceptor[InstanceMetadataProducerInterceptor] + .consumer.metricReporter[KafkaStreamsFinagleMetricsReporter] + .consumer.metricsRecordingLevel(RecordingLevel.forName(metricsRecordingLevel())) + .consumer.metricsSampleWindow(60.seconds) + .consumer.autoOffsetReset(OffsetResetStrategy.valueOf(autoOffsetReset().toUpperCase)) + .consumer.maxPollRecords(maxPollRecords()) + .consumer.interceptor[KafkaStreamsMonitoringConsumerInterceptor] + + if (applicationId().nonEmpty) { + defaultConfig = defaultConfig.applicationId(applicationId()) + } + + val properties = streamsProperties(defaultConfig).properties + + // Extra properties used by KafkaStreamsFinagleMetricsReporter. + properties.put("stats_scope", "kafka") + properties.put(StreamsConfig.producerPrefix("stats_scope"), "kafka") + properties.put(StreamsConfig.consumerPrefix("stats_scope"), "kafka") + + properties + } + + protected[finatra] def createKafkaStreamsTopology(): Topology = { + KafkaFinagleMetricsReporter.init(injector) + MonitoringConsumerInterceptor.init(injector) + FinatraRocksDBConfig.init(injector) + + configureKafkaStreams(kafkaStreamsBuilder) + val topology = kafkaStreamsBuilder.build() + info(topology.describe) + topology + } + + /* Private */ + + private def closeKafkaStreamsOnExit(kafkaStreamsToClose: KafkaStreams): Unit = { + onExit { + info("Closing kafka streams") + try { + kafkaStreams.close(defaultCloseGracePeriod.inMillis, TimeUnit.MILLISECONDS) + } catch { + case e: Throwable => + error("Error while closing kafka streams", e) + } + info("Closed kafka streams") + } + } + + private def monitorStateChanges(streams: KafkaStreams): Unit = { + streams.setStateListener(new FinatraStateChangeListener(streams)) + + streams.setGlobalStateRestoreListener(new FinatraStateRestoreListener(streamsStatsReceiver)) + + streamsStatsReceiver.provideGauge("totalTimeRebalancing")(totalTimeRebalancing.get()) + + streamsStatsReceiver.provideGauge("state") { + streams.state match { + case State.CREATED => 1 + case State.RUNNING => 2 + case State.REBALANCING => 3 + case State.PENDING_SHUTDOWN => 4 + case State.NOT_RUNNING => 5 + case State.ERROR => 6 + } + } + } + + private class FinatraStateChangeListener(streams: KafkaStreams) extends StateListener { + override def onChange(newState: State, oldState: State): Unit = { + debug(streams.toString) + if (newState == State.REBALANCING) { + timeStartedRebalancingOpt = Some(System.currentTimeMillis()) + } else { + for (timeStartedRebalancing <- timeStartedRebalancingOpt) { + totalTimeRebalancing.addAndGet(System.currentTimeMillis - timeStartedRebalancing) + timeStartedRebalancingOpt = None + } + } + + onStateChange(newState, oldState) + + if (newState == State.ERROR) { + forkAndCloseServer("State.Error") + } + } + } + + // Note: Kafka feature tests hang without closing the twitter server from a separate thread. + private def forkAndCloseServer(reason: String): Unit = { + new Thread { + override def run(): Unit = { + info(s"FinatraStreams closing server") + close(defaultCloseGracePeriod) + } + }.start() + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/StatelessKafkaStreamsTwitterServer.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/StatelessKafkaStreamsTwitterServer.scala new file mode 100644 index 0000000000..8f59d6ecf9 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/StatelessKafkaStreamsTwitterServer.scala @@ -0,0 +1,24 @@ +package com.twitter.finatra.kafkastreams + +import com.twitter.finatra.kafkastreams.internal.utils.TopologyReflectionUtils +import org.apache.kafka.streams.Topology + +/** + * + * StatelessKafkaStreamsServer is used for stateless Kafka transformations that do not need to store data in local state stores. + * Note 1: When using this class, server startup will fail if a local state store is used. + * Note 2: In the future, we could potentially, use a different TaskAssignment strategy to avoid extra unneeded metadata in the + * partition join requests + */ +abstract class StatelessKafkaStreamsTwitterServer extends KafkaStreamsTwitterServer { + + override def createKafkaStreamsTopology(): Topology = { + val topology = super.createKafkaStreamsTopology() + if (!TopologyReflectionUtils.isStateless(topology)) { + throw new UnsupportedOperationException( + "This server is using StatelessKafkaStreamsTwitterServer but it is not a stateless topology" + ) + } + topology + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/config/FinatraRocksDBConfig.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/config/FinatraRocksDBConfig.scala new file mode 100644 index 0000000000..52b1adbc4b --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/config/FinatraRocksDBConfig.scala @@ -0,0 +1,138 @@ +package com.twitter.finatra.kafkastreams.config + +import com.twitter.conversions.StorageUnitOps._ +import com.twitter.finagle.stats.{LoadedStatsReceiver, StatsReceiver} +import com.twitter.finatra.kafkastreams.internal.stats.RocksDBStatsCallback +import com.twitter.inject.{Injector, Logging} +import com.twitter.jvm.numProcs +import com.twitter.util.StorageUnit +import java.util +import org.apache.kafka.streams.state.RocksDBConfigSetter +import org.rocksdb.{ + BlockBasedTableConfig, + BloomFilter, + CompactionStyle, + CompressionType, + InfoLogLevel, + LRUCache, + Options, + Statistics, + StatisticsCollector, + StatsCollectorInput, + StatsLevel +} + +object FinatraRocksDBConfig { + + val RocksDbBlockCacheSizeConfig = "rocksdb.block.cache.size" + val RocksDbLZ4Config = "rocksdb.lz4" + val RocksDbEnableStatistics = "rocksdb.statistics" + val RocksDbStatCollectionPeriodMs = "rocksdb.statistics.collection.period.ms" + + // BlockCache to be shared by all RocksDB instances created on this instance. Note: That a single Kafka Streams instance may get multiple tasks assigned to it + // and each stateful task will have a separate RocksDB instance created. This cache will be shared across all the tasks. + // See: https://github.com/facebook/rocksdb/wiki/Block-Cache + private var SharedBlockCache: LRUCache = _ + + private var globalStatsReceiver: StatsReceiver = LoadedStatsReceiver + + def init(injector: Injector): Unit = { + globalStatsReceiver = injector.instance[StatsReceiver] + } +} + +class FinatraRocksDBConfig extends RocksDBConfigSetter with Logging { + + //See https://github.com/facebook/rocksdb/wiki/Setup-Options-and-Basic-Tuning#other-general-options + override def setConfig( + storeName: String, + options: Options, + configs: util.Map[String, AnyRef] + ): Unit = { + if (FinatraRocksDBConfig.SharedBlockCache == null) { + val blockCacheSize = + getBytesOrDefault(configs, FinatraRocksDBConfig.RocksDbBlockCacheSizeConfig, 100.megabytes) + val numShardBits = 1 //TODO: Make configurable so this can be increased for multi-threaded queryable state access + FinatraRocksDBConfig.SharedBlockCache = new LRUCache(blockCacheSize, numShardBits) + } + + val tableConfig = new BlockBasedTableConfig + tableConfig.setBlockSize(16 * 1024) + tableConfig.setBlockCache(FinatraRocksDBConfig.SharedBlockCache) + tableConfig.setFilter(new BloomFilter(10)) + options + .setTableFormatConfig(tableConfig) + + options + .setDbWriteBufferSize(0) + .setWriteBufferSize(1.gigabyte.inBytes) //TODO: Make configurable with default value equal to RocksDB default (which is much lower than 1 GB!) + .setMinWriteBufferNumberToMerge(1) + .setMaxWriteBufferNumber(2) + + options + .setBytesPerSync(1048576) //See: https://github.com/facebook/rocksdb/wiki/Setup-Options-and-Basic-Tuning#other-general-options + .setMaxBackgroundCompactions(4) + .setMaxBackgroundFlushes(2) + .setIncreaseParallelism(Math.max(numProcs().toInt, 2)) + + /* From the docs: "Allows thread-safe inplace updates. If this is true, there is no way to + achieve point-in-time consistency using snapshot or iterator (assuming concurrent updates). + Hence iterator and multi-get will return results which are not consistent as of any point-in-time." */ + options + .setInplaceUpdateSupport(true) //We set to true since we never have concurrent updates + .setAllowConcurrentMemtableWrite(false) + .setEnableWriteThreadAdaptiveYield(false) + + options + .setCompactionStyle(CompactionStyle.UNIVERSAL) + .setMaxBytesForLevelBase(1.gigabyte.inBytes) + .setLevelCompactionDynamicLevelBytes(true) + .optimizeUniversalStyleCompaction() + + if (configs.get(FinatraRocksDBConfig.RocksDbLZ4Config) == "true") { + options.setCompressionType(CompressionType.LZ4_COMPRESSION) + } + + options + .setInfoLogLevel(InfoLogLevel.DEBUG_LEVEL) + + if (configs.get(FinatraRocksDBConfig.RocksDbEnableStatistics) == "true") { + val statistics = new Statistics + + val statsCallback = new RocksDBStatsCallback(FinatraRocksDBConfig.globalStatsReceiver) + val statsCollectorInput = new StatsCollectorInput(statistics, statsCallback) + val statsCollector = new StatisticsCollector( + util.Arrays.asList(statsCollectorInput), + getIntOrDefault(configs, FinatraRocksDBConfig.RocksDbStatCollectionPeriodMs, 60000) + ) + statsCollector.start() + + statistics.setStatsLevel(StatsLevel.ALL) + options + .setStatistics(statistics) + .setStatsDumpPeriodSec(20) + } + } + + private def getBytesOrDefault( + configs: util.Map[String, AnyRef], + key: String, + default: StorageUnit + ): Long = { + val valueBytesString = configs.get(key) + if (valueBytesString != null) { + valueBytesString.toString.toLong + } else { + default.inBytes + } + } + + private def getIntOrDefault(configs: util.Map[String, AnyRef], key: String, default: Int): Int = { + val valueString = configs.get(key) + if (valueString != null) { + valueString.toString.toInt + } else { + default + } + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/config/KafkaStreamsConfig.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/config/KafkaStreamsConfig.scala new file mode 100644 index 0000000000..b4109a7bc2 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/config/KafkaStreamsConfig.scala @@ -0,0 +1,164 @@ +package com.twitter.finatra.kafkastreams.config + +import com.twitter.finatra.kafka.config.{KafkaConfigMethods, ToKafkaProperties} +import com.twitter.finatra.kafka.consumers.KafkaConsumerConfigMethods +import com.twitter.finatra.kafka.producers.KafkaProducerConfigMethods +import com.twitter.finatra.kafka.utils.BootstrapServerUtils +import com.twitter.finatra.kafkastreams.domain.ProcessingGuarantee +import com.twitter.util.{Duration, StorageUnit} +import java.util.Properties +import org.apache.kafka.common.metrics.Sensor.RecordingLevel +import org.apache.kafka.common.security.auth.SecurityProtocol +import org.apache.kafka.streams.StreamsConfig + +/** + * Builder for setting various [[StreamsConfig]] parameters, see that class for documentation on + * each parameter. + */ +class KafkaStreamsConfig( + streamsConfigMap: Map[String, String] = Map.empty, + producerConfigMap: Map[String, String] = Map.empty, + consumerConfigMap: Map[String, String] = Map.empty) + extends KafkaConfigMethods[KafkaStreamsConfig] + with ToKafkaProperties { + override protected def fromConfigMap(configMap: Map[String, String]): KafkaStreamsConfig = + new KafkaStreamsConfig(configMap, producerConfigMap, consumerConfigMap) + + override protected def configMap: Map[String, String] = streamsConfigMap + + val producer: KafkaProducerConfigMethods[KafkaStreamsConfig] = + new KafkaProducerConfigMethods[KafkaStreamsConfig] { + override protected def fromConfigMap(configMap: Map[String, String]): This = + new KafkaStreamsConfig(streamsConfigMap, configMap, consumerConfigMap) + override protected def configMap: Map[String, String] = producerConfigMap + } + + val consumer: KafkaConsumerConfigMethods[KafkaStreamsConfig] = + new KafkaConsumerConfigMethods[KafkaStreamsConfig] { + override protected def fromConfigMap(configMap: Map[String, String]): This = + new KafkaStreamsConfig(streamsConfigMap, producerConfigMap, configMap) + + override protected def configMap: Map[String, String] = consumerConfigMap + } + + override def properties: Properties = { + val streamsProperties = super.properties + + for ((k, v) <- producerConfigMap) { + streamsProperties.put(StreamsConfig.producerPrefix(k), v) + } + + for ((k, v) <- consumerConfigMap) { + streamsProperties.put(StreamsConfig.consumerPrefix(k), v) + } + + streamsProperties + } + + def dest(dest: String): This = bootstrapServers(BootstrapServerUtils.lookupBootstrapServers(dest)) + + def applicationId(appId: String): This = + withConfig(StreamsConfig.APPLICATION_ID_CONFIG, appId) + + def applicationServer(hostPort: String): This = + withConfig(StreamsConfig.APPLICATION_SERVER_CONFIG, hostPort) + + def bootstrapServers(servers: String): This = + withConfig(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, servers) + + def bufferedRecordsPerPartition(records: Int): This = + withConfig(StreamsConfig.BUFFERED_RECORDS_PER_PARTITION_CONFIG, records.toString) + + def cacheMaxBuffering(storageUnit: StorageUnit): This = + withConfig(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, storageUnit) + + def clientId(clientId: String): This = + withConfig(StreamsConfig.CLIENT_ID_CONFIG, clientId) + + def commitInterval(duration: Duration): This = + withConfig(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, duration) + + def connectionsMaxIdle(duration: Duration): This = + withConfig(StreamsConfig.CONNECTIONS_MAX_IDLE_MS_CONFIG, duration) + + def defaultDeserializationExceptionHandler[T: Manifest]: This = + withClassName[T](StreamsConfig.DEFAULT_DESERIALIZATION_EXCEPTION_HANDLER_CLASS_CONFIG) + + def defaultKeySerde[T: Manifest]: This = + withClassName[T](StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG) + + def defaultTimestampExtractor[T: Manifest]: This = + withClassName[T](StreamsConfig.DEFAULT_TIMESTAMP_EXTRACTOR_CLASS_CONFIG) + + def defaultValueSerde[T: Manifest]: This = + withClassName[T](StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG) + + def numStandbyReplicas(threads: Int): This = + withConfig(StreamsConfig.NUM_STANDBY_REPLICAS_CONFIG, threads.toString) + + def numStreamThreads(threads: Int): This = + withConfig(StreamsConfig.NUM_STREAM_THREADS_CONFIG, threads.toString) + + def metadataMaxAge(duration: Duration): This = + withConfig(StreamsConfig.METADATA_MAX_AGE_CONFIG, duration) + + def metricReporter[T: Manifest]: This = + withClassName[T](StreamsConfig.METRIC_REPORTER_CLASSES_CONFIG) + + def metricsNumSamples(samples: Int): This = + withConfig(StreamsConfig.METRICS_NUM_SAMPLES_CONFIG, samples.toString) + + def metricsRecordingLevelConfig(recordingLevel: RecordingLevel): This = + withConfig(StreamsConfig.METRICS_RECORDING_LEVEL_CONFIG, recordingLevel.name) + + def metricsSampleWindow(duration: Duration): This = + withConfig(StreamsConfig.METRICS_SAMPLE_WINDOW_MS_CONFIG, duration) + + def partitionGrouper[T: Manifest]: This = + withClassName[T](StreamsConfig.PARTITION_GROUPER_CLASS_CONFIG) + + def poll(duration: Duration): This = + withConfig(StreamsConfig.POLL_MS_CONFIG, duration) + + def processingGuarantee(guarantee: ProcessingGuarantee): This = + withConfig(StreamsConfig.PROCESSING_GUARANTEE_CONFIG, guarantee.toString) + + def receiveBuffer(storageUnit: StorageUnit): This = + withConfig(StreamsConfig.RECEIVE_BUFFER_CONFIG, storageUnit) + + def reconnectBackoffMax(duration: Duration): This = + withConfig(StreamsConfig.RECONNECT_BACKOFF_MAX_MS_CONFIG, duration) + + def reconnectBackoff(duration: Duration): This = + withConfig(StreamsConfig.RECONNECT_BACKOFF_MS_CONFIG, duration) + + def replicationFactor(factor: Int): This = + withConfig(StreamsConfig.REPLICATION_FACTOR_CONFIG, factor.toString) + + def requestTimeout(duration: Duration): This = + withConfig(StreamsConfig.REQUEST_TIMEOUT_MS_CONFIG, duration) + + def retries(retries: Int): This = + withConfig(StreamsConfig.RETRIES_CONFIG, retries.toString) + + def retryBackoff(duration: Duration): This = + withConfig(StreamsConfig.RETRY_BACKOFF_MS_CONFIG, duration) + + def rocksDbConfigSetter[T: Manifest]: This = + withClassName[T](StreamsConfig.ROCKSDB_CONFIG_SETTER_CLASS_CONFIG) + + def securityProtocol(securityProtocol: SecurityProtocol): This = + withConfig(StreamsConfig.SECURITY_PROTOCOL_CONFIG, securityProtocol.name) + + def sendBuffer(storageUnit: StorageUnit): This = + withConfig(StreamsConfig.SEND_BUFFER_CONFIG, storageUnit) + + def stateCleanupDelay(duration: Duration): This = + withConfig(StreamsConfig.STATE_CLEANUP_DELAY_MS_CONFIG, duration) + + def stateDir(directory: String): This = + withConfig(StreamsConfig.STATE_DIR_CONFIG, directory) + + def windowStoreChangeLogAdditionalRetention(duration: Duration): This = + withConfig(StreamsConfig.WINDOW_STORE_CHANGE_LOG_ADDITIONAL_RETENTION_MS_CONFIG, duration) +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/dsl/FinatraDslSampling.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/dsl/FinatraDslSampling.scala new file mode 100644 index 0000000000..5f210f2160 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/dsl/FinatraDslSampling.scala @@ -0,0 +1,155 @@ +package com.twitter.finatra.kafkastreams.dsl + +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finatra.kafka.serde.ScalaSerdes +import com.twitter.finatra.kafkastreams.KafkaStreamsTwitterServer +import com.twitter.finatra.kafkastreams.internal.utils.sampling.{ + IndexedSampleKeySerde, + ReservoirSamplingTransformer +} +import com.twitter.finatra.streams.config.DefaultTopicConfig +import com.twitter.finatra.streams.flags.FinatraTransformerFlags +import com.twitter.finatra.streams.transformer.{FinatraTransformer, SamplingUtils} +import com.twitter.inject.Logging +import com.twitter.util.Duration +import org.apache.kafka.common.serialization.Serde +import org.apache.kafka.streams.StreamsBuilder +import org.apache.kafka.streams.scala.kstream.{KStream => KStreamS} +import org.apache.kafka.streams.state.Stores +import scala.reflect.ClassTag + +/** + * This trait adds reservoir sampling DSL methods to the Kafka Streams DSL + */ +trait FinatraDslSampling + extends KafkaStreamsTwitterServer + with FinatraTransformerFlags + with Logging { + + protected def streamsStatsReceiver: StatsReceiver + + protected def kafkaStreamsBuilder: StreamsBuilder + + implicit class FinatraKeyValueStream[K: ClassTag, V](inner: KStreamS[K, V]) { + + /** + * Counts and samples an attribute of a stream of records. + * + * This transformer uses two state stores: + * + * numCountsStore is a KeyValueStore[SampleKey, Long] which stores a SampleKey and the total + * number of times that SampleKey was seen. + * + * sampleStore is a KeyValueStore[IndexedSampleKey[SampleKey], SampleValue] which stores the + * samples themselves. The Key is an IndexedSampleKey, which is your sample key wrapped with an + * index of 0..sampleSize. The value is the SampleValue that you want to sample. + * + * Example: if you had a stream of Interaction(engagingUserId, engagementType) and you wanted a + * sample of users who performed each engagement type, then your sampleKey would be engagementType and + * your sampleValue would be userId. + * + * Incoming stream: + * (engagingUserId = 12, engagementType = Displayed) + * (engagingUserId = 100, engagementType = Favorited) + * (engagingUserId = 101, engagementType = Favorited) + * (engagingUserId = 12, engagementType = Favorited) + * + * This is what the numCountStore table would look like: + * Sample Key is EngagementType, + * + * |-----------|-------| + * | SampleKey | Count | + * |-----------|-------| + * | Displayed | 1 | + * | Favorited | 3 | + * |-----------|-------| + * + * + * This is what the sampleStore table would look like: + * SampleKey is EngagementType + * SampleValue is engaging user id + * + * |-----------------------------|-------------| + * | IndexedSampleKey[SampleKey] | SampleValue | + * |-----------------------------|-------------| + * | (Displayed, index = 0) | 12 | + * | (Favorited, index = 0) | 100 | + * | (Favorited, index = 1) | 101 | + * | (Favorited, index = 2) | 102 | + * |-----------------------------|-------------| + * + * + * If you want to reference the sample store(so that you can query it) the name of the store can + * be found by calling `SamplingUtils.getSampleStoreName(sampleName`. You can reference the + * name of the count store by calling `SamplingUtils.getNumCountsStoreName(sampleName)` + * + * *Note* This method will create the state stores for you. + * + * @param toSampleKey returns the key of the sample + * @param toSampleValue returns the type that you want to sample + * @param sampleSize the size of the sample + * @param expirationTime the amount of time after creation that a sample should be expired + * @param sampleName the name of the sample + * @param sampleKeySerde the serde for the SampleKey + * @param sampleValueSerde the serde for the SampleValue + * @tparam SampleValue the type of the SampleVaule + * + * @return a stream of SampleKey and SampleValue + */ + def sample[SampleKey: ClassTag, SampleValue: ClassTag]( + toSampleKey: (K, V) => SampleKey, + toSampleValue: (K, V) => SampleValue, + sampleSize: Int, + expirationTime: Option[Duration], + sampleName: String, + sampleKeySerde: Serde[SampleKey], + sampleValueSerde: Serde[SampleValue] + ): KStreamS[SampleKey, SampleValue] = { + + val countStoreName = SamplingUtils.getNumCountsStoreName(sampleName) + kafkaStreamsBuilder.addStateStore( + Stores + .keyValueStoreBuilder( + Stores.persistentKeyValueStore(countStoreName), + sampleKeySerde, + ScalaSerdes.Long + ) + .withLoggingEnabled(DefaultTopicConfig.FinatraChangelogConfig) + ) + + val sampleStoreName = SamplingUtils.getSampleStoreName(sampleName) + kafkaStreamsBuilder.addStateStore( + Stores + .keyValueStoreBuilder( + Stores.persistentKeyValueStore(sampleStoreName), + new IndexedSampleKeySerde(sampleKeySerde), + sampleValueSerde + ) + .withLoggingEnabled(DefaultTopicConfig.FinatraChangelogConfig) + ) + + val timerStoreName = SamplingUtils.getTimerStoreName(sampleName) + kafkaStreamsBuilder.addStateStore( + FinatraTransformer.timerStore(timerStoreName, sampleKeySerde) + ) + + val transformer = () => + new ReservoirSamplingTransformer[K, V, SampleKey, SampleValue]( + statsReceiver = streamsStatsReceiver, + toSampleKey = toSampleKey, + toSampleValue = toSampleValue, + sampleSize = sampleSize, + expirationTime = expirationTime, + countStoreName = countStoreName, + sampleStoreName = sampleStoreName, + timerStoreName = timerStoreName + ) + + inner.transform[SampleKey, SampleValue]( + transformer, + countStoreName, + sampleStoreName, + timerStoreName) + } + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/dsl/FinatraDslWindowedAggregations.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/dsl/FinatraDslWindowedAggregations.scala new file mode 100644 index 0000000000..7c8f9d4887 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/dsl/FinatraDslWindowedAggregations.scala @@ -0,0 +1,301 @@ +package com.twitter.finatra.kafkastreams.dsl + +import com.twitter.app.Flag +import com.twitter.conversions.storage._ +import com.twitter.conversions.time._ +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finatra.kafka.serde.ScalaSerdes +import com.twitter.finatra.kafkastreams.internal.ScalaStreamsImplicits +import com.twitter.finatra.kafkastreams.processors.FlushingAwareServer +import com.twitter.finatra.streams.flags.FinatraTransformerFlags +import com.twitter.finatra.streams.transformer.domain._ +import com.twitter.finatra.streams.transformer.{AggregatorTransformer, FinatraTransformer} +import com.twitter.inject.Logging +import com.twitter.util.Duration +import org.apache.kafka.common.config.TopicConfig.{ + CLEANUP_POLICY_COMPACT, + CLEANUP_POLICY_CONFIG, + CLEANUP_POLICY_DELETE, + DELETE_RETENTION_MS_CONFIG, + RETENTION_MS_CONFIG, + SEGMENT_BYTES_CONFIG +} +import org.apache.kafka.common.serialization.Serde +import org.apache.kafka.streams.scala.kstream.{KStream => KStreamS} +import org.apache.kafka.streams.state.Stores +import org.apache.kafka.streams.{KafkaStreams, StreamsBuilder} +import scala.collection.JavaConverters._ +import scala.reflect.ClassTag + +/** + * This trait adds Enhanced Windowed Aggregation DSL methods which offer additional control that + * is not included in the default Kafka Streams DSL + * + * Note: We extend FlushingAwareServer, because WindowStore flags are used by the AggregatorTransformer + * which requires us to be a "Flushing Aware" server. We plan to improve this coupling in the future. + */ +trait FinatraDslWindowedAggregations + extends FlushingAwareServer + with FinatraTransformerFlags + with ScalaStreamsImplicits + with Logging { + + protected val windowSize = flag("window.size", 1.hour, "Window size") + protected val emitOnClose = flag("emit.on.close", false, "Emit records on window close") + protected val emitUpdatedEntriesOnCommit = + flag("emit.updated.entries.on.commit", false, "Emit updated entries on commit interval") + + protected val queryableAfterClose = flag( + "queryable.after.close", + 0.minutes, + "Time for window entries to remain queryable after the window closes") + + protected val allowedLateness = flag("allowed.lateness", 5.minutes, "Allowed lateness") + + protected def kafkaStreams: KafkaStreams + + protected def streamsStatsReceiver: StatsReceiver + + protected def kafkaStreamsBuilder: StreamsBuilder + + protected def commitInterval: Flag[Duration] + + implicit class FinatraKeyValueStream[K: ClassTag, V](inner: KStreamS[K, V]) { + + /** + * For each unique key, aggregate the values in the stream that occurred within a given time window + * and store those values in a StateStore named stateStore. + * + * *Note* This method will create the state stores for you. + * + * A TimeWindow is a tumbling window of fixed length defined by the windowSize parameter. + * + * A Window is closed after event time passes the end of a TimeWindow + allowedLateness. + * + * After a window is closed, if emitOnClose=true it is forwarded out of this transformer with a + * [[WindowedValue.resultState]] of [[com.twitter.finatra.streams.transformer.domain.WindowClosed]] + * + * If a record arrives after a window is closed it is immediately forwarded out of this + * transformer with a [[WindowedValue.resultState]] of [[com.twitter.finatra.streams.transformer.domain.Restatement]] + * + * @param stateStore the name of the StateStore used to maintain the counts. + * @param windowSize splits the stream of data into buckets of data of windowSize, + * based on the timestamp of each message. + * @param allowedLateness allow messages that are upto this amount late to be added to the + * store, otherwise they are emitted as restatements. + * @param queryableAfterClose allow state to be queried upto this amount after the window is closed. + * @param keySerde Serde for the keys in the StateStore. + * @param aggregateSerde Serde for the aggregation type + * @param initializer Initializer function that computes an initial intermediate aggregation result + * @param aggregator Aggregator function that computes a new aggregate result + * @param windowStart Function to determine the window start time given the message time, key, + * and value. If not set, the default window start is calculated using the + * message time and the window size. + * @param emitOnClose Emit messages for each entry in the window when the window close. Emitted + * entries will have a WindowResultType set to WindowClosed. + * @param emitUpdatedEntriesOnCommit Emit messages for each updated entry in the window on the Kafka + * Streams commit interval. Emitted entries will have a + * WindowResultType set to WindowOpen. + * @param windowSizeRetentionMultiplier A multiplier on top of the windowSize to ensure data is + * not deleted from the changelog prematurely. Allows for clock drift. Default is 2 + * + * @return a stream of Keys for a particular timewindow, and the aggregation of the values for that key + * within a particular timewindow. + */ + def aggregate[Aggregate]( + stateStore: String, + windowSize: Duration, + allowedLateness: Duration, + queryableAfterClose: Duration, + keySerde: Serde[K], + aggregateSerde: Serde[Aggregate], + initializer: () => Aggregate, + aggregator: ((K, V), Aggregate) => Aggregate, + windowStart: (Time, K, V) => Long = null, + emitOnClose: Boolean = true, + emitUpdatedEntriesOnCommit: Boolean = false, + windowSizeRetentionMultiplier: Int = 2 + ): KStreamS[TimeWindowed[K], WindowedValue[Aggregate]] = { + val windowedKeySerde = FixedTimeWindowedSerde(inner = keySerde, duration = windowSize) + + val aggregateStore = Stores + .keyValueStoreBuilder( + Stores.persistentKeyValueStore(stateStore), + windowedKeySerde, + aggregateSerde) + .withLoggingEnabled(Map( + CLEANUP_POLICY_CONFIG -> (CLEANUP_POLICY_COMPACT + ", " + CLEANUP_POLICY_DELETE), + SEGMENT_BYTES_CONFIG -> 100.megabytes.inBytes.toString, + RETENTION_MS_CONFIG -> (windowSizeRetentionMultiplier * windowSize.inMillis).toString, + //configure delete retention such that standby replicas have 5 minutes to read deletes + DELETE_RETENTION_MS_CONFIG -> 5.minutes.inMillis.toString + ).asJava) + + val timerStore = FinatraTransformer.timerStore( + name = s"$stateStore-TimerStore", + timerKeySerde = ScalaSerdes.Long) + + debug(s"Add $aggregateStore") + kafkaStreamsBuilder.addStateStore(aggregateStore) + + debug(s"Add $timerStore") + kafkaStreamsBuilder.addStateStore(timerStore) + + val transformerSupplier = () => + new AggregatorTransformer[K, V, Aggregate]( + commitInterval = commitInterval(), + statsReceiver = streamsStatsReceiver, + stateStoreName = stateStore, + timerStoreName = timerStore.name, + windowSize = windowSize, + allowedLateness = allowedLateness, + queryableAfterClose = queryableAfterClose, + initializer = initializer, + aggregator = aggregator, + customWindowStart = windowStart, + emitOnClose = emitOnClose, + emitUpdatedEntriesOnCommit = emitUpdatedEntriesOnCommit) + + inner.transform(transformerSupplier, stateStore, timerStore.name) + } + } + + /* ---------------------------------------- */ + implicit class FinatraKStream[K: ClassTag](inner: KStreamS[K, Int]) extends Logging { + + /** + * For each unique key, sum the values in the stream that occurred within a given time window + * and store those values in a StateStore named stateStore. + * + * *Note* This method will create the state stores for you. + * + * A TimeWindow is a tumbling window of fixed length defined by the windowSize parameter. + * + * A Window is closed after event time passes the end of a TimeWindow + allowedLateness. + * + * After a window is closed, if emitOnClose=true it is forwarded out of this transformer with a + * [[WindowedValue.resultState]] of [[com.twitter.finatra.streams.transformer.domain.WindowClosed]] + * + * If a record arrives after a window is closed it is immediately forwarded out of this + * transformer with a [[WindowedValue.resultState]] of [[com.twitter.finatra.streams.transformer.domain.Restatement]] + * + * @param stateStore the name of the StateStore used to maintain the counts. + * @param windowSize splits the stream of data into buckets of data of windowSize, + * based on the timestamp of each message. + * @param allowedLateness allow messages that are upto this amount late to be added to the + * store, otherwise they are emitted as restatements. + * @param queryableAfterClose allow state to be queried upto this amount after the window is closed. + * @param keySerde Serde for the keys in the StateStore. + * @param emitOnClose Emit messages for each entry in the window when the window close. Emitted + * entries will have a WindowResultType set to WindowClosed. + * @param emitUpdatedEntriesOnCommit Emit messages for each updated entry in the window on the Kafka + * Streams commit interval. Emitted entries will have a + * WindowResultType set to WindowOpen. + * @param windowSizeRetentionMultiplier A multiplier on top of the windowSize to ensure data is not deleted from the changelog prematurely. Allows for clock drift. Default is 2 + * @return a stream of Keys for a particular timewindow, and the sum of the values for that key + * within a particular timewindow. + */ + def sum( + stateStore: String, + windowSize: Duration, + allowedLateness: Duration, + queryableAfterClose: Duration, + keySerde: Serde[K], + emitUpdatedEntriesOnCommit: Boolean = false, + emitOnClose: Boolean = true, + windowSizeRetentionMultiplier: Int = 2 + ): KStreamS[TimeWindowed[K], WindowedValue[Int]] = { + inner.aggregate( + stateStore = stateStore, + windowSize = windowSize, + allowedLateness = allowedLateness, + queryableAfterClose = queryableAfterClose, + keySerde = keySerde, + aggregateSerde = ScalaSerdes.Int, + initializer = () => 0, + aggregator = { + case ((key: K, incrCount: Int), aggregateCount: Int) => + aggregateCount + incrCount + }, + emitOnClose = emitOnClose, + emitUpdatedEntriesOnCommit = emitUpdatedEntriesOnCommit, + windowSizeRetentionMultiplier = windowSizeRetentionMultiplier + ) + } + } + + /* ---------------------------------------- */ + implicit class FinatraKeyToWindowedValueStream[ + K: ClassTag, + TimeWindowedType <: TimeWindowed[Int] + ](inner: KStreamS[K, TimeWindowedType]) + extends Logging { + + /** + * For each unique key, sum the TimeWindowed values in the stream that occurred within the + * TimeWindowed window and store those values in a StateStore named stateStore. + * + * *Note* This method will create the state stores for you. + * + * A TimeWindow is a tumbling window of fixed length defined by the windowSize parameter. + * + * A Window is closed after event time passes the end of a TimeWindow + allowedLateness. + * + * After a window is closed, if emitOnClose=true it is forwarded out of this transformer with a + * [[WindowedValue.resultState]] of [[com.twitter.finatra.streams.transformer.domain.WindowClosed]] + * + * If a record arrives after a window is closed it is immediately forwarded out of this + * transformer with a [[WindowedValue.resultState]] of [[com.twitter.finatra.streams.transformer.domain.Restatement]] + * + * @param stateStore the name of the StateStore used to maintain the counts. + * @param windowSize splits the stream of data into buckets of data of windowSize, + * based on the timestamp of each message. + * @param allowedLateness allow messages that are upto this amount late to be added to the + * store, otherwise they are emitted as restatements. + * @param queryableAfterClose allow state to be queried upto this amount after the window is closed. + * @param keySerde Serde for the keys in the StateStore. + * @param emitOnClose Emit messages for each entry in the window when the window close. Emitted + * entries will have a WindowResultType set to WindowClosed. + * @param emitUpdatedEntriesOnCommit Emit messages for each updated entry in the window on the Kafka + * Streams commit interval. Emitted entries will have a + * WindowResultType set to WindowOpen. + * @param windowSizeRetentionMultiplier A multiplier on top of the windowSize to ensure data is not deleted from the changelog prematurely. Allows for clock drift. Default is 2 + * @return a stream of Keys for a particular timewindow, and the sum of the values for that key + * within a particular timewindow. + */ + def sum( + stateStore: String, + allowedLateness: Duration, + queryableAfterClose: Duration, + emitOnClose: Boolean, + windowSize: Duration, + keySerde: Serde[K], + emitUpdatedEntriesOnCommit: Boolean = false, + windowSizeRetentionMultiplier: Int = 2 + ): KStreamS[TimeWindowed[K], WindowedValue[Int]] = { + val windowSizeMillis = windowSize.inMillis + + FinatraKeyValueStream(inner).aggregate( + stateStore = stateStore, + windowSize = windowSize, + allowedLateness = allowedLateness, + queryableAfterClose = queryableAfterClose, + keySerde = keySerde, + aggregateSerde = ScalaSerdes.Int, + initializer = () => 0, + aggregator = { + case ((key: K, windowedIncrCount: TimeWindowed[Int]), aggregateCount: Int) => + aggregateCount + windowedIncrCount.value + }, + windowStart = { + case (time, key, timeWindowedCount) => + assert(timeWindowedCount.sizeMillis == windowSizeMillis) + timeWindowedCount.startMs + }, + emitOnClose = emitOnClose, + emitUpdatedEntriesOnCommit = emitUpdatedEntriesOnCommit, + windowSizeRetentionMultiplier = windowSizeRetentionMultiplier + ) + } + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/internal/ScalaStreamsImplicits.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/internal/ScalaStreamsImplicits.scala new file mode 100644 index 0000000000..f0d53568dc --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/internal/ScalaStreamsImplicits.scala @@ -0,0 +1,95 @@ +package com.twitter.finatra.kafkastreams.internal + +import org.apache.kafka.streams.kstream.{Transformer, TransformerSupplier, KStream => KStreamJ} +import org.apache.kafka.streams.processor.ProcessorContext +import org.apache.kafka.streams.scala.kstream.{KStream => KStreamS} +import org.apache.kafka.streams.scala.{StreamsBuilder => StreamsBuilderS} +import org.apache.kafka.streams.{KeyValue, StreamsBuilder => StreamsBuilderJ} +import scala.language.implicitConversions + +/** + * Implicit conversions to enhance the Scala Kafka Streams DSL + */ +trait ScalaStreamsImplicits { + + /* ---------------------------------------- */ + implicit class StreamsBuilderConversions(streamsBuilder: StreamsBuilderJ) { + + def asScala: StreamsBuilderS = { + new StreamsBuilderS(streamsBuilder) + } + } + + /* ---------------------------------------- */ + implicit class KStreamJConversions[K, V](kStream: KStreamJ[K, V]) { + + def asScala: KStreamS[K, V] = { + new KStreamS(kStream) + } + } + + /* ---------------------------------------- */ + // Helper until we move to Scala 2.12 which will use SAM conversion to implement the TransformerSupplier interface + implicit def transformerFunctionToSupplier[K, V, K1, V1]( + transformerFactory: () => Transformer[K, V, (K1, V1)] + ): TransformerSupplier[K, V, KeyValue[K1, V1]] = { + new TransformerSupplier[K, V, KeyValue[K1, V1]] { + override def get(): Transformer[K, V, KeyValue[K1, V1]] = { + new Transformer[K, V, KeyValue[K1, V1]] { + private val transformer = transformerFactory() + + override def init(context: ProcessorContext): Unit = { + transformer.init(context) + } + + override def transform(key: K, value: V): KeyValue[K1, V1] = { + transformer.transform(key, value) match { + case (k1, v1) => KeyValue.pair(k1, v1) + case _ => null + } + } + + override def close(): Unit = { + transformer.close() + } + } + } + } + } + + /* ---------------------------------------- */ + implicit class KStreamSConversions[K, V](inner: KStreamS[K, V]) { + + // Helper until we move to Scala 2.12 which will use SAM conversion to implement the TransformerSupplier interface + def transformS[K1, V1]( + transformerFactory: () => Transformer[K, V, (K1, V1)], + stateStoreNames: String* + ): KStreamS[K1, V1] = { + val transformerSupplierJ: TransformerSupplier[K, V, KeyValue[K1, V1]] = + new TransformerSupplier[K, V, KeyValue[K1, V1]] { + override def get(): Transformer[K, V, KeyValue[K1, V1]] = { + val transformer = transformerFactory() + new Transformer[K, V, KeyValue[K1, V1]] { + override def transform(key: K, value: V): KeyValue[K1, V1] = { + transformer.transform(key, value) match { + case (k1, v1) => KeyValue.pair(k1, v1) + case _ => null + } + } + + override def init(context: ProcessorContext): Unit = transformer.init(context) + + override def close(): Unit = transformer.close() + } + } + } + new KStreamS(inner.inner.transform(transformerSupplierJ, stateStoreNames: _*)) + } + + def filterValues(f: V => Boolean): KStreamS[K, V] = { + inner.filter { (key: K, value: V) => + f(value) + } + } + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/internal/listeners/FinatraStateRestoreListener.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/internal/listeners/FinatraStateRestoreListener.scala new file mode 100644 index 0000000000..b52d2a2308 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/internal/listeners/FinatraStateRestoreListener.scala @@ -0,0 +1,47 @@ +package com.twitter.finatra.kafkastreams.internal.listeners + +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.util.logging.Logging +import org.apache.kafka.common.TopicPartition +import org.apache.kafka.streams.processor.StateRestoreListener + +class FinatraStateRestoreListener( + statsReceiver: StatsReceiver) //TODO: Add stats for restoration (e.g. total time) + extends StateRestoreListener + with Logging { + + override def onRestoreStart( + topicPartition: TopicPartition, + storeName: String, + startingOffset: Long, + endingOffset: Long + ): Unit = { + val upToRecords = endingOffset - startingOffset + info( + s"${storeAndPartition(storeName, topicPartition)} start restoring up to $upToRecords records from $startingOffset to $endingOffset" + ) + } + + override def onBatchRestored( + topicPartition: TopicPartition, + storeName: String, + batchEndOffset: Long, + numRestored: Long + ): Unit = { + trace(s"Restored $numRestored records for ${storeAndPartition(storeName, topicPartition)}") + } + + override def onRestoreEnd( + topicPartition: TopicPartition, + storeName: String, + totalRestored: Long + ): Unit = { + info( + s"${storeAndPartition(storeName, topicPartition)} finished restoring $totalRestored records" + ) + } + + private def storeAndPartition(storeName: String, topicPartition: TopicPartition) = { + s"$storeName topic ${topicPartition.topic}_${topicPartition.partition}" + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/internal/serde/AvoidDefaultSerde.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/internal/serde/AvoidDefaultSerde.scala new file mode 100644 index 0000000000..dfdef92963 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/internal/serde/AvoidDefaultSerde.scala @@ -0,0 +1,22 @@ +package com.twitter.finatra.kafkastreams.internal.serde + +import java.util +import org.apache.kafka.common.serialization.{Deserializer, Serde, Serializer} + +class AvoidDefaultSerde extends Serde[Object] { + + private val exceptionErrorStr = "should be avoided as they are error prone and often result in confusing error messages. " + + "Instead, explicitly specify your serdes. See https://kafka.apache.org/10/documentation/streams/developer-guide/datatypes.html#overriding-default-serdes" + + override def configure(configs: util.Map[String, _], isKey: Boolean): Unit = {} + + override def deserializer(): Deserializer[Object] = { + throw new Exception(s"Default Deserializer's $exceptionErrorStr") + } + + override def serializer(): Serializer[Object] = { + throw new Exception(s"Default Serializer's $exceptionErrorStr") + } + + override def close(): Unit = {} +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/internal/stats/KafkaStreamsFinagleMetricsReporter.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/internal/stats/KafkaStreamsFinagleMetricsReporter.scala new file mode 100644 index 0000000000..ca758ac3ad --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/internal/stats/KafkaStreamsFinagleMetricsReporter.scala @@ -0,0 +1,216 @@ +package com.twitter.finatra.kafkastreams.internal.stats + +import com.twitter.finatra.kafka.stats.KafkaFinagleMetricsReporter +import java.util +import org.apache.kafka.clients.CommonClientConfigs +import org.apache.kafka.common.MetricName +import org.apache.kafka.common.metrics.Sensor.RecordingLevel +import org.apache.kafka.common.metrics.KafkaMetric + +object KafkaStreamsFinagleMetricsReporter { + + /** + * These metrics are derived from all Sensors configured at the DEBUG RecordingLevel in Kafka Streams version 2.0.0 + */ + val debugMetrics = Set( + /** Defined in: [[org.apache.kafka.streams.processor.internals.ProcessorNode]] */ + "create-latency-avg", + "create-latency-max", + "create-rate", + "create-total", + "destroy-latency-avg", + "destroy-latency-max", + "destroy-rate", + "destroy-total", + "forward-latency-avg", + "forward-latency-max", + "forward-rate", + "forward-total", + "process-latency-avg", + "process-latency-max", + "process-rate", + "process-total", + "punctuate-latency-avg", + "punctuate-latency-max", + "punctuate-rate", + "punctuate-total", + /** Defined in: [[org.apache.kafka.streams.processor.internals.StreamTask]] */ + "commit-latency-avg", + "commit-latency-max", + "commit-rate", + "commit-total", + /** Defined in: + * [[org.apache.kafka.streams.state.internals.InnerMeteredKeyValueStore]] + * [[org.apache.kafka.streams.state.internals.MeteredSessionStore]] + * [[org.apache.kafka.streams.state.internals.MeteredWindowStore]] + */ + "all-latency-avg", + "all-latency-max", + "all-rate", + "all-total", + "delete-latency-avg", + "delete-latency-max", + "delete-rate", + "delete-total", + "flush-latency-avg", + "flush-latency-max", + "flush-rate", + "flush-total", + "get-latency-avg", + "get-latency-max", + "get-rate", + "get-total", + "put-latency-avg", + "put-latency-max", + "put-rate", + "put-total", + "put-all-latency-avg", + "put-all-latency-max", + "put-all-rate", + "put-all-total", + "put-if-absent-latency-avg", + "put-if-absent-latency-max", + "put-if-absent-rate", + "put-if-absent-total", + "range-latency-avg", + "range-latency-max", + "range-rate", + "range-total", + "restore-latency-avg", + "restore-latency-max", + "restore-rate", + "restore-total", + /** Defined in: [[org.apache.kafka.streams.state.internals.NamedCache]] */ + "hitRatio-avg", + "hitRatio-min", + "hitRatio-max" + ) + + private val rateMetricsToIgnore = Set( + "commit-rate", + "poll-rate", + "process-rate", + "punctuate-rate", + "skipped-records-rate", + "task-closed-rate", + "task-created-rate" + ) + + /** + * Disables "noisy", UUID-filled, GlobalTable metrics of the form: + * ''kafka/$applicationId_$UUID_{GlobalStreamThread, global_consumer}/...'' + * These metrics are derived from the GlobalStreamThread using the global consumer configuration found in Kafka Streams version 2.0.0 + */ + private val globalTableClientIdPatterns = Set("global-consumer", "GlobalStreamThread") +} + +/** + * Kafka-Streams specific MetricsReporter which adds some additional logic on top of the metrics + * reporter used for Kafka consumers and producers + */ +class KafkaStreamsFinagleMetricsReporter extends KafkaFinagleMetricsReporter { + + private var includeProcessorNodeId = false + private var includeGlobalTableMetrics = false + private var recordingLevel: RecordingLevel = RecordingLevel.INFO + + override def configure(configs: util.Map[String, _]): Unit = { + super.configure(configs) + + includeProcessorNodeId = + Option(configs.get("includeProcessorNodeId")).getOrElse("false").toString.toBoolean + includeGlobalTableMetrics = + Option(configs.get("includeGlobalTableMetrics")).getOrElse("false").toString.toBoolean + recordingLevel = RecordingLevel.forName( + Option(configs.get(CommonClientConfigs.METRICS_RECORDING_LEVEL_CONFIG)) + .getOrElse("INFO").toString) + } + + override protected def shouldIncludeMetric(metric: KafkaMetric): Boolean = { + val metricName = metric.metricName() + + if (isDebugMetric(metricName) && (recordingLevel != RecordingLevel.DEBUG)) { + false + } else if (KafkaStreamsFinagleMetricsReporter.rateMetricsToIgnore(metricName.name())) { // remove any metrics that are already "rated" as these not consistent with other stats: http://go/jira/DINS-2187 + false + } else if (isGlobalTableMetric(metricName)) { + includeGlobalTableMetrics + } else { + super.shouldIncludeMetric(metric) + } + } + + private val clientIdStreamThreadPattern = """.*StreamThread-(\d+)(.*)""".r + + override protected def parseComponent(clientId: String, group: String): String = { + val groupComponent = group match { + case "consumer-metrics" => "" + case "consumer-coordinator-metrics" => "" + case "producer-metrics" => "" + case "kafka-client-metrics" => "" + case "consumer-fetch-manager-metrics" => "fetch" + case "producer-topic-metrics" => "" + case "admin-client-metrics" => "" + case "stream-metrics" => "stream" + case "stream-task-metrics" => "stream" + case "stream-rocksdb-state-metrics" => "stream/rocksdb" + case "stream-rocksdb-window-metrics" => "stream/rocksdb_window" + case "stream-in-memory-state-metrics" => "stream/in-memory" + case "stream-record-cache-metrics" => "stream/record-cache" + case _ => + debug("Dropping Metric Component: " + group) + "" + } + + if (clientId == null || clientId.isEmpty) { + "" + } else if (clientIdStreamThreadPattern.findFirstIn(clientId).isDefined) { + val clientIdStreamThreadPattern(threadNumber, clientIdComponent) = clientId + if (clientIdComponent.isEmpty) { + "thread" + threadNumber + "/" + groupComponent + } else { + "thread" + threadNumber + "/" + clientIdComponent.stripPrefix("-") + } + } else { + clientId + } + } + + override protected def createFinagleMetricName( + metric: KafkaMetric, + metricName: String, + allTags: java.util.Map[String, String], + component: String, + nodeId: String, + topic: String + ): String = { + val taskId = Option(allTags.remove("task-id")).map("/" + _).getOrElse("") + val processorNodeId = + if (!includeProcessorNodeId) "" + else Option(allTags.remove("processor-node-id")).map("/" + _).getOrElse("") + val partition = parsePartitionTag(allTags) + val rocksDbStateId = Option(allTags.remove("rocksdb-state-id")).map("/" + _).getOrElse("") + val rocksDbWindowId = Option(allTags.remove("rocksdb-window-id")).map("/" + _).getOrElse("") + val inMemWindowId = Option(allTags.remove("in-mem-window-id")).map("/" + _).getOrElse("") + val inMemoryStateId = Option(allTags.remove("in-memory-state-id")).map("/" + _).getOrElse("") + val recordCacheId = Option(allTags.remove("record-cache-id")).map("/" + _).getOrElse("") + + val otherTagsStr = createOtherTagsStr(metric, allTags) + + component + taskId + rocksDbStateId + rocksDbWindowId + inMemWindowId + inMemoryStateId + + recordCacheId + topic + partition + otherTagsStr + nodeId + processorNodeId + "/" + metricName + } + + private def isDebugMetric(metricName: MetricName): Boolean = { + KafkaStreamsFinagleMetricsReporter.debugMetrics.contains(metricName.name) + } + + private def isGlobalTableMetric(metricName: MetricName): Boolean = { + val clientId = metricName.tags.get("client-id") + if (clientId != null) { + KafkaStreamsFinagleMetricsReporter.globalTableClientIdPatterns.exists(clientId.contains) + } else { + false + } + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/internal/stats/RocksDBStatsCallback.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/internal/stats/RocksDBStatsCallback.scala new file mode 100644 index 0000000000..311df2e538 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/internal/stats/RocksDBStatsCallback.scala @@ -0,0 +1,163 @@ +package com.twitter.finatra.kafkastreams.internal.stats + +import com.twitter.finagle.stats.{Gauge, StatsReceiver} +import com.twitter.inject.Logging +import java.util.concurrent.atomic.AtomicLong +import org.rocksdb.{HistogramData, HistogramType, StatisticsCollectorCallback, TickerType} +import scala.collection.mutable.{Map => MutableMap} + +/** + * Implements the callback statistics collection and reporting for RocksDB Statistics. + * + * All ticker stats are cumulative since process start. + * Histograms measures distribution of a stat across all operations. + * + * Stats are scoped to "rocksdb/statistics". + * + * For more information see: + * https://github.com/facebook/rocksdb/wiki/Statistics + * https://github.com/facebook/rocksdb/blob/master/include/rocksdb/statistics.h + */ +class RocksDBStatsCallback(statsReceiver: StatsReceiver) + extends StatisticsCollectorCallback + with Logging { + + /** + * Root scope for statistics. + */ + private val statisticsStatsScope: StatsReceiver = statsReceiver.scope("rocksdb", "statistics") + + /** + * List of ignored ticker types that will not be included in exported stats. + */ + private val ignoredTickerTypes: Seq[TickerType] = Seq( + TickerType.STALL_L0_SLOWDOWN_MICROS, + TickerType.STALL_MEMTABLE_COMPACTION_MICROS, + TickerType.STALL_L0_NUM_FILES_MICROS, + TickerType.TICKER_ENUM_MAX + ) + + /** + * List of ignored histogram types that will not be included in exported stats. + */ + private val ignoredHistogramTypes: Seq[HistogramType] = Seq( + HistogramType.HISTOGRAM_ENUM_MAX + ) + + /** + * Allowed ticker types used for stats collection. + */ + private val allowedTickerTypes: Seq[TickerType] = TickerType.values.filter(isAllowedTickerType) + + /** + * Allowed histogram types used for stats collection. + */ + private val allowedHistogramTypes: Seq[HistogramType] = + HistogramType.values.filter(isAllowedHistogramType) + + /** + * Most recent counter values from TickerType to Value. + */ + private val mostRecentCounterValues: Map[TickerType, AtomicLong] = allowedTickerTypes.map { + tickerType => + tickerType -> new AtomicLong(0) + }.toMap + + /** + * Ticker gauges. Note we need to hold a strong reference to prevent GC from collecting away the gauge. + */ + private val tickerGauges = allowedTickerTypes.map { tickerType => + statisticsStatsScope.addGauge(tickerTypeName(tickerType)) { + mostRecentCounterValues.get(tickerType).map(_.floatValue()).getOrElse(0f) + } + } + + /** + * Updatable histogram cache. + */ + private val histogramCache: MutableMap[String, Float] = MutableMap.empty[String, Float] + + /** + * Histogram suffixes + */ + private val histogramSuffixes = Seq("avg", "std", "p50", "p95", "p99") + + /** + * Histogram gauges. Note we need to hold a strong reference to prevent GC from collecting away the gauge. + */ + private val histogramGauges: Seq[Gauge] = allowedHistogramTypes.flatMap { histogramType => + val prefix = histogramTypeName(histogramType) + "_" + for { + suffix <- histogramSuffixes + } yield { + // NOTE: We use Gauge here because these metrics are coming out of RocksDB already as percentiles: + // we don't have the raw data. Hence we use Gauge which will avoid a percentile-of-percentile + // that we'll get if we used Metric. + val statName = prefix + suffix + statisticsStatsScope.addGauge(statName) { + histogramCache.getOrElse(statName, 0f) + } + } + } + + /** + * Callback that updates counters by ticker count for given ticker type + */ + override def tickerCallback(tickerType: TickerType, tickerCount: Long): Unit = { + if (isAllowedTickerType(tickerType)) { + mostRecentCounterValues.get(tickerType).foreach(_.set(tickerCount)) + } + } + + /** + * Callback that updates histogram cache for gauges + */ + override def histogramCallback( + histogramType: HistogramType, + histogramData: HistogramData + ): Unit = { + if (isAllowedHistogramType(histogramType)) { + val prefix = histogramTypeName(histogramType) + "_" + histogramCache.update(prefix + "avg", histogramData.getAverage.toFloat) + histogramCache.update(prefix + "std", histogramData.getStandardDeviation.toFloat) + histogramCache.update(prefix + "p50", histogramData.getMedian.toFloat) + histogramCache.update(prefix + "p95", histogramData.getPercentile95.toFloat) + histogramCache.update(prefix + "p99", histogramData.getPercentile99.toFloat) + } + } + + /** + * Accessor for this callback's ticker values. + */ + def tickerValues: Map[TickerType, Long] = { + mostRecentCounterValues.mapValues(_.get()) + } + + /** + * Simplified ticker name. + */ + private def tickerTypeName(tickerType: TickerType): String = { + tickerType.name().toLowerCase + } + + /** + * Simplified histogram name. + */ + private def histogramTypeName(histogramType: HistogramType): String = { + histogramType.name().toLowerCase + } + + /** + * Returns true if ticker type is allowed to be included in stats, false otherwise. + */ + private def isAllowedTickerType(tickerType: TickerType): Boolean = { + !ignoredTickerTypes.contains(tickerType) + } + + /** + * Returns true if histogram type is allowed to be included in stats, false otherwise. + */ + private def isAllowedHistogramType(histogramType: HistogramType): Boolean = { + !ignoredHistogramTypes.contains(histogramType) + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/internal/utils/FinatraDslV2Implicits.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/internal/utils/FinatraDslV2Implicits.scala new file mode 100644 index 0000000000..0bc4ea2550 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/internal/utils/FinatraDslV2Implicits.scala @@ -0,0 +1,254 @@ +package com.twitter.finatra.kafkastreams.internal.utils + +import com.twitter.app.Flag +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finatra.kafka.serde.ScalaSerdes +import com.twitter.finatra.kafkastreams.internal.ScalaStreamsImplicits +import com.twitter.finatra.streams.config.DefaultTopicConfig +import com.twitter.finatra.streams.transformer.domain.{ + CompositeKey, + FixedTimeWindowedSerde, + TimeWindowed, + WindowedValue +} +import com.twitter.finatra.streams.transformer.{ + CompositeSumAggregator, + FinatraTransformer, + SumAggregator +} +import com.twitter.inject.Logging +import com.twitter.util.Duration +import org.apache.kafka.common.serialization.Serde +import org.apache.kafka.streams.kstream.Transformer +import org.apache.kafka.streams.scala.kstream.{KStream => KStreamS} +import org.apache.kafka.streams.state.Stores +import org.apache.kafka.streams.{KafkaStreams, StreamsBuilder} +import scala.reflect.ClassTag + +@deprecated("Use FinatraDslWindowedAggregations", "1/7/2019") +trait FinatraDslV2Implicits extends ScalaStreamsImplicits { + + protected def kafkaStreams: KafkaStreams + + protected def streamsStatsReceiver: StatsReceiver + + protected def kafkaStreamsBuilder: StreamsBuilder + + protected def commitInterval: Flag[Duration] + + /* ---------------------------------------- */ + implicit class FinatraKStream[K: ClassTag](inner: KStreamS[K, Int]) extends Logging { + + /** + * For each unique key, sum the values in the stream that occurred within a given time window + * and store those values in a StateStore named stateStore. + * + * A TimeWindow is a tumbling window of fixed length defined by the windowSize parameter. + * + * A Window is closed after event time passes the end of a TimeWindow + allowedLateness. + * + * After a window is closed it is forwarded out of this transformer with a + * [[WindowedValue.resultState]] of [[com.twitter.finatra.streams.transformer.domain.WindowClosed]] + * + * If a record arrives after a window is closed it is immediately forwarded out of this + * transformer with a [[WindowedValue.resultState]] of [[com.twitter.finatra.streams.transformer.domain.Restatement]] + * + * @param stateStore the name of the StateStore used to maintain the counts. + * @param windowSize splits the stream of data into buckets of data of windowSize, + * based on the timestamp of each message. + * @param allowedLateness allow messages that are upto this amount late to be added to the + * store, otherwise they are emitted as restatements. + * @param queryableAfterClose allow state to be queried upto this amount after the window is + * closed. + * @param keyRangeStart The minimum value that will be stored in the key based on binary sort order. + * @param keySerde Serde for the keys in the StateStore. + * @return a stream of Keys for a particular timewindow, and the sum of the values for that key + * within a particular timewindow. + */ + def sum( + stateStore: String, + windowSize: Duration, + allowedLateness: Duration, + queryableAfterClose: Duration, + keyRangeStart: K, + keySerde: Serde[K] + ): KStreamS[TimeWindowed[K], WindowedValue[Int]] = { + + kafkaStreamsBuilder.addStateStore( + Stores + .keyValueStoreBuilder( + Stores.persistentKeyValueStore(stateStore), + FixedTimeWindowedSerde(keySerde, windowSize), + ScalaSerdes.Int + ) + .withLoggingEnabled(DefaultTopicConfig.FinatraChangelogConfig) + ) + + //Note: The TimerKey is a WindowStartMs value used by MultiAttributeCountAggregator + val timerStore = FinatraTransformer.timerStore(s"$stateStore-TimerStore", ScalaSerdes.Long) + kafkaStreamsBuilder.addStateStore(timerStore) + + val transformerSupplier = () => + new SumAggregator[K, Int]( + commitInterval = commitInterval(), + keyRangeStart = keyRangeStart, + statsReceiver = streamsStatsReceiver, + stateStoreName = stateStore, + timerStoreName = timerStore.name(), + windowSize = windowSize, + allowedLateness = allowedLateness, + queryableAfterClose = queryableAfterClose, + countToAggregate = (key, count) => count, + windowStart = (messageTime, key, value) => + TimeWindowed.windowStart(messageTime, windowSize.inMillis) + ) + + inner.transform(transformerSupplier, stateStore, timerStore.name) + } + } + + /* ---------------------------------------- */ + implicit class FinatraKeyToWindowedValueStream[K, TimeWindowedType <: TimeWindowed[Int]]( + inner: KStreamS[K, TimeWindowedType]) + extends Logging { + + def sum( + stateStore: String, + allowedLateness: Duration, + queryableAfterClose: Duration, + emitOnClose: Boolean, + windowSize: Duration, + keyRangeStart: K, + keySerde: Serde[K] + ): KStreamS[TimeWindowed[K], WindowedValue[Int]] = { + kafkaStreamsBuilder.addStateStore( + Stores + .keyValueStoreBuilder( + Stores.persistentKeyValueStore(stateStore), + FixedTimeWindowedSerde(keySerde, windowSize), + ScalaSerdes.Int + ) + .withLoggingEnabled(DefaultTopicConfig.FinatraChangelogConfig) + ) + + //Note: The TimerKey is a WindowStartMs value used by MultiAttributeCountAggregator + val timerStore = FinatraTransformer.timerStore(s"$stateStore-TimerStore", ScalaSerdes.Long) + kafkaStreamsBuilder.addStateStore(timerStore) + + val transformerSupplier = ( + () => + new SumAggregator[K, TimeWindowed[Int]]( + commitInterval = commitInterval(), + keyRangeStart = keyRangeStart, + statsReceiver = streamsStatsReceiver, + stateStoreName = stateStore, + timerStoreName = timerStore.name(), + windowSize = windowSize, + allowedLateness = allowedLateness, + emitOnClose = emitOnClose, + queryableAfterClose = queryableAfterClose, + countToAggregate = (key, windowedValue) => windowedValue.value, + windowStart = (messageTime, key, windowedValue) => windowedValue.startMs + ) + ).asInstanceOf[() => Transformer[ + K, + TimeWindowedType, + (TimeWindowed[K], WindowedValue[Int])]] //Coerce TimeWindowed[Int] into TimeWindowedType :-/ + + inner + .transform(transformerSupplier, stateStore, timerStore.name) + } + } + + /* ---------------------------------------- */ + implicit class FinatraCompositeKeyKStream[CompositeKeyType <: CompositeKey[_, _]: ClassTag]( + inner: KStreamS[CompositeKeyType, Int]) + extends Logging { + + /** + * For each unique composite key, sum the values in the stream that occurred within a given time window. + * + * A composite key is a multi part key that can be efficiently range scanned using the + * primary key, or the primary key and the secondary key. + * + * A TimeWindow is a tumbling window of fixed length defined by the windowSize parameter. + * + * A Window is closed after event time passes the end of a TimeWindow + allowedLateness. + * + * After a window is closed it is forwarded out of this transformer with a + * [[WindowedValue.resultState]] of [[com.twitter.finatra.streams.transformer.domain.WindowClosed]] + * + * If a record arrives after a window is closed it is immediately forwarded out of this + * transformer with a [[WindowedValue.resultState]] of [[com.twitter.finatra.streams.transformer.domain.Restatement]] + * + * @param stateStore the name of the StateStore used to maintain the counts. + * @param windowSize splits the stream of data into buckets of data of windowSize, + * based on the timestamp of each message. + * @param allowedLateness allow messages that are upto this amount late to be added to the + * store, otherwise they are emitted as restatements. + * @param queryableAfterClose allow state to be queried upto this amount after the window is + * closed. + * @param emitOnClose whether or not to emit a record when the window is closed. + * @param compositeKeyRangeStart The minimum value that will be stored in the key based on binary sort order. + * @param compositeKeySerde serde for the composite key in the StateStore. + * @tparam PrimaryKey the type for the primary key + * @tparam SecondaryKey the type for the secondary key + * + * @return + */ + def compositeSum[PrimaryKey, SecondaryKey]( + stateStore: String, + windowSize: Duration, + allowedLateness: Duration, + queryableAfterClose: Duration, + emitOnClose: Boolean, + compositeKeyRangeStart: CompositeKey[PrimaryKey, SecondaryKey], + compositeKeySerde: Serde[CompositeKeyType] + ): KStreamS[TimeWindowed[PrimaryKey], WindowedValue[scala.collection.Map[SecondaryKey, Int]]] = { + + kafkaStreamsBuilder.addStateStore( + Stores + .keyValueStoreBuilder( + Stores.persistentKeyValueStore(stateStore), + FixedTimeWindowedSerde(compositeKeySerde, windowSize), + ScalaSerdes.Int + ) + .withLoggingEnabled(DefaultTopicConfig.FinatraChangelogConfig) + ) + + //Note: The TimerKey is a WindowStartMs value used by MultiAttributeCountAggregator + val timerStore = FinatraTransformer.timerStore(s"$stateStore-TimerStore", ScalaSerdes.Long) + kafkaStreamsBuilder.addStateStore(timerStore) + + val transformerSupplier = ( + () => + new CompositeSumAggregator[ + PrimaryKey, + SecondaryKey, + CompositeKey[ + PrimaryKey, + SecondaryKey + ]]( + commitInterval = commitInterval(), + compositeKeyRangeStart = compositeKeyRangeStart, + statsReceiver = streamsStatsReceiver, + stateStoreName = stateStore, + timerStoreName = timerStore.name(), + windowSize = windowSize, + allowedLateness = allowedLateness, + queryableAfterClose = queryableAfterClose, + emitOnClose = emitOnClose + ) + ).asInstanceOf[() => Transformer[ + CompositeKeyType, + Int, + (TimeWindowed[PrimaryKey], WindowedValue[scala.collection.Map[SecondaryKey, Int]]) + ]] //Coerce CompositeKey[PrimaryKey, SecondaryKey] into CompositeKeyType :-/ + + inner + .transform(transformerSupplier, stateStore, timerStore.name) + } + + } + +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/internal/utils/KafkaFlagUtils.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/internal/utils/KafkaFlagUtils.scala new file mode 100644 index 0000000000..f018923881 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/internal/utils/KafkaFlagUtils.scala @@ -0,0 +1,47 @@ +package com.twitter.finatra.kafkastreams.utils + +import com.twitter.app.{App, Flag, Flaggable} +import org.apache.kafka.streams.StreamsConfig + +trait KafkaFlagUtils extends App { + + def requiredKafkaFlag[T: Flaggable: Manifest](key: String, helpPrefix: String = ""): Flag[T] = { + flag[T](name = "kafka." + key, help = helpPrefix + kafkaDocumentation(key)) + } + + def flagWithKafkaDefault[T: Flaggable](key: String): Flag[T] = { + kafkaFlag[T](key, getKafkaDefault[T](key)) + } + + def kafkaFlag[T: Flaggable](key: String, default: => T): Flag[T] = { + kafkaFlag[T](key, default, kafkaDocumentation(key)) + } + + def kafkaFlag[T: Flaggable](key: String, default: => T, helpDoc: String): Flag[T] = { + flag( + name = "kafka." + key, + default = default, + help = helpDoc + ) + } + + def kafkaDocumentation(key: String): String = { + val configKey = StreamsConfig.configDef.configKeys.get(key) + if (configKey == null) { + throw new Exception("Kafka Config Key Not Found: " + key) + } + configKey.documentation + } + + def getKafkaDefault[T](key: String): T = { + val configKey = StreamsConfig.configDef().configKeys().get(key) + if (configKey == null) { + throw new Exception("Kafka Config Key Not Found: " + key) + } else if (!configKey.hasDefault) { + throw new Exception( + s"Kafka doesn't have a default value for ${key}, please provide a default value" + ) + } + configKey.defaultValue.asInstanceOf[T] + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/internal/utils/ProcessorContextLogging.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/internal/utils/ProcessorContextLogging.scala new file mode 100644 index 0000000000..b81032af9c --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/internal/utils/ProcessorContextLogging.scala @@ -0,0 +1,76 @@ +package com.twitter.finatra.kafkastreams.internal.utils + +import com.twitter.util.logging.Logger +import org.apache.kafka.streams.processor.ProcessorContext +import org.joda.time.DateTime +import org.joda.time.format.ISODateTimeFormat + +trait ProcessorContextLogging { + + private val _logger = Logger(getClass) + + @deprecated("Use error, warn, info, debug, or trace methods directly") + protected def logger: Logger = { + _logger + } + + protected def processorContext: ProcessorContext + + final protected[this] def error(message: => Any): Unit = { + if (_logger.isErrorEnabled) { + _logger.error(s"$taskIdStr$message") + } + } + + final protected[this] def info(message: => Any): Unit = { + if (_logger.isInfoEnabled) { + _logger.info(s"$taskIdStr$message") + } + } + + final protected[this] def warn(message: => Any): Unit = { + if (_logger.isWarnEnabled) { + _logger.warn(s"$taskIdStr$message") + } + } + + final protected[this] def debug(message: => Any): Unit = { + if (_logger.isDebugEnabled) { + _logger.debug(s"$taskIdStr$message") + } + } + + final protected[this] def trace(message: => Any): Unit = { + if (_logger.isTraceEnabled) { + _logger.trace(s"$taskIdStr$message") + } + } + + final protected def timeStr: String = { + val timestamp = processorContext.timestamp() + if (timestamp == Long.MaxValue) { + "@MaxTimestamp" + } else { + "@" + new DateTime(processorContext.timestamp()) + } + } + + final protected def taskIdStr: String = { + if (processorContext != null && processorContext.taskId != null) { + processorContext.taskId + "\t" + } else { + "" + } + } + + implicit class RichLong(long: Long) { + def iso8601Millis: String = { + ISODateTimeFormat.dateTime.print(long) + } + + def iso8601: String = { + ISODateTimeFormat.dateTimeNoMillis.print(long) + } + } + +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/internal/utils/ReflectionUtils.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/internal/utils/ReflectionUtils.scala new file mode 100644 index 0000000000..8b8d04dcd9 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/internal/utils/ReflectionUtils.scala @@ -0,0 +1,36 @@ +package com.twitter.finatra.kafkastreams.internal.utils + +import java.lang.reflect.{Field, Modifier} + +object ReflectionUtils { + + def getField(clazz: Class[_], fieldName: String): Field = { + val field = clazz.getDeclaredField(fieldName) + field.setAccessible(true) + field + } + + def getField[T](anyRef: AnyRef, fieldName: String): T = { + val field = getField(anyRef.getClass, fieldName) + field.get(anyRef).asInstanceOf[T] + } + + def getFinalField(clazz: Class[_], fieldName: String): Field = { + val field = clazz.getDeclaredField(fieldName) + field.setAccessible(true) + removeFinal(field) + field + } + + def getFinalField[T](anyRef: AnyRef, fieldName: String): T = { + val field = getFinalField(anyRef.getClass, fieldName) + field.get(anyRef).asInstanceOf[T] + } + + def removeFinal(field: Field): Unit = { + val fieldModifiers = classOf[Field].getDeclaredField("modifiers") + fieldModifiers.setAccessible(true) + fieldModifiers.setInt(field, field.getModifiers & ~Modifier.FINAL) + } + +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/internal/utils/TopologyReflectionUtils.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/internal/utils/TopologyReflectionUtils.scala new file mode 100644 index 0000000000..f73337c204 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/internal/utils/TopologyReflectionUtils.scala @@ -0,0 +1,23 @@ +package com.twitter.finatra.kafkastreams.internal.utils + +import org.apache.kafka.streams.Topology +import org.apache.kafka.streams.processor.internals.InternalTopologyBuilder + +object TopologyReflectionUtils { + + private val internalTopologyBuilderField = + ReflectionUtils.getFinalField(classOf[Topology], "internalTopologyBuilder") + + def isStateless(topology: Topology): Boolean = { + val internalTopologyBuilder = getInternalTopologyBuilder(topology) + + internalTopologyBuilder.allStateStoreName().isEmpty && + internalTopologyBuilder.globalStateStores().isEmpty + } + + private def getInternalTopologyBuilder(topology: Topology): InternalTopologyBuilder = { + internalTopologyBuilderField + .get(topology) + .asInstanceOf[InternalTopologyBuilder] + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/internal/utils/sampling/IndexedSampleKeySerde.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/internal/utils/sampling/IndexedSampleKeySerde.scala new file mode 100644 index 0000000000..7d3e16bd72 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/internal/utils/sampling/IndexedSampleKeySerde.scala @@ -0,0 +1,49 @@ +package com.twitter.finatra.kafkastreams.internal.utils.sampling + +import com.google.common.primitives.Ints +import com.twitter.finatra.kafka.serde.AbstractSerde +import com.twitter.finatra.streams.transformer.domain.IndexedSampleKey +import java.nio.ByteBuffer +import org.apache.kafka.common.serialization.Serde + +object IndexedSampleKeySerde { + + /** + * Indexed sample key adds one Integer to the bytes + */ + val IndexSize: Int = Ints.BYTES +} + +class IndexedSampleKeySerde[SampleKey](sampleKeySerde: Serde[SampleKey]) + extends AbstractSerde[IndexedSampleKey[SampleKey]] { + + private val sampleKeySerializer = sampleKeySerde.serializer() + private val sampleKeyDeserializer = sampleKeySerde.deserializer() + + override def deserialize(bytes: Array[Byte]): IndexedSampleKey[SampleKey] = { + val bb = ByteBuffer.wrap(bytes) + + val sampleKeyBytesLength = bytes.length - IndexedSampleKeySerde.IndexSize + val sampleKeyBytes = new Array[Byte](sampleKeyBytesLength) + bb.get(sampleKeyBytes) + val sampleKey = sampleKeyDeserializer.deserialize(topic, sampleKeyBytes) + + val index = bb.getInt() + + IndexedSampleKey(sampleKey, index) + } + + override def serialize(indexedSampleKey: IndexedSampleKey[SampleKey]): Array[Byte] = { + val sampleKeyBytes = sampleKeySerializer.serialize(topic, indexedSampleKey.sampleKey) + val sampleKeyBytesLength = sampleKeyBytes.length + + val indexedSampleKeyBytes = + new Array[Byte](sampleKeyBytesLength + IndexedSampleKeySerde.IndexSize) + val bb = ByteBuffer.wrap(indexedSampleKeyBytes) + bb.put(sampleKeyBytes) + + bb.putInt(indexedSampleKey.index) + + indexedSampleKeyBytes + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/internal/utils/sampling/ReservoirSamplingTransformer.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/internal/utils/sampling/ReservoirSamplingTransformer.scala new file mode 100644 index 0000000000..1b309590fa --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/internal/utils/sampling/ReservoirSamplingTransformer.scala @@ -0,0 +1,95 @@ +package com.twitter.finatra.kafkastreams.internal.utils.sampling + +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finatra.streams.transformer.domain.{ + Expire, + IndexedSampleKey, + Time, + TimerMetadata +} +import com.twitter.finatra.streams.transformer.{FinatraTransformerV2, PersistentTimers} +import com.twitter.util.Duration +import org.apache.kafka.streams.processor.PunctuationType +import scala.reflect.ClassTag +import scala.util.Random + +/** + * A Reservoir sampling transformer + * + * See "Random Sampling with a Reservoir", Vitter, 1985 + */ +class ReservoirSamplingTransformer[ + Key: ClassTag, + Value, + SampleKey: ClassTag, + SampleValue: ClassTag +](statsReceiver: StatsReceiver, + toSampleKey: (Key, Value) => SampleKey, + toSampleValue: (Key, Value) => SampleValue, + sampleSize: Int, + expirationTime: Option[Duration], + countStoreName: String, + sampleStoreName: String, + timerStoreName: String) + extends FinatraTransformerV2[Key, Value, SampleKey, SampleValue](statsReceiver = statsReceiver) + with PersistentTimers { + + private val numExpiredCounter = statsReceiver.counter("numExpired") + private val random = new Random() + + private val countStore = getKeyValueStore[SampleKey, Long](countStoreName) + private val sampleStore = + getKeyValueStore[IndexedSampleKey[SampleKey], SampleValue](sampleStoreName) + private val timerStore = + getPersistentTimerStore[SampleKey](timerStoreName, onEventTimer, PunctuationType.STREAM_TIME) + + override protected[finatra] def onMessage(messageTime: Time, key: Key, value: Value): Unit = { + val sampleKey = toSampleKey(key, value) + val totalCount = countStore.getOrDefault(sampleKey, 0) + + for (eTime <- expirationTime) { + if (isFirstTimeSampleKeySeen(totalCount)) { + timerStore.addTimer(messageTime.plus(eTime), Expire, sampleKey) + } + } + + sample(sampleKey, toSampleValue(key, value), totalCount) + countStore.put(sampleKey, totalCount + 1) + } + + /* Private */ + + private def onEventTimer(time: Time, metadata: TimerMetadata, key: SampleKey): Unit = { + assert(metadata == Expire) + countStore.delete(key) + + sampleStore + .deleteRange( + IndexedSampleKey.rangeStart(key), + IndexedSampleKey.rangeEnd(key), + maxDeletes = sampleSize + ) + + numExpiredCounter.incr() + } + + private def isFirstTimeSampleKeySeen(count: Long): Boolean = { + // an empty value will be returned as 0 + count == 0 + } + + private def getNextSampleIndex(count: Int) = { + if (count < sampleSize) { + count + } else { + random.nextInt(count) + } + } + + private def sample(sampleKey: SampleKey, value: SampleValue, count: Long): Unit = { + val sampleIndex = getNextSampleIndex(count.toInt) + if (sampleIndex < sampleSize) { + sampleStore.put(IndexedSampleKey(sampleKey, sampleIndex), value) + } + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/partitioners/RoundRobinStreamPartitioner.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/partitioners/RoundRobinStreamPartitioner.scala new file mode 100644 index 0000000000..8d0243b0cb --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/partitioners/RoundRobinStreamPartitioner.scala @@ -0,0 +1,21 @@ +package com.twitter.finatra.kafkastreams.partitioners + +import org.apache.kafka.streams.processor.StreamPartitioner + +/** + * Partitions in a round robin fashion going from 0 to numPartitions -1 and wrapping around again. + * + * @tparam K the key on the stream + * @tparam V the value on the stream + */ +@deprecated("no longer supported", "1/7/2019") +class RoundRobinStreamPartitioner[K, V] extends StreamPartitioner[K, V] { + + private var nextPartitionId: Int = 0 + + override def partition(topic: String, key: K, value: V, numPartitions: Int): Integer = { + val partitionIdToReturn = nextPartitionId + nextPartitionId = (nextPartitionId + 1) % numPartitions + partitionIdToReturn + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/processors/AsyncProcessor.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/processors/AsyncProcessor.scala new file mode 100644 index 0000000000..b70bd549ee --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/processors/AsyncProcessor.scala @@ -0,0 +1,23 @@ +package com.twitter.finatra.kafkastreams.processors + +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finatra.kafkastreams.processors.internal.AsyncFlushing +import com.twitter.util.{Duration, Future} + +abstract class AsyncProcessor[K, V]( + override val statsReceiver: StatsReceiver, + override val maxOutstandingFuturesPerTask: Int, + override val commitInterval: Duration, + override val flushTimeout: Duration) + extends FlushingProcessor[K, V] + with AsyncFlushing[K, V, Unit, Unit] { + + protected def processAsync(key: K, value: V, timestamp: MessageTimestamp): Future[Unit] + + override final def process(key: K, value: V): Unit = { + val processAsyncResult = + processAsync(key = key, value = value, timestamp = processorContext.timestamp()) + + addFuture(key = key, value = value, future = processAsyncResult.map(_ => Iterable())) + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/processors/AsyncTransformer.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/processors/AsyncTransformer.scala new file mode 100644 index 0000000000..b6a5d5bb6d --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/processors/AsyncTransformer.scala @@ -0,0 +1,141 @@ +package com.twitter.finatra.kafkastreams.processors + +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finatra.kafkastreams.internal.utils.ProcessorContextLogging +import com.twitter.finatra.kafkastreams.processors.internal.AsyncFlushing +import com.twitter.util.{Duration, Future} +import java.util.concurrent.ConcurrentHashMap +import org.apache.kafka.streams.processor.{ + Cancellable, + ProcessorContext, + PunctuationType, + Punctuator, + To +} + +/** + * The AsyncTransformer trait allows async futures to be used to emit records downstreams + * + * See https://issues.apache.org/jira/browse/KAFKA-6989 for related ticket in Kafka Streams backlog + * See also: + * https://stackoverflow.com/questions/42049047/how-to-handle-error-and-dont-commit-when-use-kafka-streams-dsl/42056286#comment92197161_42056286 + * https://stackoverflow.com/questions/42064430/external-system-queries-during-kafka-stream-processing?noredirect=1&lq=1 + * https://issues.apache.org/jira/browse/KAFKA-7432 + * + * Note: Completed futures add output records to an outstandingResults set. Future's do not directly + * call context.forward on success since the Processor/Transformer classes have a defined lifecycle + * which revolve around 2 main processing methods (process/punctuate or transform/punctuate). + * Kafka Streams then ensures that these methods are never called from 2 threads at the same time. + * Kafka Streams assumes "forward" would only ever be called from the thread that calls process/transform/punctuate. + * As such, it could be dangerous to have a Finagle thread calling forward at any time + * + * Note 2: throwIfAsyncFailure is used to fail the Kafka Streams service more quickly than waiting + * for an eventual failure to occur at the next commit interval. We try to fail fast and a future failure + * will result in your entire instance shutting down. This default behavior prevents data loss. If + * you want your service to handle failed futures please use handle/transform on your returned future + */ +abstract class AsyncTransformer[K1, V1, K2, V2]( + override val statsReceiver: StatsReceiver, + override val maxOutstandingFuturesPerTask: Int, + flushAsyncRecordsInterval: Duration, + override val commitInterval: Duration, + override val flushTimeout: Duration) + extends FlushingTransformer[K1, V1, K2, V2] + with AsyncFlushing[K1, V1, K2, V2] + with ProcessorContextLogging { + + @volatile private var flushOutputRecordsCancellable: Cancellable = _ + private val outstandingResults = ConcurrentHashMap + .newKeySet[(K2, V2, MessageTimestamp)](maxOutstandingFuturesPerTask) + + private var _context: ProcessorContext = _ + + override protected def processorContext: ProcessorContext = _context + + /* Abstract */ + + /** + * Asynchronously transform the record with the given key and value. + * + * Additionally, any {@link StateStore state} that is {@link KStream#transform(TransformerSupplier, String...) + * attached} to this operator can be accessed and modified arbitrarily (cf. {@link ProcessorContext#getStateStore(String)}). + * + * @param key the key for the record + * @param value the value for the record + * @param timestamp the timestamp for the record + * + * @return Future iterable of output messages each containing a key, value, and message timestamp + */ + protected def transformAsync( + key: K1, + value: V1, + timestamp: MessageTimestamp + ): Future[Iterable[(K2, V2, Long)]] + + /* Overrides */ + + final override def init(context: ProcessorContext): Unit = { + _context = context + + flushOutputRecordsCancellable = context + .schedule( + flushAsyncRecordsInterval.inMillis, + PunctuationType.WALL_CLOCK_TIME, + new Punctuator { + override def punctuate(timestamp: Long): Unit = { + flushOutputRecords() + } + } + ) + + super.onInit() + } + + override final def transform(key: K1, value: V1): (K2, V2) = { + addFuture(key, value, transformAsync(key, value, _context.timestamp())) + + null + } + + override protected def onFutureSuccess( + key: K1, + value: V1, + result: Iterable[(K2, V2, MessageTimestamp)] + ): Unit = { + for ((key, value, timestamp) <- result) { + outstandingResults.add((key, value, timestamp)) + } + } + + override def onFlush(): Unit = { + //First call super.onFlush so that we wait for outstanding futures to complete + super.onFlush() + + //Then output any resulting records + flushOutputRecords() + } + + final override def close(): Unit = { + debug("Close") + + if (flushOutputRecordsCancellable != null) { + flushOutputRecordsCancellable.cancel() + flushOutputRecordsCancellable = null + } + + super.onClose() + } + + /* Private */ + + private def flushOutputRecords(): Unit = { + val iterator = outstandingResults.iterator() + while (iterator.hasNext) { + val (key, value, timestamp) = iterator.next() + + processorContext.forward(key, value, To.all().withTimestamp(timestamp)) + + iterator.remove() + } + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/processors/FlushingAwareServer.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/processors/FlushingAwareServer.scala new file mode 100644 index 0000000000..cae93745fc --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/processors/FlushingAwareServer.scala @@ -0,0 +1,23 @@ +package com.twitter.finatra.kafkastreams.processors + +import com.twitter.finatra.kafkastreams.KafkaStreamsTwitterServer +import com.twitter.finatra.kafkastreams.config.KafkaStreamsConfig +import com.twitter.util.Duration + +/** + * FlushingAwareServer must be mixed in to servers that rely on manually controlling when a flush/commit occurs. + * As such, this trait will be needed when using the following classes, FlushingProcessor, FlushingTransformer, + * AsyncProcessor, AsyncTransformer, FinatraTransformer, and FinatraTransformerV2 + * + * This trait sets 'kafka.commit.interval' to 'Duration.Top' to disable the normal Kafka Streams commit process. + * As such the only commits that will occur are triggered manually, thus allowing us to control when flush/commit + * occurs + */ +trait FlushingAwareServer extends KafkaStreamsTwitterServer { + + override def streamsProperties(config: KafkaStreamsConfig): KafkaStreamsConfig = { + super + .streamsProperties(config) + .commitInterval(Duration.Top) + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/processors/FlushingProcessor.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/processors/FlushingProcessor.scala new file mode 100644 index 0000000000..61401a715e --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/processors/FlushingProcessor.scala @@ -0,0 +1,26 @@ +package com.twitter.finatra.kafkastreams.processors + +import com.twitter.finatra.kafkastreams.internal.utils.ProcessorContextLogging +import com.twitter.finatra.kafkastreams.processors.internal.Flushing +import com.twitter.finatra.streams.transformer.internal.OnInit +import org.apache.kafka.streams.processor._ + +trait FlushingProcessor[K, V] + extends AbstractProcessor[K, V] + with OnInit + with Flushing + with ProcessorContextLogging { + + private var _context: ProcessorContext = _ + + override def init(processorContext: ProcessorContext): Unit = { + _context = processorContext + onInit() + } + + override def processorContext: ProcessorContext = _context + + final override def close(): Unit = { + onClose() + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/processors/FlushingTransformer.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/processors/FlushingTransformer.scala new file mode 100644 index 0000000000..274244d555 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/processors/FlushingTransformer.scala @@ -0,0 +1,6 @@ +package com.twitter.finatra.kafkastreams.processors + +import com.twitter.finatra.kafkastreams.processors.internal.Flushing +import org.apache.kafka.streams.kstream.Transformer + +trait FlushingTransformer[K, V, K1, V1] extends Transformer[K, V, (K1, V1)] with Flushing diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/processors/internal/AsyncFlushing.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/processors/internal/AsyncFlushing.scala new file mode 100644 index 0000000000..1b5596c11f --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/processors/internal/AsyncFlushing.scala @@ -0,0 +1,92 @@ +package com.twitter.finatra.kafkastreams.processors.internal + +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finatra.kafkastreams.processors.MessageTimestamp +import com.twitter.finatra.streams.transformer.internal.{OnClose, OnInit} +import com.twitter.util.{Await, Duration, Future, Return, Throw} +import java.util.concurrent.Semaphore + +/** + * The AsyncFlushing trait allows outstanding futures to be tracked to completion when the flush() + * method is called + */ +trait AsyncFlushing[K1, V1, K2, V2] extends Flushing with OnInit with OnClose { + + @volatile private var outstandingFutures = Future.Unit + + @volatile private var asyncFailure: Throwable = _ + + private val addPermits = new Semaphore(maxOutstandingFuturesPerTask, /* fairness = */ false) + + private val outstandingFuturesGauge = + statsReceiver.addGauge("outstandingFutures")(numOutstandingFutures) + + /* Protected */ + + protected def statsReceiver: StatsReceiver + + protected def maxOutstandingFuturesPerTask: Int + + protected def flushTimeout: Duration + + protected def addFuture( + key: K1, + value: V1, + future: Future[Iterable[(K2, V2, MessageTimestamp)]] + ): Unit = { + throwIfAsyncFailure() + + addPermits.acquire() + + outstandingFutures = outstandingFutures + .join(future.respond { + case Throw(t) => + addPermits.release() + onFutureFailure(key, value, t) + case Return(fr) => + addPermits.release() + onFutureSuccess(key, value, fr) + }).unit + } + + protected def onFutureSuccess( + key: K1, + value: V1, + result: Iterable[(K2, V2, MessageTimestamp)] + ): Unit = { + debug(s"FutureSuccess $key $value $result") + } + + protected def onFutureFailure(key: K1, value: V1, t: Throwable): Unit = { + error("Async asyncFailure: " + t) + setAsyncFailure(t) + } + + protected def setAsyncFailure(e: Throwable): Unit = { + asyncFailure = e + } + + override def onFlush(): Unit = { + debug(s"Flush: Waiting on async results") + Await.result(outstandingFutures, flushTimeout) + outstandingFutures = Future.Unit + assert(numOutstandingFutures == 0) + debug(s"Finished waiting on async results") + } + + protected def throwIfAsyncFailure(): Unit = { + if (asyncFailure != null) { + throw asyncFailure + } + } + + protected def numOutstandingFutures: Int = { + maxOutstandingFuturesPerTask - addPermits.availablePermits + } + + override def onClose(): Unit = { + super.onClose() + debug("Close") + outstandingFuturesGauge.remove() + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/processors/internal/Flushing.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/processors/internal/Flushing.scala new file mode 100644 index 0000000000..2063b122cf --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/processors/internal/Flushing.scala @@ -0,0 +1,48 @@ +package com.twitter.finatra.kafkastreams.processors.internal + +import com.twitter.finatra.kafkastreams.internal.utils.ProcessorContextLogging +import com.twitter.finatra.streams.transformer.internal.{OnClose, OnInit} +import com.twitter.util.Duration +import org.apache.kafka.streams.StreamsConfig +import org.apache.kafka.streams.processor.{Cancellable, PunctuationType, Punctuator} + +trait Flushing extends OnInit with OnClose with ProcessorContextLogging { + + @volatile private var commitPunctuatorCancellable: Cancellable = _ + + protected def commitInterval: Duration + + protected def onFlush(): Unit = {} + + //TODO: Create and use frameworkOnInit for framework use + override def onInit(): Unit = { + super.onInit() + + val streamsCommitIntervalMillis = processorContext + .appConfigs().get(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG).asInstanceOf[java.lang.Long] + assert( + streamsCommitIntervalMillis == Duration.Top.inMillis, + s"You're using an operator that requires 'Flushing' functionality (e.g. FlushingProcessor/Transformer or AsyncProcessor/Transformer). As such, your server must mixin FlushingAwareServer so that automatic Kafka Streams commit will be disabled." + ) + + if (commitInterval != Duration.Top) { + info(s"Scheduling timer to call commit every $commitInterval") + commitPunctuatorCancellable = processorContext + .schedule(commitInterval.inMillis, PunctuationType.WALL_CLOCK_TIME, new Punctuator { + override def punctuate(timestamp: Long): Unit = { + onFlush() + processorContext.commit() + } + }) + } + } + + //TODO: Create and use frameworkOnClose + override def onClose(): Unit = { + super.onClose() + if (commitPunctuatorCancellable != null) { + commitPunctuatorCancellable.cancel() + commitPunctuatorCancellable = null + } + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/processors/package.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/processors/package.scala new file mode 100644 index 0000000000..9e32f2ff97 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/processors/package.scala @@ -0,0 +1,5 @@ +package com.twitter.finatra.kafkastreams + +package object processors { + type MessageTimestamp = Long +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/punctuators/AdvancedPunctuator.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/punctuators/AdvancedPunctuator.scala new file mode 100644 index 0000000000..0914e8d24d --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/kafkastreams/punctuators/AdvancedPunctuator.scala @@ -0,0 +1,28 @@ +package com.twitter.finatra.kafkastreams.punctuators + +import org.apache.kafka.streams.processor.Punctuator + +/** + * A Punctuator that will only only call 'punctuateAdvanced' when the timestamp is greater than the last timestamp. + * + * *Note* if you extend this class you probably do not want to override 'punctuate' + */ +@deprecated("no longer supported", "1/7/2019") +trait AdvancedPunctuator extends Punctuator { + + private var lastPunctuateTimeMillis = Long.MinValue + + override def punctuate(timestampMillis: Long): Unit = { + if (timestampMillis > lastPunctuateTimeMillis) { + punctuateAdvanced(timestampMillis) + lastPunctuateTimeMillis = timestampMillis + } + } + + /** + * This will only be called if the timestamp is greater than the previous time + * + * @param timestampMillis the timestamp of the punctuate + */ + def punctuateAdvanced(timestampMillis: Long): Unit +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/config/DefaultTopicConfig.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/config/DefaultTopicConfig.scala new file mode 100644 index 0000000000..bb4de18d22 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/config/DefaultTopicConfig.scala @@ -0,0 +1,27 @@ +package com.twitter.finatra.streams.config + +import com.twitter.conversions.StorageUnitOps._ +import com.twitter.conversions.DurationOps._ +import java.util +import org.apache.kafka.common.config.TopicConfig.{ + CLEANUP_POLICY_COMPACT, + CLEANUP_POLICY_CONFIG, + DELETE_RETENTION_MS_CONFIG, + SEGMENT_BYTES_CONFIG +} +import scala.collection.JavaConverters._ + +object DefaultTopicConfig { + + /** + * Default changelog topic configs generally suitable for non-windowed use cases using FinatraTransformer. + * We explicitly do not enable cleanup-policy: compact,delete + * because we'd rather rely on FinatraTransformer PersistentTimers to handle expiration/deletes + * (which gives us more control over when and how expiration's can occur). + */ + val FinatraChangelogConfig: util.Map[String, String] = Map( + CLEANUP_POLICY_CONFIG -> CLEANUP_POLICY_COMPACT, + SEGMENT_BYTES_CONFIG -> 100.megabytes.inBytes.toString, + DELETE_RETENTION_MS_CONFIG -> 5.minutes.inMillis.toString //configure delete retention such that standby replicas have 5 minutes to read deletes + ).asJava +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/converters/time.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/converters/time.scala new file mode 100644 index 0000000000..3a54920473 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/converters/time.scala @@ -0,0 +1,15 @@ +package com.twitter.finatra.streams.converters + +import org.joda.time.format.ISODateTimeFormat + +object time { + implicit class RichLong(long: Long) { + def iso8601Millis: String = { + ISODateTimeFormat.dateTime.print(long) + } + + def iso8601: String = { + ISODateTimeFormat.dateTimeNoMillis.print(long) + } + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/flags/FinatraTransformerFlags.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/flags/FinatraTransformerFlags.scala new file mode 100644 index 0000000000..f0dbd2fee6 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/flags/FinatraTransformerFlags.scala @@ -0,0 +1,28 @@ +package com.twitter.finatra.streams.flags + +import com.twitter.conversions.DurationOps._ +import com.twitter.finatra.kafkastreams.KafkaStreamsTwitterServer +import com.twitter.finatra.streams.flags.FinatraTransformerFlags._ + +object FinatraTransformerFlags { + val AutoWatermarkInterval = "finatra.streams.watermarks.auto.interval" + val EmitWatermarkPerMessage = "finatra.streams.watermarks.per.message" +} + +/** + * A trait providing flags for configuring FinatraTransformers + */ +trait FinatraTransformerFlags extends KafkaStreamsTwitterServer { + + protected val autoWatermarkIntervalFlag = flag( + AutoWatermarkInterval, + 100.milliseconds, + "Minimum interval at which to call onWatermark when a new watermark is assigned. Set to 0.millis to disable auto watermark functionality which can be useful during topology tests." + ) + + protected val emitWatermarkPerMessageFlag = flag( + EmitWatermarkPerMessage, + false, + "Call onWatermark after each message. When set to false, onWatermark is called every finatra.streams.auto.watermark.interval. Note: onWatermark is only called when the watermark changes." + ) +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/flags/RocksDbFlags.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/flags/RocksDbFlags.scala new file mode 100644 index 0000000000..43e75e75f3 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/flags/RocksDbFlags.scala @@ -0,0 +1,39 @@ +package com.twitter.finatra.streams.flags + +import com.twitter.conversions.StorageUnitOps._ +import com.twitter.finatra.kafkastreams.config.FinatraRocksDBConfig +import com.twitter.inject.server.TwitterServer + +trait RocksDbFlags extends TwitterServer { + + protected val rocksDbCountsStoreBlockCacheSize = + flag( + name = FinatraRocksDBConfig.RocksDbBlockCacheSizeConfig, + default = 200.megabytes, + help = + "Size of the rocksdb block cache per task. We recommend that this should be about 1/3 of your total memory budget. The remaining free memory can be left for the OS page cache" + ) + + protected val rocksDbEnableStatistics = + flag( + name = FinatraRocksDBConfig.RocksDbEnableStatistics, + default = false, + help = + "Enable RocksDB statistics. Note: RocksDB Statistics could add 5-10% degradation in performance (see https://github.com/facebook/rocksdb/wiki/Statistics)" + ) + + protected val rocksDbStatCollectionPeriodMs = + flag( + name = FinatraRocksDBConfig.RocksDbStatCollectionPeriodMs, + default = 60000, + help = "Set the period in milliseconds for stats collection." + ) + + protected val rocksDbEnableLZ4 = + flag( + name = FinatraRocksDBConfig.RocksDbLZ4Config, + default = false, + help = + "Enable RocksDB LZ4 compression. (See https://github.com/facebook/rocksdb/wiki/Compression)" + ) +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/interceptors/KafkaStreamsMonitoringConsumerInterceptor.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/interceptors/KafkaStreamsMonitoringConsumerInterceptor.scala new file mode 100644 index 0000000000..0250e8cac2 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/interceptors/KafkaStreamsMonitoringConsumerInterceptor.scala @@ -0,0 +1,20 @@ +package com.twitter.finatra.streams.interceptors + +import com.twitter.finatra.kafka.interceptors.MonitoringConsumerInterceptor + +/** + * An Kafka Streams aware interceptor that looks for the `publish_time` header and record timestamp and calculates + * how much time has passed since the each of those times and updates stats for each. + * + * Note: Since this interceptor is Kafka Streams aware, it will not calculate stats when reading changelog topics to restore + * state, since this has been shown to be a hot-spot during restoration of large amounts of state. + */ +class KafkaStreamsMonitoringConsumerInterceptor extends MonitoringConsumerInterceptor { + + /** + * Determines if this interceptor should be enabled given the consumer client id + */ + override protected def enableInterceptorForClientId(consumerClientId: String): Boolean = { + !consumerClientId.endsWith("restore-consumer") + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/query/QueryableFinatraCompositeWindowStore.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/query/QueryableFinatraCompositeWindowStore.scala new file mode 100644 index 0000000000..ba1fd4a19b --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/query/QueryableFinatraCompositeWindowStore.scala @@ -0,0 +1,137 @@ +package com.twitter.finatra.streams.query + +import com.twitter.finatra.streams.converters.time._ +import com.twitter.finatra.streams.queryable.thrift.domain.ServiceShardId +import com.twitter.finatra.streams.queryable.thrift.partitioning.{ + KafkaPartitioner, + StaticServiceShardPartitioner +} +import com.twitter.finatra.streams.stores.internal.FinatraStoresGlobalManager +import com.twitter.finatra.streams.transformer.FinatraTransformer.{DateTimeMillis, WindowStartTime} +import com.twitter.finatra.streams.transformer.domain.{CompositeKey, Time, TimeWindowed} +import com.twitter.inject.Logging +import com.twitter.util.Duration +import org.apache.kafka.common.serialization.{Serde, Serializer} +import org.joda.time.DateTimeUtils +import scala.collection.JavaConverters._ + +//TODO: DRY with other queryable finatra stores +class QueryableFinatraCompositeWindowStore[PK, SK, V]( + storeName: String, + windowSize: Duration, + primaryKeySerde: Serde[PK], + numShards: Int, + numQueryablePartitions: Int, + currentShardId: Int) + extends Logging { + + // The number of windows to query before and/or after specified start and end times + private val defaultWindowMultiplier = 3 + + private val primaryKeySerializer = primaryKeySerde.serializer() + + private val currentServiceShardId = ServiceShardId(currentShardId) + + private val windowSizeMillis = windowSize.inMillis + + private val partitioner = new KafkaPartitioner( + StaticServiceShardPartitioner(numShards = numShards), + numPartitions = numQueryablePartitions + ) + + /* Public */ + + /** + * Get a range of composite keys and return them combined in a map. If the primary key for the + * composite keys is non-local to this Kafka Streams instance, return an exception indicating + * which instance is hosting this primary key + * + * @param primaryKey The primary key for the data being queried (e.g. a UserId) + * @param startCompositeKey The starting composite key being queried (e.g. UserId-ClickType) + * @param endCompositeKey The ending composite key being queried (e.g. UserId-ClickType) + * @param allowStaleReads Allow stale reads when querying a caching key value store. If set to false, + * each query will trigger a flush of the cache. + * @param startTime The start time of the windows being queried + * @param endTime The end time of the windows being queried + * @return A time windowed map of composite keys to their values + */ + def get( + primaryKey: PK, + startCompositeKey: CompositeKey[PK, SK], + endCompositeKey: CompositeKey[PK, SK], + allowStaleReads: Boolean, + startTime: Option[DateTimeMillis] = None, + endTime: Option[DateTimeMillis] = None + ): Map[WindowStartTime, scala.collection.Map[SK, V]] = { + throwIfNonLocalKey(primaryKey, primaryKeySerializer) + + val (startWindowRange, endWindowRange) = startAndEndRange( + startTime = startTime, + endTime = endTime, + windowSizeMillis = windowSizeMillis) + + val resultMap = new java.util.TreeMap[Long, scala.collection.mutable.Map[SK, V]]().asScala + + var windowStartTime = startWindowRange + while (windowStartTime <= endWindowRange) { + queryWindow(startCompositeKey, endCompositeKey, windowStartTime, allowStaleReads, resultMap) + windowStartTime = windowStartTime + windowSizeMillis + } + + resultMap.toMap + } + + /* Private */ + + private def queryWindow( + startCompositeKey: CompositeKey[PK, SK], + endCompositeKey: CompositeKey[PK, SK], + windowStartTime: DateTimeMillis, + allowStaleReads: Boolean, + resultMap: scala.collection.mutable.Map[Long, scala.collection.mutable.Map[SK, V]] + ): Unit = { + trace(s"QueryWindow $startCompositeKey to $endCompositeKey ${windowStartTime.iso8601}") + + //TODO: Use store.taskId to find exact store where the key is assigned + for (store <- FinatraStoresGlobalManager.getWindowedCompositeStores[PK, SK, V](storeName)) { + val iterator = store.range( + TimeWindowed.forSize(startMs = windowStartTime, windowSizeMillis, startCompositeKey), + TimeWindowed.forSize(startMs = windowStartTime, windowSizeMillis, endCompositeKey), + allowStaleReads = allowStaleReads + ) + + while (iterator.hasNext) { + val entry = iterator.next() + trace(s"$store\t$entry") + val innerMap = + resultMap.getOrElseUpdate(entry.key.startMs, scala.collection.mutable.Map[SK, V]()) + innerMap += (entry.key.value.secondary -> entry.value) + } + } + } + + private def startAndEndRange( + startTime: Option[DateTimeMillis], + endTime: Option[DateTimeMillis], + windowSizeMillis: DateTimeMillis + ): (DateTimeMillis, DateTimeMillis) = { + val endWindowRange = endTime.getOrElse { + TimeWindowed.windowStart( + messageTime = Time(DateTimeUtils.currentTimeMillis), + sizeMs = windowSizeMillis) + defaultWindowMultiplier * windowSizeMillis + } + + val startWindowRange = + startTime.getOrElse(endWindowRange - (defaultWindowMultiplier * windowSizeMillis)) + + (startWindowRange, endWindowRange) + } + + private def throwIfNonLocalKey(key: PK, keySerializer: Serializer[PK]): Unit = { + val keyBytes = keySerializer.serialize("", key) + val partitionsToQuery = partitioner.shardIds(keyBytes) + if (partitionsToQuery.head != currentServiceShardId) { + throw new Exception(s"Non local key. Query $partitionsToQuery") + } + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/query/QueryableFinatraKeyValueStore.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/query/QueryableFinatraKeyValueStore.scala new file mode 100644 index 0000000000..86ae4b92b3 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/query/QueryableFinatraKeyValueStore.scala @@ -0,0 +1,108 @@ +package com.twitter.finatra.streams.query + +import com.twitter.finatra.streams.queryable.thrift.domain.ServiceShardId +import com.twitter.finatra.streams.queryable.thrift.partitioning.{ + KafkaPartitioner, + StaticServiceShardPartitioner +} +import com.twitter.finatra.streams.stores.FinatraKeyValueStore +import com.twitter.finatra.streams.stores.internal.FinatraStoresGlobalManager +import com.twitter.inject.Logging +import java.util.NoSuchElementException +import org.apache.kafka.common.serialization.Serde +import org.apache.kafka.streams.state.KeyValueIterator + +//TODO: DRY with window store +class QueryableFinatraKeyValueStore[PK, K, V]( + storeName: String, + primaryKeySerde: Serde[PK], + numShards: Int, + numQueryablePartitions: Int, + currentShardId: Int) + extends Logging { + private val primaryKeySerializer = primaryKeySerde.serializer() + + private val currentServiceShardId = ServiceShardId(currentShardId) + + private val partitioner = new KafkaPartitioner( + StaticServiceShardPartitioner(numShards = numShards), + numPartitions = numQueryablePartitions + ) + + /** + * Get the value corresponding to this key. + * + * @param key The key to fetch + * + * @return The value or null if no value is found. + * + * @throws NullPointerException If null is used for key. + * @throws InvalidStateStoreException if the store is not initialized + */ + def get(primaryKey: PK, key: K): Option[V] = { + throwIfNonLocalKey(primaryKey) + + trace(s"Get $key") + + //TODO: Use store.taskId to find exact store where the key is assigned + for (store <- stores) { + val result = store.get(key) + if (result != null) { + return Some(result) + } + } + + None + } + + /** + * Get an iterator over a given range of keys. This iterator must be closed after use. + * The returned iterator must be safe from {@link java.util.ConcurrentModificationException}s + * and must not return null values. No ordering guarantees are provided. + * + * @param from The first key that could be in the range + * @param to The last key that could be in the range + * + * @return The iterator for this range. + * + * @throws NullPointerException If null is used for from or to. + * @throws InvalidStateStoreException if the store is not initialized + */ + def range(primaryKey: PK, from: K, to: K): KeyValueIterator[K, V] = { + throwIfNonLocalKey(primaryKey) + + //TODO: Use store.taskId to find exact store where the key is assigned + for (store <- stores) { + val result = store.range(from, to) + if (result.hasNext) { + return result + } + } + + EmptyKeyValueIterator + } + + private def throwIfNonLocalKey(primaryKey: PK): Unit = { + val keyBytes = primaryKeySerializer.serialize("", primaryKey) + val partitionsToQuery = partitioner.shardIds(keyBytes) + if (partitionsToQuery.head != currentServiceShardId) { + throw new Exception(s"Non local key. Query $partitionsToQuery") + } + } + + private def stores: Iterable[FinatraKeyValueStore[K, V]] = { + FinatraStoresGlobalManager.getStores[K, V](storeName) + } + + private object EmptyKeyValueIterator extends KeyValueIterator[K, V] { + override def hasNext = false + + override def close(): Unit = {} + + override def peekNextKey = throw new NoSuchElementException + + override def next = throw new NoSuchElementException + + override def remove(): Unit = {} + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/query/QueryableFinatraWindowStore.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/query/QueryableFinatraWindowStore.scala new file mode 100644 index 0000000000..f73e4eb9f6 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/query/QueryableFinatraWindowStore.scala @@ -0,0 +1,87 @@ +package com.twitter.finatra.streams.query + +import com.twitter.finatra.streams.queryable.thrift.domain.ServiceShardId +import com.twitter.finatra.streams.queryable.thrift.partitioning.{ + KafkaPartitioner, + StaticServiceShardPartitioner +} +import com.twitter.finatra.streams.stores.FinatraKeyValueStore +import com.twitter.finatra.streams.stores.internal.FinatraStoresGlobalManager +import com.twitter.finatra.streams.transformer.FinatraTransformer.{DateTimeMillis, WindowStartTime} +import com.twitter.finatra.streams.transformer.domain.{Time, TimeWindowed} +import com.twitter.inject.Logging +import com.twitter.util.Duration +import org.apache.kafka.common.serialization.{Serde, Serializer} +import org.joda.time.DateTimeUtils +import scala.collection.JavaConverters._ + +class QueryableFinatraWindowStore[K, V]( + storeName: String, + windowSize: Duration, + keySerde: Serde[K], + numShards: Int, + numQueryablePartitions: Int, + currentShardId: Int) + extends Logging { + + // The number of windows to query before and/or after specified start and end times + private val defaultWindowMultiplier = 3 + + private val keySerializer = keySerde.serializer() + + private val currentServiceShardId = ServiceShardId(currentShardId) + + private val windowSizeMillis = windowSize.inMillis + + private val partitioner = new KafkaPartitioner( + StaticServiceShardPartitioner(numShards = numShards), + numPartitions = numQueryablePartitions + ) + + def get( + key: K, + startTime: Option[Long] = None, + endTime: Option[Long] = None + ): Map[WindowStartTime, V] = { + throwIfNonLocalKey(key, keySerializer) + + val endWindowRange = endTime.getOrElse( + TimeWindowed.windowStart( + messageTime = Time(DateTimeUtils.currentTimeMillis), + sizeMs = windowSizeMillis) + defaultWindowMultiplier * windowSizeMillis) + + val startWindowRange = + startTime.getOrElse(endWindowRange - (defaultWindowMultiplier * windowSizeMillis)) + + val windowedMap = new java.util.TreeMap[DateTimeMillis, V] + + var currentWindowStart = startWindowRange + while (currentWindowStart <= endWindowRange) { + val windowedKey = TimeWindowed.forSize(currentWindowStart, windowSize.inMillis, key) + + //TODO: Use store.taskId to find exact store where the key is assigned + for (store <- stores) { + val result = store.get(windowedKey) + if (result != null) { + windowedMap.put(currentWindowStart, result) + } + } + + currentWindowStart = currentWindowStart + windowSizeMillis + } + + windowedMap.asScala.toMap + } + + private def throwIfNonLocalKey(key: K, keySerializer: Serializer[K]): Unit = { + val keyBytes = keySerializer.serialize("", key) + val partitionsToQuery = partitioner.shardIds(keyBytes) + if (partitionsToQuery.head != currentServiceShardId) { + throw new Exception(s"Non local key. Query $partitionsToQuery") + } + } + + private def stores: Iterable[FinatraKeyValueStore[TimeWindowed[K], V]] = { + FinatraStoresGlobalManager.getWindowedStores(storeName) + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/stores/CachingFinatraKeyValueStore.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/stores/CachingFinatraKeyValueStore.scala new file mode 100644 index 0000000000..eafd7cba16 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/stores/CachingFinatraKeyValueStore.scala @@ -0,0 +1,14 @@ +package com.twitter.finatra.streams.stores + +/** + * A FinatraKeyValueStore with a callback that fires when an entry is flushed into the underlying store + */ +trait CachingFinatraKeyValueStore[K, V] extends FinatraKeyValueStore[K, V] { + + /** + * Register a flush listener callback that will be called every time a cached key value store + * entry is flushed into the underlying RocksDB store + * @param listener Flush callback for cached entries + */ + def registerFlushListener(listener: (K, V) => Unit): Unit +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/stores/FinatraKeyValueStore.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/stores/FinatraKeyValueStore.scala new file mode 100644 index 0000000000..617b011451 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/stores/FinatraKeyValueStore.scala @@ -0,0 +1,124 @@ +package com.twitter.finatra.streams.stores +import com.twitter.finatra.streams.transformer.domain.TimerResult +import org.apache.kafka.streams.processor.TaskId +import org.apache.kafka.streams.state.{KeyValueIterator, KeyValueStore} + +trait FinatraKeyValueStore[K, V] extends KeyValueStore[K, V] { + + /** + * The task id associated with this store + */ + def taskId: TaskId + + /** + * Get an iterator over a given range of keys. This iterator must be closed after use. + * The returned iterator must be safe from {@link java.util.ConcurrentModificationException}s + * and must not return null values. No ordering guarantees are provided. + * + * @param from The first key that could be in the range + * @param to The last key that could be in the range + * @param allowStaleReads Allow stale reads when querying a caching key value store. If set to false, + * each query will trigger a flush of the cache. + * + * @return The iterator for this range. + * + * @throws NullPointerException If null is used for from or to. + * @throws InvalidStateStoreException if the store is not initialized + */ + def range(from: K, to: K, allowStaleReads: Boolean): KeyValueIterator[K, V] + + @deprecated("no longer supported", "1/7/2019") + def deleteRange(from: K, to: K, maxDeletes: Int = 25000): TimerResult[K] + + /** + * Delete the value from the store (if there is one) + * Note: This version of delete avoids getting the prior value which keyValueStore.delete does + * + * @param key The key + * + * @return The old value or null if there is no such key. + * + * @throws NullPointerException If null is used for key. + */ + def deleteWithoutGettingPriorValue(key: K): Unit + + /** + * Get the value corresponding to this key or return the specified default value if no key is found + * + * @param key The key to fetch + * @param default The default value to return if key is not found in the store + * + * @return The value associated with the key or the default value if the key is not found + * + * @throws NullPointerException If null is used for key. + * @throws InvalidStateStoreException if the store is not initialized + */ + def getOrDefault(key: K, default: => V): V + + /** + * A range scan starting from bytes. + * + * Note 1: This is an API for Advanced users only + * + * Note 2: If this RocksDB instance is configured in "prefix seek mode", than fromBytes will be used as a "prefix" and the iteration will end when the prefix is no longer part of the next element. + * Enabling "prefix seek mode" can be done by calling options.useFixedLengthPrefixExtractor. When enabled, prefix scans can take advantage of a prefix based bloom filter for better seek performance + * See: https://github.com/facebook/rocksdb/wiki/Prefix-Seek-API-Changes + */ + def range(fromBytes: Array[Byte]): KeyValueIterator[K, V] + + /** + * Get an iterator over a given range of keys. This iterator must be closed after use. + * The returned iterator must be safe from {@link java.util.ConcurrentModificationException}s + * and must not return null values. No ordering guarantees are provided. + * + * @param fromBytesInclusive Inclusive bytes to start the range scan + * @param toBytesExclusive Exclusive bytes to end the range scan + * + * @return The iterator for this range. + * + * @throws NullPointerException If null is used for from or to. + * @throws InvalidStateStoreException if the store is not initialized + */ + def range(fromBytesInclusive: Array[Byte], toBytesExclusive: Array[Byte]): KeyValueIterator[K, V] + + /** + Removes the database entries in the range ["begin_key", "end_key"), i.e., + including "begin_key" and excluding "end_key". Returns OK on success, and + a non-OK status on error. It is not an error if no keys exist in the range + ["begin_key", "end_key"). + + This feature is currently an experimental performance optimization for + deleting very large ranges of contiguous keys. Invoking it many times or on + small ranges may severely degrade read performance; in particular, the + resulting performance can be worse than calling Delete() for each key in + the range. Note also the degraded read performance affects keys outside the + deleted ranges, and affects database operations involving scans, like flush + and compaction. + + Consider setting ReadOptions::ignore_range_deletions = true to speed + up reads for key(s) that are known to be unaffected by range deletions. + + Note: Changelog entries will not be deleted, so this method is best used + when relying on retention.ms to delete entries from the changelog + */ + def deleteRangeExperimentalWithNoChangelogUpdates( + beginKeyInclusive: Array[Byte], + endKeyExclusive: Array[Byte] + ): Unit + + /* + Note: We define equals and hashcode so we can store Finatra Key Value stores in maps for + retrieval when implementing queryable state + */ + + override def equals(other: Any): Boolean = other match { + case that: FinatraKeyValueStore[_, _] => + taskId == that.taskId && + name == that.name() + case _ => false + } + + override def hashCode(): Int = { + 31 * taskId.hashCode() + name.hashCode + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/stores/internal/CachingFinatraKeyValueStoreImpl.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/stores/internal/CachingFinatraKeyValueStoreImpl.scala new file mode 100644 index 0000000000..c3e8efe6fc --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/stores/internal/CachingFinatraKeyValueStoreImpl.scala @@ -0,0 +1,259 @@ +package com.twitter.finatra.streams.stores.internal + +import com.twitter.finagle.stats.{Gauge, StatsReceiver} +import com.twitter.finatra.streams.stores.{CachingFinatraKeyValueStore, FinatraKeyValueStore} +import com.twitter.finatra.streams.transformer.domain.TimerResult +import com.twitter.inject.Logging +import it.unimi.dsi.fastutil.objects.Object2ObjectOpenHashMap +import java.util +import java.util.function.BiConsumer +import org.apache.kafka.streams.KeyValue +import org.apache.kafka.streams.processor.{ProcessorContext, StateStore, TaskId} +import org.apache.kafka.streams.state.KeyValueIterator +import scala.reflect.ClassTag + +/** + * A write-behind caching layer around the FinatraKeyValueStore. + * + * We cache Java objects here and then periodically flush entries into RocksDB which involves + * serializing the objects into byte arrays. As such this cache: + * 1) Reduces the number of reads/writes to RocksDB + * 2) Reduces the number of serialization/deserialization operations which can be expensive for some classes + * 3) Reduces the number of publishes to the Kafka changelog topic backing this key value store + * + * This caching does introduce a few odd corner cases :-( + * 1. Items in the cache have pass-by-reference semantics but items in rocksdb have pass-by-value semantics. Modifying items after a put is a bad idea! Ideally, only + * immutable objects would be stored in a CachingFinatraKeyValueStore + * 2. Range queries currently only work against the uncached RocksDB data. + * This is because sorted Java maps are much less performant than their unsorted counterparts. + * We typically only use range queries for queryable state where it is ok to read stale data + * If fresher data is required for range queries, decrease your commit interval. + * + * This class is inspired by: https://github.com/apache/samza/blob/1.0.0/samza-kv/src/main/scala/org/apache/samza/storage/kv/CachedStore.scala + */ +class CachingFinatraKeyValueStoreImpl[K: ClassTag, V]( + statsReceiver: StatsReceiver, + keyValueStore: FinatraKeyValueStore[K, V]) + extends CachingFinatraKeyValueStore[K, V] + with Logging { + + private var numCacheEntriesGauge: Gauge = _ + + private var _taskId: TaskId = _ + + /* Regarding concurrency, Kafka Stream's Transformer interface assures us that only 1 thread will ever modify this map. + * However, when using QueryableState, there may be concurrent readers of this map. FastUtil documentation says the following: + * ** All classes are not synchronized. + * ** If multiple threads access one of these classes concurrently, and at least one of the threads modifies it, it must be synchronized externally. + * ** Iterators will behave unpredictably in the presence of concurrent modifications. + * ** Reads, however, can be carried out concurrently. + * + * Since we only ever execute concurrent gets from queryable state and don't access iterators, we are safe performing these "gets" concurrently along with Transformer read and writes + */ + private val objectCache = new Object2ObjectOpenHashMap[K, V] + + private var flushListener: (K, V) => Unit = _ + + /* + TODO: Consider making this a "batch consumer" so we could use keyValueStore.putAll which uses RocksDB WriteBatch + which may lead to better performance... + See: https://github.com/facebook/rocksdb/wiki/RocksDB-FAQ "Q: What's the fastest way to load data into RocksDB?" + See: https://github.com/apache/samza/blob/1.0.0/samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStore.scala#L175 + */ + private val flushListenerBiConsumer = new BiConsumer[K, V] { + override def accept(key: K, value: V): Unit = { + debug(s"flush_put($key -> $value") + keyValueStore.put(key, value) + if (flushListener != null) { + flushListener(key, value) + } + } + } + + /* Public */ + + /** + * Register a flush listener callback that will be called every time a cached key value store + * entry is flushed into the underlying RocksDB store + * @param listener Flush callback for cached entries + */ + def registerFlushListener(listener: (K, V) => Unit): Unit = { + assert(flushListener == null, "Can only currently call registerFlushListener once") + flushListener = listener + } + + override def taskId: TaskId = _taskId + + override def name(): String = keyValueStore.name + + override def init(processorContext: ProcessorContext, stateStore: StateStore): Unit = { + _taskId = processorContext.taskId() + + numCacheEntriesGauge = statsReceiver + .scope("stores") + .scope(name) + .addGauge(s"numCacheEntries")(objectCache.size()) + + keyValueStore.init(processorContext, stateStore) + } + + override def flush(): Unit = { + trace("flush") + flushObjectCache() + keyValueStore.flush() + } + + override def close(): Unit = { + flushListener = null + if (numCacheEntriesGauge != null) { + numCacheEntriesGauge.remove() + numCacheEntriesGauge = null + } + keyValueStore.close() + } + + override def put(key: K, value: V): Unit = { + trace(s"put($key -> $value") + objectCache.put(key, value) + } + + override def putIfAbsent(k: K, v: V): V = { + objectCache.putIfAbsent(k, v) + } + + override def putAll(list: util.List[KeyValue[K, V]]): Unit = { + val iterator = list.iterator() + while (iterator.hasNext) { + val entry = iterator.next() + objectCache.put(entry.key, entry.value) + } + } + + override def delete(k: K): V = { + objectCache.remove(k) + keyValueStore.delete(k) + } + + override def get(k: K): V = { + trace(s"get($k)") + val cacheResult = objectCache.get(k) + if (cacheResult != null) { + cacheResult + } else { + keyValueStore.get(k) + } + } + + override def getOrDefault(k: K, default: => V): V = { + trace(s"getOrDefault($k)") + val result = get(k) + if (result != null) { + result + } else { + default + } + } + + override def deleteWithoutGettingPriorValue(key: K): Unit = { + objectCache.remove(key) + keyValueStore.put(key, null.asInstanceOf[V]) + } + + override def all(): KeyValueIterator[K, V] = { + flushObjectCache() + keyValueStore.all() + } + + override def range(fromInclusive: K, toInclusive: K): KeyValueIterator[K, V] = { + flushObjectCache() + keyValueStore.range(fromInclusive, toInclusive) + } + + override def range(fromBytesInclusive: Array[Byte]): KeyValueIterator[K, V] = { + flushObjectCache() + keyValueStore.range(fromBytesInclusive) + } + + override def range( + fromBytesInclusive: Array[Byte], + toBytesExclusive: Array[Byte] + ): KeyValueIterator[K, V] = { + flushObjectCache() + keyValueStore.range(fromBytesInclusive, toBytesExclusive) + } + + override def range( + fromInclusive: K, + toInclusive: K, + allowStaleReads: Boolean + ): KeyValueIterator[K, V] = { + trace(s"range($fromInclusive to $toInclusive)") + if (allowStaleReads) { + staleRange(fromInclusive, toInclusive) + } else { + flushObjectCache() + keyValueStore.range(fromInclusive, toInclusive) + } + } + + override def deleteRangeExperimentalWithNoChangelogUpdates( + beginKeyInclusive: Array[Byte], + endKeyExclusive: Array[Byte] + ): Unit = { + flushObjectCache() + keyValueStore.deleteRangeExperimentalWithNoChangelogUpdates(beginKeyInclusive, endKeyExclusive) + } + + override def deleteRange(from: K, to: K, maxDeletes: Int): TimerResult[K] = { + flushObjectCache() + keyValueStore.deleteRange(from, to, maxDeletes) + } + + override def approximateNumEntries(): Long = { + keyValueStore.approximateNumEntries() + } + + override def persistent(): Boolean = keyValueStore.persistent() + + override def isOpen: Boolean = keyValueStore.isOpen + + /* Private */ + + private def flushObjectCache(): Unit = { + if (!objectCache.isEmpty) { + objectCache.forEach(flushListenerBiConsumer) + objectCache.clear() + } + } + + /* A stale range read will occur for new keys (meaning that new keys will not be returned by this + * method until a flush/commit. Existing keys with stale values in rocksdb will be + * updated by checking the cache on the way out. In this way, we use RocksDB for efficient sorting + * but can still leverage the most recent values in the cache... */ + private def staleRange(fromInclusive: K, toInclusive: K) = { + new KeyValueIterator[K, V] { + private val iterator = keyValueStore.range(fromInclusive, toInclusive) + + override def hasNext: Boolean = iterator.hasNext + + override def peekNextKey(): K = { + iterator.peekNextKey() + } + + override def next(): KeyValue[K, V] = { + val result = iterator.next() + val newerResultValue = objectCache.get(result.key) + if (newerResultValue != null) { + new KeyValue(result.key, newerResultValue) + } else { + result + } + } + + override def close(): Unit = { + iterator.close() + } + } + } + +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/stores/internal/FinatraKeyValueStoreImpl.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/stores/internal/FinatraKeyValueStoreImpl.scala new file mode 100644 index 0000000000..22e84cea9d --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/stores/internal/FinatraKeyValueStoreImpl.scala @@ -0,0 +1,296 @@ +package com.twitter.finatra.streams.stores.internal + +import com.twitter.finagle.stats.{Gauge, Stat, StatsReceiver} +import com.twitter.finatra.kafkastreams.internal.utils.ReflectionUtils +import com.twitter.finatra.streams.stores.FinatraKeyValueStore +import com.twitter.finatra.streams.stores.internal.FinatraKeyValueStoreImpl._ +import com.twitter.finatra.streams.transformer.IteratorImplicits +import com.twitter.finatra.streams.transformer.domain.{DeleteTimer, RetainTimer, TimerResult} +import com.twitter.inject.Logging +import java.util +import java.util.Comparator +import java.util.concurrent.TimeUnit +import org.apache.kafka.common.serialization.{Deserializer, Serializer} +import org.apache.kafka.common.utils.Bytes +import org.apache.kafka.streams.KeyValue +import org.apache.kafka.streams.processor.{ProcessorContext, StateStore, TaskId} +import org.apache.kafka.streams.state.internals.{ + MeteredKeyValueBytesStore, + RocksDBStore, + RocksKeyValueIterator +} +import org.apache.kafka.streams.state.{KeyValueIterator, KeyValueStore, StateSerdes} +import org.rocksdb.{RocksDB, WriteOptions} +import scala.collection.JavaConverters._ +import scala.reflect.ClassTag + +object FinatraKeyValueStoreImpl { + val InitLatencyStatName = "init" + val CloseLatencyStatName = "close" + val PutLatencyStatName = "put" + val PutIfAbsentLatencyStatName = "put_if_absent" + val PutAllLatencyStatName = "put_all" + val DeleteLatencyStatName = "delete" + val FlushLatencyStatName = "flush" + val PersistentLatencyStatName = "persistent" + val IsOpenLatencyStatName = "is_open" + val GetLatencyStatName = "get" + val RangeLatencyStatName = "range" + val AllLatencyStatName = "all" + val ApproximateNumEntriesLatencyStatName = "approximate_num_entries" + val DeleteRangeLatencyStatName = "delete_range" + val DeleteWithoutGettingPriorValueLatencyStatName = "delete_without_getting_prior_value" + val FinatraRangeLatencyStatName = "finatra_range" + val DeleteRangeExperimentalLatencyStatName = "delete_range_experimental" +} + +case class FinatraKeyValueStoreImpl[K: ClassTag, V]( + override val name: String, + statsReceiver: StatsReceiver) + extends KeyValueStore[K, V] + with Logging + with IteratorImplicits + with FinatraKeyValueStore[K, V] { + + private val latencyStatName: String = "latency_us" + private val storeStatsScope: StatsReceiver = statsReceiver.scope("stores").scope(name) + + /* Private Mutable */ + private var _taskId: TaskId = _ + private var _keyValueStore: MeteredKeyValueBytesStore[K, V] = _ + private var rocksDb: RocksDB = _ + private var writeOptions: WriteOptions = _ + private var serdes: StateSerdes[K, V] = _ + private var keySerializer: Serializer[K] = _ + private var keyDeserializer: Deserializer[K] = _ + private var valueDeserializer: Deserializer[V] = _ + private var numEntriesGauge: Gauge = _ + + /* Private Stats */ + private val initLatencyStat = createStat(InitLatencyStatName) + private val closeLatencyStat = createStat(CloseLatencyStatName) + private val putLatencyStat = createStat(PutLatencyStatName) + private val putIfAbsentLatencyStat = createStat(PutIfAbsentLatencyStatName) + private val putAllLatencyStat = createStat(PutAllLatencyStatName) + private val deleteLatencyStat = createStat(DeleteLatencyStatName) + private val flushLatencyStat = createStat(FlushLatencyStatName) + private val persistentLatencyStat = createStat(PersistentLatencyStatName) + private val isOpenLatencyStat = createStat(IsOpenLatencyStatName) + private val getLatencyStat = createStat(GetLatencyStatName) + private val rangeLatencyStat = createStat(RangeLatencyStatName) + private val allLatencyStat = createStat(AllLatencyStatName) + private val approximateNumEntriesLatencyStat = createStat(ApproximateNumEntriesLatencyStatName) + private val deleteRangeLatencyStat = createStat(DeleteRangeLatencyStatName) + private val deleteWithoutGettingPriorValueLatencyStat = createStat( + DeleteWithoutGettingPriorValueLatencyStatName) + private val finatraRangeLatencyStat = createStat(FinatraRangeLatencyStatName) + private val deleteRangeExperimentalLatencyStat = createStat( + DeleteRangeExperimentalLatencyStatName) + + /* Public */ + + override def init(processorContext: ProcessorContext, root: StateStore): Unit = { + _taskId = processorContext.taskId() + + meterLatency(initLatencyStat) { + _keyValueStore = processorContext + .getStateStore(name) + .asInstanceOf[MeteredKeyValueBytesStore[K, V]] + + serdes = ReflectionUtils.getField[StateSerdes[K, V]](_keyValueStore, "serdes") + keySerializer = serdes.keySerializer() + keyDeserializer = serdes.keyDeserializer() + valueDeserializer = serdes.valueDeserializer() + + _keyValueStore.inner() match { + case rocksDbStore: RocksDBStore => + rocksDb = ReflectionUtils.getField[RocksDB](rocksDbStore, "db") + case _ => + throw new Exception("FinatraTransformer only supports RocksDB State Stores") + } + + writeOptions = new WriteOptions + writeOptions.setDisableWAL(true) + + numEntriesGauge = + storeStatsScope.addGauge(s"approxNumEntries")(_keyValueStore.approximateNumEntries) + } + } + + override def close(): Unit = { + meterLatency(closeLatencyStat) { + if (numEntriesGauge != null) { + numEntriesGauge.remove() + } + numEntriesGauge = null + + _keyValueStore = null + rocksDb = null + + if (writeOptions != null) { + writeOptions.close() + writeOptions = null + } + + serdes = null + keySerializer = null + keyDeserializer = null + valueDeserializer = null + } + } + + override def taskId: TaskId = _taskId + + override def put(key: K, value: V): Unit = + meterLatency(putLatencyStat)(keyValueStore.put(key, value)) + + override def putIfAbsent(k: K, v: V): V = + meterLatency(putIfAbsentLatencyStat)(keyValueStore.putIfAbsent(k, v)) + + override def putAll(list: util.List[KeyValue[K, V]]): Unit = + meterLatency(putAllLatencyStat)(keyValueStore.putAll(list)) + + override def delete(k: K): V = meterLatency(deleteLatencyStat)(keyValueStore.delete(k)) + + override def flush(): Unit = meterLatency(flushLatencyStat)(keyValueStore.flush()) + + override def persistent(): Boolean = + meterLatency(persistentLatencyStat)(keyValueStore.persistent()) + + override def isOpen: Boolean = + _keyValueStore != null && meterLatency(isOpenLatencyStat)(keyValueStore.isOpen) + + override def get(key: K): V = meterLatency(getLatencyStat)(keyValueStore.get(key)) + + override def range(from: K, to: K): KeyValueIterator[K, V] = + meterLatency(rangeLatencyStat)(keyValueStore.range(from, to)) + + override def range(from: K, to: K, allowStaleReads: Boolean): KeyValueIterator[K, V] = + meterLatency(rangeLatencyStat)(keyValueStore.range(from, to)) + + override def all(): KeyValueIterator[K, V] = meterLatency(allLatencyStat)(keyValueStore.all()) + + override def approximateNumEntries(): Long = + meterLatency(approximateNumEntriesLatencyStat)(keyValueStore.approximateNumEntries()) + + /* Finatra Additions */ + + @deprecated("no longer supported", "1/7/2019") + override def deleteRange(from: K, to: K, maxDeletes: Int = 25000): TimerResult[K] = { + meterLatency(deleteRangeLatencyStat) { + val iterator = range(from, to) + try { + val keysToDelete = iterator.asScala + .take(maxDeletes) + .map(keyValue => new KeyValue(keyValue.key, null.asInstanceOf[V])) + .toList + .asJava + + putAll(keysToDelete) + deleteOrRetainTimer(iterator) + } finally { + iterator.close() + } + } + } + + // Optimization which avoid getting the prior value which keyValueStore.delete does :-/ + override final def deleteWithoutGettingPriorValue(key: K): Unit = { + meterLatency(deleteWithoutGettingPriorValueLatencyStat) { + keyValueStore.put(key, null.asInstanceOf[V]) + } + } + + override final def getOrDefault(key: K, default: => V): V = { + val existing = keyValueStore.get(key) + if (existing == null) { + default + } else { + existing + } + } + + /** + * A range scan starting from bytes. If RocksDB "prefix seek mode" is not enabled, than the iteration will NOT end when fromBytes is no longer the prefix + * + * Note 1: This is an API for Advanced users only + * + * Note 2: If this RocksDB instance is configured in "prefix seek mode", than fromBytes will be used as a "prefix" and the iteration will end when the prefix is no longer part of the next element. + * Enabling "prefix seek mode" can be done by calling options.useFixedLengthPrefixExtractor. When enabled, prefix scans can take advantage of a prefix based bloom filter for better seek performance + * See: https://github.com/facebook/rocksdb/wiki/Prefix-Seek-API-Changes + */ + override def range(fromBytes: Array[Byte]): KeyValueIterator[K, V] = { + meterLatency(finatraRangeLatencyStat) { + val iterator = rocksDb.newIterator() //TODO: Save off iterators to make sure they are all closed... + iterator.seek(fromBytes) + + new RocksKeyValueIterator(iterator, keyDeserializer, valueDeserializer, keyValueStore.name) + } + } + + override def range( + fromBytesInclusive: Array[Byte], + toBytesExclusive: Array[Byte] + ): KeyValueIterator[K, V] = + meterLatency(finatraRangeLatencyStat) { + val iterator = rocksDb.newIterator() + iterator.seek(fromBytesInclusive) + + new RocksKeyValueIterator(iterator, keyDeserializer, valueDeserializer, keyValueStore.name) { + private val comparator: Comparator[Array[Byte]] = Bytes.BYTES_LEXICO_COMPARATOR + + override def hasNext: Boolean = { + super.hasNext && + comparator.compare(iterator.key(), toBytesExclusive) < 0 // < 0 since to is exclusive + } + } + } + + override def deleteRangeExperimentalWithNoChangelogUpdates( + beginKeyInclusive: Array[Byte], + endKeyExclusive: Array[Byte] + ): Unit = { + meterLatency(deleteRangeExperimentalLatencyStat) { + rocksDb.deleteRange(beginKeyInclusive, endKeyExclusive) + } + } + + /* Private */ + + private def meterLatency[T](stat: Stat)(operation: => T): T = { + Stat.time[T](stat, TimeUnit.MICROSECONDS) { + try { + operation + } catch { + case e: Throwable => + error("Failure operation", e) + throw e + } + } + } + + @deprecated + private def deleteOrRetainTimer( + iterator: KeyValueIterator[K, _], + onDeleteTimer: => Unit = () => () + ): TimerResult[K] = { + if (iterator.hasNext) { + RetainTimer(stateStoreCursor = iterator.peekNextKeyOpt, throttled = true) + } else { + onDeleteTimer + DeleteTimer() + } + } + + private def keyValueStore: KeyValueStore[K, V] = { + assert( + _keyValueStore != null, + "FinatraTransformer.getKeyValueStore must be called once outside of onMessage" + ) + _keyValueStore + } + + private def createStat(name: String) = { + storeStatsScope.scope(name).stat(latencyStatName) + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/stores/internal/FinatraStoresGlobalManager.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/stores/internal/FinatraStoresGlobalManager.scala new file mode 100644 index 0000000000..68bff217ad --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/stores/internal/FinatraStoresGlobalManager.scala @@ -0,0 +1,60 @@ +package com.twitter.finatra.streams.stores.internal + +import scala.collection.JavaConverters._ +import com.google.common.collect.{ArrayListMultimap, Multimaps} +import com.twitter.finatra.streams.stores.FinatraKeyValueStore +import com.twitter.finatra.streams.transformer.domain.{CompositeKey, TimeWindowed} +import scala.reflect.ClassTag + +/** + * This class stores a global list of all Finatra Key Value Stores for use by Finatra's + * queryable state functionality. + * + * Note: We maintain or own global list of state stores so we can retrieve Finatra + * FinatraKeyValueStore implementations directly for querying (the alternative would be to retrieve + * the underlying RocksDBStore which is wrapped by the FinatraKeyValueStore implementations). + */ +object FinatraStoresGlobalManager { + + private[stores] val queryableStateStoreNameToStores = + Multimaps.synchronizedMultimap(ArrayListMultimap.create[String, FinatraKeyValueStore[_, _]]()) + + /** + * @return True if the added store was new + */ + def addStore[VV, KK: ClassTag](store: FinatraKeyValueStore[KK, VV]): Boolean = { + queryableStateStoreNameToStores.put(store.name, store) + } + + def removeStore(store: FinatraKeyValueStore[_, _]): Unit = { + queryableStateStoreNameToStores.remove(store.name, store) + } + + def getWindowedCompositeStores[PK, SK, V]( + storeName: String + ): Iterable[FinatraKeyValueStore[TimeWindowed[CompositeKey[PK, SK]], V]] = { + queryableStateStoreNameToStores + .get(storeName) + .asScala + .asInstanceOf[Iterable[FinatraKeyValueStore[TimeWindowed[CompositeKey[PK, SK]], V]]] + .filter(_.isOpen) + } + + def getWindowedStores[K, V]( + storeName: String + ): Iterable[FinatraKeyValueStore[TimeWindowed[K], V]] = { + queryableStateStoreNameToStores + .get(storeName) + .asScala + .asInstanceOf[Iterable[FinatraKeyValueStore[TimeWindowed[K], V]]] + .filter(_.isOpen) + } + + def getStores[K, V](storeName: String): Iterable[FinatraKeyValueStore[K, V]] = { + queryableStateStoreNameToStores + .get(storeName) + .asScala + .asInstanceOf[Iterable[FinatraKeyValueStore[K, V]]] + .filter(_.isOpen) + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/thriftscala/WindowResultType.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/thriftscala/WindowResultType.scala new file mode 100644 index 0000000000..563eec85aa --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/thriftscala/WindowResultType.scala @@ -0,0 +1,11 @@ +package com.twitter.finatra.streams.thriftscala + +object WindowResultType { + @deprecated("Use com.twitter.finatra.streams.transformer.domain.WindowClosed") + object WindowClosed + extends com.twitter.finatra.streams.transformer.domain.WindowResultType( + com.twitter.finatra.streams.transformer.domain.WindowClosed.value) { + + override def toString: String = "WindowClosed" + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/AggregatorTransformer.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/AggregatorTransformer.scala new file mode 100644 index 0000000000..da63f1dc87 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/AggregatorTransformer.scala @@ -0,0 +1,206 @@ +package com.twitter.finatra.streams.transformer + +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finatra.kafka.serde.ScalaSerdes +import com.twitter.finatra.streams.stores.CachingFinatraKeyValueStore +import com.twitter.finatra.streams.transformer.FinatraTransformer.WindowStartTime +import com.twitter.finatra.streams.transformer.domain._ +import com.twitter.util.Duration +import it.unimi.dsi.fastutil.longs.LongOpenHashSet +import org.apache.kafka.streams.processor.PunctuationType +import org.apache.kafka.streams.state.KeyValueIterator + +/** + * An aggregating transformer for fixed windows which + * offers additional controls that are not included in the built in Kafka Streams Windowing DSL + * + * A TimeWindow is a tumbling window of fixed length defined by the windowSize parameter. + * + * A Window is closed after event time passes the end of a TimeWindow + allowedLateness. + * + * After a window is closed, if emitOnClose=true it is forwarded out of this transformer with a + * [[WindowedValue.resultState]] of [[com.twitter.finatra.streams.transformer.domain.WindowClosed]] + * + * If a record arrives after a window is closed it is immediately forwarded out of this + * transformer with a [[WindowedValue.resultState]] of [[com.twitter.finatra.streams.transformer.domain.Restatement]] + * + * @param statsReceiver The StatsReceiver for collecting stats + * @param stateStoreName the name of the StateStore used to maintain the counts. + * @param timerStoreName the name of the StateStore used to maintain the timers. + * @param windowSize splits the stream of data into buckets of data of windowSize, + * based on the timestamp of each message. + * @param allowedLateness allow messages that are upto this amount late to be added to the + * store, otherwise they are emitted as restatements. + * @param queryableAfterClose allow state to be queried upto this amount after the window is closed. + * @param initializer Initializer function that computes an initial intermediate aggregation result + * @param aggregator Aggregator function that computes a new aggregate result + * @param emitOnClose Emit messages for each entry in the window when the window close. Emitted + * entries will have a WindowResultType set to WindowClosed. + * @param emitUpdatedEntriesOnCommit Emit messages for each updated entry in the window on the Kafka + * Streams commit interval. Emitted entries will have a + * WindowResultType set to WindowOpen. + * @return a stream of Keys for a particular timewindow, and the aggregations of the values for that + * key within a particular timewindow. + */ +class AggregatorTransformer[K, V, Aggregate]( + override val statsReceiver: StatsReceiver, + stateStoreName: String, + timerStoreName: String, + windowSize: Duration, + allowedLateness: Duration, + initializer: () => Aggregate, + aggregator: ((K, V), Aggregate) => Aggregate, + customWindowStart: (Time, K, V) => Long, + emitOnClose: Boolean = false, + queryableAfterClose: Duration, + emitUpdatedEntriesOnCommit: Boolean, + val commitInterval: Duration) + extends FinatraTransformerV2[K, V, TimeWindowed[K], WindowedValue[Aggregate]](statsReceiver) + with CachingKeyValueStores[K, V, TimeWindowed[K], WindowedValue[Aggregate]] + with PersistentTimers { + + private val windowSizeMillis = windowSize.inMillis + private val allowedLatenessMillis = allowedLateness.inMillis + private val queryableAfterCloseMillis = queryableAfterClose.inMillis + + private val emitEarlyCounter = statsReceiver.counter("emitEarly") + private val closedWindowCounter = statsReceiver.counter("closedWindows") + private val expiredWindowCounter = statsReceiver.counter("expiredWindows") + private val restatementsCounter = statsReceiver.counter("numRestatements") + + private val longSerializer = ScalaSerdes.Long.serializer + private val nonExpiredWindowStartTimes = new LongOpenHashSet() + + private val stateStore: CachingFinatraKeyValueStore[TimeWindowed[K], Aggregate] = + getCachingKeyValueStore[TimeWindowed[K], Aggregate](stateStoreName) + + private val timerStore = getPersistentTimerStore[WindowStartTime]( + timerStoreName = timerStoreName, + onTimer = onEventTimer, + punctuationType = PunctuationType.STREAM_TIME) + + /* Public */ + + override def onInit(): Unit = { + super.onInit() + nonExpiredWindowStartTimes.clear() + stateStore.registerFlushListener(onFlushed) + } + + override def onMessage(time: Time, key: K, value: V): Unit = { + val windowedKey = TimeWindowed.forSize( + startMs = windowStart(time, key, value), + sizeMs = windowSizeMillis, + value = key) + + if (windowedKey.isLate(allowedLatenessMillis, watermark)) { + restatement(time, key, value, windowedKey) + } else { + addWindowTimersIfNew(windowedKey.startMs) + + val currentAggregateValue = stateStore.getOrDefault(windowedKey, initializer()) + stateStore.put(windowedKey, aggregator((key, value), currentAggregateValue)) + } + } + + /* Private */ + + //TODO: Optimize for when Close and Expire are at the same time e.g. TimerMetadata.CloseAndExpire + private def addWindowTimersIfNew(windowStartTime: WindowStartTime): Unit = { + val isNewWindow = nonExpiredWindowStartTimes.add(windowStartTime) + if (isNewWindow) { + val closeTime = windowStartTime + windowSizeMillis + allowedLatenessMillis + if (emitOnClose) { + timerStore.addTimer(Time(closeTime), Close, windowStartTime) + } + + timerStore.addTimer(Time(closeTime + queryableAfterCloseMillis), Expire, windowStartTime) + } + } + + private def onFlushed(timeWindowedKey: TimeWindowed[K], value: Aggregate): Unit = { + if (emitUpdatedEntriesOnCommit) { + emitEarlyCounter.incr() + val existing = stateStore.get(timeWindowedKey) + forward( + key = timeWindowedKey, + value = WindowedValue(resultState = WindowOpen, value = existing), + timestamp = forwardTime) + } + } + + private def restatement(time: Time, key: K, value: V, windowedKey: TimeWindowed[K]): Unit = { + val windowedValue = + WindowedValue(resultState = Restatement, value = aggregator((key, value), initializer())) + + forward(key = windowedKey, value = windowedValue, timestamp = forwardTime) + + restatementsCounter.incr() + } + + private def onEventTimer( + time: Time, + timerMetadata: TimerMetadata, + windowStartTime: WindowStartTime + ): Unit = { + debug(s"onEventTimer $time $timerMetadata WindowStartTime(${windowStartTime.iso8601Millis})") + val windowedEntriesIterator = stateStore.range( + fromBytesInclusive = windowStartTimeBytes(windowStartTime), + toBytesExclusive = windowStartTimeBytes(windowStartTime + 1)) + + try { + if (timerMetadata == Close) { + onClosed(windowStartTime, windowedEntriesIterator) + } else { + onExpired(windowStartTime, windowedEntriesIterator) + } + } finally { + windowedEntriesIterator.close() + } + } + + private def onClosed( + windowStartTime: WindowStartTime, + windowIterator: KeyValueIterator[TimeWindowed[K], Aggregate] + ): Unit = { + while (windowIterator.hasNext) { + val entry = windowIterator.next() + assert(entry.key.startMs == windowStartTime) + forward( + key = entry.key, + value = WindowedValue(resultState = WindowClosed, value = entry.value), + timestamp = forwardTime) + } + + closedWindowCounter.incr() + } + + private def onExpired( + windowStartTime: WindowStartTime, + windowIterator: KeyValueIterator[TimeWindowed[K], Aggregate] + ): Unit = { + stateStore.deleteRangeExperimentalWithNoChangelogUpdates( + beginKeyInclusive = windowStartTimeBytes(windowStartTime), + endKeyExclusive = windowStartTimeBytes(windowStartTime + 1)) + + nonExpiredWindowStartTimes.remove(windowStartTime) + + expiredWindowCounter.incr() + } + + private def windowStartTimeBytes(windowStartMs: Long): Array[Byte] = { + longSerializer.serialize("", windowStartMs) + } + + private def windowStart(time: Time, key: K, value: V): Long = { + if (customWindowStart != null) { + customWindowStart(time, key, value) + } else { + TimeWindowed.windowStart(time, windowSizeMillis) + } + } + + private def forwardTime: Long = { + watermark.timeMillis + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/CachingKeyValueStores.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/CachingKeyValueStores.scala new file mode 100644 index 0000000000..615bcb8a73 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/CachingKeyValueStores.scala @@ -0,0 +1,47 @@ +package com.twitter.finatra.streams.transformer + +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finatra.kafkastreams.processors.FlushingTransformer +import com.twitter.finatra.streams.stores.internal.{ + CachingFinatraKeyValueStoreImpl, + FinatraKeyValueStoreImpl, + FinatraStoresGlobalManager +} +import com.twitter.finatra.streams.stores.{CachingFinatraKeyValueStore, FinatraKeyValueStore} +import scala.collection.mutable +import scala.reflect.ClassTag + +trait CachingKeyValueStores[K, V, K1, V1] extends FlushingTransformer[K, V, K1, V1] { + + protected def statsReceiver: StatsReceiver + + protected def finatraKeyValueStoresMap: mutable.Map[String, FinatraKeyValueStore[_, _]] + + override def onFlush(): Unit = { + finatraKeyValueStoresMap.values.foreach(_.flush()) + } + + /** + * Lookup a caching key value store by name + * @param name The name of the store + * @tparam KK Type of keys in the store + * @tparam VV Type of values in the store + * @return A caching key value store + */ + protected def getCachingKeyValueStore[KK: ClassTag, VV]( + name: String + ): CachingFinatraKeyValueStore[KK, VV] = { + val store = new CachingFinatraKeyValueStoreImpl[KK, VV]( + statsReceiver, + new FinatraKeyValueStoreImpl[KK, VV](name, statsReceiver)) + + val previousStore = finatraKeyValueStoresMap.put(name, store) + assert( + previousStore.isEmpty, + s"getCachingKeyValueStore was called for store $name more than once") + FinatraStoresGlobalManager.addStore(store) + + store + } + +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/CompositeSumAggregator.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/CompositeSumAggregator.scala new file mode 100644 index 0000000000..95375ce2af --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/CompositeSumAggregator.scala @@ -0,0 +1,142 @@ +package com.twitter.finatra.streams.transformer + +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finatra.streams.transformer.FinatraTransformer.WindowStartTime +import com.twitter.finatra.streams.transformer.domain._ +import com.twitter.util.Duration +import org.apache.kafka.streams.state.KeyValueIterator + +@deprecated("Use AggregatorTransformer", "1/7/2019") +class CompositeSumAggregator[K, A, CK <: CompositeKey[K, A]]( + commitInterval: Duration, + compositeKeyRangeStart: CK, + statsReceiver: StatsReceiver, + stateStoreName: String, + timerStoreName: String, + windowSize: Duration, + allowedLateness: Duration, + queryableAfterClose: Duration, + emitOnClose: Boolean = true, + maxActionsPerTimer: Int = 25000) + extends FinatraTransformer[ + CK, + Int, + TimeWindowed[CK], + WindowStartTime, + TimeWindowed[K], + WindowedValue[ + scala.collection.Map[A, Int] + ]](timerStoreName = timerStoreName, statsReceiver = statsReceiver, cacheTimers = true) { + + private val windowSizeMillis = windowSize.inMillis + private val allowedLatenessMillis = allowedLateness.inMillis + private val queryableAfterCloseMillis = queryableAfterClose.inMillis + + private val restatementsCounter = statsReceiver.counter("numRestatements") + private val deletesCounter = statsReceiver.counter("numDeletes") + + private val closedCounter = statsReceiver.counter("closedWindows") + private val expiredCounter = statsReceiver.counter("expiredWindows") + private val getLatencyStat = statsReceiver.stat("getLatency") + private val putLatencyStat = statsReceiver.stat("putLatency") + + private val stateStore = getKeyValueStore[TimeWindowed[CK], Int](stateStoreName) + + override def onMessage(time: Time, compositeKey: CK, count: Int): Unit = { + val windowedCompositeKey = TimeWindowed.forSize(time.hourMillis, windowSizeMillis, compositeKey) + if (windowedCompositeKey.isLate(allowedLatenessMillis, Watermark(watermark))) { + restatementsCounter.incr() + forward(windowedCompositeKey.map { _ => + compositeKey.primary + }, WindowedValue(Restatement, Map(compositeKey.secondary -> count))) + } else { + val newCount = stateStore.increment( + windowedCompositeKey, + count, + getStat = getLatencyStat, + putStat = putLatencyStat + ) + if (newCount == count) { + val closeTime = windowedCompositeKey.startMs + windowSizeMillis + allowedLatenessMillis + if (emitOnClose) { + addEventTimeTimer(Time(closeTime), Close, windowedCompositeKey.startMs) + } + addEventTimeTimer( + Time(closeTime + queryableAfterCloseMillis), + Expire, + windowedCompositeKey.startMs + ) + } + } + } + + /* + * TimeWindowedKey(2018-08-04T10:00:00.000Z-20-displayed) -> 50 + * TimeWindowedKey(2018-08-04T10:00:00.000Z-20-fav) -> 10 + * TimeWindowedKey(2018-08-04T10:00:00.000Z-30-displayed) -> 30 + * TimeWindowedKey(2018-08-04T10:00:00.000Z-40-retweet) -> 4 + */ + //Note: We use the cursor even for deletes to skip tombstones that may otherwise slow down the range scan + override def onEventTimer( + time: Time, + timerMetadata: TimerMetadata, + windowStartMs: WindowStartTime, + cursor: Option[TimeWindowed[CK]] + ): TimerResult[TimeWindowed[CK]] = { + debug(s"onEventTimer $time $timerMetadata") + val windowIterator = stateStore.range( + cursor getOrElse TimeWindowed + .forSize(windowStartMs, windowSizeMillis, compositeKeyRangeStart), + TimeWindowed.forSize(windowStartMs + 1, windowSizeMillis, compositeKeyRangeStart) + ) + + try { + if (timerMetadata == Close) { + onClosed(windowStartMs, windowIterator) + } else { + onExpired(windowIterator) + } + } finally { + windowIterator.close() + } + } + + private def onClosed( + windowStartMs: Long, + windowIterator: KeyValueIterator[TimeWindowed[CK], Int] + ): TimerResult[TimeWindowed[CK]] = { + windowIterator + .groupBy( + primaryKey = timeWindowed => timeWindowed.value.primary, + secondaryKey = timeWindowed => timeWindowed.value.secondary, + mapValue = count => count, + sharedMap = true + ) + .take(maxActionsPerTimer) + .foreach { + case (key, countsMap) => + forward( + key = TimeWindowed.forSize(windowStartMs, windowSizeMillis, key), + value = WindowedValue(resultState = WindowClosed, value = countsMap) + ) + } + + deleteOrRetainTimer(windowIterator, onDeleteTimer = closedCounter.incr()) + } + + //Note: We call "put" w/ a null value instead of calling "delete" since "delete" also gets the previous value :-/ + //TODO: Consider performing deletes in a transaction so that queryable state sees all or no keys per "primary key" + private def onExpired( + windowIterator: KeyValueIterator[TimeWindowed[CK], Int] + ): TimerResult[TimeWindowed[CK]] = { + windowIterator + .take(maxActionsPerTimer) + .foreach { + case (timeWindowedCompositeKey, count) => + deletesCounter.incr() + stateStore.put(timeWindowedCompositeKey, null.asInstanceOf[Int]) + } + + deleteOrRetainTimer(windowIterator, onDeleteTimer = expiredCounter.incr()) + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/FinatraTransformer.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/FinatraTransformer.scala new file mode 100644 index 0000000000..f40f4ac784 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/FinatraTransformer.scala @@ -0,0 +1,396 @@ +package com.twitter.finatra.streams.transformer + +import com.google.common.annotations.Beta +import com.twitter.conversions.DurationOps._ +import com.twitter.finagle.stats.{LoadedStatsReceiver, StatsReceiver} +import com.twitter.finatra.kafkastreams.internal.utils.ProcessorContextLogging +import com.twitter.finatra.streams.config.DefaultTopicConfig +import com.twitter.finatra.streams.stores.internal.{ + FinatraKeyValueStoreImpl, + FinatraStoresGlobalManager +} +import com.twitter.finatra.streams.stores.FinatraKeyValueStore +import com.twitter.finatra.streams.stores.internal.FinatraStoresGlobalManager +import com.twitter.finatra.streams.transformer.FinatraTransformer.TimerTime +import com.twitter.finatra.streams.transformer.domain.{ + DeleteTimer, + RetainTimer, + Time, + TimerMetadata, + TimerResult +} +import com.twitter.finatra.streams.transformer.internal.domain.{Timer, TimerSerde} +import com.twitter.finatra.streams.transformer.internal.{ + OnClose, + OnInit, + ProcessorContextUtils, + StateStoreImplicits, + WatermarkTracker +} +import com.twitter.util.Duration +import org.agrona.collections.ObjectHashSet +import org.apache.kafka.common.serialization.{Serde, Serdes} +import org.apache.kafka.streams.kstream.Transformer +import org.apache.kafka.streams.processor.{ + Cancellable, + ProcessorContext, + PunctuationType, + Punctuator +} +import org.apache.kafka.streams.state.{KeyValueIterator, KeyValueStore, StoreBuilder, Stores} +import org.joda.time.DateTime +import scala.collection.JavaConverters._ +import scala.reflect.ClassTag + +object FinatraTransformer { + type TimerTime = Long + type WindowStartTime = Long + type DateTimeMillis = Long + + def timerStore[TimerKey]( + name: String, + timerKeySerde: Serde[TimerKey] + ): StoreBuilder[KeyValueStore[Timer[TimerKey], Array[Byte]]] = { + Stores + .keyValueStoreBuilder( + Stores.persistentKeyValueStore(name), + TimerSerde(timerKeySerde), + Serdes.ByteArray + ) + .withLoggingEnabled(DefaultTopicConfig.FinatraChangelogConfig) + } +} + +/** + * A KafkaStreams Transformer supporting Per-Key Persistent Timers + * Inspired by Flink's ProcessFunction: https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/process_function.html + * + * Note: Timers are based on a sorted RocksDB KeyValueStore + * Note: Timers that fire at the same time MAY NOT fire in the order which they were added + * + * Example Timer Key Structures (w/ corresponding CountsStore Key Structures) + * {{{ + * ImpressionsCounter (w/ TimerKey storing TweetId) + * TimeWindowedKey(2018-08-04T10:00:00.000Z-20) + * Timer( 2018-08-04T12:00:00.000Z-Expire-2018-08-04T10:00:00.000Z-20 + * TimeWindowedKey(2018-08-04T10:00:00.000Z-30) + * Timer( 2018-08-04T12:00:00.000Z-Expire-2018-08-04T10:00:00.000Z-30 + * + * ImpressionsCounter (w/ TimerKey storing windowStartMs) + * TimeWindowedKey(2018-08-04T10:00:00.000Z-20) + * TimeWindowedKey(2018-08-04T10:00:00.000Z-30) + * TimeWindowedKey(2018-08-04T10:00:00.000Z-40) + * TimeWindowedKey(2018-08-04T11:00:00.000Z-20) + * TimeWindowedKey(2018-08-04T11:00:00.000Z-30) + * Timer( 2018-08-04T12:00:00.000Z-Expire-2018-08-04T10:00:00.000Z + * Timer( 2018-08-04T13:00:00.000Z-Expire-2018-08-04T11:00:00.000Z + * + * EngagementCounter (w/ TimerKey storing windowStartMs) + * TimeWindowedKey(2018-08-04T10:00:00.000Z-20-displayed) -> 5 + * TimeWindowedKey(2018-08-04T10:00:00.000Z-20-fav) -> 10 + * Timer( 2018-08-04T12:00:00.000Z-Expire-2018-08-04T10:00:00.000Z + * + * @tparam InputKey Type of the input keys + * @tparam InputValue Type of the input values + * @tparam StoreKey Type of the key being stored in the state store (needed to support onEventTimer cursoring) + * @tparam TimerKey Type of the timer key + * @tparam OutputKey Type of the output keys + * @tparam OutputValue Type of the output values + * }}} + */ +//TODO: Create variant for when there are no timers (e.g. avoid the extra time params and need to specify a timer store +@Beta +abstract class FinatraTransformer[InputKey, InputValue, StoreKey, TimerKey, OutputKey, OutputValue]( + commitInterval: Duration = null, //TODO: This field is currently only used by one external customer (but unable to @deprecate a constructor param). Will remove from caller and here in followup Phab. + cacheTimers: Boolean = true, + throttlingResetDuration: Duration = 3.seconds, + disableTimers: Boolean = false, + timerStoreName: String, + statsReceiver: StatsReceiver = LoadedStatsReceiver) //TODO + extends Transformer[InputKey, InputValue, (OutputKey, OutputValue)] + with OnInit + with OnClose + with StateStoreImplicits + with IteratorImplicits + with ProcessorContextLogging { + + /* Private Mutable */ + + @volatile private var _context: ProcessorContext = _ + @volatile private var cancellableThrottlingResetTimer: Cancellable = _ + @volatile private var processingTimerCancellable: Cancellable = _ + @volatile private var nextTimer: Long = Long.MaxValue //Maintain to avoid iterating timerStore every time fireTimers is called + + //TODO: Persist cursor in stateStore to avoid duplicate cursored work after a restart + @volatile private var throttled: Boolean = false + @volatile private var lastThrottledCursor: Option[StoreKey] = None + + /* Private */ + + private val watermarkTracker = new WatermarkTracker + private val cachedTimers = new ObjectHashSet[Timer[TimerKey]](16) + private val finatraKeyValueStores = + scala.collection.mutable.Map[String, FinatraKeyValueStore[_, _]]() + + protected[finatra] final val timersStore = if (disableTimers) { + null + } else { + getKeyValueStore[Timer[TimerKey], Array[Byte]](timerStoreName) + } + + /* Abstract */ + + protected[finatra] def onMessage(messageTime: Time, key: InputKey, value: InputValue): Unit + + protected def onProcessingTimer(time: TimerTime): Unit = {} + + /** + * Callback for when an Event timer is ready for processing + * + * @return TimerResult indicating if this timer should be retained or deleted + */ + protected def onEventTimer( + time: Time, + metadata: TimerMetadata, + key: TimerKey, + cursor: Option[StoreKey] + ): TimerResult[StoreKey] = { + warn(s"Unhandled timer $time $metadata $key") + DeleteTimer() + } + + /* Protected */ + + final override def init(processorContext: ProcessorContext): Unit = { + _context = processorContext + + for ((name, store) <- finatraKeyValueStores) { + store.init(processorContext, null) + } + + if (!disableTimers) { + cancellableThrottlingResetTimer = _context + .schedule( + throttlingResetDuration.inMillis, + PunctuationType.WALL_CLOCK_TIME, + new Punctuator { + override def punctuate(timestamp: TimerTime): Unit = { + resetThrottled() + fireEventTimeTimers() + } + } + ) + + findAndSetNextTimer() + cacheTimersIfEnabled() + } + + onInit() + } + + override protected def processorContext: ProcessorContext = _context + + final override def transform(k: InputKey, v: InputValue): (OutputKey, OutputValue) = { + if (watermarkTracker.track(_context.topic(), _context.timestamp)) { + fireEventTimeTimers() + } + + debug(s"onMessage ${_context.timestamp.iso8601Millis} $k $v") + onMessage(Time(_context.timestamp()), k, v) + + null + } + + final override def close(): Unit = { + setNextTimerTime(0) + cachedTimers.clear() + watermarkTracker.reset() + + if (cancellableThrottlingResetTimer != null) { + cancellableThrottlingResetTimer.cancel() + cancellableThrottlingResetTimer = null + } + + if (processingTimerCancellable != null) { + processingTimerCancellable.cancel() + processingTimerCancellable = null + } + + for ((name, store) <- finatraKeyValueStores) { + store.close() + FinatraStoresGlobalManager.removeStore(store) + } + + onClose() + } + + final protected def getKeyValueStore[KK: ClassTag, VV]( + name: String + ): FinatraKeyValueStore[KK, VV] = { + val store = new FinatraKeyValueStoreImpl[KK, VV](name, statsReceiver) + val previousStore = finatraKeyValueStores.put(name, store) + FinatraStoresGlobalManager.addStore(store) + assert(previousStore.isEmpty, s"getKeyValueStore was called for store $name more than once") + + // Initialize stores that are still using the "lazy val store" pattern + if (processorContext != null) { + store.init(processorContext, null) + } + + store + } + + //TODO: Add a forwardOnCommit which just takes a key + final protected def forward(key: OutputKey, value: OutputValue): Unit = { + trace(f"${"Forward:"}%-20s $key $value") + _context.forward(key, value) + } + + final protected def forward(key: OutputKey, value: OutputValue, timestamp: Long): Unit = { + trace(f"${"Forward:"}%-20s $key $value @${new DateTime(timestamp)}") + ProcessorContextUtils.setTimestamp(_context, timestamp) + _context.forward(key, value) + } + + final protected def watermark: Long = { + watermarkTracker.watermark + } + + final protected def addEventTimeTimer( + time: Time, + metadata: TimerMetadata, + key: TimerKey + ): Unit = { + trace( + f"${"AddEventTimer:"}%-20s ${metadata.getClass.getSimpleName}%-12s Key $key Timer ${time.millis.iso8601Millis}" + ) + val timer = Timer(time = time.millis, metadata = metadata, key = key) + if (cacheTimers && cachedTimers.contains(timer)) { + trace(s"Deduped unkeyed timer: $timer") + } else { + timersStore.put(timer, Array.emptyByteArray) + if (time.millis < nextTimer) { + setNextTimerTime(time.millis) + } + if (cacheTimers) { + cachedTimers.add(timer) + } + } + } + + final protected def addProcessingTimeTimer(duration: Duration): Unit = { + assert( + processingTimerCancellable == null, + "NonPersistentProcessingTimer already set. We currently only support a single processing timer being set through addProcessingTimeTimer." + ) + processingTimerCancellable = + processorContext.schedule(duration.inMillis, PunctuationType.WALL_CLOCK_TIME, new Punctuator { + override def punctuate(time: Long): Unit = { + onProcessingTimer(time) + } + }) + } + + final protected def deleteOrRetainTimer( + iterator: KeyValueIterator[StoreKey, _], + onDeleteTimer: => Unit = () => () + ): TimerResult[StoreKey] = { + if (iterator.hasNext) { + RetainTimer(stateStoreCursor = iterator.peekNextKeyOpt, throttled = true) + } else { + onDeleteTimer + DeleteTimer() + } + } + + /* Private */ + + private def fireEventTimeTimers(): Unit = { + trace( + s"FireTimers watermark ${watermark.iso8601Millis} nextTimer ${nextTimer.iso8601Millis} throttled $throttled" + ) + if (!disableTimers && !isThrottled && watermark >= nextTimer) { + val timerIterator = timersStore.all() + try { + timerIterator.asScala + .takeWhile { timerAndEmptyValue => + !isThrottled && watermark >= timerAndEmptyValue.key.time + } + .foreach { timerAndEmptyValue => + fireEventTimeTimer(timerAndEmptyValue.key) + } + } finally { + timerIterator.close() + findAndSetNextTimer() //TODO: Optimize by avoiding the need to re-read from the timersStore iterator + } + } + } + + //Note: LastThrottledCursor is shared per Task. However, since the timers are sorted, we should only be cursoring the head timer at a time. + private def fireEventTimeTimer(timer: Timer[TimerKey]): Unit = { + trace( + s"fireEventTimeTimer ${timer.metadata.getClass.getName} key: ${timer.key} timerTime: ${timer.time.iso8601Millis}" + ) + + onEventTimer( + time = Time(timer.time), + metadata = timer.metadata, + key = timer.key, + lastThrottledCursor + ) match { + case DeleteTimer(throttledResult) => + lastThrottledCursor = None + throttled = throttledResult + + timersStore.deleteWithoutGettingPriorValue(timer) + if (cacheTimers) { + cachedTimers.remove(timer) + } + case RetainTimer(stateStoreCursor, throttledResult) => + lastThrottledCursor = stateStoreCursor + throttled = throttledResult + } + } + + private def findAndSetNextTimer(): Unit = { + val iterator = timersStore.all() + try { + if (iterator.hasNext) { + setNextTimerTime(iterator.peekNextKey.time) + } else { + setNextTimerTime(Long.MaxValue) + } + } finally { + iterator.close() + } + } + + private def setNextTimerTime(time: TimerTime): Unit = { + nextTimer = time + if (time != Long.MaxValue) { + trace(s"NextTimer: ${nextTimer.iso8601Millis}") + } + } + + private def cacheTimersIfEnabled(): Unit = { + if (cacheTimers) { + val iterator = timersStore.all() + try { + for (timerKeyValue <- iterator.asScala) { + val timer = timerKeyValue.key + cachedTimers.add(timer) + } + } finally { + iterator.close() + } + } + } + + private def resetThrottled(): Unit = { + throttled = false + } + + private def isThrottled: Boolean = { + throttled + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/FinatraTransformerV2.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/FinatraTransformerV2.scala new file mode 100644 index 0000000000..4adf00758c --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/FinatraTransformerV2.scala @@ -0,0 +1,199 @@ +package com.twitter.finatra.streams.transformer + +import com.google.common.annotations.Beta +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finatra.kafka.utils.ConfigUtils +import com.twitter.finatra.kafkastreams.internal.utils.ProcessorContextLogging +import com.twitter.finatra.streams.flags.FinatraTransformerFlags +import com.twitter.finatra.streams.stores.FinatraKeyValueStore +import com.twitter.finatra.streams.stores.internal.{ + FinatraKeyValueStoreImpl, + FinatraStoresGlobalManager +} +import com.twitter.finatra.streams.transformer.FinatraTransformer.TimerTime +import com.twitter.finatra.streams.transformer.domain.{Time, Watermark} +import com.twitter.finatra.streams.transformer.internal.{OnClose, OnInit} +import com.twitter.finatra.streams.transformer.watermarks.internal.WatermarkManager +import com.twitter.finatra.streams.transformer.watermarks.{ + DefaultWatermarkAssignor, + WatermarkAssignor +} +import com.twitter.util.Duration +import org.apache.kafka.streams.kstream.Transformer +import org.apache.kafka.streams.processor.{ + Cancellable, + ProcessorContext, + PunctuationType, + Punctuator, + To +} +import scala.collection.mutable +import scala.reflect.ClassTag + +/** + * A KafkaStreams Transformer offering an upgraded API over the built in Transformer interface. + * + * This Transformer differs from the built in Transformer interface by exposing an [onMesssage] + * interface that is used to process incoming messages. Within [onMessage] you may use the + * [forward] method to emit 0 or more records. + * + * This transformer also manages watermarks(see [WatermarkManager]), and extends [OnWatermark] which + * allows you to track the passage of event time. + * + * Note: In time, this class will replace the deprecated FinatraTransformer class + * + * @tparam InputKey Type of the input keys + * @tparam InputValue Type of the input values + * @tparam OutputKey Type of the output keys + * @tparam OutputValue Type of the output values + */ +@Beta +abstract class FinatraTransformerV2[InputKey, InputValue, OutputKey, OutputValue]( + statsReceiver: StatsReceiver, + watermarkAssignor: WatermarkAssignor[InputKey, InputValue] = + new DefaultWatermarkAssignor[InputKey, InputValue]) + extends Transformer[InputKey, InputValue, (OutputKey, OutputValue)] + with OnInit + with OnWatermark + with OnClose + with ProcessorContextLogging { + + protected[streams] val finatraKeyValueStoresMap: mutable.Map[String, FinatraKeyValueStore[_, _]] = + scala.collection.mutable.Map[String, FinatraKeyValueStore[_, _]]() + + private var watermarkManager: WatermarkManager[InputKey, InputValue] = _ + + /* Private Mutable */ + + @volatile private var _context: ProcessorContext = _ + @volatile private var watermarkTimerCancellable: Cancellable = _ + + /* Abstract */ + + /** + * Callback method which is called for every message in the stream this Transformer is attached to. + * Implementers of this method may emit 0 or more records by using the processorContext. + * + * @param messageTime the time of the message + * @param key the key of the message + * @param value the value of the message + */ + protected[finatra] def onMessage(messageTime: Time, key: InputKey, value: InputValue): Unit + + /* Protected */ + + override protected def processorContext: ProcessorContext = _context + + final override def init(processorContext: ProcessorContext): Unit = { + _context = processorContext + + watermarkManager = new WatermarkManager[InputKey, InputValue]( + onWatermark = this, + watermarkAssignor = watermarkAssignor, + emitWatermarkPerMessage = shouldEmitWatermarkPerMessage(_context)) + + for ((name, store) <- finatraKeyValueStoresMap) { + store.init(processorContext, null) + } + + val autoWatermarkInterval = parseAutoWatermarkInterval(_context).inMillis + if (autoWatermarkInterval > 0) { + watermarkTimerCancellable = _context.schedule( + autoWatermarkInterval, + PunctuationType.WALL_CLOCK_TIME, + new Punctuator { + override def punctuate(timestamp: TimerTime): Unit = { + watermarkManager.callOnWatermarkIfChanged() + } + } + ) + } + + onInit() + } + + override def onWatermark(watermark: Watermark): Unit = { + trace(s"onWatermark $watermark") + } + + final override def transform(k: InputKey, v: InputValue): (OutputKey, OutputValue) = { + /* Note: It's important to save off the message time before watermarkManager.onMessage is called + which can trigger persistent timers to fire, which can cause messages to be forwarded, which + can cause context.timestamp to be mutated to the forwarded message timestamp :-( */ + val messageTime = Time(_context.timestamp()) + + debug(s"onMessage $watermark MessageTime(${messageTime.millis.iso8601Millis}) $k -> $v") + watermarkManager.onMessage(messageTime, _context.topic(), k, v) + onMessage(messageTime, k, v) + null + } + + final override def close(): Unit = { + if (watermarkTimerCancellable != null) { + watermarkTimerCancellable.cancel() + watermarkTimerCancellable = null + } + watermarkManager.close() + + for ((name, store) <- finatraKeyValueStoresMap) { + store.close() + FinatraStoresGlobalManager.removeStore(store) + } + + onClose() + } + + final protected def getKeyValueStore[KK: ClassTag, VV]( + name: String + ): FinatraKeyValueStore[KK, VV] = { + val store = new FinatraKeyValueStoreImpl[KK, VV](name, statsReceiver) + + val previousStore = finatraKeyValueStoresMap.put(name, store) + assert(previousStore.isEmpty, s"getKeyValueStore was called for store $name more than once") + FinatraStoresGlobalManager.addStore(store) + + // Initialize stores that are still using the "lazy val store" pattern + if (processorContext != null) { + store.init(processorContext, null) + } + + store + } + + final protected def forward(key: OutputKey, value: OutputValue): Unit = { + debug(s"Forward ${_context.timestamp().iso8601Millis} $key $value") + _context.forward(key, value) + } + + final protected def forward(key: OutputKey, value: OutputValue, timestamp: Long): Unit = { + if (timestamp <= 10000) { + warn(s"Forward SMALL TIMESTAMP: $timestamp $key $value") + } else { + debug(s"Forward ${timestamp.iso8601Millis} $key $value") + } + + _context.forward(key, value, To.all().withTimestamp(timestamp)) + } + + final protected def watermark: Watermark = { + watermarkManager.watermark + } + + private def parseAutoWatermarkInterval(processorContext: ProcessorContext): Duration = { + Duration.parse( + ConfigUtils.getConfigOrElse( + processorContext.appConfigs, + FinatraTransformerFlags.AutoWatermarkInterval, + "100.milliseconds" + ) + ) + } + + private def shouldEmitWatermarkPerMessage(processorContext: ProcessorContext): Boolean = { + ConfigUtils + .getConfigOrElse( + configs = processorContext.appConfigs, + key = FinatraTransformerFlags.EmitWatermarkPerMessage, + default = "false").toBoolean + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/IteratorImplicits.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/IteratorImplicits.scala new file mode 100644 index 0000000000..91022fbadf --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/IteratorImplicits.scala @@ -0,0 +1,107 @@ +package com.twitter.finatra.streams.transformer + +import org.agrona.collections.{Hashing, Object2ObjectHashMap} +import org.apache.kafka.streams.state.KeyValueIterator +import scala.collection.JavaConverters._ +trait IteratorImplicits { + + implicit class RichIterator[T](iterator: Iterator[T]) { + + final def multiSpan[SpanId](getSpanId: T => SpanId): Iterator[Iterator[T]] = { + new MultiSpanIterator(iterator, getSpanId) + } + } + + /* ------------------------------------------ */ + implicit class RichKeyValueIterator[K, V](keyValueIterator: KeyValueIterator[K, V]) { + + final def peekNextKeyOpt: Option[K] = { + if (keyValueIterator.hasNext) { + Some(keyValueIterator.peekNextKey()) + } else { + None + } + } + + final def keys: Iterator[K] = { + new Iterator[K] { + override def hasNext: Boolean = { + keyValueIterator.hasNext + } + + override def next(): K = { + keyValueIterator.next().key + } + } + } + + final def values: Iterator[V] = { + new Iterator[V] { + override def hasNext: Boolean = { + keyValueIterator.hasNext + } + + override def next(): V = { + keyValueIterator.next().value + } + } + } + + //NOTE: If sharedMap is set to true, a shared mutable map is returned in the iterator. You must immediately use the map's contents or copy the map otherwise the map's + // contents will change after each iterator of the iterator! + final def groupBy[PrimaryKey, SecondaryKey, MappedValue]( + primaryKey: K => PrimaryKey, + secondaryKey: K => SecondaryKey, + mapValue: V => MappedValue, + filterSecondaryKey: (SecondaryKey => Boolean) = (_: SecondaryKey) => true, + sharedMap: Boolean = false + ): Iterator[(PrimaryKey, scala.collection.Map[SecondaryKey, MappedValue])] = { + new Iterator[(PrimaryKey, scala.collection.Map[SecondaryKey, MappedValue])] { + final override def hasNext: Boolean = keyValueIterator.hasNext + + final override def next(): (PrimaryKey, scala.collection.Map[SecondaryKey, MappedValue]) = { + val secondaryKeyMap = getSecondaryMap() + + val currentPartition = primaryKey(keyValueIterator.peekNextKey()) + while (keyValueIterator.hasNext && primaryKey(keyValueIterator.peekNextKey) == currentPartition) { + val entry = keyValueIterator.next() + val secondaryKeyToAdd = secondaryKey(entry.key) + if (filterSecondaryKey(secondaryKeyToAdd)) { + secondaryKeyMap.put( + secondaryKeyToAdd.asInstanceOf[Any], + mapValue(entry.value).asInstanceOf[Any] + ) + } + } + currentPartition -> secondaryKeyMap.asScala + } + + private var reusableSharedMap: Object2ObjectHashMap[SecondaryKey, MappedValue] = _ + + private def getSecondaryMap(): Object2ObjectHashMap[SecondaryKey, MappedValue] = { + if (sharedMap) { + if (reusableSharedMap == null) { + reusableSharedMap = createMap() + } else { + reusableSharedMap.clear() + } + reusableSharedMap + } else { + createMap() + } + } + + private def createMap() = { + new Object2ObjectHashMap[SecondaryKey, MappedValue](16, Hashing.DEFAULT_LOAD_FACTOR) + } + } + } + + final def take(num: Int): Iterator[(K, V)] = { + keyValueIterator.asScala + .take(num) + .map(keyValue => (keyValue.key, keyValue.value)) + } + } + +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/MultiSpanIterator.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/MultiSpanIterator.scala new file mode 100644 index 0000000000..c0f15271df --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/MultiSpanIterator.scala @@ -0,0 +1,60 @@ +package com.twitter.finatra.streams.transformer + +/** + * This Iterator will take an Iterator and split it into subiterators, where each subiterator + * contains all contiguous elements that have the same span as defined by the getSpan function. + * + * For example: + * + * Passing in an Iterator(1, 1, 1, 2, 2, 3) with a span function of `identity` will yield: + * {{{ + * Iterator( + * Iterator(1,1,1), + * Iterator(2,2), + * Iterator(3)) + * }}} + * + * If there are multiple elements that have the same span, but they are not contiguous then + * they will be returned in separate subiterators. + * + * For example: + * + * Passing in an Iterator(1,2,1,2) with a span function of `identity` will yield: + * {{{ + * Iterator( + * Iterator(1), + * Iterator(2), + * Iterator(1), + * Iterator(2)) + * }}} + * + * Contiguous is defined by the Iterator.span function: + * + * @see [[scala.collection.Iterator.span]] + * + * @param iterator The iterator to split + * @param getSpanId A function of item to span + * @tparam T the type of the item + * @tparam SpanId the type of the span + */ +class MultiSpanIterator[T, SpanId](private var iterator: Iterator[T], getSpanId: T => SpanId) + extends Iterator[Iterator[T]] { + + override def hasNext: Boolean = { + iterator.nonEmpty + } + + override def next(): Iterator[T] = { + val headItem = iterator.next + val headSpanId = getSpanId(headItem) + + val (contiguousItems, remainingItems) = iterator.span { currentItem => + getSpanId(currentItem) == headSpanId + } + + // mutate the iterator member to be the remaining items + iterator = remainingItems + + Iterator(headItem) ++ contiguousItems + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/OnWatermark.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/OnWatermark.scala new file mode 100644 index 0000000000..249198150a --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/OnWatermark.scala @@ -0,0 +1,7 @@ +package com.twitter.finatra.streams.transformer + +import com.twitter.finatra.streams.transformer.domain.Watermark + +trait OnWatermark { + def onWatermark(watermark: Watermark): Unit +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/PeriodicWatermarkManager.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/PeriodicWatermarkManager.scala new file mode 100644 index 0000000000..6b3f9a131d --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/PeriodicWatermarkManager.scala @@ -0,0 +1,17 @@ +package com.twitter.finatra.streams.transformer + +trait PeriodicWatermarkManager[K, V] { + + def init(onWatermark: Long => Unit): Unit + + def close(): Unit + + def currentWatermark: Long + + def onMessage(topic: String, timestamp: Long, key: K, value: V): Unit + + /** + * Called every watermarkPeriodicWallClockDuration allowing the Watermark manager decide whether to call onWatermark to emit a new watermark + */ + def onPeriodicWallClockDuration(): Unit +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/PersistentTimerStore.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/PersistentTimerStore.scala new file mode 100644 index 0000000000..81d2dd5717 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/PersistentTimerStore.scala @@ -0,0 +1,154 @@ +package com.twitter.finatra.streams.transformer + +import com.google.common.annotations.Beta +import com.twitter.finatra.streams.converters.time._ +import com.twitter.finatra.streams.stores.FinatraKeyValueStore +import com.twitter.finatra.streams.transformer.FinatraTransformer.TimerTime +import com.twitter.finatra.streams.transformer.domain.{Time, TimerMetadata, Watermark} +import com.twitter.finatra.streams.transformer.internal.domain.{Timer, TimerSerde} +import com.twitter.inject.Logging +import org.apache.kafka.streams.state.KeyValueIterator + +@Beta +class PersistentTimerStore[TimerKey]( + timersStore: FinatraKeyValueStore[Timer[TimerKey], Array[Byte]], + onTimer: (Time, TimerMetadata, TimerKey) => Unit, + maxTimerFiresPerWatermark: Int) + extends OnWatermark + with Logging { + + /* Private Mutable */ + + @volatile private var nextTimerTime: Long = _ + @volatile private var currentWatermark: Watermark = _ + + /* Public */ + + def onInit(): Unit = { + setNextTimerTime(Long.MaxValue) + currentWatermark = Watermark(0) + + val iterator = timersStore.all() + try { + if (iterator.hasNext) { + setNextTimerTime(iterator.next.key.time) + } + } finally { + iterator.close() + } + } + + final override def onWatermark(watermark: Watermark): Unit = { + if (watermark.timeMillis < 10000) { + warn(s"onWatermark too small $watermark") + } else { + trace(s"onWatermark $watermark nextTimerTime ${nextTimerTime.iso8601Millis}") + } + + if (watermark.timeMillis >= nextTimerTime) { + trace(s"Calling fireTimers($watermark)") + fireTimers(watermark) + } + currentWatermark = watermark + } + + def addTimer(time: Time, metadata: TimerMetadata, key: TimerKey): Unit = { + if (time.millis < currentWatermark.timeMillis) { + info( + f"${"DirectlyFireTimer:"}%-20s ${metadata.getClass.getSimpleName}%-12s Key $key Timer $time since $time < $currentWatermark") + + onTimer(time, metadata, key) + } else { + debug(f"${"AddTimer:"}%-20s ${metadata.getClass.getSimpleName}%-12s Key $key Timer $time") + timersStore.put( + Timer(time = time.millis, metadata = metadata, key = key), + Array.emptyByteArray) + + if (time.millis < nextTimerTime) { + setNextTimerTime(time.millis) + } + } + } + + /* Private */ + + private sealed trait TimerIteratorState { + def done: Boolean + } + private object Iterating extends TimerIteratorState { + override val done = false + } + private object FoundTimerAfterWatermark extends TimerIteratorState { + override val done = true + } + private object ExceededMaxTimers extends TimerIteratorState { + override val done = true + } + + // Mostly optimized (although hasNext is still called more times than needed) + private def fireTimers(watermark: Watermark): Unit = { + val timerIterator = timersStoreIterator() + + try { + var timerIteratorState: TimerIteratorState = Iterating + var currentTimer: Timer[TimerKey] = null + var numTimerFires = 0 + + while (timerIterator.hasNext && !timerIteratorState.done) { + currentTimer = timerIterator.next().key + + if (watermark.timeMillis >= currentTimer.time) { + fireAndDeleteTimer(currentTimer) + numTimerFires += 1 + if (numTimerFires >= maxTimerFiresPerWatermark) { + timerIteratorState = ExceededMaxTimers + } + } else { + timerIteratorState = FoundTimerAfterWatermark + } + } + + if (timerIteratorState == FoundTimerAfterWatermark) { + setNextTimerTime(currentTimer.time) + } else if (timerIteratorState == ExceededMaxTimers && timerIterator.hasNext) { + setNextTimerTime(timerIterator.next().key.time) + debug( + s"Exceeded $maxTimerFiresPerWatermark max timer fires per watermark. LastTimerFired: ${currentTimer.time.iso8601Millis} NextTimer: ${nextTimerTime.iso8601Millis}" + ) + } else { + assert(!timerIterator.hasNext) + setNextTimerTime(Long.MaxValue) + } + } finally { + timerIterator.close() + } + } + + /* + * Instead of calling timersStore.all, we perform a range scan starting at our nextTimerTime. This optimization + * avoids a performance issue where timersStore.all may need to traverse lots of tombstoned timers that were + * deleted but not yet compacted. + * + * For more information see: + * https://github.com/facebook/rocksdb/issues/261 + * https://www.reddit.com/r/IAmA/comments/3de3cv/we_are_rocksdb_engineering_team_ask_us_anything/ct4c0fk/ + */ + private def timersStoreIterator(): KeyValueIterator[Timer[TimerKey], Array[Byte]] = { + timersStore.range(TimerSerde.timerTimeToBytes(nextTimerTime)) + } + + private def fireAndDeleteTimer(timer: Timer[TimerKey]): Unit = { + trace(s"fireAndDeleteTimer $timer") + onTimer(Time(timer.time), timer.metadata, timer.key) + timersStore.deleteWithoutGettingPriorValue(timer) + } + + private def setNextTimerTime(time: TimerTime): Unit = { + nextTimerTime = time + if (time != Long.MaxValue) { + trace(s"NextTimer: ${nextTimerTime.iso8601Millis}") + } else { + trace(s"NextTimer: Long.MaxValue") + } + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/PersistentTimers.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/PersistentTimers.scala new file mode 100644 index 0000000000..3581314f8a --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/PersistentTimers.scala @@ -0,0 +1,65 @@ +package com.twitter.finatra.streams.transformer + +import com.google.common.annotations.Beta +import com.twitter.finatra.streams.stores.FinatraKeyValueStore +import com.twitter.finatra.streams.transformer.domain.{Time, TimerMetadata, Watermark} +import com.twitter.finatra.streams.transformer.internal.OnInit +import com.twitter.finatra.streams.transformer.internal.domain.Timer +import java.util +import org.apache.kafka.streams.processor.PunctuationType +import scala.reflect.ClassTag + +/** + * Per-Key Persistent Timers inspired by Flink's ProcessFunction: + * https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/process_function.html + * + * Note: Timers are based on a sorted RocksDB KeyValueStore + * Note: Timers that fire at the same time MAY NOT fire in the order which they were added + */ +@Beta +trait PersistentTimers extends OnWatermark with OnInit { + + private val timerStoresMap = scala.collection.mutable.Map[String, PersistentTimerStore[_]]() + private val timerStores = new util.ArrayList[PersistentTimerStore[_]] + + protected def getKeyValueStore[KK: ClassTag, VV](name: String): FinatraKeyValueStore[KK, VV] + + override def onInit(): Unit = { + val iterator = timerStores.iterator + while (iterator.hasNext) { + iterator.next.onInit() + } + super.onInit() + } + + protected def getPersistentTimerStore[TimerKey]( + timerStoreName: String, + onTimer: (Time, TimerMetadata, TimerKey) => Unit, + punctuationType: PunctuationType, + maxTimerFiresPerWatermark: Int = 10000 + ): PersistentTimerStore[TimerKey] = { + assert(punctuationType == PunctuationType.STREAM_TIME) //TODO: Support WALL CLOCK TIME + + val store = new PersistentTimerStore[TimerKey]( + timersStore = getKeyValueStore[Timer[TimerKey], Array[Byte]](timerStoreName), + onTimer = onTimer, + maxTimerFiresPerWatermark = maxTimerFiresPerWatermark) + + assert( + timerStoresMap.put(timerStoreName, store).isEmpty, + s"getPersistentTimerStore already called for $timerStoreName") + + timerStores.add(store) + + store + } + + //TODO: protected def getCursoredTimerStore[TimerKey, CursorKey] ... + + final override def onWatermark(watermark: Watermark): Unit = { + val iterator = timerStores.iterator + while (iterator.hasNext) { + iterator.next.onWatermark(watermark) + } + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/SamplingUtils.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/SamplingUtils.scala new file mode 100644 index 0000000000..cf68035731 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/SamplingUtils.scala @@ -0,0 +1,15 @@ +package com.twitter.finatra.streams.transformer + +object SamplingUtils { + def getNumCountsStoreName(sampleName: String): String = { + s"Num${sampleName}CountStore" + } + + def getSampleStoreName(sampleName: String): String = { + s"${sampleName}SampleStore" + } + + def getTimerStoreName(sampleName: String): String = { + s"${sampleName}TimerStore" + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/SumAggregator.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/SumAggregator.scala new file mode 100644 index 0000000000..3e7975232f --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/SumAggregator.scala @@ -0,0 +1,114 @@ +package com.twitter.finatra.streams.transformer + +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finatra.streams.transformer.FinatraTransformer.WindowStartTime +import com.twitter.finatra.streams.transformer.domain._ +import com.twitter.util.Duration +import org.apache.kafka.streams.state.KeyValueIterator + +@deprecated("Use AggregatorTransformer") +class SumAggregator[K, V]( + commitInterval: Duration, + keyRangeStart: K, + statsReceiver: StatsReceiver, + stateStoreName: String, + timerStoreName: String, + windowSize: Duration, + allowedLateness: Duration, + queryableAfterClose: Duration, + windowStart: (Time, K, V) => Long, + countToAggregate: (K, V) => Int, + emitOnClose: Boolean = true, + maxActionsPerTimer: Int = 25000) + extends FinatraTransformer[ + K, + V, + TimeWindowed[K], + WindowStartTime, + TimeWindowed[K], + WindowedValue[ + Int + ]](timerStoreName = timerStoreName, statsReceiver = statsReceiver, cacheTimers = true) { + + private val windowSizeMillis = windowSize.inMillis + private val allowedLatenessMillis = allowedLateness.inMillis + private val queryableAfterCloseMillis = queryableAfterClose.inMillis + + private val restatementsCounter = statsReceiver.counter("numRestatements") + private val closedCounter = statsReceiver.counter("closedWindows") + private val expiredCounter = statsReceiver.counter("expiredWindows") + + private val stateStore = getKeyValueStore[TimeWindowed[K], Int](stateStoreName) + + override def onMessage(time: Time, key: K, value: V): Unit = { + val windowedKey = TimeWindowed.forSize( + startMs = windowStart(time, key, value), + sizeMs = windowSizeMillis, + value = key + ) + + val count = countToAggregate(key, value) + if (windowedKey.isLate(allowedLatenessMillis, Watermark(watermark))) { + restatementsCounter.incr() + forward(windowedKey, WindowedValue(Restatement, count)) + } else { + val newCount = stateStore.increment(windowedKey, count) + if (newCount == count) { + val closeTime = windowedKey.startMs + windowSizeMillis + allowedLatenessMillis + if (emitOnClose) { + addEventTimeTimer(Time(closeTime), Close, windowedKey.startMs) + } + addEventTimeTimer(Time(closeTime + queryableAfterCloseMillis), Expire, windowedKey.startMs) + } + } + } + + override def onEventTimer( + time: Time, + timerMetadata: TimerMetadata, + windowStartMs: WindowStartTime, + cursor: Option[TimeWindowed[K]] + ): TimerResult[TimeWindowed[K]] = { + val hourlyWindowIterator = stateStore.range( + cursor getOrElse TimeWindowed.forSize(windowStartMs, windowSizeMillis, keyRangeStart), + TimeWindowed.forSize(windowStartMs + 1, windowSizeMillis, keyRangeStart) + ) + + try { + if (timerMetadata == Close) { + onClosed(windowStartMs, hourlyWindowIterator) + } else { + onExpired(hourlyWindowIterator) + } + } finally { + hourlyWindowIterator.close() + } + } + + private def onClosed( + windowStartMs: Long, + windowIterator: KeyValueIterator[TimeWindowed[K], Int] + ): TimerResult[TimeWindowed[K]] = { + windowIterator + .take(maxActionsPerTimer) + .foreach { + case (key, value) => + forward(key = key, value = WindowedValue(resultState = WindowClosed, value = value)) + } + + deleteOrRetainTimer(windowIterator, closedCounter.incr()) + } + + private def onExpired( + windowIterator: KeyValueIterator[TimeWindowed[K], Int] + ): TimerResult[TimeWindowed[K]] = { + windowIterator + .take(maxActionsPerTimer) + .foreach { + case (key, value) => + stateStore.deleteWithoutGettingPriorValue(key) + } + + deleteOrRetainTimer(windowIterator, expiredCounter.incr()) + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/domain/CompositeKey.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/domain/CompositeKey.scala new file mode 100644 index 0000000000..e948f0ed74 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/domain/CompositeKey.scala @@ -0,0 +1,6 @@ +package com.twitter.finatra.streams.transformer.domain + +trait CompositeKey[P, S] { + def primary: P + def secondary: S +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/domain/FixedTimeWindowedSerde.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/domain/FixedTimeWindowedSerde.scala new file mode 100644 index 0000000000..6573effffb --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/domain/FixedTimeWindowedSerde.scala @@ -0,0 +1,57 @@ +package com.twitter.finatra.streams.transformer.domain + +import com.twitter.finatra.kafka.serde.AbstractSerde +import com.twitter.util.Duration +import java.nio.ByteBuffer +import org.apache.kafka.common.serialization.Serde + +object FixedTimeWindowedSerde { + + def apply[K](inner: Serde[K], duration: Duration): Serde[TimeWindowed[K]] = { + new FixedTimeWindowedSerde[K](inner, duration) + } +} + +/** + * Serde for use when your time windows are fixed length and non-overlapping. When this + * condition is met, we are able to avoid serializing 8 bytes for TimeWindowed.endTime + */ +class FixedTimeWindowedSerde[K](val inner: Serde[K], windowSize: Duration) + extends AbstractSerde[TimeWindowed[K]] { + + private val WindowStartTimeSizeBytes = java.lang.Long.BYTES + + private val innerDeserializer = inner.deserializer() + private val innerSerializer = inner.serializer() + private val windowSizeMillis = windowSize.inMillis + assert(windowSizeMillis > 10, "The minimum window size currently supported is 10ms") + + /* Public */ + + final override def deserialize(bytes: Array[Byte]): TimeWindowed[K] = { + val keyBytesSize = bytes.length - WindowStartTimeSizeBytes + val keyBytes = new Array[Byte](keyBytesSize) + + val bb = ByteBuffer.wrap(bytes) + val startMs = bb.getLong() + bb.get(keyBytes) + val endMs = startMs + windowSizeMillis + + TimeWindowed(startMs = startMs, endMs = endMs, innerDeserializer.deserialize(topic, keyBytes)) + } + + final override def serialize(timeWindowedKey: TimeWindowed[K]): Array[Byte] = { + assert( + timeWindowedKey.startMs + windowSizeMillis == timeWindowedKey.endMs, + s"TimeWindowed element being serialized has end time which is not consistent with the FixedTimeWindowedSerde window size of $windowSize. ${timeWindowedKey.startMs + windowSizeMillis} != ${timeWindowedKey.endMs}" + ) + + val keyBytes = innerSerializer.serialize(topic, timeWindowedKey.value) + val windowAndKeyBytesSize = new Array[Byte](WindowStartTimeSizeBytes + keyBytes.length) + + val bb = ByteBuffer.wrap(windowAndKeyBytesSize) + bb.putLong(timeWindowedKey.startMs) + bb.put(keyBytes) + bb.array() + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/domain/IndexedSampleKey.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/domain/IndexedSampleKey.scala new file mode 100644 index 0000000000..71749c9346 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/domain/IndexedSampleKey.scala @@ -0,0 +1,21 @@ +package com.twitter.finatra.streams.transformer.domain + +object IndexedSampleKey { + def rangeStart[SampleKey](sampleKey: SampleKey): IndexedSampleKey[SampleKey] = { + IndexedSampleKey(sampleKey, 0) + } + + def rangeEnd[SampleKey](sampleKey: SampleKey): IndexedSampleKey[SampleKey] = { + IndexedSampleKey(sampleKey, Int.MaxValue) + } +} + +/** + * The key in a sample KeyValue store. Each sample is stored as a row in the table, + * and the index is what makes each row unique. The index is a number of 0..sampleSize + * + * @param sampleKey the user specified key of the sample(e.g. engagement type, or audience) + * @param index a number of 0..sampleSize + * @tparam SampleKey the user specified sample key type. + */ +case class IndexedSampleKey[SampleKey](sampleKey: SampleKey, index: Int) diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/domain/Time.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/domain/Time.scala new file mode 100644 index 0000000000..b0edf56ba2 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/domain/Time.scala @@ -0,0 +1,36 @@ +package com.twitter.finatra.streams.transformer.domain + +import com.twitter.util.Duration +import org.joda.time.DateTimeConstants +import com.twitter.finatra.streams.converters.time._ + +object Time { + def nextInterval(time: Long, duration: Duration): Long = { + val durationMillis = duration.inMillis + val currentNumIntervals = time / durationMillis + (currentNumIntervals + 1) * durationMillis + } +} + +//TODO: Refactor +case class Time(millis: Long) extends AnyVal { + + final def plus(duration: Duration): Time = { + new Time(millis + duration.inMillis) + } + + final def hourMillis: Long = { + val unitsSinceEpoch = millis / DateTimeConstants.MILLIS_PER_HOUR + unitsSinceEpoch * DateTimeConstants.MILLIS_PER_HOUR + } + + final def hourlyWindowed[K](key: K): TimeWindowed[K] = { + val start = hourMillis + val end = start + DateTimeConstants.MILLIS_PER_HOUR + TimeWindowed(start, end, key) + } + + override def toString: String = { + s"Time(${millis.iso8601Millis})" + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/domain/TimeWindowed.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/domain/TimeWindowed.scala new file mode 100644 index 0000000000..195d367540 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/domain/TimeWindowed.scala @@ -0,0 +1,86 @@ +package com.twitter.finatra.streams.transformer.domain + +import com.twitter.util.Duration +import org.joda.time.{DateTime, DateTimeConstants} + +object TimeWindowed { + + def forSize[V](startMs: Long, sizeMs: Long, value: V): TimeWindowed[V] = { + TimeWindowed(startMs, startMs + sizeMs, value) + } + + def forSizeFromMessageTime[V](messageTime: Time, sizeMs: Long, value: V): TimeWindowed[V] = { + val windowStartMs = windowStart(messageTime, sizeMs) + TimeWindowed(windowStartMs, windowStartMs + sizeMs, value) + } + + def hourly[V](startMs: Long, value: V): TimeWindowed[V] = { + TimeWindowed(startMs, startMs + DateTimeConstants.MILLIS_PER_HOUR, value) + } + + def windowStart(messageTime: Time, sizeMs: Long): Long = { + (messageTime.millis / sizeMs) * sizeMs + } +} + +/** + * A time windowed value specified by a start and end time + * @param startMs the start timestamp of the window (inclusive) + * @param endMs the end timestamp of the window (exclusive) + */ +case class TimeWindowed[V](startMs: Long, endMs: Long, value: V) { + + /** + * Determine if this windowed value is late given the allowedLateness configuration and the + * current watermark + * + * @param allowedLateness the configured amount of allowed lateness specified in milliseconds + * @param watermark a watermark used to determine if this windowed value is late + * @return If the windowed value is late + */ + def isLate(allowedLateness: Long, watermark: Watermark): Boolean = { + watermark.timeMillis > endMs + allowedLateness + } + + /** + * Determine the start of the next fixed window interval + */ + def nextInterval(time: Long, duration: Duration): Long = { + val intervalStart = math.max(startMs, time) + Time.nextInterval(intervalStart, duration) + } + + /** + * Map the time windowed value into another value occurring in the same window + */ + def map[KK](f: V => KK): TimeWindowed[KK] = { + copy(value = f(value)) + } + + /** + * The size of this windowed value in milliseconds + */ + def sizeMillis: Long = endMs - startMs + + final override val hashCode: Int = { + var result = value.hashCode() + result = 31 * result + (startMs ^ (startMs >>> 32)).toInt + result = 31 * result + (endMs ^ (endMs >>> 32)).toInt + result + } + + final override def equals(obj: scala.Any): Boolean = { + obj match { + case other: TimeWindowed[V] => + startMs == other.startMs && + endMs == other.endMs && + value == other.value + case _ => + false + } + } + + override def toString: String = { + s"TimeWindowed(${new DateTime(startMs)}-${new DateTime(endMs)}-$value)" + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/domain/TimerMetadata.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/domain/TimerMetadata.scala new file mode 100644 index 0000000000..901cdb9cb9 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/domain/TimerMetadata.scala @@ -0,0 +1,49 @@ +package com.twitter.finatra.streams.transformer.domain + +object TimerMetadata { + def apply(value: Byte): TimerMetadata = { + value match { + case EmitEarly.value => EmitEarly + case Close.value => Close + case Expire.value => Expire + case _ => new TimerMetadata(value) + } + } +} + +/** + * Metadata used to convey the purpose of a + * [[com.twitter.finatra.streams.transformer.internal.domain.Timer]]. + * + * [[TimerMetadata]] represents the following Timer actions: [[EmitEarly]], [[Close]], [[Expire]] + */ +class TimerMetadata(val value: Byte) { + require(value >= 0) + + override def equals(obj: scala.Any): Boolean = { + obj match { + case other: TimerMetadata => value == other.value + case _ => false + } + } + + override def hashCode(): Int = { + value.hashCode() + } + + override def toString: String = { + s"TimerMetadata($value)" + } +} + +object EmitEarly extends TimerMetadata(0) { + override def toString: String = "EmitEarly" +} + +object Close extends TimerMetadata(1) { + override def toString: String = "Close" +} + +object Expire extends TimerMetadata(2) { + override def toString: String = "Expire" +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/domain/Watermark.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/domain/Watermark.scala new file mode 100644 index 0000000000..3357d136ea --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/domain/Watermark.scala @@ -0,0 +1,10 @@ +package com.twitter.finatra.streams.transformer.domain + +import com.twitter.finatra.streams.converters.time._ + +case class Watermark(timeMillis: Long) extends AnyVal { + + override def toString: String = { + s"Watermark(${timeMillis.iso8601Millis})" + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/domain/WindowValueResult.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/domain/WindowValueResult.scala new file mode 100644 index 0000000000..8b99291499 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/domain/WindowValueResult.scala @@ -0,0 +1,40 @@ +package com.twitter.finatra.streams.transformer.domain + +object WindowResultType { + def apply(value: Byte): WindowResultType = { + value match { + case WindowOpen.value => WindowOpen + case WindowClosed.value => WindowClosed + case Restatement.value => Restatement + } + } +} + +class WindowResultType(val value: Byte) { + override def equals(obj: scala.Any): Boolean = { + obj match { + case other: WindowResultType => value == other.value + case _ => false + } + } + + override def hashCode(): Int = { + value.hashCode() + } + + override def toString: String = { + s"WindowResultType($value)" + } +} + +object WindowOpen extends WindowResultType(0) { + override def toString: String = "WindowOpen" +} + +object WindowClosed extends WindowResultType(1) { + override def toString: String = "WindowClosed" +} + +object Restatement extends WindowResultType(2) { + override def toString: String = "Restatement" +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/domain/WindowedValue.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/domain/WindowedValue.scala new file mode 100644 index 0000000000..9a619e8817 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/domain/WindowedValue.scala @@ -0,0 +1,9 @@ +package com.twitter.finatra.streams.transformer.domain + +//TODO: Rename resultState to WindowResultType +case class WindowedValue[V](resultState: WindowResultType, value: V) { + + def map[VV](f: V => VV): WindowedValue[VV] = { + copy(value = f(value)) + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/domain/WindowedValueSerde.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/domain/WindowedValueSerde.scala new file mode 100644 index 0000000000..5ea0aeacea --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/domain/WindowedValueSerde.scala @@ -0,0 +1,42 @@ +package com.twitter.finatra.streams.transformer.domain + +import com.twitter.finatra.kafka.serde.AbstractSerde +import java.nio.ByteBuffer +import org.apache.kafka.common.serialization.Serde + +object WindowedValueSerde { + def apply[V](inner: Serde[V]): WindowedValueSerde[V] = { + new WindowedValueSerde[V](inner) + } +} + +/** + * Serde for the [[WindowedValue]] class. + * + * @param inner Serde for [[WindowedValue.value]]. + */ +class WindowedValueSerde[V](inner: Serde[V]) extends AbstractSerde[WindowedValue[V]] { + + private val innerDeserializer = inner.deserializer() + private val innerSerializer = inner.serializer() + + override def deserialize(bytes: Array[Byte]): WindowedValue[V] = { + val resultState = WindowResultType(bytes(0)) + + val valueBytes = new Array[Byte](bytes.length - 1) + System.arraycopy(bytes, 1, valueBytes, 0, valueBytes.length) + val value = innerDeserializer.deserialize(topic, valueBytes) + + WindowedValue(resultState = resultState, value = value) + } + + override def serialize(windowedValue: WindowedValue[V]): Array[Byte] = { + val valueBytes = innerSerializer.serialize(topic, windowedValue.value) + + val resultTypeAndValueBytes = new Array[Byte](1 + valueBytes.size) + val bb = ByteBuffer.wrap(resultTypeAndValueBytes) + bb.put(windowedValue.resultState.value) + bb.put(valueBytes) + resultTypeAndValueBytes + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/domain/timerResults.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/domain/timerResults.scala new file mode 100644 index 0000000000..78aee918ab --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/domain/timerResults.scala @@ -0,0 +1,33 @@ +package com.twitter.finatra.streams.transformer.domain + +/** + * Indicates the result of a Timer-based operation. + */ +sealed trait TimerResult[SK] { + def map[SKR](f: SK => SKR): TimerResult[SKR] = { + this match { + case result @ RetainTimer(Some(cursor), throttled) => + result.copy(stateStoreCursor = Some(f(cursor))) + case _ => + this.asInstanceOf[TimerResult[SKR]] + } + } +} + +/** + * A [[TimerResult]] that represents the completion of a deletion. + * + * @param throttled Indicates the number of operations has surpassed those allocated + * for a period of time. + */ +case class DeleteTimer[SK](throttled: Boolean = false) extends TimerResult[SK] + +/** + * A [[TimerResult]] that represents the retention of an incomplete deletion. + * + * @param stateStoreCursor A cursor representing the next key in an iterator. + * @param throttled Indicates the number of operations has surpassed those allocated + * for a period of time. + */ +case class RetainTimer[SK](stateStoreCursor: Option[SK] = None, throttled: Boolean = false) + extends TimerResult[SK] diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/internal/OnClose.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/internal/OnClose.scala new file mode 100644 index 0000000000..960d7fac6b --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/internal/OnClose.scala @@ -0,0 +1,5 @@ +package com.twitter.finatra.streams.transformer.internal + +trait OnClose { + protected def onClose(): Unit = {} +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/internal/OnInit.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/internal/OnInit.scala new file mode 100644 index 0000000000..9070439f5b --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/internal/OnInit.scala @@ -0,0 +1,5 @@ +package com.twitter.finatra.streams.transformer.internal + +trait OnInit { + protected def onInit(): Unit = {} +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/internal/ProcessorContextUtils.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/internal/ProcessorContextUtils.scala new file mode 100644 index 0000000000..d13c2fe90a --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/internal/ProcessorContextUtils.scala @@ -0,0 +1,23 @@ +package com.twitter.finatra.streams.transformer.internal + +import com.twitter.finatra.kafkastreams.internal.utils.ReflectionUtils +import java.lang.reflect.Field +import org.apache.kafka.streams.processor.ProcessorContext +import org.apache.kafka.streams.processor.internals.{ + AbstractProcessorContext, + ProcessorRecordContext +} + +object ProcessorContextUtils { + + private val processorRecordContextTimestampField: Field = + ReflectionUtils.getFinalField(classOf[ProcessorRecordContext], "timestamp") + + //Workaround until new KIP code lands from: https://github.com/apache/kafka/pull/4519/files#diff-7fba7b13f10a41d067e38316bf3f01b6 + //See also: KAFKA-6454 + def setTimestamp(processorContext: ProcessorContext, newTimestamp: Long): Unit = { + val processorRecordContext = processorContext + .asInstanceOf[AbstractProcessorContext].recordContext.asInstanceOf[ProcessorRecordContext] + processorRecordContextTimestampField.setLong(processorRecordContext, newTimestamp) + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/internal/StateStoreImplicits.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/internal/StateStoreImplicits.scala new file mode 100644 index 0000000000..08088cdaeb --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/internal/StateStoreImplicits.scala @@ -0,0 +1,45 @@ +package com.twitter.finatra.streams.transformer.internal + +import com.twitter.finagle.stats.Stat +import com.twitter.finatra.kafkastreams.internal.utils.ProcessorContextLogging +import com.twitter.util.Stopwatch +import org.apache.kafka.streams.state.KeyValueStore + +trait StateStoreImplicits extends ProcessorContextLogging { + + /* ------------------------------------------ */ + implicit class RichKeyIntValueStore[SK](keyValueStore: KeyValueStore[SK, Int]) { + + /** + * @return the new value associated with the specified key + */ + final def increment(key: SK, amount: Int): Int = { + val existingCount = keyValueStore.get(key) + val newCount = existingCount + amount + trace(s"keyValueStore.put($key, $newCount)") + keyValueStore.put(key, newCount) + newCount + } + + /** + * @return the new value associated with the specified key + */ + final def increment(key: SK, amount: Int, getStat: Stat, putStat: Stat): Int = { + val getElapsed = Stopwatch.start() + val existingCount = keyValueStore.get(key) + val getElapsedMillis = getElapsed.apply().inMillis + getStat.add(getElapsedMillis) + if (getElapsedMillis > 10) { + warn(s"SlowGet $getElapsedMillis ms for key $key") + } + + val newCount = existingCount + amount + + val putElapsed = Stopwatch.start() + keyValueStore.put(key, newCount) + putStat.add(putElapsed.apply().inMillis) + + newCount + } + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/internal/WatermarkTracker.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/internal/WatermarkTracker.scala new file mode 100644 index 0000000000..010dc56696 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/internal/WatermarkTracker.scala @@ -0,0 +1,32 @@ +package com.twitter.finatra.streams.transformer.internal + +//TODO: Need method called by processing timer so that watermarks can be emitted without input records +class WatermarkTracker { + private var _watermark: Long = 0L + reset() + + def watermark: Long = _watermark + + def reset(): Unit = { + _watermark = 0L + } + + /** + * @param timestamp + * + * @return True if watermark changed + */ + //TODO: Verify topic is correct when merging inputs + //TODO: Also take in deserialized key and value since we can extract source info (e.g. source of interactions) + //TODO: Also take in maxOutOfOrder param + //TODO: Use rolling histogram + def track(topic: String, timestamp: Long): Boolean = { + val potentialWatermark = timestamp - 1 + if (potentialWatermark > _watermark) { + _watermark = potentialWatermark + true + } else { + false + } + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/internal/domain/Timer.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/internal/domain/Timer.scala new file mode 100644 index 0000000000..0c48bbef40 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/internal/domain/Timer.scala @@ -0,0 +1,14 @@ +package com.twitter.finatra.streams.transformer.internal.domain + +import com.twitter.finatra.streams.converters.time._ +import com.twitter.finatra.streams.transformer.domain.TimerMetadata + +/** + * @param time Time to fire the timer + */ +case class Timer[K](time: Long, metadata: TimerMetadata, key: K) { + + override def toString: String = { + s"Timer(${metadata.getClass.getName} $key @${time.iso8601Millis})" + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/internal/domain/TimerSerde.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/internal/domain/TimerSerde.scala new file mode 100644 index 0000000000..7145774126 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/internal/domain/TimerSerde.scala @@ -0,0 +1,52 @@ +package com.twitter.finatra.streams.transformer.internal.domain + +import com.google.common.primitives.Longs +import com.twitter.finatra.kafka.serde.AbstractSerde +import com.twitter.finatra.streams.transformer.domain.TimerMetadata +import java.nio.ByteBuffer +import org.apache.kafka.common.serialization.Serde + +object TimerSerde { + def apply[K](inner: Serde[K]): TimerSerde[K] = { + new TimerSerde(inner) + } + + def timerTimeToBytes(time: Long): Array[Byte] = { + ByteBuffer + .allocate(Longs.BYTES) + .putLong(time) + .array() + } +} + +class TimerSerde[K](inner: Serde[K]) extends AbstractSerde[Timer[K]] { + + private val TimerTimeSizeBytes = Longs.BYTES + private val MetadataSizeBytes = 1 + + private val innerDeser = inner.deserializer() + private val innerSer = inner.serializer() + + final override def deserialize(bytes: Array[Byte]): Timer[K] = { + val bb = ByteBuffer.wrap(bytes) + val time = bb.getLong() + val metadata = TimerMetadata(bb.get) + + val keyBytes = new Array[Byte](bb.remaining()) + bb.get(keyBytes) + val key = innerDeser.deserialize(topic, keyBytes) + + Timer(time = time, metadata = metadata, key = key) + } + + final override def serialize(timer: Timer[K]): Array[Byte] = { + val keyBytes = innerSer.serialize(topic, timer.key) + val timerBytes = new Array[Byte](TimerTimeSizeBytes + MetadataSizeBytes + keyBytes.length) + + val bb = ByteBuffer.wrap(timerBytes) + bb.putLong(timer.time) + bb.put(timer.metadata.value) + bb.put(keyBytes) + timerBytes + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/watermarks/DefaultWatermarkAssignor.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/watermarks/DefaultWatermarkAssignor.scala new file mode 100644 index 0000000000..665c05e529 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/watermarks/DefaultWatermarkAssignor.scala @@ -0,0 +1,21 @@ +package com.twitter.finatra.streams.transformer.watermarks + +import com.twitter.finatra.streams.transformer.domain.{Time, Watermark} +import com.twitter.inject.Logging + +class DefaultWatermarkAssignor[K, V] extends WatermarkAssignor[K, V] with Logging { + + @volatile private var watermark = Watermark(0L) + + override def onMessage(topic: String, timestamp: Time, key: K, value: V): Unit = { + trace(s"onMessage $topic $timestamp $key -> $value") + val potentialWatermark = timestamp.millis - 1 + if (potentialWatermark > watermark.timeMillis) { + watermark = Watermark(potentialWatermark) + } + } + + override def getWatermark: Watermark = { + watermark + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/watermarks/WatermarkAssignor.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/watermarks/WatermarkAssignor.scala new file mode 100644 index 0000000000..a70ebc68c7 --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/watermarks/WatermarkAssignor.scala @@ -0,0 +1,9 @@ +package com.twitter.finatra.streams.transformer.watermarks + +import com.twitter.finatra.streams.transformer.domain.{Time, Watermark} + +trait WatermarkAssignor[K, V] { + def onMessage(topic: String, timestamp: Time, key: K, value: V): Unit + + def getWatermark: Watermark +} diff --git a/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/watermarks/internal/WatermarkManager.scala b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/watermarks/internal/WatermarkManager.scala new file mode 100644 index 0000000000..669080f36c --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/com/twitter/finatra/streams/transformer/watermarks/internal/WatermarkManager.scala @@ -0,0 +1,45 @@ +package com.twitter.finatra.streams.transformer.watermarks.internal + +import com.twitter.finatra.streams.transformer.OnWatermark +import com.twitter.finatra.streams.transformer.domain.{Time, Watermark} +import com.twitter.finatra.streams.transformer.watermarks.WatermarkAssignor +import com.twitter.inject.Logging + +class WatermarkManager[K, V]( + onWatermark: OnWatermark, + watermarkAssignor: WatermarkAssignor[K, V], + emitWatermarkPerMessage: Boolean) + extends Logging { + + @volatile private var lastEmittedWatermark = Watermark(0L) + + /* Public */ + + def close(): Unit = { + setLastEmittedWatermark(Watermark(0L)) + } + + def watermark: Watermark = { + lastEmittedWatermark + } + + def onMessage(messageTime: Time, topic: String, key: K, value: V): Unit = { + watermarkAssignor.onMessage(topic = topic, timestamp = messageTime, key = key, value = value) + + if (emitWatermarkPerMessage) { + callOnWatermarkIfChanged() + } + } + + def callOnWatermarkIfChanged(): Unit = { + val currentWatermark = watermarkAssignor.getWatermark + if (currentWatermark.timeMillis > lastEmittedWatermark.timeMillis) { + onWatermark.onWatermark(currentWatermark) + setLastEmittedWatermark(currentWatermark) + } + } + + protected[streams] def setLastEmittedWatermark(newWatermark: Watermark): Unit = { + lastEmittedWatermark = newWatermark + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/org/apache/kafka/streams/state/internals/FinatraAbstractStoreBuilder.scala b/kafka-streams/kafka-streams/src/main/scala/org/apache/kafka/streams/state/internals/FinatraAbstractStoreBuilder.scala new file mode 100644 index 0000000000..cd7578a65a --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/org/apache/kafka/streams/state/internals/FinatraAbstractStoreBuilder.scala @@ -0,0 +1,13 @@ +package org.apache.kafka.streams.state.internals + +import org.apache.kafka.common.serialization.Serde +import org.apache.kafka.common.utils.Time +import org.apache.kafka.streams.processor.StateStore + +/* Note: To avoid code duplication for now, this class is created for access to package protected AbstractStoreBuilder */ +abstract class FinatraAbstractStoreBuilder[K, V, T <: StateStore]( + name: String, + keySerde: Serde[K], + valueSerde: Serde[V], + time: Time) + extends AbstractStoreBuilder[K, V, T](name, keySerde, valueSerde, time) diff --git a/kafka-streams/kafka-streams/src/main/scala/org/apache/kafka/streams/state/internals/InMemoryKeyValueFlushingStoreBuilder.scala b/kafka-streams/kafka-streams/src/main/scala/org/apache/kafka/streams/state/internals/InMemoryKeyValueFlushingStoreBuilder.scala new file mode 100644 index 0000000000..35240489ea --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/org/apache/kafka/streams/state/internals/InMemoryKeyValueFlushingStoreBuilder.scala @@ -0,0 +1,20 @@ +package org.apache.kafka.streams.state.internals + +import org.apache.kafka.common.serialization.Serde +import org.apache.kafka.common.utils.Time +import org.apache.kafka.streams.state.KeyValueStore + +class InMemoryKeyValueFlushingStoreBuilder[K, V]( + name: String, + keySerde: Serde[K], + valueSerde: Serde[V], + time: Time = Time.SYSTEM) + extends FinatraAbstractStoreBuilder[K, V, KeyValueStore[K, V]](name, keySerde, valueSerde, time) { + + override def build(): KeyValueStore[K, V] = { + val inMemoryKeyValueStore = new InMemoryKeyValueStore[K, V](name, keySerde, valueSerde) + val inMemoryFlushingKeyValueStore = + new InMemoryKeyValueFlushingLoggedStore[K, V](inMemoryKeyValueStore, keySerde, valueSerde) + new MeteredKeyValueStore[K, V](inMemoryFlushingKeyValueStore, "in-memory-state", time) + } +} diff --git a/kafka-streams/kafka-streams/src/main/scala/org/apache/kafka/streams/state/internals/RocksKeyValueIterator.scala b/kafka-streams/kafka-streams/src/main/scala/org/apache/kafka/streams/state/internals/RocksKeyValueIterator.scala new file mode 100644 index 0000000000..64b8efda3a --- /dev/null +++ b/kafka-streams/kafka-streams/src/main/scala/org/apache/kafka/streams/state/internals/RocksKeyValueIterator.scala @@ -0,0 +1,45 @@ +package org.apache.kafka.streams.state.internals + +import java.util.NoSuchElementException +import org.apache.kafka.common.serialization.Deserializer +import org.apache.kafka.streams.KeyValue +import org.apache.kafka.streams.errors.InvalidStateStoreException +import org.apache.kafka.streams.state.KeyValueIterator +import org.rocksdb.RocksIterator + +class RocksKeyValueIterator[K, V]( + iterator: RocksIterator, + keyDeserializer: Deserializer[K], + valueDeserializer: Deserializer[V], + storeName: String) + extends KeyValueIterator[K, V] { + + private var open: Boolean = true + + override def hasNext: Boolean = { + if (!open) throw new InvalidStateStoreException(s"RocksDB store $storeName has closed") + iterator.isValid + } + + override def peekNextKey(): K = { + if (!hasNext) throw new NoSuchElementException + keyDeserializer.deserialize("", iterator.key()) + } + + override def next(): KeyValue[K, V] = { + if (!hasNext) throw new NoSuchElementException + val entry = new KeyValue( + keyDeserializer.deserialize("", iterator.key()), + valueDeserializer.deserialize("", iterator.value()) + ) + + iterator.next() + + entry + } + + override def close(): Unit = { + open = false + iterator.close() + } +} diff --git a/kafka-streams/kafka-streams/src/test/resources/BUILD b/kafka-streams/kafka-streams/src/test/resources/BUILD new file mode 100644 index 0000000000..9237675c63 --- /dev/null +++ b/kafka-streams/kafka-streams/src/test/resources/BUILD @@ -0,0 +1,3 @@ +resources( + sources = globs("*.xml"), +) diff --git a/kafka-streams/kafka-streams/src/test/resources/logback-test.xml b/kafka-streams/kafka-streams/src/test/resources/logback-test.xml new file mode 100644 index 0000000000..4fee7c9f6f --- /dev/null +++ b/kafka-streams/kafka-streams/src/test/resources/logback-test.xml @@ -0,0 +1,30 @@ + + + + %.-3level %-100logger %msg%n + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/kafka-streams/kafka-streams/src/test/scala/BUILD b/kafka-streams/kafka-streams/src/test/scala/BUILD new file mode 100644 index 0000000000..a0a7124dd8 --- /dev/null +++ b/kafka-streams/kafka-streams/src/test/scala/BUILD @@ -0,0 +1,6 @@ +target( + name = "test-deps", + dependencies = [ + "finatra/kafka-streams/kafka-streams/src/test/scala/com/twitter:test-deps", + ], +) diff --git a/kafka-streams/kafka-streams/src/test/scala/com/twitter/BUILD b/kafka-streams/kafka-streams/src/test/scala/com/twitter/BUILD new file mode 100644 index 0000000000..ddad30417c --- /dev/null +++ b/kafka-streams/kafka-streams/src/test/scala/com/twitter/BUILD @@ -0,0 +1,66 @@ +scala_library( + name = "test-deps", + sources = globs( + "finatra/kafkastreams/test/*.scala", + "finatra/streams/tests/*.scala", + "inject/*.scala", + ), + compiler_option_sets = {"fatal_warnings"}, + provides = scala_artifact( + org = "com.twitter", + name = "finatra-streams-tests", + repo = artifactory, + ), + strict_deps = False, + dependencies = [ + "3rdparty/jvm/com/google/inject:guice", + "3rdparty/jvm/junit", + "3rdparty/jvm/org/apache/kafka", + "3rdparty/jvm/org/apache/kafka:kafka-clients", + "3rdparty/jvm/org/apache/kafka:kafka-clients-test", + "3rdparty/jvm/org/apache/kafka:kafka-streams-test", + "3rdparty/jvm/org/apache/kafka:kafka-streams-test-utils", + "3rdparty/jvm/org/apache/kafka:kafka-test", + "3rdparty/jvm/org/scalatest", + "finatra/inject/inject-core/src/main/scala", + "finatra/inject/inject-core/src/test/scala:test-deps", + "finatra/inject/inject-server/src/test/scala:test-deps", + "finatra/inject/inject-slf4j/src/main/scala", + "finatra/jackson/src/main/scala", + "finatra/kafka-streams/kafka-streams/src/main/scala", + "finatra/kafka/src/test/scala:test-deps", + "util/util-slf4j-api/src/main/scala", + ], + excludes = [ + exclude( + org = "com.twitter", + name = "twitter-server-internal-naming_2.11", + ), + exclude( + org = "com.twitter", + name = "loglens-log4j-logging_2.11", + ), + exclude( + org = "log4j", + name = "log4j", + ), + ], + exports = [ + "3rdparty/jvm/com/google/inject:guice", + "3rdparty/jvm/junit", + "3rdparty/jvm/org/apache/kafka", + "3rdparty/jvm/org/apache/kafka:kafka-clients", + "3rdparty/jvm/org/apache/kafka:kafka-clients-test", + "3rdparty/jvm/org/apache/kafka:kafka-streams-test", + "3rdparty/jvm/org/apache/kafka:kafka-test", + "3rdparty/jvm/org/scalatest", + "finatra/inject/inject-core/src/main/scala", + "finatra/inject/inject-core/src/test/scala:test-deps", + "finatra/inject/inject-server/src/test/scala:test-deps", + "finatra/inject/inject-slf4j/src/main/scala", + "finatra/jackson/src/main/scala", + "finatra/kafka-streams/kafka-streams/src/main/scala", + "finatra/kafka/src/test/scala:test-deps", + "util/util-slf4j-api/src/main/scala", + ], +) diff --git a/kafka-streams/kafka-streams/src/test/scala/com/twitter/finatra/kafkastreams/test/KafkaStreamsFeatureTest.scala b/kafka-streams/kafka-streams/src/test/scala/com/twitter/finatra/kafkastreams/test/KafkaStreamsFeatureTest.scala new file mode 100644 index 0000000000..856e25fee8 --- /dev/null +++ b/kafka-streams/kafka-streams/src/test/scala/com/twitter/finatra/kafkastreams/test/KafkaStreamsFeatureTest.scala @@ -0,0 +1,155 @@ +package com.twitter.finatra.kafkastreams.test + +import com.twitter.conversions.DurationOps._ +import com.twitter.finatra.kafka.test._ +import com.twitter.inject.Test +import com.twitter.util.Duration +import java.io.File +import java.util.concurrent.atomic.AtomicInteger +import org.apache.kafka.common.serialization.Serde +import org.apache.kafka.streams.processor.internals.StreamThread + +/** + * Extensible abstract test class used when testing a single KafkaStreamsTwitterServer in your test. + */ +abstract class KafkaStreamsFeatureTest extends AbstractKafkaStreamsFeatureTest with KafkaFeatureTest + +/** + * Extensible abstract test class used when testing multiple KafkaStreamsTwitterServers in your + * test. + */ +abstract class KafkaStreamsMultiServerFeatureTest extends AbstractKafkaStreamsFeatureTest + +/** + * Extensible abstract test class that provides helper methods to create and access the kafka + * topics used in testing. + */ +abstract class AbstractKafkaStreamsFeatureTest extends Test with EmbeddedKafka { + + override def afterAll(): Unit = { + super.afterAll() + + resetStreamThreadId() + } + + protected def newTempDirectory(): File = { + TestDirectoryUtils.newTempDirectory() + } + + protected def kafkaCommitInterval: Duration = 50.milliseconds + + protected def kafkaStreamsFlags: Map[String, String] = { + kafkaBootstrapFlag ++ Map( + "kafka.auto.offset.reset" -> "earliest", + "kafka.max.poll.records" -> "1", //Read one record at a time to help makes tests more deterministic + "kafka.commit.interval" -> kafkaCommitInterval.toString(), + "kafka.replication.factor" -> "1", + "kafka.state.dir" -> newTempDirectory().toString + ) + } + + /** + * Creates a kafka topic on the internal test brokers to be used for testing. Returns an instance + * of a [[KafkaTopic]] which can be used in testing to write to/read from the topic. + * + * @param keySerde serde for the key of the topic + * @param valSerde serde for the value of the topic + * @param name name of the topic + * @param partitions number of partitions of the topic + * @param replication replication factor for the topic + * @param autoCreate true to create the topic on the brokers, false to simply access an existing + * topic. + * @param autoConsume true causes the [[KafkaTopic]] class to read messages off of the topic as + * soon as they are available, false leaves them on the topic until + * [[KafkaTopic.consumeRecord()]] is called. + * @param logPublish true will log each + * @param allowPublishes whether or not this topic allows you to publish to it from a test + * @tparam K the type of the key + * @tparam V the type of the value + * + * @return a kafkaTopic which can be used to read and assert that values were written to a topic, + * or insert values into a topic. + */ + override protected def kafkaTopic[K, V]( + keySerde: Serde[K], + valSerde: Serde[V], + name: String, + partitions: Int = 1, + replication: Int = 1, + autoCreate: Boolean = true, + autoConsume: Boolean = true, + logPublish: Boolean = false, + allowPublishes: Boolean = true // TODO is this used?!?!? + ): KafkaTopic[K, V] = { + if (name.contains("changelog") && autoCreate) { + warn( + s"Changelog topics should be created by Kafka-Streams. It's recommended that you set autoCreate=false for kafka topic $name" + ) + } + super.kafkaTopic( + keySerde, + valSerde, + name, + partitions, + replication, + autoCreate, + autoConsume = autoConsume, + logPublishes = logPublish + ) + } + + /** + * Returns an instance of a [[KafkaTopic]] which can be used in testing to write to/read from the + * topic. + * + * @note because changelog topics are automatically created by the KafkaStreams app, use this + * method to access them, which will not create them on the broker. + * + * @note you cannot publish to this [[KafkaTopic]] from your test, because all publishes to this + * topic should originate from KafkaStreams. Attempting to do so will cause an assertion + * to fail. + * + * @param keySerde serde for the key of the topic + * @param valSerde serde for the value of the topic + * @param name name of the topic + * @param partitions number of partitions of the topic + * @param replication replication factor for the topic + * @param autoCreate true to create the topic on the brokers, false to simply access an existing + * topic. + * @tparam K the type of the key + * @tparam V the type of the value + * + * @return a kafkaTopic which can be used to assert that values have been written to the changelog. + */ + protected def kafkaChangelogTopic[K, V]( + keySerde: Serde[K], + valSerde: Serde[V], + name: String, + partitions: Int = 1, + replication: Int = 1, + autoConsume: Boolean = true + ): KafkaTopic[K, V] = { + super.kafkaTopic( + keySerde, + valSerde, + name, + partitions, + replication, + autoCreate = false, + autoConsume = autoConsume, + logPublishes = false, + allowPublishes = false + ) + } + + //HACK: Reset StreamThread's id after each test so that each test starts from a known fresh state + //Without this hack, tests would need to always wildcard the thread number when asserting stats + protected def resetStreamThreadId(): Unit = { + val streamThreadClass = classOf[StreamThread] + val streamThreadIdSequenceField = streamThreadClass + .getDeclaredField("STREAM_THREAD_ID_SEQUENCE") + streamThreadIdSequenceField.setAccessible(true) + val streamThreadIdSequence = streamThreadIdSequenceField.get(null).asInstanceOf[AtomicInteger] + streamThreadIdSequence.set(1) + } +} diff --git a/kafka-streams/kafka-streams/src/test/scala/com/twitter/finatra/kafkastreams/test/TestDirectoryUtils.scala b/kafka-streams/kafka-streams/src/test/scala/com/twitter/finatra/kafkastreams/test/TestDirectoryUtils.scala new file mode 100644 index 0000000000..b0652ecd2b --- /dev/null +++ b/kafka-streams/kafka-streams/src/test/scala/com/twitter/finatra/kafkastreams/test/TestDirectoryUtils.scala @@ -0,0 +1,24 @@ +package com.twitter.finatra.kafkastreams.test + +import com.twitter.inject.Logging +import java.io.File +import java.nio.file.Files +import org.apache.commons.io.FileUtils +import scala.util.control.NonFatal + +object TestDirectoryUtils extends Logging { + + def newTempDirectory(): File = { + val dir = Files.createTempDirectory("kafkastreams").toFile + Runtime.getRuntime.addShutdownHook(new Thread() { + override def run(): Unit = { + try FileUtils.forceDelete(dir) + catch { + case NonFatal(e) => + error(s"Error deleting $dir", e) + } + } + }) + dir + } +} diff --git a/kafka-streams/kafka-streams/src/test/scala/com/twitter/finatra/kafkastreams/test/TimeTraveler.scala b/kafka-streams/kafka-streams/src/test/scala/com/twitter/finatra/kafkastreams/test/TimeTraveler.scala new file mode 100644 index 0000000000..8589186149 --- /dev/null +++ b/kafka-streams/kafka-streams/src/test/scala/com/twitter/finatra/kafkastreams/test/TimeTraveler.scala @@ -0,0 +1,51 @@ +package com.twitter.finatra.kafkastreams.test + +import com.github.nscala_time.time.DurationBuilder +import com.twitter.util.Duration +import org.joda.time.{DateTime, DateTimeUtils} + +/** + * Helper class to modify the timestamps that will be used in FinatraStreams tests. + */ +trait TimeTraveler { + def setTime(now: String): Unit = { + setTime(new DateTime(now)) + } + + def setTime(now: DateTime): Unit = { + DateTimeUtils.setCurrentMillisFixed(now.getMillis) + } + + def advanceTime(duration: Duration): DateTime = { + advanceTimeMillis(duration.inMillis) + } + + def advanceTime(duration: DurationBuilder): DateTime = { + advanceTimeMillis(duration.toDuration.getMillis) + } + + def currentHour: DateTime = { + now.hourOfDay().roundFloorCopy() + } + + def now: DateTime = { + DateTime.now() + } + + def currentHourMillis: Long = { + currentHour.getMillis + } + + def priorHour: DateTime = { + currentHour.minusHours(1) + } + + def priorHourMillis: Long = { + priorHour.getMillis + } + + private def advanceTimeMillis(durationMillis: Long) = { + DateTimeUtils.setCurrentMillisFixed(DateTimeUtils.currentTimeMillis() + durationMillis) + now + } +} diff --git a/kafka-streams/kafka-streams/src/test/scala/com/twitter/finatra/streams/tests/FinatraTopologyTester.scala b/kafka-streams/kafka-streams/src/test/scala/com/twitter/finatra/streams/tests/FinatraTopologyTester.scala new file mode 100644 index 0000000000..c19fd2f967 --- /dev/null +++ b/kafka-streams/kafka-streams/src/test/scala/com/twitter/finatra/streams/tests/FinatraTopologyTester.scala @@ -0,0 +1,311 @@ +package com.twitter.finatra.streams.tests + +import com.github.nscala_time.time.DurationBuilder +import com.google.inject.Module +import com.twitter.conversions.DurationOps._ +import com.twitter.finagle.stats.{InMemoryStatsReceiver, StatsReceiver} +import com.twitter.finatra.kafka.modules.KafkaBootstrapModule +import com.twitter.finatra.kafka.test.utils.InMemoryStatsUtil +import com.twitter.finatra.kafkastreams.KafkaStreamsTwitterServer +import com.twitter.finatra.kafkastreams.test.TestDirectoryUtils +import com.twitter.finatra.streams.converters.time._ +import com.twitter.finatra.streams.flags.FinatraTransformerFlags +import com.twitter.finatra.streams.query.{ + QueryableFinatraKeyValueStore, + QueryableFinatraWindowStore +} +import com.twitter.finatra.streams.transformer.domain.TimeWindowed +import com.twitter.finatra.streams.transformer.internal.domain.Timer +import com.twitter.inject.{AppAccessor, Injector, Logging, TwitterModule} +import com.twitter.util.Duration +import java.util.Properties +import org.apache.kafka.common.serialization.Serde +import org.apache.kafka.streams.state.KeyValueStore +import org.apache.kafka.streams.{Topology, TopologyTestDriver} +import org.joda.time.{DateTime, DateTimeUtils} + +object FinatraTopologyTester { + + /** + * FinatraTopologyTester is provides useful testing utilities integrating Kafka's + * [[TopologyTestDriver]] and a Finatra Streams server. + * + * @param kafkaApplicationId The application.id of the Kafka Streams server being tested. + * The application.id is used to name the changelog topics, so we recommend + * setting this value to the same value used in your production service + * @param server A KafkaStreamsTwitterServer containing the streams topology to be + * tested. + * @param startingWallClockTime The starting wall clock time for each individual test. Note, that + * publishing a message using TopologyTesterTopic#pipeInput will use + * the current mocked wall clock time unless an explicit publish time + * is specified. + * @param flags Additional application level flags that you may have. + * @param thriftQueryable Enable if your service is exposing queryable state using the + * com.twitter.finatra.streams.queryable.thrift.QueryableState trait. + * @param overrideModules Finatra override modules which redefine production bindings. + */ + def apply( + kafkaApplicationId: String, + server: KafkaStreamsTwitterServer, + startingWallClockTime: DateTime, + flags: Map[String, String] = Map(), + overrideModules: Seq[Module] = Seq(), + thriftQueryable: Boolean = false, + finatraTransformer: Boolean = false, + emitWatermarkPerMessage: Boolean = true, + autoWatermarkInterval: Duration = 0.millis + ): FinatraTopologyTester = { + AppAccessor.callInits(server) + + AppAccessor.callParseArgs( + server = server, + args = kafkaStreamsArgs( + kafkaApplicationId = kafkaApplicationId, + otherFlags = flags, + thriftQueryable = thriftQueryable, + finatraTransformer = finatraTransformer, + emitWatermarkPerMessage = emitWatermarkPerMessage, + autoWatermarkInterval = autoWatermarkInterval + ) + ) + + val inMemoryStatsReceiver = new InMemoryStatsReceiver + AppAccessor.callAddFrameworkOverrideModules( + server, + overrideModules.toList :+ new TwitterModule { + override def configure(): Unit = { + bind[StatsReceiver].toInstance(inMemoryStatsReceiver) + } + } + ) + + val injector = AppAccessor.loadAndSetInstalledModules(server) + + FinatraTopologyTester( + properties = server.createKafkaStreamsProperties(), + topology = server.createKafkaStreamsTopology(), + inMemoryStatsReceiver = inMemoryStatsReceiver, + injector = injector, + startingWallClockTime = startingWallClockTime + ) + } + + private def kafkaStreamsArgs( + kafkaApplicationId: String, + otherFlags: Map[String, String] = Map(), + thriftQueryable: Boolean, + finatraTransformer: Boolean, + emitWatermarkPerMessage: Boolean, + autoWatermarkInterval: Duration + ): Map[String, String] = { + val kafkaStreamsAndOtherFlags = otherFlags ++ Map( + "kafka.application.id" -> kafkaApplicationId, + KafkaBootstrapModule.kafkaBootstrapServers.name -> "127.0.0.1:12345", + "kafka.state.dir" -> TestDirectoryUtils.newTempDirectory().toString + ) + + addFinatraTransformerFlags( + flags = addThriftQueryableFlags( + thriftQueryable = thriftQueryable, + otherFlagsPlusKafkaStreamsRequiredFlags = kafkaStreamsAndOtherFlags + ), + finatraTransformer = finatraTransformer, + emitWatermarkPerMessage = emitWatermarkPerMessage, + autoWatermarkInterval = autoWatermarkInterval + ) + } + + private def addThriftQueryableFlags( + thriftQueryable: Boolean, + otherFlagsPlusKafkaStreamsRequiredFlags: Map[String, String] + ) = { + if (thriftQueryable) { + Map( + "kafka.application.num.instances" -> "1", + "kafka.num.queryable.partitions" -> "1", + "kafka.current.shard" -> "0" + ) ++ otherFlagsPlusKafkaStreamsRequiredFlags + } else { + otherFlagsPlusKafkaStreamsRequiredFlags + } + } + + private def addFinatraTransformerFlags( + flags: Map[String, String], + finatraTransformer: Boolean, + emitWatermarkPerMessage: Boolean, + autoWatermarkInterval: Duration + ): Map[String, String] = { + if (finatraTransformer) { + flags ++ Map( + FinatraTransformerFlags.AutoWatermarkInterval -> s"$autoWatermarkInterval", + FinatraTransformerFlags.EmitWatermarkPerMessage -> s"$emitWatermarkPerMessage" + ) + } else { + flags + } + } +} + +case class FinatraTopologyTester private ( + properties: Properties, + topology: Topology, + inMemoryStatsReceiver: InMemoryStatsReceiver, + injector: Injector, + startingWallClockTime: DateTime) + extends Logging { + + private val inMemoryStatsUtil = new InMemoryStatsUtil(inMemoryStatsReceiver) + private var _driver: TopologyTestDriver = _ + + /* Public */ + + def driver: TopologyTestDriver = _driver + + def topic[K, V]( + name: String, + keySerde: Serde[K], + valSerde: Serde[V] + ): TopologyTesterTopic[K, V] = { + new TopologyTesterTopic(_driver, name, keySerde, valSerde) + } + + def getKeyValueStore[K, V](name: String): KeyValueStore[K, V] = { + driver + .getStateStore(name) + .asInstanceOf[KeyValueStore[K, V]] + } + + /** + * Get a Finatra windowed key value store by name + * @param name Name of the store + * @tparam K Key type of the store + * @tparam V Value type of the store + * @return KeyValueStore used for time windowed keys + */ + def getFinatraWindowedStore[K, V](name: String): KeyValueStore[TimeWindowed[K], V] = { + getKeyValueStore[TimeWindowed[K], V](name) + } + + /** + * Get a Finatra timer key value store by name + * @param name Name of the store + * @tparam K Key type of the store + * @tparam V Value type of the store + * @return KeyValueStore used for timer entries + */ + def getFinatraTimerStore[K](name: String): KeyValueStore[Timer[K], Array[Byte]] = { + getKeyValueStore[Timer[K], Array[Byte]](name) + } + + /** + * Get a Finatra windowed timer store by name + * @param name Name of the store + * @tparam K Key type of the store + * @tparam V Value type of the store + * @return KeyValueStore used for time windowed timer entries + */ + def getFinatraWindowedTimerStore[K]( + name: String + ): KeyValueStore[Timer[TimeWindowed[K]], Array[Byte]] = { + getFinatraTimerStore[TimeWindowed[K]](name) + } + + def reset(): Unit = { + close() + createTopologyTester() + DateTimeUtils.setCurrentMillisFixed(startingWallClockTime.getMillis) + } + + def close(): Unit = { + if (_driver != null) { + _driver.close() + } + } + + def setWallClockTime(now: String): Unit = { + setWallClockTime(new DateTime(now)) + } + + def setWallClockTime(now: DateTime): Unit = { + DateTimeUtils.setCurrentMillisFixed(now.getMillis) + } + + def now: DateTime = { + DateTime.now() + } + + def currentHour: DateTime = { + now.hourOfDay().roundFloorCopy() + } + + def currentMinute: DateTime = { + now.minuteOfDay().roundFloorCopy() + } + + def priorMinute: DateTime = { + priorMinute(1) + } + + def priorMinute(minutesBack: Int): DateTime = { + currentMinute.minusMinutes(minutesBack) + } + + def priorHour: DateTime = { + priorHour(1) + } + + def priorHour(hoursBack: Int): DateTime = { + currentHour.minusHours(hoursBack) + } + + def advanceWallClockTime(duration: Duration): DateTime = { + advanceWallClockTime(duration.inMillis) + } + + def advanceWallClockTime(duration: DurationBuilder): DateTime = { + advanceWallClockTime(duration.toDuration.getMillis) + } + + def queryableFinatraKeyValueStore[PK, K, V]( + storeName: String, + primaryKeySerde: Serde[PK] + ): QueryableFinatraKeyValueStore[PK, K, V] = { + new QueryableFinatraKeyValueStore[PK, K, V]( + storeName, + primaryKeySerde = primaryKeySerde, + numShards = 1, + numQueryablePartitions = 1, + currentShardId = 0 + ) + } + + def queryableFinatraWindowStore[K, V]( + storeName: String, + windowSize: Duration, + keySerde: Serde[K] + ): QueryableFinatraWindowStore[K, V] = { + new QueryableFinatraWindowStore[K, V]( + storeName, + windowSize = windowSize, + keySerde = keySerde, + numShards = 1, + numQueryablePartitions = 1, + currentShardId = 0) + } + + def stats: InMemoryStatsUtil = inMemoryStatsUtil + + /* Private */ + + private def advanceWallClockTime(durationMillis: Long): DateTime = { + DateTimeUtils.setCurrentMillisFixed(DateTimeUtils.currentTimeMillis() + durationMillis) + debug(s"Advance wall clock to ${DateTimeUtils.currentTimeMillis().iso8601Millis}") + _driver.advanceWallClockTime(durationMillis) + now + } + + private def createTopologyTester(): Unit = { + _driver = new TopologyTestDriver(topology, properties) + } +} diff --git a/kafka-streams/kafka-streams/src/test/scala/com/twitter/finatra/streams/tests/TopologyFeatureTest.scala b/kafka-streams/kafka-streams/src/test/scala/com/twitter/finatra/streams/tests/TopologyFeatureTest.scala new file mode 100644 index 0000000000..c1b331f0c6 --- /dev/null +++ b/kafka-streams/kafka-streams/src/test/scala/com/twitter/finatra/streams/tests/TopologyFeatureTest.scala @@ -0,0 +1,66 @@ +package com.twitter.finatra.streams.tests + +import com.twitter.inject.Test + +/** + * Extensible abstract test class used when testing your KafkaStreams topology using the + * [[FinatraTopologyTester]]. + * + * Example usage: + * + * {{{ + * class WordCountServerTopologyFeatureTest extends TopologyFeatureTest { + * + * override val topologyTester = FinatraTopologyTester( + * kafkaApplicationId = "wordcount-prod-bob", + * server = new WordCountRocksDbServer, + * startingWallClockTime = new DateTime("2018-01-01T00:00:00Z") + * ) + * + * private val textLinesTopic = + * topologyTester.topic("TextLinesTopic", Serdes.ByteArray(), Serdes.String) + * + * private val wordsWithCountsTopic = + * topologyTester.topic("WordsWithCountsTopic", Serdes.String, ScalaSerdes.Long) + * + * test("word count test 1") { + * val countsStore = topologyTester.getKeyValueStore[String, Long]("CountsStore") + * + * textLinesTopic.pipeInput(Array.emptyByteArray, "Hello World Hello") + * + * wordsWithCountsTopic.assertOutput("Hello", 1) + * wordsWithCountsTopic.assertOutput("World", 1) + * wordsWithCountsTopic.assertOutput("Hello", 2) + * + * countsStore.get("Hello") should equal(2) + * countsStore.get("World") should equal(1) + * } + * + * test("word count test 2") { + * val countsStore = topologyTester.getKeyValueStore[String, Long]("CountsStore") + * + * textLinesTopic.pipeInput(Array.emptyByteArray, "yo yo yo") + * + * wordsWithCountsTopic.assertOutput("yo", 1) + * wordsWithCountsTopic.assertOutput("yo", 2) + * wordsWithCountsTopic.assertOutput("yo", 3) + * + * countsStore.get("yo") should equal(3) + * } + * } + * }}} + */ +abstract class TopologyFeatureTest extends Test { + + protected def topologyTester: FinatraTopologyTester + + override def beforeEach(): Unit = { + super.beforeEach() + topologyTester.reset() + } + + override def afterAll(): Unit = { + super.afterAll() + topologyTester.close() + } +} diff --git a/kafka-streams/kafka-streams/src/test/scala/com/twitter/finatra/streams/tests/TopologyTesterTopic.scala b/kafka-streams/kafka-streams/src/test/scala/com/twitter/finatra/streams/tests/TopologyTesterTopic.scala new file mode 100644 index 0000000000..60e4d31d0a --- /dev/null +++ b/kafka-streams/kafka-streams/src/test/scala/com/twitter/finatra/streams/tests/TopologyTesterTopic.scala @@ -0,0 +1,63 @@ +package com.twitter.finatra.streams.tests + +import com.twitter.finatra.streams.converters.time._ +import org.apache.kafka.clients.producer.ProducerRecord +import org.apache.kafka.common.serialization.Serde +import org.apache.kafka.streams.TopologyTestDriver +import org.apache.kafka.streams.test.ConsumerRecordFactory +import org.joda.time.{DateTime, DateTimeUtils} +import org.scalatest.Matchers + +/** + * Used to read/write from Kafka topics in the topology tester. + * + * @param topologyTestDriver the topology test driver + * @param name the name of the topic + * @param keySerde the serde for the key + * @param valSerde the serde for the value + * @tparam K the type of the key + * @tparam V the type of the value + */ +class TopologyTesterTopic[K, V]( + topologyTestDriver: => TopologyTestDriver, + name: String, + keySerde: Serde[K], + valSerde: Serde[V]) + extends Matchers { + + private val recordFactory = + new ConsumerRecordFactory(name, keySerde.serializer, valSerde.serializer) + + def pipeInput(key: K, value: V, timestamp: Long = DateTimeUtils.currentTimeMillis()): Unit = { + topologyTestDriver.pipeInput(recordFactory.create(key, value, timestamp)) + } + + def readOutput(): ProducerRecord[K, V] = { + topologyTestDriver.readOutput(name, keySerde.deserializer(), valSerde.deserializer()) + } + + def readAllOutput(): Seq[ProducerRecord[K, V]] = { + Iterator + .continually(readOutput()) + .takeWhile(_ != null) + .toSeq + } + + def assertOutput(key: K, value: V, time: DateTime = null): Unit = { + val outputRecord = readOutput() + assert(outputRecord != null, "No output record is available for this assertion") + + if (key != outputRecord.key() || value != outputRecord.value()) { + assert((outputRecord.key.toString -> outputRecord.value.toString) == (key -> value)) + } + + if (time != null && outputRecord.timestamp.toLong.iso8601Millis != time.getMillis.iso8601Millis) { + assert( + Tuple3( + outputRecord.key(), + outputRecord.value(), + outputRecord.timestamp.toLong.iso8601Millis) == + Tuple3(key, value, time.getMillis.iso8601Millis)) + } + } +} diff --git a/kafka-streams/kafka-streams/src/test/scala/com/twitter/inject/AppAccessor.scala b/kafka-streams/kafka-streams/src/test/scala/com/twitter/inject/AppAccessor.scala new file mode 100644 index 0000000000..24a523645e --- /dev/null +++ b/kafka-streams/kafka-streams/src/test/scala/com/twitter/inject/AppAccessor.scala @@ -0,0 +1,49 @@ +package com.twitter.inject + +import com.google.inject.Module +import com.twitter.finatra.kafkastreams.KafkaStreamsTwitterServer +import scala.collection.mutable + +//TODO: DINS-2387: Update com.twitter.inject.app.App to avoid the need for reflection below +object AppAccessor { + private val appClass = classOf[com.twitter.inject.app.App] + + /* Public */ + + def callInits(server: KafkaStreamsTwitterServer): Unit = { + val initsMethod = classOf[com.twitter.app.App].getMethods.find(_.getName.endsWith("inits")).get + initsMethod.setAccessible(true) + val inits = initsMethod.invoke(server).asInstanceOf[mutable.Buffer[() => Unit]] + for (f <- inits) { + f() + } + } + + def callParseArgs(server: KafkaStreamsTwitterServer, args: Map[String, String]): Unit = { + val parseArgsMethod = appClass.getMethod("parseArgs", classOf[Array[String]]) + parseArgsMethod.invoke(server, flagsAsArgs(args)) + } + + def callAddFrameworkOverrideModules( + server: KafkaStreamsTwitterServer, + overrideModules: Seq[Module] + ): Unit = { + server.addFrameworkOverrideModules(overrideModules: _*) + } + + def loadAndSetInstalledModules(server: KafkaStreamsTwitterServer): Injector = { + val installedModules = server.loadModules() + val injector = installedModules.injector + + val setInstalledModulesMethod = appClass.getMethods + .find(_.toString.contains("installedModules_$eq")).get + + setInstalledModulesMethod.invoke(server, installedModules) + + injector + } + + def flagsAsArgs(flags: Map[String, String]): Array[String] = { + flags.map { case (k, v) => "-" + k + "=" + v }.toArray + } +} diff --git a/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/BUILD b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/BUILD new file mode 100644 index 0000000000..150a15777c --- /dev/null +++ b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/BUILD @@ -0,0 +1,27 @@ +junit_tests( + sources = rglobs("*.scala"), + compiler_option_sets = {"fatal_warnings"}, + strict_deps = False, + dependencies = [ + "3rdparty/jvm/ch/qos/logback:logback-classic", + "3rdparty/jvm/org/apache/kafka:kafka-clients-test", + "3rdparty/jvm/org/apache/kafka:kafka-streams", + "3rdparty/jvm/org/apache/kafka:kafka-streams-test", + "3rdparty/jvm/org/apache/kafka:kafka-streams-test-utils", + "3rdparty/jvm/org/apache/kafka:kafka-test", + "3rdparty/jvm/org/apache/zookeeper:zookeeper-client", + "3rdparty/jvm/org/apache/zookeeper:zookeeper-server", + "finatra-internal/streams/examples/tweet-word-count/src/main/scala", + "finatra/inject/inject-app/src/main/scala", + "finatra/inject/inject-core/src/main/scala", + "finatra/inject/inject-core/src/test/scala:test-deps", + "finatra/inject/inject-server/src/main/scala", + "finatra/inject/inject-server/src/test/scala:test-deps", + "finatra/inject/inject-slf4j/src/main/scala", + "finatra/kafka-streams/kafka-streams/src/main/scala", + "finatra/kafka-streams/kafka-streams/src/test/resources", + "finatra/kafka-streams/kafka-streams/src/test/scala/com/twitter:test-deps", + "finatra/kafka/src/test/scala:test-deps", + "finatra/thrift/src/test/scala:test-deps", + ], +) diff --git a/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/FinatraKeyValueStoreLatencyTest.scala b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/FinatraKeyValueStoreLatencyTest.scala new file mode 100644 index 0000000000..b33eac4d72 --- /dev/null +++ b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/FinatraKeyValueStoreLatencyTest.scala @@ -0,0 +1,152 @@ +package com.twitter.unittests + +import com.twitter.finagle.stats.InMemoryStatsReceiver +import com.twitter.finatra.kafka.test.utils.InMemoryStatsUtil +import com.twitter.finatra.streams.stores.internal.FinatraKeyValueStoreImpl +import com.twitter.inject.Test +import org.apache.kafka.common.metrics.Metrics +import org.apache.kafka.common.serialization.Serdes +import org.apache.kafka.common.utils.LogContext +import org.apache.kafka.streams.KeyValue +import org.apache.kafka.streams.processor.StateStore +import org.apache.kafka.streams.processor.internals.MockStreamsMetrics +import org.apache.kafka.streams.state.Stores +import org.apache.kafka.streams.state.internals.ThreadCache +import org.apache.kafka.test.{InternalMockProcessorContext, NoOpRecordCollector, TestUtils} +import scala.collection.JavaConversions._ + +class FinatraKeyValueStoreLatencyTest extends Test { + + private var context: InternalMockProcessorContext = _ + + private val statsReceiver = new InMemoryStatsReceiver() + private val statsUtil = new InMemoryStatsUtil(statsReceiver) + private val keyValueStore = new FinatraKeyValueStoreImpl[Int, String]( + name = "FinatraKeyValueStoreTest", + statsReceiver = statsReceiver + ) + + private val Keys1to10 = 1 to 10 + private val Values1to10 = 'a' to 'j' + private val KeyValues1to10 = (Keys1to10 zip Values1to10) + .map(keyValue => new KeyValue(keyValue._1, keyValue._2.toString)) + private val Key1 = KeyValues1to10.head.key + private val Value1 = KeyValues1to10.head.value + + // TODO: add `FinatraKeyValueStoreImpl.DeleteRangeExperimentalLatencyStatName` for testing + private val AllLatencyStats = Seq( + FinatraKeyValueStoreImpl.InitLatencyStatName, + FinatraKeyValueStoreImpl.CloseLatencyStatName, + FinatraKeyValueStoreImpl.PutLatencyStatName, + FinatraKeyValueStoreImpl.PutIfAbsentLatencyStatName, + FinatraKeyValueStoreImpl.PutAllLatencyStatName, + FinatraKeyValueStoreImpl.DeleteLatencyStatName, + FinatraKeyValueStoreImpl.FlushLatencyStatName, + FinatraKeyValueStoreImpl.PersistentLatencyStatName, + FinatraKeyValueStoreImpl.IsOpenLatencyStatName, + FinatraKeyValueStoreImpl.GetLatencyStatName, + FinatraKeyValueStoreImpl.RangeLatencyStatName, + FinatraKeyValueStoreImpl.AllLatencyStatName, + FinatraKeyValueStoreImpl.ApproximateNumEntriesLatencyStatName, + FinatraKeyValueStoreImpl.DeleteRangeLatencyStatName, + FinatraKeyValueStoreImpl.DeleteWithoutGettingPriorValueLatencyStatName, + FinatraKeyValueStoreImpl.FinatraRangeLatencyStatName + ) + + private def getLatencyStat(name: String): Seq[Float] = { + val latencyStatNamePrefix = "stores/FinatraKeyValueStoreTest" + val latencyStatNameSuffix = "latency_us" + statsUtil.getStat(s"$latencyStatNamePrefix/$name/$latencyStatNameSuffix") + } + + private def assertNonzeroLatency(name: String) = { + val latencyStat = getLatencyStat(name) + assert(latencyStat.nonEmpty, s"$name stat is empty") + assert(latencyStat.forall(_ >= 0), s"$name call had zero latency") + } + + private def assertAllNonzeroLatency() = { + AllLatencyStats.map { name => + assertNonzeroLatency(name) + } + } + + override def beforeEach(): Unit = { + context = new InternalMockProcessorContext( + TestUtils.tempDirectory, + Serdes.Integer, + Serdes.String, + new NoOpRecordCollector, + new ThreadCache(new LogContext(), 0, new MockStreamsMetrics(new Metrics())) + ) { + override def getStateStore(name: String): StateStore = { + val storeBuilder = Stores + .keyValueStoreBuilder( + Stores.persistentKeyValueStore(name), + Serdes.Integer(), + Serdes.String() + ) + + val store = storeBuilder.build + store.init(this, store) + store + } + } + } + + override def afterEach(): Unit = { + statsReceiver.clear() + } + + test("Series of store operations") { // TODO: test deleteRangeExperimental() + keyValueStore.init(context, null) + assertNonzeroLatency(FinatraKeyValueStoreImpl.InitLatencyStatName) + + assert(keyValueStore.isOpen()) + assertNonzeroLatency(FinatraKeyValueStoreImpl.IsOpenLatencyStatName) + + assert(keyValueStore.persistent()) + assertNonzeroLatency(FinatraKeyValueStoreImpl.PersistentLatencyStatName) + + keyValueStore.put(Key1, Value1) + assertNonzeroLatency(FinatraKeyValueStoreImpl.PutLatencyStatName) + + assert(keyValueStore.get(Key1) == Value1) + assertNonzeroLatency(FinatraKeyValueStoreImpl.GetLatencyStatName) + + keyValueStore.putIfAbsent(Key1, Value1) + assertNonzeroLatency(FinatraKeyValueStoreImpl.PutIfAbsentLatencyStatName) + + keyValueStore.delete(Key1) + assertNonzeroLatency(FinatraKeyValueStoreImpl.DeleteLatencyStatName) + + keyValueStore.putAll(KeyValues1to10) + assertNonzeroLatency(FinatraKeyValueStoreImpl.PutAllLatencyStatName) + + keyValueStore.range(1, 5).close() + assertNonzeroLatency(FinatraKeyValueStoreImpl.RangeLatencyStatName) + + keyValueStore.all().close() + assertNonzeroLatency(FinatraKeyValueStoreImpl.AllLatencyStatName) + + keyValueStore.approximateNumEntries() + assertNonzeroLatency(FinatraKeyValueStoreImpl.ApproximateNumEntriesLatencyStatName) + + keyValueStore.deleteRange(1, 2) + assertNonzeroLatency(FinatraKeyValueStoreImpl.DeleteRangeLatencyStatName) + + keyValueStore.deleteWithoutGettingPriorValue(10) + assertNonzeroLatency(FinatraKeyValueStoreImpl.DeleteWithoutGettingPriorValueLatencyStatName) + + keyValueStore.range(Array()) + assertNonzeroLatency(FinatraKeyValueStoreImpl.FinatraRangeLatencyStatName) + + keyValueStore.flush() + assertNonzeroLatency(FinatraKeyValueStoreImpl.FlushLatencyStatName) + + keyValueStore.close() + assertNonzeroLatency(FinatraKeyValueStoreImpl.CloseLatencyStatName) + + assertAllNonzeroLatency() + } +} diff --git a/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/MultiSpanIteratorTest.scala b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/MultiSpanIteratorTest.scala new file mode 100644 index 0000000000..c62d7d5b80 --- /dev/null +++ b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/MultiSpanIteratorTest.scala @@ -0,0 +1,47 @@ +package com.twitter.unittests + +import com.twitter.finatra.streams.transformer.MultiSpanIterator +import com.twitter.inject.Test + +class MultiSpanIteratorTest extends Test { + + test("test spanning function works, all items to the same span") { + val spanningIterator = new MultiSpanIterator[Int, Int](Seq(1, 1, 1, 2, 2, 3).toIterator, x => 0) + assertSpanningIterator(spanningIterator, Seq(Seq(1, 1, 1, 2, 2, 3))) + } + + test("1,1,1") { + val spanningIterator = new MultiSpanIterator[Int, Int](Seq(1, 1, 1).toIterator, identity) + assertSpanningIterator(spanningIterator, Seq(Seq(1, 1, 1))) + } + + test("1") { + val spanningIterator = new MultiSpanIterator[Int, Int](Seq(1).toIterator, identity) + assertSpanningIterator(spanningIterator, Seq(Seq(1))) + } + + test("empty") { + val spanningIterator = new MultiSpanIterator[Int, Int](Seq().toIterator, identity) + assertSpanningIterator(spanningIterator, Seq()) + } + + test("1,1,1,2,2,3") { + val spanningIterator = + new MultiSpanIterator[Int, Int](Seq(1, 1, 1, 2, 2, 3).toIterator, identity) + assertSpanningIterator(spanningIterator, Seq(Seq(1, 1, 1), Seq(2, 2), Seq(3))) + } + + test("1,2,1,2") { + val spanningIterator = new MultiSpanIterator[Int, Int](Seq(1, 2, 1, 2).toIterator, identity) + assertSpanningIterator(spanningIterator, Seq(Seq(1), Seq(2), Seq(1), Seq(2))) + } + + /* Private */ + + private def assertSpanningIterator[T]( + iterator: Iterator[Iterator[T]], + expected: Seq[Seq[T]] + ): Unit = { + assert(iterator.map(_.toSeq).toSeq == expected) + } +} diff --git a/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/PersistentTimerStoreTest.scala b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/PersistentTimerStoreTest.scala new file mode 100644 index 0000000000..1149a40d95 --- /dev/null +++ b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/PersistentTimerStoreTest.scala @@ -0,0 +1,188 @@ +package com.twitter.unittests + +import com.twitter.finagle.stats.NullStatsReceiver +import com.twitter.finatra.json.JsonDiff +import com.twitter.finatra.streams.stores.internal.FinatraKeyValueStoreImpl +import com.twitter.finatra.streams.transformer.PersistentTimerStore +import com.twitter.finatra.streams.transformer.domain.{Expire, Time, TimerMetadata, Watermark} +import com.twitter.finatra.streams.transformer.internal.domain.{Timer, TimerSerde} +import com.twitter.inject.Test +import org.apache.kafka.common.metrics.Metrics +import org.apache.kafka.common.serialization.Serdes +import org.apache.kafka.common.utils.LogContext +import org.apache.kafka.streams.processor.StateStore +import org.apache.kafka.streams.processor.internals.MockStreamsMetrics +import org.apache.kafka.streams.state.Stores +import org.apache.kafka.streams.state.internals.ThreadCache +import org.apache.kafka.test.{InternalMockProcessorContext, NoOpRecordCollector, TestUtils} +import scala.collection.JavaConverters._ +import scala.collection.mutable.ArrayBuffer + +class PersistentTimerStoreTest extends Test { + + private type TimerKey = String + + private var context: InternalMockProcessorContext = _ + + private val keyValueStore = new FinatraKeyValueStoreImpl[Timer[TimerKey], Array[Byte]]( + name = "TimerStore", + statsReceiver = NullStatsReceiver + ) + + val timerStore = new PersistentTimerStore[TimerKey]( + timersStore = keyValueStore, + onTimer, + maxTimerFiresPerWatermark = 2 + ) + + private val onTimerCalls = new ArrayBuffer[OnTimerCall] + + override def beforeEach(): Unit = { + context = new InternalMockProcessorContext( + TestUtils.tempDirectory, + Serdes.String, + Serdes.String, + new NoOpRecordCollector, + new ThreadCache(new LogContext("testCache"), 0, new MockStreamsMetrics(new Metrics())) + ) { + + override def getStateStore(name: TimerKey): StateStore = { + val storeBuilder = Stores + .keyValueStoreBuilder( + Stores.persistentKeyValueStore(name), + TimerSerde(Serdes.String()), + Serdes.ByteArray() + ) + + val store = storeBuilder.build + store.init(this, store) + store + } + } + + keyValueStore.init(context, null) + timerStore.onInit() + + onTimerCalls.clear() + } + + override def afterEach(): Unit = { + assertEmptyOnTimerCalls() + assert(keyValueStore.all.asScala.isEmpty) + keyValueStore.close() + } + + test("one timer") { + val timerCall = OnTimerCall(Time(100), Expire, "key123") + addTimer(timerCall) + timerStore.onWatermark(Watermark(100)) + assertAndClearOnTimerCallbacks(timerCall) + timerStore.onWatermark(Watermark(101)) + assertEmptyOnTimerCalls() + } + + test("two timers same time before onWatermark") { + val timerCall1 = OnTimerCall(Time(100), Expire, "key1") + val timerCall2 = OnTimerCall(Time(100), Expire, "key2") + + addTimer(timerCall1) + addTimer(timerCall2) + + timerStore.onWatermark(Watermark(100)) + assertAndClearOnTimerCallbacks(timerCall1, timerCall2) + } + + test("add timer before current watermark") { + timerStore.onWatermark(Watermark(100)) + + val timerCall = OnTimerCall(Time(50), Expire, "key123") + addTimer(timerCall) + assertAndClearOnTimerCallbacks(timerCall) + + timerStore.onWatermark(Watermark(101)) + assertEmptyOnTimerCalls() + } + + test("foundTimerAfterWatermark") { + val timerCall1 = OnTimerCall(Time(100), Expire, "key1") + val timerCall2 = OnTimerCall(Time(200), Expire, "key2") + + addTimer(timerCall1) + addTimer(timerCall2) + + timerStore.onWatermark(Watermark(150)) + assertAndClearOnTimerCallbacks(timerCall1) + + timerStore.onWatermark(Watermark(250)) + assertAndClearOnTimerCallbacks(timerCall2) + } + + test("exceededMaxTimersFired(2) with hasNext") { + val timerCall1 = OnTimerCall(Time(100), Expire, "key1") + val timerCall2 = OnTimerCall(Time(200), Expire, "key2") + val timerCall3 = OnTimerCall(Time(300), Expire, "key3") + + addTimer(timerCall1) + addTimer(timerCall2) + addTimer(timerCall3) + + timerStore.onWatermark(Watermark(400)) + assertAndClearOnTimerCallbacks(timerCall1, timerCall2) + + timerStore.onWatermark(Watermark(401)) + assertAndClearOnTimerCallbacks(timerCall3) + } + + test("exceededMaxTimersFired(2) with no hasNext") { + val timerCall1 = OnTimerCall(Time(100), Expire, "key1") + val timerCall2 = OnTimerCall(Time(200), Expire, "key2") + + addTimer(timerCall1) + addTimer(timerCall2) + + timerStore.onWatermark(Watermark(400)) + assertAndClearOnTimerCallbacks(timerCall1, timerCall2) + + val timerCall3 = OnTimerCall(Time(300), Expire, "key3") + addTimer(timerCall3) + + timerStore.onWatermark(Watermark(401)) + assertAndClearOnTimerCallbacks(timerCall3) + } + + test("onWatermark when no timers") { + timerStore.onWatermark(Watermark(100)) + timerStore.onWatermark(Watermark(200)) + } + + test("init with existing timers") { + val timerCall1 = OnTimerCall(Time(100), Expire, "key1") + addTimer(timerCall1) + + timerStore.onInit() + + timerStore.onWatermark(Watermark(100)) + assertAndClearOnTimerCallbacks(timerCall1) + } + + private def addTimer(timerCall: OnTimerCall): Unit = { + timerStore.addTimer(timerCall.time, timerCall.metadata, timerCall.timerKey) + } + + private def assertAndClearOnTimerCallbacks(expectedTimerCalls: OnTimerCall*): Unit = { + if (onTimerCalls != expectedTimerCalls) { + JsonDiff.jsonDiff(onTimerCalls, expectedTimerCalls) + } + onTimerCalls.clear() + } + + private def onTimer(time: Time, metadata: TimerMetadata, timerKey: TimerKey): Unit = { + onTimerCalls += OnTimerCall(time, metadata, timerKey) + } + + private def assertEmptyOnTimerCalls(): Unit = { + assert(onTimerCalls.isEmpty) + } + + private case class OnTimerCall(time: Time, metadata: TimerMetadata, timerKey: TimerKey) +} diff --git a/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/async_transformer/WordLookupAsyncServer.scala b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/async_transformer/WordLookupAsyncServer.scala new file mode 100644 index 0000000000..2907671fca --- /dev/null +++ b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/async_transformer/WordLookupAsyncServer.scala @@ -0,0 +1,23 @@ +package com.twitter.unittests.integration.async_transformer + +import com.twitter.finatra.kafka.serde.{ScalaSerdes, UnKeyedSerde} +import com.twitter.finatra.kafkastreams.KafkaStreamsTwitterServer +import com.twitter.finatra.kafkastreams.processors.FlushingAwareServer +import org.apache.kafka.common.serialization.Serdes +import org.apache.kafka.streams.StreamsBuilder +import org.apache.kafka.streams.kstream.{Consumed, Produced} + +class WordLookupAsyncServer extends KafkaStreamsTwitterServer with FlushingAwareServer { + + override val name = "wordcount" + + override protected def configureKafkaStreams(builder: StreamsBuilder): Unit = { + val supplier = () => new WordLookupAsyncTransformer(streamsStatsReceiver, commitInterval()) + + builder.asScala + .stream("TextLinesTopic")(Consumed.`with`(UnKeyedSerde, Serdes.String)) + .flatMapValues(_.split(' ')) + .transform(supplier) + .to("WordToWordLength")(Produced.`with`(Serdes.String, ScalaSerdes.Long)) + } +} diff --git a/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/async_transformer/WordLookupAsyncServerFeatureTest.scala b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/async_transformer/WordLookupAsyncServerFeatureTest.scala new file mode 100644 index 0000000000..da79ba819a --- /dev/null +++ b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/async_transformer/WordLookupAsyncServerFeatureTest.scala @@ -0,0 +1,35 @@ +package com.twitter.unittests.integration.async_transformer + +import com.twitter.finatra.kafka.serde.ScalaSerdes +import com.twitter.finatra.kafkastreams.test.KafkaStreamsFeatureTest +import com.twitter.inject.conversions.time._ +import com.twitter.inject.server.EmbeddedTwitterServer +import com.twitter.util.Try +import org.apache.kafka.common.serialization.Serdes + +class WordLookupAsyncServerFeatureTest extends KafkaStreamsFeatureTest { + + override val server = new EmbeddedTwitterServer( + new WordLookupAsyncServer, + flags = kafkaStreamsFlags ++ Map("kafka.application.id" -> "wordcount-prod") + ) + + private val textLinesTopic = kafkaTopic( + ScalaSerdes.Long, + Serdes.String, + "TextLinesTopic", + logPublish = true, + autoConsume = false + ) + private val wordsWithCountsTopic = kafkaTopic(Serdes.String, Serdes.Long, "WordToWordLength") + + test("word count") { + server.start() + textLinesTopic.publish(1L -> "hello world foo") + wordsWithCountsTopic.consumeExpectedMap(Map("hello" -> 5L, "world" -> 5L, "foo" -> 3L)) + + val otherResults = + Try(wordsWithCountsTopic.consumeMessages(numMessages = 1, 2.seconds)).getOrElse(Seq.empty) + otherResults should equal(Seq.empty) // make sure there are no more results except those we have seen so far! + } +} diff --git a/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/async_transformer/WordLookupAsyncServerTopologyFeatureTest.scala b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/async_transformer/WordLookupAsyncServerTopologyFeatureTest.scala new file mode 100644 index 0000000000..568533f4f7 --- /dev/null +++ b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/async_transformer/WordLookupAsyncServerTopologyFeatureTest.scala @@ -0,0 +1,35 @@ +package com.twitter.unittests.integration.async_transformer + +import com.twitter.conversions.DurationOps._ +import com.twitter.finatra.kafka.serde.ScalaSerdes +import com.twitter.finatra.streams.tests.{FinatraTopologyTester, TopologyFeatureTest} +import org.apache.kafka.common.serialization.Serdes +import org.joda.time.DateTime + +class WordLookupAsyncServerTopologyFeatureTest extends TopologyFeatureTest { + + override val topologyTester = FinatraTopologyTester( + "async-server-prod-bob", + new WordLookupAsyncServer, + startingWallClockTime = new DateTime("2018-01-01T00:00:00Z") + ) + + private val textLinesTopic = topologyTester + .topic("TextLinesTopic", ScalaSerdes.Long, Serdes.String) + + private val wordsWithCountsTopic = topologyTester + .topic("WordToWordLength", Serdes.String, Serdes.Long) + + test("word count") { + val messagePublishTime = topologyTester.now + textLinesTopic.pipeInput(1L, "hello") + + // Trigger manual commit which is configured to run every 30 seconds + topologyTester.advanceWallClockTime(30.seconds) + + val result = wordsWithCountsTopic.readOutput() + result.key should equal("hello") + result.value should equal(5) + new DateTime(result.timestamp()) should equal(messagePublishTime) + } +} diff --git a/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/async_transformer/WordLookupAsyncTransformer.scala b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/async_transformer/WordLookupAsyncTransformer.scala new file mode 100644 index 0000000000..c968d154bd --- /dev/null +++ b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/async_transformer/WordLookupAsyncTransformer.scala @@ -0,0 +1,33 @@ +package com.twitter.unittests.integration.async_transformer + +import com.twitter.conversions.DurationOps._ +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finatra.kafka.serde.UnKeyed +import com.twitter.finatra.kafkastreams.processors.{AsyncTransformer, MessageTimestamp} +import com.twitter.util.{Duration, Future} + +class WordLookupAsyncTransformer(statsReceiver: StatsReceiver, commitInterval: Duration) + extends AsyncTransformer[UnKeyed, String, String, Long]( + statsReceiver, + maxOutstandingFuturesPerTask = 10, + flushAsyncRecordsInterval = 1.second, + commitInterval = commitInterval, + flushTimeout = commitInterval + ) { + + override def transformAsync( + key: UnKeyed, + value: String, + timestamp: MessageTimestamp + ): Future[Iterable[(String, Long, MessageTimestamp)]] = { + info(s"transformAsync $key $value") + + for (length <- lookupWordLength(value)) yield { + Seq((value, length, timestamp)) + } + } + + private def lookupWordLength(word: String): Future[Int] = { + Future(word.length) + } +} diff --git a/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/compositesum/UserClicks.scala b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/compositesum/UserClicks.scala new file mode 100644 index 0000000000..2887e0c560 --- /dev/null +++ b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/compositesum/UserClicks.scala @@ -0,0 +1,5 @@ +package com.twitter.unittests.integration.compositesum + +import com.twitter.unittests.integration.compositesum.UserClicksTypes.UserId + +case class UserClicks(userId: UserId, clickType: Int) diff --git a/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/compositesum/UserClicksSerde.scala b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/compositesum/UserClicksSerde.scala new file mode 100644 index 0000000000..6a29fe7672 --- /dev/null +++ b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/compositesum/UserClicksSerde.scala @@ -0,0 +1,21 @@ +package com.twitter.unittests.integration.compositesum + +import com.google.common.primitives.Ints +import com.twitter.finatra.kafka.serde.AbstractSerde +import java.nio.ByteBuffer + +object UserClicksSerde extends AbstractSerde[UserClicks] { + override def deserialize(bytes: Array[Byte]): UserClicks = { + val bb = ByteBuffer.wrap(bytes) + val userId = bb.getInt() + val clicks = bb.getInt() + UserClicks(userId = userId, clickType = clicks) + } + + override def serialize(obj: UserClicks): Array[Byte] = { + val bb = ByteBuffer.allocate(Ints.BYTES + Ints.BYTES) + bb.putInt(obj.userId) + bb.putInt(obj.clickType) + bb.array() + } +} diff --git a/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/compositesum/UserClicksServer.scala b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/compositesum/UserClicksServer.scala new file mode 100644 index 0000000000..7fcdf0be10 --- /dev/null +++ b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/compositesum/UserClicksServer.scala @@ -0,0 +1,30 @@ +package com.twitter.unittests.integration.compositesum + +import com.twitter.conversions.DurationOps._ +import com.twitter.finatra.kafka.serde.ScalaSerdes +import com.twitter.finatra.kafkastreams.KafkaStreamsTwitterServer +import com.twitter.finatra.kafkastreams.dsl.FinatraDslWindowedAggregations +import com.twitter.finatra.streams.transformer.domain.{FixedTimeWindowedSerde, WindowedValueSerde} +import com.twitter.unittests.integration.compositesum.UserClicksTypes.{NumClicksSerde, UserIdSerde} +import org.apache.kafka.streams.StreamsBuilder +import org.apache.kafka.streams.kstream.{Consumed, Produced} + +class UserClicksServer extends KafkaStreamsTwitterServer with FinatraDslWindowedAggregations { + + override def configureKafkaStreams(streamsBuilder: StreamsBuilder): Unit = { + streamsBuilder.asScala + .stream("userid-to-clicktype")(Consumed.`with`(UserIdSerde, ScalaSerdes.Int)) + .map((userId, clickType) => UserClicks(userId, clickType) -> 1) + .sum( + stateStore = "user-clicks-store", + windowSize = 1.hour, + allowedLateness = 5.minutes, + queryableAfterClose = 1.hour, + emitUpdatedEntriesOnCommit = true, + keySerde = UserClicksSerde + ) + .to("userid-to-hourly-clicks")(Produced.`with`( + FixedTimeWindowedSerde(UserClicksSerde, duration = 1.hour), + WindowedValueSerde(NumClicksSerde))) + } +} diff --git a/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/compositesum/UserClicksTopologyFeatureTest.scala b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/compositesum/UserClicksTopologyFeatureTest.scala new file mode 100644 index 0000000000..a6b3589031 --- /dev/null +++ b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/compositesum/UserClicksTopologyFeatureTest.scala @@ -0,0 +1,97 @@ +package com.twitter.unittests.integration.compositesum + +import com.twitter.conversions.DurationOps._ +import com.twitter.finatra.streams.tests.{FinatraTopologyTester, TopologyFeatureTest} +import com.twitter.finatra.streams.transformer.domain.{ + FixedTimeWindowedSerde, + TimeWindowed, + WindowClosed, + WindowOpen, + WindowedValue, + WindowedValueSerde +} +import com.twitter.unittests.integration.compositesum.UserClicksTypes.{ + ClickTypeSerde, + NumClicksSerde, + UserIdSerde +} +import org.joda.time.DateTime + +class UserClicksTopologyFeatureTest extends TopologyFeatureTest { + + override val topologyTester = FinatraTopologyTester( + kafkaApplicationId = "user-clicks-prod", + server = new UserClicksServer, + startingWallClockTime = new DateTime("2018-01-01T00:00:00Z")) + + private val userIdToClicksTopic = + topologyTester.topic("userid-to-clicktype", UserIdSerde, ClickTypeSerde) + + private val hourlyWordAndCountTopic = + topologyTester.topic( + "userid-to-hourly-clicks", + FixedTimeWindowedSerde(UserClicksSerde, duration = 1.hour), + WindowedValueSerde(NumClicksSerde)) + + test("windowed clicks") { + val userId1 = 1 + val firstHourStartMillis = new DateTime("2018-01-01T00:00:00Z").getMillis + val fifthHourStartMillis = new DateTime("2018-01-01T05:00:00Z").getMillis + + userIdToClicksTopic.pipeInput(userId1, 100) + userIdToClicksTopic.pipeInput(userId1, 200) + userIdToClicksTopic.pipeInput(userId1, 300) + userIdToClicksTopic.pipeInput(userId1, 200) + userIdToClicksTopic.pipeInput(userId1, 300) + userIdToClicksTopic.pipeInput(userId1, 300) + + topologyTester.advanceWallClockTime(30.seconds) + hourlyWordAndCountTopic.assertOutput( + TimeWindowed.hourly(firstHourStartMillis, UserClicks(userId1, clickType = 100)), + WindowedValue(WindowOpen, 1)) + + hourlyWordAndCountTopic.assertOutput( + TimeWindowed.hourly(firstHourStartMillis, UserClicks(userId1, clickType = 300)), + WindowedValue(WindowOpen, 3)) + + hourlyWordAndCountTopic.assertOutput( + TimeWindowed.hourly(firstHourStartMillis, UserClicks(userId1, clickType = 200)), + WindowedValue(WindowOpen, 2)) + + userIdToClicksTopic.pipeInput(userId1, 100) + userIdToClicksTopic.pipeInput(userId1, 200) + userIdToClicksTopic.pipeInput(userId1, 300) + + topologyTester.advanceWallClockTime(5.hours) + hourlyWordAndCountTopic.assertOutput( + TimeWindowed.hourly(firstHourStartMillis, UserClicks(userId1, clickType = 100)), + WindowedValue(WindowOpen, 2)) + + hourlyWordAndCountTopic.assertOutput( + TimeWindowed.hourly(firstHourStartMillis, UserClicks(userId1, clickType = 300)), + WindowedValue(WindowOpen, 4)) + + hourlyWordAndCountTopic.assertOutput( + TimeWindowed.hourly(firstHourStartMillis, UserClicks(userId1, clickType = 200)), + WindowedValue(WindowOpen, 3)) + + userIdToClicksTopic.pipeInput(userId1, 1) + topologyTester.advanceWallClockTime(30.seconds) + + hourlyWordAndCountTopic.assertOutput( + TimeWindowed.hourly(fifthHourStartMillis, UserClicks(userId1, clickType = 1)), + WindowedValue(WindowOpen, 1)) + + hourlyWordAndCountTopic.assertOutput( + TimeWindowed.hourly(firstHourStartMillis, UserClicks(userId1, clickType = 100)), + WindowedValue(WindowClosed, 2)) + + hourlyWordAndCountTopic.assertOutput( + TimeWindowed.hourly(firstHourStartMillis, UserClicks(userId1, clickType = 200)), + WindowedValue(WindowClosed, 3)) + + hourlyWordAndCountTopic.assertOutput( + TimeWindowed.hourly(firstHourStartMillis, UserClicks(userId1, clickType = 300)), + WindowedValue(WindowClosed, 4)) + } +} diff --git a/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/compositesum/UserClicksTypes.scala b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/compositesum/UserClicksTypes.scala new file mode 100644 index 0000000000..a8ba9dd98d --- /dev/null +++ b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/compositesum/UserClicksTypes.scala @@ -0,0 +1,15 @@ +package com.twitter.unittests.integration.compositesum + +import com.twitter.finatra.kafka.serde.ScalaSerdes +import org.apache.kafka.common.serialization.Serde + +object UserClicksTypes { + type UserId = Int + val UserIdSerde: Serde[Int] = ScalaSerdes.Int + + type NumClicks = Int + val NumClicksSerde: Serde[Int] = ScalaSerdes.Int + + type ClickType = Int + val ClickTypeSerde: Serde[Int] = ScalaSerdes.Int +} diff --git a/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/default_serde/DefaultSerdeWordCountDbServer.scala b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/default_serde/DefaultSerdeWordCountDbServer.scala new file mode 100644 index 0000000000..7de876fecd --- /dev/null +++ b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/default_serde/DefaultSerdeWordCountDbServer.scala @@ -0,0 +1,25 @@ +package com.twitter.unittests.integration.default_serde + +import com.twitter.finatra.kafka.serde.ScalaSerdes +import com.twitter.finatra.kafkastreams.KafkaStreamsTwitterServer +import org.apache.kafka.common.serialization.Serdes +import org.apache.kafka.common.utils.Bytes +import org.apache.kafka.streams.StreamsBuilder +import org.apache.kafka.streams.kstream.{Materialized, Produced, Serialized} + +class DefaultSerdeWordCountDbServer extends KafkaStreamsTwitterServer { + + override val name = "wordcount" + private val countStoreName = "CountsStore" + + override protected def configureKafkaStreams(builder: StreamsBuilder): Unit = { + builder + .stream[Bytes, String]("TextLinesTopic") // Uses default serdes since Consumed.with not specified + .asScala + .flatMapValues(_.split(' ')) + .groupBy((_, word) => word)(Serialized.`with`(Serdes.String, Serdes.String)) + .count()(Materialized.as(countStoreName)) + .toStream + .to("WordsWithCountsTopic")(Produced.`with`(Serdes.String, ScalaSerdes.Long)) + } +} diff --git a/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/default_serde/DefaultSerdeWordCountServerFeatureTest.scala b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/default_serde/DefaultSerdeWordCountServerFeatureTest.scala new file mode 100644 index 0000000000..e324378432 --- /dev/null +++ b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/default_serde/DefaultSerdeWordCountServerFeatureTest.scala @@ -0,0 +1,35 @@ +package com.twitter.unittests.integration.default_serde + +import com.twitter.finatra.kafka.serde.ScalaSerdes +import com.twitter.finatra.kafkastreams.KafkaStreamsTwitterServer +import com.twitter.finatra.kafkastreams.test.KafkaStreamsFeatureTest +import com.twitter.inject.server.EmbeddedTwitterServer +import com.twitter.util.Await +import org.apache.kafka.common.serialization.Serdes + +class DefaultSerdeWordCountServerFeatureTest extends KafkaStreamsFeatureTest { + + override val server = new EmbeddedTwitterServer( + new DefaultSerdeWordCountDbServer, + flags = kafkaStreamsFlags ++ Map("kafka.application.id" -> "wordcount-prod") + ) + + private val textLinesTopic = kafkaTopic(ScalaSerdes.Long, Serdes.String, "TextLinesTopic") + private val countsChangelogTopic = kafkaTopic( + Serdes.String, + Serdes.Long, + "wordcount-prod-CountsStore-changelog", + autoCreate = false + ) + private val wordsWithCountsTopic = kafkaTopic(Serdes.String, Serdes.Long, "WordsWithCountsTopic") + + test("word count") { + server.start() + textLinesTopic.publish(1L -> "hello world hello") + Await.result(server.mainResult) + server.injectableServer + .asInstanceOf[KafkaStreamsTwitterServer].uncaughtException.toString should include( + "Default Deserializer's should be avoided " + ) + } +} diff --git a/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/finatratransformer/WordLengthFinatraTransformerV2.scala b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/finatratransformer/WordLengthFinatraTransformerV2.scala new file mode 100644 index 0000000000..6fa26b74aa --- /dev/null +++ b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/finatratransformer/WordLengthFinatraTransformerV2.scala @@ -0,0 +1,33 @@ +package com.twitter.unittests.integration.finatratransformer + +import com.twitter.conversions.DurationOps._ +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finatra.streams.transformer.domain.{Expire, Time, TimerMetadata} +import com.twitter.finatra.streams.transformer.{FinatraTransformerV2, PersistentTimers} +import com.twitter.unittests.integration.finatratransformer.WordLengthFinatraTransformerV2._ +import com.twitter.util.Duration +import org.apache.kafka.streams.processor.PunctuationType + +object WordLengthFinatraTransformerV2 { + val delayedMessageTime: Duration = 5.seconds +} + +class WordLengthFinatraTransformerV2(statsReceiver: StatsReceiver, timerStoreName: String) + extends FinatraTransformerV2[String, String, String, String](statsReceiver) + with PersistentTimers { + + private val timerStore = + getPersistentTimerStore[String](timerStoreName, onEventTimer, PunctuationType.STREAM_TIME) + + override def onMessage(messageTime: Time, key: String, value: String): Unit = { + forward(key, "onMessage " + key + " " + key.length) + + val time = messageTime.plus(delayedMessageTime) + + timerStore.addTimer(time, Expire, key) + } + + private def onEventTimer(time: Time, metadata: TimerMetadata, key: String): Unit = { + forward(key, "onEventTimer " + key + " " + key.length) + } +} diff --git a/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/finatratransformer/WordLengthServer.scala b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/finatratransformer/WordLengthServer.scala new file mode 100644 index 0000000000..98c8c6d26d --- /dev/null +++ b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/finatratransformer/WordLengthServer.scala @@ -0,0 +1,32 @@ +package com.twitter.unittests.integration.finatratransformer + +import com.twitter.finatra.kafkastreams.KafkaStreamsTwitterServer +import com.twitter.finatra.streams.transformer.FinatraTransformer +import com.twitter.unittests.integration.finatratransformer.WordLengthServer._ +import org.apache.kafka.common.serialization.Serdes +import org.apache.kafka.streams.StreamsBuilder +import org.apache.kafka.streams.kstream.{Consumed, Produced} + +object WordLengthServer { + val timerStoreName = "timers" + val stringsAndInputsTopic = "strings-and-inputs" + val StringsAndOutputsTopic = "strings-and-outputs" +} + +class WordLengthServer extends KafkaStreamsTwitterServer { + + override protected def configureKafkaStreams(streamsBuilder: StreamsBuilder): Unit = { + + kafkaStreamsBuilder.addStateStore( + FinatraTransformer.timerStore(timerStoreName, Serdes.String())) + + val transformerSupplier = () => + new WordLengthFinatraTransformerV2(statsReceiver, timerStoreName) + + streamsBuilder.asScala + .stream(stringsAndInputsTopic)( + Consumed.`with`(Serdes.String(), Serdes.String()) + ).transform(transformerSupplier, timerStoreName) + .to(StringsAndOutputsTopic)(Produced.`with`(Serdes.String(), Serdes.String())) + } +} diff --git a/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/finatratransformer/WordLengthServerTopologyFeatureTest.scala b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/finatratransformer/WordLengthServerTopologyFeatureTest.scala new file mode 100644 index 0000000000..7fc06b441c --- /dev/null +++ b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/finatratransformer/WordLengthServerTopologyFeatureTest.scala @@ -0,0 +1,35 @@ +package com.twitter.unittests.integration.finatratransformer + +import com.twitter.conversions.DurationOps._ +import com.twitter.finatra.streams.tests.{FinatraTopologyTester, TopologyFeatureTest} +import org.apache.kafka.common.serialization.Serdes +import org.joda.time.DateTime + +class WordLengthServerTopologyFeatureTest extends TopologyFeatureTest { + + override val topologyTester = FinatraTopologyTester( + kafkaApplicationId = "test-transformer-prod-alice", + server = new WordLengthServer, + startingWallClockTime = new DateTime("2018-01-01T00:00:00Z") + ) + + private val wordAndCountTopic = + topologyTester.topic(WordLengthServer.stringsAndInputsTopic, Serdes.String(), Serdes.String()) + + private val stringAndCountTopic = + topologyTester.topic(WordLengthServer.StringsAndOutputsTopic, Serdes.String(), Serdes.String()) + + test("test inputs get transformed and timers fire") { + wordAndCountTopic.pipeInput("key", "") + stringAndCountTopic.assertOutput("key", "onMessage key " + "key".length) + // advance time + topologyTester.advanceWallClockTime(6.seconds) + // send a message to advance the watermark + wordAndCountTopic.pipeInput("key2", "") + // advance time again to cause the new watermark to get passed through onWatermark + topologyTester.advanceWallClockTime(1.seconds) + + stringAndCountTopic.assertOutput("key2", "onMessage key2 " + "key2".length) + stringAndCountTopic.assertOutput("key", "onEventTimer key " + "key".length) + } +} diff --git a/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/globaltable/GlobalTableServer.scala b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/globaltable/GlobalTableServer.scala new file mode 100644 index 0000000000..d50819b5f3 --- /dev/null +++ b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/globaltable/GlobalTableServer.scala @@ -0,0 +1,27 @@ +package com.twitter.unittests.integration.globaltable + +import com.twitter.finatra.kafka.serde.ScalaSerdes +import com.twitter.finatra.kafkastreams.KafkaStreamsTwitterServer +import org.apache.kafka.common.utils.Bytes +import org.apache.kafka.streams.StreamsBuilder +import org.apache.kafka.streams.kstream.Materialized +import org.apache.kafka.streams.state.KeyValueStore + +object GlobalTableServer { + final val GlobalTableTopic = "GlobalTableTopic" +} + +class GlobalTableServer extends KafkaStreamsTwitterServer { + override val name = "globaltable" + + override protected def configureKafkaStreams(builder: StreamsBuilder): Unit = { + builder + .globalTable( + GlobalTableServer.GlobalTableTopic, + Materialized + .as[Int, Int, KeyValueStore[Bytes, Array[Byte]]]("CountsStore") + .withKeySerde(ScalaSerdes.Int) + .withValueSerde(ScalaSerdes.Int) + ) + } +} diff --git a/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/globaltable/GlobalTableServerFeatureTest.scala b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/globaltable/GlobalTableServerFeatureTest.scala new file mode 100644 index 0000000000..3fa8438156 --- /dev/null +++ b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/globaltable/GlobalTableServerFeatureTest.scala @@ -0,0 +1,49 @@ +package com.twitter.unittests.integration.globaltable + +import com.twitter.finatra.kafka.serde.ScalaSerdes +import com.twitter.finatra.kafkastreams.config.KafkaStreamsConfig +import com.twitter.finatra.kafkastreams.test.KafkaStreamsMultiServerFeatureTest +import com.twitter.inject.server.EmbeddedTwitterServer + +class GlobalTableServerFeatureTest extends KafkaStreamsMultiServerFeatureTest { + + private val globalTableClientIdPatterns = Set("global-consumer", "GlobalStreamThread") + + kafkaTopic(ScalaSerdes.Int, ScalaSerdes.Int, GlobalTableServer.GlobalTableTopic) + + test("verify globalTable metrics included") { + val server = new EmbeddedTwitterServer( + new GlobalTableServer { + override def streamsProperties(config: KafkaStreamsConfig): KafkaStreamsConfig = { + super + .streamsProperties(config) + .withConfig("includeGlobalTableMetrics", "true") + } + }, + flags = kafkaStreamsFlags ++ Map("kafka.application.id" -> "GlobalTableServer") + ) + + assert(server.gaugeMap.keys.exists { metric => + globalTableClientIdPatterns.exists(metric.contains) + }) + server.close() + } + + test("verify globalTable metrics filtered") { + val server = new EmbeddedTwitterServer( + new GlobalTableServer { + override def streamsProperties(config: KafkaStreamsConfig): KafkaStreamsConfig = { + super + .streamsProperties(config) + .withConfig("includeGlobalTableMetrics", "false") + } + }, + flags = kafkaStreamsFlags ++ Map("kafka.application.id" -> "GlobalTableServer") + ) + + assert(!server.gaugeMap.keys.exists { metric => + globalTableClientIdPatterns.exists(metric.contains) + }) + server.close() + } +} diff --git a/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/sampling/SamplingServer.scala b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/sampling/SamplingServer.scala new file mode 100644 index 0000000000..c5c32a43dd --- /dev/null +++ b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/sampling/SamplingServer.scala @@ -0,0 +1,68 @@ +package com.twitter.unittests.integration.sampling + +import com.twitter.conversions.DurationOps._ +import com.twitter.conversions.StorageUnitOps._ +import com.twitter.finatra.kafka.serde.ScalaSerdes +import com.twitter.finatra.kafkastreams.KafkaStreamsTwitterServer +import com.twitter.finatra.kafkastreams.config.FinatraRocksDBConfig.{ + RocksDbBlockCacheSizeConfig, + RocksDbEnableStatistics, + RocksDbLZ4Config +} +import com.twitter.finatra.kafkastreams.config.{FinatraRocksDBConfig, KafkaStreamsConfig} +import com.twitter.finatra.kafkastreams.dsl.FinatraDslSampling +import com.twitter.finatra.streams.flags.FinatraTransformerFlags.{ + AutoWatermarkInterval, + EmitWatermarkPerMessage +} +import com.twitter.finatra.streams.flags.RocksDbFlags +import com.twitter.unittests.integration.sampling.SamplingServer._ +import org.apache.kafka.common.record.CompressionType +import org.apache.kafka.streams.StreamsBuilder +import org.apache.kafka.streams.kstream._ + +object SamplingServer { + val tweetToImpressingUserTopic = "tweet-id-to-impressing-user-id" + val sampleName = "TweetImpressors" +} + +class SamplingServer extends KafkaStreamsTwitterServer with RocksDbFlags with FinatraDslSampling { + + override def configureKafkaStreams(streamsBuilder: StreamsBuilder): Unit = { + streamsBuilder.asScala + .stream(topic = tweetToImpressingUserTopic)(Consumed + .`with`(ScalaSerdes.Long, ScalaSerdes.Long)) + .sample( + toSampleKey = (tweetId, _) => tweetId, + toSampleValue = (_, impressorId) => impressorId, + sampleSize = 5, + expirationTime = Some(1.minute), + sampleName = sampleName, + sampleKeySerde = ScalaSerdes.Long, + sampleValueSerde = ScalaSerdes.Long + ) + } + + override def streamsProperties(config: KafkaStreamsConfig): KafkaStreamsConfig = { + super + .streamsProperties(config) + .retries(60) + .retryBackoff(1.second) + .rocksDbConfigSetter[FinatraRocksDBConfig] + .withConfig(RocksDbBlockCacheSizeConfig, rocksDbCountsStoreBlockCacheSize()) + .withConfig(RocksDbEnableStatistics, rocksDbEnableStatistics().toString) + .withConfig(RocksDbLZ4Config, rocksDbEnableLZ4().toString) + .withConfig(AutoWatermarkInterval, autoWatermarkIntervalFlag().toString) + .withConfig(EmitWatermarkPerMessage, emitWatermarkPerMessageFlag().toString) + .consumer.sessionTimeout(10.seconds) + .consumer.heartbeatInterval(1.second) + .producer.retries(300) + .producer.retryBackoff(1.second) + .producer.requestTimeout(2.minutes) + .producer.transactionTimeout(2.minutes) + .producer.compressionType(CompressionType.LZ4) + .producer.batchSize(500.kilobytes) + .producer.bufferMemorySize(256.megabytes) + .producer.linger(10.seconds) + } +} diff --git a/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/sampling/SamplingServerTopologyFeatureTest.scala b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/sampling/SamplingServerTopologyFeatureTest.scala new file mode 100644 index 0000000000..d95341056f --- /dev/null +++ b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/sampling/SamplingServerTopologyFeatureTest.scala @@ -0,0 +1,93 @@ +package com.twitter.unittests.integration.sampling + +import com.twitter.conversions.DurationOps._ +import com.twitter.finatra.kafka.serde.ScalaSerdes +import com.twitter.finatra.streams.tests.{FinatraTopologyTester, TopologyFeatureTest} +import com.twitter.finatra.streams.transformer.domain.IndexedSampleKey +import com.twitter.finatra.streams.transformer.{IteratorImplicits, SamplingUtils} +import org.apache.kafka.streams.state.KeyValueStore +import org.joda.time.DateTime + +class SamplingServerTopologyFeatureTest extends TopologyFeatureTest with IteratorImplicits{ + + override val topologyTester = FinatraTopologyTester( + kafkaApplicationId = "sampling-server-prod-alice", + server = new SamplingServer, + startingWallClockTime = new DateTime("2018-01-01T00:00:00Z") + ) + + private val tweetIdToImpressingUserId = + topologyTester.topic(SamplingServer.tweetToImpressingUserTopic, ScalaSerdes.Long, ScalaSerdes.Long) + + private var countStore: KeyValueStore[Long, Long] = _ + + private var sampleStore: KeyValueStore[IndexedSampleKey[Long], Long] = _ + + + override def beforeEach(): Unit = { + super.beforeEach() + + countStore = topologyTester.driver.getKeyValueStore[Long, Long](SamplingUtils.getNumCountsStoreName(SamplingServer.sampleName)) + sampleStore = topologyTester.driver.getKeyValueStore[IndexedSampleKey[Long], Long](SamplingUtils.getSampleStoreName(SamplingServer.sampleName)) + } + + test("test that a sample does what you want") { + val tweetId = 1 + + publishImpression(tweetId, userId = 1) + countStore.get(tweetId) should equal(1) + assertSampleSize(tweetId, 1) + assertSampleEquals(tweetId, Set(1)) + + publishImpression(tweetId, userId = 2) + countStore.get(tweetId) should equal(2) + assertSampleSize(tweetId, 2) + assertSampleEquals(tweetId, Set(1, 2)) + + publishImpression(tweetId, userId = 3) + countStore.get(tweetId) should equal(3) + assertSampleSize(tweetId, 3) + assertSampleEquals(tweetId, Set(1, 2, 3)) + + publishImpression(tweetId, userId = 4) + countStore.get(tweetId) should equal(4) + assertSampleSize(tweetId, 4) + assertSampleEquals(tweetId, Set(1, 2, 3, 4)) + + publishImpression(tweetId, userId = 5) + countStore.get(tweetId) should equal(5) + assertSampleSize(tweetId, 5) + assertSampleEquals(tweetId, Set(1, 2, 3, 4, 5)) + + publishImpression(tweetId, userId = 6) + countStore.get(tweetId) should equal(6) + assertSampleSize(tweetId, 5) + + // advance time and verify that the sample is thrown away. + topologyTester.advanceWallClockTime(2.minutes) + publishImpression(tweetId = 666, userId = 666) + topologyTester.stats.assertCounter("kafka/stream/numExpired", 0) + topologyTester.advanceWallClockTime(2.minutes) + topologyTester.stats.assertCounter("kafka/stream/numExpired", 1) + assert(countStore.get(tweetId) == 0) + assertSampleSize(tweetId, 0) + } + + private def assertSampleSize(tweetId: Int, expectedSize: Int): Unit = { + val range = sampleStore.range(IndexedSampleKey(tweetId, 0), IndexedSampleKey(tweetId, Int.MaxValue)) + range.values.toSet.size should be(expectedSize) + } + + private def assertSampleEquals(tweetId: Int, expectedSample: Set[Int]): Unit = { + val range = sampleStore.range(IndexedSampleKey(tweetId, 0), IndexedSampleKey(tweetId, Int.MaxValue)) + range.values.toSet should be(expectedSample) + } + + private def publishImpression( + tweetId: Long, + userId: Long, + publishTime: DateTime = DateTime.now + ): Unit = { + tweetIdToImpressingUserId.pipeInput(tweetId, userId) + } +} diff --git a/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/stateless/VerifyFailureServer.scala b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/stateless/VerifyFailureServer.scala new file mode 100644 index 0000000000..06148bf77d --- /dev/null +++ b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/stateless/VerifyFailureServer.scala @@ -0,0 +1,21 @@ +package com.twitter.unittests.integration.stateless + +import com.twitter.finatra.kafka.serde.ScalaSerdes +import com.twitter.finatra.kafkastreams.StatelessKafkaStreamsTwitterServer +import org.apache.kafka.common.serialization.Serdes +import org.apache.kafka.streams.StreamsBuilder +import org.apache.kafka.streams.kstream.{Consumed, Materialized, Produced, Serialized} + +class VerifyFailureServer extends StatelessKafkaStreamsTwitterServer { + + override val name = "stateless" + override protected def configureKafkaStreams(builder: StreamsBuilder): Unit = { + builder.asScala + .stream("TextLinesTopic")(Consumed.`with`(Serdes.Bytes, Serdes.String)) + .flatMapValues(_.split(' ')) + .groupBy((_, word) => word)(Serialized.`with`(Serdes.String, Serdes.String)) + .count()(Materialized.as("CountsStore")) + .toStream + .to("WordsWithCountsTopic")(Produced.`with`(Serdes.String, ScalaSerdes.Long)) + } +} diff --git a/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/stateless/VerifyFailureServerFeatureTest.scala b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/stateless/VerifyFailureServerFeatureTest.scala new file mode 100644 index 0000000000..4dd8b95599 --- /dev/null +++ b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/stateless/VerifyFailureServerFeatureTest.scala @@ -0,0 +1,25 @@ +package com.twitter.unittests.integration.stateless + +import com.twitter.finatra.kafka.serde.ScalaSerdes +import com.twitter.finatra.kafkastreams.test.KafkaStreamsMultiServerFeatureTest +import com.twitter.inject.server.EmbeddedTwitterServer +import org.apache.kafka.common.serialization.Serdes + +class VerifyFailureServerFeatureTest extends KafkaStreamsMultiServerFeatureTest { + + kafkaTopic(ScalaSerdes.Long, Serdes.String, "TextLinesTopic") + kafkaTopic(Serdes.String, Serdes.Long, "WordsWithCountsTopic") + + test("verify stateful server will fail") { + val server = new EmbeddedTwitterServer( + new VerifyFailureServer, + flags = kafkaStreamsFlags ++ Map("kafka.application.id" -> "VerifyFailureServer") + ) + + intercept[UnsupportedOperationException] { + server.start() + } + + server.close() + } +} diff --git a/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/window/WindowedTweetWordCountServer.scala b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/window/WindowedTweetWordCountServer.scala new file mode 100644 index 0000000000..decbfb29ed --- /dev/null +++ b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/window/WindowedTweetWordCountServer.scala @@ -0,0 +1,32 @@ +package com.twitter.unittests.integration.window + +import com.twitter.conversions.DurationOps._ +import com.twitter.finatra.kafka.serde.ScalaSerdes +import com.twitter.finatra.kafkastreams.KafkaStreamsTwitterServer +import com.twitter.finatra.kafkastreams.dsl.FinatraDslWindowedAggregations +import com.twitter.finatra.streams.transformer.domain._ +import org.apache.kafka.common.serialization.Serdes +import org.apache.kafka.streams.StreamsBuilder +import org.apache.kafka.streams.kstream.{Consumed, Produced} + +class WindowedTweetWordCountServer + extends KafkaStreamsTwitterServer + with FinatraDslWindowedAggregations { + + private val countStoreName = "CountsStore" + + override protected def configureKafkaStreams(streamsBuilder: StreamsBuilder): Unit = { + streamsBuilder.asScala + .stream("word-and-count")(Consumed.`with`(Serdes.String(), ScalaSerdes.Int)) + .sum( + stateStore = countStoreName, + windowSize = windowSize(), + allowedLateness = 5.minutes, + queryableAfterClose = 1.hour, + keySerde = Serdes.String()) + .to("word-to-hourly-counts")( + Produced.`with`( + FixedTimeWindowedSerde(Serdes.String, duration = windowSize()), + WindowedValueSerde(ScalaSerdes.Int))) + } +} diff --git a/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/window/WindowedTweetWordCountServerTopologyFeatureTest.scala b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/window/WindowedTweetWordCountServerTopologyFeatureTest.scala new file mode 100644 index 0000000000..06650991db --- /dev/null +++ b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/window/WindowedTweetWordCountServerTopologyFeatureTest.scala @@ -0,0 +1,51 @@ +package com.twitter.unittests.integration.window + +import com.twitter.conversions.DurationOps._ +import com.twitter.finatra.kafka.serde.ScalaSerdes +import com.twitter.finatra.streams.query.QueryableFinatraWindowStore +import com.twitter.finatra.streams.tests.{FinatraTopologyTester, TopologyFeatureTest} +import com.twitter.finatra.streams.transformer.domain.WindowedValueSerde +import org.apache.kafka.common.serialization.Serdes +import org.joda.time.DateTime + +class WindowedTweetWordCountServerTopologyFeatureTest extends TopologyFeatureTest { + + override val topologyTester = FinatraTopologyTester( + kafkaApplicationId = "windowed-wordcount-prod-alice", + server = new WindowedTweetWordCountServer, + startingWallClockTime = new DateTime("2018-01-01T00:00:00Z") + ) + + private val wordAndCountTopic = + topologyTester.topic("word-and-count", Serdes.String(), ScalaSerdes.Int) + + private val hourlyWordAndCountTopic = + topologyTester.topic( + "word-to-hourly-counts", + Serdes.String, + WindowedValueSerde(ScalaSerdes.Int)) + + test("windowed word count test 1") { + val countStore = + topologyTester + .queryableFinatraWindowStore[String, Int]("CountsStore", 1.hour, Serdes.String()) + + wordAndCountTopic.pipeInput("bob", 1) + assertCurrentHourContains(countStore, "bob", 1) + wordAndCountTopic.pipeInput("bob", 1) + assertCurrentHourContains(countStore, "bob", 2) + wordAndCountTopic.pipeInput("alice", 1) + assertCurrentHourContains(countStore, "bob", 2) + assertCurrentHourContains(countStore, "alice", 1) + } + + private def assertCurrentHourContains( + countStore: QueryableFinatraWindowStore[String, Int], + key: String, + expectedValue: Int + ): Unit = { + val currentHourOpt = Some(topologyTester.currentHour.getMillis) + countStore.get(key, startTime = currentHourOpt, endTime = currentHourOpt) should + equal(Map(topologyTester.currentHour.getMillis -> expectedValue)) + } +} diff --git a/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/wordcount/WordCountRocksDbServer.scala b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/wordcount/WordCountRocksDbServer.scala new file mode 100644 index 0000000000..79728e54e7 --- /dev/null +++ b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/wordcount/WordCountRocksDbServer.scala @@ -0,0 +1,24 @@ +package com.twitter.unittests.integration.wordcount + +import com.twitter.finatra.kafka.serde.ScalaSerdes +import com.twitter.finatra.kafkastreams.KafkaStreamsTwitterServer +import org.apache.kafka.common.serialization.Serdes +import org.apache.kafka.common.utils.Bytes +import org.apache.kafka.streams.StreamsBuilder +import org.apache.kafka.streams.kstream.{Consumed, Materialized, Produced, Serialized} + +class WordCountRocksDbServer extends KafkaStreamsTwitterServer { + + override val name = "wordcount" + private val countStoreName = "CountsStore" + + override protected def configureKafkaStreams(builder: StreamsBuilder): Unit = { + builder.asScala + .stream[Bytes, String]("TextLinesTopic")(Consumed.`with`(Serdes.Bytes, Serdes.String)) + .flatMapValues(_.split(' ')) + .groupBy((_, word) => word)(Serialized.`with`(Serdes.String, Serdes.String)) + .count()(Materialized.as(countStoreName)) + .toStream + .to("WordsWithCountsTopic")(Produced.`with`(Serdes.String, ScalaSerdes.Long)) + } +} diff --git a/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/wordcount/WordCountServerFeatureTest.scala b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/wordcount/WordCountServerFeatureTest.scala new file mode 100644 index 0000000000..e4bca39e69 --- /dev/null +++ b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/wordcount/WordCountServerFeatureTest.scala @@ -0,0 +1,155 @@ +package com.twitter.unittests.integration.wordcount + +import com.twitter.finatra.kafka.serde.ScalaSerdes +import com.twitter.finatra.kafka.test.utils.InMemoryStatsUtil +import com.twitter.finatra.kafkastreams.config.KafkaStreamsConfig +import com.twitter.finatra.kafkastreams.internal.stats.KafkaStreamsFinagleMetricsReporter +import com.twitter.finatra.kafkastreams.test.KafkaStreamsMultiServerFeatureTest +import com.twitter.inject.server.EmbeddedTwitterServer +import com.twitter.util.Await +import org.apache.kafka.common.metrics.Sensor.RecordingLevel +import org.apache.kafka.common.serialization.Serdes + +class WordCountServerFeatureTest extends KafkaStreamsMultiServerFeatureTest { + + private def createServer( + recordingLevel: RecordingLevel = RecordingLevel.INFO + ): EmbeddedTwitterServer = { + new EmbeddedTwitterServer( + new WordCountRocksDbServer { + override def streamsProperties(config: KafkaStreamsConfig): KafkaStreamsConfig = { + super + .streamsProperties(config) + .metricsRecordingLevelConfig(recordingLevel) + } + }, + flags = kafkaStreamsFlags ++ Map("kafka.application.id" -> "wordcount-prod") + ) + } + + override def beforeEach(): Unit = { + resetStreamThreadId() + } + + private val textLinesTopic = kafkaTopic(ScalaSerdes.Long, Serdes.String, "TextLinesTopic") + private val countsChangelogTopic = kafkaTopic( + Serdes.String, + Serdes.Long, + "wordcount-prod-CountsStore-changelog", + autoCreate = false + ) + private val wordsWithCountsTopic = kafkaTopic(Serdes.String, Serdes.Long, "WordsWithCountsTopic") + + test("word count") { + val serverBeforeRestart = createServer() + val serverBeforeRestartStats = InMemoryStatsUtil(serverBeforeRestart.injector) + serverBeforeRestart.start() + + textLinesTopic.publish(1L -> "hello world hello") + serverBeforeRestartStats.waitForGauge( + "kafka/thread1/consumer/TextLinesTopic/records_consumed_total", + 1 + ) + wordsWithCountsTopic.consumeAsManyMessagesUntilMap(Map("world" -> 1L, "hello" -> 2L)) + serverBeforeRestart.assertGauge( + "kafka/thread1/producer/wordcount_prod_CountsStore_changelog/record_send_total", + 2 + ) + serverBeforeRestart.assertGauge( + "kafka/thread1/producer/WordsWithCountsTopic/record_send_total", + 2 + ) + + textLinesTopic.publish(1L -> "world world") + serverBeforeRestartStats.waitForGauge( + "kafka/thread1/consumer/TextLinesTopic/records_consumed_total", + 2 + ) + wordsWithCountsTopic.consumeAsManyMessagesUntilMap(Map("world" -> 3L)) + serverBeforeRestart.assertGauge( + "kafka/thread1/producer/wordcount_prod_CountsStore_changelog/record_send_total", + 3 + ) + serverBeforeRestart.assertGauge( + "kafka/thread1/producer/WordsWithCountsTopic/record_send_total", + 3 + ) + + serverBeforeRestart.assertGauge("kafka/stream/state", 2) + assert(countsChangelogTopic.consumeValue() > 0) + assert( + serverBeforeRestart + .getGauge("kafka/thread1/producer/wordcount_prod_CountsStore_changelog/byte_total") > 0 + ) + assertTimeSincePublishedSet( + serverBeforeRestart, + "kafka/consumer/TextLinesTopic/time_since_record_published_ms" + ) + assertTimeSincePublishedSet( + serverBeforeRestart, + "kafka/consumer/wordcount-prod-CountsStore-repartition/time_since_record_published_ms" + ) + + serverBeforeRestart.printStats() + serverBeforeRestart.close() + Await.result(serverBeforeRestart.mainResult) + + val serverAfterRestart = createServer() + serverAfterRestart.start() + + textLinesTopic.publish(1L -> "world world") + wordsWithCountsTopic.consumeAsManyMessagesUntilMap(Map("world" -> 5L)) + + // Why isn't the records_consumed_total stat > 0 in the feature test? + // val serverAfterRestartStats = new InMemoryStatsUtil(serverAfterRestart.injector) + // serverAfterRestartStats.waitForGaugeUntil("kafka/thread2/restore_consumer/records_consumed_total", _ > 0) + serverAfterRestart.getStat( + "kafka/consumer/wordcount-prod-CountsStore-changelog/time_since_record_published_ms" + ) should equal(Seq()) + serverAfterRestart.getStat( + "kafka/consumer/wordcount-prod-CountsStore-changelog/time_since_record_timestamp_ms" + ) should equal(Seq()) + serverAfterRestart.close() + Await.result(serverAfterRestart.mainResult) + } + + test("ensure debug metrics not included if RecordingLevel is not DEBUG") { + val debugServer = createServer(recordingLevel = RecordingLevel.DEBUG) + debugServer.start() + val debugServerStats = InMemoryStatsUtil(debugServer.injector) + + textLinesTopic.publish(1L -> "hello world hello") + debugServerStats.waitForGauge("kafka/thread1/consumer/TextLinesTopic/records_consumed_total", 1) + val debugServerMetricNames = debugServerStats.metricNames + + debugServer.close() + resetStreamThreadId() + + val infoServer = createServer(recordingLevel = RecordingLevel.INFO) + infoServer.start() + val infoServerStats = InMemoryStatsUtil(infoServer.injector) + + textLinesTopic.publish(1L -> "hello world hello") + infoServerStats.waitForGauge("kafka/thread1/consumer/TextLinesTopic/records_consumed_total", 1) + val infoServerMetricNames = infoServerStats.metricNames + + infoServer.close() + + // `infoServer` should not contain DEBUG-level ''rocksdb'' metrics + assert(debugServerMetricNames.exists(_.contains("rocksdb"))) + assert(!infoServerMetricNames.exists(_.contains("rocksdb"))) + assert(!infoServerMetricNames.exists { metric => + KafkaStreamsFinagleMetricsReporter.debugMetrics.exists(metric.endsWith) + }) + assert(debugServerMetricNames.size > infoServerMetricNames.size) + } + + /* Private */ + + private def assertTimeSincePublishedSet(server: EmbeddedTwitterServer, topic: String): Unit = { + assert( + server.getStat(topic).nonEmpty && + server.getStat(topic).forall(_ >= 0) + ) + } +} diff --git a/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/wordcount/WordCountServerTopologyFeatureTest.scala b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/wordcount/WordCountServerTopologyFeatureTest.scala new file mode 100644 index 0000000000..c189d8b8bd --- /dev/null +++ b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/wordcount/WordCountServerTopologyFeatureTest.scala @@ -0,0 +1,46 @@ +package com.twitter.unittests.integration.wordcount + +import com.twitter.finatra.kafka.serde.ScalaSerdes +import com.twitter.finatra.streams.tests.{FinatraTopologyTester, TopologyFeatureTest} +import org.apache.kafka.common.serialization.Serdes +import org.joda.time.DateTime + +class WordCountServerTopologyFeatureTest extends TopologyFeatureTest { + + override val topologyTester = FinatraTopologyTester( + kafkaApplicationId = "wordcount-prod-bob", + server = new WordCountRocksDbServer, + startingWallClockTime = new DateTime("2018-01-01T00:00:00Z") + ) + + private val textLinesTopic = + topologyTester.topic("TextLinesTopic", Serdes.ByteArray(), Serdes.String) + + private val wordsWithCountsTopic = + topologyTester.topic("WordsWithCountsTopic", Serdes.String, ScalaSerdes.Long) + + test("word count test 1") { + val countsStore = topologyTester.getKeyValueStore[String, Long]("CountsStore") + + textLinesTopic.pipeInput(Array.emptyByteArray, "Hello World Hello") + + wordsWithCountsTopic.assertOutput("Hello", 1) + wordsWithCountsTopic.assertOutput("World", 1) + wordsWithCountsTopic.assertOutput("Hello", 2) + + countsStore.get("Hello") should equal(2) + countsStore.get("World") should equal(1) + } + + test("word count test 2") { + val countsStore = topologyTester.getKeyValueStore[String, Long]("CountsStore") + + textLinesTopic.pipeInput(Array.emptyByteArray, "yo yo yo") + + wordsWithCountsTopic.assertOutput("yo", 1) + wordsWithCountsTopic.assertOutput("yo", 2) + wordsWithCountsTopic.assertOutput("yo", 3) + + countsStore.get("yo") should equal(3) + } +} diff --git a/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/wordcount_in_memory/WordCountInMemoryServer.scala b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/wordcount_in_memory/WordCountInMemoryServer.scala new file mode 100644 index 0000000000..0a4dee1dab --- /dev/null +++ b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/wordcount_in_memory/WordCountInMemoryServer.scala @@ -0,0 +1,35 @@ +package com.twitter.unittests.integration.wordcount_in_memory + +import com.twitter.finatra.kafka.serde.ScalaSerdes +import com.twitter.finatra.kafkastreams.KafkaStreamsTwitterServer +import org.apache.kafka.common.serialization.Serdes +import org.apache.kafka.streams.StreamsBuilder +import org.apache.kafka.streams.kstream.{Consumed, Materialized, Produced, Serialized} +import org.apache.kafka.streams.state.Stores + +object WordCountInMemoryServerMain extends WordCountInMemoryServer + +class WordCountInMemoryServer extends KafkaStreamsTwitterServer { + + override val name = "wordcount" + private val countStoreName = "CountsStore" + + override protected def configureKafkaStreams(builder: StreamsBuilder): Unit = { + builder.asScala + .stream("TextLinesTopic")(Consumed.`with`(Serdes.Bytes, Serdes.String)) + .flatMapValues(_.split(' ')) + .groupBy((_, word) => word)(Serialized.`with`(Serdes.String, Serdes.String)) + .count()( + Materialized + .as( + Stores + .inMemoryKeyValueStore(countStoreName) + ) + .withKeySerde(Serdes.String()) + .withValueSerde(ScalaSerdes.Long) + .withCachingDisabled() + ) + .toStream + .to("WordsWithCountsTopic")(Produced.`with`(Serdes.String, ScalaSerdes.Long)) + } +} diff --git a/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/wordcount_in_memory/WordCountInMemoryServerFeatureTest.scala b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/wordcount_in_memory/WordCountInMemoryServerFeatureTest.scala new file mode 100644 index 0000000000..e9a1baac0a --- /dev/null +++ b/kafka-streams/kafka-streams/src/test/scala/com/twitter/unittests/integration/wordcount_in_memory/WordCountInMemoryServerFeatureTest.scala @@ -0,0 +1,49 @@ +package com.twitter.unittests.integration.wordcount_in_memory + +import com.twitter.finatra.kafka.serde.ScalaSerdes +import com.twitter.finatra.kafkastreams.test.KafkaStreamsFeatureTest +import com.twitter.inject.server.EmbeddedTwitterServer +import org.apache.kafka.common.serialization.Serdes + +class WordCountInMemoryServerFeatureTest extends KafkaStreamsFeatureTest { + + override val server = new EmbeddedTwitterServer( + new WordCountInMemoryServer, + flags = kafkaStreamsFlags ++ Map("kafka.application.id" -> "wordcount-prod") + ) + + private val textLinesTopic = kafkaTopic(ScalaSerdes.Long, Serdes.String, "TextLinesTopic") + private val wordsWithCountsTopic = kafkaTopic(Serdes.String, Serdes.Long, "WordsWithCountsTopic") + + test("word count") { + server.start() + + textLinesTopic.publish(1L -> "hello world hello") + waitForKafkaMetric("kafka/thread1/consumer/TextLinesTopic/records_consumed_total", 1) + wordsWithCountsTopic.consumeMessages(numMessages = 3) should contain theSameElementsAs Seq( + "world" -> 1, + "hello" -> 1, + "hello" -> 2 + ) + server.assertGauge( + "kafka/thread1/producer/wordcount_prod_CountsStore_changelog/record_send_total", + 3 + ) + server.assertGauge("kafka/thread1/producer/WordsWithCountsTopic/record_send_total", 3) + + textLinesTopic.publish(1L -> "world world") + waitForKafkaMetric("kafka/thread1/consumer/TextLinesTopic/records_consumed_total", 2) + wordsWithCountsTopic.consumeMessages(numMessages = 2) should contain theSameElementsAs Seq( + "world" -> 2, + "world" -> 3 + ) + server.assertGauge( + "kafka/thread1/producer/wordcount_prod_CountsStore_changelog/record_send_total", + 5 + ) + server.assertGauge("kafka/thread1/producer/WordsWithCountsTopic/record_send_total", 5) + + server.assertGauge("kafka/stream/state", 2) + server.printStats() + } +} diff --git a/kafka/PROJECT b/kafka/PROJECT new file mode 100644 index 0000000000..c57fb17484 --- /dev/null +++ b/kafka/PROJECT @@ -0,0 +1,7 @@ +owners: + - messaging-group:ldap + - scosenza + - dbress + - adams +watchers: + - ds-messaging@twitter.com diff --git a/kafka/src/main/java/BUILD b/kafka/src/main/java/BUILD new file mode 100644 index 0000000000..cd1f7ca1ec --- /dev/null +++ b/kafka/src/main/java/BUILD @@ -0,0 +1,13 @@ +java_library( + sources = rglobs("*.java"), + compiler_option_sets = {}, + provides = artifact( + org = "com.twitter", + name = "finatra-kafka-java", + repo = artifactory, + ), + dependencies = [ + ], + exports = [ + ], +) diff --git a/kafka/src/main/java/com/twitter/finatra/kafka/domain/AckMode.java b/kafka/src/main/java/com/twitter/finatra/kafka/domain/AckMode.java new file mode 100644 index 0000000000..6d82d3f260 --- /dev/null +++ b/kafka/src/main/java/com/twitter/finatra/kafka/domain/AckMode.java @@ -0,0 +1,18 @@ +package com.twitter.finatra.kafka.domain; + +public enum AckMode { + ALL("all"), + ONE("1"), + ZERO("0"); + + private String value; + + AckMode(String value) { + this.value = value; + } + + @Override + public String toString() { + return value; + } +} diff --git a/kafka/src/main/java/com/twitter/finatra/kafka/domain/IsolationLevel.java b/kafka/src/main/java/com/twitter/finatra/kafka/domain/IsolationLevel.java new file mode 100644 index 0000000000..90a1ed8d2f --- /dev/null +++ b/kafka/src/main/java/com/twitter/finatra/kafka/domain/IsolationLevel.java @@ -0,0 +1,27 @@ +package com.twitter.finatra.kafka.domain; + +/** + * Controls how to read messages written transactionally. + * + * If set to read_committed, consumer.poll() will only return transactional messages + * if they have been committed. + * If set to read_uncommitted (the default), consumer.poll() will return all messages, + * even transactional messages which have been aborted. + * + * Non-transactional messages will be returned unconditionally in either mode. + */ +public enum IsolationLevel { + READ_UNCOMMITTED("read_uncommitted"), + READ_COMMITTED("read_committed"); + + private String value; + + IsolationLevel(String value) { + this.value = value; + } + + @Override + public String toString() { + return value; + } +} diff --git a/kafka/src/main/java/com/twitter/finatra/kafka/domain/SeekStrategy.java b/kafka/src/main/java/com/twitter/finatra/kafka/domain/SeekStrategy.java new file mode 100644 index 0000000000..6561c5ac81 --- /dev/null +++ b/kafka/src/main/java/com/twitter/finatra/kafka/domain/SeekStrategy.java @@ -0,0 +1,5 @@ +package com.twitter.finatra.kafka.domain; + +public enum SeekStrategy { + BEGINNING, RESUME, REWIND, END +} diff --git a/kafka/src/main/scala/BUILD b/kafka/src/main/scala/BUILD new file mode 100644 index 0000000000..ad9c8ac36d --- /dev/null +++ b/kafka/src/main/scala/BUILD @@ -0,0 +1,31 @@ +scala_library( + sources = rglobs("*.scala"), + compiler_option_sets = {"fatal_warnings"}, + provides = scala_artifact( + org = "com.twitter", + name = "finatra-kafka", + repo = artifactory, + ), + strict_deps = False, + dependencies = [ + "3rdparty/jvm/org/apache/kafka:kafka-clients", + "finagle/finagle-core/src/main/scala", + "finatra/inject/inject-core", + "finatra/inject/inject-slf4j", + "finatra/inject/inject-utils", + "finatra/kafka/src/main/java", + "finatra/utils", + "scrooge/scrooge-serializer/src/main/scala", + "util/util-codec/src/main/scala", + ], + exports = [ + "3rdparty/jvm/org/apache/kafka:kafka-clients", + "finagle/finagle-core/src/main/scala", + "finatra/inject/inject-core", + "finatra/inject/inject-slf4j", + "finatra/inject/inject-utils", + "finatra/utils", + "scrooge/scrooge-serializer/src/main/scala", + "util/util-codec/src/main/scala", + ], +) diff --git a/kafka/src/main/scala/com/twitter/finatra/kafka/config/KafkaConfig.scala b/kafka/src/main/scala/com/twitter/finatra/kafka/config/KafkaConfig.scala new file mode 100644 index 0000000000..24ec5d7a20 --- /dev/null +++ b/kafka/src/main/scala/com/twitter/finatra/kafka/config/KafkaConfig.scala @@ -0,0 +1,85 @@ +package com.twitter.finatra.kafka.config + +import com.twitter.util.{Duration, StorageUnit} +import java.util.Properties + +/** + * Base trait for everything Kafka config related. + * Kafka's configuration eventually ends up in a + * java.util.Properties (see ToKafkaProperties below). + * + * We keep it in a Map[String, String] for convenience + * until the last possible moment. + */ +trait KafkaConfig { + protected def configMap: Map[String, String] +} + +/** + * Base trait for making builders that set kafka config. Gives you helpers + * for setting the config with values of different types: + * - Time + * - StorageUnit + * - class name + * + * If your builder would be useful as part of another builder, + * implement your methods in a method trait that extends KafkaConfigMethods, + * so that other builders can include you. + * See KafkaProducerConfigMethods and FinagleKafkaConsumerBuilderMethods + * for examples of this pattern. + * + * @tparam Self The type of your concrete builder. This lets all the convenience + * methods here and all the methods defined in intermediate traits + * return that type. + */ +trait KafkaConfigMethods[Self] extends KafkaConfig { + type This = Self + + /** + * Override this in your concrete builder with a copy constructor for that + * builder that replaces the old configMap with a modified one. + */ + protected def fromConfigMap(configMap: Map[String, String]): This + + def withConfig(key: String, value: String): This = + fromConfigMap(configMap + (key -> value)) + + def withConfig(key: String, value: Duration): This = { + fromConfigMap(configMap + (key -> value.inMilliseconds.toString)) + } + + def withConfig(key: String, value: StorageUnit): This = { + fromConfigMap(configMap + (key -> value.bytes.toString)) + } + + protected def withClassName[T: Manifest](key: String): This = { + fromConfigMap(configMap + (key -> manifest[T].runtimeClass.getName)) + } + + protected def withClassNameBuilder[T: Manifest](key: String): This = { + val className = manifest[T].runtimeClass.getName + val classes = configMap.get(key) match { + case Some(classNameValues) => s"$classNameValues,$className" + case _ => className + } + fromConfigMap(configMap + (key -> classes)) + } +} + +/** + * Extend in your concrete configuration object so that the configMap + * can be converted to java.util.Properties. + * + * See KafkaProducerConfig and FinagleKafkaConsumerConfig + * for examples of this pattern. + */ +trait ToKafkaProperties { self: KafkaConfig => + def properties: Properties = { + val p = new Properties + configMap.foreach { + case (k, v) => + p.setProperty(k, v) + } + p + } +} diff --git a/kafka/src/main/scala/com/twitter/finatra/kafka/consumers/AsyncStreamKafkaConsumerBuilder.scala b/kafka/src/main/scala/com/twitter/finatra/kafka/consumers/AsyncStreamKafkaConsumerBuilder.scala new file mode 100644 index 0000000000..9c44641004 --- /dev/null +++ b/kafka/src/main/scala/com/twitter/finatra/kafka/consumers/AsyncStreamKafkaConsumerBuilder.scala @@ -0,0 +1,50 @@ +package com.twitter.finatra.kafka.consumers + +import com.twitter.concurrent.AsyncStream +import com.twitter.finatra.kafka.domain.KafkaTopic +import com.twitter.util.{Future, Time} +import org.apache.kafka.clients.consumer.ConsumerRecord +import scala.collection.JavaConverters._ + +case class AsyncStreamKafkaConsumerBuilder[K, V]( + asyncStreamConsumerConfig: AsyncStreamKafkaConsumerConfig[K, V] = + AsyncStreamKafkaConsumerConfig[K, V]()) + extends FinagleKafkaConsumerBuilderMethods[K, V, AsyncStreamKafkaConsumerBuilder[K, V]] { + override protected def fromFinagleConsumerConfig(config: FinagleKafkaConsumerConfig[K, V]): This = + AsyncStreamKafkaConsumerBuilder( + asyncStreamConsumerConfig.copy(finagleKafkaConsumerConfig = config) + ) + + override protected def finagleConsumerConfig: FinagleKafkaConsumerConfig[K, V] = + asyncStreamConsumerConfig.finagleKafkaConsumerConfig + + def topics(topics: Set[KafkaTopic]): This = + AsyncStreamKafkaConsumerBuilder(asyncStreamConsumerConfig.copy(topics = topics)) + + def subscribe(): ClosableAsyncStream[ConsumerRecord[K, V]] = { + new ClosableAsyncStream[ConsumerRecord[K, V]] { + + private val consumer = build() + + consumer.subscribe(asyncStreamConsumerConfig.topics) + + def asyncStream(): AsyncStream[ConsumerRecord[K, V]] = { + val futureConsumerRecords: Future[Seq[ConsumerRecord[K, V]]] = consumer + .poll(asyncStreamConsumerConfig.finagleKafkaConsumerConfig.pollTimeout).map( + _.iterator().asScala.toSeq + ) + val asyncStreamOfSeqs: AsyncStream[Seq[ConsumerRecord[K, V]]] = + AsyncStream.fromFuture(futureConsumerRecords) + asyncStreamOfSeqs.flatMap(AsyncStream.fromSeq) ++ asyncStream() + } + + override def close(deadline: Time): Future[Unit] = { + consumer.close(deadline) + } + } + } +} + +case class AsyncStreamKafkaConsumerConfig[K, V]( + finagleKafkaConsumerConfig: FinagleKafkaConsumerConfig[K, V] = FinagleKafkaConsumerConfig[K, V](), + topics: Set[KafkaTopic] = Set.empty) diff --git a/kafka/src/main/scala/com/twitter/finatra/kafka/consumers/ClosableAsyncStream.scala b/kafka/src/main/scala/com/twitter/finatra/kafka/consumers/ClosableAsyncStream.scala new file mode 100644 index 0000000000..7ffc667b9d --- /dev/null +++ b/kafka/src/main/scala/com/twitter/finatra/kafka/consumers/ClosableAsyncStream.scala @@ -0,0 +1,8 @@ +package com.twitter.finatra.kafka.consumers + +import com.twitter.concurrent.AsyncStream +import com.twitter.util.Closable + +trait ClosableAsyncStream[T] extends Closable { + def asyncStream(): AsyncStream[T] +} \ No newline at end of file diff --git a/kafka/src/main/scala/com/twitter/finatra/kafka/consumers/FinagleKafkaConsumer.scala b/kafka/src/main/scala/com/twitter/finatra/kafka/consumers/FinagleKafkaConsumer.scala new file mode 100644 index 0000000000..a8e69a7b64 --- /dev/null +++ b/kafka/src/main/scala/com/twitter/finatra/kafka/consumers/FinagleKafkaConsumer.scala @@ -0,0 +1,209 @@ +package com.twitter.finatra.kafka.consumers + +import com.twitter.finatra.kafka.domain.{KafkaTopic, SeekStrategy} +import com.twitter.finatra.utils.FuturePools +import com.twitter.inject.Logging +import com.twitter.util._ +import java.util +import java.util.concurrent.TimeUnit.SECONDS +import java.util.concurrent.atomic.AtomicBoolean +import org.apache.kafka.clients.consumer._ +import org.apache.kafka.common.TopicPartition +import scala.collection.JavaConverters._ + +/* + * Note: The current implementation relies on a future pool with a single thread since Kafka + * requires poll to always be called from the same thread. However, we can likely optimize the + * conversion to avoid the need for a future pool + */ +class FinagleKafkaConsumer[K, V](config: FinagleKafkaConsumerConfig[K, V]) + extends Closable + with Logging { + + private val groupId = config.kafkaConsumerConfig.configMap(ConsumerConfig.GROUP_ID_CONFIG) + private val keyDeserializer = config.keyDeserializer.get + private val valueDeserializer = config.valueDeserializer.get + private val seekStrategy = config.seekStrategy + private val rewindDuration = config.rewindDuration + private val singleThreadFuturePool = FuturePools.fixedPool(s"kafka-consumer-$groupId", 1) + private val consumer = createConsumer() + private var subscribed = false + private var assigned = false + private val initialSeekCompleted = new AtomicBoolean() + + /** + * This class will handle seek strategy when partitions are first assigned to this consumer. + * And it also takes an inner listener which can be passed in by user to handle user defined + * listening actions. + */ + private class SeekRebalanceListener(innerListener: Option[ConsumerRebalanceListener] = None) + extends ConsumerRebalanceListener { + override def onPartitionsAssigned(partitions: util.Collection[TopicPartition]): Unit = { + if (initialSeekCompleted.compareAndSet(false, true)) { + info(s"Applying seek strategy $seekStrategy") + seekStrategy match { + case SeekStrategy.BEGINNING => consumer.seekToBeginning(partitions) + case SeekStrategy.END => consumer.seekToEnd(partitions) + case SeekStrategy.REWIND => + require(rewindDuration.isDefined) + val seekTime = rewindDuration.get.ago + seekToTime(seekTime) + case _ => + } + // We don't need to commit offsets when resuming, only when we're seeking to a designated position + if (seekStrategy != SeekStrategy.RESUME) { + info(s"Committing offsets after seek is complete") + consumer.commitSync() + } + } + if (innerListener.isDefined) { + innerListener.get.onPartitionsAssigned(partitions) + } + } + + override def onPartitionsRevoked(partitions: util.Collection[TopicPartition]): Unit = { + if (innerListener.isDefined) { + innerListener.get.onPartitionsRevoked(partitions) + } + } + + /** + * Positions the consumer the a specific time for each partition assigned to this consumer. + */ + private def seekToTime(seekTime: Time): Unit = { + val partitionTimestamps = (consumer.assignment.asScala map { topicPartition => + topicPartition -> java.lang.Long.valueOf(seekTime.inMillis) + }).toMap.asJava + consumer.offsetsForTimes(partitionTimestamps).asScala foreach { + partitionOffsetPair: (TopicPartition, OffsetAndTimestamp) => + val partition = partitionOffsetPair._1 + val offset = partitionOffsetPair._2.offset() + consumer.seek(partition, offset) + } + } + } + + /** + * Subscribe to the given list of topics to get dynamically assigned partitions. + * + * We block start until the consumer is subscribed to the topic. We're using a future pool here + * since Kafka requires all interactions with the consumer to come from a single thread. + */ + def subscribe( + topics: Set[KafkaTopic], + listener: Option[ConsumerRebalanceListener] = None + ): Unit = { + Await.result(singleThreadFuturePool({ + assert(!subscribed, "subscribe() has already been called") + val topicNames = topics.map(_.name) + consumer.subscribe(topicNames.asJava, new SeekRebalanceListener(listener)) + info(s"Subscribed to topics ${consumer.subscription()}") + subscribed = true + })) + } + + def assignment(): Future[util.Set[TopicPartition]] = { + singleThreadFuturePool(consumer.assignment()) + } + + /** + * Manually assign partitions to the consumer. This assignment is not incremental but replaces + * the previous assignment. + */ + def assign(partitions: Seq[TopicPartition]): Unit = { + Await.result(singleThreadFuturePool({ + consumer.assign(partitions.asJavaCollection) + info(s"Assigned to topics-partitions ${consumer.assignment()}") + assigned = true + })) + } + + def seekToOffset(partition: TopicPartition, offset: Long): Future[Unit] = { + singleThreadFuturePool(consumer.seek(partition, offset)) + } + + def seekToBeginning(partitions: util.Collection[TopicPartition]): Future[Unit] = { + singleThreadFuturePool(consumer.seekToBeginning(partitions)) + } + + def seekToEnd(partitions: util.Collection[TopicPartition]): Future[Unit] = { + singleThreadFuturePool(consumer.seekToEnd(partitions)) + } + + def offsetsForTimes( + timestampsToSearch: java.util.Map[TopicPartition, java.lang.Long] + ): Future[util.Map[TopicPartition, OffsetAndTimestamp]] = { + singleThreadFuturePool(consumer.offsetsForTimes(timestampsToSearch)) + } + + /** + * @param timeout The time, in milliseconds, spent waiting in poll if data is not available in the buffer. + * If 0, returns immediately with any records that are available currently in the buffer, else returns empty. + * Must not be negative. + * @return map of topic to records since the last fetch for the subscribed list of topics and partitions + */ + def poll(timeout: Duration = config.pollTimeout): Future[ConsumerRecords[K, V]] = { + singleThreadFuturePool({ + assert(subscribed || assigned, "either subscribe() or assign() has not been called") + // We should only seek after the first poll and getting partition assignment successfully + consumer.poll(timeout.inMilliseconds) + }) + } + + /** + * Commit offsets returned on the last poll() for all the subscribed list of topics and partition. + */ + def commit(): Future[Unit] = { + singleThreadFuturePool({ + consumer.commitSync() + }) + } + + /** + * Get the offset of the next record that will be fetched (if a record with that offset exists). + */ + def position(partition: TopicPartition): Future[Long] = { + singleThreadFuturePool({ + consumer.position(partition) + }) + } + + /** + * Commit the specified offsets for the specified list of topics and partitions. + */ + def commit(offsets: util.Map[TopicPartition, OffsetAndMetadata]): Future[Unit] = { + singleThreadFuturePool({ + consumer.commitSync(offsets) + }) + } + + /** + * Wakeup the consumer. This method is thread-safe and is useful in particular to abort a long poll. + * + * The thread which is blocking in an operation will throw WakeupException + * If no thread is blocking, the next blocking call will raise it instead. + */ + def wakeup(): Unit = { + consumer.wakeup() + } + + def close(deadline: Time): Future[Unit] = { + try { + singleThreadFuturePool({ + info(s"Closing consumer for topics: ${consumer.subscription()}") + consumer.close(deadline.inSeconds, SECONDS) + }).ensure { + singleThreadFuturePool.executor.shutdown() + } + } catch { + case e: Exception => + error(s"Error closing consumer ${groupId}", e) + Future.exception(e) + } + } + + /* Private */ + private def createConsumer(): KafkaConsumer[K, V] = { + new KafkaConsumer[K, V](config.properties, keyDeserializer, valueDeserializer) + } +} diff --git a/kafka/src/main/scala/com/twitter/finatra/kafka/consumers/FinagleKafkaConsumerBuilder.scala b/kafka/src/main/scala/com/twitter/finatra/kafka/consumers/FinagleKafkaConsumerBuilder.scala new file mode 100644 index 0000000000..00cd152a77 --- /dev/null +++ b/kafka/src/main/scala/com/twitter/finatra/kafka/consumers/FinagleKafkaConsumerBuilder.scala @@ -0,0 +1,120 @@ +package com.twitter.finatra.kafka.consumers + +import com.twitter.conversions.DurationOps._ +import com.twitter.finatra.kafka.config.{KafkaConfig, ToKafkaProperties} +import com.twitter.finatra.kafka.domain.SeekStrategy +import com.twitter.finatra.kafka.stats.KafkaFinagleMetricsReporter +import com.twitter.util.Duration +import java.util.Properties +import org.apache.kafka.clients.consumer.ConsumerConfig +import org.apache.kafka.common.serialization.Deserializer + +/** + * Trait defining methods for configuring a FinagleKafkaConsumer. + * It extends KafkaConsumerConfig, to get all the Kafka-specific methods that + * set elements of the java.util.Properties object. + * It also adds methods for configuring new parameters that are specific to + * FinagleKafkaConsumer (pollTimeout and seekStrategy). + * + * @tparam K The key type of the consumer this will build. + * @tparam V The value type of the consumer this will build. + * @tparam Self The type of the concrete builder that includes these methods. + */ +trait FinagleKafkaConsumerBuilderMethods[K, V, Self] extends KafkaConsumerConfigMethods[Self] { + protected def finagleConsumerConfig: FinagleKafkaConsumerConfig[K, V] + protected def fromFinagleConsumerConfig(config: FinagleKafkaConsumerConfig[K, V]): This + + override protected def configMap: Map[String, String] = + finagleConsumerConfig.kafkaConsumerConfig.configMap + override protected def fromConfigMap(configMap: Map[String, String]): This = + fromFinagleConsumerConfig( + finagleConsumerConfig.copy(kafkaConsumerConfig = KafkaConsumerConfig(configMap)) + ) + + /** + * Deserializer class for key + */ + def keyDeserializer(keyDeserializer: Deserializer[K]): This = + fromFinagleConsumerConfig(finagleConsumerConfig.copy(keyDeserializer = Some(keyDeserializer))) + + /** + * Deserializer class for value + */ + def valueDeserializer(valueDeserializer: Deserializer[V]): This = + fromFinagleConsumerConfig( + finagleConsumerConfig.copy(valueDeserializer = Some(valueDeserializer)) + ) + + /** + * Default poll timeout in milliseconds + */ + def pollTimeout(pollTimeout: Duration): This = + fromFinagleConsumerConfig(finagleConsumerConfig.copy(pollTimeout = pollTimeout)) + + /** + * Whether the consumer should start from end, beginning or from the offset + */ + def seekStrategy(seekStrategy: SeekStrategy): This = + fromFinagleConsumerConfig(finagleConsumerConfig.copy(seekStrategy = seekStrategy)) + + /** + * If using SeekStrategy.REWIND, specify the duration back in time to rewind and start consuming from + */ + def rewindDuration(rewindDuration: Duration): This = + fromFinagleConsumerConfig(finagleConsumerConfig.copy(rewindDuration = Some(rewindDuration))) + + /** + * For KafkaFinagleMetricsReporter: whether to include node-level metrics. + */ + def includeNodeMetrics(include: Boolean): This = + fromFinagleConsumerConfig(finagleConsumerConfig.copy(includeNodeMetrics = include)) + + def build(): FinagleKafkaConsumer[K, V] = { + validateConfigs(finagleConsumerConfig) + new FinagleKafkaConsumer[K, V](finagleConsumerConfig) + } + + protected def validateConfigs(config: FinagleKafkaConsumerConfig[K, V]) = { + require( + configMap.get(ConsumerConfig.GROUP_ID_CONFIG).isDefined, + "FinagleKafkaConsumerBuilder: groupId must be configured" + ) + require( + config.keyDeserializer.isDefined, + "FinagleKafkaConsumerBuilder: keyDeserializer must be configured" + ) + require( + config.valueDeserializer.isDefined, + "FinagleKafkaConsumerBuilder: valueDeserializer must be configured" + ) + } +} + +case class FinagleKafkaConsumerBuilder[K, V]( + override protected val finagleConsumerConfig: FinagleKafkaConsumerConfig[K, V] = + FinagleKafkaConsumerConfig[K, V]()) + extends FinagleKafkaConsumerBuilderMethods[K, V, FinagleKafkaConsumerBuilder[K, V]] { + override protected def fromFinagleConsumerConfig(config: FinagleKafkaConsumerConfig[K, V]): This = + new FinagleKafkaConsumerBuilder[K, V](config) +} + +case class FinagleKafkaConsumerConfig[K, V]( + kafkaConsumerConfig: KafkaConsumerConfig = KafkaConsumerConfig(), + keyDeserializer: Option[Deserializer[K]] = None, + valueDeserializer: Option[Deserializer[V]] = None, + pollTimeout: Duration = 100.millis, + seekStrategy: SeekStrategy = SeekStrategy.RESUME, + rewindDuration: Option[Duration] = None, + includeNodeMetrics: Boolean = false) + extends KafkaConfig + with ToKafkaProperties { + override protected def configMap: Map[String, String] = kafkaConsumerConfig.configMap + + override def properties: Properties = { + val properties = super.properties + + properties.put(KafkaFinagleMetricsReporter.IncludeNodeMetrics, includeNodeMetrics.toString) + + properties + } +} diff --git a/kafka/src/main/scala/com/twitter/finatra/kafka/consumers/KafkaConsumerConfig.scala b/kafka/src/main/scala/com/twitter/finatra/kafka/consumers/KafkaConsumerConfig.scala new file mode 100644 index 0000000000..8fc558ccee --- /dev/null +++ b/kafka/src/main/scala/com/twitter/finatra/kafka/consumers/KafkaConsumerConfig.scala @@ -0,0 +1,141 @@ +package com.twitter.finatra.kafka.consumers + +import com.twitter.finatra.kafka.config.{KafkaConfigMethods, ToKafkaProperties} +import com.twitter.finatra.kafka.domain.{IsolationLevel, KafkaGroupId} +import com.twitter.finatra.kafka.stats.KafkaFinagleMetricsReporter +import com.twitter.finatra.kafka.utils.BootstrapServerUtils +import com.twitter.util.{Duration, StorageUnit} +import org.apache.kafka.clients.consumer.{ConsumerConfig, OffsetResetStrategy} +import org.apache.kafka.common.metrics.Sensor.RecordingLevel +import com.twitter.conversions.DurationOps._ +import com.twitter.finatra.kafka.interceptors.MonitoringConsumerInterceptor +import com.twitter.inject.Logging + +object KafkaConsumerConfig { + def apply(): KafkaConsumerConfig = + new KafkaConsumerConfig() + .metricReporter[KafkaFinagleMetricsReporter] + .metricsRecordingLevel(RecordingLevel.INFO) + .metricsSampleWindow(60.seconds) + .interceptor[MonitoringConsumerInterceptor] +} + +trait KafkaConsumerConfigMethods[Self] extends KafkaConfigMethods[Self] with Logging { + def dest(dest: String): This = bootstrapServers(BootstrapServerUtils.lookupBootstrapServers(dest)) + + def autoCommitInterval(duration: Duration): This = + withConfig(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, duration) + + def autoOffsetReset(offsetResetStrategy: OffsetResetStrategy): This = + withConfig(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, offsetResetStrategy.toString.toLowerCase) + + def bootstrapServers(servers: String): This = + withConfig(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, servers) + + def checkCrcs(boolean: Boolean): This = + withConfig(ConsumerConfig.CHECK_CRCS_CONFIG, boolean.toString) + + def clientId(clientId: String): This = + withConfig(ConsumerConfig.CLIENT_ID_CONFIG, clientId) + + def connectionsMaxIdle(duration: Duration): This = + withConfig(ConsumerConfig.CONNECTIONS_MAX_IDLE_MS_CONFIG, duration) + + def enableAutoCommit(boolean: Boolean): This = + withConfig(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, boolean.toString) + + def excludeInternalTopics(boolean: Boolean): This = + withConfig(ConsumerConfig.EXCLUDE_INTERNAL_TOPICS_CONFIG, boolean.toString) + + def fetchMax(storageUnit: StorageUnit): This = + withConfig(ConsumerConfig.FETCH_MAX_BYTES_CONFIG, storageUnit) + + def fetchMaxWait(duration: Duration): This = + withConfig(ConsumerConfig.FETCH_MAX_WAIT_MS_CONFIG, duration) + + def fetchMin(storageUnit: StorageUnit): This = + withConfig(ConsumerConfig.FETCH_MIN_BYTES_CONFIG, storageUnit) + + def groupId(groupId: KafkaGroupId): This = + withConfig(ConsumerConfig.GROUP_ID_CONFIG, groupId.name) + + def heartbeatInterval(duration: Duration): This = + withConfig(ConsumerConfig.HEARTBEAT_INTERVAL_MS_CONFIG, duration) + + def interceptor[T: Manifest]: This = { + val interceptorKey = ConsumerConfig.INTERCEPTOR_CLASSES_CONFIG + configMap.get(interceptorKey) match { + case Some(interceptors) + if interceptors.split(",").contains(manifest[T].runtimeClass.getName) => + warn( + s"Appending duplicate consumer interceptor class name ${manifest[T].runtimeClass.getName} in $interceptors ignored" + ) + fromConfigMap(configMap) + case _ => + withClassNameBuilder(interceptorKey) + } + } + + def isolationLevel(isolationLevel: IsolationLevel): This = + withConfig(ConsumerConfig.ISOLATION_LEVEL_CONFIG, isolationLevel.toString) + + def maxPartitionFetch(storageUnit: StorageUnit) = + withConfig(ConsumerConfig.MAX_PARTITION_FETCH_BYTES_CONFIG, storageUnit) + + def maxPollInterval(duration: Duration): This = + withConfig(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, duration) + + def maxPollRecords(int: Int): This = + withConfig(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, int.toString) + + def metadataMaxAge(duration: Duration): This = + withConfig(ConsumerConfig.METADATA_MAX_AGE_CONFIG, duration) + + def metricReporter[T: Manifest]: This = + withClassName[T](ConsumerConfig.METRIC_REPORTER_CLASSES_CONFIG) + + def metricsNumSamples(int: Int): This = + withConfig(ConsumerConfig.METRICS_NUM_SAMPLES_CONFIG, int.toString) + + def metricsRecordingLevel(recordingLevel: RecordingLevel): This = + withConfig(ConsumerConfig.METRICS_RECORDING_LEVEL_CONFIG, recordingLevel.name) + + def metricsSampleWindow(duration: Duration): This = + withConfig(ConsumerConfig.METRICS_SAMPLE_WINDOW_MS_CONFIG, duration) + + def partitionAssignmentStrategy[T: Manifest]: This = + withClassName(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG) + + def receiveBuffer(storageUnit: StorageUnit): This = + withConfig(ConsumerConfig.RECEIVE_BUFFER_CONFIG, storageUnit) + + def reconnectBackoffMax(duration: Duration): This = + withConfig(ConsumerConfig.RECONNECT_BACKOFF_MAX_MS_CONFIG, duration) + + def reconnectBackoff(duration: Duration): This = + withConfig(ConsumerConfig.RECONNECT_BACKOFF_MS_CONFIG, duration) + + def requestTimeout(duration: Duration): This = + withConfig(ConsumerConfig.REQUEST_TIMEOUT_MS_CONFIG, duration) + + def retryBackoff(duration: Duration): This = + withConfig(ConsumerConfig.RETRY_BACKOFF_MS_CONFIG, duration) + + def sendBufferConfig(storageUnit: StorageUnit): This = + withConfig(ConsumerConfig.SEND_BUFFER_CONFIG, storageUnit) + + def sessionTimeout(duration: Duration): This = + withConfig(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, duration) + + // Unsupported. Pass instances directly to the consumer instead. + // ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG + // ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG +} + +case class KafkaConsumerConfig private (configMap: Map[String, String] = Map.empty) + extends KafkaConsumerConfigMethods[KafkaConsumerConfig] + with ToKafkaProperties { + + override def fromConfigMap(config: Map[String, String]): KafkaConsumerConfig = + KafkaConsumerConfig(config) +} diff --git a/kafka/src/main/scala/com/twitter/finatra/kafka/domain/KafkaGroupId.scala b/kafka/src/main/scala/com/twitter/finatra/kafka/domain/KafkaGroupId.scala new file mode 100644 index 0000000000..0746df1656 --- /dev/null +++ b/kafka/src/main/scala/com/twitter/finatra/kafka/domain/KafkaGroupId.scala @@ -0,0 +1,3 @@ +package com.twitter.finatra.kafka.domain + +case class KafkaGroupId(name: String) diff --git a/kafka/src/main/scala/com/twitter/finatra/kafka/domain/KafkaTopic.scala b/kafka/src/main/scala/com/twitter/finatra/kafka/domain/KafkaTopic.scala new file mode 100644 index 0000000000..cde91ccf15 --- /dev/null +++ b/kafka/src/main/scala/com/twitter/finatra/kafka/domain/KafkaTopic.scala @@ -0,0 +1,3 @@ +package com.twitter.finatra.kafka.domain + +case class KafkaTopic(name: String) diff --git a/kafka/src/main/scala/com/twitter/finatra/kafka/interceptors/InstanceMetadataProducerInterceptor.scala b/kafka/src/main/scala/com/twitter/finatra/kafka/interceptors/InstanceMetadataProducerInterceptor.scala new file mode 100644 index 0000000000..93a8efffb1 --- /dev/null +++ b/kafka/src/main/scala/com/twitter/finatra/kafka/interceptors/InstanceMetadataProducerInterceptor.scala @@ -0,0 +1,53 @@ +package com.twitter.finatra.kafka.interceptors + +import com.twitter.finatra.kafka.interceptors.InstanceMetadataProducerInterceptor._ +import com.twitter.finatra.kafka.utils.ConfigUtils +import java.util +import org.apache.kafka.clients.producer.{ProducerInterceptor, ProducerRecord, RecordMetadata} +import org.apache.kafka.common.serialization.Serdes + +object InstanceMetadataProducerInterceptor { + val KafkaInstanceKeyFlagName = "kafka.instance.key" + val InstanceKeyHeaderName = "instance_key" +} + +/** + * An interceptor that includes metadata about specific instance in `record` headers. + * + * `instance_key` is a configurable header serialized with `Serdes.StringSerde`. The value + * of the header is configured by `kafka.instance.key` application flag. Only if the flag + * is set will a header key/value be serialized into the record. + */ +class InstanceMetadataProducerInterceptor extends ProducerInterceptor[Any, Any] { + private var instanceKey = "" + private var instanceKeyBytes: Array[Byte] = _ + private val serializer = new Serdes.StringSerde().serializer() + + override def onSend(record: ProducerRecord[Any, Any]): ProducerRecord[Any, Any] = { + if (instanceKey.nonEmpty) { + record + .headers() + .add(InstanceKeyHeaderName, instanceKeyBytes) + } + record + } + + override def onAcknowledgement(metadata: RecordMetadata, exception: Exception): Unit = {} + + override def close(): Unit = { + serializer.close() + instanceKey = "" + instanceKeyBytes = null + } + + override def configure(configs: util.Map[String, _]): Unit = { + instanceKey = ConfigUtils.getConfigOrElse(configs, key = KafkaInstanceKeyFlagName, default = "") + if (instanceKey.nonEmpty) { + instanceKeyBytes = serializeInstanceKeyHeader() + } + } + + protected def serializeInstanceKeyHeader(): Array[Byte] = { + serializer.serialize("", instanceKey) + } +} diff --git a/kafka/src/main/scala/com/twitter/finatra/kafka/interceptors/MonitoringConsumerInterceptor.scala b/kafka/src/main/scala/com/twitter/finatra/kafka/interceptors/MonitoringConsumerInterceptor.scala new file mode 100644 index 0000000000..13f8ffe989 --- /dev/null +++ b/kafka/src/main/scala/com/twitter/finatra/kafka/interceptors/MonitoringConsumerInterceptor.scala @@ -0,0 +1,109 @@ +package com.twitter.finatra.kafka.interceptors + +import com.google.common.primitives.Longs +import com.twitter.finatra.kafka.interceptors.PublishTimeProducerInterceptor._ +import com.twitter.finagle.stats.{LoadedStatsReceiver, Stat, StatsReceiver} +import com.twitter.finatra.kafka.utils.ConfigUtils +import com.twitter.inject.Injector +import java.util +import org.apache.kafka.clients.consumer.{ConsumerInterceptor, ConsumerRecords, OffsetAndMetadata} +import org.apache.kafka.common.TopicPartition +import org.joda.time.DateTimeUtils +import scala.collection.mutable + +object MonitoringConsumerInterceptor { + private var globalStatsReceiver: StatsReceiver = LoadedStatsReceiver + + def init(injector: Injector): Unit = { + globalStatsReceiver = injector.instance[StatsReceiver] + } +} + +/** + * An interceptor that looks for the `publish_time` header and record timestamp and calculates + * how much time has passed since the each of those times and updates stats for each. + */ +class MonitoringConsumerInterceptor extends ConsumerInterceptor[Any, Any] { + + private var consumerStatsReceiver: StatsReceiver = _ + private val topicNameToLagStat = mutable.Map[TopicAndStatName, Stat]() + private var enabled: Boolean = _ + + override def configure(configs: util.Map[String, _]): Unit = { + val consumerClientId = ConfigUtils.getConfigOrElse(configs, "client.id", "") + enabled = enableInterceptorForClientId(consumerClientId) + + val statsScope = ConfigUtils.getConfigOrElse(configs, key = "stats_scope", default = "kafka") + consumerStatsReceiver = MonitoringConsumerInterceptor.globalStatsReceiver + .scope(statsScope) + .scope("consumer") + } + + override def onConsume(records: ConsumerRecords[Any, Any]): ConsumerRecords[Any, Any] = { + if (enabled) { + val now = DateTimeUtils.currentTimeMillis() + val iterator = records.iterator() + while (iterator.hasNext) { + val record = iterator.next() + val topic = record.topic() + + val publishTimeHeader = record.headers().lastHeader(PublishTimeHeaderName) + if (publishTimeHeader != null) { + val publishTimeHeaderMillis = Longs.fromByteArray(publishTimeHeader.value()) + updateLagStat( + now = now, + topic = topic, + timestamp = publishTimeHeaderMillis, + statName = "time_since_record_published_ms" + ) + } + + val recordTimestamp = record.timestamp() + updateLagStat( + now = now, + topic = topic, + timestamp = recordTimestamp, + statName = "time_since_record_timestamp_ms" + ) + } + } + + records + } + + override def onCommit(offsets: util.Map[TopicPartition, OffsetAndMetadata]): Unit = {} + + override def close(): Unit = { + topicNameToLagStat.clear() + } + + /** + * Determines if this interceptor should be enabled given the consumer client id + */ + protected def enableInterceptorForClientId(consumerClientId: String): Boolean = { + true + } + + /* Private */ + + private def createNewStat(topicName: String, statName: String): Stat = { + consumerStatsReceiver + .scope(topicName) + .stat(statName) + } + + //TODO: Optimize map lookup which is a hotspot during profiling + private def updateLagStat(now: Long, topic: String, timestamp: Long, statName: String): Unit = { + val lag = now - timestamp + if (lag >= 0) { + val cacheKey = TopicAndStatName(topic, statName) + val lagStat = topicNameToLagStat.getOrElseUpdate( + cacheKey, + createNewStat(topicName = topic, statName = statName) + ) + lagStat.add(lag) + } + } + + private case class TopicAndStatName(topic: String, statName: String) +} diff --git a/kafka/src/main/scala/com/twitter/finatra/kafka/interceptors/PublishTimeProducerInterceptor.scala b/kafka/src/main/scala/com/twitter/finatra/kafka/interceptors/PublishTimeProducerInterceptor.scala new file mode 100644 index 0000000000..24925268ff --- /dev/null +++ b/kafka/src/main/scala/com/twitter/finatra/kafka/interceptors/PublishTimeProducerInterceptor.scala @@ -0,0 +1,30 @@ +package com.twitter.finatra.kafka.interceptors + +import com.google.common.primitives.Longs +import com.twitter.finatra.kafka.interceptors.PublishTimeProducerInterceptor._ +import java.util +import org.apache.kafka.clients.producer.{ProducerInterceptor, ProducerRecord, RecordMetadata} +import org.joda.time.DateTimeUtils + +object PublishTimeProducerInterceptor { + val PublishTimeHeaderName = "publish_time" +} + +/** + * An interceptor that puts a header on each Kafka record indicating when the record was published. + */ +class PublishTimeProducerInterceptor extends ProducerInterceptor[Any, Any] { + + override def onSend(record: ProducerRecord[Any, Any]): ProducerRecord[Any, Any] = { + record + .headers() + .add(PublishTimeHeaderName, Longs.toByteArray(DateTimeUtils.currentTimeMillis())) + record + } + + override def onAcknowledgement(metadata: RecordMetadata, exception: Exception): Unit = {} + + override def close(): Unit = {} + + override def configure(configs: util.Map[String, _]): Unit = {} +} diff --git a/kafka/src/main/scala/com/twitter/finatra/kafka/modules/KafkaBootstrapModule.scala b/kafka/src/main/scala/com/twitter/finatra/kafka/modules/KafkaBootstrapModule.scala new file mode 100644 index 0000000000..4e78c71682 --- /dev/null +++ b/kafka/src/main/scala/com/twitter/finatra/kafka/modules/KafkaBootstrapModule.scala @@ -0,0 +1,17 @@ +package com.twitter.finatra.kafka.modules + +import com.twitter.app.Flag +import com.twitter.inject.TwitterModule + +/** + * Use this module when your app connects to a kafka cluster. Your app should use this flag + * to indicate which kafka cluster your app should talk to. + */ +object KafkaBootstrapModule extends TwitterModule { + + val kafkaBootstrapServers: Flag[String] = + flag[String]( + "kafka.bootstrap.servers", + "Destination of kafka bootstrap servers. Can be a wily path, or a host:port" + ) +} diff --git a/kafka/src/main/scala/com/twitter/finatra/kafka/producers/FinagleKafkaProducer.scala b/kafka/src/main/scala/com/twitter/finatra/kafka/producers/FinagleKafkaProducer.scala new file mode 100644 index 0000000000..c45dbd77ce --- /dev/null +++ b/kafka/src/main/scala/com/twitter/finatra/kafka/producers/FinagleKafkaProducer.scala @@ -0,0 +1,117 @@ +package com.twitter.finatra.kafka.producers + +import com.twitter.finagle.stats.Stat +import com.twitter.finatra.kafka.stats.KafkaFinagleMetricsReporter.sanitizeMetricName +import com.twitter.inject.Logging +import com.twitter.util._ +import java.util +import java.util.concurrent.TimeUnit.SECONDS +import org.apache.kafka.clients.consumer.OffsetAndMetadata +import org.apache.kafka.clients.producer._ +import org.apache.kafka.common.{PartitionInfo, TopicPartition} +import scala.collection.JavaConverters._ + +class FinagleKafkaProducer[K, V](config: FinagleKafkaProducerConfig[K, V]) + extends Closable + with Logging { + + private val keySerializer = config.keySerializer.get + private val valueSerializer = config.valueSerializer.get + private val producer = createProducer() + + private val clientId = + config.kafkaProducerConfig.configMap.getOrElse(ProducerConfig.CLIENT_ID_CONFIG, "no_client_id") + private val scopedStatsReceiver = + config.statsReceiver.scope("kafka").scope(sanitizeMetricName(clientId)) + private val timestampOnSendLag = scopedStatsReceiver.stat("record_timestamp_on_send_lag") + private val timestampOnSuccessLag = scopedStatsReceiver.stat("record_timestamp_on_success_lag") + private val timestampOnFailureLag = scopedStatsReceiver.stat("record_timestamp_on_failure_lag") + + /* Public */ + + //Note: Default partitionIdx should be set to null, see: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=69406838 + //TODO: producer.send will throw exceptions in the transactional API which are not recoverable. As such, we may want to continue to allow these exceptions + //to be thrown by this method, but this may be unexpected as most Future returning methods will return failed futures rather than throwing + //exceptions. To be discussed... + def send( + topic: String, + key: K, + value: V, + timestamp: Long, + partitionIdx: Option[Integer] = None + ): Future[RecordMetadata] = { + val producerRecord = new ProducerRecord[K, V]( + topic, + partitionIdx.orNull, + timestamp, + key, + value + ) + send(producerRecord) + } + + def send(producerRecord: ProducerRecord[K, V]): Future[RecordMetadata] = { + val resultPromise = Promise[RecordMetadata]() + calcTimestampLag(timestampOnSendLag, producerRecord.timestamp) + producer.send( + producerRecord, + new Callback { + override def onCompletion(metadata: RecordMetadata, exception: Exception): Unit = { + if (exception != null) { + calcTimestampLag(timestampOnFailureLag, producerRecord.timestamp) + resultPromise.setException(exception) + } else { + calcTimestampLag(timestampOnSuccessLag, producerRecord.timestamp) + resultPromise.setValue(metadata) + } + } + } + ) + resultPromise + } + + def initTransactions(): Unit = { + producer.initTransactions() + } + + def beginTransaction(): Unit = { + producer.beginTransaction() + } + + def sendOffsetsToTransaction( + offsets: Map[TopicPartition, OffsetAndMetadata], + consumerGroupId: String + ): Unit = { + producer.sendOffsetsToTransaction(offsets.asJava, consumerGroupId) + } + + def commitTransaction(): Unit = { + producer.commitTransaction() + } + + def abortTransaction(): Unit = { + producer.abortTransaction() + } + + def flush(): Unit = { + producer.flush() + } + + def partitionsFor(topic: String): util.List[PartitionInfo] = { + producer.partitionsFor(topic) + } + + override def close(deadline: Time): Future[Unit] = { + Future(producer.close(deadline.inSeconds, SECONDS)) + } + + /* Private */ + + private def createProducer(): KafkaProducer[K, V] = { + new KafkaProducer[K, V](config.properties, keySerializer, valueSerializer) + } + + private def calcTimestampLag(stat: Stat, timestamp: Long): Unit = { + stat.add(System.currentTimeMillis() - timestamp) + } +} diff --git a/kafka/src/main/scala/com/twitter/finatra/kafka/producers/FinagleKafkaProducerBuilder.scala b/kafka/src/main/scala/com/twitter/finatra/kafka/producers/FinagleKafkaProducerBuilder.scala new file mode 100644 index 0000000000..6dcd96e12b --- /dev/null +++ b/kafka/src/main/scala/com/twitter/finatra/kafka/producers/FinagleKafkaProducerBuilder.scala @@ -0,0 +1,83 @@ +package com.twitter.finatra.kafka.producers + +import com.twitter.finagle.stats.{LoadedStatsReceiver, StatsReceiver} +import com.twitter.finatra.kafka.config.{KafkaConfig, ToKafkaProperties} +import com.twitter.finatra.kafka.stats.KafkaFinagleMetricsReporter +import java.util.Properties +import org.apache.kafka.clients.producer.ProducerConfig +import org.apache.kafka.common.serialization.Serializer + +case class FinagleKafkaProducerBuilder[K, V]( + config: FinagleKafkaProducerConfig[K, V] = FinagleKafkaProducerConfig[K, V]()) + extends KafkaProducerConfigMethods[FinagleKafkaProducerBuilder[K, V]] { + override protected def fromConfigMap(configMap: Map[String, String]): This = + FinagleKafkaProducerBuilder(config.copy(kafkaProducerConfig = KafkaProducerConfig(configMap))) + + override protected def configMap: Map[String, String] = config.kafkaProducerConfig.configMap + + protected def withConfig(config: FinagleKafkaProducerConfig[K, V]): This = + new FinagleKafkaProducerBuilder[K, V](config) + + /** + * Serializer class for key + */ + def keySerializer(keySerializer: Serializer[K]): This = + withConfig(config.copy(keySerializer = Some(keySerializer))) + + /** + * Serializer class for value + */ + def valueSerializer(valueSerializer: Serializer[V]): This = + withConfig(config.copy(valueSerializer = Some(valueSerializer))) + + /** + * For KafkaFinagleMetricsReporter: whether to include node-level metrics. + */ + def includeNodeMetrics(include: Boolean): This = + withConfig(config.copy(includeNodeMetrics = include)) + + def statsReceiver(statsReceiver: StatsReceiver): This = + withConfig(config.copy(statsReceiver = statsReceiver)) + + def build(): FinagleKafkaProducer[K, V] = { + validateConfigs(config) + new FinagleKafkaProducer[K, V](config) + } + + private def validateConfigs(config: FinagleKafkaProducerConfig[K, V]) = { + require( + configMap.get(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG).isDefined, + "FinagleKafkaProducerBuilder: dest must be configured" + ) + require( + config.keySerializer.isDefined, + "FinagleKafkaProducerBuilder: keySerializer must be configured" + ) + require( + config.valueSerializer.isDefined, + "FinagleKafkaProducerBuilder: valueSerializer must be configured" + ) + require( + configMap.get(ProducerConfig.CLIENT_ID_CONFIG).isDefined, + "FinagleKafkaProducerBuilder: clientId must be configured" + ) + } +} + +case class FinagleKafkaProducerConfig[K, V]( + kafkaProducerConfig: KafkaProducerConfig = KafkaProducerConfig(), + keySerializer: Option[Serializer[K]] = None, + valueSerializer: Option[Serializer[V]] = None, + includeNodeMetrics: Boolean = false, + statsReceiver: StatsReceiver = LoadedStatsReceiver) + extends KafkaConfig + with ToKafkaProperties { + override def configMap: Map[String, String] = kafkaProducerConfig.configMap + override def properties: Properties = { + val properties = super.properties + + properties.put(KafkaFinagleMetricsReporter.IncludeNodeMetrics, includeNodeMetrics.toString) + + properties + } +} diff --git a/kafka/src/main/scala/com/twitter/finatra/kafka/producers/KafkaProducerConfig.scala b/kafka/src/main/scala/com/twitter/finatra/kafka/producers/KafkaProducerConfig.scala new file mode 100644 index 0000000000..4d29d547a9 --- /dev/null +++ b/kafka/src/main/scala/com/twitter/finatra/kafka/producers/KafkaProducerConfig.scala @@ -0,0 +1,134 @@ +package com.twitter.finatra.kafka.producers + +import com.twitter.finatra.kafka.domain.AckMode +import com.twitter.finatra.kafka.config.{KafkaConfigMethods, ToKafkaProperties} +import com.twitter.util.{Duration, StorageUnit} +import org.apache.kafka.clients.producer.ProducerConfig +import org.apache.kafka.common.metrics.Sensor.RecordingLevel +import org.apache.kafka.common.record.CompressionType +import com.twitter.conversions.DurationOps._ +import com.twitter.finatra.kafka.interceptors.PublishTimeProducerInterceptor +import com.twitter.finatra.kafka.stats.KafkaFinagleMetricsReporter +import com.twitter.finatra.kafka.utils.BootstrapServerUtils +import com.twitter.inject.Logging + +object KafkaProducerConfig { + def apply(): KafkaProducerConfig = + new KafkaProducerConfig() + .ackMode(AckMode.ALL) // kafka default is AckMode.ONE + .metricReporter[KafkaFinagleMetricsReporter] + .metricsRecordingLevel(RecordingLevel.INFO) + .metricsSampleWindow(60.seconds) + .interceptor[PublishTimeProducerInterceptor] +} + +trait KafkaProducerConfigMethods[Self] extends KafkaConfigMethods[Self] with Logging { + def dest(dest: String): This = bootstrapServers(BootstrapServerUtils.lookupBootstrapServers(dest)) + + def ackMode(ackMode: AckMode): This = + withConfig(ProducerConfig.ACKS_CONFIG, ackMode.toString) + + def batchSize(size: StorageUnit): This = + withConfig(ProducerConfig.BATCH_SIZE_CONFIG, size) + + def bootstrapServers(servers: String): This = + withConfig(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, servers) + + def bufferMemorySize(size: StorageUnit): This = + withConfig(ProducerConfig.BUFFER_MEMORY_CONFIG, size) + + def clientId(clientId: String): This = + withConfig(ProducerConfig.CLIENT_ID_CONFIG, clientId) + + def compressionType(compresionType: CompressionType): This = + withConfig(ProducerConfig.COMPRESSION_TYPE_CONFIG, compresionType.name) + + def connectionsMaxIdle(duration: Duration): This = + withConfig(ProducerConfig.CONNECTIONS_MAX_IDLE_MS_CONFIG, duration) + + def enableIdempotence(boolean: Boolean): This = + withConfig(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, boolean.toString) + + def interceptor[T: Manifest]: This = { + val interceptorKey = ProducerConfig.INTERCEPTOR_CLASSES_CONFIG + + configMap.get(interceptorKey) match { + case Some(interceptors) + if interceptors.split(",").contains(manifest[T].runtimeClass.getName) => + warn( + s"Appending duplicate producer interceptor class name ${manifest[T].runtimeClass.getName} in $interceptors ignored" + ) + fromConfigMap(configMap) + case _ => + withClassNameBuilder(interceptorKey) + } + } + + def linger(duration: Duration): This = + withConfig(ProducerConfig.LINGER_MS_CONFIG, duration) + + def maxBlock(duration: Duration): This = + withConfig(ProducerConfig.MAX_BLOCK_MS_CONFIG, duration) + + def maxInFlightRequestsPerConnection(max: Int): This = + withConfig(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, max.toString) + + def maxRequestSize(size: StorageUnit): This = + withConfig(ProducerConfig.MAX_REQUEST_SIZE_CONFIG, size) + + def metadataMaxAge(duration: Duration): This = + withConfig(ProducerConfig.METADATA_MAX_AGE_CONFIG, duration) + + def metricReporter[T: Manifest]: This = + withClassName[T](ProducerConfig.METRIC_REPORTER_CLASSES_CONFIG) + + def metricsSampleWindow(duration: Duration): This = + withConfig(ProducerConfig.METRICS_SAMPLE_WINDOW_MS_CONFIG, duration) + + def metricsNumSamples(samples: Int): This = + withConfig(ProducerConfig.METRICS_NUM_SAMPLES_CONFIG, samples.toString) + + def metricsRecordingLevel(recordingLevel: RecordingLevel): This = + withConfig(ProducerConfig.METRICS_RECORDING_LEVEL_CONFIG, recordingLevel.name) + + def partitioner[T: Manifest]: This = + withClassName[T](ProducerConfig.PARTITIONER_CLASS_CONFIG) + + def receiveBufferSize(size: StorageUnit): This = + withConfig(ProducerConfig.RECEIVE_BUFFER_CONFIG, size) + + def reconnectBackoffMax(duration: Duration): This = + withConfig(ProducerConfig.RECONNECT_BACKOFF_MAX_MS_CONFIG, duration) + + def reconnectBackoff(duration: Duration): This = + withConfig(ProducerConfig.RECONNECT_BACKOFF_MS_CONFIG, duration) + + def requestTimeout(duration: Duration): This = + withConfig(ProducerConfig.REQUEST_TIMEOUT_MS_CONFIG, duration) + + def retries(retries: Int): This = + withConfig(ProducerConfig.RETRIES_CONFIG, retries.toString) + + def retryBackoff(duration: Duration): This = + withConfig(ProducerConfig.RETRY_BACKOFF_MS_CONFIG, duration) + + def sendBufferSize(size: StorageUnit): This = + withConfig(ProducerConfig.SEND_BUFFER_CONFIG, size) + + def transactionalId(id: String): This = + withConfig(ProducerConfig.TRANSACTIONAL_ID_CONFIG, id) + + def transactionTimeout(duration: Duration): This = + withConfig(ProducerConfig.TRANSACTION_TIMEOUT_CONFIG, duration) + + // Unsupported. Pass instances directly to the producer instead. + // ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG + // ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG +} + +case class KafkaProducerConfig private (configMap: Map[String, String] = Map.empty) + extends KafkaProducerConfigMethods[KafkaProducerConfig] + with ToKafkaProperties { + override def fromConfigMap(config: Map[String, String]): KafkaProducerConfig = + KafkaProducerConfig(config) +} diff --git a/kafka/src/main/scala/com/twitter/finatra/kafka/serde/AbstractSerde.scala b/kafka/src/main/scala/com/twitter/finatra/kafka/serde/AbstractSerde.scala new file mode 100644 index 0000000000..a9b12fd448 --- /dev/null +++ b/kafka/src/main/scala/com/twitter/finatra/kafka/serde/AbstractSerde.scala @@ -0,0 +1,65 @@ +package com.twitter.finatra.kafka.serde + +import java.util +import org.apache.kafka.common.serialization.{Deserializer, Serde, Serializer} + +abstract class AbstractSerde[T] extends Serde[T] { + + private var _topic: String = _ + + private val _deserializer = new Deserializer[T] { + override def configure(configs: util.Map[String, _], isKey: Boolean): Unit = {} + + final override def deserialize(topic: String, bytes: Array[Byte]): T = { + if (bytes == null) { + null.asInstanceOf[T] + } else { + _topic = topic + AbstractSerde.this.deserialize(bytes) + } + } + + override def close(): Unit = {} + } + + private val _serializer = new Serializer[T] { + override def configure(configs: util.Map[String, _], isKey: Boolean): Unit = {} + + final override def serialize(topic: String, obj: T): Array[Byte] = { + if (obj == null) { + null + } else { + _topic = topic + AbstractSerde.this.serialize(obj) + } + } + + override def close(): Unit = {} + } + + /* Public Abstract */ + + def deserialize(bytes: Array[Byte]): T + + def serialize(obj: T): Array[Byte] + + /* Public */ + + override def configure(configs: util.Map[String, _], isKey: Boolean): Unit = {} + + override def deserializer(): Deserializer[T] = { + _deserializer + } + + override def serializer(): Serializer[T] = { + _serializer + } + + override def close(): Unit = {} + + /** + * The topic of the element being serialized or deserialized + * Note: topic is only available when called from the "deserialize" or "serialize" methods + */ + final def topic: String = _topic +} diff --git a/kafka/src/main/scala/com/twitter/finatra/kafka/serde/AbstractThriftDelegateSerde.scala b/kafka/src/main/scala/com/twitter/finatra/kafka/serde/AbstractThriftDelegateSerde.scala new file mode 100644 index 0000000000..46b40c0ba2 --- /dev/null +++ b/kafka/src/main/scala/com/twitter/finatra/kafka/serde/AbstractThriftDelegateSerde.scala @@ -0,0 +1,21 @@ +package com.twitter.finatra.kafka.serde + +import com.twitter.scrooge.ThriftStruct + +abstract class AbstractThriftDelegateSerde[T, ThriftType <: ThriftStruct: Manifest] + extends AbstractSerde[T] { + + private val thriftStructSerializer = ScalaSerdes.Thrift[ThriftType].thriftStructSerializer + + def toThrift(value: T): ThriftType + + def fromThrift(thrift: ThriftType): T + + final override def serialize(obj: T): Array[Byte] = { + thriftStructSerializer.toBytes(toThrift(obj)) + } + + final override def deserialize(bytes: Array[Byte]): T = { + fromThrift(thriftStructSerializer.fromBytes(bytes)) + } +} diff --git a/kafka/src/main/scala/com/twitter/finatra/kafka/serde/ReusableDeserialize.scala b/kafka/src/main/scala/com/twitter/finatra/kafka/serde/ReusableDeserialize.scala new file mode 100644 index 0000000000..bad6c24385 --- /dev/null +++ b/kafka/src/main/scala/com/twitter/finatra/kafka/serde/ReusableDeserialize.scala @@ -0,0 +1,5 @@ +package com.twitter.finatra.kafka.serde + +trait ReusableDeserialize[T] { + def deserialize(bytes: Array[Byte], reusable: T): Unit +} diff --git a/kafka/src/main/scala/com/twitter/finatra/kafka/serde/ScalaSerdes.scala b/kafka/src/main/scala/com/twitter/finatra/kafka/serde/ScalaSerdes.scala new file mode 100644 index 0000000000..f684137377 --- /dev/null +++ b/kafka/src/main/scala/com/twitter/finatra/kafka/serde/ScalaSerdes.scala @@ -0,0 +1,39 @@ +/* + * Copyright (c) 2016 Fred Cecilia, Valentin Kasas, Olivier Girardot + * + * Permission is hereby granted, free of charge, to any person obtaining a copy of + * this software and associated documentation files (the "Software"), to deal in + * the Software without restriction, including without limitation the rights to + * use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of + * the Software, and to permit persons to whom the Software is furnished to do so, + * subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in all + * copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS + * FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR + * COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER + * IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + */ + +//Derived from: https://github.com/aseigneurin/kafka-streams-scala +package com.twitter.finatra.kafka.serde + +import com.twitter.finatra.kafka.serde.internal._ +import com.twitter.scrooge.ThriftStruct + +object ScalaSerdes { + + def Thrift[T <: ThriftStruct: Manifest]: ThriftSerDe[T] = new ThriftSerDe[T] + + def CompactThrift[T <: ThriftStruct: Manifest]: CompactThriftSerDe[T] = new CompactThriftSerDe[T] + + val Int = IntSerde + + val Long = LongSerde + + val Double = DoubleSerde +} diff --git a/kafka/src/main/scala/com/twitter/finatra/kafka/serde/UnKeyed.scala b/kafka/src/main/scala/com/twitter/finatra/kafka/serde/UnKeyed.scala new file mode 100644 index 0000000000..71673f8483 --- /dev/null +++ b/kafka/src/main/scala/com/twitter/finatra/kafka/serde/UnKeyed.scala @@ -0,0 +1,5 @@ +package com.twitter.finatra.kafka.serde + +object UnKeyed extends UnKeyed + +class UnKeyed \ No newline at end of file diff --git a/kafka/src/main/scala/com/twitter/finatra/kafka/serde/UnKeyedSerde.scala b/kafka/src/main/scala/com/twitter/finatra/kafka/serde/UnKeyedSerde.scala new file mode 100644 index 0000000000..aa8da8fc06 --- /dev/null +++ b/kafka/src/main/scala/com/twitter/finatra/kafka/serde/UnKeyedSerde.scala @@ -0,0 +1,30 @@ +package com.twitter.finatra.kafka.serde + +import java.util +import org.apache.kafka.common.serialization.{Deserializer, Serde, Serializer} + +object UnKeyedSerde extends Serde[UnKeyed] { + override def configure(configs: util.Map[String, _], isKey: Boolean): Unit = {} + + override def deserializer: Deserializer[UnKeyed] = new Deserializer[UnKeyed] { + override def configure(configs: util.Map[String, _], isKey: Boolean): Unit = {} + + override def close(): Unit = {} + + override def deserialize(topic: String, data: Array[Byte]): UnKeyed = { + UnKeyed + } + } + + override def serializer: Serializer[UnKeyed] = new Serializer[UnKeyed] { + override def configure(configs: util.Map[String, _], isKey: Boolean): Unit = {} + + override def serialize(topic: String, data: UnKeyed): Array[Byte] = { + null + } + + override def close(): Unit = {} + } + + override def close(): Unit = {} +} diff --git a/kafka/src/main/scala/com/twitter/finatra/kafka/serde/internal/serde.scala b/kafka/src/main/scala/com/twitter/finatra/kafka/serde/internal/serde.scala new file mode 100644 index 0000000000..0b000bb751 --- /dev/null +++ b/kafka/src/main/scala/com/twitter/finatra/kafka/serde/internal/serde.scala @@ -0,0 +1,116 @@ +/* + * Copyright (c) 2016 Fred Cecilia, Valentin Kasas, Olivier Girardot + * + * Permission is hereby granted, free of charge, to any person obtaining a copy of + * this software and associated documentation files (the "Software"), to deal in + * the Software without restriction, including without limitation the rights to + * use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of + * the Software, and to permit persons to whom the Software is furnished to do so, + * subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in all + * copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS + * FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR + * COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER + * IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + */ +//Derived from: https://github.com/aseigneurin/kafka-streams-scala +package com.twitter.finatra.kafka.serde.internal + +import java.util +import org.apache.kafka.common.serialization.{Deserializer, Serde, Serializer, _} + +private[serde] object IntSerde extends BaseSerde[Int] { + private val innerSerializer = new IntegerSerializer + private val innerDeserializer = new IntegerDeserializer + + final override def serialize(topic: String, data: Int): Array[Byte] = + innerSerializer.serialize(topic, data) + + final override def deserialize(topic: String, data: Array[Byte]): Int = + innerDeserializer.deserialize(topic, data) + +} + +private[serde] object LongSerde extends BaseSerde[Long] { + private val innerSerializer = new LongSerializer + private val innerDeserializer = new LongDeserializer + + final override def serialize(topic: String, data: Long): Array[Byte] = + innerSerializer.serialize(topic, data) + + final override def deserialize(topic: String, data: Array[Byte]): Long = + innerDeserializer.deserialize(topic, data) + +} + +private[serde] object DoubleSerde extends BaseSerde[Double] { + private val innerSerializer = new DoubleSerializer + private val innerDeserializer = new DoubleDeserializer + + final override def serialize(topic: String, data: Double): Array[Byte] = + innerSerializer.serialize(topic, data) + + final override def deserialize(topic: String, data: Array[Byte]): Double = + innerDeserializer.deserialize(topic, data) + +} + +abstract class BaseSerde[T] extends Serde[T] { + + override def configure(configs: util.Map[String, _], isKey: Boolean): Unit = {} + + override def close(): Unit = {} + + override def serializer = BaseSerializer(serialize) + + override def deserializer = BaseDeserializer(deserialize) + + def serialize(topic: String, data: T): Array[Byte] + + def serializeBytes(data: T): Array[Byte] = serialize("", data) + + def deserialize(topic: String, data: Array[Byte]): T + + def deserializeBytes(data: Array[Byte]): T = deserialize("", data) +} + +abstract class BaseSerializer[T] extends Serializer[T] { + override def configure(configs: util.Map[String, _], isKey: Boolean): Unit = {} + + override def close(): Unit = {} +} + +object BaseSerializer { + def apply[T](func: (String, T) => Array[Byte]): BaseSerializer[T] = new BaseSerializer[T] { + final override def serialize(topic: String, data: T): Array[Byte] = { + if (data == null) { + null + } else { + func(topic, data) + } + } + } +} + +abstract class BaseDeserializer[T] extends Deserializer[T] { + override def configure(configs: util.Map[String, _], isKey: Boolean): Unit = {} + + override def close(): Unit = {} +} + +object BaseDeserializer { + def apply[T](func: (String, Array[Byte]) => T): BaseDeserializer[T] = new BaseDeserializer[T] { + final override def deserialize(topic: String, data: Array[Byte]): T = { + if (data == null) { + null.asInstanceOf[T] + } else { + func(topic, data) + } + } + } +} diff --git a/kafka/src/main/scala/com/twitter/finatra/kafka/serde/internal/thrift.scala b/kafka/src/main/scala/com/twitter/finatra/kafka/serde/internal/thrift.scala new file mode 100644 index 0000000000..db49e8a24c --- /dev/null +++ b/kafka/src/main/scala/com/twitter/finatra/kafka/serde/internal/thrift.scala @@ -0,0 +1,128 @@ +package com.twitter.finatra.kafka.serde.internal + +import java.util + +import com.twitter.scrooge.{ + CompactThriftSerializer, + ThriftStruct, + ThriftStructCodec, + ThriftStructSerializer +} +import org.apache.kafka.common.serialization.{Deserializer, Serde, Serializer} +import org.apache.thrift.protocol.TBinaryProtocol + +import scala.util.Try + +private[serde] abstract class AbstractScroogeSerDe[T <: ThriftStruct: Manifest] extends Serde[T] { + private[kafka] val thriftStructSerializer: ThriftStructSerializer[T] = { + val clazz = manifest[T].runtimeClass.asInstanceOf[Class[T]] + val codec = constructCodec(clazz) + + constructThriftStructSerializer(clazz, codec) + } + + private val _deserializer = new Deserializer[T] { + override def configure(configs: util.Map[String, _], isKey: Boolean): Unit = {} + + override def close(): Unit = {} + + override def deserialize(topic: String, data: Array[Byte]): T = { + if (data == null) { + null.asInstanceOf[T] + } else { + thriftStructSerializer.fromBytes(data) + } + } + } + + private val _serializer = new Serializer[T] { + override def configure(configs: util.Map[String, _], isKey: Boolean): Unit = {} + + override def serialize(topic: String, data: T): Array[Byte] = { + if (data == null) { + null + } else { + thriftStructSerializer.toBytes(data) + } + } + + override def close(): Unit = {} + } + + /* Public */ + + override def configure(configs: util.Map[String, _], isKey: Boolean): Unit = {} + + override def close(): Unit = {} + + override def deserializer: Deserializer[T] = { + _deserializer + } + + override def serializer: Serializer[T] = { + _serializer + } + + /** + * Subclasses should implement this method and provide a concrete ThriftStructSerializer + */ + protected[this] def constructThriftStructSerializer( + thriftStructClass: Class[T], + thriftStructCodec: ThriftStructCodec[T] + ): ThriftStructSerializer[T] + + /* Public */ + private[this] def constructCodec(thriftStructClass: Class[T]): ThriftStructCodec[T] = + codecForNormal(thriftStructClass) + .orElse(codecForUnion(thriftStructClass)) + .get + + /** + * For unions, we split on $ after the dot. + * this is costly, but only done once per Class + */ + private[this] def codecForUnion(maybeUnion: Class[T]): Try[ThriftStructCodec[T]] = + Try( + getObject( + Class.forName( + maybeUnion.getName.reverse.dropWhile(_ != '$').reverse, + true, + maybeUnion.getClassLoader + ) + ) + ).map(_.asInstanceOf[ThriftStructCodec[T]]) + + private[this] def codecForNormal(thriftStructClass: Class[T]): Try[ThriftStructCodec[T]] = + Try( + getObject( + Class.forName(thriftStructClass.getName + "$", true, thriftStructClass.getClassLoader) + ) + ).map(_.asInstanceOf[ThriftStructCodec[T]]) + + private def getObject(companionClass: Class[_]): AnyRef = + companionClass.getField("MODULE$").get(null) +} + +private[serde] class ThriftSerDe[T <: ThriftStruct: Manifest] extends AbstractScroogeSerDe[T] { + protected[this] override def constructThriftStructSerializer( + thriftStructClass: Class[T], + thriftStructCodec: ThriftStructCodec[T] + ): ThriftStructSerializer[T] = { + new ThriftStructSerializer[T] { + override val protocolFactory = new TBinaryProtocol.Factory + override def codec: ThriftStructCodec[T] = thriftStructCodec + } + } +} + +private[serde] class CompactThriftSerDe[T <: ThriftStruct: Manifest] + extends AbstractScroogeSerDe[T] { + override protected[this] def constructThriftStructSerializer( + thriftStructClass: Class[T], + thriftStructCodec: ThriftStructCodec[T] + ): ThriftStructSerializer[T] = { + new CompactThriftSerializer[T] { + override def codec: ThriftStructCodec[T] = thriftStructCodec + } + } +} diff --git a/kafka/src/main/scala/com/twitter/finatra/kafka/stats/KafkaFinagleMetricsReporter.scala b/kafka/src/main/scala/com/twitter/finatra/kafka/stats/KafkaFinagleMetricsReporter.scala new file mode 100644 index 0000000000..3eb6f4188f --- /dev/null +++ b/kafka/src/main/scala/com/twitter/finatra/kafka/stats/KafkaFinagleMetricsReporter.scala @@ -0,0 +1,205 @@ +package com.twitter.finatra.kafka.stats + +import com.twitter.finagle.stats.{Gauge, LoadedStatsReceiver, StatsReceiver} +import com.twitter.inject.conversions.string._ +import com.twitter.inject.{Injector, Logging} +import java.util +import java.util.regex.Pattern +import org.apache.kafka.common.metrics.{KafkaMetric, MetricsReporter} +import scala.collection.JavaConverters._ +import scala.collection.mutable + +object KafkaFinagleMetricsReporter { + + private[kafka] val IncludeNodeMetrics = "include.node.metrics" + + //Hack to allow tests to use an injected StatsReceiver suitable for assertions + private var globalStatsReceiver: StatsReceiver = LoadedStatsReceiver + + def init(injector: Injector): Unit = { + globalStatsReceiver = injector.instance[StatsReceiver] + } + + def sanitizeMetricName(metricName: String) = { + KafkaFinagleMetricsReporter.notAllowedMetricPattern + .matcher(metricName) + .replaceAll("_") + } + + private val notAllowedMetricPattern = + Pattern.compile("-| -> |: |, |\\(|\\)| |[^\\w\\d]&&[^./]") + + private val rateMetricsToIgnore = Set( + "batch-split-rate", + "buffer-exhausted-rate", + "byte-rate", + "bytes-consumed-rate", + "connection-close-rate", + "connection-creation-rate", + "failed-authentication-rate", + "fetch-rate", + "heartbeat-rate", + "incoming-byte-rate", + "join-rate", + "network-io-rate", + "outgoing-byte-rate", + "record-error-rate", + "record-retry-rate", + "record-send-rate", + "records-consumed-rate", + "request-rate", + "response-rate", + "select-rate", + "sync-rate", + "successful-authentication-rate" + ) +} + +class KafkaFinagleMetricsReporter extends MetricsReporter with Logging { + private var statsReceiver: StatsReceiver = _ + private val gauges: mutable.Map[String, Gauge] = mutable.Map() + private var statsScope: String = "" + private var includeNodeMetrics: Boolean = _ + private var includePartition: Boolean = _ + + /* Public */ + + override def init(metrics: util.List[KafkaMetric]): Unit = { + // Initial testing shows that no metrics appear to be passed into init... + } + + override def configure(configs: util.Map[String, _]): Unit = { + trace("Configure: " + configs.asScala.mkString("\n")) + statsScope = Option(configs.get("stats_scope")).getOrElse("kafka").toString + includeNodeMetrics = Option(configs.get(KafkaFinagleMetricsReporter.IncludeNodeMetrics)) + .getOrElse("false").toString.toBoolean + includePartition = Option(configs.get("includePartition")).getOrElse("true").toString.toBoolean + statsReceiver = KafkaFinagleMetricsReporter.globalStatsReceiver.scope(statsScope.toString) + } + + override def metricRemoval(metric: KafkaMetric): Unit = { + if (shouldIncludeMetric(metric)) { + val combinedName = createAndSanitizeFinagleMetricName(metric) + trace("metricRemoval: " + metric.metricName() + "\t" + combinedName) + + for (removedGauge <- gauges.remove(combinedName)) { + removedGauge.remove() + } + } + } + + override def metricChange(metric: KafkaMetric): Unit = { + if (shouldIncludeMetric(metric)) { + val combinedName = createAndSanitizeFinagleMetricName(metric) + trace("metricChange: " + metric.metricName() + "\t" + combinedName) + + // Ensure prior metrics are removed (although these should be removed in the metricRemoval method + for (removedGauge <- gauges.remove(combinedName)) { + warn( + s"Duplicate metric found. Removing prior gauges for: " + metric + .metricName() + "\t" + combinedName + ) + removedGauge.remove() + } + + val gauge = statsReceiver.addGauge(combinedName) { metricToFloat(metric) } + + gauges.put(combinedName, gauge) + } + } + + override def close(): Unit = { + trace("Closing FinagleMetricsReporter") + gauges.values.foreach(_.remove()) + gauges.clear() + } + + /* Protected */ + + protected def createFinagleMetricName(metric: KafkaMetric): String = { + val allTags = new util.HashMap[String, String]() + allTags.putAll(metric.metricName().tags()) + allTags.putAll(metric.config().tags()) + + val metricName = metric.metricName().name() + val component = + parseComponent(clientId = allTags.remove("client-id"), group = metric.metricName().group) + val nodeId = Option(allTags.remove("node-id")).map("/" + _).getOrElse("") + val topic = Option(allTags.remove("topic")).map("/" + _).getOrElse("") + + createFinagleMetricName(metric, metricName, allTags, component, nodeId, topic) + } + + protected def createFinagleMetricName( + metric: KafkaMetric, + metricName: String, + allTags: java.util.Map[String, String], + component: String, + nodeId: String, + topic: String + ): String = { + val partition = parsePartitionTag(allTags) + val otherTagsStr = createOtherTagsStr(metric, allTags) + + component + topic + partition + otherTagsStr + nodeId + "/" + metricName + } + + protected def createOtherTagsStr( + metric: KafkaMetric, + allTags: util.Map[String, String] + ): String = { + val otherTagsStr = allTags.asScala.mkString("__").toOption.map("/" + _).getOrElse("") + if (otherTagsStr.nonEmpty) { + warn(s"Unexpected metrics tags found: $metric ${metric.metricName()} $otherTagsStr") + } + otherTagsStr + } + + protected def shouldIncludeMetric(metric: KafkaMetric): Boolean = { + val metricName = metric.metricName() + + // remove any metrics that are already "rated" as these not consistent with other metrics: http://go/jira/DINS-2187 + if (KafkaFinagleMetricsReporter.rateMetricsToIgnore(metricName.name())) { + false + } else if (metricName + .name() == "assigned-partitions") { //See: https://issues.apache.org/jira/browse/KAFKA-4950 where an occasional error reading the assigned-partitions stat then leads to the instance hanging and not restarting + false + } else if (metricName.group.contains("node")) { //By default we omit node level metrics which leads to lots of fine grained stats + includeNodeMetrics + } else { + metricName.group() != "kafka-metrics-count" && + metric.metricValue().isInstanceOf[Number] + } + } + + protected def parseComponent(clientId: String, group: String): String = { + clientId + } + + protected def parsePartitionTag(allTags: util.Map[String, String]): String = { + val partitionOpt = Option(allTags.remove("partition")) + if (!includePartition) { + "" + } else { + partitionOpt.map("/" + _).getOrElse("") + } + } + + /* Private */ + + private def createAndSanitizeFinagleMetricName(metric: KafkaMetric): String = { + trace(metric.metricName()) + val finagleMetricName = createFinagleMetricName(metric) + KafkaFinagleMetricsReporter.sanitizeMetricName(finagleMetricName) + } + + //Note: We map Double.NegInfinitiy to Float.MinValue since it would otherwise map to Float.NegInfiniti which doesn't render as a number in /admin/metrics.json + private def metricToFloat(metric: KafkaMetric) = { + metric.metricValue() match { + case number: Number if number.doubleValue().isNegInfinity => Float.MinValue + case number: Number if number.doubleValue().isInfinity => Float.MaxValue + case number: Number => number.floatValue() + case _ => Float.NaN + } + } +} diff --git a/kafka/src/main/scala/com/twitter/finatra/kafka/utils/BootstrapServerUtils.scala b/kafka/src/main/scala/com/twitter/finatra/kafka/utils/BootstrapServerUtils.scala new file mode 100644 index 0000000000..a6de38a8a4 --- /dev/null +++ b/kafka/src/main/scala/com/twitter/finatra/kafka/utils/BootstrapServerUtils.scala @@ -0,0 +1,52 @@ +package com.twitter.finatra.kafka.utils + +import com.twitter.finagle.Addr.{Bound, Failed, Neg, Pending} +import com.twitter.finagle.Address.Inet +import com.twitter.finagle.{Addr, Address, Namer} +import com.twitter.inject.Logging +import com.twitter.util.{Await, Promise, Witness} +import java.net.InetSocketAddress + +object BootstrapServerUtils extends Logging { + + def lookupBootstrapServers(dest: String): String = { + if (!dest.startsWith("/")) { + info(s"Resolved Kafka Dest = $dest") + dest + } else { + info(s"Resolving Kafka Bootstrap Servers: $dest") + val promise = new Promise[Seq[InetSocketAddress]]() + val resolveResult = Namer + .resolve(dest).changes + .register(new Witness[Addr] { + override def notify(note: Addr): Unit = note match { + case Pending => + case Bound(addresses, _) => + val socketAddresses = toAddresses(addresses) + promise.setValue(socketAddresses) + case Failed(t) => + promise + .setException(new IllegalStateException(s"Unable to find addresses for $dest", t)) + case Neg => + promise.setException(new IllegalStateException(s"Unable to bind addresses for $dest")) + } + }) + + val socketAddress = Await.result(promise) + resolveResult.close() + val servers = + socketAddress.take(5).map(a => s"${a.getAddress.getHostAddress}:${a.getPort}").mkString(",") + info(s"Resolved $dest = " + servers) + servers + } + } + + private def toAddresses(addresses: Set[Address]): Seq[InetSocketAddress] = { + addresses.flatMap { + case Inet(addr, _) => Some(addr) + case unknown => + warn(s"Found unknown address type looking up bootstrap servers: $unknown") + None + }.toSeq + } +} diff --git a/kafka/src/main/scala/com/twitter/finatra/kafka/utils/ConfigUtils.scala b/kafka/src/main/scala/com/twitter/finatra/kafka/utils/ConfigUtils.scala new file mode 100644 index 0000000000..96bf8bc88b --- /dev/null +++ b/kafka/src/main/scala/com/twitter/finatra/kafka/utils/ConfigUtils.scala @@ -0,0 +1,11 @@ +package com.twitter.finatra.kafka.utils + +import java.util + +object ConfigUtils { + def getConfigOrElse(configs: util.Map[String, _], key: String, default: String): String = { + Option(configs.get(key)) + .map(_.toString) + .getOrElse(default) + } +} diff --git a/kafka/src/test/resources/BUILD b/kafka/src/test/resources/BUILD new file mode 100644 index 0000000000..9237675c63 --- /dev/null +++ b/kafka/src/test/resources/BUILD @@ -0,0 +1,3 @@ +resources( + sources = globs("*.xml"), +) diff --git a/kafka/src/test/resources/logback-test.xml b/kafka/src/test/resources/logback-test.xml new file mode 100644 index 0000000000..e1ca500127 --- /dev/null +++ b/kafka/src/test/resources/logback-test.xml @@ -0,0 +1,33 @@ + + + + %.-3level %-100logger %msg%n + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/kafka/src/test/scala/BUILD b/kafka/src/test/scala/BUILD new file mode 100644 index 0000000000..084f3a53ce --- /dev/null +++ b/kafka/src/test/scala/BUILD @@ -0,0 +1,74 @@ +scala_library( + name = "test-deps", + sources = globs( + "com/twitter/finatra/kafka/test/*.scala", + "com/twitter/finatra/kafka/test/utils/*.scala", + ), + compiler_option_sets = {"fatal_warnings"}, + provides = scala_artifact( + org = "com.twitter", + name = "finatra-kafka-tests", + repo = artifactory, + ), + strict_deps = False, + dependencies = [ + "3rdparty/jvm/com/google/inject:guice", + "3rdparty/jvm/junit", + "3rdparty/jvm/org/apache/kafka", + "3rdparty/jvm/org/apache/kafka:kafka-clients", + "3rdparty/jvm/org/apache/kafka:kafka-clients-test", + "3rdparty/jvm/org/apache/kafka:kafka-streams-test", + "3rdparty/jvm/org/apache/kafka:kafka-test", + "3rdparty/jvm/org/apache/zookeeper:zookeeper-client", + "3rdparty/jvm/org/scalatest", + "finatra/http/src/test/scala:test-deps", + "finatra/inject/inject-core/src/main/scala", + "finatra/inject/inject-core/src/test/scala:test-deps", + "finatra/inject/inject-server/src/test/scala:test-deps", + "finatra/inject/inject-slf4j/src/main/scala", + "finatra/jackson/src/main/scala", + "finatra/kafka/src/main/scala", + "finatra/kafka/src/test/thrift:thrift-scala", + "util/util-slf4j-api/src/main/scala", + ], + excludes = [ + exclude( + org = "com.twitter", + name = "twitter-server-internal-naming_2.11", + ), + exclude( + org = "com.twitter", + name = "loglens-log4j-logging_2.11", + ), + exclude( + org = "log4j", + name = "log4j", + ), + ], + exports = [ + "3rdparty/jvm/com/google/inject:guice", + "3rdparty/jvm/junit", + "3rdparty/jvm/org/apache/kafka", + "3rdparty/jvm/org/apache/kafka:kafka-clients", + "3rdparty/jvm/org/apache/kafka:kafka-clients-test", + "3rdparty/jvm/org/apache/kafka:kafka-streams-test", + "3rdparty/jvm/org/apache/kafka:kafka-test", + "3rdparty/jvm/org/scalatest", + "finatra/inject/inject-core/src/main/scala", + "finatra/inject/inject-core/src/test/scala:test-deps", + "finatra/inject/inject-server/src/test/scala:test-deps", + "finatra/inject/inject-slf4j/src/main/scala", + "finatra/kafka/src/main/scala", + "util/util-slf4j-api/src/main/scala", + ], +) + +junit_tests( + sources = rglobs("*.scala"), + compiler_option_sets = {"fatal_warnings"}, + strict_deps = False, + dependencies = [ + ":test-deps", + "3rdparty/jvm/org/apache/zookeeper:zookeeper-server", + ], +) diff --git a/kafka/src/test/scala/com/twitter/finatra/kafka/serde/AbstractSerdeTest.scala b/kafka/src/test/scala/com/twitter/finatra/kafka/serde/AbstractSerdeTest.scala new file mode 100644 index 0000000000..e2f010e7f8 --- /dev/null +++ b/kafka/src/test/scala/com/twitter/finatra/kafka/serde/AbstractSerdeTest.scala @@ -0,0 +1,38 @@ +package com.twitter.finatra.kafka.serde + +import com.twitter.inject.Test + +class AbstractSerdeTest extends Test { + + private val bob = Person("Bob", 22) + private val serde = new PersonSerde() + + test("serde") { + val bobBytes = serde.serializer().serialize("topicA", bob) + serde.deserializer().deserialize("topicA", bobBytes) should equal(bob) + + val reusablePerson = Person("", 0) + serde.deserialize(bobBytes, reusablePerson) + reusablePerson should equal(bob) + } + + private case class Person(var name: String, var age: Int) + + private class PersonSerde extends AbstractSerde[Person] with ReusableDeserialize[Person] { + + override def deserialize(bytes: Array[Byte]): Person = { + val personParts = new String(bytes).split(',') + Person(personParts(0), personParts(1).toInt) + } + + override def deserialize(bytes: Array[Byte], reusable: Person): Unit = { + val personParts = new String(bytes).split(',') + reusable.name = personParts(0) + reusable.age = personParts(1).toInt + } + + override def serialize(person: Person): Array[Byte] = { + s"${person.name},${person.age}".getBytes + } + } +} diff --git a/kafka/src/test/scala/com/twitter/finatra/kafka/serde/AbstractThriftDelegateSerdeTest.scala b/kafka/src/test/scala/com/twitter/finatra/kafka/serde/AbstractThriftDelegateSerdeTest.scala new file mode 100644 index 0000000000..17f4e2d588 --- /dev/null +++ b/kafka/src/test/scala/com/twitter/finatra/kafka/serde/AbstractThriftDelegateSerdeTest.scala @@ -0,0 +1,26 @@ +package com.twitter.finatra.kafka.serde + +import com.twitter.finatra.kafka.test.thriftscala.ThriftPerson +import com.twitter.inject.Test + +class AbstractThriftDelegateSerdeTest extends Test { + test("serde") { + val bob = Person("Bob", 22) + val serde = new PersonSerde() + + val bobBytes = serde.serializer().serialize("topicA", bob) + serde.deserializer().deserialize("topicA", bobBytes) should equal(bob) + } + + case class Person(name: String, age: Int) + + class PersonSerde extends AbstractThriftDelegateSerde[Person, ThriftPerson] { + override def toThrift(person: Person): ThriftPerson = { + ThriftPerson(person.name, person.age.toShort) + } + + override def fromThrift(thrift: ThriftPerson): Person = { + Person(thrift.name, thrift.age) + } + } +} diff --git a/kafka/src/test/scala/com/twitter/finatra/kafka/test/EmbeddedKafka.scala b/kafka/src/test/scala/com/twitter/finatra/kafka/test/EmbeddedKafka.scala new file mode 100644 index 0000000000..a7124325ec --- /dev/null +++ b/kafka/src/test/scala/com/twitter/finatra/kafka/test/EmbeddedKafka.scala @@ -0,0 +1,130 @@ +package com.twitter.finatra.kafka.test + +import com.twitter.conversions.StorageUnitOps._ +import com.twitter.conversions.DurationOps._ +import com.twitter.finatra.kafka.modules.KafkaBootstrapModule +import com.twitter.inject.{Logging, Test} +import com.twitter.util.Duration +import java.util.Properties +import kafka.server.KafkaConfig +import org.apache.kafka.common.serialization._ +import org.apache.kafka.common.utils.Bytes +import org.apache.kafka.streams.integration.utils.{EmbeddedKafkaCluster, KafkaEmbedded} +import scala.collection.mutable.ArrayBuffer + +trait EmbeddedKafka extends Test with Logging { + + private val kafkaTopics = ArrayBuffer[KafkaTopic[_, _]]() + + /* Protected */ + + protected def numKafkaBrokers: Int = 1 + + protected def autoCreateTopicsEnable: Boolean = false + + protected def groupInitialRebalanceDelay: Duration = 0.seconds + + protected def maxMessageBytes = 20.megabytes.bytes + + protected val emptyBytes: Bytes = Bytes.wrap(Array.emptyByteArray) + + protected def brokerConfig: Properties = { + val properties = new Properties() + properties.put(KafkaConfig.AutoCreateTopicsEnableProp, autoCreateTopicsEnable.toString) + properties.put( + KafkaConfig.GroupInitialRebalanceDelayMsProp, + groupInitialRebalanceDelay.inMillis.toString + ) + properties.put(KafkaConfig.NumIoThreadsProp, "1") + properties.put(KafkaConfig.NumNetworkThreadsProp, "1") + properties.put(KafkaConfig.BackgroundThreadsProp, "1") + properties.put(KafkaConfig.LogCleanerThreadsProp, "1") + properties.put(KafkaConfig.DefaultReplicationFactorProp, "1") + properties.put(KafkaConfig.TransactionsTopicReplicationFactorProp, "1") + properties.put(KafkaConfig.TransactionsTopicMinISRProp, "1") + properties.put(KafkaConfig.OffsetsTopicReplicationFactorProp, "1") + properties.put(KafkaConfig.MinInSyncReplicasProp, "1") + properties.put(KafkaConfig.MessageMaxBytesProp, maxMessageBytes.toString) + properties + } + + protected lazy val kafkaCluster = new EmbeddedKafkaCluster(numKafkaBrokers, brokerConfig) + + protected def brokers: Array[KafkaEmbedded] = { + val brokersField = classOf[EmbeddedKafkaCluster].getDeclaredField("brokers") + brokersField.setAccessible(true) + brokersField.get(kafkaCluster).asInstanceOf[Array[KafkaEmbedded]] + } + + override protected def beforeAll(): Unit = { + kafkaCluster.start() + createKafkaTopics(kafkaTopics) + } + + protected def kafkaBootstrapFlag: Map[String, String] = { + Map(KafkaBootstrapModule.kafkaBootstrapServers.name -> kafkaCluster.bootstrapServers()) + } + + protected def kafkaTopic[K, V]( + keySerde: Serde[K], + valSerde: Serde[V], + name: String, + partitions: Int = 1, + replication: Int = 1, + autoCreate: Boolean = true, + autoConsume: Boolean = true, + logPublishes: Boolean = false, + allowPublishes: Boolean = true + ): KafkaTopic[K, V] = { + val topic = KafkaTopic( + topic = name, + keySerde = keySerde, + valSerde = valSerde, + _kafkaCluster = () => kafkaCluster, + partitions = partitions, + replication = replication, + autoCreate = autoCreate, + autoConsume = autoConsume, + logPublishes = logPublishes, + allowPublishes = allowPublishes + ) + + kafkaTopics += topic + topic + } + + override protected def afterAll(): Unit = { + super.afterAll() + try { + debug("Shutdown kafka topics") + kafkaTopics.foreach(_.close()) + } finally { + debug("Shutdown embedded kafka") + closeEmbeddedKafka() + debug("Embedded kafka closed") + } + } + + //Note: EmbeddedKafkaCluster appears to only be closable through a JUnit ExternalResource + protected def closeEmbeddedKafka() = { + val afterMethod = classOf[EmbeddedKafkaCluster].getDeclaredMethod("after") + afterMethod.setAccessible(true) + afterMethod.invoke(kafkaCluster) + } + + protected def createKafkaServerProperties(): Properties = { + val kafkaServerProperties = new Properties + kafkaServerProperties.put(KafkaConfig.OffsetsTopicReplicationFactorProp, "1") + kafkaServerProperties + } + + private def createKafkaTopics(topics: Seq[KafkaTopic[_, _]]): Unit = { + for (topic <- topics) { + if (topic.autoCreate) { + info("Creating topic " + topic.toPrettyString) + kafkaCluster.createTopic(topic.topic, topic.partitions, topic.replication) + } + topic.init() + } + } +} diff --git a/kafka/src/test/scala/com/twitter/finatra/kafka/test/KafkaFeatureTest.scala b/kafka/src/test/scala/com/twitter/finatra/kafka/test/KafkaFeatureTest.scala new file mode 100644 index 0000000000..b595a78065 --- /dev/null +++ b/kafka/src/test/scala/com/twitter/finatra/kafka/test/KafkaFeatureTest.scala @@ -0,0 +1,31 @@ +package com.twitter.finatra.kafka.test + +import com.twitter.finatra.kafka.test.utils.InMemoryStatsUtil +import com.twitter.inject.Test +import com.twitter.inject.server.EmbeddedTwitterServer + +trait KafkaFeatureTest extends Test with EmbeddedKafka { + + protected def server: EmbeddedTwitterServer + + override def beforeAll(): Unit = { + super.beforeAll() + server.start() + } + + // Note: We close the server connected to kafka before closing the embedded kafka server + override def afterAll(): Unit = { + try { + server.close() + //TODO: Await.result(server.mainResult) + } finally { + super.afterAll() + } + } + + protected lazy val inMemoryStatsUtil = InMemoryStatsUtil(server.injector) + + protected def waitForKafkaMetric(name: String, expected: Float): Unit = { + inMemoryStatsUtil.waitForGauge(name, expected) + } +} diff --git a/kafka/src/test/scala/com/twitter/finatra/kafka/test/KafkaTopic.scala b/kafka/src/test/scala/com/twitter/finatra/kafka/test/KafkaTopic.scala new file mode 100644 index 0000000000..3c3cc4d6ca --- /dev/null +++ b/kafka/src/test/scala/com/twitter/finatra/kafka/test/KafkaTopic.scala @@ -0,0 +1,361 @@ +package com.twitter.finatra.kafka.test + +import com.twitter.finatra.json.JsonDiff +import com.twitter.finatra.kafka.interceptors.PublishTimeProducerInterceptor +import com.twitter.finatra.kafka.test.utils.{PollUtils, ThreadUtils} +import com.twitter.inject.Logging +import com.twitter.inject.conversions.time._ +import com.twitter.util.TimeoutException +import java.util.concurrent.LinkedBlockingQueue +import java.util.{Collections, Properties} +import org.apache.kafka.clients.consumer.{ConsumerConfig, ConsumerRecord, KafkaConsumer} +import org.apache.kafka.clients.producer.{KafkaProducer, ProducerConfig, ProducerRecord} +import org.apache.kafka.common.header.Header +import org.apache.kafka.common.serialization.{ByteArrayDeserializer, ByteArraySerializer, Serde} +import org.apache.kafka.streams.integration.utils.EmbeddedKafkaCluster +import org.joda.time.{DateTime, Duration} +import org.scalatest.Matchers +import org.slf4j.event.Level +import scala.collection.JavaConverters._ +import scala.util.control.NonFatal + +/** + * Used to read/write from Kafka topics created on local brokers during testing. + * + * @param topic the topic to write to + * @param keySerde the serde for the key + * @param valSerde the serde for the value + * @param _kafkaCluster the kafka cluster to use to produce/consume from + * @param partitions the number of partitions for this topic + * @param replication tge replication factor for this topic + * @param autoConsume whether or not to automatically consume messages off this topic(useful for logging) + * @param autoCreate whether or not to automatically create this topic on the brokers + * @param logPublishes whether or not to publish logs + * @param allowPublishes whether or not this topic allows publishes + * @tparam K the type of the key + * @tparam V the type of the value + */ +case class KafkaTopic[K, V]( + topic: String, + keySerde: Serde[K], + valSerde: Serde[V], + _kafkaCluster: () => EmbeddedKafkaCluster, + partitions: Int = 1, + replication: Int = 1, + autoConsume: Boolean = true, //TODO: Rename autoConsume + autoCreate: Boolean = true, + logPublishes: Boolean = true, + allowPublishes: Boolean = true) + extends Logging + with Matchers { + + private val defaultConsumeTimeout = 60.seconds + private lazy val kafkaCluster = _kafkaCluster() + private lazy val producer = new KafkaProducer[Array[Byte], Array[Byte]](producerConfig) + private[twitter] lazy val consumer = new KafkaConsumer[Array[Byte], Array[Byte]](consumerConfig) + private val consumedMessages = new LinkedBlockingQueue[ConsumerRecord[Array[Byte], Array[Byte]]]() + @volatile private var running = false + @volatile private var failure: Throwable = _ + + private val keyDeserializer = keySerde.deserializer() + private val valueDeserializer = valSerde.deserializer() + + /* Public */ + + def init(): Unit = { + running = true + if (autoConsume) { + ThreadUtils.fork { + try { + consumer.subscribe(Collections.singletonList(topic)) + + while (running) { + val consumerRecords = consumer.poll(java.time.Duration.ofMillis(Long.MaxValue)) + for (record <- consumerRecords.iterator().asScala) { + val (key, value) = deserializeKeyValue(record) + debug( + f"@${dateTimeStr(record)}%-24s ${topic + "_" + record.partition}%-80s$key%-50s -> $value" + ) + + consumedMessages.put(record) + } + } + + consumer.close() + } catch { + case NonFatal(e) => + running = false + failure = e + error(s"Error reading KafkaTopic $topic", e) + } + } + } + } + + private def dateTimeStr(consumerRecord: ConsumerRecord[Array[Byte], Array[Byte]]): String = { + val timestamp = consumerRecord.timestamp() + dateTimeStr(timestamp) + } + + private def dateTimeStr(timestamp: Long) = { + if (timestamp == Long.MaxValue) { + "MaxWatermark" + } else { + new DateTime(timestamp).toString + } + } + + def close(): Unit = { + running = false + producer.close() + assert(failure == null, s"There was an error consuming from KafkaTopic $topic " + failure) + } + + def publish( + keyValue: (K, V), + timestamp: Long = System.currentTimeMillis(), + headers: Iterable[Header] = Seq.empty[Header] + ): Unit = { + assert(allowPublishes) + val (key, value) = keyValue + val producerRecord = new ProducerRecord( + topic, + null, + timestamp, + keySerde.serializer().serialize(topic, key), + valSerde.serializer().serialize(topic, value), + headers.asJava + ) + + val sendResult = producer.send(producerRecord).get() + if (logPublishes) { + info( + f"@${dateTimeStr(timestamp)}%-24s ${topic + "_" + sendResult.partition}%-80s$key%-50s -> $value" + ) + } + } + + def publishUnkeyedValue(value: V, timestamp: Long = System.currentTimeMillis()): Unit = { + publish(keyValue = null.asInstanceOf[K] -> value, timestamp = timestamp) + } + + def consumeValue(): V = consumeValue(defaultConsumeTimeout) + + def consumeValue(timeout: Duration = defaultConsumeTimeout): V = consumeValues(1, timeout).head + + def consumeValues(numValues: Int, timeout: Duration = defaultConsumeTimeout): Seq[V] = + consumeMessages(numValues, timeout).map(kv => kv._2) + + def consumeMessage(): (K, V) = consumeMessages(1, defaultConsumeTimeout).head + + def consumeRecord(): ConsumerRecord[K, V] = { + val record = consumeRecords(1).head + + val (k, v) = deserializeKeyValue(record) + + new ConsumerRecord( + record.topic(), + record.partition(), + record.offset(), + record.timestamp(), + record.timestampType(), + record.checksum(), + record.serializedKeySize(), + record.serializedValueSize(), + k, + v, + record.headers() + ) + } + + def consumeMessage(timeout: Duration = defaultConsumeTimeout): (K, V) = { + consumeMessages(1, timeout).head + } + + def consumeMessages(numMessages: Int, timeout: Duration = defaultConsumeTimeout): Seq[(K, V)] = { + assert(failure == null, s"There was an error consuming from KafkaTopic $topic " + failure) + assert(autoConsume) + if (!running) { + init() + } + + val resultBuilder = Seq.newBuilder[(K, V)] + resultBuilder.sizeHint(numMessages) + val endTime = System.currentTimeMillis() + timeout.getMillis + + var messagesRemaining = numMessages + while (messagesRemaining > 0) { + if (System.currentTimeMillis() > endTime) { + throw new TimeoutException(s"Timeout waiting to consume $numMessages messages") + } + + val pollResult = consumedMessages.poll() + if (pollResult != null) { + messagesRemaining -= 1 + trace(s"Poll result w/ messages remaining $messagesRemaining: " + pollResult) + val (key, value) = deserializeKeyValue(pollResult) + + resultBuilder += ((key, value)) + } + + if (messagesRemaining > 0) { + Thread.sleep(5) + } + } + + resultBuilder.result() + } + + def consumeRecords( + numMessages: Int, + timeout: Duration = defaultConsumeTimeout + ): Seq[ConsumerRecord[Array[Byte], Array[Byte]]] = { + assert(failure == null, s"There was an error consuming from KafkaTopic $topic " + failure) + assert(autoConsume) + if (!running) { + init() + } + + val resultBuilder = Seq.newBuilder[ConsumerRecord[Array[Byte], Array[Byte]]] + resultBuilder.sizeHint(numMessages) + val endTime = System.currentTimeMillis() + timeout.getMillis + + var messagesRemaining = numMessages + while (messagesRemaining > 0) { + if (System.currentTimeMillis() > endTime) { + throw new TimeoutException(s"Timeout waiting to consume $numMessages messages") + } + + val pollResult = consumedMessages.poll() + if (pollResult != null) { + messagesRemaining -= 1 + trace(s"Poll result w/ messages remaining $messagesRemaining: " + pollResult) + + resultBuilder += pollResult + } + + if (messagesRemaining > 0) { + Thread.sleep(5) + } + } + + resultBuilder.result() + } + + /** + * Note: This method may consume more messages than the expected number of keys + */ + def consumeAsManyMessagesUntil( + timeout: Duration = defaultConsumeTimeout, + exhaustedTimeoutMessage: => String = "", + exhaustedTriesMessage: => String = "" + )(until: ((K, V)) => Boolean + ): (K, V) = { + try { + PollUtils.poll( + func = consumeMessage(Duration.standardHours(999)), //Note: Set set a high duration here so that we rely on PollUtils to enforce the duration + exhaustedTriesMessage = (_: (K, V)) => exhaustedTriesMessage, + exhaustedTimeoutMessage = exhaustedTimeoutMessage, + timeout = timeout, + sleepDuration = 0.millis + )(until = until) + } catch { + case e: com.twitter.util.TimeoutException => + warn(exhaustedTimeoutMessage) + throw e + } + } + + /** + * Note: This method may consume more messages than the expected number of keys + */ + //TODO: DRY + def consumeAsManyMessagesUntilMap( + expected: Map[K, V], + timeout: Duration = defaultConsumeTimeout, + logLevel: Level = Level.TRACE + ): (K, V) = { + val unSeenKeys = expected.keySet.toBuffer + consumeAsManyMessagesUntil( + timeout, + exhaustedTimeoutMessage = s"UnSeenKeys: $unSeenKeys", + exhaustedTriesMessage = s"UnSeenKeys: $unSeenKeys" + ) { + case (key, value) => + if (expected.get(key).contains(value)) { + unSeenKeys -= key + log(logLevel, s"Match $key $value $expected UnseenKeys $unSeenKeys") + } else { + log(logLevel, s"NoMatch $key $value $expected UnseenKeys $unSeenKeys") + } + unSeenKeys.isEmpty + } + } + + def consumeExpectedMap(expected: Map[K, V], timeout: Duration = defaultConsumeTimeout): Unit = { + val receivedMap = consumeMessages(expected.size, timeout).toMap + if (receivedMap != expected) { + JsonDiff.jsonDiff(receivedMap, expected) + } + } + + def toPrettyString: String = { + s"$topic\tPartitions: $partitions Replication: $replication" + } + + def clearConsumedMessages(): Unit = consumedMessages.clear() + + def numConsumedMessages: Int = consumedMessages.size + + /* Private */ + + private lazy val producerConfig = { + val config = new Properties + config.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaCluster.bootstrapServers) + config.put(ProducerConfig.ACKS_CONFIG, "all") + config.put(ProducerConfig.RETRIES_CONFIG, Integer.valueOf(0)) + config.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, classOf[ByteArraySerializer]) + config.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, classOf[ByteArraySerializer]) + config.put( + ProducerConfig.INTERCEPTOR_CLASSES_CONFIG, + classOf[PublishTimeProducerInterceptor].getName + ) + config + } + + private lazy val consumerConfig = { + val consumerConfig = new Properties + consumerConfig.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaCluster.bootstrapServers()) + consumerConfig.put(ConsumerConfig.GROUP_ID_CONFIG, "kafka-tester-consumer") + consumerConfig.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest") + consumerConfig.put(ConsumerConfig.METADATA_MAX_AGE_CONFIG, "5000") + consumerConfig.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, classOf[ByteArrayDeserializer]) + consumerConfig.put( + ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, + classOf[ByteArrayDeserializer] + ) + consumerConfig + } + + private def log(level: Level, msg: String): Unit = { + level match { + case Level.ERROR => error(msg) + case Level.WARN => warn(msg) + case Level.INFO => info(msg) + case Level.DEBUG => debug(msg) + case Level.TRACE => trace(msg) + } + } + + private def deserializeKeyValue(record: ConsumerRecord[Array[Byte], Array[Byte]]): (K, V) = { + val key = keyDeserializer.deserialize(topic, record.key()) + val recordValue = record.value() + val value: V = if (recordValue == null) { + null.asInstanceOf[V] + } else { + valueDeserializer.deserialize(topic, record.value()) + } + + (key, value) + } +} diff --git a/kafka/src/test/scala/com/twitter/finatra/kafka/test/integration/FinagleKafkaProducerIntegrationTest.scala b/kafka/src/test/scala/com/twitter/finatra/kafka/test/integration/FinagleKafkaProducerIntegrationTest.scala new file mode 100644 index 0000000000..d3aa2bddb0 --- /dev/null +++ b/kafka/src/test/scala/com/twitter/finatra/kafka/test/integration/FinagleKafkaProducerIntegrationTest.scala @@ -0,0 +1,58 @@ +package com.twitter.finatra.kafka.test.integration + +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finatra.kafka.domain.AckMode +import com.twitter.finatra.kafka.producers.FinagleKafkaProducerBuilder +import com.twitter.finatra.kafka.stats.KafkaFinagleMetricsReporter +import com.twitter.finatra.kafka.test.EmbeddedKafka +import com.twitter.finatra.kafka.test.utils.InMemoryStatsUtil +import com.twitter.inject.app.TestInjector +import com.twitter.inject.modules.InMemoryStatsReceiverModule +import com.twitter.util.Await +import org.apache.kafka.common.serialization.Serdes + +class FinagleKafkaProducerIntegrationTest extends EmbeddedKafka { + + private val testTopic = kafkaTopic(Serdes.String, Serdes.String, "test-topic") + + test("success then failure publish") { + val injector = TestInjector(InMemoryStatsReceiverModule).create + KafkaFinagleMetricsReporter.init(injector) + + val producer = FinagleKafkaProducerBuilder() + .dest(brokers.map(_.brokerList()).mkString(",")) + .statsReceiver(injector.instance[StatsReceiver]) + .clientId("test-producer") + .ackMode(AckMode.ALL) + .keySerializer(Serdes.String.serializer) + .valueSerializer(Serdes.String.serializer) + .build() + + Await.result(producer.send("test-topic", "Foo", "Bar", System.currentTimeMillis)) + + val statsUtils = InMemoryStatsUtil(injector) + statsUtils.printStats() + + statsUtils.assertGauge("kafka/test_producer/record_send_total", 1) + val onSendLag = statsUtils.getStat("kafka/test_producer/record_timestamp_on_send_lag") + assert(onSendLag.size == 1) + assert(onSendLag.head >= 0) + + val onSuccessLag = statsUtils.getStat("kafka/test_producer/record_timestamp_on_success_lag") + assert(onSuccessLag.size == 1) + assert(onSuccessLag.head >= onSendLag.head) + + /* Stop the brokers so that the next publish attempt results in publish error */ + closeEmbeddedKafka() + + intercept[org.apache.kafka.common.errors.TimeoutException] { + Await.result(producer.send("test-topic", "Hello", "World", System.currentTimeMillis)) + } + + statsUtils.printStats() + statsUtils.assertGauge("kafka/test_producer/record_error_total", 1) + val onFailureLag = statsUtils.getStat("kafka/test_producer/record_timestamp_on_failure_lag") + assert(onFailureLag.size == 1) + assert(onFailureLag.head >= 0) + } +} diff --git a/kafka/src/test/scala/com/twitter/finatra/kafka/test/utils/InMemoryStatsUtil.scala b/kafka/src/test/scala/com/twitter/finatra/kafka/test/utils/InMemoryStatsUtil.scala new file mode 100644 index 0000000000..a553ad0878 --- /dev/null +++ b/kafka/src/test/scala/com/twitter/finatra/kafka/test/utils/InMemoryStatsUtil.scala @@ -0,0 +1,150 @@ +package com.twitter.finatra.kafka.test.utils + +import com.twitter.finagle.stats.{InMemoryStatsReceiver, StatsReceiver} +import com.twitter.inject.conversions.map._ +import com.twitter.inject.{Injector, Logging} +import org.scalatest.Matchers + +object InMemoryStatsUtil { + def apply(injector: Injector): InMemoryStatsUtil = { + val inMemoryStatsReceiver = injector.instance[StatsReceiver].asInstanceOf[InMemoryStatsReceiver] + new InMemoryStatsUtil(inMemoryStatsReceiver) + } +} + +class InMemoryStatsUtil(val inMemoryStatsReceiver: InMemoryStatsReceiver) + extends Logging + with Matchers { + + def statsMap: Map[String, Seq[Float]] = inMemoryStatsReceiver.stats.toMap.mapKeys(keyStr) + + def countersMap: Map[String, Long] = inMemoryStatsReceiver.counters.iterator.toMap.mapKeys(keyStr) + + def gaugeMap: Map[String, () => Float] = + inMemoryStatsReceiver.gauges.iterator.toMap.mapKeys(keyStr) + + def metricNames: Set[String] = + statsMap.keySet ++ countersMap.keySet ++ gaugeMap.keySet + + def getCounter(name: String): Long = { + getOptionalCounter(name) getOrElse { + printStats() + throw new Exception(name + " not found") + } + } + + def getOptionalCounter(name: String): Option[Long] = { + countersMap.get(name) + } + + def assertCounter(name: String, expected: Long): Unit = { + val value = getCounter(name) + if (value != expected) { + printStats() + } + value should equal(expected) + } + + def assertCounter(name: String)(callback: Long => Boolean): Unit = { + callback(getCounter(name)) should be(true) + } + + def getStat(name: String): Seq[Float] = { + statsMap.getOrElse(name, throw new Exception(name + " not found")) + } + + def assertStat(name: String, expected: Seq[Float]): Unit = { + val value = getStat(name) + if (value != expected) { + printStats() + } + value should equal(expected) + } + + def getGauge(name: String): Float = { + getOptionalGauge(name) getOrElse (throw new Exception(name + " not found")) + } + + def getOptionalGauge(name: String): Option[Float] = { + gaugeMap.get(name) map { _.apply() } + } + + def assertGauge(name: String, expected: Float): Unit = { + val value = getGauge(name) + if (value != expected) { + printStats() + } + assert(value == expected) + } + + def printStats(): Unit = { + info(" Stats") + for ((key, values) <- statsMap.toSortedMap) { + val avg = values.sum / values.size + val valuesStr = values.mkString("[", ", ", "]") + info(f"$key%-70s = $avg = $valuesStr") + } + + info("\nCounters:") + for ((key, value) <- countersMap.toSortedMap) { + info(f"$key%-70s = $value") + } + + info("\nGauges:") + for ((key, value) <- gaugeMap.toSortedMap) { + info(f"$key%-70s = ${value()}") + } + } + + def waitForGauge(name: String, expected: Float): Unit = { + waitForGaugeUntil(name, _ == expected) + } + + def waitForGaugeUntil(name: String, predicate: Float => Boolean): Unit = { + PollUtils + .poll[Option[Float]](func = getOptionalGauge(name), exhaustedTriesMessage = result => { + printStats() + s"Gauge $name $result did not satisfy predicate" + })(until = { result => + result.nonEmpty && predicate(result.get) + }).get + } + + /** + * Wait for a counter's value to equal the expected + * @param name Counter name + * @param expected Expected value of counter + * @param failIfActualOverExpected Enable to have test fail fast once the current value is greater than the expected value + * @return + */ + def waitForCounter( + name: String, + expected: Long, + failIfActualOverExpected: Boolean = true + ): Unit = { + PollUtils + .poll[Option[Long]](func = getOptionalCounter(name), exhaustedTriesMessage = result => { + printStats() + s"Counter $name $result != $expected" + })(until = { result => + if (failIfActualOverExpected && result.isDefined) { + assert(result.get <= expected, "Actual counter value is greater than the expected value") + } + result.getOrElse(0) == expected + }) + } + + def waitForCounterUntil(name: String, predicate: Long => Boolean): Long = { + PollUtils + .poll[Option[Long]](func = getOptionalCounter(name), exhaustedTriesMessage = result => { + printStats() + s"Counter $name $result did not satisfy predicate" + })(until = { result => + result.nonEmpty && predicate(result.get) + }).get + } + + private def keyStr(keys: Seq[String]): String = { + keys.mkString("/") + } +} diff --git a/kafka/src/test/scala/com/twitter/finatra/kafka/test/utils/PollUtils.scala b/kafka/src/test/scala/com/twitter/finatra/kafka/test/utils/PollUtils.scala new file mode 100644 index 0000000000..40d2635bed --- /dev/null +++ b/kafka/src/test/scala/com/twitter/finatra/kafka/test/utils/PollUtils.scala @@ -0,0 +1,48 @@ +package com.twitter.finatra.kafka.test.utils + +import com.twitter.inject.Logging +import com.twitter.inject.conversions.time._ +import org.joda.time.Duration + +object PollUtils extends Logging { + def poll[T]( + func: => T, + sleepDuration: Duration = 100.millis, + timeout: Duration = 60.seconds, + @deprecated("Use timeout") maxTries: Int = -1, + pollMessage: String = "", + exhaustedTimeoutMessage: => String = "", + exhaustedTriesMessage: (T => String) = (_: T) => "" + )(until: T => Boolean + ): T = { + var tries = 0 + var funcResult: T = func + + val timeoutToUse = if (maxTries == -1) { + timeout + } else { + maxTries * sleepDuration + } + + val endTime = System.currentTimeMillis + timeoutToUse.millis + + while (!until(funcResult)) { + tries += 1 + + if (System.currentTimeMillis() > endTime) { + throw new Exception( + s"Poll exceeded totalDuration $timeoutToUse: ${exhaustedTriesMessage(funcResult)}" + ) + } + + if (pollMessage.nonEmpty) { + info(pollMessage) + } + + Thread.sleep(sleepDuration.millis) + funcResult = func + } + + funcResult + } +} diff --git a/kafka/src/test/scala/com/twitter/finatra/kafka/test/utils/ThreadUtils.scala b/kafka/src/test/scala/com/twitter/finatra/kafka/test/utils/ThreadUtils.scala new file mode 100644 index 0000000000..7a5ccafe24 --- /dev/null +++ b/kafka/src/test/scala/com/twitter/finatra/kafka/test/utils/ThreadUtils.scala @@ -0,0 +1,13 @@ +package com.twitter.finatra.kafka.test.utils + +object ThreadUtils { + + def fork(func: => Unit): Unit = { + new Thread { + override def run(): Unit = { + func + } + }.start() + } + +} diff --git a/kafka/src/test/thrift/BUILD b/kafka/src/test/thrift/BUILD new file mode 100644 index 0000000000..123f149cf3 --- /dev/null +++ b/kafka/src/test/thrift/BUILD @@ -0,0 +1,12 @@ +create_thrift_libraries( + base_name = "thrift", + sources = rglobs("*.thrift"), + dependency_roots = [ + ], + generate_languages = [ + "java", + "scala", + ], + provides_java_name = "testperson-thrift-java", + provides_scala_name = "testperson-thrift-scala", +) diff --git a/kafka/src/test/thrift/person.thrift b/kafka/src/test/thrift/person.thrift new file mode 100644 index 0000000000..803737e52b --- /dev/null +++ b/kafka/src/test/thrift/person.thrift @@ -0,0 +1,7 @@ +namespace java com.twitter.finatra.kafka.test.thrift +#@namespace scala com.twitter.finatra.kafka.test.thriftscala + +struct ThriftPerson { + 1: string name + 2: i16 age +} diff --git a/project/plugins.sbt b/project/plugins.sbt index 562e89ff7e..5c8c1abfae 100644 --- a/project/plugins.sbt +++ b/project/plugins.sbt @@ -3,7 +3,7 @@ resolvers ++= Seq( Resolver.sonatypeRepo("snapshots") ) -val releaseVersion = "18.12.0" +val releaseVersion = "19.1.0" addSbtPlugin("com.twitter" % "scrooge-sbt-plugin" % releaseVersion) diff --git a/thrift/src/main/scala/com/twitter/finatra/thrift/Controller.scala b/thrift/src/main/scala/com/twitter/finatra/thrift/Controller.scala index 4a40ae5abd..a13a07b59d 100644 --- a/thrift/src/main/scala/com/twitter/finatra/thrift/Controller.scala +++ b/thrift/src/main/scala/com/twitter/finatra/thrift/Controller.scala @@ -1,29 +1,126 @@ package com.twitter.finatra.thrift -import com.twitter.finagle.Service -import com.twitter.finagle.thrift.ToThriftService +import com.twitter.finagle.{Filter, Service} +import com.twitter.finagle.thrift.{GeneratedThriftService, ToThriftService} import com.twitter.finatra.thrift.internal.ThriftMethodService import com.twitter.inject.Logging -import com.twitter.scrooge.ThriftMethod +import com.twitter.scrooge.{Request, Response, ThriftMethod} +import com.twitter.util.Future import scala.collection.mutable.ListBuffer -trait Controller extends Logging { self: ToThriftService => - private[thrift] val methods = new ListBuffer[ThriftMethodService[_, _]] - - protected def handle[Args, Success]( - method: ThriftMethod - )( - f: method.FunctionType - )( - implicit argsEv: =:=[Args, method.Args], - successEv: =:=[Success, method.SuccessType], - serviceFnEv: =:=[method.ServiceIfaceServiceType, Service[Args, Success]] - ): ThriftMethodService[Args, Success] = { - val service: method.ServiceIfaceServiceType = method.toServiceIfaceService(f) - val thriftMethodService = - new ThriftMethodService[Args, Success](method, service) - - methods += thriftMethodService - thriftMethodService +private[thrift] object Controller { + case class ConfiguredMethod( + method: ThriftMethod, + filters: Filter.TypeAgnostic, + impl: ScroogeServiceImpl + ) + + sealed trait Config + + @deprecated("Construct controllers with a GeneratedThriftService", "2018-12-20") + class LegacyConfig extends Config { + val methods = new ListBuffer[ThriftMethodService[_, _]] + } + + class ControllerConfig(val gen: GeneratedThriftService) extends Config { + val methods = new ListBuffer[ConfiguredMethod] + + def isValid: Boolean = { + val expected = gen.methods + methods.size == expected.size && methods.map(_.method).toSet == expected + } } } + +abstract class Controller private (val config: Controller.Config) extends Logging { self => + import Controller._ + + @deprecated("Construct controllers with a GeneratedThriftService", "2018-12-20") + def this() { + this(new Controller.LegacyConfig) + assert(self.isInstanceOf[ToThriftService], "Legacy controllers must extend a service iface") + } + + def this(gen: GeneratedThriftService) { + this(new Controller.ControllerConfig(gen)) + assert(!self.isInstanceOf[ToThriftService], "Controllers should no longer extend ToThriftSerivce") + } + + /** + * The MethodDSL child class is responsible for capturing the state of the applied filter chains + * and implementation. + */ + class MethodDSL[M <: ThriftMethod] (val m: M, chain: Filter.TypeAgnostic) { + + private[this] def nonLegacy[T](f: ControllerConfig => T): T = config match { + case cc: ControllerConfig => f(cc) + case _: LegacyConfig => throw new IllegalStateException("Legacy controllers cannot use method DSLs") + } + + /** + * Add a filter to the implementation + */ + def filtered(f: Filter.TypeAgnostic): MethodDSL[M] = nonLegacy { _ => + new MethodDSL[M](m, chain.andThen(f)) + } + + /** + * Provide an implementation for this method in the form of a [[com.twitter.finagle.Service]] + * + * @param svc the service to use as an implementation + */ + def withService(svc: Service[Request[M#Args], Response[M#SuccessType]]): Unit = nonLegacy { cc => + cc.methods += ConfiguredMethod(m, chain, svc.asInstanceOf[ScroogeServiceImpl]) + } + + /** + * Provide an implementation for this method in the form of a function of + * Request => Future[Response] + * + * @param fn the function to use + */ + def withFn(fn: Request[M#Args] => Future[Response[M#SuccessType]]): Unit = nonLegacy { cc => + withService(Service.mk(fn)) + } + + /** + * Provide an implementation for this method in the form a function of Args => Future[SuccessType] + * This exists for legacy compatibility reasons. Users should instead use Request/Response + * based functionality. + * + * @param f the implementation + * @return a ThriftMethodService, which is used in legacy controller configurations + */ + @deprecated("Use Request/Response based functionality", "2018-12-20") + def apply(f: M#Args => Future[M#SuccessType]): ThriftMethodService[M#Args, M#SuccessType] = { + config match { + case _: ControllerConfig => + withService(Service.mk { req: Request[M#Args] => + f(req.args).map(Response[M#SuccessType]) + }) + + // This exists to match return types with the legacy methods of creating a controller. + // The service created here should never be invoked. + new ThriftMethodService[M#Args, M#SuccessType](m, Service.mk { _ => + throw new RuntimeException("Legacy shim service invoked") + }) + + case lc: LegacyConfig => + val thriftMethodService = new ThriftMethodService[M#Args, M#SuccessType](m, Service.mk(f)) + lc.methods += thriftMethodService + thriftMethodService + + } + } + } + + /** + * Have the controller handle a thrift method with optionally applied filters and an + * implementation. All thrift methods that a ThriftSerivce handles must be registered using + * this method to properly construct a Controller. + * + * @param m The thrift method to handle. + */ + protected def handle[M <: ThriftMethod](m: M) = new MethodDSL[M](m, Filter.TypeAgnostic.Identity) +} + diff --git a/thrift/src/main/scala/com/twitter/finatra/thrift/filters/DarkTrafficFilter.scala b/thrift/src/main/scala/com/twitter/finatra/thrift/filters/DarkTrafficFilter.scala index 4396bccee2..82b09849a8 100644 --- a/thrift/src/main/scala/com/twitter/finatra/thrift/filters/DarkTrafficFilter.scala +++ b/thrift/src/main/scala/com/twitter/finatra/thrift/filters/DarkTrafficFilter.scala @@ -63,25 +63,35 @@ class DarkTrafficFilter[ServiceIface: ClassTag]( darkServiceIface: ServiceIface, override protected val enableSampling: Any => Boolean, forwardAfterService: Boolean, - override val statsReceiver: StatsReceiver + override val statsReceiver: StatsReceiver, + lookupByMethod: Boolean = false ) extends BaseDarkTrafficFilter(forwardAfterService, statsReceiver) { private val serviceIfaceClass = implicitly[ClassTag[ServiceIface]].runtimeClass + private def getService(methodName: String) = { + if (lookupByMethod) { + val field = serviceIfaceClass.getDeclaredMethod(methodName) + field.invoke(this.darkServiceIface) + } else { + val field = serviceIfaceClass.getDeclaredField(methodName) + field.setAccessible(true) + field.get(this.darkServiceIface) + } + } + /** * The [[com.twitter.finagle.Filter.TypeAgnostic]] filter chain works on a Service[T, Rep]. * The method name is extracted from the local context. * @param request - the request to send to dark service * @tparam T - the request type - * @tparam U - the response type param of the service call. + * @tparam Rep - the response type param of the service call. * @return a [[com.twitter.util.Future]] over the Rep type. */ protected def invokeDarkService[T, Rep](request: T): Future[Rep] = { MethodMetadata.current match { case Some(mm) => - val field = serviceIfaceClass.getDeclaredField(mm.methodName) - field.setAccessible(true) - val service = field.get(this.darkServiceIface).asInstanceOf[Service[T, Rep]] + val service = getService(mm.methodName).asInstanceOf[Service[T, Rep]] service(request) case None => val t = new IllegalStateException("DarkTrafficFilter invoked without method data") @@ -100,14 +110,10 @@ class DarkTrafficFilter[ServiceIface: ClassTag]( * @param darkService Service to which to send requests. Expected to be * `Service[ThriftClientRequest, Array[Byte]]` which is the return * from `ThriftMux.newService`. - * @param enableSampling if function returns true, the request will be forwarded. + * @param enableSamplingFn if function returns true, the request will be forwarded. * @param forwardAfterService forward the request after the initial service has processed the request * @param statsReceiver keeps stats for requests forwarded, skipped and failed. * - * @tparam T this Filter's request type, which is expected to be Array[Byte] at runtime. - * @tparam U this Filter's and the dark service's response type, which is expected to both - * be Array[Byte] at runtime. - * * @see [[com.twitter.finagle.ThriftMux.newService]] * @see [[com.twitter.finagle.exp.AbstractDarkTrafficFilter]] */ diff --git a/thrift/src/main/scala/com/twitter/finatra/thrift/internal/ThriftMethodService.scala b/thrift/src/main/scala/com/twitter/finatra/thrift/internal/ThriftMethodService.scala index bc636c284d..83a3108c2c 100644 --- a/thrift/src/main/scala/com/twitter/finatra/thrift/internal/ThriftMethodService.scala +++ b/thrift/src/main/scala/com/twitter/finatra/thrift/internal/ThriftMethodService.scala @@ -4,18 +4,18 @@ import com.twitter.finagle.{Filter, Service} import com.twitter.scrooge.ThriftMethod import com.twitter.util.Future -class ThriftMethodService[Args, Result]( +private[thrift] class ThriftMethodService[Args, Result]( val method: ThriftMethod, val service: Service[Args, Result] ) extends Service[Args, Result] { private[this] var filter: Filter[Args, Result, Args, Result] = Filter.identity - def name: String = method.name + private[finatra] def name: String = method.name override def apply(request: Args): Future[Result] = filter.andThen(service)(request) - def setFilter(f: Filter.TypeAgnostic): Unit = + private[finatra] def setFilter(f: Filter.TypeAgnostic): Unit = filter = filter.andThen(f.toFilter[Args, Result]) } diff --git a/thrift/src/main/scala/com/twitter/finatra/thrift/internal/routing/Registrar.scala b/thrift/src/main/scala/com/twitter/finatra/thrift/internal/routing/Registrar.scala index e89f0ae135..956f8dad0f 100644 --- a/thrift/src/main/scala/com/twitter/finatra/thrift/internal/routing/Registrar.scala +++ b/thrift/src/main/scala/com/twitter/finatra/thrift/internal/routing/Registrar.scala @@ -1,5 +1,6 @@ package com.twitter.finatra.thrift.internal.routing +import com.twitter.finagle.Filter import com.twitter.inject.internal.LibraryRegistry import com.twitter.scrooge.ThriftMethod import java.lang.reflect.Method @@ -7,7 +8,7 @@ import java.lang.reflect.Method /** Performs registration of Thrift domain entities in a LibraryRegistry */ private[thrift] class Registrar(registry: LibraryRegistry) { - def register(clazz: Class[_], method: ThriftMethod): Unit = { + def register(clazz: Class[_], method: ThriftMethod, filters: Filter.TypeAgnostic): Unit = { registry.put( Seq(method.name, "service_name"), method.serviceName @@ -28,9 +29,16 @@ private[thrift] class Registrar(registry: LibraryRegistry) { method.annotations.map { case (k, v) => s"$k = $v" }.mkString(",") ) } + + if (filters ne Filter.TypeAgnostic.Identity) { + registry.put( + Seq(method.name, "filters"), + filters.toString + ) + } } - def register(serviceName: String, clazz: Class[_], method: Method): Unit = { + def registerJavaMethod(serviceName: String, clazz: Class[_], method: Method): Unit = { registry.put(Seq(method.getName, "service_name"), serviceName) registry.put(Seq(method.getName, "class"), clazz.getName) if (method.getParameterTypes.nonEmpty) { diff --git a/thrift/src/main/scala/com/twitter/finatra/thrift/modules/darktrafficmodules.scala b/thrift/src/main/scala/com/twitter/finatra/thrift/modules/darktrafficmodules.scala index 0b54692a7d..d1a5ecbf90 100644 --- a/thrift/src/main/scala/com/twitter/finatra/thrift/modules/darktrafficmodules.scala +++ b/thrift/src/main/scala/com/twitter/finatra/thrift/modules/darktrafficmodules.scala @@ -4,7 +4,7 @@ import com.google.inject.Provides import com.twitter.app.Flag import com.twitter.finagle.{Filter, ThriftMux} import com.twitter.finagle.stats.StatsReceiver -import com.twitter.finagle.thrift.service.Filterable +import com.twitter.finagle.thrift.service.{Filterable, ReqRepServicePerEndpointBuilder} import com.twitter.finagle.thrift.{ClientId, ServiceIfaceBuilder} import com.twitter.finatra.annotations.DarkTrafficFilterType import com.twitter.finatra.thrift.filters.{DarkTrafficFilter, JavaDarkTrafficFilter} @@ -116,7 +116,37 @@ abstract class DarkTrafficFilterModule[ServiceIface <: Filterable[ServiceIface]: client.newServiceIface[ServiceIface](dest, label), enableSampling(injector), forwardAfterService, - stats + stats, + lookupByMethod = false + ) + } +} + +abstract class ReqRepDarkTrafficFilterModule[MethodIface <: Filterable[MethodIface]: ClassTag]( + implicit serviceBuilder: ReqRepServicePerEndpointBuilder[MethodIface] +) extends AbstractDarkTrafficFilterModule { + + /** + * Function to determine if the request should be "sampled", e.g. + * sent to the dark service. + * + * @param injector the [[com.twitter.inject.Injector]] for use in determining if a given request + * should be forwarded or not. + */ + protected def enableSampling(injector: Injector): Any => Boolean + + protected def newFilter( + dest: String, + client: ThriftMux.Client, + injector: Injector, + stats: StatsReceiver + ): DarkTrafficFilter[MethodIface] = { + new DarkTrafficFilter[MethodIface]( + client.servicePerEndpoint[MethodIface](dest, label), + enableSampling(injector), + forwardAfterService, + stats, + lookupByMethod = true ) } } diff --git a/thrift/src/main/scala/com/twitter/finatra/thrift/package.scala b/thrift/src/main/scala/com/twitter/finatra/thrift/package.scala new file mode 100644 index 0000000000..5eb1454b78 --- /dev/null +++ b/thrift/src/main/scala/com/twitter/finatra/thrift/package.scala @@ -0,0 +1,8 @@ +package com.twitter.finatra + +import com.twitter.finagle.Service +import com.twitter.scrooge.{Request, Response} + +package object thrift { + private[thrift] type ScroogeServiceImpl = Service[Request[_], Response[_]] +} diff --git a/thrift/src/main/scala/com/twitter/finatra/thrift/routing/ThriftWarmup.scala b/thrift/src/main/scala/com/twitter/finatra/thrift/routing/ThriftWarmup.scala index b5ed2add2e..162c454514 100644 --- a/thrift/src/main/scala/com/twitter/finatra/thrift/routing/ThriftWarmup.scala +++ b/thrift/src/main/scala/com/twitter/finatra/thrift/routing/ThriftWarmup.scala @@ -1,10 +1,9 @@ package com.twitter.finatra.thrift.routing -import com.twitter.finatra.thrift.internal.ThriftMethodService import com.twitter.inject.Logging import com.twitter.inject.thrift.utils.ThriftMethodUtils._ -import com.twitter.scrooge.ThriftMethod -import com.twitter.util.{Await, Future, Try} +import com.twitter.scrooge.{Request, Response, ThriftMethod} +import com.twitter.util.{Await, Try} import javax.inject.Inject private object ThriftWarmup { @@ -33,38 +32,48 @@ class ThriftWarmup @Inject()( * @param method - [[com.twitter.scrooge.ThriftMethod]] to send request through * @param args - [[com.twitter.scrooge.ThriftMethod]].Args to send * @param times - number of times to send the request - * @param responseCallback - callback called for every response where assertions can be made. NOTE: be aware that - * failed assertions that throws Exceptions could prevent a server from restarting. This is - * generally when dependent services are unresponsive causing the warm-up request(s) to fail. - * As such, you should wrap your warm-up calls in these situations in a try/catch {}. + * @param responseCallback - callback called for every response where assertions can be made. * @tparam M - type of the [[com.twitter.scrooge.ThriftMethod]] + * @note be aware that in the response callback, failed assertions that throw Exceptions could + * prevent a server from restarting. This is generally when dependent services are + * unresponsive causing the warm-up request(s) to fail. As such, you should wrap your + * warm-up calls in these situations in a try/catch {}. */ + @deprecated("Use Request/Response based functionality", "2018-12-20") def send[M <: ThriftMethod](method: M, args: M#Args, times: Int = 1)( responseCallback: Try[M#SuccessType] => Unit = unitFunction ): Unit = { + if (!router.isConfigured) throw new IllegalStateException("Thrift warmup requires a properly configured router") + sendRequest(method, Request[M#Args](args), times) { response => + responseCallback(response.map(_.value)) + } + } + /** + * Send a request to warmup services that are not yet externally receiving traffic. + * + * @param method - [[com.twitter.scrooge.ThriftMethod]] to send request through + * @param req - [[com.twitter.scrooge.Request]] to send + * @param times - number of times to send the request + * @param responseCallback - callback called for every response where assertions can be made. + * @tparam M - type of the [[com.twitter.scrooge.ThriftMethod]] + * @note be aware that in the response callback, failed assertions that throw Exceptions could + * prevent a server from restarting. This is generally when dependent services are + * unresponsive causing the warm-up request(s) to fail. As such, you should wrap your + * warm-up calls in these situations in a try/catch {}. + */ + def sendRequest[M <: ThriftMethod](method: M, req: Request[M#Args], times: Int = 1)( + responseCallback: Try[Response[M#SuccessType]] => Unit = unitFunction + ): Unit = { + if (!router.isConfigured) throw new IllegalStateException("Thrift warmup requires a properly configured router") + val service = router.routeWarmup(method) for (_ <- 1 to times) { time(s"Warmup ${prettyStr(method)} completed in %sms.") { - val response = executeRequest(method, args) - responseCallback(response) + responseCallback(Await.result(service(req).liftToTry)) } } } @deprecated("This is now a no-op.", "2018-03-20") def close(): Unit = {} - - /* Private */ - - private def executeRequest[M <: ThriftMethod](method: M, args: M#Args): Try[M#SuccessType] = { - Try(Await.result(routeRequest(method, args))) - } - - private def routeRequest[M <: ThriftMethod](method: M, args: M#Args): Future[M#SuccessType] = { - val service = - router - .thriftMethodService(method) - .asInstanceOf[ThriftMethodService[M#Args, M#SuccessType]] - service(args) - } } diff --git a/thrift/src/main/scala/com/twitter/finatra/thrift/routing/routers.scala b/thrift/src/main/scala/com/twitter/finatra/thrift/routing/routers.scala index 22cc622b11..d56d7c9780 100644 --- a/thrift/src/main/scala/com/twitter/finatra/thrift/routing/routers.scala +++ b/thrift/src/main/scala/com/twitter/finatra/thrift/routing/routers.scala @@ -5,24 +5,26 @@ import com.twitter.finagle.thrift.{RichServerParam, ThriftService, ToThriftServi import com.twitter.finagle.{Filter, Service, Thrift, ThriftMux} import com.twitter.finatra.thrift._ import com.twitter.finatra.thrift.exceptions.{ExceptionManager, ExceptionMapper} -import com.twitter.finatra.thrift.internal.ThriftMethodService import com.twitter.finatra.thrift.internal.routing.{NullThriftService, Registrar} import com.twitter.inject.TypeUtils._ import com.twitter.inject.internal.LibraryRegistry import com.twitter.inject.{Injector, Logging} -import com.twitter.scrooge.ThriftMethod +import com.twitter.scrooge.{Request, Response, ThriftMethod} import java.lang.reflect.{Method => JMethod} import java.lang.annotation.{Annotation => JavaAnnotation} import javax.inject.{Inject, Singleton} import org.apache.thrift.protocol.TProtocolFactory -import scala.collection.mutable.{Map => MutableMap} private[routing] abstract class BaseThriftRouter[Router <: BaseThriftRouter[Router]]( injector: Injector, - exceptionManager: ExceptionManager -) extends Logging { this: Router => + exceptionManager: ExceptionManager) + extends Logging { this: Router => - private[this] var done: Boolean = false + def isConfigured: Boolean = configurationComplete + + // There is no guarantee that this is always accessed from the same thread + @volatile + private[this] var configurationComplete: Boolean = false /** * Add exception mapper used for the corresponding exceptions. @@ -49,14 +51,15 @@ private[routing] abstract class BaseThriftRouter[Router <: BaseThriftRouter[Rout * * @see the [[https://twitter.github.io/finatra/user-guide/thrift/exceptions.html user guide]] */ - def exceptionMapper[T <: Throwable](clazz: Class[_ <: ExceptionMapper[T, _]]): Router = { - val mapperType = superTypeFromClass(clazz, classOf[ExceptionMapper[_, _]]) - val throwableType = singleTypeParam(mapperType) - exceptionMapper(injector.instance(clazz))( - Manifest.classType(Class.forName(throwableType.getTypeName)) - ) - this - } + def exceptionMapper[T <: Throwable](clazz: Class[_ <: ExceptionMapper[T, _]]): Router = + preConfig("Exception mappers must be added before a controller is added") { + val mapperType = superTypeFromClass(clazz, classOf[ExceptionMapper[_, _]]) + val throwableType = singleTypeParam(mapperType) + exceptionMapper(injector.instance(clazz))( + Manifest.classType(Class.forName(throwableType.getTypeName)) + ) + this + } /* Protected */ @@ -69,14 +72,33 @@ private[routing] abstract class BaseThriftRouter[Router <: BaseThriftRouter[Rout .withSection("thrift", "methods") ) - protected def assertController(f: => Unit): Unit = { - assert( - !done, + /** + * Ensure that `f` is only run prior to configuring a controller and setting up a thrift service. + */ + protected def preConfig[T](what: String)(f: => T): T = { + assert(!configurationComplete, what) + f + } + + /** + * Ensure that `f` is only run after a controller has been configured + */ + protected def postConfig[T](what: String)(f: => T): T = { + assert(configurationComplete, what) + f + } + + /** + * Ensures that configuring a controller happens only once and provides a consistent message + */ + protected def assertController[T](f: => T): T = { + val message = s"${this.getClass.getSimpleName}#add cannot be called multiple times, as we don't " + s"currently support serving multiple thrift services via the same router." - ) - f - done = true + + val result = preConfig(message)(f) + configurationComplete = true + result } protected[this] def registerGlobalFilter(thriftFilter: Filter.TypeAgnostic): Unit = { @@ -104,15 +126,28 @@ class ThriftRouter @Inject()(injector: Injector, exceptionManager: ExceptionMana extends BaseThriftRouter[ThriftRouter](injector, exceptionManager) { private[this] var underlying: ThriftService = NullThriftService + + // This map of routes is generated based on the controller and set once. + private[this] var routes: Map[ThriftMethod, ScroogeServiceImpl] = _ + protected[this] var filters: Filter.TypeAgnostic = Filter.TypeAgnostic.Identity - private[finatra] val methods = MutableMap[ThriftMethod, ThriftMethodService[_, _]]() + private[finatra] def routeWarmup[M <: ThriftMethod]( + m: M + ): Service[Request[M#Args], Response[M#SuccessType]] = + postConfig("Router has not been configured with a controller") { + routes.get(m) match { + case Some(s) => s.asInstanceOf[Service[Request[M#Args], Response[M#SuccessType]]] + case None => throw new IllegalArgumentException(s"No route for method $m") + } + } /* Public */ - def thriftService: ThriftService = this.underlying - - def thriftMethodService(method: ThriftMethod): ThriftMethodService[_, _] = this.methods(method) + def thriftService: ThriftService = + postConfig("Router has not been configured with a controller") { + this.underlying + } /** * Add global filter used for all requests. @@ -134,8 +169,10 @@ class ThriftRouter @Inject()(injector: Injector, exceptionManager: ExceptionMana * * @see The [[https://twitter.github.io/finatra/user-guide/thrift/filters.html user guide]] */ - def filter[FilterType <: Filter.TypeAgnostic: Manifest, Ann <: JavaAnnotation: Manifest] - : ThriftRouter = { + def filter[ + FilterType <: Filter.TypeAgnostic: Manifest, + Ann <: JavaAnnotation: Manifest + ]: ThriftRouter = { filter(injector.instance[FilterType, Ann]) } @@ -159,11 +196,11 @@ class ThriftRouter @Inject()(injector: Injector, exceptionManager: ExceptionMana * * @see The [[https://twitter.github.io/finatra/user-guide/thrift/filters.html user guide]] */ - def filter(filter: Filter.TypeAgnostic): ThriftRouter = { - assert(underlying == NullThriftService, "'filter' must be called before 'add'.") - filters = filters.andThen(filter) - this - } + def filter(filter: Filter.TypeAgnostic): ThriftRouter = + preConfig("'filter' must be called before 'add'.") { + filters = filters.andThen(filter) + this + } /** * Instantiate and add thrift controller used for all requests. @@ -172,7 +209,7 @@ class ThriftRouter @Inject()(injector: Injector, exceptionManager: ExceptionMana * * @see the [[https://twitter.github.io/finatra/user-guide/thrift/controllers.html user guide]] */ - def add[C <: Controller with ToThriftService: Manifest]: ThriftRouter = { + def add[C <: Controller: Manifest]: Unit = { val controller = injector.instance[C] add(controller) } @@ -183,37 +220,88 @@ class ThriftRouter @Inject()(injector: Injector, exceptionManager: ExceptionMana * * @see the [[https://twitter.github.io/finatra/user-guide/thrift/controllers.html user guide]] */ - def add(controller: Controller with ToThriftService): ThriftRouter = { + def add(controller: Controller): Unit = { assertController { - if (controller.methods.isEmpty) { - error( - s"${controller.getClass.getName} contains no visible methods. For more details see: ${ThriftRouter.url}" - ) - } else { - for (m <- controller.methods) { - m.setFilter(filters) - methods += (m.method -> m) - } - info( - "Adding methods\n" + controller.methods - .map(method => s"${controller.getClass.getSimpleName}.${method.name}") - .mkString("\n") - ) + val reg = injector + .instance[LibraryRegistry] + .withSection("thrift", "methods") + + registerGlobalFilter(reg, filters) + + underlying = controller.config match { + case c: Controller.ControllerConfig => addController(controller, c) + case c: Controller.LegacyConfig => addLegacyController(controller, c) } - registerMethods(controller.getClass, controller.methods.map(_.method)) - registerGlobalFilter(filters) - underlying = controller.toThriftService } - this } - private[this] def registerMethods( - clazz: Class[_], - methods: Seq[ThriftMethod] - ): Unit = - methods.foreach(thriftMethodRegistrar.register(clazz, _)) + private[this] def addController( + controller: Controller, + conf: Controller.ControllerConfig + ): ThriftService = { + if (!conf.isValid) { + val expectStr = conf.methods.map(_.method.name).mkString("{,", ", ", "}") + val message = + s"${controller.getClass.getSimpleName} for service " + + s"${conf.gen.getClass.getSimpleName} is misconfigured. " + + s"Expected exactly one implementation for each of $expectStr but found:\n" + + conf.methods.map(m => s" - ${m.method.name}").mkString("\n") + error(message) + } + + routes = conf.methods.map { cm => + val method: ThriftMethod = cm.method + val service = cm.impl.asInstanceOf[Service[Request[method.Args], Response[method.SuccessType]]] + thriftMethodRegistrar.register(controller.getClass, method, cm.filters) + + method -> filters.andThen(cm.filters).andThen(service).asInstanceOf[ScroogeServiceImpl] + }.toMap + + info( + "Adding methods\n" + routes.keys + .map(method => s"${controller.getClass.getSimpleName}.${method.name}") + .mkString("\n") + ) + + conf.gen.unsafeBuildFromMethods(routes).toThriftService + } + + private[this] def addLegacyController( + controller: Controller, + conf: Controller.LegacyConfig + ): ThriftService = { + if (conf.methods.isEmpty) { + error( + s"${controller.getClass.getName} contains no visible methods. " + + s"For more details see: ${ThriftRouter.url}" + ) + } else { + routes = conf.methods.map { methodService => + val method = methodService.method + thriftMethodRegistrar.register(controller.getClass, method, Filter.TypeAgnostic.Identity) + methodService.setFilter(filters) + + // Convert to a ScroogeServiceImpl for issuing warmup requests + val castedService = methodService.asInstanceOf[Service[method.Args, method.SuccessType]] + val reqRepService = Service.mk[Request[method.Args], Response[method.SuccessType]] { req => + castedService(req.args).map(Response[method.SuccessType]) + } + method -> reqRepService.asInstanceOf[ScroogeServiceImpl] + }.toMap - private[this] def registerGlobalFilter(thriftFilter: Filter.TypeAgnostic): Unit = { + info( + "Adding methods\n" + conf.methods + .map(method => s"${controller.getClass.getSimpleName}.${method.name}") + .mkString("\n") + ) + } + controller.asInstanceOf[ToThriftService].toThriftService + } + + private[this] def registerGlobalFilter( + registry: LibraryRegistry, + thriftFilter: Filter.TypeAgnostic + ): Unit = { if (thriftFilter ne Filter.TypeAgnostic.Identity) { libraryRegistry .withSection("thrift") @@ -243,7 +331,10 @@ class JavaThriftRouter @Inject()(injector: Injector, exceptionManager: Exception /* Public */ - def service: Service[Array[Byte], Array[Byte]] = this.underlying + def service: Service[Array[Byte], Array[Byte]] = + postConfig("Router has not been configured with a controller") { + this.underlying + } /** * Add global filter used for all requests. @@ -265,8 +356,10 @@ class JavaThriftRouter @Inject()(injector: Injector, exceptionManager: Exception * * @see The [[https://twitter.github.io/finatra/user-guide/thrift/filters.html user guide]] */ - def filter[FilterType <: Filter.TypeAgnostic: Manifest, Ann <: JavaAnnotation: Manifest] - : JavaThriftRouter = { + def filter[ + FilterType <: Filter.TypeAgnostic: Manifest, + Ann <: JavaAnnotation: Manifest + ]: JavaThriftRouter = { this.filter(injector.instance[FilterType, Ann]) } @@ -290,11 +383,11 @@ class JavaThriftRouter @Inject()(injector: Injector, exceptionManager: Exception * * @see The [[https://twitter.github.io/finatra/user-guide/thrift/filters.html user guide]] */ - def filter(filter: Filter.TypeAgnostic): JavaThriftRouter = { - assert(underlying == NilService, "'filter' must be called before 'add'.") - filters = filters.andThen(filter) - this - } + def filter(filter: Filter.TypeAgnostic): JavaThriftRouter = + preConfig("'filter' must be called before add") { + filters = filters.andThen(filter) + this + } /** * Add controller used for all requests for usage from Java. The [[ThriftRouter]] only supports @@ -368,5 +461,5 @@ class JavaThriftRouter @Inject()(injector: Injector, exceptionManager: Exception clazz: Class[_], methods: Seq[JMethod] ): Unit = - methods.foreach(thriftMethodRegistrar.register(serviceName, clazz, _)) + methods.foreach(thriftMethodRegistrar.registerJavaMethod(serviceName, clazz, _)) } diff --git a/thrift/src/main/scala/com/twitter/finatra/thrift/servers.scala b/thrift/src/main/scala/com/twitter/finatra/thrift/servers.scala index 3e4f497ce9..7e052cbad9 100644 --- a/thrift/src/main/scala/com/twitter/finatra/thrift/servers.scala +++ b/thrift/src/main/scala/com/twitter/finatra/thrift/servers.scala @@ -2,7 +2,7 @@ package com.twitter.finatra.thrift import com.google.inject.Module import com.twitter.app.Flag -import com.twitter.conversions.time._ +import com.twitter.conversions.DurationOps._ import com.twitter.finagle.service.NilService import com.twitter.finagle.stats.StatsReceiver import com.twitter.finagle.{ListeningServer, NullServer, Service, ThriftMux} diff --git a/thrift/src/test/scala/com/twitter/finatra/thrift/EmbeddedThriftServer.scala b/thrift/src/test/scala/com/twitter/finatra/thrift/EmbeddedThriftServer.scala index c14a6a6787..ea5f700d47 100644 --- a/thrift/src/test/scala/com/twitter/finatra/thrift/EmbeddedThriftServer.scala +++ b/thrift/src/test/scala/com/twitter/finatra/thrift/EmbeddedThriftServer.scala @@ -8,7 +8,7 @@ import scala.collection.JavaConverters._ /** * EmbeddedThriftServer allows a [[com.twitter.server.TwitterServer]] serving thrift endpoints to be started - * locally (on ephemeral ports), and tested through it's thrift interface. + * locally (on ephemeral ports), and tested through its thrift interface. * * @param twitterServer The twitter server to be started locally for integration testing. * @param flags Command line Flags (e.g. "foo"->"bar" will be translated into -foo=bar). See: [[com.twitter.app.Flag]]. diff --git a/thrift/src/test/scala/com/twitter/finatra/thrift/tests/ControllerTest.scala b/thrift/src/test/scala/com/twitter/finatra/thrift/tests/ControllerTest.scala new file mode 100644 index 0000000000..7dee184429 --- /dev/null +++ b/thrift/src/test/scala/com/twitter/finatra/thrift/tests/ControllerTest.scala @@ -0,0 +1,110 @@ +package com.twitter.finatra.thrift.tests + +import com.twitter.doeverything.thriftscala.{Answer, DoEverything} +import com.twitter.doeverything.thriftscala.DoEverything.{Ask, Echo, Echo2, MagicNum, MoreThanTwentyTwoArgs, Uppercase} +import com.twitter.finagle.{Filter, Service} +import com.twitter.finatra.thrift.Controller +import com.twitter.inject.Test +import com.twitter.scrooge.{Request, Response} +import com.twitter.util.Future + +class ControllerTest extends Test { + private val futureExn = Future.exception(new AssertionError("oh no")) + + test("Controller configuration is valid when all methods are specified") { + val ctrl = new Controller(DoEverything) { + handle(Ask) { args => futureExn } + handle(MoreThanTwentyTwoArgs) { args: MoreThanTwentyTwoArgs.Args => futureExn } + handle(MagicNum) { args: MagicNum.Args => futureExn } + handle(Echo2) { args: Echo2.Args => futureExn } + handle(Echo) { args: Echo.Args => futureExn } + handle(Uppercase) { args: Uppercase.Args => futureExn } + } + ctrl.config match { + case cc: Controller.ControllerConfig => assert(cc.isValid) + case _ => fail("Configuration profile was incorrect") + } + } + + test("Controller configuration is invalid unless all methods are specified") { + val ctrl = new Controller(DoEverything) { + handle(Ask) { args => futureExn } + handle(MoreThanTwentyTwoArgs) { args: MoreThanTwentyTwoArgs.Args => futureExn } + handle(MagicNum) { args: MagicNum.Args => futureExn } + // Missing Impl: handle(Echo2) { args: Echo2.Args => futureExn } + handle(Echo) { args: Echo.Args => futureExn } + handle(Uppercase) { args: Uppercase.Args => futureExn } + } + ctrl.config match { + case cc: Controller.ControllerConfig => assert(!cc.isValid) + case _ => fail("Configuration profile was incorrect") + } + } + + test("Controller configuration is invalid if more than one impl is given for a method") { + val ctrl = new Controller(DoEverything) { + handle(Ask) { args => futureExn } + handle(MoreThanTwentyTwoArgs) { args: MoreThanTwentyTwoArgs.Args => futureExn } + handle(MagicNum) { args: MagicNum.Args => futureExn } + handle(Echo2) { args: Echo2.Args => futureExn } + handle(Echo) { args: Echo.Args => futureExn } + handle(Uppercase) { args: Uppercase.Args => futureExn } + handle(Uppercase) { args: Uppercase.Args => futureExn } + } + ctrl.config match { + case cc: Controller.ControllerConfig => assert(!cc.isValid) + case _ => fail("Configuration profile was incorrect") + } + } + + test("When constructed in legacy mode, controller configuration is legacy config") { + val ctrl = new Controller with DoEverything.BaseServiceIface { + val uppercase: Service[Uppercase.Args, String] = handle(Uppercase) { args => futureExn } + val echo: Service[Echo.Args, String] = handle(Echo) { args => futureExn } + val echo2: Service[Echo2.Args, String] = handle(Echo2) { args => futureExn } + val magicNum: Service[MagicNum.Args, String] = handle(MagicNum) { args => futureExn } + val moreThanTwentyTwoArgs: Service[MoreThanTwentyTwoArgs.Args, String] = handle(MoreThanTwentyTwoArgs) { args => futureExn } + val ask: Service[Ask.Args, Answer] = handle(Ask) { args => futureExn } + } + + ctrl.config match { + case lc: Controller.LegacyConfig => + assert(lc.methods.map(_.method).toSet.sameElements(DoEverything.methods)) + case _ => fail(s"Bad configuration ${ctrl.config}") + } + } + + test("Legacy controllers cannot use any MethodDSL functionality") { + class TestController extends Controller with DoEverything.BaseServiceIface { + val uppercase: Service[Uppercase.Args, String] = handle(Uppercase) { args => futureExn } + val echo: Service[Echo.Args, String] = handle(Echo) { args => futureExn } + val echo2: Service[Echo2.Args, String] = handle(Echo2) { args => futureExn } + val magicNum: Service[MagicNum.Args, String] = handle(MagicNum) { args => futureExn } + val moreThanTwentyTwoArgs: Service[MoreThanTwentyTwoArgs.Args, String] = handle(MoreThanTwentyTwoArgs) { args => futureExn } + val ask: Service[Ask.Args, Answer] = handle(Ask) { args => futureExn } + + def check(): Boolean = { + val dsl = handle(Echo) + val fn = { req: Request[Echo.Args] => + Future.value(Response(req.args.msg)) + } + + intercept[IllegalStateException] { + dsl.filtered(Filter.TypeAgnostic.Identity) { args: Echo.Args => Future.value(args.msg) } + } + + intercept[IllegalStateException] { + dsl.withFn(fn) + } + + intercept[IllegalStateException] { + dsl.withService(Service.mk(fn)) + } + true + } + } + + assert(new TestController().check()) + } +} + diff --git a/thrift/src/test/scala/com/twitter/finatra/thrift/tests/DoEverythingThriftServerFeatureTest.scala b/thrift/src/test/scala/com/twitter/finatra/thrift/tests/DoEverythingThriftServerFeatureTest.scala index 9d4490d6f1..56981343f8 100644 --- a/thrift/src/test/scala/com/twitter/finatra/thrift/tests/DoEverythingThriftServerFeatureTest.scala +++ b/thrift/src/test/scala/com/twitter/finatra/thrift/tests/DoEverythingThriftServerFeatureTest.scala @@ -1,6 +1,6 @@ package com.twitter.finatra.thrift.tests -import com.twitter.conversions.time._ +import com.twitter.conversions.DurationOps._ import com.twitter.doeverything.thriftscala.{Answer, DoEverything, Question} import com.twitter.finagle.http.Status import com.twitter.finagle.tracing.Trace @@ -167,13 +167,13 @@ class DoEverythingThriftServerFeatureTest extends FeatureTest { } } - // should be caught by BarExceptionMapper + // should be caught by ReqRepBarExceptionMapper test("BarException mapping") { - await(client123.echo2("barException")) should equal("BarException caught") + await(client123.echo2("barException")) should equal("ReqRep BarException caught") } - // should be caught by FooExceptionMapper + // should be caught by ReqRepFooExceptionMapper test("FooException mapping") { - await(client123.echo2("fooException")) should equal("FooException caught") + await(client123.echo2("fooException")) should equal("ReqRep FooException caught") } test("ThriftException#UnhandledSourcedException mapping") { @@ -234,7 +234,7 @@ class DoEverythingThriftServerFeatureTest extends FeatureTest { test("ask fail") { val question = Question("fail") - await(client123.ask(question)) should equal(Answer("DoEverythingException caught")) + await(client123.ask(question)) should equal(Answer("ReqRep DoEverythingException caught")) } test("MDC filtering") { @@ -303,19 +303,27 @@ class DoEverythingThriftServerFeatureTest extends FeatureTest { test("Per-method stats scope") { val question = Question("fail") - await(client123.ask(question)) should equal(Answer("DoEverythingException caught")) + await(client123.ask(question)) should equal(Answer("ReqRep DoEverythingException caught")) server.assertCounter("per_method_stats/ask/success", 1L) server.assertCounter("per_method_stats/ask/failures", 0L) } test("Per-endpoint stats scope") { val question = Question("fail") - await(client123.ask(question)) should equal(Answer("DoEverythingException caught")) + await(client123.ask(question)) should equal(Answer("ReqRep DoEverythingException caught")) server.assertCounter("srv/thrift/ask/requests", 1L) server.assertCounter("srv/thrift/ask/success", 1L) server.assertCounter("srv/thrift/ask/failures", 0L) } + test("per-endpoint filtering") { + // The echo counter is applied to the two echo endpoints + await(client123.echo("a")) + await(client123.echo2("a")) + await(client123.uppercase("a")) + server.assertCounter("echo_calls", 2L) + } + private def await[T](f: Future[T]): T = { Await.result(f, 2.seconds) } diff --git a/thrift/src/test/scala/com/twitter/finatra/thrift/tests/EmbeddedThriftServerControllerFeatureTest.scala b/thrift/src/test/scala/com/twitter/finatra/thrift/tests/EmbeddedThriftServerControllerFeatureTest.scala index 3c0b6af6c2..57dc587eed 100644 --- a/thrift/src/test/scala/com/twitter/finatra/thrift/tests/EmbeddedThriftServerControllerFeatureTest.scala +++ b/thrift/src/test/scala/com/twitter/finatra/thrift/tests/EmbeddedThriftServerControllerFeatureTest.scala @@ -1,6 +1,6 @@ package com.twitter.finatra.thrift.tests -import com.twitter.conversions.time._ +import com.twitter.conversions.DurationOps._ import com.twitter.converter.thriftscala.Converter import com.twitter.converter.thriftscala.Converter.Uppercase import com.twitter.finagle.{Filter, Service} diff --git a/thrift/src/test/scala/com/twitter/finatra/thrift/tests/InheritanceServerFeatureTest.scala b/thrift/src/test/scala/com/twitter/finatra/thrift/tests/InheritanceServerFeatureTest.scala index 2aa6d1e110..6e89fbc140 100644 --- a/thrift/src/test/scala/com/twitter/finatra/thrift/tests/InheritanceServerFeatureTest.scala +++ b/thrift/src/test/scala/com/twitter/finatra/thrift/tests/InheritanceServerFeatureTest.scala @@ -1,6 +1,6 @@ package com.twitter.finatra.thrift.tests -import com.twitter.conversions.time._ +import com.twitter.conversions.DurationOps._ import com.twitter.finatra.thrift.EmbeddedThriftServer import com.twitter.finatra.thrift.tests.inheritance.InheritanceServer import com.twitter.inject.server.FeatureTest diff --git a/thrift/src/test/scala/com/twitter/finatra/thrift/tests/LegacyDoEverythingThriftServerFeatureTest.scala b/thrift/src/test/scala/com/twitter/finatra/thrift/tests/LegacyDoEverythingThriftServerFeatureTest.scala new file mode 100644 index 0000000000..02b2b3cc15 --- /dev/null +++ b/thrift/src/test/scala/com/twitter/finatra/thrift/tests/LegacyDoEverythingThriftServerFeatureTest.scala @@ -0,0 +1,327 @@ +package com.twitter.finatra.thrift.tests + +import com.twitter.conversions.time._ +import com.twitter.doeverything.thriftscala.{Answer, DoEverything, Question} +import com.twitter.finagle.http.Status +import com.twitter.finagle.tracing.Trace +import com.twitter.finagle.{Filter, Service} +import com.twitter.finatra.thrift.EmbeddedThriftServer +import com.twitter.finatra.thrift.tests.doeverything.LegacyDoEverythingThriftServer +import com.twitter.finatra.thrift.tests.doeverything.controllers.LegacyDoEverythingThriftController +import com.twitter.finatra.thrift.thriftscala.{ClientError, NoClientIdError, ServerError, UnknownClientIdError} +import com.twitter.inject.server.FeatureTest +import com.twitter.io.Buf +import com.twitter.scrooge +import com.twitter.util.{Await, Future} +import org.apache.thrift.TApplicationException +import scala.util.parsing.json.JSON + +@deprecated("These tests exist to ensure legacy functionaly still operates. Do not use them for guidance", "2018-12-20") +class LegacyDoEverythingThriftServerFeatureTest extends FeatureTest { + override val server = new EmbeddedThriftServer( + twitterServer = new LegacyDoEverythingThriftServer, + disableTestLogging = true, + flags = Map("magicNum" -> "57") + ) + + /* Higher-kinded interface type */ + val client123: DoEverything[Future] = server.thriftClient[DoEverything[Future]](clientId = "client123") + /* Method-Per-Endpoint type: https://twitter.github.io/scrooge/Finagle.html#id1 */ + val methodPerEndpointClient123: DoEverything.MethodPerEndpoint = + server.thriftClient[DoEverything.MethodPerEndpoint](clientId = "client123") + /* Service-Per-Endpoint type: https://twitter.github.io/scrooge/Finagle.html#id2 */ + val servicePerEndpoint123: DoEverything.ServicePerEndpoint = + server.servicePerEndpoint[DoEverything.ServicePerEndpoint](clientId = "client123") + /* Higher-kinded interface type wrapping a Service-per-endpoint: https://twitter.github.io/scrooge/Finagle.html#id1 */ + val anotherMethodPerEndpointClient123: DoEverything[Future] = + server.thriftClient[DoEverything.ServicePerEndpoint, DoEverything[Future]]( + servicePerEndpoint123 + ) + /* Another Method-Per-Endpoint type wrapping a Service-per-endpoint: https://twitter.github.io/scrooge/Finagle.html#id1 */ + val yetAnotherMethodPerEndpointClient123: DoEverything.MethodPerEndpoint = + server.methodPerEndpoint[DoEverything.ServicePerEndpoint, DoEverything.MethodPerEndpoint]( + servicePerEndpoint123 + ) + /* Req/Rep Service-Per-Endpoint type: https://twitter.github.io/scrooge/Finagle.html#id3 */ + val reqRepServicePerEndpoint123: DoEverything.ReqRepServicePerEndpoint = + server.servicePerEndpoint[DoEverything.ReqRepServicePerEndpoint](clientId = "client123") + + override protected def afterAll(): Unit = { + Await.all( + Seq( + client123.asClosable.close(), + methodPerEndpointClient123.asClosable.close(), + servicePerEndpoint123.asClosable.close(), + anotherMethodPerEndpointClient123.asClosable.close(), + yetAnotherMethodPerEndpointClient123.asClosable.close(), + reqRepServicePerEndpoint123.asClosable.close() + ), + 2.seconds + ) + super.afterAll() + } + + test("success") { + await(client123.uppercase("Hi")) should equal("HI") + await(methodPerEndpointClient123.uppercase("Hi")) should equal("HI") + await(anotherMethodPerEndpointClient123.uppercase("Hi")) should equal("HI") + await(yetAnotherMethodPerEndpointClient123.uppercase("Hi")) should equal("HI") + + val filter = new Filter[ + DoEverything.Uppercase.Args, + DoEverything.Uppercase.SuccessType, + DoEverything.Uppercase.Args, + DoEverything.Uppercase.SuccessType + ] { + override def apply( + request: DoEverything.Uppercase.Args, + service: Service[DoEverything.Uppercase.Args, String] + ): Future[String] = { + if (request.msg == "hello") { + service(DoEverything.Uppercase.Args("goodbye")) + } else service(request) + } + } + val service = filter.andThen(servicePerEndpoint123.uppercase) + await(service(DoEverything.Uppercase.Args("hello"))) should equal("GOODBYE") + + val filter2 = new Filter[scrooge.Request[DoEverything.Uppercase.Args], scrooge.Response[ + DoEverything.Uppercase.SuccessType + ], scrooge.Request[DoEverything.Uppercase.Args], scrooge.Response[ + DoEverything.Uppercase.SuccessType + ]] { + override def apply( + request: scrooge.Request[DoEverything.Uppercase.Args], + service: Service[scrooge.Request[DoEverything.Uppercase.Args], scrooge.Response[ + DoEverything.Uppercase.SuccessType + ]] + ): Future[scrooge.Response[DoEverything.Uppercase.SuccessType]] = { + val filteredRequest: scrooge.Request[DoEverything.Uppercase.Args] = + scrooge.Request(Map("com.twitter.test.header" -> Seq(Buf.Utf8("foo"))), request.args) + service(filteredRequest) + } + } + val service2 = filter2.andThen(reqRepServicePerEndpoint123.uppercase) + await(service2(scrooge.Request(DoEverything.Uppercase.Args("hello")))).value should equal( + "HELLO" + ) + } + + test("failure") { + val e = assertFailedFuture[Exception] { + client123.uppercase("fail") + } + e.getMessage should include("oops") + } + + test("magicNum") { + await(client123.magicNum()) should equal("57") + } + + test("blacklist") { + val notWhitelistClient = + server.thriftClient[DoEverything[Future]](clientId = "not_on_whitelist") + assertFailedFuture[UnknownClientIdError] { + notWhitelistClient.echo("Hi") + } + } + + test("no client id") { + val noClientIdClient = server.thriftClient[DoEverything[Future]]() + assertFailedFuture[NoClientIdError] { + noClientIdClient.echo("Hi") + } + } + + // echo method doesn't define throws ClientError Exception + // we should receive TApplicationException + test("ClientError throw back") { + assertFailedFuture[TApplicationException] { + client123.echo("clientError") + } + } + + // should be caught by FinatraThriftExceptionMapper + test("ThriftException#ClientError mapping") { + val e = assertFailedFuture[ClientError] { + client123.echo2("clientError") + } + e.getMessage should include("client error") + } + + test("ThriftException#UnknownClientIdError mapping") { + val e = assertFailedFuture[UnknownClientIdError] { + client123.echo2("unknownClientIdError") + } + e.getMessage should include("unknown client id error") + } + + test("ThriftException#RequestException mapping") { + assertFailedFuture[ServerError] { + client123.echo2("requestException") + } + } + + test("ThriftException#TimeoutException mapping") { + assertFailedFuture[ClientError] { + client123.echo2("timeoutException") + } + } + + // should be caught by ReqRepBarExceptionMapper + test("BarException mapping") { + await(client123.echo2("barException")) should equal("BarException caught") + } + // should be caught by ReqRepFooExceptionMapper + test("FooException mapping") { + await(client123.echo2("fooException")) should equal("FooException caught") + } + + test("ThriftException#UnhandledSourcedException mapping") { + assertFailedFuture[ServerError] { + client123.echo2("unhandledSourcedException") + } + } + + test("ThriftException#UnhandledException mapping") { + assertFailedFuture[ServerError] { + client123.echo2("unhandledException") + } + } + + // should be caught by framework root exception mapper - ThrowableExceptionMapper + test("ThriftException#UnhandledThrowable mapping") { + assertFailedFuture[TApplicationException] { + client123.echo2("unhandledThrowable") + } + } + + test("more than 22 args") { + await( + client123.moreThanTwentyTwoArgs( + "one", + "two", + "three", + "four", + "five", + "six", + "seven", + "eight", + "nine", + "ten", + "eleven", + "twelve", + "thirteen", + "fourteen", + "fifteen", + "sixteen", + "seventeen", + "eighteen", + "nineteen", + "twenty", + "twentyone", + "twentytwo", + "twentythree" + ) + ) should equal("handled") + } + + test("ask") { + val question = Question("What is the meaning of life?") + await(client123.ask(question)) should equal( + Answer("The answer to the question: `What is the meaning of life?` is 42.") + ) + } + + test("ask fail") { + val question = Question("fail") + await(client123.ask(question)) should equal(Answer("DoEverythingException caught")) + } + + test("MDC filtering") { + val traceId = Trace.nextId + val response = await { + Trace.letId(traceId) { + client123.uppercase("Hi") + } + } + + response should equal("HI") + + val MDC = server.injector.instance[LegacyDoEverythingThriftController].getStoredMDC + MDC should not be None + MDC.get.size should equal(3) + + MDC.get("method") should not be null + MDC.get("method") should be("uppercase") + + MDC.get("clientId") should not be null + MDC.get("clientId") should be("client123") + + MDC.get("traceId") should not be null + MDC.get("traceId") should be(traceId.traceId.toString()) + } + + test("GET /admin/registry.json") { + val response = server.httpGetAdmin( + "/admin/registry.json", + andExpect = Status.Ok) + + val json: Map[String, Any] = + JSON.parseFull(response.contentString).get.asInstanceOf[Map[String, Any]] + + val registry = json("registry").asInstanceOf[Map[String, Any]] + registry.contains("library") should be(true) + registry("library").asInstanceOf[Map[String, String]].contains("finatra") should be(true) + + val finatra = registry("library") + .asInstanceOf[Map[String, Any]]("finatra") + .asInstanceOf[Map[String, Any]] + + finatra.contains("thrift") should be(true) + val thrift = finatra("thrift").asInstanceOf[Map[String, Any]] + thrift.contains("filters") should be(true) + thrift.contains("methods") should be(true) + + val methods = thrift("methods").asInstanceOf[Map[String, Any]] + methods.size should be > 0 + + methods.foreach { case (_, data) => + data.isInstanceOf[Map[_, _]] should be(true) + val methodJsonInformation = data.asInstanceOf[Map[String, Any]] + methodJsonInformation.contains("service_name") should be(true) + methodJsonInformation.contains("class") should be(true) + } + } + + test("Basic server stats") { + await(client123.uppercase("Hi")) should equal("HI") + server.assertCounter("srv/thrift/sent_bytes")(_ > 0) + server.assertCounter("srv/thrift/received_bytes")(_ > 0) + server.assertCounter("srv/thrift/requests", 1L) + server.assertCounter("srv/thrift/success", 1L) + } + + test("Per-method stats scope") { + val question = Question("fail") + await(client123.ask(question)) should equal(Answer("DoEverythingException caught")) + server.assertCounter("per_method_stats/ask/success", 1L) + server.assertCounter("per_method_stats/ask/failures", 0L) + } + + test("Per-endpoint stats scope") { + val question = Question("fail") + await(client123.ask(question)) should equal(Answer("DoEverythingException caught")) + server.assertCounter("srv/thrift/ask/requests", 1L) + server.assertCounter("srv/thrift/ask/success", 1L) + server.assertCounter("srv/thrift/ask/failures", 0L) + } + + private def await[T](f: Future[T]): T = { + Await.result(f, 2.seconds) + } + + override protected def beforeEach(): Unit = { + server.inMemoryStatsReceiver.clear() + } +} diff --git a/thrift/src/test/scala/com/twitter/finatra/thrift/tests/NonInjectionThriftServerFeatureTest.scala b/thrift/src/test/scala/com/twitter/finatra/thrift/tests/NonInjectionThriftServerFeatureTest.scala index 52ac4122e0..047d798b16 100644 --- a/thrift/src/test/scala/com/twitter/finatra/thrift/tests/NonInjectionThriftServerFeatureTest.scala +++ b/thrift/src/test/scala/com/twitter/finatra/thrift/tests/NonInjectionThriftServerFeatureTest.scala @@ -1,6 +1,6 @@ package com.twitter.finatra.thrift.tests -import com.twitter.conversions.time._ +import com.twitter.conversions.DurationOps._ import com.twitter.noninjection.thriftscala.NonInjectionService import com.twitter.finatra.thrift.EmbeddedThriftServer import com.twitter.finatra.thrift.tests.noninjection.NonInjectionThriftServer diff --git a/thrift/src/test/scala/com/twitter/finatra/thrift/tests/StatsFilterTest.scala b/thrift/src/test/scala/com/twitter/finatra/thrift/tests/StatsFilterTest.scala index 025ce16d8e..529ad69323 100644 --- a/thrift/src/test/scala/com/twitter/finatra/thrift/tests/StatsFilterTest.scala +++ b/thrift/src/test/scala/com/twitter/finatra/thrift/tests/StatsFilterTest.scala @@ -1,6 +1,6 @@ package com.twitter.finatra.thrift.tests -import com.twitter.conversions.time._ +import com.twitter.conversions.DurationOps._ import com.twitter.doeverything.thriftscala.DoEverything import com.twitter.finagle.Service import com.twitter.finagle.service.{ReqRep, ResponseClass, ResponseClassifier} diff --git a/thrift/src/test/scala/com/twitter/finatra/thrift/tests/ThriftWarmupTest.scala b/thrift/src/test/scala/com/twitter/finatra/thrift/tests/ThriftWarmupTest.scala new file mode 100644 index 0000000000..338f9cb00c --- /dev/null +++ b/thrift/src/test/scala/com/twitter/finatra/thrift/tests/ThriftWarmupTest.scala @@ -0,0 +1,14 @@ +package com.twitter.finatra.thrift.tests + +import com.twitter.doeverything.thriftscala.DoEverything.Uppercase +import com.twitter.finatra.thrift.routing.{ThriftWarmup, ThriftRouter} +import com.twitter.inject.Test + +class ThriftWarmupTest extends Test { + test("ThriftWarmup refuses to send a request if a router is not configured") { + intercept[IllegalStateException] { + val tw = new ThriftWarmup(new ThriftRouter(null, null)) + tw.send(Uppercase, Uppercase.Args("hithere"), 1)() + } + } +} diff --git a/thrift/src/test/scala/com/twitter/finatra/thrift/tests/doeverything/DoEverythingThriftServer.scala b/thrift/src/test/scala/com/twitter/finatra/thrift/tests/doeverything/DoEverythingThriftServer.scala index ee1880af89..fefae3cf57 100644 --- a/thrift/src/test/scala/com/twitter/finatra/thrift/tests/doeverything/DoEverythingThriftServer.scala +++ b/thrift/src/test/scala/com/twitter/finatra/thrift/tests/doeverything/DoEverythingThriftServer.scala @@ -8,7 +8,7 @@ import com.twitter.finatra.thrift.filters._ import com.twitter.finatra.thrift.modules.ClientIdAcceptlistModule import com.twitter.finatra.thrift.routing.ThriftRouter import com.twitter.finatra.thrift.tests.doeverything.controllers.DoEverythingThriftController -import com.twitter.finatra.thrift.tests.doeverything.exceptions.{BarExceptionMapper, DoEverythingExceptionMapper, FooExceptionMapper} +import com.twitter.finatra.thrift.tests.doeverything.exceptions.{ReqRepBarExceptionMapper, ReqRepDoEverythingExceptionMapper, ReqRepFooExceptionMapper} import com.twitter.finatra.thrift.tests.doeverything.modules.DoEverythingThriftServerDarkTrafficFilterModule import com.twitter.finatra.thrift.ThriftServer import com.twitter.util.NullMonitor @@ -44,9 +44,9 @@ class DoEverythingThriftServer extends ThriftServer { .filter(Filter.TypeAgnostic.Identity) .filter[Filter.TypeAgnostic, DarkTrafficFilterType] .exceptionMapper[FinatraThriftExceptionMapper] - .exceptionMapper[BarExceptionMapper] - .exceptionMapper[FooExceptionMapper] - .exceptionMapper[DoEverythingExceptionMapper] + .exceptionMapper[ReqRepBarExceptionMapper] + .exceptionMapper[ReqRepFooExceptionMapper] + .exceptionMapper[ReqRepDoEverythingExceptionMapper] .add[DoEverythingThriftController] } diff --git a/thrift/src/test/scala/com/twitter/finatra/thrift/tests/doeverything/DoEverythingThriftTwitterServer.scala b/thrift/src/test/scala/com/twitter/finatra/thrift/tests/doeverything/DoEverythingThriftTwitterServer.scala index acaa32e98f..e472060137 100644 --- a/thrift/src/test/scala/com/twitter/finatra/thrift/tests/doeverything/DoEverythingThriftTwitterServer.scala +++ b/thrift/src/test/scala/com/twitter/finatra/thrift/tests/doeverything/DoEverythingThriftTwitterServer.scala @@ -1,6 +1,6 @@ package com.twitter.finatra.thrift.tests.doeverything -import com.twitter.conversions.time._ +import com.twitter.conversions.DurationOps._ import com.twitter.doeverything.thriftscala.DoEverything import com.twitter.doeverything.thriftscala.DoEverything.{Ask, Echo, Echo2, MagicNum, MoreThanTwentyTwoArgs, Uppercase} import com.twitter.finagle.{ListeningServer, NullServer, Service, ThriftMux} diff --git a/thrift/src/test/scala/com/twitter/finatra/thrift/tests/doeverything/LegacyDoEverythingThriftServer.scala b/thrift/src/test/scala/com/twitter/finatra/thrift/tests/doeverything/LegacyDoEverythingThriftServer.scala new file mode 100644 index 0000000000..8206fe4f4f --- /dev/null +++ b/thrift/src/test/scala/com/twitter/finatra/thrift/tests/doeverything/LegacyDoEverythingThriftServer.scala @@ -0,0 +1,57 @@ +package com.twitter.finatra.thrift.tests.doeverything + +import com.twitter.finagle.{Filter, ThriftMux} +import com.twitter.finagle.tracing.NullTracer +import com.twitter.finatra.annotations.DarkTrafficFilterType +import com.twitter.finatra.thrift.exceptions.FinatraThriftExceptionMapper +import com.twitter.finatra.thrift.filters._ +import com.twitter.finatra.thrift.modules.ClientIdAcceptlistModule +import com.twitter.finatra.thrift.routing.ThriftRouter +import com.twitter.finatra.thrift.tests.doeverything.controllers.LegacyDoEverythingThriftController +import com.twitter.finatra.thrift.tests.doeverything.exceptions.{BarExceptionMapper, DoEverythingExceptionMapper, FooExceptionMapper} +import com.twitter.finatra.thrift.tests.doeverything.modules.LegacyDoEverythingThriftServerDarkTrafficFilterModule +import com.twitter.finatra.thrift.ThriftServer +import com.twitter.util.NullMonitor + +object LegacyDoEverythingThriftServerMain extends LegacyDoEverythingThriftServer + +@deprecated("These tests exist to ensure legacy functionaly still operates. Do not use them for guidance", "2018-12-20") +class LegacyDoEverythingThriftServer extends ThriftServer { + override val name = "example-server" + + flag("magicNum", "26", "Magic number") + + override val modules = + Seq( + new ClientIdAcceptlistModule("/clients.yml"), + new LegacyDoEverythingThriftServerDarkTrafficFilterModule) + + override protected def configureThriftServer(server: ThriftMux.Server): ThriftMux.Server = { + server + .withMonitor(NullMonitor) + .withTracer(NullTracer) + .withPerEndpointStats + } + + override def configureThrift(router: ThriftRouter): Unit = { + router + .filter[LoggingMDCFilter] + .filter[TraceIdMDCFilter] + .filter[ThriftMDCFilter] + .filter(classOf[AccessLoggingFilter]) + .filter[StatsFilter] + .filter[ExceptionMappingFilter] + .filter[ClientIdAcceptlistFilter] + .filter(Filter.TypeAgnostic.Identity) + .filter[Filter.TypeAgnostic, DarkTrafficFilterType] + .exceptionMapper[FinatraThriftExceptionMapper] + .exceptionMapper[BarExceptionMapper] + .exceptionMapper[FooExceptionMapper] + .exceptionMapper[DoEverythingExceptionMapper] + .add[LegacyDoEverythingThriftController] + } + + override protected def warmup(): Unit = { + handle[DoEverythingThriftWarmupHandler]() + } +} diff --git a/thrift/src/test/scala/com/twitter/finatra/thrift/tests/doeverything/controllers/DoEverythingThriftController.scala b/thrift/src/test/scala/com/twitter/finatra/thrift/tests/doeverything/controllers/DoEverythingThriftController.scala index 469c255fc3..d8774a296b 100644 --- a/thrift/src/test/scala/com/twitter/finatra/thrift/tests/doeverything/controllers/DoEverythingThriftController.scala +++ b/thrift/src/test/scala/com/twitter/finatra/thrift/tests/doeverything/controllers/DoEverythingThriftController.scala @@ -1,9 +1,10 @@ package com.twitter.finatra.thrift.tests.doeverything.controllers -import com.twitter.conversions.time._ +import com.twitter.conversions.DurationOps._ import com.twitter.doeverything.thriftscala.{Answer, DoEverything, DoEverythingException} import com.twitter.doeverything.thriftscala.DoEverything.{Ask, Echo, Echo2, MagicNum, MoreThanTwentyTwoArgs, Uppercase} -import com.twitter.finagle.{ChannelException, RequestException, RequestTimeoutException} +import com.twitter.finagle.stats.StatsReceiver +import com.twitter.finagle.{ChannelException, Filter, RequestException, RequestTimeoutException, Service, SimpleFilter} import com.twitter.finatra.thrift.Controller import com.twitter.finatra.thrift.tests.doeverything.exceptions.{BarException, FooException} import com.twitter.finatra.thrift.thriftscala.{ClientError, UnknownClientIdError} @@ -17,13 +18,23 @@ import scala.collection.JavaConverters._ import scala.util.control.NoStackTrace @Singleton -class DoEverythingThriftController @Inject()(@Flag("magicNum") magicNumValue: String) - extends Controller - with DoEverything.BaseServiceIface { +class DoEverythingThriftController @Inject()(@Flag("magicNum") magicNumValue: String, stats: StatsReceiver) + extends Controller(DoEverything) { + + private[this] val echos = stats.counter("echo_calls") + + private[this] val countEchoFilter = new Filter.TypeAgnostic { + def toFilter[Req, Rep]: Filter[Req, Rep, Req, Rep] = new SimpleFilter[Req, Rep]{ + def apply(request: Req, service: Service[Req, Rep]): Future[Rep] = { + echos.incr() + service(request) + } + } + } private[this] var storedMDC: Option[Map[String, String]] = None - override val uppercase = handle(Uppercase) { args: Uppercase.Args => + handle(Uppercase) { args: Uppercase.Args => storeForTesting() info("In uppercase method.") if (args.msg == "fail") { @@ -33,7 +44,7 @@ class DoEverythingThriftController @Inject()(@Flag("magicNum") magicNumValue: St } } - override val echo = handle(Echo) { args: Echo.Args => + handle(Echo).filtered(countEchoFilter) { args: Echo.Args => if (args.msg == "clientError") { Future.exception(ClientError(BadRequest, "client error")) } else { @@ -41,7 +52,7 @@ class DoEverythingThriftController @Inject()(@Flag("magicNum") magicNumValue: St } } - override val echo2 = handle(Echo2) { args: Echo2.Args => + handle(Echo2).filtered(countEchoFilter) { args: Echo2.Args => args.msg match { // should be handled by FinatraExceptionMapper case "clientError" => throw new ClientError(BadRequest, "client error") @@ -49,7 +60,7 @@ class DoEverythingThriftController @Inject()(@Flag("magicNum") magicNumValue: St case "requestException" => throw new RequestException case "timeoutException" => throw new RequestTimeoutException(1.second, "timeout exception") case "unhandledException" => throw new Exception("unhandled exception") with NoStackTrace - // should be handled by BarExceptionMapper and FooExceptionMapper + // should be handled by ReqRepBarExceptionMapper and ReqRepFooExceptionMapper case "barException" => throw new BarException case "fooException" => throw new FooException case "unhandledSourcedException" => throw new ChannelException with NoStackTrace @@ -59,16 +70,16 @@ class DoEverythingThriftController @Inject()(@Flag("magicNum") magicNumValue: St } } - override val magicNum = handle(MagicNum) { args: MagicNum.Args => + handle(MagicNum) { args: MagicNum.Args => Future.value(magicNumValue) } - override val moreThanTwentyTwoArgs = handle(MoreThanTwentyTwoArgs) { + handle(MoreThanTwentyTwoArgs) { args: MoreThanTwentyTwoArgs.Args => Future.value("handled") } - override val ask = handle(Ask) { args: Ask.Args => + handle(Ask) { args: Ask.Args => val question = args.question if (question.text.equals("fail")) { Future.exception(new DoEverythingException("This is a test.")) diff --git a/thrift/src/test/scala/com/twitter/finatra/thrift/tests/doeverything/controllers/DoNothingController.scala b/thrift/src/test/scala/com/twitter/finatra/thrift/tests/doeverything/controllers/DoNothingController.scala index 592dc412ae..e4aa081359 100644 --- a/thrift/src/test/scala/com/twitter/finatra/thrift/tests/doeverything/controllers/DoNothingController.scala +++ b/thrift/src/test/scala/com/twitter/finatra/thrift/tests/doeverything/controllers/DoNothingController.scala @@ -3,4 +3,4 @@ package com.twitter.finatra.thrift.tests.doeverything.controllers import com.twitter.doeverything.thriftscala.DoNothing import com.twitter.finatra.thrift.Controller -class DoNothingController extends Controller with DoNothing.BaseServiceIface +class DoNothingController extends Controller(DoNothing) diff --git a/thrift/src/test/scala/com/twitter/finatra/thrift/tests/doeverything/controllers/LegacyDoEverythingThriftController.scala b/thrift/src/test/scala/com/twitter/finatra/thrift/tests/doeverything/controllers/LegacyDoEverythingThriftController.scala new file mode 100644 index 0000000000..a1d82a179c --- /dev/null +++ b/thrift/src/test/scala/com/twitter/finatra/thrift/tests/doeverything/controllers/LegacyDoEverythingThriftController.scala @@ -0,0 +1,93 @@ +package com.twitter.finatra.thrift.tests.doeverything.controllers + +import com.twitter.conversions.time._ +import com.twitter.doeverything.thriftscala.{Answer, DoEverything, DoEverythingException} +import com.twitter.doeverything.thriftscala.DoEverything.{Ask, Echo, Echo2, MagicNum, MoreThanTwentyTwoArgs, Uppercase} +import com.twitter.finagle.{ChannelException, RequestException, RequestTimeoutException} +import com.twitter.finatra.thrift.Controller +import com.twitter.finatra.thrift.tests.doeverything.exceptions.{BarException, FooException} +import com.twitter.finatra.thrift.thriftscala.{ClientError, UnknownClientIdError} +import com.twitter.finatra.thrift.thriftscala.ClientErrorCause.BadRequest +import com.twitter.inject.annotations.Flag +import com.twitter.inject.logging.FinagleMDCAdapter +import com.twitter.util.Future +import javax.inject.{Inject, Singleton} +import org.slf4j.MDC +import scala.collection.JavaConverters._ +import scala.util.control.NoStackTrace + +@Singleton +@deprecated("These tests exist to ensure legacy functionaly still operates. Do not use them for guidance", "2018-12-20") +class LegacyDoEverythingThriftController @Inject()(@Flag("magicNum") magicNumValue: String) + extends Controller with DoEverything.BaseServiceIface + { + + private[this] var storedMDC: Option[Map[String, String]] = None + + override val uppercase = handle(Uppercase) { args: Uppercase.Args => + storeForTesting() + info("In uppercase method.") + if (args.msg == "fail") { + Future.exception(new Exception("oops") with NoStackTrace) + } else { + Future.value(args.msg.toUpperCase) + } + } + + override val echo = handle(Echo) { args: Echo.Args => + if (args.msg == "clientError") { + Future.exception(ClientError(BadRequest, "client error")) + } else { + Future.value(args.msg) + } + } + + override val echo2 = handle(Echo2) { args: Echo2.Args => + args.msg match { + // should be handled by FinatraExceptionMapper + case "clientError" => throw new ClientError(BadRequest, "client error") + case "unknownClientIdError" => throw new UnknownClientIdError("unknown client id error") + case "requestException" => throw new RequestException + case "timeoutException" => throw new RequestTimeoutException(1.second, "timeout exception") + case "unhandledException" => throw new Exception("unhandled exception") with NoStackTrace + // should be handled by BarExceptionMapper and FooExceptionMapper + case "barException" => throw new BarException + case "fooException" => throw new FooException + case "unhandledSourcedException" => throw new ChannelException with NoStackTrace + // should be handled by root mapper, ThrowableExceptionMapper + case "unhandledThrowable" => throw new Throwable("unhandled throwable") + case _ => Future.value("no specified exception") + } + } + + override val magicNum = handle(MagicNum) { args: MagicNum.Args => + Future.value(magicNumValue) + } + + override val moreThanTwentyTwoArgs = handle(MoreThanTwentyTwoArgs) { + args: MoreThanTwentyTwoArgs.Args => + Future.value("handled") + } + + override val ask = handle(Ask) { args: Ask.Args => + val question = args.question + if (question.text.equals("fail")) { + Future.exception(new DoEverythingException("This is a test.")) + } else { + Future.value( + Answer(s"The answer to the question: `${question.text}` is 42.")) + } + } + + def getStoredMDC: Option[Map[String, String]] = this.storedMDC + + private def storeForTesting(): Unit = { + this.storedMDC = Some( + MDC.getMDCAdapter + .asInstanceOf[FinagleMDCAdapter] + .getPropertyContextMap + .asScala + .toMap + ) + } +} diff --git a/thrift/src/test/scala/com/twitter/finatra/thrift/tests/doeverything/exceptions/mappers.scala b/thrift/src/test/scala/com/twitter/finatra/thrift/tests/doeverything/exceptions/mappers.scala index 4ded80eae1..6d58e7079f 100644 --- a/thrift/src/test/scala/com/twitter/finatra/thrift/tests/doeverything/exceptions/mappers.scala +++ b/thrift/src/test/scala/com/twitter/finatra/thrift/tests/doeverything/exceptions/mappers.scala @@ -2,20 +2,28 @@ package com.twitter.finatra.thrift.tests.doeverything.exceptions import com.twitter.doeverything.thriftscala.{Answer, DoEverythingException} import com.twitter.finatra.thrift.exceptions.ExceptionMapper +import com.twitter.scrooge.Response import com.twitter.util.Future import javax.inject.Singleton @Singleton -class BarExceptionMapper extends ExceptionMapper[BarException, String] { - def handleException(throwable: BarException): Future[String] = { - Future.value("BarException caught") +class ReqRepBarExceptionMapper extends ExceptionMapper[BarException, Response[String]] { + def handleException(throwable: BarException): Future[Response[String]] = { + Future.value(Response("ReqRep BarException caught")) } } @Singleton -class FooExceptionMapper extends ExceptionMapper[FooException, String] { - def handleException(throwable: FooException): Future[String] = { - Future.value("FooException caught") +class ReqRepFooExceptionMapper extends ExceptionMapper[FooException, Response[String]] { + def handleException(throwable: FooException): Future[Response[String]] = { + Future.value(Response("ReqRep FooException caught")) + } +} + +@Singleton +class ReqRepDoEverythingExceptionMapper extends ExceptionMapper[DoEverythingException, Response[Answer]] { + def handleException(throwable: DoEverythingException): Future[Response[Answer]] = { + Future.value(Response(Answer("ReqRep DoEverythingException caught"))) } } @@ -25,3 +33,18 @@ class DoEverythingExceptionMapper extends ExceptionMapper[DoEverythingException, Future.value(Answer("DoEverythingException caught")) } } + + +@Singleton +class BarExceptionMapper extends ExceptionMapper[BarException, String] { + def handleException(throwable: BarException): Future[String] = { + Future.value("BarException caught") + } +} + +@Singleton +class FooExceptionMapper extends ExceptionMapper[FooException, String] { + def handleException(throwable: FooException): Future[String] = { + Future.value("FooException caught") + } +} \ No newline at end of file diff --git a/thrift/src/test/scala/com/twitter/finatra/thrift/tests/doeverything/modules/DoEverythingThriftServerDarkTrafficFilterModule.scala b/thrift/src/test/scala/com/twitter/finatra/thrift/tests/doeverything/modules/DoEverythingThriftServerDarkTrafficFilterModule.scala index 33681516d9..3a1d819a15 100644 --- a/thrift/src/test/scala/com/twitter/finatra/thrift/tests/doeverything/modules/DoEverythingThriftServerDarkTrafficFilterModule.scala +++ b/thrift/src/test/scala/com/twitter/finatra/thrift/tests/doeverything/modules/DoEverythingThriftServerDarkTrafficFilterModule.scala @@ -2,18 +2,33 @@ package com.twitter.finatra.thrift.tests.doeverything.modules import com.twitter.doeverything.thriftscala.DoEverything import com.twitter.finagle.thrift.MethodMetadata -import com.twitter.finatra.thrift.modules.DarkTrafficFilterModule +import com.twitter.finatra.thrift.modules.{ReqRepDarkTrafficFilterModule, DarkTrafficFilterModule} import com.twitter.inject.Injector class DoEverythingThriftServerDarkTrafficFilterModule - extends DarkTrafficFilterModule[DoEverything.ServiceIface] { + extends ReqRepDarkTrafficFilterModule[DoEverything.ReqRepServicePerEndpoint] { /** * Function to determine if the request should be "sampled", e.g. * sent to the dark service. */ override def enableSampling(injector: Injector): Any => Boolean = { request => + MethodMetadata.current match { + case Some(m) => !(m.methodName.equals("uppercase") || m.methodName.equals("moreThanTwentyTwoArgs")) + case _ => true + } + } +} +@deprecated("These tests exist to ensure legacy functionaly still operates. Do not use them for guidance", "2018-12-20") +class LegacyDoEverythingThriftServerDarkTrafficFilterModule + extends DarkTrafficFilterModule[DoEverything.ServiceIface] { + + /** + * Function to determine if the request should be "sampled", e.g. + * sent to the dark service. + */ + override def enableSampling(injector: Injector): Any => Boolean = { request => MethodMetadata.current match { case Some(m) => !(m.methodName.equals("uppercase") || m.methodName.equals("moreThanTwentyTwoArgs")) case _ => true diff --git a/thrift/src/test/scala/com/twitter/finatra/thrift/tests/exceptions/FinatraThriftExceptionMapperIntegrationTest.scala b/thrift/src/test/scala/com/twitter/finatra/thrift/tests/exceptions/FinatraThriftExceptionMapperIntegrationTest.scala index 929d13ab9c..41e0a68a69 100644 --- a/thrift/src/test/scala/com/twitter/finatra/thrift/tests/exceptions/FinatraThriftExceptionMapperIntegrationTest.scala +++ b/thrift/src/test/scala/com/twitter/finatra/thrift/tests/exceptions/FinatraThriftExceptionMapperIntegrationTest.scala @@ -1,6 +1,6 @@ package com.twitter.finatra.thrift.tests.exceptions -import com.twitter.conversions.time._ +import com.twitter.conversions.DurationOps._ import com.twitter.finagle.{CancelledRequestException, Failure, RequestTimeoutException} import com.twitter.finatra.thrift.exceptions.FinatraThriftExceptionMapper import com.twitter.finatra.thrift.thriftscala.{ diff --git a/thrift/src/test/scala/com/twitter/finatra/thrift/tests/inheritance/InheritanceThriftTwitterServer.scala b/thrift/src/test/scala/com/twitter/finatra/thrift/tests/inheritance/InheritanceThriftTwitterServer.scala index 929f6819f3..470c85ffbf 100644 --- a/thrift/src/test/scala/com/twitter/finatra/thrift/tests/inheritance/InheritanceThriftTwitterServer.scala +++ b/thrift/src/test/scala/com/twitter/finatra/thrift/tests/inheritance/InheritanceThriftTwitterServer.scala @@ -1,6 +1,6 @@ package com.twitter.finatra.thrift.tests.inheritance -import com.twitter.conversions.time._ +import com.twitter.conversions.DurationOps._ import com.twitter.finagle.{Filter, ListeningServer, NullServer, Service, ThriftMux} import com.twitter.finatra.thrift.tests.ReqRepServicePerEndpointTest._ import com.twitter.inject.server.{PortUtils, Ports}