Pull cascading flowDef thread into cascading_backend #1681

johnynek · 2017-05-03T22:00:37Z

This is a first step towards factoring Execution so that it does not know about the backend. This is a good cleanup regardless, because we separate the concern of caching from producing Futures from FlowDefs.

But I want to drive towards the Mode providing something to take a sequence of TypedPipe[T], TypedSink[T] pairs and returning a future Unit (and maybe counters) when they have all been written.

ianoc · 2017-05-03T23:09:05Z

...ng-core/src/main/scala/com/twitter/scalding/typed/cascading_backend/AsyncFlowDefRunner.scala

+    // get an immutable copy
+    val cleanUp = filesToCleanup.synchronized { filesToCleanup.toList }
+    if (cleanUp.nonEmpty) {
+      Runtime.getRuntime.addShutdownHook(TempFileCleanup(cleanUp, mode))


Since i'm reading this, any thoughts on how we should handle this instead. Not for doing in this PR obviously. I'm guessing we don't run the cleanup right now because we might have done a toIterableExecution and are using it post the run?

It's a good question. I'm currently thinking of making some methods like:

object Execution { def forceToDisk[T](t: TypedPipe[T]): Execution[TypedPipe[T]] = ... def toIterable[T](t: TypedPipe[T]): Execution[Iterable[T]] = ... }

and just have those two be new subclasses of the Execution trait. Then each Writer can have a functions like:

trait Writer { def forceToDisk[T](c: Config, m: Mode, t: TypedPipe[T]): Future[TypedPipe[T]] def toIterable[T](c: Config, m: Mode, t: TypedPipe[T]): Future[Iterable[T]] }

which execution will just call through to. What do you think of that?

When we get to that stage, basically Execution really is just a memo-izing State/Reader monad. In cats-like description Execution[T] = StateT[Future, (Config, Mode, ExecutionContext), T] with some memoization.

I wasn't that clear: if we do the above, the logic about cleanup files can happen when you do forceToDisk, which Writer is driving, so it can keep the list of clean-up files it needs.

Yeah i like this ^^ makes it more pluggable, and ultimately separating the 2 notions inside the execution feels like a good place to handle it. The question of maybe when to do the delete is still rough but not sure what to do there. I don't know if we can hook into some sort of hadoop fs shutdown thing. But separating this logic out from typed pipe to the platforms is 👍

ianoc · 2017-05-03T23:11:49Z

👍 lgtm

benpence

Looks good. A few minor comments.

benpence · 2017-05-04T21:20:50Z

scalding-core/src/main/scala/com/twitter/scalding/Execution.scala

+  sealed trait ToWrite
+  object ToWrite {
+    case class SimpleWrite[T](pipe: TypedPipe[T], sink: TypedSink[T]) extends ToWrite
+    case class PreparedWrite[T](fn: (Config, Mode) => SimpleWrite[T]) extends ToWrite


Do we need to change the visibility of these?

Yes, the idea is that we want to be able to write new backends. To do that, you will need to be able to see these types (so, in this case AsyncFlowDefRunner needs to be able to pattern match on them).

benpence · 2017-05-05T22:12:21Z

...ng-core/src/main/scala/com/twitter/scalding/typed/cascading_backend/AsyncFlowDefRunner.scala

+}
+
+/**
+ * This holds an internal thread to submit run


"submit run"

benpence · 2017-05-05T22:20:55Z

...ng-core/src/main/scala/com/twitter/scalding/typed/cascading_backend/AsyncFlowDefRunner.scala

+      filesToCleanup ++= files
+    }
+
+  private def toFuture[T](t: Try[T]): Future[T] =


This is already defined elsewhere. Can we just put it in some central place or use bijection's or inline it?

It's so simple, I hear you, but I don't want to make a public API which we can't delete, not use add a bunch of syntax to call bijection here. Can we let this minor duplication slide?

benpence · 2017-05-05T22:40:29Z

...ng-core/src/main/scala/com/twitter/scalding/typed/cascading_backend/AsyncFlowDefRunner.scala

+  def finished(mode: Mode): Unit = {
+    messageQueue.put(Stop)
+    // get an immutable copy
+    val cleanUp = filesToCleanup.synchronized { filesToCleanup.toList }


Why the addition?

The previous stuff was not thread safe.

johnynek · 2017-05-06T00:30:39Z

merging now, and can continue fixes in the next in the series.

* Pull cascading flowDef thread into cascading_backend * Make ToWrite public * remove unused imports

Pull cascading flowDef thread into cascading_backend

45ffc7c

johnynek requested review from ianoc and benpence May 3, 2017 22:00

johnynek added 2 commits May 3, 2017 12:38

Make ToWrite public

2585650

remove unused imports

ad57580

ianoc reviewed May 3, 2017

View reviewed changes

benpence reviewed May 5, 2017

View reviewed changes

johnynek merged commit 53e2568 into develop May 6, 2017

johnynek mentioned this pull request May 6, 2017

Move toIterableExecution and forceToDiskExecution into Execution #1682

Merged

jcdavis pushed a commit to jcdavis/scalding that referenced this pull request May 23, 2017

Pull cascading flowDef thread into cascading_backend (twitter#1681)

5f6a0b6

* Pull cascading flowDef thread into cascading_backend * Make ToWrite public * remove unused imports

johnynek added this to merged in Modularize the typed API Oct 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pull cascading flowDef thread into cascading_backend #1681

Pull cascading flowDef thread into cascading_backend #1681

johnynek commented May 3, 2017

ianoc May 3, 2017

johnynek May 3, 2017

johnynek May 3, 2017

ianoc May 4, 2017

ianoc commented May 3, 2017

benpence left a comment

benpence May 4, 2017

johnynek May 5, 2017

benpence May 5, 2017

benpence May 5, 2017

johnynek May 5, 2017

benpence May 5, 2017

johnynek May 5, 2017

johnynek commented May 6, 2017

Pull cascading flowDef thread into cascading_backend #1681

Pull cascading flowDef thread into cascading_backend #1681

Conversation

johnynek commented May 3, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ianoc commented May 3, 2017

benpence left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnynek commented May 6, 2017