Make an Execution[T] type, which is a monad, which makes composing Jobs easy. #974

johnynek · 2014-07-25T02:40:34Z

No description provided.

jcoveney · 2014-07-25T02:55:17Z

scalding-core/src/main/scala/com/twitter/scalding/Execution.scala

+/**
+ * This is a Monad, that represents a computation and a result
+ */
+sealed trait Execution[+T] {


Still reading the code so I'm just tossing stuff out as I go along, but this seems like a Reader[(Config, Mode, ConcurrentExecutionContext), Future[T]] ?

sort of, but actually it should be a ReaderT[WriterT[ImmutableFlowDef], (Config, Mode, ConcurrentExecutionContext), Future, T] or something like that. There are three monads here, Reader, Writer, Future.

On a similar note, should we lift this out to have a ScaldingExecution[...], this seems like something that would be useful outside of scalding jobs themselves. -- Though i'm quite in favor of getting this down and abstracting later also.

jcoveney · 2014-07-25T20:11:28Z

scalding-core/src/main/scala/com/twitter/scalding/typed/TypedPipe.scala

+   * This is the functionally pure approach to building jobs. Note,
+   * that you have to call run on the result for anything to happen here.
+   */
+  def writeExecution(dest: TypedSink[T]): Execution[Unit] =


so right now, it's common to do pipe.stuff.write(sink).stuff.write(sink2) and so on...if we want to encourage functional purity, we'd force the user to compose executions, right? I wonder if it would be useful to have another

def writeThrough(dest: TypedSource[T] with TypedSink[T]): Execution[TypedPipe[T]]

where it wouldn't force the user to compose the Executions.

That said, I'm still ruminating on whether or not this is necessary...

I'm not sure it is, that example would all be part of one execution right? what does writeThrough mean there? we force write it out and resume from there?

We could add that easily. snapshotExecution is similar, you just don't control the task.

Alternatively:

val forked = somestuff.fork val ex1 = forked.writeExecution(sink) val ex2 = someMore(forked) zip(ex1, ex2).unit : Execution[Unit]

Edit: I wrote this before I saw oscar's, but I think it still stands

That's what I wasn't sure about. I mean once they do .write(), how can they continue computation?

If you have (tpipe:TypedPipeFactory).stuff.writeExecution(sink), now you're stuff. If you do

val tp = (tpipe:TypedPipeFactory).stuff tp.writeExecution(sink) tp.writeExecution(sink2)

What is going on, right? We have to call run on each I guess, and the composition is a bit weird. likely I'm misunderstanding something

Edit: this is after Oscar's

Is that really how we want people to use this? I feel like we should have a nicer way to do continued computation, or maybe we layer that on top since this is just a lower level set of primitives anyway

it is like summingbird .also. If you don't .zip them, one of them will be ignored. Watch out!

you can also flatMap them.

So, the problem if we flatMap an Execution, it is going to push it into two flows. a Zip will be only one.

Let's recall, to make this pure, all the side effects (write) must wind up in the Execution type. So, they can use for (which will sequence things) or zip (which will run the branches in parallel), or they can avoid functional purity and go back the way the had it.

Any better suggestions?

What about a method like:

def zipAll[T](exs: Seq[Execution[T]]): Execution[Seq[T]] =

And then you capture all your job into one.

How does that sound?

johnynek · 2014-07-25T21:52:36Z

Check out if these added zip methods make it clearer how to compose after writes.

johnynek · 2014-07-27T03:00:33Z

I implemented K-means this weekend without this patch. It is insane how much work you have to do to implement an iterative job without this pull request.

I'm really looking forward to merging this, as writing page-rank and k-means will become almost trivial (and will serve as excellent examples).

jcoveney · 2014-07-27T03:02:21Z

Why not add the example to this patch, to yhe scalding test code that runs?
Would be great to have, and also good to concretize the API

johnynek · 2014-07-27T03:51:33Z

@jcoveney what your bar for shipping? Is implementing (and writing tests for, because I don't want a buggy example) of kmeans needed to see the utility in a composable execution monad?

jcoveney · 2014-07-27T03:53:38Z

Certainly not part of the bar at all. I was just suggesting that if you
already had an example, it could be good to clean up and include as a nice
motivator for why this patch is awesome. People def ask about doing
iterative or dependent processing in scalding and this goes from awful to
amazing.

But definitely not necessary!

jcoveney · 2014-07-27T17:59:53Z

scalding-core/src/main/scala/com/twitter/scalding/Execution.scala

+      prev.run(conf, mode).map(fn)
+
+    // Don't bother applying the function if we are mapped
+    override def unit = prev.unit


I'm trying to think if this could break people...it definitely will if fn has side effects. Trying to think if there are "more legitimate" cases where it would as well... ie where they need the computation to happen but don't care about the result.

Still, seems like it could lead to hard to understand bugs?

I would say, that is not map, that is foreach. Map is pure. We make the assumption that mapping functions are pure all the time.

By the way, what kind of side-effect would be safe to apply?

Hm. I mean I agree with you. I suppose we will just punish people who abuse the API :)

jcoveney · 2014-07-27T18:08:28Z

Had the question about .unit in Mapped, otherwise lgtm

jcoveney · 2014-07-27T19:17:56Z

You need to merge master, feel free to merge.

johnynek · 2014-07-27T21:27:00Z

@jcoveney ok. Here is kmeans. With tests.

In a later pull request we can add .foreach or .recover, or other methods from Future.

jcoveney · 2014-07-27T21:30:12Z

Wow, thanks for going above and beyond. I think that's an incredibly useful example though. A very nontrivial computation which can be expressed quite succinctly with this API.

Make an Execution[T] type, which is a monad, which makes composing Jobs easy.

johnynek added 2 commits July 24, 2014 15:51

Add the Execution Monad

e75a026

Get the tmp directory from the config

382030c

jcoveney reviewed Jul 25, 2014
View reviewed changes

Add some tests

124e5db

jcoveney reviewed Jul 25, 2014
View reviewed changes

Add more zip methods and .unit to ease composition

0c92c76

johnynek added 2 commits July 25, 2014 12:05

Add Jco's writeThrough

bea09ef

Rename snapshotExecution to forceToDiskExecution for consistency

03eebcf

jcoveney reviewed Jul 27, 2014
View reviewed changes

johnynek added 2 commits July 27, 2014 09:27

Merge with develop

7aac46b

Add Kmeans test to execution

f491a27

jcoveney added a commit that referenced this pull request Jul 28, 2014

Merge pull request #974 from twitter/execution_monad

2da0540

Make an Execution[T] type, which is a monad, which makes composing Jobs easy.

jcoveney merged commit 2da0540 into develop Jul 28, 2014

caniszczyk deleted the execution_monad branch May 18, 2015 19:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make an Execution[T] type, which is a monad, which makes composing Jobs easy. #974

Make an Execution[T] type, which is a monad, which makes composing Jobs easy. #974

johnynek commented Jul 25, 2014

jcoveney Jul 25, 2014

johnynek Jul 25, 2014

ianoc Jul 25, 2014

jcoveney Jul 25, 2014

ianoc Jul 25, 2014

johnynek Jul 25, 2014

jcoveney Jul 25, 2014

johnynek Jul 25, 2014

johnynek Jul 25, 2014

johnynek Jul 25, 2014

johnynek commented Jul 25, 2014

johnynek commented Jul 27, 2014

jcoveney commented Jul 27, 2014

johnynek commented Jul 27, 2014

jcoveney commented Jul 27, 2014

jcoveney Jul 27, 2014

johnynek Jul 27, 2014

jcoveney Jul 27, 2014

jcoveney commented Jul 27, 2014

jcoveney commented Jul 27, 2014

johnynek commented Jul 27, 2014

jcoveney commented Jul 27, 2014

Make an Execution[T] type, which is a monad, which makes composing Jobs easy. #974

Make an Execution[T] type, which is a monad, which makes composing Jobs easy. #974

Conversation

johnynek commented Jul 25, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnynek commented Jul 25, 2014

johnynek commented Jul 27, 2014

jcoveney commented Jul 27, 2014

johnynek commented Jul 27, 2014

jcoveney commented Jul 27, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jcoveney commented Jul 27, 2014

jcoveney commented Jul 27, 2014

johnynek commented Jul 27, 2014

jcoveney commented Jul 27, 2014