Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better support for Iterative Jobs / Temporary outputs #258

Open
johnynek opened this issue Dec 22, 2012 · 0 comments
Open

Better support for Iterative Jobs / Temporary outputs #258

johnynek opened this issue Dec 22, 2012 · 0 comments
Milestone

Comments

@johnynek
Copy link
Collaborator

If you want to do an iterative job, the pattern is:

Read some previous state, compute some new state, check if you should stop, if you should, next returns None else next returns Some(job) where job is probably a copy of the current job.

The problem is the programmer has to manage all the temporary sources that are created to check convergence.

I imagine something like a TempSourceFactory that can keep track of temporary sources, which are probably cascading sequence files, which you can pass between jobs.

After the last job is run, the TempSourceFactory cleans up all the allocated data on the disk (which is to say, all of this data is ephemeral).

My design thinking around this is to have a map inside the TempSourceFactory object that maps a UUID onto a Map[String,Source]. For each iterative job, there is one UUID, and this can be accessed from any job.

This should probably be plumbed through with an API on TempSourceFactory, something like:

object TempSourceFactory {
  def apply[T](name: String)(implicit mf: Manifest[T], args: Args): Mappable[T]
  def cleanup: Unit
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant