Timeout error: "WORKER ERROR, A WORKER EXPIRED! Did not get results or even errors back from all workers!" #1

Open
mattheworiordan opened this Issue Feb 13, 2010 · 0 comments

Projects

None yet

1 participant

@mattheworiordan

Hi

I recently started using Skynet and have found it very useful to get up and running very quickly.

The only problem I now have is that my threads are timing out and I am not sure what to do about this.

I am following a simple structure where I use a class similar to the following:

class BuildLandMassImagesMapreduce
  include SkynetDebugger

  def self.run
    data = BuildLandMassImages.new().get_tile_pairs

    job = Skynet::Job.new(
      :mappers          => 4,
      :reducers         => 1,
      :map_reduce_class => self,
      :map_data         => data,
      :map_timeout      => 10.minutes,
      :data_debug       => true
    )
    results = job.run
  end

  def self.map(tile_pairs)
    result = Array.new
    tile_pairs.each do |tile_pair|
      # the next step is quite intensive and can take up to a second for each request
      processed = Worker.heavy_lifting(tile_pair) ? true : false
      result << [tile_pair, processed]
    end
    result
  end

  def self.reduce(tiles)
    totals = Hash.new
    tiles.each do |tile|
      tile_pair, processed = tile
      if (tile_pair)
        totals[tile_pair] ||= 0
        totals[tile_pair] += 1 if processed
      end
    end
    totals.keys.sort.map { |key| [key, totals[key]] }
  end
end

The problem I have is that whenever I run the run method, the Skynet MapReduce system kicks off and seems to be working OK. I watch the threads, and the number of workers tends to start off with around 1 active thread, and then build up to around 4 threads, and then after a short while I receive the following error in the console:

"WORKER ERROR, A WORKER EXPIRED! Did not get results or even errors back from all workers!"

Now I am not sure why I am receiving that error. I assume that the MapReduce system should be distributing the tasks out to the threads, and managing the load of how much data is passed to them to process. If Skynet is managing the load automatically, then I don't think the map method should be timing out. However, if Skynet splits the load across the 4 threads, and they in turn should be redistributing the load to more map methods, then I have obviously misunderstood how I should be implementing the code. Please can you advise where you think I am going wrong?

Finally, off topic slightly, but I have tried to use the SkynetDebugger info / log methods to debug these problems without luck. What namespace / method needs to be used to access these methods from within a map method?

Thanks,
Matthew O'Riordan
http://mattheworiordan.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment