No description, website, or topics provided.
Switch branches/tags
Nothing to show
Pull request Compare This branch is even with progrium:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


Miyamoto Task Queue

Miyamoto is a fast, clusterable task queue inspired by Google App Engine's task 
queue. This means it speaks HTTP with a RESTful API for enqueuing tasks and 
uses HTTP callbacks (webhooks) for processing tasks. Worker daemons can be any 
web daemon.

End result? Super easy asynchronous processing.

Like the App Engine task queue, Miyamoto features task scheduling and rate 
limiting. Unlike the App Engine task queue, it provides idempotency semantics
and helps you avoid duplicate execution of tasks.

Miyamoto is designed for:
 - High throughput (fast)
 - Fault tolerance (highly available)
 - Data replication (minimal lost messages)
 - No major dependencies or client libs (easy to set up and use)
 - Simple design/implementation (easy to hack!)

Miyamoto is bad if you want:
 - Slow disk persistence (but we may add it)
 - Traditional task polling
 - Smart clients
 - AMQP, JMS, STOMP or other interfaces*

Not out of the box, but you can still achieve:
 - Synchronous tasks
 - Return values / result storage
 - Workers behind firewall / NAT
 - Dynamically adding to the worker pool
* = Excluding ZeroMQ, which is used internally, so there are optional ZMQ interfaces

Make sure you have Python 2.6+ installed with gevent. Run on port 8088 with:

    INTERFACE= python scripts/


Once it's running (currently hardcoded to 8088), set up a pretend web server with netcat:

    nc -l 9099

Now let's post a task to it scheduled to run in 5 seconds: 

    curl -d "task.url=http://localhost:9099/worker&task.countdown=5&foo=bar" http://localhost:8088/queue

You should get back some JSON:

    {"status": "scheduled", "id": "7e998f34-bed7-44f4-80cd-49620b0ab562"}

And in 5 seconds, you should see on your listening netcat:

    POST /worker HTTP/1.1
    Accept-Encoding: identity
    Content-Length: 7
    Host: localhost:9099
    Content-Type: application/x-www-form-urlencoded
    X-Task-Id: 7e998f34-bed7-44f4-80cd-49620b0ab562
    Connection: close
    User-Agent: miyamoto/0.1

You can also add tasks as JSON objects. Here's one without a delay:

    curl -H "Content-Type: applicatoin/json" -d '{"url": "http://localhost:9099/worker", "params": {"foo": "bar"}}' http://localhost:8088/queue


In theory, you'd run a cluster of Miyamoto nodes. Each one is exactly the same, 
except that one will be the leader. You seed the leader with the first node, but
otherwise, leaders are elected if, say, the leader dies. Internally there is a list
of hosts in the cluster that form a ring. Whenever you add tasks to any node, it 
will send the task round robin to another node in the cluster for basic load 
balancing. But it will do this N times where N is a replication factor. Each replicant
has an added delay to execution. If any of them run, the others are canceled.
In this way you can schedule tasks and not worry that hosts might go down, achieving
HA of service and data. 


- rate limiting
- idempotency safeguards
- error handling


App Engine - API, features
Celery - features, anti-features
Memcache - ideas around consistent hashing
Membase - replication, clustering
ZooKeeper - realtime group membership, leader election
Kestrel - speed, simplicity
ZeroMQ - oh, the possibilities


Evan Cooke - offset replica scheduling idea
JJ Behrens - code review, feedback