Browse files

Updated documentation

  • Loading branch information...
1 parent 94d5178 commit d6714953a95739a7be81e112ad773d10383a7703 Ville Tuulos committed Apr 8, 2009
Showing with 115 additions and 1 deletion.
  1. +3 −1 doc/faq.rst
  2. +1 −0 doc/index.rst
  3. +11 −0 doc/py/core.rst
  4. +74 −0 doc/py/disco_worker.rst
  5. +26 −0 doc/releases.rst
4 doc/faq.rst
@@ -55,7 +55,7 @@ also for development in general. It is highly recommended that you test
your functions first locally with :mod:`homedisco`, before running them
in the normal distributed Disco environment.
-.. _reduceonly:
+.. _outputtypes:
How can I output arbitrary Python objects in map and reduce, not only strings?
@@ -71,6 +71,8 @@ If you want to output arbitrary objects in your reduce function, set also
:func:`disco.core.result_iterator` to read results, set its *reader* parameter
to :func:`disco.func.object_reader`.
+.. _reduceonly:
Do I always have to provide a function for map and reduce?
*Updated for Disco 0.2 which supports the reduce-only case*
1 doc/index.rst
@@ -14,6 +14,7 @@ Background
FAQ <faq>
+ releases
Getting started
11 doc/py/core.rst
@@ -109,6 +109,17 @@ anymore. You can delete the unneeded job files as follows::
Returns a dictionary containing information about the job *name*.
+ .. method:: Disco.oob_get(name, key)
+ Returns an out-of-band value assigned to *key* for the job *name*.
+ The key-value pair was stored with a :func:`disco_worker.put` call
+ in the job *name*.
+ .. method:: Disco.oob_list(name)
+ Returns all out-of-band keys for the job *name*. Keys were stored by
+ the job *name* using the :func:`disco_worker.put` function.
.. method:: Disco.wait(name[, poll_interval, timeout, clean])
Block until the job *name* has finished. Returns a list URLs to the
74 doc/py/disco_worker.rst
@@ -15,6 +15,80 @@ As job functions are imported to the :mod:`disco_worker` namespace
for execution, they can use functions in this module directly without
importing the module explicitely.
+.. _oob:
+Out-of-band results
+*(new in version 0.2)*
+In addition to standard input and output streams, map and reduce tasks can
+output results through an auxiliary channel called *out-of-band results* (OOB).
+In contrast to the standard output stream, which is sequential, OOB results
+can be accessed by unique keys.
+Out-of-band results should not be used as a substitute for the normal output
+stream. Each OOB key-value pair is saved to an individual file which waste
+space when values are small and which are inefficient to random-access in bulk.
+Due to these limitations, OOB results are mainly suitable, e.g for outputting
+statistics and other metadata about the actual results.
+To prevent rogue tasks from overwhelming nodes with a large number of OOB
+results, each is allowed to output 1000 results (:func:`put` calls) at maximum.
+Hitting this limit is often a sign that you should use the normal output stream
+for you results instead.
+You can not use OOB results as a communication channel between concurrent tasks.
+Concurrent tasks need to be independent to preserve desirable fault-tolerance
+and scheduling characteristics of the map/reduce paradigm. However, in the
+reduce phase you can access OOB results produced in the preceding map phase.
+Similarly you can access OOB results produced by other finished jobs, given
+a job name.
+You can retrieve OOB results outside tasks using the :meth:`disco.core.Disco.oob_list` and
+:meth:`disco.core.Disco.oob_get` functions.
+.. function:: put(key, value)
+Stores an out-of-band result *value* with the key *key*. Key must be unique in
+this job. Maximum key length is 256 characters. Only characters in the set
+``[a-zA-Z_\-:0-9]`` are allowed in the key.
+.. function:: get(key, [job])
+Gets an out-of-band result assigned with the key *key*. The job name *job*
+defaults to the current job.
+Given the semantics of OOB results (see above), this means that the default
+value is only good for the reduce phase which can access results produced
+in the preceding map phase.
+Utility functions
+.. function:: this_partition()
+For a map task, returns an integer between *[0..nr_maps]* that identifies
+the task. This value is mainly useful if you need to generate unique IDs
+in each map task. There are no guarantees about how ids are assigned
+for map tasks.
+For a reduce task, returns an integer between *[0..nr_reduces]* that
+identifies this partition. You can use a custom partitioning function to
+assign key-value pairs to a particular partition.
+.. function:: this_host()
+Returns jostname of the node that executes the task currently.
+.. function:: this_master()
+Returns hostname and port of the disco master.
+.. function:: this_inputs()
+List of input files for this task.
.. function:: msg(message)
Sends the string *message* to the master for logging. The message is
26 doc/releases.rst
@@ -0,0 +1,26 @@
+Release notes
+Disco 0.2 (April 7th 2009)
+New features
+ - :ref:`oob`: A mechanism to produce auxiliary results in map/reduce tasks.
+ - Map writers, reduce readers and writers (see :meth:`disco.core.Disco.new_job`): Support for custom result formats and internal protocols.
+ - Support for arbitrary output types: :ref:`outputtypes`.
+ - Custom task initialization functions: Ssee *map_init* and *reduce_init* in :meth:`disco.core.Disco.new_job`.
+ - Jobs without inputs i.e. generator maps: See the `raw://` protocol in :meth:`disco.core.Disco.new_job`.
+ - Reduces without maps for efficient join and merge operations: See :ref:`reduceonly`.
+ - ``chunked = false`` mode produced incorrect input files for the reduce phase (commit db718eb6)
+ - Shell enabled for the disco master process (bug #7, commit 7944e4c8)
+ - Added warning about unknown parameters in ``new_job()`` (bug #8, commit db707e7d)
+ - Fix for sending invalid configuration data (bug #1, commit bea70dd4)
+ - Fixed missing ``msg``, ``err`` and ``data_err`` functions (commit e99a406d)

0 comments on commit d671495

Please sign in to comment.