Skip to content

Commit

Permalink
update docs for scaling
Browse files Browse the repository at this point in the history
  • Loading branch information
GreatYYX committed Apr 5, 2019
1 parent 40f2839 commit 863ca31
Showing 1 changed file with 10 additions and 6 deletions.
16 changes: 10 additions & 6 deletions docs/scaling_and_optimization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,12 @@ Some of the methods have optional / required arguments about buffer size, chunk
Parallel processing
-------------------

Here you need to use a package called `pyrallel <https://github.com/usc-isi-i2/pyrallel>`_.

General parallel processing
```````````````````````````

If you have some compute-intensive procedures and your machine has more than one CPU core, `rltk.ParallelProcessor` is a tool to try. You can find more detailed information in API documentation :doc:`mod_parallel_processor`, but in general, it encapsulates multiprocessing to do parallel computing and multithreading to do data collecting.
If you have some compute-intensive procedures and your machine has more than one CPU core, `pyrallel.ParallelProcessor` is a tool to try. You can find more detailed information in its API documentation, but in general, it encapsulates multiprocessing to do parallel computing and multithreading to do data collecting.

.. code-block:: python
Expand All @@ -26,11 +28,11 @@ If you have some compute-intensive procedures and your machine has more than one
def output_handler(r1, r2):
result.append(r1 if r1 > r2 else r2)
pp = rltk.ParallelProcessor(heavy_calculation, 8, output_handler=output_handler)
pp = pyrallel.ParallelProcessor(8, mapper=heavy_calculation, collector=output_handler)
pp.start()
for i in range(8):
pp.compute(i, i + 1)
pp.add_task(i, i + 1)
pp.task_done()
pp.join()
Expand All @@ -41,7 +43,7 @@ If you have some compute-intensive procedures and your machine has more than one
MapReduce
`````````

The above solution uses one thread (in main process) for collecting calculated data. If you want to do something like divide and conquer, especially when "conquer" needs heavy calculation, you may need `rltk.MapReduce` module. Detailed documentation can be found :doc:`mod_map_reduce`.
The above solution uses one thread (in main process) for collecting calculated data. If you want to do something like divide and conquer, especially when "conquer" needs heavy calculation, you may need `pyrallel.MapReduce` module.

.. code-block:: python
Expand All @@ -52,9 +54,11 @@ The above solution uses one thread (in main process) for collecting calculated d
def reducer(r1, r2):
return r1 + r2
mr = rltk.MapReduce(8, mapper, reducer)
mr = pyrallel.MapReduce(8, mapper, reducer)
for i in range(10000):
mr.add_task(i)
mr.task_done()
result = mr.join()
print(result)
Expand All @@ -81,7 +85,7 @@ Then on worker machines, do
python -m rltk remote.worker <scheduler ip>:8786 --nprocs <processors>
Second, change a bit of your code and run it. The API for distributed computing is really like `rltk.ParallelProcessor`. But you need a `rltk.remote.Remote` object which connects to the scheduler and an instance of `rltk.remote.Task` which has a input and a output handler.
Second, change a bit of your code and run it. The API for distributed computing is really like `pyrallel.ParallelProcessor`. But you need a `rltk.remote.Remote` object which connects to the scheduler and an instance of `rltk.remote.Task` which has a input and a output handler.

.. code-block:: python
Expand Down

0 comments on commit 863ca31

Please sign in to comment.