Skip to content
This repository was archived by the owner on Mar 13, 2024. It is now read-only.

JuliaReinforcementLearning/DistributedReinforcementLearning.jl-archive

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

If it works, it works everywhere!


Design

Components

  • πŸ‘· Worker, a worker creates a task to run an experiment in background. It periodically sends out transitions between agent and environment, and fetches latest parameter.
  • πŸ“’ WorkerProxy, a worker proxy collects messages from/to workers on the same node so that some message data (model params) can be shared across different workers.
  • πŸ’Ώ TrajectoryManager, a trajectory manager is a wrapper around an AbstractTrajectory. It takes in a bulk of transitions and samples a batch of training data in respond to request.
  • πŸ’‘ Trainer, a trainer is a wrapper around an AbstractPolicy, it does nothing but to update its internal parameters when received a batch of training data and periodically broadcast its latest parameters.
  • ⏱️ Orchestrator, an orchestrator is in charge of controlling the start, stop and the speed of communications between the above components.

Note that:

  1. We adopt the actor model here. Each instance of the above components is an actor. Only messages are passing between them.
  2. A node is a process in Julia. Different nodes can be on one machine or across different machines.
  3. Tasks in different workers are initiated with Threads.@spawn. There's no direct communication between them by design.
  4. In single node environment (WorkerNode and MainNode are the same one), the WorkerProxy can be removed and workers communicate with Orchestrator directly.

Messages

  • 1️⃣ (πŸ‘· β†’ πŸ“’) InsertTransitionMsg, contains the local transitions between agent and environment in an experiment.
  • 2️⃣ (πŸ“’ β†’ ⏱️) InsertTransitionMsg from different workers.
  • 3️⃣ (⏱️ β†’ πŸ’Ώ) InsertTransitionMsg and SampleBatchMsg (which contains the address of Trainer).
  • 4️⃣ (πŸ’Ώ β†’ πŸ’‘) BatchTrainingDataMsg
  • 5️⃣ (πŸ’‘ β†’ πŸ’Ώ) UpdatePriorityMsg, only necessary in prioritized experience replay related algorithms.
  • 6️⃣ (πŸ’‘ β†’ ⏱️) LoadParamsMsg, contains the latest parameters of the policy.
  • 7️⃣ (⏱️ β†’ πŸ“’) LoadParamsMsg
  • 8️⃣ (πŸ“’ β†’ πŸ‘·) LoadParamsMsg

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages