Improvement #2

vwxyzjn · 2023-02-27T22:12:06Z

This PR does a few things.

Adds a make_bulk_array that combines the lists of DeviceArrays into bulk DeviceArrays. E.g., obs is a list containing args.num_steps * args.async_update instances of DeviceArrays, and make_bulk_array combines them into a single DeviceArray. This makes data transfer from the actor's device to the learner's device faster — instead of making MemcpyP2P of args.num_steps * args.async_update times, we do it just one time. Related Why does device-to-device transfer (`DevicePut`) block computation in another thread? google/jax#14693
- Preliminary experiments show about ~10% speed increase and some compilation time reduction.
When making the env, change from envs = make_env(args.env_id, args.seed, args.local_num_envs, args.async_batch_size)() to envs = make_env(args.env_id, args.seed + jax.process_index(), args.local_num_envs, args.async_batch_size)(). While this change does not matter when args.async_batch_size != args.local_num_envs, when they are equal, the seeds are essentially the same across replicated processes and are respected by envpool, which is bad: we will be dealing with the same data in different processes.

Both changes do not impact sample efficiency performance, so benchmark rerun is unnecessary.

Improvement

f271d24

vwxyzjn merged commit a3d7584 into main Feb 27, 2023

Provide feedback