You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
from itertools import product
envs = (RandomMDPEnv(10, 10, 'r1', 'c1', transition_seed=1066) for _ in range(10))
Ps = (env.transition_probabilities for env in envs)
P_pairs = product(Ps, Ps)
pairs = product(envs[0].states, envs[0].actions)
all(all(P[0](*pair) == P[1](*pair)) for pair in pairs for P in P_pairs)
returns True, so I think setting transition_seed suffices to guarantee that RandomMDPEnv will always return the same environment.
New Issue
The keyword argument training_seed in RandomMDPEnv appears to be superfluous, since an agent responsible for training should be making the appropriate call to np.seed anyway.
I would be happy to get rid of the training_seed. I think setting the seed in one place is the best option by far, and have all the downstream consequences flow from that.
Issue
We need to be able to seed
RandomMDPEnv
so that, whenever identical seeds are provided, identicalMDPEnvs
are produced.Question
Is this already possible with the current class definition?
The text was updated successfully, but these errors were encountered: