Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

training losses of urbandriver is easy to diverge. is there any training tricks to stabilizing the training process? #383

Open
shubaozhang opened this issue Apr 1, 2022 · 5 comments

Comments

@shubaozhang
Copy link

Any training tricks to stabilizing the training process of UrbanDriver?

  1. Optimizer params: learning rate, batch_size, .etc.
  2. Since urbandirver is an offline RL method, does any tricks exist to constrain the simulation?
@perone
Copy link
Contributor

perone commented Apr 1, 2022

Hi @shubaozhang, none of the authors of this work are working at Level-5 anymore, you can try reaching directly to them. With that said, my personal take on it is that it doesn't seem to be an offline RL method for many reasons (i.e. you are not optimizing for expected reward but for an imitation loss, you still have a differentiable loss minimization and simulation, there is no exploration, etc) so it is quite different than what we have in a RL setting or the setting where policy gradient theorem is derived from.

@shubaozhang
Copy link
Author

Hi @shubaozhang, none of the authors of this work are working at Level-5 anymore, you can try reaching directly to them. With that said, my personal take on it is that it doesn't seem to be an offline RL method for many reasons (i.e. you are not optimizing for expected reward but for an imitation loss, you still have a differentiable loss minimization and simulation, there is no exploration, etc) so it is quite different than what we have in a RL setting or the setting where policy gradient theorem is derived from.

Thanks for your reply

@jeffreywu13579
Copy link

Hi @shubaozhang, were you ever able to figure out the issues with training? I'm also facing difficulties in getting my trained urbandriver model (specifically the open loop with history) to match the performance of the pretrained models provided.

@shubaozhang
Copy link
Author

Hi @shubaozhang, were you ever able to figure out the issues with training? I'm also facing difficulties in getting my trained urbandriver model (specifically the open loop with history) to match the performance of the pretrained models provided.

The parameters: history_num_frames_ego, future_num_frames, etc., affects a lot. I use the following paramets. And the training loss converges.
image

@jeffreywu13579
Copy link

Hi @shubaozhang, thanks for these configurations! Have you tried evaluating your trained model in closed_loop_test.ipynb and visualizing the scenes? My trained Urban Driver model (with the configs above as well as the default configs) with 150k iterations on train_full.zarr still seem to converge a degenerate solution (such as just driving straight ahead regardless of the map) whereas the pre-trained BPTT.pt provided does not have this issue. Were there any additional changes you had to make (such as to closed_loop_model.py or open_loop_model.py)?

Also, by chance, have you tried loading in the state dict of the pretrained BPTT.pt model (as opposed as directly using the Jit model)? It seems the provided configs do not work when trying to load the state dict. I had to change the d_local from 256 to 128 in open_loop_model.py to get the pretrained state dict to load in to my model and there seems to be other mismatches (the shape of result['positions'] is different between the BPTT.pt Jit model and my model where I load in the state dict of BPTT.pt).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants