training losses of urbandriver is easy to diverge. is there any training tricks to stabilizing the training process? #383

shubaozhang · 2022-04-01T12:31:14Z

Any training tricks to stabilizing the training process of UrbanDriver?

Optimizer params: learning rate, batch_size, .etc.
Since urbandirver is an offline RL method, does any tricks exist to constrain the simulation?

perone · 2022-04-01T13:43:05Z

Hi @shubaozhang, none of the authors of this work are working at Level-5 anymore, you can try reaching directly to them. With that said, my personal take on it is that it doesn't seem to be an offline RL method for many reasons (i.e. you are not optimizing for expected reward but for an imitation loss, you still have a differentiable loss minimization and simulation, there is no exploration, etc) so it is quite different than what we have in a RL setting or the setting where policy gradient theorem is derived from.

shubaozhang · 2022-04-02T02:17:10Z

Hi @shubaozhang, none of the authors of this work are working at Level-5 anymore, you can try reaching directly to them. With that said, my personal take on it is that it doesn't seem to be an offline RL method for many reasons (i.e. you are not optimizing for expected reward but for an imitation loss, you still have a differentiable loss minimization and simulation, there is no exploration, etc) so it is quite different than what we have in a RL setting or the setting where policy gradient theorem is derived from.

Thanks for your reply

jeffreywu13579 · 2022-04-17T10:00:40Z

Hi @shubaozhang, were you ever able to figure out the issues with training? I'm also facing difficulties in getting my trained urbandriver model (specifically the open loop with history) to match the performance of the pretrained models provided.

shubaozhang · 2022-04-18T01:48:58Z

Hi @shubaozhang, were you ever able to figure out the issues with training? I'm also facing difficulties in getting my trained urbandriver model (specifically the open loop with history) to match the performance of the pretrained models provided.

The parameters: history_num_frames_ego, future_num_frames, etc., affects a lot. I use the following paramets. And the training loss converges.

jeffreywu13579 · 2022-04-26T10:11:40Z

Hi @shubaozhang, thanks for these configurations! Have you tried evaluating your trained model in closed_loop_test.ipynb and visualizing the scenes? My trained Urban Driver model (with the configs above as well as the default configs) with 150k iterations on train_full.zarr still seem to converge a degenerate solution (such as just driving straight ahead regardless of the map) whereas the pre-trained BPTT.pt provided does not have this issue. Were there any additional changes you had to make (such as to closed_loop_model.py or open_loop_model.py)?

Also, by chance, have you tried loading in the state dict of the pretrained BPTT.pt model (as opposed as directly using the Jit model)? It seems the provided configs do not work when trying to load the state dict. I had to change the d_local from 256 to 128 in open_loop_model.py to get the pretrained state dict to load in to my model and there seems to be other mismatches (the shape of result['positions'] is different between the BPTT.pt Jit model and my model where I load in the state dict of BPTT.pt).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

training losses of urbandriver is easy to diverge. is there any training tricks to stabilizing the training process? #383

training losses of urbandriver is easy to diverge. is there any training tricks to stabilizing the training process? #383

shubaozhang commented Apr 1, 2022

perone commented Apr 1, 2022

shubaozhang commented Apr 2, 2022

jeffreywu13579 commented Apr 17, 2022

shubaozhang commented Apr 18, 2022

jeffreywu13579 commented Apr 26, 2022

training losses of urbandriver is easy to diverge. is there any training tricks to stabilizing the training process? #383

training losses of urbandriver is easy to diverge. is there any training tricks to stabilizing the training process? #383

Comments

shubaozhang commented Apr 1, 2022

perone commented Apr 1, 2022

shubaozhang commented Apr 2, 2022

jeffreywu13579 commented Apr 17, 2022

shubaozhang commented Apr 18, 2022

jeffreywu13579 commented Apr 26, 2022