question on freezing target nework #15

hashbangCoder · 2016-04-20T09:46:18Z

Hi @yenchenlin1994 , love your implementation!
I went through your code and I can't seem to find where you've frozen the target network?
Unless Im missing something in my excess-caffeine induced brain fade,you continue to update the target every batch?
Wouldn't that hurt your convergence rate badly?

yenchenlin · 2016-04-20T10:25:50Z

Hello,
Yeah you are right.
Actually I got a reimplemented version.
Will submit soon!
On Wed, Apr 20, 2016 at 17:46 Code-Deep-Blue notifications@github.com
wrote:

Hi @yenchenlin1994 https://github.com/yenchenlin1994 , love your
implementation!
I went through your code and I can't seem to find where you've frozen the
target network?
Unless Im missing something in my excess-caffeine induced brain fade,you
continue to update the target every batch?
Wouldn't that hurt your convergence rate badly?

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#15

hashbangCoder · 2016-05-10T02:46:44Z

Hi again,
i'm trying to reproduce the results on keras and have trained for ~~400,000 steps and the bird is unable to cross the first pipe consistently. My loss is low though (~~ 0.2) and Q-values are in the range of [0,8]. How long did it take for you before it actually started working i.e. cross the first pipe consistently?

yenchenlin · 2016-05-10T06:12:41Z

I can't remember the exactly number of iterations, but it's no more than ~1000,000 steps

xiahouzuoxin · 2017-05-27T07:29:27Z

Still cannot find freezing target network in current version's code. It's really no effect?

zsy372901 · 2017-09-19T19:09:05Z

@hashbangCoder
I meet the same question that the silly bird keeps top of the screen.....Did you fix it?

weijinsong · 2017-12-08T02:54:49Z

I also couldn't find freezing target network code. But thanks for your code. It's helpful for me.

initial-h · 2018-06-05T03:56:05Z

I write a version base on this repo with freezing target network.FlappyBird_DQN_with_target_network

patrick-llgc · 2019-01-29T03:01:15Z

Here is another repo with target network. https://github.com/patrick-12sigma/DRL_FlappyBird

I made target network an option. You can turn it on and off and experiment to see how much it affects the convergence of training.

I refactored the network into a class, and added some logging functionalities to track the training process. I also borrowed the human play function from @initial-h. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question on freezing target nework #15

question on freezing target nework #15

hashbangCoder commented Apr 20, 2016

yenchenlin commented Apr 20, 2016

hashbangCoder commented May 10, 2016

yenchenlin commented May 10, 2016

xiahouzuoxin commented May 27, 2017

zsy372901 commented Sep 19, 2017

weijinsong commented Dec 8, 2017

initial-h commented Jun 5, 2018

patrick-llgc commented Jan 29, 2019 •

edited

question on freezing target nework #15

question on freezing target nework #15

Comments

hashbangCoder commented Apr 20, 2016

yenchenlin commented Apr 20, 2016

hashbangCoder commented May 10, 2016

yenchenlin commented May 10, 2016

xiahouzuoxin commented May 27, 2017

zsy372901 commented Sep 19, 2017

weijinsong commented Dec 8, 2017

initial-h commented Jun 5, 2018

patrick-llgc commented Jan 29, 2019 • edited

patrick-llgc commented Jan 29, 2019 •

edited