Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code review #1

Open
tejank10 opened this issue May 28, 2018 · 9 comments
Open

Code review #1

tejank10 opened this issue May 28, 2018 · 9 comments

Comments

@tejank10
Copy link
Owner

Hey @MikeInnes, if you are back could you please review the code? New models which I have added are Dueling DQN, Advantage Actor-Critic, and DDPG. Also, all the previous work done on DQN is added to dqn directory.

@jekbradbury
Copy link

These look so much better than the RL code I've seen in Python-based frameworks. One thing I noticed, though, is that functions like zero_grad! and update_target! are implemented with nested loops over individual parameter elements, so they would have to be rewritten for GPU. Instead, maybe use a single loop over the parameter arrays and an in-place broadcast inside the loop? Or maybe the outer loop can also be replaced with mapleaves!?

@tejank10
Copy link
Owner Author

tejank10 commented May 30, 2018

Fixed it. Can we add those two functions (or just zero_grad! maybe) to Flux?

@MikeInnes
Copy link

How about turning this into a Julia package? Makes the dependencies easier. It would be nice to start training a standard model in a couple of lines, and even better if I can quickly demo a trained model.

Would be great to make sure they are all GPU compatible as well.

@tejank10
Copy link
Owner Author

tejank10 commented Jun 2, 2018

Cool, I'll start working on it.

@jekbradbury
Copy link

What's going on with lines like https://github.com/tejank10/Flux-baselines/blob/master/dqn/duel-dqn.jl#L46? Is it a matter of Flux not being able to deal with the complicated broadcast expression if you write it out with dots? (If so we should make sure that's fixed in Flux-on-v0.7).

@tejank10
Copy link
Owner Author

tejank10 commented Jun 7, 2018

Works with dots, fixed it

@zahachtah
Copy link

I am trying to adapt this code to a problem of mine but I get an error (on julia 1.0). Just wondering if anyone could give a hint how to deal with this. The code line is

logπ = log.(mean(π .* action, 1) + 1f-10)

and π comes directly from the Flux model (I use the AC example)

π = softmax(policy(base_out))
MethodError: no method matching mean(::TrackedArray{…,Array{Float64,1}}, ::Int64)
Closest candidates are:
  mean(!Matched::Union{Function, Type}, ::Any) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.0/Statistics/src/Statistics.jl:58
  mean(::TrackedArray; dims) at /Users/jonnorberg/.julia/packages/Flux/jbpWo/src/tracker/array.jl:226
  mean(::AbstractArray; dims) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.0/Statistics/src/Statistics.jl:128

Not sure why the mean function doesn't recognize the ::TrackedArray or how I could adjust it to work.

thanks for any hints

@jekbradbury
Copy link

If you're on Julia 1.0, mean takes dims as a kwarg.

@zahachtah
Copy link

major thanks. I also noted I need to change + to .+ at two places to not get error message

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants