Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running out of GPU memory after several minutes training #7

Open
ganzhi opened this issue Feb 21, 2021 · 5 comments
Open

Running out of GPU memory after several minutes training #7

ganzhi opened this issue Feb 21, 2021 · 5 comments

Comments

@ganzhi
Copy link

ganzhi commented Feb 21, 2021

Hi,

I got a CUDA out of memory issue after several minutes training. Is there a way to fix it?

(py38) C:\Src\GitHub\MadMario>python main.py
Loading model at checkpoints\2021-02-20T16-13-06\trained_mario.chkpt with exploration rate 0.1
Episode 0 - Step 660 - Epsilon 0.1 - Mean Reward 2990.0 - Mean Length 660.0 - Mean Loss 0.0 - Mean Q Value 0.0 - Time Delta 10.198 - Time 2021-02-20T16:29:03
Episode 20 - Step 5262 - Epsilon 0.1 - Mean Reward 1311.095 - Mean Length 250.571 - Mean Loss 0.0 - Mean Q Value 0.0 - Time Delta 61.936 - Time 2021-02-20T16:30:05
Episode 40 - Step 9888 - Epsilon 0.1 - Mean Reward 1149.829 - Mean Length 241.171 - Mean Loss 0.0 - Mean Q Value 0.0 - Time Delta 62.843 - Time 2021-02-20T16:31:08
Episode 60 - Step 13407 - Epsilon 0.1 - Mean Reward 1072.361 - Mean Length 219.787 - Mean Loss 0.0 - Mean Q Value 0.0 - Time Delta 47.898 - Time 2021-02-20T16:31:56
Episode 80 - Step 19197 - Epsilon 0.1 - Mean Reward 1144.407 - Mean Length 237.0 - Mean Loss 0.0 - Mean Q Value 0.0 - Time Delta 77.715 - Time 2021-02-20T16:33:14
Episode 100 - Step 22474 - Epsilon 0.1 - Mean Reward 1060.12 - Mean Length 218.14 - Mean Loss 0.0 - Mean Q Value 0.0 - Time Delta 44.237 - Time 2021-02-20T16:33:58
Episode 120 - Step 26864 - Epsilon 0.1 - Mean Reward 1015.29 - Mean Length 216.02 - Mean Loss 0.0 - Mean Q Value 0.0 - Time Delta 58.86 - Time 2021-02-20T16:34:57
Episode 140 - Step 32109 - Epsilon 0.1 - Mean Reward 1094.56 - Mean Length 222.21 - Mean Loss 0.0 - Mean Q Value 0.0 - Time Delta 71.322 - Time 2021-02-20T16:36:08
Traceback (most recent call last):
File "main.py", line 59, in
action = mario.act(state)
File "C:\Src\GitHub\MadMario\agent.py", line 57, in act
state = torch.FloatTensor(state).cuda() if self.use_cuda else torch.FloatTensor(state)
RuntimeError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 10.00 GiB total capacity; 7.56 GiB already allocated; 0 bytes free; 7.74 GiB reserved in total by PyTorch)

@oldschooler-dev
Copy link

Hi,
I also receive this error giving it a try. I tweaked the memory settings in the agent:
self.memory = deque(maxlen=20000) in the agent.py line 13 from 100000
torch.FloatTensor I guess ! these are never freed which I think is the expectation to have this on the GPU for access during training(eg the memory for experiences)! I am not 100% as I am a journey of trying to fit this into a rtx2080 on Windows, pytorch cuda 1.7.1. I have tried the latest dev with the same results so thinks its down to not having a 32gb on the GPU, I am now up to 10,000 episodes after tweaking the memory to be less as above..fingers crossed its going to take another 24 hours I would imagine....not sure how many episodes I have to go through to get the results of the model provided though.

@oldschooler-dev
Copy link

and also the burn in I changed... self.burnin = 1e4

@ganzhi
Copy link
Author

ganzhi commented Mar 3, 2021

Created a PR to address the issue here:
#8

@oldschooler-dev, you can try my fix by cloning this repo: https://github.com/ganzhi/MadMario

@oldschooler-dev
Copy link

Seems to work ok, no memory errors...Cheers

@LI-SUSTech
Copy link

Seems to work ok, no memory errors...Cheers

how did you fix the problem. It seems no difference between @ganzhi fork between the master ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants