CUDA out of memory #1

Sanchez2020 · 2021-06-28T06:32:21Z

Hi，Thank you for releasing your code. When I run R-GSN get error "RuntimeError: CUDA out of memory. Tried to allocate 562.00 MiB (GPU 1; 10.76 GiB total capacity; 8.98 GiB already allocated; 470.56 MiB free; 9.19 GiB reserved in total by PyTorch)".

I try to reduce the batch size. batch_size has been reduced to 64 and test_batch_size has been reduced to 4, I still get the same error. I used GeForce RTX 2080, can u tell me why and how to fix it, thanks a lot!

Environment

numpy==1.18.5
scipy==1.6.2
ogb==1.3.1
texttable==1.6.3
torch==1.7.0+cu110
torchvision==0.8.0
torch-cluster==1.5.9
torch-geometric==1.7.0
torch-scatter==2.0.7
torch-sparse==0.6.9
torch-spline-conv==1.2.1

full error information

Using backend: pytorch
+-----------------+-------+
| Parameter       | Value |
+-----------------+-------+
| device          | 1     |
+-----------------+-------+
| num_layers      | 2     |
+-----------------+-------+
| hidden_channels | 64    |
+-----------------+-------+
| dropout         | 0.500 |
+-----------------+-------+
| lr              | 0.004 |
+-----------------+-------+
| epochs          | 3     |
+-----------------+-------+
| runs            | 10    |
+-----------------+-------+
| batch_size      | 64    |
+-----------------+-------+
| test_batch_size | 4     |
+-----------------+-------+
| opt             | adamw |
+-----------------+-------+
| early_stop      | 1     |
+-----------------+-------+
| feat_dir        | feat  |
+-----------------+-------+
| conv_name       | rgsn  |
+-----------------+-------+
| Norm4           | 1     |
+-----------------+-------+
| FDFT            | 1     |
+-----------------+-------+
| use_attack      | 1     |
+-----------------+-------+
Data(
  edge_index_dict={
    ('author', 'affiliated_with', 'institution')=[2, 1043998],
    ('author', 'writes', 'paper')=[2, 7145660],
    ('paper', 'cites', 'paper')=[2, 5416271],
    ('paper', 'has_topic', 'field_of_study')=[2, 7505078]
  },
  edge_reltype={
    ('author', 'affiliated_with', 'institution')=[1043998, 1],
    ('author', 'writes', 'paper')=[7145660, 1],
    ('paper', 'cites', 'paper')=[5416271, 1],
    ('paper', 'has_topic', 'field_of_study')=[7505078, 1]
  },
  node_year={
    paper=[736389, 1]
  },
  num_nodes_dict={
    author=1134649,
    field_of_study=59965,
    institution=8740,
    paper=736389
  },
  x_dict={
    paper=[736389, 128]
  },
  y_dict={
    paper=[736389, 1]
  }
)
preprocess finished
/opt/conda/lib/python3.8/site-packages/torch/nn/modules/container.py:550: UserWarning: Setting attributes on ParameterDict is not supported.
  warnings.warn("Setting attributes on ParameterDict is not supported.")
Model #Params: 154373028
Attack Epoch 01: 100%|███████████████| 629571/629571 [1:01:40<00:00, 170.13it/s]
* infer valid_test exact :  86%|█████▏| 629655/736389 [01:41<2:44:33, 10.81it/s]Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
    return func(*args, **kwargs)
  File "/data-input/houl/R-GSN/rgsn.py", line 284, in infer
    out = model(n_id, x_dict, adjs, edge_type, node_type, local_node_idx)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/data-input/houl/R-GSN/models.py", line 266, in forward
    x = conv((x, x_target), edge_index, edge_type[e_id], node_type, src_node_type)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/data-input/houl/R-GSN/models.py", line 124, in forward
    msg_from_i = F.normalize(self.propagate(ei, x=x, edge_type=i, src_node_type = src_node_type, a=a))
  File "/opt/conda/lib/python3.8/site-packages/torch_geometric/nn/conv/message_passing.py", line 237, in propagate
    out = self.message(**msg_kwargs)
  File "/data-input/houl/R-GSN/models.py", line 163, in message
    res = a.unsqueeze(-1) * self.rel_lins[edge_type](x_j)  ######## Message Transform
RuntimeError: CUDA out of memory. Tried to allocate 310.00 MiB (GPU 1; 10.76 GiB total capacity; 9.36 GiB already allocated; 54.56 MiB free; 9.59 GiB reserved in total by PyTorch)
python-BaseException

The text was updated successfully, but these errors were encountered:

xjtuwxliang · 2021-06-28T08:06:48Z

@Sanchez2020 Hello, the GPU I used in the experiment was GTX1080Ti, 11GB. But I currently don't have a GPU on hand. If adjusting test_batch_size cannot solve the problem, maybe you can only try to use a GPU with a slightly larger memory. I haven't thought of a better solution right now, so sorry.

Sanchez2020 · 2021-06-28T08:30:15Z

@xjtuwxliang ，thank you for your reply.
I used GPU has 10.76 GiB total capacity, and no other programs occupied. So I am confused.
I'm looking for other ways, thank you again.

Sanchez2020 closed this as completed Jul 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA out of memory #1

CUDA out of memory #1

Sanchez2020 commented Jun 28, 2021

xjtuwxliang commented Jun 28, 2021

Sanchez2020 commented Jun 28, 2021

CUDA out of memory #1

CUDA out of memory #1

Comments

Sanchez2020 commented Jun 28, 2021

Environment

full error information

xjtuwxliang commented Jun 28, 2021

Sanchez2020 commented Jun 28, 2021