You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
NumpyIndex does not require the full index to be in memory, but can the same be said on TorchIndex?
I'm trying to run an experiment on the MS MARCO dev set with TorchIndex qnd i get this error saying cuda is out of memory.
Thanks for any help!
Here's the code:
`#Training
model = pyterrier_dr.TctColBert(model_name='distilbert-base-uncased')
dataset = pt.get_dataset('irds:msmarco-passage/train/judged')
model.fit(dataset=dataset, steps=1000)
#Evaluation
dataset = pt.get_dataset('irds:msmarco-passage/dev/judged')
index = pyterrier_dr.TorchIndex('index.torch')
idx_pipeline = model >> index
idx_pipeline.index(dataset.get_corpus_iter())
retr_pipeline = model >> index
pt.Experiment(
[retr_pipeline],
dataset.get_topics(),
dataset.get_qrels(),
eval_metrics=["map", "recip_rank","ndcg", "ndcg_cut_10"]
)`
And i get this error:
`Traceback (most recent call last):
File "/XXX/DR_cuda/DR_cuda.py", line 22, in
pt.Experiment(
File "/XXX/dr_env/lib/python3.8/site-packages/pyterrier/pipelines.py", line 450, in Experiment
time, evalMeasuresDict = _run_and_evaluate(
File "/XXX/dr_env/lib/python3.8/site-packages/pyterrier/pipelines.py", line 170, in _run_and_evaluate
res = system.transform(topics)
File "/XXX/dr_env/lib/python3.8/site-packages/pyterrier/ops.py", line 335, in transform
topics = m.transform(topics)
File "/XXX/dr_env/lib/python3.8/site-packages/pyterrier_dr/indexes.py", line 676, in transform
scores = query_vecs @ self._cuda_data[:bsize].T
RuntimeError: CUDA out of memory. Tried to allocate 33.70 GiB (GPU 0; 10.92 GiB total capacity; 1.63 GiB already allocated; 8.27 GiB free; 2.00 GiB reserved in total by PyTorch)
srun: error: gpu-nc06: task 0: Exited with exit code 1`
The text was updated successfully, but these errors were encountered:
oussaidene
changed the title
Memory requerement for TorchIndex
Memory requirement for TorchIndex
Feb 7, 2023
TorchIndex works by moving chunks of data to the GPU memory. The size of these chunks can be controlled with the idx_mem parameter, which specifies an (approximation) of the memory it will use (in bytes). By default, it's 500000000 (~500MB), but you can adjust that as you need. Since your encoder model is probably also on GPU, this size will need to be lower than your total GPU memory.
Hi,
NumpyIndex does not require the full index to be in memory, but can the same be said on TorchIndex?
I'm trying to run an experiment on the MS MARCO dev set with TorchIndex qnd i get this error saying cuda is out of memory.
Thanks for any help!
Here's the code:
`#Training
model = pyterrier_dr.TctColBert(model_name='distilbert-base-uncased')
dataset = pt.get_dataset('irds:msmarco-passage/train/judged')
model.fit(dataset=dataset, steps=1000)
#Evaluation
dataset = pt.get_dataset('irds:msmarco-passage/dev/judged')
index = pyterrier_dr.TorchIndex('index.torch')
idx_pipeline = model >> index
idx_pipeline.index(dataset.get_corpus_iter())
retr_pipeline = model >> index
pt.Experiment(
[retr_pipeline],
dataset.get_topics(),
dataset.get_qrels(),
eval_metrics=["map", "recip_rank","ndcg", "ndcg_cut_10"]
)`
And i get this error:
`Traceback (most recent call last):
File "/XXX/DR_cuda/DR_cuda.py", line 22, in
pt.Experiment(
File "/XXX/dr_env/lib/python3.8/site-packages/pyterrier/pipelines.py", line 450, in Experiment
time, evalMeasuresDict = _run_and_evaluate(
File "/XXX/dr_env/lib/python3.8/site-packages/pyterrier/pipelines.py", line 170, in _run_and_evaluate
res = system.transform(topics)
File "/XXX/dr_env/lib/python3.8/site-packages/pyterrier/ops.py", line 335, in transform
topics = m.transform(topics)
File "/XXX/dr_env/lib/python3.8/site-packages/pyterrier_dr/indexes.py", line 676, in transform
scores = query_vecs @ self._cuda_data[:bsize].T
RuntimeError: CUDA out of memory. Tried to allocate 33.70 GiB (GPU 0; 10.92 GiB total capacity; 1.63 GiB already allocated; 8.27 GiB free; 2.00 GiB reserved in total by PyTorch)
srun: error: gpu-nc06: task 0: Exited with exit code 1`
The text was updated successfully, but these errors were encountered: