다중 Model 로드 방법 #18

Don9wanKim · 2021-12-31T08:28:48Z

How to reproduce

먼저 좋은 프로젝트를 만들어 주셔서 감사의 말씀을 드립니다.
현재 1080 GPU 8개가 있는 서버에서 Flask 를 사용하여 한국어 모델을 여러개를 올려보는 테스트를 해보고 있는데요.
1개의 모델을 여러개의 GPU에 올리는 부분들은 잘 되는데 동시에 여러 모델을 올릴 때 아래와 같은 에러가 발생하고 있습니다.
혹시 여러 모델을 동시에 올릴 경우 추가적으로 해야할 작업이 있을까요?
타깃 GPU의 경우에는 모델 호출 전 Environments 의 CUDA_VISIBLE_DEVICES를 조절하여 변경하고 있습니다.
ex > os.environ["CUDA_VISIBLE_DEVICES"]="0" , parallelize(model_1, ... )

 > os.environ["CUDA_VISIBLE_DEVICES"]="1" ,  parallelize(model_2, ... )

.... ( 두번 째 모델 로드 시 에러 발생 )
===========================================================       
model name :  ./model/ko-gpt-trinity-1.2B-v0.5
CUDA_VISIBLE_DEVICES :  1
request_gpu :  1                            
used_gpu    :  2
===========================================================          
Process ParallelProcess-2:                                         
Traceback (most recent call last):                                        
  File "/opt/conda/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()                  
  File "/opt/conda/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/parallelformers/parallel/process.py", line 254, in run
    custom_policies=self.custom_policies,
  File "/opt/conda/lib/python3.7/site-packages/parallelformers/parallel/engine.py", line 53, in __init__
    self.mp_group = self.create_process_group(backend)
  File "/opt/conda/lib/python3.7/site-packages/parallelformers/parallel/engine.py", line 104, in create_process_group
    dist.init_process_group(backend=backend)
  File "/opt/conda/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 576, in init_process_group
    store, rank, world_size = next(rendezvous_iterator)
  File "/opt/conda/lib/python3.7/site-packages/torch/distributed/rendezvous.py", line 229, in _env_rendezvous_handler
    store = _create_c10d_store(master_addr, master_port, rank, world_size, timeout)
  File "/opt/conda/lib/python3.7/site-packages/torch/distributed/rendezvous.py", line 158, in _create_c10d_store
    hostname, port, world_size, start_daemon, timeout, multi_tenant=True
RuntimeError: Address already in use

parallelformers/parallel/engine.py 부분에서 dist.init_process_group 을 할 때 에러가 발생하는 것 같은데요.
parallelize 호출 시 어떻게 변경하면 다양한 모델들을 동시에 올릴 수 있을까요?

    def create_process_group(self, backend: str):
        """
        Create Pytorch distributed process group
        Args:
            backend (str): distributed backend
        Returns:
            ProcessGroupNCCL: process group for parallization
        """
        if not dist.is_initialized():
            dist.init_process_group(backend=backend)

        torch.cuda.set_device(int(os.getenv("LOCAL_RANK", "0")))
        new_group = dist.new_group([i for i in range(self.num_gpus)])

        return new_group

Environment

OS : Ubuntu 18.04
Python version :3.7.11
Transformers version : 4.15.0
Whether to use Docker: FROM pytorch/pytorch:1.9.1-cuda11.1-cudnn8-devel
Misc.: " flask 내에서 parallelformers를 활용한 다중 모델 로드"

The text was updated successfully, but these errors were encountered:

Don9wanKim · 2021-12-31T08:34:58Z

bug가 아닌 문의인데 제가 label을 잘못 달았네요.

hyunwoongko · 2021-12-31T08:36:45Z

parallelformers에서는 parallelize(..., master_port=YOUR_MASTER_PORT)와 같이 마스터포트를 변경하시고 사용하시면 됩니다.
참고로 GPT2 모델의 경우 OSLO에서도 지원이 되고 있습니다. parallelformers에 개선된 부분이 많으니 OSLO 사용을 권장드립니다.

https://github.com/tunib-ai/parallelformers/blob/main/FAQ.md#q-can-i-parallelize-multiple-models-on-same-gpus

Don9wanKim · 2021-12-31T08:38:15Z

빠른 답변 정말 감사드립니다.
OSLO도 어제 deployment 부분이 추가 된 것을 확인했습니다.
이후 OSLO로 변경해볼 예정이에요 ㅎㅎ
답변 감사드려요.

Don9wanKim · 2021-12-31T10:00:41Z

말씀해주신 부분을 수정해서 위의 문제가 해결되었습니다. : )

추가적으로 조금 더 질문을 드리자면,

parallelize 를 호출 할때, num_gpus 를 int 로 입력받게 되어 있고 이를 environments 를

init_environments 메소드에서 os.environ["CUDA_VISIBLE_DEVICES"] = ", ".join([str(i) for i in range(num_gpus)] ) 로 할당하게 되는데요.

Flask 하나의 앱 안에서 다중 모델을 로딩할 때 저 부분을 어떤식으로 조절해야 CUDA_VISIBLE_DEVICES 를 제가 원하는 타깃 디바이스에 올릴 수 있을까요?

현재 제가 했던 방법은 parallelize 에서 os.environ["CUDA_VISIBLE_DEVICES"] 를 변경 후 호출을 했는데, 실질적으로는 위의 코드를 타면서 결국 같은 GPU에 할당하는 것 같더라구요.

hyunwoongko · 2021-12-31T10:45:52Z

지금은 딱히 그렇게 수행하는걸 지원하고 있지는 않습니다. 추후에 추가하면 말씀드리겠습니다.

Don9wanKim · 2021-12-31T11:12:55Z

연말인데 늦게 까지 답변 감사합니다. : )

Don9wanKim added the bug Something isn't working label Dec 31, 2021

hyunwoongko added question Further information is requested and removed bug Something isn't working labels Dec 31, 2021

Don9wanKim closed this as completed Dec 31, 2021

Don9wanKim reopened this Dec 31, 2021

Don9wanKim closed this as completed Dec 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

다중 Model 로드 방법 #18

다중 Model 로드 방법 #18

Don9wanKim commented Dec 31, 2021 •

edited

Don9wanKim commented Dec 31, 2021

hyunwoongko commented Dec 31, 2021 •

edited

Don9wanKim commented Dec 31, 2021

Don9wanKim commented Dec 31, 2021 •

edited

hyunwoongko commented Dec 31, 2021

Don9wanKim commented Dec 31, 2021

다중 Model 로드 방법 #18

다중 Model 로드 방법 #18

Comments

Don9wanKim commented Dec 31, 2021 • edited

How to reproduce

Environment

Don9wanKim commented Dec 31, 2021

hyunwoongko commented Dec 31, 2021 • edited

Don9wanKim commented Dec 31, 2021

Don9wanKim commented Dec 31, 2021 • edited

hyunwoongko commented Dec 31, 2021

Don9wanKim commented Dec 31, 2021

Don9wanKim commented Dec 31, 2021 •

edited

hyunwoongko commented Dec 31, 2021 •

edited

Don9wanKim commented Dec 31, 2021 •

edited