Use EasyLM to pre-train llama-7B using Nvidia GPU #78

zhpacer · 2023-07-24T02:46:58Z

Do we have the training script to pre-train a llama-7B model using GPU such as A100? Current examples are based on TPU. Don't know if there are some difference. thanks.

young-geng · 2023-07-24T08:59:17Z

I believe the configuration would be very similar, although you might need to tune the mesh dimensions according to your cluster configuration and network topology to get the best performance. Specifically, you'll want to add these options when training on GPUs in a multihost environment:

python -m EasyLM.models.llama.llama_train \
    --jax_distributed.initialize_jax_distributed=True \
    --jax_distributed.coordinator_address=<your coordinator (process 0) address and port> \
    --jax_distributed.num_processes=<total number of processes (hosts)> \
    --jax_distributed.process_id=<current process id>

zhpacer · 2023-07-24T10:57:38Z

Great thanks, I will have a try

young-geng mentioned this issue Jul 25, 2023

LlaMa Pretraining in A100 80G #79

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use EasyLM to pre-train llama-7B using Nvidia GPU #78

Use EasyLM to pre-train llama-7B using Nvidia GPU #78

zhpacer commented Jul 24, 2023

young-geng commented Jul 24, 2023

zhpacer commented Jul 24, 2023

Use EasyLM to pre-train llama-7B using Nvidia GPU #78

Use EasyLM to pre-train llama-7B using Nvidia GPU #78

Comments

zhpacer commented Jul 24, 2023

young-geng commented Jul 24, 2023

zhpacer commented Jul 24, 2023