Files

sagemaker-core
end_to_end_ml_lifecycle
prepare_data
build_and_train_models
- sm-distributed_data_parallelism_pytorch
- sm-distributed_model_parallel
- sm-distributed_model_parallel_v2
  - gpt-neox
  - llama_v2_v3
  - llama_v3d1
  - mixtral
  - shared-scripts
  - README.md
- sm-forecast_deepar_time_series_modeling
- sm-heterogeneous_clusters_for_model_training
- sm-heterogeneous_clusters_training
- sm-hyperparameter_tuning_pytorch
- sm-introduction_to_blazingtext_word2vec_text8
- sm-introduction_to_ip_insights
- sm-introduction_to_lda
- sm-introduction_to_ntm
- sm-introduction_to_object2vec_sentence_similarity
- sm-jax_bring_your_own
- sm-jumpstart_private_model_hub_import
- sm-managed_spot_training_xgboost
- sm-marketplace_build_model_package_for_listing
- sm-marketplace_building_your_own_container_as_package
- sm-model_trainer
- sm-object_detection_birds
- sm-random_cut_forest_example
- sm-regression_xgboost
- sm-remote_function_pytorch_mnist
- sm-remote_function_quick_start
- sm-scikit_build_your_own_container
- sm-script_mode_distributed_training_horovod_tensorflow
- sm-semantic_segmentation
- sm-smddp_bert
- sm-training_compiler_language_modeling_multi_gpu_multi_node
- sm-training_compliler_single_node_single_gpu_bert
- README.md
- sm-automatic_model_tune_hyperparameter_tuning_early_stopping.ipynb
- sm-deepar_example.ipynb
- sm-distributed_training_model_parallel_v2_mixtral_on_p4.ipynb
- sm-hpo_warmstart_image_classification.ipynb
- sm-hyperparameter_tuning_hyperband_automatic_model_tuning.ipynb
- sm-introduction_to_auogluon_tabular_regression.ipynb
- sm-introduction_to_factorization_machines.ipynb
- sm-introduction_to_pca.ipynb
- sm-k_nearest_neighbors_multi_class_classification.ipynb
- sm-lightgbm_catboost_tabular_classification.ipynb
- sm-linear_learner_mnist.ipynb
- sm-tabtransformer_tabular_classification.ipynb
deploy_and_monitor
generative_ai
ml_ops
responsible_ai
archived
.github
_static
_templates
.gitignore
.readthedocs.yml
CODEOWNERS
CONTRIBUTING.md
LICENSE.txt
Makefile
NOTICE
README.md
conf.py
config.json
environment.yml
index.rst
intro.rst
make.bat
new_file_structure_updated_notebook_names_and_folders.xlsx
tox.ini

sm-distributed_model_parallel_v2

Name		Name	Last commit message	Last commit date
parent directory ..
gpt-neox		gpt-neox
llama_v2_v3		llama_v2_v3
llama_v3d1		llama_v3d1
mixtral		mixtral
shared-scripts		shared-scripts
README.md		README.md

README.md

Sagemaker Model Parallelism

This directory contains example scripts to train or fine-tune large scale models, with the Sagemaker distributed model parallelism library. When using one of the ipynb notebooks within the folders of this directory please make sure to use the ./shared-scripts/ directory as the source directory when submitting a job.

For example, if one wanted to submit a llama finetune job on Sagemaker using the /llama_v2/smp-finetuning-llama-fsdp-tp.ipynb notebook, they would have to copy that notebook within the ./shared-scripts/ directory to make sure it can access all the accompanied files.

After cloning this repository run the following command to setup a copy of the notebook associated with your desired model into the ./shared-scripts/ directory.

cp [RELATIVE PATH TO ipynb] shared-scripts/

Finally, upload the ./shared-scripts/ directory to a Sagemaker notebook to submit your training/finetuning job.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

sm-distributed_model_parallel_v2

sm-distributed_model_parallel_v2

README.md

Sagemaker Model Parallelism

Files

sm-distributed_model_parallel_v2

Directory actions

More options

Directory actions

More options

Latest commit

History

sm-distributed_model_parallel_v2

Folders and files

parent directory

README.md

Sagemaker Model Parallelism