Set required variables. To get values for S3_ENDPOINT, S3_ACCESS_KEY, S3_SECRET_KEY variables, run following command on the bastion/helper node: `oc extract -n openshift-storage secret/s3-secret-bck --to=-`

In [None]:
%env S3_ENDPOINT=<S3_RGW_ROUTE>
%env S3_ACCESS_KEY=<AWS_ACCESS_KEY_ID>
%env S3_SECRET_KEY=<AWS_SECRET_ACCESS_KEY>
%env HF_TOKEN=<PASTE_HUGGINGFACE_TOKEN>

If the cluster is behind a proxy set proxy variables. If not you can skip following code block.

In [None]:
%env http_proxy=<PASTE_HTTP_PROXY>
%env https_proxy=<PASTE_HTTPS_PROXY>
%env no_proxy=<PASTE_NO_PROXY>,${S3_ENDPOINT}
%env NO_PROXY=<PASTE_NO_PROXY>,${S3_ENDPOINT}

Install required huggingface package.

In [None]:
!pip install huggingface-hub

Run Python script to download model from HuggingFace website to OpenShift RGW storage. It can take some time to download, depending on the model size.

In [None]:
import os
import boto3
import botocore
import glob

from huggingface_hub import snapshot_download

import warnings
warnings.filterwarnings("ignore")


bucket_name = 'model-bucket'
s3_endpoint = os.environ.get('S3_ENDPOINT')
s3_accesskey = os.environ.get('S3_ACCESS_KEY')
s3_secretkey = os.environ.get('S3_SECRET_KEY')
path = 'models'
hf_token = os.environ.get('HF_TOKEN')

config = botocore.config.Config(proxies={})
session = boto3.session.Session()
s3_resource = session.resource('s3',
                               endpoint_url=s3_endpoint,
                               verify=False,
                               aws_access_key_id=s3_accesskey,
                               aws_secret_access_key=s3_secretkey, config=config)

bucket = s3_resource.Bucket(bucket_name)

print('Downloading model...')
snapshot_download("meta-llama/Llama-2-70b-chat-hf", cache_dir=f'./models', token=hf_token)

files = (file for file in glob.glob(f'{path}/**/*', recursive=True) if os.path.isfile(file) and "snapshots" in file)

print('Updating files to s3...')
for filename in files:
    s3_name = filename.replace(path, '')
    print(f'Uploading: {filename} to {path}{s3_name}')
    bucket.upload_file(filename, f'{path}{s3_name}')