# Most popular open weights LLMs - December 2023

## Dependencies

In [None]:
pip install --upgrade transformers

In [3]:
import transformers
transformers.__version__

'4.36.2'

For Qwen/Qwen-7B:

In [None]:
pip install tiktoken

In [None]:
pip install transformers_stream_generator

For 01-ai/Yi-6B:

In [None]:
pip install sentencepiece

## Models dictionary

Models are selected based on the following criteria:
- Base models only: to test their initial strengh on multilingual text before fine-tuning
- State of the art or very significant at the time of their release
- For local use: should run reasonably well on a 24 GB consumer GPU machine

For the 30B-45B parameters range of models:
- Only consider 4 bits quantized versions: 16 bits versions are too big to store on disk on a consumer machine
- Use fine-tuned models: 4 bits quantized versions of base models unfortunately don't exist on HuggingFace

### 1B parameters

| Date | Name | Disk size | Gated access | Remote code | URL | Params | Context | Train tokens | License |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| 2023/12/29 | tinyllama_1b | ?? GB | No | No | https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1195k-token-2.5T | 1.1 B | 2048 | 2.5 T | Apache 2.0 |

### 3B parameters

| Date | Name | Disk size | Gated access | Remote code | URL | Params | Context | Train tokens | License |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| 2023/05/04 | redpajama_3b | 5.30 GB | No | No | https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-3B-v1 | 2.8 B | 2048 | 800 B | Apache 2.0 |
| 2023/07/14 | btlm_3b | 4.93 GB | No | YES | https://huggingface.co/cerebras/btlm-3b-8k-base | 3 B | 8192 | 627 B | Apache 2.0 |
| 2023/07/16 | openllama2_3b | 6.38 GB | No | No | https://huggingface.co/openlm-research/open_llama_3b_v2 | 3 B | 2048 | 1 T | Apache 2.0 |
| 2023/09/29 | stablelm_3b | 5.21 GB | YES | YES | https://huggingface.co/stabilityai/stablelm-3b-4e1t | 3 B | 4096 | 1 T | CC BY-SA-4.0 |
| 2023/12/13 | phi2_3b | 5.18 GB | No | YES | https://huggingface.co/microsoft/phi-2 | 2.7 B | 2048 | 1.4 T | MIT |

### 7B parameters

| Date | Name | Disk size | Gated access | Remote code | URL | Params | Context | Train tokens | License |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| 2022/10/09 | bloomz_7b | 13.18 GB | No | No | https://huggingface.co/bigscience/bloomz-7b1-mt | 7.1 B | 2048 | 350 B | BigScience RAIL License  |
| 2023/04/24 | falcon_7b | 13.45 GB  | No | YES | https://huggingface.co/tiiuae/falcon-7b | 7 B | 2048 | 1.5 T | Apache 2.0 |
| 2024/05/04 | redpajama_7b | 12.90 GB | No | No | https://huggingface.co/togethercomputer/RedPajama-INCITE-7B-Base | 6.9 B | 2048 | 1 T | Apache 2.0 |
| 2023/05/05 | mpt_7b | 12.39 GB | No | YES | https://huggingface.co/mosaicml/mpt-7b | 6.7 B | 2048 | 1 T | Apache-2.0 |
| 2023/06/30 | mpt_7b_8k| 12.39 GB | No | YES | https://huggingface.co/mosaicml/mpt-7b-8k | 6.7 B | 8192 | 1.5 T | Apache-2.0 |
| 2023/07/06 | openllama2_7b | 12.55 GB | No | No | https://huggingface.co/openlm-research/open_llama_7b_v2 | 7 B | 2048 | 1 T | Apache 2.0 | 
| 2023/07/18 | llama2_7b | 12.55 GB | YES | No | https://huggingface.co/meta-llama/Llama-2-7b-hf | 7 B  | 4096 | 2 T | LLAMA 2 COMMUNITY LICENSE AGREEMENT |
| 2023/07/26 | llama2_7b_32k | 12.55 GB | No | YES | https://huggingface.co/togethercomputer/LLaMA-2-7B-32K | 7 B | 32768 | fine-tuned | LLAMA 2 COMMUNITY LICENSE AGREEMENT |
| 2023/09/20 | mistral_7b | 13.49 GB | No | No | https://huggingface.co/mistralai/Mistral-7B-v0.1 | 7.3 B | 8192 | ?? | Apache 2.0 |
| 2023/09/24 | qwen_7b | 14.38 GB | No | YES | https://huggingface.co/Qwen/Qwen-7B | 7 B | 8192 | 2.4 T | Tongyi Qianwen LICENSE AGREEMENT |
| 2023/11/01 | yi_6b | 11.29 GB | No | No | https://huggingface.co/01-ai/Yi-6B | 6 B | 4096 | 3 T | Yi Series Models Community License Agreement |
| 2023/12/10 | decilm_7b | 13.12 GB | No | YES | https://huggingface.co/Deci/DeciLM-7B | 7 B | 8192 | ?? | Apache 2.0 |

### 13B parameters

| Date | Name | Disk size | Gated access | Remote code | URL | Params | Context | Train tokens | License |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| 2023/06/15 | openllama1_13b | 24.24 GB | No | No | https://huggingface.co/openlm-research/open_llama_13b | 13 B | 2048 | 1 T | Apache 2.0 |
| 2023/07/18 | llama2_13b | 24.25 GB | YES | No | https://huggingface.co/meta-llama/Llama-2-13b-hf | 13B  | 4096 | 2 T | LLAMA 2 COMMUNITY LICENSE AGREEMENT |
| 2023/09/24 | qwen_14b | 26.39 GB | No | YES | https://huggingface.co/Qwen/Qwen-14B | 14 B | 2048 | 3 T | Tongyi Qianwen LICENSE AGREEMENT |
| 2023/12/12 | solar_10b | 19.99 GB | No | No | https://huggingface.co/upstage/SOLAR-10.7B-v1.0 | 10.7 B | 4096 | 3 T | Apache 2.0 |

### 30-45B parameters

| Date | Name | Disk size | Gated access | Remote code | URL | Params | Context | Train tokens | License |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| 2023/02/24 | llama1_33b | 15.78 GB | No | No | https://huggingface.co/alexl83/LLaMA-33B-HF | 32.5 B | 2048 | 1.4 T | Noncommercial license focused on research use cases |
| 2023/05/24 | falcon_40b | 21.00 GB | No | YES | https://huggingface.co/tiiuae/falcon-40b | 40 B | 2048 | 1 T | Apache 2.0 |
| 2023/06/20 | mpt_30b | 15.00 GB | No | YES | https://huggingface.co/mosaicml/mpt-30b | 30 B | 8192 | 1 T | Apache-2.0 |
| 2023/08/24 | codellama_34b | 17.07 GB | No | No | https://huggingface.co/codellama/CodeLlama-34b-hf | 34 B | 16384 | 2.5 T  | LLAMA 2 COMMUNITY LICENSE AGREEMENT |
| 2023/11/01 | yi_34b | 17.33 GB | No | No | https://huggingface.co/01-ai/Yi-34B | 34 B | 4096 | 3 T | Yi Series Models Community License Agreement
| 2023/12/11 | mixtral_8x7B | 22.18 GB | No | No | https://huggingface.co/mistralai/Mixtral-8x7B-v0.1 | 46.7 B -> 12.9 B | 32768 | ??  | Apache 2.0 |

In [1]:
 models = { 
    "tinyllama_1b": "TinyLlama/TinyLlama-1.1B-intermediate-step-1195k-token-2.5T", # 4.10 GB
     
    "redpajama_3b" : "togethercomputer/RedPajama-INCITE-Base-3B-v1", # 5.30 GB
    "btlm_3b" : "cerebras/btlm-3b-8k-base", #  4.93 GB
    "openllama2_3b" : "openlm-research/open_llama_3b_v2", #  6.38 GB
    "stablelm_3b" : "stabilityai/stablelm-3b-4e1t", # 5.21 GB
    "phi2_3b" : "microsoft/phi-2", # 5.18 GB

    "bloomz_7b" : "bigscience/bloomz-7b1-mt", # 13.18 GB
    "falcon_7b" : "tiiuae/falcon-7b", # 13.45 GB       
    "redpajama_7b" : "togethercomputer/RedPajama-INCITE-7B-Base", # 12.90 GB
    "mpt_7b" : "mosaicml/mpt-7b", # 12.39 GB
    "mpt_7b_8k" : "mosaicml/mpt-7b-8k", # 12.39 GB
    "openllama2_7b" : "openlm-research/open_llama_7b_v2", # 12.55 GB
    "llama2_7b" : "meta-llama/Llama-2-7b-hf", # 12.55 GB
    "llama2_7b_32k" : "togethercomputer/LLaMA-2-7B-32K", # 12.55 GB
    "mistral_7b" : "mistralai/Mistral-7B-v0.1", # 13.49 GB
    "qwen_7b" : "Qwen/Qwen-7B", # 14.38 GB
    "yi_6b" : "01-ai/Yi-6B", # 11.29 GB
    "decilm_7b" : "Deci/DeciLM-7B", # 13.12 GB
    
    "openllama1_13b" : "openlm-research/open_llama_13b", # 24.24 GB
    "llama2_13b" : "meta-llama/Llama-2-13b-hf", # 24.25 GB
    "qwen_14b" : "Qwen/Qwen-14B", # 26.39 GB
    "solar_10b" : "upstage/SOLAR-10.7B-v1.0", # 19.99 GB
    
    "llama1_33b" : "TheBloke/WizardLM-33B-V1.0-Uncensored-GPTQ", # 15.78 GB https://huggingface.co/alexl83/LLaMA-33B-HF
    "falcon_40b" : "TheBloke/falcon-40b-instruct-GPTQ", # 21.00 GB https://huggingface.co/tiiuae/falcon-40b
    "mpt_30b" : "abhinavkulkarni/mosaicml-mpt-30b-instruct-w4-g128-awq", # 15.00 GB https://huggingface.co/mosaicml/mpt-30b
    "codellama_34b" : "TheBloke/CodeLlama-34B-Instruct-GPTQ", # 17.07 GB https://huggingface.co/codellama/CodeLlama-34b-hf
    "yi_34b" : "TheBloke/Yi-34B-GPTQ", # 17.33 GB https://huggingface.co/01-ai/Yi-34B    
    "mixtral_8x7B" : "TheBloke/Mixtral-8x7B-v0.1-GPTQ" # 22.18 GB https://huggingface.co/mistralai/Mixtral-8x7B-v0.1
}

## Models download

If you want to be able to access gated HuggingFace repositories:

1. Login to your HuggingFace account, go to https://huggingface.co/settings/tokens, create a READ access token and copy it
2. Paste your HuggingFace access token in the local file /workspace/hftoken
3. Load it in Python with the the code below

In [2]:
with open("/workspace/hftoken", 'r') as file:
    myhftoken = file.read().strip()

Download all models in HF local cache and measure the model files size on disk:

In [6]:
import os
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.utils.hub import cached_file

memory_unit_mb = 1024*1024
memory_unit_gb = 1024*1024*1024

def get_directory_size(directory):
    total_size = 0
    for dirpath, dirnames, filenames in os.walk(directory):
        for f in filenames:
            fp = os.path.join(dirpath, f)
            total_size += os.path.getsize(fp)
    return total_size

def get_model_path_and_size_on_disk(pretrained_model_id):    
    model_config_file = cached_file(pretrained_model_id, "config.json", local_files_only=True)
    model_directory = os.path.dirname(os.path.dirname(model_config_file))    
    total_size = get_directory_size(model_directory)
    return model_directory,total_size

def download_in_local_cache(pretrained_model_id, **kwargs):
    print(f"Loading model {pretrained_model_id} in local cache ...")
    AutoTokenizer.from_pretrained(pretrained_model_id, **kwargs)
    try:            
        AutoModelForCausalLM.from_pretrained(pretrained_model_id, device_map="meta", **kwargs)
    except Exception as e:
        print("Ignored exceptions while loading the model in memory: not the goal here")
        print(e)
    path,size = get_model_path_and_size_on_disk(pretrained_model_id)
    print(f"--> model files size   : {(size/memory_unit_gb):.2f} GB")
    print(f"--> stored in directory: {path}")
    print()

In [None]:
for idx,model_key in enumerate(models):
    download_in_local_cache(models[model_key], use_safetensors=True, token=myhftoken)

In [None]:
model_name = list(models)[18]
download_in_local_cache(models[model_name], use_safetensors=True)#, trust_remote_code=True)

Loading model openlm-research/open_llama_13b in local cache ...


/models/huggingface/transformers/models--togethercomputer--RedPajama-INCITE-7B-Base
/models/huggingface/transformers/models--mosaicml--mpt-7b
/models/huggingface/transformers/models--mosaicml--mpt-7b-8k
/models/huggingface/transformers/models--openlm-research--open_llama_7b_v2
/models/huggingface/transformers/models--togethercomputer--LLaMA-2-7B-32K
/models/huggingface/transformers/models--mistralai--Mistral-7B-v0.1
/models/huggingface/transformers/models--Qwen--Qwen-7B
/models/huggingface/transformers/models--01-ai--Yi-6B

/models/huggingface/transformers/models--openlm-research--open_llama_13b

linkname=$(find . -name "*.bin")
target=$(readlink -f $linkname)
rm "$target"
rm "$linkname"

In [None]:
find . -name "*.bin"