This lists various services that provide free access or credits towards API-based LLM usage.
Note
Please don't abuse these services, else we might lose them.
Warning
This list explicitly excludes any services that are not legitimate (eg reverse engineers an existing chatbot)
Limits:
20 requests/minute
50 requests/day
1000 requests/day with $10 lifetime topup
Models share a common quota.
- DeepCoder 14B Preview
- DeepHermes 3 Llama 3 8B Preview
- DeepSeek R1
- DeepSeek R1 Distill Llama 70B
- DeepSeek R1 Distill Qwen 14B
- DeepSeek V3
- DeepSeek V3 0324
- DeepSeek V3 Base
- Dolphin 3.0 Mistral 24B
- Dolphin 3.0 R1 Mistral 24B
- Featherless Qwerky 72B
- Gemma 2 9B Instruct
- Gemma 3 12B Instruct
- Gemma 3 27B Instruct
- Gemma 3 4B Instruct
- Kimi VL A3B Thinking
- Llama 3.1 8B Instruct
- Llama 3.1 Nemotron Ultra 253B v1
- Llama 3.2 11B Vision Instruct
- Llama 3.2 1B Instruct
- Llama 3.3 70B Instruct
- Llama 3.3 Nemotron Super 49B v1
- Llama 4 Maverick
- Llama 4 Scout
- Mistral 7B Instruct
- Mistral Nemo
- Mistral Small 24B Instruct 2501
- Mistral Small 3.1 24B Instruct
- QwQ 32B ArliAI RpR v1
- Qwen 2.5 72B Instruct
- Qwen 2.5 VL 32B Instruct
- Qwen QwQ 32B
- Qwen2.5 Coder 32B Instruct
- Qwen2.5 VL 72B Instruct
- Reka Flash 3
- Shisa V2 Llama 3.3 70B
- deepseek/deepseek-r1-0528-qwen3-8b:free
- deepseek/deepseek-r1-0528:free
- google/gemma-3n-e4b-it:free
- microsoft/mai-ds-r1:free
- mistralai/devstral-small:free
- mistralai/mistral-small-3.2-24b-instruct:free
- moonshotai/kimi-dev-72b:free
- qwen/qwen3-14b:free
- qwen/qwen3-235b-a22b:free
- qwen/qwen3-30b-a3b:free
- qwen/qwen3-32b:free
- qwen/qwen3-8b:free
- sarvamai/sarvam-m:free
- thudm/glm-4-32b:free
- thudm/glm-z1-32b:free
- tngtech/deepseek-r1t-chimera:free
Data is used for training when used outside of the UK/CH/EEA/EU.
Model Name | Model Limits |
---|---|
Gemini 2.5 Pro | 6,000,000 tokens/day 250,000 tokens/minute 100 requests/day 5 requests/minute |
Gemini 2.5 Flash | 250,000 tokens/minute 250 requests/day 10 requests/minute |
Gemini 2.0 Flash | 1,000,000 tokens/minute 200 requests/day 15 requests/minute |
Gemini 2.0 Flash-Lite | 1,000,000 tokens/minute 200 requests/day 30 requests/minute |
Gemini 2.0 Flash (Experimental) | 250,000 tokens/minute 50 requests/day 10 requests/minute |
Gemini 1.5 Flash | 250,000 tokens/minute 50 requests/day 15 requests/minute |
Gemini 1.5 Flash-8B | 250,000 tokens/minute 50 requests/day 15 requests/minute |
LearnLM 2.0 Flash (Experimental) | 1,500 requests/day 15 requests/minute |
Gemma 3 27B Instruct | 15,000 tokens/minute 14,400 requests/day 30 requests/minute |
Gemma 3 12B Instruct | 15,000 tokens/minute 14,400 requests/day 30 requests/minute |
Gemma 3 4B Instruct | 15,000 tokens/minute 14,400 requests/day 30 requests/minute |
Gemma 3 1B Instruct | 15,000 tokens/minute 14,400 requests/day 30 requests/minute |
text-embedding-004 | 150 batch requests/minute 1,500 requests/minute 100 content/batch Shared Quota |
embedding-001 |
Phone number verification required. Models tend to be context window limited.
Limits: 40 requests/minute
- Free tier (Experiment plan) requires opting into data training
- Requires phone number verification.
Limits (per-model): 1 request/second, 500,000 tokens/minute, 1,000,000,000 tokens/month
- Currently free to use
- Monthly subscription based
- Requires phone number verification
Limits: 30 requests/minute, 2,000 requests/day
- Codestral
HuggingFace Serverless Inference limited to models smaller than 10GB. Some popular models are supported even if they exceed 10GB.
Limits: $0.10/month in credits
- Various open models across supported providers
Routes to various supported providers.
Limits: $5/month
Free tier restricted to 8K context.
Model Name | Model Limits |
---|---|
Qwen 3 32B | 30 requests/minute 60,000 tokens/minute 900 requests/hour 1,000,000 tokens/hour 14,400 requests/day 1,000,000 tokens/day |
Llama 4 Scout | 30 requests/minute 60,000 tokens/minute 900 requests/hour 1,000,000 tokens/hour 14,400 requests/day 1,000,000 tokens/day |
Llama 3.1 8B | 30 requests/minute 60,000 tokens/minute 900 requests/hour 1,000,000 tokens/hour 14,400 requests/day 1,000,000 tokens/day |
Llama 3.3 70B | 30 requests/minute 60,000 tokens/minute 900 requests/hour 1,000,000 tokens/hour 14,400 requests/day 1,000,000 tokens/day |
Model Name | Model Limits |
---|---|
Allam 2 7B | 7,000 requests/day 6,000 tokens/minute |
DeepSeek R1 Distill Llama 70B | 1,000 requests/day 6,000 tokens/minute |
Distil Whisper Large v3 | 7,200 audio-seconds/minute 2,000 requests/day |
Gemma 2 9B Instruct | 14,400 requests/day 15,000 tokens/minute |
Groq compound-beta | 200 requests/day 70,000 tokens/minute |
Groq compound-beta-mini | 200 requests/day 70,000 tokens/minute |
Llama 3 70B | 14,400 requests/day 6,000 tokens/minute |
Llama 3 8B | 14,400 requests/day 6,000 tokens/minute |
Llama 3.1 8B | 14,400 requests/day 6,000 tokens/minute |
Llama 3.3 70B | 1,000 requests/day 12,000 tokens/minute |
Llama 4 Maverick 17B 128E Instruct | 1,000 requests/day 6,000 tokens/minute |
Llama 4 Scout Instruct | 1,000 requests/day 30,000 tokens/minute |
Mistral Saba 24B | 1,000 requests/day 6,000 tokens/minute |
Qwen QwQ 32B | 1,000 requests/day 6,000 tokens/minute |
Whisper Large v3 | 7,200 audio-seconds/minute 2,000 requests/day |
Whisper Large v3 Turbo | 7,200 audio-seconds/minute 2,000 requests/day |
meta-llama/llama-guard-4-12b | 14,400 requests/day 15,000 tokens/minute |
meta-llama/llama-prompt-guard-2-22m | |
meta-llama/llama-prompt-guard-2-86m | |
qwen/qwen3-32b | 1,000 requests/day 6,000 tokens/minute |
Limits: Up to 60 requests/minute
Limits:
20 requests/minute
1,000 requests/month
Models share a common quota.
- Command-A
- Command-R7B
- Command-R+
- Command-R
- Aya Expanse 8B
- Aya Expanse 32B
- Aya Vision 8B
- Aya Vision 32B
Extremely restrictive input/output token limits.
Limits: Dependent on Copilot subscription tier (Free/Pro/Pro+/Business/Enterprise)
- AI21 Jamba 1.5 Large
- AI21 Jamba 1.5 Mini
- Codestral 25.01
- Cohere Command A
- Cohere Command R
- Cohere Command R 08-2024
- Cohere Command R+
- Cohere Command R+ 08-2024
- Cohere Embed v3 English
- Cohere Embed v3 Multilingual
- DeepSeek-R1
- DeepSeek-R1-0528
- DeepSeek-V3-0324
- Grok 3
- Grok 3 Mini
- JAIS 30b Chat
- Llama 4 Maverick 17B 128E Instruct FP8
- Llama 4 Scout 17B 16E Instruct
- Llama-3.2-11B-Vision-Instruct
- Llama-3.2-90B-Vision-Instruct
- Llama-3.3-70B-Instruct
- MAI-DS-R1
- Meta-Llama-3-70B-Instruct
- Meta-Llama-3-8B-Instruct
- Meta-Llama-3.1-405B-Instruct
- Meta-Llama-3.1-70B-Instruct
- Meta-Llama-3.1-8B-Instruct
- Ministral 3B
- Mistral Large 24.11
- Mistral Medium 3 (25.05)
- Mistral Nemo
- Mistral Small 3.1
- OpenAI GPT-4.1
- OpenAI GPT-4.1-mini
- OpenAI GPT-4.1-nano
- OpenAI GPT-4o
- OpenAI GPT-4o mini
- OpenAI Text Embedding 3 (large)
- OpenAI Text Embedding 3 (small)
- OpenAI o1
- OpenAI o1-mini
- OpenAI o1-preview
- OpenAI o3
- OpenAI o3-mini
- OpenAI o4-mini
- Phi-3-medium instruct (128k)
- Phi-3-medium instruct (4k)
- Phi-3-mini instruct (128k)
- Phi-3-mini instruct (4k)
- Phi-3-small instruct (128k)
- Phi-3-small instruct (8k)
- Phi-3.5-MoE instruct (128k)
- Phi-3.5-mini instruct (128k)
- Phi-3.5-vision instruct (128k)
- Phi-4
- Phi-4-Reasoning
- Phi-4-mini-instruct
- Phi-4-mini-reasoning
- Phi-4-multimodal-instruct
Distributed, decentralized crypto-based compute. Data is sent to individual hosts. Limits: 200 requests/day
- Various open models
Limits: 10,000 neurons/day
- DeepSeek R1 Distill Qwen 32B
- Deepseek Coder 6.7B Base (AWQ)
- Deepseek Coder 6.7B Instruct (AWQ)
- Deepseek Math 7B Instruct
- Discolm German 7B v1 (AWQ)
- Falcom 7B Instruct
- Gemma 2B Instruct (LoRA)
- Gemma 3 12B Instruct
- Gemma 7B Instruct
- Gemma 7B Instruct (LoRA)
- Hermes 2 Pro Mistral 7B
- Llama 2 13B Chat (AWQ)
- Llama 2 7B Chat (FP16)
- Llama 2 7B Chat (INT8)
- Llama 2 7B Chat (LoRA)
- Llama 3 8B Instruct
- Llama 3 8B Instruct
- Llama 3 8B Instruct (AWQ)
- Llama 3.1 8B Instruct (AWQ)
- Llama 3.1 8B Instruct (FP8)
- Llama 3.2 11B Vision Instruct
- Llama 3.2 1B Instruct
- Llama 3.2 3B Instruct
- Llama 3.3 70B Instruct (FP8)
- Llama 4 Scout Instruct
- Llama Guard 3 8B
- LlamaGuard 7B (AWQ)
- Mistral 7B Instruct v0.1
- Mistral 7B Instruct v0.1 (AWQ)
- Mistral 7B Instruct v0.2
- Mistral 7B Instruct v0.2 (LoRA)
- Mistral Small 3.1 24B Instruct
- Neural Chat 7B v3.1 (AWQ)
- OpenChat 3.5 0106
- OpenHermes 2.5 Mistral 7B (AWQ)
- Phi-2
- Qwen 1.5 0.5B Chat
- Qwen 1.5 1.8B Chat
- Qwen 1.5 14B Chat (AWQ)
- Qwen 1.5 7B Chat (AWQ)
- Qwen 2.5 Coder 32B Instruct
- Qwen QwQ 32B
- SQLCoder 7B 2
- Starling LM 7B Beta
- TinyLlama 1.1B Chat v1.0
- Una Cybertron 7B v2 (BF16)
- Zephyr 7B Beta (AWQ)
Very stringent payment verification for Google Cloud.
Model Name | Model Limits |
---|---|
Llama 3.2 90B Vision Instruct | 30 requests/minute Free during preview |
Llama 3.1 70B Instruct | 60 requests/minute Free during preview |
Llama 3.1 8B Instruct | 60 requests/minute Free during preview |
DeepSeek R1-0528 | 60 requests/minute Free during preview |
Credits: $1 when you add a payment method
Models: Various open models
Credits: $1
Models: Various open models
Credits: $30
Models: Any supported model - pay by compute time
Credits: $1
Models: Various open models
Credits: $0.5 for 1 year, $10 for 3 months for LLMs with referral code + GitHub account connection
Models: Various open models
Credits: $10 for 3 months
Models: Jamba family of models
Credits: $10 for 3 months
Models: Solar Pro/Mini
Credits: $15
Requirements: Phone number verification
Models: Various open models
Credits: 1 million tokens/model
Models: Various open and proprietary Qwen models
Credits: $5/month upon sign up, $30/month with payment method added
Models: Any supported model - pay by compute time
Credits: $1, $25 on responding to email survey
Models: Various open models
Credits: $1
Models: Various open models
Credits: $5
Models:
- BGE-M3
- DeepSeek-R1-0528
- DeepSeek-V3-0324
- Gemma 3 27B
- Magistral Small
- Meta Llama 3.1 8B
- Meta Llama 3.3 70B
- Meta Llama 4 Maverick
- Meta Llama 4 Scout
- Mistral NeMo
- Mistral Small
- Qwen2.5-VL 7B
- Qwen3-235B-A22B
- kluster reliability check
Credits: $1
Models:
- DeepSeek V3
- DeepSeek V3 0324
- Hermes 3 Llama 3.1 70B
- Llama 3 70B Instruct
- Llama 3.1 405B Base
- Llama 3.1 405B Base (FP8)
- Llama 3.1 405B Instruct
- Llama 3.1 70B Instruct
- Llama 3.1 8B Instruct
- Llama 3.2 3B Instruct
- Llama 3.3 70B Instruct
- Pixtral 12B (2409)
- Qwen QwQ 32B
- Qwen QwQ 32B Preview
- Qwen2.5 72B Instruct
- Qwen2.5 Coder 32B Instruct
- Qwen2.5 VL 72B Instruct
- Qwen2.5 VL 7B Instruct
Credits: $5 for 3 months
Models:
- E5-Mistral-7B-Instruct
- Llama 3.1 8B
- Llama 3.3 70B
- Llama-4-Maverick-17B-128E-Instruct
- Qwen/Qwen3-32B
- Whisper-Large-v3
- deepseek-ai/DeepSeek-R1-0528
- deepseek-ai/DeepSeek-R1-Distill-Llama-70B
- deepseek-ai/DeepSeek-V3-0324
Credits: 1,000,000 free tokens
Models:
- BGE-Multilingual-Gemma2
- DeepSeek R1 Distill Llama 70B
- Gemma 3 27B Instruct
- Llama 3.1 70B Instruct
- Llama 3.1 8B Instruct
- Llama 3.3 70B Instruct
- Mistral Nemo 2407
- Mistral Small 3.1 24B Instruct 2503
- Pixtral 12B (2409)
- Qwen2.5 Coder 32B Instruct