Skip to content

bright-spark/free-llm-api-resources

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

Free LLM API resources

This lists various services that provide free access or credits towards API-based LLM usage.

Note

Please don't abuse these services, else we might lose them.

Warning

This list explicitly excludes any services that are not legitimate (eg reverse engineers an existing chatbot)

Free Providers

Limits:

20 requests/minute
50 requests/day
1000 requests/day with $10 credit balance

Models share a common quota.

Data is used for training when used outside of the UK/CH/EEA/EU.

Model NameModel Limits
Gemini 2.5 Pro (Experimental)1,000,000 tokens/day
250,000 tokens/minute
25 requests/day
5 requests/minute
Gemini 2.5 Flash (Preview)250,000 tokens/minute
500 requests/day
10 requests/minute
Gemini 2.0 Flash1,000,000 tokens/minute
1,000 requests/day
15 requests/minute
Gemini 2.0 Flash-Lite1,000,000 tokens/minute
1,500 requests/day
30 requests/minute
Gemini 2.0 Flash (Experimental)250,000 tokens/minute
500 requests/day
10 requests/minute
Gemini 1.5 Flash250,000 tokens/minute
500 requests/day
15 requests/minute
Gemini 1.5 Flash-8B250,000 tokens/minute
500 requests/day
15 requests/minute
LearnLM 1.5 Pro (Experimental)1,500 requests/day
15 requests/minute
Gemma 3 27B Instruct15,000 tokens/minute
14,400 requests/day
30 requests/minute
Gemma 3 12B Instruct15,000 tokens/minute
14,400 requests/day
30 requests/minute
Gemma 3 4B Instruct15,000 tokens/minute
14,400 requests/day
30 requests/minute
Gemma 3 1B Instruct15,000 tokens/minute
14,400 requests/day
30 requests/minute
text-embedding-004150 batch requests/minute
1,500 requests/minute
100 content/batch
Shared Quota
embedding-001

Phone number verification required. Models tend to be context window limited.

Limits: 40 requests/minute

  • Free tier (Experiment plan) requires opting into data training
  • Requires phone number verification.

Limits (per-model): 1 request/second, 500,000 tokens/minute, 1,000,000,000 tokens/month

  • Currently free to use
  • Monthly subscription based
  • Requires phone number verification

Limits: 30 requests/minute, 2,000 requests/day

  • Codestral

HuggingFace Serverless Inference limited to models smaller than 10GB. Some popular models are supported even if they exceed 10GB.

Limits: $0.10/month in credits

  • Various open models across supported providers

Free tier restricted to 8K context.

Model NameModel Limits
Llama 4 Scout30 requests/minute
60,000 tokens/minute
900 requests/hour
1,000,000 tokens/hour
14,400 requests/day
1,000,000 tokens/day
Llama 3.1 8B30 requests/minute
60,000 tokens/minute
900 requests/hour
1,000,000 tokens/hour
14,400 requests/day
1,000,000 tokens/day
Llama 3.3 70B30 requests/minute
60,000 tokens/minute
900 requests/hour
1,000,000 tokens/hour
14,400 requests/day
1,000,000 tokens/day
Model NameModel Limits
Allam 2 7B7,000 requests/day
6,000 tokens/minute
DeepSeek R1 Distill Llama 70B1,000 requests/day
6,000 tokens/minute
Distil Whisper Large v37,200 audio-seconds/minute
2,000 requests/day
Gemma 2 9B Instruct14,400 requests/day
15,000 tokens/minute
Groq compound-beta200 requests/day
70,000 tokens/minute
Groq compound-beta-mini200 requests/day
70,000 tokens/minute
Llama 3 70B14,400 requests/day
6,000 tokens/minute
Llama 3 8B14,400 requests/day
6,000 tokens/minute
Llama 3.1 8B14,400 requests/day
6,000 tokens/minute
Llama 3.3 70B1,000 requests/day
12,000 tokens/minute
Llama 4 Maverick 17B 128E Instruct1,000 requests/day
6,000 tokens/minute
Llama 4 Scout Instruct1,000 requests/day
30,000 tokens/minute
Llama Guard 3 8B14,400 requests/day
15,000 tokens/minute
Mistral Saba 24B1,000 requests/day
6,000 tokens/minute
Qwen QwQ 32B1,000 requests/day
6,000 tokens/minute
Whisper Large v37,200 audio-seconds/minute
2,000 requests/day
Whisper Large v3 Turbo7,200 audio-seconds/minute
2,000 requests/day
Model NameModel Limits
DeepSeek R1 Distill Llama 70B12 requests/minute
Llama 3.1 70B Instruct12 requests/minute
Llama 3.1 8B Instruct12 requests/minute
Llama 3.3 70B Instruct12 requests/minute
Llava Next Mistral 7B12 requests/minute
Mamba Codestral 7B v0.112 requests/minute
Mistral 7B Instruct v0.312 requests/minute
Mistral Nemo 240712 requests/minute
Mixtral 8x7B Instruct v0.112 requests/minute
Qwen 2.5 VL 72B Instruct12 requests/minute
Qwen2.5 Coder 32B Instruct12 requests/minute

Limits: Up to 60 requests/minute

Limits:

20 requests/minute
1,000 requests/month

Models share a common quota.

  • Command-A
  • Command-R7B
  • Command-R+
  • Command-R
  • Aya Expanse 8B
  • Aya Expanse 32B
  • Aya Vision 8B
  • Aya Vision 32B

Extremely restrictive input/output token limits.

Limits: Dependent on Copilot subscription tier (Free/Pro/Pro+/Business/Enterprise)

  • AI21 Jamba 1.5 Large
  • AI21 Jamba 1.5 Mini
  • Codestral 25.01
  • Cohere Command A
  • Cohere Command R
  • Cohere Command R 08-2024
  • Cohere Command R+
  • Cohere Command R+ 08-2024
  • Cohere Embed 4
  • Cohere Embed v3 English
  • Cohere Embed v3 Multilingual
  • DeepSeek-R1
  • DeepSeek-V3-0324
  • JAIS 30b Chat
  • Llama 4 Maverick 17B 128E Instruct FP8
  • Llama 4 Scout 17B 16E Instruct
  • Llama-3.2-11B-Vision-Instruct
  • Llama-3.2-90B-Vision-Instruct
  • Llama-3.3-70B-Instruct
  • MAI-DS-R1
  • Meta-Llama-3-70B-Instruct
  • Meta-Llama-3-8B-Instruct
  • Meta-Llama-3.1-405B-Instruct
  • Meta-Llama-3.1-70B-Instruct
  • Meta-Llama-3.1-8B-Instruct
  • Ministral 3B
  • Mistral Large (2407)
  • Mistral Large 24.11
  • Mistral Nemo
  • Mistral Small 3.1
  • OpenAI GPT-4.1
  • OpenAI GPT-4.1-mini
  • OpenAI GPT-4.1-nano
  • OpenAI GPT-4o
  • OpenAI GPT-4o mini
  • OpenAI Text Embedding 3 (large)
  • OpenAI Text Embedding 3 (small)
  • OpenAI o1
  • OpenAI o1-mini
  • OpenAI o1-preview
  • OpenAI o3
  • OpenAI o3-mini
  • OpenAI o4-mini
  • Phi-3-medium instruct (128k)
  • Phi-3-medium instruct (4k)
  • Phi-3-mini instruct (128k)
  • Phi-3-mini instruct (4k)
  • Phi-3-small instruct (128k)
  • Phi-3-small instruct (8k)
  • Phi-3.5-MoE instruct (128k)
  • Phi-3.5-mini instruct (128k)
  • Phi-3.5-vision instruct (128k)
  • Phi-4
  • Phi-4-Reasoning
  • Phi-4-mini-instruct
  • Phi-4-mini-reasoning
  • Phi-4-multimodal-instruct

Distributed, decentralized crypto-based compute. Data is sent to individual hosts.

  • DeepCoder 14B Preview
  • DeepSeek R1
  • DeepSeek R1-Zero
  • DeepSeek V3
  • DeepSeek V3 0324
  • DeepSeek V3 Base
  • Dolphin 3.0 Mistral 24B
  • Dolphin 3.0 R1 Mistral 24B
  • Llama 3.1 Nemotron Ultra 253B v1
  • Llama 4 Maverick 17B 128E Instruct FP8
  • Llama 4 Scout 17B 16E Instruct
  • QwQ 32B ArliAI RpR v1
  • Qwen 2.5 VL 32B Instruct
  • Shisa V2 Llama 3.3 70B
  • chutesai/llama-3.1-405b-fp8
  • deepseek-ai/deepseek-prover-v2-671b
  • microsoft/mai-ds-r1-fp8
  • qwen/qwen3-14b
  • qwen/qwen3-235b-a22b
  • qwen/qwen3-30b-a3b
  • qwen/qwen3-32b
  • qwen/qwen3-8b
  • thudm/glm-4-32b-0414
  • thudm/glm-z1-32b-0414
  • tngtech/deepseek-r1t-chimera

Limits: 10,000 neurons/day

  • DeepSeek R1 Distill Qwen 32B
  • Deepseek Coder 6.7B Base (AWQ)
  • Deepseek Coder 6.7B Instruct (AWQ)
  • Deepseek Math 7B Instruct
  • Discolm German 7B v1 (AWQ)
  • Falcom 7B Instruct
  • Gemma 2B Instruct (LoRA)
  • Gemma 3 12B Instruct
  • Gemma 7B Instruct
  • Gemma 7B Instruct (LoRA)
  • Hermes 2 Pro Mistral 7B
  • Llama 2 13B Chat (AWQ)
  • Llama 2 7B Chat (FP16)
  • Llama 2 7B Chat (INT8)
  • Llama 2 7B Chat (LoRA)
  • Llama 3 8B Instruct
  • Llama 3 8B Instruct
  • Llama 3 8B Instruct (AWQ)
  • Llama 3.1 8B Instruct
  • Llama 3.1 8B Instruct (AWQ)
  • Llama 3.1 8B Instruct (FP8)
  • Llama 3.2 11B Vision Instruct
  • Llama 3.2 1B Instruct
  • Llama 3.2 3B Instruct
  • Llama 3.3 70B Instruct (FP8)
  • Llama 4 Scout Instruct
  • Llama Guard 3 8B
  • LlamaGuard 7B (AWQ)
  • Mistral 7B Instruct v0.1
  • Mistral 7B Instruct v0.1 (AWQ)
  • Mistral 7B Instruct v0.2
  • Mistral 7B Instruct v0.2 (LoRA)
  • Mistral Small 3.1 24B Instruct
  • Neural Chat 7B v3.1 (AWQ)
  • OpenChat 3.5 0106
  • OpenHermes 2.5 Mistral 7B (AWQ)
  • Phi-2
  • Qwen 1.5 0.5B Chat
  • Qwen 1.5 1.8B Chat
  • Qwen 1.5 14B Chat (AWQ)
  • Qwen 1.5 7B Chat (AWQ)
  • Qwen 2.5 Coder 32B Instruct
  • Qwen QwQ 32B
  • SQLCoder 7B 2
  • Starling LM 7B Beta
  • TinyLlama 1.1B Chat v1.0
  • Una Cybertron 7B v2 (BF16)
  • Zephyr 7B Beta (AWQ)

Very stringent payment verification for Google Cloud.

Model NameModel Limits
Gemini 2.5 Pro (Experimental)10 requests/minute
Shared Quota
Gemini 2.0 Flash (Experimental)
Gemini 2.0 Flash Thinking (Experimental)
Gemini 2.0 Pro (Experimental)
Llama 3.2 90B Vision Instruct30 requests/minute
Free during preview
Llama 3.1 70B Instruct60 requests/minute
Free during preview
Llama 3.1 8B Instruct60 requests/minute
Free during preview

Providers with trial credits

Credits: $1 when you add a payment method

Models: Various open models

Credits: $1

Models: Various open models

Credits: $5 when you add a payment method

Models: Routes to other providers, various open models and proprietary models (OpenAI, Gemini, Anthropic, Mistral, Perplexity, etc)

Credits: $30

Models: Any supported model - pay by compute time

Credits: $1

Models: Various open models

Credits: $0.5 for 1 year, $10 for 3 months for LLMs with referral code + GitHub account connection

Models: Various open models

Credits: $10 for 3 months

Models: Jamba family of models

Credits: $10 for 3 months

Models: Solar Pro/Mini

Credits: $15

Requirements: Phone number verification

Models: Various open models

Credits: 1 million tokens/model

Models: Various open and proprietary Qwen models

Credits: $5/month upon sign up, $30/month with payment method added

Models: Any supported model - pay by compute time

Credits: $1, $25 on responding to email survey

Models: Various open models

Credits: $1

Models: Various open models

Credits: $5

Models: Various open models

Credits: $1

Models: Various open models

Credits: $1

Models:

  • DeepSeek V3
  • DeepSeek V3 0324
  • Hermes 3 Llama 3.1 70B
  • Llama 3 70B Instruct
  • Llama 3.1 405B Base
  • Llama 3.1 405B Base (FP8)
  • Llama 3.1 405B Instruct
  • Llama 3.1 70B Instruct
  • Llama 3.1 8B Instruct
  • Llama 3.2 3B Instruct
  • Llama 3.3 70B Instruct
  • Pixtral 12B (2409)
  • Qwen QwQ 32B
  • Qwen QwQ 32B Preview
  • Qwen2.5 72B Instruct
  • Qwen2.5 Coder 32B Instruct
  • Qwen2.5 VL 72B Instruct
  • Qwen2.5 VL 7B Instruct

Credits: $5 for 3 months

Models:

  • E5-Mistral-7B-Instruct
  • Llama 3.1 405B
  • Llama 3.1 8B
  • Llama 3.2 1B
  • Llama 3.2 3B
  • Llama 3.3 70B
  • Llama-4-Maverick-17B-128E-Instruct
  • Llama-4-Scout-17B-16E-Instruct
  • Llama-Guard-3-8B
  • Qwen/QwQ-32B
  • Qwen/Qwen2-Audio-7B-Instruct
  • Qwen/Qwen3-32B
  • deepseek-ai/DeepSeek-R1
  • deepseek-ai/DeepSeek-R1-Distill-Llama-70B
  • deepseek-ai/DeepSeek-V3-0324

Credits: 1,000,000 free tokens

Models:

  • BGE-Multilingual-Gemma2
  • DeepSeek R1 Distill Llama 70B
  • DeepSeek R1 Distill Llama 8B
  • Gemma 3 27B Instruct
  • Llama 3.1 70B Instruct
  • Llama 3.1 8B Instruct
  • Llama 3.3 70B Instruct
  • Mistral Nemo 2407
  • Mistral Small 3.1 24B Instruct 2503
  • Pixtral 12B (2409)
  • Qwen2.5 Coder 32B Instruct

About

A list of free LLM inference resources accessible via API.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%