# 🎓 FrugalGPT: Better Quality and Lower Cost for LLM Apps

This notebook illustrates the FrugalGPT framework for _building LLM Applications with budget constraints._

Below, we will demonstrate two techniques offered by FrugalGPT to build cost-efficient LLM applications. The first techniques, LLMforAll, allows the users to query various LLM APIs via a unified inferface. The sceond one, LLMCascade, automates and optimizes the query process given a user-defined budget constraint.



## Installation
Let us start by installing FrugalGPT (if you haven't yet!).

In [1]:
%load_ext autoreload
%autoreload 2
import sys, json, copy 
import logging
logging.disable(logging.CRITICAL)
sys.path.append("src/")

## Setup
Next, let us set up the environment. Currently, FrugalGPT leverages LLM APIs from OpenAI (including ChatGPT and GPT-4), AI21, cohere, TextSynth, and Anthropic. Thus we need to set up their API keys. You can still run the notebook without the keys, but API keys are needed if you need FrugalGPT for your own queries.

In [2]:
import os
import yaml
import wandb

config = yaml.safe_load(open("myconfig.yml"))

os.environ['OPENAI_API_KEY'] = config['OPENAI_API_KEY']
os.environ['AI21_STUDIO_API_KEY'] = config['AI21_STUDIO_API_KEY']
os.environ['COHERE_STUDIO_API_KEY'] = config['COHERE_STUDIO_API_KEY']
os.environ['TEXTSYNTH_API_SECRET_KEY'] = config['TEXTSYNTH_API_SECRET_KEY']
os.environ['ANTHROPIC_API_KEY'] = config['ANTHROPIC_API_KEY']

wandb.login()
wandb.init(project=config['WANDB_PROJECT'])

from IPython.display import display
import FrugalGPT
supported_LLM = FrugalGPT.getservicename()
print("supported LLMs:",supported_LLM)

supported LLMs: ['textsynth/gptneox_20B', 'textsynth/fairseq_gpt_13B', 'textsynth/gptj_6B', 'openai/text-davinci-002', 'openai/text-davinci-003', 'openai/text-curie-001', 'openai/text-babbage-001', 'openai/text-ada-001', 'openaichat/gpt-3.5-turbo', 'openaichat/gpt-4', 'ai21/j1-jumbo', 'ai21/j1-grande', 'ai21/j1-large', 'ai21/j2-ultra', 'ai21/j2-mid', 'ai21/j2-light', 'cohere/command', 'cohere/base', 'cohere/xlarge', 'cohere/medium', 'anthropic/claude-1', 'anthropic/claude-instant-1', 'anthropic/claude-1-100k']


## 1. LLMforAll: One interface for all LLM services
Let us first study an example for LLMforAll, an interface that unifies all existing services.  

In [3]:
MyLLMforAll = FrugalGPT.LLMforAll()
query = "Question: Who is Matei Zaharia in 2023?\nAnswer:"
service_name = supported_LLM[-1]
genparams = FrugalGPT.GenerationParameter(max_tokens=50, temperature=0.1, stop=['\n\n\n\n'])
answer = MyLLMforAll.get_completion(query,service_name,genparams=genparams)
cost = MyLLMforAll.get_cost()
print("API:",service_name,"answer:",answer,"cost:",cost)

API: anthropic/claude-1-100k answer:  I apologize, but I do not actually know who Matei Zaharia will be in 2023 or specifics about his future career. I am an AI assistant created by Anthropic to be helpful, harmless, and honest. cost: 0.0016795999999999998


The above code snippet shows how to use LLMforAll. Its function get_completion gives a unified inferface for all LLMs: it takes the query, the generation parameters (such as temperature), and the service name as input, and then gives the corresponding generation. The cost can be obtained by calling get_cost(). 

In [4]:
responses = MyLLMforAll.get_completion_allservice(query, supported_LLM, genparams=genparams)
print("full responses")
display(responses)

full responses


Unnamed: 0,service,answer,cost
0,textsynth/gptneox_20B,Matei Zaharia is a computer scientist who wor...,0.0017724
1,textsynth/fairseq_gpt_13B,Matei Zaharia is a Romanian politician and me...,0.000759
2,textsynth/gptj_6B,The world’s first self-driving car.\n\nMatei ...,0.000253
3,openai/text-davinci-002,Matei Zaharia is a Romanian-American computer...,0.00108
4,openai/text-davinci-003,"In 2023, Matei Zaharia is a computer scientis...",0.0013
5,openai/text-curie-001,Matei Zaharia will be 73 years old in 2023.,5.6e-05
6,openai/text-babbage-001,Matei Zaharia is a Romanian-born American bus...,1.5e-05
7,openai/text-ada-001,\n\nMatei Zaharia is a Russian-born American b...,2e-05
8,openaichat/gpt-3.5-turbo,"As an AI language model, I cannot predict the ...",0.000146
9,openaichat/gpt-4,"As an AI, I cannot predict the future or provi...",0.00366


## 2. LLMCascade: Optimizing performance within budget constraints 
Next let us use LLMCascade to automatically optimize the overall performance given a budget constraint.

### Example usage: predicting gold price trends from financial news
Let us first create a few NLP queries that asks LLM to predict gold price trends.

In [5]:
dev = [['Q: april gold down 20 cents to settle at $1,116.10/oz\nA:', 'down','0'],
       ['Q: gold suffers third straight daily decline\nA:', 'down','1'],
       ['Q: Gold futures edge up after two-session decline\nA:', 'up','2'],
       ['Q: Dec. gold climbs $9.40, or 0.7%, to settle at $1,356.90/oz\nA:','up','3'],
       ['Q: Gold struggles; silver slides, base metals falter\nA:','up','4'],
       ['Q: feb. gold ends up $9.60, or 1.1%, at $901.60 an ounce\nA:','up','5'],
        ['Q: dent research : is gold\'s day in the sun coming soon?\nA:','none','6']
      ]
prefix = open('config/prompt/HEADLINES/prefix_e8.txt').read()
raw_data = copy.deepcopy(dev)
data = FrugalGPT.formatdata(dev,prefix)

Next let us load a LLMCascade instance.

In [6]:
MyCascade = FrugalGPT.LLMCascade()
MyCascade.load(loadpath="strategy/HEADLINES/",budget=0.000665)

Let us take a look on LLMCascade's generation on one query.

In [7]:
index = 2
query = data[index][0]
query_raw = raw_data[index][0]
genparams=FrugalGPT.GenerationParameter(max_tokens=50, temperature=0.1, stop=['\n'])
answer = MyCascade.get_completion(query=query,genparams=genparams)
cost = MyCascade.get_cost()
print("query:",query_raw)
print("FrugalGPT LLMCascade answer:",answer)

query: Q: Gold futures edge up after two-session decline
A:
FrugalGPT LLMCascade answer:  up


Now we can pass all the queries to both LLMCascade and vanilla GPT-4, and compare their performance.

In [8]:
# batch generation
result = MyCascade.get_completion_batch(queries=data,genparams=genparams)
result_GPT4 = MyLLMforAll.get_completion_batch(queries=data,genparams=genparams,service_name='openaichat/gpt-4')
print("FrugalGPT LLMCascade generations")
display(result)
display(FrugalGPT.compute_score(result))
print("GPT-4 generations")
display(result_GPT4)
display(FrugalGPT.compute_score(result_GPT4))

FrugalGPT LLMCascade generations


Unnamed: 0,_id,answer,ref_answer,cost
0,0,down,down,5.6e-05
1,1,down,down,5.4e-05
2,2,up,up,5.5e-05
3,3,up,up,5.8e-05
4,4,down,up,5.5e-05
5,5,up,up,5.7e-05
6,6,none,none,0.000497


{'em': 0.8571428571428571, 'cost': 0.00011888571428571427}

GPT-4 generations


Unnamed: 0,_id,answer,ref_answer,cost
0,0,down,down,0.00678
1,1,down,down,0.00645
2,2,up,up,0.00651
3,3,up,up,0.00705
4,4,down,up,0.00657
5,5,up,up,0.00699
6,6,none,none,0.00663


{'em': 0.8571428571428571, 'cost': 0.006711428571428572}

Overall, FrugalGPT LLMCascade gives the same performance but incurs a much smaller cost. This data is of course quite small; Later we will see the evaluation on a larger dataset.

### Using FrugalGPT-LLMCascade for your own data
Interested in using FrugalGPT for your own data? No problem! The following code snippnet demonstrates how to do it.

The first thing is to load the training dataset.

In [9]:
# load data
dev = FrugalGPT.loadcsvdata("data/HEADLINES/train.csv")
dev = dev[0:10]
prefix = open('config/prompt/HEADLINES/prefix_e8.txt').read()
data = FrugalGPT.formatdata(dev,prefix)

Second, specify the budget per query, and then train the model. Warning: This can take a while on large datasets!

In [10]:
# train the model
MyCascade = FrugalGPT.LLMCascade()
service_names = ['openaichat/gpt-3.5-turbo','openaichat/gpt-4','ai21/j1-large','textsynth/gptj_6B']
result = MyCascade.train(data,budget=100,service_names=service_names)



Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.663858,0.666667
2,No log,0.663796,0.666667
3,No log,0.66367,0.666667
4,No log,0.663486,0.666667
5,No log,0.663241,0.666667
6,No log,0.662938,0.666667
7,No log,0.662582,0.666667
8,No log,0.662167,0.666667




Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.712168,0.333333
2,No log,0.712062,0.333333
3,No log,0.711848,0.333333
4,No log,0.711504,0.333333
5,No log,0.711062,0.333333
6,No log,0.710515,0.333333
7,No log,0.709853,0.333333
8,No log,0.709063,0.333333




Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.76138,0.0
2,No log,0.76115,0.0
3,No log,0.760674,0.0
4,No log,0.76001,0.0
5,No log,0.759138,0.0
6,No log,0.758036,0.0
7,No log,0.756688,0.0
8,No log,0.755082,0.0




Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.76138,0.0
2,No log,0.76115,0.0
3,No log,0.760674,0.0
4,No log,0.76001,0.0
5,No log,0.759138,0.0
6,No log,0.758036,0.0
7,No log,0.756688,0.0
8,No log,0.755082,0.0


In [11]:
# save to disk
MyCascade.save(savepath="strategy/TEST/")

Now the model has been saved to disk. You can load it as follows for future applications.

In [12]:
MyCascade = FrugalGPT.LLMCascade()
MyCascade.load(loadpath="strategy/TEST/",budget=100)

### Performance evaluation
Now let us evaluate the performance of FrugalGPT. We use LLMCascade on the HEADLINES dataset as an example.

First, we load the evaluation dataset and the LLMCascade.

In [13]:
test = FrugalGPT.loadcsvdata("data/HEADLINES/test.csv")
prefix = open('config/prompt/HEADLINES/prefix_e8.txt').read()
data_eval = FrugalGPT.formatdata(test,prefix)
print("test data size:",len(data_eval))
MyCascade = FrugalGPT.LLMCascade()
MyCascade.load(loadpath="strategy/HEADLINES/",budget=0.000665)

test data size: 5000


And then let us evaluate it on the evaluation dataset.

In [14]:
genparams=FrugalGPT.GenerationParameter(max_tokens=50, temperature=0.1, stop=['\n'])
result_GPT4 = MyLLMforAll.get_completion_batch(queries=data_eval,genparams=genparams,service_name='openaichat/gpt-4')
print("GPT-4 generations")
display(result_GPT4)
display(FrugalGPT.compute_score(result_GPT4))
result = MyCascade.get_completion_batch(queries=data_eval,genparams=genparams)
print("FrugalGPT LLMCascade generations")
display(result)
display(FrugalGPT.compute_score(result))

GPT-4 generations


Unnamed: 0,_id,answer,ref_answer,cost
0,6556,up,up,0.00660
1,5832,none,none,0.00666
2,5618,none,none,0.00666
3,4205,down,down,0.00660
4,842,up,down,0.00654
...,...,...,...,...
4995,8376,neutral,up,0.00651
4996,4242,down,down,0.00654
4997,663,down,up,0.00660
4998,6890,neutral,neutral,0.00648


{'em': 0.8556, 'cost': 0.006661218000000001}

FrugalGPT LLMCascade generations


Unnamed: 0,_id,answer,ref_answer,cost
0,6556,up,up,0.000055
1,5832,none,none,0.000499
2,5618,none,none,0.000499
3,4205,down,down,0.000055
4,842,up,down,0.000055
...,...,...,...,...
4995,8376,up,up,0.000055
4996,4242,down,down,0.000055
4997,663,down,up,0.000495
4998,6890,neutral,neutral,0.000054


{'em': 0.8594, 'cost': 0.00064688504}

Overall, LLMCascade achieves better performance than GPT-4 with a 10x smaller cost.