<h1 align="center">
  <a href="https://uptrain.ai">
    <img width="300" src="https://user-images.githubusercontent.com/108270398/214240695-4f958b76-c993-4ddd-8de6-8668f4d0da84.png" alt="uptrain">
  </a>
</h1>


<h1 style="text-align: center;">Evaluating Response Completeness</h1>


**What is Response Completeness?**: Response Completeness is a metric that determines how well a model answers all aspects of a question. This is important when making important decisions where you want to consider all aspects of a question.

For example, if you are asking a model to determine if a person is eligible for a loan, you want to make sure that the model considers all aspects of the person's financial situation. If the model only considers the person's income, it may not be able to make an accurate decision.

**Data schema**: The data schema required for this evaluation is as follows:

| Column Name | Description                     |
| ----------- | ------------------------------- |
| question    | The question asked by the user  |
| response    | The response given by the model |


Install UpTrain by running 'pip install uptrain'


In [1]:
from uptrain import EvalLLM, Evals
import json

OPENAI_API_KEY = "sk-*****************"  # Insert your OpenAI key here

Let's look at some good data first. The first question asks about the best ways to save money. The response contains many different ways as well as specific tips to save money. This is a complete response. The other two responses are also good and complete.

In [2]:
good_data = [{
    "question": "What are the best ways to save money?",
    "response": "There are many ways to save money, including:\
\
* Creating a budget and tracking your spending\
* Cutting back on unnecessary expenses\
* Saving money on your groceries and other everyday purchases\
* Investing your money wisely\
\
Here are some specific tips for saving money:\
\
* Set financial goals. What are you saving for? Once you know what you're saving for, you can create a plan to reach your goals.\
* Create a budget. A budget can help you track your income and expenses, so you can see where your money is going and make changes to save more.\
* Cut back on unnecessary expenses. Take a close look at your spending and identify any areas where you can cut back. For example, you could cook more meals at home instead of eating out, or cancel unused subscriptions.\
* Save money on your groceries and other everyday purchases. There are many ways to save money on groceries and other everyday purchases, such as using coupons, shopping at discount stores, and buying in bulk.\
* Invest your money wisely. Investing is a great way to grow your money over time. However, it's important to do your research and invest in a way that's appropriate for your risk tolerance and financial goals."
},
    {
    "question": "What are the different types of business loans?",
    "response": "There are many different types of business loans available, depending on the needs of your business. Some of the most common types of business loans include:\
\
\
* Term loans: Term loans are repaid over a set period of time, typically with equal monthly payments.\
* Line of credit: A line of credit is a revolving loan that you can borrow from and repay as needed.\
* Equipment loans: Equipment loans are used to finance the purchase of equipment for your business.\
* Real estate loans: Real estate loans are used to finance the purchase of commercial real estate for your business.\
* SBA loans: SBA loans are guaranteed by the Small Business Administration, making them easier to qualify for.\
\
To choose the right type of business loan for your needs, you should consider the following factors:\
\
* The amount of money you need to borrow\
* The purpose of the loan\
* Your credit score and financial history\
* The term of the loan\
* The interest rate\
\
You should also compare offers from multiple lenders before choosing a loan."
},
    {
    "question": "What are the tax implications of starting a business?",
    "response": "There are a number of tax implications to consider when starting a business. Some of the most important tax considerations include:\
\
* The type of business entity you choose: The type of business entity you choose will affect how your business is taxed. For example, sole proprietors and partnerships are taxed as pass-through entities, meaning that the business income is passed through to the owners and taxed on their individual tax returns. Corporations, on the other hand, are taxed as separate entities.\
* Business income and expenses: All business income is subject to income tax. Business expenses can be deducted to reduce taxable income. However, there are certain rules that apply to business deductions.\
* Self-employment tax: Self-employment tax is a combination of Social Security and Medicare taxes that self-employed individuals must pay.\
* Estimated taxes: Self-employed individuals and businesses that expect to owe more than $1,000 in taxes for the year are required to make estimated tax payments throughout the year.\
\
It is important to consult with a tax advisor to discuss the specific tax implications of starting your business."
}
]

In [3]:
eval_llm = EvalLLM(openai_api_key=OPENAI_API_KEY)

res = eval_llm.evaluate(
    data=good_data,
    checks=[Evals.RESPONSE_COMPLETENESS]
)

[32m2023-11-16 12:27:37.114[0m | [1mINFO    [0m | [36muptrain.framework.evalllm[0m:[36mevaluate[0m:[36m95[0m - [1mSending evaluation request for rows 0 to <50 to the Uptrain[0m


[32m2023-11-16 12:27:47.567[0m | [1mINFO    [0m | [36muptrain.framework.evalllm[0m:[36mevaluate[0m:[36m95[0m - [1mSending evaluation request for rows 0 to <50 to the Uptrain[0m
[32m2023-11-16 12:27:52.009[0m | [1mINFO    [0m | [36muptrain.framework.evalllm[0m:[36mevaluate[0m:[36m95[0m - [1mSending evaluation request for rows 0 to <50 to the Uptrain[0m


Now let's look at the results of the UpTrain evaluation. We can see that the responses have been given a high score along with a detailed explanation of why the score was given.

In [4]:
print(json.dumps(res, indent=3))

[
   {
      "question": "What are the best ways to save money?",
      "response": "There are many ways to save money, including:* Creating a budget and tracking your spending* Cutting back on unnecessary expenses* Saving money on your groceries and other everyday purchases* Investing your money wiselyHere are some specific tips for saving money:* Set financial goals. What are you saving for? Once you know what you're saving for, you can create a plan to reach your goals.* Create a budget. A budget can help you track your income and expenses, so you can see where your money is going and make changes to save more.* Cut back on unnecessary expenses. Take a close look at your spending and identify any areas where you can cut back. For example, you could cook more meals at home instead of eating out, or cancel unused subscriptions.* Save money on your groceries and other everyday purchases. There are many ways to save money on groceries and other everyday purchases, such as using coupons,

We can look at the explanation for the first response and see that it is logical and reasonable. 

In [5]:
print(res[0]["explanation_response_completeness"])

1. The question asks for the best ways to save money.
2. The response provides a list of ways to save money, including creating a budget, cutting back on unnecessary expenses, saving money on groceries and everyday purchases, and investing money wisely.
3. The response also provides specific tips for saving money, such as setting financial goals, creating a budget, cutting back on unnecessary expenses, saving money on groceries and everyday purchases, and investing money wisely.
4. The response addresses the question by providing relevant and adequate information on the best ways to save money.
5. Therefore, the generated answer answers the given question adequately.

The score is 1.0
1.0


Now, let's look at some bad data. We will take up the same questions, but with bad responses and look at the results of the UpTrain evaluation. Looking at the response of the first question again, we can easily tell that it is not a good response and does not answer the question completely. Let's also see what happens when we give a response that is not even in the same domain as the question. 

In [6]:
bad_data = [{
    "question": "What are the best ways to save money?",
    "response": "Just don't spend money. It's that simple.",
},
    {
    "question": "What are the tax implications of starting a business?",
    "response": "Taxes are complicated. You should hire an accountant to help you with your taxes.",
},
    {
    "question": "What are the different types of business loans?",
    "response": "Fruits with high water content, such as watermelon and cantaloupe, are good for hydration.",
}]

In [7]:
res = eval_llm.evaluate(
    data=bad_data,
    checks=[Evals.RESPONSE_COMPLETENESS]
)

[32m2023-11-16 12:27:56.483[0m | [1mINFO    [0m | [36muptrain.framework.evalllm[0m:[36mevaluate[0m:[36m95[0m - [1mSending evaluation request for rows 0 to <50 to the Uptrain[0m
[32m2023-11-16 12:28:01.054[0m | [1mINFO    [0m | [36muptrain.framework.evalllm[0m:[36mevaluate[0m:[36m95[0m - [1mSending evaluation request for rows 0 to <50 to the Uptrain[0m
[32m2023-11-16 12:28:05.993[0m | [1mINFO    [0m | [36muptrain.framework.evalllm[0m:[36mevaluate[0m:[36m95[0m - [1mSending evaluation request for rows 0 to <50 to the Uptrain[0m


Let's look at the results of the evaluations.

In [8]:
print(json.dumps(res, indent=3))

[
   {
      "question": "What are the best ways to save money?",
      "response": "Just don't spend money. It's that simple.",
      "score_response_completeness": 0.5,
      "explanation_response_completeness": "1. The question asks for the best ways to save money.\n2. The machine-generated response states, \"Just don't spend money. It's that simple.\"\n3. The response does not provide any specific ways or strategies to save money.\n4. Instead, the response suggests not spending money as the solution.\n5. While not spending money can be a way to save money, it is not a comprehensive answer to the question.\n6. The response does not address other potential ways to save money such as budgeting, cutting expenses, or investing.\n7. Therefore, the generated answer only partially answers the given question.\n8. The correct choice is B: The generated answer only partially answers the given question.\n\n0.5\n0.5"
   },
   {
      "question": "What are the tax implications of starting a busi

We can look at the explanation for the first one and contrast it with the explanation for the good response. It is clear that this response misses many aspects of the question and is not a good response and thus gets a score of 0.5.

In [9]:
print(res[0]["explanation_response_completeness"])

1. The question asks for the best ways to save money.
2. The machine-generated response states, "Just don't spend money. It's that simple."
3. The response does not provide any specific ways or strategies to save money.
4. Instead, the response suggests not spending money as the solution.
5. While not spending money can be a way to save money, it is not a comprehensive answer to the question.
6. The response does not address other potential ways to save money such as budgeting, cutting expenses, or investing.
7. Therefore, the generated answer only partially answers the given question.
8. The correct choice is B: The generated answer only partially answers the given question.

0.5
0.5


The first two responses got a score of 0.5 because they answered some aspects of the question, but not all. The last response got a score of 0 because it did not answer any aspect of the question. Let's look at the explanation that UpTrain returned for this score.

In [10]:
print(res[2]["explanation_response_completeness"])

1. The question asks about the different types of business loans.
2. The response talks about fruits with high water content, such as watermelon and cantaloupe, being good for hydration.
3. The response is completely unrelated to the question and does not provide any information about business loans.
4. Therefore, the generated answer doesn't answer the given question at all.

Score: 0.0
