Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Multi-Query Accuracy docs and improve Sub-Query Completeness docs #676

Merged
merged 3 commits into from
Apr 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -215,6 +215,8 @@ Speak directly with the maintainers of UpTrain by [booking a call here](https://
| Eval | Description |
| ---- | ----------- |
|[Sub-Query Completeness](https://docs.uptrain.ai/predefined-evaluations/query-quality/sub-query-completeness) | Evaluate whether all of the sub-questions generated from a user's query, taken together, cover all aspects of the user's query or not |
| [Multi-Query Accuracy](https://docs.uptrain.ai/predefined-evaluations/query-quality/multi-query-accuracy) | Evaluate whether the variants generated accurately represent the original query |


<br />

Expand Down
3 changes: 2 additions & 1 deletion docs/mint.json
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,8 @@
{
"group": "Query Clarity Evals",
"pages": [
"predefined-evaluations/query-quality/sub-query-completeness"
"predefined-evaluations/query-quality/sub-query-completeness",
"predefined-evaluations/query-quality/multi-query-accuracy"
]
},
{
Expand Down
1 change: 1 addition & 0 deletions docs/predefined-evaluations/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ You can choose evals as per your needs. We have divided them into a few categori
| Eval | Description |
| ---- | ----------- |
|[Sub-query Completeness](/predefined-evaluations/query-quality/sub-query-completeness) | Evaluate if the list of generated sub-questions comprehensively cover all aspects of the main question. |
|[Multi-query Accuracy](/predefined-evaluations/query-quality/multi-query-accuracy) | Evaluates how accurately the variations of the query represent the same question. |
</Accordion>

<Accordion title="Code Related Evals">
Expand Down
83 changes: 83 additions & 0 deletions docs/predefined-evaluations/query-quality/multi-query-accuracy.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
---
title: Multi-Query Accuracy
description: Evaluates how accurately the variations of the query represent the same question.
---

Columns required:
- `question`: The question asked by the user
- `variants`: Sub questions generated from the question

### How to use it?

```python
from uptrain import EvalLLM, Evals

OPENAI_API_KEY = "sk-********************" # Insert your OpenAI key here

data = [
{
'question': 'How does the stock market work?',
'variants': '1. What is the stock market?\n 2. How does the stock market function?\n 3. What is the purpose of the stock market?'
},
{
'question': 'How does the stock market work?',
'variants': '1. What is the stock market?'
}
]

eval_llm = EvalLLM(openai_api_key=OPENAI_API_KEY)

res = eval_llm.evaluate(
data = data,
checks = [Evals.MULTI_QUERY_ACCURACY]
)
```
<Info>By default, we are using GPT 3.5 Turbo for evaluations. If you want to use a different model, check out this [tutorial](https://github.com/uptrain-ai/uptrain/blob/main/examples/open_source_evaluator_tutorial.ipynb).</Info>

Sample Response:
```json
[
{
"question": "How does the stock market work?",
"variants": "1. What is the stock market?\n 2. How does the stock market function?\n 3. What is the purpose of the stock market?",
"score_multi_query_accuracy": 1.0,
"explanation_multi_query_accuracy": "{\n \"Reasoning\": \"The response provides accurate and relevant information about the functioning and purpose of the stock market, addressing the various aspects of the question across different queries. It covers the definition of the stock market, its functioning, and its purpose, demonstrating a comprehensive understanding of the topic.\",\n \"Choice\": \"A\"\n}"
},
{
"question": "How does the stock market work?",
"variants": "1. What is the stock market?",
"score_multi_query_accuracy": 0.0,
"explanation_multi_query_accuracy": "{\n \"Reasoning\": \"The given variation does not directly address the main causes of climate change, but rather focuses on defining the stock market. It does not cover the aspects of how the stock market works, such as trading, investment, and market dynamics.\",\n \"Choice\": \"C\"\n}"
}
]
```

<Note>A higher Multi-Query Accuracy score reflects that the generated variants accurately represent the main question. A lower score indicates that the variants do not cover all the aspects of the main question.</Note>

### How it works?

We evaluate Multi-Query Accuracy by determining which of the following three cases apply for the given task data:

* The given variations mean the same as the original question.
* The given variations partially mean the same as the original question.
* The given variations do not mean the same as the original question.


<CardGroup cols={2}>
<Card
title="Tutorial"
href="https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/query_quality/multi_query_accuracy.ipynb"
icon="github"
color="#808080"
>
Open this tutorial in GitHub
</Card>
<Card
title="Have Questions?"
href="https://join.slack.com/t/uptraincommunity/shared_invite/zt-1yih3aojn-CEoR_gAh6PDSknhFmuaJeg"
icon="slack"
color="#808080"
>
Join our community for any questions or requests
</Card>
</CardGroup>
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Sub-Query Completeness checks whether the sub-queries generated from a question

Columns required:
- `question`: The question asked by the user
- `sub_question`: Sub questions generated from the question
- `sub_questions`: Sub questions generated from the question

### How to use it?

Expand All @@ -17,10 +17,10 @@ from uptrain import EvalLLM, Evals
OPENAI_API_KEY = "sk-********************" # Insert your OpenAI key here

data = [
{
'question': 'What is the Taj Mahal? When was it built, where and by whom',
'sub_questions': '1. What is the Taj Mahal? '
}
{
'question': 'What is the Taj Mahal? When was it built, where and by whom?',
'sub_questions': '1. What is the Taj Mahal? 2. When was the Taj Mahal built? 3. Where is the Taj Mahal? 4. Who built the Taj Mahal?'
}
]

eval_llm = EvalLLM(openai_api_key=OPENAI_API_KEY)
Expand All @@ -35,15 +35,17 @@ res = eval_llm.evaluate(
Sample Response:
```json
[
{
"score_sub_query_completeness": 0.0,
"explanation_sub_query_completeness": "Step by step reasoning:\n\n1. The main question is \"What is the Taj Mahal? When was it built, where and by whom.\"\n2. The sub-question provided is \"What is the Taj Mahal?\"\n3. The sub-question does not cover the aspects of when it was built, where, and by whom.\n4. The sub-question collectively does not cover any aspects of the main question.\n\n[Choice]: (C) Sub Questions collectively does not cover any aspects of the main question.\n[Explanation]: The sub-question provided collectively does not cover any aspects of the main question."
}
{
"question": "What is the Taj Mahal? When was it built, where and by whom?",
"sub_questions": "1. What is the Taj Mahal? 2. When was the Taj Mahal built? 3. Where is the Taj Mahal? 4. Who built the Taj Mahal?",
"score_sub_query_completeness": 1.0,
"explanation_sub_query_completeness": "Step by step reasoning:\n\n1. What is the Taj Mahal? - This sub-question covers the aspect of understanding what the Taj Mahal is, providing information about its nature and purpose.\n2. When was the Taj Mahal built? - This sub-question covers the aspect of the time of construction, addressing the historical timeline of the Taj Mahal's creation.\n3. Where is the Taj Mahal? - This sub-question covers the aspect of location, providing information about the geographical placement of the Taj Mahal.\n4. Who built the Taj Mahal? - This sub-question covers the aspect of the creator, addressing the individuals or entities responsible for the construction of the Taj Mahal.\n\nConclusion:\nThe sub-questions collectively cover all the aspects of the main question.\n\n[Choice]: (A) Sub Questions collectively all the aspects of the main question."
}
]
```
<Note>A higher Sub-Query Completeness score reflects that the generated sub-questions cover all aspects of the question asked.</Note>

The `sub_question` does not contain some parts of the `question` such as: "When was the Taj Mahal?", "Who built the Taj Mahal?", "Where is the the Taj Mahal?"
The `sub_questions` do not contain some parts of the `question` such as: "When was the Taj Mahal?", "Who built the Taj Mahal?", "Where is the the Taj Mahal?"

Resulting in low Sub-Query Completeness score.

Expand All @@ -59,7 +61,7 @@ We evaluate Sub-Query Completeness by determining which of the following three c
<CardGroup cols={2}>
<Card
title="Tutorial"
href="https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/sub_query/sub_query_completeness.ipynb"
href="https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/query_quality/sub_query_completeness.ipynb"
icon="github"
color="#808080"
>
Expand Down
1 change: 1 addition & 0 deletions examples/checks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,3 +87,4 @@
| Eval | Description |
| ---- | ----------- |
|[Sub-Query Completeness](https://docs.uptrain.ai/predefined-evaluations/query-quality/sub-query-completeness) | Evaluate whether all of the sub-questions generated from a user's query, taken together, cover all aspects of the user's query or not |
| [Multi-Query Accuracy](https://docs.uptrain.ai/predefined-evaluations/query-quality/multi-query-accuracy) | Evaluate whether the variants generated accurately represent the original query |
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,4 @@
| Eval | Description |
| ---- | ----------- |
|[Sub-Query Completeness](https://docs.uptrain.ai/predefined-evaluations/query-quality/sub-query-completeness) | Evaluate whether all of the sub-questions generated from a user's query, taken together, cover all aspects of the user's query or not |
| [Multi-Query Accuracy](https://docs.uptrain.ai/predefined-evaluations/query-quality/multi-query-accuracy) | Evaluate whether the variants generated accurately represent the original query |
Loading