From df185d1d931e1c1a35d71c5c22d07cf9b9d6bf62 Mon Sep 17 00:00:00 2001
From: Shreyansh Jain
Try out Evaluations
-
Read Docs
-
+Quickstart Tutorials
+-
Slack Community
-
Feature Request
+Try out Evaluations
+-
+Read Docs
+-
+Quickstart Tutorials
+-
+Slack Community
+-
+Feature Request
+
+Try out Evaluations
+-
+Read Docs
+-
+Quickstart Tutorials
+-
+Slack Community
+-
+Feature Request
+
Try out Evaluations
-
Read Docs
-
+Quickstart Tutorials
+-
Slack Community
-
Feature Request
Try out Evaluations
-
Read Docs
-
+Quickstart Tutorials
+-
Slack Community
-
Feature Request
-Try out Evaluations
--
-Read Docs
--
-Slack Community
--
-Feature Request
-
Try out Evaluations
-
Read Docs
-
+Quickstart Tutorials
+-
Slack Community
-
Feature Request
Try out Evaluations
-
Read Docs
-
+Quickstart Tutorials
+-
Slack Community
-
Feature Request
Try out Evaluations
-
Read Docs
-
+Quickstart Tutorials
+-
Slack Community
-
Feature Request
Try out Evaluations
-
Read Docs
-
+Quickstart Tutorials
+-
Slack Community
-
Feature Request
Try out Evaluations
-
Read Docs
-
+Quickstart Tutorials
+-
Slack Community
-
Feature Request
| Eval | Description |
| ---- | ----------- |
|[Context Relevance](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-relevance) | Grades how relevant the context was to the question specified. |
-|[Context Utilization](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-utilization) | Grades how complete the generated response was for the question specified given the information provided in the context. |
+|[Context Utilization](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-utilization) | Grades how complete the generated response was for the question specified, given the information provided in the context. |
|[Factual Accuracy](https://docs.uptrain.ai/predefined-evaluations/context-awareness/factual-accuracy)| Grades whether the response generated is factually correct and grounded by the provided context.|
|[Context Conciseness](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-conciseness)| Evaluates the concise context cited from an original context for irrelevant information.
|[Context Reranking](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-reranking)| Evaluates how efficient the reranked context is compared to the original context.|
@@ -123,9 +123,15 @@ Support for **40+ operators** such as BLEU, ROUGE, Embeddings Similarity, Exact
| Eval | Description |
| ---- | ----------- |
-|[Prompt Injection](https://docs.uptrain.ai/predefined-evaluations/safeguarding/prompt-injection) | Grades whether the generated response is leaking any system prompt. |
+|[Prompt Injection](https://docs.uptrain.ai/predefined-evaluations/safeguarding/prompt-injection) | Grades whether the user's prompt is an attempt to make the LLM reveal its system prompts. |
|[Jailbreak Detection](https://docs.uptrain.ai/predefined-evaluations/safeguarding/jailbreak) | Grades whether the user's prompt is an attempt to jailbreak (i.e. generate illegal or harmful responses). |
+
+
+| Eval | Description |
+| ---- | ----------- |
+|[Sub-Query Completeness](https://docs.uptrain.ai/predefined-evaluations/query-quality/sub-query-completeness) | Evaluate whether all of the sub-questions generated from a user's query, taken together, answer all aspects of the question or not |
+
# Get started π
diff --git a/docs/predefined-evaluations/ground-truth-comparison/response-matching.mdx b/docs/predefined-evaluations/ground-truth-comparison/response-matching.mdx
index 37846b6f7..e7dd5ec5e 100644
--- a/docs/predefined-evaluations/ground-truth-comparison/response-matching.mdx
+++ b/docs/predefined-evaluations/ground-truth-comparison/response-matching.mdx
@@ -1,11 +1,9 @@
---
title: Response Matching
-description: Grades how relevant the generated context was to the question specified.
+description: Grades how well the response generated by the LLM aligns with the provided ground truth.
---
-Response relevance is the measure of how relevant the generated response is to the question asked.
-
-It helps evaluate how well the response addresses the question asked and if it contains any additional information that is not relevant to the question asked.
+Response Matching compares the LLM-generated text with the gold (ideal) response using the defined score metric.
Columns required:
- `question`: The question asked by the user
diff --git a/examples/checks/README.md b/examples/checks/README.md
index b2164c807..8dc443ba4 100644
--- a/examples/checks/README.md
+++ b/examples/checks/README.md
@@ -1,88 +1,89 @@
-
-
+
+
+
-
-
-
-# Pre-built Evaluations We Offer π
+**[UpTrain](https://www.uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.
-#### Evaluate the quality of your responses:
+
-
-
-
-
-
-
-
-
-
-
-
-| Metrics | Usage |
-|------------|----------|
-| [Response Completeness](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/response_quality/completeness.ipynb) | Evaluate if the response completely resolves the given user query. |
-| [Response Relevance](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/response_quality/relevance.ipynb) | Evaluate whether the generated response for the given question, is relevant or not. |
-| [Response Conciseness](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/response_quality/conciseness.ipynb) | Evaluate how concise the generated response is i.e. the extent of additional irrelevant information in the response. |
-| [Response Matching ](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/response_quality/matching.ipynb) | Compare the LLM-generated text with the gold (ideal) response using the defined score metric. |
-| [Response Consistency](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/response_quality/consistency.ipynb) | Evaluate how consistent the response is with the question asked as well as with the context provided. |
+# Pre-built Evaluations We Offer π
+
+| Eval | Description |
+| ---- | ----------- |
+|[Response Completeness](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-completeness) | Grades whether the response has answered all the aspects of the question specified. |
+|[Response Conciseness](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-conciseness) | Grades how concise the generated response is or if it has any additional irrelevant information for the question asked. |
+|[Response Relevance](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-relevance)| Grades how relevant the generated context was to the question specified.|
+|[Response Validity](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-validity)| Grades if the response generated is valid or not. A response is considered to be valid if it contains any information.|
+|[Response Consistency](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-consistency)| Grades how consistent the response is with the question asked as well as with the context provided.|
-#### Evaluate the quality of retrieved context and response groundedness:
+
-| Metrics | Usage |
-|------------|----------|
+| Eval | Description |
+| ---- | ----------- |
|[Context Relevance](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-relevance) | Grades how relevant the context was to the question specified. |
-|[Context Utilization](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-utilization) | Grades how complete the generated response was for the question specified given the information provided in the context. |
+|[Context Utilization](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-utilization) | Grades how complete the generated response was for the question specified, given the information provided in the context. |
|[Factual Accuracy](https://docs.uptrain.ai/predefined-evaluations/context-awareness/factual-accuracy)| Grades whether the response generated is factually correct and grounded by the provided context.|
|[Context Conciseness](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-conciseness)| Evaluates the concise context cited from an original context for irrelevant information.
|[Context Reranking](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-reranking)| Evaluates how efficient the reranked context is compared to the original context.|
+
+
+| Eval | Description |
+| ---- | ----------- |
+|[Language Features](https://docs.uptrain.ai/predefined-evaluations/language-quality/fluency-and-coherence) | Grades whether the response has answered all the aspects of the question specified. |
+|[Tonality](https://docs.uptrain.ai/predefined-evaluations/code-evals/code-hallucination) | Grades whether the generated response matches the required persona's tone |
+
+
+
+| Eval | Description |
+| ---- | ----------- |
+|[Code Hallucination](https://docs.uptrain.ai/predefined-evaluations/code-evals/code-hallucination) | Grades whether the code present in the generated response is grounded by the context. |
-#### Evaluations to safeguard system prompts and avoid LLM mis-use:
+
-| Metrics | Usage |
-|------------|----------|
-| [Prompt Injection](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/safeguarding/system_prompt_injection.ipynb) | Identify prompt leakage attacks|
-| [Jailbreak Detection](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/safeguarding/jailbreak_detection.ipynb) | Detect prompts with potentially harmful or illegal behaviour|
+| Eval | Description |
+| ---- | ----------- |
+|[User Satisfaction](https://docs.uptrain.ai/predefined-evaluations/conversation-evals/user-satisfaction) | Grades how well the user's concerns are addressed and assesses their satisfaction based on provided conversation. |
-#### Evaluate the language quality of the response:
+
-| Metrics | Usage |
-|------------|----------|
-| [Tone Critique](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/language_features/tone_critique.ipynb) | Assess if the tone of machine-generated responses matches with the desired persona.|
-| [Language Critique](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/language_features/language_critique.ipynb) | Evaluate LLM generated responses on multiple aspects - fluence, politeness, grammar, and coherence.|
-| [Rouge Score](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/language_features/rouge_score.ipynb) | Measure the similarity between two pieces of text. |
+ Eval | Description |
+| ---- | ----------- |
+|[Custom Guideline](https://docs.uptrain.ai/predefined-evaluations/custom-evals/custom-guideline) | Allows you to specify a guideline and grades how well the LLM adheres to the provided guideline when giving a response. |
+|[Custom Prompts](https://docs.uptrain.ai/predefined-evaluations/custom-evals/custom-prompt-eval) | Allows you to create your own set of evaluations. |
-#### Evaluate the conversation as a whole:
+
-| Metrics | Usage |
-|------------|----------|
-| [Conversation Satisfaction](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/conversation/conversation_satisfaction.ipynb) | Measures the userβs satisfaction with the conversation with the AI assistant based on completeness and user acceptance.|
+| Eval | Description |
+| ---- | ----------- |
+|[Response Matching](https://docs.uptrain.ai/predefined-evaluations/ground-truth-comparison/response-matching) | Compares and grades how well the response generated by the LLM aligns with the provided ground truth. |
-#### Defining custom evaluations and others:
+
-| Metrics | Usage |
-|------------|----------|
-| [Guideline Adherence](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/custom/guideline_adherence.ipynb) | Grade how well the LLM adheres to a given custom guideline.|
-| [Custom Prompt Evaluation](https://github.com/uptrain-ai/uptrain/blob/main) | Evaluate by defining your custom grading prompt.|
-| [Cosine Similarity](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/custom/cosine_similarity.ipynb) | Calculate cosine similarity between embeddings of two texts.|
+| Eval | Description |
+| ---- | ----------- |
+|[Prompt Injection](https://docs.uptrain.ai/predefined-evaluations/safeguarding/prompt-injection) | Grades whether the user's prompt is an attempt to make the LLM reveal its system prompts. |
+|[Jailbreak Detection](https://docs.uptrain.ai/predefined-evaluations/safeguarding/jailbreak) | Grades whether the user's prompt is an attempt to jailbreak (i.e. generate illegal or harmful responses). |
-If you face any difficulties, need some help with using UpTrain or want to brainstorm on custom evaluations for your use-case, [speak to the maintainers of UpTrain here](https://calendly.com/uptrain-sourabh/30min).
+
-You can also raise a request for a new metrics [here](https://github.com/uptrain-ai/uptrain/issues/new?assignees=&labels=enhancement&template=feature_request.md&title=)
\ No newline at end of file
+| Eval | Description |
+| ---- | ----------- |
+|[Sub-Query Completeness](https://docs.uptrain.ai/predefined-evaluations/query-quality/sub-query-completeness) | Evaluate whether all of the sub-questions generated from a user's query, taken together, answer all aspects of the question or not |
\ No newline at end of file
diff --git a/examples/checks/code_eval/README.md b/examples/checks/code_eval/README.md
new file mode 100644
index 000000000..2d4d2436a
--- /dev/null
+++ b/examples/checks/code_eval/README.md
@@ -0,0 +1,25 @@
+
+
+
+
+
+
+
+
+| Eval | Description |
+| ---- | ----------- |
+|[Code Hallucination](https://docs.uptrain.ai/predefined-evaluations/code-evals/code-hallucination) | Grades whether the code present in the generated response is grounded by the context. |
diff --git a/examples/checks/compare_ground_truth/README.md b/examples/checks/compare_ground_truth/README.md
new file mode 100644
index 000000000..8fcec5f21
--- /dev/null
+++ b/examples/checks/compare_ground_truth/README.md
@@ -0,0 +1,26 @@
+
+
+
+
+
+
+
+
+
+| Eval | Description |
+| ---- | ----------- |
+|[Response Matching](https://docs.uptrain.ai/predefined-evaluations/ground-truth-comparison/response-matching) | Compares and grades how well the response generated by the LLM aligns with the provided ground truth. |
\ No newline at end of file
diff --git a/examples/checks/compare_ground_truth/matching.ipynb b/examples/checks/compare_ground_truth/matching.ipynb
index f1e0342cb..07ecd6b31 100644
--- a/examples/checks/compare_ground_truth/matching.ipynb
+++ b/examples/checks/compare_ground_truth/matching.ipynb
@@ -33,9 +33,7 @@
"id": "2ef54d59-295e-4f15-a35f-33f4e86ecdd2",
"metadata": {},
"source": [
- "**What is Response Matching?**: Response Completeness is a metric that determines how well the response generated by an LLM matches the ground truth. It comes in handy while checking for the overlap between an LLM generated response and ground truth.\n",
- "\n",
- "For example, if a user asks a question about the formula of chlorophyll, the ideal response could be: \"The formula of chlorophyll is C55H72MgN4O5\", rather if the response contains some other information about chlorophyll and not the formula: \"Chlorophyll is the pigmet used in photosynthesis, it helpes in generating oxygen.\" it might not really be ideal as it does not match with the ground truth resulting in a low matching score.\n",
+ "**What is Response Matching?**: Response Matching compares the LLM-generated text with the gold (ideal) response using the defined score metric. \n",
"\n",
"**Data schema**: The data schema required for this evaluation is as follows:\n",
"\n",
diff --git a/examples/checks/context_awareness/README.md b/examples/checks/context_awareness/README.md
index 4e97cd06b..fef040b84 100644
--- a/examples/checks/context_awareness/README.md
+++ b/examples/checks/context_awareness/README.md
@@ -1,47 +1,31 @@
-
-
+
+
+
-
-
-
-# Context Awareness Evaluations π
+**[UpTrain](https://www.uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.
-#### Evaluate the quality of retrieved context and response groundedness:
+
-
-
-
-
-
-
-
-
-
-
-
-| Metrics | Usage |
-|------------|----------|
+| Eval | Description |
+| ---- | ----------- |
|[Context Relevance](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-relevance) | Grades how relevant the context was to the question specified. |
-|[Context Utilization](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-utilization) | Grades how complete the generated response was for the question specified given the information provided in the context. |
+|[Context Utilization](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-utilization) | Grades how complete the generated response was for the question specified, given the information provided in the context. |
|[Factual Accuracy](https://docs.uptrain.ai/predefined-evaluations/context-awareness/factual-accuracy)| Grades whether the response generated is factually correct and grounded by the provided context.|
|[Context Conciseness](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-conciseness)| Evaluates the concise context cited from an original context for irrelevant information.
-|[Context Reranking](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-reranking)| Evaluates how efficient the reranked context is compared to the original context.|
-
-If you face any difficulties, need some help with using UpTrain or want to brainstorm on custom evaluations for your use-case, [speak to the maintainers of UpTrain here](https://calendly.com/uptrain-sourabh/30min).
-
-You can also raise a request for a new metrics [here](https://github.com/uptrain-ai/uptrain/issues/new?assignees=&labels=enhancement&template=feature_request.md&title=)
\ No newline at end of file
+|[Context Reranking](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-reranking)| Evaluates how efficient the reranked context is compared to the original context.|
\ No newline at end of file
diff --git a/examples/checks/conversation/README.md b/examples/checks/conversation/README.md
index 6fac2f4e1..0bb178a43 100644
--- a/examples/checks/conversation/README.md
+++ b/examples/checks/conversation/README.md
@@ -1,43 +1,26 @@
-
-
+
+
+
-
-
-
-
-# Conversation Satisfaction Evaluation π
-
-#### Evaluate the conversation as a whole:
-
-| Metrics | Usage |
-|------------|----------|
-| [Conversation Satisfaction](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/conversation/conversation_satisfaction.ipynb) | Measures the userβs satisfaction with the conversation with the AI assistant based on completeness and user acceptance.|
+**[UpTrain](https://www.uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.
-If you face any difficulties, need some help with using UpTrain or want to brainstorm on custom evaluations for your use-case, [speak to the maintainers of UpTrain here](https://calendly.com/uptrain-sourabh/30min).
+
-
-
-
-
-
-
-
-
-
-
-
-You can also raise a request for a new metrics [here](https://github.com/uptrain-ai/uptrain/issues/new?assignees=&labels=enhancement&template=feature_request.md&title=)
\ No newline at end of file
+| Eval | Description |
+| ---- | ----------- |
+|[User Satisfaction](https://docs.uptrain.ai/predefined-evaluations/conversation-evals/user-satisfaction) | Grades how well the user's concerns are addressed and assesses their satisfaction based on provided conversation. |
\ No newline at end of file
diff --git a/examples/checks/custom/README.md b/examples/checks/custom/README.md
index 9d5c27698..347edb323 100644
--- a/examples/checks/custom/README.md
+++ b/examples/checks/custom/README.md
@@ -1,117 +1,27 @@
-
-
-
+
+
-
-
-
-
-# Other Custom Evaluations π
-
-#### Evaluate the quality of your responses:
-
-| Metrics | Usage |
-|------------|----------|
-| [Response Completeness](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/response_quality/completeness.ipynb) | Evaluate if the response completely resolves the given user query. |
-| [Response Relevance](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/response_quality/relevance.ipynb) | Evaluate whether the generated response for the given question, is relevant or not. |
-| [Response Conciseness](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/response_quality/conciseness.ipynb) | Evaluate how concise the generated response is i.e. the extent of additional irrelevant information in the response. |
-| [Response Matching ](https://github.com/uptrain-ai/uptrain/blob/main) | Compare the LLM-generated text with the gold (ideal) response using the defined score metric. |
-| [Response Consistency](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/response_quality/consistency.ipynb) | Evaluate how consistent the response is with the question asked as well as with the context provided. |
-
-
-#### Evaluate based on langauge features:
-
-| Metrics | Usage |
-|------------|----------|
-| [Factual Accuracy](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/context_awareness/factual_accuracy.ipynb) | Evaluate if the facts present in the response can be verified by the retrieved context. |
-| [Response Completeness wrt Context](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/context_awareness/response_completeness_wrt_context.ipynb) | Grades how complete the response was for the question specified concerning the information present in the context.|
-| [Context Relevance](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/context_awareness/relevance.ipynb) | Evaluate if the retrieved context contains sufficient information to answer the given question. |
-
-
-#### Evaluations to safeguard system prompts and avoid LLM mis-use:
-
-| Metrics | Usage |
-|------------|----------|
-| [Prompt Injection](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/safeguarding/system_prompt_injection.ipynb) | Identify prompt leakage attacks|
-
-
-#### Evaluate the language quality of the response:
-
-| Metrics | Usage |
-|------------|----------|
-| [Tone Critique](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/language_features/tone_critique.ipynb) | Assess if the tone of machine-generated responses matches with the desired persona.|
-| [Language Critique](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/language_features/language_critique.ipynb) | Evaluate LLM generated responses on multiple aspects - fluence, politeness, grammar, and coherence.|
-
-#### Evaluate the conversation as a whole:
-
-| Metrics | Usage |
-|------------|----------|
-| [Conversation Satisfaction](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/conversation/conversation_satisfaction.ipynb) | Measures the userβs satisfaction with the conversation with the AI assistant based on completeness and user acceptance.|
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-# Pre-built Evaluations We Offer π
-
-#### Defining custom evaluations and others:
-
-| Metrics | Usage |
-|------------|----------|
-| [Guideline Adherence](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/custom/guideline_adherence.ipynb) | Grade how well the LLM adheres to a given custom guideline.|
-| [Custom Prompt Evaluation](https://github.com/uptrain-ai/uptrain/blob/main) | Evaluate by defining your custom grading prompt.|
-| [Cosine Similarity](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/custom/cosine_similarity.ipynb) | Calculate cosine similarity between embeddings of two texts.|
+**[UpTrain](https://www.uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.
-If you face any difficulties, need some help with using UpTrain or want to brainstorm on custom evaluations for your use-case, [speak to the maintainers of UpTrain here](https://calendly.com/uptrain-sourabh/30min).
+
-
-
-
-
-
-
-
-
-
-
-
-You can also raise a request for a new metrics [here](https://github.com/uptrain-ai/uptrain/issues/new?assignees=&labels=enhancement&template=feature_request.md&title=)
\ No newline at end of file
+ Eval | Description |
+| ---- | ----------- |
+|[Custom Guideline](https://docs.uptrain.ai/predefined-evaluations/custom-evals/custom-guideline) | Allows you to specify a guideline and grades how well the LLM adheres to the provided guideline when giving a response. |
+|[Custom Prompts](https://docs.uptrain.ai/predefined-evaluations/custom-evals/custom-prompt-eval) | Allows you to create your own set of evaluations. |
\ No newline at end of file
diff --git a/examples/checks/language_features/README.md b/examples/checks/language_features/README.md
index b046842f9..4a595c853 100644
--- a/examples/checks/language_features/README.md
+++ b/examples/checks/language_features/README.md
@@ -1,45 +1,27 @@
-
-
+
+
+
-
-
-
-
-# Evaluations based on language quality π
-
-#### Evaluate the language quality of the response:
-
-| Metrics | Usage |
-|------------|----------|
-| [Tone Critique](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/language_features/tone_critique.ipynb) | Assess if the tone of machine-generated responses matches with the desired persona.|
-| [Language Critique](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/language_features/language_critique.ipynb) | Evaluate LLM generated responses on multiple aspects - fluence, politeness, grammar, and coherence.|
-| [Rouge Score](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/language_features/rouge_score.ipynb) | Measure the similarity between two pieces of text. |
+**[UpTrain](https://www.uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.
-If you face any difficulties, need some help with using UpTrain or want to brainstorm on custom evaluations for your use-case, [speak to the maintainers of UpTrain here](https://calendly.com/uptrain-sourabh/30min).
+
-
-
-
-
-
-
-
-
-
-
-
-You can also raise a request for a new metrics [here](https://github.com/uptrain-ai/uptrain/issues/new?assignees=&labels=enhancement&template=feature_request.md&title=)
\ No newline at end of file
+| Eval | Description |
+| ---- | ----------- |
+|[Language Features](https://docs.uptrain.ai/predefined-evaluations/language-quality/fluency-and-coherence) | Grades whether the response has answered all the aspects of the question specified. |
+|[Tonality](https://docs.uptrain.ai/predefined-evaluations/code-evals/code-hallucination) | Grades whether the generated response matches the required persona's tone |
\ No newline at end of file
diff --git a/examples/checks/response_quality/README.md b/examples/checks/response_quality/README.md
index 8a34bceaa..dbb4a1273 100644
--- a/examples/checks/response_quality/README.md
+++ b/examples/checks/response_quality/README.md
@@ -1,47 +1,31 @@
-
-
+
+
+
-
-
-
-
-# Evaluations based on response quality π
-
-#### Evaluate the quality of your responses:
+**[UpTrain](https://www.uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.
-| Metrics | Usage |
-|------------|----------|
-| [Response Completeness](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/response_quality/completeness.ipynb) | Evaluate if the response completely resolves the given user query. |
-| [Response Relevance](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/response_quality/relevance.ipynb) | Evaluate whether the generated response for the given question, is relevant or not. |
-| [Response Conciseness](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/response_quality/conciseness.ipynb) | Evaluate how concise the generated response is i.e. the extent of additional irrelevant information in the response. |
-| [Response Matching ](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/response_quality/matching.ipynb) | Compare the LLM-generated text with the gold (ideal) response using the defined score metric. |
-| [Response Consistency](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/response_quality/consistency.ipynb) | Evaluate how consistent the response is with the question asked as well as with the context provided. |
-If you face any difficulties, need some help with using UpTrain or want to brainstorm on custom evaluations for your use-case, [speak to the maintainers of UpTrain here](https://calendly.com/uptrain-sourabh/30min).
+
-
-
-
-
-
-
-
-
-
-
-
-You can also raise a request for a new metrics [here](https://github.com/uptrain-ai/uptrain/issues/new?assignees=&labels=enhancement&template=feature_request.md&title=)
\ No newline at end of file
+| Eval | Description |
+| ---- | ----------- |
+|[Response Completeness](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-completeness) | Grades whether the response has answered all the aspects of the question specified. |
+|[Response Conciseness](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-conciseness) | Grades how concise the generated response is or if it has any additional irrelevant information for the question asked. |
+|[Response Relevance](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-relevance)| Grades how relevant the generated context was to the question specified.|
+|[Response Validity](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-validity)| Grades if the response generated is valid or not. A response is considered to be valid if it contains any information.|
+|[Response Consistency](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-consistency)| Grades how consistent the response is with the question asked as well as with the context provided.|
\ No newline at end of file
diff --git a/examples/checks/safeguarding/README.md b/examples/checks/safeguarding/README.md
index ac177c1bb..e190d0a07 100644
--- a/examples/checks/safeguarding/README.md
+++ b/examples/checks/safeguarding/README.md
@@ -1,44 +1,27 @@
-
-
+
+
+
-
-
-
-
-# Evaluations to ensure better safety π
-
-#### Evaluations to safeguard system prompts and avoid LLM mis-use:
-
-| Metrics | Usage |
-|------------|----------|
-| [Prompt Injection](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/safeguarding/system_prompt_injection.ipynb) | Identify prompt leakage attacks|
-| [Jailbreak Detection](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/safeguarding/jailbreak_detection.ipynb) | Detect prompts with potentially harmful or illegal behaviour|
+**[UpTrain](https://www.uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.
-If you face any difficulties, need some help with using UpTrain or want to brainstorm on custom evaluations for your use-case, [speak to the maintainers of UpTrain here](https://calendly.com/uptrain-sourabh/30min).
+
-
-
-
-
-
-
-
-
-
-
-
-You can also raise a request for a new metrics [here](https://github.com/uptrain-ai/uptrain/issues/new?assignees=&labels=enhancement&template=feature_request.md&title=)
\ No newline at end of file
+| Eval | Description |
+| ---- | ----------- |
+|[Prompt Injection](https://docs.uptrain.ai/predefined-evaluations/safeguarding/prompt-injection) | Grades whether the user's prompt is an attempt to make the LLM reveal its system prompts. |
+|[Jailbreak Detection](https://docs.uptrain.ai/predefined-evaluations/safeguarding/jailbreak) | Grades whether the user's prompt is an attempt to jailbreak (i.e. generate illegal or harmful responses). |
\ No newline at end of file
diff --git a/examples/checks/sub_query/README.md b/examples/checks/sub_query/README.md
index 08179307b..cbc621cb6 100644
--- a/examples/checks/sub_query/README.md
+++ b/examples/checks/sub_query/README.md
@@ -1,44 +1,27 @@
-
-
+
+
+
-
-
-
-
-# Pre-built Evaluations We Offer π
-
-#### Evaluate the quality of sub queries:
-
-| Metrics | Usage |
-|------------|----------|
-| [Sub-query Completeness](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/sub_query/sub_query_completeness.ipynb) | Evaluate if the list of generated sub-questions comprehensively cover all aspects of the main question.|
+**[UpTrain](https://www.uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.
-If you face any difficulties, need some help with using UpTrain or want to brainstorm on custom evaluations for your use-case, [speak to the maintainers of UpTrain here](https://calendly.com/uptrain-sourabh/30min).
+
-
-
-
-
-
-
-
-
-
-
-
-You can also raise a request for a new metrics [here](https://github.com/uptrain-ai/uptrain/issues/new?assignees=&labels=enhancement&template=feature_request.md&title=)
\ No newline at end of file
+| Eval | Description |
+| ---- | ----------- |
+|[Sub-Query Completeness](https://docs.uptrain.ai/predefined-evaluations/query-quality/sub-query-completeness) | Evaluate whether all of the sub-questions generated from a user's query, taken together, answer all aspects of the question or not |
\ No newline at end of file
From 723acb908d5b43476a8d8e3e11ef30c149abda6d Mon Sep 17 00:00:00 2001
From: Shreyansh Jain
-
+
@@ -21,7 +21,7 @@
-**[UpTrain](https://www.uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.
+**[UpTrain](https://uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.
diff --git a/examples/checks/README.md b/examples/checks/README.md
index 8dc443ba4..624bff999 100644
--- a/examples/checks/README.md
+++ b/examples/checks/README.md
@@ -1,5 +1,5 @@
-
+
@@ -18,7 +18,7 @@