From df185d1d931e1c1a35d71c5c22d07cf9b9d6bf62 Mon Sep 17 00:00:00 2001 From: Shreyansh Jain Date: Wed, 6 Mar 2024 17:11:29 +0530 Subject: [PATCH 1/6] fixed readme for evals tutorials --- README.md | 20 ++-- .../response-matching.mdx | 6 +- examples/checks/README.md | 107 ++++++++--------- examples/checks/code_eval/README.md | 25 ++++ .../checks/compare_ground_truth/README.md | 26 +++++ .../compare_ground_truth/matching.ipynb | 4 +- examples/checks/context_awareness/README.md | 38 ++---- examples/checks/conversation/README.md | 37 ++---- examples/checks/custom/README.md | 110 ++---------------- examples/checks/language_features/README.md | 40 ++----- examples/checks/response_quality/README.md | 44 +++---- examples/checks/safeguarding/README.md | 39 ++----- examples/checks/sub_query/README.md | 37 ++---- 13 files changed, 198 insertions(+), 335 deletions(-) create mode 100644 examples/checks/code_eval/README.md create mode 100644 examples/checks/compare_ground_truth/README.md diff --git a/README.md b/README.md index adbb89e5e..8495e9a69 100644 --- a/README.md +++ b/README.md @@ -71,18 +71,18 @@ Support for **40+ operators** such as BLEU, ROUGE, Embeddings Similarity, Exact | Eval | Description | | ---- | ----------- | -|[Reponse Completeness](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-completeness) | Grades whether the response has answered all the aspects of the question specified. | -|[Reponse Conciseness](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-conciseness) | Grades how concise the generated response is or if it has any additional irrelevant information for the question asked. | -|[Reponse Relevance](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-relevance)| Grades how relevant the generated context was to the question specified.| -|[Reponse Validity](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-validity)| Grades if the response generated is valid or not. A response is considered to be valid if it contains any information.| -|[Reponse Consistency](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-consistency)| Grades how consistent the response is with the question asked as well as with the context provided.| +|[Response Completeness](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-completeness) | Grades whether the response has answered all the aspects of the question specified. | +|[Response Conciseness](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-conciseness) | Grades how concise the generated response is or if it has any additional irrelevant information for the question asked. | +|[Response Relevance](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-relevance)| Grades how relevant the generated context was to the question specified.| +|[Response Validity](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-validity)| Grades if the response generated is valid or not. A response is considered to be valid if it contains any information.| +|[Response Consistency](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-consistency)| Grades how consistent the response is with the question asked as well as with the context provided.| quality of retrieved context and response groundedness | Eval | Description | | ---- | ----------- | |[Context Relevance](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-relevance) | Grades how relevant the context was to the question specified. | -|[Context Utilization](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-utilization) | Grades how complete the generated response was for the question specified given the information provided in the context. | +|[Context Utilization](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-utilization) | Grades how complete the generated response was for the question specified, given the information provided in the context. | |[Factual Accuracy](https://docs.uptrain.ai/predefined-evaluations/context-awareness/factual-accuracy)| Grades whether the response generated is factually correct and grounded by the provided context.| |[Context Conciseness](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-conciseness)| Evaluates the concise context cited from an original context for irrelevant information. |[Context Reranking](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-reranking)| Evaluates how efficient the reranked context is compared to the original context.| @@ -123,9 +123,15 @@ Support for **40+ operators** such as BLEU, ROUGE, Embeddings Similarity, Exact | Eval | Description | | ---- | ----------- | -|[Prompt Injection](https://docs.uptrain.ai/predefined-evaluations/safeguarding/prompt-injection) | Grades whether the generated response is leaking any system prompt. | +|[Prompt Injection](https://docs.uptrain.ai/predefined-evaluations/safeguarding/prompt-injection) | Grades whether the user's prompt is an attempt to make the LLM reveal its system prompts. | |[Jailbreak Detection](https://docs.uptrain.ai/predefined-evaluations/safeguarding/jailbreak) | Grades whether the user's prompt is an attempt to jailbreak (i.e. generate illegal or harmful responses). | +evaluate the clarity of user queries + +| Eval | Description | +| ---- | ----------- | +|[Sub-Query Completeness](https://docs.uptrain.ai/predefined-evaluations/query-quality/sub-query-completeness) | Evaluate whether all of the sub-questions generated from a user's query, taken together, answer all aspects of the question or not | +
# Get started πŸ™Œ diff --git a/docs/predefined-evaluations/ground-truth-comparison/response-matching.mdx b/docs/predefined-evaluations/ground-truth-comparison/response-matching.mdx index 37846b6f7..e7dd5ec5e 100644 --- a/docs/predefined-evaluations/ground-truth-comparison/response-matching.mdx +++ b/docs/predefined-evaluations/ground-truth-comparison/response-matching.mdx @@ -1,11 +1,9 @@ --- title: Response Matching -description: Grades how relevant the generated context was to the question specified. +description: Grades how well the response generated by the LLM aligns with the provided ground truth. --- -Response relevance is the measure of how relevant the generated response is to the question asked. - -It helps evaluate how well the response addresses the question asked and if it contains any additional information that is not relevant to the question asked. +Response Matching compares the LLM-generated text with the gold (ideal) response using the defined score metric. Columns required: - `question`: The question asked by the user diff --git a/examples/checks/README.md b/examples/checks/README.md index b2164c807..8dc443ba4 100644 --- a/examples/checks/README.md +++ b/examples/checks/README.md @@ -1,88 +1,89 @@

- - Github banner 006 (1) + + Logo of UpTrain - an open-source platform to evaluate and improve LLM applications

+

Try out Evaluations - Read Docs - +Quickstart Tutorials +- Slack Community - Feature Request

-

- - PRs Welcome - - - - - - Quickstart - - - Website - -

- -# Pre-built Evaluations We Offer πŸ“ +**[UpTrain](https://www.uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. -#### Evaluate the quality of your responses: +
-| Metrics | Usage | -|------------|----------| -| [Response Completeness](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/response_quality/completeness.ipynb) | Evaluate if the response completely resolves the given user query. | -| [Response Relevance](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/response_quality/relevance.ipynb) | Evaluate whether the generated response for the given question, is relevant or not. | -| [Response Conciseness](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/response_quality/conciseness.ipynb) | Evaluate how concise the generated response is i.e. the extent of additional irrelevant information in the response. | -| [Response Matching ](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/response_quality/matching.ipynb) | Compare the LLM-generated text with the gold (ideal) response using the defined score metric. | -| [Response Consistency](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/response_quality/consistency.ipynb) | Evaluate how consistent the response is with the question asked as well as with the context provided. | +# Pre-built Evaluations We Offer πŸ“ +quality of your responses +| Eval | Description | +| ---- | ----------- | +|[Response Completeness](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-completeness) | Grades whether the response has answered all the aspects of the question specified. | +|[Response Conciseness](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-conciseness) | Grades how concise the generated response is or if it has any additional irrelevant information for the question asked. | +|[Response Relevance](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-relevance)| Grades how relevant the generated context was to the question specified.| +|[Response Validity](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-validity)| Grades if the response generated is valid or not. A response is considered to be valid if it contains any information.| +|[Response Consistency](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-consistency)| Grades how consistent the response is with the question asked as well as with the context provided.| -#### Evaluate the quality of retrieved context and response groundedness: +quality of retrieved context and response groundedness -| Metrics | Usage | -|------------|----------| +| Eval | Description | +| ---- | ----------- | |[Context Relevance](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-relevance) | Grades how relevant the context was to the question specified. | -|[Context Utilization](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-utilization) | Grades how complete the generated response was for the question specified given the information provided in the context. | +|[Context Utilization](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-utilization) | Grades how complete the generated response was for the question specified, given the information provided in the context. | |[Factual Accuracy](https://docs.uptrain.ai/predefined-evaluations/context-awareness/factual-accuracy)| Grades whether the response generated is factually correct and grounded by the provided context.| |[Context Conciseness](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-conciseness)| Evaluates the concise context cited from an original context for irrelevant information. |[Context Reranking](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-reranking)| Evaluates how efficient the reranked context is compared to the original context.| +language quality of the response + +| Eval | Description | +| ---- | ----------- | +|[Language Features](https://docs.uptrain.ai/predefined-evaluations/language-quality/fluency-and-coherence) | Grades whether the response has answered all the aspects of the question specified. | +|[Tonality](https://docs.uptrain.ai/predefined-evaluations/code-evals/code-hallucination) | Grades whether the generated response matches the required persona's tone | + +language quality of the response + +| Eval | Description | +| ---- | ----------- | +|[Code Hallucination](https://docs.uptrain.ai/predefined-evaluations/code-evals/code-hallucination) | Grades whether the code present in the generated response is grounded by the context. | -#### Evaluations to safeguard system prompts and avoid LLM mis-use: +conversation as a whole -| Metrics | Usage | -|------------|----------| -| [Prompt Injection](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/safeguarding/system_prompt_injection.ipynb) | Identify prompt leakage attacks| -| [Jailbreak Detection](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/safeguarding/jailbreak_detection.ipynb) | Detect prompts with potentially harmful or illegal behaviour| +| Eval | Description | +| ---- | ----------- | +|[User Satisfaction](https://docs.uptrain.ai/predefined-evaluations/conversation-evals/user-satisfaction) | Grades how well the user's concerns are addressed and assesses their satisfaction based on provided conversation. | -#### Evaluate the language quality of the response: +custom evaluations and others -| Metrics | Usage | -|------------|----------| -| [Tone Critique](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/language_features/tone_critique.ipynb) | Assess if the tone of machine-generated responses matches with the desired persona.| -| [Language Critique](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/language_features/language_critique.ipynb) | Evaluate LLM generated responses on multiple aspects - fluence, politeness, grammar, and coherence.| -| [Rouge Score](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/language_features/rouge_score.ipynb) | Measure the similarity between two pieces of text. | + Eval | Description | +| ---- | ----------- | +|[Custom Guideline](https://docs.uptrain.ai/predefined-evaluations/custom-evals/custom-guideline) | Allows you to specify a guideline and grades how well the LLM adheres to the provided guideline when giving a response. | +|[Custom Prompts](https://docs.uptrain.ai/predefined-evaluations/custom-evals/custom-prompt-eval) | Allows you to create your own set of evaluations. | -#### Evaluate the conversation as a whole: +compare responses with ground truth -| Metrics | Usage | -|------------|----------| -| [Conversation Satisfaction](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/conversation/conversation_satisfaction.ipynb) | Measures the user’s satisfaction with the conversation with the AI assistant based on completeness and user acceptance.| +| Eval | Description | +| ---- | ----------- | +|[Response Matching](https://docs.uptrain.ai/predefined-evaluations/ground-truth-comparison/response-matching) | Compares and grades how well the response generated by the LLM aligns with the provided ground truth. | -#### Defining custom evaluations and others: +safeguard system prompts and avoid LLM mis-use -| Metrics | Usage | -|------------|----------| -| [Guideline Adherence](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/custom/guideline_adherence.ipynb) | Grade how well the LLM adheres to a given custom guideline.| -| [Custom Prompt Evaluation](https://github.com/uptrain-ai/uptrain/blob/main) | Evaluate by defining your custom grading prompt.| -| [Cosine Similarity](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/custom/cosine_similarity.ipynb) | Calculate cosine similarity between embeddings of two texts.| +| Eval | Description | +| ---- | ----------- | +|[Prompt Injection](https://docs.uptrain.ai/predefined-evaluations/safeguarding/prompt-injection) | Grades whether the user's prompt is an attempt to make the LLM reveal its system prompts. | +|[Jailbreak Detection](https://docs.uptrain.ai/predefined-evaluations/safeguarding/jailbreak) | Grades whether the user's prompt is an attempt to jailbreak (i.e. generate illegal or harmful responses). | -If you face any difficulties, need some help with using UpTrain or want to brainstorm on custom evaluations for your use-case, [speak to the maintainers of UpTrain here](https://calendly.com/uptrain-sourabh/30min). +evaluate the clarity of user queries -You can also raise a request for a new metrics [here](https://github.com/uptrain-ai/uptrain/issues/new?assignees=&labels=enhancement&template=feature_request.md&title=) \ No newline at end of file +| Eval | Description | +| ---- | ----------- | +|[Sub-Query Completeness](https://docs.uptrain.ai/predefined-evaluations/query-quality/sub-query-completeness) | Evaluate whether all of the sub-questions generated from a user's query, taken together, answer all aspects of the question or not | \ No newline at end of file diff --git a/examples/checks/code_eval/README.md b/examples/checks/code_eval/README.md new file mode 100644 index 000000000..2d4d2436a --- /dev/null +++ b/examples/checks/code_eval/README.md @@ -0,0 +1,25 @@ +

+ + Logo of UpTrain - an open-source platform to evaluate and improve LLM applications + +

+ +

+Try out Evaluations +- +Read Docs +- +Quickstart Tutorials +- +Slack Community +- +Feature Request +

+ +**[UpTrain](https://www.uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. + +language quality of the response + +| Eval | Description | +| ---- | ----------- | +|[Code Hallucination](https://docs.uptrain.ai/predefined-evaluations/code-evals/code-hallucination) | Grades whether the code present in the generated response is grounded by the context. | diff --git a/examples/checks/compare_ground_truth/README.md b/examples/checks/compare_ground_truth/README.md new file mode 100644 index 000000000..8fcec5f21 --- /dev/null +++ b/examples/checks/compare_ground_truth/README.md @@ -0,0 +1,26 @@ +

+ + Logo of UpTrain - an open-source platform to evaluate and improve LLM applications + +

+ + +

+Try out Evaluations +- +Read Docs +- +Quickstart Tutorials +- +Slack Community +- +Feature Request +

+ +**[UpTrain](https://www.uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. + +compare responses with ground truth + +| Eval | Description | +| ---- | ----------- | +|[Response Matching](https://docs.uptrain.ai/predefined-evaluations/ground-truth-comparison/response-matching) | Compares and grades how well the response generated by the LLM aligns with the provided ground truth. | \ No newline at end of file diff --git a/examples/checks/compare_ground_truth/matching.ipynb b/examples/checks/compare_ground_truth/matching.ipynb index f1e0342cb..07ecd6b31 100644 --- a/examples/checks/compare_ground_truth/matching.ipynb +++ b/examples/checks/compare_ground_truth/matching.ipynb @@ -33,9 +33,7 @@ "id": "2ef54d59-295e-4f15-a35f-33f4e86ecdd2", "metadata": {}, "source": [ - "**What is Response Matching?**: Response Completeness is a metric that determines how well the response generated by an LLM matches the ground truth. It comes in handy while checking for the overlap between an LLM generated response and ground truth.\n", - "\n", - "For example, if a user asks a question about the formula of chlorophyll, the ideal response could be: \"The formula of chlorophyll is C55H72MgN4O5\", rather if the response contains some other information about chlorophyll and not the formula: \"Chlorophyll is the pigmet used in photosynthesis, it helpes in generating oxygen.\" it might not really be ideal as it does not match with the ground truth resulting in a low matching score.\n", + "**What is Response Matching?**: Response Matching compares the LLM-generated text with the gold (ideal) response using the defined score metric. \n", "\n", "**Data schema**: The data schema required for this evaluation is as follows:\n", "\n", diff --git a/examples/checks/context_awareness/README.md b/examples/checks/context_awareness/README.md index 4e97cd06b..fef040b84 100644 --- a/examples/checks/context_awareness/README.md +++ b/examples/checks/context_awareness/README.md @@ -1,47 +1,31 @@

- - Github banner 006 (1) + + Logo of UpTrain - an open-source platform to evaluate and improve LLM applications

+

Try out Evaluations - Read Docs - +Quickstart Tutorials +- Slack Community - Feature Request

-

- - PRs Welcome - - - - - - Quickstart - - - Website - -

- -# Context Awareness Evaluations πŸ“ +**[UpTrain](https://www.uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. -#### Evaluate the quality of retrieved context and response groundedness: +quality of retrieved context and response groundedness -| Metrics | Usage | -|------------|----------| +| Eval | Description | +| ---- | ----------- | |[Context Relevance](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-relevance) | Grades how relevant the context was to the question specified. | -|[Context Utilization](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-utilization) | Grades how complete the generated response was for the question specified given the information provided in the context. | +|[Context Utilization](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-utilization) | Grades how complete the generated response was for the question specified, given the information provided in the context. | |[Factual Accuracy](https://docs.uptrain.ai/predefined-evaluations/context-awareness/factual-accuracy)| Grades whether the response generated is factually correct and grounded by the provided context.| |[Context Conciseness](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-conciseness)| Evaluates the concise context cited from an original context for irrelevant information. -|[Context Reranking](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-reranking)| Evaluates how efficient the reranked context is compared to the original context.| - -If you face any difficulties, need some help with using UpTrain or want to brainstorm on custom evaluations for your use-case, [speak to the maintainers of UpTrain here](https://calendly.com/uptrain-sourabh/30min). - -You can also raise a request for a new metrics [here](https://github.com/uptrain-ai/uptrain/issues/new?assignees=&labels=enhancement&template=feature_request.md&title=) \ No newline at end of file +|[Context Reranking](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-reranking)| Evaluates how efficient the reranked context is compared to the original context.| \ No newline at end of file diff --git a/examples/checks/conversation/README.md b/examples/checks/conversation/README.md index 6fac2f4e1..0bb178a43 100644 --- a/examples/checks/conversation/README.md +++ b/examples/checks/conversation/README.md @@ -1,43 +1,26 @@

- - Github banner 006 (1) + + Logo of UpTrain - an open-source platform to evaluate and improve LLM applications

+

Try out Evaluations - Read Docs - +Quickstart Tutorials +- Slack Community - Feature Request

-

- - PRs Welcome - - - - - - Quickstart - - - Website - -

- - -# Conversation Satisfaction Evaluation πŸ“ - -#### Evaluate the conversation as a whole: - -| Metrics | Usage | -|------------|----------| -| [Conversation Satisfaction](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/conversation/conversation_satisfaction.ipynb) | Measures the user’s satisfaction with the conversation with the AI assistant based on completeness and user acceptance.| +**[UpTrain](https://www.uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. -If you face any difficulties, need some help with using UpTrain or want to brainstorm on custom evaluations for your use-case, [speak to the maintainers of UpTrain here](https://calendly.com/uptrain-sourabh/30min). +conversation as a whole -You can also raise a request for a new metrics [here](https://github.com/uptrain-ai/uptrain/issues/new?assignees=&labels=enhancement&template=feature_request.md&title=) \ No newline at end of file +| Eval | Description | +| ---- | ----------- | +|[User Satisfaction](https://docs.uptrain.ai/predefined-evaluations/conversation-evals/user-satisfaction) | Grades how well the user's concerns are addressed and assesses their satisfaction based on provided conversation. | \ No newline at end of file diff --git a/examples/checks/custom/README.md b/examples/checks/custom/README.md index 9d5c27698..347edb323 100644 --- a/examples/checks/custom/README.md +++ b/examples/checks/custom/README.md @@ -1,117 +1,27 @@

- - Github banner 006 (1) + + Logo of UpTrain - an open-source platform to evaluate and improve LLM applications

-

-Try out Evaluations -- -Read Docs -- -Slack Community -- -Feature Request -

- -

- - PRs Welcome - - - - - - Quickstart - - - Website - -

- - -# Other Custom Evaluations πŸ“ - -#### Evaluate the quality of your responses: - -| Metrics | Usage | -|------------|----------| -| [Response Completeness](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/response_quality/completeness.ipynb) | Evaluate if the response completely resolves the given user query. | -| [Response Relevance](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/response_quality/relevance.ipynb) | Evaluate whether the generated response for the given question, is relevant or not. | -| [Response Conciseness](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/response_quality/conciseness.ipynb) | Evaluate how concise the generated response is i.e. the extent of additional irrelevant information in the response. | -| [Response Matching ](https://github.com/uptrain-ai/uptrain/blob/main) | Compare the LLM-generated text with the gold (ideal) response using the defined score metric. | -| [Response Consistency](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/response_quality/consistency.ipynb) | Evaluate how consistent the response is with the question asked as well as with the context provided. | - - -#### Evaluate based on langauge features: - -| Metrics | Usage | -|------------|----------| -| [Factual Accuracy](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/context_awareness/factual_accuracy.ipynb) | Evaluate if the facts present in the response can be verified by the retrieved context. | -| [Response Completeness wrt Context](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/context_awareness/response_completeness_wrt_context.ipynb) | Grades how complete the response was for the question specified concerning the information present in the context.| -| [Context Relevance](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/context_awareness/relevance.ipynb) | Evaluate if the retrieved context contains sufficient information to answer the given question. | - - -#### Evaluations to safeguard system prompts and avoid LLM mis-use: - -| Metrics | Usage | -|------------|----------| -| [Prompt Injection](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/safeguarding/system_prompt_injection.ipynb) | Identify prompt leakage attacks| - - -#### Evaluate the language quality of the response: - -| Metrics | Usage | -|------------|----------| -| [Tone Critique](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/language_features/tone_critique.ipynb) | Assess if the tone of machine-generated responses matches with the desired persona.| -| [Language Critique](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/language_features/language_critique.ipynb) | Evaluate LLM generated responses on multiple aspects - fluence, politeness, grammar, and coherence.| - -#### Evaluate the conversation as a whole: - -| Metrics | Usage | -|------------|----------| -| [Conversation Satisfaction](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/conversation/conversation_satisfaction.ipynb) | Measures the user’s satisfaction with the conversation with the AI assistant based on completeness and user acceptance.| - - Github banner 006 (1) - -

Try out Evaluations - Read Docs - +Quickstart Tutorials +- Slack Community - Feature Request

-

- - PRs Welcome - - - - - - Quickstart - - - Website - -

- - -# Pre-built Evaluations We Offer πŸ“ - -#### Defining custom evaluations and others: - -| Metrics | Usage | -|------------|----------| -| [Guideline Adherence](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/custom/guideline_adherence.ipynb) | Grade how well the LLM adheres to a given custom guideline.| -| [Custom Prompt Evaluation](https://github.com/uptrain-ai/uptrain/blob/main) | Evaluate by defining your custom grading prompt.| -| [Cosine Similarity](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/custom/cosine_similarity.ipynb) | Calculate cosine similarity between embeddings of two texts.| +**[UpTrain](https://www.uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. -If you face any difficulties, need some help with using UpTrain or want to brainstorm on custom evaluations for your use-case, [speak to the maintainers of UpTrain here](https://calendly.com/uptrain-sourabh/30min). +custom evaluations and others -You can also raise a request for a new metrics [here](https://github.com/uptrain-ai/uptrain/issues/new?assignees=&labels=enhancement&template=feature_request.md&title=) \ No newline at end of file + Eval | Description | +| ---- | ----------- | +|[Custom Guideline](https://docs.uptrain.ai/predefined-evaluations/custom-evals/custom-guideline) | Allows you to specify a guideline and grades how well the LLM adheres to the provided guideline when giving a response. | +|[Custom Prompts](https://docs.uptrain.ai/predefined-evaluations/custom-evals/custom-prompt-eval) | Allows you to create your own set of evaluations. | \ No newline at end of file diff --git a/examples/checks/language_features/README.md b/examples/checks/language_features/README.md index b046842f9..4a595c853 100644 --- a/examples/checks/language_features/README.md +++ b/examples/checks/language_features/README.md @@ -1,45 +1,27 @@

- - Github banner 006 (1) + + Logo of UpTrain - an open-source platform to evaluate and improve LLM applications

+

Try out Evaluations - Read Docs - +Quickstart Tutorials +- Slack Community - Feature Request

-

- - PRs Welcome - - - - - - Quickstart - - - Website - -

- - -# Evaluations based on language quality πŸ“ - -#### Evaluate the language quality of the response: - -| Metrics | Usage | -|------------|----------| -| [Tone Critique](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/language_features/tone_critique.ipynb) | Assess if the tone of machine-generated responses matches with the desired persona.| -| [Language Critique](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/language_features/language_critique.ipynb) | Evaluate LLM generated responses on multiple aspects - fluence, politeness, grammar, and coherence.| -| [Rouge Score](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/language_features/rouge_score.ipynb) | Measure the similarity between two pieces of text. | +**[UpTrain](https://www.uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. -If you face any difficulties, need some help with using UpTrain or want to brainstorm on custom evaluations for your use-case, [speak to the maintainers of UpTrain here](https://calendly.com/uptrain-sourabh/30min). +language quality of the response -You can also raise a request for a new metrics [here](https://github.com/uptrain-ai/uptrain/issues/new?assignees=&labels=enhancement&template=feature_request.md&title=) \ No newline at end of file +| Eval | Description | +| ---- | ----------- | +|[Language Features](https://docs.uptrain.ai/predefined-evaluations/language-quality/fluency-and-coherence) | Grades whether the response has answered all the aspects of the question specified. | +|[Tonality](https://docs.uptrain.ai/predefined-evaluations/code-evals/code-hallucination) | Grades whether the generated response matches the required persona's tone | \ No newline at end of file diff --git a/examples/checks/response_quality/README.md b/examples/checks/response_quality/README.md index 8a34bceaa..dbb4a1273 100644 --- a/examples/checks/response_quality/README.md +++ b/examples/checks/response_quality/README.md @@ -1,47 +1,31 @@

- - Github banner 006 (1) + + Logo of UpTrain - an open-source platform to evaluate and improve LLM applications

+

Try out Evaluations - Read Docs - +Quickstart Tutorials +- Slack Community - Feature Request

-

- - PRs Welcome - - - - - - Quickstart - - - Website - -

- - -# Evaluations based on response quality πŸ“ - -#### Evaluate the quality of your responses: +**[UpTrain](https://www.uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. -| Metrics | Usage | -|------------|----------| -| [Response Completeness](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/response_quality/completeness.ipynb) | Evaluate if the response completely resolves the given user query. | -| [Response Relevance](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/response_quality/relevance.ipynb) | Evaluate whether the generated response for the given question, is relevant or not. | -| [Response Conciseness](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/response_quality/conciseness.ipynb) | Evaluate how concise the generated response is i.e. the extent of additional irrelevant information in the response. | -| [Response Matching ](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/response_quality/matching.ipynb) | Compare the LLM-generated text with the gold (ideal) response using the defined score metric. | -| [Response Consistency](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/response_quality/consistency.ipynb) | Evaluate how consistent the response is with the question asked as well as with the context provided. | -If you face any difficulties, need some help with using UpTrain or want to brainstorm on custom evaluations for your use-case, [speak to the maintainers of UpTrain here](https://calendly.com/uptrain-sourabh/30min). +quality of your responses -You can also raise a request for a new metrics [here](https://github.com/uptrain-ai/uptrain/issues/new?assignees=&labels=enhancement&template=feature_request.md&title=) \ No newline at end of file +| Eval | Description | +| ---- | ----------- | +|[Response Completeness](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-completeness) | Grades whether the response has answered all the aspects of the question specified. | +|[Response Conciseness](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-conciseness) | Grades how concise the generated response is or if it has any additional irrelevant information for the question asked. | +|[Response Relevance](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-relevance)| Grades how relevant the generated context was to the question specified.| +|[Response Validity](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-validity)| Grades if the response generated is valid or not. A response is considered to be valid if it contains any information.| +|[Response Consistency](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-consistency)| Grades how consistent the response is with the question asked as well as with the context provided.| \ No newline at end of file diff --git a/examples/checks/safeguarding/README.md b/examples/checks/safeguarding/README.md index ac177c1bb..e190d0a07 100644 --- a/examples/checks/safeguarding/README.md +++ b/examples/checks/safeguarding/README.md @@ -1,44 +1,27 @@

- - Github banner 006 (1) + + Logo of UpTrain - an open-source platform to evaluate and improve LLM applications

+

Try out Evaluations - Read Docs - +Quickstart Tutorials +- Slack Community - Feature Request

-

- - PRs Welcome - - - - - - Quickstart - - - Website - -

- - -# Evaluations to ensure better safety πŸ“ - -#### Evaluations to safeguard system prompts and avoid LLM mis-use: - -| Metrics | Usage | -|------------|----------| -| [Prompt Injection](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/safeguarding/system_prompt_injection.ipynb) | Identify prompt leakage attacks| -| [Jailbreak Detection](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/safeguarding/jailbreak_detection.ipynb) | Detect prompts with potentially harmful or illegal behaviour| +**[UpTrain](https://www.uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. -If you face any difficulties, need some help with using UpTrain or want to brainstorm on custom evaluations for your use-case, [speak to the maintainers of UpTrain here](https://calendly.com/uptrain-sourabh/30min). +safeguard system prompts and avoid LLM mis-use -You can also raise a request for a new metrics [here](https://github.com/uptrain-ai/uptrain/issues/new?assignees=&labels=enhancement&template=feature_request.md&title=) \ No newline at end of file +| Eval | Description | +| ---- | ----------- | +|[Prompt Injection](https://docs.uptrain.ai/predefined-evaluations/safeguarding/prompt-injection) | Grades whether the user's prompt is an attempt to make the LLM reveal its system prompts. | +|[Jailbreak Detection](https://docs.uptrain.ai/predefined-evaluations/safeguarding/jailbreak) | Grades whether the user's prompt is an attempt to jailbreak (i.e. generate illegal or harmful responses). | \ No newline at end of file diff --git a/examples/checks/sub_query/README.md b/examples/checks/sub_query/README.md index 08179307b..cbc621cb6 100644 --- a/examples/checks/sub_query/README.md +++ b/examples/checks/sub_query/README.md @@ -1,44 +1,27 @@

- - Github banner 006 (1) + + Logo of UpTrain - an open-source platform to evaluate and improve LLM applications

+

Try out Evaluations - Read Docs - +Quickstart Tutorials +- Slack Community - Feature Request

-

- - PRs Welcome - - - - - - Quickstart - - - Website - -

- - -# Pre-built Evaluations We Offer πŸ“ - -#### Evaluate the quality of sub queries: - -| Metrics | Usage | -|------------|----------| -| [Sub-query Completeness](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/sub_query/sub_query_completeness.ipynb) | Evaluate if the list of generated sub-questions comprehensively cover all aspects of the main question.| +**[UpTrain](https://www.uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. -If you face any difficulties, need some help with using UpTrain or want to brainstorm on custom evaluations for your use-case, [speak to the maintainers of UpTrain here](https://calendly.com/uptrain-sourabh/30min). +evaluate the clarity of user queries -You can also raise a request for a new metrics [here](https://github.com/uptrain-ai/uptrain/issues/new?assignees=&labels=enhancement&template=feature_request.md&title=) \ No newline at end of file +| Eval | Description | +| ---- | ----------- | +|[Sub-Query Completeness](https://docs.uptrain.ai/predefined-evaluations/query-quality/sub-query-completeness) | Evaluate whether all of the sub-questions generated from a user's query, taken together, answer all aspects of the question or not | \ No newline at end of file From 723acb908d5b43476a8d8e3e11ef30c149abda6d Mon Sep 17 00:00:00 2001 From: Shreyansh Jain Date: Wed, 6 Mar 2024 20:05:48 +0530 Subject: [PATCH 2/6] fix uptrain website links --- README.md | 4 ++-- examples/checks/README.md | 4 ++-- examples/checks/code_eval/README.md | 4 ++-- examples/checks/compare_ground_truth/README.md | 4 ++-- examples/checks/context_awareness/README.md | 4 ++-- examples/checks/conversation/README.md | 4 ++-- examples/checks/custom/README.md | 4 ++-- examples/checks/language_features/README.md | 4 ++-- examples/checks/response_quality/README.md | 4 ++-- examples/checks/safeguarding/README.md | 4 ++-- examples/checks/sub_query/README.md | 4 ++-- 11 files changed, 22 insertions(+), 22 deletions(-) diff --git a/README.md b/README.md index 8495e9a69..fdcd932a7 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,5 @@

- + Logo of UpTrain - an open-source platform to evaluate and improve LLM applications

@@ -21,7 +21,7 @@ Demo of UpTrain's LLM evaluations with scores for hallucinations, retrieved-context quality, response tonality for a customer support chatbot -**[UpTrain](https://www.uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. +**[UpTrain](https://uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.
diff --git a/examples/checks/README.md b/examples/checks/README.md index 8dc443ba4..624bff999 100644 --- a/examples/checks/README.md +++ b/examples/checks/README.md @@ -1,5 +1,5 @@

- + Logo of UpTrain - an open-source platform to evaluate and improve LLM applications

@@ -18,7 +18,7 @@

-**[UpTrain](https://www.uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. +**[UpTrain](https://uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.
diff --git a/examples/checks/code_eval/README.md b/examples/checks/code_eval/README.md index 2d4d2436a..c7b379ec5 100644 --- a/examples/checks/code_eval/README.md +++ b/examples/checks/code_eval/README.md @@ -1,5 +1,5 @@

- + Logo of UpTrain - an open-source platform to evaluate and improve LLM applications

@@ -16,7 +16,7 @@ Feature Request

-**[UpTrain](https://www.uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. +**[UpTrain](https://uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. language quality of the response diff --git a/examples/checks/compare_ground_truth/README.md b/examples/checks/compare_ground_truth/README.md index 8fcec5f21..3fa7b7eb0 100644 --- a/examples/checks/compare_ground_truth/README.md +++ b/examples/checks/compare_ground_truth/README.md @@ -1,5 +1,5 @@

- + Logo of UpTrain - an open-source platform to evaluate and improve LLM applications

@@ -17,7 +17,7 @@ Feature Request

-**[UpTrain](https://www.uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. +**[UpTrain](https://uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. compare responses with ground truth diff --git a/examples/checks/context_awareness/README.md b/examples/checks/context_awareness/README.md index fef040b84..4199cec74 100644 --- a/examples/checks/context_awareness/README.md +++ b/examples/checks/context_awareness/README.md @@ -1,5 +1,5 @@

- + Logo of UpTrain - an open-source platform to evaluate and improve LLM applications

@@ -18,7 +18,7 @@

-**[UpTrain](https://www.uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. +**[UpTrain](https://uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. quality of retrieved context and response groundedness diff --git a/examples/checks/conversation/README.md b/examples/checks/conversation/README.md index 0bb178a43..8403ef1fd 100644 --- a/examples/checks/conversation/README.md +++ b/examples/checks/conversation/README.md @@ -1,5 +1,5 @@

- + Logo of UpTrain - an open-source platform to evaluate and improve LLM applications

@@ -17,7 +17,7 @@ Feature Request

-**[UpTrain](https://www.uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. +**[UpTrain](https://uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. conversation as a whole diff --git a/examples/checks/custom/README.md b/examples/checks/custom/README.md index 347edb323..c777a28fb 100644 --- a/examples/checks/custom/README.md +++ b/examples/checks/custom/README.md @@ -1,5 +1,5 @@

- + Logo of UpTrain - an open-source platform to evaluate and improve LLM applications

@@ -17,7 +17,7 @@ Feature Request

-**[UpTrain](https://www.uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. +**[UpTrain](https://uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. custom evaluations and others diff --git a/examples/checks/language_features/README.md b/examples/checks/language_features/README.md index 4a595c853..427e1ca40 100644 --- a/examples/checks/language_features/README.md +++ b/examples/checks/language_features/README.md @@ -1,5 +1,5 @@

- + Logo of UpTrain - an open-source platform to evaluate and improve LLM applications

@@ -17,7 +17,7 @@ Feature Request

-**[UpTrain](https://www.uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. +**[UpTrain](https://uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. language quality of the response diff --git a/examples/checks/response_quality/README.md b/examples/checks/response_quality/README.md index dbb4a1273..314e1c50c 100644 --- a/examples/checks/response_quality/README.md +++ b/examples/checks/response_quality/README.md @@ -1,5 +1,5 @@

- + Logo of UpTrain - an open-source platform to evaluate and improve LLM applications

@@ -17,7 +17,7 @@ Feature Request

-**[UpTrain](https://www.uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. +**[UpTrain](https://uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. quality of your responses diff --git a/examples/checks/safeguarding/README.md b/examples/checks/safeguarding/README.md index e190d0a07..a3a353ef9 100644 --- a/examples/checks/safeguarding/README.md +++ b/examples/checks/safeguarding/README.md @@ -1,5 +1,5 @@

- + Logo of UpTrain - an open-source platform to evaluate and improve LLM applications

@@ -17,7 +17,7 @@ Feature Request

-**[UpTrain](https://www.uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. +**[UpTrain](https://uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. safeguard system prompts and avoid LLM mis-use diff --git a/examples/checks/sub_query/README.md b/examples/checks/sub_query/README.md index cbc621cb6..5cf06deec 100644 --- a/examples/checks/sub_query/README.md +++ b/examples/checks/sub_query/README.md @@ -1,5 +1,5 @@

- + Logo of UpTrain - an open-source platform to evaluate and improve LLM applications

@@ -18,7 +18,7 @@

-**[UpTrain](https://www.uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. +**[UpTrain](https://uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them. evaluate the clarity of user queries From 37a4c4ae735878e5d0833eeca39db8953bcb8239 Mon Sep 17 00:00:00 2001 From: Dhruv Chawla <43818888+Dominastorm@users.noreply.github.com> Date: Wed, 6 Mar 2024 21:49:08 +0530 Subject: [PATCH 3/6] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index fdcd932a7..e3770d9bf 100644 --- a/README.md +++ b/README.md @@ -130,7 +130,7 @@ Support for **40+ operators** such as BLEU, ROUGE, Embeddings Similarity, Exact | Eval | Description | | ---- | ----------- | -|[Sub-Query Completeness](https://docs.uptrain.ai/predefined-evaluations/query-quality/sub-query-completeness) | Evaluate whether all of the sub-questions generated from a user's query, taken together, answer all aspects of the question or not | +|[Sub-Query Completeness](https://docs.uptrain.ai/predefined-evaluations/query-quality/sub-query-completeness) | Evaluate whether all of the sub-questions generated from a user's query, taken together, cover all aspects of the user's query or not |
From 8e868aaf514d22d8c12231cb7dc5907046825442 Mon Sep 17 00:00:00 2001 From: Dhruv Chawla <43818888+Dominastorm@users.noreply.github.com> Date: Thu, 7 Mar 2024 20:00:17 +0530 Subject: [PATCH 4/6] Update README.md --- examples/checks/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/checks/README.md b/examples/checks/README.md index 624bff999..82246fd23 100644 --- a/examples/checks/README.md +++ b/examples/checks/README.md @@ -86,4 +86,4 @@ | Eval | Description | | ---- | ----------- | -|[Sub-Query Completeness](https://docs.uptrain.ai/predefined-evaluations/query-quality/sub-query-completeness) | Evaluate whether all of the sub-questions generated from a user's query, taken together, answer all aspects of the question or not | \ No newline at end of file +|[Sub-Query Completeness](https://docs.uptrain.ai/predefined-evaluations/query-quality/sub-query-completeness) | Evaluate whether all of the sub-questions generated from a user's query, taken together, cover all aspects of the user's query or not | From a16378fe81c977e5a6bae9109e2cd6aa3ceab98a Mon Sep 17 00:00:00 2001 From: Dhruv Chawla <43818888+Dominastorm@users.noreply.github.com> Date: Thu, 7 Mar 2024 20:02:39 +0530 Subject: [PATCH 5/6] Update README.md --- examples/checks/sub_query/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/checks/sub_query/README.md b/examples/checks/sub_query/README.md index 5cf06deec..73abf6173 100644 --- a/examples/checks/sub_query/README.md +++ b/examples/checks/sub_query/README.md @@ -24,4 +24,4 @@ | Eval | Description | | ---- | ----------- | -|[Sub-Query Completeness](https://docs.uptrain.ai/predefined-evaluations/query-quality/sub-query-completeness) | Evaluate whether all of the sub-questions generated from a user's query, taken together, answer all aspects of the question or not | \ No newline at end of file +|[Sub-Query Completeness](https://docs.uptrain.ai/predefined-evaluations/query-quality/sub-query-completeness) | Evaluate whether all of the sub-questions generated from a user's query, taken together, cover all aspects of the user's query or not | From ba4e5508da3a05b49842964e2fc509b9b8ecc847 Mon Sep 17 00:00:00 2001 From: Shreyansh Jain Date: Mon, 11 Mar 2024 17:52:14 +0530 Subject: [PATCH 6/6] minor fixes --- README.md | 17 +++++++++++------ docs/dashboard/evaluations.mdx | 2 ++ docs/dashboard/getting_started.mdx | 3 +++ docs/dashboard/project.mdx | 2 ++ docs/dashboard/prompts.mdx | 2 ++ examples/checks/README.md | 2 +- examples/checks/language_features/README.md | 2 +- 7 files changed, 22 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index e3770d9bf..dda975984 100644 --- a/README.md +++ b/README.md @@ -55,14 +55,17 @@ UpTrain provides tons of ways to **customize evaluations**. You can customize ev Support for **40+ operators** such as BLEU, ROUGE, Embeddings Similarity, Exact match, etc. +Interactive Dashboards + +UpTrain Dashboard is a web-based interface that runs on your **local machine**. You can use the dashboard to evaluate your LLM applications, view the results, and perform root cause analysis. + ### Coming Soon: -1. Experiment Dashboards -2. Collaborate with your team -3. Embedding visualization via UMAP and Clustering -4. Pattern recognition among failure cases -5. Prompt improvement suggestions +1. Collaborate with your team +2. Embedding visualization via UMAP and Clustering +3. Pattern recognition among failure cases +4. Prompt improvement suggestions
@@ -79,6 +82,7 @@ Support for **40+ operators** such as BLEU, ROUGE, Embeddings Similarity, Exact quality of retrieved context and response groundedness + | Eval | Description | | ---- | ----------- | |[Context Relevance](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-relevance) | Grades how relevant the context was to the question specified. | @@ -91,7 +95,7 @@ Support for **40+ operators** such as BLEU, ROUGE, Embeddings Similarity, Exact | Eval | Description | | ---- | ----------- | -|[Language Features](https://docs.uptrain.ai/predefined-evaluations/language-quality/fluency-and-coherence) | Grades whether the response has answered all the aspects of the question specified. | +|[Language Features](https://docs.uptrain.ai/predefined-evaluations/language-quality/fluency-and-coherence) | Grades the quality and effectiveness of language in a response, focusing on factors such as clarity, coherence, conciseness, and overall communication. | |[Tonality](https://docs.uptrain.ai/predefined-evaluations/code-evals/code-hallucination) | Grades whether the generated response matches the required persona's tone | language quality of the response @@ -153,6 +157,7 @@ cd uptrain # Run UpTrain bash run_uptrain.sh ``` +> **_NOTE:_** UpTrain Dashboard is currently in **Beta version**. We would love your feedback to improve it. ## Using the UpTrain package diff --git a/docs/dashboard/evaluations.mdx b/docs/dashboard/evaluations.mdx index 8dddf580b..733902e9f 100644 --- a/docs/dashboard/evaluations.mdx +++ b/docs/dashboard/evaluations.mdx @@ -55,6 +55,8 @@ You can look at the complete list of UpTrain's supported metrics [here](/predefi +UpTrain Dashboard is currently in Beta version. We would love your feedback to improve it. + Before you start, ensure you have docker installed on your machine. If not, you can install it from [here](https://docs.docker.com/get-docker/). + ### How to install? The following commands will download the UpTrain dashboard and start it on your local machine: @@ -24,6 +25,8 @@ cd uptrain bash run_uptrain.sh ``` +UpTrain Dashboard is currently in Beta version. We would love your feedback to improve it. + UpTrain Dashboard is currently in Beta version. We would love your feedback to improve it. + +UpTrain Dashboard is currently in Beta version. We would love your feedback to improve it. + diff --git a/examples/checks/language_features/README.md b/examples/checks/language_features/README.md index 427e1ca40..ef3236477 100644 --- a/examples/checks/language_features/README.md +++ b/examples/checks/language_features/README.md @@ -23,5 +23,5 @@ | Eval | Description | | ---- | ----------- | -|[Language Features](https://docs.uptrain.ai/predefined-evaluations/language-quality/fluency-and-coherence) | Grades whether the response has answered all the aspects of the question specified. | +|[Language Features](https://docs.uptrain.ai/predefined-evaluations/language-quality/fluency-and-coherence) | Grades the quality and effectiveness of language in a response, focusing on factors such as clarity, coherence, conciseness, and overall communication. | |[Tonality](https://docs.uptrain.ai/predefined-evaluations/code-evals/code-hallucination) | Grades whether the generated response matches the required persona's tone | \ No newline at end of file