Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Vectara Hallucination Detection Model #950

Merged
merged 22 commits into from
Mar 23, 2024
Merged

Conversation

Josephrp
Copy link
Contributor

@Josephrp Josephrp commented Mar 2, 2024

Added vectara hallucination detection model to the huggingface class

adding an exposition/model example using huggingface models end to end for demo

this is a draft PR , just need to add some text to explain the examples we have chosen


Ellipsis 🚀 This PR description was created by Ellipsis for commit 4f3353e.

Summary:

The PR adds a new method hallucination_evaluator to the Huggingface class in hugs.py for evaluating the hallucination score of a combined input of two statements using the Huggingface hallucination evaluation model.

Key points:

  • Added hallucination_evaluator method to Huggingface class in hugs.py.
  • The method uses the HUGS_HALLUCINATION_API_URL endpoint for the hallucination evaluation model from Huggingface.
  • The method takes two arguments: model_output and retrieved_text_chunks, combines them, and sends a POST request to the API.
  • The response is parsed to extract the hallucination score.

Generated with ❤️ by ellipsis.dev

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@Josephrp Josephrp marked this pull request as ready for review March 2, 2024 15:21
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. documentation Improvements or additions to documentation labels Mar 2, 2024
Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested.

  • Reviewed the entire pull request up to 4f3353e
  • Looked at 61 lines of code in 1 files
  • Took 1 minute and 3 seconds to review
More info
  • Skipped 4 files when reviewing.
  • Skipped posting 1 additional comments because they didn't meet confidence threshold of 50%.
1. trulens_eval/trulens_eval/feedback/provider/hugs.py:485:
  • Assessed confidence : 100%
  • Grade: 40%
  • Comment:
    The method hallucination_evaluator does not handle the case when the response from the API is not a list and not a proper HTTP response. This could lead to unexpected behavior. Consider adding an else clause to handle this case.
  • Reasoning:
    The new method hallucination_evaluator is not handling the case when the response from the API is not a list and not a proper HTTP response. This could lead to unexpected behavior.

Workflow ID: wflow_I5ako30SCD4tU8DH


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. We'll respond in a few minutes. Learn more here.

@@ -0,0 +1,300 @@
{
Copy link
Contributor

@joshreini1 joshreini1 Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vectra -> Vectara?

Please remove the stray "or."


Reply via ReviewNB

@joshreini1
Copy link
Contributor

Can you remove the .ipynb checkpoints from this change?

Also can you elaborate on why there are two separate notebooks to show this capability?

@@ -0,0 +1,300 @@
{
Copy link
Contributor

@joshreini1 joshreini1 Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this notebook is focused on the Vectara HHEM evaluator, can you rename the notebook to reflect that?

Suggestion: vectara_hallucination_evaluator.ipynb


Reply via ReviewNB

@@ -0,0 +1,300 @@
{
Copy link
Contributor

@joshreini1 joshreini1 Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be useful here to show usage of this evaluator as part of a recorded app, e.g. as shown in https://www.trulens.org/trulens_eval/langchain_quickstart/


Reply via ReviewNB

@@ -0,0 +1,260 @@
{
Copy link
Contributor

@joshreini1 joshreini1 Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This notebook seems similar to the one under /models - what is the distinction?


Reply via ReviewNB

@joshreini1
Copy link
Contributor

Thanks for all the work on this @MN-Noor @Josephrp - this is close! Just added a few small comments before it can go in

@Josephrp
Copy link
Contributor Author

Josephrp commented Mar 9, 2024

hi @MN-Noor would you like to take the last reviews above ?

@joshreini1 , thanks for the comments, should we keep both notebooks renamed or remove one or the other?

sorry to bother you but i'll be able to wrap this up :-)

@joshreini1
Copy link
Contributor

@Josephrp keep the one in /models and remove the other. Thanks!

@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:XXL This PR changes 1000+ lines, ignoring generated files. labels Mar 13, 2024
@Josephrp
Copy link
Contributor Author

@joshreini1 thanks ! @MN-Noor wrapped it up nicely , hope that's us done - until next time !

@joshreini1 joshreini1 self-requested a review March 15, 2024 17:28
@joshreini1 joshreini1 merged commit bff1cdc into truera:main Mar 23, 2024
9 checks passed
@joshreini1
Copy link
Contributor

@Josephrp @MN-Noor do you have a Twitter/x account? I’ll shoutout from the TruLens handle if so!

@Josephrp
Copy link
Contributor Author

ha! i sure do , here's me .
Noor's doesnt work well where she is , so that's why she prefers linked-in

hope that's okay :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation size:XL This PR changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants