Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

verify feedback selectors on recorder init #961

Merged
merged 13 commits into from
Mar 12, 2024
Merged

Conversation

piotrm0
Copy link
Contributor

@piotrm0 piotrm0 commented Mar 6, 2024

  • Fill in normal method call information in dummy record.
  • Ignore failures if they are of the form something_referring_to_method_m.args.method_m_arg.anything or something_referring_to_method_m.rets.anything as info beyond known parameter names or return values is not known before app is invoked.
  • Also added App.dummy_record method to produce the records used to check for selector issue but might be independently useful for users.

When creating an app recorder and feedbacks are provided, the selectors in those feedbacks are checked against the app and (empty) record in case those selectors are wrong. Settings to not run this check or not throw the error are provided and explained in the error message. The message also includes a dump of the longest prefix of the selector that does exist. Example:

f = Feedback(hugs.language_match).on(Select.App.app._response_synthesizer.thisdoesnotexist.thisalso)

tru_query_engine_recorder = TruLlama(query_engine, feedbacks=[f])

Produces an exception and a hint message:

ValueError: Some selectors do not exist in the app or record.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃                                              Selector check failed                                              ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

Source of argument text1 to language_match does not exist in app or expected record:                               

                                                                                                                   
 __record__.app._response_synthesizer.thisdoesnotexist.thisalso                                                    
                                                                                                                   

The data used to make this check may be incomplete. If you expect records produced by your app to contain the      
selected content, you can ignore this error by setting selectors_nocheck in the TruLlama constructor.              
Alternatively, setting selectors_check_warning will print out this message but will not raise an error.            


                                              Additional information:                                              

Feedback function signature:                                                                                       

                                                                                                                   
 (text1: str, text2: str) -> Tuple[float, Dict]                                                                    
                                                                                                                   

The prefix __record__.app._response_synthesizer selects this data that exists in your app or typical records:      

 • Object of type dict starting with:                                                                              

                                                                                                                   
       {                                                                                                           
         '_llm': {                                                                                                 
           'wrapped_llm_predict': [...],                                                                           
           'wrapped_async_llm_predict': [...],                                                                     
           'wrapped_llm_chat': [...],                                                                              
           'wrapped_async_llm_chat': [...]                                                                         
         },                                                                                                        
         'get_response': [RecordAppCall(...), RecordAppCall(...), RecordAppCall(...)]                              
       }                                                                                                           
                                                                                                                   

Ellipsis 🚀 This PR description was created by Ellipsis for commit 90514b3.

Summary:

This PR introduces a method to verify feedback selectors, improves comments and logging, adds support for NEMO Guardrails apps with new classes, and updates test files and documentation.

Key points:

  • Added a new method check_selectors in App and Feedback classes in app.py and feedback.py respectively.
  • Improved comments and logging messages in app.py and feedback.py.
  • Introduced support for NEMO Guardrails apps with the creation of TruRails and RailsInstrument classes in tru_rails.py.
  • Modified test files and updated documentation.

Generated with ❤️ by ellipsis.dev


Ellipsis 🚀 This PR description was created by Ellipsis for commit c2289f5.

Summary:

This PR maintains the detailed explanation of the error handling mechanism, introduces a method to verify feedback selectors, improves comments and logging, adds support for NEMO Guardrails apps, and updates test files and documentation.

Key points:

  • Maintains the detailed explanation of the error handling mechanism when selectors in feedbacks are checked against the app and record.
  • Introduces a method to verify feedback selectors.
  • Improves comments and logging in app.py and feedback.py.
  • Adds support for NEMO Guardrails apps with the creation of TruRails and RailsInstrument classes in tru_rails.py.
  • Updates test files and documentation.

Generated with ❤️ by ellipsis.dev

@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Mar 6, 2024
@piotrm0 piotrm0 marked this pull request as draft March 6, 2024 19:06
Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested.

  • Reviewed the entire pull request up to 0cb3c5f
  • Looked at 195 lines of code in 2 files
  • Took 2 minutes and 51 seconds to review
More info
  • Skipped 0 files when reviewing.
  • Skipped posting 0 additional comments because they didn't meet confidence threshold of 50%.

Workflow ID: wflow_KHxlrV4eb8P1Hk0F


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. We'll respond in a few minutes. Learn more here.

"""
Check that the selectors are valid for the given app and record.
"""
return True
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check_selectors method currently does not perform any checks and always returns True. This could potentially lead to issues if the selectors are not valid for the given app and record. Please implement the necessary checks or remove the method if it's not needed.

@@ -415,6 +415,7 @@
higher_is_better=self.higher_is_better
)

# alias
on_input = on_prompt
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The on_prompt and on_input methods are identical, as are on_response and on_output. Having two methods that do the same thing can lead to confusion. Consider removing one of each pair, or if they are needed for backward compatibility, mark one as deprecated.

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problems found on commit 90514b3.


Not what you expected? You can customize the content of the reviews using rules. Learn more here.

@piotrm0 piotrm0 requested a review from joshreini1 March 7, 2024 00:04
@piotrm0 piotrm0 marked this pull request as ready for review March 7, 2024 00:04
@dosubot dosubot bot added the documentation Improvements or additions to documentation label Mar 7, 2024
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Mar 7, 2024
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Mar 7, 2024
@joshreini1
Copy link
Contributor

Expecting this change to make a big difference in users understanding why they run into issues with the feedback selector. Nicely done!

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me!

  • Performed an incremental review on c2289f5
  • Looked at 5025 lines of code in 41 files
  • Took 13 minutes and 30 seconds to review
More info
  • Skipped 6 files when reviewing.
  • Skipped posting 6 additional comments because they didn't meet confidence threshold of 50%.
1. trulens_eval/trulens_eval/tru_chain.py:14:
  • Assessed confidence : 10%
  • Comment:
    The TruChain class is a wrapper for a Llama chain that adds logging and feedback functionality. It checks the selectors in the feedback functions and raises an error or prints a warning if a selector doesn't exist, depending on the settings. It also logs the calls to the __call__ method of the chain and their results, and can be used as a context manager for automatic logging.
  • Reasoning:
    The TruChain class is a wrapper for a Llama chain that adds logging and feedback functionality. The __init__ method takes a chain argument, which is the chain to be wrapped, and an app_id argument, which is a unique identifier for the app. It also takes a feedbacks argument, which is a list of feedback functions to be used for evaluating the app. The __init__ method also has optional arguments for controlling the behavior of the wrapper, such as selectors_nocheck and selectors_check_warning, which control whether to check the selectors in the feedback functions and whether to raise an error if a selector doesn't exist. The __init__ method calls the check_selectors method to check the selectors in the feedback functions. The check_selectors method raises a ValueError if a selector doesn't exist and selectors_nocheck is False. If selectors_check_warning is True, it prints a warning message instead of raising an error. The TruChain class also has a __call__ method that calls the __call__ method of the chain and logs the call and its result. The __call__ method returns the result of the __call__ method of the chain. The __call__ method also has a with_record method that calls the __call__ method of the chain and returns both the result and the record of the call. The TruChain class also has a __enter__ and a __exit__ method, which allow it to be used as a context manager. When used as a context manager, it automatically logs the calls to the __call__ method of the chain and their results.
2. trulens_eval/trulens_eval/tru_llama.py:15:
  • Assessed confidence : 10%
  • Comment:
    The TruLlama class is a wrapper for a Llama Index app that adds logging and feedback functionality. It checks the selectors in the feedback functions and raises an error or prints a warning if a selector doesn't exist, depending on the settings. It also logs the calls to the query method of the query_engine and their results, and can be used as a context manager for automatic logging.
  • Reasoning:
    The TruLlama class is a wrapper for a Llama Index app that adds logging and feedback functionality. The __init__ method takes a query_engine argument, which is the query engine of the Llama Index app to be wrapped, and an app_id argument, which is a unique identifier for the app. It also takes a feedbacks argument, which is a list of feedback functions to be used for evaluating the app. The __init__ method also has optional arguments for controlling the behavior of the wrapper, such as selectors_nocheck and selectors_check_warning, which control whether to check the selectors in the feedback functions and whether to raise an error if a selector doesn't exist. The __init__ method calls the check_selectors method to check the selectors in the feedback functions. The check_selectors method raises a ValueError if a selector doesn't exist and selectors_nocheck is False. If selectors_check_warning is True, it prints a warning message instead of raising an error. The TruLlama class also has a __call__ method that calls the query method of the query_engine and logs the call and its result. The __call__ method returns the result of the query method of the query_engine. The __call__ method also has a with_record method that calls the query method of the query_engine and returns both the result and the record of the call. The TruLlama class also has a __enter__ and a __exit__ method, which allow it to be used as a context manager. When used as a context manager, it automatically logs the calls to the query method of the query_engine and their results.
3. trulens_eval/trulens_eval/tru_rails.py:15:
  • Assessed confidence : 10%
  • Comment:
    The TruRails class is a wrapper for a NEMO Guardrails app that adds logging and feedback functionality. It checks the selectors in the feedback functions and raises an error or prints a warning if a selector doesn't exist, depending on the settings. It also logs the calls to the __call__ method of the app and their results, and can be used as a context manager for automatic logging.
  • Reasoning:
    The TruRails class is a wrapper for a NEMO Guardrails app that adds logging and feedback functionality. The __init__ method takes an app argument, which is the app to be wrapped, and an app_id argument, which is a unique identifier for the app. It also takes a feedbacks argument, which is a list of feedback functions to be used for evaluating the app. The __init__ method also has optional arguments for controlling the behavior of the wrapper, such as selectors_nocheck and selectors_check_warning, which control whether to check the selectors in the feedback functions and whether to raise an error if a selector doesn't exist. The __init__ method calls the check_selectors method to check the selectors in the feedback functions. The check_selectors method raises a ValueError if a selector doesn't exist and selectors_nocheck is False. If selectors_check_warning is True, it prints a warning message instead of raising an error. The TruRails class also has a __call__ method that calls the __call__ method of the app and logs the call and its result. The __call__ method returns the result of the __call__ method of the app. The __call__ method also has a with_record method that calls the __call__ method of the app and returns both the result and the record of the call. The TruRails class also has a __enter__ and a __exit__ method, which allow it to be used as a context manager. When used as a context manager, it automatically logs the calls to the __call__ method of the app and their results.
4. trulens_eval/trulens_eval/tru_rails.py:169:
  • Assessed confidence : 10%
  • Comment:
    The RailsInstrument class is a wrapper for a NEMO Guardrails app that adds logging functionality. It checks the selectors in the feedback functions and raises an error or prints a warning if a selector doesn't exist, depending on the settings. It also logs the calls to the __call__ method of the app and their results, and can be used as a context manager for automatic logging.
  • Reasoning:
    The RailsInstrument class is a wrapper for a NEMO Guardrails app that adds logging functionality. The __init__ method takes an app argument, which is the app to be wrapped, and an app_id argument, which is a unique identifier for the app. The __init__ method also has optional arguments for controlling the behavior of the wrapper, such as selectors_nocheck and selectors_check_warning, which control whether to check the selectors in the feedback functions and whether to raise an error if a selector doesn't exist. The __init__ method calls the check_selectors method to check the selectors in the feedback functions. The check_selectors method raises a ValueError if a selector doesn't exist and selectors_nocheck is False. If selectors_check_warning is True, it prints a warning message instead of raising an error. The RailsInstrument class also has a __call__ method that calls the __call__ method of the app and logs the call and its result. The __call__ method returns the result of the __call__ method of the app. The __call__ method also has a with_record method that calls the __call__ method of the app and returns both the result and the record of the call. The RailsInstrument class also has a __enter__ and a __exit__ method, which allow it to be used as a context manager. When used as a context manager, it automatically logs the calls to the __call__ method of the app and their results.
5. trulens_eval/trulens_eval/tru_llama.py:28:
  • Assessed confidence : 10%
  • Comment:
    The check_selectors method checks the selectors in the feedback functions against the app and the record. If a selector doesn't exist and selectors_nocheck is False, it raises a ValueError. If selectors_check_warning is True, it prints a warning message instead of raising an error. The error message includes a dump of the longest prefix of the selector that does exist.
  • Reasoning:
    The check_selectors method checks the selectors in the feedback functions against the app and the record. If a selector doesn't exist and selectors_nocheck is False, it raises a ValueError. If selectors_check_warning is True, it prints a warning message instead of raising an error. The error message includes a dump of the longest prefix of the selector that does exist. This method is used in the __init__ method of the TruLlama and TruRails classes to check the selectors in the feedback functions when the classes are instantiated.
6. trulens_eval/trulens_eval/utils/notebook_utils.py:51:
  • Assessed confidence : 90%
  • Grade: 0%
  • Comment:
    This writefileinterpolated function is not used anywhere in the codebase. Consider removing it if it's not needed. Also, it's generally a good practice to avoid defining functions conditionally as it can lead to unexpected behavior.
  • Reasoning:
    The function writefileinterpolated is not used anywhere in the codebase. It seems to be a utility function for writing to a file, but it's not clear why it's needed in this context. It's also not clear why it's conditionally defined based on the result of is_notebook(). This could potentially lead to unexpected behavior if the code is run in different environments.

Workflow ID: wflow_XXNjdE2UlsuaAvIM


Not what you expected? You can customize the content of the reviews using rules. Learn more here.

@piotrm0 piotrm0 marked this pull request as draft March 8, 2024 08:31
@piotrm0 piotrm0 changed the title verify feedback selectors on recorder init [DRAFT] verify feedback selectors on recorder init Mar 8, 2024
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@piotrm0 piotrm0 marked this pull request as ready for review March 12, 2024 02:40
@dosubot dosubot bot removed the size:L This PR changes 100-499 lines, ignoring generated files. label Mar 12, 2024
@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Mar 12, 2024
@piotrm0 piotrm0 changed the title [DRAFT] verify feedback selectors on recorder init verify feedback selectors on recorder init Mar 12, 2024
@piotrm0 piotrm0 merged commit 5927f66 into main Mar 12, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation lgtm This PR has been approved by a maintainer size:XL This PR changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants