-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added two new caches: FileCache, MemoryCache #2
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love these two new Evaluators! I've added a few comments that I think are important to address.
When you have a chance, please take a look and let me know your thoughts on them.
self._num_cache_hits += 1 | ||
if lookup: | ||
return Evaluator.Match(prediction=prediction.output, expected=expected) | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I get the logic here. If we miss, we want to break for this loop, not return, right? Otherwise, we don't count the miss (line 44).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for expected in prediction.test.expected:
lookup = self.lookup(expected, prediction.output)
# None indicates that nothing was found in the cache
# while True and False are both valid cache values
if lookup is None:
continue
self._num_cache_hits += 1
if lookup:
return Evaluator.Match(prediction=prediction.output, expected=expected)
return None
If lookup is None, then we don't have any entry for that prediction, expected tuple so we continue.
If we have an entry, it might be a positive entry (they match) and then we can stop searching.
If we have an entry, it might also be a negative entry (they do not match) and we can also stop searching [this could be debated tbh]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have a look now, your comment helped me find a corner case which I also wrote a test case for!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thank you for adding some documentation, it's much clearer now!
|
||
from pydantic import BaseModel | ||
|
||
Json = Union[str, bool, list, dict] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the right way of typing a JSON-serializable object/variable. I found a very interesting issue in the Python official repo: python/typing#182.
I really like this approach:
All major type checkers now support recursive type aliases by default, so this should largely work:
JSON: TypeAlias = dict[str, "JSON"] | list["JSON"] | str | int | float | bool | None
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting I will check it out, I removed int and float on purpose since it made pydantic parse things in a strange way
"2" -> 2
["2"] -> ["2"]
which was kind of unexpected and hard to work with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like Pydantic does not yet support recursive TypeAlias
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks really good, especially:
- better inline comments as dev-friendly documentation
- split of evaluators into individual modules
- better and larger test coverage
Approved! 👍
self._num_cache_hits += 1 | ||
if lookup: | ||
return Evaluator.Match(prediction=prediction.output, expected=expected) | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thank you for adding some documentation, it's much clearer now!
Adds two new caches, FileCache and MemoryCache.
FileCache is used by default from the CLI.
or
from the cli (same from
bench eval
)Example cache
There is most likely a better way to store the keys, but right now we encode them to json strings, this way strings, dicts, ints, lists are all supported for
expected
and foroutput
. Didn't want to shoehorn us into only supporting strings.The cli also indicates if the cache was used or not.