Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OFV market resolver #225

Merged
merged 111 commits into from
Aug 7, 2024
Merged

Conversation

kongzii
Copy link
Contributor

@kongzii kongzii commented May 7, 2024

Proposed changes

Implements a new tool that can be used for market resolution.

Types of changes

What types of changes does your code introduce? (A breaking change is a fix or feature that would cause existing functionality and APIs to not work as expected.)
Put an x in the box that applies

  • Non-breaking fix (non-breaking change which fixes an issue)
  • Breaking fix (breaking change which fixes an issue)
  • Non-breaking feature (non-breaking change which adds functionality)
  • Breaking feature (breaking change which adds functionality)
  • Refactor (non-breaking change which changes implementation)
  • Messy (mixture of the above - requires explanation!)

Checklist

Put an x in the boxes that apply.

  • I have read the CONTRIBUTING doc
  • I am making a pull request against the main branch (left side). Also you should start your branch off our main.
  • Lint and unit tests pass locally with my changes
    • mypy found 54 errors, but in files that aren't created or modified by me
  • I have added tests that prove my fix is effective or that my feature works

Further comments

I used a customised fork of OpenFactVerifier (PR is currently open again their main branch) to create a new market resolver.

The steps are:

  1. Verify that the given market is reasonable to resolve using the is_predictable_binary function.
  2. Rewrite the market's question into a statement. For example, Will former Trump Organization CFO Allen Weisselberg be sentenced to jail by 15 April 2024? would be rewritten to Former Trump Organization CFO Allen Weisselberg was sentenced to jail by 15 April 2024..
  3. Use OpenFactVerifier to verify the given statement. It will be either True, False or None if there is some kind of problem.

I also implemented a benchmark where I compare (1) current resolution visible on Omen, (2) resolution obtained by packages.napthaai.customs.resolve_market_reasoning.resolve_market_reasoning by running it today and (3) resolution by OFV. The results are:

  1. Current accuracy: 73.47%
  2. Original's new run accuracy: 57.14
  3. OFV's accuracy: 91.84%

However, I don't understand why (2) is so low, I expected it to be at least a little better than (1), because new information is available on internet. If you can point out a bug in my implementation, I'd re-run the benchmark again.

The benchmark is run over 50 markets that I resolved manually and the full results are available here.

I will pick just the ones where OFV made mistakes:

  • Will Apple introduce spatial computing in Mac's screen by 14th April 2024?
    • This is questionable, should Vision Pro count as it? I guess nobody would really expect that Macbooks would get spatial computing by themselves by April 2024, but Vision Pro allows to interact with macs.
  • Will McDonald's successfully buy back all its Israeli restaurants by 12 April 2024?
    • I found articles that it will happen, but no article that it already did happen. I found an article from April 5 saying they signed the deal. Bu that shouldn't count as market was created on April 8.
  • Will Samsung release the Galaxy Z Fold 6 Ultra on or before 11 April 2024?
  • Will Disney Plus implement its password-sharing crackdown by 11 April 2024?
    • It's questionable if annoucment should make this YES, or the fact that it will be used from June should make it NO.

As we can see, 3 out of 4 mistakes are questionable.

pyproject.toml Outdated
py-multicodec = "==0.2.1"
grpcio = "==1.53.0"
python = ">=3.10,<3.12"
open-autonomy = { git = "https://github.com/kongzii/open-autonomy.git", rev = "13344d6551222224492024623cd10aa79ad1a13e" }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the plan for the dependency updates in related repositories, please?

And I know Evan was working on some dependency-related stuff as well, I'll get in touch with him about the current status.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the issue you're facing? Did you have to make adjustments to open-autonomy? If so, feel free to make a PR on the main open-autonomy repo, we can take a look.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, PR for open-autonomy is open: valory-xyz/open-autonomy#2223
And for tomte too: valory-xyz/tomte#27

The issue we are facing is that with current strict versioning, it's almost impossible to use it with other libraries.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were to merge this, it means that we need to keep using your fork until we reflect your changes on our framework.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then maybe you could merge and release changes in these two PRs so I can use them properly here?

@kongzii
Copy link
Contributor Author

kongzii commented May 8, 2024

@richardblythman Could you please explain why this pattern of doing def run(**kwargs) is necessary? I saw it in other tools as well but I didn't understand, because from the caller's perspective, having

def run(**kwargs): ...

or having

def run(
  prompt: str,
  api_keys: dict[str, str],
): ...

is called in the same way by doing run(prompt=..., api_keys=...). But in the second way it cleaner what arguments the function expects and what types.

@richardblythman
Copy link
Contributor

richardblythman commented May 8, 2024

def run(
  prompt: str,
  api_keys: dict[str, str],
): ...

@kongzii I think this would have worked, but it's not what you had originally.

@kongzii
Copy link
Contributor Author

kongzii commented May 8, 2024

I think this would have worked, but it's not what you had originally.

@richardblythman Yeah, I see the changes in arguments, that's clear to me that it needs to have the same interface as other tools to make it simpler for calling from agents/mech. I just didn't understand why the **kwargs.

@richardblythman
Copy link
Contributor

@kongzii I just copied and pasted. Not sure if the team at Valory prefer one over the other.

victorpolisetty and others added 5 commits May 17, 2024 09:54
# Conflicts:
#	packages/packages.json
#	packages/valory/agents/mech/aea-config.yaml
#	packages/valory/customs/prediction_request/component.yaml
#	packages/valory/services/mech/service.yaml
#	packages/valory/skills/task_execution/skill.yaml
@0xArdi 0xArdi mentioned this pull request Aug 7, 2024
10 tasks
@0xArdi 0xArdi changed the base branch from main to feat/ofv-2 August 7, 2024 15:00
@0xArdi 0xArdi changed the base branch from feat/ofv-2 to main August 7, 2024 15:02
@0xArdi 0xArdi changed the base branch from main to feat/ofv-2 August 7, 2024 15:03
@0xArdi 0xArdi merged commit 678a47f into valory-xyz:feat/ofv-2 Aug 7, 2024
5 of 7 checks passed
This was referenced Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants