Skip to content

The initial implementation of the code was incorrect, but RDAgent failed to detect it. #520

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
liujianliuku opened this issue Jan 13, 2025 · 5 comments
Labels
question Further information is requested

Comments

@liujianliuku
Copy link

During the first loop, the implement of SMA10 is wrong, the wrong code is : df['daily_pct_change'] = df['$close'].pct_change() , which lead the whole following loops to a wrong direction. Can I simply do something to fix this issue through some config or prompts?

Image
@liujianliuku liujianliuku added the question Further information is requested label Jan 13, 2025
@liujianliuku
Copy link
Author

It's supposed to be df['daily_pct_change'] = df['$close'].groupby(level='instrument').pct_change()

@TPLin22
Copy link
Collaborator

TPLin22 commented Jan 14, 2025

Intended for generality, the prompts do not target specific factor codes. However, if you indeed wish to provide specific code hints, this can be achieved. You can try edit prompts in rdagent/components/coder, although it is not recommended.

@liujianliu
Copy link

Intended for generality, the prompts do not target specific factor codes. However, if you indeed wish to provide specific code hints, this can be achieved. You can try edit prompts in rdagent/components/coder, although it is not recommended.

Do you have some better idea for the issue?

@peteryang1
Copy link
Collaborator

Hi @liujianliu @liujianliuku !
Thank you for bringing this important issue to our attention and for your interest in improving RD-Agent!

You are correct that there is a bug in the implementation of SMA10 in the first loop, where the code df['daily_pct_change'] = df['$close'].pct_change() does not handle the two-level index in the data correctly. The correct implementation should be df['daily_pct_change'] = df['$close'].groupby(level='instrument').pct_change() as you pointed out.

To address this issue, a good approach would be to add a new evaluation step in the evaluator. The problem arises because the LLM fails to understand the two-level index in the data. To mitigate this, we can randomly select half of the instruments and feed them into the code to generate the factors. The evaluator will then test the factor values in the selected instruments against the original factor values calculated from the whole dataset. The evaluator will only give a pass signal if every value matches.

Also, this evaluator randomly selects the dates in the dataset which can also test the data leakage problem. 😄

Implementing this new evaluator is not very complex. However, as our team is currently busy working on new features, we encourage you to participate in our open-source repository and draft a pull request (PR) to address this bug. We will do our best to help review and refine the code to ensure the issue is fixed.

Thank you once again for your valuable contribution!

@liujianliu
Copy link

Hi @liujianliu @liujianliuku ! Thank you for bringing this important issue to our attention and for your interest in improving RD-Agent!

You are correct that there is a bug in the implementation of SMA10 in the first loop, where the code df['daily_pct_change'] = df['$close'].pct_change() does not handle the two-level index in the data correctly. The correct implementation should be df['daily_pct_change'] = df['$close'].groupby(level='instrument').pct_change() as you pointed out.

To address this issue, a good approach would be to add a new evaluation step in the evaluator. The problem arises because the LLM fails to understand the two-level index in the data. To mitigate this, we can randomly select half of the instruments and feed them into the code to generate the factors. The evaluator will then test the factor values in the selected instruments against the original factor values calculated from the whole dataset. The evaluator will only give a pass signal if every value matches.

Also, this evaluator randomly selects the dates in the dataset which can also test the data leakage problem. 😄

Implementing this new evaluator is not very complex. However, as our team is currently busy working on new features, we encourage you to participate in our open-source repository and draft a pull request (PR) to address this bug. We will do our best to help review and refine the code to ensure the issue is fixed.

Thank you once again for your valuable contribution!

Thank you for providing the solution. I will try this method to submit a PR, but I can't guarantee the timeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants