The initial implementation of the code was incorrect, but RDAgent failed to detect it. #520

liujianliuku · 2025-01-13T15:44:48Z

During the first loop, the implement of SMA10 is wrong, the wrong code is : df['daily_pct_change'] = df['$close'].pct_change() , which lead the whole following loops to a wrong direction. Can I simply do something to fix this issue through some config or prompts?

liujianliuku · 2025-01-13T15:48:34Z

It's supposed to be df['daily_pct_change'] = df['$close'].groupby(level='instrument').pct_change()

TPLin22 · 2025-01-14T10:27:33Z

Intended for generality, the prompts do not target specific factor codes. However, if you indeed wish to provide specific code hints, this can be achieved. You can try edit prompts in rdagent/components/coder, although it is not recommended.

liujianliu · 2025-01-14T15:07:41Z

Intended for generality, the prompts do not target specific factor codes. However, if you indeed wish to provide specific code hints, this can be achieved. You can try edit prompts in rdagent/components/coder, although it is not recommended.

Do you have some better idea for the issue?

peteryang1 · 2025-01-16T07:45:33Z

Hi @liujianliu @liujianliuku !
Thank you for bringing this important issue to our attention and for your interest in improving RD-Agent!

You are correct that there is a bug in the implementation of SMA10 in the first loop, where the code df['daily_pct_change'] = df['$close'].pct_change() does not handle the two-level index in the data correctly. The correct implementation should be df['daily_pct_change'] = df['$close'].groupby(level='instrument').pct_change() as you pointed out.

To address this issue, a good approach would be to add a new evaluation step in the evaluator. The problem arises because the LLM fails to understand the two-level index in the data. To mitigate this, we can randomly select half of the instruments and feed them into the code to generate the factors. The evaluator will then test the factor values in the selected instruments against the original factor values calculated from the whole dataset. The evaluator will only give a pass signal if every value matches.

Also, this evaluator randomly selects the dates in the dataset which can also test the data leakage problem. 😄

Implementing this new evaluator is not very complex. However, as our team is currently busy working on new features, we encourage you to participate in our open-source repository and draft a pull request (PR) to address this bug. We will do our best to help review and refine the code to ensure the issue is fixed.

Thank you once again for your valuable contribution!

liujianliu · 2025-01-16T15:22:10Z

Hi @liujianliu @liujianliuku ! Thank you for bringing this important issue to our attention and for your interest in improving RD-Agent!

You are correct that there is a bug in the implementation of SMA10 in the first loop, where the code df['daily_pct_change'] = df['$close'].pct_change() does not handle the two-level index in the data correctly. The correct implementation should be df['daily_pct_change'] = df['$close'].groupby(level='instrument').pct_change() as you pointed out.

To address this issue, a good approach would be to add a new evaluation step in the evaluator. The problem arises because the LLM fails to understand the two-level index in the data. To mitigate this, we can randomly select half of the instruments and feed them into the code to generate the factors. The evaluator will then test the factor values in the selected instruments against the original factor values calculated from the whole dataset. The evaluator will only give a pass signal if every value matches.

Also, this evaluator randomly selects the dates in the dataset which can also test the data leakage problem. 😄

Implementing this new evaluator is not very complex. However, as our team is currently busy working on new features, we encourage you to participate in our open-source repository and draft a pull request (PR) to address this bug. We will do our best to help review and refine the code to ensure the issue is fixed.

Thank you once again for your valuable contribution!

Thank you for providing the solution. I will try this method to submit a PR, but I can't guarantee the timeline.

liujianliuku added the question label Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The initial implementation of the code was incorrect, but RDAgent failed to detect it. #520

The initial implementation of the code was incorrect, but RDAgent failed to detect it. #520

liujianliuku commented Jan 13, 2025

liujianliuku commented Jan 13, 2025

TPLin22 commented Jan 14, 2025

liujianliu commented Jan 14, 2025

peteryang1 commented Jan 16, 2025

liujianliu commented Jan 16, 2025

The initial implementation of the code was incorrect, but RDAgent failed to detect it. #520

The initial implementation of the code was incorrect, but RDAgent failed to detect it. #520

Comments

liujianliuku commented Jan 13, 2025

liujianliuku commented Jan 13, 2025

TPLin22 commented Jan 14, 2025

liujianliu commented Jan 14, 2025

peteryang1 commented Jan 16, 2025

liujianliu commented Jan 16, 2025