Conversation
…included as a dir as well.
This reverts commit 3b83ce3.
soulios
left a comment
There was a problem hiding this comment.
So, I checked the files, it doesnt seem to be something wrong with them. Can someone review this? do I have to do anything else besides checking the files?
|
There seem to be still conflicts: My strategy would be the following:
If needed I can help next week. Should be around Monday and Wednesday. Have a nice git-free weekend :) |
|
Can you assign it to me and put someone else as a reviewer? so I can resolve some conflicts myself. |
|
@soulios cool that there are no more conflicts! I started the workflows. Linter and tests revealed some (hopefully) minor problems :) Could you take care of these? If I find some time I may add some comments earlier. ping @halirutan maybe you can also add some first comments. |
|
flake and pytests checks have failed. |
bernt-matthias
left a comment
There was a problem hiding this comment.
Can you add some test calls for the new functionality to the smoke tests defined here:
deepFPlearn/.github/workflows/pr.yml
Line 66 in 0f85b06
From today's perspective, I would implement it as shell scripts in tests/ and in the pr.yaml file just loop over all those scripts and execute them. Might simplify it to create new tests.
dfpl/options.py
Outdated
| metavar="DIR", | ||
| help="The directory where the full model of the encoder will be saved (if trainAE=True) or " | ||
| "loaded from (if trainAE=False). Provide a full path here.", | ||
| help="The directory where the full model of the encoder will be saved...", |
There was a problem hiding this comment.
Why shorten the help?
dfpl/options.py
Outdated
| metavar="BOOL", | ||
| type=bool, | ||
| help="Track training performance via Weights & Biases, see https://wandb.ai.", | ||
| metavar="STRING", |
There was a problem hiding this comment.
Is it intended that the type changed? This parameter and several others.
|
Not sure if I understand why this has been merged yet.
The solution^{TM} is to create a proper pull request of your changes upstream! I also have to note that this is a blocker conda and Galaxy deployment .. besides being incredibly hard to maintain .. I would say impossible (because someone needs to sync this with upstream changes of chemprop). |
|
But if the user installs my version of chemprop at this point, any code will be running. Why should it be always the latest version of chemprop? Mine has a few extra functionalities that the original project does not. i.e tracking the training metrics. I have asked them before if they would like it and said no(in a private chat). So I dont think I can open a pr to them.or shall I and we can decide from there? |
True
I'm just saying that it's quite an effort to get updates (like bug fixes and improvements) .. my pessimistic prediction would be that we will be stuck with this version for a long time. Also, think about the wording in a publication. We can't just write we used chemprop version x.y, but we used a modified version of chemprop and describe in detail what has been changed?
Sure. But it needs to be well prepared. At the moment a few changes seem wrong (e.g. changes in the imports). It's also important to submit PR(s) that are as small/atomic as possible. It's probably a good idea to make all new functionality optional. We also should think twice if at least some of the changes may be implemented externally (or if we can suggest a change of chemprop such that it can be implemented externally). |
| ) -> (np.ndarray, np.ndarray): | ||
| # check the value counts and abort if too imbalanced | ||
| df: pd.DataFrame, target: str, opts: options.Options, return_dataframe: bool = False | ||
| ) -> Union[Tuple[np.ndarray, np.ndarray], pd.DataFrame]: |
There was a problem hiding this comment.
This makes it a bit hard to use because one always needs to check if a DataFrame or a tuple has been returned.
There was a problem hiding this comment.
Also the function is not used correctly in vae.py: the use assumes a triple (and there is no seed parameter)
train_data, val_data, test_data = weight_split(
df, sizes=(1 - opts.testSize, 0.0, opts.testSize), bias="small", seed=42
)
wondering if this works at all.
Unfortunately, you completely ignored my request for a test.
There was a problem hiding this comment.
In order to find such problems easily you can use mypy dfpl (note that some errors predate your PR :) )
There was a problem hiding this comment.
This makes it a bit hard to use because one always needs to check if a DataFrame or a tuple has been returned.
Well the retun_dataframe argument solves this ambiguity,no?
Also the function is not used correctly in vae.py: the use assumes a triple (and there is no seed parameter)
train_data, val_data, test_data = weight_split( df, sizes=(1 - opts.testSize, 0.0, opts.testSize), bias="small", seed=42 )wondering if this works at all.
Unfortunately, you completely ignored my request for a test.
I didnt know about mypy. On it!
There was a problem hiding this comment.
Well the retun_dataframe argument solves this ambiguity,no?
Indeed
Regarding weight_split .. somehow I mixed up the functions .. but the comment regarding mypy is still valid. But I think you only need to fix errors concerning your contributions.
Just to clarify: For some reason (temporary problem at github ...), the automated testing did not run for the last commits prior to the merge. Thus the PR may have appeared successful, but it wasn't (at least tests are failing locally). |
No description provided.