-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataframe effect size #52
Conversation
Codecov Report
@@ Coverage Diff @@
## main #52 +/- ##
==========================================
+ Coverage 93.25% 94.75% +1.49%
==========================================
Files 3 4 +1
Lines 267 343 +76
Branches 53 67 +14
==========================================
+ Hits 249 325 +76
Misses 18 18
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
@jeromekelleher |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, a few comments.
tstrait/simulate_effect_size.py
Outdated
|
||
|
||
def sim_traits(ts, num_causal, model, alpha=0, random_seed=None): | ||
"""Rnadomly selects causal sites from the inputted tree sequence data, and simulates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo Rnadomly
tstrait/simulate_effect_size.py
Outdated
import tstrait | ||
|
||
|
||
class EffectSizeSimulator: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps TraitSimulator
would be clearer?
tstrait/simulate_effect_size.py
Outdated
simulation. The tree sequence data must include a mutation. | ||
:type ts: tskit.TreeSequence | ||
:param num_causal: Number of causal sites that will be chosen randomly. It must be | ||
a positive integer that is greater than the number of sites in the tree sequence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
less than?
tstrait/simulate_effect_size.py
Outdated
data=[causal_site_array, causal_state_array, beta_array] | ||
).T | ||
effect_size_df = effect_size_df.set_axis( | ||
["SiteID", "CausalState", "EffectSize"], axis="columns" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good - any particular reason for CamelCasing the column names? I would tend to see them as variables, so would use site_id
, causal_state
etc
tstrait/simulate_effect_size.py
Outdated
return effect_size_df | ||
|
||
|
||
def sim_traits(ts, num_causal, model, alpha=0, random_seed=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess an issue here is that we're simulating a single trait. Should we call this function sim_trait
then?
tests/test_sim_trait.py
Outdated
alpha=alpha, | ||
random_seed=random_seed, | ||
) | ||
assert sim_result.shape[0] == num_causal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests aren't very strong, and there's a lot of duplicated code. You could delegate most of this to a method
def check_dimensions(self, df, num_causal):
assert len(df) == num_causal
# etc
tests/test_sim_trait.py
Outdated
): | ||
tstrait.sim_traits(ts=ts, num_causal=1, model=model, alpha=1, random_seed=1) | ||
|
||
@pytest.mark.parametrize("num_ind", [1, 2, np.array([5])[0]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's really no point in taking the cross product of these parameters - they don't interact and all you're testing is whether you raise a ValueError. It's just a waste of time and electricity!
will also need to add pandas to setup.cfg install requires |
@jeromekelleher |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, let's squash and merge
Generate a pandas dataframe to output simulated effect sizes
Created a function to obtain dataframe based on effect size simulation.