Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataframe effect size #52

Merged
merged 1 commit into from
Jul 15, 2023
Merged

Dataframe effect size #52

merged 1 commit into from
Jul 15, 2023

Conversation

daikitag
Copy link
Collaborator

Created a function to obtain dataframe based on effect size simulation.

@codecov
Copy link

codecov bot commented Jul 14, 2023

Codecov Report

Merging #52 (dd15d4b) into main (eb1ad76) will increase coverage by 1.49%.
The diff coverage is 100.00%.

❗ Current head dd15d4b differs from pull request most recent head 6f0d265. Consider uploading reports for the commit 6f0d265 to get more accurate results

@@            Coverage Diff             @@
##             main      #52      +/-   ##
==========================================
+ Coverage   93.25%   94.75%   +1.49%     
==========================================
  Files           3        4       +1     
  Lines         267      343      +76     
  Branches       53       67      +14     
==========================================
+ Hits          249      325      +76     
  Misses         18       18              
Impacted Files Coverage Δ
tstrait/__init__.py 100.00% <100.00%> (ø)
tstrait/simulate_effect_size.py 100.00% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@daikitag
Copy link
Collaborator Author

@jeromekelleher
Thank you very much for your feedback on the previous pull request, and I really appreciate it, as it is my first time to do these things.
I have only included the files to simulate a dataframe of effect sizes, such that it will be easier to see the changes being made. I will make a pull request of additional files after this one, and please let me know if there is anything else that you need. I really appreciate your time.

@daikitag daikitag mentioned this pull request Jul 14, 2023
Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, a few comments.



def sim_traits(ts, num_causal, model, alpha=0, random_seed=None):
"""Rnadomly selects causal sites from the inputted tree sequence data, and simulates
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo Rnadomly

import tstrait


class EffectSizeSimulator:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps TraitSimulator would be clearer?

tstrait/simulate_effect_size.py Show resolved Hide resolved
simulation. The tree sequence data must include a mutation.
:type ts: tskit.TreeSequence
:param num_causal: Number of causal sites that will be chosen randomly. It must be
a positive integer that is greater than the number of sites in the tree sequence
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

less than?

data=[causal_site_array, causal_state_array, beta_array]
).T
effect_size_df = effect_size_df.set_axis(
["SiteID", "CausalState", "EffectSize"], axis="columns"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - any particular reason for CamelCasing the column names? I would tend to see them as variables, so would use site_id, causal_state etc

return effect_size_df


def sim_traits(ts, num_causal, model, alpha=0, random_seed=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess an issue here is that we're simulating a single trait. Should we call this function sim_trait then?

alpha=alpha,
random_seed=random_seed,
)
assert sim_result.shape[0] == num_causal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests aren't very strong, and there's a lot of duplicated code. You could delegate most of this to a method

def check_dimensions(self, df, num_causal):
    assert len(df) == num_causal  
    # etc

):
tstrait.sim_traits(ts=ts, num_causal=1, model=model, alpha=1, random_seed=1)

@pytest.mark.parametrize("num_ind", [1, 2, np.array([5])[0]])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's really no point in taking the cross product of these parameters - they don't interact and all you're testing is whether you raise a ValueError. It's just a waste of time and electricity!

@jeromekelleher
Copy link
Member

will also need to add pandas to setup.cfg install requires

@daikitag
Copy link
Collaborator Author

@jeromekelleher
Thank you very much for your prompt feedback on my work. I have edited all sections that you had mentioned, and would it be possible for you to check them whenever you have some time? If they look good, I will squash all commits like before, and create 1 commit to reflect all changes.

@daikitag daikitag mentioned this pull request Jul 15, 2023
Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let's squash and merge

Generate a pandas dataframe to output simulated effect sizes
@mergify mergify bot merged commit 1601552 into tskit-dev:main Jul 15, 2023
@daikitag daikitag deleted the dataframe-effect-size branch July 15, 2023 11:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants