New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Dataframe effect size #52

Merged

mergify merged 1 commit into tskit-dev:main from daikitag:dataframe-effect-size

Jul 15, 2023

Collaborator

daikitag commented Jul 14, 2023

Created a function to obtain dataframe based on effect size simulation.

codecov bot commented Jul 14, 2023 •

edited

Loading

Codecov Report

Merging #52 (dd15d4b) into main (eb1ad76) will increase coverage by 1.49%.
The diff coverage is 100.00%.

❗ Current head dd15d4b differs from pull request most recent head 6f0d265. Consider uploading reports for the commit 6f0d265 to get more accurate results

@@            Coverage Diff             @@
##             main      #52      +/-   ##
==========================================
+ Coverage   93.25%   94.75%   +1.49%     
==========================================
  Files           3        4       +1     
  Lines         267      343      +76     
  Branches       53       67      +14     
==========================================
+ Hits          249      325      +76     
  Misses         18       18

Impacted Files	Coverage Δ
tstrait/__init__.py	`100.00% <100.00%> (ø)`
tstrait/simulate_effect_size.py	`100.00% <100.00%> (ø)`

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Collaborator Author

daikitag commented Jul 14, 2023

@jeromekelleher
Thank you very much for your feedback on the previous pull request, and I really appreciate it, as it is my first time to do these things.
I have only included the files to simulate a dataframe of effect sizes, such that it will be easier to see the changes being made. I will make a pull request of additional files after this one, and please let me know if there is anything else that you need. I really appreciate your time.

daikitag mentioned this pull request

Dataframe simulation #51

Closed

jeromekelleher reviewed

View reviewed changes

Member

jeromekelleher left a comment

Looks great, a few comments.

tstrait/simulate_effect_size.py Outdated



		def sim_traits(ts, num_causal, model, alpha=0, random_seed=None):
		"""Rnadomly selects causal sites from the inputted tree sequence data, and simulates

Member

jeromekelleher Jul 14, 2023

typo Rnadomly

tstrait/simulate_effect_size.py Outdated

		import tstrait


		class EffectSizeSimulator:

Member

jeromekelleher Jul 14, 2023

Perhaps TraitSimulator would be clearer?

tstrait/simulate_effect_size.py Show resolved Hide resolved

tstrait/simulate_effect_size.py Outdated

+                      simulation. The tree sequence data must include a mutation.
+                  :type ts: tskit.TreeSequence
+                  :param num_causal: Number of causal sites that will be chosen randomly. It must be
+                      a positive integer that is greater than the number of sites in the tree sequence

Member

jeromekelleher Jul 14, 2023

less than?

tstrait/simulate_effect_size.py Outdated

+                          data=[causal_site_array, causal_state_array, beta_array]
+                      ).T
+                      effect_size_df = effect_size_df.set_axis(
+                          ["SiteID", "CausalState", "EffectSize"], axis="columns"

Member

jeromekelleher Jul 14, 2023

Looks good - any particular reason for CamelCasing the column names? I would tend to see them as variables, so would use site_id, causal_state etc

tstrait/simulate_effect_size.py Outdated

		return effect_size_df


		def sim_traits(ts, num_causal, model, alpha=0, random_seed=None):

Member

jeromekelleher Jul 14, 2023

I guess an issue here is that we're simulating a single trait. Should we call this function sim_trait then?

tests/test_sim_trait.py Outdated

+                          alpha=alpha,
+                          random_seed=random_seed,
+                      )
+                      assert sim_result.shape[0] == num_causal

Member

jeromekelleher Jul 14, 2023

These tests aren't very strong, and there's a lot of duplicated code. You could delegate most of this to a method

def check_dimensions(self, df, num_causal):
    assert len(df) == num_causal  
    # etc

tests/test_sim_trait.py Outdated

+                      ):
+                          tstrait.sim_traits(ts=ts, num_causal=1, model=model, alpha=1, random_seed=1)
+                  @pytest.mark.parametrize("num_ind", [1, 2, np.array([5])[0]])

Member

jeromekelleher Jul 14, 2023

There's really no point in taking the cross product of these parameters - they don't interact and all you're testing is whether you raise a ValueError. It's just a waste of time and electricity!

Member

jeromekelleher commented Jul 14, 2023

will also need to add pandas to setup.cfg install requires

Collaborator Author

daikitag commented Jul 14, 2023

@jeromekelleher
Thank you very much for your prompt feedback on my work. I have edited all sections that you had mentioned, and would it be possible for you to check them whenever you have some time? If they look good, I will squash all commits like before, and create 1 commit to reflect all changes.

daikitag mentioned this pull request

causal sites #53

Closed

jeromekelleher approved these changes

View reviewed changes

Member

jeromekelleher left a comment

LGTM, let's squash and merge


          dataftame-effect-size

6f0d265

Generate a pandas dataframe to output simulated effect sizes

jeromekelleher added the AUTOMERGE-REQUESTED label

mergify bot merged commit 1601552 into tskit-dev:main

mergify bot removed the AUTOMERGE-REQUESTED label

jeromekelleher mentioned this pull request

Simulating multiple traits #54

Closed

daikitag deleted the dataframe-effect-size branch

July 15, 2023 11:20

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet