Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store PopulationSizeHistory as a json list #274

Closed
hyanwong opened this issue Jun 15, 2023 · 5 comments · Fixed by #283
Closed

Store PopulationSizeHistory as a json list #274

hyanwong opened this issue Jun 15, 2023 · 5 comments · Fixed by #283

Comments

@hyanwong
Copy link
Member

hyanwong commented Jun 15, 2023

It would be nice to store the passed-in PopulationSizeHistory object in the provenance of a tsdated tree sequence. I don't think it is a very large object, right? It just consists of 2 arrays, neither of which is likely to be millions of numbers long. Any reason why not to store it?

@hyanwong
Copy link
Member Author

Sorry - forgot to ping @nspope : mainly a question for him.

hyanwong added a commit to hyanwong/tsdate that referenced this issue Jun 15, 2023
hyanwong added a commit to hyanwong/tsdate that referenced this issue Jun 15, 2023
@nspope
Copy link
Contributor

nspope commented Jun 15, 2023

I think this is a good idea! Not too large, it should be two smallish 1D arrays of breakpoints and sizes respectively.

@hyanwong
Copy link
Member Author

hyanwong commented Jun 20, 2023

One problem is that we can't store a Python class in the JSON format required by the provenance spec.

One way around this, as long as we are content with piecewise population size changes (I'm not sure if this is valid for variational inference) would be to allow the population_size parameter to take either a number, a PopulationSizeHistory object, or a list of 2 lists, which are passed as the parameters to PopulationSizeHistory, e.g.

date(ts, population_size=[[1000, 2000, 3000], [500, 2500]])

which would be equivalent to

date(ts, population_size=tsinfer.PopulationSizeHistory(*[[1000, 2000, 3000], [500, 2500]]))

Then either invocation would result in a provenance like

{
  "parameters": {
    "command": "date",
    "population_size": [[1000.0, 2000.0, 3000.0], [500.0, 2500.0]],
  }
}

What do you think @nspope ? Is it too ugly allowing a list of lists as an additional population_size option? How would it fit with the variational inference. I assume that since we haven't actually released a new version, we're free to change the API etc as we wish.

@hyanwong
Copy link
Member Author

Or actually, it might be clearer as a dictionary that is unpacked into the PopulationSizeHistory constructor:

date(ts, population_size={"population_size": [1000, 2000, 3000], "time_breaks":[500, 2500]})

saved in provenance as

{
  "parameters": {
    "command": "date",
    "population_size": {"population_size":[1000.0, 2000.0, 3000.0], "time_breaks":[500.0, 2500.0]}
  }
}

@nspope
Copy link
Contributor

nspope commented Jun 21, 2023

I think the dict option is cleaner. Wrt how variable population size interfaces with the variational algorithm-- it should work perfectly fine with the current implementation. This might change depending on how the prior calibration is fixed (still a work in progress).

hyanwong added a commit to hyanwong/tsdate that referenced this issue Jun 21, 2023
hyanwong added a commit to hyanwong/tsdate that referenced this issue Jun 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants