Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iedorep: Prototyping detection of commands/lines generating irreproducibility #259

Closed
4 of 9 tasks
luisesanmartin opened this issue Nov 1, 2021 · 0 comments · Fixed by #261
Closed
4 of 9 tasks
Assignees
Labels
new command resolved but not yet published Issue is fixed, but not yet published on SSC

Comments

@luisesanmartin
Copy link
Member

luisesanmartin commented Nov 1, 2021

Objective

To detect in which line a dofile stops having the same partial result (intermediate dataset) and random number order when run two times.

Inputs and outputs

  • Input: a Stata dofile
  • Output: a message in the console that tells you

Outline of the idea

  • Take a dofile as input, run it and save the data signature and random number state after every line of code, run it again and compare by line that the data signature and random are the same. If they're not, it means that the result stops being reproducible in the first line where they diverge.
  • The intended use of this command is to be run after iesave detects that there are changes in the final result of the dofile with respect to the previous result.

Outline of possible tasks

  1. Input dofile modification:
  • Add an empty line after every line of the dofile with code
  • Obtain the data signature and random number state of the dataset in these new empty lines
  • Save the line number, data signature, and random number state in a temporary data table
  1. Data signature and random number state comparison:
  • In the second run of the dofile, compare the data signature and random number state with their corresponding values saved in the first run.
  • Stop if any of the numbers differ and show a message saying in which line the discrepancy was found

Decision tree

We track changes on:

  • Data signature (data changed)
  • RNG (random number generator state)
  • Sort RNG (data sorted state)

Test that:

  • If RNG advanced: flag, but optional
  • If RNG advanced to a different place as in the first run: error!
  • If Sort RNG advanced: flag, but optional
  • If the data signature is changing: error!
    • This is not tracking changes in sort order
@bbdaniels bbdaniels linked a pull request Nov 9, 2021 that will close this issue
21 tasks
@luisesanmartin luisesanmartin changed the title Prototyping detection of commands/lines generating irreproducibility [iedorep] Prototyping detection of commands/lines generating irreproducibility Jan 28, 2022
@bbdaniels bbdaniels added the resolved but not yet published Issue is fixed, but not yet published on SSC label Jun 7, 2022
@bbdaniels bbdaniels changed the title [iedorep] Prototyping detection of commands/lines generating irreproducibility iedorep: Prototyping detection of commands/lines generating irreproducibility Dec 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new command resolved but not yet published Issue is fixed, but not yet published on SSC
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants