Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changed random state definition as a temp fix... issue stemming from … #192

Merged
merged 1 commit into from
May 23, 2020

Conversation

ZainNasrullah
Copy link
Collaborator

As per our discussion, minor fix which resolves failure of running compare_all_models.py

The true issue seems to be in utility.py, where the sample_without_replacement function (called in generate_indices) produces inconsistent results with np.random.RandomState(42). Take a deeper look at this interaction to solve permanently.

…interaction between np.Random.RandomState and sample_without_replacement()
@coveralls
Copy link

coveralls commented May 18, 2020

Pull Request Test Coverage Report for Build 1495

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.02%) to 95.621%

Totals Coverage Status
Change from base Build 1476: -0.02%
Covered Lines: 4411
Relevant Lines: 4613

💛 - Coveralls

@Shihab-Shahriar
Copy link

Shihab-Shahriar commented May 19, 2020

If the problem is in sample_without_replacement function, can't it be simply removed from generate_indices? Something like below:

def generate_indices(random_state, bootstrap, n_population, n_samples):
    return random_state.choice(n_population,size=n_samples,replace=bootstrap)    

Couldn't test this, so sorry if I'm missing something.

@ZainNasrullah
Copy link
Collaborator Author

If the problem is in sample_without_replacement function, can't it be simply removed from generate_indices? Something like below:

def generate_indices(random_state, bootstrap, n_population, n_samples):
    return random_state.choice(n_population,size=n_samples,replace=bootstrap)    

Couldn't test this, so sorry if I'm missing something.

Hey Shihab, as you mentioned, this is a potential fix. However, it is a good idea for us to understand what the underlying problem is between sample_without_replacement and RandomState since it can have implications in other contexts.

Additionally, an issue with the proposed fix is that the API for these methods allows random_state to be an integer or a RandomState object; if the user passes in an integer, the proposed code will eventually hit an error. Although we can manually force the int to RandomState, I'll defer to @yzhao062 on design decisions which can have impacts beyond the scope of this model.

@Shihab-Shahriar
Copy link

If I understood correctly, current implementation forces random_state to be a RandomState object. This can be, I think, relaxed by adding random_state = check_random_state(random_state) line at the beginning.

I agree that investigating first why this error is thrown is a good idea.

@ZainNasrullah
Copy link
Collaborator Author

ZainNasrullah commented May 19, 2020

Taking a look at the implementation, both the fit and generate_bagging_indices methods call check_random_state(self.random_state) and return a RandomState object. For reasons I don't understand yet, this RandomState object seems to behave differently within sample_without_replacement when it is defined in compare_all_models.py versus when it is created by check_random_state.

You are correct though, your proposed fix would work but we would need to ensure that, in all uses of generate_indices, random_state is always input as a RandomState object or it gets converted into one.

@yzhao062 yzhao062 merged commit 5bcd52c into development May 23, 2020
@yzhao062 yzhao062 deleted the issue-180-temp-fix branch May 23, 2020 20:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants