-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
generate() should default to providing numbers from the unallocated development / test range #24
Comments
It's a fair point, although it means that the default behaviour of |
I'd have thought it would be clear enough if the usage documentation clearly stated that by default numbers are generated in the synthetic range, unless explicitly specified. |
I disagree. There's a perfectly usable |
My proposal is about helping to protect users from generating random live NHS numbers for testing unless they explicitly need to. The parameter is only as good as the people who use it. If it is easier not to use it, users will generally not. It's perfectly pythonic to apply a default keyword parameter in absence of a user-provided one. In this case that would be a default of I have experienced several incidents over the years of people using live NHS numbers for testing, and for various reasons those then finding their ways into places they shouldn't and causing confusion / data issues. What is the use case for generating a random number from all regions that couldn't be met with an explicit |
Then why not just declare and export
In my experience, the supplied arguments that are defaulted tend to be numbers and Booleans, not more complex objects
Your problem there is people testing on live systems, not code that generates NHS Numbers.
Or simply create a function called One could invert your logic and ask "What is the use case for generating a random number from all regions that couldn't be met with an explicit Changing the current behaviour would also be a breaking change, in my opinion, so there's another reason for keeping it as it is and extending if required |
I don't really understand this point I'm afraid - but I infer this wasn't a serious suggestion.
I agree they tend to be static values - I think constants are fine personally - but I can't question your personal experience.
Yes - the problem is generally and very often how people use things, not the things themselves. Generally, this is not an argument for not helping users do sensible things. We cannot predict how people might use this library to generate numbers, in particular simply as a tool to generate test numbers in large quantities etc.
That's a good alternative suggestion. It does add complexity in a different way, but could be documented clearly and give people an easy way of generating numbers that they know can never be live.
Sorry I don't understand this.
I agree with the general principle about avoiding breaking changes. The package was only published today - I expected the usage to be very low on that basis. |
If you're concerned that users will generate NHS numbers that are potentially live numbers, then why not mandate that they just use a single pre-defined value to test the systems that you're responsible for. Why do they need more than one? Why do they need anything that might be a live number? It's a slightly tongue-in-cheek suggestion, yes, but I'm gently highlighting how far you could go with the "won't someone think of the poor defenceless naive coders" argument |
A breaking change is a breaking change |
Again - I'm sorry, but I'm not clear as to your point here. If there was ever a time to make breaking changes with less impact, it's before anyone is using the package - surely. This is a distraction from the original question.
There are surely imaginable scenarios for using multiple validly-formatted NHS numbers for testing purposes. That's not to say that many testing use cases couldn't be met with a single allocated number.
Generally, this is not a satisfactory argument for not addressing any potential harms. This is ultimately a risk mitigation argument. |
@andylaw I feel strongly that software in healthcare should help people do good and sensible things. I do not feel strongly about the implementation detail. I'll leave this with you now, as it would appear you do feel incredibly strongly both about the implementation details and about the idea of compromising the code for a "non-issue". |
It was a clumsy, and logically wrong as I have written it, attempt to point out that your request to generate NHS numbers in a particular block could equally be met by specifying a range. Which it could be. What we're differing on is way that we instinctively interpret the "natural" output of the function. I've been working on systems that span the entire UK. I haven't used the generator, but I have use cases for sets of NHS numbers that trigger different actions depending on the region that they come from. In my mind, specifying a region restricts the numbers generated. Thus, not specifying a region should produce numbers from the entire spectrum of possibilities |
Sorry I feel like I'm taking over your issues section. Hope you don't mind me contributing to this conversation as well. The current behaviour of this is that it creates an NHS number from the full range, meaning it could very well generate one from the synthetic range anyway, or it might not. My feeling is, regardless of generating live numbers, being explicit about the region you would like to test is preferable. In terms of the argument for_region, the default is currently None and there is no need to change it. Instead, in the generate function you just need to change ranges = [FULL_RANGE] to ranges = [RANGE_NOT_ISSUED_SYNTHETIC] If it was me, I would be perfectly okay knowing that by default any number generated wasn't going to get confused for a live number and would see this as a security improvement. I have also experience live test numbers getting into places they shouldn't be, and it's a pain. Making a user supply a specific region just means they are forced to make a conscious decision to dice with potentially live numbers. You aren't preventing that behaviour, but you are making them consider their actions. Seeing as you aren't changing the function call and only slightly changing the ambiguity of the output, I wouldn't think this qualifies as a breaking change. In terms of why you might want to create more than one NHS number for testing, there are plenty of good use cases. We have systems that prevent users inputting an existing NHS number, we model family connections, and for load testing. I'm sure there are a host of other reasons. My personal feelings aside, to implement the CHI number checks (issue 25) and have the valid argument of this function behave correctly, you have to lock the Scottish range down to only valid dates. Letting it randomly draw numbers from the Scottish range will mean it produces invalid CHI numbers. My suggestion would be to check the region, if it's Scotland build a random date, build the rest of the number, get the correct check sum, and return that. This means the existing behaviour for generating an invalid number would remain untouched. If this is agreeable, I can make a PR that deals with both issues. I don't think it's possible/easy to have one behaviour without the other. |
A PR that fixes both issues would be most welcome. I agree that number generation should by default come from the unallocated range to avoid accidentally using live numbers in code, tests or documentation, which could be seen as a breach of privacy for the individual whose number it was, even though it was inadvertent, unintentional, and didn't leak any PID. It's avoidable and so we should avoid it. Having fully functional CHI number validation and generation would be a major boost to the library so thank you for offering to add this. I have colleagues in SCIMP and other Scottish government healthcare IT service who would be glad we have solved this. |
Should generate() default to generating numbers that are from the dedicated test range? I.e. 900000000n - 999999999n
It would still create valid NHS numbers, but would remove the risk of using a live NHS number as a test number.
If we do feel there is a need for generating live numbers from specific regional ranges, then that should only happen by specifying a region parameter
The text was updated successfully, but these errors were encountered: