Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CHI number validation #25

Open
mattstibbs opened this issue Jun 13, 2023 · 14 comments
Open

CHI number validation #25

mattstibbs opened this issue Jun 13, 2023 · 14 comments
Labels
enhancement New feature or request

Comments

@mattstibbs
Copy link

mattstibbs commented Jun 13, 2023

CHI numbers conform to the standard NHS validation format, but have an additional requirement that the first 6 digits must be a valid date of birth (DDMMYY).

The validate functions should identify that a provided number falls into the Scotland range, and then apply the additional validation checks.

https://en.wikipedia.org/wiki/National_Health_Service_Central_Register#Community_Health_Index

Patients are identified using a ten-digit number known as the CHI Number. This number is normally formed using the patient's date of birth (as DDMMYY), followed by four digits: two digits randomly generated, the third digit identifying gender (odd for men, even for women) and a check digit.

This new validation should then be applied to the generate() function when a user passes in for_region=nhs_number.REGION_SCOTLAND so that only a valid CHI is generated.

@mattstibbs mattstibbs changed the title Implement CHI number validation CHI number validation Jun 13, 2023
@pacharanero
Copy link
Contributor

Thanks @mattstibbs we would definitely like to add this additional validation, we are open to a PR 😄

@pacharanero pacharanero added the enhancement New feature or request label Jun 13, 2023
@mattstibbs
Copy link
Author

The validate functions should identify that a provided number falls into the Scotland range, and then apply the additional validation checks.

Question: is this the right behaviour?

i.e. is_valid called on a number that falls within the CHI range will automatically apply the additional CHI validation in order to decide whether the number is valid.

Alternatively, should CHI validation be an additional validation applied somehow, leaving is_valid to validate only that the number falls into the correct format with a valid checksum?

@pacharanero
Copy link
Contributor

pacharanero commented Jun 14, 2023

I think perhaps if/when we get around to adding strict CHI number validation, this could be something like

>>> import nhs_number

>>> nhs_number.is_valid("1406230002",
                     for_region=nhs-numberREGION_SCOTLAND,
                     sex="Male"
                     strict_chi=True)
True

so that the strict_chi=True flag adds:
strict CHI validity checking with DOB as plausible DDMMYY date and that the sex accords with the odd/even sex digit.

for_region continues to purely check the region/range of numbers.

We would just document clearly that this is what you would need to do in order to check a CHI number properly. For additional syntactic sugar we could create a convenience function is_valid_chi_strict(chi_number: str) function which wrapped the above is_valid syntax and added the CHI related parameters.

This os just one of many ways to implement, and I am happy for the implementer to add their own flavour.

@andylaw
Copy link
Collaborator

andylaw commented Jun 14, 2023

It would be very easy to overthink this around the phrase "plausible" date of birth

For example - 2904240012.

Not valid if the date of birth is 1924, but valid if it is 2024, but that's in the future so...??

Lots of glorious rabbit-holes to disappear down :)

@mattstibbs
Copy link
Author

mattstibbs commented Jun 14, 2023

@pacharanero Just so I'm clear, would you suggest that we keep the default behaviour of is_valid to ignore the CHI-specific DOB validation, even for numbers that fall within the CHI range?

And then add the option to apply strict CHI validation to is_valid and add a new method is_valid_chi_number as a wrap around is_valid which applies any additional CHI validation logic we want to specify

@pacharanero
Copy link
Contributor

Yes, that's what I'm suggesting, however I'm open to views, especially from anyone actively validating CHI numbers.

I am given to understand that Python prefers 'explicit' over 'implicit' so having explicit flags for strict CHI checking seems more Pythonic? They don't like 'magic' do they, Python people 😉. Rubyists like me, we love magic.

I have some colleagues in GP in Scotland and I will ask them to have a look and give their views. In general I'm keener to see working code than a lot of worrying about how to implement. It'll be fine as long as it works and is documented.

@andylaw
Copy link
Collaborator

andylaw commented Jun 14, 2023

They don't like 'magic' do they, Python people 😉.

Hate it. Mind you, I used to be (still am, under the hood) a Perl person ("There's more than one way to do it")

@pacharanero
Copy link
Contributor

(still am, under the hood) a Perl person ("There's more than one way to do it")

You're probably more of a Rubyist than you think then! Ruby gives you all the ways to do it, and all the footguns - you as the user need to know where to point the gun :-)

@andylaw
Copy link
Collaborator

andylaw commented Jun 14, 2023

You're probably more of a Rubyist than you think then!

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-10-221

@mattstibbs
Copy link
Author

I think the key is probably agreeing on what we mean by a 'valid' number. i.e. is a valid number just conformant with the general format and checksum, or is it only valid if it could feasibly be issued etc.

There is probably an alternative argument to say that is_valid should reflect the rules we've documented in the library and the rules that people will encounter in real life, and that someone could opt to loosen the validation i.e. with an ignore_chi_validation flag.

i.e. is this simple a format validation library, or is it a helper library for the way things are in reality

@VJ911
Copy link

VJ911 commented Jun 18, 2023

Practically, you know whether it is a CHI number or H&C number based upon the range, so you shouldn't need additional arguments.

010 100 0000 to 311 299 9999 (used for CHI numbers in Scotland)
320 000 0001 to 399 999 9999 (allocated to the Northern Ireland H&C)
400 000 0000 onwards for England & Wales

Logic as follows: if cast to int of left 6 digits < 320,000 then check that if datetime.strptime throws an error.

@andyatterton
Copy link

Hey guys. Been looking into your package as a replacement for our own in house solution, and I was curious if this conversation was resolved.

If you don't mind me making a suggestion, I think I would take a binary approach and the number is either valid or it is not. If the date section isn't possible it's not valid and as such I would add the check to the is_valid function using a check for region, not give CHI numbers their own is valid wrapper.

We are currently taking the approach mentioned by @VJ911 where we check for an error from strptime which I believe takes care of any leap year oddness. I would be of the opinion that to be a valid CHI number it must be in the range of CHI numbers and the first 6 digits must make up a valid date.

Something along the lines of

if for_region == 'RANGE_SCOTLAND':
            try:
            # Should start with ddmmyy date
                datetime.strptime(nhs_number[:6], "%d%m%y")
            except ValueError:
                return False

You could have is_valid() return a NhsNumber object (details.py) that had some extra attributes to cover the information supplied with CHI numbers. Valid is a current attribute, but you could include an in_range attribute so that CHI number in range but with impossible dates would be
{valid: False, in_range: True}

Could also return the sex rather than asking a user to specify it when calling is_valid so the user can run their own checks against that. On top of that, you could include a faliure_message to provide some feed back about the impossible date being the issue and any other issues that you might encounter during validation.

{valid: False, in_range: True, sex: 'male', faliure_message: 'In CHI number range but date section impossible' }

The major draw back would be that this would all constitute breaking changes if applied to is_valid() so it might need a detailed_is_valid() function or something 🤷‍♂️

@pacharanero
Copy link
Contributor

Thanks @andyatterton for this, very valid points made and I think we should finish off the CHI implementation so it works for real-life use-cases such as yours. I don't think we have a huge user base of the library at this stage, so a breaking change that makes is_valid more accurate for real-world usage of CHI numbers is unlikely to upset anyone.

If anyone is up for making the update and creating a PR, I think we would be amenable to accepting that. If not, I will get around to this when I get around to it.

@andyatterton
Copy link

I opened a PR but aimed it at main instead of dev and didn't get round to mentioning it here. I haven't had chance to check back until now so apologise for that. I think you can remove #37 and hopefully #38 will answer this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants