Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Telephone number format not de-identified #61

Closed
sandertan opened this issue Oct 18, 2022 · 1 comment
Closed

Telephone number format not de-identified #61

sandertan opened this issue Oct 18, 2022 · 1 comment

Comments

@sandertan
Copy link
Contributor

The telephone number format 06 18 34 56 78 is currently not de-identified with DEDUCE. I suspect this format is not uncommon, so it might be worth adding functionality for it to DEDUCE.

import deduce
text = u"De patient J. Jansen (e: j.jnsen@email.com, t: 06 18 34 56 78) is 64 jaar oud."
print(deduce.deidentify_annotations(deduce.annotate_text(text)))

De <PERSOON-1> (e: <URL-1>, t: 06 18 34 56 78) is <LEEFTIJD-1> jaar oud.

Interestingly, the first 4 numbers can be recognized as date:

import deduce
text = u"De patient J. Jansen (e: j.jnsen@email.com, t: 06 12 34 56 78) is 64 jaar oud."
print(deduce.deidentify_annotations(deduce.annotate_text(text)))

De <PERSOON-1> (e: <URL-1>, t: <DATUM-1> 56 78) is <LEEFTIJD-1> jaar oud.

@vmenger
Copy link
Owner

vmenger commented Aug 1, 2023

Fixed in #89

@vmenger vmenger closed this as completed Aug 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants