Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deep scan and shallow scan results are different. #67

Closed
vrajat opened this issue Feb 14, 2020 · 2 comments
Closed

Deep scan and shallow scan results are different. #67

vrajat opened this issue Feb 14, 2020 · 2 comments

Comments

@vrajat
Copy link
Member

vrajat commented Feb 14, 2020

Will this work ? Deep scan and shallow scan results are different.
deepscan.txt
shallowscan.txt

Originally posted by @jayeshagwan1 in #63 (comment)

@vrajat
Copy link
Member Author

vrajat commented Feb 14, 2020

It is not surprising that deep and shallow scan show different results. Shallow scan only looks at column names. Deep scan looks at a sample of the data. I've even noticed that two different runs of deep scan show different results as sample rows are different. This is the challenge with not scanning all of the data. Its a trade-off between performance/cost and accuracy. There is no right answer.

W.R.T the output in particular, my observations are:

  1. Shallow scan should recognize phone, credit card, person and location from column names
  2. Deep scan did not recognize PII in a few columns. I need to look at the data to figure out if thats a bug or the column did not have any relevant data.
  3. Deep scan should also scan column names for candidates
  4. Along with an array, PIICatcher should add confidence numbers.

@vrajat
Copy link
Member Author

vrajat commented Apr 14, 2020

I am closing this issue as I have created specific issues for all the gaps that were identified.

@vrajat vrajat closed this as completed Apr 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant