Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance #149

Merged
merged 6 commits into from
Oct 12, 2019
Merged

Improve performance #149

merged 6 commits into from
Oct 12, 2019

Conversation

dineshba
Copy link
Collaborator

Co-authored-by: @aswinkarthik

We introduced go routines in pattern detector and file content detector to
make things in parallel.

Results:

For a codebase which had 600 newly added files

Before our change:

time ~/.talisman/bin/talisman_darwin_amd64 --githook pre-commit >/dev/null

real    0m14.029s
user    0m12.035s
sys     0m2.216s

After our change:

time ./talisman_darwin_amd64 --githook pre-commit >/dev/null
real    0m7.797s
user    0m19.857s
sys     0m2.336s

Method level metrics:

Pattern Detector:
    Before: 5.498711305s
    After: 1.898453802s

Content Detector:
    Before: 639.769275m
    After: 221.733402ms

dineshba and others added 2 commits October 11, 2019 23:17
Co-authored-by: Aswin Karthik <aswinkarthik93@gmail.com>
Signed-off-by: Dinesh <dineshudt17@gmail.com>
Co-authored-by: Aswin Karthik <aswinkarthik93@gmail.com>
Signed-off-by: Dinesh <dineshudt17@gmail.com>
detector/pattern_detector.go Outdated Show resolved Hide resolved
detector/pattern_detector.go Outdated Show resolved Hide resolved
Signed-off-by: Dinesh <dineshudt17@gmail.com>
Co-authored-by: Aswin Karthik <aswinkarthik93@gmail.com>
Comment on lines 95 to 113
}
contents <- content{
name: addition.Name,
path: addition.Path,
contentType: base64Content,
results: fc.detectFile(addition.Data, checkBase64),
}
contents <- content{
name: addition.Name,
path: addition.Path,
contentType: hexContent,
results: fc.detectFile(addition.Data, checkHex),
}
contents <- content{
name: addition.Name,
path: addition.Path,
contentType: creditCardContent,
results: fc.detectFile(addition.Data, checkCreditCardNumber),
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lot of repeated code here.
I wonder if repetition could be avoided ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 96 to 112, we are just creating 3 structs and sending to channel. I am not very sure how to reduce that. If you have any idea, please let me know.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored this. Please review it

Signed-off-by: Dinesh <dineshudt17@gmail.com>
Signed-off-by: Dinesh <dineshudt17@gmail.com>
Signed-off-by: Dinesh <dineshudt17@gmail.com>
@svishwanath-tw svishwanath-tw merged commit 98e9b83 into thoughtworks:master Oct 12, 2019
@dineshba
Copy link
Collaborator Author

This will help to close this #152
Thanks @svishwanath-tw for merging

@dineshba dineshba deleted the improve_performance branch October 12, 2019 19:47
@karuppiah7890
Copy link

karuppiah7890 commented Oct 13, 2019

I haven't checked the PR completely. But from an overview, I'm just curious about tests for this PR. Tests for the happy path and to show that performance has increased. Like, how do I say if my performance has increased by this PR, and how do I know if it gets decreased by some other code later? I haven't done such testing before, just curious. Some ideas - probably do some profiling? Or use some other parameter that relates to performance and check it?

cc @svishwanath-tw @dineshba @aswinkarthik

@svishwanath-tw
Copy link
Collaborator

@karuppiah7890 are you saying we need to add tests to the test suite ?

@karuppiah7890
Copy link

If it's possible, yes. I mean, I would need some measure to say the performance has increased, it can be in any form. Numbers always speak for themselves. Or else we will never know how much performant the code is compared to before. And such a metric would be useful in the future too

@svishwanath-tw
Copy link
Collaborator

@svishwanath-tw
Copy link
Collaborator

https://tour.golang.org/concurrency/4
According to this article: for ... range will also work in the scenario.

@dineshba
Copy link
Collaborator Author

https://tour.golang.org/concurrency/4
According to this article: for ... range will also work in the scenario.

We are reading from two channels simultaneously. So we have to use select. Refer https://stackoverflow.com/questions/20593126/reading-from-multiple-channels-simultaneously-in-golang

@svishwanath-tw
Copy link
Collaborator

@dineshba : The two processing methods seem to have the same signature.
Why not extend the contentType abstraction to process ignored content too and just have one processor ?
The advantage is that you can get away with 1 channel right ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants