-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
you might not like this pull request ... #2
base: master
Are you sure you want to change the base?
Conversation
Thanks for the PR! I wrote this a long time ago when I was first getting my feet wet with DynamoRIO, so I suspect that there are many improvements that could be made. This is a big PR so I'll be taking a look at it over the course of a few days :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some small nits and formatting concerns, but I'm more concerned about correctness right now. Does this work?
@@ -1,7 +1,7 @@ | |||
cmake_minimum_required (VERSION 3.0) | |||
project (cfg) | |||
|
|||
add_library(cfg SHARED cfg.cpp cfg_impl.cpp cbr.cpp cti.cpp app.cpp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why both of these are gone. On ARM I wouldn't expect cbr.cpp to work (it's a special case of branches for which all branch targets are known statically, inspired by a sample in the DR repo, cbr.c), but why are you deleting cti.cpp? cti.cpp is supposed to cover a superset of what cbr.cpp covers. Maybe we can simply disable cbr.cpp for ARM and let cti.cpp take over?
event_app_instruction(void *drcontext, void *tag, instrlist_t *bb, instr_t *instr, | ||
bool for_trace, bool translating, void *user_data) | ||
{ | ||
per_thread_t *data = (per_thread_t *) drmgr_get_cls_field(drcontext, tcls_idx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this is trying to re-implement cbr.cpp. I would not recommend this, as it'll probably be easier to fix cbr.cpp and adapt it for ARM (and potentially re-contribute the fix towards the cbr.c sample in the DynamoRIO repo which should exhibit the same problems).
If you do not have the time for this, I can take a stab at it within the next week or so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing--I'm wondering if this works in general. DynamoRIO fires an event_app_instruction
event the first time it sees an instruction, but not subsequently. Consider the following control flow:
a
/ \
b c
\ /
d
It's a simple if-then-else statement in C. However, if we record the edge b-d
, then on a subsequent execution from path a-c-d
I wouldn't expect the DynamoRIO event to fire when we execute the instructions in basic block d
. Does this work for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just reading this after reading & answering your email - just returned from vacation :)
actually, you have a point that your a -> b|c ->d might only log one path. that might be the case.
I will investigate and give you a response on that. if it is like you think, then yes, my approach is broken. if it work like I hope or thought it would work, then it would be the easiest solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, done.
I did a comparison to drcfg in your repository (called "before.txt") and my cloned repository with the changes (called "after.txt"). I disabled random address space, and traced "unrar -inul p test.rar"
# ls -l after.txt before.txt
-rw-r--r-- 1 root root 324991 Apr 16 16:58 after.txt
-rw-r--r-- 1 root root 325064 Apr 16 16:58 before.txt
# wc -l after.txt before.txt
20320 after.txt
20325 before.txt
40645 total
this looks like 99% similar results.
if your assumption would hold true, then for every source, there would only be one target recorded. however I find in the my patched results even results with multiple entries, e.g.:
"address": 140737351486947,
"targets": [
140737324874432,
4205056,
140737333003728,
140737329276560,
140737327079680
]
so it looks like my approach is working. I would feel more sure if you would verify with your own tests though :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will take a look w/in the next few days :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider the following code: cbr.zip
It has the diamond shape, like I mentioned above. This PR yields the following output:
{
"branches": [
{
"address": 40008a,
"targets": [
40008c
]
},
{
"address": 400081,
"targets": [
400085,
400083
]
}
]
}
Clearly we're missing one of the targets (to the done branch, I think).
dynamorio is jumping from instruction 0x400085 to 0x40008e without processing again the basic block 400087-40008A. and that is why 40008A=>40008E is missing. the question is now why.
here it seems (that is my unrar test) that 5 branch destinations are lost which is 0.00025% . that is a very small number. so maybe could it be that dynamorio tries to be efficient and jumps over blocks if it is near the exit of the program? in other words maybe the destination would not be lost if the would be more branches after 0x40008E? still it would mean loosing a few branches (for me I would not mind as in the real world that is libc cleanup which is not interesting). sigh it seems I have to dig deeper. if it is not like I hope, then it needs implementing the dr_insert_ ... function |
because it eliminates most parts of your code.
I did this because of
#1
which led me to DynamoRIO/dynamorio#2919
and boiled down to this solution which makes it work on all platforms and is even a bit faster.
plus the locations are not reported in hex not decimal.
and -txt command line option added.