Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

you might not like this pull request ... #2

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

vanhauser-thc
Copy link

because it eliminates most parts of your code.

I did this because of
#1
which led me to DynamoRIO/dynamorio#2919

and boiled down to this solution which makes it work on all platforms and is even a bit faster.

plus the locations are not reported in hex not decimal.

and -txt command line option added.

@toshipiazza
Copy link
Owner

Thanks for the PR! I wrote this a long time ago when I was first getting my feet wet with DynamoRIO, so I suspect that there are many improvements that could be made. This is a big PR so I'll be taking a look at it over the course of a few days :)

Copy link
Owner

@toshipiazza toshipiazza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some small nits and formatting concerns, but I'm more concerned about correctness right now. Does this work?

@@ -1,7 +1,7 @@
cmake_minimum_required (VERSION 3.0)
project (cfg)

add_library(cfg SHARED cfg.cpp cfg_impl.cpp cbr.cpp cti.cpp app.cpp)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why both of these are gone. On ARM I wouldn't expect cbr.cpp to work (it's a special case of branches for which all branch targets are known statically, inspired by a sample in the DR repo, cbr.c), but why are you deleting cti.cpp? cti.cpp is supposed to cover a superset of what cbr.cpp covers. Maybe we can simply disable cbr.cpp for ARM and let cti.cpp take over?

event_app_instruction(void *drcontext, void *tag, instrlist_t *bb, instr_t *instr,
bool for_trace, bool translating, void *user_data)
{
per_thread_t *data = (per_thread_t *) drmgr_get_cls_field(drcontext, tcls_idx);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is trying to re-implement cbr.cpp. I would not recommend this, as it'll probably be easier to fix cbr.cpp and adapt it for ARM (and potentially re-contribute the fix towards the cbr.c sample in the DynamoRIO repo which should exhibit the same problems).

If you do not have the time for this, I can take a stab at it within the next week or so.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing--I'm wondering if this works in general. DynamoRIO fires an event_app_instruction event the first time it sees an instruction, but not subsequently. Consider the following control flow:

  a
 / \
b   c
 \ /
  d

It's a simple if-then-else statement in C. However, if we record the edge b-d, then on a subsequent execution from path a-c-d I wouldn't expect the DynamoRIO event to fire when we execute the instructions in basic block d. Does this work for you?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just reading this after reading & answering your email - just returned from vacation :)

actually, you have a point that your a -> b|c ->d might only log one path. that might be the case.
I will investigate and give you a response on that. if it is like you think, then yes, my approach is broken. if it work like I hope or thought it would work, then it would be the easiest solution.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, done.

I did a comparison to drcfg in your repository (called "before.txt") and my cloned repository with the changes (called "after.txt"). I disabled random address space, and traced "unrar -inul p test.rar"

# ls -l after.txt before.txt
-rw-r--r-- 1 root root 324991 Apr 16 16:58 after.txt
-rw-r--r-- 1 root root 325064 Apr 16 16:58 before.txt
# wc -l after.txt before.txt
 20320 after.txt
 20325 before.txt
 40645 total

this looks like 99% similar results.

if your assumption would hold true, then for every source, there would only be one target recorded. however I find in the my patched results even results with multiple entries, e.g.:

      "address": 140737351486947,
      "targets": [
        140737324874432,
        4205056,
        140737333003728,
        140737329276560,
        140737327079680
      ]

so it looks like my approach is working. I would feel more sure if you would verify with your own tests though :)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will take a look w/in the next few days :)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider the following code: cbr.zip

It has the diamond shape, like I mentioned above. This PR yields the following output:

{
  "branches": [
    {
      "address": 40008a,
      "targets": [
        40008c
      ]
    },
    {
      "address": 400081,
      "targets": [
        400085,
        400083
      ]
    }
  ]
}

Clearly we're missing one of the targets (to the done branch, I think).

@vanhauser-thc
Copy link
Author

dynamorio is jumping from instruction 0x400085 to 0x40008e without processing again the basic block 400087-40008A. and that is why 40008A=>40008E is missing.

the question is now why.

# wc -l after.txt before.txt
 20320 after.txt
 20325 before.txt

here it seems (that is my unrar test) that 5 branch destinations are lost which is 0.00025% . that is a very small number. so maybe could it be that dynamorio tries to be efficient and jumps over blocks if it is near the exit of the program? in other words maybe the destination would not be lost if the would be more branches after 0x40008E? still it would mean loosing a few branches (for me I would not mind as in the real world that is libc cleanup which is not interesting).

sigh it seems I have to dig deeper. if it is not like I hope, then it needs implementing the dr_insert_ ... function

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants