New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed ups #402
Speed ups #402
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking good, but a few things to discuss.
…unky and now names are broken?
op = envi.Emulator.parseOpcode(self, va, arch=arch) | ||
self.opcache[va] = op | ||
return op | ||
return self.vw.parseOpcode(va, arch=arch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't like this. the extra call does cost performance-wise. this is directly in the critical path for emulation, which needs to be extremely fast.
i get the idea of code-reuse. however, in this case, i think we're better off maintaining the separate code (or inheriting the function somehow). i don't think there is any function called more often than this one.
if you're interested in getting the value of the vw's opcache, you can make them the same dict in __init__()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't have a strong position on this change either way; however...
most respectfully, i don't find this performance argument very convincing without any numbers/profiling.
python, as a dynamically interpreted scripting language, doesn't have the same icache/dcache characteristics of native code. my intuition is that a single call
python opcode won't affect performance that much; more importantly, i really have no idea if this intuition is true or not - hence, i'd want to see numbers to back up any performance argument like this.
if performance is critical here, then there are other optimizations to be made as well, such as maintaining a local reference to parseOpcode
to avoid the module lookup to envi.Emulator
(at least two call
s & hash table lookups). and even then, we need profiling to guide us to what actually matters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of this change was to present the same interface, since we expect vw to always be populated in an emulator instance.
However, I did do profiling to guide this change:
On master:
Timer unit: 1e-06 s
Total time: 21.3873 s
File: /home/james/work/vivisect/vivisect/__init__.py
Function: parseOpcode at line 1113
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1113 @profile
1114 def parseOpcode(self, va, arch=envi.ARCH_DEFAULT):
1115 '''
1116 Parse an opcode from the specified virtual address.
1117
1118 Example: op = m.parseOpcode(0x7c773803)
1119
1120 note: differs from the IMemory interface by checking loclist
1121 '''
1122 221791 476526.0 2.1 2.2 off, b = self.getByteDef(va)
1123 221791 129628.0 0.6 0.6 if arch == envi.ARCH_DEFAULT:
1124 118660 333287.0 2.8 1.6 loctup = self.getLocation(va)
1125 # XXX - in the case where we've set a location on what should be an
1126 # opcode lets make sure L_LTYPE == LOC_OP if not lets reset L_TINFO = original arch param
1127 # so that at least parse opcode wont fail
1128 118660 66195.0 0.6 0.3 if loctup is not None and loctup[L_TINFO] and loctup[L_LTYPE] == LOC_OP:
1129 118036 47468.0 0.4 0.2 arch = loctup[L_TINFO]
1130
1131 221791 20334217.0 91.7 95.1 return self.imem_archs[(arch & envi.ARCH_MASK) >> 16].archParseOpcode(b, off, va)
Total time: 15.1873 s
File: /home/james/work/vivisect/vivisect/impemu/emulator.py
Function: parseOpcode at line 240
Line # Hits Time Per Hit % Time Line Contents
==============================================================
240 @profile
241 def parseOpcode(self, va, arch=envi.ARCH_DEFAULT):
242 # We can make an opcode *faster* with the workspace because of
243 # getByteDef etc... use it.
244 152977 127568.0 0.8 0.8 op = self.opcache.get(va)
245 152977 61823.0 0.4 0.4 if op is None:
246 152977 14838736.0 97.0 97.7 op = envi.Emulator.parseOpcode(self, va, arch=arch)
247 152854 112640.0 0.7 0.7 self.opcache[va] = op
248 152854 46491.0 0.3 0.3 return op
this branch:
Timer unit: 1e-06 s
Total time: 20.7949 s
File: /home/james/work/vivisect/vivisect/__init__.py
Function: parseOpcode at line 1115
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1115 @profile
1116 def parseOpcode(self, va, arch=envi.ARCH_DEFAULT, skip=False):
1117 '''
1118 Parse an opcode from the specified virtual address.
1119
1120 Example: op = m.parseOpcode(0x7c773803, skip=True)
1121
1122 Set skip=True in order to bypass the opcode cache and force a reparsing of bytes
1123 '''
1124 353264 792157.0 2.2 3.8 off, b = self.getByteDef(va)
1125 353264 204849.0 0.6 1.0 if arch == envi.ARCH_DEFAULT:
1126 271637 851600.0 3.1 4.1 loctup = self.getLocation(va)
1127 # XXX - in the case where we've set a location on what should be an
1128 # opcode lets make sure L_LTYPE == LOC_OP if not lets reset L_TINFO = original arch param
1129 # so that at least parse opcode wont fail
1130 271637 166207.0 0.6 0.8 if loctup is not None and loctup[L_TINFO] and loctup[L_LTYPE] == LOC_OP:
1131 248398 106030.0 0.4 0.5 arch = loctup[L_TINFO]
1132 353264 139779.0 0.4 0.7 if not skip:
1133 353264 233272.0 0.7 1.1 key = (va, arch, b[:16])
1134 353264 355915.0 1.0 1.7 valu = self._op_cache.get(key, None)
1135 353264 274178.0 0.8 1.3 if not valu:
1136 172663 17248817.0 99.9 82.9 valu = self.imem_archs[(arch & envi.ARCH_MASK) >> 16].archParseOpcode(b, off, va)
1137 353141 285284.0 0.8 1.4 self._op_cache[key] = valu
1138 353141 136858.0 0.4 0.7 return valu
1139 return self.imem_archs[(arch & envi.ARCH_MASK) >> 16].archParseOpcode(b, off, va)
Total time: 4.04843 s
File: /home/james/work/vivisect/vivisect/impemu/emulator.py
Function: parseOpcode at line 239
Line # Hits Time Per Hit % Time Line Contents
==============================================================
239 @profile
240 def parseOpcode(self, va, arch=envi.ARCH_DEFAULT):
241 152977 4048431.0 26.5 100.0 return self.vw.parseOpcode(va, arch=arch)
So about 50% more hits to the parseOpcode in the VivWorkspace object, but comparable performance, but faster emulator performance, mostly because the opcache in the workspace isn't local to each emulator instance (and actually gets to persist)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(and this was just and example done on the gcc-7 binary in our test files)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair enough.
i'm confused about the numbers. how did you have such wildly different numbers in the top parts, but matching on the bottom parts: 152977 ? how are you caching the vw.parseOpcode versus the emulator in the comparison? i'm guessing we're gaining so much because of the method of caching.
i'll get over it, but i've had it pounded into me that calls are not cheap.
vivisect/__init__.py
Outdated
''' | ||
Parse an opcode from the specified virtual address. | ||
|
||
Example: op = m.parseOpcode(0x7c773803) | ||
Example: op = m.parseOpcode(0x7c773803, skip=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you change these comments from skip=
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nm. i got it.
update comment to follow the api changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work, @rakuy0
3 main things in this PR:
The first two I can rip out as need be. They're helpful but I'm no beholden to them. The pointers one though I feel is fairly important since it has a huge effect on performance.