Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed ups #402

Merged
merged 17 commits into from Jun 12, 2021
Merged

Speed ups #402

merged 17 commits into from Jun 12, 2021

Conversation

rakuy0
Copy link
Contributor

@rakuy0 rakuy0 commented May 21, 2021

3 main things in this PR:

  • I wanted to bring up caching opcodes again since we do that all over the place and it would be good to try and make that as fast as possible without inducing a bunch of cache thrashing.
  • There are some experimental changes to symboliks that I wanted to discuss that hopefully speed things up.
  • Main speedup though is in vivisect/analysis/generic/pointers.py. The creation of pointers was O(n^2), which was causing all sorts of performance problems. So instead, I treat the pointers like a somewhat connected graph, since I realized pointers essentially form a bunch of trees. Didn't have to change any of the tests, worked fine, and from looking at the Elf tests in testelf.py, reduces analysis time anywhere from 20% to 80%, with an average of ~40%.

The first two I can rip out as need be. They're helpful but I'm no beholden to them. The pointers one though I feel is fairly important since it has a huge effect on performance.

Copy link
Contributor

@atlas0fd00m atlas0fd00m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good, but a few things to discuss.

vivisect/__init__.py Outdated Show resolved Hide resolved
vivisect/analysis/generic/pointers.py Show resolved Hide resolved
vivisect/analysis/generic/pointers.py Show resolved Hide resolved
vivisect/analysis/generic/pointers.py Show resolved Hide resolved
vivisect/analysis/generic/pointers.py Show resolved Hide resolved
vivisect/symboliks/common.py Show resolved Hide resolved
vivisect/symboliks/common.py Show resolved Hide resolved
vivisect/symboliks/common.py Show resolved Hide resolved
vivisect/symboliks/tests/test_reduce.py Show resolved Hide resolved
@rakuy0 rakuy0 changed the title WIP: Speed ups Speed ups May 25, 2021
vivisect/__init__.py Outdated Show resolved Hide resolved
vivisect/analysis/generic/pointers.py Show resolved Hide resolved
vivisect/symboliks/common.py Show resolved Hide resolved
op = envi.Emulator.parseOpcode(self, va, arch=arch)
self.opcache[va] = op
return op
return self.vw.parseOpcode(va, arch=arch)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't like this. the extra call does cost performance-wise. this is directly in the critical path for emulation, which needs to be extremely fast.

i get the idea of code-reuse. however, in this case, i think we're better off maintaining the separate code (or inheriting the function somehow). i don't think there is any function called more often than this one.

if you're interested in getting the value of the vw's opcache, you can make them the same dict in __init__().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't have a strong position on this change either way; however...

most respectfully, i don't find this performance argument very convincing without any numbers/profiling.

python, as a dynamically interpreted scripting language, doesn't have the same icache/dcache characteristics of native code. my intuition is that a single call python opcode won't affect performance that much; more importantly, i really have no idea if this intuition is true or not - hence, i'd want to see numbers to back up any performance argument like this.

if performance is critical here, then there are other optimizations to be made as well, such as maintaining a local reference to parseOpcode to avoid the module lookup to envi.Emulator (at least two calls & hash table lookups). and even then, we need profiling to guide us to what actually matters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of this change was to present the same interface, since we expect vw to always be populated in an emulator instance.

However, I did do profiling to guide this change:
On master:

Timer unit: 1e-06 s

Total time: 21.3873 s
File: /home/james/work/vivisect/vivisect/__init__.py
Function: parseOpcode at line 1113

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  1113                                               @profile
  1114                                               def parseOpcode(self, va, arch=envi.ARCH_DEFAULT):
  1115                                                   '''
  1116                                                   Parse an opcode from the specified virtual address.
  1117
  1118                                                   Example: op = m.parseOpcode(0x7c773803)
  1119
  1120                                                   note: differs from the IMemory interface by checking loclist
  1121                                                   '''
  1122    221791     476526.0      2.1      2.2          off, b = self.getByteDef(va)
  1123    221791     129628.0      0.6      0.6          if arch == envi.ARCH_DEFAULT:
  1124    118660     333287.0      2.8      1.6              loctup = self.getLocation(va)
  1125                                                       # XXX - in the case where we've set a location on what should be an
  1126                                                       # opcode lets make sure L_LTYPE == LOC_OP if not lets reset L_TINFO = original arch param
  1127                                                       # so that at least parse opcode wont fail
  1128    118660      66195.0      0.6      0.3              if loctup is not None and loctup[L_TINFO] and loctup[L_LTYPE] == LOC_OP:
  1129    118036      47468.0      0.4      0.2                  arch = loctup[L_TINFO]
  1130
  1131    221791   20334217.0     91.7     95.1          return self.imem_archs[(arch & envi.ARCH_MASK) >> 16].archParseOpcode(b, off, va)

Total time: 15.1873 s
File: /home/james/work/vivisect/vivisect/impemu/emulator.py
Function: parseOpcode at line 240

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   240                                               @profile
   241                                               def parseOpcode(self, va, arch=envi.ARCH_DEFAULT):
   242                                                   # We can make an opcode *faster* with the workspace because of
   243                                                   # getByteDef etc... use it.
   244    152977     127568.0      0.8      0.8          op = self.opcache.get(va)
   245    152977      61823.0      0.4      0.4          if op is None:
   246    152977   14838736.0     97.0     97.7              op = envi.Emulator.parseOpcode(self, va, arch=arch)
   247    152854     112640.0      0.7      0.7              self.opcache[va] = op
   248    152854      46491.0      0.3      0.3          return op

this branch:

Timer unit: 1e-06 s

Total time: 20.7949 s
File: /home/james/work/vivisect/vivisect/__init__.py
Function: parseOpcode at line 1115

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  1115                                               @profile
  1116                                               def parseOpcode(self, va, arch=envi.ARCH_DEFAULT, skip=False):
  1117                                                   '''
  1118                                                   Parse an opcode from the specified virtual address.
  1119
  1120                                                   Example: op = m.parseOpcode(0x7c773803, skip=True)
  1121
  1122                                                   Set skip=True in order to bypass the opcode cache and force a reparsing of bytes
  1123                                                   '''
  1124    353264     792157.0      2.2      3.8          off, b = self.getByteDef(va)
  1125    353264     204849.0      0.6      1.0          if arch == envi.ARCH_DEFAULT:
  1126    271637     851600.0      3.1      4.1              loctup = self.getLocation(va)
  1127                                                       # XXX - in the case where we've set a location on what should be an
  1128                                                       # opcode lets make sure L_LTYPE == LOC_OP if not lets reset L_TINFO = original arch param
  1129                                                       # so that at least parse opcode wont fail
  1130    271637     166207.0      0.6      0.8              if loctup is not None and loctup[L_TINFO] and loctup[L_LTYPE] == LOC_OP:
  1131    248398     106030.0      0.4      0.5                  arch = loctup[L_TINFO]
  1132    353264     139779.0      0.4      0.7          if not skip:
  1133    353264     233272.0      0.7      1.1              key = (va, arch, b[:16])
  1134    353264     355915.0      1.0      1.7              valu = self._op_cache.get(key, None)
  1135    353264     274178.0      0.8      1.3              if not valu:
  1136    172663   17248817.0     99.9     82.9                  valu = self.imem_archs[(arch & envi.ARCH_MASK) >> 16].archParseOpcode(b, off, va)
  1137    353141     285284.0      0.8      1.4              self._op_cache[key] = valu
  1138    353141     136858.0      0.4      0.7              return valu
  1139                                                   return self.imem_archs[(arch & envi.ARCH_MASK) >> 16].archParseOpcode(b, off, va)

Total time: 4.04843 s
File: /home/james/work/vivisect/vivisect/impemu/emulator.py
Function: parseOpcode at line 239

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   239                                               @profile
   240                                               def parseOpcode(self, va, arch=envi.ARCH_DEFAULT):
   241    152977    4048431.0     26.5    100.0          return self.vw.parseOpcode(va, arch=arch)

So about 50% more hits to the parseOpcode in the VivWorkspace object, but comparable performance, but faster emulator performance, mostly because the opcache in the workspace isn't local to each emulator instance (and actually gets to persist)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and this was just and example done on the gcc-7 binary in our test files)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough.
i'm confused about the numbers. how did you have such wildly different numbers in the top parts, but matching on the bottom parts: 152977 ? how are you caching the vw.parseOpcode versus the emulator in the comparison? i'm guessing we're gaining so much because of the method of caching.

i'll get over it, but i've had it pounded into me that calls are not cheap.

'''
Parse an opcode from the specified virtual address.

Example: op = m.parseOpcode(0x7c773803)
Example: op = m.parseOpcode(0x7c773803, skip=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you change these comments from skip= as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nm. i got it.

update comment to follow the api changes
Copy link
Contributor

@atlas0fd00m atlas0fd00m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work, @rakuy0

@rakuy0 rakuy0 merged commit 012f45d into master Jun 12, 2021
@rakuy0 rakuy0 deleted the i_am_speed branch June 12, 2021 02:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants