Optimize amz_norm_whitespace #35

noamkush · 2018-02-21T22:59:27Z

While profiling I noticed that requests-aws4auth spends 2/3 of the time in amz_norm_whitespace while many of the strings that are passed to it don't contain any whitespace. Checking for whitespace using a regex before trying to replace it turned out to be much faster than running shlex every time.

tedder · 2021-03-03T22:50:35Z

I did some testing, and I'm going to reject this because it is much worse in some scenarios.

#!/usr/bin/python3

import timeit


def run(run_count=10000, _repeat=1):
  print(run_count, _repeat)
  test_str = 'Vice lyft pitchfork, poke wayfarers flexitarian lumbersexual taxidermy twee bespoke la croix listicle. Tote bag cloud bread XOXO yuccie crucifix, fam messenger bag letterpress cronut subway tile leggings small batch. Freegan shaman tattooed, gluten-free pinterest chambray whatever. Put a bird on it small batch franzen, readymade cardigan four dollar toast marfa heirloom cold-pressed YOLO.'
  t = timeit.repeat(stmt='''
if re.search('\s', text):
  f = ' '.join(shlex.split(text, posix=False))
''', setup=f'''
import re
import shlex
text = '{test_str}' * {_repeat}
''', number=run_count, repeat=3)
  print(f"   spaces, re and shlex: {t}")
  
  t = timeit.repeat(stmt='''
f = ' '.join(shlex.split(text, posix=False))
''', setup=f'''
import shlex
text = '{test_str}' * {_repeat}
''', number=run_count, repeat=3)
  print(f"   spaces,        shlex: {t}")
  t = timeit.repeat(stmt='''
if re.search('\s', text):
  f = ' '.join(shlex.split(text, posix=False))
''', setup=f'''
import re
import shlex
text = '{test_str}'.replace(' ', '') * {_repeat}
''', number=run_count, repeat=3)
  print(f"no spaces, re and shlex: {t}")
  
  t = timeit.repeat(stmt='''
f = ' '.join(shlex.split(text, posix=False))
''', setup=f'''
import shlex
text = '{test_str}'.replace(' ', '') * {_repeat}
''', number=run_count, repeat=3)
  print(f"no spaces,        shlex: {t}")

run(_repeat=1)
print("*** again with length multiplier")
run(500, 1024)

$ python3 rex_time.py 
10000 1
   spaces, re and shlex: [2.3800748139619827, 2.653704177006148, 2.585710724000819]
   spaces,        shlex: [2.2576327560236678, 2.394987210049294, 2.31907087599393]
no spaces, re and shlex: [0.01793892600107938, 0.017424626974388957, 0.01678658206947148]
no spaces,        shlex: [2.1401367279468104, 1.9101750730769709, 1.9226093819597736]
*** again with length multiplier
500 1024
   spaces, re and shlex: [118.54910042299889, 121.399052057066, 122.54422497900669]
   spaces,        shlex: [121.31505271594506, 122.77670257701539, 129.68882209795993]
no spaces, re and shlex: [0.659932279959321, 0.6706405270379037, 0.6688751910114661]
no spaces,        shlex: [1140.910614454071, 1158.019293964957, 1188.4628928540042]

For relatively short strings (and high iterations) the overhead is very similar for "shlex with regex" versus "shlex without regex". There's one exception that is very fast: strings with zero spaces. But the other three are very close. Removing the regex generally speeds things up by 5-20%.

Now, when it gets to a long string with spaces, the results are similar to the short strings. (removing the regex actually slows it down slightly). But long strings without spaces is what seals the deal: shlex is drastically slower.

So, TLDR, removing the regex speeds up the best cases by 11%. But it slows down the worst cases by a factor of 10 compared to long strings with spaces. Like many algorithms, the worst-case is the biggest problem.

noamkush · 2021-03-04T09:52:12Z

Huh? According to your own numbers without whitespace my version is 117 to 1600 times faster (this is not a percentage, it's a multiplier) and practically the same with whitespace. You said

shlex is drastically slower.

So why are you not merging this? In what scenario was it much worse?

tedder · 2021-03-04T10:15:34Z

Aha- I misremembered the diff. I thought you were removing the regex.

Optimized amz_norm_whitespace.

404e8ee

tedder closed this Mar 3, 2021

tedder reopened this Mar 4, 2021

tedder merged commit 31c09f8 into tedder:master Mar 4, 2021

tedder added a commit that referenced this pull request Mar 4, 2021

add @noamkush's #35 to the changelog

6d4af88

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize amz_norm_whitespace #35

Optimize amz_norm_whitespace #35

noamkush commented Feb 21, 2018

tedder commented Mar 3, 2021

noamkush commented Mar 4, 2021

tedder commented Mar 4, 2021

Optimize amz_norm_whitespace #35

Optimize amz_norm_whitespace #35

Conversation

noamkush commented Feb 21, 2018

tedder commented Mar 3, 2021

noamkush commented Mar 4, 2021

tedder commented Mar 4, 2021