Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

websocket: optimize C websocket_mask function #2024

Merged
merged 2 commits into from May 20, 2017

Conversation

@pjknkda
Copy link
Contributor

pjknkda commented Apr 25, 2017

This change includes the optimization of websocket_mask function which is written in C extension.

Basically the idea is from the cython implementation of another popular Python web framework - aiohttp. (https://github.com/aio-libs/aiohttp/blob/v1.2.0/aiohttp/_websocket.pyx)

Here is a benchmark result (CPython 3.5.1):

# Original
% python -m timeit \
         -n 100000 \
         -s 'import os; from tornado.speedups import websocket_mask; mask = b"asdf"; data = os.urandom(100 * 1024)' \
         'websocket_mask(mask, data)'
100000 loops, best of 3: 113 usec per loop

# Proposed
% python -m timeit \
         -n 100000 \
         -s 'import os; from tornado.speedups import websocket_mask; mask = b"asdf"; data = os.urandom(100 * 1024)' \
         'websocket_mask(mask, data)'
100000 loops, best of 3: 23.1 usec per loop

The change takes more advantages if the length of data gets longer.

len(data) Original Proposed Ratio
1 0.131 0.129 0.98
10 0.157 0.133 0.85
100 0.253 0.140 0.55
1,000 1.24 0.387 0.31
10,000 11.1 2.36 0.21
100,000 113 22.4 0.20
1,000,000 1110 218 0.20
uint64_mask = (uint64_mask << 32) | uint32_mask;

while (data_len >= 8) {
((uint64_t*)buf)[0] = ((uint64_t*)data)[0] ^ uint64_mask;

This comment has been minimized.

Copy link
@ploxiln

ploxiln Apr 26, 2017

Contributor

minor style tip, feel free to ignore: this sort of thing is more succinctly written as

    *(uint64_t*)buf = *(uint64_t*)data ^ uint64_mask;
}
}

while (data_len >= 4) {

This comment has been minimized.

Copy link
@ploxiln

ploxiln Apr 26, 2017

Contributor

another silly little thing which probably has no measurable effect: this can just be if (data_len >= 4) because it can only be true once, saving an extra compare/branch after the first (only) iteration

This comment has been minimized.

Copy link
@pjknkda

pjknkda Apr 26, 2017

Author Contributor

In the case of 32bit system, above 64-bit XOR will not be triggered and it's a fallback for that case.

This comment has been minimized.

Copy link
@ploxiln

ploxiln Apr 26, 2017

Contributor

yes, my bad, I see the sizeof() check

@bdarnell

This comment has been minimized.

Copy link
Member

bdarnell commented May 20, 2017

Thanks!

@bdarnell bdarnell merged commit 9a18f6c into tornadoweb:master May 20, 2017
2 checks passed
2 checks passed
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.