Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

websocket: optimize C websocket_mask function #2024

Merged
merged 2 commits into from May 20, 2017
Merged

websocket: optimize C websocket_mask function #2024

merged 2 commits into from May 20, 2017

Conversation

pjknkda
Copy link
Contributor

@pjknkda pjknkda commented Apr 25, 2017

This change includes the optimization of websocket_mask function which is written in C extension.

Basically the idea is from the cython implementation of another popular Python web framework - aiohttp. (https://github.com/aio-libs/aiohttp/blob/v1.2.0/aiohttp/_websocket.pyx)

Here is a benchmark result (CPython 3.5.1):

# Original
% python -m timeit \
         -n 100000 \
         -s 'import os; from tornado.speedups import websocket_mask; mask = b"asdf"; data = os.urandom(100 * 1024)' \
         'websocket_mask(mask, data)'
100000 loops, best of 3: 113 usec per loop

# Proposed
% python -m timeit \
         -n 100000 \
         -s 'import os; from tornado.speedups import websocket_mask; mask = b"asdf"; data = os.urandom(100 * 1024)' \
         'websocket_mask(mask, data)'
100000 loops, best of 3: 23.1 usec per loop

The change takes more advantages if the length of data gets longer.

len(data) Original Proposed Ratio
1 0.131 0.129 0.98
10 0.157 0.133 0.85
100 0.253 0.140 0.55
1,000 1.24 0.387 0.31
10,000 11.1 2.36 0.21
100,000 113 22.4 0.20
1,000,000 1110 218 0.20

uint64_mask = (uint64_mask << 32) | uint32_mask;

while (data_len >= 8) {
((uint64_t*)buf)[0] = ((uint64_t*)data)[0] ^ uint64_mask;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor style tip, feel free to ignore: this sort of thing is more succinctly written as

    *(uint64_t*)buf = *(uint64_t*)data ^ uint64_mask;

}
}

while (data_len >= 4) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another silly little thing which probably has no measurable effect: this can just be if (data_len >= 4) because it can only be true once, saving an extra compare/branch after the first (only) iteration

Copy link
Contributor Author

@pjknkda pjknkda Apr 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of 32bit system, above 64-bit XOR will not be triggered and it's a fallback for that case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, my bad, I see the sizeof() check

@bdarnell
Copy link
Member

Thanks!

@bdarnell bdarnell merged commit 9a18f6c into tornadoweb:master May 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants