Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite HTTP strings processing in assembler #1249

Merged
merged 9 commits into from May 16, 2019
Merged

Conversation

krizhanovsky
Copy link
Contributor

Rewrite HTTP strings processing in assembler:

  1. remove tls/dummy_headers
  2. replace dynaic table initialization with static one
  3. shrink tables size
  4. small assembly optimizations

Introduce test_debug_relax() to avoid debug message drop on unit tests with debug.

1. remove tls/dummy_headers
2. replace dynaic table initialization with static one
3. shrink tables size
4. small assembly optimizations

Inroduce test_debug_relax() to avoid debug message drop on unit
tests with debug.
Best number for VM on my i7-6500U host is 80ms.
is suggested by the Intel optimization manual as a good practice).
Actually, GCC does exactly the same since with -mavx implying -msse2avx
it generates code with VEX-prefixed instructions, AVX version of SSE
instructions. So according to the optimization manual there is no
AVX to SSE transition penalties.

I checked str_avx2.S as well as lib/str_simd.S and none of them uses
plain SSE instructions. Other kernel SIMD code works with AVX (e.g.
crypto routines) and/or takes care about vzeroupper call on their own.
Since SIMD isn't so widely used in kernel code, there is no sense to
follow the recommended good practice.

The best time for the str benchmark is 84ms, which is 5% worse than
previously, so I'll remove vzeroupper with the next patch.
tempesta_fw/t/unit/test_tfw_str.c Outdated Show resolved Hide resolved
tempesta_fw/t/unit/test_tfw_str.c Outdated Show resolved Hide resolved
Copy link
Contributor

@i-rinat i-rinat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. Here are some minor comments:

tempesta_fw/str.c Show resolved Hide resolved
tempesta_fw/str.h Outdated Show resolved Hide resolved
tempesta_fw/str_avx2.S Outdated Show resolved Hide resolved
cmpq $8, %rdx
ja .str2low_test_len128
movq .str2low_switch(,%rdx,8), %rax
jmpq *%rax
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These indirect jumps make objtool generate a warning: "warning: objtool: __tfw_strtolower_avx2()+0x46: indirect jump found in RETPOLINE build". To suppress it we need either annotate these jumps like that:

diff --git a/tempesta_fw/str_avx2.S b/tempesta_fw/str_avx2.S
index 5f890fe6..df3e2e2f 100644
--- a/tempesta_fw/str_avx2.S
+++ b/tempesta_fw/str_avx2.S
@@ -24,6 +24,7 @@
 #include <linux/linkage.h>
 #include <asm/alternative-asm.h>
 #include <asm/export.h>
+#include <asm/nospec-branch.h>
 
 #define CASE	0x2020202020202020
 
@@ -476,6 +477,7 @@ ENTRY(__tfw_strtolower_avx2)
 	cmpq	$8, %rdx
 	ja	.str2low_test_len128
 	movq	.str2low_switch(,%rdx,8), %rax
+	ANNOTATE_RETPOLINE_SAFE
 	jmpq	*%rax
 .section .rodata
 .align	8
@@ -653,6 +655,7 @@ ENTRY(__tfw_stricmp_avx2)
 
 	/* Process short strings below 8 bytes in length. */
 	movq	.stricmp_switch(,%rdx,8), %rax
+	ANNOTATE_RETPOLINE_SAFE
 	jmpq	*%rax
 .section .rodata
 .align 8
@@ -1009,6 +1012,7 @@ ENTRY(__tfw_stricmp_avx2_2lc)
 	ja	.sic2lc_short
 
 	movq	.sic2lc_switch(,%rdx,8), %rax
+	ANNOTATE_RETPOLINE_SAFE
 	jmpq	*%rax
 .section	.rodata
 .align 8

or enable Spectre mitigation:

diff --git a/tempesta_fw/str_avx2.S b/tempesta_fw/str_avx2.S
index 5f890fe6..d5d2334c 100644
--- a/tempesta_fw/str_avx2.S
+++ b/tempesta_fw/str_avx2.S
@@ -24,6 +24,7 @@
 #include <linux/linkage.h>
 #include <asm/alternative-asm.h>
 #include <asm/export.h>
+#include <asm/nospec-branch.h>
 
 #define CASE	0x2020202020202020
 
@@ -476,7 +477,7 @@ ENTRY(__tfw_strtolower_avx2)
 	cmpq	$8, %rdx
 	ja	.str2low_test_len128
 	movq	.str2low_switch(,%rdx,8), %rax
-	jmpq	*%rax
+	JMP_NOSPEC	%rax
 .section .rodata
 .align	8
 .str2low_switch:
@@ -653,7 +654,7 @@ ENTRY(__tfw_stricmp_avx2)
 
 	/* Process short strings below 8 bytes in length. */
 	movq	.stricmp_switch(,%rdx,8), %rax
-	jmpq	*%rax
+	JMP_NOSPEC	%rax
 .section .rodata
 .align 8
 .stricmp_switch:
@@ -1009,7 +1010,7 @@ ENTRY(__tfw_stricmp_avx2_2lc)
 	ja	.sic2lc_short
 
 	movq	.sic2lc_switch(,%rdx,8), %rax
-	jmpq	*%rax
+	JMP_NOSPEC	%rax
 .section	.rodata
 .align 8
 .sic2lc_switch:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We check jump array bounds against a constant, encoded in an instruction opcode, so there is no cache miss on boundary check required for Spectre. JPM_NOSPEC gives about 15% performance impact in comparison with ANNOTATE_RETPOLINE_SAFE: 101ms vs 85ms in the tfw_str microbenchmark.

movq %rbx, %r11
jmp .str2low_test_len8

.str2low_tolower8:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use of volatile in the C source code resulted in obviously redundant memory stores and loads. As far as I understand, code below is equivalent to:

        movq    (%rsi,%r11), %rax
        movq    %r9, %mm2
        movq    %r8, %mm1
        movq    %rax, %mm0
        movq    %rax, %r12
        psubb   %mm2, %mm0
        pcmpgtb %mm0, %mm1
        movq    %mm1, %rax
        andq    %rcx, %rax
        orq     %r12, %rax
        movq    %rax, (%rdi,%r11)
        movslq  %ebx, %rax
        jmp     .str2low_small_len

tempesta_fw/str_avx2.S Outdated Show resolved Hide resolved
Use `req` notation for macro arguments.
Use ANNOTATE_RETPOLINE_SAFE as we're save against Spectre attack.
__tfw_strtolower_avx2() optimizations.
Small cleanups.
@krizhanovsky krizhanovsky merged commit 0dc931b into master May 16, 2019
@krizhanovsky krizhanovsky deleted the ak-str-asm2 branch May 16, 2019 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants