http_parser: optimize parsing using vector instructions (AVX) #182

Closed
steils opened this Issue Aug 12, 2015 · 4 comments

Projects

None yet

4 participants

@steils
Contributor
steils commented Aug 12, 2015

Now HTTP parser cycles to process every single character. Improve it using AVX to process strings.

@steils steils added the enhancement label Aug 12, 2015
@krizhanovsky krizhanovsky added this to the TBD milestone Aug 12, 2015
@krizhanovsky
Contributor

The most crucial piece is x_HdrOther processing which should potentially scan long strings and verify allowed alphabet (currently we do simple memchr()), see 9da1b89#diff-0a919ed70e9a8fc9bdfce5b5ee1e1e99R674.

It worth to mention that current implementation can eat LF as an allowed symbol in HTTP header, e.g. request like echo -ne 'GET / HTTP/1.1\r\nHost: test\nConnection: close\nConnection: foo\n\r\n' generates Host header value as test\nConnection: close\nConnection: foo\n since the parses expects the header length at CR.

The other funny request parsed wrongly is

     GET / HTTP/1.1\r\n
     GET / HTTP/1.1\r\n
     Host: test\r\n
     \r\n

The second GET / HTTP/1.1 is parsed as Other header regardless it hasn't : delimiter.

@krizhanovsky krizhanovsky self-assigned this Aug 12, 2015
@krizhanovsky
Contributor

Depends on #69

@milabs milabs added a commit that referenced this issue Apr 19, 2016
@milabs milabs http-parser: ensure that common headers contain ':'
This patch ensures that the common (raw) headers contains ':' after the
header-name. Also, it fixes issue when the second `GET / HTTP/1.1` is
parsed as common header regardless of absence of ':' delimiter:

    GET / HTTP/1.1\r\n
    GET / HTTP/1.1\r\n
    Host: test\r\n
    \r\n

Related to #182, #444.
701e737
@milabs milabs added a commit that referenced this issue Apr 19, 2016
@milabs milabs http-parser: ensure that common headers contain ':'
This patch ensures that the common (raw) headers contains ':' after the
header-name. Also, it fixes issue when the second `GET / HTTP/1.1` is
parsed as common header regardless of absence of ':' delimiter:

    GET / HTTP/1.1\r\n
    GET / HTTP/1.1\r\n
    Host: test\r\n
    \r\n

Related to #182, #444.
788c21d
@milabs milabs added a commit that referenced this issue Apr 20, 2016
@milabs milabs http-parser: ensure that common headers contain ':'
This patch ensures that the common (raw) headers contains ':' after the
header-name. Also, it fixes issue when the second `GET / HTTP/1.1` is
parsed as common header regardless of absence of ':' delimiter:

    GET / HTTP/1.1\r\n
    GET / HTTP/1.1\r\n
    Host: test\r\n
    \r\n

Related to #182, #444.
c0efd31
@milabs
Contributor
milabs commented Apr 20, 2016

Common HTTP-headers processing fixed by 04ca164

@sergsever sergsever added a commit that referenced this issue May 6, 2016
@milabs @sergsever milabs + sergsever http-parser: ensure that common headers contain ':'
This patch ensures that the common (raw) headers contains ':' after the
header-name. Also, it fixes issue when the second `GET / HTTP/1.1` is
parsed as common header regardless of absence of ':' delimiter:

    GET / HTTP/1.1\r\n
    GET / HTTP/1.1\r\n
    Host: test\r\n
    \r\n

Related to #182, #444.
fee0391
@krizhanovsky
Contributor

According to the benchmark small HTTP flood-like request doesn't stress the parser at all, rather Linux I/O is the bottleneck. Meantime, large strings in request significantly hurt performance in HTTP parser.

A request with significantly large strings (e.g. see example of very long User-Agent strings) like

    $ ./wrk -t 8 -c 64 -d 60s --header 'Connection: keep-alive' --header
'Upgrade-Insecure-Requests: 1' --header 'User-Agent: Mozilla/4.0
(compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; (R1 1.6); SLCC1;
.NET CLR 2.0.50727; InfoPath.2; OfficeLiveConnector.1.3;
OfficeLivePatch.0.0; .NET CLR 3.5.30729; .NET CLR 3.0.30618;
66760635803; runtime 11.00294; 876906799603; 97880703; 669602703;
9778063903; 877905603; 89670803; 96690803; 8878091903; 7879040603;
999608065603; 799808803; 6666059903; 669602102803; 888809342903;
696901603; 788907703; 887806555703; 97690214703; 66760903; 968909903;
796802422703; 8868026703; 889803611803; 898706903; 977806408603;
976900799903; 9897086903; 88780803; 798802301603; 9966008603;
66760703; 97890452603; 9789064803; 96990759803; 99960107703;
8868087903; 889801155603; 78890703; 8898070603; 89970603; 89970539603;
89970488703; 8789007603; 87890903; 877904603; 9887077703; 798804903;
97890264603; 967901703; 87890703; 97690420803; 79980706603;
9867086703; 996602846703; 87690803; 6989010903; 977809603; 666601903;
876905337803; 89670603; 89970200903; 786903603; 696901911703;
788905703; 896709803; 96890703; 998601903; 88980703; 666604769703;
978806603; 7988020803; 996608803; 788903297903; 98770043603;
899708803; 66960371603; 9669088903; 69990703; 99660519903; 97780603;
888801803; 9867071703; 79780803; 9779087603; 899708603; 66960456803;
898706824603; 78890299903; 99660703; 9768079803; 977901591603;
89670605603; 787903608603; 998607934903; 799808573903; 878909603;
979808146703; 9996088603; 797803154903; 69790603; 99660565603;
7869028603; 896707703; 97980965603; 976907191703; 88680703; 888809803;
69690903; 889805523703; 899707703; 997605035603; 89970029803;
9699094903; 877906803; 899707002703; 786905857603; 69890803;
97980051903; 997603978803; 9897097903; 66960141703; 7968077603;
977804603; 88980603; 989700803; 999607887803; 78690772803;
96990560903; 98970961603; 9996032903; 9699098703; 69890655603;
978903803; 698905066803; 977806903; 9789061703; 967903747703;
976900550903; 88980934703; 8878075803; 8977028703; 97980903;
9769006603; 786900803; 98770682703; 78790903; 878906967903;
87690399603; 99860976703; 796805703; 87990603; 968906803;
967904724603; 999606603; 988705903; 989702842603; 96790603; 99760703;
88980166703; 9799038903; 98670903; 697905248603; 7968043603; 66860703;
66860127903; 9779048903; 89670123903; 78890397703; 97890603; 87890803;
8789030603; 69990603; 88880763703; 9769000603; 96990203903;
978900405903; 7869022803; 699905422903; 97890703; 87990903; 878908703;
7998093903; 898702507603; 97780637603; 966907903; 896702603;
9769004803; 7869007903; 99660158803; 7899099603; 8977055803; 99660603;
7889080903; 66660981603; 997604603; 6969089803; 899701903; 9769072703;
666603903; 99860803; 997608803; 69790903; 88680756703; 979805677903;
9986047703; 89970803; 66660603; 96690903; 8997051603; 789901209803;
8977098903; 968900326803; 87790703; 98770024803; 697901794603;
69990803; 887805925803; 968908903; 97880603; 897709148703;
877909476903; 66760197703; 977908603; 698902703; 988706504803;
977802026603; 88680964703; 8878068703; 987705107903; 978902878703'
--header 'Accept:
text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'
--header 'Accept-Encoding: gzip, deflate, sdch' --header
'Accept-Language: en-US,en;q=0.8,ru;q=0.6' --header 'Cookie: a=sdfasd;
sdf=3242u389erfhhs; djcnjhe=sdfsdafsdjfb324te1267dd;
sdaf=mo2u8943478t67437461746rfdgfcdc; ityu=9u489573484duifhd;
GTYFT=nsdjhcbyq3te76ewgfcZ; uityut=23Y746756247856425784657;
GA=URHUFVHHVSDNFDHGYSDGF;
a=%45345%dfdfg%4656%4534sdfjhsdb.sdfsg.sdfgsf.; nsdjhfb=4358345y;
jkbsdff=aaaa; aa=4583478; aaaaa=34435345; rrr=iy7t67t6tsdf;
ggg=234i5y24785y78ry534785; mmm=23uy47fbhdsfbgh;
bsdfhbhfgdqqwew=883476757%345345; jksdfb=2348y;
ndfsgsfdg=235trHHVGHFGC; a=sdfasd; sdf=3242u389erfhhs;
djcnjhe=sdfsdafsdjfb324te1267dd; sdaf=mo2u8943478t67437461746rfdgfcdc;
ityu=9u489573484duifhd; GTYFT=nsdjhcbyq3te76ewgfcZ;
uityut=23Y746756247856425784657; GA=URHUFVHHVSDNFDHGYSDGF;
a=%45345%dfdfg%4656%4534sdfjhsdb.sdfsg.sdfgsf.; nsdjhfb=4358345y;
jkbsdff=aaaa; aa=4583478; aaaaa=34435345; rrr=iy7t67t6tsdf;
ggg=234i5y24785y78ry534785; mmm=23uy47fbhdsfbgh;
bsdfhbhfgdqqwew=883476757%345345; jksdfb=2348y;
ndfsgsfdg=235trHHVGHFGC; erertrt=3242342343423324234;
ggggggggg=8888888888888888888888888888888888888888888888888888888888888788'
'http://192.168.100.100:80/?;r=657222568;a=p-2945K0QbJw0BA;fpan=0;fpa=P0
-456992954-1322415728212;ns=0;ce=1;je=0;sr=1280x800x24;enc=n;dst=1;et=13
40553300515;tzo=-240;ref=;url=http%3A%2F%2Fitman.livejournal.com%2F47424
9.html%3Fthread%3D5941385%23t5941385;ogl=title.%D0%9F%D0%BE%D1%87%D0%B5%
D0%BC%D1%83%20%D0%BA%D0%BE%D0%BC%D0%BF%D1%8C%D1%8E%D1%82%D0%B5%D1%80%20-
-%20%D1%8D%D1%82%D0%BE%20%D0%BD%D0%B5%20%D0%BA%D0%BE%D0%BD%D0%B5%D1%87%D0
%BD%D1%8B%D0%B9%20%D0%B0%D0%B2%D1%82%D0%BE%D0%BC%D0%B0%D1%82%3F%2Cdescrip
tion.%D0%A1%D1%82%D0%BE%D0%BB%D0%B5%D1%82%D0%B8%D1%8E%20%D0%A2%D1%8C%D1%8
E%D1%80%D0%B8%D0%BD%D0%B3%D0%B0%20%D0%BF%D0%BE%D1%81%D0%B2%D1%8F%D1%89%D0
%B0%D0%B5%D1%82%D1%81%D1%8F%252E%20%D0%9E%D0%BA%D0%B0%D0%B7%D1%8B%D0%B2%D
0%B0%D0%B5%D1%82%D1%81%D1%8F%252C%20%D0%BE%D0%B3%D1%80%D0%BE%D0%BC%D0%BD%
D0%BE%D0%B5%20%D0%BA%D0%BE%D0%BB%D0%B8%D1%87%D0%B5%D1%81%D1%82%D0%B2%D0%B
E%20%D0%D0%B8%D1%8E%20%D0%A2%D1%8C%D1%8E%D1%80%D0%B8%D0%BD%D0%B3%D0%B0%20
%D0%BF%D0%BE%D1%81%D0%B2%D1%8F%D1%89%D0%B0%D0%B5%D1%82%D1%81%D1%8F%252E%2
0%D0%9E%D0%BA%D0%B0%D0%B7%D1%8B%D0%B2%D0%B0%D0%B5%D1%82%D1%81%D1%8F%252C%
20%D0%BE%D0%B3%D1%80%D0%BE%D0%BC%D0%BD%D0%BE%D0%B5%20%D0%BA%D0%BE%D0%BB%D
0%B8%D1%87%D0%B5%D1%81%D1%82%D0%B2%D0%BE%20%D0%BB%D1%8E%D0%B4%D0%B5%D0%B9
%20%D1%81%D1%87%D0%B8%D1%82%D0%B0%D0%B5%D1%82%252C%20%D1%BE%D0%BB%D0%B8%D
1%87%D0%B5%D1%81%D1%82%D0%B2%D0%BE%20%D0%BB%D1%8E%D0%B4%D0%B5%D0%B9%20%D1
%81%D1%87%D0%B8%D1%82%D0%B0%D0%B5%D1%82%252C%20%D1%87%2Cimage.http%3A%2F%
2Fl-userpic%252Elivejournal%252Ecom%2F113387160%2F8313909'

produces following perf profile:

    + 24.31% tfw_http_parse_req
    + 5.95% strncasecmp
    + 4.47% check_poison_obj
    + 3.12% __memset
    + 2.63% debug_lockdep_rcu_enabled
    + 2.56% lock_acquire
    + 2.46% lock_release
    + 2.26% __str_grow_tree
    + 2.24% tfw_http_msg_field_chunk_fixup
    + 1.26% __memcpy
    + 1.23% tfw_pool_realloc
    + 1.09% ixgbe_clean_rx_irq
    + 1.06% debug_locks_off
    + 0.88% kmem_cache_free
    + 0.85% trace_hardirqs_off_caller
    + 0.82% ixgbe_xmit_frame_ring
    + 0.74% lock_acquired
    + 0.68% do_raw_spin_trylock
    + 0.62% tfw_str_add_compound
    + 0.58% __lock_acquire
    + 0.57% dev_gro_receive
    + 0.56% kmem_cache_alloc
    + 0.56% debug_check_no_obj_freed
    + 0.54% inet_gro_receive
    + 0.53% tfw_wq_tasklet
    + 0.51% tfw_http_msg_hdr_chunk_fixup
@krizhanovsky krizhanovsky added a commit that closed this issue Oct 30, 2016
@krizhanovsky krizhanovsky Fix #182, fix #624: SIMD HTTP strings processing, multiple HTTP parse…
…r fixes with extended unit tests
39ef33b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment