Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP normalization #2

Open
krizhanovsky opened this issue Sep 6, 2014 · 2 comments
Open

HTTP normalization #2

krizhanovsky opened this issue Sep 6, 2014 · 2 comments

Comments

@krizhanovsky
Copy link
Contributor

krizhanovsky commented Sep 6, 2014

We need at least minimal HTTP requests normalization (like ngx_http_parse_complex_uri() from Nginx does it). The normalization must be implemented as a part of current HTTP FSM (to avoid double processing as in Nginx) and write normalized fields to appropriate part of TfwHttpReq.

Normalization depending on back-end server personality also must be done, however this is very customizable and expensive logic, so it should be possible to switch the functionality off. Thus it must be implemented in http_norm.h as plugable HTTP FSM states.

Basically, we shouldn't do the normalization if Cache-Control: no-transform is presented.

Depends on #902. Linked with #1207.

@krizhanovsky
Copy link
Contributor Author

krizhanovsky commented Sep 6, 2014

Motivation

#628 implements strict alphabet validation. However, for example, if an alphabet prohibits space (x20) in URI, then a request can bypass the validator using simple hex encoding GET /foo%20bar HTTP/1.1. One of dangerous real life example could be a response splitting attack:

/redir_lang.jsp?lang=foobar%0d%0aContent-Length:%200%0d%0a%0d%0a
HTTP/1.1%20200%20OK%0d%0aContent-Type:%20text/html%0d%0a
Content-Length:%2019%0d%0a%0d%0a<html>Shazam</html>

Allowed characters (bytes) must be taken from the same configuration options as for #628.

The encodings must be validated, see for example validate_url_encoding() from ModSecurity/apache2/re_operators.c.

Traffic normalization for intrusion detection is well studied, see for example Network Intrusion Detection: Evasion,
Traffic Normalization, and End-to-End Protocol Semantics
for L3-L4 NIDS.

HPACK & QPACK

Huffman decoder and encoders should be reviewed: at the moment we use 1-character decoding table which shows better performance than for nghttp2 and Nginx decoders #1176 (comment) . However, LiteSpeed uses large tables and batching to speedup Huffman encoding and decoding. Probably allowed characters (in sense of #628), already decoded (in sense of this issue) can be encoded in the large table. Also see #1207.

References:

Modes of operation

To not to hurt performance in cases which don't require strong security, the feature should be configurable per-vhost and per-location in the same sense as #688.

The transformation logic (as described in RFC 7230 5.7.2) for cookies and URI must be done by a configuration option (see also #902 ):

    http_norm <uri|cookie>
    content_security_mode <strict|transform|log>

, e.g.

    http_norm uri cookie;
    content_security_mode strict;

Following checks and transformations must be done:

  • url - decode percent-encoded string (double percent hex and first/second/double nibble hex aren't allowed). Messages with wrong hex strigs (e.g. http://foo.com/?%product=select 1,2,3, % isn't followed by 2 hex digets) must be blocked. Spaces may be represented in many ways, e.g. with + or %20 (see HTML URL Encoding Reference) - we don't need to do anything with it. RFC 3986 allows percent encoding in all parts of URI, but it's unclear how to deal e.g. with UTF-8 hostname, so we decode URI abs_path only.

  • utf8 - validate UTF-8 encoding: decode percent-encoded and validate UTF-8 bytes;

  • path - remove path traversals like /../ or // (see ngx_http_parse_complex_uri()) and translate \ to /.

  • pollution (subject for Process HTTP abs_path and query separately #1276) - take the 1st polluted HTTP parameter for URI or POST in content_security_mode=transform mode. In validation mode (w/o content_security_mode=log attribute) just builds a map of the parameters and ensures that there is no pollution. In content_security_mode=transform mode rewrites the URI (available for URIs only) and drops a request and writes a warning for content_security_mode=strict. HTTP parameter fragmentation, e.g.

      http://test.com/url?a=1+select&b=1+from&c=base
    

    is left for application-side WAF.

Additional alphabets must be introduced to validate the strings after all the decodings. These alphabets may prevent double percent encodings (e.g. %2541 which is essencialy %41 after the first hex decoding and A after the second) by prohibiting %.

path must be executed after string decoding, e.g. path /a/b/abba/%2e%2e/abba must be decoded and after that .. removed. Also allowed alphabets must be verified after the decodings to block messages with CR, LF or zero byte.

Implementation requirements

If none of the normalization option is specified, then the HTTP parser must not perform detailed processing and just validate allowed alphabet as now, i.e. there must be zero performance overhead if normalization isn't required by configuration.

All the decoders and log and strict modes must copy an observed string to some buffer, because we need to forward percent-encoded URI. Since all the encodings are larger than an original data, the content_security_mode=transform mode must percent-recode decoded string in-place rewriting the original string. skb fragmentation should be used to handle data gap between shortened URI and HTTP/ part. The fragmentation must be done only once when all the decoders finish. A fallback to full data copying if number of fragments per buffer (#498) grows to more than a compile-time constant.

The normalization must be done before the cache processing to minimize different URI keys stored in the cache.

Since it's unwished to grow current HTTP parser states set, the logic must be done in the plugable (external) FSM by conditional unlikely jump (no need to support the compilation directive any more).

Also please fix the TODO for URI abs_path for more accurate filtering of injection attacks, e.g. it'd be good to be able to prohibit / in query string.

SIMD

There are SIMD implementations of UTF-8 validation or recoding (e.g. to/from UTF-16 or UTF-32). See for example

  1. https://github.com/lemire/fastvalidate-utf-8 and the paper https://arxiv.org/pdf/2010.03090.pdf
  2. https://r-libre.teluq.ca/2400/3/Transcoding%20Billions%20of%20Unicode%20Characters%20per%20Second%20with%20SIMD%20Instructions.pdf
  3. https://nullprogram.com/blog/2017/10/06/
  4. Adventures in SIMD-Thinking (part 2 of 2) - Bob Steagall - CppCon 2020, UTF-8 to UTF-32 Conversion Algorithm

However, probably it makes sense to sacrifice SIMD to do percent-decoding, UTF-8 validation, validation of allowed character sets (in sense of #628 ) and transformations (path or arguments) in single pass.

Tests and docs

Please update https://github.com/tempesta-tech/tempesta/wiki/Web-security Wiki page on finishing the task.

  • At least one functional test for detection a response splitting attack is required.
  • Functional test for custom character sets Custom character sets tempesta-test#3
  • Tempesta must block requests like GET /vulnerabilities/xss_d/?default=/%0aSet-Cookie:crlf-injection HTTP/2

Further possible extensions

We leave back-end server personality normalization for further development if there will be any real requests. Probably this won't be needed since we're going to provide full Web server functionality and leave really heavy processing logic to dedicated WAF solutions.

HTTP responses also aren't normalized - we target initial attacks filtering instead of filtration of their consequences.

Also the decoders set is very restricted, e.g. there is no lower case conversion or Microsoft %U decoding or unicode normalization, so please keep in mind possible further extension of the decoder.

@krizhanovsky krizhanovsky added this to the TBD milestone Jun 21, 2015
dkirjanov pushed a commit that referenced this issue Aug 23, 2017
there is a chance that the server connection resources wont't be released
on server disconnect and tempesta shutdown threafter so we have to explicitly
call the tfw_connection_release()

[17193.213542] ------------[ cut here ]------------
[17193.217477] Kernel BUG at ffffffffc04f06e7 [verbose debug info unavailable]
[17193.217477] invalid opcode: 0000 [#1] SMP
[17193.217477] Modules linked in: tfw_sched_ratio(O) tfw_sched_http(O)
tfw_sched_hash(O) tempesta_fw(O) tempesta_db(O) tempesta_tls(O)
bochs_drm ttm drm_kms_helper drm fb_sys_fops syscopyarea sysfillrect
ppdev input_leds led_class sg serio_raw sysimgblt parport_pc parport
pcspkr button ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto
mbcache sr_mod cdrom sd_mod ata_generic crc32c_intel psmouse ata_piix
libata i2c_piix4 e1000 scsi_mod floppy [last unloaded: tempesta_tls]
[17193.217477] CPU: 1 PID: 4288 Comm: sysctl Tainted: G           O    4.9.35 #2
[17193.217477] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
[17193.217477] task: ffff93939a8f6700 task.stack: ffffa400005f0000
[17193.217477] RIP: 0010:[<ffffffffc04f06e7>]  [<ffffffffc04f06e7>]
tfw_sock_srv_del_conns+0xf7/0x110 [tempesta_fw]
[17193.217477] RSP: 0018:ffffa400005f3ce8  EFLAGS: 00010202
[17193.217477] RAX: 0000000000000001 RBX: ffff93937642b2a0 RCX: ffffffffc0500a50
[17193.217477] RDX: ffff93937642b350 RSI: ffff93939ba9c700 RDI: ffff9393734a6420
[17193.217477] RBP: ffff93937642b2d8 R08: fffffffffffffffc R09: 0000000000000003
[17193.217477] R10: 0010000100023588 R11: 0000000000000000 R12: ffff9393734a63d8
[17193.217477] R13: ffff9393734a6410 R14: ffff9393734a6410 R15: ffffa400005f3f20
[17193.217477] FS:  00007fbc0655c880(0000) GS:ffff9393bfd00000(0000)
knlGS:0000000000000000
[17193.217477] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[17193.217477] CR2: 00007fbc05c55d00 CR3: 0000000036565000 CR4: 00000000000006e0
[17193.217477] Stack:
[17193.217477]  ffff9393734a6410 ffff93939ba9c490 ffffffffc04f05f0
ffff93939ba9c480
[17193.217477]  0000561d95e7e2e0 ffffffffc04eceb8 ece581022d7a1302
ffffffffc0500fd0
[17193.217477]  ffffffffc0500fd0 dead000000000200 dead000000000100
ffffffffc04f0835
[17193.217477] Call Trace:
[17193.217477]  [<ffffffffc04f05f0>] ?
tfw_cfg_handle_ratio_predyn_opts+0x150/0x150 [tempesta_fw]
[17193.217477]  [<ffffffffc04eceb8>] ? tfw_sg_for_each_srv+0x58/0x90
[tempesta_fw]
[17193.217477]  [<ffffffffc04f0835>] ?
tfw_clean_srv_groups+0x135/0x150 [tempesta_fw]
[17193.217477]  [<ffffffffc04d0c5f>] ? tfw_cfg_stop+0x6f/0xb0 [tempesta_fw]
[17193.217477]  [<ffffffffc04eb98d>] ?
handle_sysctl_state_io+0x19d/0x1d0 [tempesta_fw]
[17193.217477]  [<ffffffffc04eb82a>] ?
handle_sysctl_state_io+0x3a/0x1d0 [tempesta_fw]
[17193.217477]  [<ffffffffa55fc22e>] ? proc_sys_call_handler+0xde/0x100
[17193.217477]  [<ffffffffa558b6ae>] ? __vfs_write+0x2e/0x160
[17193.217477]  [<ffffffffa558bdab>] ? vfs_write+0xab/0x190
[17193.217477]  [<ffffffffa558d14d>] ? SyS_write+0x4d/0xb0
[17193.217477]  [<ffffffffa59278e4>] ? entry_SYSCALL_64_fastpath+0x17/0x98
[17193.217477] Code: 38 48 83 e8 38 4c 39 f5 74 23 4d 8b ac 24 88 00
00 00 4d 85 ed 74 14 49 8b 54 24 38 4c 89 e3 49 89 c4 48 39 d5 0f 85
46 ff ff ff <0f> 0b 5b 31 c0 5d 41 5c 41 5d 41 5e c3 66 90 66 2e 0f 1f
84 00
[17193.217477] RIP  [<ffffffffc04f06e7>]
tfw_sock_srv_del_conns+0xf7/0x110 [tempesta_fw]
[17193.217477]  RSP <ffffa400005f3ce8>
[17193.473816] ---[ end trace c254541427767bd1 ]---
[17193.485634] [tempesta] Un-registering scheduler: hash
[17193.518667] [tempesta] Un-registering scheduler: http
[17193.550624] [tempesta] Un-registering scheduler: ratio
[17193.585109] [tempesta] exiting...
[17193.588362] kmem_cache_destroy tfw_srv_conn_cache: Slab cache still
has objects
[17193.594033] CPU: 0 PID: 4301 Comm: rmmod Tainted: G      D    O    4.9.35 #2
[17193.598002] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
[17193.598002]  0000000000000000 ffffffffa569ddf8 ffff93939a0bdb40
0000000000000000
[17193.598002]  ffffffffa553c59e 00ff93939a0bdae0 ffffa400003b7e70
ffffa400003b7e70
[17193.598002]  ffffa400003b7e80 ffffa400003b7e80 4501bb40293c7162
000000000000000a
[17193.598002] Call Trace:
[17193.598002]  [<ffffffffa569ddf8>] ? dump_stack+0x46/0x5e
[17193.598002]  [<ffffffffa553c59e>] ? kmem_cache_destroy+0x23e/0x250
[17193.598002]  [<ffffffffc04eb9eb>] ? tfw_exit+0x2b/0x60 [tempesta_fw]
[17193.598002]  [<ffffffffa54cb858>] ? SyS_delete_module+0x178/0x240
[17193.598002]  [<ffffffffa5402064>] ? exit_to_usermode_loop+0x64/0x80
[17193.598002]  [<ffffffffa59278e4>] ? entry_SYSCALL_64_fastpath+0x17/0x98

Signed-off-by: Denis Kirjanov <dk@tempesta-tech.com>
@krizhanovsky krizhanovsky mentioned this issue Feb 4, 2018
12 tasks
@krizhanovsky krizhanovsky modified the milestones: backlog, 0.8 TDB v0.2 Feb 4, 2018
@krizhanovsky krizhanovsky modified the milestones: 0.8 TDB v0.2, 0.6 KTLS Mar 22, 2018
@krizhanovsky krizhanovsky changed the title HTTP requests normalization HTTP normalization Apr 4, 2018
@krizhanovsky krizhanovsky modified the milestones: 0.6 KTLS, 1.0 Beta Jul 17, 2018
@krizhanovsky krizhanovsky modified the milestones: 1.0 Beta, 0.7 HTTP/2 Jul 17, 2018
@krizhanovsky krizhanovsky removed this from the 0.8 TDBv0.2 milestone Feb 2, 2019
@krizhanovsky
Copy link
Contributor Author

The issue was wrongly closed

@krizhanovsky krizhanovsky reopened this May 14, 2023
@krizhanovsky krizhanovsky removed their assignment May 14, 2023
dmpetroff pushed a commit that referenced this issue May 18, 2023
- fuse domain fronting ans strict host checking frang validations together
- show full SNI in "unknown host" message
krizhanovsky pushed a commit that referenced this issue Jun 16, 2023
Review fixes #2

- fuse domain fronting ans strict host checking frang validations together
- show full SNI in "unknown host" message
krizhanovsky pushed a commit that referenced this issue Jun 16, 2023
- fuse domain fronting ans strict host checking frang validations together
- show full SNI in "unknown host" message
const-t added a commit that referenced this issue Feb 26, 2024
We can do this, because connection won't
be removed while we have alvie response-request
pair.
const-t added a commit that referenced this issue Feb 27, 2024
We can do this, because connection won't
be removed while we have alvie response-request
pair.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants