Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safe handling of websockets drop #1797

Merged
merged 11 commits into from
Feb 3, 2023
Merged

Safe handling of websockets drop #1797

merged 11 commits into from
Feb 3, 2023

Conversation

krizhanovsky
Copy link
Contributor

Fix for #1647 (see commit messages). Now

for i in `seq 200`; do ./run_tests.py ws.test_ws_ping.CacheTest.test_ping_websockets; done

passes for local tests.

Copy link
Contributor

@const-t const-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got panic running run_tests.py ws.test_ws_ping.CacheTest.test_ping_websockets

Looks like we use the server even if its connection closed. tfw_ws_conn_release should have "backend management" code.

[20093.217396] kernel BUG at /home/constantine/projects/tempesta/fw/sock.c:456!
[20093.217940] invalid opcode: 0000 [#1] SMP NOPTI
[20093.218276] CPU: 2 PID: 7450 Comm: python3 Kdump: loaded Tainted: G           OE     5.10.35+ #1
[20093.218867] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.16.1-1-1 04/01/2014
[20093.219542] RIP: 0010:ss_send+0x2bd/0x310 [tempesta_fw]
[20093.219905] Code: 89 44 24 08 eb 98 e8 52 ea d0 e0 85 c0 0f 85 5c 74 01 00 41 bc f4 ff ff ff e9 55 ff ff ff 48 c7 03 00 00 00 00 e9 f8 fe ff ff <0f> 0b be 01 00 00 00 48 89 ef e8 b4 de a7 e0 e9 39 fe ff ff 0f 0b
[20093.221189] RSP: 0018:ffffc900000f4a00 EFLAGS: 00010246
[20093.221557] RAX: 0000000000000000 RBX: ffffc900000f4b28 RCX: ffff88817530f878
[20093.222072] RDX: 0000000000000002 RSI: ffff88817530f030 RDI: 0000000000000000
[20093.222562] RBP: ffff88817530f020 R08: ffff8881602dd598 R09: 0000000000000001
[20093.223052] R10: 0000000000000001 R11: 000000000000000c R12: ffff8881602dd578
[20093.223542] R13: ffff8881602dd220 R14: ffff8881602dd330 R15: ffff8881602dd220
[20093.224037] FS:  00007f6f52f15740(0000) GS:ffff888277d00000(0000) knlGS:0000000000000000
[20093.224591] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[20093.224987] CR2: 00007f6f4fcf4018 CR3: 000000018eca8000 CR4: 0000000000350ee0
[20093.225477] Call Trace:
[20093.225654]  <IRQ>
[20093.225813]  tfw_http_req_fwd_send.constprop.0+0x51/0x2d0 [tempesta_fw]
[20093.226282]  ? tfw_http_msg_hdr_xfrm+0xb8/0xf0 [tempesta_fw]
[20093.226677]  tfw_http_conn_fwd_unsent+0xe7/0x3d0 [tempesta_fw]
[20093.227089]  tfw_http_req_cache_cb+0x37c/0x640 [tempesta_fw]
[20093.227483]  ? tfw_http_req_redir+0x7e0/0x7e0 [tempesta_fw]
[20093.227871]  tfw_cache_process+0x146/0x290 [tempesta_fw]
[20093.228250]  tfw_http_req_process+0x39d/0x820 [tempesta_fw]
[20093.228682]  ? __coalesce_frag+0x99/0x150 [tempesta_fw]
[20093.229039]  ? ss_skb_queue_coalesce_tail+0xe7/0x120 [tempesta_fw]
[20093.229457]  tfw_connection_recv+0x53/0x90 [tempesta_fw]
[20093.229822]  ss_tcp_process_data+0x1f1/0x400 [tempesta_fw]
[20093.230196]  ss_tcp_data_ready+0x3f/0xa0 [tempesta_fw]
[20093.230552]  tcp_data_queue+0x8d3/0xdf0
[20093.230820]  tcp_rcv_established+0x234/0x650
[20093.231112]  ? sk_filter_trim_cap+0xd2/0x220
[20093.231407]  tcp_v4_do_rcv+0x127/0x1e0
[20093.231666]  tcp_v4_rcv+0xc49/0xdb0
[20093.231906]  ip_protocol_deliver_rcu+0x2b/0x1c0
[20093.232215]  ip_local_deliver_finish+0x44/0x50
[20093.232517]  __netif_receive_skb_one_core+0x85/0xa0
[20093.232851]  process_backlog+0x96/0x170
[20093.233112]  net_rx_action+0x142/0x3e0
[20093.233369]  __do_softirq+0xd0/0x28a
[20093.233618]  asm_call_irq_on_stack+0x12/0x20
[20093.233914]  </IRQ>
[20093.234064]  do_softirq_own_stack+0x32/0x40
[20093.234388]  do_softirq+0x5e/0x70
[20093.234621]  __local_bh_enable_ip+0x4b/0x50
[20093.234906]  ip_finish_output2+0x193/0x540
[20093.235185]  __ip_queue_xmit+0x17a/0x400
[20093.235454]  ? __local_bh_enable_ip+0x4b/0x50
[20093.235751]  __tcp_transmit_skb+0x9f0/0xb90
[20093.236037]  tcp_write_xmit+0x408/0x1270
[20093.236305]  __tcp_push_pending_frames+0x32/0xe0
[20093.236620]  tcp_sendmsg_locked+0xa1d/0xb80
[20093.236907]  tcp_sendmsg+0x27/0x40
[20093.237153]  sock_sendmsg+0x54/0x60
[20093.237400]  __sys_sendto+0xde/0x120
[20093.237649]  ? init_wait_entry+0x30/0x30
[20093.237922]  ? __sys_connect+0x8c/0xb0
[20093.238182]  ? switch_fpu_return+0x40/0xb0
[20093.238463]  __x64_sys_sendto+0x20/0x30
[20093.238728]  do_syscall_64+0x33/0x80
[20093.238979]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[20093.239327] RIP: 0033:0x7f6f5329c44c
[20093.239574] Code: 89 02 48 c7 c0 ff ff ff ff eb b5 0f 1f 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 19 45 31 c9 45 31 c0 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 64 c3 0f 1f 00 55 48 83 ec 20 48 89 54 24 10
[20093.240835] RSP: 002b:00007fff37a14e58 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[20093.241346] RAX: ffffffffffffffda RBX: 00007fff37a14f10 RCX: 00007f6f5329c44c
[20093.241831] RDX: 0000000000000217 RSI: 0000000003969810 RDI: 0000000000000005
[20093.242314] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[20093.242796] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[20093.243277] R13: ffffffffc4653600 R14: 00007fff37a14f10 R15: 0000000000000001
[20093.243765] Modules linked in: tempesta_fw(OE) tempesta_db(OE) tempesta_tls(OE) tempesta_lib(OE) sha256_ssse3(E) sha512_ssse3(E) sha512_generic(E) uinput(E) joydev(E) hid_generic(E) usbhid(E) hid(E) rfkill(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) libaes(E) crypto_simd(E) cryptd(E) glue_helper(E) snd_hda_codec_generic(E) ledtrig_audio(E) snd_hda_intel(E) snd_intel_dspcfg(E) iTCO_wdt(E) soundwire_intel(E) intel_pmc_bxt(E) soundwire_generic_allocation(E) iTCO_vendor_support(E) snd_soc_core(E) watchdog(E) virtio_net(E) xhci_pci(E) snd_compress(E) net_failover(E) soundwire_cadence(E) snd_hda_codec(E) i2c_i801(E) xhci_hcd(E) snd_hda_core(E) i2c_smbus(E) pcspkr(E) usbcore(E) failover(E) virtiofs(E) virtio_balloon(E) snd_hwdep(E) ahci(E) soundwire_bus(E) libahci(E) virtio_console(E) snd_pcm(E) libata(E) snd_timer(E) snd(E) soundcore(E) scsi_mod(E) lpc_ich(E) qemu_fw_cfg(E) button(E) msr(E) parport_pc(E) ppdev(E) lp(E) parport(E) fuse(E) configfs(E) virtio_rng(E) rng_core(E)
[20093.243797]  ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc32c_generic(E) crc16(E) mbcache(E) jbd2(E) virtio_gpu(E) virtio_dma_buf(E) drm_kms_helper(E) cec(E) drm(E) virtio_blk(E) evdev(E) crct10dif_pclmul(E) crct10dif_common(E) crc32c_intel(E) psmouse(E) serio_raw(E) virtio_pci(E) virtio_ring(E) virtio(E) [last unloaded: tempesta_lib]
[    0.041690] Tempesta: cannot allocate 256 continous huge pages at node 0
[    0.041929] Tempesta: cannot vmalloc area of 536870912 bytes at node 0

@s0nx
Copy link
Contributor

s0nx commented Jan 27, 2023

@const-t, i can't reproduce your crash with the same test

@s0nx
Copy link
Contributor

s0nx commented Jan 27, 2023

The fix looks good, though i'm worried about the aforementioned crash.

@krizhanovsky
Copy link
Contributor Author

Yeah, I also tried for i in seq 200; do ./run_tests.py ws.test_ws_ping.CacheTest.test_ping_websockets; done on my tests branch from tempesta-tech/tempesta-test#381 in a KVM with 8CPUs and 8GB RAM and with no luck. One thing for me is that I run a KASAN build (config in the attachment).
1797-linux-conf.txt

The call stack has no any WS function and it seems the regular HTTP code path is broken. The crash is in the request processing and I don't see any changes except 5c2226e ... Need to dig deeper into the logic. If the bug can be reproduced, then it'll simplify debugging a lot.

@const-t could you please share you config and any specs about your setup?

@const-t
Copy link
Contributor

const-t commented Jan 30, 2023

@krizhanovsky @s0nx I think problem in this line. We must clear TFW_CONN_B_UNSCHED bit only on unsuccessfull "steal". After fix this the panic is gone.

To reproduce I building tempesta with no debug flags. Also, maybe worth to change this line to server ${server_ip}:8099 conns_n=1;

config: const-t-vm-config.txt

@EvgeniiMekhanik
Copy link
Contributor

EvgeniiMekhanik commented Jan 30, 2023

I got crash, after two hours tests, but i can't tell is it relates to this issue (i want to run tests all night long to check that it will not reproduced) :
[ 2746.178951] tfw_http_conn_send+0x1c/0x20 [tempesta_fw]
[ 2746.179262] tfw_connection_send+0x25/0x30 [tempesta_fw]
[ 2746.179580] tfw_http_conn_fwd_unsent+0x11e/0x5e0 [tempesta_fw]
[ 2746.179939] tfw_http_req_cache_cb+0x1c8/0x5f0 [tempesta_fw]
[ 2746.180270] ? tfw_http_req_redir+0x8d0/0x8d0 [tempesta_fw]
[ 2746.180579] tfw_cache_process+0x63/0x360 [tempesta_fw]
[ 2746.180915] ? tfw_http_parse_req+0x13580/0x13580 [tempesta_fw]
[ 2746.181259] tfw_http_req_process+0x3e1/0x940 [tempesta_fw]
[ 2746.181572] ? __tfw_http_msg_alloc+0x1a0/0x340 [tempesta_fw]
[ 2746.181887] ? bzero_fast+0xe/0x10 [tempesta_lib]
[ 2746.182151] ? tfw_http_init_parser_req+0x22/0x50 [tempesta_fw]
[ 2746.182492] tfw_http_msg_process_generic+0x19d/0x6b0 [tempesta_fw]
[ 2746.182867] ? __alloc_skb+0x42/0x200
[ 2746.183075] ? __coalesce_frag+0x99/0x150 [tempesta_fw]
[ 2746.183379] tfw_http_msg_process+0x38/0x50 [tempesta_fw]
[ 2746.183699] tfw_connection_recv+0x59/0xa0 [tempesta_fw]
[ 2746.184107] ss_tcp_process_data+0x1e0/0x430 [tempesta_fw]
[ 2746.184488] ss_tcp_data_ready+0x44/0xc0 [tempesta_fw]
[ 2746.184804] tcp_data_ready+0x2b/0xd0
[ 2746.185023] tcp_data_queue+0x805/0xe50
[ 2746.185261] tcp_rcv_established+0x254/0x6b0
[ 2746.185515] tcp_v4_do_rcv+0x140/0x200
[ 2746.185737] tcp_v4_rcv+0xcd0/0xe20
[ 2746.185945] ip_protocol_deliver_rcu+0x44/0x230
[ 2746.186214] ip_local_deliver_finish+0x48/0x60
[ 2746.186479] ip_local_deliver+0xf8/0x110
[ 2746.186698] ? ip_protocol_deliver_rcu+0x230/0x230
[ 2746.186960] ip_rcv_finish+0x87/0xa0
[ 2746.187171] ip_rcv+0xce/0xe0
[ 2746.187338] ? ip_rcv_finish_core.constprop.0+0x470/0x470
[ 2746.187658] __netif_receive_skb_one_core+0x86/0xa0
[ 2746.187952] __netif_receive_skb+0x18/0x60
[ 2746.188194] process_backlog+0x9e/0x170
[ 2746.188424] net_rx_action+0x13b/0x430
[ 2746.188673] __do_softirq+0xe3/0x340
[ 2746.188879] asm_call_irq_on_stack+0xf/0x20
[ 2746.189109]
[ 2746.189229] do_softirq_own_stack+0x3d/0x50
[ 2746.189473] do_softirq+0x66/0x80
[ 2746.189656] __local_bh_enable_ip+0x50/0x60
[ 2746.189886] ip_finish_output2+0x1bb/0x5c0
[ 2746.190118] ? nf_conntrack_in+0x200/0x680 [nf_conntrack]
[ 2746.190443] __ip_finish_output+0xca/0x210
[ 2746.190689] ip_finish_output+0x31/0xc0
[ 2746.190918] ip_output+0x78/0x100
[ 2746.191123] ? __ip_finish_output+0x210/0x210
[ 2746.191362] __ip_queue_xmit+0x18d/0x440
[ 2746.191581] ? __ip_finish_output+0x210/0x210
[ 2746.191820] ip_queue_xmit+0x15/0x20
[ 2746.192033] __tcp_transmit_skb+0xae3/0xcd0
[ 2746.192305] tcp_write_xmit+0x423/0x1260
[ 2746.192523] ? _copy_from_iter_full+0x8a/0x280
[ 2746.192796] __tcp_push_pending_frames+0x37/0x100
[ 2746.193067] tcp_push+0xd3/0x100
[ 2746.193246] tcp_sendmsg_locked+0xad7/0xc40
[ 2746.193495] tcp_sendmsg+0x2d/0x50
[ 2746.193684] inet_sendmsg+0x43/0x70
[ 2746.193904] sock_sendmsg+0x6a/0x80
[ 2746.194210] __sys_sendto+0x113/0x190
[ 2746.194516] ? __sys_connect+0xa2/0xd0
[ 2746.194846] ? do_vfs_ioctl+0x20b/0x6b0
[ 2746.195120] ? switch_fpu_return+0x49/0xc0
[ 2746.195391] __x64_sys_sendto+0x24/0x30
[ 2746.195689] do_syscall_64+0x38/0x90
[ 2746.195917] entry_SYSCALL_64_after_hwframe+0x44/0xa9

@krizhanovsky
Copy link
Contributor Author

I reproduced exactly the same crash as @const-t reproduced on his kernel config.

Copy link
Contributor

@const-t const-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

req_sent might be NULL and tfw_http_conn_on_hold() checks this, but
still may return true and we may crash on NULL pointer dereference,
so check req_sent before entering the if condition.
corresponding to the clinet connection pair. The reference counter
is set under the server socket spinlock. This makes the server
connection management safe and the server connection can no pass away
from under the client connection.

Client connection can not be freed until the upgrade response is
received and processed (either response can be dropped or
tfw_http_websocket_upgrade() called), so we never get freed client
connection in tfw_http_websocket_upgrade().

The only possible problem is that tfw_http_websocket_upgrade() makes
the connections pair, but does not acquire one more reference count
for the client connection.

This patch adds the client connection reference count and decrements
it in tfw_ws_conn_drop(), when a server connection is dead.
about HTTP and is mostly about connections handling. Now we can
export higher level tfw_http_websocket_upgrade() than
tfw_ws_srv_new_steal_sk();

Add asserttion that websocket upgrade happens under server socket lock.
forwarding a response. This is required to keep the request in the
seq_queue that the client connection will be referenced by at least
the request paired with currently processed response.

Move tfw_ws_cli_mod_timer() above the first user.

More accurate and safe connection initialization and putting.
and client websocket connections:

* tfw_ws_srv_ss_hook_drop() is used for server connections only, so
  it should use the same sequence of calls as tfw_sock_clnt_drop()
  for client connections.

* on upgrade time both the server and client connections must have
  reference count at least 2: for pairing pointers and for downcall
  destruction. Only client connection may have refcounter > 2
  (in case of pipelined HTTP requests after the upgrade request).

* tfw_ws_conn_unpair() is responsible now for it's own connection
  only since it doesn't know the state of the pair. This makes the
  code robust against concurrent drop of the paired connections.

* obviously, we should put connection only after closing it.

* Small coding style cleanups and additional tracing.
has just recovered after destructor cull in tfw_ws_srv_new_steal_sk().
In this case tfw_ws_srv_new_steal_sk() might clear TFW_CONN_B_UNSCHED
after setting it in concurrent tfw_http_req_fwd_send(), which just
make the connection available for transmission.

We still need clear_bit(TFW_CONN_B_UNSCHED) in normal operation
since we have fixed and limited number of server connections and
must make them available after the socket stealing.
  DEBUG=3 DBG_HTTP=3 DBG_TLS=3 DBG_SS=3 DBG_WS=3 make

kvmalloc_node() and kvfree() symbols can't be found.
just raises the reconnection timer and the srv_conn remains with
sk == NULL. So this isn't a good place to clear the TFW_CONN_B_UNSCHED
bit. This patch clears the TFW_CONN_B_UNSCHED on error in
tfw_ws_srv_new_steal_sk() or when a server connection is established
in tfw_sock_srv_connect_complete().
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants