Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #782: release server connection resources on server disconnect #814

Closed
wants to merge 1 commit into from

Conversation

dkirjanov
Copy link
Contributor

there is a chance that the server connection resources wont't be released
on server disconnect and tempesta shutdown threafter so we have to explicitly
call the tfw_connection_release()

[17193.213542] ------------[ cut here ]------------
[17193.217477] Kernel BUG at ffffffffc04f06e7 [verbose debug info unavailable]
[17193.217477] invalid opcode: 0000 [#1] SMP
[17193.217477] Modules linked in: tfw_sched_ratio(O) tfw_sched_http(O)
tfw_sched_hash(O) tempesta_fw(O) tempesta_db(O) tempesta_tls(O)
bochs_drm ttm drm_kms_helper drm fb_sys_fops syscopyarea sysfillrect
ppdev input_leds led_class sg serio_raw sysimgblt parport_pc parport
pcspkr button ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto
mbcache sr_mod cdrom sd_mod ata_generic crc32c_intel psmouse ata_piix
libata i2c_piix4 e1000 scsi_mod floppy [last unloaded: tempesta_tls]
[17193.217477] CPU: 1 PID: 4288 Comm: sysctl Tainted: G O 4.9.35 #2
[17193.217477] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
[17193.217477] task: ffff93939a8f6700 task.stack: ffffa400005f0000
[17193.217477] RIP: 0010:[] []
tfw_sock_srv_del_conns+0xf7/0x110 [tempesta_fw]
[17193.217477] RSP: 0018:ffffa400005f3ce8 EFLAGS: 00010202
[17193.217477] RAX: 0000000000000001 RBX: ffff93937642b2a0 RCX: ffffffffc0500a50
[17193.217477] RDX: ffff93937642b350 RSI: ffff93939ba9c700 RDI: ffff9393734a6420
[17193.217477] RBP: ffff93937642b2d8 R08: fffffffffffffffc R09: 0000000000000003
[17193.217477] R10: 0010000100023588 R11: 0000000000000000 R12: ffff9393734a63d8
[17193.217477] R13: ffff9393734a6410 R14: ffff9393734a6410 R15: ffffa400005f3f20
[17193.217477] FS: 00007fbc0655c880(0000) GS:ffff9393bfd00000(0000)
knlGS:0000000000000000
[17193.217477] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[17193.217477] CR2: 00007fbc05c55d00 CR3: 0000000036565000 CR4: 00000000000006e0
[17193.217477] Stack:
[17193.217477] ffff9393734a6410 ffff93939ba9c490 ffffffffc04f05f0
ffff93939ba9c480
[17193.217477] 0000561d95e7e2e0 ffffffffc04eceb8 ece581022d7a1302
ffffffffc0500fd0
[17193.217477] ffffffffc0500fd0 dead000000000200 dead000000000100
ffffffffc04f0835
[17193.217477] Call Trace:
[17193.217477] [] ?
tfw_cfg_handle_ratio_predyn_opts+0x150/0x150 [tempesta_fw]
[17193.217477] [] ? tfw_sg_for_each_srv+0x58/0x90
[tempesta_fw]
[17193.217477] [] ?
tfw_clean_srv_groups+0x135/0x150 [tempesta_fw]
[17193.217477] [] ? tfw_cfg_stop+0x6f/0xb0 [tempesta_fw]
[17193.217477] [] ?
handle_sysctl_state_io+0x19d/0x1d0 [tempesta_fw]
[17193.217477] [] ?
handle_sysctl_state_io+0x3a/0x1d0 [tempesta_fw]
[17193.217477] [] ? proc_sys_call_handler+0xde/0x100
[17193.217477] [] ? __vfs_write+0x2e/0x160
[17193.217477] [] ? vfs_write+0xab/0x190
[17193.217477] [] ? SyS_write+0x4d/0xb0
[17193.217477] [] ? entry_SYSCALL_64_fastpath+0x17/0x98
[17193.217477] Code: 38 48 83 e8 38 4c 39 f5 74 23 4d 8b ac 24 88 00
00 00 4d 85 ed 74 14 49 8b 54 24 38 4c 89 e3 49 89 c4 48 39 d5 0f 85
46 ff ff ff <0f> 0b 5b 31 c0 5d 41 5c 41 5d 41 5e c3 66 90 66 2e 0f 1f
84 00
[17193.217477] RIP []
tfw_sock_srv_del_conns+0xf7/0x110 [tempesta_fw]
[17193.217477] RSP
[17193.473816] ---[ end trace c254541427767bd1 ]---
[17193.485634] [tempesta] Un-registering scheduler: hash
[17193.518667] [tempesta] Un-registering scheduler: http
[17193.550624] [tempesta] Un-registering scheduler: ratio
[17193.585109] [tempesta] exiting...
[17193.588362] kmem_cache_destroy tfw_srv_conn_cache: Slab cache still
has objects
[17193.594033] CPU: 0 PID: 4301 Comm: rmmod Tainted: G D O 4.9.35 #2
[17193.598002] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
[17193.598002] 0000000000000000 ffffffffa569ddf8 ffff93939a0bdb40
0000000000000000
[17193.598002] ffffffffa553c59e 00ff93939a0bdae0 ffffa400003b7e70
ffffa400003b7e70
[17193.598002] ffffa400003b7e80 ffffa400003b7e80 4501bb40293c7162
000000000000000a
[17193.598002] Call Trace:
[17193.598002] [] ? dump_stack+0x46/0x5e
[17193.598002] [] ? kmem_cache_destroy+0x23e/0x250
[17193.598002] [] ? tfw_exit+0x2b/0x60 [tempesta_fw]
[17193.598002] [] ? SyS_delete_module+0x178/0x240
[17193.598002] [] ? exit_to_usermode_loop+0x64/0x80
[17193.598002] [] ? entry_SYSCALL_64_fastpath+0x17/0x98

Signed-off-by: Denis Kirjanov dk@tempesta-tech.com

there is a chance that the server connection resources wont't be released
on server disconnect and tempesta shutdown threafter so we have to explicitly
call the tfw_connection_release()

[17193.213542] ------------[ cut here ]------------
[17193.217477] Kernel BUG at ffffffffc04f06e7 [verbose debug info unavailable]
[17193.217477] invalid opcode: 0000 [#1] SMP
[17193.217477] Modules linked in: tfw_sched_ratio(O) tfw_sched_http(O)
tfw_sched_hash(O) tempesta_fw(O) tempesta_db(O) tempesta_tls(O)
bochs_drm ttm drm_kms_helper drm fb_sys_fops syscopyarea sysfillrect
ppdev input_leds led_class sg serio_raw sysimgblt parport_pc parport
pcspkr button ip_tables x_tables autofs4 ext4 crc16 jbd2 fscrypto
mbcache sr_mod cdrom sd_mod ata_generic crc32c_intel psmouse ata_piix
libata i2c_piix4 e1000 scsi_mod floppy [last unloaded: tempesta_tls]
[17193.217477] CPU: 1 PID: 4288 Comm: sysctl Tainted: G           O    4.9.35 #2
[17193.217477] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
[17193.217477] task: ffff93939a8f6700 task.stack: ffffa400005f0000
[17193.217477] RIP: 0010:[<ffffffffc04f06e7>]  [<ffffffffc04f06e7>]
tfw_sock_srv_del_conns+0xf7/0x110 [tempesta_fw]
[17193.217477] RSP: 0018:ffffa400005f3ce8  EFLAGS: 00010202
[17193.217477] RAX: 0000000000000001 RBX: ffff93937642b2a0 RCX: ffffffffc0500a50
[17193.217477] RDX: ffff93937642b350 RSI: ffff93939ba9c700 RDI: ffff9393734a6420
[17193.217477] RBP: ffff93937642b2d8 R08: fffffffffffffffc R09: 0000000000000003
[17193.217477] R10: 0010000100023588 R11: 0000000000000000 R12: ffff9393734a63d8
[17193.217477] R13: ffff9393734a6410 R14: ffff9393734a6410 R15: ffffa400005f3f20
[17193.217477] FS:  00007fbc0655c880(0000) GS:ffff9393bfd00000(0000)
knlGS:0000000000000000
[17193.217477] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[17193.217477] CR2: 00007fbc05c55d00 CR3: 0000000036565000 CR4: 00000000000006e0
[17193.217477] Stack:
[17193.217477]  ffff9393734a6410 ffff93939ba9c490 ffffffffc04f05f0
ffff93939ba9c480
[17193.217477]  0000561d95e7e2e0 ffffffffc04eceb8 ece581022d7a1302
ffffffffc0500fd0
[17193.217477]  ffffffffc0500fd0 dead000000000200 dead000000000100
ffffffffc04f0835
[17193.217477] Call Trace:
[17193.217477]  [<ffffffffc04f05f0>] ?
tfw_cfg_handle_ratio_predyn_opts+0x150/0x150 [tempesta_fw]
[17193.217477]  [<ffffffffc04eceb8>] ? tfw_sg_for_each_srv+0x58/0x90
[tempesta_fw]
[17193.217477]  [<ffffffffc04f0835>] ?
tfw_clean_srv_groups+0x135/0x150 [tempesta_fw]
[17193.217477]  [<ffffffffc04d0c5f>] ? tfw_cfg_stop+0x6f/0xb0 [tempesta_fw]
[17193.217477]  [<ffffffffc04eb98d>] ?
handle_sysctl_state_io+0x19d/0x1d0 [tempesta_fw]
[17193.217477]  [<ffffffffc04eb82a>] ?
handle_sysctl_state_io+0x3a/0x1d0 [tempesta_fw]
[17193.217477]  [<ffffffffa55fc22e>] ? proc_sys_call_handler+0xde/0x100
[17193.217477]  [<ffffffffa558b6ae>] ? __vfs_write+0x2e/0x160
[17193.217477]  [<ffffffffa558bdab>] ? vfs_write+0xab/0x190
[17193.217477]  [<ffffffffa558d14d>] ? SyS_write+0x4d/0xb0
[17193.217477]  [<ffffffffa59278e4>] ? entry_SYSCALL_64_fastpath+0x17/0x98
[17193.217477] Code: 38 48 83 e8 38 4c 39 f5 74 23 4d 8b ac 24 88 00
00 00 4d 85 ed 74 14 49 8b 54 24 38 4c 89 e3 49 89 c4 48 39 d5 0f 85
46 ff ff ff <0f> 0b 5b 31 c0 5d 41 5c 41 5d 41 5e c3 66 90 66 2e 0f 1f
84 00
[17193.217477] RIP  [<ffffffffc04f06e7>]
tfw_sock_srv_del_conns+0xf7/0x110 [tempesta_fw]
[17193.217477]  RSP <ffffa400005f3ce8>
[17193.473816] ---[ end trace c254541427767bd1 ]---
[17193.485634] [tempesta] Un-registering scheduler: hash
[17193.518667] [tempesta] Un-registering scheduler: http
[17193.550624] [tempesta] Un-registering scheduler: ratio
[17193.585109] [tempesta] exiting...
[17193.588362] kmem_cache_destroy tfw_srv_conn_cache: Slab cache still
has objects
[17193.594033] CPU: 0 PID: 4301 Comm: rmmod Tainted: G      D    O    4.9.35 #2
[17193.598002] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
[17193.598002]  0000000000000000 ffffffffa569ddf8 ffff93939a0bdb40
0000000000000000
[17193.598002]  ffffffffa553c59e 00ff93939a0bdae0 ffffa400003b7e70
ffffa400003b7e70
[17193.598002]  ffffa400003b7e80 ffffa400003b7e80 4501bb40293c7162
000000000000000a
[17193.598002] Call Trace:
[17193.598002]  [<ffffffffa569ddf8>] ? dump_stack+0x46/0x5e
[17193.598002]  [<ffffffffa553c59e>] ? kmem_cache_destroy+0x23e/0x250
[17193.598002]  [<ffffffffc04eb9eb>] ? tfw_exit+0x2b/0x60 [tempesta_fw]
[17193.598002]  [<ffffffffa54cb858>] ? SyS_delete_module+0x178/0x240
[17193.598002]  [<ffffffffa5402064>] ? exit_to_usermode_loop+0x64/0x80
[17193.598002]  [<ffffffffa59278e4>] ? entry_SYSCALL_64_fastpath+0x17/0x98

Signed-off-by: Denis Kirjanov <dk@tempesta-tech.com>
@dkirjanov dkirjanov changed the title Fix #782: release a server connection resources on server disconnect Fix #782: release server connection resources on server disconnect Aug 23, 2017
Copy link
Contributor

@keshonok keshonok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix itself is correct. However, a change is required.

@@ -414,6 +414,8 @@ tfw_sock_srv_disconnect(TfwConn *conn)
if (atomic_read(&conn->refcnt) != TFW_CONN_DEATHCNT)
ret = ss_close_sync(sk, true);

tfw_connection_release(conn);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that the fix is correct. However, an explanation of the fix is completely lost. The code interactions are very complex, so an explanation needs to be put somewhere. Perhaps as a comment to the tfw_sock_srv_disconnect() function (best!), and possibly as a comment to the issue itself on GitHub and in the commit message.

I think that putting a crash trace in the commit message is overkill.

My understanding of the issue is as follows:

  • There are two different ways of closing a server connection. One is closing by a backend. The other is closing by Tempesta. These two different ways of a close go through slightly different code paths.
    • A close by a backend is considered to be temporary. In that case all pending requests in the queue of a connection are resent to the backend if the connection is restored relatively quickly, or rescheduled to other connections. In other words, the requests are not lost, and the communication between clients and backends is not stopped.
    • A close by Tempesta on shutdown is considered permanent. All resources are released, which includes all pending requests.
  • Closing of a connection is a competitive process as it can be invoked concurrently by either a backend or Tempesta. The code is deliberately written in such a way so that only one of concurrent closes is allowed to proceed. Given that, there's a chance that even when Tempesta is in shutdown a request for a close from a backend gets through a tad bit earlier, so the process of closing would go on an undesired path. The request for a close from Tempesta won't get through. Therefore there's the need to assure that the connection's resources are definitely released in this case. Hence the fix.

Copy link
Contributor

@vankoven vankoven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix is good, but comment is a mandatory, since explicit release is used in addition to reference counting.

I also don't see any need in crash case in the commit message. We have bug tracker and the issue number is sufficient information to describe the crash.

@@ -414,6 +414,8 @@ tfw_sock_srv_disconnect(TfwConn *conn)
if (atomic_read(&conn->refcnt) != TFW_CONN_DEATHCNT)
ret = ss_close_sync(sk, true);

tfw_connection_release(conn);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally connections are destroyed when a reference counter reach zero or DEATH value, and it's not obvious why need explicit release() here. A comment is needed here or before function.

@dkirjanov dkirjanov closed this Aug 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants