Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continuous memory leak with console_subscriber #184

Closed
Seppel3210 opened this issue Nov 17, 2021 · 5 comments · May be fixed by #457
Closed

Continuous memory leak with console_subscriber #184

Seppel3210 opened this issue Nov 17, 2021 · 5 comments · May be fixed by #457
Labels
C-subscriber Crate: console-subscriber. S-bug Severity: bug

Comments

@Seppel3210
Copy link

Seppel3210 commented Nov 17, 2021

When calling console_subscriber::init() there is a big memory leak that seems to grow faster the more futures are spawned.
I ran this application through valgrind and it said that all the blocks were still reachable (the pointers still exist) so I'm guessing it's some sort of Circular ref-counted data structure that isn't using Weak properly (or something equivalent).

When I commented out the console_subscriber::init() line the leak completely went away (now idling at 3mb when it would go to 252mb when running for a couple hours with the subscriber)

@Seppel3210 Seppel3210 changed the title Continuous memory leak with `console_subscriber Continuous memory leak with console_subscriber Nov 17, 2021
@hawkw
Copy link
Member

hawkw commented Dec 2, 2021

Over what time interval was the memory leak observed, and does it eventually become stable?

I'm wondering about this because it's possible what you're seeing is not actually a memory leak per se, but memory use: by default, the console subscriber will store historical data for tasks when no clients are connected, so that this data can be played back to the client when it connects. Of course, this uses memory.

The historical data that's stored by default includes data for tasks which have completed, based on the assumption that users may wish to see information about what previously happened in the system. This data is retained for a period of time after the task completes, and then is eventually deleted. By default, completed tasks are retained for an hour after the task completes.

You can change how long completed tasks are retained for using a builder setting if you're using the TasksLayer::builder API, or with the TOKIO_CONSOLE_RETENTION_SECS environment variable if you're using console_subscriber::init(). Decreasing the retention duration should reduce the console subscriber's memory use significantly.

Of course, it's also certainly possible that there is also an actual memory leak somewhere (where memory is just never actually deallocated) --- this is still pre-release software, and there might be any number of bugs. :) But, it would be good to isolate that from memory usage to store historical data. Can you try changing the value of TOKIO_CONSOLE_RETENTION_SECS to something much lower (say, 60 seconds) and seeing if memory use continues to increase over a long period of runtime?

@hawkw hawkw added the C-subscriber Crate: console-subscriber. label Dec 2, 2021
@Milo123459
Copy link
Contributor

It would go up about a megabyte, even when no tasks were being spawned, every 10-15 minutes or so. This meant that over-night, it could clock up to 250 MBs more.

@AdamLeyshon
Copy link

Just though I'd give a little bit of info here as it might be related,
I've noticed that if I don't connect the console at all and leave it a few days the memory usage climbs forever.

However, the instant I connect the console, the memory is freed. What I have also noticed is that if I use the builder and set the retention period instead of using the default this does not occur.

if let Some(config) = settings.debug.as_ref() {
    if config.enabled {
        // Start Tokio Console
        console_subscriber::ConsoleLayer::builder()
            .retention(Duration::from_secs(300))
            .server_addr(([0, 0, 0, 0], config.port))
            .init();
    }
}

Cargo.toml:

# Async runtime
tokio = { version = "1.20.1", features = ["full", "rt-multi-thread"] }
tokio-stream = "0.1.11"
futures = "0.3.0"
futures-util = "0.3.21"
async-trait = "0.1.57"
console-subscriber = "0.1.8"

This AWS instance has 2GB of RAM and the program was using 55% of it, the normal amount is about 20MB. When I last checked, it was 1.1GB, after opening the console it dropped back to 23MB, this was over a period of approx. 6 days.

image

@AdamLeyshon
Copy link

Looking at it again, this only seems to be part of the problem.
Even with the retention set to 5 minutes, it still appears to leak forever.
In 15 minutes, it made 3706 allocations, about 44KB in size, but if left unchecked for several days, this grows to hundreds of megabytes.

image

image

Most of them are when I use Tokio's sleep in a loop of some sort but I suppose it could be any type of async task that was spawned that may lead to this not just sleep.

#0 [libc.so.6] 0x00007FB0C332C9FF
#1 [libc.so.6] 0x00007FB0C329AB42
#2 [registry-rpc] std::sys::unix::thread::Thread::new::thread_start [thread.rs:108]
#3 [registry-rpc] <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once [boxed.rs:1987]
#4 [registry-rpc] <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once [boxed.rs:1987]
#5 [registry-rpc] core::ops::function::FnOnce::call_once{{vtable.shim}} [function.rs:251]
#6 [registry-rpc] std::thread::Builder::spawn_unchecked_::{{closure}} [mod.rs:550]
#7 [registry-rpc] std::panic::catch_unwind [panic.rs:137]
#8 [registry-rpc] std::panicking::try [panicking.rs:447]
#9 [registry-rpc] __rust_try
#10 [registry-rpc] std::panicking::try::do_call [panicking.rs:483]
#11 [registry-rpc] <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once [unwind_safe.rs:271]
#12 [registry-rpc] std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}} [mod.rs:551]
#13 [registry-rpc] std::sys_common::backtrace::__rust_begin_short_backtrace [backtrace.rs:121]
#14 [registry-rpc] tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}} [pool.rs:469]
#15 [registry-rpc] tokio::runtime::blocking::pool::Inner::run [pool.rs:511]
#16 [registry-rpc] tokio::runtime::blocking::pool::Task::run [pool.rs:159]
#17 [registry-rpc] tokio::runtime::task::UnownedTask<S>::run [mod.rs:431]
#18 [registry-rpc] tokio::runtime::task::raw::RawTask::poll [raw.rs:200]
#19 [registry-rpc] tokio::runtime::task::raw::poll [raw.rs:255]
#20 [registry-rpc] tokio::runtime::task::harness::Harness<T,S>::poll [harness.rs:152]
#21 [registry-rpc] tokio::runtime::task::harness::Harness<T,S>::poll_inner [harness.rs:198]
#22 [registry-rpc] tokio::runtime::task::harness::poll_future [harness.rs:464]
#23 [registry-rpc] std::panic::catch_unwind [panic.rs:137]
#24 [registry-rpc] std::panicking::try [panicking.rs:447]
#25 [registry-rpc] __rust_try
#26 [registry-rpc] std::panicking::try::do_call [panicking.rs:483]
#27 [registry-rpc] <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once [unwind_safe.rs:271]
#28 [registry-rpc] tokio::runtime::task::harness::poll_future::{{closure}} [harness.rs:476]
#29 [registry-rpc] tokio::runtime::task::core::Core<T,S>::poll [core.rs:212]
#30 [registry-rpc] tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut [unsafe_cell.rs:14]
#31 [registry-rpc] tokio::runtime::task::core::Core<T,S>::poll::{{closure}} [core.rs:223]
#32 [registry-rpc] <tracing::instrument::Instrumented<T> as core::future::future::Future>::poll [instrument.rs:272]
#33 [registry-rpc] <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll [task.rs:42]
#34 [registry-rpc] tokio::runtime::scheduler::multi_thread::worker::Launch::launch::{{closure}} [worker.rs:365]
#35 [registry-rpc] tokio::runtime::scheduler::multi_thread::worker::run [worker.rs:403]
#36 [registry-rpc] tokio::macros::scoped_tls::ScopedKey<T>::set [scoped_tls.rs:61]
#37 [registry-rpc] tokio::runtime::scheduler::multi_thread::worker::run::{{closure}} [worker.rs:406]
#38 [registry-rpc] tokio::runtime::scheduler::multi_thread::worker::Context::run [worker.rs:426]
#39 [registry-rpc] tokio::runtime::scheduler::multi_thread::worker::Context::run_task [worker.rs:463]
#40 [registry-rpc] tokio::runtime::coop::budget [coop.rs:68]
#41 [registry-rpc] tokio::runtime::coop::with_budget [coop.rs:102]
#42 [registry-rpc] tokio::runtime::scheduler::multi_thread::worker::Context::run_task::{{closure}} [worker.rs:464]
#43 [registry-rpc] tokio::runtime::task::LocalNotified<S>::run [mod.rs:394]
#44 [registry-rpc] tokio::runtime::task::raw::RawTask::poll [raw.rs:200]
#45 [registry-rpc] tokio::runtime::task::raw::poll [raw.rs:255]
#46 [registry-rpc] tokio::runtime::task::harness::Harness<T,S>::poll [harness.rs:152]
#47 [registry-rpc] tokio::runtime::task::harness::Harness<T,S>::poll_inner [harness.rs:198]
#48 [registry-rpc] tokio::runtime::task::harness::poll_future [harness.rs:464]
#49 [registry-rpc] std::panic::catch_unwind [panic.rs:137]
#50 [registry-rpc] std::panicking::try [panicking.rs:447]
#51 [registry-rpc] __rust_try
#52 [registry-rpc] std::panicking::try::do_call [panicking.rs:483]
#53 [registry-rpc] <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once [unwind_safe.rs:271]
#54 [registry-rpc] tokio::runtime::task::harness::poll_future::{{closure}} [harness.rs:476]
#55 [registry-rpc] tokio::runtime::task::core::Core<T,S>::poll [core.rs:212]
#56 [registry-rpc] tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut [unsafe_cell.rs:14]
#57 [registry-rpc] tokio::runtime::task::core::Core<T,S>::poll::{{closure}} [core.rs:223]
#58 [registry-rpc] <tracing::instrument::Instrumented<T> as core::future::future::Future>::poll [instrument.rs:272]
#59 [registry-rpc] <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll [mod.rs:91]
**#60 [registry-rpc] registry_rs::internal::controller::rebalance_handler::{{closure}} [controller.rs:238]**
#61 [registry-rpc] <tokio::time::sleep::Sleep as core::future::future::Future>::poll [sleep.rs:433]
#62 [registry-rpc] tokio::time::sleep::Sleep::poll_elapsed [sleep.rs:393]
#63 [registry-rpc] tokio::time::sleep::Sleep::poll_elapsed::{{closure}} [macros.rs:664]
#64 [registry-rpc] tracing_core::event::Event::dispatch [event.rs:34]
#65 [registry-rpc] tracing_core::dispatcher::get_default [dispatcher.rs:368]
#66 [registry-rpc] std::thread::local::LocalKey<T>::try_with [local.rs:446]
#67 [registry-rpc] tracing_core::dispatcher::get_default::{{closure}} [dispatcher.rs:371]
#68 [registry-rpc] tracing_core::event::Event::dispatch::{{closure}} [event.rs:35]
#69 [registry-rpc] tracing_core::dispatcher::Dispatch::event [dispatcher.rs:586]
#70 [registry-rpc] <alloc::sync::Arc<S> as tracing_core::subscriber::Subscriber>::event [subscriber.rs:828]
#71 [registry-rpc] <tracing_subscriber::layer::layered::Layered<L,S> as tracing_core::subscriber::Subscriber>::event [layered.rs:152]
#72 [registry-rpc] <tracing_subscriber::layer::layered::Layered<L,S> as tracing_core::subscriber::Subscriber>::event [layered.rs:153]
#73 [registry-rpc] <tracing_subscriber::filter::layer_filters::Filtered<L,F,S> as tracing_subscriber::layer::Layer<S>>::on_event [mod.rs:724]
#74 [registry-rpc] tracing_subscriber::filter::layer_filters::Filtered<L,F,S>::did_enable [mod.rs:528]
#75 [registry-rpc] std::thread::local::LocalKey<T>::with [local.rs:422]
#76 [registry-rpc] std::thread::local::LocalKey<T>::try_with [local.rs:446]
#77 [registry-rpc] tracing_subscriber::filter::layer_filters::Filtered<L,F,S>::did_enable::{{closure}} [mod.rs:528]
#78 [registry-rpc] tracing_subscriber::filter::layer_filters::FilterState::did_enable [mod.rs:1068]
#79 [registry-rpc] <tracing_subscriber::filter::layer_filters::Filtered<L,F,S> as tracing_subscriber::layer::Layer<S>>::on_event::{{closure}} [mod.rs:725]
#80 [registry-rpc] <console_subscriber::ConsoleLayer as tracing_subscriber::layer::Layer<S>>::on_event [lib.rs:701]
#81 [registry-rpc] tracing_core::event::Event::record [event.rs:87]
#82 [registry-rpc] tracing_core::field::ValueSet::record [field.rs:997]
#83 [registry-rpc] <&T as tracing_core::field::Value>::record [field.rs:585]
#84 [registry-rpc] <&T as tracing_core::field::Value>::record [field.rs:585]
#85 [registry-rpc] <str as tracing_core::field::Value>::record [field.rs:530]
#86 [registry-rpc] <console_subscriber::visitors::PollOpVisitor as tracing_core::field::Visit>::record_str [visitors.rs:433]
#87 [registry-rpc] <str as alloc::string::ToString>::to_string [string.rs:2599]
#88 [registry-rpc] <alloc::string::String as core::convert::From<&str>>::from [string.rs:2653]
#89 [registry-rpc] alloc::str::<impl alloc::borrow::ToOwned for str>::to_owned [str.rs:209]
#90 [registry-rpc] alloc::slice::<impl alloc::borrow::ToOwned for [T]>::to_owned [slice.rs:786]
#91 [registry-rpc] alloc::slice::<impl [T]>::to_vec [slice.rs:411]
#92 [registry-rpc] alloc::slice::<impl [T]>::to_vec_in [slice.rs:436]
#93 [registry-rpc] alloc::slice::hack::to_vec [slice.rs:106]
#94 [registry-rpc] <T as alloc::slice::hack::ConvertVec>::to_vec [slice.rs:157]
#95 [registry-rpc] alloc::vec::Vec<T,A>::with_capacity_in [mod.rs:673]
#96 [registry-rpc] alloc::raw_vec::RawVec<T,A>::with_capacity_in [raw_vec.rs:131]
#97 [registry-rpc] alloc::raw_vec::RawVec<T,A>::allocate_in [raw_vec.rs:185]
#98 [registry-rpc] <alloc::alloc::Global as core::alloc::Allocator>::allocate [alloc.rs:241]
#99 [registry-rpc] alloc::alloc::Global::alloc_impl [alloc.rs:181]
#100 [registry-rpc] alloc::alloc::alloc [alloc.rs:99]
#101 [libbytehound.so] malloc [api.rs:294]

grahamking pushed a commit to grahamking/console that referenced this issue Dec 14, 2023
Do not record poll_ops if there are no current connected clients
(watchers). Without this `Aggregator::poll_ops` would grow forever.

Follow up to tokio-rs#311 and
fix for these two:
- tokio-rs#184
- tokio-rs#500
@grahamking
Copy link
Contributor

I've noticed that if I don't connect the console at all and leave it a few days the memory usage climbs forever.
However, the instant I connect the console, the memory is freed

I think this is Aggregator::poll_ops. It grows forever if there are no connected consoles to send the data to. This fixes it for me: #501

@hawkw hawkw closed this as completed in 1656c79 Jan 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-subscriber Crate: console-subscriber. S-bug Severity: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants