Skip to content

Commit

Permalink
Improve Garbage Collection (#8154)
Browse files Browse the repository at this point in the history
### Description

Simplify and improve GC

This improves the GC queue.

The job of the GC queue is to find tasks that should be garbage
collected. There are three factors which influence that:

* age of the task: Time since last access.
* memory usage of the task
* compute duration of the task: CPU time spend to compute the task.

Memory usage and compute duration combine into a GC priority by
calculating: `(memory_usage + C1) / (compute_duration + C2)`. C1 and C2
and constants to fine tune the priority.

The age of the task is constantly changing so a different scheme is used
here:

Every task has a generation in which is was last accessed.
The generation is increased every 100,000 tasks.

We accumulate tasks in the current generation in a concurrent queue.
Once 100,000 tasks are reached (atomic counter), we increase the
generation and pop 100,000 tasks from the queue into an `OldGeneration`.
These old generations are stored in another queue. No storing is apply
so far. These are just lists of task ids.

Once we need to perform GC, we pop the oldest old generation from the
queue, filter out all tasks that are in a higher generation (they have
been accessed in the meantime), and sort the list by GC priority.
Then we take the 30% top tasks and garbage collect them.
Then remaining tasks are pushed to the front of the queue again,
intermixed with other tasks into existing old generations until we reach
a maximum of 200,000 tasks in a generation item. In that case the
generation item is split into two items.

### Testing Instructions

<!--
  Give a quick description of steps to test your changes.
-->
  • Loading branch information
sokra committed May 17, 2024
1 parent ef69b7e commit 864a6ad
Show file tree
Hide file tree
Showing 28 changed files with 439 additions and 2,458 deletions.
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

30 changes: 3 additions & 27 deletions crates/node-file-trace/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ mod nft_json;
use std::{
collections::{BTreeSet, HashMap},
env::current_dir,
fs,
future::Future,
path::{Path, PathBuf},
pin::Pin,
Expand All @@ -24,15 +23,12 @@ use serde::Serialize;
use tokio::sync::mpsc::channel;
use turbo_tasks::{
backend::Backend, util::FormatDuration, TaskId, TransientInstance, TransientValue, TurboTasks,
TurboTasksBackendApi, UpdateInfo, Value, Vc,
UpdateInfo, Value, Vc,
};
use turbo_tasks_fs::{
glob::Glob, DirectoryEntry, DiskFileSystem, FileSystem, FileSystemPath, ReadGlobResult,
};
use turbo_tasks_memory::{
stats::{ReferenceType, Stats},
viz, MemoryBackend,
};
use turbo_tasks_memory::MemoryBackend;
use turbopack::{
emit_asset, emit_with_completion, module_options::ModuleOptionsContext, rebase::RebasedAsset,
ModuleAssetContext,
Expand Down Expand Up @@ -101,10 +97,6 @@ pub struct CommonArgs {
#[cfg_attr(feature = "node-api", serde(default))]
cache: CacheArgs,

#[cfg_attr(feature = "cli", clap(short, long))]
#[cfg_attr(feature = "node-api", serde(default))]
visualize_graph: bool,

#[cfg_attr(feature = "cli", clap(short, long))]
#[cfg_attr(feature = "node-api", serde(default))]
watch: bool,
Expand Down Expand Up @@ -325,7 +317,6 @@ pub async fn start(
) -> Result<Vec<String>> {
register();
let &CommonArgs {
visualize_graph,
memory_limit,
#[cfg(feature = "persistent_cache")]
cache: CacheArgs {
Expand Down Expand Up @@ -390,22 +381,7 @@ pub async fn start(
TurboTasks::new(MemoryBackend::new(memory_limit.unwrap_or(usize::MAX)))
})
},
|tt, root_task, _| async move {
if visualize_graph {
let mut stats = Stats::new();
let b = tt.backend();
b.with_all_cached_tasks(|task| {
stats.add_id(b, task);
});
stats.add_id(b, root_task);
// stats.merge_resolve();
let tree = stats.treeify(ReferenceType::Child);
let graph =
viz::graph::visualize_stats_tree(tree, ReferenceType::Child, tt.stats_type());
fs::write("graph.html", viz::graph::wrap_html(&graph)).unwrap();
println!("graph.html written");
}
},
|_, _, _| async move {},
module_options,
resolve_options,
)
Expand Down
4 changes: 4 additions & 0 deletions crates/turbo-tasks-malloc/src/counter.rs
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,10 @@ pub fn allocation_counters() -> AllocationCounters {
with_local_counter(|local| local.allocation_counters.clone())
}

pub fn reset_allocation_counters(start: AllocationCounters) {
with_local_counter(|local| local.allocation_counters = start);
}

fn with_local_counter<T>(f: impl FnOnce(&mut ThreadLocalCounter) -> T) -> T {
LOCAL_COUNTER.with(|local| {
let ptr = local.get();
Expand Down
4 changes: 4 additions & 0 deletions crates/turbo-tasks-malloc/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,10 @@ impl TurboMalloc {
pub fn allocation_counters() -> AllocationCounters {
self::counter::allocation_counters()
}

pub fn reset_allocation_counters(start: AllocationCounters) {
self::counter::reset_allocation_counters(start);
}
}

#[cfg(all(
Expand Down
4 changes: 2 additions & 2 deletions crates/turbo-tasks-memory/src/aggregation/new_edge.rs
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,9 @@ const MAX_AFFECTED_NODES: usize = 4096;
/// Handle the addition of a new edge to a node. The the edge is propagated to
/// the uppers of that node or added a inner node.
#[tracing::instrument(level = tracing::Level::TRACE, name = "handle_new_edge_preparation", skip_all)]
pub fn handle_new_edge<'l, C: AggregationContext>(
pub fn handle_new_edge<C: AggregationContext>(
ctx: &C,
origin: &mut C::Guard<'l>,
origin: &mut C::Guard<'_>,
origin_id: &C::NodeRef,
target_id: &C::NodeRef,
number_of_children: usize,
Expand Down
38 changes: 3 additions & 35 deletions crates/turbo-tasks-memory/src/cell.rs
Original file line number Diff line number Diff line change
Expand Up @@ -75,41 +75,6 @@ impl Cell {
}
}

/// Returns true if the cell has dependent tasks.
pub fn has_dependent_tasks(&self) -> bool {
match self {
Cell::Empty => false,
Cell::Recomputing {
dependent_tasks, ..
}
| Cell::Value {
dependent_tasks, ..
}
| Cell::TrackedValueless {
dependent_tasks, ..
} => !dependent_tasks.is_empty(),
}
}

/// Returns the list of dependent tasks.
pub fn dependent_tasks(&self) -> &TaskIdSet {
match self {
Cell::Empty => {
static EMPTY: TaskIdSet = AutoSet::with_hasher();
&EMPTY
}
Cell::Value {
dependent_tasks, ..
}
| Cell::TrackedValueless {
dependent_tasks, ..
}
| Cell::Recomputing {
dependent_tasks, ..
} => dependent_tasks,
}
}

/// Switch the cell to recomputing state.
fn recompute(
&mut self,
Expand Down Expand Up @@ -235,6 +200,9 @@ impl Cell {
} => {
// Assigning to a cell will invalidate all dependent tasks as the content might
// have changed.
// TODO this leads to flagging task unnecessarily dirty when a GC'ed task is
// recomputed. We need to use the notification of changed cells for the current
// task to check if it's valid to skip the invalidation here
if !dependent_tasks.is_empty() {
turbo_tasks.schedule_notify_tasks_set(dependent_tasks);
}
Expand Down
185 changes: 0 additions & 185 deletions crates/turbo-tasks-memory/src/concurrent_priority_queue.rs

This file was deleted.

Loading

0 comments on commit 864a6ad

Please sign in to comment.