Skip to content

YTsaurus 25.2.0 release notes draft

Ignat edited this page Sep 23, 2025 · 15 revisions

YTsaurus 25.2.0 release notes:

To install YTsaurus Server 25.2.0 update the k8s-operator to version 0.27.0.

Significant changes

  • Added support for Nvidia GPU in k8s-operator. Improved GPU devices discovery in job container. Documentation.
  • Added bundle controller for managing tablet cell bundles on small clusters. This component distributes tablet nodes between bundles, manages node maintenance and controls CPU and memory distribution over tablet nodes. Documentation.
  • Added support for multiproxy mode in RPC proxies. RPC Proxies (including RPC Proxy in Job Proxy) could be configured to operate with remote clusters. Documentation.

Query language features

  • Added cardinality_state and cardinality_merge functions.
  • Added support for timestamp functions for arbitrary time zones.
  • Implemented array_agg function.
  • Added support for simple subqueries in FROM clause.

Default changes and deprecations

  • Enabled decommission_through_extra_peers by default; which significantly reduces downtime of tablet node maintenance.
  • Enabled hunks remote copy by default.
  • Switched to per-bundle tablet resource accounting by default.
  • Remote copy operations set some system attributes on destination table, even if copy_attributes is set to false in spec; these attributes are: compression_codec, erasure_codec, optimize_for.
  • Deprecation of list_node. Master-servers will now issue an alert-level log message after loading a snapshot if it contains a list node. This behaviour can be turned off using alert_on_list_node_load option. Consider switching to other types and removing or replacing all remaining list nodes. If not done, this will result in master-server not starting in the next major update. With this release, we've bundled a script that should help you migrate in the vast majority of cases. You can find it at yt/yt/scripts/master/replace_list_nodes. We've published a blog post explaining our reasoning for deprecating this type and suggesting other migration methods. !!(TODO: Add a link to the blogpost!)!!

Full changelog

Scheduler and GPU

New Features & Changes:

  • Added validation of exe node resource limits. If a node does not meet configured limit scheduler raises an alert, c4b5dcd.
  • Allowed dot by default in pool name validation regexp, 93cf17a.
  • Added type, user and title to operation orchid, ffda123.
  • Added option to scheduler config and operation spec that fails operation if specified pools do not exist, 30eebf3.

Queue Agent

New Features & Changes:

  • Added enable_verbose_logging option in the dynamic config which enables verbose logging for specific objects from verbose_logging_objects, 67efedf.
  • Take queue exports into account in case of trimming replicated table and chaos replicated table queues, c016ca9.
  • Added support for retries for CreateQueueProducerSession method, 591d500.

Fixes & Optimizations:

  • Fixed init_queue_agent_state in case directory already exists (occurs in k8s operator), 8aca0e2.
  • Fixed write_data_weight_rate for empty partitions, 3e1dd6d.
  • Prevented split-brain between queue agent instances leading to orchid redirect loop by limiting the number of redirection and add retries, 39b014d.
  • Fixed queue agent crashes in some tricky cases with queue recreation, 133b109.
  • Fixed potential data loss in case of multiple exports per queue, f51f9fa.

Proxy

New Features & Changes:

  • Added support for multiproxy mode in RPC proxies: client can use multiproxy of one cluster to operate other connected clusters, e0f98d7.
  • Introduced get_current_user method in RPC and HTTP proxies, ab3f903.
  • Added the ability to configure a user limit for a specific proxy role, 14d49a0.
  • Extended impersonation functionality in HTTP protocol; which is now allowed for all unbanned superusers, 9eee43d.
  • Allowed sending/reading values larger than 16MB to RPC proxy via wire protocol to/from methods dealing with static tables, df8fb64.
  • Added to_lower and to_upper option to OAuth login_transformations, 0245614.
  • Added support for dynamic reconfiguration of signature subsystem in HTTP and RPC proxy, 029f6ce

Fixes & Optimizations:

  • Various fixes in arrow format:
    • Fixed reading of tables with date type columns, 1a23993.
    • Added the ability to read tables with different number of columns in chunk meta in arrow format, 6210035.
  • Changed caching options in config of CypressUserManager for OAuthAuthenticator. Migrate to options compatible with AsyncExpiringCache ("expire_after_*_time"). Older options ("cache_ttl", "optimistic_cache_ttl") are deprecated and will be removed in future versions, 1ecabbc.
  • Fixed for CVE-2023-33460: Memory leak in yajl 2.1.0 with use of yajl_tree_parse function. f7b9064
  • Fixed possible deadlock in the chunk meta cache, 7c68dbe.
  • Fixing the calculation of state_counts and type_counts in list_jobs method, 84d7713.
  • Set attribute treat_as_queue_producer=%true during queue_producer creation, 88eac20.
  • Made HTTP proxy functionable when master is in read-only mode, ccb0228.
  • Improvement of handling memory pressure errors, db04463.
  • In case of memory pressure Drop only heavy request, 646071a.

Dynamic Tables

New Features & Changes:

  • Added support for bulk insert under user transaction, 3fe8c73.
  • Consider primary key prefix constrained by predicate in ORDER BY and use ordered execution if it is sufficient, f8dbc00.
  • Allowed background compaction and partition tasks to be executed within the two-level-fair-share thread pool, 55c2dfd.
  • Added cardinality_state and cardinality_merge functions for QL, a267e20.
  • Timestamp functions for arbitrary time zones are added to QL, 55178fe.
  • Implemented array_agg function for QL, d656eec.
  • Added support for RegisterChunkReplicasOnStoresUpdate for ordered tables; reduce the number of master requests required for reading flushed chunks via tablet node API, 155fe69.
  • Casts double->(unsigned) integer now avoid undefined behavior and function the same regardless of execution engine by clamping values and converting NaNs to zero, 8c492fa.
  • Optimize writes into tables with secondary indices under certain conditions, 4066880.
  • Enabled decommission_through_extra_peers by default; which significantly reduces downtime of tablet node maintenance, c34ff21.
  • Allowed columns with type Any to be unfolded via indices; their contents are checked in runtime, d0eb7ad.
  • Smooth movement for tablets with hunks, d5357ea.
  • Added option to decrease serialization time by serializing transaction within each lock group in each row separately, fe90e97.
  • Optimize QL: SELECT queries with GROUP BY and JOIN group rows BEFORE joining when possible, 75e8e64.
  • Added a method to return freezing or unmounting tables back to mounted state, bbf1101.
  • Added support for simple subqueries in FROM clause in QL, a7e0701.
  • Added profiling counters in tablet nodes for pull_queue/pull_queue_consumer commands, 6aebfc1.
  • Select queries now properly choose a random in-sync replica even if candidates belong to the same cluster, efdf083.
  • Added total_grouped_row_count to QL statistics, e37b81f.
  • Added log drop tracker in overload controller, 8cd772d.
  • Switched to per-bundle tablet resource accounting by default, 4f78e41.
  • The @resource_quota attribute for tablet cell bundles is now interned and mirrored with @resource_limits, 27ed1ef.
  • Added support for remote copy for dynamic table with compression dictionaries, 0791aea.
  • Introduced "evaluatable schema" extension for secondary indices, which allows indexation of expressions, bf45155.
  • Enabled hunks remote copy by default, 9f2c5f4.
  • Optimize QL: use lookup join when left subplan is selective, 03d25a9.
  • Improved performance of timestamp_floor_week function, 06a643c.
  • Added erasure codec validation in journal writer, ae194de.

Fixes & Optimizations:

  • Fixed per-category tablet dynamic memory accounting at followers, d147efd.
  • Signal handler stack enlarged; enabled memory protection for signal handler stack to avoid memory corruption due to stack overflow during signal handler execution, 7ba96aa.
  • Fixed accounting of row cache memory, bcadf28.
  • Fixed to_any function - Cast of EValueType::Composite to EValueType::Any now works as expected. Allowed some functions to work with both of these types, 20f22b7.
  • Fixed a bug involving unfolding secondary indices that lead to crashes when query predicate contained list_contains(expr) where expr was not a reference, 61b2b4e.
  • Fixed crashes in proxies when selecting from a table with a malformed computed column, 913edd3.
  • Fixed select_rows not waiting on locks when reading via lookup, 336ab90.
  • Fixed bad invoker choice when migrating to query thread pool, 8d93be7.
  • Misconfiguration of row cache no longer leads to OOM, 7db2519.
  • Fixed a bug with incorrect meta size estimation in scan format which caused overzealous chunk fragmentation during compaction, 11a59e3.
  • Fixed the bug where range inference produced mishapen keys and ranges, aa7597d.
  • Fixed data corruption of floating point values in dynamic tables in scan format, 8fcbf50.
  • Added push_down_group_by for replicated tables, cb64762.
  • Fixed considering timezones in timestamp_floor_*_localtime functions, 07f30b0.
  • Used physical chunk count instead of logical one in ordered dynamic tablet chunk lists, 96c1a7f.
  • Eliminate a memory leak caused by cancelled selects, 75521e1.
  • Join-predicate is now used in range inference for subqueries that fetch data from dictionaries, d62d232.
  • Fixed bug leading to segfaults in table with nested columns, 45cb542.
  • Fixed incompatibility of AVG function in QL for some cases, ef7e062.

MapReduce

New Features & Changes:

  • Added support for dynamic reconfiguration of signature subsystem in exec node, 029f6ce
  • Fixed input data slicing issues, 3bb3594.
  • Added option that forces lower bound of user job CPU limit, c4675d4.
  • Added support for NBD network disks to disk_request, a83d345.
  • Added support for extra jobs for gang operations and introduced gang ranks, d45f1f7.
  • Added new delivery fenced connection that works on vanilla linux kernel; it may be used for CPU intensive or GPU jobs to prevent job abort on interruption, d7861cd.
  • Enhance input compressed size estimation for operations with columnar statistics, 68474e7.
  • CAs now fetch schema from external cells by default, b5932ab.
  • Initial support of multiple jobs in one allocation, ee037ac.
  • Hostname in containers is now built using slot-{slot_index}.{exec_node_hostname} format, fe028ee.
  • RemoteCopy operations now set some system attributes on destination table, even if copy_attributes is set to false in spec; these attributes are: compression_codec, erasure_codec, optimize_for, 25be378.
  • Fixed yt.exec_node.rpc_proxy_in_job_proxy_count metric, add host label, 416d8bf.

Fixes & Optimizations:

  • Passed actual error instead of Job failed by external request, bc9b656.

Master Server

New Features & Changes:

  • Made the default optimize_for configurable for static and dynamic tables, 9e5d4a8.
  • Introduced TCompactTableSchema to reduce master server process memory footprint (unlike TTableSchema, this holds just the protobuf-serialized schema 666861a.
  • Introduced a hard limit on the response size for read requests to //sys/chunks, configurable via virtual_chunk_map_read_result_limit. Previously, accessing this virtual map could crash the master server due to excessive creation of fibers and subsequent memory allocation. This is a temporary mitigation while we work on a better fix, 61eb1f5.
  • Changed copy/move commands to require an explicit flag to drop secondary indices; previously this was the default behaviour, 0668453.
  • Allowed users to authenticate with their aliases. This is a preliminary compatibility step toward migrating system users to a new naming scheme (e.g. job -> yt-job), b52bbb5.
  • Annotated read requests to external attributes (typically served at secondary cells) with the proper user identity, 49fd63b. This is primarily for proper load attribution in profiling and monitoring.
  • Added per-chunk replica throttling of data node heartbeat processing, 5ca5f62.
  • Include more information about secondary indices in table attributes, cd9ebbe.
  • Moved operation locking output dynamic tables from the controller to the native protocol, 9ac2870.
  • Prohibited chunk_host role revocation from cells with chunks and cypress_node_host role revocation from cells with native Cypress nodes, 5d802d7.

Fixes & Optimizations:

  • Stopped generating mutation IDs for read requests. Mutation IDs are considered to be a part of the request message and generating a unique mutation ID for every request previously broke the object service cache, 3d1d9eb.
  • Used the persistent response keeper for commit/abort requests for Cypress transactions, b6da7ab.
  • Fixed crash in HydraUpdateMasterCellChunkStatistics when ChunkScanExecutor_ called OnChunkScan more than once before the execution of the committed HydraUpdateMasterCellChunkStatistics mutation, 0c2d095.
  • Fixed "list node creation is forbidden" error when attempting to set a list to an attribute of a nonexistent node, bca5761.
  • Remove the broadcast from DoGetMulticellOwningNodes, 75a99bd.
  • Fixed potential missing inherited attributes when a node created with the "force" flag overwrote another node, 86d07f9.
  • Fixed revision validation when revision paths differ from execution paths, f39e9e8, 1663a23.
  • Forbid creation of tables that are indices of themselves, f6404a8.
  • Fixed creation of secondary indices beyond portal, 96ad84e.
  • Fixed careful chunk requisition update in chunk merger, 996e11c.
  • Fixed statistics checks during the removal of a master-cell, b558b10.
  • Fixed job heartbeat processing on a yet non-registered new cell, 4733cbb.
  • Fixed the chunk replicator not respecting medium-specific replication factor override in certain scenarios, d6bab2f.
  • Fixed the ID of the admins built-in group, 29529ec.
  • Fixed nullptr dereference in HydraCreateForeignObject, a30d422.
  • Dropped legacy ZooKeeper shard, 558b7b5.
  • Fixed master-server not changing the reliability of the exec nodes, 4e9becf.
  • Fixed a race between the transaction coordinator committing a transaction and a cell with an exported object unreferencing said object, 5b72aad.
  • Fixed manual Cypress node merging for Scheduler transactions, 8a80023.
  • Fixed a master crash when setting a YSON dictionary with duplicated keys into a custom attribute, 867354b.
  • Fixed row comparison in shallow merge validation so that it does not fail the job, 404a790.
  • Fixed a crash triggered by reading @local_scan_flags attribute, d8743cb.
  • Fixed non-deterministic error caused by the non-deterministic order of YSON struct field loading when two or more required fields are missing. Since error message is a part of the response for a master mutation, this could lead to "state hashes differ" alert on the master, d907ada.
  • Fixed TAttributeFilter handling, 488b343.
  • Fixed locking for concatenation in append mode, c1f5c7e.
  • Fixed a bug related to the compatibility patch for the imaginary chunk locations, f591951.

Data Node

New Features & Changes:

  • Added support for erasure encoding in read size estimation, 4b3a28e.
  • Added flag enable_read_size_estimation to disable read size estimation (true by default), 4b3a28e.

Fixes & Optimizations:

  • Fixed chunk meta extensions absorption in meta aggregated writer, 677d2d4.
  • Fixed bug in computing compression ratio in read size estimation based on heavy columnar statistics, add unit tests to address such kind of bugs, 4b3a28e.
  • Added config for master cell directory synchronizer into cluster node dynamic config, d049363.
  • Fixed start of node heartbeats before actual registration, 58e442b.
  • Fixed crash in meta aggregated writer on corrupted chunks, 5653dfb.
  • Reuse node lease transaction during node re-registration, d0eb92b.
  • Fixed crash of node disconnecting while starting heartbeat report, 66efd89.

Other

New Features & Changes:

  • Added sensors for mlock calls, 085a74c.
  • Implemented stockpile relative to user-jobs memory limit (it is necessary for exec nodes in dynamic-tables-oriented clusters), 8b7c91b.

Fixes & Optimizations:

  • Stopped aborting node lease transaction when re-register, 6da69bd.
  • Fixed UB in chunked memory pool, ec99700.
  • Fixed UB in logging zstd compression, 870ca53.
  • Fixed a bug in RPC service where a heavy request which had been queued would use propagating storage (e.g. a trace context) from another request, 7745b84.
  • Improved tracking of memory used in concurrent cache, 122fd89.
  • Used 64-bit counters for histogram buckets, 94fe6d3.

Clone this wiki locally