This notebook contains examples of how to work with [YSON documents](https://ytsaurus.tech/docs/en/user-guide/storage/yson-docs) on Tracto.

Notebooks demonstrates how to:
1. Create a document.
2. Write content to a document.
3. Partially update a document.
4. Partially read a document.


Documents can be used to store small amount of data, for example:
1. Common configuration for operation's runner.
2. Debug metadata about ML training run.
3. Settings for YT-based service (like CHYT).

The document behaves as a whole in terms of Cypress-specific features:
* locks
* owners
* revisions
* creation_time, modification_time and expiration_time
* attributes
* and other features

### Limits

* RPS should be < 1
* Single document should be about some KB.
* The total volume of all user documents must not exceed <10MB.



In [1]:
from yt import wrapper as yt
import uuid

In [2]:
# configure environment to run this notebooks
import uuid
import yt.wrapper as yt

username = yt.get_user_name()
if yt.exists(f"//sys/users/{username}/@user_info/home_path"):
    # prepare working directory on distributed file system
    user_info = yt.get(f"//sys/users/{yt.get_user_name()}/@user_info")
    homedir = user_info["home_path"]
    # find avaliable vm presets
    cpu_pool_trees = [pool_tree for pool_tree in user_info["available_pool_trees"] if pool_tree.endswith("cpu")] or ["default"]
    h100_pool_trees = [pool_tree for pool_tree in user_info["available_pool_trees"] if pool_tree.endswith("h100")]
    h100_8_pool_trees = [pool_tree for pool_tree in user_info["available_pool_trees"] if pool_tree.endswith("h100-8")]
    workdir = f"{homedir}/tmp/demo_workdir/{uuid.uuid4().hex}"
else:
    cpu_pool_trees = ["default"]
    h100_pool_trees = ["gpu_h100"]
    h100_8_pool_trees = ["gpu_h100"]
    workdir = f"//tmp/examples/{uuid.uuid4().hex}"

yt.create("map_node", workdir, recursive=True, ignore_existing=True)
print("Current working directory:", workdir)

Current working directory: //home/equal_amethyst_vulture/tmp/demo_workdir/b83830f9f5a64c5dbaa5257079701ace


## Example

Let's create an empty document and wright some data.

In [4]:
document_path = f"{workdir}/document"
yt.create("document", document_path)

'46d-2624-24dd01a5-368ef8f'

In [5]:
yt.set(document_path, {"data": {"ytsaurus": ["master", "proxies", "scheduler", "exec nodes", "data nodes"]}})
yt.get(document_path)

{'data': {'ytsaurus': ['master', 'proxies', 'scheduler', 'exec nodes', 'data nodes']}}

The document can be partially readed and updated.

In [7]:
yt.get(f"{document_path}/data/ytsaurus")

['master', 'proxies', 'scheduler', 'exec nodes', 'data nodes']

In [8]:
yt.set(f"{document_path}/object_class", "Safe")
yt.get(document_path)

{'data': {'ytsaurus': ['master', 'proxies', 'scheduler', 'exec nodes', 'data nodes']}, 'object_class': 'Safe'}

The document can also be completely rewritten.

In [10]:
yt.set(document_path, "[DATA EXPUNGED]")
yt.get(document_path)

'[DATA EXPUNGED]'