## This notebook demonstrates how to perform bulk inference of [DeepSeek R1 Zero](https://github.com/deepseek-ai/DeepSeek-R1) on the [Tracto.ai](https://tracto.ai/) platform.

In [1]:
import yt.wrapper as yt
import uuid

In [2]:
yt.config["pickling"]["dynamic_libraries"]["enable_auto_collection"] = False
yt.config["pickling"]["ignore_system_modules"] = True
yt.config["pickling"]["safe_stream_mode"] = False

In [3]:
working_dir = f"//tmp/examples/bulk-inference-deepseek_{uuid.uuid4()}"
yt.create("map_node", working_dir, recursive=True)
print(working_dir)

//tmp/examples/bulk-inference-deepseek_012804c3-2615-45cf-8afb-8bd00f397bc5


Prepare data for inference as an YTSaurus table.

In [5]:
from datasets import load_dataset

dataset = load_dataset("Rapidata/Other-Animals-10")

table_path = f"{working_dir}/questions"
yt.create("table", table_path, force=True)

questions = [
    {"question": f"Can {animal} fly?"}
    for animal in set(dataset["train"].features["label"].int2str(dataset["train"]["label"]))
]

yt.write_table(table_path, questions)

Run bulk inference of DeepSeek R1 Zero on 2 nodes.

In [7]:
from typing import Iterable
import logging
import sys
import random

@yt.aggregator
def bulk_inference(records: Iterable[dict[str, str]]) -> dict[str, str]:
    from vllm import LLM, SamplingParams

    # yt job have to write all logs to stderr
    vllm_logger = logging.getLogger("vllm")
    vllm_logger.handlers.clear()
    vllm_logger.addHandler(logging.StreamHandler(sys.stderr))

    llm = LLM(model="deepseek-ai/DeepSeek-R1-Zero", tensor_parallel_size=8, seed=random.randint(0, 1000000), trust_remote_code=True)
    sampling_params = SamplingParams(
        temperature=0.6,
        top_p=0.9,
        max_tokens=32000,
    )

    conversations = [
        [
            {
                "role": "user",
                "content": record["question"],
            },
        ]
        for record in records
    ]
    outputs = llm.chat(
        messages=conversations,
        sampling_params=sampling_params,
    )
    for output in outputs:
        yield {
            "prompt": output.prompt,
            "text": output.outputs[0].text,
        }

In [8]:
result_path = f"{working_dir}/result"

yt.run_map(
    bulk_inference,
    table_path,
    result_path,
    job_count=2,
    spec={
        "pool": "fifo",
        "pool_trees": ["gpu_h200"],
        "mapper": {
            "gpu_limit": 8,
            "memory_limit": 322122547200,
            "cpu_limit": 64,
        },
    },
)

2025-02-07 00:15:43,067	INFO	Operation started: https://playground.yt.nebius.yt/playground/operations/e50fb14c-36c52d75-270703e8-4a1ada05/details


2025-02-07 00:15:43,088	INFO	( 0 min) operation e50fb14c-36c52d75-270703e8-4a1ada05 starting


2025-02-07 00:15:43,617	INFO	( 0 min) operation e50fb14c-36c52d75-270703e8-4a1ada05 initializing


2025-02-07 00:15:45,766	INFO	( 0 min) Unrecognized spec: {'enable_partitioned_data_balancing': false, 'mapper': {'title': 'bulk_inference'}}


2025-02-07 00:15:45,796	INFO	( 0 min) operation e50fb14c-36c52d75-270703e8-4a1ada05: running=0     completed=0     pending=2     failed=0     aborted=0     lost=0     total=2     blocked=0    


2025-02-07 00:15:46,894	INFO	( 0 min) operation e50fb14c-36c52d75-270703e8-4a1ada05: running=2     completed=0     pending=0     failed=0     aborted=0     lost=0     total=2     blocked=0    


2025-02-07 00:56:06,533	INFO	(40 min) operation e50fb14c-36c52d75-270703e8-4a1ada05: running=1     completed=1     pending=0     failed=0     aborted=0     lost=0     total=2     blocked=0    


2025-02-07 00:56:11,557	INFO	(40 min) operation e50fb14c-36c52d75-270703e8-4a1ada05 completed


2025-02-07 00:56:11,603	INFO	(40 min) Alerts: {'low_cpu_usage': {'code': 1, 'message': "Average CPU usage of some of your job types is significantly lower than requested 'cpu_limit'. Consider decreasing cpu_limit in spec of your operation", 'attributes': {'pid': 1, 'tid': 12985338020924340636, 'thread': 'Controller:1', 'fid': 18446262941903288191, 'host': 'man0-0460.hw.nebius.yt', 'datetime': '2025-02-07T00:56:08.492770Z', 'trace_id': '6ed8553e-217fb054-e4892abd-1f12ae0', 'span_id': 13671918639496277952}, 'inner_errors': [{'code': 1, 'message': 'Jobs of task "map" use 2.16% of requested cpu limit', 'attributes': {'pid': 1, 'tid': 12985338020924340636, 'thread': 'Controller:1', 'fid': 18446262941903288191, 'host': 'man0-0460.hw.nebius.yt', 'datetime': '2025-02-07T00:56:08.492757Z', 'trace_id': '6ed8553e-217fb054-e4892abd-1f12ae0', 'span_id': 13671918639496277952, 'cpu_time': 6691733, 'cpu_limit': 64.0, 'exec_time': 4838073}}]}}


<yt.wrapper.operation_commands.Operation at 0x7f62c242f200>

In [9]:
for record in yt.read_table(result_path):
    print(record)

{'prompt': '<｜begin▁of▁sentence｜><｜User｜>Can fly fly?<｜Assistant｜>', 'text': '<think>\nThe question "Can fly fly?" seems to be a play on words involving the word "fly," which has two different meanings in English:\n\n1. "Fly" as a noun refers to a small flying insect with two wings (e.g., a housefly).\n2. "Fly" as a verb refers to the action of moving through the air using wings (e.g., birds fly).\n\nTo answer the question "Can fly fly?" we need to interpret it correctly. The question could be interpreted as asking whether a "fly" (the insect) can perform the action of "flying."\n\nBased on this reasoning, the answer is yes, a fly (the insect) can indeed fly. Flies, such as houseflies, are well-known for their ability to fly. They have two wings (most insects have two pairs of wings, but flies belong to the order Diptera, which means "two wings") that enable them to fly quite adeptly.\n\nHowever, it is important to make sure that the question is interpreted correctly. Another part of t