Skip to content

Conversation

@atrivedi-tsavoritesi
Copy link

@atrivedi-tsavoritesi atrivedi-tsavoritesi commented Jun 17, 2025

…path

The changes are as follows

  1. change directory to right folder before running the commands
  2. Add system-info and txe-restart functionlity

Make sure to read the contributing guidelines before submitting a PR

Test results
system-info endpoint
/usr/bin/tsi/v0.1.1.tsv31_06_06_2025/bin/../install/tsi-version;lscpu TSI Software: v0.1.1.tsv31_06_06_2025 Compiled by: TSI Release Manager Compile Date: Fri May 6th, 2025 FPGA Version: SKYLP G0221 RAL Version: SKYLP_G0221 SOF File Version: ia80m-hps.sof version:SKYLP_G0221 AOT Tests Version: 0.3.1_rt0.4.2 RT Libs Version: 0.4.2 GGML Version: 0.0.2 Architecture: aarch64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Vendor ID: ARM Model name: Cortex-A53 Model: 4 Thread(s) per core: 1 Core(s) per cluster: 4 Socket(s): - Cluster(s): 1 Stepping: r0p4 BogoMIPS: 800.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid NUMA: NUMA node(s): 1 NUMA node0 CPU(s): 0-3 Vulnerabilities: Gather data sampling: Not affected Itlb multihit: Not affected L1tf: Not affected Mds: Not affected Meltdown: Not affected Mmio stale data: Not affected Reg file data sampling: Not affected Retbleed: Not affected Spec rstack overflow: Not affected Spec store bypass: Not affected Spectre v1: Mitigation; __user pointer sanitization Spectre v2: Not affected Srbds: Not affected Tsx async abort: Not affected

health-check endpoint
free -h total used free shared buff/cache available Mem: 1924284 1130668 754480 232 39136 740140 Swap: 0 0 0

run_llama_cli.sh endpoint
cd /usr/bin/tsi/v0.1.1.tsv31_06_06_2025/bin/; ./run_llama_cli.sh "Context:\nJohn worked at Google for 10 years.\n\nQuestion:\nWhere did John work?\n\nAnswer:" 10 tinyllam ma-vo-5m-para.gguf tSavorite register_backend: registered backend Tsavorite (1 devices) register_device: registered device Tsavorite (txe) register_backend: registered backend CPU (1 devices) register_device: registered device CPU (CPU) load_backend: failed to find ggml_backend_init in /usr/bin/tsi/v0.1.1.tsv31_06_06_2025/bin/tsi-ggml/libggml-tsavorite.so load_backend: failed to find ggml_backend_init in /usr/bin/tsi/v0.1.1.tsv31_06_06_2025/bin/tsi-ggml/libggml-cpu.so build: 5473 (a7b7e46) with gcc (GCC) 13.3.0 for x86_64-pc-linux-gnu (debug) main: llama backend init main: load the model and apply lora adapter, if any TXE Device MEMORY Summary total 134217728 and free 134217728 llama_model_load_from_file_impl: using device Tsavorite (txe) - 128 MiB free llama_model_loader: loaded meta data with 24 key-value pairs and 75 tensors from /tsi/proj/model-cache/gguf/tinyllama-vo-5m-para.gguf (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Vicuna Hf llama_model_loader: - kv 3: general.size_label str = 4.6M llama_model_loader: - kv 4: general.license str = apache-2.0 llama_model_loader: - kv 5: llama.block_count u32 = 8 llama_model_loader: - kv 6: llama.context_length u32 = 2048 llama_model_loader: - kv 7: llama.embedding_length u32 = 64 llama_model_loader: - kv 8: llama.feed_forward_length u32 = 256 llama_model_loader: - kv 9: llama.attention.head_count u32 = 16 llama_model_loader: - kv 10: llama.attention.layer_norm_rms_epsilon f32 = 0.000001 llama_model_loader: - kv 11: general.file_type u32 = 32 llama_model_loader: - kv 12: llama.vocab_size u32 = 32000 llama_model_loader: - kv 13: llama.rope.dimension_count u32 = 4 llama_model_loader: - kv 14: tokenizer.ggml.model str = llama llama_model_loader: - kv 15: tokenizer.ggml.pre str = default llama_model_loader: - kv 16: tokenizer.ggml.tokens arr[str,32000] = ["", "", "", "<0x00>", "<... llama_model_loader: - kv 17: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000... llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... llama_model_loader: - kv 19: tokenizer.ggml.bos_token_id u32 = 1 llama_model_loader: - kv 20: tokenizer.ggml.eos_token_id u32 = 2 llama_model_loader: - kv 21: tokenizer.ggml.unknown_token_id u32 = 0 llama_model_loader: - kv 22: tokenizer.ggml.padding_token_id u32 = 0 llama_model_loader: - kv 23: general.quantization_version u32 = 2 llama_model_loader: - type f32: 17 tensors llama_model_loader: - type bf16: 58 tensors print_info: file format = GGUF V3 (latest) print_info: file type = BF16 print_info: file size = 8.82 MiB (16.00 BPW) load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect load: special tokens cache size = 3 load: token to piece cache size = 0.1914 MB print_info: arch = llama print_info: vocab_only = 0 print_info: n_ctx_train = 2048 print_info: n_embd = 64 print_info: n_layer = 8 print_info: n_head = 16 print_info: n_head_kv = 16 print_info: n_rot = 4 print_info: n_swa = 0 print_info: n_swa_pattern = 1 print_info: n_embd_head_k = 4 print_info: n_embd_head_v = 4 print_info: n_gqa = 1 print_info: n_embd_k_gqa = 64 print_info: n_embd_v_gqa = 64 print_info: f_norm_eps = 0.0e+00 print_info: f_norm_rms_eps = 1.0e-06 print_info: f_clamp_kqv = 0.0e+00 print_info: f_max_alibi_bias = 0.0e+00 print_info: f_logit_scale = 0.0e+00 print_info: f_attn_scale = 0.0e+00 print_info: n_ff = 256 print_info: n_expert = 0 print_info: n_expert_used = 0 print_info: causal attn = 1 print_info: pooling type = 0 print_info: rope type = 0 print_info: rope scaling = linear print_info: freq_base_train = 10000.0 print_info: freq_scale_train = 1 print_info: n_ctx_orig_yarn = 2048 print_info: rope_finetuned = unknown print_info: ssm_d_conv = 0 print_info: ssm_d_inner = 0 print_info: ssm_d_state = 0 print_info: ssm_dt_rank = 0 print_info: ssm_dt_b_c_rms = 0 print_info: model type = ?B print_info: model params = 4.62 M print_info: general.name = Vicuna Hf print_info: vocab type = SPM print_info: n_vocab = 32000 print_info: n_merges = 0 print_info: BOS token = 1 '' print_info: EOS token = 2 '' print_info: UNK token = 0 '' print_info: PAD token = 0 '' print_info: LF token = 13 '<0x0A>' print_info: EOG token = 2 '' print_info: max token length = 18 load_tensors: loading model tensors, this can take a while... (mmap = true) TXE Device MEMORY Summary total 134217728 and free 134217728 load_tensors: offloading 0 repeating layers to GPU load_tensors: offloaded 0/9 layers to GPU load_tensors: CPU_Mapped model buffer size = 8.82 MiB .............. llama_context: constructing llama_context llama_context: n_seq_max = 1 llama_context: n_ctx = 12288 llama_context: n_ctx_per_seq = 12288 llama_context: n_batch = 1024 llama_context: n_ubatch = 512 llama_context: causal_attn = 1 llama_context: flash_attn = 0 llama_context: freq_base = 10000.0 llama_context: freq_scale = 1 llama_context: n_ctx_per_seq (12288) > n_ctx_train (2048) -- possible training context overflow [2018-03-09 13:03:39.996790] 317:318 [�[32m info�[m] :: TXE resource allocation request processed successfully. llama_context: CPU output buffer size = 0.12 MiB llama_kv_cache_unified: CPU KV buffer size = 24.00 MiB llama_kv_cache_unified: size = 24.00 MiB ( 12288 cells, 8 layers, 1 seqs), K (f16): 12.00 MiB, V (f16): 12.00 MiB ggml_backend_tsavorite_buffer_type_alloc_buffer is called from llama data Loader ANoop Allocating memory from tsi_alloc with size 659456 Allocating memory from tsi_alloc with size 659456 starting memory 0xffff83e00080 Address of Newly Created BUffer 0xffff83e00080 and size 659456 llama_context: tsavorite compute buffer size = 0.63 MiB llama_context: CPU compute buffer size = 408.51 MiB llama_context: graph nodes = 294 llama_context: graph splits = 83 (with bs=512), 53 (with bs=1) common_init_from_params: setting dry_penalty_last_n to ctx_size = 12288 main: llama threadpool init, n_threads = 4 main: model was trained on only 2048 context tokens (12288 specified) system_info: n_threads = 4 (n_threads_batch = 4) / 4 | CPU : NEON = 1 | ARM_FMA = 1 | LLAMAFILE = 1 | AARCH64_REPACK = 1 | sampler seed: 1848950838 sampler params: repeat_last_n = 5, repeat_penalty = 1.500, frequency_penalty = 0.000, presence_penalty = 0.000 dry_multiplier = 0.000, dry_base = 1.750, dry_allowed_length = 2, dry_penalty_last_n = 12288 top_k = 50, top_p = 0.900, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, top_n_sigma = -1.000, temp = 0.000 mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000 sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist generate: n_ctx = 12288, n_batch = 1024, n_predict = 10, n_keep = 1 Context: John worked at Google for 10 years. Question: Where did John work? Answer: the, it was a new and had the, llama_perf_sampler_print: sampling time = 199.36 ms / 41 runs ( 4.86 ms per token, 205.66 tokens per second) llama_perf_context_print: load time = 4815.19 ms llama_perf_context_print: prompt eval time = 2899.57 ms / 31 tokens ( 93.53 ms per token, 10.69 tokens per second) llama_perf_context_print: eval time = 1305.32 ms / 9 runs ( 145.04 ms per token, 6.89 tokens per second) llama_perf_context_print: total time = 6371.25 ms / 40 tokens TXE_ADD Operation, total tensor: 10 Number of Kernel Call: 10 Number of tensor got spilt: 0 Min Num of Elem 64 Max Num of Elem 64 TXE_SUB Operation, total tensor: 0 Number of Kernel Call: 0 Number of tensor got spilt: 0 Min Num of Elem 0 Max Num of Elem 0 TXE_MULT Operation, total tensor: 170 Number of Kernel Call: 620 Number of tensor got spilt: 0 Min Num of Elem 64 Max Num of Elem 1984 TXE_DIV Operation, total tensor: 0 Number of Kernel Call: 0 Number of tensor got spilt: 0 Min Num of Elem 0 Max Num of Elem 0 TXE_SQRT Operation, total tensor: 0 Number of Kernel Call: 0 Number of tensor got spilt: 0 Min Num of Elem 0 Max Num of Elem 0 TXE_NEG Operation, total tensor: 0 Number of Kernel Call: 0 Number of tensor got spilt: 0 Min Num of Elem 0 Max Num of Elem 0 TXE_ABS Operation, total tensor: 0 Number of Kernel Call: 0 Number of tensor got spilt: 0 Min Num of Elem 0 Max Num of Elem 0 TXE_SIN Operation, total tensor: 0 Number of Kernel Call: 0 Number of tensor got spilt: 0 Min Num of Elem 0 Max Num of Elem 0 TXE_SIGMOID Operation, total tensor: 0 Number of Kernel Call: 0 Number of tensor got spilt: 0 Min Num of Elem 0 Max Num of Elem 0 TXE_SILU Operation, total tensor: 80 Number of Kernel Call: 1160 Number of tensor got spilt: 80 Min Num of Elem 256 Max Num of Elem 7936 [2018-03-09 13:03:46.694398] 317:318 [�[32m info�[m] :: TXE resource release request processed successfully. GGML Tsavorite Profiling Results: ------------------------------------------------------------------------------------------------------------------------ Calls Total(ms) T/call Self(ms) Function ------------------------------------------------------------------------------------------------------------------------ 1790 1791.000 1.001 0.000 [27%] RuntimeHostShim::awaitCommandListCompletion 1160 1757.075 1.515 1757.075 └─ [26%] [ txe_silu ] 620 924.879 1.492 924.879 └─ [14%] [ txe_mult ] 10 14.893 1.489 14.893 └─ [ 0%] [ txe_add ] 1790 0.459 0.000 0.459 └─ [ 0%] TXE 0 Idle 1 37.000 37.000 3.000 [ 1%] GGML Tsavorite 1 34.000 34.000 34.000 └─ [ 1%] RuntimeHostShim::initialize 1 35.000 35.000 35.000 [ 1%] RuntimeHostShim::finalize 1790 2.000 0.001 2.000 [ 0%] RuntimeHostShim::loadBlob 1791 1.000 0.001 1.000 [ 0%] RuntimeHostShim::allocate 1790 1.000 0.001 1.000 [ 0%] RuntimeHostShim::addCommandToList 1790 1.000 0.001 1.000 [ 0%] RuntimeHostShim::finalizeCommandList 6000 0.000 0.000 0.000 [ 0%] RuntimeHostShim::getShmemManager 1790 0.000 0.000 0.000 [ 0%] RuntimeHostShim::createCommandList 1790 0.000 0.000 0.000 [ 0%] RuntimeHostShim::launchBlob 1790 0.000 0.000 0.000 [ 0%] RuntimeHostShim::unloadBlob 1790 0.000 0.000 0.000 [ 0%] RuntimeHostShim::deallocate ======================================================================================================================== 22113 6724.000 0.304 6724.000 [100%] TOTAL ========================================================================================================================

restart-txe endpoint

/usr/bin/tsi/v0.1.1.tsv31_06_06_2025/bin//../install/tsi-start Directory /usr/bin/tsi/v0.1.1.tsv31_06_06_2025/bin exists. Do you want to run tnApcMgr? (yes/no) Started tnApcMgr... [2018-03-09 13:02:48.005595] 315:315 [�[32m info�[m] :: APC Init started [2018-03-09 13:02:48.005822] 315:315 [�[32m info�[m] :: Logger initialized successfully. [2018-03-09 13:02:48.014635] 317:315 [�[31m�[1merror�[m] :: DB Persistent file doesnt exist for reading [2018-03-09 13:02:48.014972] 317:315 [�[32m info�[m] :: DB After parsing total node count 1, Package_count 1, chiplet_count 1, agent_count 1 [2018-03-09 13:02:48.015414] 317:315 [�[32m info�[m] :: Database initialized successfully [2018-03-09 13:02:48.015560] 317:315 [�[32m info�[m] :: RSM message queue created successfully. [2018-03-09 13:02:48.015743] 317:315 [�[32m info�[m] :: RSM message queue pop thread created successfully. [2018-03-09 13:02:49.014173] 315:315 [�[32m info�[m] :: RSM fork process successfully initialized [2018-03-09 13:02:49.015240] 319:315 [�[32m info�[m] :: CLI server listening on port 8000 [2018-03-09 13:02:50.014741] 315:315 [�[32m info�[m] :: CLI fork process successfully initialized [2018-03-09 13:02:50.019959] 317:318 [�[32m info�[m] :: DB available TXE count request processed successfully. [2018-03-09 13:02:50.020022] 315:315 [�[32m info�[m] :: Total TXE count = 1, Available TXE count = 1 [2018-03-09 13:02:50.020229] 317:318 [�[32m info�[m] :: DB agent count request processed successfully. [2018-03-09 13:02:50.020260] 315:315 [�[32m info�[m] :: txe manager count = 1, total_txe_count = 1 [2018-03-09 13:02:50.021556] 317:318 [�[32m info�[m] :: DB agent details request processed successfully. [2018-03-09 13:02:50.021708] 317:318 [�[32m info�[m] :: DB agent status update request processed successfully. [2018-03-09 13:02:50.021761] 321:315 [�[32m info�[m] :: Agent ID: 10000 successfully updated to 'Initialization in Progress' state [2018-03-09 13:02:50.023202] 321:315 [�[32m info�[m] :: Setup TXE {0, 0, 0} [2018-03-09 13:02:50.023225] 321:315 [�[32m info�[m] :: mmap() memory for chipletID:0. [2018-03-09 13:02:50.025486] 321:315 [�[31m�[1merror�[m] :: TXE reset status =0x0. Init for for TXE:0 will abort. [2018-03-09 13:02:50.025811] 317:318 [�[32m info�[m] :: DB agent status update request processed successfully. [2018-03-09 13:02:50.025827] 321:315 [�[32m info�[m] :: Agent ID: 10000 is now in 'Ready' state [2018-03-09 13:02:50.025948] 317:318 [�[32m info�[m] :: DB agent base pointer request processed successfully. [2018-03-09 13:02:50.025976] 321:315 [�[32m info�[m] :: Agent ID: 10000 base pointer successfully updated in DB [2018-03-09 13:02:50.025997] 321:315 [�[32m info�[m] :: Agent ID: 10000 is ready for use [2018-03-09 13:02:55.021172] 315:315 [�[32m info�[m] :: Total number of TXE Managers: 1 ==================================================================== TXE Manager PID Status ==================================================================== 321 0 ====================================================================

Copy link

@akapoor3518 akapoor3518 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

…path

The changes are as follows
1. change directory to right folder before running the commands
2. Add system-info and txe-restart functionlity
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants