Name		Name	Last commit message	Last commit date
parent directory ..
CMakeLists.txt		CMakeLists.txt
InferenceSetIOBuffer.cpp		InferenceSetIOBuffer.cpp
README.md		README.md
text_inference_using_cpp.py		text_inference_using_cpp.py

README.md

Text Generation using CPP Inference

Overview

This example demonstrates how to execute a model on AI 100 using Efficient Transformers and C++ APIs. The Efficient Transformers library is utilized for transforming, exporting and compiling the model, while the QPC is executed using C++ APIs. It is tested on both x86 and ARM platform.

NOTE: This supports BS>1 and Chunking.

Prerequisite

pip install pybind11
Cpp17 or above (Tested on C++17 and g++ version - 11.4.0)
QEfficient Quick Installation Guide

Setup and Execution

# Compile the cpp file using the following commands
mkdir build
cd build

cmake ..
make -j 8

cd ../../../  # Need to be in base folder - efficient-transformers to run below cmd

# Run the python script to get the generated text
python examples/cpp_execution/text_inference_using_cpp.py --model_name gpt2 --batch_size 1 --prompt_len 32 --ctx_len 128 --mxfp6 --num_cores 14 --device_group [0] --prompt "My name is" --mos 1 --aic_enable_depth_first

Future Enhancements

DMA Buffer Handling
Continuous Batching
Handling streamer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cpp_execution

cpp_execution

README.md

Text Generation using CPP Inference

Overview

Prerequisite

Setup and Execution

Future Enhancements

Files

cpp_execution

Directory actions

More options

Directory actions

More options

Latest commit

History

cpp_execution

Folders and files

parent directory

README.md

Text Generation using CPP Inference

Overview

Prerequisite

Setup and Execution

Future Enhancements