lua-cgemma

Lua bindings for gemma.cpp.

Requirements

Before starting, you should have installed:

CMake
C++ compiler, supporting at least C++17
LuaJIT, recommended to install OpenResty directly

Installation

1st step: Clone the source code from GitHub: git clone https://github.com/ufownl/lua-cgemma.git

2nd step: Build and install:

To build and install using the default settings, just enter the repository's directory and run the following commands:

mkdir build
cd build
cmake .. && make
sudo make install

3rd step: See here to learn how to obtain model weights and tokenizer.

Usage

Synopsis

-- Create a Gemma instance
local gemma, err = require("cgemma").new({
  tokenizer = "/path/to/tokenizer.spm",
  model = "2b-it",
  weights = "/path/to/2b-it-sfp.sbs"
})
if not gemma then
  print("Opoos! ", err)
  return
end

-- Create a chat session
local session, seed = gemma:session()
if not session then
  print("Opoos! ", seed)
  return
end

print("Random seed of session: ", seed)
while true do
  print("New conversation started")

  -- Multi-turn chat
  while session:ready() do
    io.write("> ")
    local text = io.read()
    if not text then
      print("End of file")
      return
    end
    local reply, err = session(text)
    if not reply then
      print("Opoos! ", err)
      return
    end
    print("reply: ", reply)
  end

  print("Exceed the maximum number of tokens")
  session:reset()
end

APIs for Lua

cgemma.info

syntax: cgemma.info()

Show information of cgemma module.

cgemma.new

syntax: <cgemma.instance>inst, <string>err = cgemma.new(<table>options)

Create a Gemma instance.

A successful call returns a Gemma instance. Otherwise, it returns nil and a string describing the error.

Available options:

{
  tokenizer = "/path/to/tokenizer.spm",  -- Path of tokenizer model file. (required)
  model = "2b-it",  -- Model type:
                    -- 2b-it (Gemma 2B parameters, instruction-tuned),
                    -- 2b-pt (Gemma 2B parameters, pretrained),
                    -- 7b-it (Gemma 7B parameters, instruction-tuned),
                    -- 7b-pt (Gemma 7B parameters, pretrained),
                    -- 9b-it (Gemma2 9B parameters, instruction-tuned),
                    -- 9b-pt (Gemma2 9B parameters, pretrained),
                    -- 27b-it (Gemma2 27B parameters, instruction-tuned),
                    -- 27b-pt (Gemma2 27B parameters, pretrained),
                    -- gr2b-it (Griffin 2B parameters, instruction-tuned),
                    -- gr2b-pt (Griffin 2B parameters, pretrained),
                    -- gemma2-2b-it (Gemma2 2.6B parameters, instruction-tuned),
                    -- gemma2-2b-pt (Gemma2 2.6B parameters, pretrained).
                    -- (required)
  weights = "/path/to/2b-it-sfp.sbs",  -- Path of model weights file. (required)
  weight_type = "sfp",  -- Weight type:
                        -- sfp (8-bit FP, default)
                        -- f32 (float)
                        -- bf16 (bfloat16)
  scheduler = sched_inst,  -- Instance of scheduler, if not provided a default
                           -- scheduler will be attached.
  disabled_words = {...},  -- Words you don't want to generate.
}

cgemma.scheduler

syntax: <cgemma.scheduler>sched, <string>err = cgemma.scheduler([<number>num_threads])

Create a scheduler instance.

A successful call returns a scheduler instance. Otherwise, it returns nil and a string describing the error.

The only parameter num_threads indicates the number of threads in the internal thread pool. If not provided or num_threads <= 0, it will create a default scheduler with the number of threads depending on the concurrent threads supported by the implementation.

cgemma.scheduler.pin_threads

syntax: sched:pin_threads()

Pin the scheduler's threads to logical processors.

cgemma.instance.disabled_tokens

syntax: <table>tokens = inst:disabled_tokens()

Query the disabled tokens of a Gemma instance.

cgemma.instance.session

syntax: <cgemma.session>sess, <number or string>seed = inst:session([<table>options])

Create a chat session.

A successful call returns the session and its random seed. Otherwise, it returns nil and a string describing the error.

Available options and default values:

{
  max_tokens = 3072,  -- Maximum number of tokens in prompt + generation.
  max_generated_tokens = 2048,  -- Maximum number of tokens to generate.
  prefill_tbatch = 64,  -- Prefill: max tokens per batch.
  decode_qbatch = 16,  -- Decode: max queries per batch.
  temperature = 1.0,  -- Temperature for top-K.
  seed = 42,  -- Random seed. (default is random setting)
}

cgemma.session.ready

syntax: <boolean>ok = sess:ready()

Check if the session is ready to chat.

cgemma.session.reset

syntax: sess:reset()

Reset the session to start a new conversation.

cgemma.session.dumps

syntax: <string>data, <string>err = sess:dumps()

Dump the current state of the session to a Lua string.

A successful call returns a Lua string that stores state data (binary) of the session. Otherwise, it returns nil and a string describing the error.

cgemma.session.loads

syntax: <boolean>ok, <string>err = sess:loads(<string>data)

Load the state data from the given Lua string to restore a previous session.

A successful call returns true. Otherwise, it returns false and a string describing the error.

cgemma.session.dump

syntax: <boolean>ok, <string>err = sess:dump(<string>path)

Dump the current state of the session to a specific file.

A successful call returns true. Otherwise, it returns false and a string describing the error.

cgemma.session.load

syntax: <boolean>ok, <string>err = sess:load(<string>path)

Load the state data from the given file to restore a previous session.

A successful call returns true. Otherwise, it returns false and a string describing the error.

cgemma.session.stats

syntax: <table>statistics = sess:stats()

Get statistics for the current session.

Example of statistics:

{
  prefill_tokens_per_second = 34.950446398036,
  generate_tokens_per_second = 9.0089134969039,
  time_to_first_token = 0.8253711364232,
  tokens_generated = 85
}

metatable(cgemma.session).__call

syntax: <string or boolean>reply, <string>err = sess(<string>text[, <function>stream])

Generate reply.

A successful call returns the content of the reply (without a stream function) or true (with a stream function). Otherwise, it returns nil and a string describing the error.

The stream function is defined as follows:

function stream(token, pos, prompt_size)
  if pos < prompt_size then
    -- Gemma is processing the prompt
    io.write(pos == 0 and "reading and thinking ." or ".")
  elseif token then
    -- Stream the token text output by Gemma here
    if pos == prompt_size then
      io.write("\nreply: ")
    end
    io.write(token)
  else
    -- Gemma's output reaches the end
    print()
  end
  io.flush()
  -- return `true` indicates success; return `false` indicates failure and terminates the generation
  return true
end

License

BSD-3-Clause license. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
.github/workflows		.github/workflows
cmake		cmake
demo		demo
examples		examples
src		src
tools		tools
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lua-cgemma

Requirements

Installation

Usage

Synopsis

APIs for Lua

cgemma.info

cgemma.new

cgemma.scheduler

cgemma.scheduler.pin_threads

cgemma.instance.disabled_tokens

cgemma.instance.session

cgemma.session.ready

cgemma.session.reset

cgemma.session.dumps

cgemma.session.loads

cgemma.session.dump

cgemma.session.load

cgemma.session.stats

metatable(cgemma.session).__call

License

About

Releases

Packages

Languages

License

ufownl/lua-cgemma

Folders and files

Latest commit

History

Repository files navigation

lua-cgemma

Requirements

Installation

Usage

Synopsis

APIs for Lua

cgemma.info

cgemma.new

cgemma.scheduler

cgemma.scheduler.pin_threads

cgemma.instance.disabled_tokens

cgemma.instance.session

cgemma.session.ready

cgemma.session.reset

cgemma.session.dumps

cgemma.session.loads

cgemma.session.dump

cgemma.session.load

cgemma.session.stats

metatable(cgemma.session).__call

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages