Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



10 Commits

Repository files navigation

Replicate LLM Model API | Voiceflow

A simple server built on Node.js using the Express framework that provides an interface to generate text using the Replicate library. The service receives text prompts and returns the generated text based on the given prompt and the selected LLM model.


If you are running this on Node.js 16, either:

run the application with NODE_OPTIONS='--experimental-fetch' node ..., or install node-fetch and follow the instructions here

If you are running this on Node.js 18 or 19, you do not need to do anything.


To install and set up the repository, follow these steps:

Clone the repository:

git clone
cd repo

Install dependencies:

npm install

Create a .env file (or rename the .env.example) in the root directory with your REPLICATE_TOKEN: REPLICATE_TOKEN=your_replicate_token

Replace your_replicate_token with your own token.

(Optional) If you need to use a specific port number, add the PORT variable to your .env file: PORT=your_preferred_port

Replace PORT value with your desired port number.


Once you have the repository installed and environment variables set up, you can start the server using the command:

npm start

It will start the server at http://localhost:PORT, where PORT is the value from your .env file or the default value 3210.



This endpoint allows a POST request with the prompt, the model name and the settings for this model. You can find more information about the settings from the models documentation below.


Parameter Type Description Default
prompt String Required. The text prompt for generation.
model String Model identifier, available in `models.json`. "dolly-v2-12b"
max_tokens Integer Maximum number of tokens to generate. 100
max_length Integer Maximum length of the generated text. 100
top_p Float Top-p sampling value. 1
top_k Integer Top-k sampling value.
decoding String Decoding strategy, either "top_p" or "top_k". "top_p"
temperature Float Softmax temperature. 0.75
repetition_penalty Float Penalty for repetitive tokens. 1.2


A JSON object containing the following fields:

Field Type Description
success Boolean Whether the request succeeded
response String The generated text output
processingTimeSec Float The processing time in seconds
error String Error message (if any)


  "prompt":"Who was Dolly the sheep?",
  "max_tokens": 500,
  "temperature": 0.75,
  "repetition_penalty": 1.2,


  "success": true,
  "response": "Dolly could have been any sheep, but she especially became famous because she was the first successfully cloned mammal\n\n",
  "processingTimeSec": 0.5,

Available models

Here is a list of the models you can use with this API. Of course, you can update the model.json file as you want to add/remove models.

Model Name Creator
dolly-v2-12b Databricks
stablelm-tuned-alpha-7b Stability AI
flan-t5-xl Google
llama-7b Meta AI
oasst-sft-1-pythia-12b Open-Assistant
gpt-j-6b EleutherAI

The split setting for each model can be set to true or false. It's used to join the response array into a string for model that returns an array of strings.

Using ngrok

To allow access to the app externally using the port set in the .env file, you can use ngrok. Follow the steps below:

  1. Install ngrok:
  2. Run ngrok http <port> in your terminal (replace <port> with the port set in your .env file)
  3. Copy the ngrok URL generated by the command and use it in your Voiceflow Assistant API step.

This can be handy if you want to quickly test this in an API step within your Voiceflow Assistant.


No description, website, or topics provided.






No releases published


No packages published