Streaming API #37

bkutasi · 2023-06-06T21:07:15Z

Foremost, this is a terrific project.
I've been trying to integrate it with other apps, but the API is a little bit different compared to other implementations like KobolAI and its API or textgen-webui and its API examples.
I could get it to work (while the webapp is running) with the following script with my limited knowledge, albeit it's not the best:

import requests
import json
import sys

url = 'http://0.0.0.0:5005/api/userinput'
data = {'user_input': 'What time is it? Write a very looong essay about time.'}
headers = {'Content-type': 'application/json'}

# send the POST request and stream the response
response = requests.post(url, data=json.dumps(data), headers=headers, stream=True)

# extract the text values from the JSON response
text_values = (json.loads(line).get('text') for line in response.iter_lines())
for text_value in text_values:
    print(text_value, end="")
    sys.stdout.flush() # flush the output buffer

What do you think about the possibility of making a streaming api endpoint on /api/stream that is not connected with the backend user handling and message saving, and is "stateless" so it follows the REST principles? Since it's one of the most performant backends this would surely boost its popularity.

The text was updated successfully, but these errors were encountered:

turboderp · 2023-06-06T21:54:08Z

There are some people already working on APIs. But it is on my list. I just need to do a little more research to figure out what the best, minimal stateless API would look like.

disarmyouwitha · 2023-06-06T23:29:27Z

@bkutasi I have a (very) basic "stateless" API wrapper for exllama that might point you in the right direction:
https://github.com/disarmyouwitha/exllama/blob/master/fast_api.py
https://github.com/disarmyouwitha/exllama/blob/master/fastapi_chat.html
https://github.com/disarmyouwitha/exllama/blob/master/fastapi_request.py

fast_api.py is just a FastAPI wrapper around the model and generate_simple functions. It takes the -d command for the model directory. It will load the model and start listening on port 7862 for POST requests to http://localhost:7862/generate

You can go to /chat to load the HTML through FastAPI, which will allow you to load the page via browser.

fastapi_request.py is an example script of how to call the API from python.

This is just a quick implementation, I will actually be revisiting this code to work in some of the new improvements Turboderp made... after I get in a bit of Diablo4 this week ^^;

bkutasi · 2023-06-07T05:12:26Z

Your implementation looks great, I will try it out right away. Would love to see it merged down the line(in some form) into the main branch.

bkutasi · 2023-06-07T15:13:02Z

@disarmyouwitha your fast api is working great, but the web interface is not sending generation requests if its not accessed through the localhost, even when listening (0.0.0.0). Probably other requests are also not sent, but the page loads.
Basically everything jinja2 related to work but the other two does not.
Sorry for mentioning it here but i didn't see issue reporting active on your repo i hope turboderp wont mind it, otherwise lets move.

disarmyouwitha · 2023-06-07T17:14:20Z

@bkutasi oh hm, I never noticed you had to enable issues - I have opened up the issues tab in my repo if you continue to have problems we can follow up there =]

Are you accessing the GUI by clicking the .html file, or by going to http://host:7862/chat?

If accessing it through the HTML file it will always assume localhost:

// Check if the page was loaded from FastAPI or opened independently
if (!window.location.href.startsWith("http://{{host}}:{{port}}/")) 
{
    host = "localhost";
    port = "7862";
}

If accessing through /chat it should be trying to determine your host like this:

@app.get("/chat")
async def chat(request: Request, q: Union[str, None] = None):
    return templates.TemplateResponse("fastapi_chat.html", {"request": request, "host": socket.gethostname(), "port": _PORT})

(But maybe I was trying to be too clever and broke something)

I have the FastAPI running on a headless server, so I access the page like this:
http://wintermute:7862/chat

And in fastapi_requests.py I use:
r = requests.post("http://wintermute:7862/generate", json=data, stream=True)

It may be worth mentioning that you will probably need to forward port 7862 to access it from another machine:
sudo ufw allow 7862

rt974 mentioned this issue Jun 12, 2023

usage disarmyouwitha/exllama#6

Open

ZanMax mentioned this issue Apr 18, 2024

Run on CPU without AVX2 #315

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming API #37

Streaming API #37

bkutasi commented Jun 6, 2023

turboderp commented Jun 6, 2023

disarmyouwitha commented Jun 6, 2023 •

edited

Loading

bkutasi commented Jun 7, 2023

bkutasi commented Jun 7, 2023

disarmyouwitha commented Jun 7, 2023

Streaming API #37

Streaming API #37

Comments

bkutasi commented Jun 6, 2023

turboderp commented Jun 6, 2023

disarmyouwitha commented Jun 6, 2023 • edited Loading

bkutasi commented Jun 7, 2023

bkutasi commented Jun 7, 2023

disarmyouwitha commented Jun 7, 2023

disarmyouwitha commented Jun 6, 2023 •

edited

Loading