Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming API #37

Open
bkutasi opened this issue Jun 6, 2023 · 5 comments
Open

Streaming API #37

bkutasi opened this issue Jun 6, 2023 · 5 comments

Comments

@bkutasi
Copy link

bkutasi commented Jun 6, 2023

Foremost, this is a terrific project.
I've been trying to integrate it with other apps, but the API is a little bit different compared to other implementations like KobolAI and its API or textgen-webui and its API examples.
I could get it to work (while the webapp is running) with the following script with my limited knowledge, albeit it's not the best:

import requests
import json
import sys

url = 'http://0.0.0.0:5005/api/userinput'
data = {'user_input': 'What time is it? Write a very looong essay about time.'}
headers = {'Content-type': 'application/json'}

# send the POST request and stream the response
response = requests.post(url, data=json.dumps(data), headers=headers, stream=True)

# extract the text values from the JSON response
text_values = (json.loads(line).get('text') for line in response.iter_lines())
for text_value in text_values:
    print(text_value, end="")
    sys.stdout.flush() # flush the output buffer

What do you think about the possibility of making a streaming api endpoint on /api/stream that is not connected with the backend user handling and message saving, and is "stateless" so it follows the REST principles? Since it's one of the most performant backends this would surely boost its popularity.

@turboderp
Copy link
Owner

There are some people already working on APIs. But it is on my list. I just need to do a little more research to figure out what the best, minimal stateless API would look like.

@disarmyouwitha
Copy link

disarmyouwitha commented Jun 6, 2023

@bkutasi I have a (very) basic "stateless" API wrapper for exllama that might point you in the right direction:
https://github.com/disarmyouwitha/exllama/blob/master/fast_api.py
https://github.com/disarmyouwitha/exllama/blob/master/fastapi_chat.html
https://github.com/disarmyouwitha/exllama/blob/master/fastapi_request.py

fast_api.py is just a FastAPI wrapper around the model and generate_simple functions. It takes the -d command for the model directory. It will load the model and start listening on port 7862 for POST requests to http://localhost:7862/generate

You can go to /chat to load the HTML through FastAPI, which will allow you to load the page via browser.

fastapi_request.py is an example script of how to call the API from python.

This is just a quick implementation, I will actually be revisiting this code to work in some of the new improvements Turboderp made... after I get in a bit of Diablo4 this week ^^;

@bkutasi
Copy link
Author

bkutasi commented Jun 7, 2023

Your implementation looks great, I will try it out right away. Would love to see it merged down the line(in some form) into the main branch.

@bkutasi
Copy link
Author

bkutasi commented Jun 7, 2023

@disarmyouwitha your fast api is working great, but the web interface is not sending generation requests if its not accessed through the localhost, even when listening (0.0.0.0). Probably other requests are also not sent, but the page loads.
Basically everything jinja2 related to work but the other two does not.
Sorry for mentioning it here but i didn't see issue reporting active on your repo i hope turboderp wont mind it, otherwise lets move.

@disarmyouwitha
Copy link

@bkutasi oh hm, I never noticed you had to enable issues - I have opened up the issues tab in my repo if you continue to have problems we can follow up there =]

Are you accessing the GUI by clicking the .html file, or by going to http://host:7862/chat?

If accessing it through the HTML file it will always assume localhost:

// Check if the page was loaded from FastAPI or opened independently
if (!window.location.href.startsWith("http://{{host}}:{{port}}/")) 
{
    host = "localhost";
    port = "7862";
}

If accessing through /chat it should be trying to determine your host like this:

@app.get("/chat")
async def chat(request: Request, q: Union[str, None] = None):
    return templates.TemplateResponse("fastapi_chat.html", {"request": request, "host": socket.gethostname(), "port": _PORT})

(But maybe I was trying to be too clever and broke something)

I have the FastAPI running on a headless server, so I access the page like this:
http://wintermute:7862/chat

And in fastapi_requests.py I use:
r = requests.post("http://wintermute:7862/generate", json=data, stream=True)

It may be worth mentioning that you will probably need to forward port 7862 to access it from another machine:
sudo ufw allow 7862

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants