JSON parse errors handling streamed responses #44

oxaronick · 2024-01-23T17:52:15Z

Describe the bug

When I run a local Ollama server on my M1 MacBook, Twinny seems to always receive a single HTTP chunk with a complete JSON object it containing the query response.

However, when running Ollama on a server with a decent GPU, sometimes the responses are large enough that they get split up into multiple chunks, and Twinny attempts to parse incomplete JSON objects.

I believe there is a bug in Twinny's handling of Ollama's "chunked" HTTP transfer encoding, because if I inspect the object being passed to JSON.parse in the extension I see incomplete objects, and a stack trace like this from Twinny:

2024-01-23 12:42:54.158 [error] SyntaxError: Expected ',' or ']' after array element in JSON at position 11330
	at JSON.parse (<anonymous>)
	at onData (/Users/nick/.vscode/extensions/rjmacarthy.twinny-2.6.17/out/extension.js:685:46)
	at IncomingMessage.<anonymous> (/Users/nick/.vscode/extensions/rjmacarthy.twinny-2.6.17/out/extension.js:988:17)
	at IncomingMessage.emit (node:events:513:28)
	at Readable.read (node:internal/streams/readable:539:10)
	at flow (node:internal/streams/readable:1023:34)
	at resume_ (node:internal/streams/readable:1004:3)
	at process.processTicksAndRejections (node:internal/process/task_queues:82:21)

It looks like Twinny is receiving a chunk and trying to parse it, but the chunk is not guaranteed to be valid JSON data:

onData: (chunk, onDestroy) => {
              const json = JSON.parse(chunk)
              completion = completion + json.response

I don't notice this locally, just when running against a server.

To Reproduce
Steps to reproduce the behavior:

set up Twinny to use an Ollama server that will return large responses (e.g. 15kB)
use Twinny in code completion mode

Expected behavior

Twinny should accept the completion suggestion from the server and display it in the editor.

Actual Behaviour

Twinny attempts to parse an incomplete response, resulting in a JSON parsing stack trace, and no suggestion is shown.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: MacOS

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

rjmacarthy · 2024-01-23T20:08:13Z

Hey @rcgtnick thanks for the report. I'm confused by this because twinny is using the stream: true option for Ollama API and this should always return multiple small JSON responses.

As far as I can tell you are using your server as some kind of proxy between twinny and the ollama instance? Are you running a proxy server between Ollama and your API? If so, are you also piping the request or taking the input and making an API call without stream: true this is the only reason that any responses would exceed 15kb.

Many thanks.

oxaronick · 2024-01-23T20:23:37Z

I may be wrong about the cause, then.

The only thing between Twinny and Ollama is a reverse proxy, which shouldn't be modifying headers or request bodies at all. I captured some network traffic for a request between the proxy and Ollama, to see what is happening on the wire:

ollama_query_server.txt

Looks like it's not the chunking - while the last chunk is quite large, it's still just one chunk. However, if I enable developer tools in VSCode, add a console.log to print the chunk right before it's passed to JSON.parse, I can see that it'schunk contains only part of the complete chunk from the server, and the error from the JSON module accurately describes the problem (data is cut off in the middle of an array, should have been a comma or a ] next).

Interestingly, though, a capture from the local Ollama request shows an even larger response, but it's received by Twinny all together and parsed just fine:

ollama_query_local.txt

Any other idea what would cause this?

rjmacarthy · 2024-01-23T20:33:51Z

After looking at both captures, all of the JSON that is returned/printed in the document seems to be parseable so it makes it harder for me to understand what could be causing it. If it is as you say that large responses are the issue I think a fix might be to check for done: true and cut off the context before parsing.

oxaronick · 2024-01-23T20:48:56Z

Just to rule out the proxy, I opened a port directly to Ollama and captured a Twinny request from the server side. The results look like the above two captures: complete JSON objects in each chunk, but the last chunk is very large.

The same error occurred: JSON parsing error expecting a comma or ].

ollama_query_server_notls.txt

2024-01-23 15:45:55.399 [error] SyntaxError: Expected ',' or ']' after array element in JSON at position 1434
	at JSON.parse (<anonymous>)
	at onData (/Users/nick/.vscode/extensions/rjmacarthy.twinny-2.6.17/out/extension.js:684:46)
	at IncomingMessage.<anonymous> (/Users/nick/.vscode/extensions/rjmacarthy.twinny-2.6.17/out/extension.js:987:17)
	at IncomingMessage.emit (node:events:513:28)
	at addChunk (node:internal/streams/readable:324:12)
	at readableAddChunk (node:internal/streams/readable:297:9)
	at Readable.push (node:internal/streams/readable:234:10)
	at HTTPParser.parserOnBody (node:_http_common:131:24)
	at Socket.socketOnData (node:_http_client:542:22)
	at Socket.emit (node:events:513:28)
	at addChunk (node:internal/streams/readable:324:12)
	at readableAddChunk (node:internal/streams/readable:297:9)
	at Readable.push (node:internal/streams/readable:234:10)
	at TCP.onStreamRead (node:internal/stream_base_commons:190:23)

oxaronick · 2024-01-23T20:54:57Z

And just so it's all here, here's an example of a cut-off response object right before Twinny tries to parse it as JSON:

{"model":"codellama:7b-code","created_at":"2024-01-23T20:52:47.488087387Z","response":"","done":true,"context":[32007,29871,13,13,458,17088,29901,12728,313,7729,29897,29871,13,458,3497,21333,29901,934,597,29914,5959,29914,19254,6294,4270,401,29914,24299,29914,29878,29926,8628,279,21155,29889,29873,5080,1460,29899,29906,29889,29953,29889,29896,29955,29914,449,29914,17588,29889,1315,313,7729,29897,29871,13,18884,716,21501,3552,29872,29897,1149,23597,14885,1149,321,11864,29900,511,29871,29896,29872,29941,876,13,795,3482,13,9651,5615,890,13,4706,3980,13,418,2981,13,539,29906,29946,29929,29901,313,29872,29892,260,29897,1149,426,13,4706,376,1509,9406,1769,13,4706,4669,29889,7922,4854,29898,29873,29892,376,1649,267,7355,613,426,995,29901,1738,29900,500,511,13,3986,313,29873,29889,29880,8737,353,1780,29871,29900,511,13,3986,313,29873,29889,29880,8737,353,426,13,9651,23741,29901,426,13,795,1024,29901,376,10562,924,613,13,795,17752,29901,518,1642,1372,613,11393,312,29879,613,11393,29885,1372,12436,13,795,3440,29901,426,1369,29901,376,458,29908,2981,13,9651,2981,13,9651,23741,8423,29901,426,13,795,1024,29901,376,10562,924,9537,613,13,795,17752,29901,518,1642,1372,29916,12436,13,795,3440,29901,426,1369,29901,376,458,29908,2981,13,9651,2981,13,9651,3513,29901,426,13,795,1024,29901,376,29967,2516,613,13,795,17752,29901,518,1642,1315,613,11393,1315,29916,613,11393,29883,1315,12436,13,795,3440,29901,426,1369,29901,376,458,29908,2981,13,9651,2981,13,9651,6965,29916,29901,426,13,795,1024,29901,376,8700,29990,613,13,795,17752,29901,518,1642,1315,29916,12436,13,795,3440,29901,426,1369,29901,376,458,29908,2981,13,9651,2981,13,9651,3472,29901,426,13,795,1024,29901,376,7020,613,13,795,17752,29901,518,1642,13357,613,11393,1420,12436,13,795,3440,29901,426,1369,29901,6634,29916,29941,29883,6172,613,1095,29901,376,489,29905,29916,29941,29872,29908,2981,13,9651,2981,13,9651,5997,29901,426,1024,29901,376,19407,613,17752,29901,518,1642,4268,3108,2981,13,9651,269,465,29901,426,13,795,1024,29901,376,8132,1799,613,13,795,17752,29901,518,1642,29879,465,12436,13,795,3440,29901,426,1369,29901,376,458,29908,2981,13,9651,2981,13,9651,885,893,29901,426,13,795,1024,29901,376,7187,1799,613,13,795,17752,29901,518,1642,1557,893,12436,13,795,3440,29901,426,1369,29901,376,458,29908,2981,13,9651,2981,13,9651,4390,29901,426,1024,29901,376,7249,613,17752,29901,518,1642,3126,613,11393,3126,29880,613,11393,24756,3126,3108,2981,13,9651,343,8807,29901,426,13,795,1024,29901,376,29979,23956,613,13,795,17752,29901,518,1642,21053,613,11393,25162,12436,13,795,3440,29901,426,1369,29901,12305,29908,2981,13,9651,2981,13,9651,4903,29901,426,13,795,1024,29901,376,9165,613,13,795,17752,29901,518,1642,3134,12436,13,795,3440,29901,426,1369,29901,6634,29916,29941,29883,6172,613,1095,29901,376,489,29905,29916,29941,29872,29908,2981,13,9651,2981,13,9651,2115,29901,426,13,795,1024,29901,376,8404,613,13,795,17752,29901,518,1642,1645,12436,13,795,3440,29901,426,1369,29901,376,458,29908,2981,13,9651,2981,13,9651,413,13961,29901,426,13,795,1024,29901,376,29968,13961,613,13,795,17752,29901,518,1642,1193,613,11393,1193,29885,613,11393,1193,29879,12436,13,795,3440,29901,426,1369,29901,376,458,29908,2981,13,9651,2981,13,9651,12086,29901,426,13,795,1024,29901,376,10840,2027,613,13,795,17752,29901,518,1642,26792,12436,13,795,3440,29901,426,1369,29901,376,458,29908,2981,13,9651,2981,13,9651,376,3318,573,29899,29883,1115,426,13,795,1024,29901,376,2061,573,315,613,13,795,17752,29901,518,1642,29882,613,11393,29885,613,11393,4317,12436,13,795,3440,29901,426,1369,29901,376,458,29908,2981,13,9651,2981,13,9651,21580,29901,426,13,795,1024,29901,376,29934,504,613,13,795,17752,29901,518,1642,2288,613,11393,2288,29889,262,12436,13,795,3440,29901,426,1369,29901,376,458,29908,2981,13,9651,2981,13,9651,3017,29901,426,13,795,1024,29901,376,11980,613,13,795,17752,29901,518,1642,2272,12436,13,795,3440,29901,426,1369,29901,12305,29908,2981,13,9651,2981,13,9651,274,29901,426,13,795,1024,29901,376,29907,613,13,795,17752,29901,518,1642,29883,613,11393,29882,12436,13,795,3440,29901,426,1369,29901,376,458,29908,2981

(That's for a different request than the capture, so the responses won't line up exactly.)

rjmacarthy · 2024-01-23T20:55:19Z

I just added a new version to cut off everything after context in the JSON response if you say it's cut off it might help, please let me know. It's a bit of a hack but should work.

oxaronick · 2024-01-23T20:57:00Z

Thanks, I'll give it a try.

rjmacarthy · 2024-01-24T13:43:12Z

@rcgtnick all good?

oxaronick · 2024-01-24T13:49:50Z

EDIT: This was on v2.6.18.

Now it looks like multiple chunks are getting processed at the same time.

Here is the console.log with the object just before it's passed to JSON.parse:

{"model":"codellama:13b-code","created_at":"2024-01-24T13:40:28.671507727Z","response":"\n","done":false}
{"model":"codellama:13b-code","created_at":"2024-01-24T13:40:28.683629072Z","response":"                ","done":false}

And here's the stack trace from the extension upon calling JSON.parse:

log.ts:441   ERR [Extension Host] SyntaxError: Unexpected non-whitespace character after JSON at position 106
	at JSON.parse (<anonymous>)
	at onData (/Users/nick/.vscode/extensions/rjmacarthy.twinny-2.6.18/out/extension.js:690:48)
	at IncomingMessage.<anonymous> (/Users/nick/.vscode/extensions/rjmacarthy.twinny-2.6.18/out/extension.js:1001:17)
	at IncomingMessage.emit (node:events:513:28)
	at Readable.read (node:internal/streams/readable:539:10)
	at flow (node:internal/streams/readable:1023:34)
	at resume_ (node:internal/streams/readable:1004:3)
	at process.processTicksAndRejections (node:internal/process/task_queues:82:21)

Based on packet captures, the server is going to send multiple JSON objects in one response, and it looks like the extension is processing both as a single JSON object.

oxaronick · 2024-01-24T13:54:25Z

On v2.6.21 I see the large, truncated JSON object and a JSON parsing error, same as before 2.6.18.

  ERR [Extension Host] SyntaxError: Unexpected end of JSON input
	at JSON.parse (<anonymous>)
	at onData (/Users/nick/.vscode/extensions/rjmacarthy.twinny-2.6.21/out/extension.js:709:50)
	at IncomingMessage.<anonymous> (/Users/nick/.vscode/extensions/rjmacarthy.twinny-2.6.21/out/extension.js:1020:17)
	at IncomingMessage.emit (node:events:513:28)
	at Readable.read (node:internal/streams/readable:539:10)
	at flow (node:internal/streams/readable:1023:34)
	at resume_ (node:internal/streams/readable:1004:3)
	at process.processTicksAndRejections (node:internal/process/task_queues:82:21)

How about I give you a server you can send requests to?

rjmacarthy · 2024-01-24T14:05:38Z

I'd be happy to try a server hosted version in case there is any further can be done. However, I'm tempted to revert the flimsy changes introduced to remove the context in the previous release as it appears to be just a server issue. Feel free to email me at richardmacarthy@protonmail.com

oxaronick · 2024-01-24T17:34:07Z

If this issue isn't high priority for you, I get it! I've stuck with this so far because I need a VSCode extension that will use a remote Ollama API, and most extensions I've tried only work with local Ollama.

I've done a bunch of testing with past versions, and I've found:

the problem only occurs when response chunks reach a certain size: <1kB chunks are fine, but >30kB chunks are not
as I go through versions from 2.6.0 to 2.6.15 I get longer responses back

I can't reproduce the issue with 2.6.14, but I can with 2.6.15 and later versions. The prompts are longer - I have a capture of 2.6.15 sending 25kB to the server and getting a 32kB response back (which triggers the JSON parsing issue).

In 2.6.21, if I set the Context Length from 300 (the default) down to 20 I do not encounter the issue. I get responses of around 5kB, which seem to be fine. At 100 lines of context I get responses around 11kB, which do cause the issue. I'll try using the extension with small values here and see if it's still helpful.

I'll send you an email and get you set up with a dev server in case you want to try it out.

EDIT: I found the "Num Predict Fim" setting, looks like that gets me back smaller responses too, probably more reliably than limiting the context length.

oxaronick · 2024-01-24T18:23:10Z

As for the code, I don't know JS, but it looks like maybe this pattern would be better than calling onData every time data is available in the response. The callback handler for the on('data') event is definitely getting called before the full HTTP chunk is available from the wire.

In the SO post, the .on('data') callback just builds a string until the .on('end') callback is called, at which point it processes the data. Maybe that's when you call the onData callback?

oxaronick · 2024-01-24T19:19:16Z

I tried this ^^ and ended up where you did in an earlier attempt, sending all the chunks at once, which is a series of JSON objects but not one valid JSON object.

I tried looking for chunk delimiters in the stream and that seems to have done it. I'll put up a PR.

rjmacarthy · 2024-01-24T19:30:35Z

I see, I think that there may be a better way. The suggestion to add the callback in end is not the correct approach but I think if we buffer the response until its parse-able that would be better. I will release a new version shortly to remove the flimsy fix and try this approach. If you PR makes it in time I will review it also. Thanks.

rjmacarthy · 2024-01-24T19:34:35Z

I just released version v2.6.23, please could you test it and let me know how you get on?

oxaronick · 2024-01-24T20:59:43Z

I don't see v2.6.23 yet, but I tested on v2.6.22.

It's not raising the error any longer, but it seems harder to get it to suggest a completion. Maybe it's not related, but I really have to coax it now.

rjmacarthy · 2024-01-24T21:05:49Z

Version v2.6.22 includes the fix. For me on localhost this works fast and the same as before, this extra logic make no difference. For you, it may be that you server is taking some time send parse-able data before the completion callback is executed.

oxaronick · 2024-01-24T21:17:56Z

I think it would be better to look for the delimiters in the HTTP chunked transfer encoding than to try parsing every response as JSON. Trying to parse everything as JSON is less efficient and won't work if you get a packet that is the end of one chunk and the start of another.

Newlines are built into the protocol and are meant to tell you when a chunk ends, so splitting up the data base on newlines is more correct, more reliable, and more efficient.

I don't care if you use my PR or not, but there's a packet trace in the description that shows exactly what I mean.

rjmacarthy · 2024-01-24T21:20:06Z

Ok, thanks @rcgtnick Ill take another look soon.

rjmacarthy · 2024-01-25T08:08:28Z

Hey @rcgtnick I just merged #48 which includes your changes. You're correct that this is a better solution for speed and efficiency. Please let me know if it works and we'll close the issue.

Many thanks

oxaronick · 2024-01-25T13:43:47Z

2.6.24 looks good!

Glad I could help, and thanks for all your work on Twinny!

oxaronick · 2024-03-06T15:49:27Z

Funny thing happened to me while I was writing a small LLM chat app that streamed responses. After a while of chatting with an LLM I started getting JSON parsing errors on my chunks.

It took me a minute, but eventually I realized I already knew what the problem was and how to solve the it!

oxaronick changed the title ~~JSON parse errors handling streamed responses~~ JSON parse errors handling chunked responses Jan 23, 2024

oxaronick changed the title ~~JSON parse errors handling chunked responses~~ JSON parse errors handling streamed responses Jan 23, 2024

oxaronick mentioned this issue Jan 24, 2024

handle large chunks that get split by http lib #46

Closed

oxaronick closed this as completed Jan 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JSON parse errors handling streamed responses #44

JSON parse errors handling streamed responses #44

oxaronick commented Jan 23, 2024

rjmacarthy commented Jan 23, 2024 •

edited

Loading

oxaronick commented Jan 23, 2024

rjmacarthy commented Jan 23, 2024

oxaronick commented Jan 23, 2024 •

edited

Loading

oxaronick commented Jan 23, 2024

rjmacarthy commented Jan 23, 2024

oxaronick commented Jan 23, 2024

rjmacarthy commented Jan 24, 2024

oxaronick commented Jan 24, 2024 •

edited

Loading

oxaronick commented Jan 24, 2024

rjmacarthy commented Jan 24, 2024 •

edited

Loading

oxaronick commented Jan 24, 2024 •

edited

Loading

oxaronick commented Jan 24, 2024

oxaronick commented Jan 24, 2024

rjmacarthy commented Jan 24, 2024

rjmacarthy commented Jan 24, 2024

oxaronick commented Jan 24, 2024

rjmacarthy commented Jan 24, 2024 •

edited

Loading

oxaronick commented Jan 24, 2024

rjmacarthy commented Jan 24, 2024

rjmacarthy commented Jan 25, 2024

oxaronick commented Jan 25, 2024

oxaronick commented Mar 6, 2024

JSON parse errors handling streamed responses #44

JSON parse errors handling streamed responses #44

Comments

oxaronick commented Jan 23, 2024

rjmacarthy commented Jan 23, 2024 • edited Loading

oxaronick commented Jan 23, 2024

rjmacarthy commented Jan 23, 2024

oxaronick commented Jan 23, 2024 • edited Loading

oxaronick commented Jan 23, 2024

rjmacarthy commented Jan 23, 2024

oxaronick commented Jan 23, 2024

rjmacarthy commented Jan 24, 2024

oxaronick commented Jan 24, 2024 • edited Loading

oxaronick commented Jan 24, 2024

rjmacarthy commented Jan 24, 2024 • edited Loading

oxaronick commented Jan 24, 2024 • edited Loading

oxaronick commented Jan 24, 2024

oxaronick commented Jan 24, 2024

rjmacarthy commented Jan 24, 2024

rjmacarthy commented Jan 24, 2024

oxaronick commented Jan 24, 2024

rjmacarthy commented Jan 24, 2024 • edited Loading

oxaronick commented Jan 24, 2024

rjmacarthy commented Jan 24, 2024

rjmacarthy commented Jan 25, 2024

oxaronick commented Jan 25, 2024

oxaronick commented Mar 6, 2024

rjmacarthy commented Jan 23, 2024 •

edited

Loading

oxaronick commented Jan 23, 2024 •

edited

Loading

oxaronick commented Jan 24, 2024 •

edited

Loading

rjmacarthy commented Jan 24, 2024 •

edited

Loading

oxaronick commented Jan 24, 2024 •

edited

Loading

rjmacarthy commented Jan 24, 2024 •

edited

Loading