Checking of long documents fails with LanguageTool Premium API #215

protyposis · 2023-01-27T18:42:51Z

Describe the bug

Long documents are not checked when the LanguageTool HTTP API is used. Requests seem to fail with error 413. When checking a (short) selection of the document, it works as expected. The full document is also correctly checked when the local LanguageTool instance is used (i.e., when no languageToolHttpServerUri is configured). A long document is any document with, e.g., 8000 words and 50000 characters.

Steps to reproduce

Open long document in VSCode, wait forever for language check results (or watch failure in LTeX Language Server logs).

Expected behavior

The document is fully checked. If the text is too long, I would expect it to be split into multiple separate requests that get successfully processed. In the worst case though, it should at least show a visible warning to the user instead of silently failing.

Sample document

Reproduction sample can be generated on https://www.lipsum.com/ by choosing 8000 words.

LTeX configuration

    "ltex.languageToolHttpServerUri": "https://api.languagetoolplus.com/",
    "ltex.languageToolOrg.username": "[removed]",
    "ltex.languageToolOrg.apiKey": "[removed]"

LTeX LS log

Jan 27, 2023 7:19:22 PM org.bsplines.ltexls.server.DocumentChecker logTextToBeChecked
FINE: Checking the following text in language 'en-US' via LanguageTool: "[removed]"... (truncated to 100 characters)
Jan 27, 2023 7:19:23 PM org.bsplines.ltexls.languagetool.LanguageToolHttpInterface checkInternal
SEVERE: LanguageTool failed with HTTP status code 413
Jan 27, 2023 7:19:23 PM org.bsplines.ltexls.server.DocumentChecker checkAnnotatedTextFragment
FINE: Obtained 0 rule matches

Version information

Operating system: Windows 11
vscode-ltex: 13.1.0
ltex-ls: no idea how to figure this out from the VSCode extension

The text was updated successfully, but these errors were encountered:

real-or-random · 2023-02-08T17:40:42Z

I see the same issue on emacs / lsp-ltex-ls (though I can't find a log that confirms the 413 error code)

For me, the limit seems to be around ~20000 chars, and according to https://languagetoolplus.com/http-api/#/default this would mean that my credentials don't really work... Do you have any hints on how to debug this?

real-or-random · 2023-02-08T18:26:54Z

Okay, I did some more checking by enabling logging.

When I change my wrong username/api key to the config, I get 403. And I see corrections for Premium rules, so the credentials work in general.
When I try the API manually with the failing tests (using the web interface https://languagetoolplus.com/http-api/#/default), checking works.
In the log I see a lot of these:

FINEST: annotatedTextParts = [TEXT("L"), TEXT("o"), TEXT("r"), TEXT("e"), TEXT("m"), MARKUP(" "), FAKE_CONTENT(" "), TEXT("i"), TEXT("p"), TEXT("s"), TEXT("u"), TEXT("m"), MARKUP(" "), FAKE_CONTENT(" "), TEXT("d"), TEXT("o"), TEXT("l"), TEXT("o"), TEXT("r"), MARKUP(" "), FAKE_CONTENT(" "), TEXT("s"), TEXT("i"), TEXT("t"), MARKUP(" "), `

Are the texts actually sent like this in JSON, i.e., everything is split into single elements for every character? Maybe the request body will be too large then.

Edit: It seems the answer is yes:

ltex-ls/src/main/kotlin/org/bsplines/ltexls/languagetool/LanguageToolHttpInterface.kt

Lines 175 to 203 in 1a58976

    
           private fun convertAnnotatedTextToJson(annotatedText: AnnotatedText): JsonElement { 
        
             val jsonDataAnnotation = JsonArray() 
        
             val parts: List<TextPart> = annotatedText.parts 
        
             var i = 0 
        
             while (i < parts.size) { 
        
               val jsonPart = JsonObject() 
        
               if (parts[i].type == TextPart.Type.TEXT) { 
        
                 jsonPart.addProperty("text", parts[i].part) 
        
               } else if (parts[i].type == TextPart.Type.MARKUP) { 
        
                 jsonPart.addProperty("markup", parts[i].part) 
        
                 if ((i < parts.size - 1) && (parts[i + 1].type == TextPart.Type.FAKE_CONTENT)) { 
        
                   i++ 
        
                   jsonPart.addProperty("interpretAs", parts[i].part) 
        
                 } 
        
               } else { 
        
                 // should not happen 
        
                 i++ 
        
                 continue 
        
               } 
        
               jsonDataAnnotation.add(jsonPart) 
        
               i++ 
        
             } 
        
             return jsonDataAnnotation 
        
           }

I'm unsure if this is the root cause, but the loop can certainly be optimized to produce shorter JSON.

real-or-random · 2023-02-11T11:18:23Z

I'm unsure if this is the root cause, but the loop can certainly be optimized to produce shorter JSON.

Okay, this is the root cause... I have some local changes that optimize the JSON output. I can open a PR soon.

Musta-Pollo · 2023-02-11T18:14:43Z

That would be very nice 👍

real-or-random · 2023-02-26T17:09:26Z

See #228, which works well for me locally.

Still, more should be done. We should at least truncate the JSON output at the API limit. We could also split it into multiple requests, but I'm convinced that this is much better because then you easily hit the minutely limits.

By the way, it's still a good idea to set ltex.checkFrequency to "save" to avoid hitting the API limits. This makes Premium much less useful, and not clearly better than the open-source version. I complained about the low limits about https://forum.languagetool.org/t/disappointing-api-limits-for-premium/8728, feel free to join me if this also bothers you.

intractabilis · 2023-07-26T02:11:47Z

I still see this problem in VS Code. Was the extension updated on the VS Code marketplace? Should I install something manually?

intractabilis · 2023-07-26T02:35:39Z

I tried a night build from the release section of GitHub repository. VS Code shows "Starting LTeX..." at the bottom forever. The LTeX Language Server output shows

[Info  - 7:20:58 PM] Starting ltex-ls...
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Jul 25, 2023 7:21:01 PM org.bsplines.ltexls.server.LtexLanguageServer initialize
INFO: ltex-ls 16.0.1-alpha.1.nightly.2023-07-25 - initializing...
Jul 25, 2023 7:21:01 PM org.bsplines.ltexls.tools.I18n setLocale

I tried changing the Java runtime, no help. How can I use the fix for this problem?

ritscAlex · 2023-10-10T13:11:37Z

I get the same problem while using nvim v0.9.1 and ltex-ls v16.0.0 implemented with null-ls. However, the same file gets checked correctly in vscode with vscode-ltex v13.1.0

intractabilis · 2023-10-10T19:37:39Z

@danielnaber I reached out to LanguageTooler GmbH. I explained to the support that the limit of 150,000 characters is a fake because this limit includes the characters in the augmented text, not the actual text being checked. Users cannot control the former. Since it is entirely unexpected for any user, it elevates to being lied to about the product when users buy it. I suggested setting the limit on the characters of the actual text. The support replied, “This is not intended to be changed,” and ghosted. They didn’t comment about lying about the product. I tried talking to a lawyer in the US, but he said that the fact that the company is in Germany makes it difficult. So, guys, if you are in Germany, you may be able to file a customer fraud case.

Meanwhile, I've done the minimum I could: I have canceled my Premium subscription. The funny part is that they've sent me an email (automated) asking me to cancel my cancellation. Within the sales pitch, among other things, they mentioned that “It can also check longer texts with up to 100,000 characters.” I replied that 100,000 characters is a lie because it limits an arbitrary amount of augmentation, not the actual text. They didn't answer. Oh, well. I switched to Grammarly.

protyposis added 1-bug 🐛 Issue type: Bug report (something isn't working as expected) 2-unconfirmed Issue status: Bug that needs to be reproduced (all new bugs have this label) labels Jan 27, 2023

real-or-random mentioned this issue Feb 26, 2023

Shorten JSON sent to HTTP server #228

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Checking of long documents fails with LanguageTool Premium API #215

Checking of long documents fails with LanguageTool Premium API #215

protyposis commented Jan 27, 2023

real-or-random commented Feb 8, 2023

real-or-random commented Feb 8, 2023 •

edited

real-or-random commented Feb 11, 2023

Musta-Pollo commented Feb 11, 2023

real-or-random commented Feb 26, 2023

intractabilis commented Jul 26, 2023

intractabilis commented Jul 26, 2023

ritscAlex commented Oct 10, 2023

intractabilis commented Oct 10, 2023

Checking of long documents fails with LanguageTool Premium API #215

Checking of long documents fails with LanguageTool Premium API #215

Comments

protyposis commented Jan 27, 2023

real-or-random commented Feb 8, 2023

real-or-random commented Feb 8, 2023 • edited

real-or-random commented Feb 11, 2023

Musta-Pollo commented Feb 11, 2023

real-or-random commented Feb 26, 2023

intractabilis commented Jul 26, 2023

intractabilis commented Jul 26, 2023

ritscAlex commented Oct 10, 2023

intractabilis commented Oct 10, 2023

real-or-random commented Feb 8, 2023 •

edited