Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I spell-check multiple languages? #21

Closed
edward-martyr opened this issue Dec 23, 2019 · 11 comments
Closed

How do I spell-check multiple languages? #21

edward-martyr opened this issue Dec 23, 2019 · 11 comments
Assignees
Labels
1-feature-request ✨ Issue type: Request for a desirable, nice-to-have feature 3-fixed Issue resolution: Issue has been fixed on the develop branch
Milestone

Comments

@edward-martyr
Copy link

Describe the solution you'd like
Being able to assign a list of languages to ltex.language, e.g., "ltex.language": ["en-GB", "zh-CN"]

@edward-martyr edward-martyr added the 1-feature-request ✨ Issue type: Request for a desirable, nice-to-have feature label Dec 23, 2019
@valentjn
Copy link
Owner

@edward-martyr What do you mean with "multiple languages?" Do you have multiple languages in one workspace, but all the files are single-language each? Or do you have multiple languages in one file?

In the former case, you can use VS Code's multi-root workspace feature to have different settings (which include ltex.language) for different resources. However, LanguageTool seems to have some kind of language detection, so maybe this could be simplified (if the LATEX artifacts don't confuse the detector).

In the latter case, you're out of luck, LT doesn't have multilingual support right now.

@valentjn valentjn added the 2-needs-info Issue status: We need more information (usually) from the submitter before continuing label Dec 23, 2019
@edward-martyr
Copy link
Author

@valentjn Thanks for the comment. In most cases, I work with multiple languages, each in a separate project, so that helps a lot. However, there are occasions where I work with Chinese and Japanese, or English and Chinese in a single file. I guess I'll have to wait until any relevant updates.

@edward-martyr
Copy link
Author

Especially considering that Chinese and Japanese share the same script, it would be hard to differentiate and spell-check them simultaneously. It would be much easier in the case of English and Chinese.

@valentjn valentjn removed the 2-needs-info Issue status: We need more information (usually) from the submitter before continuing label Jan 15, 2020
@valentjn
Copy link
Owner

So LT has the concept of "alternative languages," which means that if a word is unknown in the main language, then the word will be looked up in these alternative languages. If it's in one of those, then there won't be a spelling error.

However, I'm a bit reluctant to implement this. This would only be about spell checking, not grammar checking. And the current architecture of LTEX is focused on single languages - each language has its own user dictionary and optionally additional rules from neural networks or Word2Vec. I could also imagine the consumption of resources (CPU, memory) would increase even further for keeping these languages in memory. So this would be a big change, and then you don't even have grammar checking.

I'll mark this issue as upstream as I would wait until LT has proper multi-lingual support. In the meantime, if someone wants to implement these alternative languages, I won't stop them 😉

@valentjn valentjn added the 2-upstream Issue status: Bug is caused by some dependency, might have to wait before continuing label Feb 16, 2020
@ghost
Copy link

ghost commented May 20, 2020

We could :

  1. Specified several languages
  2. Tests a file against all of those
  3. Choose the answer with the fewer errors as matched language

@valentjn
Copy link
Owner

@St-Ex-savadenn I think the current scope of the issue is having multiple languages in a single file. If your files are in one language each, and in different folders, then you can use multiple config.jsons with different values for ltex.language.

A possibility to cope with multiple languages in one file could be magic comments. Something like % ltex: language=en-US in LATEX would work, if your language parts are not just single words. You could also use that for whole files, if the comment is at the top.

Just for the record, LanguageTool has a built-in language detection via https://github.com/optimaize/language-detector. It's based on n-grams, so it's very fast. Downside is that it only gives you generic languages like en or de, instead of specific variants like en-US or de-DE, which won't give you spelling errors, only grammar errors.

@valentjn valentjn removed the 2-upstream Issue status: Bug is caused by some dependency, might have to wait before continuing label May 20, 2020
@valentjn valentjn self-assigned this May 22, 2020
@valentjn valentjn added this to the 5.0.0 milestone May 22, 2020
@nweldev
Copy link

nweldev commented May 22, 2020

@valentjn What do you mean by config.json? For now I only know about the VSCode settings.json. Therefore, I can't define different languages for different folders in the same workspace. If there is a specific configuration file for Ltex, it's awesome 😃 But I wasn't abble to find any mention of it in the docs.

@valentjn
Copy link
Owner

@noelmace Sorry, I meant settings.json, not config.json. There's no extra configuration for LTEX, apart from what can be set in settings.json. I was talking about multi-root workspaces. It's possible to have multiple workspaces in one project, each with its own setting.json (even nested). It's probably overkill to use this just for having different languages. Therefore, I did a first implementation of those magic comments, since that's more flexible. It will be in the next release.

@valentjn
Copy link
Owner

The code for magic comments is now in master. It will be in the upcoming 5.0.0 release. Here's a sneak preview for the syntax: https://valentjn.github.io/vscode-ltex/docs/advanced-features.html#magic-comments. I think this should be enough for most multilingual users.

@valentjn valentjn added the 3-fixed Issue resolution: Issue has been fixed on the develop branch label May 24, 2020
@valentjn
Copy link
Owner

valentjn commented Jun 1, 2020

Fix released in 5.0.0.

@RenatoLopes771
Copy link

For those coming from google, if you put ltex.language": "auto" it can actually work with multiple languages. This is not the best way to go about things as it's advised against using this setting in the documentation. I'm using with portuguese brazilian and english and it's working well so far.

If I have a single sentence in english it doesn't give me errors, but if I have a english word on a portuguese brazilian sentence it gives me the "estrangeirismo" error, which is when you use a word outside the language to describe something. Granted it's not perfect, some english phrases have the "estrangeirismo" error still but it's not everything if you set the language to just pt-BR.

TIP: You can create a .vscode folder and a settings.json file inside that folder to keep this setting only to that project, while having a main language in your standard settings.json for your other projects.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1-feature-request ✨ Issue type: Request for a desirable, nice-to-have feature 3-fixed Issue resolution: Issue has been fixed on the develop branch
Projects
None yet
Development

No branches or pull requests

4 participants