Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auto trans #21

Open
gedw99 opened this issue Nov 9, 2023 · 8 comments
Open

auto trans #21

gedw99 opened this issue Nov 9, 2023 · 8 comments

Comments

@gedw99
Copy link

gedw99 commented Nov 9, 2023

Hey @vorlif

Try this :

https://translate.google.com/?sl=en&tl=de&text=%7B%0A%20%20%22%25%5B1%5Dd%20byte%22%3A%20%7B%0A%20%20%20%20%22one%22%3A%20%22%25%5B1%5Dd%20byte%22%2C%0A%20%20%20%20%22other%22%3A%20%22%25%5B1%5Dd%20bytes%22%0A%20%20%7D%2C%0A%20%20%22%25s%20GB%22%3A%20%22%25s%20GB%22%2C%0A%20%20%22%25s%20KB%22%3A%20%22%25s%20KB%22%2C%0A%20%20%22%25s%20MB%22%3A%20%22%25s%20MB%22%2C%0A%20%20%22%25s%20PB%22%3A%20%22%25s%20PB%22%2C%0A%20%20%22%25s%20TB%22%3A%20%22%25s%20TB%22%2C%0A%20%20%22%25vth_ordinal%2011%2C%2012%2C%2013%22%3A%20%7B%0A%20%20%20%20%22context%22%3A%20%22ordinal%2011%2C%2012%2C%2013%22%2C%0A%20%20%20%20%22other%22%3A%20%22%25vth%22%0A%20%20%7D%2C%0A%20%20%22%2C%20%22%3A%20%22%2C%20%22%2C%0A%20%20%22AM%22%3A%20%22AM%22%2C%0A%20%20%22PM%22%3A%20%22PM%22%2C%0A%20%20%22a.m.%22%3A%20%22a.m.%22%2C%0A%20%20%22midnight%22%3A%20%22midnight%22%2C%0A%20%20%22noon%22%3A%20%22noon%22%2C%0A%20%20%22p.m.%22%3A%20%22p.m.%22%2C%0A%20%20%22today%22%3A%20%22today%22%2C%0A%20%20%22tomorrow%22%3A%20%22tomorrow%22%2C%0A%20%20%22yesterday%22%3A%20%22yesterday%22%0A%7D%0A&op=translate

It fails because it also does the key works, so need an extractor and mergers just for this building machine translators.

 "midnight": "midnight",
„Mitternacht“: „Mitternacht“,
```.

So I was wondering if there is code for machine translation using any of the providers at all ?

The flow is that everything runs through machine translation and then for humans to also check as a 2nd phase.
@vorlif
Copy link
Owner

vorlif commented Dec 4, 2023

Hi @gedw99,

there is no such thing for Spreak and there will not be in the future.
The problem is that many languages have several plural categories. For example, English has two plural categories: One and Other. In contrast, Polish has four: One, Few, Many and Other. For machine translation, it is not easy to find out for which category you want a translation.

But I agree with you that it's an interesting idea. I am therefore currently working on redesigning the catalog processing and making it more usable for users. If you like, you can test this and write your own script. If you search for Go libraries for Google Translate, Deepl or similar, you will find a lot.

I have created a gist as a small template of how to start the process.

@gedw99
Copy link
Author

gedw99 commented Dec 5, 2023

yep I already use some of those libraries in golang.

yep I now the pluralisation problem.

If you get something working would love to contribute, as its a very common problem

this project is easily the best approach I have seen btw.

@gedw99
Copy link
Author

gedw99 commented Dec 5, 2023

this is one of the plugins I wanted to integrate btw. You don't want to it in spreak plugins ?

Its a good one with caching because you get immediate translation but only if you get a cache miss so its good enough for most projects as you won't get rate limited due to the local caching

package main

import (
	"encoding/json"
	"fmt"

	gtranslate "github.com/gilang-as/google-translate"
)

func main() {
	value := gtranslate.Translate{
		Text: "Halo Dunia",
		//From: "id",
		To: "en",
	}
	translated, err := gtranslate.Translator(value)
	if err != nil {
		panic(err)
	} else {
		prettyJSON, err := json.MarshalIndent(translated, "", "\t")
		if err != nil {
			panic(err)
		}
		fmt.Println(string(prettyJSON))
	}
}

@gedw99
Copy link
Author

gedw99 commented Dec 6, 2023

Hey @vorlif

let me know what you think about incorporating auto translation thing above and wider goals. Feel free to brainstorm with me.

@vorlif
Copy link
Owner

vorlif commented Dec 6, 2023

Hi @gedw99,

I'm sorry, but I don't think that should be part of this library. There are too many tools/APIs for machine translation, and each user prefers a different one. So the library would have to support several, what I would like to avoid. Another problem is the different plural forms, which cannot be translated properly.

With the above library, legal aspects come into play that I don't want to deal with.

I see two options:

  1. For the translation of JSON files, each user can simply write a small script, similar to the Gist above, and decide for themselves which API they want to use and how they want to deal with the plural forms.
  2. For translations at runtime, you can simply write your own catalog. This could wrap the JSONCatalog and perform the translations at runtime.

If you create a catalog or script, I would be happy to link it in the README. But I don't think it should be part of the library itself.

Were you thinking more of a translation of files during development or a translation at runtime?

@gedw99
Copy link
Author

gedw99 commented Dec 7, 2023

Ok got it @vorlif

Really appreciate the feedback as I like this system a lot.

So the machine translation will be my repo and then I can rig things up for the Catalogue system uses my machine translator.
I am not sure how just yet but thats the plan.

Is that cool ?

@vorlif
Copy link
Owner

vorlif commented Dec 7, 2023

Hi @gedw99,

I'm glad to hear that.

I think that sounds like a good plan. If you have something, I'd be happy to link it in the README.

If you need any clarification about creating a catalog, I will be happy to assist you. Would you like to have machine translation performed during development or at runtime of your program?

@gedw99
Copy link
Author

gedw99 commented Dec 9, 2023

hey @vorlif

Will get back you when I have something to link to . Have no time right now though …

ml at dev time is is my thinking . So the code holds everything . This means that the cache is committed to a repo, which is fine as it’s just some json files. Not big and merge-able..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants