Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Umlaut or similar ("ä, ö, ü ...") characters cannot be displayed correctly in the console. #116

Closed
delgermurun opened this issue Jan 7, 2022 · 16 comments
Assignees

Comments

@delgermurun
Copy link

When I'm labeling, Zingg is unable to show me ä, ö, ü ... kinds of characters correctly in the console. The console shows ? instead of the correct character.

File's encoding is utf-8 and CSV file.

  • OS: macOS Monterey. But I am running Zingg from Docker.
  • iTerm2 with zsh.
  • Zingg version 0.3.1

Thank you in advance!

@sonalgoyal
Copy link
Member

@navinrathore - can you please take a look at this? you can replace our febrl test file with some of the above characters and test.

@navinrathore
Copy link
Contributor

navinrathore commented Jan 7, 2022

I quickly checked by running zingg on a xterm (Ubuntu-20). These chars were display on screen. On your iterm2, Could you try to run cat "data-filename" and let us know if it shows these chars or not?
image

@sonalgoyal
Copy link
Member

@delgermurun - can you please check?

@delgermurun
Copy link
Author

@navinrathore @sonalgoyal thanks for the quick responses.

cat <filename> works fine on my iTerm2.
image

@sonalgoyal
Copy link
Member

which version of java are you using @delgermurun ?

@delgermurun
Copy link
Author

@sonalgoyal I am running Zingg's Docker image directly (by binding my data directory as a volume)
here is an output of the java -version command.

I have no name!@0bf5a68b729f:/zingg-0.3.1-SNAPSHOT/scripts$ java -version
openjdk version "1.8.0_312"
OpenJDK Runtime Environment (build 1.8.0_312-b07)
OpenJDK 64-Bit Server VM (build 25.312-b07, mixed mode)

@sonalgoyal
Copy link
Member

hmm..does cat file work fine in the docker?

@delgermurun
Copy link
Author

just checked. Yes cat works fine in the docker.

@sonalgoyal
Copy link
Member

sonalgoyal commented Jan 7, 2022

@delgermurun let us run a few tests to narrow down this problem.

  • update test.csv in examples/ febrl with the characters
  • Run match and see what the output under /tmp/zinggOutput looks like
  • Run findTrainingData and label and see what the console shows

@sonalgoyal
Copy link
Member

@navinrathore have we tested the umlaut characters in docker?

@navinrathore
Copy link
Contributor

Yes. It is reproducible in Docker image.
@delgermurun, we will work on it. Thanks.

@navinrathore
Copy link
Contributor

@delgermurun , Please set the below environment variable for locale. Umlauts will be displayed on screen while Labeling.

export LANG=C.UTF-8

They are fine in Match output. It is just an issue related to display.

@sonalgoyal
Copy link
Member

That’s a good find @navinrathore, can we set this in the docker file for the future as well?

@navinrathore
Copy link
Contributor

Current locale setting on Docker image is:
$ locale
LANG=
LANGUAGE=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"

Current locale availability
$ locale -av
locale: C.UTF-8 directory: /usr/lib/locale/C.UTF-8

we will fix this issue in docker image. Thanks.

sonalgoyal added a commit that referenced this issue Jan 10, 2022
locale set to C.UTF-8 in Dockerfile #116
@delgermurun
Copy link
Author

I can confirm that export LANG=C.UTF-8 works!

Thanks.

@sonalgoyal
Copy link
Member

Awesome @delgermurun , thanks @navinrathore for the quick turnaround!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants