Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error on recognizing some UTF-16 and UTF files #6

Closed
UVJkiNTQ opened this issue Mar 20, 2020 · 2 comments
Closed

error on recognizing some UTF-16 and UTF files #6

UVJkiNTQ opened this issue Mar 20, 2020 · 2 comments

Comments

@UVJkiNTQ
Copy link

UVJkiNTQ commented Mar 20, 2020

Describe the bug
A clear and concise description of what the bug is.
I use the tool to convert .cue and .log files.
I discovered problem with some UTF-8 encoded cue files and many UTF-16 encoded log files.
Some UTF-8 encoded files do not transcoding well and still have problem in mapping.
UTF-16 encoded log files are recognized as latin_1 encoded.

Example is attach.
.bak is the original fils.
error.zip

@x1angli
Copy link
Owner

x1angli commented Mar 23, 2020

@UVJkiNTQ
Thank you for your feedback along with detailed information

  • the SW-030GAMERS.cue file in your zip, sorry but that file looks good. is there any problem?

  • the SW-030GAMERS.log file. Yes it's totally scrambled.

The reason: cvt2utf relies on a 3rd-party tool called chardet to detect codc/charsets. chardet is generally good, but it has a cryptic and confusing probabilistic model, which makes the codec erroneously detected in certain cases. In order to ameliorate this problem, our cvt2utf uses some tricks. However, these tricks may break or fail in other cases...

At this moment, you can follow the instruction below to temporarily solve this flaw

  1. git clone or download the source code
  2. go to Line 20 of cvt2utf/main.py
  3. drop the latin-1, so as to make it look like 'codec_chain': ['ascii', 'utf_8_sig', 'chardet'],
  4. go to terminal / cmd, cd to the cvt2utf folder (the upper cvt2utf folder, not the lower sub-folder), run python3 -m cvt2utf.main convert "./error" -i log (change the path to your path)

As a result, it can be correctly recognized as UTF-16 and transcoded to Latin-1. I tried on my machine and the generated SW-030GAMERS.log looks good.
let me know if you run into any further problem

Note 1: cvt2utf8 is not intended for general users, instead, it aims at coding-savvy users . Though I've tried my best to make it easy to use, yet still, users are expected to understand how it works.........

Note 2: dropping latin-1 may work in this case, but may break things in other cases....

@UVJkiNTQ
Copy link
Author

That cue seems good. I am sorry for that. It just looks bad in nano.
I will try the workaround.
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants