Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于用notepad2打开文本文档,出现转换编码无效、乱码问题 #386

Closed
jkqxl opened this issue Oct 14, 2021 · 9 comments
Closed

Comments

@jkqxl
Copy link

jkqxl commented Oct 14, 2021

现象描述:
1.同样的编码、txt文本(原来utf-8,默认打开utf-8),过一段时间打开发现乱码,用notepad++打开,发现编码变成了GB2312(简体中文)
2.notepad2似乎无法自动识别编码
3.编码转换无效。打开出现乱码的文档,选择对应的编码,例GB2312(简体中文),无效,依旧是乱码,必须得修改默认编码,重新打开才行。

@zufuliu
Copy link
Owner

zufuliu commented Oct 14, 2021

  1. 过一段时间打开发现乱码

For file can not be detected as UTF-16 or UTF-8, Notepad2 defaults to system ANSI code page (GBK in your case, which should still renders your file correctly). statusbar contains encoding info, please check what it is, and attach your broken file (zipped) for further investigate🙏.

  1. notepad2似乎无法自动识别编码

Notepad2 only recognizes UTF-8 (this includes 7-bit ASCII) or UTF-16 with or without BOM.

  1. 编码转换无效。

Please use File -> Reload -> With Encoding, not File-> Encoding.

@jkqxl
Copy link
Author

jkqxl commented Oct 15, 2021

在我将txt默认打开方式改为GB2312(简体中文)后,我的问题基本已经解决,但关于这出现的一系列问题,还是要反馈下。

1.在4.21.09r3900版本前,忘记是哪个版本了,在默认打开方式为UTF-8,打开 GBK 等未出现过乱码问题
2.关于zufuliu的回复1,在 ANSI 下呈现的文档为乱码,具体请看我上传的文件
使用说明.zip

最后,既然在状态栏包含编码信息,说明是可以进行编码识别的,那么为什么不一步到位,在打开txt时,就用该识别的编码打开呢。如果担心启动速度被拖慢,则可以增加是否启用该功能的选项。

@zufuliu
Copy link
Owner

zufuliu commented Oct 15, 2021

Your file works fine for me with default settings.
Maybe you have "Open ANSI (unknown encoding) file in UTF-8 mode." enabled (via Edit -> Encoding -> Default), in which case if the file can not be detected as UTF-16, it will always be treated as UTF-8.
EncpdingSettings

One possible solution would be: try decode input file into UTF-16 (use MultiByteToWideChar) with system default code page, if decoded successful without data loss and replacements, then encode the UTF-16 result to UTF-8 (use WideCharToMultiByte).

@zufuliu zufuliu added this to the v4.21.11 milestone Oct 15, 2021
@jkqxl
Copy link
Author

jkqxl commented Oct 16, 2021

如图片设置的情况下,以 UTF-8 打开依旧乱码,这可能是我系统的问题

感谢软件作者与我的热情讨论,不过我不太懂编码,就这样吧

@zufuliu
Copy link
Owner

zufuliu commented Oct 16, 2021

@jkqxl if that not working, mostly your system's ANSI code page is not 936 (GBK, GB2312), please share the screenshot for "Select Encoding" dialog (click File -> Reload -> With Encoding, then scroll to top), like following:
SelectEncoding

zufuliu added a commit that referenced this issue Oct 16, 2021
…issue #386.

Change the implementation to use system ANSI code page to decode
the opening file (which is not valid UTF-8), if decoding succeeded
without data loss then convert the decoded UTF-16 result into UTF-8.

This diff from previous behavior, which always use UTF-8 to decode
the opening file when the option is enabled and the file can not be
detected as UTF-16.
zufuliu added a commit that referenced this issue Oct 16, 2021
@zufuliu
Copy link
Owner

zufuliu commented Oct 16, 2021

I think this was fixed (by af59a71 and db68b13), please test latest builds (from https://github.com/zufuliu/notepad2/actions or https://ci.appveyor.com/project/zufuliu/notepad2).

@jkqxl
Copy link
Author

jkqxl commented Oct 17, 2021

嗯,好的

再次测试“Notepad2_zh-Hans_x64_v4.21.09r3900”,真实编码GBK,,默认设置1、4项打钩以默认 UTF-8 打开,显示为 ANSI ,正常中文 ——— 很奇怪,猜测是由于我昨天更新了win10 21H1补丁KB4023057,而导致乱码恢复了中文。

对比再次第二次测试“Notepad2_zh-Hans_x64_v4.21.09r3900”,真实编码GBK,默认设置1、2、3、4项打钩以默认 UTF-8 打开,显示为 ANSI ,结果出现乱码

测试最新版本“Notepad2_GCC_en_x64:Notepad2_GCC_x64_v4.21.09r3900”,真实编码GBK,默认设置1、2、3、4项打钩,以默认 UTF-8 打开,显示为 GBK,未出现乱码

测试结果:因未知原因,以 UTF-8 打开真实编码 GBK ,不再会出现乱码问题;且,现在设置默认编码GBK、重新载入-选择编码GBK,或者最新版本修改默认设置1、2、3、4项打钩,已经可以成功解决乱码问题

@zufuliu
Copy link
Owner

zufuliu commented Oct 17, 2021

Thanks for the tests.

显示为 ANSI ,结果出现乱码

This means your system's ANSI code page is not GBK (936), mostly it's UTF-8, see issue #39.

@jkqxl
Copy link
Author

jkqxl commented Oct 18, 2021

不客气,确实是 使用 Unicode UTF-8 进行全球语言支持 问题

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants