Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Channel names encoding #482

Closed
programadorhedonista opened this Issue Aug 11, 2015 · 9 comments

Comments

Projects
None yet
3 participants
@programadorhedonista
Copy link

commented Aug 11, 2015

I use an irc network (chathispano) that it use iso-8859-15, and the channel names are iso-8859-15 too.
But also, some people use utf-8 for theirs messages.

Configuring the default charset to iso-8859-15 in weechat,

  • With weechat 1.2 I can see all messages (including utf-8 messages) and all iso-8859-15 channelnames.
  • Now, with weechat from github (master branch, commit ca6854e), the "iso-8859-15 channel names" are corrupted with "?".
@Mikaela

This comment has been minimized.

Copy link
Contributor

commented Aug 12, 2015

I think this is duplicate of #218 or at least related to it or that fix.

@flashcode

This comment has been minimized.

Copy link
Member

commented Aug 12, 2015

Yes it should be fixed by #218, so if possible, please try with latest development version (1.3-rc1), which is stable for daily use.

@flashcode flashcode self-assigned this Aug 12, 2015

@flashcode flashcode changed the title channel names encoding Channel names encoding Aug 12, 2015

@programadorhedonista

This comment has been minimized.

Copy link
Author

commented Aug 12, 2015

As I said, this problem is with the last dev version.
There is no problem with weechat 1.2 version.
So #218 must be a different problem.

@flashcode

This comment has been minimized.

Copy link
Member

commented Aug 12, 2015

Can you try with a default config, in a new WeeChat session, to be sure it's not an upgrade problem or something like that?
If you are able to reproduce the problem, please tell me the exact steps, this will make things easier for me to fix the problem.
I still think it's related to #218, so maybe that's a regression caused by this fix.

@flashcode flashcode removed the duplicate label Aug 12, 2015

@programadorhedonista

This comment has been minimized.

Copy link
Author

commented Aug 12, 2015

after $ rm -rf ~/.weechat the problem persist.

Steps to test:
server irc.chathispano.com/6667 and a channel with a ISO encoded character is his name. For example #matemáticas

With the default charset config,

decode "iso-8859-1"
encode ""

I can't join this channel, so I edit encode = "iso-8859-1" (or -15)

Now,

  • with weechat 1.2, I can send /MODE #matemáticas and see the modes, and I can /JOIN #matemáticas.
  • with weechat git, /MODE #matemáticas reply #matemáticas: No such channel, and /JOIN works but show #matem?aticas in the status bar.

.
P.S. console locale LANG=es_ES.UTF-8

@flashcode

This comment has been minimized.

Copy link
Member

commented Aug 14, 2015

OK, so I understand your problem. Let me explain what WeeChat did before the fix #218, and the current behavior.

In WeeChat <= 1.2, the whole message was decoded, including the nick, host, channel, target, and the body of message ifself.
For example:

:nick!user@example.com PRIVMSG #matemáticas :this is a test

Here everything was decoded and encoded when receiving/sending to the IRC server.
That was wrong and caused bugs (sometimes channel not found if it's decoded by WeeChat).
Anyway, decoding the whole message was a bug, and for me charset options should apply only on the user message, nothing else (here: this is a test).

The current behavior in WeeChat >= 1.3-dev is to decode the user message only.
Since WeeChat is full UTF-8 (everything is stored as UTF-8 internally), this is a problem for channels with ISO chars because there's no conversion any more (from UTF-8 to ISO or reverse).
There's a workaround to join or interact with ISO channels, using /eval command (and .autojoin options in server are auto-evaluated).
That means you can do that:

/eval /mode #matem${\xe1}ticas
/eval /join #matem${\xe1}ticas
/set irc.server.freenode.autojoin "#matem${\xe1}ticas"

I agree this is a bit weird to have to do that, and this lead to some display problem in name after, because the channel name is now stored as-is in WeeChat (in ISO and not as UTF-8, like everything else). That's why you'll see chars ? in status bar or in channel when the channel name is displayed (and even in input bar when you complete the channel name).
But apart from these small problems, everything should work fine with the channel once joined.

So what can we do for that?

Note: this solution could be developed for the future, not for 1.3, which should be released next days.

As I said, I think current charset options should apply only to user messages, not channel names. This is because on the same IRC server you can have UTF-8 and ISO channels (it would be really better if people use UTF-8 everywhere, but for now, we must support both...).

So what about adding other charset options that would tell WeeChat to encode one specific channel to ISO (or all channels on a server if you are sure to always use ISO)?
So that internally the channel will be stored as UTF-8 in WeeChat (no more display bug), and WeeChat will have to decode/encode the channel name in all messages (from or to the server).
The channel can then have different encoding from the messages you are sending (this should happen rarely though).

And what can we do for the version 1.3?

Since the version 1.3 should be released soon, I see only two options:

  1. keep the behavior as-is, with the drawbacks mentioned above: need to use /eval to join ISO channels, and the ? displayed in names
  2. add a temporary option to switch to old behavior, ie decode/encode channel inside messages, according to your existing charset options (waiting for a better solution).

Please give me a fast feedback on that, because the release of 1.3 is still scheduled for Aug 16th
(the date can be shifted if this bug is considered as blocking).

@programadorhedonista

This comment has been minimized.

Copy link
Author

commented Aug 14, 2015

For now I think that option 2 is better. Writing log with � in the file name is ugly :)

@flashcode flashcode added this to the 1.3 milestone Aug 14, 2015

@flashcode flashcode removed the waiting info label Aug 14, 2015

flashcode added a commit that referenced this issue Aug 14, 2015

irc: add option irc.network.channel_encode (issue #218, issue #482)
This is a workaround (disabled by default) to join and chat on ISO
encoded channels (or another charset different from UTF-8).

This option may be removed in future if a better solution is
implemented.

@flashcode flashcode closed this Aug 14, 2015

@programadorhedonista

This comment has been minimized.

Copy link
Author

commented Aug 14, 2015

Thank you, flashcode. Now I can see the ISO channels correctly :)

( Just a comment. The option irc.network.channel_encode = on don't work exactily like it was in WeeChat <= 1.2
This irc network permit to use ISO characters in the host name (a hostname cloak),
and now, with channel_encode = on, those characters are shown with "?" in JOIN, etc.
A minor problem...)

@flashcode

This comment has been minimized.

Copy link
Member

commented Aug 15, 2015

Yes you're right, with the option enabled, only the channel is decoded (in addition to the user message).
The nick/host are not decoded any more. So I'll update the help on option.
And all these things should be improved in future (for now I don't know exactly how).

flashcode added a commit that referenced this issue Aug 15, 2015

irc: update help on option irc.network.channel_encode (issue #218, is…
…sue #482)

Remove mention of WeeChat <= 1.2 since the behavior is not exactly the
same as old versions (when the option is enabled): only the
channel/message are decoded/encoded and not the nick/host.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.