Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

w3mman does not render ansi escape sequences on redhat based distributions #201

Closed
albfan opened this issue Oct 19, 2021 · 10 comments · Fixed by #238
Closed

w3mman does not render ansi escape sequences on redhat based distributions #201

albfan opened this issue Oct 19, 2021 · 10 comments · Fixed by #238

Comments

@albfan
Copy link
Contributor

albfan commented Oct 19, 2021

OS: Fedora 34
package: w3m-0.5.3-50.git20210102.fc34.x86_64

Man pages add ansi escape sequences for bold

$ PAGER='cat -A' /usr/bin/man bash
BASH(1)                                                                                                  General Commands Manual                                                                                                  BASH(1)$
$
^[[1mNAME^[[0m$
       bash - GNU Bourne-Again SHell$
$
^[[1mSYNOPSIS^[[0m$
       ^[[1mbash ^[[22m[options] [command_string | file]$
$
^[[1mCOPYRIGHT^[[0m$
       Bash is Copyright (C) 1989-2020 by the Free Software Foundation, Inc.$
$
^[[1mDESCRIPTION^[[0m$
...

On bash it shows correctly:

$ /usr/bin/man bash
BASH(1)                                                                                                  General Commands Manual                                                                                                  BASH(1)

NAME
       bash - GNU Bourne-Again SHell

SYNOPSIS
       bash [options] [command_string | file]

COPYRIGHT
       Bash is Copyright (C) 1989-2020 by the Free Software Foundation, Inc.

DESCRIPTION
...

But w3mman do not render those correctly:

$ /usr/bin/w3mman bash
BASH(1)                                                                                                  General Commands Manual                                                                                                  BASH(1)

 [1mNAME [0m
       bash - GNU Bourne-Again SHell

 [1mSYNOPSIS [0m
        [1mbash  [22m[options] [command_string | file]

 [1mCOPYRIGHT [0m
       Bash is Copyright (C) 1989-2020 by the Free Software Foundation, Inc.

 [1mDESCRIPTION [0m
...

Any settings I'm missing? I see this working correctly on arch linux

@hboetes
Copy link

hboetes commented Jun 3, 2022

Please change the title to w3mman does not render ansi escape sequences on redhat based distributions

That's a more accurate description of what's going on.

@hboetes
Copy link

hboetes commented Jun 3, 2022

/usr/local/libexec/w3m/cgi-bin/w3mman2html.cgi man > man.html

generates proper html on other platforms, but on redhat and friends the resulting output contains escape codes.

@hboetes
Copy link

hboetes commented Jun 3, 2022

I compiled man-db like it's compiled on arch, and I get exactly the same problem...

@albfan albfan changed the title w3mman do not render ansi escape sequences w3mman does not render ansi escape sequences on redhat based distributions Jun 4, 2022
@albfan
Copy link
Contributor Author

albfan commented Jun 4, 2022

Arch linux.

/usr/lib/w3m/cgi-bin/w3mman2html.cgi man >man.html
$ cat -A man.html | head
Content-Type: text/html$
$
<html>$
<head><title>man man</title></head>$
<body>$
<pre>$
MAN(1)                                                            Utilidades de paginador del manual                                                           MAN(1)$
$
<b>NOMBRE</b>$
       man - interfaz de los manuales de referencia del sistema$

Fedora:

/usr/libexec/w3m/cgi-bin/w3mman2html.cgi man > man.html
$ cat -A man.html | head
Content-Type: text/html$
$
<html>$
<head><title>man man</title></head>$
<body>$
<pre>$
MAN(1)                                                                                             Utilidades del paginador del manual                                                                                             MAN(1)$
$
^[[1mNOMBRE^[[0m$
       man - interfaz de los manuales de referencia del sistema$

@albfan
Copy link
Contributor Author

albfan commented Jun 4, 2022

I started adding substitute commands:

diff --git i/w3mman2html.cgi w/w3mman2html.cgi
index b121470..0fa90f5 100755
--- i/w3mman2html.cgi
+++ w/w3mman2html.cgi
@@ -162,7 +162,15 @@ EOF
     next;
   }
 
-  s@[1m(\w+)[0m$@<b>$1</b>@g;
+  my $printchar='[\wÁÉÍÓÚáéíóú /\'.:;,&()\\"~=%*\$\?|!#\`\@\{\}\<\>_-]';
+  s@[1m($printchar+)[0m@<b>$1</b>@g;
+  s@[4m($printchar+)[24m@<u>$1</u>@g;
+  s@[1m($printchar+)[0m@<b>$1</b>@g;
+  s@[1m($printchar+)[22m@<b>$1</b>@g;
+  s@[1m($printchar+)[4m@<b>$1</b>@g;
+  s@[22m($printchar+)[0m@<u>$1</u>@g;
+  s@[22m($printchar+)[24m@<u>$1</u>@g;
+  s@[4m([\wÁÉÍÓÚáéíóú /'.:;,&()\\"~=%*\$\?|!#\`\@\{\}\<\>_-]+)[0m@<u>$1</u>@g;
   s@(http|ftp)://[\w.\-/~]+[\w/]@<a href="$&">$&</a>@g;
   s@\b(mailto:|)(\w[\w.\-]*\@\w[\w.\-]*\.[\w.\-]*\w)@<a href="mailto:$2">$1$2</a>@g;
   s@(\W)(\~?/[\w.][\w.\-/~]*)@$1 . &file_ref($2)@ge;

This almost do it. I test with man bash and still there are some errors. Basically we need anything that is a character. instead of all that

[\wÁÉÍÓÚáéíóú /'.:;,&()\"~=%*$?|!#`@{}<>_-]

@hboetes
Copy link

hboetes commented Oct 1, 2022

Please commit #238
Thanks!

@hboetes
Copy link

hboetes commented Dec 24, 2022

I have found one still, [34m in dbus-run-session(1)

@rkta
Copy link
Contributor

rkta commented Dec 25, 2022 via email

@tats tats closed this as completed in #238 Jan 5, 2023
tats added a commit that referenced this issue Jan 15, 2023
@albfan
Copy link
Contributor Author

albfan commented Jan 18, 2023

So finally setting missed is

GROFF_NO_SGR=1

8891eab...760d7ad

Wonder if we should reopen this and consider a parameter to configure depending on distro. Or this just force same behaviour in all distros?

@tats
Copy link
Owner

tats commented Jan 18, 2023

I assume adding GROFF_NO_SGR=1 has no problem with

  • groff >=1.18 with Debian default
  • groff >=1.18 default
  • groff <1.18, or
  • non-groff.

I don't assume SGR is forcely enabled even when GROFF_NO_SGR=1.

Anyway, if you really found a problem, please reopen.

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Apr 27, 2023
2023-01-21  Tatsuya Kinoshita  <tats@debian.org>

	* NEWS: Update NEWS to 0.5.3+git20230121.

2023-01-15  Tatsuya Kinoshita  <tats@debian.org>

	* scripts/w3mman/w3mman2html.cgi.in:
	Add GROFF_NO_SGR=1 to w3mman2html.cgi for non-Debian groff.
	Bug-Debian: tats/w3m#238
	Bug-Debian: tats/w3m#201

	* scripts/w3mman/w3mman2html.cgi.in:
	Revert "Turn ansi escape sequences into html tags".
	This reverts commit 44af9271e0e984544762e2212549f134c86b4418.
	cf. tats/w3m#238

2023-01-12  Tatsuya Kinoshita  <tats@debian.org>

	* fm.h, rc.c: Do not expand config value of tmp_dir.

	* config.h.dist, config.h.in, configure, configure.ac, rc.c:
	Use faccessat for rc_dir and tmp_dir.

	* local.c: Allow writeLocalCookie even when no_rc_dir.

	* main.c, rc.c: Call wtf_init in sync_with_option.

	* rc.c: Avoid modifying read-only rc_dir.

	* fm.h, main.c, proto.h, rc.c: Make tmp_dir if not found.

2023-01-09  Tatsuya Kinoshita  <tats@debian.org>

	* NEWS: Prepare NEWS for w3m 0.5.3+git202301XX.

	* doc-de/FAQ.html, doc-jp/FAQ.html, doc/FAQ.html:
	Remove obsolete documents.

	* doc-de/FAQ.html, doc-de/MANUAL.html:
	Wrap long lines to avoid Lintian warnings.

2023-01-07  Tatsuya Kinoshita  <tats@debian.org>

	* file.c: Only read a first title.
	* file.c, fm.h: Revert "Only read title when in head".
	This reverts commit 0189e8aa5c4c4919a9bbc4dcbe0e521aada51e3c.
	Bug-Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1020215

2023-01-06  Tatsuya Kinoshita  <tats@debian.org>

	* file.c: Indentation fix for HTMLtagproc1.

2023-01-06  Robert Alm Nilsson  <robert@robalni.org>

	* file.c, fm.h: Only read title when in head.
	Origin: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1020215

2023-01-06  Tatsuya Kinoshita  <tats@debian.org>

	* libwc/charset.c: Avoid locale sensitive tolower in wc_charset_to_ces.

2023-01-06  Sertaç Ö. Yıldız  <sertacyildiz@gmail.com>

	* libwc/charset.c:
	Fix charset declaration parser fails with turkish locale.
	Origin: https://bugzilla-attachments.redhat.com/attachment.cgi?id=160014
	Bug-Fedora: https://bugzilla.redhat.com/show_bug.cgi?id=249675

	* history.c: Use st_mtime instead of st_mtim.tv_sec to compile on macos.
	cf. tats/w3m#247

2023-01-06  Rene Kita  <mail@rkta.de>

	* html.c, html.h, tagtable.tab: Recognize link targets in dfn elements.
	Refactor html.c.  Align in html.c.
	Origin: tats/w3m#259
	Bug-Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1018696

	* Makefile.in, form.c, main.c, util.c, util.h:
	Handle failed system calls.
	* display.c, display.h, file.c, form.c, main.c, proto.h, terms.h:
	Move declarations to appropiate header files.
	Origin: tats/w3m#257
	Bug-Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=398989

	* entity.js, etc.c, table.c, tests/allentity.expected:
	* tests/allentity.html: Skip soft hyphen when reading token.
	Fix generated HTML for entity test.
	Origin: tats/w3m#256
	Bug-Debian: tats/w3m#224
	Bug-Debian: tats/w3m#258
	Bug-Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=830173

	* file.c: Check LESSOPEN to avoid undefined behaviour.
	Refactor lessopen_stream.
	Origin: tats/w3m#254
	Bug-Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=991608

2023-01-05  Markus Hiereth  <translation@hiereth.de>

	* po/de.po: Update German message catalogue.
	Origin: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1011945#10

2023-01-05  Rene Kita  <mail@rkta.de>

	* buffer.c: Exit with error if a new buffer can't be allocated.
	Origin: https://git.sr.ht/~rkta/w3m/commit/1f88544c1a009ed2088ff20973bcfffe6cbcb5de
	Bug-Debian: tats/w3m#232
	Bug-Debian: tats/w3m#233

	* history.c, history.h:
	Merge history file if it was modified after start.
	* history.h, proto.h: Move declarations to the appropriate header file.
	* history.c: Add comment to explain placement of the ifdef.
	* history.c, proto.h: Let loadHistory return an error code.
	* history.c: Use 'goto fail' to remove code duplication.
	Origin: tats/w3m#247
	Bug-Debian: tats/w3m#176

2023-01-05  Alberto Fanjul  <albertofanjul@gmail.com>

	* scripts/w3mman/w3mman2html.cgi.in:
	Turn ansi escape sequences into html tags.
	Origin: tats/w3m#238
	Bug-Debian: tats/w3m#201

2023-01-04  Tatsuya Kinoshita  <tats@debian.org>

	* po/de.po, po/it.po, po/ja.po, po/sv_SE.po, po/w3m.pot, po/zh_CN.po:
	* po/zh_TW.po: Update PO strings.

	* doc/MANUAL.html, doc/README.img, libwc/wc_types.h, main.c, rc.c:
	English fixes.
	cf. tats/w3m#241

2023-01-04  Rene Kita  <mail@rkta.de>

	* rc.c: Remove unused variable.
	* table.c: Remove a warning for bzero with GCC 12.
	* file.c: Fix potential null pointer dereference.
	* .github/workflows/build.yml:
	Don't error out on deprecated declaration warnings.
	Origin: tats/w3m#255
	cf. tats/w3m#252

2023-01-04  nico  <smnicolas@gmail.com>

	* doc/MANUAL.html, doc/w3m.1, fm.h, main.c, rc.c, terms.c:
	Add high-intensity colors option and cli flag.
	Origin: tats/w3m#251
	cf. tats/w3m#250
	Bug-Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=626291

2023-01-04  Trafficone  <trafficone@gmail.com>

	* doc/README.SSL, doc/README.keymap, doc/README.menu: Translate from
	doc-jp.
	* doc/README.cookie, doc/README.func, doc/README.img, doc/README.m17n:
	* doc/README.passwd: Clarified wording.  Minor grammar changes.
	Origin: tats/w3m#241

2022-12-25  Tatsuya Kinoshita  <tats@debian.org>

	* configure: Update configure with acinclude.m4.

2022-12-25  Sam James  <sam@gentoo.org>

	* acinclude.m4: Fix configure tests broken with Clang 16.
	Origin: tats/w3m#248

2022-12-25  Rin Okuyama  <rokuyama.rk@gmail.com>

	* image.c, terms.c:
	For sixel, no need to round image size to multiple of character size.
	Origin: tats/w3m#246

	* image.c: Display resized image for OSC 5379 (mlterm).
	Origin: tats/w3m#245

2022-12-25  Rene Kita  <mail@rkta.de>

	* doc/README.siteconf: Say what the comment character is.
	Use the comment character in Examples.
	Origin: tats/w3m#237

	* main.c: Retry if loading of a file fails when argv_is_url.
	Origin: tats/w3m#235
	Bug-Debian: tats/w3m#210
	Bug-Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=537761
	Bug-Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=946440

2022-12-25  NRK  <nrk@disroot.org>

	* image.c: remove duplicate declaration.
	* cookie.c, entity.c, file.c, frame.c, func.c, image.c, linein.c:
	* mailcap.c, main.c, rc.c, rc.h, table.c, terms.c, terms.h:
	* w3mbookmark.c, w3mhelperpanel.c:
	fix all -Wmissing-prototypes warnings.
	* file.c, history.c, history.h, indep.c, indep.h, mailcap.c, proto.h:
	* rc.c, terms.c, url.c: fix some -Wstrict-prototypes warnings.
	Origin: tats/w3m#234

2022-12-25  Rene Kita  <mail@rkta.de>

	* .github/workflows/build.yml:
	Add GitHub Action to build source when pushing.
	Origin: tats/w3m#228

2022-12-21  Tatsuya Kinoshita  <tats@debian.org>

	* po/de.po, po/it.po, po/ja.po, po/sv_SE.po, po/w3m.pot, po/zh_CN.po:
	* po/zh_TW.po: Update PO strings.

2022-12-21  Rene Kita  <mail@rkta.de>

	* etc.c, fm.h, history.c, rc.c:
	Add option to set directory for temporary files.
	Origin: tats/w3m#219
	cf. tats/w3m#130

2022-12-21  Yash Lala  <yashlala@gmail.com>

	* rc.c: Use `Strnew_charp()` to create `char *` instead of `strdup()`.

	* rc.c:
	refactor: Substitute some clunky code with a `strdup()`.

	* doc/FAQ.html, doc/MANUAL.html, doc/w3m.1, rc.c:
	Set `rc_dir` based on `W3M_DIR` environment variable.
	Origin: tats/w3m#207
	cf. tats/w3m#130

2022-12-20  Tatsuya Kinoshita  <tats@debian.org>

	* etc.c: Fix potential overflow in checkType.

	* etc.c:
	Fix m17n backspace handling causes out-of-bounds write in checkType.
	[CVE-2022-38223]
	Bug-Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1019599
	Bug-Debian: tats/w3m#242
bptato pushed a commit to bptato/w3m that referenced this issue Jul 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants