Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turn ansi escape sequences into html tags #238

Merged
merged 1 commit into from
Jan 5, 2023
Merged

Conversation

albfan
Copy link
Contributor

@albfan albfan commented Jun 6, 2022

fixes #201

@hboetes
Copy link

hboetes commented Jun 11, 2022

Much better! Thanks!

@albfan
Copy link
Contributor Author

albfan commented Aug 8, 2022

I just setup a new laptop and apply this patch again.

I found another corner case with ^ symbols, so PR is updated.

Anything missing to apply it? just check with w3mman bash to see the result

@hboetes
Copy link

hboetes commented Aug 12, 2022

In w3mman bash I spotted this problem:

Compound Commands
[1m[[  [4mexpression  [1m]] [0m

With your latest version. We'll get there. :-)

@hboetes
Copy link

hboetes commented Sep 22, 2022

Can you explain how you "translate" a missed character into a w3mman2html.cgi entry? It would be a shame if this fix does not get implemented.

@albfan
Copy link
Contributor Author

albfan commented Sep 23, 2022

Let me check your last gotcha:

Compound Commands
[1m[[  [4mexpression  [1m]] [0m

Looks I forget to add [ and ] to possible inside chars.

It should be fixed now.

Basically I use:

/usr/bin/man bash

and compare with

w3mman bash

Basically the code replaces any number of characters from printchar, surrounded by these ansi escape sequences into HTML tags.

Anyway check what is different in some fedora distros to cause this would be a better solutions

@hboetes
Copy link

hboetes commented Sep 23, 2022

Thanks! I found a few more.

So [0m [1m [4m and [22m

% w3mman bash|grep '\[.m'
       -c        If the -c option is present, then commands are read from the first non-option  argument  command_string.   If  there  are  arguments  after  the   [4mcom‐ [0m
        [1m[-+]O [ [4mshopt_option [1m] [0m
                 value  of  that option;  [1m+O  [22munsets it.  If shopt_option is not supplied, the names and values of the shell options accepted by shopt are printed on the
                 standard output.  If the invocation option is  [1m+O [22m, the output is displayed in a format that may be reused as input.
        [1m! case  coproc  do done elif else esac fi for function if in select then until while { } time [[ ]] [0m
        [1m[[  [4mexpression  [1m]] [0m
              low  under CONDITIONAL EXPRESSIONS.  Word splitting and pathname expansion are not performed on the words between the  [1m[[  [22mand  [1m]] [22m; tilde expan

@albfan
Copy link
Contributor Author

albfan commented Sep 25, 2022

Symbol + is not under \w. Should be fixed now.

I see other paths to fix with:

w3mman bash|grep '\[[^-]m'

working on it

@albfan albfan marked this pull request as draft September 25, 2022 08:44
@albfan albfan marked this pull request as ready for review September 25, 2022 08:56
@albfan
Copy link
Contributor Author

albfan commented Sep 25, 2022

Added more symbols like \. Just found I'm fixing my locale all accents and ñ, so probably accents part need a better regex to deal with all languages

@hboetes
Copy link

hboetes commented Sep 25, 2022

Thanks for the updates, almost there.

Since it happens at the ends of lines I suspect it has something to do with the line-breaks. This is with

COLUMNS=80 w3mman bash

       --rcfile file
              Execute commands from file instead of the standard personal ini‐
              tialization file ~/.bashrc if the shell is interactive (see   [1mIN‐ [0m
              VOCATION below).

It also happens if you don't set COLUMNS, but isn't as visible, since it happens in the wrapped line. Setting COLUMNS makes it stand out.

@albfan
Copy link
Contributor Author

albfan commented Sep 25, 2022

Ah yes, wrapped texto do not includ new line, fixing it

@albfan
Copy link
Contributor Author

albfan commented Sep 30, 2022

See if columns create a splitted word man is wrapped with start and end sequence: here \0x27[1m \0x27[0m

that pattern is here:

https://github.com/tats/w3m/pull/238/files#diff-7bd451f4ef63311cbda7ddcbbae207707823c3892c19ab25c3daec3e9bf093e4R166

so I think word splitted are correctly covered.

I tested and works on my side, can you try again:

Captura desde 2022-09-30 12-50-00

@hboetes
Copy link

hboetes commented Sep 30, 2022

The diff hasn't changed, and I still see the same problem.

% echo $COLUMNS 
80
       --rcfile file
              Execute commands from file instead of the standard personal ini‐
              tialization file ~/.bashrc if the shell is interactive (see   [1mI
              VOCATION below).

@albfan
Copy link
Contributor Author

albfan commented Sep 30, 2022

Yes for me It works. I can only think the missing symbol is that lower dash, as you can see It is for me -. I added _ previously but yours looks small, have to check what unicode that is

@hboetes
Copy link

hboetes commented Sep 30, 2022

A fair point, whilst using env COLUMNS=80 LC_ALL=C w3mman bash the output is clean indeed.

The UTF-8 char is: ‐

Here is the hexl output:

87654321  0011 2233 4455 6677 8899 aabb ccdd eeff  0123456789abcdef                                                                                                         
00000000: e280 900a 2d0a 0ae2 8090 0a2d 0a         ....-......-.                                                                                                            

So 0a is a LF, 2d is the normal -, and our UTF char is e28090 which is …drum roll… U+2010 ‐ e2 80 90 HYPHEN

Does that help?

@albfan
Copy link
Contributor Author

albfan commented Oct 1, 2022

cool I think know we have a solution that works for any char. anything that is not an escape.

Let me know if that works now

@hboetes
Copy link

hboetes commented Oct 1, 2022

It looks like it should, much appreciated!

@albfan
Copy link
Contributor Author

albfan commented Dec 27, 2022

Added option for

s@^[\[34m^[\[1m($printchar+)^[\[0m@<u><b>$1</b></u>@g;

@hboetes
Copy link

hboetes commented Dec 27, 2022

Yet another one bites the dust. 😊

@tats tats merged commit c56a66a into tats:master Jan 5, 2023
@tats
Copy link
Owner

tats commented Jan 5, 2023

Merged, thanks for your contribution.

@albfan albfan deleted the w3mman branch January 5, 2023 13:07
@hboetes
Copy link

hboetes commented Jan 15, 2023

I've found another gem in maildirmake(1) from the maildrop package:

\-q \fIquota\fR
.RS 4
install a quota on the maildir\&. See
\m[blue]\fB\fBmaildirquota\fR(7)\fR\m[]\&\s-2\u[1]\d\s+2
for more information\&.

Which results in:

       -q quota
           install a quota on the maildir. See  [34mmaildirquota(7) [0m[1] for more
           information.

@albfan
Copy link
Contributor Author

albfan commented Jan 15, 2023

This is problematic because currently nested syntax is not allowed:

[34m [1mmaildirquota [22m(7) [0m[1]

There's a line for [34 [0m and another for [1 [22m, but [34 stops at first escape. Ned to find a different way to parse this, probably check what nested escape sequences are valid

tats added a commit that referenced this pull request Jan 15, 2023
@tats
Copy link
Owner

tats commented Jan 15, 2023

Fixed by setting GROFF_NO_SGR.

Note that Debian disable the use of SGR escape sequences by default.
cf. man grotty.

@hboetes
Copy link

hboetes commented Jan 16, 2023

Looking much better, thanks!

       -q quota
           install a quota on the maildir. See maildirquota(7)[1] for more
           information.

@albfan
Copy link
Contributor Author

albfan commented Jan 18, 2023

So probably that invalidates all need for the merged changes on cgi?

@tats
Copy link
Owner

tats commented Jan 18, 2023

Reverted this pull request.
cf. 8891eab...760d7ad

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Apr 27, 2023
2023-01-21  Tatsuya Kinoshita  <tats@debian.org>

	* NEWS: Update NEWS to 0.5.3+git20230121.

2023-01-15  Tatsuya Kinoshita  <tats@debian.org>

	* scripts/w3mman/w3mman2html.cgi.in:
	Add GROFF_NO_SGR=1 to w3mman2html.cgi for non-Debian groff.
	Bug-Debian: tats/w3m#238
	Bug-Debian: tats/w3m#201

	* scripts/w3mman/w3mman2html.cgi.in:
	Revert "Turn ansi escape sequences into html tags".
	This reverts commit 44af9271e0e984544762e2212549f134c86b4418.
	cf. tats/w3m#238

2023-01-12  Tatsuya Kinoshita  <tats@debian.org>

	* fm.h, rc.c: Do not expand config value of tmp_dir.

	* config.h.dist, config.h.in, configure, configure.ac, rc.c:
	Use faccessat for rc_dir and tmp_dir.

	* local.c: Allow writeLocalCookie even when no_rc_dir.

	* main.c, rc.c: Call wtf_init in sync_with_option.

	* rc.c: Avoid modifying read-only rc_dir.

	* fm.h, main.c, proto.h, rc.c: Make tmp_dir if not found.

2023-01-09  Tatsuya Kinoshita  <tats@debian.org>

	* NEWS: Prepare NEWS for w3m 0.5.3+git202301XX.

	* doc-de/FAQ.html, doc-jp/FAQ.html, doc/FAQ.html:
	Remove obsolete documents.

	* doc-de/FAQ.html, doc-de/MANUAL.html:
	Wrap long lines to avoid Lintian warnings.

2023-01-07  Tatsuya Kinoshita  <tats@debian.org>

	* file.c: Only read a first title.
	* file.c, fm.h: Revert "Only read title when in head".
	This reverts commit 0189e8aa5c4c4919a9bbc4dcbe0e521aada51e3c.
	Bug-Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1020215

2023-01-06  Tatsuya Kinoshita  <tats@debian.org>

	* file.c: Indentation fix for HTMLtagproc1.

2023-01-06  Robert Alm Nilsson  <robert@robalni.org>

	* file.c, fm.h: Only read title when in head.
	Origin: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1020215

2023-01-06  Tatsuya Kinoshita  <tats@debian.org>

	* libwc/charset.c: Avoid locale sensitive tolower in wc_charset_to_ces.

2023-01-06  Sertaç Ö. Yıldız  <sertacyildiz@gmail.com>

	* libwc/charset.c:
	Fix charset declaration parser fails with turkish locale.
	Origin: https://bugzilla-attachments.redhat.com/attachment.cgi?id=160014
	Bug-Fedora: https://bugzilla.redhat.com/show_bug.cgi?id=249675

	* history.c: Use st_mtime instead of st_mtim.tv_sec to compile on macos.
	cf. tats/w3m#247

2023-01-06  Rene Kita  <mail@rkta.de>

	* html.c, html.h, tagtable.tab: Recognize link targets in dfn elements.
	Refactor html.c.  Align in html.c.
	Origin: tats/w3m#259
	Bug-Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1018696

	* Makefile.in, form.c, main.c, util.c, util.h:
	Handle failed system calls.
	* display.c, display.h, file.c, form.c, main.c, proto.h, terms.h:
	Move declarations to appropiate header files.
	Origin: tats/w3m#257
	Bug-Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=398989

	* entity.js, etc.c, table.c, tests/allentity.expected:
	* tests/allentity.html: Skip soft hyphen when reading token.
	Fix generated HTML for entity test.
	Origin: tats/w3m#256
	Bug-Debian: tats/w3m#224
	Bug-Debian: tats/w3m#258
	Bug-Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=830173

	* file.c: Check LESSOPEN to avoid undefined behaviour.
	Refactor lessopen_stream.
	Origin: tats/w3m#254
	Bug-Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=991608

2023-01-05  Markus Hiereth  <translation@hiereth.de>

	* po/de.po: Update German message catalogue.
	Origin: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1011945#10

2023-01-05  Rene Kita  <mail@rkta.de>

	* buffer.c: Exit with error if a new buffer can't be allocated.
	Origin: https://git.sr.ht/~rkta/w3m/commit/1f88544c1a009ed2088ff20973bcfffe6cbcb5de
	Bug-Debian: tats/w3m#232
	Bug-Debian: tats/w3m#233

	* history.c, history.h:
	Merge history file if it was modified after start.
	* history.h, proto.h: Move declarations to the appropriate header file.
	* history.c: Add comment to explain placement of the ifdef.
	* history.c, proto.h: Let loadHistory return an error code.
	* history.c: Use 'goto fail' to remove code duplication.
	Origin: tats/w3m#247
	Bug-Debian: tats/w3m#176

2023-01-05  Alberto Fanjul  <albertofanjul@gmail.com>

	* scripts/w3mman/w3mman2html.cgi.in:
	Turn ansi escape sequences into html tags.
	Origin: tats/w3m#238
	Bug-Debian: tats/w3m#201

2023-01-04  Tatsuya Kinoshita  <tats@debian.org>

	* po/de.po, po/it.po, po/ja.po, po/sv_SE.po, po/w3m.pot, po/zh_CN.po:
	* po/zh_TW.po: Update PO strings.

	* doc/MANUAL.html, doc/README.img, libwc/wc_types.h, main.c, rc.c:
	English fixes.
	cf. tats/w3m#241

2023-01-04  Rene Kita  <mail@rkta.de>

	* rc.c: Remove unused variable.
	* table.c: Remove a warning for bzero with GCC 12.
	* file.c: Fix potential null pointer dereference.
	* .github/workflows/build.yml:
	Don't error out on deprecated declaration warnings.
	Origin: tats/w3m#255
	cf. tats/w3m#252

2023-01-04  nico  <smnicolas@gmail.com>

	* doc/MANUAL.html, doc/w3m.1, fm.h, main.c, rc.c, terms.c:
	Add high-intensity colors option and cli flag.
	Origin: tats/w3m#251
	cf. tats/w3m#250
	Bug-Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=626291

2023-01-04  Trafficone  <trafficone@gmail.com>

	* doc/README.SSL, doc/README.keymap, doc/README.menu: Translate from
	doc-jp.
	* doc/README.cookie, doc/README.func, doc/README.img, doc/README.m17n:
	* doc/README.passwd: Clarified wording.  Minor grammar changes.
	Origin: tats/w3m#241

2022-12-25  Tatsuya Kinoshita  <tats@debian.org>

	* configure: Update configure with acinclude.m4.

2022-12-25  Sam James  <sam@gentoo.org>

	* acinclude.m4: Fix configure tests broken with Clang 16.
	Origin: tats/w3m#248

2022-12-25  Rin Okuyama  <rokuyama.rk@gmail.com>

	* image.c, terms.c:
	For sixel, no need to round image size to multiple of character size.
	Origin: tats/w3m#246

	* image.c: Display resized image for OSC 5379 (mlterm).
	Origin: tats/w3m#245

2022-12-25  Rene Kita  <mail@rkta.de>

	* doc/README.siteconf: Say what the comment character is.
	Use the comment character in Examples.
	Origin: tats/w3m#237

	* main.c: Retry if loading of a file fails when argv_is_url.
	Origin: tats/w3m#235
	Bug-Debian: tats/w3m#210
	Bug-Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=537761
	Bug-Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=946440

2022-12-25  NRK  <nrk@disroot.org>

	* image.c: remove duplicate declaration.
	* cookie.c, entity.c, file.c, frame.c, func.c, image.c, linein.c:
	* mailcap.c, main.c, rc.c, rc.h, table.c, terms.c, terms.h:
	* w3mbookmark.c, w3mhelperpanel.c:
	fix all -Wmissing-prototypes warnings.
	* file.c, history.c, history.h, indep.c, indep.h, mailcap.c, proto.h:
	* rc.c, terms.c, url.c: fix some -Wstrict-prototypes warnings.
	Origin: tats/w3m#234

2022-12-25  Rene Kita  <mail@rkta.de>

	* .github/workflows/build.yml:
	Add GitHub Action to build source when pushing.
	Origin: tats/w3m#228

2022-12-21  Tatsuya Kinoshita  <tats@debian.org>

	* po/de.po, po/it.po, po/ja.po, po/sv_SE.po, po/w3m.pot, po/zh_CN.po:
	* po/zh_TW.po: Update PO strings.

2022-12-21  Rene Kita  <mail@rkta.de>

	* etc.c, fm.h, history.c, rc.c:
	Add option to set directory for temporary files.
	Origin: tats/w3m#219
	cf. tats/w3m#130

2022-12-21  Yash Lala  <yashlala@gmail.com>

	* rc.c: Use `Strnew_charp()` to create `char *` instead of `strdup()`.

	* rc.c:
	refactor: Substitute some clunky code with a `strdup()`.

	* doc/FAQ.html, doc/MANUAL.html, doc/w3m.1, rc.c:
	Set `rc_dir` based on `W3M_DIR` environment variable.
	Origin: tats/w3m#207
	cf. tats/w3m#130

2022-12-20  Tatsuya Kinoshita  <tats@debian.org>

	* etc.c: Fix potential overflow in checkType.

	* etc.c:
	Fix m17n backspace handling causes out-of-bounds write in checkType.
	[CVE-2022-38223]
	Bug-Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1019599
	Bug-Debian: tats/w3m#242
bptato pushed a commit to bptato/w3m that referenced this pull request Jul 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

w3mman does not render ansi escape sequences on redhat based distributions
3 participants