Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support plotting with multibyte characters #99

Merged
merged 4 commits into from
Nov 25, 2023

Conversation

edgar-bonet
Copy link
Collaborator

This pull request addresses issue #84. Note that, in order for ncurses to support non-ASCII characters, we need to:

  • Ask for wide character support by defining a suitable macro before including <curses.h>. I went for _XOPEN_SOURCE, but we could use the more explicit NCURSES_WIDECHAR instead.

  • Link with the version of the library that provides this support. On my system (Ubuntu 22.04), this is ncursesw.so. Some tests should be done on other OSes in order to find the suitable link options.

One drawback of using the wide character version of mvvline() is that it does not support ACS (alternative character set) characters. Since we cannot draw with ACS_VLINE, we default to the visually equivalent “│” (U+2502 box drawings light vertical). This, however, cannot be done on the C locale (or any other 8-bit locale). In this case we use the ASCII character “|” (U+007C vertical line), which unfortunately leaves tiny gaps between successive character cells.

With this PR, the command:

{ ./torture | head -75; sleep 1m; } | ./ttyplot -c ╳

displays:

screenshot

I tested with multiple plotting characters, inside and outside the BMP, and also with ASCII characters in the C locale. It works for me, but more testing would be welcome, especially on OSes other than Ubuntu.

Fixes #84.

@hartwork
Copy link
Collaborator

hartwork commented Nov 13, 2023

Hi @edgar-bonet, very nice! Compiled and ran on Gentoo. Two small things:

I get this warning when compiling:

# make
cc -Wall -Wextra    ttyplot.c  `pkg-config --libs ncursesw 2>/dev/null || echo '-lcursesw -ltinfo'` -o ttyplot
ttyplot.c: In function ‘draw_line’:
ttyplot.c:135:9: warning: implicit declaration of function ‘mvvline_set’; did you mean ‘mvvline’? [-Wimplicit-function-declaration]
  135 |         mvvline_set(ph+1-l1, x, c1, l1-l2 );
      |         ^~~~~~~~~~~
      |         mvvline

This is with GCC 12.

The other thing is that the ^ and > on the axes' end look "less ASCII" now. Maybe it's just my font but I liked before better personally:

Before

before_Screenshot_20231113_150724

After

after_Screenshot_20231113_150755

@edgar-bonet
Copy link
Collaborator Author

@hartwork: Thanks for testing!

implicit declaration of function ‘mvvline_set’

You may be using the wrong include file, or the wrong feature-test macro. On my Ubuntu, /usr/include/curses.h has this:

#ifndef NCURSES_WIDECHAR
#if defined(_XOPEN_SOURCE_EXTENDED) || (defined(_XOPEN_SOURCE) && (_XOPEN_SOURCE - 0 >= 500))
#define NCURSES_WIDECHAR 1
#else
#define NCURSES_WIDECHAR 0
#endif
#endif /* NCURSES_WIDECHAR */

It then declares all wide character functions (including mvvline_set()) if NCURSES_WIDECHAR is non-zero.

Could you check how it works on your Gentoo? Is mvvline_set() defined in a different file? Is its definition conditioned on something else? Would it be appropriate to directly define NCURSES_WIDECHAR? Does pkg-config --cflags ncursesw give some good clue?

The other thing is that the ^ and > on the axes' end look "less ASCII" now. Maybe it's just my font but I liked before better personally

I also like the “before” better. I'll have to check why this happened...

@hartwork
Copy link
Collaborator

#ifndef NCURSES_WIDECHAR
#if defined(_XOPEN_SOURCE_EXTENDED) || (defined(_XOPEN_SOURCE) && (_XOPEN_SOURCE - 0 >= 500))
#define NCURSES_WIDECHAR 1
#else
#define NCURSES_WIDECHAR 0
#endif
#endif /* NCURSES_WIDECHAR */

@edgar-bonet same here.

It then declares all wide character functions (including mvvline_set()) if NCURSES_WIDECHAR is non-zero.

Could you check how it works on your Gentoo? Is mvvline_set() defined in a different file?

A different file, yes:

# fgrep -R mvvline_set /usr/include/ncurses*
/usr/include/ncursestw/ncurses.h:extern NCURSES_EXPORT(int) mvvline_set (int, int, const cchar_t *, int);       /* generated:WIDEC */
/usr/include/ncursestw/ncurses.h:#define mvvline_set(y,x,c,n)           mvwvline_set(stdscr,(y),(x),(c),(n))
/usr/include/ncursestw/curses.h:extern NCURSES_EXPORT(int) mvvline_set (int, int, const cchar_t *, int);        /* generated:WIDEC */
/usr/include/ncursestw/curses.h:#define mvvline_set(y,x,c,n)            mvwvline_set(stdscr,(y),(x),(c),(n))
/usr/include/ncursesw/ncurses.h:extern NCURSES_EXPORT(int) mvvline_set (int, int, const cchar_t *, int);        /* generated:WIDEC */
/usr/include/ncursesw/ncurses.h:#define mvvline_set(y,x,c,n)            mvwvline_set(stdscr,(y),(x),(c),(n))
/usr/include/ncursesw/curses.h:extern NCURSES_EXPORT(int) mvvline_set (int, int, const cchar_t *, int); /* generated:WIDEC */
/usr/include/ncursesw/curses.h:#define mvvline_set(y,x,c,n)             mvwvline_set(stdscr,(y),(x),(c),(n))

None of these seem to be pulled in from curses.h automatically…

$ grep '#.*include' /usr/include/curses.h | sort
#include <libutf8.h>
#include <ncurses_dll.h>
#include <stdarg.h>     /* we need va_list */
#include <stdbool.h>
#include <stddef.h>     /* we want wchar_t */
#include <stdint.h>
#include <stdio.h>
#include <stdnoreturn.h>
#include <unctrl.h>
#include <wchar.h>              /* ...to get mbstate_t, etc. */

…so it may need an additional include.

Does pkg-config --cflags ncursesw give some good clue?

It does, good idea!

# pkg-config --cflags ncursesw
-D_DEFAULT_SOURCE -D_XOPEN_SOURCE=600 -I/usr/include/ncursesw

So the Makefile would need respect these flags also and then it may work out of the box.

The other thing is that the ^ and > on the axes' end look "less ASCII" now. Maybe it's just my font but I liked before better personally

I also like the “before” better. I'll have to check why this happened...

My guess is that that is what ncursesw makes of the ACS_* values used in the code.

Copy link
Collaborator

@hartwork hartwork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edgar-bonet so it would be great to have the build system add pkg-config --cflags ncursesw to CFLAGS. PS: The pull request title has a small typo "mu[l]tibyte".

@edgar-bonet edgar-bonet changed the title Support plotting with mutibyte characters Support plotting with multibyte characters Nov 13, 2023
@edgar-bonet
Copy link
Collaborator Author

@hartwork wrote:

so it would be great to have the build system add pkg-config --cflags ncursesw to CFLAGS

I just pushed a commit with this change. Does it fix the warning you see?

Copy link
Collaborator

@hartwork hartwork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed a commit with this change. Does it fix the warning you see?

@edgar-bonet yes it does, thank you!

@edgar-bonet
Copy link
Collaborator Author

About the axes ends changing from ^/> to /, @hartwork wrote:

My guess is that that is what ncursesw makes of the ACS_* values used in the code.

Indeed. It appears that two conditions are needed for this to happen:

  • link with ncursesw
  • call setlocale() with a locale supporting /.

One way to avoid this change would be to unconditionally define T_*ARR to '^'/'>'. Or maybe make them wide characters too and use either L'▲'/L'▶' or L'△'/L'▷'.

@tenox7: Do your vintage Unix systems support wide characters? Specifically, is mbtowc() provided? Do they have a version of the curses library that provides mvvline_set()? If not, either this will have to be a build option, or we may try @MIvanchev's idea of outputting multibyte strings directly.

@hartwork
Copy link
Collaborator

One way to avoid this change would be to unconditionally define T_*ARR to '^'/'>'. Or maybe make them wide characters too and use either L'▲'/L'▶' or L'△'/L'▷'.

@edgar-bonet 👍

@edgar-bonet
Copy link
Collaborator Author

As this pull request's branch conflicted with master, I rebased and force-pushed.

@hartwork, could you please test this on Gentoo? I am worried about the include file: the previous version of this pull request added /usr/include/ncursesw to the include path, whereas commit 45e0db8 (which landed in master today) changed the header name to ncurses.h. Rebasing the PR on master changes the include file again:

this PR (old):  /usr/include/curses.h  → /usr/include/ncursesw/curses.h
commit 45e0db8: /usr/include/curses.h  → /usr/include/ncurses.h
this PR (new):  /usr/include/ncurses.h → /usr/include/ncursesw/ncurses.h

Also, is the last commit's message correct? It says

On Gentoo, however, /usr/include/ncurses.h and /usr/include/ncursesw/ncurses.h are different files, and only the latter defines the wide character functions

@tenox7
Copy link
Owner

tenox7 commented Nov 17, 2023

feel free to change it to ncursesw or whatever is needed for wide chars, I just wanted to break away from old curses; or revert my change if you think curses.h is better

@edgar-bonet
Copy link
Collaborator Author

@tenox7: For me, it makes no difference at all: on Ubuntu, the files

  • /usr/include/ncurses.h
  • /usr/include/ncursesw/curses.h
  • /usr/include/ncursesw/ncurses.h

are all symlinks to /usr/include/curses.h. That file declares everything we may need: the classic curses API, the ncurses extensions and – given the proper #define – the wide character extensions.

The relevant question is about portability: What would be the most portable way to get the declarations we want? Are there systems where ncurses.h is needed to get the ncurses or wchar extensions? Are there systems that lack ncurses.h but have everything in curses.h? Where should we search for those files? pkg-config seems like a good way to get the include paths but, if we want to support systems without pkg-config (do we want that?), what fallback path should we use? I do not have the answers to those questions. 😦

FWIW, the Debian package ncurses-doc contains man pages that recommend including curses.h, and a HOWTO that recommends ncurses.h instead. 😕 The ncurses package also came with two pkg-config package definitions: one with and one without the wchar extensions. They differ only in the name and the -l compiler option:

--- a/usr/lib/x86_64-linux-gnu/pkgconfig/ncurses.pc
+++ b/usr/lib/x86_64-linux-gnu/pkgconfig/ncursesw.pc
@@ -1,19 +1,19 @@
 # pkg-config file generated by gen-pkgconfig
 # vile:makemode
 
 prefix=/usr
 exec_prefix=${prefix}
 libdir=/usr/lib/x86_64-linux-gnu
 includedir=${prefix}/include
 abi_version=6
 major_version=6
 version=6.3.20211021
 
-Name: ncurses
+Name: ncursesw
 Description: ncurses 6.3 library
 Version: ${version}
 URL: https://invisible-island.net/ncurses
 Requires.private: 
-Libs:  -L/usr/lib/x86_64-linux-gnu -lncurses -ltinfo
+Libs:  -L/usr/lib/x86_64-linux-gnu -lncursesw -ltinfo
 Libs.private:  -ldl 
 Cflags:  -D_DEFAULT_SOURCE -D_XOPEN_SOURCE=600

Makefile Outdated Show resolved Hide resolved
ttyplot.c Show resolved Hide resolved
@hartwork
Copy link
Collaborator

The relevant question is about portability: What would be the most portable way to get the declarations we want? Are there systems where ncurses.h is needed to get the ncurses or wchar extensions? Are there systems that lack ncurses.h but have everything in curses.h? Where should we search for those files? pkg-config seems like a good way to get the include paths but, if we want to support systems without pkg-config (do we want that?), what fallback path should we use? I do not have the answers to those questions. 😦

@edgar-bonet my vote would be for requiring widely available ncurses (https://repology.org/project/ncurses/versions) and command pkg-config (sometimes provided by pkgconf). That would allow dropping the || echo ... (that has a hard time working for e.g. Gentoo and Debian at the same, as we learned today, further up). @tenox7 what do you think?

@edgar-bonet
Copy link
Collaborator Author

@hartwork: This PR would not work with traditional curses: it relies on the wchar extensions, which come with ncurses. The thing is, we can usually import the ncurses API by including either curses.h or ncurses.h, and I don't know whether one option is more portable than the other.

Regarding pkg-config, I wonder how many systems would be left out by such requirement. According to Wikipedia, it is available for Linux, BSD, Windows, macOS, and Solaris. Does not look like an unreasonable requirement to me.

@hartwork
Copy link
Collaborator

@hartwork: This PR would not work with traditional curses: it relies on the wchar extensions, which come with ncurses. The thing is, we can usually import the ncurses API by including either curses.h or ncurses.h, and I don't know whether one option is more portable than the other.

@edgar-bonet I'd be happy either way, maybe with a slight preference for ncurses.h because it says what we developed against and doesn't pretend to support something else as of today. We could look at more distros if we find any with ncurses that does not install both files but I'd doubt we'd find any. One thing we could do is see how much of a chance the new code has to work with pdcurses either way but only supporting ncurses would probably be acceptable. My personal interest in pdcurses is limited by the fact that neither Debian nor Gentoo have a package for it, which makes it feel a bit too niche to me right now, but some distros have pdcurses packaged. How do you feel about the whole topic (not just pdcurses)?

Regarding pkg-config, I wonder how many systems would be left out by such requirement. According to Wikipedia, it is available for Linux, BSD, Windows, macOS, and Solaris. Does not look like an unreasonable requirement to me.

👍

@edgar-bonet
Copy link
Collaborator Author

@hartwork: I'm happy with including ncurses.h.

I hadn't heard about PDCurses before. Seems niche indeed: it's described as “A curses library for environments that don't fit the termcap/terminfo model” (not what we target), it doesn't provide a Makefile for Unix other than X11 and SDL, and most distro packages describe it as “Curses for Windows/MinGW/X11”. I would say don't bother unless someone volunteers to maintain the port.

Long term, it would be nice to have an easy way to run tests on multiple Linux and non-Linux OSes. A Vagrantfile could be an option if/when a list of officially supported systems is established.

@hartwork
Copy link
Collaborator

hartwork commented Nov 17, 2023

@hartwork: I'm happy with including ncurses.h.

I hadn't heard about PDCurses before. Seems niche indeed: it's described as “A curses library for environments that don't fit the termcap/terminfo model” (not what we target), it doesn't provide a Makefile for Unix other than X11 and SDL, and most distro packages describe it as “Curses for Windows/MinGW/X11”. I would say don't bother unless someone volunteers to maintain the port.

@edgar-bonet 👍

Long term, it would be nice to have an easy way to run tests on multiple Linux and non-Linux OSes. A Vagrantfile could be an option if/when a list of officially supported systems is established.

https://github.com/mpartel/bindfs/blob/5c8390f75925797a81973925a14cdf8ab092a3bc/.github/workflows/tests.yml#L108 has a CI using Vagrant based on GitHub Actions, but it's a bit slow whenever a CI runner does not come with KVM acceleration, there seems to be some roulette about it. Free alternatives for running Vagrant inside some CI may include AppVeyor and Cirrus CI, but it would need to be tried/verified. Also Vagrant images for all target OSes and the right driver would be needed, e.g. the VirtualBox Vagrant images only work if the driver is VirtualBox, same for KVM/Qemu.

@tenox7
Copy link
Owner

tenox7 commented Nov 18, 2023

yeah lets make ncurses a requirement, I'm happy with that; ncurses are available even for very old unixes; I haven't seen or used pdcurses in a very very long time

@edgar-bonet
Copy link
Collaborator Author

@tenox7: Did you have a chance to take look at this PR? As far an I am concerned, it is ready, but I would happily amend it if requested.

Should I change the arrows at the end of the axes (/) within this PR? If so, should I use ^/>, / or /?

@tenox7
Copy link
Owner

tenox7 commented Nov 18, 2023

sorry, for slow responses, let me check

@tenox7
Copy link
Owner

tenox7 commented Nov 18, 2023

I want to test it a little bit in different OSes and terminals.

@hartwork
Copy link
Collaborator

hartwork commented Nov 18, 2023

Should I change the arrows at the end of the axes (/) within this PR? If so, should I use ^/>, / or /?

@edgar-bonet my vote for any of these over the current arrows, yes please. @tenox7 how do you feel about the topic?

@edgar-bonet
Copy link
Collaborator Author

I pushed a commit that unconditionally sets the arrow tips to the ASCII characters ^/>.

@hartwork
Copy link
Collaborator

@edgar-bonet good call 👍

@tenox7
Copy link
Owner

tenox7 commented Nov 23, 2023

It doesn't build on macOS:

$ sw_vers 
ProductName:		macOS
ProductVersion:		14.1.1
BuildVersion:		23B81
$ cc -Wall -Wextra -lcursesw ttyplot.c -o ttyplot
ttyplot.c:47:1: error: unknown type name 'cchar_t'; did you mean 'wchar_t'?
cchar_t plotchar, max_errchar, min_errchar;
^~~~~~~
wchar_t

(repeated many times)

@edgar-bonet
Copy link
Collaborator Author

@tenox7 wrote:

ttyplot.c:47:1: error: unknown type name 'cchar_t'; did you mean 'wchar_t'?

It would seem the wrong ncurses.h is being included. @hartwork had a similar problem: he has two versions of ncurses.h in his Gentoo, one with and one without the wide-character extensions. Or maybe you have the right ncurses.h but you are lacking some feature-enabling macro that is needed for those extensions to be declared.

$ cc -Wall -Wextra -lcursesw ttyplot.c -o ttyplot

Did you try just typing make? The Makefile calls pkg-config in order to get the compiler's -I flag that will get you the correct ncurses.h, and the -D flags needed to enable the widechar API. Do you have pkg-config installed? What does pkg-config --cflags ncursesw say?

@tenox7
Copy link
Owner

tenox7 commented Nov 23, 2023

yes, just make on macos

$ git clone https://github.com/edgar-bonet/ttyplot.git && cd ttyplot && git checkout multibyte
$ make
cc -Wall -Wextra `pkg-config --cflags ncursesw 2>/dev/null || echo '-D_DEFAULT_SOURCE -D_XOPEN_SOURCE=600 -I/usr/include/ncursesw'`    ttyplot.c  `pkg-config --libs ncursesw 2>/dev/null || echo '-lcursesw -ltinfo'` -o ttyplot
ttyplot.c:47:1: error: unknown type name 'cchar_t'; did you mean 'wchar_t'?
cchar_t plotchar, max_errchar, min_errchar;
^~~~~~~
wchar_t
$ pkg-config --cflags ncursesw
-D_DARWIN_C_SOURCE

The options -c, -e and -E only support ASCII characters. Make them
support multibyte characters too, as box drawing and block elements can
make nice plotting glyphs.

This requires wide character support from the "ncursesw" version of the
ncurses library. Only characters supported by the user's locale can be
used, which includes the full Unicode repertoire on UTF-8 locales.

As the wide character API of ncurses does not support ACS characters,
the default plotting character on locales lacking mutibyte support is
now "|" (U+007C vertical line).
On Ubuntu, /usr/include/ncursesw/ncurses.h is a symlink to ../ncurses.h.
On Gentoo, however, /usr/include/ncurses.h and
/usr/include/ncursesw/ncurses.h are different files, and only the latter
defines the wide character functions: we need to give the compiler the
proper -I flag.

For the sake of portability, ask pgk-config for the correct compiler
flags (including -I). Default to the flags used on Gentoo, which also
work on Ubuntu.
Enabling multi-byte support changes the arrow tips from the ASCII
characters '^'/'>' to the Unicode arrows '↑'/'→' (U+2191 Upwards arrow
and U+2192 Rightwards arrow).

The ASCII characters look better on the graph: use them unconditionally.
@edgar-bonet
Copy link
Collaborator Author

@tenox7 wrote:

yes, just make on macos

There is something fishy going on: pkg-config does find the package ncursesw (where the w stands for “widechar”), yet the CFLAGS it gives cannot get us the widechar API. :-(

Could you try to see how many files matching the glob *curses*.h you have on your system? Do any of these define the type cchar_t? If so, is the definition protected by some #if or #ifdef? If not, does any of your *.h files define cchar_t?

PS: As there was a conflict, I rebased on master and force-pushed.

@tenox7
Copy link
Owner

tenox7 commented Nov 23, 2023

I think there may be a conflict between curses from the SDK/Xcode and Homebrew. Checking.

$ find /opt/homebrew/ -name ncurses\*.h
/opt/homebrew//Cellar/ncurses/6.4/include/ncurses.h
/opt/homebrew//Cellar/ncurses/6.4/include/ncursesw/ncurses.h
/opt/homebrew//Cellar/ncurses/6.4/include/ncursesw/ncurses_dll.h

$ find /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/ -name ncurses\*h
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk//usr/include/ncurses.h
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk//usr/include/ncurses_dll.h

@tenox7
Copy link
Owner

tenox7 commented Nov 23, 2023

I did this and it worked:

cc -Wall -Wextra -D_DARWIN_C_SOURCE -D_XOPEN_SOURCE_EXTENDED    -o ttyplot ttyplot.c -lncurses

perhaps something like this?

#if defined(__APPLE__)
#define _XOPEN_SOURCE_EXTENDED
#endif

This macro enables the wide character API. Although it has been
deprecated in favor of _XOPEN_SOURCE, macOS 14 ships a version of
ncurses that does not understand the later. Furthermore, pkg-config
fails to define it.
@edgar-bonet
Copy link
Collaborator Author

@tenox7: Thanks for debugging the issue!

I checked man ncurses, and it says _XOPEN_SOURCE_EXTENDED is deprecated in favor of _XOPEN_SOURCE. I tried the later, for consistency with what pkg-config sets on Linux, but the CI build failed on Mac. So I just followed your suggestion, and CI is happy. 😄

Copy link
Collaborator

@hartwork hartwork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not be happier that the new macOS CI from #108 could be of help here. Let's get this merged 👍

edgar-bonet added a commit to edgar-bonet/ttyplot that referenced this pull request Nov 23, 2023
* Define _XOPEN_SOURCE_EXTENDED on macOS
* Always use ASCII characters for the arrow tips
* Rely on pkg-config to find the proper ncurses.h
* Support plotting with mutibyte characters
@edgar-bonet edgar-bonet changed the base branch from master to development November 25, 2023 21:46
@edgar-bonet edgar-bonet merged commit e1c7f7e into tenox7:development Nov 25, 2023
5 checks passed
@edgar-bonet edgar-bonet deleted the multibyte branch November 25, 2023 22:38
@hartwork hartwork added this to the 1.6.0 milestone Nov 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multi-byte characters not supported?
4 participants