Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kitty Image Protocol Support #986

Open
7 of 11 tasks
wez opened this issue Jul 28, 2021 · 20 comments
Open
7 of 11 tasks

Kitty Image Protocol Support #986

wez opened this issue Jul 28, 2021 · 20 comments

Comments

@wez
Copy link
Owner

wez commented Jul 28, 2021

This issue is tracking the status of supporting Kitty's image protocol.

Spec at: https://sw.kovidgoyal.net/kitty/graphics-protocol/

  • termwiz: parse APC escapes
  • termwiz: parse baseline Kitty Image protocol APC escapes
  • term: basic placement implementation on top of existing iTerm2/sixel image data model
  • term: add placement data model to support full range of kitty placement options
  • wezterm-gui: teach renderer to allocate quads for new placement model
  • termwiz: parse animation/composition kitty image protocol sequences
  • term: apply animation to model
  • handle animation control
  • termwiz: support data transmission via shared memory on unix
  • termwiz: support data transmission via shared memory on Windows
  • placeholder support, which might enable support to function in tmux

Using it:

Enable the protocol by setting enable_kitty_graphics=true in your config.

Known conformance issues

  • The spec models image placements independently from the terminal cells, but wezterm maps them to terminal cells at placement time. As a result, over-writing cells with text can poke holes in wezterm which may not appear in kitty.
wez added a commit that referenced this issue Jul 28, 2021
These were parsed but swallowed. This commit expands the transitions
to be able to track the APC start, data and end and then adds
an `apc_dispatch` method to allow capturing APC sequences.

APC sequences are used in the kitty image protocol.

refs: #986
wez added a commit that referenced this issue Jul 28, 2021
This teaches termwiz to recognize and encode the APC
sequences used by the kitty image protocol.

This doesn't include support for animations, just the
transmit, placement and delete requests.

refs: #986
wez added a commit that referenced this issue Jul 28, 2021
wez added a commit that referenced this issue Jul 28, 2021
This isn't complete; many of the placement options are not supported,
and the status reporting is missing in a number of cases, including
querying/probing, and shared memory objects are not supported yet.

However, this commit is sufficient to allow the kitty-png.py script
(that was copied from
https://sw.kovidgoyal.net/kitty/graphics-protocol/#a-minimal-example)
to render a PNG in the terminal.

This implementation routes the basic image display via the same
code that we use for iterm2 and sixel protocols, but it isn't
sufficient to support the rest of the placement options allowed
by the spec.

Notably, we'll need to add the concept of image placements to
the data model maintained by the terminal state and find a way
to efficiently manage placements both by id and by a viewport
range.

The renderer will need to manage separate quads for placements
and order them by z-index, and adjust the render phases so that
images can appear in the correct plane.

refs: #986
wez added a commit that referenced this issue Jul 28, 2021
Untested... it compiles!

refs: #986
@kovidgoyal
Copy link

I am happy to see this, please don't hesitate to ping me if you have any questions.

@wez
Copy link
Owner Author

wez commented Jul 29, 2021

I am happy to see this, please don't hesitate to ping me if you have any questions.

Great to hear! Do you have a set of tests or similar that could be made to run against wezterm to sanity check conformance?
The surface area of the spec is quite large and I'm sure I'm going to overlook something. My game plan is try running @dankamongmen's notcurses demo when there's enough of an implementation working and see if anything looks egregiously bad.

@kovidgoyal
Copy link

I do have unit tests in kitty itself, for individual bits of functionality from the protocol, but I doubt they can be easily lifted for another terminal emulator. If you wish to have a look, please see kitty_tests/graphics.py

I would actually be happy to collaborate to make those (and other) tests defined in a terminal independent fashion so anyone implementing the protocol could use them. Perhaps a simple txt or json based format that specifies the input as escape codes and the output as a set of image placement data or similar.

@dankamongmen
Copy link
Contributor

I am happy to see this, please don't hesitate to ping me if you have any questions.

Great to hear! Do you have a set of tests or similar that could be made to run against wezterm to sanity check conformance?
The surface area of the spec is quite large and I'm sure I'm going to overlook something. My game plan is try running @dankamongmen's notcurses demo when there's enough of an implementation working and see if anything looks egregiously bad.

first off, i enthusiastically support this move. the kitty protocol is (at least for my purposes) a tremendous improvement upon both sixel and iterm. it simplifies things (no more requirement that one draw in terms of 6 rows, sane deletion behavior), improves performance in several areas (fast moves, fast changes to drawn images), and makes certain things possible that otherwise are not (text drawn atop bitmaps without killing entire cells). relative to iterm2, it's much more flexible and powerful. at the same time, it currently shows worse local performance in Notcurses due to (a) greater bandwidth demands than sixel and (b) time spent in zlib. i expect to be able to hide the latter in most applications via threading. it would be possible to reduce the local bandwidth via using the filesystem as a side channel to load images, but i have not yet embraced this, and might never do so (but might, who knows).

right now Notcurses selects the Kitty protocol strictly based on heuristics, as opposed to doing the recommended query. I intend to add support for the latter (and really ought have by now), but hadn't bothered since no one else had implemented it (and also because said query frustratingly doesn't let you determine the version of the kitty protocol supported--as you note, it's a large protocol). you can ensure kitty graphics are being used by running notcurses-info:

2021-07-29-005228_1083x620_scrot

where it says "rgba pixel animation support", you want that. wezterm currently says "sixel graphics" or "iterm graphics", i forget which one. if it says "rgba ....", you're driving kitty graphics. i'll try to add this query soon.

Notcurses uses a pretty wide subset of the protocol, and indeed motivated/proposed some of it. Among the elements it exercises are:

  • image loading via 32-bit RGBA without display
  • display of loaded images
  • movement of loaded images via position commands
  • deletion of loaded images specified by id
  • deletion of all images present
  • scrolling of images
  • inhibition of responses via q=2
  • non-scrolling of images via C=1
  • images in both the regular screen and alternative screen
  • fast cell-sized wipes via the animation protocol + RGBA load
  • fast cell-sized restores via the animation protocol + reflection (composition)
  • display of images larger than the terminal width (kitty cuts off excess material on the right), but only due to an existing bug

i do not currently exercise: sideloading images via the filesystem, managed animations, scaling, or z-indices other than 0, though i expect to start using the last Really Soon Now.

notcurses-tester and ncplayer -bpixel will also effectively test portions of your implementation.

if you have any questions, don't hesitate to ask me; i reckon i know more about the kitty graphics protocol than anyone living save @kovidgoyal himself. happy hacking!

@dankamongmen
Copy link
Contributor

if you have any questions, don't hesitate to ask me; i reckon i know more about the kitty graphics protocol than anyone living save @kovidgoyal himself. happy hacking!

oh and let me add that i make use of kitty's honoring of transparency values in RGBA (at at least a bimodal level), just as I do in iterm, and using unspecified pixels in sixel when P2=1 in sixel.

@kovidgoyal
Copy link

@dankamongmen regarding using heuristics for detection, because you dont have a version. You shouldnt need a version. You can create a dummy image and try to perform every operation you want on it and see if it succeeds, i.e. the terminal does not respond with an error. Thus you can know exactly what the terminal you are running in supports.

And of course as am sure you already know, if you really want a version us XTVERSION or XTGETTCAP

@dankamongmen
Copy link
Contributor

And of course as am sure you already know, if you really want a version us XTVERSION or XTGETTCAP

this is exactly what i'm doing, see https://github.com/dankamongmen/notcurses/blob/master/src/lib/termdesc.c#L501-L527

@dankamongmen regarding using heuristics for detection, because you dont have a version. You shouldnt need a version. You can create a dummy image and try to perform every operation you want on it and see if it succeeds, i.e. the terminal does not respond with an error. Thus you can know exactly what the terminal you are running in supports.

yep, i could do that. what i've got now works for me pretty well, though.

and let it be known in all lands the sun touches: i wholeheartedly affirm the use of the Kitty graphics protocol over others. indeed, i sat down a few weeks ago to describe my ideal terminal graphics protocol, and ended up with something very similar to Kitty's.

i will be adding support for coarse kitty graphics support very shortly, probably tonight, in dankamongmen/notcurses#1998.

wez added a commit that referenced this issue Jul 31, 2021
This is a stepping stone towards dynamically allocating
vertices.

refs: #986
wez added a commit that referenced this issue Jul 31, 2021
This commit removes the `Quads` struct which maintained pre-defined quad
indices for each of the cells, the background image and scrollbar thumb.

In its place, we now "dynamically" hand out quads to meet the needs of
what is being rendered.  There are some efficiency gains here with
things like the selection (which can now be a single stretched quad,
rather than `n` quads in width).

This isn't a fully dynamic allocation scheme, as we still allocate the
current worst case number of quads when resizing.

A following commit will adjust that so that we allocate a ballpark and
then employ a mechanism similar to OutOfTextureSpace to grow and retry a
render pass when we need more quads.

Futhermore, this dynamic approach may allow reducing the amount of stuff
we have in the Vertex and "simply" render some quads before others so
that we don't have to have so many draw() passes to build up the
complete scene.

refs: #986
wez added a commit that referenced this issue Jul 31, 2021
This removes the pre-allocated (at resize) number of quads
and replaces it with a dynamic mechanism that tracks how many
quads are needed for a frame and then will re-allocate and
re-render when there weren't enough.

We start with 1024 quads and try to allocate in multiples
of 1024 quads.

refs: #986
wez added a commit that referenced this issue Jul 31, 2021
This was added in 365a68d to free the
orca from its cage.  With the recent dynamic quad allocation changes, we
don't need a distinct 4th pass any more and can simply layer a separate
quad on top of the glyph quad.

refs: #986
wez added a commit that referenced this issue Jul 31, 2021
Taking further advantage of dynamic quad allocation, we can now
remove the multiple render passes in favor of allocating the quads
and painting them from back to front.

In turn, this means that we can reduce the amount of data that we
store in the vertex, which simplifies the shaders a bit, at the
expense of making the render code in rust a bit more complex.

However, we can take advantage of stretching runs of cells with
background colors in to a single quad.

refs: #986
wez added a commit that referenced this issue Jul 31, 2021
Now that we have a single pass, we don't need to "include" snippets,
and that makes the shaders a lot easier to follow.

refs: #986
wez added a commit that referenced this issue Jul 31, 2021
You can run `wezterm --config enable_kitty_graphics=true` to do ad-hoc
tests with the protocol enabled.

refs: #986
refs: #1998
wez added a commit that referenced this issue Jul 31, 2021
Until we have dual source blending, we need the background color
in order to have a more visually pleasing alpha blend.

refs: #986
refs: #932
wez added a commit that referenced this issue Jul 31, 2021
Allows selecting a source "sprite" from the image data, and offsetting
its position within the cell.

refs: #986
wez added a commit that referenced this issue Aug 1, 2021
Just use a single quad for a given split

refs: #986
wez added a commit that referenced this issue Aug 1, 2021
I noticed when running the notcurses demo that we're spending a
decent amount of time decoding png data whenever we need to
re-do the texture atlas.

Let's avoid that by allowing for ImageData at the termwiz layer
to represent both the image file format and decoded rgba8 data.

This commit is a bit muddy and also includes some stuff to try
to delete placements from the model.  It's not perfect by any
means--more expensive than I want, and there's something funky
that causes a large number of images to build up during some
phases of the demo.

refs: #986
wez added a commit that referenced this issue Aug 1, 2021
This adds a simple garbage collection scheme; when adding an image,
check to see if we're over budget on the total amount of RAM used
by the image data.

If we are, remove unreferenced images (images that are not placed)
until we're below the budget.

refs: #986
wez added a commit that referenced this issue Aug 3, 2021
Moves the localized hashing logic from term -> termwiz
where it can be re-used.

refs: #986
wez added a commit that referenced this issue Dec 16, 2021
wez added a commit that referenced this issue Dec 16, 2021
@stevenxxiu
Copy link

stevenxxiu commented Jan 5, 2022

@wez I'm trying to use Broot in WezTerm and opened an issue there. Mind taking a look at some problems in Canop/broot#473 (comment) with the protocol support?

Kitty's spec specifies that "The image will be scaled (enlarged/shrunk) as needed to fit the specified area" and Wezterm doesn't seem to respect that.

I've also another problem: the only way to correctly detect the support without having problems on some terminals seems to be to read an env var and there's no guarantee wezterm sets it in an adequate way.

@Canop
Copy link

Canop commented Jan 7, 2022

Everything is fine now, Broot 1.9.1 displays high resolution images in WezTerm:

image

@AnonymouX47
Copy link

AnonymouX47 commented May 9, 2022

Hello!

First of all, great work here and I do recognize that there's no claim that the terminal fully supports the kitty graphics protocol yet.

I'm currently working on a project that uses this protocol to display images (See AnonymouX47/term-image#40).
So far, everything I've implemented is working fine on Kitty.. and then I decided to try it out on other terminal emulators that were mentioned to support the protocol, Werzterm being the first.

From my short try with Werzterm (nightly build, downloaded few hours before sending this message), I've encountered a number of crippling (for my use case) inconsistencies with the protocol's specifications (or behaviour in kitty). Here are a few:

1. Every image transmited without an ID is replaced by the next image also without an ID.

According to the specification:

You can either simultaneously transmit and display an image using the action a=T, or first transmit the image with a id, such as i=10 and then display it with a=p,i=10 which will display the previously transmitted image at the current cursor position.

emphasis on the "or"
From my understanding, this means a=T without an ID should simply place the image without attaching it to an ID... which doesn't seem like what Wezterm does.

Trying a=T on kitty displayed as many images as I tried (a lot) without deleting anyone... but that was not the case with Wezterm.

2. Control codes c and r don't exactly work as specified.

If c and r are in relative proportion to the image, it'll scale the image... else it crops the image. (See the images below)

image
Scaled

image
Cropped (and the scale is even way off)

image
Cropped

3. Cursor advancement

According to the specs:

After placing an image on the screen the cursor must be moved to the right by the number of cols in the image placement rectangle and down by the number of rows in the image placement rectangle.

Trying this without C=1 (which Wezterm doesn't support yet) and without the cursor reaching either edge of the window:

  • On kitty, the cursor ends up on on the cell to the right of the bottom-leftmost cell occupied by the image.
  • On Wezterm, the cursor ends up on the first cell of the line immediately below that bottommost line occupied by the image.

Can't readily provide images now but I'm pretty sure those were the results


Please, keep in mind that inconsistency in interpretation and implementation of protocols is one of the major reasons we're where we are with terminal emulators today.

Thanks for your audience and I do hope you look into these and more... Keep up the great work.

@AnonymouX47
Copy link

AnonymouX47 commented May 9, 2022

By the way, here are the results of c and r with kitty (0.25.0):

image

image

image

All properly scaled.

@AnonymouX47
Copy link

@wez 👆🏾

@wez
Copy link
Owner Author

wez commented May 18, 2022

I saw this, and I appreciate the detail in your comments, but haven't had time to look at it. If you're eager to see this improved more quickly then I would be happy to accept a PR!

Otherwise, I'd like to remind you that this is free software that I hack on in my spare time, and I have a lot of demands on my spare time.

@AnonymouX47
Copy link

Oh! I totally understand that, I was only expecting some form of acknowledgement (of the comment).

It's not urgent for me as I personally don't use Wezterm, I was only testing out my project in different terminal emulators and decided to report bugs/issues I found (just like kovidgoyal/kitty#5081).

Thanks for your response.

@dholth
Copy link

dholth commented Dec 17, 2022

Do I remember correctly that wezterm used to have a "cat image to screen" utility? Where's a good one?

@AnonymouX47
Copy link

AnonymouX47 commented Dec 18, 2022

The default tool is the imgcat subcommand. As described in the docs:

To render an image inline in your terminal:

$ wezterm imgcat /path/to/image.png

Also, you can use term-image which provides an interactive user interface for browsing/viewing images... still under developmenr though.

Disclosure: I might be biased about the latter :) but feel free to judge for yourself.

There are a number of other tools out there that also perform similar tasks e.g timg.

@bew
Copy link
Sponsor Contributor

bew commented Apr 15, 2023

The latest release of Kitty added an interesting feature that allow displaying images without explicit support from a program, using unicode chars, diacritics and fg color to select and position an image.
Refs:

@AnonymouX47
Copy link

AnonymouX47 commented Apr 15, 2023

A wezterm fork (by the author of the feature) that implements support for the feature: https://github.com/sergei-grechanik/wezterm/tree/unicode-placeholders

Might make sense to ask the author to open a PR.

@AndydeCleyre
Copy link

Assuming that adding support for Kitty's unicode placeholder model means Wezterm + tmux can support the Kitty image protocol, can we get a todo-checkbox for it up top here?

@constantitus
Copy link

constantitus commented Nov 22, 2023

I'm trying to find a good way to display images in neovim, but most plugins use the kitty protocol, which doesn't seem to work as intended on wezterm.
What would cause this discrepancy between kitty and wezterm ?

edluffy/hologram.nvim:

wezterm.mp4

Edit: Another plugin, another issue

3rd/image.nvim:

image.nvim.mp4

BourgeoisBear added a commit to BourgeoisBear/rasterm that referenced this issue Apr 3, 2024
- renamed image protocol check functions
- switched back to base64.StdEncoding for Kitty format.  Kitty handles Std or
Raw, but wezterm only accepts Std.
- wezterm is also kitty-capable now (wez/wezterm#986)
- updated tests
- write kitty image preamble outside of chunk writer
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants