Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Towards Generic Behaviors #213

Open
innovaker opened this issue Sep 26, 2020 · 17 comments
Open

Towards Generic Behaviors #213

innovaker opened this issue Sep 26, 2020 · 17 comments
Labels
behaviors core Core functionality/behavior of ZMK enhancement New feature or request keymaps PRs and issues related to keymaps rfc

Comments

@innovaker
Copy link
Contributor

Background

Outputs via HID are defined by 2 'coordinates':

  • Application Usage (32-bit) - e.g. Keyboard, Keypad, Consumer Control, Mouse, Joystick, Gamepad, etc.
    • Page (16-bit)
    • Id (16-bit)
  • Item Usage (32-bit) - e.g. A, B, C, mute, next, button 1, x axis, etc.
    • Page (16-bit)
    • Id (16-bit)

Technically there's other HID metadata that defines an input/output - i.e. Collection Physical, Collection Logical, Usage Switch, Usage Modifier - but my current understanding is that they're irrelevant for this context due to the constraints of our HID reports and how OS drivers interpret/treat them?  At least for keyboards, mice, joysticks - consumer too?  If I've got that wrong, please correct me because it's critical!

If utilized in their 32-bit forms, these 'coordinates' amount to 8-bytes, or 2 integers. This makes HID bindings tricky to define in the byte-array keymaps currently used by ZMK. It's partly why we currently have &kp and &cp which each symbolize the meaning of 6 of the bytes.

Generic HID Behaviors

I'd like to propose moving towards generic HID behaviors.  They work with:

  • any HID application (keyboard, keypad, consumer, mouse, joystick, etc.)
  • any HID keys
  • any HID buttons
  • any HID controls (dials, sliders, etc.).
  • etc.

Example Syntax

&p A ... or ... p(A) ... // press A
&p M_1 ... or ... p(M_1) ... // press Mouse Button 1
&ht LSFT C_NEXT ... or ... ht(LSFT, C_NEXT) ... // hold for LFST, tap for Next Track

I believe this syntax is concise, flexible and should be future-proof too.
It should also upgrade gracefully by making &kp and &cp aliases their of successor (they'd become legacy).

How?

It's a compromise between concise syntax, FLASH and SRAM. It relies on encoding (bit-packing) the 'coordinates' within the definition of each key (keys.h). Like this ...

  • Item Usage:
    • Page: raw HID code p => p or a remapping p => map(p)
    • Id: raw HID code id => id or a slight trimming id => id & 0x1FFF
  • Application Usage: mapping the meaning of the usage (page and id) into one numeric code (p, id) => map(p, id)
  • Modifiers (shifting): an 8-bit array

This combines well with @okke-formsma's modifier proposal #86 as well my suggestion for improving the shift syntax.

Implementation

How precisely the encoding is achieved is implementation detail and I'm still finessing it. But an ongoing exploration of the possibilities can be found here:
https://docs.google.com/spreadsheets/d/1Il4vEMcD3YD7sVDYb6pnMQfNpK-Pmp5AYrStf97DtI0/

Constraints

  • Application Usage should be mapped.
    • There's only 3 - 8 applications that the majority of users will care about.
    • The application codes are spread across several usage pages.
  • Item Usage Id can be trimmed down to 13-bits (unmapped) without losing anything important (12-bits is probably possible too).
  • Item Usage Page can be trimmed down to 7-bits (unmapped) or 5-bits (unmapped) without losing anything potentially useful. But it might be better just to map them to our own set of codes, see notes ...

Notes:

  • It's a living document - a work in progress. It will change over time as discussions progress. The latest version always represents my best attempts (so far).
  • USE represents the bit-packed codes previously described in How?
    • Why USE instead of KEY? Because for generic behaviors we're not just talking about keys anymore.
  • APPLICATION probably only needs 3-bits, but has 4-bits reserved at time of writing for flexibility. We could also leverage these bits if we wanted to incorporate non-HID codes too.
  • ITEM USAGE PAGE has 7-bits reserved at time of writing.
    • However, we'll probably only ever need 4 - 9 pages (?), so it might be better just to remap these to 3-bit or 4-bit numeric codes (and perhaps give the leftover 3-bits back to ITEM USAGE ID). I'm still studying this.
    • There is also the question of how to facilitate vendor defined pages which advanced users might want to utilize.
      • These often use 16-bit page codes.
      • I guess we could have an extra override property added to each behavior: usage_page.
  • The behaviors listed on the sheet are more granular and varied than we currently use in ZMK because it's an ongoing exploration & analysis document for the future. Please keep in mind that this aspect of the document is out of context for this issue. The focus of this issue is generic behaviors. For those curious however ...
    • I feel that the granular behaviors p (press), r (release), pr (press-release), rp (release-press), are a more flexible approach.
    • Behaviors which have multiple names listed (i.e. {press, on, down}) are simply to illustrate the meaning of that behavior in the document. Ideally we'd settle on one term per behavior.
    • 'press' in the context of the document has a different meaning than is currently used in ZMK. It refers only to the downward stroke which turns the switch on.

Alternatives

  • Using the full 8 bytes per key binding, and wrapping them into macros to make it look prettier. This isn't really an option.
  • Magic numbers or mass re-mapping of all codes
    • That's something we've been trying to avoid because it leads to architectural issues, bloated code and synchronization issues further down the line (see ZMK Studio).
  • Non-generic behaviors (using various abbreviations - variations on a theme - to encapsulate parameters).
  • Developing a keyboard-orientated (or hci-orientated) binary-serialization-friendly DSL or serialization format, this is one of my ongoing longer-term pet projects. Ping me if you want contribute or collaborate on it.
  • raw C structs/code ... not really an option for users who prefer form over function.

The $1,000,000 Question: Non-HID Generic Behaviors too?

  • How can we also generalize (some of?) the behaviors beyond HID for anything?
    • i.e. ht(&bt BT_CLR, A)
    • So for some behaviors, relevant parameters would be: { USE or BEHAVIOR }?
    • This would be valuable for:
      • Hold Tap
      • Tap Dancing
      • etc.
    • These types of behaviors would effectively become trigger behaviors, with a default action of tapping? a HID item.
    • Can we unionize/reconcile those two forms of parameter?
    • Could we also leverage the zero code of APPLICATION or HID_PAGE (whichever has the least bits) as a NOT_HID flag?

I welcome all comments, suggestions and discussion.

@innovaker innovaker added enhancement New feature or request rfc keymaps PRs and issues related to keymaps core Core functionality/behavior of ZMK behaviors labels Sep 26, 2020
@innovaker
Copy link
Contributor Author

I haven't touched the document since posting this proposal as I wanted to gauge interest before putting more time into it. If it gathers support then I'll spend more time nailing down the finer detail w.r.t. the bit-encoding.

@petejohanson
Copy link
Contributor

This is needed definitely for the modifiers work. I'm still not sure we need to go to the level of encoding the application usage into these, so we can just use "one behaviour" for sending all HID data.

That feels too abstracted, for our needs. Can we not infer the application from the usage page for the given keycode? Have some convention at least for that, that could be stepped out of for crazy use cases we don't need for normal usage?

@innovaker
Copy link
Contributor Author

innovaker commented Oct 16, 2020

Thanks @petejohanson.

I don't believe so. For simple keys that's possible, but for buttons or axes the application is needed for context (such as mouse, or joystick for instance). That context would otherwise have to be provided by a behavior - much like we currently do for item usage pages - which leads to:

  • duplicate code/logic - i.e. code pathways and memory
    and either:
    • the behavior essentially represents the application encoding, which will add to our growing pains as we move beyond from DT because each behavior is going to have to be encoded and serialized somehow, preferably in the smallest package possible (memory)
    • extra metadata for application

Also, I'd like to have more than the current push-release behavior (which is effectively what &kp is). I can think of use cases for push (down only), release (up only), toggle, tap, hold, release-push (inverted). Taking mouse button behaviors into account, you'd end up with many more behavior tags if application wasn't encoded.

If someone wants to add Joystick or Gamepad controls further down the line too - either buttons or axes - it's yet another multiplier. The same goes for Wireless Radio Controls perhaps? I'm just spitballing here merely to show how it can snowball. More on that can be found here: https://usb.org/sites/default/files/hut1_2.pdf#page=31 - those are only the generic desktop application usages. There's others on other pages (less relevant).

Finally, without incorporating the application, it splits HID across behaviors with parallel conventions. With application, a mouse button simply becomes another HID (keys) code - i.e. &p M_1 or &p M1. That's less cognitive load for users and easier to grasp and document as well. Users probably don't want to care if it's keyboard, consumer or mouse - that's implementation detail for them. It's also more concise as the alternative would have to be .... &mp M_1 or &mp M1.

In many ways, HID has already solved this problem for us, we just need to squeeze/encode it into a smaller package whilst minimizing the mapping we do.

Any thoughts?

@innovaker
Copy link
Contributor Author

Playing Devil's Advocate, I guess the other question is ... why wouldn't we want to approach it this way? What are the alternatives? What are their benefits and costs?

@petejohanson
Copy link
Contributor

I think the concern I have is trying to automatically link a generic behavior to any number of HID applications, and not write ourselves into a corner. Are they entirely separate applications? Different logical groups? How does this work for custom behaviors looking to do custom HID stuff?

It seems like a lot to try to "Get Right" the first time.

@innovaker
Copy link
Contributor Author

I think the concern I have is trying to automatically link a generic behavior to any number of HID applications, and not write ourselves into a corner. Are they entirely separate applications? Different logical groups?

That's an excellent point and it touches on my opening paragraph to the proposal. You're right, we do need to be confident that this is how HID works before we go down this route. Everything I've read and encountered so far suggests this is the case.
But I could be wrong! I need someone else to confirm it really.

My background with HID covers:

I've yet to encounter duplicate applications or duplicate usages within a single report. My understanding is that's partly why the concept of multiple reports exist. So I'm fairly confident that our likely use cases are covered.

But the devil is in the details. That detail is in this document: https://usb.org/sites/default/files/hid1_11.pdf. And I suspect these bits are relevant:

  • Page 17: A Report descriptor can have multiple Usage tags. There is a one-to-one correspondence between usages and controls, one usage control defined in the descriptor.
  • Page 33: Application: A group of Main items that might be familiar to applications. It could also be used to identify item groups serving different purposes in a single device. Common examples are a keyboard or mouse. A keyboard with an integrated pointing device could be defined as two different application collections. Data reports are usually (but not necessarily) associated with application collections (at least one report ID per application).
  • Page 34: Collection items may be nested, and they are always optional, except for the top-level application collection.
  • Page 42: Delimiters cannot be used when defining usages that apply to Application Collections or Array items.
  • Page 57: Each top level collection must be an application collection and reports may not span more than one top level collection.

It also depends on the finer details of each type of Collection. But I don't think they affect this discussion? Can anyone verify that?

This paragraph from the Windows documentation also raises my eyebrow:
An unnested collection is always a top-level collection, regardless of its HID type. In particular, a top-level collection does not have to be an Application collection, as defined by the USB HID Standard.
I don't know what usage you'd use as a top-level collection if it isn't an application collection - shrug. Presumably a code which is mappable to a Windows PDO? Perhaps to support quirky report implementations?

So, I think so, based on how I've interpreted it, but my understanding has never been validated by a domain expert. The only way I can see of us being sure, is for someone else to read the specification and confirm my instincts. Or for an experienced HID expert to validate it. But given that HID is the primary target for ZMK, it has to be worth it right?

How does this work for custom behaviors looking to do custom HID stuff?

As long as we plan ahead, I think we'll be alright. You can't magically fit 8 bytes into 1.5 bytes (allowing for modifiers) without cutting some corners, but the corners we'd cut are dead space, as well as the usages that are unfeasible for a keyboard or even a multi-application HCI device. That's the finessing I was talking about. Moreover, as it's all internal implementation detail, we are afforded a fallback or backup position if we find we screwed up. It's probably easier to attempt generic behaviors and then fallback onto application specific behaviors, rather than the reverse. We've also the option of a bespoke encoding.

@innovaker
Copy link
Contributor Author

For completeness and observers, the other alternative that hasn't been discussed in this thread yet is the one I listed as:

Magic numbers or mass re-mapping of all codes

I believe this is the approach taken by Linux, QMK and others. In essence the key codes are one long set of codes - effectively an enumeration. Most of the meaning (behavior, application, item) is implied by a single number and bit flags. We've tended to call them magic numbers.

The disadvantages of magic numbers include:

  • extra logic code to do the mappings (throughout the system)
  • keeping them in sync between different systems
  • facilitating custom codes for users
  • a lack of "namespaces" for the codes
  • it gets more complicated over time as new stuff gets tacked on and stuff is refactored

But they do have their merits too, especially for the lower-end chips!

It was an early decision for ZMK not to go down that route in favour of behaviors because at the time it felt like a better choice. In the context of ZMK, I guess the equivalent is currently two-fold:

  • behavior (+ metadata)
  • parameters

Effectively the meaning is wrapped up into the data associated with those two parts. Behaviors can also act as namespace divider to some extent (i.e. a parameter 0x01 has a different meaning for each behavior), although namespaces can also be shared across behaviors.

When we eventually start looking beyond the Device Tree (DT) for keymap/binding configuration, there's always the possibility this could change. Any approach that involves another system such as ZMK Studio, will have to encode and serialize the configuration. This includes behavior identities, regardless of whether it's strings or numbers. If it's numeric, we'll probably have more spare bits to play with (as the number of behaviors will always be relatively small) which is food for thought. That's a conversation for another day, but it will probably open doors on further "internal" encoding optimization in the future. We just need to be sure we don't close any doors too early with the decisions we make now.

@innovaker
Copy link
Contributor Author

innovaker commented Oct 17, 2020

Had a thought when I woke up today.

We can both:

  • provide maximum flexibility for bespoke/future pages/applications
  • minimize the impact of any potential unforeseen fallout

by localising the use of the encoded keycode to the keymap interface only, which is its primary purpose anyhow.

In practice that means we would:

  • encode/pack in the key definitions
  • decode/unpack in behavior_key_press into the application page+id, item page+id, and modifiers
    • we can can either use extended usages (page << 16 | id) or just integers
    • at this stage, it's transitive data so it won't impact memory much
  • refactor keycode_state_changed to carry these values
  • replace any references to keycode elsewhere in the system (such as hid_listener) with their usage equivalents, based on the work I did for feat(HID): Preprocessor definitions for HID Usage Tables 1.21 #217.
  • optional: also keep the encoded keycode in keycode_state_changed for debugging purposes

This would allow us to facilitate override behaviors or behavior metadata for any usage pages that don't make it into the encoding if the need arises (which if we do our homework for the finessing, will probably never even be an issue).

Effectively the keycodes only become keymap notation shorthand. The rest of the internal state can be more explicit.

A cursory look at the current system suggests we'd also need to do some behavior refactors to avoid repeating ourselves, but that shouldn't be an issue.

@petejohanson
Copy link
Contributor

Looking at this, I really think I would favor an incremental approach to this, and if we can keep the "external contract" for the keymaps themselves stable, we'll have a win here. In particular, I was reading https://usb.org/sites/default/files/hut1_21.pdf the section "3.1 HID Usage Table Conventions", which states

Usages are 32-bit identifiers, where the high order 16 bits represents the Usage page and the low order 16 bits represents
the Usage ID. To allow more compact Report descriptors, Usage Page items can be declared to specify the high order
bits of the Usage item and the Usage items can declare only the ID portion of the Usage, as follows:

So, we would be very inline w/ HID itself to use the single 32-bit parameter to various behaviors, e.g. &kp FOO to encode both the page and usage ID for a given location, avoiding the awkward cp versus kp crazy, and making hold-taps work properly for both, etc.

Reviewing the possible usage page values, I see no need for any pages that use the top 8 bits, e.g. we only need the 0x01 to 0x0F range, leaving those top bits as ripe space for storing the extra modifier information we need for shifted keycodes.

We can imagine then something akin to:

#define BANG Z_MOD_KEYCODE(MOD_LSFT, HID_USAGE_PAGE_KEYPAD, HID_USAGE_NUM_1)

To encode the mods, usage page, and usage ID all in one 32 bit value.

I do understand this doesn't encode the second coordinate at this point. I believe we at this point don't have a need for multiple reports that send keypad or consumer values, and it's reasonable, since this is a keyboard firmware, to unpack the encoded format, and use the usage page to determine what report field to update w/ new state, etc.

Should we later decide for some more compact encoding internally that is even more sneaky w/ our "wasted" bits to encode the other coordinate, the keymap consumer will still just say "I use the BANG define", and doesn't need to be any the wiser that behind the scenes that is encoded any differently.

Thoughts? Concerns?

@innovaker
Copy link
Contributor Author

Thanks for looking at this @petejohanson.

Your suggested approach is the same as my initial plan for this proposal. The application coordinate however, becomes important once we start using any controls from the generic pages - such as buttons or axes. BUTTON 1 from button page or Axis X from the generic page are not indicative of their reports - because they're generic by design. Sure, in a strictly constrained system you might get away with declaring that BUTTON 1 should always go to the Mouse report and using extra layers of conditional logic to do so, but that breaks down as soon as someone wants to use BUTTON 1 for a different purpose. I believe that's one of the purposes of application - to provide the necessary context for generic HID controls.

I appreciate that you prefer incremental changes. So sure, for the interim:

  • change &cp into a simple alias of &kp, and hopefully &pr too (press-release - please see previous post or the sheet).
  • put the modifier flags into bits 24 through 31 of param1.
    (we don't really have any choice for where the bits can live within the current DT map design).
  • change the key codes to extended usages based on feat(HID): Preprocessor definitions for HID Usage Tables 1.21 #217. I'm probably best placed to do that.
  • ensure any user interface syntax for applying modifiers to arbitrary codes is flexible enough to seamlessly accommodate a change in the bit structure. The macros I suggested would be generic enough for that.
  • develop a strategy for handling the use cases where explicit modifiers (e.g. user holding down shift + ctrl) are simultaneously mixed with implicit modifiers (e.g. BANG).
  • implement implicitly shifted keys codes (symbols). Again, I'm best placed to do that as I have the prerequisites prepared.

Let's continue discussing the encoding however - specifically application - because it's one of the main purposes of this proposal. It's necessary for these reasons:

  • As discussed the other day, I'm designing a mouse report as a prerequisite to MouseKeys etc. This needs application.
  • I'm exploring a System Control report (which fell out of the key standardization). This needs application.

Are there any other blockers besides verifying the purpose of application within HID?

Aside:

  • The 32-bit identifiers are extended usages as touched on in my previous post. They're mentioned on page 17 of the specification as well as Microsoft's documentation. Can we adopt this terminology?
  • As you spotted, unmapped usage pages are fine up to 7-bits - possibly lower (see first post). We should also be mindful to keep the door ajar for vendor-defined usage pages which are 16-bits. That said, it that probably won't be an issue until DT is superseded anyway so it's more a concern for its successor.

@petejohanson
Copy link
Contributor

The main other current blocker would be the lack of room in the current 32-bit param from the DT to allow encoding all of that nicely. And keeping this in one 32-bit param is really important to leveraging the existing behavior work from @okke-formsma and others to support hold-taps w/ modifier keycodes, e.g. ! in a hold-tap, or "Auto Shift" as noted in Discord.

So, I'm really happy to continue the discussion, especially for "post DT" targets, but in the interim, I think we're on the same page on an encoding strategy that we can do ASAP, building on your already awesome HID work.

I think we should work on getting your generated HID stuff in, then work on this as a next step for that, to unblock the modifier work @okke-formsma already has spearheaded.

Any concerns?

@innovaker
Copy link
Contributor Author

That's what this proposal is attempting to solve though? I designed it with @okke-formsma's work in mind in conjunction with the upcoming needs (Mouse etc.).

The modifier work has never been blocked by this issue. It compliments it.

@petejohanson
Copy link
Contributor

I'm not saying this closes this issue, just talking about how I am proposing we implement things today to get the modifier and usage page stuff addressed ASAP, in a way that doesn't make this issue harder to work on as a follow up.

@innovaker
Copy link
Contributor Author

Sure, the checklist in my post above is the step-by-step for the immediate concerns.

@okke-formsma
Copy link
Collaborator

I like the approach you guys figured out, as it checks all the boxes we currently need and gives enough flexibility for the future. Let's update the keycode defines and get the modifiers up-to-date so we have some short term profit from all this work :)

@innovaker
Copy link
Contributor Author

Each of the steps I described above is effectively its own issue / PR, the whole lot being a small epic or game plan.

@okke-formsma's probably best placed to do the modifier bits. I'm best placed to do the key code bits as follow up to #21. So it'll need a degree of coordination.

I suggest we create a tracking/checklist issue based on my original checklist so that we can assign/track each part, and continue using this current issue for ongoing discussions about the other aspects of the proposal.

@and-elf
Copy link

and-elf commented May 25, 2024

I'm entirely new to the codebase, and have only skimmed through the current implementation, so I bet I'm way off..
The behaviors are already referenced/defined in the linker script, right?
Then, why not just set up a large-ish static section for it?
To update behaviors, the usb/ble hid could enumerate a device for it, and we could have a small tool (basically an llvm-based compiler) to generate the packed struct data as a blob.

Conceivably, the tool could even be javascript-based and run client-side, and using html5, the blob could be written to the enumerated device. Pehaps work together with https://github.com/nickcoutsos/keymap-editor?

I think pretty minor changes would need to be done in the code base, but the tool may be a bit complicated, especially in javascript..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
behaviors core Core functionality/behavior of ZMK enhancement New feature or request keymaps PRs and issues related to keymaps rfc
Projects
None yet
Development

No branches or pull requests

4 participants