Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] mesa: add Directed Messaging protocol #6932

Draft
wants to merge 327 commits into
base: next/kelvin/410
Choose a base branch
from
Draft

Conversation

yosoyubik
Copy link
Contributor

@yosoyubik yosoyubik commented Mar 11, 2024

TODO:

  • vere [ ]

    • %memo cache growing out of control and causing crashes
    • %memo cache checks taking longer than expected after crash and reboot
    • look dropped packet (authentication)
      • happens sometimes, gets fixed on re-send
    • make forwarding real
    • real jumbo packets
    • new lss streaming algorithm jets
    • prime scry cache on first %poke fragment send
      • XX
    • unified fake driver (ames/mesa.c)
      • make the unified driver real/ polish
    • symmetric routing
  • arvo [ ]

    • initial contact flow when lane is missing
      • MVP: send to the peer's galaxy lane—route everything through it
      • make it flexible to send to the sponsorship chain
      • replicate alien key retrieval from ++send-blob
    • lane management/routing
      • fake, using %dear
      • Fix; port in %ames is different than the "real" one, on a ship not behind a NAT. %ames works fine, but after migrating the peer and using |mesa, we need to manually put the lane in, or we can't communicate
        • test
      • keeping multiple lanes and sending to all is also messing something up, as in -hi not working, but keeping just the last one we've heard, makes it work again.
      • inject direct/indirect lane for pokes
        • inject hopcount into %arvo
          • useful for trace logging?
          • XX [ ] replace it with just direct/indirect
      • for %page, if ?=(~ .next) lane is direct, otherwise add all .next to route.peer
      • lane eviction; keep the last 5 heard lanes
        • some other eviction policy?
    • fix migrating .keens into +peeks
      • we need to have updated chums.ames-state before we start migrating peeks
    • add %mesa support in %clay—currently done in the yu/rate branch
    • look into pe-keen for per-peer migrating check
    • look into +peeking for a file that we lack the mark of, causing a block in the queue—the requested crashes in ev-make-pact the responder in [=wire resp=$>(?(%boon %noon) gift)]—check what other gifts %clay gives to %ames
    • look into crash in %lull for lane when hearing a =task ((harden task) wrapped-task)
    • test new lss streaming algorithm
    • scry unification
      • XX how to make test work; we have to move the scry handler into the event-core to avoid calling +rof and do +ev-peek instead
    • weird +scry cancel
      • seen on a live moon: a scry to the parent planet of a moon, shows up in the path as "~nec"
    • QoS
      handled properly
    • encrypted payload path in poke:pact
    • remove crypto core
    • XX remove typed paths
      • or add missing jets
    • trace logs
    • ~debug dashboard support
    • test comets
  • Open questions:

    • Scrying on ++load
    • Larval stage—currently the unix duct in ames-state is the only data dependency and the latest version has the duct hardcoded
    • Boot: directly into the latest protocol version, how to do protocol negotiation?
      • Issue reads? currently not possible, see %alien +peeks
      • Try all protocols?
      • currently: boot into %ames, migrate peers manually.
      • ...

DONE:

  • vere

    • scry list of fragment using `/hunk/lop=@/len=@' namespace
    • inject response page into %arvo
    • inject completed, serialized, and encrypted %page using the %mess-ser task
      • (it currently uses %mess %page) see task bellow
      • XX better to no proliferate task for each possible entry point (one fragment, multi-fragment, encrypted, public... etc)
    • use one entry point for both packet/message %page responses
    • fix segfault caused by incorrect uv_timer_init
  • arvo

    • store private key in state

    • |hi ~zod ($poke via %chum)

    • -meen ~zod /c/x/1/sys/kelvin ($peek via %publ)

    • -meen ~zod /c/x/1/sys/kelvin [~ key=1 sec=0x0] ($peek via %shut)

    • hunk namespace for batches of fragments

    • fix wrong "fragment" inject for "fake" jumbo packets

    • meet %alien(s)

    • comet attestation

      • Instead of sending a special packet (i.e. open-packet in old %ames) we just
        read the attestation proof from the comet's namespace. There needs to be
        a special case for %alien comets that we pretend to know, so we can add the
        path into the pit to track it—this is a bit of a hack and might be fragile because
        of the way we need to check that we have bunted the state of the chum comet
    • refactor flow paths (remove +dire=?(%for %bak), use plea/ack-plea boon/ack-boon)

    • rethink %mesa-ser task, %mesa-response gift

    • test +peek cancel (%yawn/%wham)

    • fix %alien migrationg (fix: re-retrieving public-keys on %prod if %alien or missing_

      • seen on a live moon: ~marzod keys were never received, so we have to manually ask %jaels for them again to switch from %alien to %known, otherwise we can't contact the parent planet.
        • seems fixed—moves were not emitted, although the state not being update to %known from %alien is weird...
          • this was either a stale solid pill, or something previously fixed. now there's only one wire when requesting the keys from %jael, and i haven't seen this again, with the latest mesa-solid-live.pill.
    • fix state from .peers not being deleted after migration

    • fix missing /pump timers (at least on first attemp; %prod puts them back)

    • refactor per=[=ship sat=ship-state] in mesa core

      • tasks that include a ship are called in |mesa with the peer already in the door's sample
    • %alien +peeks

      • if we don't have the keys for ships that read into our namespace:
        • how to retrieve their keys so we can respond?
          • scry from %jael; revisit in the future
          • Every +peek to an alien ship gets preceded by an advisory +poke (e.g. please, get my keys)
          • Make %vere smart so it can detect that aliens are trying to read into our namespace, gather who they are, and send a list of ships for gathering keys that we need.
          • ...
    • For a 6-fragment message, I'm seeing a failure in decrypting the path (%heer crash), then after proding and retrying, the message comes is, |hi succesful, poke acknowleged, entries from the .pit (for both) removed... and then bail: oops on the receiver of the |hi

    • subscriptions

      • %cork
      • %watch
      • %leave
      • %boon
        • XX repeated, rapid %facts from a %gall agent block
          • seems fixed? check again
      • fix truncated %corked flow response
      • clog machinery
        • remove mem from the state parameters?
      • %naxplanation flow
        • app crash (e.g. +on-leave)
        • vane (%g/%m) crash
        • reuse %near task?
        • check closing bones
    • remote scry

      • %keen (both encrypted and public)
      • %yawn
      • %wham
    • Full end-to-end test

      • Groups
        • install groups
        • search for host
          • search again
            • XX seeing subscribe wire not unique (for /gangs/index/~fen)
        • join a group
        • receive backlog
        • send a message (as subscriber)
        • send a message (as host)
        • update subscriber
        • leave group
        • kick subscriber
        • rejoin
          • seems to be working, but only sometimes? (more tests needed)
          • leaving and kicking several times, and rejoining seems to be working, so the things listed bellowes, as far as i can see are %eyre/%gall subscription isssues
            • ~~can't find ~host~~ host shows up eventually
            • seeing spider crash
            • seeing subscribe wire not unique (for /gangs/index/~fen)
            • general slowness searching for a group host
            • ...
        • load latest version of Groups
          • rejoining a group after leaving/kicking is slow
      • send DM
      • Suspend/Nuke/Revive agents
      • Install other Apps
        • %file-share
      • ...
    • system taks

      • %tame
      • %dear
      • %kroc (probably not)
      • %heed
      • %jilt
      • %stun
      • %prod
      • %sift
      • %snub
      • %spew
      • %cong
        • remove mem from $axle?
      • %stir (what to put in here?)
      • %trim
    • system gifts

      • %turf
      • %nail
      • %stub
      • %near (only in %ames, captured by the used wire in +take)
      • %saxo
      • ...
    • Migration

      • At first, %ames is updated with a:
        • refactor to accommodate the two protocol logic handling: %ames and (wip) %mesa
        • new type of %plea targeting the $% vane (currently used for pre-cork ships)
          • plea/[vane=%$ path=/mesa payload=%ahoy], on a wire controlled by %ames.
        • new intermediate core, for per-peer checks on protocol switching
      • when on-load is called, with the new version of the vane, %ames (old protocol handling core) will be used by default—nobody will speak the new protocol automatically
      • a generator will allow per-peer attempts to switch to the new protocol by using old-ames to send an %ahoy %plea.
        • if the receiver has not updated, it will crash, and the sender will keep trying.
        • if the receiver has updated, we will flag this peer as "speaking the new protocol".
      • any calls into %ames to send a request will use a per-peer check to confirm if we know that the receiver speaks the new protocol
        • if not, the %ames core will be used
          • XX send also the %ahoy %plea to be notified as soon as they switch?
        • if yes, the %mesa core will be used and all flows will be migrated to the new protocol
      • Migration flow:
        • (pre: only "quiet" flows (i.e. not live) will be migrated; a per-peer timer will check if there are any flows that were live but have achieved quiescence and can be migrated)
        • (XX not true anymore, we migrate all flows, also partial/live naxplanations flows, since we know how to distinguish those based on current != next and lack of state for that message in the packet-pump)
        • A .ship send an %ahoy plea to .peer.
        • .peer receives the %plea, if it doesn't speak the new protocol it will crash (the plea will be resend), otherwise it will migrate any non-live flows and ack the plea.
        • .ship receives the %ack for the %ahoy plea and then migrates any non-live flows.
        • if there are no flows that are still live, .ship and .peer will remove each other from peers.ames-state and eveything will live not at chums.ames-state
        • live flows will be kept in peers.ames-state, and on a per-peer timer we will check if those flows have achieved quiescence.
        • If, after migrating flows, the ack for the %ahoy plea gets lost, the sender of the %ahoy plea could send something, for their point of view, "old", as in, still in peers.ames-state, but which the receiver will have in chums.ames-state. In order to avoid having two bones in two places, we always need to check if something we hear from the old protocol has been migrated, and in that case drop it, since eventually we will receive the %ack for the %ahoy plea, and then migrate the flow to the new protocol.
          • XX now we have introduced a problem where we have a live flow that won't be migrated but that will never be acked.
  • merge last jb/dire commit

  • ...

  • merge jb/dire branch to keep git history

@yosoyubik yosoyubik changed the title mesa: WIP Directed Messaging vane [WIP] mesa: add Directed Messaging vane Mar 11, 2024
@yosoyubik yosoyubik marked this pull request as draft March 11, 2024 19:07
@yosoyubik
Copy link
Contributor Author

We ran into an issue decrypting a specific path with the +crtc:aes algorithm (the decrypted path doesn't match the one we encrypt). I haven't found any other reproductions with paths of different length... Using chacha instead of AES works fine.

> =key `@`0w5ub.Wig2P.yo031.tZgj8.314QW.yDv56.Pe-Hp.fv2Zo.F~jQM 
> =path /m/x/1//flow/6/~fen/poke/~dev/bak/1

:: Encrypting

[%pat 0x58.d075.b0b1.3c07.b32b.23f7.c072.b5b7.b87c.0773.2b33.f7c0.76a0.f76f.6c66.f80c.e341.f1c1.dbc1]
[%tag 0x1bd1.df38.9d35.919d.e8d3.8678.9662.8003]
[%cyf 0x9e75.a089.0786.1021.bf82.b86d.ecc4.5d6e.744f.f313.5cf7.01bc.880c.5a6f.96a3.3e27.96d8.26af]

> `@ux`(seal-path:crypt:mesa key path)
0x1.3ceb.4112.0f0c.2043.7f05.70db.d988.badc.e89f.e626.b9ee.0379.1018.b4df.2d46.7c4f.2db0.4d5e.4100.37a3.be71.3a6b.233b.d1a7.0cf1.2cc5.0007.ec01


:: Decrypting

[%sealed 0x1.3ceb.4112.0f0c.2043.7f05.70db.d988.badc.e89f.e626.b9ee.0379.1018.b4df.2d46.7c4f.2db0.4d5e.4100.37a3.be71.3a6b.233b.d1a7.0cf1.2cc5.0007.ec01]
[%tag 0x1bd1.df38.9d35.919d.e8d3.8678.9662.8003]
[%cyf 0x9e75.a089.0786.1021.bf82.b86d.ecc4.5d6e.744f.f313.5cf7.01bc.880c.5a6f.96a3.3e27.96d8.26af]
[%pat 0xc63b.a099.3fbd.9182.b51e.cd15.f39d.2e8b.663c.cfcb.7c5c.36cb.9473.3959.9fcd.91fa.f0bf.3f52]
[%met-pat 36]
[%keyed 0xdbe5.aab6.9cd7.1199.1520.2d2b.4c8d.5fa4]
[%cue 0]
[%const-cmp 84.967.810.778.138.679.633.494.167.652.306.067.551]


> (open-path:crypt:mesa key enc-path)

/sys/vane/mesa/hoon:<[154 9].[164 27]>
/sys/vane/mesa/hoon:<[155 9].[164 27]>
/sys/vane/mesa/hoon:<[156 9].[164 27]>
/sys/vane/mesa/hoon:<[157 9].[164 27]>
/sys/vane/mesa/hoon:<[158 9].[164 27]>
/sys/vane/mesa/hoon:<[159 9].[164 27]>
/sys/vane/mesa/hoon:<[160 9].[164 27]>
/sys/vane/mesa/hoon:<[161 9].[164 27]>
/sys/vane/mesa/hoon:<[162 9].[164 27]>
/sys/vane/mesa/hoon:<[163 9].[164 27]>

cc: @joemfb @lukechampine

Expand code
```
++  encrypt
  |=  [key=@uxI iv=@ msg=@]
  ^+  msg
  (~(en ctrc:aes:crypto key 7 (met 3 msg) iv) msg) :: TODO: chacha8
++  decrypt  encrypt
::
++  seal-path
  |=  [key=@uxI =path]
  ^-  @
  =/  pat  (jam path)
  ~&  pat/`@ux`pat
  =/  tag  ((keyed:blake3:blake:crypto 32^key) 16 (met 3 pat)^pat)
  ~&  tag/`@ux`tag
  =/  cyf  (encrypt (mix key tag) tag pat)
  ~&  cyf/`@ux`cyf
  (jam [tag cyf])
::
++  open-path
  |=  [key=@uxI sealed=@]
  ^-  path
  =+  ;;([tag=@ cyf=@] (cue sealed))
  ~&  tag/`@ux`tag
  ~&  cyf/`@ux`cyf
  =/  pat  (decrypt (mix key tag) tag cyf)
  ~&  pat/`@ux`pat
  ~&  met-pat/(met 3 pat)
  ~&  keyed/((keyed:blake3:blake:crypto 32^key) 16 (met 3 pat)^pat)
  ~&  cue/(cue pat)
  ?>  (const-cmp tag ((keyed:blake3:blake:crypto 32^key) 16 (met 3 pat)^pat))  :: XX crash!
  ;;(path (cue pat))
::
++  const-cmp
  |=  [a=@ b=@]
  ^-  ?
  ~&  const-cmp/(~(dif fe 7) a b)
  =(0 (~(dif fe 7) a b))  :: XX jet for constant-time
::
```

@lukechampine
Copy link
Contributor

Ok, I investigated this, and the root cause was using @ instead of byts. What happened was that, for that particular key and path, the first (last?) byte of the keystream and the plaintext happened to match exactly; thus, when they were xor'd, the result had a leading (trailing?) zero, causing met 3 returned 1 less than it should have. So when this ciphertext was decrypted, it was missing a byte.

I've rectified this in jb/dire, so you can copy from there and it should work fine. Of course, as you observed, it already worked with chacha anyway, because the bug is sensitive to exactly which bytes are produced by the keystream. So we can't know for sure that it's fixed without exhaustive testing. But I'm pretty sure. :P

As you might suspect, this same bug also affects encryption of the message bodies. For now, I've cheated by using met 3 in a few places, but really we should be using the actual payload length. AFAIK that isn't currently exposed anywhere, so either it needs to be added to the packet format, or jam'd alongside the message data and cue'd out before decryption. 🤔 @joemfb

Include the mug for poke data, mark for facts, separate out the on-agent
sign explicitly.
Using the timestamp of the first event, instead of the start of the time
block.

Also narrow the time block down from a day to an hour, for faster log
writes. Further experimentation needed here.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants