Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[APP-7566]: Implement Bluetooth Agent Provisioning #77

Open
wants to merge 28 commits into
base: main
Choose a base branch
from

Conversation

maxhorowitz
Copy link
Member

Summary

This PR encompasses everything needed for adding bluetooth provisioning as a default provisioning method in the viam-agent (and is designed to run in parallel to the existing WiFi hotspot provisioning method).

Setup

I want you to test as if you are a customer who has just received a machine in the mail. The machine will just have a viam-agent binary installation. It will not have WiFi credentials (and thus no internet connectivity).

Machine setup

Using a Linux machine/laptop (that you are comfortable getting your /opt/viam/*, /etc/viam/* and other files tampered with) , pull down this branch. Because you will need to remove connectivity to emulate the user perspective, please do not SSH into a machine over WiFi (and don't use Ethernet because it can interfere with provisioning). Instead, test directly on the device (keyboard + mouse) needed.

Uninstall Viam

First, you will need to remove any Viam-related stuff from your computer (to emulate the "newly-out-of-the-box" scenario our customers will be in). Please run the following from the repository root:
sudo ./uninstall.sh

Remove local networks

You will then need to remove local networks from your machine to emulate the "offline-ness" that our customers will be dealing with. I've been running nmcli con show and subsequently sudo nmcli con delete <name> (for every network that could "interfere" with the provisioning flow).

Preinstall the Viam Agent

Then, run the following (again from the repository root):

  1. make
  2. sudo ./bin/viam-agent-custom-aarch64 --install
  3. sudo /opt/viam/bin/viam-agent --debug

I've been running the commands altogether as such:
make && sudo ./bin/viam-agent-custom-aarch64 --install && sudo /opt/viam/bin/viam-agent --debug

Phone setup

Download the LightBlue app (available on both Android and iPhone). This is what we will use to communicate with the bluetooth service that is being advertised from the Linux machine.

Testing Procedure

At this point, we are ready to test.

LightBlue

Pairing

In the app, use the search bar to find the name of the Linux machine that you are currently testing against. It should be there. Pair with your machine (one of the tests is to see if the pairing request is automatically accepted on the machine side, so I am hoping this part works!).

If it does not pair, check your screen for a six-digit pairing code and accept the request manually. This is a known limitation that I am working on fixing. The TLDR is it's complicated because of reliance on systemd bus signals which may be picked up and discarded as negligible elsewhere in the viam-agent.

Bluetooth service used to "transmit" credentials

Once paired, look at the bluetooth service and nested characteristics available to us. You should be able to take the provided UUIDs and map them to the logs on your viam-agent machine:

There is an encoding that will be helpful for you to know here. Characteristics whose last 4 characters of their first 8 character sequence (preceding the -) will always be the following:

  • xxxx1111-... is the encompassing service
  • xxxx2222-... is the write-only characteristic for SSID
  • xxxx3333-... is the write-only characteristic for passkey
  • xxxx4444-... is the write-only characteristic for part ID
  • xxxx5555-... is the write-only characteristic for part secret
  • xxxx6666-... is the write-only characteristic for app address
  • xxxx7777-... is the read-only characteristic for nearby available WiFi networks that the machine has detected
2025-02-25T20:41:24.245Z	DEBUG	viam-agent	networking/bluetooth.go:507	BlueZ version (5.66) meets the requirement (5.66 or later)
2025-02-25T20:41:24.246Z	DEBUG	viam-agent	networking/bluetooth.go:43	Bluetooth peripheral service UUID: 79ff1111-4f38-44b9-b3b5-78fb7e14757e
2025-02-25T20:41:24.246Z	DEBUG	viam-agent	networking/bluetooth.go:45	WiFi SSID can be written to the following bluetooth characteristic: db8f2222-baea-452f-bad0-62b440f98161
2025-02-25T20:41:24.246Z	DEBUG	viam-agent	networking/bluetooth.go:47	WiFi passkey can be written to the following bluetooth characteristic: 06af3333-4442-42e5-9e6a-1729ffa1378d
2025-02-25T20:41:24.246Z	DEBUG	viam-agent	networking/bluetooth.go:49	Robot part key ID can be written to the following bluetooth characteristic: 26d24444-161f-4474-a4a5-e83c563ac52b
2025-02-25T20:41:24.246Z	DEBUG	viam-agent	networking/bluetooth.go:51	Robot part key can be written to the following bluetooth characteristic: 12db5555-ca13-4a0b-879a-c76fb1542fe1
2025-02-25T20:41:24.246Z	DEBUG	viam-agent	networking/bluetooth.go:53	Viam app address can be written to the following bluetooth characteristic: 9eaa6666-14f0-47a9-b69f-a093f8358b18
2025-02-25T20:41:24.246Z	DEBUG	viam-agent	networking/bluetooth.go:55	Available WiFi networks can be read from the following bluetooth characteristic: a8ee7777-2496-485a-b0ea-dc63a1122f1

The LightBlue interface is pretty easy to follow. You may need to change LightBlue messages from hex or binary to utf-8 string. Once confirmed you're in utf-8 string mode, you can write individual messages to the SSID, passkey, part ID, part secret, and app address "characteristics" by clicking "write value." Similarly, you can read from the available WiFi networks "characteristic" by clicking "read value." Any value written will get "sucked in" to the viam-agent provisioning flow. You can check this in the viam-agent logs. Once all values are submitted, the provisioning loop should close out the BT connection, connect to WiFi, retrieve its cloud config, and should start up a viam-server (thus ending the provisioning loop).

Cases

  1. Hotspot provisioning flow (i.e. check the captive portal flow still works)
  2. Bluetooth provisioning flow (i.e. check the LightBlue flow works)

@maxhorowitz
Copy link
Member Author

maxhorowitz commented Mar 3, 2025

@Otterverse I'm the worst - I needed to create a new PR because when I force pushed from my other Pi (which was actually on origin/main the git commit history wiped itself). The other PR still exists (it's closed), and I am copying your review here. I will add my response to each subsection of your review below each inline quote.

I apologize in advance for how long this review is. I'm leaving a summary here at the top of (most) of the important things, so it's easier to discuss (and check off as you go) hopefully. But it will overlap a lot with the inline comments too.

As always, hit me up in slack to discuss things whenever. Plenty of this is non-obvious, as you're deep in the guts of the most complicated part of Agent here.

  1. Background go routines/threads need to be used only when/where required, and must be fully managable. Remember that calls to start/stop can come in at any point in the flow, so everything must be able to exit cleanly and quickly. No need to background threads if you're just going to wait for them in the same fuction. Just handle things linearly to keep it simpler.

I added health, which uses Sleep() to mark itself healthy when sleeping. I was a little confused if I should ever be marking this subsystem "unhealthy" - which hasn't occurred to be possible to me (likely I am misunderstanding something). I'm also unsure if I'm using the boolean logic properly. Please let me know if I am using properly, and if not, what to fix.

  1. Need to be careful about data races, and protect any data that can be changed from another thread with locks. Note that even if it's not explicitly in a thread, outside calls from start/stop/update can happen at any point, so everything in those paths has to be race-safe.

Places where I use mutexes that should be reviewed:

  1. getCharacteristic (one mutex per characteristic)
  2. registerBluezAgent (one mutex total for 2,3,4)
  3. listenForPropertyChanges
  4. trustDevice
  1. BT characteristics (and their associated UUIDs) are a bunch of indivudual variables resulting in ~1/3 of bluetooth.go being very repetitive boilerplate. Should get them arranged into a simpler data structure, with a map to use directly, or a true structure with (generalized) getter/setter methods. E.g. calls should be able to look like val, err := bt.GetCharacteristic("ssid") or bt.SetCharacteristic("ssid", myVal) and be human readable. Look at networkState and connState for hints.

My refactor includes:

  1. getCharacteristic(name) -- streamlines the reading of characteristic values
  2. bsl.getReadOnlyCharacteristicConfig -- streamlines defining read only characteristic (like WiFi list)
  3. bsl.getWriteOnlyCharacteristicConfig -- streamlines defining write only characteristic (ssid, psk, part ID, part secret, app address)

** setCharacteristic is N/A because there shouldn't be any "seting" - it should be read only. Separately, there is an internal BT callback func that writes the BT value to a peripheral from a client, but that is not for us to implement.

  1. As discussed in slack, the 20 character limit for charcteristic could be a real problem. SSIDs and PSKs can be longer than that by themselves. And scan results may contains dozens of networks.

Solved this, let me know if you want any changes or tweaks.

  1. Need a new config setting to disable bluetooth.

May punt this to a follow up ticket. Reasoning is you know the config code path the best, and bringing it into the scope of this PR adds a lot more to think about. This would mean that we can merge this PR but wouldn't put out the release candidate until its all done. Just wanted to call out, and if we punt I will make it clear in JIRA that the release is blocking on that ticket.

  1. If agent starts on a device without bluetooth (or otherwise unsupport version/etc.), it needs to detect this and quietly do nothing (beyond an initial log.) Likely best to integrate with above, and just have a "temporary" flag that disables it until a full restart.

This should be implemented already in validateSystem - but I will confirm.

  1. BLE needs to integrate with the health checking. Any backgrounded routines need to report health status regularly using health.healthySleep() (add a new one on Networking{} to track ble health.)

I did this, let me know if it was done incorrectly and I will fix.

  1. Ideally, this should be abstracted just like the portal is. E.g. whenever start/stopPortal() is called, start/stopBLE() is called, and otherwise works the same. They should even use the same inputChan, so the outer provisioning code only has to listen to that one channel.

They do 👍 - and thanks for the easy design pattern - was very easy to add a parallel system using that inputChan.

  1. Network scans (and other info) are updated in real time. This is why they're managed in their own structures like n.connState(). BLE needs to use those and not just pass in a static list one time at startup.

I pass in n.getVisibleNetworks into the prepareBluetooth setup code as a callback that can be used to get the latest visible networks.

@maxhorowitz maxhorowitz changed the title Add new files. Implement Bluetooth Agent Provisioning Mar 3, 2025
Comment on lines 91 to 96
Type string `json:"type,omitempty"`
SSID string `json:"ssid"`
Security string `json:"sec"`
Signal int32 `json:"sig"`
Connected bool `json:"conn,omitempty"`
LastError string `json:"lastErr,omitempty"`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Max note to self to not use shorthand.

@@ -88,12 +88,12 @@ func (n *network) getInfo() NetworkInfo {
}

type NetworkInfo struct {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Otterverse you mentioned to use a different, smaller struct for this use case. In a previous PR, Ale suggested reusing the NetworkInfo type specifically. You own this codebase, so lmk if you want something split out. My opinion is to use NetworkInfo because its in the same package and we're using it the prescribed way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IF you can use it without changing it, that'd be fine. But you added shorthand tags, which does change the existing behavior (in theory... it may not actually harm anything currently.) You're also intentionally trying to skip several fields, so a more compact struct specifically for your parsing would probably be better regardless.

That said, unless you've solved the 20 byte limit, this doesn't matter, as json isn't going to be compact enough. If you HAVE solved it, then the question is "what's the new limit?" If there's still a tight limit (I read somewhere that BLE characteristics are 512 bytes long in the standards, not 20) then bytepacking might make more sense than json. If there's a more lax limit via some new method though, then you could possibly use the stucture as-is, without the new tags.

@maxhorowitz maxhorowitz changed the title Implement Bluetooth Agent Provisioning [APP-7566]: Implement Bluetooth Agent Provisioning Mar 3, 2025
Comment on lines 512 to 583
// getBlueZVersion retrieves the installed BlueZ version and extracts the numeric value correctly.
func getBlueZVersion() (float64, error) {
// Try to get version from bluetoothctl first, fallback to bluetoothd
versionCmds := []string{"bluetoothctl --version", "bluetoothd --version"}

var versionOutput bytes.Buffer
var err error

for _, cmd := range versionCmds {
versionOutput.Reset() // Clear buffer
cmdParts := strings.Fields(cmd)
execCmd := exec.Command(cmdParts[0], cmdParts[1:]...) //nolint:gosec
execCmd.Stdout = &versionOutput
err = execCmd.Run()
if err == nil {
break // Found a valid command
}
}

if err != nil {
return 0, fmt.Errorf("BlueZ is not installed or not accessible")
}

// Extract only the numeric version
versionStr := strings.TrimSpace(versionOutput.String())
parts := strings.Fields(versionStr)

// Ensure we have at least one part before accessing it
if len(parts) == 0 {
return 0, fmt.Errorf("failed to parse BlueZ version: empty output")
}

versionNum := parts[len(parts)-1] // Get the last word, which should be the version number

// Convert to float
versionFloat, err := strconv.ParseFloat(versionNum, 64)
if err != nil {
return 0, fmt.Errorf("failed to parse BlueZ version: %s", versionStr)
}

return versionFloat, nil
}

// validateSystem checks BlueZ installation/version.
func validateSystem(logger logging.Logger) error {
// 1. Check BlueZ version
blueZVersion, err := getBlueZVersion()
if err != nil {
return err
}

// 2. Validate BlueZ version is 5.66 or higher
if blueZVersion < 5.66 {
return fmt.Errorf("BlueZ version is %.2f, but 5.66 or later is required", blueZVersion)
}

logger.Debugf("BlueZ version (%.2f) meets the requirement (5.66 or later)", blueZVersion)
return nil
}

// convertDBusPathToMAC converts a DBus object path to a Bluetooth MAC address.
func convertDBusPathToMAC(path string) string {
parts := strings.Split(path, "/")
if len(parts) < 4 {
return ""
}

// Extract last part and convert underscores to colons
macPart := parts[len(parts)-1]
mac := strings.ReplaceAll(macPart, "_", ":")
return mac
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Max note to self did not fix this yet.

@@ -54,6 +54,10 @@ type Networking struct {
grpcServer *grpc.Server
portalData *portalData

// bluetooth
bluetoothService bluetoothService
bluetoothHealth *health
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nott to self to ask james about health

Comment on lines 111 to 122
// getProvisioning returns true if in provisioning mode, and the time of the last state change.
func (c *connectionState) getProvisioning() bool {
c.mu.Lock()
defer c.mu.Unlock()
switch c.provisioningMode {
case none:
return false
case hotspotOnly, bluetoothOnly, hotspotAndBluetooth:
return true
}
return false
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be all be either on or off

Comment on lines 186 to 198
if hotspotErr != nil && bluetoothErr != nil { //nolint:gocritic
n.connState.setProvisioningMode(none)
return errors.Join(hotspotErr, bluetoothErr)
} else if hotspotErr == nil && bluetoothErr != nil {
n.connState.setProvisioningMode(hotspotOnly)
return bluetoothErr
} else if hotspotErr != nil && bluetoothErr == nil {
n.connState.setProvisioningMode(bluetoothOnly)
return hotspotErr
} else {
n.connState.setProvisioningMode(hotspotAndBluetooth)
return nil
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switch statement goes away. It will be just set to true once we get rid of enumeration

if err := errors.Join(err, err2); err != nil {
return err
}
n.logger.Info("Stopped hotspot provisioning mode.")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make debug loglines for this and for bluetooth.

Comment on lines 237 to 247
func (n *Networking) prepareBluetooth() error {
if n.bluetoothService != nil {
return nil
}
deviceName := fmt.Sprintf("%s.%s.%s", n.Config().Manufacturer, n.Config().Model, n.Config().FragmentID)
bt, health, err := newBluetoothService(n.logger, deviceName, n.getVisibleNetworks)
if err != nil {
return err
}
n.bluetoothService = bt
n.bluetoothHealth = health
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to init

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and ensure it is locked there.

if ctx.Err() != nil {
return ctx.Err()
}
select {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove this its not needed.


// run spawns three goroutines, one to refresh available WiFi networks, another to listen for user input, and the last to listen for bluetooth
// pairing requests.
func (bsl *bluetoothServiceLinux) run(ctx context.Context, requiresCloudCredentials, requiresWiFiCredentials bool, inputChan chan<- userInput) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mark this as unhealthy if we are returning early - something needs to know that this is no longer "actively bluetooth-ing"

}

// If we haven't received all required credentials, try again a second later.
if !bsl.health.Sleep(ctx, time.Second) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of the healthy sleep is to be able to exit a sleep before the entire duration because context was cancelled during the sleep.

** bail out of the sleep **

Comment on lines 594 to 601
if err := bsl.listenForPropertyChanges(conn); err != nil {
return err
}

// Listen for properties changed events (includes bluetooth pairing requests).
signalChan := make(chan *dbus.Signal, 25)
conn.Signal(signalChan)
for {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactor so its clearer

Comment on lines 617 to 619
iface, ok := signal.Body[0].(string)
if !ok || iface != "org.bluez.Device1" {
continue
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see if i can naturally filter this

@Otterverse
Copy link
Member

From our discussion about bytepacking, my suggestion for the network list is to have the first byte be metadata, and the rest be the SSID, terminated by a null byte. For the "metadata" that would be signal and security. Use the high bit of that byte to indicate "secure" network, and the remaining 7 bits for 0-128 signal strength.

So a Network called "foobar" with 50% signal strength and security enabled would be something like this (did the full return-trip to check my own understanding.)

ssid := "foobar"
signal := uint8(50) // can't be more than 127, use 0-100)
secure := true

meta := byte(signal)
if secure {
	meta = meta | (1 << 7)
}

list := []byte{meta}
list = append(list, []byte(ssid)...)
list = append(list, 0x0) // separator/terminator

fmt.Printf("HEX: % x\n", list)

newMeta := list[0]
newSecure := uint8(newMeta) > 127
newSignal := newMeta &^ byte(1<<7)
newSsid := string(list[1:])

fmt.Printf("SSID: %s, Signal: %d, Secure: %t\n", newSsid, newSignal, newSecure)

Adding more entries (and splitting them afterwards) is left as an exercise to the reader.

listeningForPropertyChanges bool
adv *bluetooth.Advertisement
advActive bool
characteristicsByName map[string]*bluetoothCharacteristicLinux[*string]
Copy link
Member Author

@maxhorowitz maxhorowitz Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not review this - I still need to make the update to the map.

}

// Start begins advertising a bluetooth service that advertises nearby networks and acccepts WiFi and/or Viam cloud config credentials.
func (bsl *bluetoothServiceLinux) start(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need state external to the bluetoothServiceLinux type which will refrain from calling start or stop if the bluetoothServiceLinux is nil to begin with.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this with noBS (which sounds pretty funny in retrospect 😂 ) - it means "no bluetooth service"

refreshAvailableWiFiNetworks func(p []byte) (e error)

// Channel remains open until we have successfully completed provisioning.
provisioningComplete chan struct{}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this.

@maxhorowitz maxhorowitz requested a review from Otterverse March 6, 2025 22:21
@maxhorowitz
Copy link
Member Author

@Otterverse wanted to call out that I tested this with a soft-block on my onboard bluetooth and the behavior was correct - it bypassed all of the bluetooth service stuff without erroring, and permanently set the noBS field to true so that those startBluetoothProv and stopBluetoothProv calls would return before doing anything.

Copy link
Member

@Otterverse Otterverse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lot more stuff, sorry. Around half is kinda minor though. Note that I didn't add any comments around the characteristics/map/struct thing, as you said you'd work on that already.

We'll sync next week when I'm back, so you've got some time here. When we meet up, I'll probably ask you to walk me through (or provide instructions/links) for what to install on my phone and how to use it for testing this going forward.

Thanks!

Comment on lines 177 to 180
var hotspotErr error
if err := n.startProvisioningHotspot(ctx, inputChan); err != nil {
hotspotErr = err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var hotspotErr error
if err := n.startProvisioningHotspot(ctx, inputChan); err != nil {
hotspotErr = err
}
hotspotErr := n.startProvisioningHotspot(ctx, inputChan)

Comment on lines 181 to 184
var bluetoothErr error
if err := n.startProvisioningBluetooth(ctx, inputChan); err != nil {
bluetoothErr = err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var bluetoothErr error
if err := n.startProvisioningBluetooth(ctx, inputChan); err != nil {
bluetoothErr = err
}
bluetoothErr := n.startProvisioningBluetooth(ctx, inputChan)

}

n.connState.setProvisioning(hotspotErr == nil || bluetoothErr == nil)
return errors.Join(hotspotErr, bluetoothErr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if bluetooth fails, this returns an error, but setProvisioning acts like it was a success? Should probably log errors as they happen, but only RETURN an error if everything is foobar.

// startProvisioningBluetooth should only be called by 'StartProvisioning' (to ensure opMutex is acquired).
func (n *Networking) startProvisioningBluetooth(ctx context.Context, inputChan chan<- userInput,
) error {
if err := n.bluetoothService.start(ctx, true, true, inputChan); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this panic if n.bluetoothService is nil? E.g. if init failed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, you have two arguments here that you're hardcoding to true. Looks like they say what credential are needed. Don't you have to change the inputs here to match reality? E.g. how does it know when the viam config is already done?

Note that there are bits of this on the state machine above. n.connState.GetConnected() an n.connState.GetConfigured() Should probably use those dynamically deeper in the code, rather than passing in values only at startup.

@@ -54,6 +54,10 @@ type Networking struct {
grpcServer *grpc.Server
portalData *portalData

// bluetooth
noBS bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
noBS bool
noBT bool

nit: but "noBS" sounds like something snarky/rude without context, I think noBT (or noBLE) is more readable/meaningful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that 😅

Comment on lines 603 to 606
bsl.refreshAvailableWiFiNetworks = func(bs []byte) error {
_, err := c.Write(bs)
return err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is creating this callback for ALL read-only characteristics?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, I see the networklist is the only read-only one. I'd assumed you were passing the "needWiFi" and "needConfig" values to the mobile device this way,, the same way the wifi provisioning does (so the UI doesn't prompt for unneeded fields.)

Looks like you're also not sharing the fragment_id? I'm pretty sure that one is critical for provisioning to work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two important pieces here:

1 - the passing of fields "needs WiFi" and "needs config" values to the mobile app.

I can do this to mirror what you have for the captive portal. Should be an easy addition - and I do agree that its necessary.

2 - the fragment ID.

I think there are two very different provisioning scenarios here, so I want to make sure we're not talking past each other.

Scenario A: customer (i.e. Canyon Runner) sends devices that are pre-flashed with their company's fragment ID. Every customer gets the same fragment ID, and it doesn't get "set" by the end user. This is the most important / frequent use case according to Eliot. And in that case, the only things that get sent over should be WiFi credentials. I don't know this code path well at all - but the machine should be making a request directly to the cloud with a fragment ID only, and the cloud should create a new machine and return its cloud config to the robot.

Question for you: does no fragment ID make sense?

Scenario B: someone wants to set up their machine that already exists in the cloud. In that case we want to send the credentials from the phone to the machine (included in this PR).

Deciding whether we are in Scenario A or Scenario B should happen outside of the bluetooth.go code - I think it should be a provisioning decision, and that's why I included the [currently unused] boolean args for "needs WiFi credentials" and "needs cloud credentials" - essentially deferring to the caller to dictate which scenario we are in.

Question for you: does giving all control to the caller make sense?

if err := adapter.Enable(); err != nil {
return [4]uint32{}, nil, fmt.Errorf("failed to enable bluetooth adapter: %w", err)
}
serviceUUID := bluetooth.NewUUID(uuid.New()).Replace16BitComponent(0x1111)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: I see these UUIDs getting used all over, but uuid.New() is generating NEW uuids every run of the software. Is that expected/normal? Or are you suppose to define the full thing once/permanantly for a "known service" or anything like that? Just feels weird to generate new UUIDs for consistent elements.

Comment on lines +661 to +671
versionCmds := []string{"bluetoothctl --version", "bluetoothd --version"}
for _, cmd := range versionCmds {
versionOutput.Reset() // Clear buffer
cmdParts := strings.Fields(cmd)
execCmd := exec.Command(cmdParts[0], cmdParts[1:]...) //nolint:gosec
execCmd.Stdout = &versionOutput
err = execCmd.Run()
if err == nil {
break // Found a valid command
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat minor for now, but I do think it'd be better to test this via D-BUS (and during initBluetooth()) in order to really validate that you're talking to bluez in the expected way. Running commands may not always work as expected, and even if they're installed, it doesn't mean the service is actually running.

Also, there's an easier way to just run a one-shot command without setting up your own buffers or anything.

cmd := exec.Command("bluetoothctl", "--version")
output, err := cmd.CombinedOutput()

Comment on lines 192 to 199
[]bluetooth.CharacteristicConfig{
ssidCharacteristicConfig,
pskCharacteristicConfig,
robotPartKeyIDCharacteristicConfig,
robotPartKeyCharacteristicConfig,
appAddressCharacteristicConfig,
availableWiFiNetworksCharacteristicConfig,
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is missing some key fields. Ideally BT provisioning should provide the same info (and therefore UI flow) as wifi. Look at line 40 of grpc.go for the list there that's sent to the mobile device. Manufacturer and Model are probably okay to skip, but FragmentID is critical for provisioning. Similarly "HasSmartMachineCredentials" and "IsOnline" values inform the mobile UI what fields to show (or not) when asking the user for input. There's also a list of errors provided, so when, say, a bad wifi password is provided (or something else goes wrong) it can be reported to the user.

if n.bluetoothService != nil {
return nil
}
deviceName := fmt.Sprintf("%s-%s-%s", n.Config().Manufacturer, n.Config().Model, n.Config().FragmentID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there constraints on the device name? Manufacturer and Model could both be rather long, and fragment ID is a full 32-byte UUID, so that'll look really weird on a phone. Also, all robots with the same config would look the same, so you couldn't tell them apart. I would suggest using the same pattern as the wifi. That is, prefix+hostname:

n.Config().HotspotPrefix + "-" + strings.ToLower(hostname)
You can get the hostname from os.Hostname()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants