Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

async sensor API #56963

Closed
wants to merge 10 commits into from
Closed

Conversation

yperess
Copy link
Collaborator

@yperess yperess commented Apr 18, 2023

  • Introduce a new data flow:
    • Request - data is requested and returned by the sensor asynchronously and without a syscall via RTIO
    • Raw - data is in a raw buffer format directly from the sensor driver
    • Processing - sensor drivers provide a decoder which will decode the data into a Q31 format which is compatible with zDSP (Zephyr Digital Signal Processing) subsystem.
  • New API is fully backwards compatible. Benchmarking demonstrated being within 1% performance of the exiting API when USERSPACE was disabled (it's likely that it might even beat it when userspace is enabled because no syscalls are required in the new API). Additionally, data is much more compressed in the raw data format while the data is waiting for processing.
  • Using the RTIO enables the use of mempools which will allocate data as needed which allows several sensors to share the same context.
  • Migrate the sensor shell to use the new API

@yperess
Copy link
Collaborator Author

yperess commented Apr 18, 2023

@teburd I got things working (kinda). I still can't get the thermistor to work side by side with the IMU (@asemjonovs do you have any thoughts?). But overall this is working really well. Let me know what you think.

@teburd
Copy link
Collaborator

teburd commented Apr 18, 2023

@teburd I got things working (kinda). I still can't get the thermistor to work side by side with the IMU (@asemjonovs do you have any thoughts?). But overall this is working really well. Let me know what you think.

I'll take a closer look today. I guess at first glance I'm concerned about passing the rtio context to the sensor_read() call but I need to understand better why that might be needed by reading this.

@teburd
Copy link
Collaborator

teburd commented Apr 18, 2023

Ok just to kind of re-dump what we had brainstormed...

RTIO_DEFINE_WITH_MEMPOOL(r, 4, 4, BUF_POOL_COUNT, BUF_SIZE);

int main()
{
    const struct device *icm42688 = DEVICE_DT_GET(icm42688);
    const struct device *akm09918 = DEVICE_DT_GET(akm09918);

    /* TODO Setup sensors including trigger source here (e.g. fifo en, watermark level, timers, whatever) */
      
    /* Setup a pool read for the sensors, using the sensor as the userdata */
    sensor_read_with_pool(icm42688, r);
    sensor_read_with_pool(akm09918, r);
    
    while (true) {
      struct device *read_dev;
      void *buf;
      uint32_t len;

      /* Submit with wait count 1, immediately reads and deciphers the completion into the out params */
      int res = sensor_read_await(r, &read_dev, &buf, &len);

      /* We now have a buffer with the sensor data or an error, its ours until we free it */
      if (res < 0) {
         LOG_ERR("failed to read sensor data %d\n", res);
         continue;
      } else {
         if (read_dev == icm42888) {
            process_icm42688(read_dev, buf, len);
         }  else { 
           process_akm09918(read_dev, buf, len);
         }
         sensor_read_free(r, buf, len);
         sensor_read_with_pool(read_dev, r);
      }
      
    }
}

And honestly this doesn't seem too far off from that. I think the difference here is what maybe my mind was going to with the sensor_read() call was a wrapper around what amounts to the sensor acting like an I/O device that you can submit requests to, and sensor_read does all the setup for that for you.

So making sensor_read() a syscall I think is the the unexpected part in this setup. I was expecting there to be a driver level submit() type call (should not be a syscall) that takes the submission (has the rtio context as well!) to read and produces a completion.

I guess the difficulty is matching the idea that you'd like to read some N channels.

This can be encoded in the iodev's data and then the read call can take the iodev (configured with the channels and such). The read call doesn't need to be a syscall but can be an inline helper that creates the submission and submits it with the given context.

SENSOR_DT_IODEV(icm42688_gyros, sensor_dt_node, somechannels, num_channels)

sensor_read(icm42688_gyros, r);

Does that make sense or maybe I'm overlooking something else?

To make this backwards compatible then, a sensor_submit API is available that does kind of what you do here already...

void sensor_iodev_submit(struct rtio_iodev_sqe *iodev_sqe) {
  const struct sensor_iodev_data *iodev_data = iodev_sqe->sqe->iodev->data;
  const struct device *sensor_dev = iodev_data->sensor_dev;
  const struct sensor_driver_api *api = sensor_dev->api;

  /* Device does supports async reads with rtio */
  if (api->submit != NULL) {
    return api->submit(dev, iodev_sqe);
  }
  
  /* submit not implemented, but presumably fetch and get are so use those to mimic submit */
  enum sensor_channel *channels = iodev_data->channels;
  const uint32_t channel_num = iodev_data->channel_num;
  
  sensor_sample_fetch(dev, SENSOR ...);

  /* for channel in channels */
  for(int i = 0; i < channel_num; i++) {
     sensor_channel_get(dev, ....);
  }

  /* Return completion... */
  rtio_cqe_submit(iodev_sqe->r, cqe, ...);
}

I do kind of like that you've made the processing take a function pointer so cleanup and everything is automatic. This too should probably be a helper function kind of thing. So rather than sensor_read_await you you call sensor_read_process(r, process_func) kind of thing perhaps?

Anyways, I think this is really on the right track, but maybe some missed thoughts on my part weren't conveyed well. I hope that helps!

@yperess
Copy link
Collaborator Author

yperess commented Apr 18, 2023

@teburd I got things working (kinda). I still can't get the thermistor to work side by side with the IMU (@asemjonovs do you have any thoughts?). But overall this is working really well. Let me know what you think.

I'll take a closer look today. I guess at first glance I'm concerned about passing the rtio context to the sensor_read() call but I need to understand better why that might be needed by reading this.

Yes, passing the context for every read was not my first choice, but it was the only way I could think of making this 100% backwards compatible. A workflow that does:

  1. Set RTIO context
  2. Trigger a read

Will not work without changes from the driver since we don't have anywhere to store the RTIO context. There are alternatives to create a meta data array of {const struct device*, struct rtio *} and we allocate one per sensor in an iterable section. I'm still open to this approach if you like it better.

@yperess
Copy link
Collaborator Author

yperess commented Apr 18, 2023

Ok just to kind of re-dump what we had brainstormed...

RTIO_DEFINE_WITH_MEMPOOL(r, 4, 4, BUF_POOL_COUNT, BUF_SIZE);

int main()
{
    const struct device *icm42688 = DEVICE_DT_GET(icm42688);
    const struct device *akm09918 = DEVICE_DT_GET(akm09918);

    /* TODO Setup sensors including trigger source here (e.g. fifo en, watermark level, timers, whatever) */
      
    /* Setup a pool read for the sensors, using the sensor as the userdata */
    sensor_read_with_pool(icm42688, r);
    sensor_read_with_pool(akm09918, r);
    
    while (true) {
      struct device *read_dev;
      void *buf;
      uint32_t len;

      /* Submit with wait count 1, immediately reads and deciphers the completion into the out params */
      int res = sensor_read_await(r, &read_dev, &buf, &len);

      /* We now have a buffer with the sensor data or an error, its ours until we free it */
      if (res < 0) {
         LOG_ERR("failed to read sensor data %d\n", res);
         continue;
      } else {
         if (read_dev == icm42888) {
            process_icm42688(read_dev, buf, len);
         }  else { 
           process_akm09918(read_dev, buf, len);
         }
         sensor_read_free(r, buf, len);
         sensor_read_with_pool(read_dev, r);
      }
      
    }
}

Correct, and what I have here is basically the same. The sample we have assumes FIFO and that the sensor is generating the data (which will come soon after this PR). What I have currently in this PR basically is a trigger one-shot read (sensor_read()) The while loop that you posted is basically what the extra thread in this PR is doing it's just that I didn't get around to doing the FIFO support so I needed 2 threads. One triggers and one processes data.

And honestly this doesn't seem too far off from that. I think the difference here is what maybe my mind was going to with the sensor_read() call was a wrapper around what amounts to the sensor acting like an I/O device that you can submit requests to, and sensor_read does all the setup for that for you.

So making sensor_read() a syscall I think is the the unexpected part in this setup. I was expecting there to be a driver level submit() type call (should not be a syscall) that takes the submission (has the rtio context as well!) to read and produces a completion.

I guess the difficulty is matching the idea that you'd like to read some N channels.

This can be encoded in the iodev's data and then the read call can take the iodev (configured with the channels and such). The read call doesn't need to be a syscall but can be an inline helper that creates the submission and submits it with the given context.

I think the difficulty here was actually making this backwards compatible (though maybe I just wasn't creative enough). My current thinking was that with the existing approach sensor_read() is a single syscall but it's replacing 2+ (fetch/get). For FIFO reading we would just add a new API after this sensor_stream(const struct device *, struct rtio*) if the RTIO context is NULL, then we'll turn off the stream, if it's non-NULL then we'll start streaming data to the same processing loop associated with the RTIO context.

SENSOR_DT_IODEV(icm42688_gyros, sensor_dt_node, somechannels, num_channels)

sensor_read(icm42688_gyros, r);

Does that make sense or maybe I'm overlooking something else?

To make this backwards compatible then, a sensor_submit API is available that does kind of what you do here already...

void sensor_iodev_submit(struct rtio_iodev_sqe *iodev_sqe) {
  const struct sensor_iodev_data *iodev_data = iodev_sqe->sqe->iodev->data;
  const struct device *sensor_dev = iodev_data->sensor_dev;
  const struct sensor_driver_api *api = sensor_dev->api;

  /* Device does supports async reads with rtio */
  if (api->submit != NULL) {
    return api->submit(dev, iodev_sqe);
  }
  
  /* submit not implemented, but presumably fetch and get are so use those to mimic submit */
  enum sensor_channel *channels = iodev_data->channels;
  const uint32_t channel_num = iodev_data->channel_num;
  
  sensor_sample_fetch(dev, SENSOR ...);

  /* for channel in channels */
  for(int i = 0; i < channel_num; i++) {
     sensor_channel_get(dev, ....);
  }

  /* Return completion... */
  rtio_cqe_submit(iodev_sqe->r, cqe, ...);
}

I was aiming for abstracting away the RTIO details though I'll admit it comes at the cost of granularity. If the developer wants to do a single read with a pre-allocated buffer to a mempool RTIO, they will not be able to with my API, though we could add a buffer/size argument and allow the developer to use NULL if the context has a mempool. That wouldn't be too difficult.

I do kind of like that you've made the processing take a function pointer so cleanup and everything is automatic. This too should probably be a helper function kind of thing. So rather than sensor_read_await you you call sensor_read_process(r, process_func) kind of thing perhaps?

Anyways, I think this is really on the right track, but maybe some missed thoughts on my part weren't conveyed well. I hope that helps!

Copy link
Collaborator

@teburd teburd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really good to me, some commentary on the mempool changes would be good

map_entry->state = RTIO_MEMPOOL_ENTRY_STATE_ALLOCATED;
map_entry->block_idx = __rtio_compute_mempool_block_index(r, *buf);
map_entry->block_count = num_blks;
mutable_sqe->buf = *buf;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering what happened with the mempool situation here, seems like a complete change to what you had?

Copy link
Collaborator Author

@yperess yperess Apr 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the other way wasn't working. We can chat about it offline but basically we were keeping the sqe index and using it to recover the buffer. That works when the sqe and cqe are used at the same pace, but the sqe is released already and in my case was then assigned to another submit which the everything off

@yperess
Copy link
Collaborator Author

yperess commented Apr 21, 2023

This is looking really good to me, some commentary on the mempool changes would be good

Something isn't quite right, it is not working well with high data rates. I'll try to debug today.

@yperess yperess force-pushed the peress/sensors branch 2 times, most recently from ca79839 to 5430bfb Compare April 25, 2023 05:22
@yperess
Copy link
Collaborator Author

yperess commented Apr 25, 2023

@teburd, performance is abysmal currently (see output below). The fact that we're only getting 512 callbacks though makes me wonder if this is an issue with the CQE not being released properly?

Statistics:
Duration: 0.500
Num Requests: 36914
Sampling frequency: 73828Hz (ideal: 32000Hz)
Reading frequency: 512Hz
Num Callbacks: 512
Num Samples: 256
Drop rate: 50.00
Num Failed Reads: 256
Num Failed Timestamps: 0
Num Failed Frame Counts: 0
Num Failed Get Scale: 0


*************************
Legacy Statistics:
Duration: 0.500
Sampling frequency: 80582Hz (ideal: 32000Hz)
Reading frequency: 31658Hz
Num Requests: 40291
Num Samples: 15829

@teburd
Copy link
Collaborator

teburd commented Apr 25, 2023

Ok correction, I do see what you are seeing now, but lets look at whats on the logic analyzer.

Between spi reads chip select times is 30us, in the polling loop. This is 50KHz polling over a spi bus that itself uses a tight polling loop. Meaning in effect zero computation time is available, that other 20us is the cost of doing business.

Screenshot from 2023-04-25 08-21-02

Then at a larger scale there's a large bank of time where no spi work is done, nearly a second in fact. So something isn't quite right there. Your processing thread isn't being scheduled in between reads like I think you want it to be is that right?

Screenshot from 2023-04-25 08-23-01

That seems like a different issue. The throughput of queueing reads and getting completions seems to be quite good to me.

@yperess
Copy link
Collaborator Author

yperess commented Apr 26, 2023

@teburd @MaureenHelm I believe this is ready for first pass review (the sample used here is a hacked sensor_shell and that final commit will be replaced tomorrow by actually updating the sensor shell implementation to use this API instead. Some metrics from the TDK Robokit1 board:

When running the sensor at 32kHz, and using a tight sampling loop:

Existing fetch/get APIs
31,678Hz (1.00625% of the samples were dropped) and used 48 bytes for 6 channels of data with 1µm/s² precision for the accelerometer and 1µrad/s for the gyroscope

Async API
31330Hz (2.09375% of the samples were dropped) and used 48 bytes for 6 channels of data + 64bit timestamp with a worst case precision of 0.073µm/s² and 0.0016µrad/s for the accelerometer and gyroscope respectively.

(Precision is only theoretical while drivers don't implement this because the Q values are generated from the original APIs)

What's next?

  • Update the sample by actually implementing the sensor_shell get API with the new API.
  • Post a second PR to demonstrate the next layer which is the streaming support. It will build ontop of these changes.

@yperess yperess force-pushed the peress/sensors branch 2 times, most recently from f3a650c to 73ebfe5 Compare April 27, 2023 08:04
@yperess
Copy link
Collaborator Author

yperess commented Apr 27, 2023

@teburd @MaureenHelm Updates:

  1. timestamp is now ticks
  2. uint32_t scale is now an int8_t shift so data compression is better (reading accel_xyz + gyro_xyz uses a smaller buffer than the existing APIs). Also, using the shift instead of a free form scale makes for faster math
  3. Updated the sensor_shell.c to use the new API which worked out a few bugs when dynamically configuring the read iodev.

Sample:

uart:~$ sensor get icm42688p@0 accel_xyz gyro_xyz die_temp
channel idx=0 accel_x value=1.626859
channel idx=1 accel_y value=1.321598
channel idx=2 accel_z value=5.795164
channel idx=4 gyro_x value=0.002663
channel idx=5 gyro_y value=-0.018384
channel idx=6 gyro_z value=0.005594
channel idx=12 die_temp value=28.313706

@yperess yperess marked this pull request as ready for review April 27, 2023 12:36
@yperess yperess added the DNM This PR should not be merged (Do Not Merge) label Apr 27, 2023
west.yml Outdated
@@ -22,16 +22,9 @@ manifest:
- name: upstream
url-base: https://github.com/zephyrproject-rtos

group-filter: [-babblesim]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wonder why this is needed :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't been able to run west or twister without this revert

@yperess yperess force-pushed the peress/sensors branch 3 times, most recently from cd5c2b2 to 0cebe93 Compare June 6, 2023 02:26
@yperess yperess force-pushed the peress/sensors branch 3 times, most recently from c26fdc6 to 0a11578 Compare June 27, 2023 07:15
Introduce a streaming API that uses the same data path as the async API.

This includes features to the decoder:
* Checking if triggers are present

Adding streaming features built ontop of existing triggers:
* Adding 3 operations to be done on a trigger
  * include - include the data with the trigger information
  * nop - do nothing
  * drop - drop the data (flush)
* Add a new sensor_stream() API to mirror sensor_read() but add an
optional handler to be able to cancel the stream.

Signed-off-by: Yuval Peress <peress@google.com>
topic#sensor_stream
Add streaming implementation for icm42688 using both threshold and
full FIFO triggers.

Signed-off-by: Yuval Peress <peress@google.com>
topic#sensor_stream
Signed-off-by: Yuval Peress <peress@google.com>
@github-actions
Copy link

This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants