[Question] performance "optimisation" #11

false · 2019-10-22T05:09:30Z

Hello,
I am trying to figure out the most efficient way to read data in term of performance/latency using the lib.
I have a loop which reads (pciescreamer r02, 4.0 and 4.3) data. I then parse it (c++).
The loop reads about 45000 bytes of continuous data, but I don't use most of it.
I made another version which reads (vmm readmem) only the parts that I use, which results in 4 calls of 800 bytes, 400, 2 and 2.

Is that better to grab one big amount of data or to split it up in smaller multiple reads ?

I profiled the execution with both implementation but the results are vague, and I can't determine the best way to use the lib/hardware to get optimal performances. Any recommendation ? Thanks :-)

ufrisk · 2019-10-22T08:19:31Z

I'm assuming you use the VMMDLL_FLAG_NOCACHE flag on your reads to get fresh data from the device and not reading from the internal cache.

In general, multiple smaller reads are almost always slower than one larger read (even if the larger read is significantly larger than the multiple smaller reads combined.

Also, internally the minimum read is 4096 bytes aligned to 4096-boundrys. Reading 800 bytes won't speed things up one single bit as compared to read 4096 bytes.

The amount of memory do matter though. Your best shot at this is probably to read your multiple non-contiguous chunks of memory one one single call using the VMMDLL_MemReadScatter function.

The function is somewhat complex, it takes an array of pointers that points to pMEM structs. These structs must be initialized with some values and will allow you to read 0x1000 (page) sized memory at page-boundaries. I just now realize I don't have a good example of this in my example project. I'll try to add this in the next couple of days.

ufrisk · 2019-10-22T10:24:05Z

Example updated with MemReadScatter function.

false · 2019-10-23T05:07:46Z

I was wondering : with that method I can just pass an array of all my small values I have to read and it will read it the most efficient way right ? I was using it with 800 400 2 and 2 bytes then parse, but If I pass an array of 15 pMem structs you confirm to me it will be as good if not better in term or efficiency?

ufrisk · 2019-10-24T17:19:09Z

If you pass an array of 15 pMem structs it will read 15 4096-byte (page-sized) chunks (provided that they are on page boundaries.

It does not matter if you read 2, 4, 8 or any other memory amount if they are read from within the same memory page. Regardless how many bytes you read inside the same memory page it will read the whole memory page.

All pMem requested memory will be read in one single read (or at least up until the maximum supported by the FPGA).

false · 2019-10-28T05:20:25Z

Ok, thanks a lot for these precious informations, it helped.

ufrisk added the question Question and answer label Oct 22, 2019

false closed this as completed Oct 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] performance "optimisation" #11

[Question] performance "optimisation" #11

false commented Oct 22, 2019 •

edited

Loading

ufrisk commented Oct 22, 2019

ufrisk commented Oct 22, 2019

false commented Oct 23, 2019

ufrisk commented Oct 24, 2019 •

edited

Loading

false commented Oct 28, 2019 •

edited

Loading

[Question] performance "optimisation" #11

[Question] performance "optimisation" #11

Comments

false commented Oct 22, 2019 • edited Loading

ufrisk commented Oct 22, 2019

ufrisk commented Oct 22, 2019

false commented Oct 23, 2019

ufrisk commented Oct 24, 2019 • edited Loading

false commented Oct 28, 2019 • edited Loading

false commented Oct 22, 2019 •

edited

Loading

ufrisk commented Oct 24, 2019 •

edited

Loading

false commented Oct 28, 2019 •

edited

Loading