Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] performance "optimisation" #11

Closed
false opened this issue Oct 22, 2019 · 5 comments
Closed

[Question] performance "optimisation" #11

false opened this issue Oct 22, 2019 · 5 comments
Labels
question Question and answer

Comments

@false
Copy link

false commented Oct 22, 2019

Hello,
I am trying to figure out the most efficient way to read data in term of performance/latency using the lib.
I have a loop which reads (pciescreamer r02, 4.0 and 4.3) data. I then parse it (c++).
The loop reads about 45000 bytes of continuous data, but I don't use most of it.
I made another version which reads (vmm readmem) only the parts that I use, which results in 4 calls of 800 bytes, 400, 2 and 2.

Is that better to grab one big amount of data or to split it up in smaller multiple reads ?

I profiled the execution with both implementation but the results are vague, and I can't determine the best way to use the lib/hardware to get optimal performances. Any recommendation ? Thanks :-)

@ufrisk
Copy link
Owner

ufrisk commented Oct 22, 2019

I'm assuming you use the VMMDLL_FLAG_NOCACHE flag on your reads to get fresh data from the device and not reading from the internal cache.

In general, multiple smaller reads are almost always slower than one larger read (even if the larger read is significantly larger than the multiple smaller reads combined.

Also, internally the minimum read is 4096 bytes aligned to 4096-boundrys. Reading 800 bytes won't speed things up one single bit as compared to read 4096 bytes.

The amount of memory do matter though. Your best shot at this is probably to read your multiple non-contiguous chunks of memory one one single call using the VMMDLL_MemReadScatter function.

The function is somewhat complex, it takes an array of pointers that points to pMEM structs. These structs must be initialized with some values and will allow you to read 0x1000 (page) sized memory at page-boundaries. I just now realize I don't have a good example of this in my example project. I'll try to add this in the next couple of days.

@ufrisk
Copy link
Owner

ufrisk commented Oct 22, 2019

Example updated with MemReadScatter function.

@ufrisk ufrisk added the question Question and answer label Oct 22, 2019
@false
Copy link
Author

false commented Oct 23, 2019

I was wondering : with that method I can just pass an array of all my small values I have to read and it will read it the most efficient way right ? I was using it with 800 400 2 and 2 bytes then parse, but If I pass an array of 15 pMem structs you confirm to me it will be as good if not better in term or efficiency?

@ufrisk
Copy link
Owner

ufrisk commented Oct 24, 2019

If you pass an array of 15 pMem structs it will read 15 4096-byte (page-sized) chunks (provided that they are on page boundaries.

It does not matter if you read 2, 4, 8 or any other memory amount if they are read from within the same memory page. Regardless how many bytes you read inside the same memory page it will read the whole memory page.

All pMem requested memory will be read in one single read (or at least up until the maximum supported by the FPGA).

@false
Copy link
Author

false commented Oct 28, 2019

Ok, thanks a lot for these precious informations, it helped.

@false false closed this as completed Oct 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Question and answer
Projects
None yet
Development

No branches or pull requests

2 participants