-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add memory snapshots #1820
add memory snapshots #1820
Conversation
In unicorn.h we dont use uc_struct, but uc_engine.
|
Pretty nice work. For API design, I have two ideas:
If we don't change current signature of the |
My approach was to use Copy on Write to implement snapshots. So making a snapshot is literally only increasing a counter. On write operations then is checked if the priority of the
This don't work with this implementation. The problem is making dumps of the complete Memory would be to big to be useful.
Some sort of combination with a context save is planed, but it's not that easy because the |
I got it. Then
I can't get it. Why not just track which memory region we should restore? I mean, say:
Note this infers that "snapshots" must be organized as a tree structure (which is also a common design in hypervisors like VMWare). That is to say, if we take the second snapshot s2 after s1, when we restore s1, s2 should be invalidated.
|
To support this the internal data structure of the memory need to be also a something with supports CoW (i.e. a Merkle tree). What could be done is to also add some serialization and combine this with the snapshot_level to only copy changed memory regions. But this is out of scope for this PR. The memory is internal stored in lists (QTAILQ), therefor I can't do CoW on the
In some way this is already done, but only internal. This stores the What can be added is that A bit more context what we want to do with this feature: We would like to emulate a program till it's reads an input then make a snapshot (including context save and save the internal state of the OS emulation). After that emulate till the next input. then check if this has lead to a new block, if not roll back and test with a different input. |
I got your point and I forget memory regions are organized in lists, which is not ideal for CoW indeed.
You could assume a context is only restored with the same Reference: 1044403
That sounds like what fuzzware once did and we would like to backport and I perfectly understand your motivation. Could you try to merge this to I will start to review the implementation details probably next week, sorry for possible delay. XD |
I have add this. If enabled uc_context_save makes a snapshot. uc_context_restore will restore to the point directly after the snapshot is taken.
This sounds interesting. Is there some sort of overview about what and how they implement this? |
Really nice, thanks!
They implemented CoW by manipulating memory protections as explained in the file. |
c5f68ec
to
f0c6c89
Compare
I think the code is complete now and works as inspected. I'll write some tests and add more documentation in the next days. |
Found some issues with the alias handling, which needs to be resolved before merging this. Problem is to find the region inside the address_space_memory. A simple solution would be to disallow |
59e9a16
to
90bf82e
Compare
Sorry for the force push, but I had to do some rewrite. The snapshots still works the same way, but now The snapshots should work as expected. I'll write some more tests and documentation. |
Hi just wanted to know if you had time to look at the code or when you have time for it? |
I'm sick recently and spend most of my time in bed. Sorry I can't estimate when I could review this but I will soon have another docker appointment recently. |
@@ -1145,6 +1164,7 @@ void qemu_ram_free(struct uc_struct *uc, RAMBlock *block) | |||
|
|||
QLIST_REMOVE_RCU(block, next); | |||
uc->ram_list.mru_block = NULL; | |||
uc->ram_list.freed = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't understand why a new field freed
was added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an optimization. With the benchmark we found that ram write time increase over the number of snapshots. So I have look at this with a profiler and found that most time is spend in find_ram_offset()
. This function search for the best fit of a new RAMBlock
in the ram_addr_t
address space (special internal address space for RAMBlocks). Because of the data structure this takes O(n^2).
When no RAMBlock
was freed in this address space the new RAMBlock
will be added at the end. ram_list.freed
stores this information. Based on this boolean find_ram_offset_last()
is called. This function skips the expensive search and just add the RAMBlock at the end of the address space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read find_ram_offset
again and think a bit. Is it possible that we disable this optimization for fragmentation?
/* If it fits remember our place and remember the size
* of gap, but keep going so that we might find a smaller
* gap to fill so avoiding fragmentation.
*/
if (next - candidate >= size && next - candidate < mingap) {
offset = candidate;
mingap = next - candidate;
}
In other words:
/* If it fits remember our place and remember the size
* of gap, but keep going so that we might find a smaller
* gap to fill so avoiding fragmentation.
*/
if (next - candidate >= size && next - candidate < mingap) {
offset = candidate;
mingap = next - candidate;
break; // Stop searching.
}
Could you benchmark if this solution is fast enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand is the ram_addr_t address space an optimization for the dirty memory tracking. The allocator avoids fragmentation to have less bookkeeping in the memory tracking. So changing this allocator to fragment more might influence the execution speed in general, independent from the use of snapshots.
Could you benchmark if this solution is fast enough?
Will do, but my benchmark only benchmark the snapshots, not the overall execution of an complex binary with dynamic maps/unmaps. So I can't tell if this change influence the emulation speed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmarked it, it is about the same speed as without an optimization.
Looking at the code again, it's clear why: Adding fragmentation would only work, if there would be some free space to fragment. But in my case there is no free space so it needs to search for the block with the highest address. The ram_list.blocks
is decreasing ordered by the size of the block so the later blocks by snapshots (only one page) will be added at the end. So for my use case adding fragmentation will not help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, I'm actually curious why QEMU doesn't have this problem (see https://github.com/qemu/qemu/blob/154e3b61ac9cfab9639e6d6207a96fff017040fe/softmmu/physmem.c#L1433) because this optimization seems so straightforward and common. A possible explanation is that, most qemu system emulation cases don't have too many ram blocks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also guess having many ram blocks is quite uncommon. Also dynamic allocating many ram blocks is not a thing in qemu.
I carefully check your code and I love the idea to utilize existing qemu mechanism to support this. However, some concerns I have came up so far:
|
The
I have look long time to find a solution for this problem. I haven't found a good one, so |
Sorry for the force push wanted the CI to run and remove the fixup commits. |
That’s exactly my concern because the pointer will point to some outdated content. But I also don’t see any obvious solution so maybe we can only document this clearly.
What if we just drop all snapshots and unmap the region? Regarding uc_mem_protect, probably more cases have to be carefully handled. |
This is kind of expected, because you can use the same pointer at two places with different writes (i.e. multiple mmap with MAP_PRIVAT). I'll write some examples and documentation for this.
For full unmap it would be possible to unmap the region and store it inside the uc_struct with the current level. On a restore just remap the region. The problem is partial unmap. With the current way CoW is implemented it would require to itterate over the FlatView and copy the old region in the worst case per page (expect unmaped parts). This could be done, but it's would be to expensive for us. I have tried to implement this in a way without copy so that the unmap/protect without snapshots would be faster as well. But I have not found a sane way to do this. |
I see and make sense and my idea is:
|
So unmap is implemented. the hack for For the types ( |
So added some more tests and fixed some bugs now the unmap works also with snapshots. I'm not realy happy with it, because it needs some dirty hacks. The read of the first subregion I can't fix. The hack with the snapshot_level at unmap can be fixed by using a new struct. All in all it's many fiddling around with the memory regions. Also I don't expect that this is used very often. To fix the merge conflicts I would like to rebase the branch (and forcepush). This way I can also remove the fixup commits and update some commit messages. By the way: can you add me to the CREDITS.TXT? |
We are fine on this and please add a new line next time! Thanks for your contributions! ;) |
Hi, just a friendly reminder for this pr. Or did I forgot some change requests? |
@@ -242,6 +244,10 @@ static uc_err uc_init(uc_engine *uc) | |||
uc->reg_reset(uc); | |||
} | |||
|
|||
uc->context_content = UC_CTL_CONTEXT_CPU; | |||
|
|||
uc->unmapped_regions = g_array_new(false, false, sizeof(MemoryRegion*)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fail to find the location to free this memory. Do I miss something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I missed the free.
The PR looks pretty good to me and is ready to go! Just a few minor changes left to be solved. Thanks! |
Did I miss something or is now everything ready to go? |
Really sorry that I thought I have replied to you. This PR is ready to go and just fixing the conflicting files should be fine enough. |
first version has bugs
Uses Copy on Write to make it posible to restore the memory state after a snapshot was made. To restore all MemoryRegions created after the snapshot are removed.
simple benchmark for the snapshots
The ram_offset allocator searches the smalest gap in the ram_offset address space. This is slow especialy in combination with many allocation (i.e. snapshots). When it is known that there is no gap, this is now optimized.
still has todos and need tests
Great thanks! |
Thanks |
Hi
To implement a fuzzer based on unicorn we would like to have the ability to make snapshots of the memory. This is the current version of my changes. There are some corner case I have to check and think about how best handle them.
The basic idea is to use the priority of
MemoryRegion
and overlap multipleMemoryRegion
based on the current snapshot level. On a write (after a snapshot) a newMemoryRegion
will be created and mapped with the priority of the current snapshot level. To restore a snapshot theMemoryRegions
with smaller priority will be unmapped.This also includes a updated version for
split_region
which works with aliases. This allows to unmap parts of a mapping or changing priority without copy.The implementation only adds a few 0 checks when snapshots are not used.