Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add memory snapshots #1820

Merged
merged 8 commits into from
Jul 11, 2023
Merged

add memory snapshots #1820

merged 8 commits into from
Jul 11, 2023

Conversation

PhilippTakacs
Copy link

Hi

To implement a fuzzer based on unicorn we would like to have the ability to make snapshots of the memory. This is the current version of my changes. There are some corner case I have to check and think about how best handle them.

The basic idea is to use the priority of MemoryRegion and overlap multiple MemoryRegion based on the current snapshot level. On a write (after a snapshot) a new MemoryRegion will be created and mapped with the priority of the current snapshot level. To restore a snapshot the MemoryRegions with smaller priority will be unmapped.

This also includes a updated version for split_region which works with aliases. This allows to unmap parts of a mapping or changing priority without copy.

The implementation only adds a few 0 checks when snapshots are not used.

@aquynh
Copy link
Member

aquynh commented Apr 5, 2023 via email

@wtdcode
Copy link
Member

wtdcode commented Apr 6, 2023

Pretty nice work. For API design, I have two ideas:

  1. mimic uc_context API
uc_err uc_alloc_snapshot(uc_engine* uc, uc_snapshot** snapshot);
uc_err uc_take_snapshot(uc_engine* uc, uc_snapshot* snapshot);
uc_err uc_restore_snapshot(uc_engine* uc, uc_snapshot* snapshot);
uc_err uc_free_snapshot(uc_engine* uc, uc_snapshot* snapshot);
  1. merge into uc_context API.

If we don't change current signature of the uc_context, internally we could first add a new uc_ctl to indicate the contents a context will save, say cpu or memory or cpu+memory. That is to say, uc_context_save will save different contexts regarding this uc_ctl. Note we will record what we have saved (cpu, memory or both) in uc_context struct so that when we restore or free the context, we know what we should overwrite.

@PhilippTakacs
Copy link
Author

PhilippTakacs commented Apr 6, 2023

My approach was to use Copy on Write to implement snapshots. So making a snapshot is literally only increasing a counter. On write operations then is checked if the priority of the MemoryRegion is smaller then the current snapshot_level. When this is the case a new MemoryRegion is created and mapped at the address with a higher priority. To restore a snapshot the MemoryRegions are filtered by the priority and the snapshot_level is decreased. So the snapshot_level can only increased to take a snapshot and decreased to restore the last snapshot.

For API design, I have two ideas:

1. mimic `uc_context` API

This don't work with this implementation. The problem is making dumps of the complete Memory would be to big to be useful.

2. merge into `uc_context` API.

Some sort of combination with a context save is planed, but it's not that easy because the uc_context and snapshots work different. You can have multiple uc_context but snapshots can only increased and decreased.

@wtdcode
Copy link
Member

wtdcode commented Apr 7, 2023

My approach was to use Copy on Write to implement snapshots. So making a snapshot is literally only increasing a counter. On write operations then is checked if the priority of the MemoryRegion is smaller then the current snapshot_level. When this is the case a new MemoryRegion is created and mapped at the address with a higher priority. To restore a snapshot the MemoryRegions are filtered by the priority and the snapshot_level is decreased. So the snapshot_level can only increased to take a snapshot and decreased to restore the last snapshot.

For API design, I have two ideas:

1. mimic `uc_context` API

This don't work with this implementation. The problem is making dumps of the complete Memory would be to big to be useful.

2. merge into `uc_context` API.

Some sort of combination with a context save is planed, but it's not that easy because the uc_context and snapshots work different. You can have multiple uc_context but snapshots can only increased and decreased.

I got it. Then

My approach was to use Copy on Write to implement snapshots. So making a snapshot is literally only increasing a counter. On write operations then is checked if the priority of the MemoryRegion is smaller then the current snapshot_level. When this is the case a new MemoryRegion is created and mapped at the address with a higher priority. To restore a snapshot the MemoryRegions are filtered by the priority and the snapshot_level is decreased. So the snapshot_level can only increased to take a snapshot and decreased to restore the last snapshot.

For API design, I have two ideas:

1. mimic `uc_context` API

This don't work with this implementation. The problem is making dumps of the complete Memory would be to big to be useful.

I can't get it. Why not just track which memory region we should restore? I mean, say:

uc_snapshot_alloc(uc, &s1); // Also save this snapshot pointer in UC instance
uc_take_snapshot(uc, s1);  // Point s1 to current memory regions.

uc_mem_write(uc...); // CoW overlays a new memory region

uc_restore_snapshot(uc, s1); // Note s1 points to all previous memory regions so we can remove all regions overlayed before.

Note this infers that "snapshots" must be organized as a tree structure (which is also a common design in hypervisors like VMWare). That is to say, if we take the second snapshot s2 after s1, when we restore s1, s2 should be invalidated.

2. merge into `uc_context` API.

Some sort of combination with a context save is planed, but it's not that easy because the uc_context and snapshots work different. You can have multiple uc_context but snapshots can only increased and decreased.

@PhilippTakacs
Copy link
Author

To support this the internal data structure of the memory need to be also a something with supports CoW (i.e. a Merkle tree). What could be done is to also add some serialization and combine this with the snapshot_level to only copy changed memory regions. But this is out of scope for this PR.

The memory is internal stored in lists (QTAILQ), therefor I can't do CoW on the AddressSpace/MemoryRegion. But I can use the priority feature to have a MemoryRegion override another without changing the Data. But this way the "snapshot" is only stored internal and can only increased and decreased. What I can add is a way to increase decrease the level more then one step. This leads to some corner cases I don't know how to handle, therefor I haven't implemented this.

Why not just track which memory region we should restore

In some way this is already done, but only internal. This stores the snapshot_level as priority of the MemoryRegion. So the rollback is only remove the regions with a higher or equal priority then the current level and then decrease the level. This is given more or less for free, because the priority is also used to decide which region is active.

What can be added is that context_save also makes a snapshot and on context_restore this snapshot is restored. But this only works with the same uc_engine object.

A bit more context what we want to do with this feature: We would like to emulate a program till it's reads an input then make a snapshot (including context save and save the internal state of the OS emulation). After that emulate till the next input. then check if this has lead to a new block, if not roll back and test with a different input.

@wtdcode
Copy link
Member

wtdcode commented Apr 8, 2023

To support this the internal data structure of the memory need to be also a something with supports CoW (i.e. a Merkle tree). What could be done is to also add some serialization and combine this with the snapshot_level to only copy changed memory regions. But this is out of scope for this PR.

The memory is internal stored in lists (QTAILQ), therefor I can't do CoW on the AddressSpace/MemoryRegion. But I can use the priority feature to have a MemoryRegion override another without changing the Data. But this way the "snapshot" is only stored internal and can only increased and decreased. What I can add is a way to increase decrease the level more then one step. This leads to some corner cases I don't know how to handle, therefor I haven't implemented this.

I got your point and I forget memory regions are organized in lists, which is not ideal for CoW indeed.

Why not just track which memory region we should restore

In some way this is already done, but only internal. This stores the snapshot_level as priority of the MemoryRegion. So the rollback is only remove the regions with a higher or equal priority then the current level and then decrease the level. This is given more or less for free, because the priority is also used to decide which region is active.

What can be added is that context_save also makes a snapshot and on context_restore this snapshot is restored. But this only works with the same uc_engine object.

You could assume a context is only restored with the same uc_engine object. Or we could firstly record all snapshots (contexts) we have saved and check against it when a context is going to be restored. Note this was once implemented as a workaround to support nested uc_emu_start but soon got removed.

Reference: 1044403

A bit more context what we want to do with this feature: We would like to emulate a program till it's reads an input then make a snapshot (including context save and save the internal state of the OS emulation). After that emulate till the next input. then check if this has lead to a new block, if not roll back and test with a different input.

That sounds like what fuzzware once did and we would like to backport and I perfectly understand your motivation. Could you try to merge this to uc_context_save and provide a switch as uc_ctl?

I will start to review the implementation details probably next week, sorry for possible delay. XD

@PhilippTakacs
Copy link
Author

Could you try to merge this to uc_context_save and provide a switch as uc_ctl?

I have add this. If enabled uc_context_save makes a snapshot. uc_context_restore will restore to the point directly after the snapshot is taken.

That sounds like what fuzzware once did and we would like to backport

This sounds interesting. Is there some sort of overview about what and how they implement this?

@wtdcode
Copy link
Member

wtdcode commented Apr 11, 2023

Could you try to merge this to uc_context_save and provide a switch as uc_ctl?

I have add this. If enabled uc_context_save makes a snapshot. uc_context_restore will restore to the point directly after the snapshot is taken.

Really nice, thanks!

That sounds like what fuzzware once did and we would like to backport

This sounds interesting. Is there some sort of overview about what and how they implement this?

Check this: https://github.com/fuzzware-fuzzer/fuzzware-emulator/blob/20a3deb9227606f2efd88baa345bc3ad695d34a2/harness/fuzzware_harness/native/uc_snapshot.c

They implemented CoW by manipulating memory protections as explained in the file.

@PhilippTakacs PhilippTakacs marked this pull request as ready for review April 17, 2023 14:52
@PhilippTakacs
Copy link
Author

I think the code is complete now and works as inspected.

I'll write some tests and add more documentation in the next days.

@PhilippTakacs PhilippTakacs marked this pull request as draft April 24, 2023 11:22
@PhilippTakacs
Copy link
Author

Found some issues with the alias handling, which needs to be resolved before merging this.

Problem is to find the region inside the address_space_memory. A simple solution would be to disallow uc_mem_unmap/uc_mem_prot in combination with snapshots.

@PhilippTakacs
Copy link
Author

Sorry for the force push, but I had to do some rewrite.

The snapshots still works the same way, but now uc_mem_unmap and uc_mem_protect can't be used in combination with snapshots. I don't think this is a big issue, because with the vtlb and the tlb_hook change protection and unmapping is still possible.

The snapshots should work as expected. I'll write some more tests and documentation.

@PhilippTakacs PhilippTakacs marked this pull request as ready for review April 27, 2023 14:19
@PhilippTakacs
Copy link
Author

I will start to review the implementation details probably next week, sorry for possible delay. XD

Hi just wanted to know if you had time to look at the code or when you have time for it?

@wtdcode
Copy link
Member

wtdcode commented May 15, 2023

I will start to review the implementation details probably next week, sorry for possible delay. XD

Hi just wanted to know if you had time to look at the code or when you have time for it?

I'm sick recently and spend most of my time in bed. Sorry I can't estimate when I could review this but I will soon have another docker appointment recently.

qemu/exec.c Show resolved Hide resolved
@@ -1145,6 +1164,7 @@ void qemu_ram_free(struct uc_struct *uc, RAMBlock *block)

QLIST_REMOVE_RCU(block, next);
uc->ram_list.mru_block = NULL;
uc->ram_list.freed = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand why a new field freed was added.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an optimization. With the benchmark we found that ram write time increase over the number of snapshots. So I have look at this with a profiler and found that most time is spend in find_ram_offset(). This function search for the best fit of a new RAMBlock in the ram_addr_t address space (special internal address space for RAMBlocks). Because of the data structure this takes O(n^2).

When no RAMBlock was freed in this address space the new RAMBlock will be added at the end. ram_list.freed stores this information. Based on this boolean find_ram_offset_last() is called. This function skips the expensive search and just add the RAMBlock at the end of the address space.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read find_ram_offset again and think a bit. Is it possible that we disable this optimization for fragmentation?

        /* If it fits remember our place and remember the size
         * of gap, but keep going so that we might find a smaller
         * gap to fill so avoiding fragmentation.
         */
        if (next - candidate >= size && next - candidate < mingap) {
            offset = candidate;
            mingap = next - candidate;
        }

In other words:

        /* If it fits remember our place and remember the size
         * of gap, but keep going so that we might find a smaller
         * gap to fill so avoiding fragmentation.
         */
        if (next - candidate >= size && next - candidate < mingap) {
            offset = candidate;
            mingap = next - candidate;
            break; // Stop searching.
        }

Could you benchmark if this solution is fast enough?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand is the ram_addr_t address space an optimization for the dirty memory tracking. The allocator avoids fragmentation to have less bookkeeping in the memory tracking. So changing this allocator to fragment more might influence the execution speed in general, independent from the use of snapshots.

Could you benchmark if this solution is fast enough?

Will do, but my benchmark only benchmark the snapshots, not the overall execution of an complex binary with dynamic maps/unmaps. So I can't tell if this change influence the emulation speed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmarked it, it is about the same speed as without an optimization.

Looking at the code again, it's clear why: Adding fragmentation would only work, if there would be some free space to fragment. But in my case there is no free space so it needs to search for the block with the highest address. The ram_list.blocks is decreasing ordered by the size of the block so the later blocks by snapshots (only one page) will be added at the end. So for my use case adding fragmentation will not help.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, I'm actually curious why QEMU doesn't have this problem (see https://github.com/qemu/qemu/blob/154e3b61ac9cfab9639e6d6207a96fff017040fe/softmmu/physmem.c#L1433) because this optimization seems so straightforward and common. A possible explanation is that, most qemu system emulation cases don't have too many ram blocks?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also guess having many ram blocks is quite uncommon. Also dynamic allocating many ram blocks is not a thing in qemu.

uc.c Outdated Show resolved Hide resolved
@wtdcode
Copy link
Member

wtdcode commented May 19, 2023

I carefully check your code and I love the idea to utilize existing qemu mechanism to support this. However, some concerns I have came up so far:

  • compatibility with uc_mem_map_ptr, e.g. will we invalidate the pointer?
  • API design, see comments above
  • what if user unmaps the regions with snapshot_level > 0?

@PhilippTakacs
Copy link
Author

PhilippTakacs commented May 22, 2023

* compatibility with `uc_mem_map_ptr`, e.g. will we invalidate the pointer?

The uc_mem_map_ptr will work as before. The change is on CoW a new RAMBlock is allocated and mapped overlapping. So without using snapshots everything behaves the same. Only after a snapshot was made writes through unicorn will not access the pointer and reads for outside might not correspond to the current state.

* what if user unmaps the regions with `snapshot_level > 0`?

I have look long time to find a solution for this problem. I haven't found a good one, so uc_mem_unmap (edit: same for uc_mem_protect) will just return an error if snapshot_level > 0. To do unmapping there is still the tlb accessible. Therefore I would this would be nice to have but not important.

@PhilippTakacs
Copy link
Author

Sorry for the force push wanted the CI to run and remove the fixup commits.

@wtdcode
Copy link
Member

wtdcode commented May 23, 2023

* compatibility with `uc_mem_map_ptr`, e.g. will we invalidate the pointer?

The uc_mem_map_ptr will work as before. The change is on CoW a new RAMBlock is allocated and mapped overlapping. So without using snapshots everything behaves the same. Only after a snapshot was made writes through unicorn will not access the pointer and reads for outside might not correspond to the current state.

That’s exactly my concern because the pointer will point to some outdated content. But I also don’t see any obvious solution so maybe we can only document this clearly.

* what if user unmaps the regions with `snapshot_level > 0`?

I have look long time to find a solution for this problem. I haven't found a good one, so uc_mem_unmap (edit: same for uc_mem_protect) will just return an error if snapshot_level > 0. To do unmapping there is still the tlb accessible. Therefore I would this would be nice to have but not important.

What if we just drop all snapshots and unmap the region? Regarding uc_mem_protect, probably more cases have to be carefully handled.

@PhilippTakacs
Copy link
Author

* compatibility with `uc_mem_map_ptr`, e.g. will we invalidate the pointer?

The uc_mem_map_ptr will work as before. The change is on CoW a new RAMBlock is allocated and mapped overlapping. So without using snapshots everything behaves the same. Only after a snapshot was made writes through unicorn will not access the pointer and reads for outside might not correspond to the current state.

That’s exactly my concern because the pointer will point to some outdated content. But I also don’t see any obvious solution so maybe we can only document this clearly.

This is kind of expected, because you can use the same pointer at two places with different writes (i.e. multiple mmap with MAP_PRIVAT). I'll write some examples and documentation for this.

* what if user unmaps the regions with `snapshot_level > 0`?

I have look long time to find a solution for this problem. I haven't found a good one, so uc_mem_unmap (edit: same for uc_mem_protect) will just return an error if snapshot_level > 0. To do unmapping there is still the tlb accessible. Therefore I would this would be nice to have but not important.

What if we just drop all snapshots and unmap the region? Regarding uc_mem_protect, probably more cases have to be carefully handled.

For full unmap it would be possible to unmap the region and store it inside the uc_struct with the current level. On a restore just remap the region. The problem is partial unmap. With the current way CoW is implemented it would require to itterate over the FlatView and copy the old region in the worst case per page (expect unmaped parts). This could be done, but it's would be to expensive for us.

I have tried to implement this in a way without copy so that the unmap/protect without snapshots would be faster as well. But I have not found a sane way to do this.

@wtdcode
Copy link
Member

wtdcode commented May 26, 2023

* compatibility with `uc_mem_map_ptr`, e.g. will we invalidate the pointer?

The uc_mem_map_ptr will work as before. The change is on CoW a new RAMBlock is allocated and mapped overlapping. So without using snapshots everything behaves the same. Only after a snapshot was made writes through unicorn will not access the pointer and reads for outside might not correspond to the current state.

That’s exactly my concern because the pointer will point to some outdated content. But I also don’t see any obvious solution so maybe we can only document this clearly.

This is kind of expected, because you can use the same pointer at two places with different writes (i.e. multiple mmap with MAP_PRIVAT). I'll write some examples and documentation for this.

* what if user unmaps the regions with `snapshot_level > 0`?

I have look long time to find a solution for this problem. I haven't found a good one, so uc_mem_unmap (edit: same for uc_mem_protect) will just return an error if snapshot_level > 0. To do unmapping there is still the tlb accessible. Therefore I would this would be nice to have but not important.

What if we just drop all snapshots and unmap the region? Regarding uc_mem_protect, probably more cases have to be carefully handled.

For full unmap it would be possible to unmap the region and store it inside the uc_struct with the current level. On a restore just remap the region. The problem is partial unmap. With the current way CoW is implemented it would require to itterate over the FlatView and copy the old region in the worst case per page (expect unmaped parts). This could be done, but it's would be to expensive for us.

I see and make sense and my idea is:

  • If it's a full unmap, we do everything nicely, unmap the region and map the region when restoring the snapshot later on.
  • If it's a partial unmap, tell users an error as you previous proposed. This is fine if we document it.

@PhilippTakacs
Copy link
Author

So unmap is implemented. the hack for uc->mapped_blocks bugs me a bit, but this would require a bigger redesign to make this clean.

For the types (mr->addr and mr->size) I'm currently looking for a solution. Next I'll add some test, samples and some documentation.

@PhilippTakacs
Copy link
Author

So added some more tests and fixed some bugs now the unmap works also with snapshots. I'm not realy happy with it, because it needs some dirty hacks. The read of the first subregion I can't fix. The hack with the snapshot_level at unmap can be fixed by using a new struct. All in all it's many fiddling around with the memory regions. Also I don't expect that this is used very often.

To fix the merge conflicts I would like to rebase the branch (and forcepush). This way I can also remove the fixup commits and update some commit messages.

By the way: can you add me to the CREDITS.TXT?

@wtdcode
Copy link
Member

wtdcode commented Jun 12, 2023

By the way: can you add me to the CREDITS.TXT?

We are fine on this and please add a new line next time! Thanks for your contributions! ;)

@PhilippTakacs
Copy link
Author

Hi, just a friendly reminder for this pr. Or did I forgot some change requests?

@@ -242,6 +244,10 @@ static uc_err uc_init(uc_engine *uc)
uc->reg_reset(uc);
}

uc->context_content = UC_CTL_CONTEXT_CPU;

uc->unmapped_regions = g_array_new(false, false, sizeof(MemoryRegion*));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fail to find the location to free this memory. Do I miss something?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I missed the free.

@wtdcode
Copy link
Member

wtdcode commented Jun 28, 2023

The PR looks pretty good to me and is ready to go! Just a few minor changes left to be solved.

Thanks!

@PhilippTakacs
Copy link
Author

The PR looks pretty good to me and is ready to go! Just a few minor changes left to be solved.

Did I miss something or is now everything ready to go?

@wtdcode
Copy link
Member

wtdcode commented Jul 11, 2023

The PR looks pretty good to me and is ready to go! Just a few minor changes left to be solved.

Did I miss something or is now everything ready to go?

Really sorry that I thought I have replied to you. This PR is ready to go and just fixing the conflicting files should be fine enough.

Uses Copy on Write to make it posible to restore the memory state after a snapshot
was made. To restore all MemoryRegions created after the snapshot are removed.
simple benchmark for the snapshots
The ram_offset allocator searches the smalest gap in the ram_offset address space.
This is slow especialy in combination with many allocation (i.e. snapshots). When
it is known that there is no gap, this is now optimized.
still has todos and need tests
@wtdcode wtdcode merged commit e88264c into unicorn-engine:dev Jul 11, 2023
27 checks passed
@wtdcode
Copy link
Member

wtdcode commented Jul 11, 2023

Great thanks!

@PhilippTakacs
Copy link
Author

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants