-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize memory usage when scanning processes and files #33
Comments
From plus...@gmail.com on December 22, 2011 02:06:05 YARA uses memory-mapped files. If the scanned file is big your virtual memory usage will grow a lot, but that doesn't means that your physical memory usage should be the same, that depends on the available RAM and file size, the operating system will do it's magic to remove from physical memory the portions of the file that are not in use. Process scanning is done in chunks, however the chunks can be big if the target process have big blocks of contiguous memory allocated. I think there is some room for improvement here. |
I want to try a fix for process memory reading. Basically adding a new The caller would call When the caller is done reading the section, it would have to call a The benefit would be much smaller memory footprint during process scanning at the expense of potential memory consistency. I need this behavior because the current memory characteristics will not work for the environment we want to use YARA in. |
YARA does seem to be using memory mapped files for OSX and Linux, but the Windows implementation using ReadProcessMemory, does result in a a huge increase in physical memory usage if I'm not mistaken. Good luck, Kallenreed. This would be hugely helpful for me as well. I had started coding a POC implementation similar to what you've described for Windows. So far, my solution was to no longer read data into each block in yr_process_get_memory and instead just stuff the process handle into the data field in the block structure. Then, when iterating over blocks in yr_rules_scan_mem_blocks I can call ReadProcessMemory to load the data for that one block, scan it, release it, and move onto the next block. Your implementation using a SectionReader is much cleaner though, let me know if there is any way I can help! |
Yeah, you don't really want to leak the platform specific code into the rules handler. I'll give my design a try this weekend and see if I can make it work for all three platforms. I don't have a Linux build machine handy so I'll need some help testing once I get a PR ready. Sent from my iPhone On Feb 18, 2016, at 14:48, Jeremy Humble <notifications@github.commailto:notifications@github.com> wrote: YARA does seem to be using memory mapped files for OSX and Linux, but the Windows implementation using ReadProcessMemory, does result in a a huge increase in physical memory usage if I'm not mistaken. Good luck, Kallenreed. This would be hugely helpful for me as well. I had started coding a POC implementation similar to what you've described for Windows. So far, my solution was to no longer read data into each block in yr_process_get_memory and instead just stuff the process handle into the data field in the block structure. Then, when iterating over blocks in yr_rules_scan_mem_blocks I can call ReadProcessMemory to load the data for that one block, scan it, release it, and move onto the next block. Your implementation using a SectionReader is much cleaner though, let me know if there is any way I can help! Reply to this email directly or view it on GitHubhttps://github.com/plusvic/yara/issues/33#issuecomment-185966469. |
I have a POC that I'd like some feedback on. The gist is that the section maps are loaded all at once and then sections are read and processed one at a time. I scanned notepad on Windows and the memory usage went from 90MB to 3MB. It's currently only implemented for Windows but I think the pattern will work for BSD/Linux. Error handling isn't all there and the code's not cleaned up, but you can get a sense for where it's going. Concerns
@plusvic I'd like to know if I should continue down this path and finish this up. |
This solution is not correct, you are calling yr_rules_scan_mem_blocks rule test { ...and the strings "foo" and "bar" appear in different sections within the I think the way to go is replacing the loop in yr_rules_scan_mem_blocks while (block != NULL)
} to this: block = _yr_get_first_block(...) while (block != NULL)
} Instead of passing a linked list containing all the blocks to On Sun, Feb 21, 2016 at 8:41 AM, Kyle Reed notifications@github.com wrote:
|
I did point out the context problem in my comment above.
My initial thought was to move the context into yr_rules_scan_procs and the have a _yr_scan_mem_blocks overload that accepted a context. The only reason I liked this was because I considered it less touch. The block callback is a reasonable approach but it will require updating more callers. I'll see what the code looks like. I did have another question re: BSD/Linux. It looks like it does allocate when reading process memory but some of the comments above seemed to indicate otherwise. Is this change needed on those platforms as well? |
POC with a block iterator concept @plusvic This is basically the callback implementation - I added a new type BLOCK_READER that tracks the advancement of the pointer in the buffer linked list. This way the section reader and the block reader behave basically the same way from the caller perspective. Also, it needed to return errors so the callback sets the block through an out param. I wasn't really thrilled about the void* in the callback but I don't know any other polymorphic idioms in C. Anyway, the basic concepts are all there - should I spend the time to finish it? Any major concerns about the design or names? |
This is getting better, but still I have a few things to point out. Why not YR_API int yr_rules_scan_mem_blocks( And YR_BLOCK_ITERATOR looks like this: typedef _YR_BLOCK_ITERATOR { void* context; YR_MEMORY_BLOCK* first(typedef _YR_BLOCK_ITERATOR* _self); // YR_MEMORY_BLOCK* next(typedef _YR_BLOCK_ITERATOR* _self); // } YR_BLOCK_ITERATOR; yr_rules_scan_mem_blocks receives a pointer to a structure In yr_rules_scan_mem_blocks you just do: block = iterator->first(iterator); while (block != NULL) {
} While scanning a process you simply create a new iterator with a pair It would be very useful to provide a way of iterating over the blocks block = iterator->first(iterator); while (block != NULL) {
} Here YR_MEMORY_BLOCK doesn't contain a pointer to the data, you retrieve On Mon, Feb 22, 2016 at 6:38 AM, Kyle Reed notifications@github.com wrote:
|
Few thoughts
Given the above, we need the following types
EDIT: I think I have an idea that will work. I'll send out an update soon. |
You don't need to open the process for each block, you open the process
You can do the same thing here, you can pre-fetch the list of VM sections
Yes, that's true.
We can use int fetch_block(char** data) this doesn't means that the caller
Yes, the idea is removing the "data" field from MEMORY_BLOCK, and accessing PROCESS_ITERATOR_CONTEXT to hold the attached process context, a We need a PROCESS_ITERATOR_CONTEXT for iterating over the process sections, On Mon, Feb 22, 2016 at 4:56 PM, Kyle Reed notifications@github.com wrote:
|
I started a the impl of the design and ran into some issues. Some of which you addressed above, I added some comments in the commit.
I was considering a second type like
|
Yes, block->data in some modules, and in exec.c, but in all cases can be On Mon, Feb 22, 2016 at 8:43 PM, Kyle Reed notifications@github.com wrote:
|
Sounds good. I'll give that a try. |
Here's what the change to add the block iterator everywhere looks like. I tried to minimize the touch, but I had to add null checks after fetch and (hopefully) do the right thing on failure. This does not include the new process reader yet but you know how that's going to look. My only concern is that before, reading process memory was done once at the expense of memory. Now if there are multiple modules enabled, process memory will be read multiple times - adding to CPU usage. As you mentioned, not all modules actually read all blocks so it may not be that bad. I'm concerned by the amount of size of this change. How are big changes normally tested? EDIT: |
@kallanreed sorry for not answered before, I've been very busy these days. Yes, I've found a problem that we haven't solved yet. The YR_MATCH structure has a field named We need to solve that issue before merging these changes. I haven't thought about the best solution yet, but if you have any proposal please tell me. |
So I'm clear, the bug is here in scan_mem_blocks:
The problem is that the callback is passed a rule which contains a pointer to the freed block. I don't quite understand how the rule references the match (the arena?) but I think I understand the issue. I can repro with Are there any other places the callback will be called with match data that I'm missing? I didn't think so after a quick check. @plusvic Few top of mind thoughts:
|
This issue was solved. See #418. |
From zeroStei...@gmail.com on December 20, 2011 19:01:35
This is a feature request not a bug.
It would be helpful if Yara would not load entire files into memory to scan them but step through them with an sizeable overlap to avoid false negatives. It would also be very helpful if Yara would use a similar chunking method to scan processes because files can at least be scanned in as chunks of data by the controlling process.
Basically Yara's memory usage needs to have some limitation.
Original issue: http://code.google.com/p/yara-project/issues/detail?id=33
The text was updated successfully, but these errors were encountered: