Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/isf unification #630

Merged
merged 19 commits into from
Jul 20, 2022
Merged

Feature/isf unification #630

merged 19 commits into from
Jul 20, 2022

Conversation

ikelos
Copy link
Member

@ikelos ikelos commented Jan 24, 2022

Big, but hopefully unnoticable change to how we handle a generic list of identifiers and ISF files. Previously this was only used for linux/mac, and the identifiers were the banner, however the guid/age/pdbname could also be used as an identifier for windows ISF files. This luckily should be stored in the ISF files in the metadata section, and the autogeneration code of volatility has always done this. Previously volatility used the file name and location to identify and load the right ISF for windows.

Now, the various OSes are unified under a single cache and allows symbol files to be dropped under any name in any subdirectory under the symbols directory. The primary implementation stores the data in a database that's updated each run (the first run will take a while depending on how many ISF files there are). If the database ever is corrupted, it's wiped and starts again, so it shouldn't be possible for a bad cache to hang volatility, but it's worth checking. The database can return multiple locations for a single identifier (although each location can only store one identifier), but that should result in warnings.|

ISF files are allowed to not contain an identifier, but then they must not have an operating system associated with them. The changes should not have been relied on, however the sqlcache is versionable to make sure that internal components that use it can check it's the right implementation (the interface isn't versioned, perhaps it should be instead of the concrete implementation?).\

isfinfo now makes use of it, as does the pdbutility code.

I think that covers everything, but this could do with a fair amount of tire kicking. There's been a lot changed and it could almost certainly do with tweaks, both in terms of bugs and in terms of how it's architected. Going to wait for review on this rather than the usual week for review...

@ikelos
Copy link
Member Author

ikelos commented Jan 26, 2022

@ikelos
Copy link
Member Author

ikelos commented Jan 26, 2022

And probably go through the code and take out anywhere that makes use of the subpath part of that function.

@ikelos ikelos linked an issue Feb 23, 2022 that may be closed by this pull request
@ikelos
Copy link
Member Author

ikelos commented Mar 3, 2022

Anyone had a chance to prod this yet? It's pretty big so I'd prefer we get it merged before too much else changes the ground beneath it...

@ikelos
Copy link
Member Author

ikelos commented May 28, 2022

This has been sat around for a while, there's been several months for testing. I'd really like someone to get some eyes on it if possible, but if not then I'll just end up merging it and fixing any issues that arise afterwards, so anyone that wants to give it a look please do so! 5:)

Copy link
Contributor

@digitalisx digitalisx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ikelos Thank you for your work! 🙌
I also tested the PR and it seems to work well.
I leave a minor comment in the code :)

volatility3/framework/plugins/isfinfo.py Outdated Show resolved Hide resolved
volatility3/framework/automagic/symbol_cache.py Outdated Show resolved Hide resolved
Copy link
Contributor

@digitalisx digitalisx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ikelos Thank you for applying my suggestion, Additionally I leave small comments on the unification of type hint. It is a recommendation based on Python PEP, not mandatory.. :)

volatility3/framework/automagic/symbol_cache.py Outdated Show resolved Hide resolved
volatility3/framework/configuration/requirements.py Outdated Show resolved Hide resolved
volatility3/framework/automagic/symbol_cache.py Outdated Show resolved Hide resolved
volatility3/framework/automagic/symbol_cache.py Outdated Show resolved Hide resolved
volatility3/framework/automagic/symbol_cache.py Outdated Show resolved Hide resolved
volatility3/framework/automagic/symbol_cache.py Outdated Show resolved Hide resolved
Copy link
Contributor

@digitalisx digitalisx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ikelos Thank you for apply my suggestion.
I think that's all I can review.
And this code looks better then before and it's a good change.
I hope the other maintainers think the same.. 🙂

@ikelos
Copy link
Member Author

ikelos commented May 29, 2022

@digitalisx thanks very much for looking it over and improving it. I'm glad I've now got confidence that it's been checked over at least once, so I'm much happier merging it... I expect it to get merged at the end of the week, I just need to figure out if I need to post an announcement about the change on slack, given the first run will be quite long when it needs to initially build the cache...

@npetroni
Copy link

npetroni commented Jun 1, 2022

When I run windows.pslist against a Windows 10 image on 9be57a4, it produces the correct output. However, subsequent runs of the same command seem to re-parse the PDB file. Is that expected? On develop, the same sequence of commands does not result in a second parsing of the PDB.

The tell-tale sign is the printing of Reading TPI layer in the output. The second run takes significantly longer on my Mac for isf-unification than for develop, unless I am doing something wrong (very possible!).

@ikelos
Copy link
Member Author

ikelos commented Jun 1, 2022

Hmmm, no that's very definitely wrong. It should have cached it and be pulling it back from the cache directory (although it has been a while since I've looked at the code). Lemme see if I can recreate it...

@ikelos
Copy link
Member Author

ikelos commented Jun 1, 2022

Hmmm, so it seems to process it, and update the cache, but the generated JSON never gets written to the directory... 5:S

@ikelos
Copy link
Member Author

ikelos commented Jun 1, 2022

Well, that's thoroughly disheartening. When we find a PDB entry, it has an age of say, 1. We then grab the corresponding pdb file from Microsoft, with a 1 as the age in the filename. We process it and it says that its age is in fact 2. We therefore stash its identifier including an age of 2. When we find the PDB entry in memory again, it's age 1, so we redownload it... 5:\

@ikelos
Copy link
Member Author

ikelos commented Jun 1, 2022

The pdb name with the age value taken from the pdb file doesn't exist on Microsoft servers, so it looks like we've got a mismatch between fields. Unfortunately, that means even if we sort this, we'll need to regenerate the entire library based off the filenames (doable, but annoying)... 5:S

@digitalisx
Copy link
Contributor

When I first checked the operation, I felt that there was no problem.
But when I checked again, the same phenomenon occurred. I couldn't double check it.. 😱
Watching your error tracking, I'll be happy to comment if I can think of anything.. 🙂

@ikelos
Copy link
Member Author

ikelos commented Jun 1, 2022

Ok, so, confusingly it looks like there's a second field called age in the DBI_HEADER structure, which is the one we should have been using (instead of the age in the PDB header itself). I've added a fix, but all previously generated PDB files will be inaccurate and not match correctly (which should just mean they're re-downloaded and updated, possibly taking up some disk space even though they'll no longer be used). We should really go through the old windows.zip file and update it based off the filename (which used the age value we'd wanted) and also add some more recent kernels in. The windows.zip file has some historical symbol tables whose data is no longer available, so it is still a useful resource and one worth updating...

Very happy to have people think over the consequences of changing the age value this late in the game... 5:S

@digitalisx
Copy link
Contributor

I don't know if it'll help, but... The relatively recent Symbol Tables appear to be organized. :)
(https://github.com/JPCERTCC/Windows-Symbol-Tables)

@ikelos ikelos mentioned this pull request Jul 20, 2022
ikelos and others added 19 commits July 20, 2022 20:46
Yep, good spot as ever, thanks!  5:)

Co-authored-by: Donghyun Kim <digitalisx99@gmail.com>
Cool, I always forget about that, I think it's just what I'm used to, thanks!  5:)

Co-authored-by: Donghyun Kim <digitalisx99@gmail.com>
Yep, not sure why I forgot, thanks  5:)

Co-authored-by: Donghyun Kim <digitalisx99@gmail.com>
Hehehe, I guess I'm just a little shy about handing out complex objects, but you're right and it is a private method.  5:)

Co-authored-by: Donghyun Kim <digitalisx99@gmail.com>
Quite right, thanks for the catch!  5:)

Co-authored-by: Donghyun Kim <digitalisx99@gmail.com>
Yep, you're quite right, not sure how that got left behind.  Thanks!  5:)

Co-authored-by: Donghyun Kim <digitalisx99@gmail.com>
@ikelos ikelos merged commit 590b4fa into develop Jul 20, 2022
@ikelos ikelos deleted the feature/isf-unification branch July 20, 2022 20:48
@mischw
Copy link

mischw commented Jul 4, 2024

... We should really go through the old windows.zip file and update it based off the filename (which used the age value we'd wanted) and also add some more recent kernels in. ...

Hey. Are there any plans to update the symbol tables? Or is any help wanted?

@ikelos
Copy link
Member Author

ikelos commented Jul 4, 2024

There hasn't been yet, but if you'd like to help we'll happy accept any progress you can make on the task? The win.zip pack as it stands now is useful because it contains some historical JSON files that were made before Microsoft seems to have changed some of their pdb files to remove some typing information from them. I'll look over your pull request and we can go from there... 5:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Vol.py -s switch is too complicated
4 participants