Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AES and incorrect use or _malloca() and _freea() under Microsoft compilers #302

Closed
noloader opened this issue Sep 23, 2016 · 10 comments
Closed

Comments

@noloader
Copy link
Collaborator

noloader commented Sep 23, 2016

John Byrd privately reported a crash in AES under Microsoft compilers due to use of _malloca, AliasedWithTables and _freea. _malloca and _freea are Microsoft SDLC functions (alloca is on Microsoft's SDLC banned function list). Microsoft sometimes uses the heap rather than the stack for _malloca, and that's the reason _freea is needed.

The bug is specific to Windows and Microsoft compilers because its guarded by _MSC_VER. The bug does not affect Unix and Linux; and does not affect non-Microsoft compilers, like ICC and Borland. The bug was introduced at Commit 823bc93357da32a3 and only affects Crypto++ 5.6.4.

The code in question asks _malloca for a block of memory. The code also over-commits the size and adjust the pointer to a 256-byte boundary. The pointer is sent AliasedWithTables to flush cache lines. If AliasedWithTables cannot perform the flush, then it returns false. Upon the false return, the code reallocates, adjusts the pointer and calls AliasedWithTables again. Eventually the call succeeds. After the code in question completes, it frees the adjusted pointer and not the original pointer using _freea.

We believe we failed to detect the issue during testing because the pointer did not require adjustment. We got [un]lucky, but Byrd's testing revealed it.

Fortune favored us, and we picked up Byrd's suggestion of avoiding unaligned data access last week at at Commit f57c4dced5bfbcd1. The issue is side stepped "out of the box" in Master. However, an unsuspecting user could wander back into the configuration.

We immediately disclosed the memory bug so decision makers could assess the risk, provide feedback, and select a remediation commensurate with the security posture and comfort level:

@johnwbyrd
Copy link
Contributor

johnwbyrd commented Sep 23, 2016

A possible short-term workaround is to recompile Crypto++ with the CRYPTOPP_DISABLE_RIJNDAEL_ASM flag enabled.

I have developed a first attempt at a patch for this bug. In doing so, I refactored the code to make it less dire in general.

johnwbyrd@a33b953

@noloader
Copy link
Collaborator Author

@johnwbyrd, I hate to ask since you've helped so much, but do you have test data for the issue? I'd like to get it added to our test suite.

I realize its probably platform dependent. But I only need it to trigger on one person's machine to continue to alert us.

@johnwbyrd
Copy link
Contributor

johnwbyrd commented Sep 23, 2016

Unfortunately, the bug occurred on a bit of closed-source code, so I can't give out the entire thing.

In any case, you should be able to trigger it readily when compiling in debug mode with each malloc and free being audited... see also the _CrtSetDbgFlag( _CRTDBG_CHECK_ALWAYS_DF ) function, as well as compiling with the /RTC1 flag.

While debugging, if at any point the "space" pointer is changed from its original location by the 256-byte rounding function, you're guaranteed to be in for some pain at _freea time, as _freea refers to a memory address that was never allocated in the first place. Just compile and step through this function, and you'll see it happen now and then.

I will mention that the problem seems to occur more frequently when the Te variable happens to cross a 4K page boundary. I am not sure this is a necessary condition though.

@noloader
Copy link
Collaborator Author

noloader commented Sep 23, 2016

@johnwbyrd,

you should be able to trigger it readily when compiling in debug mode with each malloc and free being audited... see also the _CrtSetDbgFlag( _CRTDBG_CHECK_ALWAYS_DF ) function, as well as compiling with the /RTC1 flag.

Yeah, I would have thought so too. I regularly test Visual Studio 2002-2015 on Win32 and Win64. I have yet to see the issue surface. I also test compiles under ARM for Windows Phone, but I don't run it because I lack a test platform.

I'll need to verify _CrtSetDbgFlag( _CRTDBG_CHECK_ALWAYS_DF ). I know w are using _CrtSetDbgFlag, but I'm not sure if it includes _CRTDBG_CHECK_ALWAYS_DF.

@johnwbyrd
Copy link
Contributor

johnwbyrd commented Sep 23, 2016

You may not be moving the location of the Te structure in memory whenever you change the test suite. A different code path occurs based on whether the Te structure crosses a 4K page boundary or not. The location of Te is determined by the linker. In my case, I would assume that I have enough static variables declared before Te is instantiated, such that Te crosses a 4K boundary. In that case, the else statement is taken in AliasedWithTable(), which usually returns false for me. This might possibly explain why you're not seeing it in test cases -- in my code, Te always crosses a 4K boundary.

Additionally, do not lull yourself into the notion that patching AliasedWithTable() can fix this problem. _malloca() is called in a loop; only the last call to _malloca() is ever matched with _free(); and the rest of the _malloca()'s are dropped on the floor.

Candidly, rijndael.cpp needs to be taken out behind the barn and shot. It shouldn't be depending on _malloca() in the first place -- if it needs temp buffers it should set them up at initialization time from well-aligned buffers on the heap. "It needs to go fast" is not an excuse for this level of hackitude, especially in a security library.

@noloader
Copy link
Collaborator Author

noloader commented Sep 23, 2016

@johnwbyrd, thanks for the feedback. I knew that code had opportunity for improvement back in February or so. The struggle for me was two fold:

  1. do no harm and don't break things
  2. the intersection of the language with security requirements

For (1), I think we are at a point we can mostly detect violations. However, we don't have automated test scripts for Windows, and we don't have code generation tests for Windows (related, see Issue 159). Therefore I proceed with caution.

For (2), its a tricky problem. The C and C++ language does not always allow us to express ourselves well, and good intentions sometimes result in questionable hacks. Zeroizers are a perfect point - its a simple concept but amazingly difficult in practice due to compiler optimizations, C++ abstract machines and the "as if" rule.

This bug is another good example. AliasedWithTable exists to thwart timing attacks, but the language provides no facilities to address it. As far as I know, the C++ abstract machine has no concept of time in this context. The use of _malloca() and freea() is due to Microsoft SDLC and C&A requirements. Again, the C++ language does not know about C&A pressures.

There was no excuse for me to drop the ball on the cutover to _malloca() and freea(). All I can do is apologize for it.

noloader referenced this issue Sep 23, 2016
The 823 commit introduced a subtle bug we were not able to detect during testing. However, users experienced it in the field. We are reverting it because we violated the "do no harm" rule. The next steps are (1) completely remediation and (2) proper testing of the unit
@noloader
Copy link
Collaborator Author

noloader commented Sep 23, 2016

Thanks again @johnwbyrd. As a first step I backed out the portion of the 823 commit that introduced this bug. We are now back at where things were before the break.

I think we are going to pass on the 301 patch at the moment (but thank you very much for it). I think your critique was fair, but I can't justify doing the same thing after we were criticized for it.

@johnwbyrd
Copy link
Contributor

johnwbyrd commented Sep 23, 2016

Please don't apologize for the code! :) If you really feel bad, you can refund me the $0.00 that I paid for Crypto++.

Meanwhile, to your point about keeping rijndael.cpp intact. That code has gotten to this state by multiple independent efforts to get the block cipher running in assembly on multiple different platforms. I see machine code for at least 4 different architectures in there. And I don't see the new AVX or AVX2 being shoehorned into this. (It's tempting though to be able to do 4 simultaneous block ciphers with AVX2.)

So, the current structure of that file is unmaintainable as you know. I would suggest ripping out all the assembly and building a C++-only implementation that represents operations in a way which the compiler can easily vectorize to 2^n bits. That would give reasonable performance on all platforms while leaving open a door to future SIMD instruction sets.

I've had some success with armadillo, a template-based library for representing large matrices abstractly and then compiling the matrices down to platform-specific SIMD code. So you might see how Armadillo does this and rip off concepts as you see fit. http://arma.sourceforge.net/faq.html#speed

As for backing out _freea, won't that break your Microsoft SDLC compliance? (whatever that is)

@noloader
Copy link
Collaborator Author

noloader commented Sep 23, 2016

@johnwbyrd, @everyone,

Mitre assigned CVE-2016-7544 to the issue.

We applied John Byrd's patch at Commit 31e776d4e7aaaa9e.

We will keep the bug open until Crypto++ 5.6.5 ships.

@noloader noloader changed the title AES and incorrect argument to _freea() under Microsoft compilers AES and incorrect use or _malloca() and _freea() under Microsoft compilers Sep 25, 2016
@noloader
Copy link
Collaborator Author

Crypto++ 5.6.5 was released on 10/11/2016. Closing the issue.

@noloader noloader added the cve label Sep 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants