Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timing Attack Counter Measure AES #146

Closed
plestrin opened this issue Mar 11, 2016 · 13 comments
Closed

Timing Attack Counter Measure AES #146

plestrin opened this issue Mar 11, 2016 · 13 comments

Comments

@plestrin
Copy link

For both Rijndael::Enc::ProcessAndXorBlock and Rijndael::Dec::ProcessAndXorBlock there is some code to avoid timing attacks:

    word32 u = 0;
#ifdef CRYPTOPP_ALLOW_UNALIGNED_DATA_ACCESS
    for (i=0; i<2048; i+=cacheLineSize)
#else
    for (i=0; i<1024; i+=cacheLineSize)
#endif
        u &= *(const word32 *)(((const byte *)Te)+i);
    u &= Te[255];
    s0 |= u; s1 |= u; s2 |= u; s3 |= u;

As far as I understand it, the goal is to do at least one read per cache line in order to preload Te into the cache. However when looking at the x86 binary (obtained in the Debian package), I noticed that if the loop structure remains, the memory reads have been optimized away:

.text:00280499     mov     edx, ds:(_ZN8CryptoPP15g_cacheLineSizeE_ptr - 3A4000h)[ebx]
.text:0028049F     mov     ecx, [edx]
.text:002804A1     xor     edx, edx
.text:002804A3     nop
.text:002804A4     lea     esi, [esi+0]
.text:002804A8
.text:002804A8 loc_2804A8:                             ; CODE XREF: CryptoPP::Rijndael::Enc::ProcessAndXorBlock(uchar const*,uchar const*,uchar *)+D0�j
.text:002804A8     add     edx, ecx
.text:002804AA     cmp     edx, 7FFh
.text:002804B0     jbe     short loc_2804A8

This counter measure seems to be removed by the compiler. Hence, the binary may be vulnerable to timing attacks.

@noloader
Copy link
Collaborator

noloader commented Apr 6, 2016

Thanks for the report @plestrin.

_UPDATE_: CVE-2016-3995 has been assigned to this issue.

I've been mulling over it trying to decide the best way to approach the remdiation and verify it. The code for Rijndael is some of the hairiest I have seen, and I try to avoid it due to fears of knocking something loose.

I'm thinking it maybe as simple as:

volatile word32 u = 0;

Debian compiles Crypto++ with the following flags: -Wdate-time -D_FORTIFY_SOURCE=2 -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -DCRYPTOPP_INIT_PRIORITY=250 -DCRYPTOPP_NO_UNALIGNED_DATA_ACCESS.

Would you mind doing the following and take a look at the generated assembly:

$ git clone https://github.com/weidai11/cryptopp.git
$ cd cryptopp
# Make the change  "volatile word32 u = 0;"
$ make CXXFLAGS="-Wdate-time -D_FORTIFY_SOURCE=2  -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -DCRYPTOPP_INIT_PRIORITY=250 -DCRYPTOPP_NO_UNALIGNED_DATA_ACCESS"

If it tests OK, would you mind providing a Pull Request. The PR will ensure you receive credit at Crypto++ Contributors.

If you don't want credit, then that's fine, too. I'll wait a few days and check in the remediation once its verified.


László Böszörményi is our Debian package maintainer. I'm going to alert him to this report. I will also alert our other package maintainers.

@plestrin
Copy link
Author

plestrin commented Apr 6, 2016

Your solution seems perfectly correct. Here the assembly code that I got:

080000CB                 mov     esi, ds:_ZN8CryptoPP15g_cacheLineSizeE ; CryptoPP::g_cacheLineSize
080000D1                 mov     [esp+6Ch+var_34], 0
080000D9                 xor     edx, edx
080000DB                 mov     ebp, offset _ZN8CryptoPPL2TdE ; CryptoPP::Td
080000E0
080000E0 loc_80000E0:                            ; CODE XREF: CryptoPP::Rijndael::Dec::ProcessAndXorBlock(uchar const*,uchar const*,uchar *)+BB�j
080000E0                 mov     ecx, [esp+6Ch+var_34]
080000E4                 and     ecx, ds:_ZN8CryptoPPL2TdE[edx] ; CryptoPP::Td
080000EA                 add     edx, esi
080000EC                 cmp     edx, 7FFh
080000F2                 mov     eax, offset _ZN8CryptoPPL2TdE ; CryptoPP::Td
080000F7                 mov     [esp+6Ch+var_34], ecx
080000FB                 jbe     short loc_80000E0
080000FD                 mov     edx, [esp+6Ch+var_34]
08000101                 and     edx, ds:dword_8002F78
08000107                 mov     [esp+6Ch+var_34], edx
0800010B                 mov     edx, [esp+6Ch+var_34]
0800010F                 or      edx, [esp+6Ch+var_6C]
08000112                 mov     ecx, [esp+6Ch+var_34]
08000116                 or      ecx, edi

However, not to impact too much the performances I though of that little trick:

    volatile word32 u = 0;
    word32 u_ = u;
#if defined(CRYPTOPP_ALLOW_UNALIGNED_DATA_ACCESS) || defined(CRYPTOPP_ALLOW_RIJNDAEL_UNALIGNED_DATA_ACCESS)
    for (i=0; i<2048; i+=cacheLineSize)
#else
    for (i=0; i<1024; i+=cacheLineSize)
#endif
        u_ &= *(const word32 *)(const void *)(((const byte *)Td)+i);
    u_ &= Td[255];
    s0 |= u_; s1 |= u_; s2 |= u_; s3 |= u_;

Because u is declared volatile, the compiler will not perform constant propagation, but still, it will be able to optimize u_ and store it in a register. Here, is the slightly improved assembly code that I obtained with my solution:

080000C7                 mov     [esp+6Ch+var_34], 0
080000CF                 mov     ecx, ds:_ZN8CryptoPP15g_cacheLineSizeE ; CryptoPP::g_cacheLineSize
080000D5                 xor     ebx, ebx
080000D7                 mov     eax, [esp+6Ch+var_34]
080000DB                 mov     ebp, offset _ZN8CryptoPPL2TdE ; CryptoPP::Td
080000E0
080000E0 loc_80000E0:                            ; CODE XREF: CryptoPP::Rijndael::Dec::ProcessAndXorBlock(uchar const*,uchar const*,uchar *)+B3�j
080000E0                 and     eax, ds:_ZN8CryptoPPL2TdE[ebx] ; CryptoPP::Td
080000E6                 add     ebx, ecx
080000E8                 mov     edx, offset _ZN8CryptoPPL2TdE ; CryptoPP::Td
080000ED                 cmp     ebx, 7FFh
080000F3                 jbe     short loc_80000E0
080000F5                 and     eax, ds:dword_8002F78
080000FB                 mov     ebx, [esp+6Ch+var_6C]
080000FE                 mov     ecx, esi
08000100                 or      ecx, eax
08000102                 or      edi, eax
08000104                 or      ebx, eax
08000106                 or      eax, [esp+6Ch+var_68]

But I don't known if it's worth the additional complexity / lack of readability / risk. Anyway, I did a pull request with your fix, as you have requested.

@noloader
Copy link
Collaborator

noloader commented Apr 6, 2016

@plestrin - Thank you very much for the extra work. Its appreciated.

Your solution seems perfectly correct. Here the assembly code that I got...

OK, good. I know volatile is an abuse under GCC, but its usually enough to tame the optimizer. The other compilers are OK because their interpretation of the standard is less narrow.

Here, is the slightly improved assembly code that I obtained with my solution...

OK, thanks. Let's pull that instead if you don't mind.


If you feel inclined, contact me off-list at noloader, gmail account and provide your real name. I'll make sure the name gets added to the Release Notes (on the next release) and any CVE information (if one gets assigned).

@noloader
Copy link
Collaborator

noloader commented Apr 7, 2016

Also see Pull Request 154 and Commit 22f493dda9674df5.

@noloader
Copy link
Collaborator

noloader commented Apr 7, 2016

Thanks again @plestrin. We picked up the improvement you suggested at Commit 5cce8c33cabd92af.

@noloader
Copy link
Collaborator

@plestrin,

Our Debian maintainer got a CVE assigned. You will be credited with CVE-2016-3995.

Can you verify the current code to ensure we've remediated it?

Do you want to use the handle "plestrin", or would you like to use another name?

We will be releasing Crypto++ 5.6.4 in 2 to 4 weeks.

@noloader
Copy link
Collaborator

Re-opening because of the CVE.

@noloader noloader reopened this Apr 11, 2016
@noloader
Copy link
Collaborator

_Crypto++ 5.6.2_

For those who need to backport to Crypto++ 5.6.2:

git clone https://github.com/weidai11/cryptopp.git
cd cryptopp
git checkout CRYPTOPP_5_6_2

And then:

$ cat rijndael562.diff 
diff --git a/rijndael.cpp b/rijndael.cpp
index c185032..bfa2988 100644
--- a/rijndael.cpp
+++ b/rijndael.cpp
@@ -379,7 +379,8 @@ void Rijndael::Enc::ProcessAndXorBlock(const byte *inBlock, const byte *xorBlock
        // timing attack countermeasure. see comments at top for more details
        const int cacheLineSize = GetCacheLineSize();
        unsigned int i;
-       word32 u = 0;
+       volatile word32 _u = 0;
+       word32 u = _u;
 #ifdef CRYPTOPP_ALLOW_UNALIGNED_DATA_ACCESS
        for (i=0; i<2048; i+=cacheLineSize)
 #else
@@ -455,7 +456,8 @@ void Rijndael::Dec::ProcessAndXorBlock(const byte *inBlock, const byte *xorBlock
        // timing attack countermeasure. see comments at top for more details
        const int cacheLineSize = GetCacheLineSize();
        unsigned int i;
-       word32 u = 0;
+       volatile word32 _u = 0;
+       word32 u = _u;
 #ifdef CRYPTOPP_ALLOW_UNALIGNED_DATA_ACCESS
        for (i=0; i<2048; i+=cacheLineSize)
 #else
@@ -495,7 +497,7 @@ void Rijndael::Dec::ProcessAndXorBlock(const byte *inBlock, const byte *xorBlock
        // timing attack countermeasure. see comments at top for more details
        // If CRYPTOPP_ALLOW_UNALIGNED_DATA_ACCESS is defined, 
        // QUARTER_ROUND_LD will use Td, which is already preloaded.
-       u = 0;
+       u = _u;
        for (i=0; i<256; i+=cacheLineSize)
                u &= *(const word32 *)(Sd+i);
        u &= *(const word32 *)(Sd+252);

_Crypto++ 5.6.1_

For those who need to backport to Crypto++ 5.6.1:

git clone https://github.com/weidai11/cryptopp.git
cd cryptopp
git checkout CRYPTOPP_5_6_1

And then:

$ cat rijndael561.diff
diff --git a/rijndael.cpp b/rijndael.cpp
index 608b9d3..1f448a2 100644
--- a/rijndael.cpp
+++ b/rijndael.cpp
@@ -375,7 +375,8 @@ void Rijndael::Enc::ProcessAndXorBlock(const byte *inBlock, const byte *xorBlock
    // timing attack countermeasure. see comments at top for more details
    const int cacheLineSize = GetCacheLineSize();
    unsigned int i;
-   word32 u = 0;
+   volatile word32 _u = 0;
+   word32 u = _u;
 #ifdef CRYPTOPP_ALLOW_UNALIGNED_DATA_ACCESS
    for (i=0; i<2048; i+=cacheLineSize)
 #else
@@ -451,7 +452,8 @@ void Rijndael::Dec::ProcessAndXorBlock(const byte *inBlock, const byte *xorBlock
    // timing attack countermeasure. see comments at top for more details
    const int cacheLineSize = GetCacheLineSize();
    unsigned int i;
-   word32 u = 0;
+   volatile word32 _u = 0;
+   word32 u = _u;
 #ifdef CRYPTOPP_ALLOW_UNALIGNED_DATA_ACCESS
    for (i=0; i<2048; i+=cacheLineSize)
 #else
@@ -491,7 +493,7 @@ void Rijndael::Dec::ProcessAndXorBlock(const byte *inBlock, const byte *xorBlock
    // timing attack countermeasure. see comments at top for more details
    // If CRYPTOPP_ALLOW_UNALIGNED_DATA_ACCESS is defined, 
    // QUARTER_ROUND_LD will use Td, which is already preloaded.
-   u = 0;
+   u = _u;
    for (i=0; i<256; i+=cacheLineSize)
        u &= *(const word32 *)(Sd+i);
    u &= *(const word32 *)(Sd+252);

@plestrin
Copy link
Author

To be complete this line also needs to be modified: u=_u;

@noloader
Copy link
Collaborator

@plestrin - Got it. thanks.

@bs37208
Copy link

bs37208 commented Apr 28, 2016

Just curious about current status for the upcoming 5.6.4 release to include this fix. Do you foresee any target date for 5.6.4? Thanks!

@noloader
Copy link
Collaborator

@sbunify,

Sorry about the late reply. I don't know how I missed it...

Just curious about current status for the upcoming 5.6.4 release to include this fix. Do you foresee any target date for 5.6.4?

Yes, the weekend of September 10, 2016. Also see Please Test Crypto++ 5.6.4 Release Candidate on the mailing list.

@noloader
Copy link
Collaborator

The library state has been tested and release is eminent. Closing this report.

The release page is located at Crypto++ 5.6.4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants