Openwall GNU/*/Linux 3.0 - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 17 Mar 2013 08:03:21 +0400
From: Solar Designer <solar@...nwall.com>
To: crypt-dev@...ts.openwall.com
Subject: scrypt TMTO defeater

Colin, Anthony, all -

The message I am following up on is the one where I described an attack
on Anthony's scrypt time-memory tradeoff defeater:

http://www.openwall.com/lists/crypt-dev/2012/09/10/3

I'll skip most of its content now, and only comment on my attempt to
actually implement a time-memory tradeoff defeater of this sort (despite
of the possible attack).

On Mon, Sep 10, 2012 at 06:27:44AM +0400, Solar Designer wrote:
> For context, here's Anthony Ferrara's comment that I am referring to:
> 
> http://drupal.org/node/1201444#comment-4675994
> 
> Here's an attack on Anthony's tradeoff defeater ("simply writing to V_j
> in the second loop"):

[... lots of text skipped ...]

> Does this mean that Anthony's tradeoff defeater is not worthwhile?  Not
> quite.  The cold memory is nevertheless accessed pretty often - just not
> as often as the hot portion.  So there is some reduction in attack speed
> due to the defeater even in this special case.  Then, if we try to
> reduce our hot memory needs by more than a factor of 2, the corresponding
> revisions of the attack described above become a lot less efficient
> (mostly in terms of frequency of accesses to the cold portion).
> 
> Yet we could enhance the tradeoff defeater to make the attack less
> efficient: write into more than one V element per loop iteration and/or
> write into such V elements that more of them are written to (perhaps
> 100% instead of just 63% by loop end).

So I've tried to implement an enhancement like what I described above:
writing not only to V_j, but also to V_i (with another value).
Unfortunately, it turns out that BlockMix's shuffling(*) gets in the way
of efficient implementations, and _possibly_ the inefficiency mostly hurts
cleaner CPU code, as opposed to hacks and ASICs.  Specifically, when the
V element we write into is or might be the same as V_j, in a clean
implementation we have to postpone those writes until the BlockMix
completes, and this means re-reading the same values from memory
(hopefully, from L1 cache) rather than simply storing register values
into two locations at once (such as X and V_j).  In order to do the
latter without introducing extra checks, I had to limit the writes to
elements that are definitely not same as V_j; I chose (~j & (N - 1)) for
their indices.

(*) which I am still looking for an explanation of:
http://mail.tarsnap.com/scrypt/msg00059.html

I've attached a diff between two of my code revisions.  escrypt-23
implements the official scrypt, with only some optimizations to
crypto_scrypt-sse.c.  escrypt-24 renames the function to escrypt and
adds a Boolean defeat_tmto parameter.  When it is zero, the function
returns the same values as the official scrypt does.  When it is
non-zero, the trivial TMTO defeater as described above is enabled.
It is implemented in all three versions of the code (-ref, -nosse,
-sse), although only the -sse revision is (partially) optimized (reuses
register values instead of copying memory).  (Checking defeat_tmto
outside of the loop and then using one of two specialized loop versions
is an obvious further optimization.)

I may consider adding a TMTO defeater by means of changes to the first
loop as well - an approach needed for my "ROM-port hard functions" idea
anyway, because the ROM can't be written to in the second loop.

I'd appreciate any comments.

For testing:
MD5 of output of "tests" with defeat_tmto = 0:
4455b1ce0529e7f877de53f24ff78bec
MD5 of output of "tests" with defeat_tmto = 1:
43f6df099d2d6dec402960ae41ee1d08

Alexander

diff -urp escrypt-23/crypto_scrypt-nosse.c escrypt-24/crypto_scrypt-nosse.c
--- escrypt-23/crypto_scrypt-nosse.c	2010-01-16 20:48:20 +0000
+++ escrypt-24/crypto_scrypt-nosse.c	2013-03-17 03:07:41 +0000
@@ -1,5 +1,6 @@
 /*-
  * Copyright 2009 Colin Percival
+ * Copyright 2013 Alexander Peslyak
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
@@ -46,7 +47,7 @@ static void blkxor(void *, void *, size_
 static void salsa20_8(uint32_t[16]);
 static void blockmix_salsa8(uint32_t *, uint32_t *, uint32_t *, size_t);
 static uint64_t integerify(void *, size_t);
-static void smix(uint8_t *, size_t, uint64_t, uint32_t *, uint32_t *);
+static void smix(uint8_t *, size_t, uint64_t, uint32_t *, uint32_t *, int);
 
 static void
 blkcpy(void * dest, void * src, size_t len)
@@ -163,7 +164,7 @@ integerify(void * B, size_t r)
 }
 
 /**
- * smix(B, r, N, V, XY):
+ * smix(B, r, N, V, XY, defeat_tmto):
  * Compute B = SMix_r(B, N).  The input B must be 128r bytes in length;
  * the temporary storage V must be 128rN bytes in length; the temporary
  * storage XY must be 256r + 64 bytes in length.  The value N must be a
@@ -171,7 +172,8 @@ integerify(void * B, size_t r)
  * multiple of 64 bytes.
  */
 static void
-smix(uint8_t * B, size_t r, uint64_t N, uint32_t * V, uint32_t * XY)
+smix(uint8_t * B, size_t r, uint64_t N, uint32_t * V, uint32_t * XY,
+    int defeat_tmto)
 {
 	uint32_t * X = XY;
 	uint32_t * Y = &XY[32 * r];
@@ -207,6 +209,8 @@ smix(uint8_t * B, size_t r, uint64_t N,
 		/* 8: X <-- H(X \xor V_j) */
 		blkxor(X, &V[j * (32 * r)], 128 * r);
 		blockmix_salsa8(X, Y, Z, r);
+		if (defeat_tmto)
+			blkcpy(&V[(~j & (N - 1)) * (32 * r)], Y, 128 * r);
 
 		/* 7: j <-- Integerify(X) mod N */
 		j = integerify(Y, r) & (N - 1);
@@ -214,6 +218,8 @@ smix(uint8_t * B, size_t r, uint64_t N,
 		/* 8: X <-- H(X \xor V_j) */
 		blkxor(Y, &V[j * (32 * r)], 128 * r);
 		blockmix_salsa8(Y, X, Z, r);
+		if (defeat_tmto)
+			blkcpy(&V[(~j & (N - 1)) * (32 * r)], X, 128 * r);
 	}
 
 	/* 10: B' <-- X */
@@ -222,7 +228,8 @@ smix(uint8_t * B, size_t r, uint64_t N,
 }
 
 /**
- * crypto_scrypt(passwd, passwdlen, salt, saltlen, N, r, p, buf, buflen):
+ * crypto_escrypt(passwd, passwdlen, salt, saltlen, N, r, p, buf, buflen,
+ *     defeat_tmto):
  * Compute scrypt(passwd[0 .. passwdlen - 1], salt[0 .. saltlen - 1], N, r,
  * p, buflen) and write the result into buf.  The parameters r, p, and buflen
  * must satisfy r * p < 2^30 and buflen <= (2^32 - 1) * 32.  The parameter N
@@ -231,9 +238,9 @@ smix(uint8_t * B, size_t r, uint64_t N,
  * Return 0 on success; or -1 on error.
  */
 int
-crypto_scrypt(const uint8_t * passwd, size_t passwdlen,
+crypto_escrypt(const uint8_t * passwd, size_t passwdlen,
     const uint8_t * salt, size_t saltlen, uint64_t N, uint32_t r, uint32_t p,
-    uint8_t * buf, size_t buflen)
+    uint8_t * buf, size_t buflen, int defeat_tmto)
 {
 	void * B0, * V0, * XY0;
 	uint8_t * B;
@@ -309,7 +316,7 @@ crypto_scrypt(const uint8_t * passwd, si
 	/* 2: for i = 0 to p - 1 do */
 	for (i = 0; i < p; i++) {
 		/* 3: B_i <-- MF(B_i, N) */
-		smix(&B[i * 128 * r], r, N, V, XY);
+		smix(&B[i * 128 * r], r, N, V, XY, defeat_tmto);
 	}
 
 	/* 5: DK <-- PBKDF2(P, B, 1, dkLen) */
diff -urp escrypt-23/crypto_scrypt-ref.c escrypt-24/crypto_scrypt-ref.c
--- escrypt-23/crypto_scrypt-ref.c	2010-01-16 20:48:20 +0000
+++ escrypt-24/crypto_scrypt-ref.c	2013-03-17 03:06:17 +0000
@@ -1,5 +1,6 @@
 /*-
  * Copyright 2009 Colin Percival
+ * Copyright 2013 Alexander Peslyak
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
@@ -43,7 +44,7 @@ static void blkxor(uint8_t *, uint8_t *,
 static void salsa20_8(uint8_t[64]);
 static void blockmix_salsa8(uint8_t *, uint8_t *, size_t);
 static uint64_t integerify(uint8_t *, size_t);
-static void smix(uint8_t *, size_t, uint64_t, uint8_t *, uint8_t *);
+static void smix(uint8_t *, size_t, uint64_t, uint8_t *, uint8_t *, int);
 
 static void
 blkcpy(uint8_t * dest, uint8_t * src, size_t len)
@@ -170,7 +171,8 @@ integerify(uint8_t * B, size_t r)
  * XY must be 256r bytes in length.  The value N must be a power of 2.
  */
 static void
-smix(uint8_t * B, size_t r, uint64_t N, uint8_t * V, uint8_t * XY)
+smix(uint8_t * B, size_t r, uint64_t N, uint8_t * V, uint8_t * XY,
+    int defeat_tmto)
 {
 	uint8_t * X = XY;
 	uint8_t * Y = &XY[128 * r];
@@ -197,6 +199,8 @@ smix(uint8_t * B, size_t r, uint64_t N,
 		/* 8: X <-- H(X \xor V_j) */
 		blkxor(X, &V[j * (128 * r)], 128 * r);
 		blockmix_salsa8(X, Y, r);
+		if (defeat_tmto)
+			blkcpy(&V[(~j & (N - 1)) * (128 * r)], X, 128 * r);
 	}
 
 	/* 10: B' <-- X */
@@ -204,7 +208,8 @@ smix(uint8_t * B, size_t r, uint64_t N,
 }
 
 /**
- * crypto_scrypt(passwd, passwdlen, salt, saltlen, N, r, p, buf, buflen):
+ * crypto_escrypt(passwd, passwdlen, salt, saltlen, N, r, p, buf, buflen,
+ *     defeat_tmto):
  * Compute scrypt(passwd[0 .. passwdlen - 1], salt[0 .. saltlen - 1], N, r,
  * p, buflen) and write the result into buf.  The parameters r, p, and buflen
  * must satisfy r * p < 2^30 and buflen <= (2^32 - 1) * 32.  The parameter N
@@ -213,9 +218,9 @@ smix(uint8_t * B, size_t r, uint64_t N,
  * Return 0 on success; or -1 on error.
  */
 int
-crypto_scrypt(const uint8_t * passwd, size_t passwdlen,
+crypto_escrypt(const uint8_t * passwd, size_t passwdlen,
     const uint8_t * salt, size_t saltlen, uint64_t N, uint32_t r, uint32_t p,
-    uint8_t * buf, size_t buflen)
+    uint8_t * buf, size_t buflen, int defeat_tmto)
 {
 	uint8_t * B;
 	uint8_t * V;
@@ -260,7 +265,7 @@ crypto_scrypt(const uint8_t * passwd, si
 	/* 2: for i = 0 to p - 1 do */
 	for (i = 0; i < p; i++) {
 		/* 3: B_i <-- MF(B_i, N) */
-		smix(&B[i * 128 * r], r, N, V, XY);
+		smix(&B[i * 128 * r], r, N, V, XY, defeat_tmto);
 	}
 
 	/* 5: DK <-- PBKDF2(P, B, 1, dkLen) */
diff -urp escrypt-23/crypto_scrypt-sse.c escrypt-24/crypto_scrypt-sse.c
--- escrypt-23/crypto_scrypt-sse.c	2013-03-17 00:05:35 +0000
+++ escrypt-24/crypto_scrypt-sse.c	2013-03-17 03:04:40 +0000
@@ -1,6 +1,6 @@
 /*-
  * Copyright 2009 Colin Percival
- * Copyright 2012 Alexander Peslyak
+ * Copyright 2012,2013 Alexander Peslyak
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
@@ -151,8 +151,17 @@ blockmix_salsa8(__m128i * Bin, __m128i *
 	X2 = _mm_xor_si128(X2, (in)[2]); \
 	X3 = _mm_xor_si128(X3, (in)[3]);
 
+#define CONDOUT(do_out, out) \
+	if (do_out) { \
+		(out)[0] = X0; \
+		(out)[1] = X1; \
+		(out)[2] = X2; \
+		(out)[3] = X3; \
+	}
+
 static inline uint32_t
-blockmix_salsa8_xor(__m128i * Bin1, __m128i * Bin2, __m128i * Bout, size_t r)
+blockmix_salsa8_xor(__m128i * Bin1, __m128i * Bin2, __m128i * Bout,
+    __m128i * Bdup, size_t r)
 {
 	__m128i X0, X1, X2, X3;
 	size_t i;
@@ -168,6 +177,7 @@ blockmix_salsa8_xor(__m128i * Bin1, __m1
 	/* 6: B' <-- (Y_0, Y_2 ... Y_{2r-2}, Y_1, Y_3 ... Y_{2r-1}) */
 	XOR4(Bin1)
 	SALSA20_8_XOR(Bin2, Bout)
+	CONDOUT(Bdup, Bdup)
 
 	/* 2: for i = 0 to 2r - 1 do */
 	r--;
@@ -177,6 +187,7 @@ blockmix_salsa8_xor(__m128i * Bin1, __m1
 		/* 6: B' <-- (Y_0, Y_2 ... Y_{2r-2}, Y_1, Y_3 ... Y_{2r-1}) */
 		XOR4(&Bin1[i * 8 + 4])
 		SALSA20_8_XOR(&Bin2[i * 8 + 4], &Bout[(r + i) * 4 + 4])
+		CONDOUT(Bdup, &Bdup[(r + i) * 4 + 4])
 
 		i++;
 
@@ -185,6 +196,7 @@ blockmix_salsa8_xor(__m128i * Bin1, __m1
 		/* 6: B' <-- (Y_0, Y_2 ... Y_{2r-2}, Y_1, Y_3 ... Y_{2r-1}) */
 		XOR4(&Bin1[i * 8])
 		SALSA20_8_XOR(&Bin2[i * 8], &Bout[i * 4])
+		CONDOUT(Bdup, &Bdup[i * 4])
 	}
 
 	/* 3: X <-- H(X \xor B_i) */
@@ -192,6 +204,7 @@ blockmix_salsa8_xor(__m128i * Bin1, __m1
 	/* 6: B' <-- (Y_0, Y_2 ... Y_{2r-2}, Y_1, Y_3 ... Y_{2r-1}) */
 	XOR4(&Bin1[i * 8 + 4])
 	SALSA20_8_XOR(&Bin2[i * 8 + 4], &Bout[(r + i) * 4 + 4])
+	CONDOUT(Bdup, &Bdup[(r + i) * 4 + 4])
 
 	return _mm_cvtsi128_si32(X0);
 }
@@ -200,6 +213,7 @@ blockmix_salsa8_xor(__m128i * Bin1, __m1
 #undef SALSA20_2ROUNDS
 #undef SALSA20_8_XOR
 #undef XOR4
+#undef CONDOUT
 
 /**
  * integerify(B, r):
@@ -212,7 +226,7 @@ integerify(void * B, size_t r)
 }
 
 /**
- * smix(B, r, N, V, XY):
+ * smix(B, r, N, V, XY, defeat_tmto):
  * Compute B = SMix_r(B, N).  The input B must be 128r bytes in length;
  * the temporary storage V must be 128rN bytes in length; the temporary
  * storage XY must be 256r + 64 bytes in length.  The value N must be a
@@ -220,7 +234,7 @@ integerify(void * B, size_t r)
  * multiple of 64 bytes.
  */
 static void
-smix(uint8_t * B, size_t r, uint32_t N, void * V, void * XY)
+smix(uint8_t * B, size_t r, uint32_t N, void * V, void * XY, int defeat_tmto)
 {
 	__m128i * X = V, * Y, * V_j;
 	uint32_t * X32 = V;
@@ -264,19 +278,31 @@ smix(uint8_t * B, size_t r, uint32_t N,
 
 	/* 7: j <-- Integerify(X) mod N */
 	j = integerify(X, r) & (N - 1);
-	V_j = (void *)((uintptr_t)(V) + j * 128 * r);
 
 	/* 6: for i = 0 to N - 1 do */
 	for (i = 0; i < N; i += 2) {
+		__m128i * V_nj = NULL;
+
+		V_j = (void *)((uintptr_t)(V) + j * 128 * r);
+
+		if (defeat_tmto) {
+			uint32_t nj = ~j & (N - 1);
+			V_nj = (void *)((uintptr_t)(V) + nj * 128 * r);
+		}
+
 		/* 8: X <-- H(X \xor V_j) */
 		/* 7: j <-- Integerify(X) mod N */
-		j = blockmix_salsa8_xor(X, V_j, Y, r) & (N - 1);
+		j = blockmix_salsa8_xor(X, V_j, Y, V_nj, r) & (N - 1);
 		V_j = (void *)((uintptr_t)(V) + j * 128 * r);
 
+		if (defeat_tmto) {
+			uint32_t nj = ~j & (N - 1);
+			V_nj = (void *)((uintptr_t)(V) + nj * 128 * r);
+		}
+
 		/* 8: X <-- H(X \xor V_j) */
 		/* 7: j <-- Integerify(X) mod N */
-		j = blockmix_salsa8_xor(Y, V_j, X, r) & (N - 1);
-		V_j = (void *)((uintptr_t)(V) + j * 128 * r);
+		j = blockmix_salsa8_xor(Y, V_j, X, V_nj, r) & (N - 1);
 	}
 
 	/* 10: B' <-- X */
@@ -289,7 +315,8 @@ smix(uint8_t * B, size_t r, uint32_t N,
 }
 
 /**
- * crypto_scrypt(passwd, passwdlen, salt, saltlen, N, r, p, buf, buflen):
+ * crypto_escrypt(passwd, passwdlen, salt, saltlen, N, r, p, buf, buflen,
+ *     defeat_tmto):
  * Compute scrypt(passwd[0 .. passwdlen - 1], salt[0 .. saltlen - 1], N, r,
  * p, buflen) and write the result into buf.  The parameters r, p, and buflen
  * must satisfy r * p < 2^30 and buflen <= (2^32 - 1) * 32.  The parameter N
@@ -298,9 +325,9 @@ smix(uint8_t * B, size_t r, uint32_t N,
  * Return 0 on success; or -1 on error.
  */
 int
-crypto_scrypt(const uint8_t * passwd, size_t passwdlen,
+crypto_escrypt(const uint8_t * passwd, size_t passwdlen,
     const uint8_t * salt, size_t saltlen, uint64_t N, uint32_t r, uint32_t p,
-    uint8_t * buf, size_t buflen)
+    uint8_t * buf, size_t buflen, int defeat_tmto)
 {
 	void * B0, * V0, * XY0;
 	uint8_t * B;
@@ -380,7 +407,7 @@ crypto_scrypt(const uint8_t * passwd, si
 	/* 2: for i = 0 to p - 1 do */
 	for (i = 0; i < p; i++) {
 		/* 3: B_i <-- MF(B_i, N) */
-		smix(&B[i * 128 * r], r, N, V, XY);
+		smix(&B[i * 128 * r], r, N, V, XY, defeat_tmto);
 	}
 
 	/* 5: DK <-- PBKDF2(P, B, 1, dkLen) */
diff -urp escrypt-23/crypto_scrypt.h escrypt-24/crypto_scrypt.h
--- escrypt-23/crypto_scrypt.h	2010-01-16 20:48:20 +0000
+++ escrypt-24/crypto_scrypt.h	2013-03-17 01:29:31 +0000
@@ -32,7 +32,8 @@
 #include <stdint.h>
 
 /**
- * crypto_scrypt(passwd, passwdlen, salt, saltlen, N, r, p, buf, buflen):
+ * crypto_escrypt(passwd, passwdlen, salt, saltlen, N, r, p, buf, buflen,
+ *     defeat_tmto):
  * Compute scrypt(passwd[0 .. passwdlen - 1], salt[0 .. saltlen - 1], N, r,
  * p, buflen) and write the result into buf.  The parameters r, p, and buflen
  * must satisfy r * p < 2^30 and buflen <= (2^32 - 1) * 32.  The parameter N
@@ -40,7 +41,7 @@
  *
  * Return 0 on success; or -1 on error.
  */
-int crypto_scrypt(const uint8_t *, size_t, const uint8_t *, size_t, uint64_t,
-    uint32_t, uint32_t, uint8_t *, size_t);
+int crypto_escrypt(const uint8_t *, size_t, const uint8_t *, size_t, uint64_t,
+    uint32_t, uint32_t, uint8_t *, size_t, int);
 
 #endif /* !_CRYPTO_SCRYPT_H_ */
diff -urp escrypt-23/tests.c escrypt-24/tests.c
--- escrypt-23/tests.c	2012-11-15 08:57:58 +0000
+++ escrypt-24/tests.c	2013-03-17 03:09:53 +0000
@@ -1,6 +1,8 @@
 #include <stdio.h>
 #include <string.h>
 
+#define defeat_tmto 0
+
 #undef TEST_PBKDF2_SHA256
 #define TEST_SCRYPT
 
@@ -52,8 +54,9 @@ print_scrypt(const char * passwd, const
 	printf("scrypt(\"%s\", \"%s\", %llu, %u, %u) =",
 	    passwd, salt, (unsigned long long)N, r, p);
 
-	if (crypto_scrypt((const uint8_t *) passwd, strlen(passwd),
-	    (const uint8_t *) salt, strlen(salt), N, r, p, dk, sizeof(dk))) {
+	if (crypto_escrypt((const uint8_t *) passwd, strlen(passwd),
+	    (const uint8_t *) salt, strlen(salt), N, r, p, dk, sizeof(dk),
+	    defeat_tmto)) {
 		puts(" FAILED");
 		return;
 	}

[ CONTENT OF TYPE application/x-gzip SKIPPED ]

[ CONTENT OF TYPE application/x-gzip SKIPPED ]

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux - Powered by OpenVZ