Date: Fri, 1 Jul 2011 18:36:11 -0400 From: Anthony Ferrara <ircmaxell@...il.com> To: crypt-dev@...ts.openwall.com Subject: Some thoughts on hard to compute hashes Hello all, Let me preface this by saying that I'm no expert in cryptography. I am pretty good with math and the algorithms behind the ciphers, but I've never significantly studied them. I understand how and why they work, but I'm not versed in the intricacies of their design or testing. So I am posting this question to see if this thought is even worth pursuing. So I was conversing with Solardiz on a drupal topic (http://drupal.org/node/1201444#comment-4675994) and we were discussing scrypt hashing and how there are really two forms of the algorithm. One that uses a lot of memory, and one that uses a lot of logic (as a tradeoff). The logic version, while significantly more complicated and slower, could be potentially run on a massively parallel GPU (not parallelizing the algorithm, just running multiple hashes at the same time). While the memory bound version is not particularly suited to high parallelism due to the memory demands. So that started me thinking. What can a general purpose CPU do efficiently that neither an ASIC nor a GPU could do in an efficient manner (at a reasonable cost at least)? And the answer that I came up with was branching logic. Modern CPUs have lots of die space devoted to branching prediction and prefetching. Therefore, a CPU should be more efficient (from my limited view at least) at running a hashing algorithm with significant branching than either a GPU or an inexpensive ASIC (as the intelligent branching logic adds complexity and significant numbers of transistors to the design). So, if we take a memory-hard algorithm such as used by scrypt, and add branching logic inside the smix() function (for example), we could theoretically make the algorithm favor general purpose CPUs. For example, the current smix function is: X <-- B For i = 0 ... N - 1 V_i <-- X X <-- H(X) For i = 0 ... N - 1 j <-- Integerify(x) mod N X <-- H(X xor V_j) What if we changed the second loop to something such as: For i = 0 ... N - 1 j <-- Integerify(x) mod N If j > (N / B) X <-- H(X xor V_j) Elseif j > (N / 2B) V_i <-- H(X xor V_j) Elseif j > (N / 4B) V_j <-- H(X xor V_i) Else X <-- H(X xor V_i) Where B is a "branching factor" where 2^N > 4B > 4. The addition of the branching factor will adjust the percentages that each branch gets executed (with B=2 equal to 50%, 25%, 12.5%, 12.5%). By increasing this factor, it should adjust the ability of the processor to optimized the branch (where B=2 would favor a prefetch of all branches, and large B values favoring prediction). The exact contents of the branches is open, but I just wanted something meaningful to demonstrate the concept. Just wondering if this idea is interesting enough to do some testing and modeling of, or if it's been done before and is not worth it (or it's not worth it for other reasons). Thanks, Anthony
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.