Date: Thu, 27 Apr 2006 19:39:36 +0400 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: Performance tuning On Thu, Apr 27, 2006 at 05:45:46PM +0200, Ami Schwartzman wrote: > 1) More of a question to Solar Designer: I saw that you provide only the > UNIX source, and we might (it's not sure yet) be doing it in Windows. > Is it possible for you to provide us the Windows source? Actually, the only reason the download link for the sources has "Unix" in it is to ensure that Windows (l)users don't download the sources by mistake (and quite some of them do anyway...) ;-) The official source tarball of JtR includes everything needed for building on Win32 with Cygwin. > I'm asking > because we would eventually release the source back to the community. > Will you have a problem with that? Of course not, as long as you don't violate the GPL. > 2)This is to everyone who knows the source: Do you think there is room > for improvement? Yes. > I'm asking because I understand that DES is pretty > hard to paralellize. You don't need to parallelize individual instances of DES (or of other underlying ciphers and hashes that John uses). With password cracking, there are multiple candidate passwords to be tried - so you can try more than one in parallel. John already does this sort of parallelization with much success. > Do you think threads, If you want the absolute best performance, then separate processes are likely to work better than threads. As soon as you're starting to use threads, you have to ensure that different threads store their private data at different virtual addresses - which may require the use of an extra register and more complicated addressing modes. I'd estimate the slowdown for making the low-level crypto code in John thread-safe at 0% to 10% for different ciphers/hashes and processor architectures. For bcrypt (the Blowfish-based password hashing method), I actually did the conversion to thread-safe code for the crypt_blowfish package. The measured slowdown is 2% to 10% on different architectures. > making it run on two cores, Of course, this would help - to the same extent that starting two instances of John does. If done properly, the combined c/s rate would almost double. It is important to ensure that the order in which candidate passwords are tried not be affected by the SMP/multi-core support significantly. Otherwise, such support would not be much or any better than starting two instances of John manually - which is supported already. > or utilizing SEE2/3 for example could improve performance? It is straightforward to hack the bitslice DES code currently in John to use SSE instead of MMX - and I have that code already. For DES-based crypt(3) hashes, it runs 10% to 40% faster on Pentium 4 (including P4 Celerons and P4 Xeons). For LM hashes, it runs a little bit faster on some P4 models and a little bit slower on others. On Pentium 3 and on AMD processors, the SSE code is slower in all cases (for the benchmarks I've performed or have seen so far). I imagine that on future processors SSE will provide a bigger advantage. Also, there's likely room for improvement of the SSE code converted from MMX. > I'm interested in other algorithms, not just DES. Indeed. The use of MMX and/or SSE is obviously beneficial for MD5-based and similar hashes (MD4, SHA-1) - where several hashes may be computed in parallel in the vector registers. IIRC, people have been hacking John to use this approach for raw MD5 and SHA-1 hashes already. -- Alexander Peslyak <solar at openwall.com> GPG key ID: B35D3598 fp: 6429 0D7E F130 C13E C929 6447 73C3 A290 B35D 3598 http://www.openwall.com - bringing security into open computing environments Was I helpful? Please give your feedback here: http://rate.affero.net/solar
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.