Date: Mon, 25 Jun 2012 08:56:17 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: eliminating barrier between crypt_all() and hash comparison/lookups On Sat, Jan 21, 2012 at 09:41:25PM +0400, Solar Designer wrote: > On Mon, Jan 16, 2012 at 06:25:26PM +0400, Solar Designer wrote: > > Some further OpenMP speedup may be possible through eliminating the > > implicit barrier between crypt_all() and the bitmap lookups loop, but > > this is tricky to implement while preserving the generic program > > structure (not specific to a hash type). Well, maybe crypt_all() could > > use nowait and atomic writes to some flags (as well as write barriers), > > and the lookups loop or rather get_hash*() could spin when it sees a > > flag that is not yet set (after a read barrier, although this should not > > matter on x86). > > I looked into this possibility. No, it won't work without changing the > program structure significantly because nowait may only be used within a > parallel region, which has to be closed (and thus an implicit barrier > introduced) before the end of the function (crypt_all() in this case). Actually, the function holding the parallel region does not have to be crypt_all() itself. It can as well be the caller function in cracker.c. Then crypt_all() would use "#pragma omp for nowait" without the "parallel" keyword. Maybe a format could use a new FMT_* flag to tell cracker.c that its crypt_all() (and cmp_all() and get_hash*() maybe?) should be called from within a parallel region in the caller. Then this would fit in the existing formats interface. The attached standalone SHA-512 cracker demonstrates this approach. It runs slightly faster for me with "nowait" than without. There's some overhead to maintain the computed hash revision numbers, though - when I remove those, it runs faster yet, but then it's unreliable. (To actually trigger the problem in that case, reduce KEYS_PER_CRYPT to be same as thread count and use the reverse order loop in cmp_all() - I wrote that line specifically to make it more likely to hit not-yet-computed hashes.) Better efficiency may be achieved by having per-thread rather than per-hash flags. However, then we have to allocate key/hash indices to threads manually (not rely on OpenMP to do this magic). So we do in fact have the "nowait" alternative to the other non-pretty approach, which I described below: > What we could do is move (or duplicate) some logic from cracker.c, > cmp_*() and get_hash_*() into crypt_all(). The new crypt_all() would > then need to accept e.g. a pointer to struct db_salt, so it would check > if there are possibly any guesses or not before returning control to > cracker.c (which would then identify the specific guessed passwords with > a loop much like it does now - but only if crypt_all() returned that > there might be any). Unfortunately, this mixes up different layers of > the program. While it is no surprise that mixing them should improve > performance at fast hashes, it is bad for code readability and further > maintenance. Also, if implemented the obvious way, we may even get > circular #include's between formats.h and loader.h. Merging these into > one file would probably go against source code readability even further. > > So I haven't decided what to do on this yet. Alexander View attachment "nowait.c" of type "text/plain" (2384 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.