john-dev - RE: RAdmin, SIP speedup

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <02df01cd5cef$c29cfda0$47d6f8e0$@net>
Date: Sun, 8 Jul 2012 04:55:09 -0500
From: "jfoug" <jfoug@....net>
To: <john-dev@...ts.openwall.com>
Subject: RE: RAdmin, SIP speedup

>From: Solar Designer [mailto:solar@...nwall.com]
>
>On Sat, Jul 07, 2012 at 09:58:36PM -0500, jfoug wrote:
>> Here is an update to dynamic.
>>
>> $ ../run/john -test=5 -form=dynamic_1010
>> Benchmarking: dynamic_1010 dynamic_1010: RAdmin v2.x MD5 [32/32 128x1
>> (MD5_Body)]... DONE
>> Raw:    2117K c/s real, 2122K c/s virtual
>>
>> $ ../run/john -test=5 -form=dynamic_1011
>> Benchmarking: dynamic_1011 dynamic_1011: RAdmin v2.x MD5 [32/32 128x1
>> (MD5_Body)]... DONE
>> Raw:    2582K c/s real, 2587K c/s virtual
>
>Maybe you should make your dynamic_1011 the new dynamic_1010 instead,
>when you commit this?  That is, have just one dynamic format for RAdmin.

If the original change has not been made yet, then I would deem that the
proper change.

>> Func=DynamicFunc__set_input_len_100
>
>Maybe we need to support something like:
>
>Func=DynamicFunc__set_input_len(100)

At the current time, there are no parameters possible.  I have had ideas on
how to add simple looping, variables and parameter, but have not tried much
yet.

The problem comes from many of these type functions are not generic.  They
are written specifically for some format, OR more often specifically for
optimization of a few formats.

Take for instance set_len_32.  It was written for formats like:
md5(md5($s).$p), or any others that put a base32 hex salt first.

So what we do, us use a flag that tells us within set_salt() to compute the
md5 of the salt, and drop the base-16 over the start of each input.  Then we
simply set the length to 32, and append the keys (or do whatever other work
there is to be done).  For most formats, the X86 builds (non-sse2), do not
care about any dirty crap in the buffer, past the current length.  Thus,
that is not cleaned up on the x86 version of the set_len_32.  However, it is
cleaned up for the MMX versions.

However, for this new set_len_100, there are a few different special things
that happen.  1., the code throws an error, if you try to run it in MMX
mode.  2, the dirty part of the buffer gets a cleaning, since it is supposed
to be null terminated.

So, if we made this generic, then the generic function (dyna_set_len_param()
)  It would have to have all of the conditional logic, would always have to
clean, etc.

Then, what happens when this crypt needs supported:   md5($p .
space_padding_100).  Here the null padding would be the wrong thing, so
either a special function would need to be made here, or even more (and much
more costly) conditional logic would need to be added to our set_len
function to now add an optional fill byte.

>> And here is the set_input_len_100().  It now keeps data from the prior
>> password cleaned up.  No reason to memset the buffers each time
>> (100bytes is large for the memset).
>
>Perhaps the same applies to _64 as well?

Nope, The set_len_64 was made for functions such as this:
md5(md5($s).md5($p))   In this instance, I have a flag, that tells the
set_key to do a hash, storing the results at offset 32 within the buffer.
Then we simply overwrite the first 32 bytes with salt (without adding the
null), which is what happens in
DynamicFunc__overwrite_salt_to_input1_no_size_fix().  Then the set_length_64
simply sets the length value for each line to 64.  It does not touch the
buffer at all, which is properly formatted.

Many of these type functions have been written specifically to optimize on
specific hash type.  If they can be used for more than one, they do.  



>
>> >Benchmarking: RAdmin v2.x MD5 [32/64]... (8xOMP) DONE
>> >Raw:    10754K c/s real, 1344K c/s virtual
>>
>> Yes, it is 'faster', but you are burning all 8 cores, for just a 3x
>> improvement,
>
>Sure.  It's this way for many of the fast hashes for which we have
>OpenMP support in jumbo.  Personally, I wouldn't use this kind of
>parallelization except maybe for really quick tests where I'm lazy to do
>anything smarter and when I have the machine to waste anyway.
>
>BTW, note that this is not a true 8-core machine (it's Bulldozer with
>its 4 modules and 2 cores per module; also it reduces clock rate from
>4.0 GHz to 3.1 GHz or maybe slightly higher when under full load), so
>the perfect speedup would be about 6x.  Thus, 3x is "only" twice worse
>than perfect.

My faster systems are the same way.  You get about 6x if you saturate.  For
fast hashes, I have been running 4 parallel instances on those machines,
each with different data sets.  It gets me the full speed run, and leaves
the machine very functional.  When running OMP, I do not usually get the
same speed (it usually is slower), the machine behaves sluggish, unless I
run the IDLE=Y in the .conf file, and then performance really takes a hit if
the machine is used.   The only benefit of running the OMP build, is that it
uses less memory, since only 1 instance is running, AND that if there is
some long sequential task that cannot be split up, it does complete that 1
task faster.

For slow hashes, yes, OMP does seem to be a better option. At times, I have
seen OMP get more performance than running multiple instances (not much, but
some).

>> and with the dynamic changes I propose, you are probably only get a
>> 2.5x improvement, again, burning all 8 cores.
>
>I thought dynamic didn't support OpenMP at all.  Is this changing?
>That would be very nice, especially for phpass.

No, what I meant was my RAdmin 2.x improved the speed by about 20%  (my
2120k to 2580k was about 20%).  I was just translating that improvement into
the prior 3x dynamic -> native format OMP speed.  The 2.5x was this
relationship.  Dynamic_1011 -> OMP native RAdmin

>On a related note, you could want to see if SIP is implementable as a
>dynamic too.  I think it might be.

I will look at it.


Btw, I did find the problem with the core.  I was 'supposed' to be returning
0 for bad loading of a format.  There was a message and a 0 was supposed to
be returned. However, I was returning 1 along with all of those error
messages, and returning 1 at the bottom of the function which DID signify
success.  So the loading continued, and when it tried to access the test
array structure, it was still NULL, because it was never allocated.  So I
had to change the return to be 0 for all of those error messages.

I will spend a little more time, an try to make the messages more
descriptive, at least listing WHICH format the error came from, and where
the format is (such as dynamic_preloads.c or dynamic.conf).  When you are
testing a single format, you know what is the problem.  But when you run
john normally, and do not specify -form=dynamic_1010  then you get all error
messages, with no indication where to look for the problem.  Also, once I
get the 'better' location message, I will see if all of the exit() can be
removed from dynamic, and dynamic parser.  This has been a thorn in dynamic
for a while, and should not have been left that way, into production.

Jim.
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.