Date: Sun, 8 Jul 2012 04:55:09 -0500 From: "jfoug" <jfoug@....net> To: <john-dev@...ts.openwall.com> Subject: RE: RAdmin, SIP speedup >From: Solar Designer [mailto:solar@...nwall.com] > >On Sat, Jul 07, 2012 at 09:58:36PM -0500, jfoug wrote: >> Here is an update to dynamic. >> >> $ ../run/john -test=5 -form=dynamic_1010 >> Benchmarking: dynamic_1010 dynamic_1010: RAdmin v2.x MD5 [32/32 128x1 >> (MD5_Body)]... DONE >> Raw: 2117K c/s real, 2122K c/s virtual >> >> $ ../run/john -test=5 -form=dynamic_1011 >> Benchmarking: dynamic_1011 dynamic_1011: RAdmin v2.x MD5 [32/32 128x1 >> (MD5_Body)]... DONE >> Raw: 2582K c/s real, 2587K c/s virtual > >Maybe you should make your dynamic_1011 the new dynamic_1010 instead, >when you commit this? That is, have just one dynamic format for RAdmin. If the original change has not been made yet, then I would deem that the proper change. >> Func=DynamicFunc__set_input_len_100 > >Maybe we need to support something like: > >Func=DynamicFunc__set_input_len(100) At the current time, there are no parameters possible. I have had ideas on how to add simple looping, variables and parameter, but have not tried much yet. The problem comes from many of these type functions are not generic. They are written specifically for some format, OR more often specifically for optimization of a few formats. Take for instance set_len_32. It was written for formats like: md5(md5($s).$p), or any others that put a base32 hex salt first. So what we do, us use a flag that tells us within set_salt() to compute the md5 of the salt, and drop the base-16 over the start of each input. Then we simply set the length to 32, and append the keys (or do whatever other work there is to be done). For most formats, the X86 builds (non-sse2), do not care about any dirty crap in the buffer, past the current length. Thus, that is not cleaned up on the x86 version of the set_len_32. However, it is cleaned up for the MMX versions. However, for this new set_len_100, there are a few different special things that happen. 1., the code throws an error, if you try to run it in MMX mode. 2, the dirty part of the buffer gets a cleaning, since it is supposed to be null terminated. So, if we made this generic, then the generic function (dyna_set_len_param() ) It would have to have all of the conditional logic, would always have to clean, etc. Then, what happens when this crypt needs supported: md5($p . space_padding_100). Here the null padding would be the wrong thing, so either a special function would need to be made here, or even more (and much more costly) conditional logic would need to be added to our set_len function to now add an optional fill byte. >> And here is the set_input_len_100(). It now keeps data from the prior >> password cleaned up. No reason to memset the buffers each time >> (100bytes is large for the memset). > >Perhaps the same applies to _64 as well? Nope, The set_len_64 was made for functions such as this: md5(md5($s).md5($p)) In this instance, I have a flag, that tells the set_key to do a hash, storing the results at offset 32 within the buffer. Then we simply overwrite the first 32 bytes with salt (without adding the null), which is what happens in DynamicFunc__overwrite_salt_to_input1_no_size_fix(). Then the set_length_64 simply sets the length value for each line to 64. It does not touch the buffer at all, which is properly formatted. Many of these type functions have been written specifically to optimize on specific hash type. If they can be used for more than one, they do. > >> >Benchmarking: RAdmin v2.x MD5 [32/64]... (8xOMP) DONE >> >Raw: 10754K c/s real, 1344K c/s virtual >> >> Yes, it is 'faster', but you are burning all 8 cores, for just a 3x >> improvement, > >Sure. It's this way for many of the fast hashes for which we have >OpenMP support in jumbo. Personally, I wouldn't use this kind of >parallelization except maybe for really quick tests where I'm lazy to do >anything smarter and when I have the machine to waste anyway. > >BTW, note that this is not a true 8-core machine (it's Bulldozer with >its 4 modules and 2 cores per module; also it reduces clock rate from >4.0 GHz to 3.1 GHz or maybe slightly higher when under full load), so >the perfect speedup would be about 6x. Thus, 3x is "only" twice worse >than perfect. My faster systems are the same way. You get about 6x if you saturate. For fast hashes, I have been running 4 parallel instances on those machines, each with different data sets. It gets me the full speed run, and leaves the machine very functional. When running OMP, I do not usually get the same speed (it usually is slower), the machine behaves sluggish, unless I run the IDLE=Y in the .conf file, and then performance really takes a hit if the machine is used. The only benefit of running the OMP build, is that it uses less memory, since only 1 instance is running, AND that if there is some long sequential task that cannot be split up, it does complete that 1 task faster. For slow hashes, yes, OMP does seem to be a better option. At times, I have seen OMP get more performance than running multiple instances (not much, but some). >> and with the dynamic changes I propose, you are probably only get a >> 2.5x improvement, again, burning all 8 cores. > >I thought dynamic didn't support OpenMP at all. Is this changing? >That would be very nice, especially for phpass. No, what I meant was my RAdmin 2.x improved the speed by about 20% (my 2120k to 2580k was about 20%). I was just translating that improvement into the prior 3x dynamic -> native format OMP speed. The 2.5x was this relationship. Dynamic_1011 -> OMP native RAdmin >On a related note, you could want to see if SIP is implementable as a >dynamic too. I think it might be. I will look at it. Btw, I did find the problem with the core. I was 'supposed' to be returning 0 for bad loading of a format. There was a message and a 0 was supposed to be returned. However, I was returning 1 along with all of those error messages, and returning 1 at the bottom of the function which DID signify success. So the loading continued, and when it tried to access the test array structure, it was still NULL, because it was never allocated. So I had to change the return to be 0 for all of those error messages. I will spend a little more time, an try to make the messages more descriptive, at least listing WHICH format the error came from, and where the format is (such as dynamic_preloads.c or dynamic.conf). When you are testing a single format, you know what is the problem. But when you run john normally, and do not specify -form=dynamic_1010 then you get all error messages, with no indication where to look for the problem. Also, once I get the 'better' location message, I will see if all of the exit() can be removed from dynamic, and dynamic parser. This has been a thorn in dynamic for a while, and should not have been left that way, into production. Jim.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.