Date: Sun, 5 May 2013 11:39:59 -0400 From: <jfoug@....net> To: john-dev@...ts.openwall.com Subject: SSE code in pbkdf2_hmac_sha1 and pbkdf2_hmac_sha256 code I have modified both of these headers. They now build with either oSSL code, or with SSE intrinsic code, depending upon the build target. The SSE code is a little different than the oSSL code, multiple candidates are bundled with each call. This requires a little 'bundling' code changes and #defines in the crypt_all. However, I tried to hide as much of the differences in the actual pbkdf2 function calls. The SSE code reads all input data from 'flat' passwords, just like the oSSL code. They are bundled into an array of pointers, but the original data IN the format does not change. Also, the output crypt buffers are written 'flat', and not interspersed SSE format. So the original 'flat' oSSL crypt buffers the format uses are still the same. This does slow things down, just a touch, marshalling the data, however, the cost is so insignificant in comparison to the cost of the pbkdf2, that it can not really be measured. But the reduction in complexity is pretty extreme. At this time, there are at least 2 formats which have not been ported. DMG is one. That format will likely require some re-architecture to to work. The other format is mscash2. This is the actual format, where I first did the SSE porting of pbkdf2. This is also the format where I doubled the speed of processing PBKDF2 (by reducing the first crypt of each ipad/opad). However, that format was done interally. It will take quite a bit of work, to carefully remove all of that code, getting back to a 'clean' thin format which can use the new probably better logic in pbkdf2_hmac_sha1.h file. I also wrote the pbkdf2_hmac_sha1.h to do multiple buffers (hashes requiring more than 20 bytes), and also to do the 'skipping' code, which recently sped up the zip format. The zip format now using pbkdf2_hmac_sha1.h is twice as fast as the recent change that used gladman's code. The gladman code doing 'proper documented' hmac type code, is twice as slow as the optimized version. Also, the optimized version will use SSE2 whenever the build allows it. On my PC, zip went from 650 to about 1500 with the original change. Then to about 3100 after my changes, (oSSL build), and about 9k for SSE2 builds. So from 650 to 9k is not a bad speedup ;) Jim.
Powered by blists - more mailing lists
Powered by Openwall GNU/*/Linux - Powered by OpenVZ