Date: Tue, 22 Nov 2011 20:09:43 +0100 From: magnum <john.magnum@...hmail.com> To: john-dev@...ts.openwall.com Subject: Re: SHA1 SSE2i R&D work Good stuff. What's the gain for dcc2? magnum 2011-11-22 19:58, jfoug wrote: > I have done some R&D on the SSE2i code, for SHA1. Here is the existing > SSEi data layout for input buffers: > > > > > > uint32 crypt  > > > > Of this, the first 16 uint’s are read only, and contain the data to > hash, the 0x80 bit, and the bit count of the buffer (standard SHA input > layout). uint’s 17 to 80 are used as the expansion buffer. This data is > written to, and then later read from. > > > > I have never really liked this layout, I consider it to be pretty > wasteful. It can easily be seen, that this expansion buffer, ‘could’ be > done with only 16 uints, and treated in a circular manner. I had always > thought this would give an improvement in speed, due to a much smaller > working set, along with other things, such as improved L1 caching. > > > > This same layout is used within sha1-mmx.S (32 bit SSE), and within > sse-intrinsics.c. The sha1-mmx.S was complex enough, I was not going > to make a stab at trying to work with the data layout, but now with > sse-intrinsics-64.S and sse-intrinsics-32.S allowing the intrinsic code > to work as fast as the 32 bit hand built asm, and on 64 bits, the > intrinsic are the ONLY option. I worked at getting sha1 to use uint32 > crypt Then, within sse-intrinsics.c, I made changes, and added a > new buffer __m128itmpR[SHA1_SSE_PARA*16]; This buffer is loaded one > element at a time, and used, the same way that the ‘longer’ buffer > version would. This required several matched macros. The prior code had > a loop, that assigned values to all expansion buffer items (from 17, to > 80), prior to processing. With the circular buffer, I load the array > element, prior to using it. However, due to 4 different ‘prior’ > variables within the data being used to generate the ‘next’ variable, > the only way I could see to do this, is to have multiple SHA_ROUND > macros. I built a, b, c, d, then the ‘normal’ one, then an x. The ‘a’ > ROUND is used until we hit array element 3. We pull all of the items > from the data array. The ‘b’ ROUND is used from element 3 to 8. It > pulls the first element out of the temp circular buffer, but pulls the > other 3 out of the data. In the ‘normal’, I have to pull the current > tmp buffer item into another tmp value, this is because before > processing, this exact same variable will be overwritten. This variable > is at our ‘current’ location, and also at the +16 location. The final > ROUND macro (the SHA_ROUNDx), does not need to compute the expansion, so > that part is out. > > > > Now, I do not have the Intel compiler, so I am using cygwin GCC, > building sse-intrisics.c. It is pretty slow. Also, I can not use > SHA1_PARA 3, it slows things down a LOT, but now with improved memory > usage, PARA=3 may be useful in the Intel produced code. > > > > However, I made changes to rawSHA1_fmt.c, so that I can comment 1 line > out (or leave it in), and can build for either legacy SHA1 layout, or > for the new layout. Both functions are in sse-intrisics.c, so it can be > built either way. Here are the timings, with little work being done for > optimizations, I simply ‘got it working’, and wanted to see some speeds, > more for proof of concept at this momement. > > > > $ ../run/john -test=5 -form=raw-sha1 > > Benchmarking: Raw SHA-1 [SSE2i 8x]... DONE > > Raw: 5207K c/s > > > > $ ../run/john -test=5 -form=raw-sha1 > > Benchmarking: Raw SHA-1 [SSE2i type-2 8x]... DONE > > Raw: 6909K c/s > > > > > > So by this quick proof of concept, I am getting about 1/3 speedup. Not > bad. > > > > I have not put code up on the wiki yet. I would like to look a little > deeper, and make sure I have not overcomplicated things. I may get the > changes off to Simon and Magnum, to get them to look, and to also get a > .S file, possibly several with different SHA_PARA values being set. > > > > >
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.