john-dev - Re: bitslice SHA-256

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20150530002433.GA10980@openwall.com>
Date: Sat, 30 May 2015 03:24:33 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: bitslice SHA-256

On Sat, May 30, 2015 at 01:59:46AM +0200, magnum wrote:
> On 2015-05-29 20:13, Alain Espinosa wrote:
> >Hand-crafted AVX2 assembly code done for "normal" SHA256. Performance
> >in a core i5-4670 3.4GHz, single thread:
> >
> >- 23.7 millions keys per second. 87% faster than the bitslice one
> >with AVX2 intrinsics.
> 
> Alain, Solar,
> 
> The bitslice track is very interesting, but on a side note: What's the 
> main cause for this huge difference between normal SHA256 implemented in 
> assembly versus intrinsics? Perhaps the optimizer make some poor 
> choices? Could we learn something from analyzing compiled intrinsics and 
> tweak the source a little?

Perhaps.

> OTOH I think the JtR implementation of SHA256 is a lot faster than 12.5M 
> keys/s - benchmarking on well (i7-4770K 3.5GHz) shows over 19M. but we 
> might not compare apples to apples.

i5-4670 is documented to have max turbo at 3.8 GHz, i7-4770K has it at
3.9 GHz (confirmed by my own testing).  When comparing single thread
speeds on otherwise idle CPUs we get:

(23.7/3.8) / (19.3/3.9) = 1.26

so Alain's assembly code is 26% faster than our intrinsics.

Aleksey recently reported a 22% speedup for raw-sha256 relative to
what's committed in jumbo, using his john-devkit to generate a
C+intrinsics replacement raw-sha256 format.  This is on i7-3770.
He did not report the specific speed figures yet, just the percentage.

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.