Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 23 May 2015 18:45:11 +0300
From: Aleksey Cherepanov <lyosha@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: cycle around crypt_all() body in raw-sha512

On Sat, May 23, 2015 at 04:22:33PM +0300, Aleksey Cherepanov wrote:
> On Sat, May 23, 2015 at 02:27:47PM +0300, Aleksey Cherepanov wrote:
> > On Sat, May 23, 2015 at 10:55:38AM +0800, Lei Zhang wrote:
> > > I managed to add interleaving to SHA256 & SHA512, but the work is incomplete yet. When the interleaving factor is set other than 1, SHA256 works with a few formats, and SHA512 only works with sapH currently. Below are some statistics obtained from experimenting various interleaving factors:
> > 
> > I am trying interleave in john-devkit on raw-sha512 with sse.

While interleave gives me slow downs, I tried to wrap crypt_all()'s
body into a cycle.

With cycle up to N in xN disregarding count:

x1
Raw:	2248K c/s real, 2248K c/s virtual
Raw:	2290K c/s real, 2295K c/s virtual

x2
Raw:	2361K c/s real, 2365K c/s virtual
Raw:	2403K c/s real, 2403K c/s virtual

x3
Raw:	2375K c/s real, 2380K c/s virtual
Raw:	2414K c/s real, 2414K c/s virtual

x4
Raw:	2340K c/s real, 2345K c/s virtual
Raw:	2414K c/s real, 2419K c/s virtual

x10
Raw:	2424K c/s real, 2429K c/s virtual

x15
Raw:	2366K c/s real, 2366K c/s virtual
Raw:	2429K c/s real, 2434K c/s virtual

x16
Raw:	2283K c/s real, 2288K c/s virtual
Raw:	2373K c/s real, 2373K c/s virtual
Raw:	2425K c/s real, 2425K c/s virtual

x20
Raw:	2324K c/s real, 2324K c/s virtual
Raw:	2438K c/s real, 2438K c/s virtual

x32
Raw:	2373K c/s real, 2373K c/s virtual
Raw:	2424K c/s real, 2424K c/s virtual

There are fluctuations... But speed is better in general.

x32 should not fit into L1 cache. And I tried higher values but
constant cycle seems to be bad because self tests grow count slowly:

crypt_all(count = 2): 
crypt_all(count = 3): 
crypt_all(count = 4): 
crypt_all(count = 5): 
crypt_all(count = 7): 
crypt_all(count = 10): 
[...]
crypt_all(count = 699914): 
crypt_all(count = 1049870): 
crypt_all(count = 1574804): 
crypt_all(count = 2000000): 

So a cycle with condition: index < count / $vsize + 1 may be used. It
gives good results even on x1M (x 2 due to sse, so 2M candidates
totally, bigger crypt_out buffer should be allocated in dynamic memory
then): ~2400K c/s. So it seems that data cache does not matter much
here. My guess: switching between crypt_all() and candidate generation
in self tests is costly due to code cache, the cycle in crypt_all()
smooths it. Any comments?

Thanks!

-- 
Regards,
Aleksey Cherepanov

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.