Date: Sat, 23 May 2015 18:45:11 +0300 From: Aleksey Cherepanov <lyosha@...nwall.com> To: john-dev@...ts.openwall.com Subject: cycle around crypt_all() body in raw-sha512 On Sat, May 23, 2015 at 04:22:33PM +0300, Aleksey Cherepanov wrote: > On Sat, May 23, 2015 at 02:27:47PM +0300, Aleksey Cherepanov wrote: > > On Sat, May 23, 2015 at 10:55:38AM +0800, Lei Zhang wrote: > > > I managed to add interleaving to SHA256 & SHA512, but the work is incomplete yet. When the interleaving factor is set other than 1, SHA256 works with a few formats, and SHA512 only works with sapH currently. Below are some statistics obtained from experimenting various interleaving factors: > > > > I am trying interleave in john-devkit on raw-sha512 with sse. While interleave gives me slow downs, I tried to wrap crypt_all()'s body into a cycle. With cycle up to N in xN disregarding count: x1 Raw: 2248K c/s real, 2248K c/s virtual Raw: 2290K c/s real, 2295K c/s virtual x2 Raw: 2361K c/s real, 2365K c/s virtual Raw: 2403K c/s real, 2403K c/s virtual x3 Raw: 2375K c/s real, 2380K c/s virtual Raw: 2414K c/s real, 2414K c/s virtual x4 Raw: 2340K c/s real, 2345K c/s virtual Raw: 2414K c/s real, 2419K c/s virtual x10 Raw: 2424K c/s real, 2429K c/s virtual x15 Raw: 2366K c/s real, 2366K c/s virtual Raw: 2429K c/s real, 2434K c/s virtual x16 Raw: 2283K c/s real, 2288K c/s virtual Raw: 2373K c/s real, 2373K c/s virtual Raw: 2425K c/s real, 2425K c/s virtual x20 Raw: 2324K c/s real, 2324K c/s virtual Raw: 2438K c/s real, 2438K c/s virtual x32 Raw: 2373K c/s real, 2373K c/s virtual Raw: 2424K c/s real, 2424K c/s virtual There are fluctuations... But speed is better in general. x32 should not fit into L1 cache. And I tried higher values but constant cycle seems to be bad because self tests grow count slowly: crypt_all(count = 2): crypt_all(count = 3): crypt_all(count = 4): crypt_all(count = 5): crypt_all(count = 7): crypt_all(count = 10): [...] crypt_all(count = 699914): crypt_all(count = 1049870): crypt_all(count = 1574804): crypt_all(count = 2000000): So a cycle with condition: index < count / $vsize + 1 may be used. It gives good results even on x1M (x 2 due to sse, so 2M candidates totally, bigger crypt_out buffer should be allocated in dynamic memory then): ~2400K c/s. So it seems that data cache does not matter much here. My guess: switching between crypt_all() and candidate generation in self tests is costly due to code cache, the cycle in crypt_all() smooths it. Any comments? Thanks! -- Regards, Aleksey Cherepanov
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.