Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 11 Aug 2015 12:31:01 +0800
From: Lei Zhang <zhanglei.april@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: JtR on ARM (NEON)


> On Aug 10, 2015, at 12:16 AM, Solar Designer <solar@...nwall.com> wrote:
> 
> On Sun, Aug 09, 2015 at 08:38:35PM +0800, Lei Zhang wrote:
>> Another phenomenon observed: when I ran `john --test`, pbkdf2-hmac-sha512 and all other formats passed self-tests, but running `john --test --format=pbkdf2-hmac-sha512` would fail. This really strange.
> 
> This suggests there's an uninitialized variable being used.  Running
> other tests first probably results in something suitable getting written
> to it (whether directly or via a previous use of the same memory).

Well, I finally got this issue resolved, by using a newer compiler. 

I debugged really hard but couldn't find what's wrong in the code. After all, I added NEON intrinsics the same way as I added AltiVec intrinsics, yet nothing went wrong on POWER. I doubted if the compiler was causing some trouble under the hood, so I tried to build JtR with a newer gcc (gcc 4.7.3, cross-compiling, didn't bother to manually build gcc for ARM). The newly built binary worked just fine. 

I'm really not sure if it's some bug in gcc 4.6 for ARM, or it's some undefined behavior in JtR that happened to work with the newer gcc. Anyway, here's the comparison of JtR's performance with and without using NEON, cross-compiled with the new compiler:

(NOTES: processor model is Tegra K1; vroti is emulated with 2 instructions; OpenMP disabled; no interleaving involved; unaligned accesses are not tended yet)

[without NEON]
Benchmarking: PBKDF2-HMAC-MD4 [PBKDF2-MD4 32/32]... DONE
Speed for cost 1 (iteration count) of 1000
Raw:	2515 c/s real, 2675 c/s virtual

Benchmarking: PBKDF2-HMAC-MD5 [PBKDF2-MD5 32/32]... DONE
Speed for cost 1 (iteration count) of 1000
Raw:	1922 c/s real, 2023 c/s virtual

Benchmarking: PBKDF2-HMAC-SHA1 [PBKDF2-SHA1 32/32]... DONE
Speed for cost 1 (iteration count) of 1000
Raw:	1185 c/s real, 1247 c/s virtual

Benchmarking: PBKDF2-HMAC-SHA256 [PBKDF2-SHA256 32/32 OpenSSL]... DONE
Speed for cost 1 (iteration count) of 1000
Raw:	794 c/s real, 827 c/s virtual

Benchmarking: PBKDF2-HMAC-SHA512, GRUB2 / OS X 10.8+ [PBKDF2-SHA512 32/32 OpenSSL]... DONE
Speed for cost 1 (iteration count) of 1000
Raw:	197 c/s real, 211 c/s virtual

[with NEON]
Benchmarking: PBKDF2-HMAC-MD4 [PBKDF2-MD4 128/128 NEON 4x]... DONE
Speed for cost 1 (iteration count) of 1000
Raw:	4624 c/s real, 4867 c/s virtual

Benchmarking: PBKDF2-HMAC-MD5 [PBKDF2-MD5 128/128 NEON 4x]... DONE
Speed for cost 1 (iteration count) of 1000
Raw:	3576 c/s real, 3804 c/s virtual

Benchmarking: PBKDF2-HMAC-SHA1 [PBKDF2-SHA1 128/128 NEON 4x]... DONE
Speed for cost 1 (iteration count) of 1000
Raw:	2424 c/s real, 2551 c/s virtual

Benchmarking: PBKDF2-HMAC-SHA256 [PBKDF2-SHA256 128/128 NEON 4x]... DONE
Speed for cost 1 (iteration count) of 1000
Raw:	1104 c/s real, 1162 c/s virtual

Benchmarking: PBKDF2-HMAC-SHA512, GRUB2 / OS X 10.8+ [PBKDF2-SHA512 128/128 NEON 2x]... DONE
Speed for cost 1 (iteration count) of 1000
Raw:	518 c/s real, 551 c/s virtual

Speedups:
-----------
md4	1.8
md5	1.9
sha1	2.0
sha256	1.4
sha512	2.6
-----------

Lei

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.