Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Wed, 03 Jun 2015 13:24:42 -0400
From: Alain Espinosa <alainesp@...ta.cu>
To: john-dev@...ts.openwall.com
Subject: Re: bitslice SHA-256

Hi.

I had some free time and tried bitslice SHA256 in Neon. The results are as expected. Assembly output size is 19KB that is more than the L1 code cache of this CPU, but I do not see performance drops because of it.

Benchmark configuration: Android 4.4.2, GCC 4.6, Snapdragon 801 2.45GHz, only one thread
Performance is given in millions of keys per second
------------------------------------------------------------------------------------------------------------------
2.61 : Bitslice SHA256 implemented with hand-crafted Neon assembly (5.7% faster than normal, 35% faster than intrinsics)
2.47 : Normal   SHA256 implemented with hand-crafted Neon assembly
1.94 : Bitslice SHA256 implemented with Neon intrinsics  
0.83 : Bitslice SHA256 implemented with 64-bits code

Attached the Neon intrinsics and hand-crafted assembly source file. The VBSL (Neon bitselect) appears to be more costly than normal bitwise instructions. For practical speed-ups with bitslice SHA256 we need XOP or AVX512 instruction-sets. AVX512 probably provides speed-ups for SHA1 format also. MD5/MD4 formats uses less rotation/shifts, so bitslice is less useful and probably never practical.

Regards,
Alain
Content of type "text/html" skipped

Download attachment "bs_sha256_v3.zip" of type "application/zip" (15294 bytes)

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ