Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Wed, 03 Jun 2015 13:24:42 -0400
From: Alain Espinosa <>
Subject: Re: bitslice SHA-256


I had some free time and tried bitslice SHA256 in Neon. The results are as expected. Assembly output size is 19KB that is more than the L1 code cache of this CPU, but I do not see performance drops because of it.

Benchmark configuration: Android 4.4.2, GCC 4.6, Snapdragon 801 2.45GHz, only one thread
Performance is given in millions of keys per second
2.61 : Bitslice SHA256 implemented with hand-crafted Neon assembly (5.7% faster than normal, 35% faster than intrinsics)
2.47 : Normal   SHA256 implemented with hand-crafted Neon assembly
1.94 : Bitslice SHA256 implemented with Neon intrinsics  
0.83 : Bitslice SHA256 implemented with 64-bits code

Attached the Neon intrinsics and hand-crafted assembly source file. The VBSL (Neon bitselect) appears to be more costly than normal bitwise instructions. For practical speed-ups with bitslice SHA256 we need XOP or AVX512 instruction-sets. AVX512 probably provides speed-ups for SHA1 format also. MD5/MD4 formats uses less rotation/shifts, so bitslice is less useful and probably never practical.

Content of type "text/html" skipped

Download attachment "" of type "application/zip" (15294 bytes)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.