Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 23 Jul 2018 21:41:35 +0200
From: Solar Designer <>
Cc: Denis Burykin <>
Subject: Re: sha512crypt & Drupal 7+ password cracking on FPGA

On Mon, Jul 23, 2018 at 08:40:48PM +0200, Jens Steube wrote:
> For sha512crypt I'm getting around 377kH/s on all four GPU. That
> translates to ~94300 per 90W.
> For Drupal7 I'm getting around 156kH/s on all four GPU. That translates
> to ~39000 per 90W.
> This is a weird result on the first look.

Why, it looks reasonable to me.  Thanks for sharing it.

> If I understand your
> measurements correctly a single quad FPGA board is doing 54600H/s at 40W
> on sha512crypt and 16600H/s at 40W on Drupal7. If you scale this up to
> 90W, it's 122850H/s per sha512crypt and 37350H/s per Drupal7. That means
> from power consumption perspective it's 30% faster than the GPU for
> sha512crypt, but at the same time it's slower for Drupal7?

Right.  Our sha512crypt and Drupal7 on FPGA are basically same speed in
terms of their underlying SHA-512 hashes computed per second.  Like I
mentioned, our Drupal7 could have been more optimal in a specialized
design without support for unaligned access and maybe without the soft
CPUs at all (we could have freed up that logic to have up to 25% more
SHA-512 cores maybe), but we got it almost for free here (on top of the
sha512crypt design), so we're happy.  On GPU, you actually take
advantage of Drupal7's relative simplicity, as you say:

> The reason
> here is the branches in the loop function in sha512crypt which is a
> special case. GPU's really don't like them.

Actually, when all passwords loaded on the GPU at once are of the same
length I guess the branches don't hurt much.  What hurts is the need to
support unaligned accesses - and I guess you avoid this overhead in your
Drupal7 kernel.

> IOW, the GPU implementation
> for all *crypt algorithms is a bit below it's theoretical maximum. In
> Drupal7 (and PBKDF2 and most other KDF) there's no such branches in the
> loop thus the GPU can perform at full speed on all compute units.
> As you can see here the GPU of today are pretty close when it comes to
> power consumption to a FPGA board. I know that ztex boards are old now
> and that there's better solutions, but the same as with newer GPU, see
> alone the V100. I'm happy with the results.

Right.  Spartan-6 was introduced in 2009(?) on a 45nm process, and as
budget series (Virtex-6 were larger and faster).  NVIDIA Pascal was
introduced in 2016 and on a 16nm process.  So there's bigger potential
for improvement by switching from Spartan-6 to current UltraScale+ FPGAs
(2016, 16nm) than from Pascal to Volta (2017-2018, 12nm).

V100 is about twice larger than GTX 1080.  VU9P as offered on AWS F1 is
~16x larger than our Spartan-6 LX150 (so ~4x larger than our boards) and
also faster (we'll have higher clock rate - e.g., I saw mentions of it
running Keccak at 700+ MHz as a power consumption stress-test that
altcoin miners now use).  And this isn't even the largest FPGA (but
apparently larger ones are unrealistic to cool at full utilization).
The drawback is price.  Thousands of those boards tweaked for
cryptocurrency mining (lower core voltage, etc.) were recently offered
and quickly sold out to altcoin miners for $3600 each.  Original are
called VCU1525, tweaked are BCU1525 - you might want to Google them and
the reported altcoin mining speeds vs. GPUs.  I didn't look into this
closely yet, but if people are buying these then there must be
significant advantage.

Thanks again,


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.