Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 6 Oct 2015 03:19:11 +0300
From: Solar Designer <>
Subject: Re: PHC: yescrypt on GPU

On Tue, Oct 06, 2015 at 03:16:09AM +0300, Solar Designer wrote:
> Then there's also this weird trick I just posted about to the PHC list:
> where a BSTY miner implementation author chose to split the S-box
> lookups across multiple work-items.  He chose 16, but I think 4 would be
> optimal (or maybe 8, if additionally splitting loads of the two S-boxes
> across two sets of 4 work-items).  This might in fact speed them up, so
> might be worth trying (as an extra option, on top of 3 main ones).
> He reported 372 h/s at 2 MB (N=2048 r=8) on HD 7750.  Scaling to 7970,

> this could be up to 372*2048/512*1000/800 = 1860, but probably a lot
> less than that in practice (7750's narrower memory bus might be a better
> fit).  Your reported best result is 914 for 1.5 MB (r=6), so seemingly
> much slower than his:
> We have a 7750 (a version with DDR3 memory, though) in "well", so you
> may try your code on it and compare against the 372 figure directly.
> And like I wrote, his byte-granular loads are likely not optimal, with
> uint likely more optimal.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.