Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 24 Aug 2015 05:28:21 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: PHC: Argon2 on GPU

On Mon, Aug 24, 2015 at 01:52:35AM +0200, Agnieszka Bielec wrote:
> 2015-08-23 8:15 GMT+02:00 Solar Designer <solar@...nwall.com>:
> > While private memory might be larger and faster on specific devices, I
> > think that not making any use of local memory is wasteful.  By using
> > both private and local memory at once, we should be able to optimally
> > pack more concurrent Argon2 instances per GPU and thereby hide more of
> > the various latencies.
> 
> why will we pack more argon2 per gpu using both types of memory?
> I'm using only very small portions of private memory.

You're using several kilobytes per instance - that's not very small.

If not this, then what is limiting the number of concurrent instances
when we're not yet bumping into total global memory size?  For some of
the currently optimal LWS/GWS settings, we're nearly bumping into the
global memory size, but for some (across the different GPUs, as well as
2i vs. 2d) we are not.  And even when we are, maybe a higher LWS would
improve performance when we can afford it.

> BTW in my vectorized kernels shuffling between two groups of argon
> rounds takes very long time so I did something that I grouped kernel
> instances to 4 and I'm interleaving data to this local memory and I
> can avoid shuffling
> but in my laptop I can gain 3k c/s for LWS=8 so no speedup. (4k is in
> bleeding-jumbo branch)
> but I think this is not what you mean here
> I uploaded this to branch interleaving4 (argon2d only)
> I updated vector8 branch and created vector16 some time ago

I took a look at the interleaving4 branch, commit
11f932d9642604ca807e336fe286329651a87c49 (comment interleaving4).
No, that's not what I meant.  It's a weird mix of ulong8 and splitting
work across different work-items, whereas the latter should be attempted
as an alternative to the former.

Alexander

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ