Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 30 Aug 2015 01:44:32 +0200
From: Agnieszka Bielec <>
Subject: Re: PHC: Argon2 on GPU

2015-08-29 8:48 GMT+02:00 Solar Designer <>:
> As to loop unrolling, there's "#pragma unroll N", and when you specify
> N=1 so "#pragma unroll 1" I think it prevents unrolling.  As an
> experiment, I tried adding "#pragma unroll 1" before all loops in
>, and the PTX instruction count reduced - but not a
> lot.

Can I get this code?

> We need to figure out why it doesn't get lower.  ~80k is still a lot.
> Are there many inlined functions and unrolled loops in the .h files?

there are also blake2 files

> Maybe some pre- and/or post-processing should be kept on host to make
> the kernel simpler and smaller.  This is bad in terms of Amdahl's law,
> but it might help us figure things out initially.

I will think about it and split kernels, even small pomelo was slightly faster

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.