Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Tue, 11 Aug 2015 08:00:54 +0200
From: Agnieszka Bielec <>
Subject: Agnieszka's weekly report #15

- I made optimizations for argon2i/d although argon2d is faster only
slightly but I didn't do benchmarks because super wasn't idle, maybe
slow hash shouldn't be impaired on GPU but it's better not to risk.
In argon2i I did coalescing and vectorization and both gave me the
better speed but I have problem in my own nvidia card:
argon2i has 2 functions ComputeBlock, the first is working on private
memory, the second firstly copy memory from __global to __private and
after that do the same computations as ComputerBlock. vectorization
the first function gave me better speed on all GPU's but when I also
vectorized the second I received better speed on GPU's on super but
strange slow-down on my own card.
speed wasn't better after vectorization function ComputeBlock in
argon2d although it's better after vectorized memory access. I did
also coalescing in argon2d but speed was worse. now I have argon2d
without coalescing but it's easy to turn it on/off. don't know if my
optimizations are fully because I couldn't understand how new
addresses are computed in FillSegment() .

- check if this slow-down on my laptop is caused by the size of the kernel
- I will be working on makwa
- I wrote somewhere on ml about MEM_SIZE/4. indeed auto-tune returns
GWS properly after this division but I discovered some problems and I
will investigate this more

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.