Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 1 Apr 2012 23:56:05 +0300
From: Milen Rangelov <>
Subject: Re: fast hashes on GPU

> Is this for AMD VLIW?

Yes, my Barts kernel (I have a machine with 2x5870s so that I can test the
Cypress one - but I don't think it would make much of a difference).

Do you limit this to uint2 because you can't afford 100+ GPRs?

Yes. It does not get to more than 100 though, more like 80-90.

> Do you have similar stats for Nvidia?

Not at the moment. I have a NVidia card, but not a free PCIe slot right

If things are so bad in terms of register pressure anyway, maybe
> bitslicing would be of help - at least we'll avoid the 64-bit rotates.

Yes, but unfortunately my design would not allow for processing 64 or 128
hashes per workitem given the way I generate plaintexts currently. And this
would not change I guess. But I am interested in the results if you do that
in jtr. In fact, I was considering doing bitslice DES on GPUs before and I
did some experiments. GPR usage is a disaster, but part of the data could
be shifted to local memory and I still believe it's quite possible. Then
again, unfortunately, it is completely incompatible with my model, if I
were to implement it, changes would be so huge it's kind of becoming a
"самоцел" (I believe you have the same word in Russian, I can think of no
good equivalent in English though :) )

 Anyway, I am not quite happy with the generated ISA code. Perhaps
situation would somewhat improve if I do not use 64-bit longs and deal with
64-bit operations emulation on 32-bit uints myself. I wish I had more time
for that :(

Content of type "text/html" skipped

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.