john-dev - LOP3.LUT (was: Re: SHA-1 H())

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <9114e60f8325a2f84706bbb92ced358f@smtp.hushmail.com>
Date: Tue, 6 Oct 2015 02:32:17 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: LOP3.LUT (was: Re: SHA-1 H())

On 2015-09-14 23:06, Solar Designer wrote:
> On Mon, Sep 14, 2015 at 10:39:40PM +0200, magnum wrote:
>> BTW do you think we could use inline PTX to define a LOP3.LUT
>> instruction on nvidia, like you did with the funnel shifts?
>
> Yes, I thought of this too.  We could want to check the generated code
> first (it might already be using LOP3.LUT everywhere it should), or we
> could just do the inline asm right away to ensure we'll always have
> LOP3.LUT there no matter how the compiler might be changed.

I implemented a shared lop3_lut(a, b, c, imm) function in de6c7c6 but 
it's not enabled anywhere yet: I only tested md5crypt so far and it got 
about 5% performance loss. I also tried only using it for one function 
at a time but any of them results in performance loss - even F and G 
which are both pure bitselects otherwise. I was expecting no difference 
at all, at worst.

>> Or would it
>> possibly be worse than having the optimizer miss one or two, due to the
>> caveats of inline asm?
>
> I saw no drawbacks from using inline PTX asm, since instruction
> scheduling is performed in the PTX to ISA translation anyway.
>
> This is very different from inline asm in C code compiled for a CPU,
> where using inline asm for tiny pieces of code (such as for individual
> instructions) breaks the C compiler's instruction scheduling.

Something did not end up well. I'll compare resulting PTX and ISA and 
try to figure out what happens.

magnum

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.