Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 15 Sep 2015 00:06:41 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: SHA-1 H()

On Mon, Sep 14, 2015 at 10:39:40PM +0200, magnum wrote:
> BTW do you think we could use inline PTX to define a LOP3.LUT 
> instruction on nvidia, like you did with the funnel shifts?

Yes, I thought of this too.  We could want to check the generated code
first (it might already be using LOP3.LUT everywhere it should), or we
could just do the inline asm right away to ensure we'll always have
LOP3.LUT there no matter how the compiler might be changed.

> Or would it 
> possibly be worse than having the optimizer miss one or two, due to the 
> caveats of inline asm?

I saw no drawbacks from using inline PTX asm, since instruction
scheduling is performed in the PTX to ISA translation anyway.

This is very different from inline asm in C code compiled for a CPU,
where using inline asm for tiny pieces of code (such as for individual
instructions) breaks the C compiler's instruction scheduling.

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.