john-dev - Re: bitslice DES on GPU

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABh=JRF_rFHk0k-gacsch0+SkgKr0d4KzvT3qhWH-j5bgqhxaA@mail.gmail.com>
Date: Thu, 6 Dec 2012 23:32:41 +0200
From: Milen Rangelov <gat3way@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: bitslice DES on GPU

Hello,

I did not follow the whole thread (and I should have, bitslice DES on GPU
sounds interesting, I did not manage to make it practical though :( ).

Why do you want to patch your kernels?

I used to patch the kernel binaries for BFI (before AMD mapped it to
bitselect()). It was not the IL code that was being patched, it's rather
the binary. I have never patched the AMDIL part. Binary patching is easy
because the kernels themselves were coded in a way that all that we needed
was to replace one instruction with another. The kernel binary is an ELF
file indeed, and from what I remember, it had one or more embedded ELF data
in it, so it's like ELF inside ELF. The general idea was to find all
occurences of the instruction to replace and change it with another
instruction (BFI) that had the same number of operands. There were several
potential candidates for such "replacement" instructions, but the
BYTEALIGN_INT one was best as it was easy to have it generated from OpenCL
 code (by using an AMD extension to OpenCL). The VLIW5/VLIW4 ISA is in fact
simple, instructions are always 64bit (though they may have 2 or more
operands) and part of it is the instruction ID, src/dst register ids, some
flags, etc. By exploiting the fact that the instruction id part is known
and some flags should have fixed value, you can (heuristically) find and
replace your instructions.

In fact I had several versions of this, the first version was dumb. It
assumed the ISA code started right after the ELF header and since that's
not true, it tried several alignments and chose the one that produced most
"instructions found" results. This of course was error-prone and I had to
implement per-kernel quirks that failed often :) Then I decided that it
would be better if we parse the ELF file better to find exactly where the
text section of the kernel lies. I failed a lot of times (most of them
ended with GPU crashes :) ) until I found out a bitcoin miner code that
_reliably_ patched the BFI thing. Then I was rather surprised to find out
that we have the ELF-inside-ELF situation. Once I understood that, I was
finally able to find out where the binary code starts so that I could
reliably patch the opcode.

Note that this is a rather simple case, for more advanced binary patching,
this would become much more complex. I've seen in the AMD forums some
people posting stuff about IL patching inside kernel. I don't really know
how that works. Perhaps they compile from source, patch the IL section
inside binary, strip the text section, then again pass that to
clBuildProgram to get the final binary, but I am not quite sure about this.

Regards,
Milen


>>
>> I know some people "binary patch" AMD kernels for BFI and stuff but I
>> always thought they actually patch IL code in an ELF binary file and then
>> load that. This will of course be a lot faster than actually recompiling so
>> it might be a better alternative (of course vendor dependant, but I think
>> that currently goes for any method).
>>
>> I bet Milen would know for sure how to proceed.
>>
>>

Content of type "text/html" skipped
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.