Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Tue, 13 Aug 2013 17:53:03 +0200
From: Dániel Bali <>
Subject: Re: Daniel's weekly report #9

Oops, I noticed there was a typo in the subject.

2013/8/13 Dániel Bali <>
> What I will try to do tomorrow is generating the ISA very small/simple
> kernels and see the differences in the initialization process. The key
> questions are:

I worked on this today and found out a few things. I used the AMD APP
KernelAnalyzer to generate the ISA for different kernels. The output is
different from what OpenCL generates on bull, but the difference is minimal.

> - Where is the data for each parameter loaded from?

Here is a test-case:

__kernel void sample (__global int *data1, __global int *data2) {
    data1[11] = 42;
    data2[12] = 43;

The generated ISA:

// s0 will be the starting address of data1
s_buffer_load_dword  s0, s[12:15], 0x00
// s1 will be the starting address of data2
s_buffer_load_dword  s1, s[12:15], 0x04
s_waitcnt     lgkmcnt(0)

// data1[11] = 42
v_mov_b32     v0, s0
v_mov_b32     v1, 42
// offset = 11*4 = 44
tbuffer_store_format_x  v1, v0, s[8:11], 0 offen offset:44

// data2[12] = 43
v_mov_b32     v0, s1
s_waitcnt     expcnt(0) // this is due to the last store instruction
v_mov_b32     v1, 43
// offset = 12*4 = 48
tbuffer_store_format_x  v1, v0, s[4:7], 0 offen offset:48

> - Is there a way to know where the data will be stored?

Yes. What I don't understand is the second (SBASE) parameter of
s_buffer_load_dword (s[12:15]). According to the documentation:
"[SBASE] Specifies the SGPR-pair that holds the base byte-address for the

So the address is probably loaded there before the kernel starts execution.
However, when I only use 1 parameter for the kernel, the SBASE value will
be s[8:11]. With 3, 4 and 5 parameters it becomes s[4:7].

> - How do we access constant values?

When accessing constant buffers the SBASE value was always different, but
aside from that I didn't see any difference.

> - Where do we have to write the end results?

tbuffer_store_format_x also uses 4 scalar registers (s[4:7] above) for its
SRSRC parameter. The purpose of this is unclear to me.
The address where tbuffer_store_format_x writes is v0, which we acquired
from the initial s_buffer_load_dword instruction.
The offset for this instructions represents the array index that was passed.

After realizing that the KernelAnalyzer can generate ISA quickly, I decided
to try and generate the ISA of a "stripped down" blowfish kernel. I kept
the kernel parameters and changed the body to a single line that wrote data
to the BF_out buffer. The generated ISA looked good, but for some reason it
didn't work. I have to look into why this didn't work.

Tomorrow I'll continue comparing the ISA for slightly different kernels.
I want to find out how the global id is acquired, and what is stored in the
SGPRs that are used with s_buffer_load and tbuffer_store.


Content of type "text/html" skipped

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.