john-dev - Re: Katja's weekly report #6

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFYn=yDctQ1fe-axSjpnmSugg1dprggbMCmbO31qreUDJG9TKA@mail.gmail.com>
Date: Thu, 25 Jul 2013 06:51:49 -0400
From: Yaniv Sapir <yaniv@...pteva.com>
To: john-dev@...ts.openwall.com
Subject: Re: Katja's weekly report #6

Katja,

>
>
> Yaniv, in Epiphany Architecture Reference 4.13.01.04, p. 20, Table 1 - is
> this order guaranteed only for instructions executed by Epiphany cores? Are
> reads and writes to/from core local memory executed by host also non
> deterministic?
>

First, the table describes the on-chip traffic rules. Generally speaking,
in a system comprised of a host and an Epiphany, more out-of-order rules
may apply depending on the system architecture.

Specifically for the Parallella system, there are two domains - the
external memory domain, governed by the AXI system bus, and the Epiphany
domain, operating as described in that table. Transactions from Host to and
eCore travel via the system bus through the Epiphany (FPGA, actually) bus
interface to the chip's eLink port. From the eLink a transaction continues
on the appropriate on-chip network (xMesh for writes, rMesh for reads).

The key to understanding order-ness is the division to netwroks and paths.
Transactions travelling on the same network and same endpoints will
complete in-order. Otherwise, you cannot tell which one completes first.

This is also determined by the fact that in the Epiphany architecture,
transactions are performed in a a"fire-and-forget" manner, meaning a master
does not know when a transaction reaches its destination (no built-in
hardware "ack"). For read transactions - remember that they are performed
as split transactions. When loading a register from a remote address, the
read request travels on the mesh, and it is not known when it arrives at
the remote agent. During this time, the eCore pipeline is stalled until the
writeback is performed


Let's think about a couple of scenarios for **host *generated transactions*:

(I'll use the folllosing symbols - wr(x): write to core #x. rd(y): read
from core #y. Dz: DRAM address z)

1. wr(1), wr(1) - same on-chip path on same mesh network, so the 2nd
transaction is guarantied to complete after the 1st (so the #1 mem ends up
with the latest value).

2. wr(1), wr(2) - not necessarily same on-chip path, so we can't tell which
one completes first.

3. wr(1), rd(1) - different mesh networks, so not in-order.

4. wr(1), rd(2) - as #3.

5. wr(D1), rd(D1) - should be in-order, but TBD later today.




> On p.19 of same pdf: "To ensure that these effects do not occur in code
> that requires strong ordering of load and store operations, use run-time
> synchronization calls with order-dependent memory sequences." - where can I
> find more about those calls? I wasn't able to find them in SDK reference
> (at least not under that name).
>
>
I think that the purpose of this paragraph is to serve as an advice or
warning, rather than to point to an actual implementation.

Here's a piece of code that will be integrated into the e-hal. It should
ensure the completion of a host write transaction (soft ack). The idea is
to use a destination address which is in the same endpoint as the
data-write transactions:

void e_write_ack(unsigned *addr)
{
  unsigned probe_data;
  probe_data = (*addr);        //read old data
  probe_data = ~probe_data;    //toggle old data
  *addr      = probe_data;     //write new toggled data
  while (probe_data != *addr); //keep reading until match is met
}

Content of type "text/html" skipped
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.