Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 26 Jul 2013 19:29:13 -0400
From: Yaniv Sapir <>
Subject: Re: Parallella: Litecoin mining

No (hardware) caches in the Epiphany.

The amount of memory required by the stack usually depends on the depth of
you function call tree (as each function called is allocating a new frame
in the stack), and the amount of function local variables (automatic
variables, which are allocated in the function's frame). It is hard to tell
w/o a deep analysis of the code, but a possible technique is to "color" the
stack memory with some pattern, then run the program and examine how deep
the stack went by examining the memory patterns that were overwritten.

On Fri, Jul 26, 2013 at 7:00 PM, Rafael Waldo Delgado Doblas <> wrote:

> Mmm are there any kind of per core instruction or data cache? Also how
> much memory needs the stack? We can use macros instead of functions to
> avoid calls but how can I see the stack size that a binary will need?
> Sorry for not quote I'm using my mobile.
> El 26/07/2013 23:39, "Yaniv Sapir" <> escribió:
> Technically, you could definitely run code from other core's local mem.,
>> just like running from external mem. What you should keep in mind is that
>> the implementation must be thread-safe (which is true for your data buffers
>> as well), and that the stack should be private per-thread (which is
>> provided by the internal.ldf and fast.ldf profiles, initializing the stack
>> to the top of the local mem.).
>> Also keep in mind that the execution will be slower than from local mem.
>> (yet, much faster than execution from external mem.).
>> Yaniv.
>> On Fri, Jul 26, 2013 at 6:26 PM, Rafael Waldo Delgado Doblas <
>>> wrote:
>>> Hello Yaniv,
>>> 2013/7/26 Yaniv Sapir <>
>>>> Rafael,
>>>> As a starting point, consider using (a) neighbor core(s) local memory
>>>> to store parts of your big array. It is much faster to retrieve data from
>>>> an on-chip memory than the external memory. Obviously, it means that you
>>>> are either not going to use the adjacent core for processing in parallel,
>>>> or you need to write the program in a way that core groups share the same
>>>> data buffer(s) (I am not familiar with the algorithm so I don't know if "V"
>>>> needs to be modified during the calculation).
>>>> Yaniv.
>>> V will be modified several times during the calculation and will need
>>> whole 32KB per core. Can I load the binary executable image in just one
>>> core memory area and run it from the rest of the cores?
>>> If we can do that the memory map will looks something like:
>>> 0-31K for V on first core
>>> 32-63K for V on second core
>>> ...
>>> 416-447K for V on fourteenth core
>>> 448-479K for V on fifteenth core
>>> 480-?? for binary executable image
>>> ??-??? for XY B first core.
>>> ....
>>> ??-??? for XY B fifteenth core.
>>> With this approach only 15 of 16 cores can be used because there is no
>>> memory for the sixteenth core.

Yaniv Sapir
Adapteva Inc.
1666 Massachusetts Ave, Suite 14
Lexington, MA 02420
Phone: (781)-328-0513 (x104)
CONFIDENTIALITY NOTICE: This e-mail may contain information
that is confidential and proprietary to Adapteva, and Adapteva hereby
designates the information in this e-mail as confidential. The information
 intended only for the use of the individual or entity named above. If you
not the intended recipient, you are hereby notified that any disclosure,
distribution or use of any of the information contained in this
transmission is
strictly prohibited and that you should immediately destroy this e-mail and
contents and notify Adapteva.

Content of type "text/html" skipped

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.