john-dev - Re: Parallella: Litecoin mining

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAAepdCYS8_bLrfri9VWx8t_wGqn1X2PShz3JU7Fob4Gjnn4pBA@mail.gmail.com>
Date: Sat, 27 Jul 2013 00:00:41 +0100
From: Rafael Waldo Delgado Doblas <lord.rafa@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: Parallella: Litecoin mining

Mmm are there any kind of per core instruction or data cache? Also how much
memory needs the stack? We can use macros instead of functions to avoid
calls but how can I see the stack size that a binary will need?

Sorry for not quote I'm using my mobile.
El 26/07/2013 23:39, "Yaniv Sapir" <yaniv@...pteva.com> escribió:

> Technically, you could definitely run code from other core's local mem.,
> just like running from external mem. What you should keep in mind is that
> the implementation must be thread-safe (which is true for your data buffers
> as well), and that the stack should be private per-thread (which is
> provided by the internal.ldf and fast.ldf profiles, initializing the stack
> to the top of the local mem.).
>
> Also keep in mind that the execution will be slower than from local mem.
> (yet, much faster than execution from external mem.).
>
> Yaniv.
>
> On Fri, Jul 26, 2013 at 6:26 PM, Rafael Waldo Delgado Doblas <
> lord.rafa@...il.com> wrote:
>
>> Hello Yaniv,
>>
>>
>> 2013/7/26 Yaniv Sapir <yaniv@...pteva.com>
>>
>>> Rafael,
>>>
>>> As a starting point, consider using (a) neighbor core(s) local memory to
>>> store parts of your big array. It is much faster to retrieve data from an
>>> on-chip memory than the external memory. Obviously, it means that you are
>>> either not going to use the adjacent core for processing in parallel, or
>>> you need to write the program in a way that core groups share the same data
>>> buffer(s) (I am not familiar with the algorithm so I don't know if "V"
>>> needs to be modified during the calculation).
>>>
>>> Yaniv.
>>>
>>>
>> V will be modified several times during the calculation and will need
>> whole 32KB per core. Can I load the binary executable image in just one
>> core memory area and run it from the rest of the cores?
>>
>> If we can do that the memory map will looks something like:
>>
>> 0-31K for V on first core
>> 32-63K for V on second core
>> ...
>> 416-447K for V on fourteenth core
>> 448-479K for V on fifteenth core
>>
>> 480-?? for binary executable image
>> ??-??? for XY B first core.
>> ....
>> ??-??? for XY B fifteenth core.
>>
>> With this approach only 15 of 16 cores can be used because there is no
>> memory for the sixteenth core.
>>
>
>
>
>

Content of type "text/html" skipped

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.