crypt-dev - Re: Yuri's Status Report

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFMirAGLNJxqVoQRMVWByYjhPbgHFaxiB7fwiR1R59Q-MHa14g@mail.gmail.com>
Date: Thu, 18 Aug 2011 02:46:10 -0300
From: Yuri Gonzaga <yuriggc@...il.com>
To: crypt-dev@...ts.openwall.com
Subject: Re: Yuri's Status Report - #14 of 15

> Would it be possible for you to invoke picoXfaceP->WriteDevice() just
> once, or just once per core, on the entire block of data?  Perhaps start
> by turning this loop:
>                for(int i = 0; i < 1024; i++) {
>                        if(picoXfaceP->WriteDevice(7,&sBoxes[i],4) < 0){
>                                printf("ERRO NA ESCRITA. ABORTANDO\n");
>                                exit(EXIT_FAILURE);
>                        }
>                }
> into one call to picoXfaceP->WriteDevice(), for the 4 KB of data.
>

I gave some tries of calling WriteDevice() passing block of data. It isn't
working. It is causing the return of wrong result.
Maybe I don't know how to call that function properly or there is any
problem with byte ordering.
In fact, it works passing 4 bytes a time and greater blocks apparently not.


> On the other hand, if this is somehow time-consuming for you (would take
> more than a day to implement) and the API for M-501 under Linux is very
> different (I don't know if it is), then don't bother.
>

It is very similar.


> > With cost = 18, and 4 cores vs. 4 sequential invocations, I got:
> >
> >    - Sequential total time: ~ 33 minutes
> >    - Parallel total time: ~ 9 minutes
> These numbers looked reasonable to me at first, but then I did some math
> and they don't agree with the 0.06 seconds figure for cost=5 that you
> gave above.  Specifically:
> 33 * 60 / (2 ^ (18 - 5)) = 0.24
> I expected to see something close to 0.06.  Why is it 4 times slower
> here?  The difference between sequential and parallel times suggests
> that the reads/writes overhead is indeed pretty low at cost=18, so this
> overhead does not explain the 0.06 vs. 0.24 discrepancy.
> Do you have an explanation?
>

Could you please explain better your math?
I am sorry, but I didn't understand how you composed it.


> > 4 cores, although they (plus Pico bus internal logic) are occupying 65%,
> > suggesting the increase of cores.
> > But if add only one more core the tool is not able anymore to fit
> everything
> > in that FPGA.
> Any idea why not?  What specific error message does it give when you try
> to fit 5 cores?


 The first error is:

"ERROR:Place:543 - This design does not fit into the number of slices
available
   in this device due to the complexity of the design and/or constraints."

And it gives the following orientation:

"Please evaluate the following:

   - If there are user-defined constraints or area groups:
     Please look at the "User-defined constraints" section below to
determine
     what constraints might be impacting the fitting of this design.
     Evaluate if they can be moved, removed or resized to allow for fitting.
     Verify that they do not overlap or conflict with clock region
restrictions.
     See the clock region reports in the MAP log file (*map) for more
details
     on clock region usage.

   - If there is difficulty in placing LUTs:
     Try using the MAP LUT Combining Option (map lc area|auto|off).

   - If there is difficulty in placing FFs:
     Evaluate the number and configuration of the control sets in your
design."

Regards,

---
Yuri

Content of type "text/html" skipped
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.