john-dev - Re: Parallella: scrypt

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130627141234.GA21449@openwall.com>
Date: Thu, 27 Jun 2013 18:12:34 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Parallella: scrypt

Rafael,

On Tue, Jun 25, 2013 at 01:00:00PM +0100, Rafael Waldo Delgado Doblas wrote:
> I just finished the implementation of Scrypt format. Now I have some
> questions:
> 
> The SSE implementation have a really poor performance on my i5-3570K only
> 30 c/s

What are you comparing it against to state that this performance is
"poor"?  If I understand what test vector you're using correctly, this
one had been tuned (by scrypt author) for 100 ms on one core in a
Core 2 Duo.  This means 10 c/s per core.  However, since scrypt also
uses RAM (at these settings), its performance does not scale linearly on
multi-core systems.  On the other hand, interleaving instructions from
multiple independent Salsa20 core computations may improve performance
(an attack-specific optimization, which we don't have implemented yet).

I took a look at lordrafa_scrypt_fmt.c.  Although you based it on
c3_fmt.c, which included OpenMP support, you dropped that in your
revision of the code (yet you forgot to drop the FMT_OMP flag, setting
which without actually having OpenMP support is a minor bug).  So your
30 c/s is probably for one core, which is 3x better than the performance
achieved by scrypt author's SSE2 code on one core in his Core 2 Duo from
a few years ago.  That's fine performance - but indeed we should try to
improve it further.

Another detail I noticed - somehow you require HAVE_SCRYPT, but it's not
needed because this very source tree provides scrypt code.  (We did need
HAVE_CRYPT, because crypt(3) is provided by the system and may not be
available on some systems.)

For now, to test your code speed on FX-8120, which we have in our dev
box (bull), I added -DHAVE_SCRYPT to the linux-x86-64-xop target in my
copy of your tree.  I also lowered MIN_KEYS_PER_CRYPT and
MAX_KEYS_PER_CRYPT from 96 (inherited from c3_fmt.c) to 1, for faster
benchmarking, better interactivity when cracking, and less work lost
when interrupting/restoring.  (Higher values are needed are needed for
faster hashes and for OpenMP or similar.)  After these changes, I got
28 c/s, which is fine.

> Also I need to know the if this are the targets for my gsoc project:

Not quite.  We'll need to work on a project plan for you.

> For the first half of the summer:
>   First, implement scrypt in host mode.

OK.  You did it as a hack, just like I had suggested.  Now it feels like
I need to reimplement it cleanly. %-)

>   Second, experiment with scrypt time-memory tradeoff.

Yes.

>   Third, move it to epiphany.

I had suggested this, but upon a second thought I am not sure.
For practically relevant scrypt settings, except as used in crypto
currencies, performance on Epiphany will be extremely poor.  So maybe we
should skip this and jump to Litecoin mining only.

Perhaps you can have Litecoin mining working in two ways (and in two
source trees) by GSoC midterm:

1. Integrated into jumbo tree, on host CPU (usually x86) and/or GPU
and/or Xeon Phi.

2. Running on Epiphany as proof-of-concept - this can be implemented
perhaps as a patch to cgminer code base.  We might not even need to have
JtR involved in this, although having it implemented on top of JtR core
tree is an option too (especially if/since JtR will also have bcrypt on
Epiphany due to Katja's work).

> For the second half of the summer(early August or maybe before?) :
>   Fourth, implement epiphany based Litecoin mining.

I suggest that you approach this in the first half of summer.

>   Fifth, implement descrypt on Parallella.

Maybe, or/and (if time permits) you could work on implementing scrypt
and Litecoin mining on Xeon Phi (for both JtR and upstream cgminer
project - we'd submit a patch then).  Of course, this depends on us
actually getting Xeon Phi to work - but this is planned.

> Furthermore I need to know the next task, yesterday you told me about
> interleaving and opencl, may you tell me more about those tasks?

I've just CC'ed you on a message related to JtR/OpenCL on Epiphany.
This is also something you can work on during the summer, although I
think it should be among the stretch goals or tasks that you work on
when you would otherwise be idle (such as when waiting for guidance on
your other tasks) - not part of your GSoC project description.

As to interleaving, we'll definitely need to try it on x86 CPUs, but
probably not on Epiphany (we'd need to use even higher TMTO factors
then, which would neutralize any advantage from hiding instruction
latencies via interleaving, and besides Epiphany's latencies are
probably low enough as-is).

> is it time-memory tradeoff related?

Not directly.

Alexander
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.