john-dev - Re: Parallella: scrypt

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAepdCaNw+-ZJz_jmvJqwYH9ZBb6rnScurkVVu-TAHtMuuN3rA@mail.gmail.com>
Date: Tue, 2 Jul 2013 22:58:49 +0100
From: Rafael Waldo Delgado Doblas <lord.rafa@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: Parallella: scrypt

Hello Alexander,

I have checked the scrypt TMTO version that you send me. If I understand
the algorithm correctly, In order to reduce the memory used, we compute Xi
sequentially* in the first "for" but instead of store all Xi in V, we only
store those that makes i%exploit_tmto=0. Then in the second "for" we need
to calculate again Xj if j%exploit_tmto != 0, using the immediately smaller
Xb that makes b%exploit_tmto=0, as base to calculate it. Please tell me if
I make any mistake.

Also would like to know, why do you said to me that this approach is not
valid to be integrated into JtR? Is it only because as you said before,
only a smaller portion of the allocated region is actually used or there
any other reason?.

I almost know what the functions of scrypt do, at least in a general idea,
but I still having doubts with the integerify(X, r), may someone explain it
a little bit better? because in "http://www.tarsnap.com/scrypt/scrypt.pdf"
I couldn't see too much about it.

Furthermore the tmto is only useful for Epiphany because we have only 32K
and Litecoin needs 128K, is it right?. Also the only way to use paralleling
computer is create multiple instances of scrypt algorithm computing
different keys*, is it true?.

BTW Now, I was checking about Anthony's scrypt time-memory tradeoff
defeater and the solution proposed to attack it here: "
http://www.openwall.com/lists/crypt-dev/2013/03/17/1"

After this, what should be my next task? Litecoin integration with JtR or
try to move scrypt to Epiphany?

Best regards,
Rafael.

* because smix is sequential memory-hard under Random Oracle model


2013/6/27 Solar Designer <solar@...nwall.com>

> Rafael,
>
> On Tue, Jun 25, 2013 at 01:00:00PM +0100, Rafael Waldo Delgado Doblas
> wrote:
> > I just finished the implementation of Scrypt format. Now I have some
> > questions:
> >
> > The SSE implementation have a really poor performance on my i5-3570K only
> > 30 c/s
>
> What are you comparing it against to state that this performance is
> "poor"?  If I understand what test vector you're using correctly, this
> one had been tuned (by scrypt author) for 100 ms on one core in a
> Core 2 Duo.  This means 10 c/s per core.  However, since scrypt also
> uses RAM (at these settings), its performance does not scale linearly on
> multi-core systems.  On the other hand, interleaving instructions from
> multiple independent Salsa20 core computations may improve performance
> (an attack-specific optimization, which we don't have implemented yet).
>
> I took a look at lordrafa_scrypt_fmt.c.  Although you based it on
> c3_fmt.c, which included OpenMP support, you dropped that in your
> revision of the code (yet you forgot to drop the FMT_OMP flag, setting
> which without actually having OpenMP support is a minor bug).  So your
> 30 c/s is probably for one core, which is 3x better than the performance
> achieved by scrypt author's SSE2 code on one core in his Core 2 Duo from
> a few years ago.  That's fine performance - but indeed we should try to
> improve it further.
>
> Another detail I noticed - somehow you require HAVE_SCRYPT, but it's not
> needed because this very source tree provides scrypt code.  (We did need
> HAVE_CRYPT, because crypt(3) is provided by the system and may not be
> available on some systems.)
>
> For now, to test your code speed on FX-8120, which we have in our dev
> box (bull), I added -DHAVE_SCRYPT to the linux-x86-64-xop target in my
> copy of your tree.  I also lowered MIN_KEYS_PER_CRYPT and
> MAX_KEYS_PER_CRYPT from 96 (inherited from c3_fmt.c) to 1, for faster
> benchmarking, better interactivity when cracking, and less work lost
> when interrupting/restoring.  (Higher values are needed are needed for
> faster hashes and for OpenMP or similar.)  After these changes, I got
> 28 c/s, which is fine.
>
> > Also I need to know the if this are the targets for my gsoc project:
>
> Not quite.  We'll need to work on a project plan for you.
>
> > For the first half of the summer:
> >   First, implement scrypt in host mode.
>
> OK.  You did it as a hack, just like I had suggested.  Now it feels like
> I need to reimplement it cleanly. %-)
>
> >   Second, experiment with scrypt time-memory tradeoff.
>
> Yes.
>
> >   Third, move it to epiphany.
>
> I had suggested this, but upon a second thought I am not sure.
> For practically relevant scrypt settings, except as used in crypto
> currencies, performance on Epiphany will be extremely poor.  So maybe we
> should skip this and jump to Litecoin mining only.
>
> Perhaps you can have Litecoin mining working in two ways (and in two
> source trees) by GSoC midterm:
>
> 1. Integrated into jumbo tree, on host CPU (usually x86) and/or GPU
> and/or Xeon Phi.
>
> 2. Running on Epiphany as proof-of-concept - this can be implemented
> perhaps as a patch to cgminer code base.  We might not even need to have
> JtR involved in this, although having it implemented on top of JtR core
> tree is an option too (especially if/since JtR will also have bcrypt on
> Epiphany due to Katja's work).
>
> > For the second half of the summer(early August or maybe before?) :
> >   Fourth, implement epiphany based Litecoin mining.
>
> I suggest that you approach this in the first half of summer.
>
> >   Fifth, implement descrypt on Parallella.
>
> Maybe, or/and (if time permits) you could work on implementing scrypt
> and Litecoin mining on Xeon Phi (for both JtR and upstream cgminer
> project - we'd submit a patch then).  Of course, this depends on us
> actually getting Xeon Phi to work - but this is planned.
>
> > Furthermore I need to know the next task, yesterday you told me about
> > interleaving and opencl, may you tell me more about those tasks?
>
> I've just CC'ed you on a message related to JtR/OpenCL on Epiphany.
> This is also something you can work on during the summer, although I
> think it should be among the stretch goals or tasks that you work on
> when you would otherwise be idle (such as when waiting for guidance on
> your other tasks) - not part of your GSoC project description.
>
> As to interleaving, we'll definitely need to try it on x86 CPUs, but
> probably not on Epiphany (we'd need to use even higher TMTO factors
> then, which would neutralize any advantage from hiding instruction
> latencies via interleaving, and besides Epiphany's latencies are
> probably low enough as-is).
>
> > is it time-memory tradeoff related?
>
> Not directly.
>
> Alexander
>

Content of type "text/html" skipped
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.