john-dev - Re: Handling of hashes with different iteration counts

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <534714F0.1030001@mailbox.org>
Date: Fri, 11 Apr 2014 00:02:24 +0200
From: Frank Dittrich <frank.dittrich@...lbox.org>
To: john-dev@...ts.openwall.com
Subject: Re: Handling of hashes with different iteration counts

Solar, magnum, all,

On 01/28/2014 01:46 AM, Solar Designer wrote:
> Iteration count is just a special case of tunable cost.  We already have
> scrypt, which accepts three tunable cost parameters: N, r, p.  Maybe the
> format should export these as up to two - t_cost and m_cost (as they're
> called in PHC) - even if the underlying hash/KDF is more flexible.  For
> the scrypt example, as long as our implementation doesn't use TMTO and
> doesn't use scrypt's internal parallelism (since we've got plenty due to
> multiple candidate passwords), the format may export N*r as m_cost and
> N*r*p as t_cost. 

So far, most formats just had a tunable parameter "iteration count".
Now I am dealing with django-scrypt.
It has the tunable cost parameters known from scrypt:
  N - the CPU cost, must be a power of 2, defaults to 2^15
  r - the memory cost, defaults to 8
  p - the parallelization parameter, defaults to 1

If p is 1, then N*r = m_cost = N*r*p = t_cost. This doesn't seem to be
very useful.
For now (not committed yet, and thus no pull request), I decided to
report N*p as t_cost and r*p as m_cost.
(As long as p == 1, just N and r are reported; p > 1 influences both
t_cost and m_cost. Does that make sense?)

> And we'd need command-line options, similar to --salts,
> which will let us choose subsets of hashes based on ranges of these
> values (in fact, combined with --salts, this will be up to three ranges
> that the chosen hashes must or must not fall within).

Done, works in bleeding-jumbo (for bcrypt and a few other formats that
currently do report their tunable costs).
It is implemented as a single option --costs=, which allows a comma
separated list of values and/or ranges (one value/range per tunable cost).
For a single tunable cost, the syntax corresponds to the --salts=syntax
(as it is implemented in latest bleeding).

> t_cost shouldn't be in real time units, but rather in some fixed units -
> e.g., for bcrypt it will be exactly the cost parameter specified with
> hashes.  I think it's OK to keep it the base2-logarithm of actual cost,
> since that's how bcrypt is defined.  Or do we prefer to make it linear,
> so that regardless of hash type if one doubles the acceptable t_cost,
> they should expect the c/s rate to be roughly twice lower?  That would
> make sense to me, too.

I think reporting the actual cost (linear instead of log2) is less
confusing for users who are not that familiar with the definition of
certain hash algorithms, so that's what I've done so far.
(The other reason is that I prefer to print "iteration count" instead of
"log2(iterations)" when informing a user about hashes with varying cost
values.)

That's why, I reported as bcrypt's cost value
	1 << bf_salt->rounds
and intend to report as django-scrypt's t_cost value
	s->p * (1 << s->N)

> On the other hand, it'd be nice if our m_cost will be in bytes or KiB,
> and will not include build-specific memory allocation overhead.  That
> way, it'd be convenient to choose only hashes that fit in RAM, yet the
> same command-line will result in consistent behavior across systems.

I don't know how r = 8 translates into bytes, or how N and p influence
m_cost, so just reporting r * p as m_cost probably isn't good enough.

> We could also have a just.conf setting that will specify the maximum
> memory size that JtR is permitted to allocate, and its implementation
> could use formats' exported m_cost in order to detect would-be-excessive
> memory usage without actually reaching the threshold.

Before I can even think about that, I'll have to get reporting m_cost right.


Frank
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.