john-dev - Re: WinZip PBKDF2 use optimization

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <fc6357e806a5f40b222b59cb22040991@smtp.hushmail.com>
Date: Thu, 2 Jun 2016 18:41:32 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: WinZip PBKDF2 use optimization

On 2016-05-13 14:01, Solar Designer wrote:
> atom just posted this:
>
> Behind the WinZip KDF optimization
> https://hashcat.net/forum/thread-5451.html
>
> It's about only needing to compute some of the PBKDF2 output blocks for
> AES key sizes larger than 128 bits.
>
> I vaguely recalled that we already had it, and I went to check - to my
> surprise, it looks like the code currently in jumbo is fully prepared
> for this optimization, but does not actually include it for WinZip.
> Specifically, pbkdf2_hmac_sha1.h says:
>
>  * simpler, AND contains an option to skip bytes, and only call the hashing
>  * function where needed (significant speedup for zip format).
>
> Indeed, it accepts a parameter skip_bytes, but somehow zip_fmt_plug.c
> passes 0 for that parameter all the time.  Looking through commits
> history for zip_fmt_plug.c, I found that the optimization was lost with:
>
> commit 528e6bcfb1a59f068b70c63b3c0d7ffc62c32ce4
> Author: JimF <jfoug@....net>
> Date:   Sun Jul 6 22:03:13 2014 -0500
>
>     zip2 format. #434 #691  Removed FMT_NOT_EXACT. Now fully detects passwords.
>
> Can the two of you look into this, please, and likely reintroduce the
> optimization?  Also check the OpenCL format for the same.

Fixed now. Wow, I never realized this property of PBKDF2 until now. Good 
to keep in mind. I always thought the second and later blocks depended 
on the previous one.

I got a lot more boost than Atom said though: He said the speedup would 
be 25% for AES256 and 33% for AES192. Nothing for AES128. But that is 
only speedup from skipping (eg. doing only the last 3 out of 4 chunks 
for AES256). However, for early rejection we just need one chunk of 
output and this saves us from calculating *another* 1-2 chunks.

That is, instead of calculating 4 chunks for AES256 (the naive way) or 3 
chunks (as Atom described), we skip to chunk 2 and calculate *one* chunk 
out of the four total. If the 16 bits of verifier doesn't indicate a 
match, we're done. That makes the speedup very close to 300% (4x).

For AES192, the speedup is 200% (3x, we do one chunk instead of three) 
and for AES128, even though we can't skip any chunk we can still 
calculate just a single chunk instead of both, for a 100% or 2x boost.

Our OpenCL format never had this optimization before, but it has now.

magnum

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.