Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 2 Jun 2016 18:41:32 +0200
From: magnum <>
Subject: Re: WinZip PBKDF2 use optimization

On 2016-05-13 14:01, Solar Designer wrote:
> atom just posted this:
> Behind the WinZip KDF optimization
> It's about only needing to compute some of the PBKDF2 output blocks for
> AES key sizes larger than 128 bits.
> I vaguely recalled that we already had it, and I went to check - to my
> surprise, it looks like the code currently in jumbo is fully prepared
> for this optimization, but does not actually include it for WinZip.
> Specifically, pbkdf2_hmac_sha1.h says:
>  * simpler, AND contains an option to skip bytes, and only call the hashing
>  * function where needed (significant speedup for zip format).
> Indeed, it accepts a parameter skip_bytes, but somehow zip_fmt_plug.c
> passes 0 for that parameter all the time.  Looking through commits
> history for zip_fmt_plug.c, I found that the optimization was lost with:
> commit 528e6bcfb1a59f068b70c63b3c0d7ffc62c32ce4
> Author: JimF <>
> Date:   Sun Jul 6 22:03:13 2014 -0500
>     zip2 format. #434 #691  Removed FMT_NOT_EXACT. Now fully detects passwords.
> Can the two of you look into this, please, and likely reintroduce the
> optimization?  Also check the OpenCL format for the same.

Fixed now. Wow, I never realized this property of PBKDF2 until now. Good 
to keep in mind. I always thought the second and later blocks depended 
on the previous one.

I got a lot more boost than Atom said though: He said the speedup would 
be 25% for AES256 and 33% for AES192. Nothing for AES128. But that is 
only speedup from skipping (eg. doing only the last 3 out of 4 chunks 
for AES256). However, for early rejection we just need one chunk of 
output and this saves us from calculating *another* 1-2 chunks.

That is, instead of calculating 4 chunks for AES256 (the naive way) or 3 
chunks (as Atom described), we skip to chunk 2 and calculate *one* chunk 
out of the four total. If the 16 bits of verifier doesn't indicate a 
match, we're done. That makes the speedup very close to 300% (4x).

For AES192, the speedup is 200% (3x, we do one chunk instead of three) 
and for AES128, even though we can't skip any chunk we can still 
calculate just a single chunk instead of both, for a 100% or 2x boost.

Our OpenCL format never had this optimization before, but it has now.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.