john-dev - RE: pkzip encr

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <00eb01cc5d20$1ac0ee90$5042cbb0$@net>
Date: Wed, 17 Aug 2011 15:56:15 -0500
From: "jfoug" <jfoug@....net>
To: <john-dev@...ts.openwall.com>
Subject: RE: pkzip encr

>> Raw:    6594M c/s

That should read 6594K.  Here is a test I just ran:

$ ./john -test=3 -form=pkzip
Benchmarking: pkzip [N/A]... DONE
Raw:    6553K c/s

There is 2 hashing things which happen within the pkzip format.

The first is a quick checksum.  It is simply a loop over the password, and
the first 12 bytes of the file.  It uses a couple calls to crcupdate, and a
lookup in a pre computed multiply table (and some shifting, etc).

Then, when that checksum is computed, it will match 1 or 2 bytes of either
the CRC32 or the DOSDATE field.  The unix zip's are less secure, and
checksum 2 bytes. PKZip 2.0+ and WinZip only checksum 1 byte. 

Thus, for PKZip/WinZip, 254 out of 255 candidate passwords are tossed, very
quickly. For Unix, it is one out of 64k need to be looked at further.  

I have added ability to put multiple 'checksum' blocks into a single hash
(up to 8). This is using the 'assumption' that all files in a single zip,
would have the same password.  Thus, if there are 4 of them, and it is a
unix zip, then the checksum is actually 8 bytes, which is pretty good all on
it's own.  There can be up to 8 files in a john input line.

In the call to crypt, all of the hashes are computed, and if they ALL match
properly (1 or 2 bytes), then crypt will set the checksum value to succeed.
If any of them fail, then crypt sets the value to ~checksum.  So, then
cmp_all works, cmp_one works. 

Then in cmp_exact, I perform other tests.  These require doing the same
decryption, but not just on the first 12 bytes (which are the IV), but on
part or all of the file.  

I have built in another assumption, that someone would likely know that a
file is ASCII.  If that is the case, then only a small part of the encrypted
data needs to be there.  I then call inflate on the data provided, and make
sure that it inflated a 'normal' amount of bytes (or more), and that all of
them are ascii.  If not, then cmp_exact returns 0.

Now, in the testing I did, where there were false positives, it is where
this ascii test succeeds.   I have to also do a 'full' test, which is the
'binary' test.


For the binary test, I have to either place all the data into the hash line
(small file), along with lengths, and CRC32. Then I unencrypt the blob with
the found PW, then uncompress, and perform CRC32.  If the blob uncompresses
to proper size, and passes the CRC32, then I 'assume' the password is
correct.   Now, I will have to address handling files that are too large to
jam into a hash line.   In that case, the original .zip file will have to be
present, and the hash line will provide all of the information to quickly
load that data.  The data will be loaded one time only, and from that point
on, be used to decrypt/inflate/crc each time a possible candidate is found.

This last part of the format is not yet done.  That is why I am getting
false positives.


However, in 'real' runtime, I am getting 1.5MB/s or so.  That is on the 2
byte checksum Unix hashes.  I am sure that the 1 byte checksum hashes will
be much slower.

As a side note, FCrackZip is able to test about 600/s (for challenge 4) if
you change it to ONLY test if 2 byte checksum's match. The default is it
checks if the 1 byte checksum matches, which slows it down to a crawl.  I
was able to test a 1.4GB dictionary file (against challenge 4), in about 8
hours.   With the new john pkzip format (again, NOT doing the full inflating
yet), it took about 2 minutes to run through this same file (130 million
lines).

There should be no appreciable slowdown for 'ASCII' optimized files, even
though they do have to fully decrypt/inflate/crc a file.  That is ONLY done
if the ASCII test says it is a possible password (probably 1 out of a couple
hundred million tests, for the 2 byte checksum).


Well, I hope to have this working at least to an alpha release level shortly


Jim.

>From: Solar Designer [mailto:solar@...nwall.com]
>
>Jim -
>
>This is a very welcome addition, thank you!
>
>On Wed, Aug 17, 2011 at 02:34:52PM -0500, jfoug wrote:
>> Benchmarking: pkzip [N/A]... DONE
>> Raw:    6594M c/s
>
>How's that speed even possible with the current formats interface and on
>current CPUs?  "dummy" only achieves up to approx. 130M c/s on "--test"
>(which is reported more like 130000K c/s, indeed).
>
>Are you somehow skipping impossible keys, yet counting them?  Just a
>guess.  But even in that case you'd need to hack code outside of the new
>format definition in order for the speedup to be seen on benchmarks.
>So this got me curious.
>
>Thanks again,
>
>Alexander
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.