john-dev - pkzip encr

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <00db01cc5d14$bca65c10$35f31430$@net>
Date: Wed, 17 Aug 2011 14:34:52 -0500
From: "jfoug" <jfoug@....net>
To: <john-dev@...ts.openwall.com>
Subject: pkzip encr

Note, this email was done yesterday, but I got sidetracked, and did not
release it.  I will push it out now, and will post about the progress I have
made since I wrote this yesterday.

 

Benchmarking: pkzip [N/A]... DONE

Raw:    6594M c/s

 

NOTE, I am far from being done.  PKZIP original encryption has a lot of
things to work through.  First, you only get a 1 or 2 byte checksum, after
decrypting the first block.   So, if you have 'enough' files in the same
.zip file, all encrypted with the same password, then finding that password,
can be done by simply finding a word which satisfies all checksums.  That is
the 'first' mode I am working on, and is the 'speed' listed above.

 

I have plans on how to handle zip's if there is not enough files to allow
the checksums to be enough.  Examples would be the .zip files in the latest
contest.  In those files, there was only 1 internally encrypted file. Under
these situations, you have to fully decrypt the file, and have it decrypt to
the proper size, and then perform a CRC32 on the resultant data, and have it
match.    I do have some shortcuts in mind, so we may be able to get this
working much faster.   Programs such as FZC do not perform any tests like
this (and simply output based only on the checksums), and programs such as
FCrackZip spawn off to unzip using the -P and the password to 'test' the
file.

 

I have 4 'types' of cracks planned.

 

Type 1 is simply checksum only.  The test data will mostly perform this way.
In this mode, if all the checksum match, then the word is 'cracked'.  This
can lead to false positives.

 

Type 2, will take a file that is assumed to be ASCII.  It will grab, a small
part of the file (say, first 40 bytes of compressed data).  Then, we decrypt
it (when a 'possible' key is found), and uncompress the available data.  We
then make sure that all decompressed data  is between 0x20 and 0x7F (and \r
\n \t, etc).  If we find a password that has proper checksum, and decrypts
the file to valid ASCII, we assume we have found the password.  This will
make it easy to make a 'short enough' input hash line.

 

Type 3 is 'binary_data'.  In this mode, we do not have any knowledge that
the file is all ascii.  Such as a set of object files, a set of .zip files,
a set of word documents, etc). In this mode, we do have a small enough file
(to put on the LINE_BUFFER_SIZE line (along with any other hashes). We will
also need to list the compressed size, the decompressed size, and the CRC32.
In this mode, we still perform the checksum tests, and then when we have a
hit, we fully decrypt the buffer, fully explode it, and if all was correct,
we crc32 the resultant string.  If everything checks out, we have found the
password.

 

The final type (type 4), will be .zip files which have data too large to put
into a LINE_BUFFER_SIZE line. In this mode, we store the name of the .zip
file. Then john will pull  the required data out of the .zip file at run
time (and will refuse to run if the .zip is not there).  However, once this
data is pulled out of the .zip file (at load time), then this really becomes
similar to the type3.

 

 

I am close to having the type 1 working.  It will be setup where up to 8
files (hashes and checksums), can be loaded in a single line.  This way,
once built, you can be pretty sure the password is correct.

 

Some things I have found, is:

 

1. pkzip > 2.0 will only checksum 1 byte. This is from the CRC32.

2. Winzip (in pkware encryption), also only uses 1 byte checksum, from the
CRC32

2. infozip (unix) checksums 2 bytes.  These are from the timestamp of the
file.  Also, it appears that (headerflags & 8) is set by infozip.  There is
also other information in the 'extra data' section of the main header, which
is set by infozip.

 

The 2 byte checksums are MUCH weaker in cracking (i.e. john will be able to
run much faster, or closer to 'type 1' cracking).

 

 

Jim.

 


Content of type "text/html" skipped
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.