Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 20 Mar 2012 22:20:23 +0100
From: magnum <>
Subject: Re: RAR format tweaks (was: OpenSSL and AES-NI)

On 03/20/2012 12:31 PM, magnum wrote:
> On 03/19/2012 10:20 PM, Milen Rangelov wrote:
>> Perhaps like more can be achieved by tweaking the RAR decompression routine
>> in the libclamav code. I am not that happy with the result though, I put so
>> much hope on AES-NI...
> I was planning on concentrating on the very first data block, trying to
> detect invalid dictionaries. And just step the unpack code and see if
> there are some code paths that are more rigid than we want to. I'd like
> to think there are huge gains possible but I'm not sure we'll find them
> that easy.

I did some tests and research today that showed that the unrar code
mostly does what we want it to. This is a good thing, except it means
the gains I hoped for will probably not be there.

Actually, it bails out as-is after looking at just 15-20% of the data on
average. It could be a lot worse.

> Oh btw, I think I know one thing already, but haven't tested yet. The
> very first bit of the decrypted data tells you if it's LZSS or PPMII.
> But I think I saw somewhere in the code that if it was supposed to be
> PPMII but that engine detects an error, it tries to fall back to LZSS
> instead of aborting. This kind of behaviour is precisely what we're
> looking after!

This too was a red herring (it does abort), as well as the below.

> Also, I think I saw a suspicious function name in Valgrind that I'll get
> back to. It was like "restart" something. That too just might be some
> kind of rigidness we don't want.

So, back to square one. Meanwhile I'm trying to figure out how to deal
with -p mode best: My current code calculates the AES key and IV in GPU,
and does the rest in CPU (multi-threaded). I'm not sure how to
auto-scale OMP vs. GPU for best balance. There's no point in calculating
5000 keys in one second if it takes 9 seconds to verify them afterwards.
You might have one CPU core, or 96 of them.

For -hp mode this problem could be mitigated by decrypting in GPU, but I
don't except it to be faster, just "easier". Actually I think -hp mode
will do just fine with the current code (I haven't tested on GPU yet).

Finally, the more I look at the unrar code, the smaller it gets, and I'm
starting to think I could migrate all of it to OpenCL. Maybe not a one
beer job though.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.