Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 15 May 2012 10:05:25 +0300
From: Milen Rangelov <gat3way@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: RAR early reject (was: New RAR OpenCL kernel)

I am reaching the same conclusion. I was able to do another check for the
PPM part, order should never be more than 63 (as this is the archiver
limit) and this is safe I think.

I believe some more clever LZ test should be performed, but I am too lazy
to dig into the algorithm.

As for the memleaks, I fixed a lot of them while migrating the code for my
uses. Still unfortunately some bytes are leaked and I can't identify the
root cause. This code does a lot of mallocs/reallocs for small blocks and
this is rather annoying (and slow).

On Tue, May 15, 2012 at 3:08 AM, magnum <john.magnum@...hmail.com> wrote:

> Milen,
>
> I now tested on a couple of thousand files more. Unfortunately this test
> is pointless. We're trading false negatives for performance.
>
> I have seen up to 61 same-in-a-row from valid streams (jpeg files). In
> the same test set, the worst number for an *invalid* stream was a third
> of that.
>
> Still, you made me realise that "if (n >= decode->MaxNum)" thing (that
> *is* a safe early reject, though not as effective as I was hoping) as
> well as lots of memory leaks I had lurking :)
>
> magnum
>
>
> On 05/09/2012 02:13 AM, magnum wrote:
> > OK, then I might have got equivalent code now. I made 326 test rar files
> > and used a threshold of 3. All but 9 was cracked. Raised threshold to 4
> > and got 4 more. Raised to 5 and got all but one. That last one need a
> > threshold of 10. All these were pretty small files (*.[ch] from john's
> > src dir, the worst file was md5_eq.c rar'ed with -ptest)
> >
> > magnum
> >
> >
> > On 05/08/2012 09:26 AM, Milen Rangelov wrote:
> >> I've never seen more than 3 of them in a row for a correct stream, yet
> >> having 2 or 3 in a row is quite common. I am doing all my checks in
> >> unpack29 though.
> >>
> >> On Mon, May 7, 2012 at 10:00 PM, magnum <john.magnum@...hmail.com>
> wrote:
> >>
> >>> On 05/07/2012 09:27 AM, Milen Rangelov wrote:
> >>>> I think we can narrow the case further (that's what I did, still not
> sure
> >>>> it's completely safe).
> >>>>
> >>>> To understand why a bad stream returns long series of same values, you
> >>> need
> >>>> to have a look at the end of rar_decode_number() function. You can do
> one
> >>>> more stricter check: if rar_decode_number() returns repeatedly
> >>>> decode->DecodeNum[0], then it's "likely" a bad stream. Still I think
> it
> >>>> might happen for some valid stream too. But I guess there must be some
> >>>> threshold that can be considered relatively safe.
> >>>
> >>> I totally missed that one. Even if decode->DecodeNum[0] is valid
> several
> >>> times, I would *guess* this code is safe:
> >>>
> >>> @@ -387,7 +387,7 @@ int rar_decode_number(unpack_data_t *unpack_data,
> >>> struct Decode *decode)
> >>>        rar_addbits(unpack_data, bits);
> >>>
> >>>
> >>>
> n=decode->DecodePos[bits]+((bit_field-decode->DecodeLen[bits-1])>>(16-bits));
> >>>        if (n >= decode->MaxNum) {
> >>> -               n=0;
> >>> +               return -1;
> >>>        }
> >>>        rar_dbgmsg("rar_decode_number return(%d)\n",
> decode->DecodeNum[n]);
> >>>
> >>> ...and add code in read_tables() and rar_unpack29() so this -1
> >>> propagates to an immediate bailout. Seems to work fine but this alone
> >>> doesn't help performance as much as the threshold for repeated numbers.
> >>> BTW forget what I wrote seeing 26 of them, I did not reset the counter
> >>> properly. Besides, it depends on how/where you count them. I had the
> >>> check in read_tables() but it should be inside rar_decode_number()
> >>> itself (again returning -1 for bailout) because there are more callers.
> >>>
> >>> magnum
> >>>
> >>>
> >>>> I think with ZIP we have somewhat similar situation - it is possible
> >>>> (though very unlikely) that for a small file the verifier matches,
> file
> >>> is
> >>>> inflated correctly and checksum matches, yet password candidate is
> wrong.
> >>>>
> >>>>
> >>>>
> >>>> On Mon, May 7, 2012 at 2:59 AM, magnum <john.magnum@...hmail.com>
> wrote:
> >>>>
> >>>>> On 05/05/2012 02:09 AM, Milen Rangelov wrote:
> >>>>>> Well, there are two cases here, a PPM block or a LZ block. The PPM
> case
> >>>>>> would very quickly give you an error if stream is invallid and
> unpack29
> >>>>>> would switch back to the pure LZSS case.
> >>>>>
> >>>>> Do you mean you modified something in unpack29 for this case? When I
> >>>>> debugged it I got the impression it already rejects properly in this
> >>> case.
> >>>>>
> >>>>>> Then for Lempel-Ziv  it's highly
> >>>>>> unlikely that you'd get the same result from rar_decode_number()
> >>>  several
> >>>>>> times at a row unless the stream is bad. The key point here being
> >>>>> "several"
> >>>>>> and that's where  my concerns are.
> >>>>>
> >>>>> Nice. I tested a little and I've seen the same result at most 26
> times
> >>>>> in a row for a valid stream while bad streams can have several
> hundreds.
> >>>>> The question is what threshold would be safe...
> >>>>>
> >>>>> I think I'll have a look at Jim's pkzip code again, maybe something
> in
> >>>>> there is usable for the LZ case.
> >>>>>
> >>>>> magnum
> >>>>>
> >>>>>
> >>>>>
> >>>>>> On Sat, May 5, 2012 at 2:45 AM, magnum <john.magnum@...hmail.com>
> >>> wrote:
> >>>>>>
> >>>>>>> On 05/04/2012 08:19 PM, Milen Rangelov wrote:
> >>>>>>>> OK, my tests are all successful until now. Speed is relatively the
> >>>>> same,
> >>>>>>>> doesn't matter whether the compressed file is a 4KB text file, or
> a
> >>>>> 50MB
> >>>>>>>> avi movie :) Unfortunately there is still overhead as compared to
> the
> >>>>> HE
> >>>>>>>> case, speed being somewhat 20% slower. I am still not 100% whether
> >>> this
> >>>>>>>> would hold true in all cases and if I am wrong, it might happen
> that
> >>>>> for
> >>>>>>>> some archives we can get false negatives.
> >>>>>>>>
> >>>>>>>> I am not yet 100% convinced I am correct yet though. The approach
> is
> >>>>>>> based
> >>>>>>>> on 2 assumptions, first one can easily be proved, the second one
> is
> >>>>>>> related
> >>>>>>>> to the LZ algorithm and I cannot yet prove what I check is safe
> and
> >>>>> can't
> >>>>>>>> occur in a valid LZ stream.
> >>>>>>>
> >>>>>>> This is good news Milen, just what I hoped for. I am SURE there are
> >>> such
> >>>>>>> short-cuts because cRARk runs full speed no matter what you throw
> at
> >>> it.
> >>>>>>> Please do share your findings, you know I would.
> >>>>>>>
> >>>>>>> magnum
> >
> >
>
>
>

Content of type "text/html" skipped

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.