oss-security - Re: Randomness Attacks Against PHP Applications

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABY-coMnp9p939d-2QbDxJYU1i-TvU5ibJj=cepVnd_ooMweWw@mail.gmail.com>
Date: Thu, 27 Sep 2012 21:00:31 -0400
From: George Argyros <argyros.george@...il.com>
To: Vladimir Vorontsov <vladimir.vorontsov@...ec.ru>
Cc: Solar Designer <solar@...nwall.com>, oss-security@...ts.openwall.com, 
	Aggelos Kiayias <aggelos@...yias.com>, gifts <gifts.antichat@...il.com>, 
	Anthony Ferrara <ircmaxell@...il.com>, Pierre Joye <pierre.php@...il.com>
Subject: Re: Randomness Attacks Against PHP Applications

Hi Vladimir,

On Sun, Sep 23, 2012 at 5:22 AM, Vladimir Vorontsov
<vladimir.vorontsov@...ec.ru> wrote:
> Hi all,
>
> George,
>
> First, thx for your work again.
> I have a question to
> http://crypto.di.uoa.gr/CRYPTO.SEC/Randomness_Attacks_files/paper.pdf.
> Why you are using Request Tweens expect of Triple Requests?
>
The problem is that the three requests that you send to the server
might not get processed sequentially. For example, the first request
might be processed second and the second first. In this case you will
search in the wrong time interval.  But even if you know that your
requests will be processed sequentially why not start searching all
the time values beginning from the timestamp of the first request? It
looks like the third request is a bit redundant in this case, although
it might allow you to find out that the target server is not
vulnerable without making too many requests.


> Please, pay your attention to slide 13 from:
> http://www.slideshare.net/d0znpp/dcg7812-cryptographyinwebapps-14052863
>
> Your Adversarial Time Synchronization (ATS) methodical is great.
> But it is not applicable to real web-projects whose architecture
> includes an application server and front-end.
>
> During the real audits on Web applications we are using many tricks to
> know application server timestamp and pid:
>
> 1. phpinfo() output with mod_uniqueid (very common). To decode uniqueid
> string use web service https://dev.onsec.ru/rands/
>
> 2. /server-status output discovered by at this post:
> http://habrahabr.ru/company/pt/blog/149746/
>
> 3. others application outputs
>
We focused on the more generic things we could come up with in order
to rely on the specific application and server setup as little as
possible. One thing that would be interesting is to find a way to
discover the pid or at least distinguish between different processes
as generically as possible. This would allow to attack suhosin
protected 5.4 installations, for which we currently need to use
application specific techniques like the ones you mentioned above
(which of course are important attack vectors too).

> 23.09.12, 9:14, Solar Designer пишет:
>> George,
>>
>> I watched a video of your USENIX Security talk the other day (after I
>> was done with my experiments with mt_rand() seed cracking, though):
>>
>> http://www.youtube.com/watch?v=yE0qTTi-_iQ
>>
>> Thanks.  Shortly before that, I released a faster version of my PHP
>> mt_rand() seed cracker, now down to 1 minute worst case on CPU:
>>
>> http://www.openwall.com/lists/john-users/2012/09/20/2
>> http://download.openwall.net/pub/projects/php_mt_seed/
>>
>> $ time ./php_mt_seed 1328851649
>> Found 0, trying 637534208 - 671088639, speed 63310249 seeds per second
>> seed = 658126103
>> Found 1, trying 1207959552 - 1241513983, speed 63343447 seeds per second
>> seed = 1234567890
>> Found 2, trying 4261412864 - 4294967295, speed 63347894 seeds per second
>> Found 2
>>
>> real    1m7.798s
>> user    9m1.942s
>> sys     0m0.008s
>>
>> In one minute of real time (on an FX-8120 CPU, using vectorized fused
>> multiply-add instructions), it found the original seed, another seed
>> that also produces the same mt_rand() output, and it searched the rest
>> of the 32-bit seed space (not finding other matches in this case).
>>
>> Gifts implemented the same thing in OpenCL:
>>
>> https://github.com/Gifts/pyphp_rand_ocl
>> https://github.com/Gifts/pyphp_rand_ocl/tree/SpeedyOCL
>>
>> His SpeedyOCL branch achieves similar performance (and even 10% better
>> than mine in his test of both on a Core i5-2500) on CPUs (with Intel's
>> OpenCL SDK) and several times better performance on GPUs.
>>
>> On Thu, Sep 20, 2012 at 02:31:19PM -0400, George Argyros wrote:
>>> FYI, we have also released some code to exploit such vulnerabilities
>>> about a month ago in github
>>> (https://github.com/GeorgeArgyros/Snowflake).  We hope that this
>>> framework will allow the easier development of exploits for such
>>> vulnerabilities.
>>> The main difference with the code mentioned in the previous posts, is
>>> that it may not always be possible to obtain the first output from
>>> mt_rand() after seeding. For example, many applications only leak
>>> truncated mt_rand() outputs in which case we should really compare
>>> bits of different mt_rand outputs to test whether we found the correct
>>> seed.
>>
>> Yes.  Your implementation is far more generic.
>>
>> The way I'd approach this is by limiting the set of possible seeds based
>> on the first mt_rand() output (perhaps with a min-max range, so _many_
>> seeds will be potentially valid - could be millions, yet significantly
>> fewer than the full 4 billion set).  This can be accomplished almost as
>> quickly as cracking the seed based on an exact mt_rand() output (full
>> 31 bits of it, no min-max range) - that is, in one minute on CPU.  Then
>> slower, but more generic code may be used to filter out the impossible
>> seeds based on further mt_rand() outputs, until there's just one seed
>> value left.  The slowness of that second cracker would not matter much
>> because it'd only need to search a much smaller seed space.
>>
>> A limitation, though, is that the very first mt_rand() output after
>> seeding must be among those available, even if in truncated form.  If it
>> is not, then more of the state has to be maintained in the initial
>> cracking pass, thereby making it slightly slower.  When the first output
>> is (at least partially) available, we only need 3 state elements, so
>> they're kept in registers nicely.
>>
>>> For this reason, in our tool the user defines a "hash" function
>>> that expresses any mt_rand dependent output and then we search for a
>>> preimage for this hash function.
>>
>> That's a generic and curious approach, but I think it's slower than what
>> I described above while not being _more_ generic.
>>
>>> We also include a python API for this functionalities as well as a
>>> sample exploit for mediawiki 1.18.1. Some POCs might indeed help...
>>>
>>> We will also release the gaussian solver based tool we developed in
>>> order to recover the internal state of mt_rand from its (truncated)
>>> outputs in the following days.
>>
>> Cool!
>>
>>> I agree too that education is important. This is something that we
>>> came to an agreement with the PHP team (for example that additional
>>> information is needed on the mt_rand manual). However, as pointed out
>>> nothing has changed yet (the conversations between us and the PHP team
>>> took place in March/April).
>>
>> Did PHP 5.4's change of session IDs (vs. 5.3's) occur before or after
>> your conversations with them?
>>
>>> However I think that the specific issue goes beyond education. First
>>> of all, We believe that adding simple exploit mitigations such as
>>> secure seeding in these functions is something that should definitely
>>> happen. Secure seeding is easy to add using randomness from the
>>> operating system and furthermore it will incur a negligible
>>> performance overhead since it would happen only once in the process
>>> lifetime.
>>
>> Agreed.
>>
>>> For a more concrete solution I think that the existence of secure
>>> PRNGs is not enough. Even if mt_rand() was producing secure random
>>> numbers we would still be having a lot of vulnerable applications just
>>> from the way this function is used.
>>
>> Right.
>>
>>> Just as PHP devs use
>>> session_start() to start a new PHP session there is a need for a
>>> generate_token() function that will return a random token and take
>>> care of providing the necessary level of security for this token.
>>
>> Perhaps, and this is in line with Anthony Ferrara's work on a new
>> password hashing API for PHP 5.5 (which will eventually obsolete phpass,
>> I guess).  Simpler interfaces that are easier to use correctly than not.
>>
>> However, a reason why many PHP app devs use mt_rand() and the like is
>> because they want to generate tokens of a certain format, such as to
>> meet a standard.  One such use is crypt() salts, which we're helping
>> eliminate with phpass and then with Anthony Ferrara's new API.  But I
>> guess other uses will remain.  So perhaps that generate_token() function
>> would need to accept arguments and return tokens of different target
>> formats accordingly.  Maybe it should have a format argument similar to
>> that of pack().  Its advantage over separate calls to a PRNG and then to
>> pack() on the PRNG's output would be that it'd keep the distribution
>> uniform.
>>
>> It is curious how it is often overlooked that simple modulo division may
>> turn a uniform distribution into non-uniform.  Here's an illustration
>> (in case this is news to someone on oss-security):
>>
>> $ php -r 'for ($n = $i = 0; $i < 1000000; $i++) { $x = mt_rand(); $n += ($x < 0x40000000); } echo "$n\n";'
>> 499567
>> $ php -r 'for ($n = $i = 0; $i < 1000000; $i++) { $x = mt_rand(); $n += ($x < 0x40000000); } echo "$n\n";'
>> 500558
>>
>> $ php -r 'for ($n = $i = 0; $i < 1000000; $i++) { $x = mt_rand() % 500000000; $n += ($x < 250000000); } echo "$n\n";'
>> 535117
>> $ php -r 'for ($n = $i = 0; $i < 1000000; $i++) { $x = mt_rand() % 500000000; $n += ($x < 250000000); } echo "$n\n";'
>> 534124
>>
>> First two runs show uniform distribution of mt_rand() outputs themselves
>> (assuming the 0 to 0x7fffffff range).  The other two runs show
>> non-uniform distribution over the 0 to 500 million minus 1 range,
>> after the modulo division by 500 million.  (Values lower than half the
>> maximum suddenly became more common than higher values.  It was 50/50
>> before the modulo division, but became 53.5/46.5 afterwards.  The reason
>> is obvious: the original space is not divisible into a whole number of
>> 500 million sub-spaces.)
>>
>> This is why having separate functions that return a random number and
>> those that process it to the desired format might not be good enough.
>> A single function that does both jobs at once (avoiding the above
>> problem by using a smarter approach) could work better.
>>
>> Alexander
>>
>>
>
>
Please check out the Open Source Software Security Wiki, which is counterpart to this mailing list.
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.