Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Mon, 18 Sep 2023 13:37:20 -0700
From: Steve Thompson <susurrus.of.qualia@...il.com>
To: Steve Thompson <susurrus.of.qualia@...il.com>, oss-security@...ts.openwall.com
Subject: Possible AMD Zen2 CVE

I've been beating my head against a wall for a while on this.  I'm not a
security researcher, or even currently employed in the industry so my
ability to analyze the problem I've seemingly discovered here is somewhat
limited.

I have a laptop with an AMD Ryzen 5700U.  I've been fooling around with
spinlocks for a while and for various reasons.  Back in late March I
basically finished a R/W ticket spinlock that I instrumented for testing
purposes.  A short test program was written and I found I was getting
deadlocks and other odd symptoms.  I was unsure of the implementation of
the algorithm and so I looked and looked at the code until my eyes started
bleeding.  The errors were occurring within a few thousand iterations with
moderate parallelism.

I eventually wrote several alternate implementations of naive spinlocks,
ticket spinlocks, and MCS spinlocks.   Many of them were problematic.   I
eventually developed a much simplified test program implementing a very
basic ticket spinlock that can be made to fail with a trivial code change
that should not affect the operation of the algorithm.

The code is included as an attachment; it is relatively short at ~300 LOC,
and most of those lines are boilerplate or initialization code.  The
business end is the wr_thread() function which is the vector passed to
pthread_create(),   In a loop, the following code is found:

      nr_spin = t1lock_acquire(&obj.lock);
#if defined BROKEN
      temp = ++obj.value;
#else
      ++obj.value;
#endif
      t1lock_release(&obj.lock);

If "BROKEN" is defined, you can see that an additional cache-line write is
made with the assignment to 'temp'.  When this code path is enabled, the
underlying cmpxchg operation in t1lock_acquire() occasionally succeeds when
it shouldn't, with a probability on the order of 1:5*10^6 when the CPU
frequency is allowed to climb to 4.3GHz. or thereabouts.  I should, but
have not yet investigated whether using an attached 4K, 60Hz monitor
notably affects this problem.

The test program is essentially a bank-account simulator that adds $.01 to
'obj.value' each iteration for N threads.  If the cmpxchg operation in
t1lock_acquire() functions correctly, the final "balance" in obj.value will
be the number of threads multiplied by the number of iterations each thread
performs.  When the "-DBROKEN" codepath is enabled, the final result may be
less than expected, indicating data loss from colliding threads.  Very
occasionally, a deadlock of all threads is observed.

As the probability of this error occurring is relatively low in a test
program that really hammers on a single shared resource, I would expect
this bug to manifest relatively rarely under typical usage patterns for
code that is found to be vulnerable.  However, different test programs,
such as with the previously mentioned R/W ticket lock show much higher
error-rates.  In that case, the lock structure is five fields in a 32 or
64-bit word.  One bit is used for mutual-exclusion between threads and the
other fields track queue depth and/or the number of instantaneous active
read-only threads.  It appears that the act of using a cmpxchg operation
followed by non-atomic field updates and a release operation on a single
machine word vastly increases the probability of an error occurring in
comparison to the included test code.

I have not yet found the underlying microarchitectural features responsible
for the manifestiation of this apparent CPU bug, which implies that
individual spinlock algorithms must be tested in-situ to identify code
arrangements that trigger the bug. It is my impression thus far that most
spinlock implementations do not do this testing, which suggests that the
number of spinlocks in the wild that are vulnerable to this bug is
currently unknown.  This bug might be exploitable to cause scheduler
malfunctions, database corruption, etc. in a deterministic fashion,
although i have yet to generate an exploit to this end -- that is beyond my
expertise at this stage.

Currently, I lack access to a lab where this can be tested on other CPUs,
Intel or otherwise to determine the scope of affected processors.  (I have,
however, detected the problem on a Core 2 Duo Macbook Pro from the Jurassic
period, which is interesting.)

The bug has not been verified yet.  I have been dealing with HP as the
laptop is under warranty, but in approximately two months they have been
unable to find a technician able to understand the source code or who is
able to interpret the results.  It is still possible I have made some sort
of stupid error, but at this point I am reasonably confident I am using
atomic operations correctly as per the x86-64 architecture specification
documents.

I've posted this here to acquire feedback, and I would greatly appreciate
advice on how to better characterize what is going on here, etc.   Calling
the test program with four threads and 10^7 for the number of loop
iterations will usually trigger the bug.

Content of type "text/html" skipped

View attachment "bug_src.c" of type "text/x-csrc" (8213 bytes)

Powered by blists - more mailing lists

Please check out the Open Source Software Security Wiki, which is counterpart to this mailing list.

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.