Date: Tue, 18 Nov 2014 14:15:45 -0500 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: [PATCH] ARM atomics overhaul, try 2 Here's a new version of the ARM atomics overhaul patch which I'm much happier with. Whereas the old version imposed a heavy address computation in the caller at each point where an atomic was used, the new version achieves a light computed jump inside the callee, using an idiom of the form: ldr ip,1f ldr ip,[pc,ip] add pc,pc,ip 1: .word relativeptr-1b When relativeptr contains zero, as at program startup, the code continues with the instruction after the .word directive (a dummy version that's safe to use before initialization). Later, relativeptr is filled with the difference between the address of the desired version and the address of this dummy code. As before, v7+ is the most highly optimized, with special versions of the various atomics using ldrex/strex directly to avoid a nested cas loop. For atomics, compile-time v6 builds are not significantly better than baseline (v4t) builds, although the thread-pointer load is optimized with a hard-coded instruction. I could make v6 builds use the inline asm like v7+ does, but with "bl __a_barrier" instead of "dmb ish", but I'm not sure how much of a win this would be, if any. Comments? If no problems are noticed right away I'll probably commit this soon as a basis for any future work that needs to be done improving it, since I think it's already reasonably good (and much better than what we had). Rich View attachment "arm_atomics_overhaul_try2.diff" of type "text/plain" (10131 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.