kernel-hardening - Re: [RFC v4 PATCH 12/13] x86: implementation for HARDENED

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20161111102913.GE3117@twins.programming.kicks-ass.net>
Date: Fri, 11 Nov 2016 11:29:13 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: "Reshetova, Elena" <elena.reshetova@...el.com>
Cc: Kees Cook <keescook@...omium.org>,
	"kernel-hardening@...ts.openwall.com" <kernel-hardening@...ts.openwall.com>,
	Arnd Bergmann <arnd@...db.de>, Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>,
	"Anvin, H Peter" <h.peter.anvin@...el.com>,
	Will Deacon <will.deacon@....com>,
	Hans Liljestrand <ishkamiel@...il.com>,
	David Windsor <dwindsor@...il.com>
Subject: Re: [RFC v4 PATCH 12/13] x86: implementation for HARDENED_ATOMIC

On Fri, Nov 11, 2016 at 09:32:45AM +0000, Reshetova, Elena wrote:
> It is covered in the documentation file (Documentation/security/
> hardened-atomic.txt) included in the first patch.
> There are just so many details that including them all in commit
> messages will produce pages long commit messages, so some
> things are moved to the documentation only. 

No, never skimp on Changelogs. Nobody reads documentation.

Also, this really should also have a very explicit code comment,
non-atomic constructs in atomic.h are 'surprising' at the very least.

> >Now is this harmless? If you have two increments racing like:
> 
> 	  inc
> 	  jno 1 // overflow
> 
> 			  inc
> 			  jno 1 // !overflow
> 
> 	  dec
> 	1:		1:
> 
> >The second thread will still affect your wrap and not BUG.

This still wants an answer, because attackers never exploit races?

> >I have a benchmark somewhere, I can run numbers tomorrow, but it
> >really shows once you get a bit of contention going. Once you hit 4
> >nodes contending on a variable its completely out there IIRC.

> This would help to get more numbers on this, thank you. 

		LOCK addl		LOCK cmpxchg-addl

1-node		1: 22.038250		1: 41.572270
		2: 174.019700           2: 198.965635
		3: 185.852060           3: 274.293927
		4: 389.169783           4: 266.738485
		6: 347.827897           6: 454.785715
		8: 369.649510           8: 463.125426


2-nodes		2: 428.448130		2: 1422.221850
		4: 616.203497           4: 1166.427205
		6: 855.639025           6: 1424.131080
		8: 1083.613291          8: 1402.484560

4-nodes		 4: 1180.591315		 4: 1830.301125
		 8: 1480.023056          8: 2043.418720
		16: 2602.128429         16: 2611.188079


Results are in cycles:u, average of 100000 loops.

As measured on a 4 socket IVB-EX (E7-4890 v2).

> >The unconditional atomic ops really are loads faster than cmpxchg
> >loops.
> 
> Yes, and this is what we saw when doing performance measurements. So,
> as a result we went with a faster method, which we believed still has
> a low risk of getting into race.  If you numbers prove otherwise, then
> we have to reconsider. 

Its fundamentally a question of semantics though. These are _atomic_
ops, they really should be, well, atomic. No exceptions.

If you want to play funny games, don't call them atomic.

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.