kernel-hardening - Re: Linux guest kernel threat model for Confidential Computing

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <702f22df28e628d41babcf670c909f1fa1bb3c0c.camel@linux.ibm.com>
Date: Fri, 27 Jan 2023 14:24:55 -0500
From: James Bottomley <jejb@...ux.ibm.com>
To: "Reshetova, Elena" <elena.reshetova@...el.com>,
        Leon Romanovsky
	 <leon@...nel.org>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        "Shishkin, Alexander"
 <alexander.shishkin@...el.com>,
        "Shutemov, Kirill"
 <kirill.shutemov@...el.com>,
        "Kuppuswamy, Sathyanarayanan"
 <sathyanarayanan.kuppuswamy@...el.com>,
        "Kleen, Andi"
 <andi.kleen@...el.com>,
        "Hansen, Dave" <dave.hansen@...el.com>,
        Thomas
 Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        "Wunner, Lukas" <lukas.wunner@...el.com>,
        Mika Westerberg
 <mika.westerberg@...ux.intel.com>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        Jason Wang <jasowang@...hat.com>,
        "Poimboe, Josh" <jpoimboe@...hat.com>,
        "aarcange@...hat.com" <aarcange@...hat.com>,
        Cfir Cohen <cfir@...gle.com>, Marc Orr <marcorr@...gle.com>,
        "jbachmann@...gle.com"
 <jbachmann@...gle.com>,
        "pgonda@...gle.com" <pgonda@...gle.com>,
        "keescook@...omium.org" <keescook@...omium.org>,
        James Morris
 <jmorris@...ei.org>,
        Michael Kelley <mikelley@...rosoft.com>,
        "Lange, Jon"
 <jlange@...rosoft.com>,
        "linux-coco@...ts.linux.dev"
 <linux-coco@...ts.linux.dev>,
        Linux Kernel Mailing List
 <linux-kernel@...r.kernel.org>,
        Kernel Hardening
 <kernel-hardening@...ts.openwall.com>
Subject: Re: Linux guest kernel threat model for Confidential Computing

On Thu, 2023-01-26 at 13:28 +0000, Reshetova, Elena wrote:
> > On Thu, Jan 26, 2023 at 11:29:20AM +0000, Reshetova, Elena wrote:
> > > > On Wed, Jan 25, 2023 at 03:29:07PM +0000, Reshetova, Elena
> > > > wrote:
> > > > > Replying only to the not-so-far addressed points.
> > > > > 
> > > > > > On Wed, Jan 25, 2023 at 12:28:13PM +0000, Reshetova, Elena
> > > > > > wrote:
> > > > > > > Hi Greg,
> > > > 
> > > > <...>
> > > > 
> > > > > > > 3) All the tools are open-source and everyone can start
> > > > > > > using them right away even without any special HW (readme
> > > > > > > has description of what is needed).
> > > > > > > Tools and documentation is here:
> > > > > > > https://github.com/intel/ccc-linux-guest-hardening
> > > > > > 
> > > > > > Again, as our documentation states, when you submit patches
> > > > > > based on these tools, you HAVE TO document that.  Otherwise
> > > > > > we think you all are crazy and will get your patches
> > > > > > rejected.  You all know this, why ignore it?
> > > > > 
> > > > > Sorry, I didn’t know that for every bug that is found in
> > > > > linux kernel when we are submitting a fix that we have to
> > > > > list the way how it has been found. We will fix this in the
> > > > > future submissions, but some bugs we have are found by
> > > > > plain code audit, so 'human' is the tool. 
> > > > My problem with that statement is that by applying different
> > > > threat model you "invent" bugs which didn't exist in a first
> > > > place.
> > > > 
> > > > For example, in this [1] latest submission, authors labeled
> > > > correct behaviour as "bug".
> > > > 
> > > > [1] https://lore.kernel.org/all/20230119170633.40944-1-
> > > > alexander.shishkin@...ux.intel.com/
> > > 
> > > Hm.. Does everyone think that when kernel dies with unhandled
> > > page fault (such as in that case) or detection of a KASAN out of
> > > bounds violation (as it is in some other cases we already have
> > > fixes or investigating) it represents a correct behavior even if
> > > you expect that all your pci HW devices are trusted?
> > 
> > This is exactly what I said. You presented me the cases which exist
> > in your invented world. Mentioned unhandled page fault doesn't
> > exist in real world. If PCI device doesn't work, it needs to be
> > replaced/blocked and not left to be operable and accessible from
> > the kernel/user.
> 
> Can we really assure correct operation of *all* pci devices out
> there? How would such an audit be performed given a huge set of them
> available? Isnt it better instead to make a small fix in the kernel
> behavior that would guard us from such potentially not correctly
> operating devices? 

I think this is really the wrong question from the confidential
computing (CC) point of view.  The question shouldn't be about assuring
that the PCI device is operating completely correctly all the time (for
some value of correct).  It's if it were programmed to be malicious
what could it do to us?  If we take all DoS and Crash outcomes off the
table (annoying but harmless if they don't reveal the confidential
contents), we're left with it trying to extract secrets from the
confidential environment.

The big threat from most devices (including the thunderbolt classes) is
that they can DMA all over memory.  However, this isn't really a threat
in CC (well until PCI becomes able to do encrypted DMA) because the
device has specific unencrypted buffers set aside for the expected DMA.
If it writes outside that CC integrity will detect it and if it reads
outside that it gets unintelligible ciphertext.  So we're left with the
device trying to trick secrets out of us by returning unexpected data.

If I set this as the problem, verifying device correct operation is a
possible solution (albeit hugely expensive), but there are likely many
other cheaper ways to defeat or detect a device trying to trick us into
revealing something.

James
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.