Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJ33NAWbTShcWk04-Nu+eeUtkkXOWbXM-nNb64xtwp=gN6dUmA@mail.gmail.com>
Date: Mon, 30 Mar 2026 11:34:02 +0530
From: Sandipan Roy <saroy@...hat.com>
To: oss-security@...ts.openwall.com
Cc: Paolo Bonzini <pbonzini@...hat.com>, denis.pilipchuk@...cle.com, bkov@...zon.com, 
	fgriffo@...zon.com, Yogesh Mittal <ymittal@...hat.com>, 
	Mauro Matteo Cascella <mcascell@...hat.com>
Subject: KVM shadow EPT stale rmap use-after-free

Hello OSS-Sec,

Alexander Bulekov(bkov@...zon.com) and Fred Griffoul (fgriffo@...zon.com)
reported a use-after-free in KVM's shadow paging code. The issue was found
through fuzzing. It is

exploitable from any x86 guest with nested virtualization enabled,

on either Intel or AMD processors, or using shadow paging (ept=0 /

npt=0). The bug leads to kernel memory corruption and DoS issues.


Summary:

mark_mmio_spte() overwrites a present shadow SPTE via mmu_spte_set()

without first calling drop_spte() to remove the rmap entry. When the

shadow page is later freed through a level-conflict zap, the stale rmap

points into freed memory. Subsequent rmap traversal (dirty-page-tracking,
ksmd,

NUMA balancing, ...) dereferences the stale pointer, resulting in a

use-after-free.

The bug was introduced by commit a54aa15c6bda ("KVM: x86/mmu: Handle

MMIO SPTEs directly in mmu_set_spte()").  It is present from

v5.13 through current upstream/kvm-next.


OVE ID: OVE-20260330-0003

CVE ID: Pending from kernel.org CNA.

Root cause:

Commit a54aa15c6bda moved MMIO SPTE handling to an early return in

mmu_set_spte() that bypasses the rmap cleanup. The original code was

safe because mark_mmio_spte was called inside set_spte(), which ran

after mmu_set_spte() had already cleaned up the rmap via drop_spte().

The reasoning in the commit message ("it should be impossible to

convert a valid SPTE to an MMIO SPTE") is motivated by the earlier

commit e0c378684b65 ("KVM: x86/mmu: Retry page faults that hit an

invalid memslot").  However, this protection does not apply if

the page table entry is rewritten via DMA, thus bypassing the

write protection on the guest's shadowed page tables.


Trigger Mechanism:

L1 (e.g. a normal QEMU VM) cooperates with L2. Two EPT12 PD entries

both point to the same guest PT page, so L0 creates one shadow page

for both paths.

1. L2 accesses GPA_A (via PD[0] -> PT[0]). L0 creates a shadow page

   for the guest PT with a present SPTE at spt[0] and an rmap entry.

2. L1 rewrites PT[0] to a noslot GPA via DMA (virtio-blk, bypasses

   EPT01 write-protection). L0 does not invalidate EPT02.

3. L2 accesses GPA_B (via PD[N] -> PT[0]). This is a fresh EPT02 miss

   on a different GPA, no invalidation needed. L0 resolves PT[0] to

   noslot. mark_mmio_spte() overwrites spt[0] without drop_spte(),

   leaving a stale rmap.

4. L1 rewrites EPT12 via DMA to create a level conflict (guest PT page

   reused as PD).

5. L2 accesses through the level-conflict path. L0 zaps the shadow

   page. mmu_page_zap_pte() clears the MMIO SPTE with

   mmu_spte_clear_no_track() (no rmap removal). Shadow page freed,

   stale rmap points to freed slab.

6. Rmap traversal (dirty logging, fork/madvise via MMU notifier, ksmd)

   dereferences the freed sptep. Use-after-free read and write.

On 6.1.74, CONFIG_KASAN=y reports issues such as:

1. use-after-free Write of size 8 in mmu_spte_clear_track_bits

2. use-after-free Read of size 8 in mmu_spte_clear_track_bits

3. use-after-free Read of size 8 in rmap_write_protect

4. use-after-free Read of size 4 in mmu_spte_clear_track_bits

5. null-ptr-deref at addr 0x24 in mmu_spte_clear_track_bits

6. user-memory-access in mmu_spte_clear_track_bits

7. slab-out-of-bounds Read in mmu_spte_clear_track_bits

8. kernel BUG at mmu.c:1110, BUG_ON(!is_shadow_present_pte(*sptep))

   in rmap_write_protect

These demonstrate guest-to-host DoS and guest-to-host kernel heap

corruption, potentially aiding VM escape.

On kernels 6.16 and newer the reproducer also triggers a WARN, present

since commit 11d45175111d ("KVM: x86/mmu: Warn if PFN changes on

shadow-present SPTE in shadow MMU").


Backport instructions:

The code has seen small changes but the logic has not changed substantially

since Linux v5.13. The "if (flush)" branch added by the patch is the same

as the one that is already present at the end of mmu_set_spte().


Timeline:

March 4, 2026: vulnerability reported to security@...nel.org by Alexander
Bulekov <bkov@...zon.com>, with Cc to the KVM-x86 maintainers (Paolo
Bonzini <pbonzini@...hat.com>, Sean Christopherson <seanjc@...gle.com)

March 5, 2026: related WARN reported by Sean Christopherson

March 6, 2026: final patches posted

March 29, 2026: Patch Released as Public.


Patches

[1]
https://lore.kernel.org/kvm/20260329162258.106549-1-pbonzini@redhat.com/T/#u

[2]
https://lore.kernel.org/kvm/20260329162258.106549-2-pbonzini@redhat.com/T/#u


-- 
*Sandipan Roy*

Senior Product Security Engineer, Product Security

Secure Engineering - Incident Response

Email: sandipan@...hat.com

PGP:0x4B5C7470051BB332 <https://bytehackr.fedorapeople.org/saroy.asc>

*secalert@...hat.com <secalert@...hat.com>* For Urgent Response.
<https://www.redhat.com/>

Powered by blists - more mailing lists

Please check out the Open Source Software Security Wiki, which is counterpart to this mailing list.

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.