Forum | Documentation | Website | Blog

Skip to content
Snippets Groups Projects
  1. Jul 12, 2024
  2. Jun 28, 2024
  3. Jun 20, 2024
  4. Jun 18, 2024
    • David Matlack's avatar
      KVM: Mark a vCPU as preempted/ready iff it's scheduled out while running · 11896456
      David Matlack authored
      Mark a vCPU as preempted/ready if-and-only-if it's scheduled out while
      running. i.e. Do not mark a vCPU preempted/ready if it's scheduled out
      during a non-KVM_RUN ioctl() or when userspace is doing KVM_RUN with
      immediate_exit.
      
      Commit 54aa83c9
      
       ("KVM: x86: do not set st->preempted when going back
      to user space") stopped marking a vCPU as preempted when returning to
      userspace, but if userspace then invokes a KVM vCPU ioctl() that gets
      preempted, the vCPU will be marked preempted/ready. This is arguably
      incorrect behavior since the vCPU was not actually preempted while the
      guest was running, it was preempted while doing something on behalf of
      userspace.
      
      Marking a vCPU preempted iff its running also avoids KVM dirtying guest
      memory after userspace has paused vCPUs, e.g. for live migration, which
      allows userspace to collect the final dirty bitmap before or in parallel
      with saving vCPU state,  without having to worry about saving vCPU state
      triggering writes to guest memory.
      
      Suggested-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Link: https://lore.kernel.org/r/20240503181734.1467938-4-dmatlack@google.com
      
      
      [sean: massage changelog]
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      11896456
    • David Matlack's avatar
      KVM: Ensure new code that references immediate_exit gets extra scrutiny · 4b23e0c1
      David Matlack authored
      
      Ensure that any new KVM code that references immediate_exit gets extra
      scrutiny by renaming it to immediate_exit__unsafe in kernel code.
      
      All fields in struct kvm_run are subject to TOCTOU races since they are
      mapped into userspace, which may be malicious or buggy. To protect KVM,
      introduces a new macro that appends __unsafe to select field names in
      struct kvm_run, hinting to developers and reviewers that accessing such
      fields must be done carefully.
      
      Apply the new macro to immediate_exit, since userspace can make
      immediate_exit inconsistent with vcpu->wants_to_run, i.e. accessing
      immediate_exit directly could lead to unexpected bugs in the future.
      
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Link: https://lore.kernel.org/r/20240503181734.1467938-3-dmatlack@google.com
      
      
      [sean: massage changelog]
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      4b23e0c1
    • David Matlack's avatar
      KVM: Introduce vcpu->wants_to_run · a6816314
      David Matlack authored
      
      Introduce vcpu->wants_to_run to indicate when a vCPU is in its core run
      loop, i.e. when the vCPU is running the KVM_RUN ioctl and immediate_exit
      was not set.
      
      Replace all references to vcpu->run->immediate_exit with
      !vcpu->wants_to_run to avoid TOCTOU races with userspace. For example, a
      malicious userspace could invoked KVM_RUN with immediate_exit=true and
      then after KVM reads it to set wants_to_run=false, flip it to false.
      This would result in the vCPU running in KVM_RUN with
      wants_to_run=false. This wouldn't cause any real bugs today but is a
      dangerous landmine.
      
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Link: https://lore.kernel.org/r/20240503181734.1467938-2-dmatlack@google.com
      
      
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      a6816314
    • Mathias Krause's avatar
      KVM: Reject overly excessive IDs in KVM_CREATE_VCPU · 8b8e57e5
      Mathias Krause authored
      If, on a 64 bit system, a vCPU ID is provided that has the upper 32 bits
      set to a non-zero value, it may get accepted if the truncated to 32 bits
      integer value is below KVM_MAX_VCPU_IDS and 'max_vcpus'. This feels very
      wrong and triggered the reporting logic of PaX's SIZE_OVERFLOW plugin.
      
      Instead of silently truncating and accepting such values, pass the full
      value to kvm_vm_ioctl_create_vcpu() and make the existing limit checks
      return an error.
      
      Even if this is a userland ABI breaking change, no sane userland could
      have ever relied on that behaviour.
      
      Reported-by: PaX's SIZE_OVERFLOW plugin running on grsecurity's syzkaller
      Fixes: 6aa8b732
      
       ("[PATCH] kvm: userspace interface")
      Cc: Emese Revfy <re.emese@gmail.com>
      Cc: PaX Team <pageexec@freemail.hu>
      Signed-off-by: default avatarMathias Krause <minipli@grsecurity.net>
      Link: https://lore.kernel.org/r/20240614202859.3597745-2-minipli@grsecurity.net
      
      
      [sean: tweak comment about INT_MAX assertion]
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      8b8e57e5
    • Babu Moger's avatar
      KVM: Stop processing *all* memslots when "null" mmu_notifier handler is found · c3f3edf7
      Babu Moger authored
      Bail from outer address space loop, not just the inner memslot loop, when
      a "null" handler is encountered by __kvm_handle_hva_range(), which is the
      intended behavior.  On x86, which has multiple address spaces thanks to
      SMM emulation, breaking from just the memslot loop results in undefined
      behavior due to assigning the non-existent return value from kvm_null_fn()
      to a bool.
      
      In practice, the bug is benign as kvm_mmu_notifier_invalidate_range_end()
      is the only caller that passes handler=kvm_null_fn, and it doesn't set
      flush_on_ret, i.e. assigning garbage to r.ret is ultimately ignored.  And
      for most configuration the compiler elides the entire sequence, i.e. there
      is no undefined behavior at runtime.
      
        ------------[ cut here ]------------
        UBSAN: invalid-load in arch/x86/kvm/../../../virt/kvm/kvm_main.c:655:10
        load of value 160 is not a valid value for type '_Bool'
        CPU: 370 PID: 8246 Comm: CPU 0/KVM Not tainted 6.8.2-amdsos-build58-ubuntu-22.04+ #1
        Hardware name: AMD Corporation Sh54p/Sh54p, BIOS WPC4429N 04/25/2024
        Call Trace:
         <TASK>
         dump_stack_lvl+0x48/0x60
         ubsan_epilogue+0x5/0x30
         __ubsan_handle_load_invalid_value+0x79/0x80
         kvm_mmu_notifier_invalidate_range_end.cold+0x18/0x4f [kvm]
         __mmu_notifier_invalidate_range_end+0x63/0xe0
         __split_huge_pmd+0x367/0xfc0
         do_huge_pmd_wp_page+0x1cc/0x380
         __handle_mm_fault+0x8ee/0xe50
         handle_mm_fault+0xe4/0x4a0
         __get_user_pages+0x190/0x840
         get_user_pages_unlocked+0xe0/0x590
         hva_to_pfn+0x114/0x550 [kvm]
         kvm_faultin_pfn+0xed/0x5b0 [kvm]
         kvm_tdp_page_fault+0x123/0x170 [kvm]
         kvm_mmu_page_fault+0x244/0xaa0 [kvm]
         vcpu_enter_guest+0x592/0x1070 [kvm]
         kvm_arch_vcpu_ioctl_run+0x145/0x8a0 [kvm]
         kvm_vcpu_ioctl+0x288/0x6d0 [kvm]
         __x64_sys_ioctl+0x8f/0xd0
         do_syscall_64+0x77/0x120
         entry_SYSCALL_64_after_hwframe+0x6e/0x76
         </TASK>
        ---[ end trace ]---
      
      Fixes: 071064f1
      
       ("KVM: Don't take mmu_lock for range invalidation unless necessary")
      Signed-off-by: default avatarBabu Moger <babu.moger@amd.com>
      Link: https://lore.kernel.org/r/b8723d39903b64c241c50f5513f804390c7b5eec.1718203311.git.babu.moger@amd.com
      
      
      [sean: massage changelog]
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      c3f3edf7
  5. Jun 14, 2024
  6. Jun 11, 2024
  7. Jun 05, 2024
    • Breno Leitao's avatar
      KVM: Fix a data race on last_boosted_vcpu in kvm_vcpu_on_spin() · 49f683b4
      Breno Leitao authored
      Use {READ,WRITE}_ONCE() to access kvm->last_boosted_vcpu to ensure the
      loads and stores are atomic.  In the extremely unlikely scenario the
      compiler tears the stores, it's theoretically possible for KVM to attempt
      to get a vCPU using an out-of-bounds index, e.g. if the write is split
      into multiple 8-bit stores, and is paired with a 32-bit load on a VM with
      257 vCPUs:
      
        CPU0                              CPU1
        last_boosted_vcpu = 0xff;
      
                                          (last_boosted_vcpu = 0x100)
                                          last_boosted_vcpu[15:8] = 0x01;
        i = (last_boosted_vcpu = 0x1ff)
                                          last_boosted_vcpu[7:0] = 0x00;
      
        vcpu = kvm->vcpu_array[0x1ff];
      
      As detected by KCSAN:
      
        BUG: KCSAN: data-race in kvm_vcpu_on_spin [kvm] / kvm_vcpu_on_spin [kvm]
      
        write to 0xffffc90025a92344 of 4 bytes by task 4340 on cpu 16:
        kvm_vcpu_on_spin (arch/x86/kvm/../../../virt/kvm/kvm_main.c:4112) kvm
        handle_pause (arch/x86/kvm/vmx/vmx.c:5929) kvm_intel
        vmx_handle_exit (arch/x86/kvm/vmx/vmx.c:?
      		 arch/x86/kvm/vmx/vmx.c:6606) kvm_intel
        vcpu_run (arch/x86/kvm/x86.c:11107 arch/x86/kvm/x86.c:11211) kvm
        kvm_arch_vcpu_ioctl_run (arch/x86/kvm/x86.c:?) kvm
        kvm_vcpu_ioctl (arch/x86/kvm/../../../virt/kvm/kvm_main.c:?) kvm
        __se_sys_ioctl (fs/ioctl.c:52 fs/ioctl.c:904 fs/ioctl.c:890)
        __x64_sys_ioctl (fs/ioctl.c:890)
        x64_sys_call (arch/x86/entry/syscall_64.c:33)
        do_syscall_64 (arch/x86/entry/common.c:?)
        entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
      
        read to 0xffffc90025a92344 of 4 bytes by task 4342 on cpu 4:
        kvm_vcpu_on_spin (arch/x86/kvm/../../../virt/kvm/kvm_main.c:4069) kvm
        handle_pause (arch/x86/kvm/vmx/vmx.c:5929) kvm_intel
        vmx_handle_exit (arch/x86/kvm/vmx/vmx.c:?
      			arch/x86/kvm/vmx/vmx.c:6606) kvm_intel
        vcpu_run (arch/x86/kvm/x86.c:11107 arch/x86/kvm/x86.c:11211) kvm
        kvm_arch_vcpu_ioctl_run (arch/x86/kvm/x86.c:?) kvm
        kvm_vcpu_ioctl (arch/x86/kvm/../../../virt/kvm/kvm_main.c:?) kvm
        __se_sys_ioctl (fs/ioctl.c:52 fs/ioctl.c:904 fs/ioctl.c:890)
        __x64_sys_ioctl (fs/ioctl.c:890)
        x64_sys_call (arch/x86/entry/syscall_64.c:33)
        do_syscall_64 (arch/x86/entry/common.c:?)
        entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
      
        value changed: 0x00000012 -> 0x00000000
      
      Fixes: 217ece61
      
       ("KVM: use yield_to instead of sleep in kvm_vcpu_on_spin")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarBreno Leitao <leitao@debian.org>
      Link: https://lore.kernel.org/r/20240510092353.2261824-1-leitao@debian.org
      
      
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      49f683b4
  8. Jun 03, 2024
    • Sean Christopherson's avatar
      Revert "KVM: async_pf: avoid recursive flushing of work items" · 778c350e
      Sean Christopherson authored
      Now that KVM does NOT gift async #PF workers a "struct kvm" reference,
      don't bother skipping "done" workers when flushing/canceling queued
      workers, as the deadlock that was being fudged around can no longer occur.
      When workers, i.e. async_pf_execute(), were gifted a referenced, it was
      possible for a worker to put the last reference and trigger VM destruction,
      i.e. trigger flushing of a workqueue from a worker in said workqueue.
      
      Note, there is no actual lock, the deadlock was that a worker will be
      stuck waiting for itself (the workqueue code simulates a lock/unlock via
      lock_map_{acquire,release}()).
      
      Skipping "done" workers isn't problematic per se, but using work->vcpu as
      a "done" flag is confusing, e.g. it's not clear that async_pf.lock is
      acquired to protect the work->vcpu, NOT the processing of async_pf.queue
      (which is protected by vcpu->mutex).
      
      This reverts commit 22583f0d
      
      .
      
      Suggested-by: default avatarXu Yilun <yilun.xu@linux.intel.com>
      Link: https://lore.kernel.org/r/20240423191649.2885257-1-seanjc@google.com
      
      
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      778c350e
    • Parshuram Sangle's avatar
      KVM: Enable halt polling shrink parameter by default · aeb1b22a
      Parshuram Sangle authored
      
      Default halt_poll_ns_shrink value of 0 always resets polling interval
      to 0 on an un-successful poll where vcpu wakeup is not received. This is
      mostly to avoid pointless polling for more number of shorter intervals. But
      disabled shrink assumes vcpu wakeup is less likely to be received in
      subsequent shorter polling intervals. Another side effect of 0 shrink value
      is that, even on a successful poll if total block time was greater than
      current polling interval, the polling interval starts over from 0 instead
      of shrinking by a factor.
      
      Enabling shrink with value of 2 allows the polling interval to gradually
      decrement in case of un-successful poll events as well. This gives a fair
      chance for successful polling events in subsequent polling intervals rather
      than resetting it to 0 and starting over from grow_start.
      
      Below kvm stat log snippet shows interleaved growth and shrinking of
      polling interval:
      87162647182125: kvm_halt_poll_ns: vcpu 0: halt_poll_ns 10000 (grow 0)
      87162647637763: kvm_halt_poll_ns: vcpu 0: halt_poll_ns 20000 (grow 10000)
      87162649627943: kvm_halt_poll_ns: vcpu 0: halt_poll_ns 40000 (grow 20000)
      87162650892407: kvm_halt_poll_ns: vcpu 0: halt_poll_ns 20000 (shrink 40000)
      87162651540378: kvm_halt_poll_ns: vcpu 0: halt_poll_ns 40000 (grow 20000)
      87162652276768: kvm_halt_poll_ns: vcpu 0: halt_poll_ns 20000 (shrink 40000)
      87162652515037: kvm_halt_poll_ns: vcpu 0: halt_poll_ns 40000 (grow 20000)
      87162653383787: kvm_halt_poll_ns: vcpu 0: halt_poll_ns 20000 (shrink 40000)
      87162653627670: kvm_halt_poll_ns: vcpu 0: halt_poll_ns 10000 (shrink 20000)
      87162653796321: kvm_halt_poll_ns: vcpu 0: halt_poll_ns 20000 (grow 10000)
      87162656171645: kvm_halt_poll_ns: vcpu 0: halt_poll_ns 10000 (shrink 20000)
      87162661607487: kvm_halt_poll_ns: vcpu 0: halt_poll_ns 0 (shrink 10000)
      
      Having both grow and shrink enabled creates a balance in polling interval
      growth and shrink behavior. Tests show improved successful polling attempt
      ratio which contribute to VM performance. Power penalty is quite negligible
      as shrunk polling intervals create bursts of very short durations.
      
      Performance assessment results show 3-6% improvements in CPU+GPU, Memory
      and Storage Android VM workloads whereas 5-9% improvement in average FPS of
      gaming VM workloads.
      
      Power penalty is below 1% where host OS is either idle or running a
      native workload having 2 VMs enabled. CPU/GPU intensive gaming workloads
      as well do not show any increased power overhead with shrink enabled.
      
      Co-developed-by: default avatarRajendran Jaishankar <jaishankar.rajendran@intel.com>
      Signed-off-by: default avatarRajendran Jaishankar <jaishankar.rajendran@intel.com>
      Signed-off-by: default avatarParshuram Sangle <parshuram.sangle@intel.com>
      Link: https://lore.kernel.org/r/20231102154628.2120-2-parshuram.sangle@intel.com
      
      
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      aeb1b22a
    • Borislav Petkov's avatar
      KVM: Unexport kvm_debugfs_dir · 96a02b9f
      Borislav Petkov authored
      After
      
        faf01aef
      
       ("KVM: PPC: Merge powerpc's debugfs entry content into generic entry")
      
      kvm_debugfs_dir is not used anywhere else outside of kvm_main.c
      
      Unexport it and make it static.
      
      Signed-off-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Link: https://lore.kernel.org/r/20240515150804.9354-1-bp@kernel.org
      
      
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      96a02b9f
  9. May 12, 2024
  10. May 10, 2024
    • Michael Roth's avatar
      KVM: guest_memfd: Add hook for invalidating memory · a90764f0
      Michael Roth authored
      
      In some cases, like with SEV-SNP, guest memory needs to be updated in a
      platform-specific manner before it can be safely freed back to the host.
      Wire up arch-defined hooks to the .free_folio kvm_gmem_aops callback to
      allow for special handling of this sort when freeing memory in response
      to FALLOC_FL_PUNCH_HOLE operations and when releasing the inode, and go
      ahead and define an arch-specific hook for x86 since it will be needed
      for handling memory used for SEV-SNP guests.
      
      Signed-off-by: default avatarMichael Roth <michael.roth@amd.com>
      Message-Id: <20231230172351.574091-6-michael.roth@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a90764f0
    • Paolo Bonzini's avatar
      KVM: guest_memfd: Add interface for populating gmem pages with user data · 1f6c06b1
      Paolo Bonzini authored
      
      During guest run-time, kvm_arch_gmem_prepare() is issued as needed to
      prepare newly-allocated gmem pages prior to mapping them into the guest.
      In the case of SEV-SNP, this mainly involves setting the pages to
      private in the RMP table.
      
      However, for the GPA ranges comprising the initial guest payload, which
      are encrypted/measured prior to starting the guest, the gmem pages need
      to be accessed prior to setting them to private in the RMP table so they
      can be initialized with the userspace-provided data. Additionally, an
      SNP firmware call is needed afterward to encrypt them in-place and
      measure the contents into the guest's launch digest.
      
      While it is possible to bypass the kvm_arch_gmem_prepare() hooks so that
      this handling can be done in an open-coded/vendor-specific manner, this
      may expose more gmem-internal state/dependencies to external callers
      than necessary. Try to avoid this by implementing an interface that
      tries to handle as much of the common functionality inside gmem as
      possible, while also making it generic enough to potentially be
      usable/extensible for TDX as well.
      
      Suggested-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarMichael Roth <michael.roth@amd.com>
      Co-developed-by: default avatarMichael Roth <michael.roth@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1f6c06b1
    • Paolo Bonzini's avatar
      KVM: guest_memfd: extract __kvm_gmem_get_pfn() · 17573fd9
      Paolo Bonzini authored
      
      In preparation for adding a function that walks a set of pages
      provided by userspace and populates them in a guest_memfd,
      add a version of kvm_gmem_get_pfn() that has a "bool prepare"
      argument and passes it down to kvm_gmem_get_folio().
      
      Populating guest memory has to call repeatedly __kvm_gmem_get_pfn()
      on the same file, so make the new function take struct file*.
      
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      17573fd9
    • Paolo Bonzini's avatar
      KVM: guest_memfd: Add hook for initializing memory · 3bb2531e
      Paolo Bonzini authored
      guest_memfd pages are generally expected to be in some arch-defined
      initial state prior to using them for guest memory. For SEV-SNP this
      initial state is 'private', or 'guest-owned', and requires additional
      operations to move these pages into a 'private' state by updating the
      corresponding entries the RMP table.
      
      Allow for an arch-defined hook to handle updates of this sort, and go
      ahead and implement one for x86 so KVM implementations like AMD SVM can
      register a kvm_x86_ops callback to handle these updates for SEV-SNP
      guests.
      
      The preparation callback is always called when allocating/grabbing
      folios via gmem, and it is up to the architecture to keep track of
      whether or not the pages are already in the expected state (e.g. the RMP
      table in the case of SEV-SNP).
      
      In some cases, it is necessary to defer the preparation of the pages to
      handle things like in-place encryption of initial guest memory payloads
      before marking these pages as 'private'/'guest-owned'.  Add an argument
      (always true for now) to kvm_gmem_get_folio() that allows for the
      preparation callback to be bypassed.  To detect possible issues in
      the way userspace initializes memory, it is only possible to add an
      unprepared page if it is not already included in the filemap.
      
      Link: https://lore.kernel.org/lkml/ZLqVdvsF11Ddo7Dq@google.com/
      
      
      Co-developed-by: default avatarMichael Roth <michael.roth@amd.com>
      Signed-off-by: default avatarMichael Roth <michael.roth@amd.com>
      Message-Id: <20231230172351.574091-5-michael.roth@amd.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3bb2531e
    • Paolo Bonzini's avatar
      KVM: guest_memfd: limit overzealous WARN · fa30b0dc
      Paolo Bonzini authored
      
      Because kvm_gmem_get_pfn() is called from the page fault path without
      any of the slots_lock, filemap lock or mmu_lock taken, it is
      possible for it to race with kvm_gmem_unbind().  This is not a
      problem, as any PTE that is installed temporarily will be zapped
      before the guest has the occasion to run.
      
      However, it is not possible to have a complete unbind+bind
      racing with the page fault, because deleting the memslot
      will call synchronize_srcu_expedited() and wait for the
      page fault to be resolved.  Thus, we can still warn if
      the file is there and is not the one we expect.
      
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fa30b0dc
    • Paolo Bonzini's avatar
      KVM: guest_memfd: pass error up from filemap_grab_folio · 70623723
      Paolo Bonzini authored
      
      Some SNP ioctls will require the page not to be in the pagecache, and as such they
      will want to return EEXIST to userspace.  Start by passing the error up from
      filemap_grab_folio.
      
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      70623723
    • Michael Roth's avatar
      KVM: guest_memfd: Use AS_INACCESSIBLE when creating guest_memfd inode · 1d23040c
      Michael Roth authored
      truncate_inode_pages_range() may attempt to zero pages before truncating
      them, and this will occur before arch-specific invalidations can be
      triggered via .invalidate_folio/.free_folio hooks via kvm_gmem_aops. For
      AMD SEV-SNP this would result in an RMP #PF being generated by the
      hardware, which is currently treated as fatal (and even if specifically
      allowed for, would not result in anything other than garbage being
      written to guest pages due to encryption). On Intel TDX this would also
      result in undesirable behavior.
      
      Set the AS_INACCESSIBLE flag to prevent the MM from attempting
      unexpected accesses of this sort during operations like truncation.
      
      This may also in some cases yield a decent performance improvement for
      guest_memfd userspace implementations that hole-punch ranges immediately
      after private->shared conversions via KVM_SET_MEMORY_ATTRIBUTES, since
      the current implementation of truncate_inode_pages_range() always ends
      up zero'ing an entire 4K range if it is backing by a 2M folio.
      
      Link: https://lore.kernel.org/lkml/ZR9LYhpxTaTk6PJX@google.com/
      
      
      Suggested-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarMichael Roth <michael.roth@amd.com>
      Message-ID: <20240329212444.395559-6-michael.roth@amd.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1d23040c
  11. May 05, 2024
  12. May 02, 2024
    • Venkatesh Srinivas's avatar
      KVM: Remove kvm_make_all_cpus_request_except() · 82e9c84d
      Venkatesh Srinivas authored
      Remove kvm_make_all_cpus_request_except() as it effectively has no users,
      and arguably should never have been added in the first place.
      
      Commit 54163a34 ("KVM: Introduce kvm_make_all_cpus_request_except()")
      added the "except" variation for use in SVM's AVIC update path, which used
      it to skip sending a request to the current vCPU (commit 7d611233
      ("KVM: SVM: Disable AVIC before setting V_IRQ")).
      
      But the AVIC usage of kvm_make_all_cpus_request_except() was essentially a
      hack-a-fix that simply squashed the most likely scenario of a racy WARN
      without addressing the underlying problem(s).  Commit f1577ab2 ("KVM:
      SVM: svm_set_vintr don't warn if AVIC is active but is about to be
      deactivated") eventually fixed the WARN itself, and the "except" usage was
      subsequently dropped by df63202f
      
       ("KVM: x86: APICv: drop immediate
      APICv disablement on current vCPU").
      
      That kvm_make_all_cpus_request_except() hasn't gained any users in the
      last ~3 years isn't a coincidence.  If a VM-wide broadcast *needs* to skip
      the current vCPU, then odds are very good that there is underlying bug
      that could be better fixed elsewhere.
      
      Signed-off-by: default avatarVenkatesh Srinivas <venkateshs@chromium.org>
      Link: https://lore.kernel.org/r/20240404232651.1645176-1-venkateshs@chromium.org
      
      
      [sean: rewrite changelog with --verbose]
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      82e9c84d
  13. Apr 25, 2024
  14. Apr 19, 2024
  15. Apr 11, 2024
  16. Apr 09, 2024