Forum | Documentation | Website | Blog

Skip to content
Snippets Groups Projects
  1. Aug 01, 2024
    • Blazej Kucman's avatar
      PCI: pciehp: Retain Power Indicator bits for userspace indicators · 5560a612
      Blazej Kucman authored
      The sysfs "attention" file normally controls the Slot Control Attention
      Indicator with 0 (off), 1 (on), 2 (blink) settings.
      
      576243b3 ("PCI: pciehp: Allow exclusive userspace control of
      indicators") added pciehp_set_raw_indicator_status() to allow userspace to
      directly control all four bits in both the Attention Indicator and the
      Power Indicator fields via the "attention" file.
      
      This is used on Intel VMD bridges so utilities like "ledmon" can use sysfs
      "attention" to control up to 16 indicators for NVMe device RAID status.
      
      abaaac48 ("PCI: hotplug: Use FIELD_GET/PREP()") broke this by masking
      the sysfs data with PCI_EXP_SLTCTL_AIC, which discards the upper two bits
      intended for the Power Indicator Control field (PCI_EXP_SLTCTL_PIC).
      
      For NVMe devices behind an Intel VMD, ledmon settings that use the
      PCI_EXP_SLTCTL_PIC bits, i.e., ATTENTION_REBUILD (0x5), ATTENTION_LOCATE
      (0x7), ATTENTION_FAILURE (0xD), ATTENTION_OFF (0xF), no longer worked
      correctly.
      
      Mask with PCI_EXP_SLTCTL_AIC | PCI_EXP_SLTCTL_PIC to retain both the
      Attention Indicator and the Power Indicator bits.
      
      Fixes: abaaac48 ("PCI: hotplug: Use FIELD_GET/PREP()")
      Link: https://lore.kernel.org/r/20240722141440.7210-1-blazej.kucman@intel.com
      
      
      Signed-off-by: default avatarBlazej Kucman <blazej.kucman@intel.com>
      [bhelgaas: commit log]
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Cc: stable@vger.kernel.org	# v6.7+
      5560a612
  2. May 30, 2024
  3. Oct 24, 2023
  4. Aug 10, 2023
  5. Jun 14, 2023
  6. May 24, 2023
  7. Feb 14, 2023
  8. Dec 07, 2022
  9. Feb 10, 2022
  10. Feb 03, 2022
  11. Jan 12, 2022
    • Hans de Goede's avatar
      PCI: pciehp: Use down_read/write_nested(reset_lock) to fix lockdep errors · 085a9f43
      Hans de Goede authored
      Use down_read_nested() and down_write_nested() when taking the
      ctrl->reset_lock rw-sem, passing the number of PCIe hotplug controllers in
      the path to the PCI root bus as lock subclass parameter.
      
      This fixes the following false-positive lockdep report when unplugging a
      Lenovo X1C8 from a Lenovo 2nd gen TB3 dock:
      
        pcieport 0000:06:01.0: pciehp: Slot(1): Link Down
        pcieport 0000:06:01.0: pciehp: Slot(1): Card not present
        ============================================
        WARNING: possible recursive locking detected
        5.16.0-rc2+ #621 Not tainted
        --------------------------------------------
        irq/124-pciehp/86 is trying to acquire lock:
        ffff8e5ac4299ef8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_check_presence+0x23/0x80
      
        but task is already holding lock:
        ffff8e5ac4298af8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_ist+0xf3/0x180
      
         other info that might help us debug this:
         Possible unsafe locking scenario:
      
      	 CPU0
      	 ----
          lock(&ctrl->reset_lock);
          lock(&ctrl->reset_lock);
      
         *** DEADLOCK ***
      
         May be due to missing lock nesting notation
      
        3 locks held by irq/124-pciehp/86:
         #0: ffff8e5ac4298af8 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_ist+0xf3/0x180
         #1: ffffffffa3b024e8 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pciehp_unconfigure_device+0x31/0x110
         #2: ffff8e5ac1ee2248 (&dev->mutex){....}-{3:3}, at: device_release_driver+0x1c/0x40
      
        stack backtrace:
        CPU: 4 PID: 86 Comm: irq/124-pciehp Not tainted 5.16.0-rc2+ #621
        Hardware name: LENOVO 20U90SIT19/20U90SIT19, BIOS N2WET30W (1.20 ) 08/26/2021
        Call Trace:
         <TASK>
         dump_stack_lvl+0x59/0x73
         __lock_acquire.cold+0xc5/0x2c6
         lock_acquire+0xb5/0x2b0
         down_read+0x3e/0x50
         pciehp_check_presence+0x23/0x80
         pciehp_runtime_resume+0x5c/0xa0
         device_for_each_child+0x45/0x70
         pcie_port_device_runtime_resume+0x20/0x30
         pci_pm_runtime_resume+0xa7/0xc0
         __rpm_callback+0x41/0x110
         rpm_callback+0x59/0x70
         rpm_resume+0x512/0x7b0
         __pm_runtime_resume+0x4a/0x90
         __device_release_driver+0x28/0x240
         device_release_driver+0x26/0x40
         pci_stop_bus_device+0x68/0x90
         pci_stop_bus_device+0x2c/0x90
         pci_stop_and_remove_bus_device+0xe/0x20
         pciehp_unconfigure_device+0x6c/0x110
         pciehp_disable_slot+0x5b/0xe0
         pciehp_handle_presence_or_link_change+0xc3/0x2f0
         pciehp_ist+0x179/0x180
      
      This lockdep warning is triggered because with Thunderbolt, hotplug ports
      are nested. When removing multiple devices in a daisy-chain, each hotplug
      port's reset_lock may be acquired recursively. It's never the same lock, so
      the lockdep splat is a false positive.
      
      Because locks at the same hierarchy level are never acquired recursively, a
      per-level lockdep class is sufficient to fix the lockdep warning.
      
      The choice to use one lockdep subclass per pcie-hotplug controller in the
      path to the root-bus was made to conserve class keys because their number
      is limited and the complexity grows quadratically with number of keys
      according to Documentation/locking/lockdep-design.rst.
      
      Link: https://lore.kernel.org/linux-pci/20190402021933.GA2966@mit.edu/
      Link: https://lore.kernel.org/linux-pci/de684a28-9038-8fc6-27ca-3f6f2f6400d7@redhat.com/
      Link: https://lore.kernel.org/r/20211217141709.379663-1-hdegoede@redhat.com
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=208855
      
      
      Reported-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarLukas Wunner <lukas@wunner.de>
      Cc: stable@vger.kernel.org
      085a9f43
  12. Nov 19, 2021
  13. Nov 18, 2021
  14. Oct 15, 2021
    • Lukas Wunner's avatar
      PCI: pciehp: Ignore Link Down/Up caused by error-induced Hot Reset · ea401499
      Lukas Wunner authored
      Stuart Hayes reports that an error handled by DPC at a Root Port results
      in pciehp gratuitously bringing down a subordinate hotplug port:
      
        RP -- UP -- DP -- UP -- DP (hotplug) -- EP
      
      pciehp brings the slot down because the Link to the Endpoint goes down.
      That is caused by a Hot Reset being propagated as a result of DPC.
      Per PCIe Base Spec 5.0, section 6.6.1 "Conventional Reset":
      
        For a Switch, the following must cause a hot reset to be sent on all
        Downstream Ports: [...]
      
        * The Data Link Layer of the Upstream Port reporting DL_Down status.
          In Switches that support Link speeds greater than 5.0 GT/s, the
          Upstream Port must direct the LTSSM of each Downstream Port to the
          Hot Reset state, but not hold the LTSSMs in that state. This permits
          each Downstream Port to begin Link training immediately after its
          hot reset completes. This behavior is recommended for all Switches.
      
        * Receiving a hot reset on the Upstream Port.
      
      Once DPC recovers, pcie_do_recovery() walks down the hierarchy and
      invokes pcie_portdrv_slot_reset() to restore each port's config space.
      At that point, a hotplug interrupt is signaled per PCIe Base Spec r5.0,
      section 6.7.3.4 "Software Notification of Hot-Plug Events":
      
        If the Port is enabled for edge-triggered interrupt signaling using
        MSI or MSI-X, an interrupt message must be sent every time the logical
        AND of the following conditions transitions from FALSE to TRUE: [...]
      
        * The Hot-Plug Interrupt Enable bit in the Slot Control register is
          set to 1b.
      
        * At least one hot-plug event status bit in the Slot Status register
          and its associated enable bit in the Slot Control register are both
          set to 1b.
      
      Prevent pciehp from gratuitously bringing down the slot by clearing the
      error-induced Data Link Layer State Changed event before restoring
      config space.  Afterwards, check whether the link has unexpectedly
      failed to retrain and synthesize a DLLSC event if so.
      
      Allow each pcie_port_service_driver (one of them being pciehp) to define
      a slot_reset callback and re-use the existing pm_iter() function to
      iterate over the callbacks.
      
      Thereby, the Endpoint driver remains bound throughout error recovery and
      may restore the device to working state.
      
      Surprise removal during error recovery is detected through a Presence
      Detect Changed event.  The hotplug port is expected to not signal that
      event as a result of a Hot Reset.
      
      The issue isn't DPC-specific, it also occurs when an error is handled by
      AER through aer_root_reset().  So while the issue was noticed only now,
      it's been around since 2006 when AER support was first introduced.
      
      [bhelgaas: drop PCI_ERROR_RECOVERY Kconfig, split pm_iter() rename to
      preparatory patch]
      Link: https://lore.kernel.org/linux-pci/08c046b0-c9f2-3489-eeef-7e7aca435bb9@gmail.com/
      Fixes: 6c2b374d ("PCI-Express AER implemetation: AER core and aerdriver")
      Link: https://lore.kernel.org/r/251f4edcc04c14f873ff1c967bc686169cd07d2d.1627638184.git.lukas@wunner.de
      
      
      Reported-by: default avatarStuart Hayes <stuart.w.hayes@gmail.com>
      Tested-by: default avatarStuart Hayes <stuart.w.hayes@gmail.com>
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Cc: stable@vger.kernel.org # v2.6.19+: ba952824: PCI/portdrv: Report reset for frozen channel
      Cc: Keith Busch <kbusch@kernel.org>
      ea401499
  15. Aug 18, 2021
  16. Jun 16, 2021
  17. Sep 17, 2020
    • Lukas Wunner's avatar
      PCI: pciehp: Reduce noisiness on hot removal · 8a614499
      Lukas Wunner authored
      When a PCIe card is hot-removed, the Presence Detect State and Data Link
      Layer Link Active bits often do not clear simultaneously.  I've seen delays
      of up to 244 msec between the two events with Thunderbolt.
      
      After pciehp has brought down the slot in response to the first event, the
      other bit may still be set.  It's not discernible whether it's set because
      a new card is already in the slot or if it will soon clear.  So pciehp
      tries to bring up the slot and in the latter case fails with a bunch of
      messages, some of them at KERN_ERR severity.  If the slot is no longer
      occupied, the messages are false positives and annoy users.
      
      Stuart Hayes reports the following splat on hot removal:
      
        KERN_INFO pcieport 0000:3c:06.0: pciehp: Slot(180): Link Up
        KERN_INFO pcieport 0000:3c:06.0: pciehp: Timeout waiting for Presence Detect
        KERN_ERR  pcieport 0000:3c:06.0: pciehp: link training error: status 0x0001
        KERN_ERR  pcieport 0000:3c:06.0: pciehp: Failed to check link status
      
      Dongdong Liu complains about a similar splat:
      
        KERN_INFO pciehp 0000:80:10.0:pcie004: Slot(36): Link Down
        KERN_INFO iommu: Removing device 0000:87:00.0 from group 12
        KERN_INFO pciehp 0000:80:10.0:pcie004: Slot(36): Card present
        KERN_INFO pcieport 0000:80:10.0: Data Link Layer Link Active not set in 1000 msec
        KERN_ERR  pciehp 0000:80:10.0:pcie004: Failed to check link status
      
      Users are particularly irritated to see a bringup attempt even though the
      slot was explicitly brought down via sysfs.  In a perfect world, we could
      avoid this by setting Link Disable on slot bringdown and re-enabling it
      upon a Presence Detect State change.  In reality however, there are broken
      hotplug ports which hardwire Presence Detect to zero, see 80696f99
      ("PCI: pciehp: Tolerate Presence Detect hardwired to zero").  Conversely,
      PCIe r1.0 hotplug ports hardwire Link Active to zero because Link Active
      Reporting wasn't specified before PCIe r1.1.  On unplug, some ports first
      clear Presence then Link (see Stuart Hayes' splat) whereas others use the
      inverse order (see Dongdong Liu's splat).  To top it off, there are hotplug
      ports which flap the Presence and Link bits on slot bringup, see
      6c35a1ac ("PCI: pciehp: Tolerate initially unstable link").
      
      pciehp is designed to work with all of these variants.  Surplus attempts at
      slot bringup are a lesser evil than not being able to bring up slots at
      all.  Although we could try to perfect the behavior for specific hotplug
      controllers, we'd risk breaking others or increasing code complexity.
      
      But we can certainly minimize annoyance by emitting only a single message
      with KERN_INFO severity if bringup is unsuccessful:
      
      * Drop the "Timeout waiting for Presence Detect" message in
        pcie_wait_for_presence().  The sole caller of that function,
        pciehp_check_link_status(), ignores the timeout and carries on.  It emits
        error messages of its own and I don't think this particular message adds
        much value.
      
      * There's a single error condition in pciehp_check_link_status() which
        does not emit a message.  Adding one allows dropping the "Failed to check
        link status" message emitted by board_added() if
        pciehp_check_link_status() returns a non-zero integer.
      
      * Tone down all messages in pciehp_check_link_status() to KERN_INFO
        severity and rephrase them to look as innocuous as possible.  To this
        end, move the message emitted by pcie_wait_for_link_delay() to its
        callers.
      
      As a result, Stuart Hayes' splat becomes:
      
        KERN_INFO pcieport 0000:3c:06.0: pciehp: Slot(180): Link Up
        KERN_INFO pcieport 0000:3c:06.0: pciehp: Slot(180): Cannot train link: status 0x0001
      
      Dongdong Liu's splat becomes:
      
        KERN_INFO pciehp 0000:80:10.0:pcie004: Slot(36): Card present
        KERN_INFO pciehp 0000:80:10.0:pcie004: Slot(36): No link
      
      The messages now merely serve as information that presence or link bits
      were set a little longer than expected.  Bringup failures which are not
      false positives are still reported, albeit no longer at KERN_ERR severity.
      
      Link: https://lore.kernel.org/linux-pci/20200310182100.102987-1-stuart.w.hayes@gmail.com/
      Link: https://lore.kernel.org/linux-pci/1547649064-19019-1-git-send-email-liudongdong3@huawei.com/
      Link: https://lore.kernel.org/r/b45e46fd8a6aa6930aaac9d7718c2e4b787a4e5e.1595935071.git.lukas@wunner.de
      
      
      Reported-by: default avatarStuart Hayes <stuart.w.hayes@gmail.com>
      Reported-by: default avatarDongdong Liu <liudongdong3@huawei.com>
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      8a614499
  18. Mar 31, 2020
    • Stuart Hayes's avatar
      PCI: pciehp: Fix MSI interrupt race · 8edf5332
      Stuart Hayes authored
      Without this commit, a PCIe hotplug port can stop generating interrupts on
      hotplug events, so device adds and removals will not be seen:
      
      The pciehp interrupt handler pciehp_isr() reads the Slot Status register
      and then writes back to it to clear the bits that caused the interrupt.  If
      a different interrupt event bit gets set between the read and the write,
      pciehp_isr() returns without having cleared all of the interrupt event
      bits.  If this happens when the MSI isn't masked (which by default it isn't
      in handle_edge_irq(), and which it will never be when MSI per-vector
      masking is not supported), we won't get any more hotplug interrupts from
      that device.
      
      That is expected behavior, according to the PCIe Base Spec r5.0, section
      6.7.3.4, "Software Notification of Hot-Plug Events".
      
      Because the Presence Detect Changed and Data Link Layer State Changed event
      bits can both get set at nearly the same time when a device is added or
      removed, this is more likely to happen than it might seem.  The issue was
      found (and can be reproduced rather easily) by connecting and disconnecting
      an NVMe storage device on at least one system model where the NVMe devices
      were being connected to an AMD PCIe port (PCI device 0x1022/0x1483).
      
      Fix the issue by modifying pciehp_isr() to loop back and re-read the Slot
      Status register immediately after writing to it, until it sees that all of
      the event status bits have been cleared.
      
      [lukas: drop loop count limitation, write "events" instead of "status",
      don't loop back in INTx and poll modes, tweak code comment & commit msg]
      Link: https://lore.kernel.org/r/78b4ced5072bfe6e369d20e8b47c279b8c7af12e.1582121613.git.lukas@wunner.de
      
      
      Tested-by: default avatarStuart Hayes <stuart.w.hayes@gmail.com>
      Signed-off-by: default avatarStuart Hayes <stuart.w.hayes@gmail.com>
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarJoerg Roedel <jroedel@suse.de>
      8edf5332
    • Lukas Wunner's avatar
      PCI: pciehp: Fix indefinite wait on sysfs requests · 3e487d2e
      Lukas Wunner authored
      David Hoyer reports that powering pciehp slots up or down via sysfs may
      hang:  The call to wait_event() in pciehp_sysfs_enable_slot() and
      _disable_slot() does not return because ctrl->ist_running remains true.
      
      This flag, which was introduced by commit 157c1062 ("PCI: pciehp: Avoid
      returning prematurely from sysfs requests"), signifies that the IRQ thread
      pciehp_ist() is running.  It is set to true at the top of pciehp_ist() and
      reset to false at the end.  However there are two additional return
      statements in pciehp_ist() before which the commit neglected to reset the
      flag to false and wake up waiters for the flag.
      
      That omission opens up the following race when powering up the slot:
      
      * pciehp_ist() runs because a PCI_EXP_SLTSTA_PDC event was requested
        by pciehp_sysfs_enable_slot()
      
      * pciehp_ist() turns on slot power via the following call stack:
        pciehp_handle_presence_or_link_change() -> pciehp_enable_slot() ->
        __pciehp_enable_slot() -> board_added() -> pciehp_power_on_slot()
      
      * after slot power is turned on, the link comes up, resulting in a
        PCI_EXP_SLTSTA_DLLSC event
      
      * the IRQ handler pciehp_isr() stores the event in ctrl->pending_events
        and returns IRQ_WAKE_THREAD
      
      * the IRQ thread is already woken (it's bringing up the slot), but the
        genirq code remembers to re-run the IRQ thread after it has finished
        (such that it can deal with the new event) by setting IRQTF_RUNTHREAD
        via __handle_irq_event_percpu() -> __irq_wake_thread()
      
      * the IRQ thread removes PCI_EXP_SLTSTA_DLLSC from ctrl->pending_events
        via board_added() -> pciehp_check_link_status() in order to deal with
        presence and link flaps per commit 6c35a1ac ("PCI: pciehp:
        Tolerate initially unstable link")
      
      * after pciehp_ist() has successfully brought up the slot, it resets
        ctrl->ist_running to false and wakes up the sysfs requester
      
      * the genirq code re-runs pciehp_ist(), which sets ctrl->ist_running
        to true but then returns with IRQ_NONE because ctrl->pending_events
        is empty
      
      * pciehp_sysfs_enable_slot() is finally woken but notices that
        ctrl->ist_running is true, hence continues waiting
      
      The only way to get the hung task going again is to trigger a hotplug
      event which brings down the slot, e.g. by yanking out the card.
      
      The same race exists when powering down the slot because remove_board()
      likewise clears link or presence changes in ctrl->pending_events per commit
      3943af9d ("PCI: pciehp: Ignore Link State Changes after powering off a
      slot") and thereby may cause a re-run of pciehp_ist() which returns with
      IRQ_NONE without resetting ctrl->ist_running to false.
      
      Fix by adding a goto label before the teardown steps at the end of
      pciehp_ist() and jumping to that label from the two return statements which
      currently neglect to reset the ctrl->ist_running flag.
      
      Fixes: 157c1062 ("PCI: pciehp: Avoid returning prematurely from sysfs requests")
      Link: https://lore.kernel.org/r/cca1effa488065cb055120aa01b65719094bdcb5.1584530321.git.lukas@wunner.de
      
      
      Reported-by: default avatarDavid Hoyer <David.Hoyer@netapp.com>
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarKeith Busch <kbusch@kernel.org>
      Cc: stable@vger.kernel.org	# v4.19+
      3e487d2e
  19. Feb 20, 2020
  20. Nov 12, 2019
    • Mika Westerberg's avatar
      PCI: pciehp: Prevent deadlock on disconnect · 87d0f2a5
      Mika Westerberg authored
      This addresses deadlocks in these common cases in hierarchies containing
      two switches:
      
        - All involved ports are runtime suspended and they are unplugged. This
          can happen easily if the drivers involved automatically enable runtime
          PM (xHCI for example does that).
      
        - System is suspended (e.g., closing the lid on a laptop) with a dock +
          something else connected, and the dock is unplugged while suspended.
      
      These cases lead to the following deadlock:
      
        INFO: task irq/126-pciehp:198 blocked for more than 120 seconds.
        irq/126-pciehp  D    0   198      2 0x80000000
        Call Trace:
         schedule+0x2c/0x80
         schedule_timeout+0x246/0x350
         wait_for_completion+0xb7/0x140
         kthread_stop+0x49/0x110
         free_irq+0x32/0x70
         pcie_shutdown_notification+0x2f/0x50
         pciehp_remove+0x27/0x50
         pcie_port_remove_service+0x36/0x50
         device_release_driver+0x12/0x20
         bus_remove_device+0xec/0x160
         device_del+0x13b/0x350
         device_unregister+0x1a/0x60
         remove_iter+0x1e/0x30
         device_for_each_child+0x56/0x90
         pcie_port_device_remove+0x22/0x40
         pcie_portdrv_remove+0x20/0x60
         pci_device_remove+0x3e/0xc0
         device_release_driver_internal+0x18c/0x250
         device_release_driver+0x12/0x20
         pci_stop_bus_device+0x6f/0x90
         pci_stop_bus_device+0x31/0x90
         pci_stop_and_remove_bus_device+0x12/0x20
         pciehp_unconfigure_device+0x88/0x140
         pciehp_disable_slot+0x6a/0x110
         pciehp_handle_presence_or_link_change+0x263/0x400
         pciehp_ist+0x1c9/0x1d0
         irq_thread_fn+0x24/0x60
         irq_thread+0xeb/0x190
         kthread+0x120/0x140
      
        INFO: task irq/190-pciehp:2288 blocked for more than 120 seconds.
        irq/190-pciehp  D    0  2288      2 0x80000000
        Call Trace:
         __schedule+0x2a2/0x880
         schedule+0x2c/0x80
         schedule_preempt_disabled+0xe/0x10
         mutex_lock+0x2c/0x30
         pci_lock_rescan_remove+0x15/0x20
         pciehp_unconfigure_device+0x4d/0x140
         pciehp_disable_slot+0x6a/0x110
         pciehp_handle_presence_or_link_change+0x263/0x400
         pciehp_ist+0x1c9/0x1d0
         irq_thread_fn+0x24/0x60
         irq_thread+0xeb/0x190
         kthread+0x120/0x140
      
      What happens here is that the whole hierarchy is runtime resumed and the
      parent PCIe downstream port, which got the hot-remove event, starts
      removing devices below it, taking pci_lock_rescan_remove() lock. When the
      child PCIe port is runtime resumed it calls pciehp_check_presence() which
      ends up calling pciehp_card_present() and pciehp_check_link_active().  Both
      of these use pcie_capability_read_word(), which notices that the underlying
      device is already gone and returns PCIBIOS_DEVICE_NOT_FOUND with the
      capability value set to 0. When pciehp gets this value it thinks that its
      child device is also hot-removed and schedules its IRQ thread to handle the
      event.
      
      The deadlock happens when the child's IRQ thread runs and tries to acquire
      pci_lock_rescan_remove() which is already taken by the parent and the
      parent waits for the child's IRQ thread to finish.
      
      Prevent this from happening by checking the return value of
      pcie_capability_read_word() and if it is PCIBIOS_DEVICE_NOT_FOUND stop
      performing any hot-removal activities.
      
      [bhelgaas: add common scenarios to commit log]
      Link: https://lore.kernel.org/r/20191029170022.57528-2-mika.westerberg@linux.intel.com
      
      
      Tested-by: default avatarKai-Heng Feng <kai.heng.feng@canonical.com>
      Signed-off-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      87d0f2a5
  21. Nov 11, 2019
  22. Oct 04, 2019
  23. Sep 05, 2019
  24. May 09, 2019
  25. Feb 15, 2019
    • Mika Westerberg's avatar
      PCI: pciehp: Disable Data Link Layer State Changed event on suspend · bbe54ea5
      Mika Westerberg authored
      Commit 0e157e52 ("PCI/PME: Implement runtime PM callbacks") tried to
      solve an issue where the hierarchy immediately wakes up when it is
      transitioned into D3cold.  However, it turns out to prevent PME
      propagation on some systems that do not support D3cold.
      
      I looked more closely at what might cause the immediate wakeup.  It happens
      when the ACPI power resource of the root port is turned off.  The AML code
      associated with the _OFF() method of the ACPI power resource starts a PCIe
      L2/L3 Ready transition and waits for it to complete.  Right after the L2/L3
      Ready transition is started the root port receives a PME from the
      downstream port.
      
      The simplest hierarchy where this happens looks like this:
      
        00:1d.0 PCIe Root Port
          ^
          |
          v
          05:00.0 PCIe switch #1 upstream port
            06:01.0 PCIe switch #1 downstream hotplug port
              ^
              |
              v
              08:00.0 PCIe switch #2 upstream port
      
      It seems that the PCIe link between the two switches, before
      PME_Turn_Off/PME_TO_Ack is complete for the whole hierarchy, goes
      inactive and triggers PME towards the root port bringing it back to D0.
      The L2/L3 Ready sequence is described in PCIe r4.0 spec sections 5.2 and
      5.3.3 but unfortunately they do not state what happens if DLLSCE is
      enabled during the sequence.
      
      Disabling Data Link Layer State Changed event (DLLSCE) seems to prevent
      the issue and still allows the downstream hotplug port to notice when a
      device is plugged/unplugged.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=202593
      Fixes: 0e157e52
      
       ("PCI/PME: Implement runtime PM callbacks")
      Signed-off-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      CC: stable@vger.kernel.org	# v4.20+
      bbe54ea5
  26. Feb 01, 2019
  27. Jan 14, 2019
  28. Oct 02, 2018
    • Mika Westerberg's avatar
      PCI: pciehp: Do not handle events if interrupts are masked · 720d6a67
      Mika Westerberg authored
      
      PCIe native hotplug shares MSI vector with native PME so the interrupt
      handler might get called even the hotplug interrupt is masked. In that case
      we should not handle any events because the interrupt was not meant for us.
      
      Modify the PCIe hotplug interrupt handler to check this accordingly and
      bail out if it finds out that the interrupt was not about hotplug.
      
      Signed-off-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarLukas Wunner <lukas@wunner.de>
      720d6a67
    • Mika Westerberg's avatar
      PCI: pciehp: Disable hotplug interrupt during suspend · eb34da60
      Mika Westerberg authored
      
      When PCIe hotplug port is transitioned into D3hot, the link to the
      downstream component will go down. If hotplug interrupt generation is
      enabled when that happens, it will trigger immediately, waking up the
      system and bringing the link back up.
      
      To prevent this, disable hotplug interrupt generation when system suspend
      is entered. This does not prevent wakeup from low power states according
      to PCIe 4.0 spec section 6.7.3.4:
      
        Software enables a hot-plug event to generate a wakeup event by
        enabling software notification of the event as described in Section
        6.7.3.1. Note that in order for software to disable interrupt generation
        while keeping wakeup generation enabled, the Hot-Plug Interrupt Enable
        bit must be cleared.
      
      So as long as we have set the slot event mask accordingly, wakeup should
      work even if slot interrupt is disabled. The port should trigger wake and
      then send PME to the root port when the PCIe hierarchy is brought back up.
      
      Limit this to systems using native PME mechanism to make sure older Apple
      systems depending on commit e3354628c376 ("PCI: pciehp: Support interrupts
      sent from D3hot") still continue working.
      
      Signed-off-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      eb34da60
    • Keith Busch's avatar
      PCI: Make link active reporting detection generic · f0157160
      Keith Busch authored
      
      The spec has timing requirements when waiting for a link to become active
      after a conventional reset.  Implement those hard delays when waiting for
      an active link so pciehp and dpc drivers don't need to duplicate this.
      
      For devices that don't support data link layer active reporting, wait the
      fixed time recommended by the PCIe spec.
      
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      [bhelgaas: changelog]
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarSinan Kaya <okaya@kernel.org>
      f0157160
  29. Sep 18, 2018
    • Lukas Wunner's avatar
      PCI: hotplug: Embed hotplug_slot · 125450f8
      Lukas Wunner authored
      When the PCI hotplug core and its first user, cpqphp, were introduced in
      February 2002 with historic commit a8a2069f432c, cpqphp allocated a slot
      struct for its internal use plus a hotplug_slot struct to be registered
      with the hotplug core and linked the two with pointers:
      https://git.kernel.org/tglx/history/c/a8a2069f432c
      
      Nowadays, the predominant pattern in the tree is to embed ("subclass")
      such structures in one another and cast to the containing struct with
      container_of().  But it wasn't until July 2002 that container_of() was
      introduced with historic commit ec4f214232cf:
      https://git.kernel.org/tglx/history/c/ec4f214232cf
      
      
      
      pnv_php, introduced in 2016, did the right thing and embedded struct
      hotplug_slot in its internal struct pnv_php_slot, but all other drivers
      cargo-culted cpqphp's design and linked separate structs with pointers.
      
      Embedding structs is preferrable to linking them with pointers because
      it requires fewer allocations, thereby reducing overhead and simplifying
      error paths.  Casting an embedded struct to the containing struct
      becomes a cheap subtraction rather than a dereference.  And having fewer
      pointers reduces the risk of them pointing nowhere either accidentally
      or due to an attack.
      
      Convert all drivers to embed struct hotplug_slot in their internal slot
      struct.  The "private" pointer in struct hotplug_slot thereby becomes
      unused, so drop it.
      
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>  # drivers/pci/hotplug/rpa*
      Acked-by: Sebastian Ott <sebott@linux.ibm.com>        # drivers/pci/hotplug/s390*
      Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> # drivers/platform/x86
      Cc: Len Brown <lenb@kernel.org>
      Cc: Scott Murray <scott@spiteful.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
      Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Cc: Corentin Chary <corentin.chary@gmail.com>
      Cc: Darren Hart <dvhart@infradead.org>
      125450f8