Forum | Documentation | Website | Blog

Skip to content
Snippets Groups Projects
  1. Jun 12, 2024
  2. Jun 05, 2024
  3. May 24, 2024
    • Will Deacon's avatar
      arm64: patching: fix handling of execmem addresses · b1480ed2
      Will Deacon authored
      Klara Modin reported warnings for a kernel configured with BPF_JIT but
      without MODULES:
      
      [   44.131296] Trying to vfree() bad address (000000004a17c299)
      [   44.138024] WARNING: CPU: 1 PID: 193 at mm/vmalloc.c:3189 remove_vm_area (mm/vmalloc.c:3189 (discriminator 1))
      [   44.146675] CPU: 1 PID: 193 Comm: kworker/1:2 Tainted: G      D W          6.9.0-01786-g2c9e5d4a0082 #25
      [   44.158229] Hardware name: Raspberry Pi 3 Model B (DT)
      [   44.164433] Workqueue: events bpf_prog_free_deferred
      [   44.170492] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      [   44.178601] pc : remove_vm_area (mm/vmalloc.c:3189 (discriminator 1))
      [   44.183705] lr : remove_vm_area (mm/vmalloc.c:3189 (discriminator 1))
      [   44.188772] sp : ffff800082a13c70
      [   44.193112] x29: ffff800082a13c70 x28: 0000000000000000 x27: 0000000000000000
      [   44.201384] x26: 0000000000000000 x25: ffff00003a44efa0 x24: 00000000d4202000
      [   44.209658] x23: ffff800081223dd0 x22: ffff00003a198a40 x21: ffff8000814dd880
      [   44.217924] x20: 00000000d4202000 x19: ffff8000814dd880 x18: 0000000000000006
      [   44.226206] x17: 0000000000000000 x16: 0000000000000020 x15: 0000000000000002
      [   44.234460] x14: ffff8000811a6370 x13: 0000000020000000 x12: 0000000000000000
      [   44.242710] x11: ffff8000811a6370 x10: 0000000000000144 x9 : ffff8000811fe370
      [   44.250959] x8 : 0000000000017fe8 x7 : 00000000fffff000 x6 : ffff8000811fe370
      [   44.259206] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
      [   44.267457] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000002203240
      [   44.275703] Call trace:
      [   44.279158] remove_vm_area (mm/vmalloc.c:3189 (discriminator 1))
      [   44.283858] vfree (mm/vmalloc.c:3322)
      [   44.287835] execmem_free (mm/execmem.c:70)
      [   44.292347] bpf_jit_free_exec+0x10/0x1c
      [   44.297283] bpf_prog_pack_free (kernel/bpf/core.c:1006)
      [   44.302457] bpf_jit_binary_pack_free (kernel/bpf/core.c:1195)
      [   44.307951] bpf_jit_free (include/linux/filter.h:1083 arch/arm64/net/bpf_jit_comp.c:2474)
      [   44.312342] bpf_prog_free_deferred (kernel/bpf/core.c:2785)
      [   44.317785] process_one_work (kernel/workqueue.c:3273)
      [   44.322684] worker_thread (kernel/workqueue.c:3342 (discriminator 2) kernel/workqueue.c:3429 (discriminator 2))
      [   44.327292] kthread (kernel/kthread.c:388)
      [   44.331342] ret_from_fork (arch/arm64/kernel/entry.S:861)
      
      The problem is because bpf_arch_text_copy() silently fails to write to the
      read-only area as a result of patch_map() faulting and the resulting
      -EFAULT being chucked away.
      
      Update patch_map() to use CONFIG_EXECMEM instead of
      CONFIG_STRICT_MODULE_RWX to check for vmalloc addresses.
      
      Link: https://lkml.kernel.org/r/20240521213813.703309-1-rppt@kernel.org
      Fixes: 2c9e5d4a
      
       ("bpf: remove CONFIG_BPF_JIT dependency on CONFIG_MODULES of")
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      Signed-off-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Reported-by: default avatarKlara Modin <klarasmodin@gmail.com>
      Closes: https://lore.kernel.org/all/7983fbbf-0127-457c-9394-8d6e4299c685@gmail.com
      
      
      Tested-by: default avatarKlara Modin <klarasmodin@gmail.com>
      Cc: Björn Töpel <bjorn@kernel.org>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b1480ed2
  4. May 22, 2024
    • Steven Rostedt (Google)'s avatar
      tracing/treewide: Remove second parameter of __assign_str() · 2c92ca84
      Steven Rostedt (Google) authored
      With the rework of how the __string() handles dynamic strings where it
      saves off the source string in field in the helper structure[1], the
      assignment of that value to the trace event field is stored in the helper
      value and does not need to be passed in again.
      
      This means that with:
      
        __string(field, mystring)
      
      Which use to be assigned with __assign_str(field, mystring), no longer
      needs the second parameter and it is unused. With this, __assign_str()
      will now only get a single parameter.
      
      There's over 700 users of __assign_str() and because coccinelle does not
      handle the TRACE_EVENT() macro I ended up using the following sed script:
      
        git grep -l __assign_str | while read a ; do
            sed -e 's/\(__assign_str([^,]*[^ ,]\) *,[^;]*/\1)/' $a > /tmp/test-file;
            mv /tmp/test-file $a;
        done
      
      I then searched for __assign_str() that did not end with ';' as those
      were multi line assignments that the sed script above would fail to catch.
      
      Note, the...
      2c92ca84
    • Ard Biesheuvel's avatar
      arm64/fpsimd: Avoid erroneous elide of user state reload · e92bee9f
      Ard Biesheuvel authored
      TIF_FOREIGN_FPSTATE is a 'convenience' flag that should reflect whether
      the current CPU holds the most recent user mode FP/SIMD state of the
      current task. It combines two conditions:
      - whether the current CPU's FP/SIMD state belongs to the task;
      - whether that state is the most recent associated with the task (as a
        task may have executed on other CPUs as well).
      
      When a task is scheduled in and TIF_KERNEL_FPSTATE is set, it means the
      task was in a kernel mode NEON section when it was scheduled out, and so
      the kernel mode FP/SIMD state is restored. Since this implies that the
      current CPU is *not* holding the most recent user mode FP/SIMD state of
      the current task, the TIF_FOREIGN_FPSTATE flag is set too, so that the
      user mode FP/SIMD state is reloaded from memory when returning to
      userland.
      
      However, the task may be scheduled out after completing the kernel mode
      NEON section, but before returning to userland. When this happens, the
      TIF_FOREIGN_FPSTATE flag will not be preserved, but will be set as usual
      the next time the task is scheduled in, and will be based on the above
      conditions.
      
      This means that, rather than setting TIF_FOREIGN_FPSTATE when scheduling
      in a task with TIF_KERNEL_FPSTATE set, the underlying state should be
      updated so that TIF_FOREIGN_FPSTATE will assume the expected value as a
      result.
      
      So instead, call fpsimd_flush_cpu_state(), which takes care of this.
      
      Closes: https://lore.kernel.org/all/cb8822182231850108fa43e0446a4c7f@kernel.org
      
      
      Reported-by: default avatarJohannes Nixdorf <mixi@shadowice.org>
      Fixes: aefbab8e
      
       ("arm64: fpsimd: Preserve/restore kernel mode NEON at context switch")
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Dave Martin <Dave.Martin@arm.com>
      Cc: Janne Grunau <j@jannau.net>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Tested-by: default avatarJanne Grunau <j@jannau.net>
      Tested-by: default avatarJohannes Nixdorf <mixi@shadowice.org>
      Reviewed-by: default avatarMark Brown <broonie@kernel.org>
      Link: https://lore.kernel.org/r/20240522091335.335346-2-ardb+git@google.com
      
      
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      e92bee9f
    • Will Deacon's avatar
      Reapply "arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD" · f481bb32
      Will Deacon authored
      This reverts commit b8995a18
      
      .
      
      Ard managed to reproduce the dm-crypt corruption problem and got to the
      bottom of it, so re-apply the problematic patch in preparation for
      fixing things properly.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      f481bb32
  5. May 17, 2024
  6. May 14, 2024
  7. May 10, 2024
  8. May 09, 2024
    • Masahiro Yamada's avatar
      kbuild: use $(src) instead of $(srctree)/$(src) for source directory · b1992c37
      Masahiro Yamada authored
      Kbuild conventionally uses $(obj)/ for generated files, and $(src)/ for
      checked-in source files. It is merely a convention without any functional
      difference. In fact, $(obj) and $(src) are exactly the same, as defined
      in scripts/Makefile.build:
      
          src := $(obj)
      
      When the kernel is built in a separate output directory, $(src) does
      not accurately reflect the source directory location. While Kbuild
      resolves this discrepancy by specifying VPATH=$(srctree) to search for
      source files, it does not cover all cases. For example, when adding a
      header search path for local headers, -I$(srctree)/$(src) is typically
      passed to the compiler.
      
      This introduces inconsistency between upstream and downstream Makefiles
      because $(src) is used instead of $(srctree)/$(src) for the latter.
      
      To address this inconsistency, this commit changes the semantics of
      $(src) so that it always points to the directory in the source tree.
      
      Going forward, the variables used ...
      b1992c37
  9. May 08, 2024
  10. May 03, 2024
  11. May 02, 2024
  12. Apr 28, 2024
    • Mark Rutland's avatar
      arm64: defer clearing DAIF.D · 080297be
      Mark Rutland authored
      For historical reasons we unmask debug exceptions in __cpu_setup(), but
      it's not necessary to unmask debug exceptions this early in the
      boot/idle entry paths. It would be better to unmask debug exceptions
      later in C code as this simplifies the current code and will make it
      easier to rework exception masking logic to handle non-DAIF bits in
      future (e.g. PSTATE.{ALLINT,PM}).
      
      We started clearing DAIF.D in __cpu_setup() in commit:
      
        2ce39ad1 ("arm64: debug: unmask PSTATE.D earlier")
      
      At the time, we needed to ensure that DAIF.D was clear on the primary
      CPU before scheduling and preemption were possible, and chose to do this
      in __cpu_setup() so that this occurred in the same place for primary and
      secondary CPUs. As we cannot handle debug exceptions this early, we
      placed an ISB between initializing MDSCR_EL1 and clearing DAIF.D so that
      no exceptions should be triggered.
      
      Subsequently we rewrote the return-from-{idle,suspend} paths to use
      __cpu_setup() in commit:
      
        cabe1c81 ("arm64: Change cpu_resume() to enable mmu early then access sleep_sp by va")
      
      ... which allowed for earlier use of the MMU and had the desirable
      property of using the same code to reset the CPU in the cold and warm
      boot paths. This introduced a bug: DAIF.D was clear while
      cpu_do_resume() restored MDSCR_EL1 and other control registers (e.g.
      breakpoint/watchpoint control/value registers), and so we could
      unexpectedly take debug exceptions.
      
      We fixed that in commit:
      
        744c6c37 ("arm64: kernel: Fix unmasked debug exceptions when restoring mdscr_el1")
      
      ... by having cpu_do_resume() use the `disable_dbg` macro to set DAIF.D
      before restoring MDSCR_EL1 and other control registers. This relies on
      DAIF.D being subsequently cleared again in cpu_resume().
      
      Subsequently we reworked DAIF masking in commit:
      
        0fbeb318
      
       ("arm64: explicitly mask all exceptions")
      
      ... where we began enforcing a policy that DAIF.D being set implies all
      other DAIF bits are set, and so e.g. we cannot take an IRQ while DAIF.D
      is set. As part of this the use of `disable_dbg` in cpu_resume() was
      replaced with `disable_daif` for consistency with the rest of the
      kernel.
      
      These days, there's no need to clear DAIF.D early within __cpu_setup():
      
      * setup_arch() clears DAIF.DA before scheduling and preemption are
        possible on the primary CPU, avoiding the problem we we originally
        trying to work around.
      
        Note: DAIF.IF get cleared later when interrupts are enabled for the
        first time.
      
      * secondary_start_kernel() clears all DAIF bits before scheduling and
        preemption are possible on secondary CPUs.
      
        Note: with pseudo-NMI, the PMR is initialized here before any DAIF
        bits are cleared. Similar will be necessary for the architectural NMI.
      
      * cpu_suspend() restores all DAIF bits when returning from idle,
        ensuring that we don't unexpectedly leave DAIF.D clear or set.
      
        Note: with pseudo-NMI, the PMR is initialized here before DAIF is
        cleared. Similar will be necessary for the architectural NMI.
      
      This patch removes the unmasking of debug exceptions from __cpu_setup(),
      relying on the above locations to initialize DAIF. This allows some
      other cleanups:
      
      * It is no longer necessary for cpu_resume() to explicitly mask debug
        (or other) exceptions, as it is always called with all DAIF bits set.
        Thus we drop the use of `disable_daif`.
      
      * The `enable_dbg` macro is no longer used, and so is dropped.
      
      * It is no longer necessary to have an ISB immediately after
        initializing MDSCR_EL1 in __cpu_setup(), and we can revert to relying
        on the context synchronization that occurs when the MMU is enabled
        between __cpu_setup() and code which clears DAIF.D
      
      Comments are added to setup_arch() and secondary_start_kernel() to
      explain the initial unmasking of the DAIF bits.
      
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Will Deacon <will@kernel.org>
      Link: https://lore.kernel.org/r/20240422113523.4070414-3-mark.rutland@arm.com
      
      
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      080297be
  13. Apr 25, 2024
    • Kent Overstreet's avatar
      fix missing vmalloc.h includes · 0069455b
      Kent Overstreet authored
      Patch series "Memory allocation profiling", v6.
      
      Overview:
      Low overhead [1] per-callsite memory allocation profiling. Not just for
      debug kernels, overhead low enough to be deployed in production.
      
      Example output:
        root@moria-kvm:~# sort -rn /proc/allocinfo
         127664128    31168 mm/page_ext.c:270 func:alloc_page_ext
          56373248     4737 mm/slub.c:2259 func:alloc_slab_page
          14880768     3633 mm/readahead.c:247 func:page_cache_ra_unbounded
          14417920     3520 mm/mm_init.c:2530 func:alloc_large_system_hash
          13377536      234 block/blk-mq.c:3421 func:blk_mq_alloc_rqs
          11718656     2861 mm/filemap.c:1919 func:__filemap_get_folio
           9192960     2800 kernel/fork.c:307 func:alloc_thread_stack_node
           4206592        4 net/netfilter/nf_conntrack_core.c:2567 func:nf_ct_alloc_hashtable
           4136960     1010 drivers/staging/ctagmod/ctagmod.c:20 [ctagmod] func:ctagmod_start
           3940352      962 mm/memory.c:4214 func:alloc_anon_folio
           2894464    22613 fs/kernfs/dir.c:6...
      0069455b
  14. Apr 22, 2024
    • Jason Gunthorpe's avatar
      arm64/io: Provide a WC friendly __iowriteXX_copy() · ead79118
      Jason Gunthorpe authored
      The kernel provides driver support for using write combining IO memory
      through the __iowriteXX_copy() API which is commonly used as an optional
      optimization to generate 16/32/64 byte MemWr TLPs in a PCIe environment.
      
      iomap_copy.c provides a generic implementation as a simple 4/8 byte at a
      time copy loop that has worked well with past ARM64 CPUs, giving a high
      frequency of large TLPs being successfully formed.
      
      However modern ARM64 CPUs are quite sensitive to how the write combining
      CPU HW is operated and a compiler generated loop with intermixed
      load/store is not sufficient to frequently generate a large TLP. The CPUs
      would like to see the entire TLP generated by consecutive store
      instructions from registers. Compilers like gcc tend to intermix loads and
      stores and have poor code generation, in part, due to the ARM64 situation
      that writeq() does not codegen anything other than "[xN]". However even
      with that resolved compilers like clang still do not have good code
      generation.
      
      This means on modern ARM64 CPUs the rate at which __iowriteXX_copy()
      successfully generates large TLPs is very small (less than 1 in 10,000)
      tries), to the point that the use of WC is pointless.
      
      Implement __iowrite32/64_copy() specifically for ARM64 and use inline
      assembly to build consecutive blocks of STR instructions. Provide direct
      support for 64/32/16 large TLP generation in this manner. Optimize for
      common constant lengths so that the compiler can directly inline the store
      blocks.
      
      This brings the frequency of large TLP generation up to a high level that
      is comparable with older CPU generations.
      
      As the __iowriteXX_copy() family of APIs is intended for use with WC
      incorporate the DGH hint directly into the function.
      
      Link: https://lore.kernel.org/r/4-v3-1893cd8b9369+1925-mlx5_arm_wc_jgg@nvidia.com
      
      
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: linux-arch@vger.kernel.org
      Cc: linux-arm-kernel@lists.infradead.org
      Reviewed-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      ead79118
  15. Apr 18, 2024
  16. Apr 12, 2024
  17. Apr 03, 2024
  18. Apr 01, 2024
  19. Mar 13, 2024
  20. Mar 07, 2024
  21. Mar 04, 2024
    • Puranjay Mohan's avatar
      arm64: prohibit probing on arch_kunwind_consume_entry() · 2c79bd34
      Puranjay Mohan authored
      Make arch_kunwind_consume_entry() as __always_inline otherwise the
      compiler might not inline it and allow attaching probes to it.
      
      Without this, just probing arch_kunwind_consume_entry() via
      <tracefs>/kprobe_events will crash the kernel on arm64.
      
      The crash can be reproduced using the following compiler and kernel
      combination:
      clang version 19.0.0git (https://github.com/llvm/llvm-project.git d68d29516102252f6bf6dc23fb22cef144ca1cb3)
      commit 87adedeb ("Merge tag 'net-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
      
       [root@localhost ~]# echo 'p arch_kunwind_consume_entry' > /sys/kernel/debug/tracing/kprobe_events
       [root@localhost ~]# echo 1 > /sys/kernel/debug/tracing/events/kprobes/enable
      
       Modules linked in: aes_ce_blk aes_ce_cipher ghash_ce sha2_ce virtio_net sha256_arm64 sha1_ce arm_smccc_trng net_failover failover virtio_mmio uio_pdrv_genirq uio sch_fq_codel dm_mod dax configfs
       CPU: 3 PID: 1405 Comm: bash Not tainted 6.8.0-rc6+ #14
       Hardware name: linux,dummy-virt (DT)
       pstate: 604003c5 (nZCv DAIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
       pc : kprobe_breakpoint_handler+0x17c/0x258
       lr : kprobe_breakpoint_handler+0x17c/0x258
       sp : ffff800085d6ab60
       x29: ffff800085d6ab60 x28: ffff0000066f0040 x27: ffff0000066f0b20
       x26: ffff800081fa7b0c x25: 0000000000000002 x24: ffff00000b29bd18
       x23: ffff00007904c590 x22: ffff800081fa6590 x21: ffff800081fa6588
       x20: ffff00000b29bd18 x19: ffff800085d6ac40 x18: 0000000000000079
       x17: 0000000000000001 x16: ffffffffffffffff x15: 0000000000000004
       x14: ffff80008277a940 x13: 0000000000000003 x12: 0000000000000003
       x11: 00000000fffeffff x10: c0000000fffeffff x9 : aa95616fdf80cc00
       x8 : aa95616fdf80cc00 x7 : 205d343137373231 x6 : ffff800080fb48ec
       x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000000
       x2 : 0000000000000000 x1 : ffff800085d6a910 x0 : 0000000000000079
       Call trace:
       kprobes: Failed to recover from reentered kprobes.
       kprobes: Dump kprobe:
       .symbol_name = arch_kunwind_consume_entry, .offset = 0, .addr = arch_kunwind_consume_entry+0x0/0x40
       ------------[ cut here ]------------
       kernel BUG at arch/arm64/kernel/probes/kprobes.c:241!
       kprobes: Failed to recover from reentered kprobes.
       kprobes: Dump kprobe:
       .symbol_name = arch_kunwind_consume_entry, .offset = 0, .addr = arch_kunwind_consume_entry+0x0/0x40
      
      Fixes: 1aba06e7
      
       ("arm64: stacktrace: factor out kunwind_stack_walk()")
      Signed-off-by: default avatarPuranjay Mohan <puranjay12@gmail.com>
      Reviewed-by: default avatarMark Rutland <mark.rutland@arm.com>
      Link: https://lore.kernel.org/r/20240229231620.24846-1-puranjay12@gmail.com
      
      
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      2c79bd34
  22. Mar 01, 2024
  23. Feb 28, 2024