Forum | Documentation | Website | Blog

Skip to content
Snippets Groups Projects
  1. Jul 29, 2024
    • Linus Torvalds's avatar
      profiling: remove stale percpu flip buffer variables · 94ede2a3
      Linus Torvalds authored
      For some reason I didn't see this issue on my arm64 or x86-64 builds,
      but Stephen Rothwell reports that commit 2accfdb7 ("profiling:
      attempt to remove per-cpu profile flip buffer") left these static
      variables around, and the powerpc build is unhappy about them:
      
        kernel/profile.c:52:28: warning: 'cpu_profile_flip' defined but not used [-Wunused-variable]
           52 | static DEFINE_PER_CPU(int, cpu_profile_flip);
              |                            ^~~~~~~~~~~~~~~~
        ..
      
      So remove these stale left-over remnants too.
      
      Fixes: 2accfdb7
      
       ("profiling: attempt to remove per-cpu profile flip buffer")
      Reported-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      94ede2a3
    • Linus Torvalds's avatar
      task_work: make TWA_NMI_CURRENT handling conditional on IRQ_WORK · cec6937d
      Linus Torvalds authored
      The TWA_NMI_CURRENT handling very much depends on IRQ_WORK, but that
      isn't universally enabled everywhere.
      
      Maybe the IRQ_WORK infrastructure should just be unconditional - x86
      ends up indirectly enabling it through unconditionally enabling
      PERF_EVENTS, for example.  But it also gets enabled by having SMP
      support, or even if you just have PRINTK enabled.
      
      But in the meantime TWA_NMI_CURRENT causes tons of build failures on
      various odd minimal configs.  Which did show up in linux-next, but
      despite that nobody bothered to fix it or even inform me until -rc1 was
      out.
      
      Fixes: 466e4d80
      
       ("task_work: Add TWA_NMI_CURRENT as an additional notify mode")
      Reported-by: default avatarNaresh Kamboju <naresh.kamboju@linaro.org>
      Reported-by: default avatarkernelci.org bot <bot@kernelci.org>
      Reported-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cec6937d
    • Linus Torvalds's avatar
      profiling: attempt to remove per-cpu profile flip buffer · 2accfdb7
      Linus Torvalds authored
      This is the really old legacy kernel profiling code, which has long
      since been obviated by "real profiling" (ie 'prof' and company), and
      mainly remains as a source of syzbot reports.
      
      There are anecdotal reports that people still use it for boot-time
      profiling, but it's unlikely that such use would care about the old NUMA
      optimizations in this code from 2004 (commit ad02973d42: "profile: 512x
      Altix timer interrupt livelock fix" in the BK import archive at [1])
      
      So in order to head off future syzbot reports, let's try to simplify
      this code and get rid of the per-cpu profile buffers that are quite a
      large portion of the complexity footprint of this thing (including CPU
      hotplug callbacks etc).
      
      It's unlikely anybody will actually notice, or possibly, as Thomas put
      it: "Only people who indulge in nostalgia will notice :)".
      
      That said, if it turns out that this code is actually actively used by
      somebody, we can always revert this removal.  Thus the "attempt" in the
      summary line.
      
      [ Note: in a small nod to "the profiling code can cause NUMA problems",
        this also removes the "increment the last entry in the profiling array
        on any unknown hits" logic. That would account any program counter in
        a module to that single counter location, and might exacerbate any
        NUMA cacheline bouncing issues ]
      
      Link: https://lore.kernel.org/all/CAHk-=wgs52BxT4Zjmjz8aNvHWKxf5_ThBY4bYL1Y6CTaNL2dTw@mail.gmail.com/
      Link:  https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
      
       [1]
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2accfdb7
    • Tetsuo Handa's avatar
      profiling: remove prof_cpu_mask · 7c51f7bb
      Tetsuo Handa authored
      
      syzbot is reporting uninit-value at profile_hits(), for there is a race
      window between
      
        if (!alloc_cpumask_var(&prof_cpu_mask, GFP_KERNEL))
          return -ENOMEM;
        cpumask_copy(prof_cpu_mask, cpu_possible_mask);
      
      in profile_init() and
      
        cpumask_available(prof_cpu_mask) &&
        cpumask_test_cpu(smp_processor_id(), prof_cpu_mask))
      
      in profile_tick(); prof_cpu_mask remains uninitialzed until cpumask_copy()
      completes while cpumask_available(prof_cpu_mask) returns true as soon as
      alloc_cpumask_var(&prof_cpu_mask) completes.
      
      We could replace alloc_cpumask_var() with zalloc_cpumask_var() and
      call cpumask_copy() from create_proc_profile() on only UP kernels, for
      profile_online_cpu() calls cpumask_set_cpu() as needed via
      cpuhp_setup_state(CPUHP_AP_ONLINE_DYN) on SMP kernels. But this patch
      removes prof_cpu_mask because it seems unnecessary.
      
      The cpumask_test_cpu(smp_processor_id(), prof_cpu_mask) test
      in profile_tick() is likely always true due to
      
        a CPU cannot call profile_tick() if that CPU is offline
      
      and
      
        cpumask_set_cpu(cpu, prof_cpu_mask) is called when that CPU becomes
        online and cpumask_clear_cpu(cpu, prof_cpu_mask) is called when that
        CPU becomes offline
      
      . This test could be false during transition between online and offline.
      
      But according to include/linux/cpuhotplug.h , CPUHP_PROFILE_PREPARE
      belongs to PREPARE section, which means that the CPU subjected to
      profile_dead_cpu() cannot be inside profile_tick() (i.e. no risk of
      use-after-free bug) because interrupt for that CPU is disabled during
      PREPARE section. Therefore, this test is guaranteed to be true, and
      can be removed. (Since profile_hits() checks prof_buffer != NULL, we
      don't need to check prof_buffer != NULL here unless get_irq_regs() or
      user_mode() is such slow that we want to avoid when prof_buffer == NULL).
      
      do_profile_hits() is called from profile_tick() from timer interrupt
      only if cpumask_test_cpu(smp_processor_id(), prof_cpu_mask) is true and
      prof_buffer is not NULL. But syzbot is also reporting that sometimes
      do_profile_hits() is called while current thread is still doing vzalloc(),
      where prof_buffer must be NULL at this moment. This indicates that multiple
      threads concurrently tried to write to /sys/kernel/profiling interface,
      which caused that somebody else try to re-allocate prof_buffer despite
      somebody has already allocated prof_buffer. Fix this by using
      serialization.
      
      Reported-by: default avatarsyzbot <syzbot+b1a83ab2a9eb9321fbdd@syzkaller.appspotmail.com>
      Closes: https://syzkaller.appspot.com/bug?extid=b1a83ab2a9eb9321fbdd
      
      
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Tested-by: default avatarsyzbot <syzbot+b1a83ab2a9eb9321fbdd@syzkaller.appspotmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7c51f7bb
  2. Jul 28, 2024
    • Linus Torvalds's avatar
      minmax: make generic MIN() and MAX() macros available everywhere · 1a251f52
      Linus Torvalds authored
      
      This just standardizes the use of MIN() and MAX() macros, with the very
      traditional semantics.  The goal is to use these for C constant
      expressions and for top-level / static initializers, and so be able to
      simplify the min()/max() macros.
      
      These macro names were used by various kernel code - they are very
      traditional, after all - and all such users have been fixed up, with a
      few different approaches:
      
       - trivial duplicated macro definitions have been removed
      
         Note that 'trivial' here means that it's obviously kernel code that
         already included all the major kernel headers, and thus gets the new
         generic MIN/MAX macros automatically.
      
       - non-trivial duplicated macro definitions are guarded with #ifndef
      
         This is the "yes, they define their own versions, but no, the include
         situation is not entirely obvious, and maybe they don't get the
         generic version automatically" case.
      
       - strange use case #1
      
         A couple of drivers decided that the way they want to describe their
         versioning is with
      
      	#define MAJ 1
      	#define MIN 2
      	#define DRV_VERSION __stringify(MAJ) "." __stringify(MIN)
      
         which adds zero value and I just did my Alexander the Great
         impersonation, and rewrote that pointless Gordian knot as
      
      	#define DRV_VERSION "1.2"
      
         instead.
      
       - strange use case #2
      
         A couple of drivers thought that it's a good idea to have a random
         'MIN' or 'MAX' define for a value or index into a table, rather than
         the traditional macro that takes arguments.
      
         These values were re-written as C enum's instead. The new
         function-line macros only expand when followed by an open
         parenthesis, and thus don't clash with enum use.
      
      Happily, there weren't really all that many of these cases, and a lot of
      users already had the pattern of using '#ifndef' guarding (or in one
      case just using '#undef MIN') before defining their own private version
      that does the same thing. I left such cases alone.
      
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1a251f52
  3. Jul 24, 2024
    • Joel Granados's avatar
      sysctl: treewide: constify the ctl_table argument of proc_handlers · 78eb4ea2
      Joel Granados authored
      const qualify the struct ctl_table argument in the proc_handler function
      signatures. This is a prerequisite to moving the static ctl_table
      structs into .rodata data which will ensure that proc_handler function
      pointers cannot be modified.
      
      This patch has been generated by the following coccinelle script:
      
      ```
        virtual patch
      
        @r1@
        identifier ctl, write, buffer, lenp, ppos;
        identifier func !~ "appldata_(timer|interval)_handler|sched_(rt|rr)_handler|rds_tcp_skbuf_handler|proc_sctp_do_(hmac_alg|rto_min|rto_max|udp_port|alpha_beta|auth|probe_interval)";
        @@
      
        int func(
        - struct ctl_table *ctl
        + const struct ctl_table *ctl
          ,int write, void *buffer, size_t *lenp, loff_t *ppos);
      
        @r2@
        identifier func, ctl, write, buffer, lenp, ppos;
        @@
      
        int func(
        - struct ctl_table *ctl
        + const struct ctl_table *ctl
          ,int write, void *buffer, size_t *lenp, loff_t *ppos)
        { ... }
      
        @r3@
        identifier func;
        @@
      
        int func(
        - stru...
      78eb4ea2
  4. Jul 22, 2024
    • Anna-Maria Behnsen's avatar
    • Anna-Maria Behnsen's avatar
      timers/migration: Spare write when nothing changed · 2367e28e
      Anna-Maria Behnsen authored
      
      The wakeup value is written unconditionally in tmigr_cpu_new_timer(). When
      there was no new next timer expiry that needs to be propagated, then the
      value that was read before is written. This is not required.
      
      Move the write to the place where wakeup value is changed changed.
      
      Signed-off-by: default avatarAnna-Maria Behnsen <anna-maria@linutronix.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Link: https://lore.kernel.org/r/20240716-tmigr-fixes-v4-7-757baa7803fe@linutronix.de
      2367e28e
    • Anna-Maria Behnsen's avatar
      timers/migration: Rename childmask by groupmask to make naming more obvious · 835a9a67
      Anna-Maria Behnsen authored
      
      childmask in the group reflects the mask that is required to 'reference'
      this group in the parent. When reading childmask, this might be confusing,
      as this suggests, that this is the mask of the child of the group.
      
      Clarify this by renaming childmask in the tmigr_group and tmc_group by
      groupmask.
      
      Signed-off-by: default avatarAnna-Maria Behnsen <anna-maria@linutronix.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Link: https://lore.kernel.org/r/20240716-tmigr-fixes-v4-6-757baa7803fe@linutronix.de
      835a9a67
    • Anna-Maria Behnsen's avatar
      timers/migration: Read childmask and parent pointer in a single place · d47be589
      Anna-Maria Behnsen authored
      
      Reading the childmask and parent pointer is required when propagating
      changes through the hierarchy. At the moment this reads are spread all over
      the place which makes it harder to follow.
      
      Move those reads to a single place to keep code clean.
      
      Signed-off-by: default avatarAnna-Maria Behnsen <anna-maria@linutronix.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Link: https://lore.kernel.org/r/20240716-tmigr-fixes-v4-5-757baa7803fe@linutronix.de
      d47be589
    • Anna-Maria Behnsen's avatar
      timers/migration: Use a single struct for hierarchy walk data · 3ba11103
      Anna-Maria Behnsen authored
      
      Two different structs are defined for propagating data from one to another
      level when walking the hierarchy. Several struct members exist in both
      structs which makes generalization harder.
      
      Merge those two structs into a single one and use it directly in
      walk_groups() and the corresponding function pointers instead of
      introducing pointer casting all over the place.
      
      Signed-off-by: default avatarAnna-Maria Behnsen <anna-maria@linutronix.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Link: https://lore.kernel.org/r/20240716-tmigr-fixes-v4-4-757baa7803fe@linutronix.de
      3ba11103
    • Anna-Maria Behnsen's avatar
      timers/migration: Improve tracing · 92506741
      Anna-Maria Behnsen authored
      
      Trace points of inactive and active propagation are located at the end of
      the related functions. The interesting information of those trace points is
      the updated group state. When trace points are not located directly at the
      place where group state changed, order of trace points in traces could be
      confusing.
      
      Move inactive and active propagation trace points directly after update of
      group state values.
      
      Signed-off-by: default avatarAnna-Maria Behnsen <anna-maria@linutronix.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Link: https://lore.kernel.org/r/20240716-tmigr-fixes-v4-3-757baa7803fe@linutronix.de
      92506741
    • Anna-Maria Behnsen's avatar
      timers/migration: Move hierarchy setup into cpuhotplug prepare callback · 10a0e6f3
      Anna-Maria Behnsen authored
      When a CPU comes online the first time, it is possible that a new top level
      group will be created. In general all propagation is done from the bottom
      to top. This minimizes complexity and prevents possible races. But when a
      new top level group is created, the formely top level group needs to be
      connected to the new level. This is the only time, when the direction to
      propagate changes is changed: the changes are propagated from top (new top
      level group) to bottom (formerly top level group).
      
      This introduces two races (see (A) and (B)) as reported by Frederic:
      
      (A) This race happens, when marking the formely top level group as active,
      but the last active CPU of the formerly top level group goes idle. Then
      it's likely that formerly group is no longer active, but marked
      nevertheless as active in new top level group:
      
      		  [GRP0:0]
      	       migrator = 0
      	       active   = 0
      	       nextevt  = KTIME_MAX
      	       /         \
      	      0         1 .. 7
      	  active         idle
      
      0) Hierarchy has for now only 8 CPUs and CPU 0 is the only active CPU.
      
      			     [GRP1:0]
      			migrator = TMIGR_NONE
      			active   = NONE
      			nextevt  = KTIME_MAX
      					 \
      		 [GRP0:0]                  [GRP0:1]
      	      migrator = 0              migrator = TMIGR_NONE
      	      active   = 0              active   = NONE
      	      nextevt  = KTIME_MAX      nextevt  = KTIME_MAX
      		/         \
      	      0          1 .. 7                8
      	  active         idle                !online
      
      1) CPU 8 is booting and creates a new group in first level GRP0:1 and
         therefore also a new top group GRP1:0. For now the setup code proceeded
         only until the connected between GRP0:1 to the new top group. The
         connection between CPU8 and GRP0:1 is not yet established and CPU 8 is
         still !online.
      
      			     [GRP1:0]
      			migrator = TMIGR_NONE
      			active   = NONE
      			nextevt  = KTIME_MAX
      		       /                  \
      		 [GRP0:0]                  [GRP0:1]
      	      migrator = 0              migrator = TMIGR_NONE
      	      active   = 0              active   = NONE
      	      nextevt  = KTIME_MAX      nextevt  = KTIME_MAX
      		/         \
      	      0          1 .. 7                8
      	  active         idle                !online
      
      2) Setup code now connects GRP0:0 to GRP1:0 and observes while in
         tmigr_connect_child_parent() that GRP0:0 is not TMIGR_NONE. So it
         prepares to call tmigr_active_up() on it. It hasn't done it yet.
      
      			     [GRP1:0]
      			migrator = TMIGR_NONE
      			active   = NONE
      			nextevt  = KTIME_MAX
      		       /                  \
      		 [GRP0:0]                  [GRP0:1]
      	      migrator = TMIGR_NONE        migrator = TMIGR_NONE
      	      active   = NONE              active   = NONE
      	      nextevt  = KTIME_MAX         nextevt  = KTIME_MAX
      		/         \
      	      0          1 .. 7                8
      	    idle         idle                !online
      
      3) CPU 0 goes idle. Since GRP0:0->parent has been updated by CPU 8 with
         GRP0:0->lock held, CPU 0 observes GRP1:0 after calling
         tmigr_update_events() and it propagates the change to the top (no change
         there and no wakeup programmed since there is no timer).
      
      			     [GRP1:0]
      			migrator = GRP0:0
      			active   = GRP0:0
      			nextevt  = KTIME_MAX
      		       /                  \
      		 [GRP0:0]                  [GRP0:1]
      	      migrator = TMIGR_NONE       migrator = TMIGR_NONE
      	      active   = NONE             active   = NONE
      	      nextevt  = KTIME_MAX        nextevt  = KTIME_MAX
      		/         \
      	      0          1 .. 7                8
      	    idle         idle                !online
      
      4) Now the setup code finally calls tmigr_active_up() to and sets GRP0:0
         active in GRP1:0
      
      			     [GRP1:0]
      			migrator = GRP0:0
      			active   = GRP0:0, GRP0:1
      			nextevt  = KTIME_MAX
      		       /                  \
      		 [GRP0:0]                  [GRP0:1]
      	      migrator = TMIGR_NONE       migrator = 8
      	      active   = NONE             active   = 8
      	      nextevt  = KTIME_MAX        nextevt  = KTIME_MAX
      		/         \                    |
      	      0          1 .. 7                8
      	    idle         idle                active
      
      5) Now CPU 8 is connected with GRP0:1 and CPU 8 calls tmigr_active_up() out
         of tmigr_cpu_online().
      
      			     [GRP1:0]
      			migrator = GRP0:0
      			active   = GRP0:0
      			nextevt  = T8
      		       /                  \
      		 [GRP0:0]                  [GRP0:1]
      	      migrator = TMIGR_NONE         migrator = TMIGR_NONE
      	      active   = NONE               active   = NONE
      	      nextevt  = KTIME_MAX          nextevt  = T8
      		/         \                    |
      	      0          1 .. 7                8
      	    idle         idle                  idle
      
      5) CPU 8 goes idle with a timer T8 and relies on GRP0:0 as the migrator.
         But it's not really active, so T8 gets ignored.
      
      --> The update which is done in third step is not noticed by setup code. So
          a wrong migrator is set to top level group and a timer could get
          ignored.
      
      (B) Reading group->parent and group->childmask when an hierarchy update is
      ongoing and reaches the formerly top level group is racy as those values
      could be inconsistent. (The notation of migrator and active now slightly
      changes in contrast to the above example, as now the childmasks are used.)
      
      			     [GRP1:0]
      			migrator = TMIGR_NONE
      			active   = 0x00
      			nextevt  = KTIME_MAX
      					 \
      		 [GRP0:0]                  [GRP0:1]
      	      migrator = TMIGR_NONE     migrator = TMIGR_NONE
      	      active   = 0x00           active   = 0x00
      	      nextevt  = KTIME_MAX      nextevt  = KTIME_MAX
      	      childmask= 0		childmask= 1
      	      parent   = NULL		parent   = GRP1:0
      		/         \
      	      0          1 .. 7                8
      	  idle           idle                !online
      	  childmask=1
      
      1) Hierarchy has 8 CPUs. CPU 8 is at the moment in the process of onlining
         but did not yet connect GRP0:0 to GRP1:0.
      
      			     [GRP1:0]
      			migrator = TMIGR_NONE
      			active   = 0x00
      			nextevt  = KTIME_MAX
      		       /                  \
      		 [GRP0:0]                  [GRP0:1]
      	      migrator = TMIGR_NONE     migrator = TMIGR_NONE
      	      active   = 0x00           active   = 0x00
      	      nextevt  = KTIME_MAX      nextevt  = KTIME_MAX
      	      childmask= 0		childmask= 1
      	      parent   = GRP1:0		parent   = GRP1:0
      		/         \
      	      0          1 .. 7                8
      	  idle           idle                !online
      	  childmask=1
      
      2) Setup code (running on CPU 8) now connects GRP0:0 to GRP1:0, updates
         parent pointer of GRP0:0 and ...
      
      			     [GRP1:0]
      			migrator = TMIGR_NONE
      			active   = 0x00
      			nextevt  = KTIME_MAX
      		       /                  \
      		 [GRP0:0]                  [GRP0:1]
      	      migrator = 0x01           migrator = TMIGR_NONE
      	      active   = 0x01           active   = 0x00
      	      nextevt  = KTIME_MAX      nextevt  = KTIME_MAX
      	      childmask= 0		childmask= 1
      	      parent   = GRP1:0		parent   = GRP1:0
      		/         \
      	      0          1 .. 7                8
      	  active          idle                !online
      	  childmask=1
      
      	  tmigr_walk.childmask = 0
      
      3) ... CPU 0 comes active in the same time. As migrator in GRP0:0 was
         TMIGR_NONE, childmask of GRP0:0 is stored in update propagation data
         structure tmigr_walk (as update of childmask is not yet
         visible/updated). And now ...
      
      			     [GRP1:0]
      			migrator = TMIGR_NONE
      			active   = 0x00
      			nextevt  = KTIME_MAX
      		       /                  \
      		 [GRP0:0]                  [GRP0:1]
      	      migrator = 0x01           migrator = TMIGR_NONE
      	      active   = 0x01           active   = 0x00
      	      nextevt  = KTIME_MAX      nextevt  = KTIME_MAX
      	      childmask= 2		childmask= 1
      	      parent   = GRP1:0		parent   = GRP1:0
      		/         \
      	      0          1 .. 7                8
      	  active          idle                !online
      	  childmask=1
      
      	  tmigr_walk.childmask = 0
      
      4) ... childmask of GRP0:0 is updated by CPU 8 (still part of setup
         code).
      
      			     [GRP1:0]
      			migrator = 0x00
      			active   = 0x00
      			nextevt  = KTIME_MAX
      		       /                  \
      		 [GRP0:0]                  [GRP0:1]
      	      migrator = 0x01           migrator = TMIGR_NONE
      	      active   = 0x01           active   = 0x00
      	      nextevt  = KTIME_MAX      nextevt  = KTIME_MAX
      	      childmask= 2		childmask= 1
      	      parent   = GRP1:0		parent   = GRP1:0
      		/         \
      	      0          1 .. 7                8
      	  active          idle                !online
      	  childmask=1
      
      	  tmigr_walk.childmask = 0
      
      5) CPU 0 sees the connection to GRP1:0 and now propagates active state to
         GRP1:0 but with childmask = 0 as stored in propagation data structure.
      
      --> Now GRP1:0 always has a migrator as 0x00 != TMIGR_NONE and for all CPUs
          it looks like GRP1:0 is always active.
      
      To prevent those races, the setup of the hierarchy is moved into the
      cpuhotplug prepare callback. The prepare callback is not executed by the
      CPU which will come online, it is executed by the CPU which prepares
      onlining of the other CPU. This CPU is active while it is connecting the
      formerly top level to the new one. This prevents from (A) to happen and it
      also prevents from any further walk above the formerly top level until that
      active CPU becomes inactive, releasing the new ->parent and ->childmask
      updates to be visible by any subsequent walk up above the formerly top
      level hierarchy. This prevents from (B) to happen. The direction for the
      updates is now forced to look like "from bottom to top".
      
      However if the active CPU prevents from tmigr_cpu_(in)active() to walk up
      with the update not-or-half visible, nothing prevents walking up to the new
      top with a 0 childmask in tmigr_handle_remote_up() or
      tmigr_requires_handle_remote_up() if the active CPU doing the prepare is
      not the migrator. But then it looks fine because:
      
        * tmigr_check_migrator() should just return false
        * The migrator is active and should eventually observe the new childmask
          at some point in a future tick.
      
      Split setup functionality of online callback into the cpuhotplug prepare
      callback and setup hotplug state. Change init call into early_initcall() to
      make sure an already active CPU prepares everything for newly upcoming
      CPUs. Reorder the code, that all prepare related functions are close to
      each other and online and offline callbacks are also close together.
      
      Fixes: 7ee98877
      
       ("timers: Implement the hierarchical pull model")
      Signed-off-by: default avatarAnna-Maria Behnsen <anna-maria@linutronix.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Link: https://lore.kernel.org/r/20240717094940.18687-1-anna-maria@linutronix.de
      10a0e6f3
    • Anna-Maria Behnsen's avatar
      timers/migration: Do not rely always on group->parent · facd40aa
      Anna-Maria Behnsen authored
      When reading group->parent without holding the group lock it is racy
      against CPUs coming online the first time and thereby creating another
      level of the hierarchy. This is not a problem when this value is read once
      to decide whether to abort a propagation or not. The worst outcome is an
      unnecessary/early CPU wake up. But it is racy when reading it several times
      during a single 'action' (like activation, deactivation, checking for
      remote timer expiry,...) and relying on the consitency of this value
      without holding the lock. This happens at the moment e.g. in
      tmigr_inactive_up() which is also calling tmigr_udpate_events(). Code relys
      on group->parent not to change during this 'action'.
      
      Update parent struct member description to explain the above only
      once. Remove parent pointer checks when they are not mandatory (like update
      of data->childmask). Remove a warning, which would be nice but the trigger
      of this warning is not reliable and add expand the data structure member
      description instead. Expand a comment, why it is safe to rely on parent
      pointer here (inside hierarchy update).
      
      Fixes: 7ee98877
      
       ("timers: Implement the hierarchical pull model")
      Reported-by: default avatarBorislav Petkov <bp@alien8.de>
      Signed-off-by: default avatarAnna-Maria Behnsen <anna-maria@linutronix.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Link: https://lore.kernel.org/r/20240716-tmigr-fixes-v4-1-757baa7803fe@linutronix.de
      facd40aa
  5. Jul 20, 2024
  6. Jul 19, 2024
  7. Jul 18, 2024
  8. Jul 15, 2024
    • Lai Jiangshan's avatar
      workqueue: Remove unneeded lockdep_assert_cpus_held() · aa868475
      Lai Jiangshan authored
      The commit 19af4575 ("workqueue: Remove cpus_read_lock() from
      apply_wqattrs_lock()") removes the unneed cpus_read_lock() after the pwq
      creations and installations have been reworked based on wq_online_cpumask
      rather than cpu_online_mask making cpus_read_lock() is unneeded during
      wqattrs changes.
      
      But it desn't remove the lockdep_assert_cpus_held() checks during wqattrs
      changes, which leads to complaints from lockdep reported by kernel test
      robot:
      
      [   15.726567][  T131] ------------[ cut here ]------------
      [ 15.728117][ T131] WARNING: CPU: 1 PID: 131 at kernel/cpu.c:525 lockdep_assert_cpus_held (kernel/cpu.c:525)
      [   15.731191][  T131] Modules linked in: floppy(+) parport_pc(+) parport qemu_fw_cfg rtc_cmos
      [   15.733423][  T131] CPU: 1 PID: 131 Comm: systemd-udevd Tainted: G                T  6.10.0-rc2-00254-g19af45757383 #1 df6f039f42e8818bf9a534449362ebad1aad32e2
      [   15.737011][  T131] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
      [ 15.739760][ T131] EIP: lockdep_assert_cpus_held (kernel/cpu.c:525)
      [ 15.741326][ T131] Code: 97 c2 03 72 20 83 3d f4 73 97 c2 00 74 17 55 89 e5 b8 fc bd 4d c2 ba ff ff ff ff e8 e4 57 d1 00 85 c0 74 06 5d 31 c0 31 d2 c3 <0f> 0b eb f6 90 90 90 90 90 90 90 90 90 90 90 90 90 90 55 89 e5 b8
      
      Fix it by removing the unneeded lockdep_assert_cpus_held().
      Also remove the unneed cpus_read_lock() from wq_affn_dfl_set().
      
      tj: Dropped the removal of cpus_read_lock/unlock() in wq_affn_dfl_set() to
          keep this patch fix only.
      
      Cc: kernel test robot <oliver.sang@intel.com>
      Fixes: 19af4575
      
      ("workqueue: Remove cpus_read_lock() from apply_wqattrs_lock()")
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Closes: https://lore.kernel.org/oe-lkp/202407141846.665c0446-lkp@intel.com
      
      
      Signed-off-by: default avatarLai Jiangshan <jiangshan.ljs@antgroup.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      aa868475
    • levi.yun's avatar
      trace/pid_list: Change gfp flags in pid_list_fill_irq() · 7dc83618
      levi.yun authored
      pid_list_fill_irq() runs via irq_work.
      When CONFIG_PREEMPT_RT is disabled, it would run in irq_context.
      so it shouldn't sleep while memory allocation.
      
      Change gfp flags from GFP_KERNEL to GFP_NOWAIT to prevent sleep in
      irq_work.
      
      This change wouldn't impact functionality in practice because the worst-size
      is 2K.
      
      Cc: stable@goodmis.org
      Fixes: 8d6e9098 ("tracing: Create a sparse bitmask for pid filtering")
      Link: https://lore.kernel.org/20240704150226.1359936-1-yeoreum.yun@arm.com
      
      
      Acked-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarlevi.yun <yeoreum.yun@arm.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      7dc83618
    • Masahiro Yamada's avatar
      kbuild: remove PROVIDE() for kallsyms symbols · c442db3f
      Masahiro Yamada authored
      This reimplements commit 951bcae6
      
       ("kallsyms: Avoid weak references
      for kallsyms symbols") because I am not a big fan of PROVIDE().
      
      As an alternative solution, this commit prepends one more kallsyms step.
      
          KSYMS   .tmp_vmlinux.kallsyms0.S          # added
          AS      .tmp_vmlinux.kallsyms0.o          # added
          LD      .tmp_vmlinux.btf
          BTF     .btf.vmlinux.bin.o
          LD      .tmp_vmlinux.kallsyms1
          NM      .tmp_vmlinux.kallsyms1.syms
          KSYMS   .tmp_vmlinux.kallsyms1.S
          AS      .tmp_vmlinux.kallsyms1.o
          LD      .tmp_vmlinux.kallsyms2
          NM      .tmp_vmlinux.kallsyms2.syms
          KSYMS   .tmp_vmlinux.kallsyms2.S
          AS      .tmp_vmlinux.kallsyms2.o
          LD      vmlinux
      
      Step 0 takes /dev/null as input, and generates .tmp_vmlinux.kallsyms0.o,
      which has a valid kallsyms format with the empty symbol list, and can be
      linked to vmlinux. Since it is really small, the added compile-time cost
      is negligible.
      
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Acked-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Reviewed-by: default avatarNicolas Schier <nicolas@fjasle.eu>
      c442db3f
    • Lai Jiangshan's avatar
      workqueue: Always queue work items to the newest PWQ for order workqueues · 58629d48
      Lai Jiangshan authored
      
      To ensure non-reentrancy, __queue_work() attempts to enqueue a work
      item to the pool of the currently executing worker. This is not only
      unnecessary for an ordered workqueue, where order inherently suggests
      non-reentrancy, but it could also disrupt the sequence if the item is
      not enqueued on the newest PWQ.
      
      Just queue it to the newest PWQ and let order management guarantees
      non-reentrancy.
      
      Signed-off-by: default avatarLai Jiangshan <jiangshan.ljs@antgroup.com>
      Fixes: 4c065dbc
      
       ("workqueue: Enable unbound cpumask update on ordered workqueues")
      Cc: stable@vger.kernel.org # v6.9+
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      (cherry picked from commit 74347be3edfd11277799242766edf844c43dd5d3)
      58629d48
  9. Jul 14, 2024
    • Kent Overstreet's avatar
      lockdep: lockdep_set_notrack_class() · 1a616c2f
      Kent Overstreet authored
      
      Add a new helper to disable lockdep tracking entirely for a given class.
      
      This is needed for bcachefs, which takes too many btree node locks for
      lockdep to track. Instead, we have a single lockdep_map for "btree_trans
      has any btree nodes locked", which makes more since given that we have
      centralized lock management and a cycle detector.
      
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      1a616c2f
  10. Jul 13, 2024
  11. Jul 12, 2024
    • Kees Cook's avatar
      tsacct: replace strncpy() with strscpy() · 0fe23564
      Kees Cook authored
      Replace the deprecated[1] use of strncpy() in bacct_add_tsk().  Since this
      is UAPI, include trailing padding in the copy.
      
      Link: https://github.com/KSPP/linux/issues/90 [1]
      Link: https://lkml.kernel.org/r/20240711171308.work.995-kees@kernel.org
      
      
      Signed-off-by: default avatarKees Cook <kees@kernel.org>
      Cc: "Dr. Thomas Orgis" <thomas.orgis@uni-hamburg.de>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Ismael Luceno <ismael@iodev.co.uk>
      Cc: Peng Liu <liupeng256@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0fe23564
    • Christophe Leroy's avatar
      mm: define __pte_leaf_size() to also take a PMD entry · 18d095b2
      Christophe Leroy authored
      On powerpc 8xx, when a page is 8M size, the information is in the PMD
      entry.  So allow architectures to provide __pte_leaf_size() instead of
      pte_leaf_size() and provide the PMD entry to that function.
      
      When __pte_leaf_size() is not defined, define it as a pte_leaf_size() so
      that architectures not interested in the PMD arguments are not impacted.
      
      Only define a default pte_leaf_size() when __pte_leaf_size() is not
      defined to make sure nobody adds new calls to pte_leaf_size() in the core.
      
      Link: https://lkml.kernel.org/r/c7c008f0a314bf8029ad7288fdc908db1ec7e449.1719928057.git.christophe.leroy@csgroup.eu
      
      
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Peter Xu <peterx@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      18d095b2
    • Xiu Jianfeng's avatar
      cgroup/misc: Introduce misc.events.local · 6a26f9c6
      Xiu Jianfeng authored
      
      Currently the event counting provided by misc.events is hierarchical,
      it's not practical if user is only concerned with events of a
      specified cgroup. Therefore, introduce misc.events.local collect events
      specific to the given cgroup.
      
      This is analogous to memory.events.local and pids.events.local.
      
      Signed-off-by: default avatarXiu Jianfeng <xiujianfeng@huawei.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      6a26f9c6
    • Shung-Hsi Yu's avatar
      bpf: use check_sub_overflow() to check for subtraction overflows · deac5871
      Shung-Hsi Yu authored
      
      Similar to previous patch that drops signed_add*_overflows() and uses
      (compiler) builtin-based check_add_overflow(), do the same for
      signed_sub*_overflows() and replace them with the generic
      check_sub_overflow() to make future refactoring easier and have the
      checks implemented more efficiently.
      
      Unsigned overflow check for subtraction does not use helpers and are
      simple enough already, so they're left untouched.
      
      After the change GCC 13.3.0 generates cleaner assembly on x86_64:
      
      	if (check_sub_overflow(*dst_smin, src_reg->smax_value, dst_smin) ||
         139bf:	mov    0x28(%r12),%rax
         139c4:	mov    %edx,0x54(%r12)
         139c9:	sub    %r11,%rax
         139cc:	mov    %rax,0x28(%r12)
         139d1:	jo     14627 <adjust_reg_min_max_vals+0x1237>
      	    check_sub_overflow(*dst_smax, src_reg->smin_value, dst_smax)) {
         139d7:	mov    0x30(%r12),%rax
         139dc:	sub    %r9,%rax
         139df:	mov    %rax,0x30(%r12)
      	if (check_sub_overflow(*dst_smin, src_reg->smax_value, dst_smin) ||
         139e4:	jo     14627 <adjust_reg_min_max_vals+0x1237>
         ...
      		*dst_smin = S64_MIN;
         14627:	movabs $0x8000000000000000,%rax
         14631:	mov    %rax,0x28(%r12)
      		*dst_smax = S64_MAX;
         14636:	sub    $0x1,%rax
         1463a:	mov    %rax,0x30(%r12)
      
      Before the change it gives:
      
      	if (signed_sub_overflows(dst_reg->smin_value, smax_val) ||
         13a50:	mov    0x28(%r12),%rdi
         13a55:	mov    %edx,0x54(%r12)
      		dst_reg->smax_value = S64_MAX;
         13a5a:	movabs $0x7fffffffffffffff,%rdx
         13a64:	mov    %eax,0x50(%r12)
      		dst_reg->smin_value = S64_MIN;
         13a69:	movabs $0x8000000000000000,%rax
      	s64 res = (s64)((u64)a - (u64)b);
         13a73:	mov    %rdi,%rsi
         13a76:	sub    %rcx,%rsi
      	if (b < 0)
         13a79:	test   %rcx,%rcx
         13a7c:	js     145ea <adjust_reg_min_max_vals+0x119a>
      	if (signed_sub_overflows(dst_reg->smin_value, smax_val) ||
         13a82:	cmp    %rsi,%rdi
         13a85:	jl     13ac7 <adjust_reg_min_max_vals+0x677>
      	    signed_sub_overflows(dst_reg->smax_value, smin_val)) {
         13a87:	mov    0x30(%r12),%r8
      	s64 res = (s64)((u64)a - (u64)b);
         13a8c:	mov    %r8,%rax
         13a8f:	sub    %r9,%rax
      	return res > a;
         13a92:	cmp    %rax,%r8
         13a95:	setl   %sil
      	if (b < 0)
         13a99:	test   %r9,%r9
         13a9c:	js     147d1 <adjust_reg_min_max_vals+0x1381>
      		dst_reg->smax_value = S64_MAX;
         13aa2:	movabs $0x7fffffffffffffff,%rdx
      		dst_reg->smin_value = S64_MIN;
         13aac:	movabs $0x8000000000000000,%rax
      	if (signed_sub_overflows(dst_reg->smin_value, smax_val) ||
         13ab6:	test   %sil,%sil
         13ab9:	jne    13ac7 <adjust_reg_min_max_vals+0x677>
      		dst_reg->smin_value -= smax_val;
         13abb:	mov    %rdi,%rax
      		dst_reg->smax_value -= smin_val;
         13abe:	mov    %r8,%rdx
      		dst_reg->smin_value -= smax_val;
         13ac1:	sub    %rcx,%rax
      		dst_reg->smax_value -= smin_val;
         13ac4:	sub    %r9,%rdx
         13ac7:	mov    %rax,0x28(%r12)
         ...
         13ad1:	mov    %rdx,0x30(%r12)
         ...
      	if (signed_sub_overflows(dst_reg->smin_value, smax_val) ||
         145ea:	cmp    %rsi,%rdi
         145ed:	jg     13ac7 <adjust_reg_min_max_vals+0x677>
         145f3:	jmp    13a87 <adjust_reg_min_max_vals+0x637>
      
      Suggested-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarShung-Hsi Yu <shung-hsi.yu@suse.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: https://lore.kernel.org/r/20240712080127.136608-4-shung-hsi.yu@suse.com
      
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      deac5871
    • Shung-Hsi Yu's avatar
      bpf: use check_add_overflow() to check for addition overflows · 28a44110
      Shung-Hsi Yu authored
      signed_add*_overflows() was added back when there was no overflow-check
      helper. With the introduction of such helpers in commit f0907827
      
      
      ("compiler.h: enable builtin overflow checkers and add fallback code"), we
      can drop signed_add*_overflows() in kernel/bpf/verifier.c and use the
      generic check_add_overflow() instead.
      
      This will make future refactoring easier, and takes advantage of
      compiler-emitted hardware instructions that efficiently implement these
      checks.
      
      After the change GCC 13.3.0 generates cleaner assembly on x86_64:
      
      	err = adjust_scalar_min_max_vals(env, insn, dst_reg, *src_reg);
         13625:	mov    0x28(%rbx),%r9  /*  r9 = src_reg->smin_value */
         13629:	mov    0x30(%rbx),%rcx /* rcx = src_reg->smax_value */
         ...
      	if (check_add_overflow(*dst_smin, src_reg->smin_value, dst_smin) ||
         141c1:	mov    %r9,%rax
         141c4:	add    0x28(%r12),%rax
         141c9:	mov    %rax,0x28(%r12)
         141ce:	jo     146e4 <adjust_reg_min_max_vals+0x1294>
      	    check_add_overflow(*dst_smax, src_reg->smax_value, dst_smax)) {
         141d4:	add    0x30(%r12),%rcx
         141d9:	mov    %rcx,0x30(%r12)
      	if (check_add_overflow(*dst_smin, src_reg->smin_value, dst_smin) ||
         141de:	jo     146e4 <adjust_reg_min_max_vals+0x1294>
         ...
      		*dst_smin = S64_MIN;
         146e4:	movabs $0x8000000000000000,%rax
         146ee:	mov    %rax,0x28(%r12)
      		*dst_smax = S64_MAX;
         146f3:	sub    $0x1,%rax
         146f7:	mov    %rax,0x30(%r12)
      
      Before the change it gives:
      
      	s64 smin_val = src_reg->smin_value;
           675:	mov    0x28(%rsi),%r8
      	s64 smax_val = src_reg->smax_value;
      	u64 umin_val = src_reg->umin_value;
      	u64 umax_val = src_reg->umax_value;
           679:	mov    %rdi,%rax /* rax = dst_reg */
      	if (signed_add_overflows(dst_reg->smin_value, smin_val) ||
           67c:	mov    0x28(%rdi),%rdi /* rdi = dst_reg->smin_value */
      	u64 umin_val = src_reg->umin_value;
           680:	mov    0x38(%rsi),%rdx
      	u64 umax_val = src_reg->umax_value;
           684:	mov    0x40(%rsi),%rcx
      	s64 res = (s64)((u64)a + (u64)b);
           688:	lea    (%r8,%rdi,1),%r9 /* r9 = dst_reg->smin_value + src_reg->smin_value */
      	return res < a;
           68c:	cmp    %r9,%rdi
           68f:	setg   %r10b /* r10b = (dst_reg->smin_value + src_reg->smin_value) > dst_reg->smin_value */
      	if (b < 0)
           693:	test   %r8,%r8
           696:	js     72b <scalar_min_max_add+0xbb>
      	    signed_add_overflows(dst_reg->smax_value, smax_val)) {
      		dst_reg->smin_value = S64_MIN;
      		dst_reg->smax_value = S64_MAX;
           69c:	movabs $0x7fffffffffffffff,%rdi
      	s64 smax_val = src_reg->smax_value;
           6a6:	mov    0x30(%rsi),%r8
      		dst_reg->smin_value = S64_MIN;
           6aa:	00 00 00 	movabs $0x8000000000000000,%rsi
      	if (signed_add_overflows(dst_reg->smin_value, smin_val) ||
           6b4:	test   %r10b,%r10b /* (dst_reg->smin_value + src_reg->smin_value) > dst_reg->smin_value ? goto 6cb */
           6b7:	jne    6cb <scalar_min_max_add+0x5b>
      	    signed_add_overflows(dst_reg->smax_value, smax_val)) {
           6b9:	mov    0x30(%rax),%r10   /* r10 = dst_reg->smax_value */
      	s64 res = (s64)((u64)a + (u64)b);
           6bd:	lea    (%r10,%r8,1),%r11 /* r11 = dst_reg->smax_value + src_reg->smax_value */
      	if (b < 0)
           6c1:	test   %r8,%r8
           6c4:	js     71e <scalar_min_max_add+0xae>
      	if (signed_add_overflows(dst_reg->smin_value, smin_val) ||
           6c6:	cmp    %r11,%r10 /* (dst_reg->smax_value + src_reg->smax_value) <= dst_reg->smax_value ? goto 723 */
           6c9:	jle    723 <scalar_min_max_add+0xb3>
      	} else {
      		dst_reg->smin_value += smin_val;
      		dst_reg->smax_value += smax_val;
      	}
           6cb:	mov    %rsi,0x28(%rax)
           ...
           6d5:	mov    %rdi,0x30(%rax)
           ...
      	if (signed_add_overflows(dst_reg->smin_value, smin_val) ||
           71e:	cmp    %r11,%r10
           721:	jl     6cb <scalar_min_max_add+0x5b>
      		dst_reg->smin_value += smin_val;
           723:	mov    %r9,%rsi
      		dst_reg->smax_value += smax_val;
           726:	mov    %r11,%rdi
           729:	jmp    6cb <scalar_min_max_add+0x5b>
      		return res > a;
           72b:	cmp    %r9,%rdi
           72e:	setl   %r10b
           732:	jmp    69c <scalar_min_max_add+0x2c>
           737:	nopw   0x0(%rax,%rax,1)
      
      Note: unlike adjust_ptr_min_max_vals() and scalar*_min_max_add(), it is
      necessary to introduce intermediate variable in adjust_jmp_off() to keep
      the functional behavior unchanged. Without an intermediate variable
      imm/off will be altered even on overflow.
      
      Suggested-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarShung-Hsi Yu <shung-hsi.yu@suse.com>
      Link: https://lore.kernel.org/r/20240712080127.136608-3-shung-hsi.yu@suse.com
      
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      28a44110
    • Shung-Hsi Yu's avatar
      bpf: fix overflow check in adjust_jmp_off() · 4a04b4f0
      Shung-Hsi Yu authored
      adjust_jmp_off() incorrectly used the insn->imm field for all overflow check,
      which is incorrect as that should only be done or the BPF_JMP32 | BPF_JA case,
      not the general jump instruction case. Fix it by using insn->off for overflow
      check in the general case.
      
      Fixes: 5337ac4c
      
       ("bpf: Fix the corner case with may_goto and jump to the 1st insn.")
      Signed-off-by: default avatarShung-Hsi Yu <shung-hsi.yu@suse.com>
      Link: https://lore.kernel.org/r/20240712080127.136608-2-shung-hsi.yu@suse.com
      
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      4a04b4f0
    • Alan Maguire's avatar
      bpf: Eliminate remaining "make W=1" warnings in kernel/bpf/btf.o · 2454075f
      Alan Maguire authored
      As reported by Mirsad [1] we still see format warnings in kernel/bpf/btf.o
      at W=1 warning level:
      
        CC      kernel/bpf/btf.o
      ./kernel/bpf/btf.c: In function ‘btf_type_seq_show_flags’:
      ./kernel/bpf/btf.c:7553:21: warning: assignment left-hand side might be a candidate for a format attribute [-Wsuggest-attribute=format]
       7553 |         sseq.showfn = btf_seq_show;
            |                     ^
      ./kernel/bpf/btf.c: In function ‘btf_type_snprintf_show’:
      ./kernel/bpf/btf.c:7604:31: warning: assignment left-hand side might be a candidate for a format attribute [-Wsuggest-attribute=format]
       7604 |         ssnprintf.show.showfn = btf_snprintf_show;
            |                               ^
      
      Combined with CONFIG_WERROR=y these can halt the build.
      
      The fix (annotating the structure field with __printf())
      suggested by Mirsad resolves these. Apologies I missed this last time.
      No other W=1 warnings were observed in kernel/bpf after this fix.
      
      [1] https://lore.kernel.org/bpf/92c9d047-f058-400c-9c7d-81d4dc1ef71b@gmail.com/
      
      Fixes: b3470da3
      
       ("bpf: annotate BTF show functions with __printf")
      Reported-by: default avatarMirsad Todorovac <mtodorovac69@gmail.com>
      Suggested-by: default avatarMirsad Todorovac <mtodorovac69@gmail.com>
      Signed-off-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20240712092859.1390960-1-alan.maguire@oracle.com
      2454075f
  12. Jul 11, 2024