Forum | Documentation | Website | Blog

Skip to content
Snippets Groups Projects
  1. Jul 29, 2024
  2. Jul 28, 2024
  3. Jul 27, 2024
  4. Jul 26, 2024
  5. Jul 24, 2024
    • Joel Granados's avatar
      sysctl: treewide: constify the ctl_table argument of proc_handlers · 78eb4ea2
      Joel Granados authored
      const qualify the struct ctl_table argument in the proc_handler function
      signatures. This is a prerequisite to moving the static ctl_table
      structs into .rodata data which will ensure that proc_handler function
      pointers cannot be modified.
      
      This patch has been generated by the following coccinelle script:
      
      ```
        virtual patch
      
        @r1@
        identifier ctl, write, buffer, lenp, ppos;
        identifier func !~ "appldata_(timer|interval)_handler|sched_(rt|rr)_handler|rds_tcp_skbuf_handler|proc_sctp_do_(hmac_alg|rto_min|rto_max|udp_port|alpha_beta|auth|probe_interval)";
        @@
      
        int func(
        - struct ctl_table *ctl
        + const struct ctl_table *ctl
          ,int write, void *buffer, size_t *lenp, loff_t *ppos);
      
        @r2@
        identifier func, ctl, write, buffer, lenp, ppos;
        @@
      
        int func(
        - struct ctl_table *ctl
        + const struct ctl_table *ctl
          ,int write, void *buffer, size_t *lenp, loff_t *ppos)
        { ... }
      
        @r3@
        identifier func;
        @@
      
        int func(
        - stru...
      78eb4ea2
    • Linus Torvalds's avatar
      hostfs: fix folio conversion · e44be002
      Linus Torvalds authored
      Commit e3ec0fe9
      
       ("hostfs: Convert hostfs_read_folio() to use a
      folio") simplified hostfs_read_folio(), but in the process of converting
      to using folios natively also mis-used the folio_zero_tail() function
      due to the very confusing API of that function.
      
      Very arguably it's folio_zero_tail() API itself that is buggy, since it
      would make more sense (and the documentation kind of implies) that the
      third argument would be the pointer to the beginning of the folio
      buffer.
      
      But no, the third argument to folio_zero_tail() is where we should start
      zeroing the tail (even if we already also pass in the offset separately
      as the second argument).
      
      So fix the hostfs caller, and we can leave any folio_zero_tail() sanity
      cleanup for later.
      
      Reported-and-tested-by: default avatarMaciej Żenczykowski <maze@google.com>
      Fixes: e3ec0fe9 ("hostfs: Convert hostfs_read_folio() to use a folio")
      Link: https://lore.kernel.org/all/CANP3RGceNzwdb7w=vPf5=7BCid5HVQDmz1K5kC9JG42+HVAh_g@mail.gmail.com/
      
      
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Christian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e44be002
    • Christian Brauner's avatar
      inode: clarify what's locked · f5e5e97c
      Christian Brauner authored
      
      In __wait_on_freeing_inode() we warn in case the inode_hash_lock is held
      but the inode is unhashed. We then release the inode_lock. So using
      "locked" as parameter name is confusing. Use is_inode_hash_locked as
      parameter name instead.
      
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      f5e5e97c
    • David Howells's avatar
      vfs: Fix potential circular locking through setxattr() and removexattr() · c3a5e3e8
      David Howells authored
      
      When using cachefiles, lockdep may emit something similar to the circular
      locking dependency notice below.  The problem appears to stem from the
      following:
      
       (1) Cachefiles manipulates xattrs on the files in its cache when called
           from ->writepages().
      
       (2) The setxattr() and removexattr() system call handlers get the name
           (and value) from userspace after taking the sb_writers lock, putting
           accesses of the vma->vm_lock and mm->mmap_lock inside of that.
      
       (3) The afs filesystem uses a per-inode lock to prevent multiple
           revalidation RPCs and in writeback vs truncate to prevent parallel
           operations from deadlocking against the server on one side and local
           page locks on the other.
      
      Fix this by moving the getting of the name and value in {get,remove}xattr()
      outside of the sb_writers lock.  This also has the minor benefits that we
      don't need to reget these in the event of a retry and we never try to take
      the sb_writers lock in the event we can't pull the name and value into the
      kernel.
      
      Alternative approaches that might fix this include moving the dispatch of a
      write to the cache off to a workqueue or trying to do without the
      validation lock in afs.  Note that this might also affect other filesystems
      that use netfslib and/or cachefiles.
      
       ======================================================
       WARNING: possible circular locking dependency detected
       6.10.0-build2+ #956 Not tainted
       ------------------------------------------------------
       fsstress/6050 is trying to acquire lock:
       ffff888138fd82f0 (mapping.invalidate_lock#3){++++}-{3:3}, at: filemap_fault+0x26e/0x8b0
      
       but task is already holding lock:
       ffff888113f26d18 (&vma->vm_lock->lock){++++}-{3:3}, at: lock_vma_under_rcu+0x165/0x250
      
       which lock already depends on the new lock.
      
       the existing dependency chain (in reverse order) is:
      
       -> #4 (&vma->vm_lock->lock){++++}-{3:3}:
              __lock_acquire+0xaf0/0xd80
              lock_acquire.part.0+0x103/0x280
              down_write+0x3b/0x50
              vma_start_write+0x6b/0xa0
              vma_link+0xcc/0x140
              insert_vm_struct+0xb7/0xf0
              alloc_bprm+0x2c1/0x390
              kernel_execve+0x65/0x1a0
              call_usermodehelper_exec_async+0x14d/0x190
              ret_from_fork+0x24/0x40
              ret_from_fork_asm+0x1a/0x30
      
       -> #3 (&mm->mmap_lock){++++}-{3:3}:
              __lock_acquire+0xaf0/0xd80
              lock_acquire.part.0+0x103/0x280
              __might_fault+0x7c/0xb0
              strncpy_from_user+0x25/0x160
              removexattr+0x7f/0x100
              __do_sys_fremovexattr+0x7e/0xb0
              do_syscall_64+0x9f/0x100
              entry_SYSCALL_64_after_hwframe+0x76/0x7e
      
       -> #2 (sb_writers#14){.+.+}-{0:0}:
              __lock_acquire+0xaf0/0xd80
              lock_acquire.part.0+0x103/0x280
              percpu_down_read+0x3c/0x90
              vfs_iocb_iter_write+0xe9/0x1d0
              __cachefiles_write+0x367/0x430
              cachefiles_issue_write+0x299/0x2f0
              netfs_advance_write+0x117/0x140
              netfs_write_folio.isra.0+0x5ca/0x6e0
              netfs_writepages+0x230/0x2f0
              afs_writepages+0x4d/0x70
              do_writepages+0x1e8/0x3e0
              filemap_fdatawrite_wbc+0x84/0xa0
              __filemap_fdatawrite_range+0xa8/0xf0
              file_write_and_wait_range+0x59/0x90
              afs_release+0x10f/0x270
              __fput+0x25f/0x3d0
              __do_sys_close+0x43/0x70
              do_syscall_64+0x9f/0x100
              entry_SYSCALL_64_after_hwframe+0x76/0x7e
      
       -> #1 (&vnode->validate_lock){++++}-{3:3}:
              __lock_acquire+0xaf0/0xd80
              lock_acquire.part.0+0x103/0x280
              down_read+0x95/0x200
              afs_writepages+0x37/0x70
              do_writepages+0x1e8/0x3e0
              filemap_fdatawrite_wbc+0x84/0xa0
              filemap_invalidate_inode+0x167/0x1e0
              netfs_unbuffered_write_iter+0x1bd/0x2d0
              vfs_write+0x22e/0x320
              ksys_write+0xbc/0x130
              do_syscall_64+0x9f/0x100
              entry_SYSCALL_64_after_hwframe+0x76/0x7e
      
       -> #0 (mapping.invalidate_lock#3){++++}-{3:3}:
              check_noncircular+0x119/0x160
              check_prev_add+0x195/0x430
              __lock_acquire+0xaf0/0xd80
              lock_acquire.part.0+0x103/0x280
              down_read+0x95/0x200
              filemap_fault+0x26e/0x8b0
              __do_fault+0x57/0xd0
              do_pte_missing+0x23b/0x320
              __handle_mm_fault+0x2d4/0x320
              handle_mm_fault+0x14f/0x260
              do_user_addr_fault+0x2a2/0x500
              exc_page_fault+0x71/0x90
              asm_exc_page_fault+0x22/0x30
      
       other info that might help us debug this:
      
       Chain exists of:
         mapping.invalidate_lock#3 --> &mm->mmap_lock --> &vma->vm_lock->lock
      
        Possible unsafe locking scenario:
      
              CPU0                    CPU1
              ----                    ----
         rlock(&vma->vm_lock->lock);
                                      lock(&mm->mmap_lock);
                                      lock(&vma->vm_lock->lock);
         rlock(mapping.invalidate_lock#3);
      
        *** DEADLOCK ***
      
       1 lock held by fsstress/6050:
        #0: ffff888113f26d18 (&vma->vm_lock->lock){++++}-{3:3}, at: lock_vma_under_rcu+0x165/0x250
      
       stack backtrace:
       CPU: 0 PID: 6050 Comm: fsstress Not tainted 6.10.0-build2+ #956
       Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
       Call Trace:
        <TASK>
        dump_stack_lvl+0x57/0x80
        check_noncircular+0x119/0x160
        ? queued_spin_lock_slowpath+0x4be/0x510
        ? __pfx_check_noncircular+0x10/0x10
        ? __pfx_queued_spin_lock_slowpath+0x10/0x10
        ? mark_lock+0x47/0x160
        ? init_chain_block+0x9c/0xc0
        ? add_chain_block+0x84/0xf0
        check_prev_add+0x195/0x430
        __lock_acquire+0xaf0/0xd80
        ? __pfx___lock_acquire+0x10/0x10
        ? __lock_release.isra.0+0x13b/0x230
        lock_acquire.part.0+0x103/0x280
        ? filemap_fault+0x26e/0x8b0
        ? __pfx_lock_acquire.part.0+0x10/0x10
        ? rcu_is_watching+0x34/0x60
        ? lock_acquire+0xd7/0x120
        down_read+0x95/0x200
        ? filemap_fault+0x26e/0x8b0
        ? __pfx_down_read+0x10/0x10
        ? __filemap_get_folio+0x25/0x1a0
        filemap_fault+0x26e/0x8b0
        ? __pfx_filemap_fault+0x10/0x10
        ? find_held_lock+0x7c/0x90
        ? __pfx___lock_release.isra.0+0x10/0x10
        ? __pte_offset_map+0x99/0x110
        __do_fault+0x57/0xd0
        do_pte_missing+0x23b/0x320
        __handle_mm_fault+0x2d4/0x320
        ? __pfx___handle_mm_fault+0x10/0x10
        handle_mm_fault+0x14f/0x260
        do_user_addr_fault+0x2a2/0x500
        exc_page_fault+0x71/0x90
        asm_exc_page_fault+0x22/0x30
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Link: https://lore.kernel.org/r/2136178.1721725194@warthog.procyon.org.uk
      
      
      cc: Alexander Viro <viro@zeniv.linux.org.uk>
      cc: Christian Brauner <brauner@kernel.org>
      cc: Jan Kara <jack@suse.cz>
      cc: Jeff Layton <jlayton@kernel.org>
      cc: Gao Xiang <xiang@kernel.org>
      cc: Matthew Wilcox <willy@infradead.org>
      cc: netfs@lists.linux.dev
      cc: linux-erofs@lists.ozlabs.org
      cc: linux-fsdevel@vger.kernel.org
      [brauner: fix minor issues]
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      c3a5e3e8
    • Jann Horn's avatar
      filelock: Fix fcntl/close race recovery compat path · f8138f2a
      Jann Horn authored
      When I wrote commit 3cad1bc0 ("filelock: Remove locks reliably when
      fcntl/close race is detected"), I missed that there are two copies of the
      code I was patching: The normal version, and the version for 64-bit offsets
      on 32-bit kernels.
      Thanks to Greg KH for stumbling over this while doing the stable
      backport...
      
      Apply exactly the same fix to the compat path for 32-bit kernels.
      
      Fixes: c293621b ("[PATCH] stale POSIX lock handling")
      Cc: stable@kernel.org
      Link: https://bugs.chromium.org/p/project-zero/issues/detail?id=2563
      
      
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Link: https://lore.kernel.org/r/20240723-fs-lock-recover-compatfix-v1-1-148096719529@google.com
      
      
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      f8138f2a
    • Christian Brauner's avatar
      fs: use all available ids · 8eac5358
      Christian Brauner authored
      The counter is unconditionally incremented for each mount allocation.
      If we set it to 1ULL << 32 we're losing 4294967296 as the first valid
      non-32 bit mount id.
      
      Link: https://lore.kernel.org/r/20240719-work-mount-namespace-v1-1-834113cab0d2@kernel.org
      
      
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      8eac5358
    • David Howells's avatar
      cachefiles: Set the max subreq size for cache writes to MAX_RW_COUNT · 51d37982
      David Howells authored
      
      Set the maximum size of a subrequest that writes to cachefiles to be
      MAX_RW_COUNT so that we don't overrun the maximum write we can make to the
      backing filesystem.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Link: https://lore.kernel.org/r/1599005.1721398742@warthog.procyon.org.uk
      
      
      cc: Jeff Layton <jlayton@kernel.org>
      cc: netfs@lists.linux.dev
      cc: linux-fsdevel@vger.kernel.org
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      51d37982
    • David Howells's avatar
      netfs: Fix writeback that needs to go to both server and cache · 212be98a
      David Howells authored
      When netfslib is performing writeback (ie. ->writepages), it maintains two
      parallel streams of writes, one to the server and one to the cache, but it
      doesn't mark either stream of writes as active until it gets some data that
      needs to be written to that stream.
      
      This is done because some folios will only be written to the cache
      (e.g. copying to the cache on read is done by marking the folios and
      letting writeback do the actual work) and sometimes we'll only be writing
      to the server (e.g. if there's no cache).
      
      Now, since we don't actually dispatch uploads and cache writes in parallel,
      but rather flip between the streams, depending on which has the lowest
      so-far-issued offset, and don't wait for the subreqs to finish before
      flipping, we can end up in a situation where, say, we issue a write to the
      server and this completes before we start the write to the cache.
      
      But because we only activate a stream when we first add a subreq to it, the
      result...
      212be98a
    • Christian Brauner's avatar
      pidfs: handle kernels without namespaces cleanly · 9b3e1504
      Christian Brauner authored
      The nsproxy structure contains nearly all of the namespaces associated
      with a task. When a given namespace type is not supported by this kernel
      the rules whether the corresponding pointer in struct nsproxy is NULL or
      always init_<ns_type>_ns differ per namespace. Ideally, that wouldn't be
      the case and for all namespace types we'd always set it to
      init_<ns_type>_ns when the corresponding namespace type isn't supported.
      
      Make sure we handle all namespaces where the pointer in struct nsproxy
      can be NULL when the namespace type isn't supported.
      
      Link: https://lore.kernel.org/r/20240722-work-pidfs-e6a83030f63e@brauner
      Fixes: 5b08bd40
      
       ("pidfs: allow retrieval of namespace file descriptors") # mainline only
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      9b3e1504
    • Edward Adam Davis's avatar
      pidfs: when time ns disabled add check for ioctl · f60d38cb
      Edward Adam Davis authored
      syzbot call pidfd_ioctl() with cmd "PIDFD_GET_TIME_NAMESPACE" and disabled
      CONFIG_TIME_NS, since time_ns is NULL, it will make NULL ponter deref in
      open_namespace.
      
      Fixes: 5b08bd40
      
       ("pidfs: allow retrieval of namespace file descriptors") # mainline only
      Reported-and-tested-by: default avatar <syzbot+34a0ee986f61f15da35d@syzkaller.appspotmail.com>
      Closes: https://syzkaller.appspot.com/bug?extid=34a0ee986f61f15da35d
      
      
      Signed-off-by: default avatarEdward Adam Davis <eadavis@qq.com>
      Link: https://lore.kernel.org/r/tencent_7FAE8DB725EE0DD69236DDABDDDE195E4F07@qq.com
      
      
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      f60d38cb
    • Congjie Zhou's avatar
      vfs: correct the comments of vfs_*() helpers · b40c8e7a
      Congjie Zhou authored
      correct the comments of vfs_*() helpers in fs/namei.c, including:
      1. vfs_create()
      2. vfs_mknod()
      3. vfs_mkdir()
      4. vfs_rmdir()
      5. vfs_symlink()
      
      All of them come from the same commit:
      6521f891
      
       "namei: prepare for idmapped mounts"
      
      The @dentry is actually the dentry of child directory rather than
      base directory(parent directory), and thus the @dir has to be
      modified due to the change of @dentry.
      
      Signed-off-by: default avatarCongjie Zhou <zcjie0802@qq.com>
      Link: https://lore.kernel.org/r/tencent_2FCF6CC9E10DC8A27AE58A5A0FE4FCE96D0A@qq.com
      
      
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      b40c8e7a
    • Mateusz Guzik's avatar
      vfs: handle __wait_on_freeing_inode() and evict() race · 5bc9ad78
      Mateusz Guzik authored
      Lockless hash lookup can find and lock the inode after it gets the
      I_FREEING flag set, at which point it blocks waiting for teardown in
      evict() to finish.
      
      However, the flag is still set even after evict() wakes up all waiters.
      
      This results in a race where if the inode lock is taken late enough, it
      can happen after both hash removal and wakeups, meaning there is nobody
      to wake the racing thread up.
      
      This worked prior to RCU-based lookup because the entire ordeal was
      synchronized with the inode hash lock.
      
      Since unhashing requires the inode lock, we can safely check whether it
      happened after acquiring it.
      
      Link: https://lore.kernel.org/v9fs/20240717102458.649b60be@kernel.org/
      
      
      Reported-by: default avatarDominique Martinet <asmadeus@codewreck.org>
      Fixes: 7180f8d9
      
       ("vfs: add rcu-based find_inode variants for iget ops")
      Signed-off-by: default avatarMateusz Guzik <mjguzik@gmail.com>
      Link: https://lore.kernel.org/r/20240718151838.611807-1-mjguzik@gmail.com
      Reviewed-by: Jan Kara <j...
      5bc9ad78
    • David Howells's avatar
      netfs: Rename CONFIG_FSCACHE_DEBUG to CONFIG_NETFS_DEBUG · fcad9336
      David Howells authored
      
      CONFIG_FSCACHE_DEBUG should have been renamed to CONFIG_NETFS_DEBUG, so do
      that now.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Link: https://lore.kernel.org/r/1410796.1721333406@warthog.procyon.org.uk
      
      
      cc: Uwe Kleine-König <ukleinek@kernel.org>
      cc: Christian Brauner <brauner@kernel.org>
      cc: Jeff Layton <jlayton@kernel.org>
      cc: netfs@lists.linux.dev
      cc: linux-fsdevel@vger.kernel.org
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      fcad9336
    • David Howells's avatar
      netfs: Revert "netfs: Switch debug logging to pr_debug()" · a9d47a50
      David Howells authored
      Revert commit 163eae0f
      
       to get back the
      original operation of the debugging macros.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Link: https://lore.kernel.org/r/20240608151352.22860-2-ukleinek@kernel.org
      Link: https://lore.kernel.org/r/1410685.1721333252@warthog.procyon.org.uk
      
      
      cc: Uwe Kleine-König <ukleinek@kernel.org>
      cc: Christian Brauner <brauner@kernel.org>
      cc: Jeff Layton <jlayton@kernel.org>
      cc: netfs@lists.linux.dev
      cc: linux-fsdevel@vger.kernel.org
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      a9d47a50
  6. Jul 23, 2024
  7. Jul 22, 2024