Forum | Documentation | Website | Blog

Skip to content
Snippets Groups Projects
  1. Jul 26, 2024
  2. Jul 24, 2024
    • Joel Granados's avatar
      sysctl: treewide: constify the ctl_table argument of proc_handlers · 78eb4ea2
      Joel Granados authored
      const qualify the struct ctl_table argument in the proc_handler function
      signatures. This is a prerequisite to moving the static ctl_table
      structs into .rodata data which will ensure that proc_handler function
      pointers cannot be modified.
      
      This patch has been generated by the following coccinelle script:
      
      ```
        virtual patch
      
        @r1@
        identifier ctl, write, buffer, lenp, ppos;
        identifier func !~ "appldata_(timer|interval)_handler|sched_(rt|rr)_handler|rds_tcp_skbuf_handler|proc_sctp_do_(hmac_alg|rto_min|rto_max|udp_port|alpha_beta|auth|probe_interval)";
        @@
      
        int func(
        - struct ctl_table *ctl
        + const struct ctl_table *ctl
          ,int write, void *buffer, size_t *lenp, loff_t *ppos);
      
        @r2@
        identifier func, ctl, write, buffer, lenp, ppos;
        @@
      
        int func(
        - struct ctl_table *ctl
        + const struct ctl_table *ctl
          ,int write, void *buffer, size_t *lenp, loff_t *ppos)
        { ... }
      
        @r3@
        identifier func;
        @@
      
        int func(
        - stru...
      78eb4ea2
    • Linus Torvalds's avatar
      hostfs: fix folio conversion · e44be002
      Linus Torvalds authored
      Commit e3ec0fe9
      
       ("hostfs: Convert hostfs_read_folio() to use a
      folio") simplified hostfs_read_folio(), but in the process of converting
      to using folios natively also mis-used the folio_zero_tail() function
      due to the very confusing API of that function.
      
      Very arguably it's folio_zero_tail() API itself that is buggy, since it
      would make more sense (and the documentation kind of implies) that the
      third argument would be the pointer to the beginning of the folio
      buffer.
      
      But no, the third argument to folio_zero_tail() is where we should start
      zeroing the tail (even if we already also pass in the offset separately
      as the second argument).
      
      So fix the hostfs caller, and we can leave any folio_zero_tail() sanity
      cleanup for later.
      
      Reported-and-tested-by: default avatarMaciej Żenczykowski <maze@google.com>
      Fixes: e3ec0fe9 ("hostfs: Convert hostfs_read_folio() to use a folio")
      Link: https://lore.kernel.org/all/CANP3RGceNzwdb7w=vPf5=7BCid5HVQDmz1K5kC9JG42+HVAh_g@mail.gmail.com/
      
      
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Christian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e44be002
    • Christian Brauner's avatar
      inode: clarify what's locked · f5e5e97c
      Christian Brauner authored
      
      In __wait_on_freeing_inode() we warn in case the inode_hash_lock is held
      but the inode is unhashed. We then release the inode_lock. So using
      "locked" as parameter name is confusing. Use is_inode_hash_locked as
      parameter name instead.
      
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      f5e5e97c
    • David Howells's avatar
      vfs: Fix potential circular locking through setxattr() and removexattr() · c3a5e3e8
      David Howells authored
      
      When using cachefiles, lockdep may emit something similar to the circular
      locking dependency notice below.  The problem appears to stem from the
      following:
      
       (1) Cachefiles manipulates xattrs on the files in its cache when called
           from ->writepages().
      
       (2) The setxattr() and removexattr() system call handlers get the name
           (and value) from userspace after taking the sb_writers lock, putting
           accesses of the vma->vm_lock and mm->mmap_lock inside of that.
      
       (3) The afs filesystem uses a per-inode lock to prevent multiple
           revalidation RPCs and in writeback vs truncate to prevent parallel
           operations from deadlocking against the server on one side and local
           page locks on the other.
      
      Fix this by moving the getting of the name and value in {get,remove}xattr()
      outside of the sb_writers lock.  This also has the minor benefits that we
      don't need to reget these in the event of a retry and we never try to take
      the sb_writers lock in the event we can't pull the name and value into the
      kernel.
      
      Alternative approaches that might fix this include moving the dispatch of a
      write to the cache off to a workqueue or trying to do without the
      validation lock in afs.  Note that this might also affect other filesystems
      that use netfslib and/or cachefiles.
      
       ======================================================
       WARNING: possible circular locking dependency detected
       6.10.0-build2+ #956 Not tainted
       ------------------------------------------------------
       fsstress/6050 is trying to acquire lock:
       ffff888138fd82f0 (mapping.invalidate_lock#3){++++}-{3:3}, at: filemap_fault+0x26e/0x8b0
      
       but task is already holding lock:
       ffff888113f26d18 (&vma->vm_lock->lock){++++}-{3:3}, at: lock_vma_under_rcu+0x165/0x250
      
       which lock already depends on the new lock.
      
       the existing dependency chain (in reverse order) is:
      
       -> #4 (&vma->vm_lock->lock){++++}-{3:3}:
              __lock_acquire+0xaf0/0xd80
              lock_acquire.part.0+0x103/0x280
              down_write+0x3b/0x50
              vma_start_write+0x6b/0xa0
              vma_link+0xcc/0x140
              insert_vm_struct+0xb7/0xf0
              alloc_bprm+0x2c1/0x390
              kernel_execve+0x65/0x1a0
              call_usermodehelper_exec_async+0x14d/0x190
              ret_from_fork+0x24/0x40
              ret_from_fork_asm+0x1a/0x30
      
       -> #3 (&mm->mmap_lock){++++}-{3:3}:
              __lock_acquire+0xaf0/0xd80
              lock_acquire.part.0+0x103/0x280
              __might_fault+0x7c/0xb0
              strncpy_from_user+0x25/0x160
              removexattr+0x7f/0x100
              __do_sys_fremovexattr+0x7e/0xb0
              do_syscall_64+0x9f/0x100
              entry_SYSCALL_64_after_hwframe+0x76/0x7e
      
       -> #2 (sb_writers#14){.+.+}-{0:0}:
              __lock_acquire+0xaf0/0xd80
              lock_acquire.part.0+0x103/0x280
              percpu_down_read+0x3c/0x90
              vfs_iocb_iter_write+0xe9/0x1d0
              __cachefiles_write+0x367/0x430
              cachefiles_issue_write+0x299/0x2f0
              netfs_advance_write+0x117/0x140
              netfs_write_folio.isra.0+0x5ca/0x6e0
              netfs_writepages+0x230/0x2f0
              afs_writepages+0x4d/0x70
              do_writepages+0x1e8/0x3e0
              filemap_fdatawrite_wbc+0x84/0xa0
              __filemap_fdatawrite_range+0xa8/0xf0
              file_write_and_wait_range+0x59/0x90
              afs_release+0x10f/0x270
              __fput+0x25f/0x3d0
              __do_sys_close+0x43/0x70
              do_syscall_64+0x9f/0x100
              entry_SYSCALL_64_after_hwframe+0x76/0x7e
      
       -> #1 (&vnode->validate_lock){++++}-{3:3}:
              __lock_acquire+0xaf0/0xd80
              lock_acquire.part.0+0x103/0x280
              down_read+0x95/0x200
              afs_writepages+0x37/0x70
              do_writepages+0x1e8/0x3e0
              filemap_fdatawrite_wbc+0x84/0xa0
              filemap_invalidate_inode+0x167/0x1e0
              netfs_unbuffered_write_iter+0x1bd/0x2d0
              vfs_write+0x22e/0x320
              ksys_write+0xbc/0x130
              do_syscall_64+0x9f/0x100
              entry_SYSCALL_64_after_hwframe+0x76/0x7e
      
       -> #0 (mapping.invalidate_lock#3){++++}-{3:3}:
              check_noncircular+0x119/0x160
              check_prev_add+0x195/0x430
              __lock_acquire+0xaf0/0xd80
              lock_acquire.part.0+0x103/0x280
              down_read+0x95/0x200
              filemap_fault+0x26e/0x8b0
              __do_fault+0x57/0xd0
              do_pte_missing+0x23b/0x320
              __handle_mm_fault+0x2d4/0x320
              handle_mm_fault+0x14f/0x260
              do_user_addr_fault+0x2a2/0x500
              exc_page_fault+0x71/0x90
              asm_exc_page_fault+0x22/0x30
      
       other info that might help us debug this:
      
       Chain exists of:
         mapping.invalidate_lock#3 --> &mm->mmap_lock --> &vma->vm_lock->lock
      
        Possible unsafe locking scenario:
      
              CPU0                    CPU1
              ----                    ----
         rlock(&vma->vm_lock->lock);
                                      lock(&mm->mmap_lock);
                                      lock(&vma->vm_lock->lock);
         rlock(mapping.invalidate_lock#3);
      
        *** DEADLOCK ***
      
       1 lock held by fsstress/6050:
        #0: ffff888113f26d18 (&vma->vm_lock->lock){++++}-{3:3}, at: lock_vma_under_rcu+0x165/0x250
      
       stack backtrace:
       CPU: 0 PID: 6050 Comm: fsstress Not tainted 6.10.0-build2+ #956
       Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
       Call Trace:
        <TASK>
        dump_stack_lvl+0x57/0x80
        check_noncircular+0x119/0x160
        ? queued_spin_lock_slowpath+0x4be/0x510
        ? __pfx_check_noncircular+0x10/0x10
        ? __pfx_queued_spin_lock_slowpath+0x10/0x10
        ? mark_lock+0x47/0x160
        ? init_chain_block+0x9c/0xc0
        ? add_chain_block+0x84/0xf0
        check_prev_add+0x195/0x430
        __lock_acquire+0xaf0/0xd80
        ? __pfx___lock_acquire+0x10/0x10
        ? __lock_release.isra.0+0x13b/0x230
        lock_acquire.part.0+0x103/0x280
        ? filemap_fault+0x26e/0x8b0
        ? __pfx_lock_acquire.part.0+0x10/0x10
        ? rcu_is_watching+0x34/0x60
        ? lock_acquire+0xd7/0x120
        down_read+0x95/0x200
        ? filemap_fault+0x26e/0x8b0
        ? __pfx_down_read+0x10/0x10
        ? __filemap_get_folio+0x25/0x1a0
        filemap_fault+0x26e/0x8b0
        ? __pfx_filemap_fault+0x10/0x10
        ? find_held_lock+0x7c/0x90
        ? __pfx___lock_release.isra.0+0x10/0x10
        ? __pte_offset_map+0x99/0x110
        __do_fault+0x57/0xd0
        do_pte_missing+0x23b/0x320
        __handle_mm_fault+0x2d4/0x320
        ? __pfx___handle_mm_fault+0x10/0x10
        handle_mm_fault+0x14f/0x260
        do_user_addr_fault+0x2a2/0x500
        exc_page_fault+0x71/0x90
        asm_exc_page_fault+0x22/0x30
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Link: https://lore.kernel.org/r/2136178.1721725194@warthog.procyon.org.uk
      
      
      cc: Alexander Viro <viro@zeniv.linux.org.uk>
      cc: Christian Brauner <brauner@kernel.org>
      cc: Jan Kara <jack@suse.cz>
      cc: Jeff Layton <jlayton@kernel.org>
      cc: Gao Xiang <xiang@kernel.org>
      cc: Matthew Wilcox <willy@infradead.org>
      cc: netfs@lists.linux.dev
      cc: linux-erofs@lists.ozlabs.org
      cc: linux-fsdevel@vger.kernel.org
      [brauner: fix minor issues]
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      c3a5e3e8
    • Jann Horn's avatar
      filelock: Fix fcntl/close race recovery compat path · f8138f2a
      Jann Horn authored
      When I wrote commit 3cad1bc0 ("filelock: Remove locks reliably when
      fcntl/close race is detected"), I missed that there are two copies of the
      code I was patching: The normal version, and the version for 64-bit offsets
      on 32-bit kernels.
      Thanks to Greg KH for stumbling over this while doing the stable
      backport...
      
      Apply exactly the same fix to the compat path for 32-bit kernels.
      
      Fixes: c293621b ("[PATCH] stale POSIX lock handling")
      Cc: stable@kernel.org
      Link: https://bugs.chromium.org/p/project-zero/issues/detail?id=2563
      
      
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Link: https://lore.kernel.org/r/20240723-fs-lock-recover-compatfix-v1-1-148096719529@google.com
      
      
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      f8138f2a
    • Christian Brauner's avatar
      fs: use all available ids · 8eac5358
      Christian Brauner authored
      The counter is unconditionally incremented for each mount allocation.
      If we set it to 1ULL << 32 we're losing 4294967296 as the first valid
      non-32 bit mount id.
      
      Link: https://lore.kernel.org/r/20240719-work-mount-namespace-v1-1-834113cab0d2@kernel.org
      
      
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      8eac5358
    • David Howells's avatar
      cachefiles: Set the max subreq size for cache writes to MAX_RW_COUNT · 51d37982
      David Howells authored
      
      Set the maximum size of a subrequest that writes to cachefiles to be
      MAX_RW_COUNT so that we don't overrun the maximum write we can make to the
      backing filesystem.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Link: https://lore.kernel.org/r/1599005.1721398742@warthog.procyon.org.uk
      
      
      cc: Jeff Layton <jlayton@kernel.org>
      cc: netfs@lists.linux.dev
      cc: linux-fsdevel@vger.kernel.org
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      51d37982
    • David Howells's avatar
      netfs: Fix writeback that needs to go to both server and cache · 212be98a
      David Howells authored
      When netfslib is performing writeback (ie. ->writepages), it maintains two
      parallel streams of writes, one to the server and one to the cache, but it
      doesn't mark either stream of writes as active until it gets some data that
      needs to be written to that stream.
      
      This is done because some folios will only be written to the cache
      (e.g. copying to the cache on read is done by marking the folios and
      letting writeback do the actual work) and sometimes we'll only be writing
      to the server (e.g. if there's no cache).
      
      Now, since we don't actually dispatch uploads and cache writes in parallel,
      but rather flip between the streams, depending on which has the lowest
      so-far-issued offset, and don't wait for the subreqs to finish before
      flipping, we can end up in a situation where, say, we issue a write to the
      server and this completes before we start the write to the cache.
      
      But because we only activate a stream when we first add a subreq to it, the
      result...
      212be98a
    • Christian Brauner's avatar
      pidfs: handle kernels without namespaces cleanly · 9b3e1504
      Christian Brauner authored
      The nsproxy structure contains nearly all of the namespaces associated
      with a task. When a given namespace type is not supported by this kernel
      the rules whether the corresponding pointer in struct nsproxy is NULL or
      always init_<ns_type>_ns differ per namespace. Ideally, that wouldn't be
      the case and for all namespace types we'd always set it to
      init_<ns_type>_ns when the corresponding namespace type isn't supported.
      
      Make sure we handle all namespaces where the pointer in struct nsproxy
      can be NULL when the namespace type isn't supported.
      
      Link: https://lore.kernel.org/r/20240722-work-pidfs-e6a83030f63e@brauner
      Fixes: 5b08bd40
      
       ("pidfs: allow retrieval of namespace file descriptors") # mainline only
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      9b3e1504
    • Edward Adam Davis's avatar
      pidfs: when time ns disabled add check for ioctl · f60d38cb
      Edward Adam Davis authored
      syzbot call pidfd_ioctl() with cmd "PIDFD_GET_TIME_NAMESPACE" and disabled
      CONFIG_TIME_NS, since time_ns is NULL, it will make NULL ponter deref in
      open_namespace.
      
      Fixes: 5b08bd40
      
       ("pidfs: allow retrieval of namespace file descriptors") # mainline only
      Reported-and-tested-by: default avatar <syzbot+34a0ee986f61f15da35d@syzkaller.appspotmail.com>
      Closes: https://syzkaller.appspot.com/bug?extid=34a0ee986f61f15da35d
      
      
      Signed-off-by: default avatarEdward Adam Davis <eadavis@qq.com>
      Link: https://lore.kernel.org/r/tencent_7FAE8DB725EE0DD69236DDABDDDE195E4F07@qq.com
      
      
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      f60d38cb
    • Congjie Zhou's avatar
      vfs: correct the comments of vfs_*() helpers · b40c8e7a
      Congjie Zhou authored
      correct the comments of vfs_*() helpers in fs/namei.c, including:
      1. vfs_create()
      2. vfs_mknod()
      3. vfs_mkdir()
      4. vfs_rmdir()
      5. vfs_symlink()
      
      All of them come from the same commit:
      6521f891
      
       "namei: prepare for idmapped mounts"
      
      The @dentry is actually the dentry of child directory rather than
      base directory(parent directory), and thus the @dir has to be
      modified due to the change of @dentry.
      
      Signed-off-by: default avatarCongjie Zhou <zcjie0802@qq.com>
      Link: https://lore.kernel.org/r/tencent_2FCF6CC9E10DC8A27AE58A5A0FE4FCE96D0A@qq.com
      
      
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      b40c8e7a
    • Mateusz Guzik's avatar
      vfs: handle __wait_on_freeing_inode() and evict() race · 5bc9ad78
      Mateusz Guzik authored
      Lockless hash lookup can find and lock the inode after it gets the
      I_FREEING flag set, at which point it blocks waiting for teardown in
      evict() to finish.
      
      However, the flag is still set even after evict() wakes up all waiters.
      
      This results in a race where if the inode lock is taken late enough, it
      can happen after both hash removal and wakeups, meaning there is nobody
      to wake the racing thread up.
      
      This worked prior to RCU-based lookup because the entire ordeal was
      synchronized with the inode hash lock.
      
      Since unhashing requires the inode lock, we can safely check whether it
      happened after acquiring it.
      
      Link: https://lore.kernel.org/v9fs/20240717102458.649b60be@kernel.org/
      
      
      Reported-by: default avatarDominique Martinet <asmadeus@codewreck.org>
      Fixes: 7180f8d9
      
       ("vfs: add rcu-based find_inode variants for iget ops")
      Signed-off-by: default avatarMateusz Guzik <mjguzik@gmail.com>
      Link: https://lore.kernel.org/r/20240718151838.611807-1-mjguzik@gmail.com
      Reviewed-by: Jan Kara <j...
      5bc9ad78
    • David Howells's avatar
      netfs: Rename CONFIG_FSCACHE_DEBUG to CONFIG_NETFS_DEBUG · fcad9336
      David Howells authored
      
      CONFIG_FSCACHE_DEBUG should have been renamed to CONFIG_NETFS_DEBUG, so do
      that now.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Link: https://lore.kernel.org/r/1410796.1721333406@warthog.procyon.org.uk
      
      
      cc: Uwe Kleine-König <ukleinek@kernel.org>
      cc: Christian Brauner <brauner@kernel.org>
      cc: Jeff Layton <jlayton@kernel.org>
      cc: netfs@lists.linux.dev
      cc: linux-fsdevel@vger.kernel.org
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      fcad9336
    • David Howells's avatar
      netfs: Revert "netfs: Switch debug logging to pr_debug()" · a9d47a50
      David Howells authored
      Revert commit 163eae0f
      
       to get back the
      original operation of the debugging macros.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Link: https://lore.kernel.org/r/20240608151352.22860-2-ukleinek@kernel.org
      Link: https://lore.kernel.org/r/1410685.1721333252@warthog.procyon.org.uk
      
      
      cc: Uwe Kleine-König <ukleinek@kernel.org>
      cc: Christian Brauner <brauner@kernel.org>
      cc: Jeff Layton <jlayton@kernel.org>
      cc: netfs@lists.linux.dev
      cc: linux-fsdevel@vger.kernel.org
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      a9d47a50
  3. Jul 22, 2024
  4. Jul 20, 2024
  5. Jul 19, 2024
    • Jason A. Donenfeld's avatar
      mm: add MAP_DROPPABLE for designating always lazily freeable mappings · 9651fced
      Jason A. Donenfeld authored
      
      The vDSO getrandom() implementation works with a buffer allocated with a
      new system call that has certain requirements:
      
      - It shouldn't be written to core dumps.
        * Easy: VM_DONTDUMP.
      - It should be zeroed on fork.
        * Easy: VM_WIPEONFORK.
      
      - It shouldn't be written to swap.
        * Uh-oh: mlock is rlimited.
        * Uh-oh: mlock isn't inherited by forks.
      
      - It shouldn't reserve actual memory, but it also shouldn't crash when
        page faulting in memory if none is available
        * Uh-oh: VM_NORESERVE means segfaults.
      
      It turns out that the vDSO getrandom() function has three really nice
      characteristics that we can exploit to solve this problem:
      
      1) Due to being wiped during fork(), the vDSO code is already robust to
         having the contents of the pages it reads zeroed out midway through
         the function's execution.
      
      2) In the absolute worst case of whatever contingency we're coding for,
         we have the option to fallback to the getrandom() syscall, and
         everything is fine.
      
      3) The buffers the function uses are only ever useful for a maximum of
         60 seconds -- a sort of cache, rather than a long term allocation.
      
      These characteristics mean that we can introduce VM_DROPPABLE, which
      has the following semantics:
      
      a) It never is written out to swap.
      b) Under memory pressure, mm can just drop the pages (so that they're
         zero when read back again).
      c) It is inherited by fork.
      d) It doesn't count against the mlock budget, since nothing is locked.
      e) If there's not enough memory to service a page fault, it's not fatal,
         and no signal is sent.
      
      This way, allocations used by vDSO getrandom() can use:
      
          VM_DROPPABLE | VM_DONTDUMP | VM_WIPEONFORK | VM_NORESERVE
      
      And there will be no problem with OOMing, crashing on overcommitment,
      using memory when not in use, not wiping on fork(), coredumps, or
      writing out to swap.
      
      In order to let vDSO getrandom() use this, expose these via mmap(2) as
      MAP_DROPPABLE.
      
      Note that this involves removing the MADV_FREE special case from
      sort_folio(), which according to Yu Zhao is unnecessary and will simply
      result in an extra call to shrink_folio_list() in the worst case. The
      chunk removed reenables the swapbacked flag, which we don't want for
      VM_DROPPABLE, and we can't conditionalize it here because there isn't a
      vma reference available.
      
      Finally, the provided self test ensures that this is working as desired.
      
      Cc: linux-mm@kvack.org
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      9651fced
    • David Howells's avatar
      cifs: Add a tracepoint to track credits involved in R/W requests · 519be989
      David Howells authored
      
      Add a tracepoint to track the credit changes and server in_flight value
      involved in the lifetime of a R/W request, logging it against the
      request/subreq debugging ID.  This requires the debugging IDs to be
      recorded in the cifs_credits struct.
      
      The tracepoint can be enabled with:
      
      	echo 1 >/sys/kernel/debug/tracing/events/cifs/smb3_rw_credits/enable
      
      Also add a three-state flag to struct cifs_credits to note if we're
      interested in determining when the in_flight contribution ends and, if so,
      to track whether we've decremented the contribution yet.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarPaulo Alcantara (Red Hat) <pc@manguebit.com>
      cc: Jeff Layton <jlayton@kernel.org>
      cc: linux-cifs@vger.kernel.org
      cc: netfs@lists.linux.dev
      cc: linux-fsdevel@vger.kernel.org
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      519be989
    • David Howells's avatar
      cifs: Fix setting of zero_point after DIO write · 61ea6b3a
      David Howells authored
      At the moment, at the end of a DIO write, cifs calls netfs_resize_file() to
      adjust the size of the file if it needs it.  This will reduce the
      zero_point (the point above which we assume a read will just return zeros)
      if it's more than the new i_size, but won't increase it.
      
      With DIO writes, however, we definitely want to increase it as we have
      clobbered the local pagecache and then written some data that's not
      available locally.
      
      Fix cifs to make the zero_point above the end of a DIO or unbuffered write.
      
      This fixes corruption seen occasionally with the generic/708 xfs-test.  In
      that case, the read-back of some of the written data is being
      short-circuited and replaced with zeroes.
      
      Fixes: 3ee1a1fc
      
       ("cifs: Cut over to using netfslib")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarSteve French <sfrench@samba.org>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarPaulo Alcantara (Red Hat) <pc@manguebit.com>
      cc: Jeff Layton <jlayton@kernel.org>
      cc: linux-cifs@vger.kernel.org
      cc: netfs@lists.linux.dev
      cc: linux-fsdevel@vger.kernel.org
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      61ea6b3a
    • David Howells's avatar
      cifs: Fix missing error code set · d2c5eb57
      David Howells authored
      In cifs_strict_readv(), the default rc (-EACCES) is accidentally cleared by
      a successful return from netfs_start_io_direct(), such that if
      cifs_find_lock_conflict() fails, we don't return an error.
      
      Fix this by resetting the default error code.
      
      Fixes: 14b1cd25
      
       ("cifs: Fix locking in cifs_strict_readv()")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarPaulo Alcantara (Red Hat) <pc@manguebit.com>
      cc: Jeff Layton <jlayton@kernel.org>
      cc: linux-cifs@vger.kernel.org
      cc: netfs@lists.linux.dev
      cc: linux-fsdevel@vger.kernel.org
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      d2c5eb57
    • David Howells's avatar
      cifs: Fix server re-repick on subrequest retry · de40579b
      David Howells authored
      When a subrequest is marked for needing retry, netfs will call
      cifs_prepare_write() which will make cifs repick the server for the op
      before renegotiating credits; it then calls cifs_issue_write() which
      invokes smb2_async_writev() - which re-repicks the server.
      
      If a different server is then selected, this causes the increment of
      server->in_flight to happen against one record and the decrement to happen
      against another, leading to misaccounting.
      
      Fix this by just removing the repick code in smb2_async_writev().  As this
      is only called from netfslib-driven code, cifs_prepare_write() should
      always have been called first, and so server should never be NULL and the
      preparatory step is repeated in the event that we do a retry.
      
      The problem manifests as a warning looking something like:
      
       WARNING: CPU: 4 PID: 72896 at fs/smb/client/smb2ops.c:97 smb2_add_credits+0x3f0/0x9e0 [cifs]
       ...
       RIP: 0010:smb2_add_credits+0x3f0/0x9e0 [cifs]
       ...
        smb2_writev_callback+0x334/0x560 [cifs]
        cifs_demultiplex_thread+0x77a/0x11b0 [cifs]
        kthread+0x187/0x1d0
        ret_from_fork+0x34/0x60
        ret_from_fork_asm+0x1a/0x30
      
      Which may be triggered by a number of different xfstests running against an
      Azure server in multichannel mode.  generic/249 seems the most repeatable,
      but generic/215, generic/249 and generic/308 may also show it.
      
      Fixes: 3ee1a1fc
      
       ("cifs: Cut over to using netfslib")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarSteve French <smfrench@gmail.com>
      Reviewed-by: default avatarPaulo Alcantara (Red Hat) <pc@manguebit.com>
      Acked-by: default avatarTom Talpey <tom@talpey.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Jeff Layton <jlayton@kernel.org>
      cc: Aurelien Aptel <aaptel@suse.com>
      cc: linux-cifs@vger.kernel.org
      cc: netfs@lists.linux.dev
      cc: linux-fsdevel@vger.kernel.org
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      de40579b
    • Steve French's avatar
      cifs: fix noisy message on copy_file_range · ae4ccca4
      Steve French authored
      
      There are common cases where copy_file_range can noisily
      log "source and target of copy not on same server"
      e.g. the mv command across mounts to two different server's shares.
      Change this to informational rather than logging as an error.
      
      A followon patch will add dynamic trace points e.g. for
      cifs_file_copychunk_range
      
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarShyam Prasad N <sprasad@microsoft.com>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      ae4ccca4
    • Qu Wenruo's avatar
      btrfs: change BTRFS_MOUNT_* flags to 64bit type · c3ece6b7
      Qu Wenruo authored
      Currently the BTRFS_MOUNT_* flags are already beyond 32 bits, this is
      going to cause compilation errors for some 32 bit systems, as their
      unsigned long is only 32 bits long, thus flag
      BTRFS_MOUNT_IGNORESUPERFLAGS overflows and can lead to errors.
      
      Fix the problem by:
      
      - Migrate all existing BTRFS_MOUNT_* flags to unsigned long long
      - Migrate all mount option related variables to unsigned long long
        * btrfs_fs_info::mount_opt
        * btrfs_fs_context::mount_opt
        * mount_opt parameter of btrfs_check_options()
        * old_opts parameter of btrfs_remount_begin()
        * old_opts parameter of btrfs_remount_cleanup()
        * mount_opt parameter of btrfs_check_mountopts_zoned()
        * mount_opt and opt parameters of check_ro_option()
      
      Fixes: 32e62165
      
       ("btrfs: introduce new "rescue=ignoresuperflags" mount option")
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      c3ece6b7
  6. Jul 18, 2024
  7. Jul 17, 2024