- Jul 26, 2024
-
-
Ryusuke Konishi authored
Syzbot reported that a buffer state inconsistency was detected in nilfs_btnode_create_block(), triggering a kernel bug. It is not appropriate to treat this inconsistency as a bug; it can occur if the argument block address (the buffer index of the newly created block) is a virtual block number and has been reallocated due to corruption of the bitmap used to manage its allocation state. So, modify nilfs_btnode_create_block() and its callers to treat it as a possible filesystem error, rather than triggering a kernel bug. Link: https://lkml.kernel.org/r/20240725052007.4562-1-konishi.ryusuke@gmail.com Fixes: a60be987 ("nilfs2: B-tree node cache") Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by:
<syzbot+89cc4f2324ed37988b60@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=89cc4f2324ed37988b60 Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
- Jul 24, 2024
-
-
Joel Granados authored
const qualify the struct ctl_table argument in the proc_handler function signatures. This is a prerequisite to moving the static ctl_table structs into .rodata data which will ensure that proc_handler function pointers cannot be modified. This patch has been generated by the following coccinelle script: ``` virtual patch @r1@ identifier ctl, write, buffer, lenp, ppos; identifier func !~ "appldata_(timer|interval)_handler|sched_(rt|rr)_handler|rds_tcp_skbuf_handler|proc_sctp_do_(hmac_alg|rto_min|rto_max|udp_port|alpha_beta|auth|probe_interval)"; @@ int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int write, void *buffer, size_t *lenp, loff_t *ppos); @r2@ identifier func, ctl, write, buffer, lenp, ppos; @@ int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int write, void *buffer, size_t *lenp, loff_t *ppos) { ... } @r3@ identifier func; @@ int func( - stru...
-
Linus Torvalds authored
Commit e3ec0fe9 ("hostfs: Convert hostfs_read_folio() to use a folio") simplified hostfs_read_folio(), but in the process of converting to using folios natively also mis-used the folio_zero_tail() function due to the very confusing API of that function. Very arguably it's folio_zero_tail() API itself that is buggy, since it would make more sense (and the documentation kind of implies) that the third argument would be the pointer to the beginning of the folio buffer. But no, the third argument to folio_zero_tail() is where we should start zeroing the tail (even if we already also pass in the offset separately as the second argument). So fix the hostfs caller, and we can leave any folio_zero_tail() sanity cleanup for later. Reported-and-tested-by:
Maciej Żenczykowski <maze@google.com> Fixes: e3ec0fe9 ("hostfs: Convert hostfs_read_folio() to use a folio") Link: https://lore.kernel.org/all/CANP3RGceNzwdb7w=vPf5=7BCid5HVQDmz1K5kC9JG42+HVAh_g@mail.gmail.com/ Cc: Matthew Wilcox <willy@infradead.org> Cc: Christian Brauner <brauner@kernel.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Christian Brauner authored
In __wait_on_freeing_inode() we warn in case the inode_hash_lock is held but the inode is unhashed. We then release the inode_lock. So using "locked" as parameter name is confusing. Use is_inode_hash_locked as parameter name instead. Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
David Howells authored
When using cachefiles, lockdep may emit something similar to the circular locking dependency notice below. The problem appears to stem from the following: (1) Cachefiles manipulates xattrs on the files in its cache when called from ->writepages(). (2) The setxattr() and removexattr() system call handlers get the name (and value) from userspace after taking the sb_writers lock, putting accesses of the vma->vm_lock and mm->mmap_lock inside of that. (3) The afs filesystem uses a per-inode lock to prevent multiple revalidation RPCs and in writeback vs truncate to prevent parallel operations from deadlocking against the server on one side and local page locks on the other. Fix this by moving the getting of the name and value in {get,remove}xattr() outside of the sb_writers lock. This also has the minor benefits that we don't need to reget these in the event of a retry and we never try to take the sb_writers lock in the event we can't pull the name and value into the kernel. Alternative approaches that might fix this include moving the dispatch of a write to the cache off to a workqueue or trying to do without the validation lock in afs. Note that this might also affect other filesystems that use netfslib and/or cachefiles. ====================================================== WARNING: possible circular locking dependency detected 6.10.0-build2+ #956 Not tainted ------------------------------------------------------ fsstress/6050 is trying to acquire lock: ffff888138fd82f0 (mapping.invalidate_lock#3){++++}-{3:3}, at: filemap_fault+0x26e/0x8b0 but task is already holding lock: ffff888113f26d18 (&vma->vm_lock->lock){++++}-{3:3}, at: lock_vma_under_rcu+0x165/0x250 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #4 (&vma->vm_lock->lock){++++}-{3:3}: __lock_acquire+0xaf0/0xd80 lock_acquire.part.0+0x103/0x280 down_write+0x3b/0x50 vma_start_write+0x6b/0xa0 vma_link+0xcc/0x140 insert_vm_struct+0xb7/0xf0 alloc_bprm+0x2c1/0x390 kernel_execve+0x65/0x1a0 call_usermodehelper_exec_async+0x14d/0x190 ret_from_fork+0x24/0x40 ret_from_fork_asm+0x1a/0x30 -> #3 (&mm->mmap_lock){++++}-{3:3}: __lock_acquire+0xaf0/0xd80 lock_acquire.part.0+0x103/0x280 __might_fault+0x7c/0xb0 strncpy_from_user+0x25/0x160 removexattr+0x7f/0x100 __do_sys_fremovexattr+0x7e/0xb0 do_syscall_64+0x9f/0x100 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #2 (sb_writers#14){.+.+}-{0:0}: __lock_acquire+0xaf0/0xd80 lock_acquire.part.0+0x103/0x280 percpu_down_read+0x3c/0x90 vfs_iocb_iter_write+0xe9/0x1d0 __cachefiles_write+0x367/0x430 cachefiles_issue_write+0x299/0x2f0 netfs_advance_write+0x117/0x140 netfs_write_folio.isra.0+0x5ca/0x6e0 netfs_writepages+0x230/0x2f0 afs_writepages+0x4d/0x70 do_writepages+0x1e8/0x3e0 filemap_fdatawrite_wbc+0x84/0xa0 __filemap_fdatawrite_range+0xa8/0xf0 file_write_and_wait_range+0x59/0x90 afs_release+0x10f/0x270 __fput+0x25f/0x3d0 __do_sys_close+0x43/0x70 do_syscall_64+0x9f/0x100 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #1 (&vnode->validate_lock){++++}-{3:3}: __lock_acquire+0xaf0/0xd80 lock_acquire.part.0+0x103/0x280 down_read+0x95/0x200 afs_writepages+0x37/0x70 do_writepages+0x1e8/0x3e0 filemap_fdatawrite_wbc+0x84/0xa0 filemap_invalidate_inode+0x167/0x1e0 netfs_unbuffered_write_iter+0x1bd/0x2d0 vfs_write+0x22e/0x320 ksys_write+0xbc/0x130 do_syscall_64+0x9f/0x100 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #0 (mapping.invalidate_lock#3){++++}-{3:3}: check_noncircular+0x119/0x160 check_prev_add+0x195/0x430 __lock_acquire+0xaf0/0xd80 lock_acquire.part.0+0x103/0x280 down_read+0x95/0x200 filemap_fault+0x26e/0x8b0 __do_fault+0x57/0xd0 do_pte_missing+0x23b/0x320 __handle_mm_fault+0x2d4/0x320 handle_mm_fault+0x14f/0x260 do_user_addr_fault+0x2a2/0x500 exc_page_fault+0x71/0x90 asm_exc_page_fault+0x22/0x30 other info that might help us debug this: Chain exists of: mapping.invalidate_lock#3 --> &mm->mmap_lock --> &vma->vm_lock->lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- rlock(&vma->vm_lock->lock); lock(&mm->mmap_lock); lock(&vma->vm_lock->lock); rlock(mapping.invalidate_lock#3); *** DEADLOCK *** 1 lock held by fsstress/6050: #0: ffff888113f26d18 (&vma->vm_lock->lock){++++}-{3:3}, at: lock_vma_under_rcu+0x165/0x250 stack backtrace: CPU: 0 PID: 6050 Comm: fsstress Not tainted 6.10.0-build2+ #956 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 Call Trace: <TASK> dump_stack_lvl+0x57/0x80 check_noncircular+0x119/0x160 ? queued_spin_lock_slowpath+0x4be/0x510 ? __pfx_check_noncircular+0x10/0x10 ? __pfx_queued_spin_lock_slowpath+0x10/0x10 ? mark_lock+0x47/0x160 ? init_chain_block+0x9c/0xc0 ? add_chain_block+0x84/0xf0 check_prev_add+0x195/0x430 __lock_acquire+0xaf0/0xd80 ? __pfx___lock_acquire+0x10/0x10 ? __lock_release.isra.0+0x13b/0x230 lock_acquire.part.0+0x103/0x280 ? filemap_fault+0x26e/0x8b0 ? __pfx_lock_acquire.part.0+0x10/0x10 ? rcu_is_watching+0x34/0x60 ? lock_acquire+0xd7/0x120 down_read+0x95/0x200 ? filemap_fault+0x26e/0x8b0 ? __pfx_down_read+0x10/0x10 ? __filemap_get_folio+0x25/0x1a0 filemap_fault+0x26e/0x8b0 ? __pfx_filemap_fault+0x10/0x10 ? find_held_lock+0x7c/0x90 ? __pfx___lock_release.isra.0+0x10/0x10 ? __pte_offset_map+0x99/0x110 __do_fault+0x57/0xd0 do_pte_missing+0x23b/0x320 __handle_mm_fault+0x2d4/0x320 ? __pfx___handle_mm_fault+0x10/0x10 handle_mm_fault+0x14f/0x260 do_user_addr_fault+0x2a2/0x500 exc_page_fault+0x71/0x90 asm_exc_page_fault+0x22/0x30 Signed-off-by:
David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/2136178.1721725194@warthog.procyon.org.uk cc: Alexander Viro <viro@zeniv.linux.org.uk> cc: Christian Brauner <brauner@kernel.org> cc: Jan Kara <jack@suse.cz> cc: Jeff Layton <jlayton@kernel.org> cc: Gao Xiang <xiang@kernel.org> cc: Matthew Wilcox <willy@infradead.org> cc: netfs@lists.linux.dev cc: linux-erofs@lists.ozlabs.org cc: linux-fsdevel@vger.kernel.org [brauner: fix minor issues] Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
Jann Horn authored
When I wrote commit 3cad1bc0 ("filelock: Remove locks reliably when fcntl/close race is detected"), I missed that there are two copies of the code I was patching: The normal version, and the version for 64-bit offsets on 32-bit kernels. Thanks to Greg KH for stumbling over this while doing the stable backport... Apply exactly the same fix to the compat path for 32-bit kernels. Fixes: c293621b ("[PATCH] stale POSIX lock handling") Cc: stable@kernel.org Link: https://bugs.chromium.org/p/project-zero/issues/detail?id=2563 Signed-off-by:
Jann Horn <jannh@google.com> Link: https://lore.kernel.org/r/20240723-fs-lock-recover-compatfix-v1-1-148096719529@google.com Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
Christian Brauner authored
The counter is unconditionally incremented for each mount allocation. If we set it to 1ULL << 32 we're losing 4294967296 as the first valid non-32 bit mount id. Link: https://lore.kernel.org/r/20240719-work-mount-namespace-v1-1-834113cab0d2@kernel.org Reviewed-by:
Josef Bacik <josef@toxicpanda.com> Reviewed-by:
Jeff Layton <jlayton@kernel.org> Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
David Howells authored
Set the maximum size of a subrequest that writes to cachefiles to be MAX_RW_COUNT so that we don't overrun the maximum write we can make to the backing filesystem. Signed-off-by:
David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/1599005.1721398742@warthog.procyon.org.uk cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
David Howells authored
When netfslib is performing writeback (ie. ->writepages), it maintains two parallel streams of writes, one to the server and one to the cache, but it doesn't mark either stream of writes as active until it gets some data that needs to be written to that stream. This is done because some folios will only be written to the cache (e.g. copying to the cache on read is done by marking the folios and letting writeback do the actual work) and sometimes we'll only be writing to the server (e.g. if there's no cache). Now, since we don't actually dispatch uploads and cache writes in parallel, but rather flip between the streams, depending on which has the lowest so-far-issued offset, and don't wait for the subreqs to finish before flipping, we can end up in a situation where, say, we issue a write to the server and this completes before we start the write to the cache. But because we only activate a stream when we first add a subreq to it, the result...
-
Christian Brauner authored
The nsproxy structure contains nearly all of the namespaces associated with a task. When a given namespace type is not supported by this kernel the rules whether the corresponding pointer in struct nsproxy is NULL or always init_<ns_type>_ns differ per namespace. Ideally, that wouldn't be the case and for all namespace types we'd always set it to init_<ns_type>_ns when the corresponding namespace type isn't supported. Make sure we handle all namespaces where the pointer in struct nsproxy can be NULL when the namespace type isn't supported. Link: https://lore.kernel.org/r/20240722-work-pidfs-e6a83030f63e@brauner Fixes: 5b08bd40 ("pidfs: allow retrieval of namespace file descriptors") # mainline only Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
Edward Adam Davis authored
syzbot call pidfd_ioctl() with cmd "PIDFD_GET_TIME_NAMESPACE" and disabled CONFIG_TIME_NS, since time_ns is NULL, it will make NULL ponter deref in open_namespace. Fixes: 5b08bd40 ("pidfs: allow retrieval of namespace file descriptors") # mainline only Reported-and-tested-by:
<syzbot+34a0ee986f61f15da35d@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=34a0ee986f61f15da35d Signed-off-by:
Edward Adam Davis <eadavis@qq.com> Link: https://lore.kernel.org/r/tencent_7FAE8DB725EE0DD69236DDABDDDE195E4F07@qq.com Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
Congjie Zhou authored
correct the comments of vfs_*() helpers in fs/namei.c, including: 1. vfs_create() 2. vfs_mknod() 3. vfs_mkdir() 4. vfs_rmdir() 5. vfs_symlink() All of them come from the same commit: 6521f891 "namei: prepare for idmapped mounts" The @dentry is actually the dentry of child directory rather than base directory(parent directory), and thus the @dir has to be modified due to the change of @dentry. Signed-off-by:
Congjie Zhou <zcjie0802@qq.com> Link: https://lore.kernel.org/r/tencent_2FCF6CC9E10DC8A27AE58A5A0FE4FCE96D0A@qq.com Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
Mateusz Guzik authored
Lockless hash lookup can find and lock the inode after it gets the I_FREEING flag set, at which point it blocks waiting for teardown in evict() to finish. However, the flag is still set even after evict() wakes up all waiters. This results in a race where if the inode lock is taken late enough, it can happen after both hash removal and wakeups, meaning there is nobody to wake the racing thread up. This worked prior to RCU-based lookup because the entire ordeal was synchronized with the inode hash lock. Since unhashing requires the inode lock, we can safely check whether it happened after acquiring it. Link: https://lore.kernel.org/v9fs/20240717102458.649b60be@kernel.org/ Reported-by:
Dominique Martinet <asmadeus@codewreck.org> Fixes: 7180f8d9 ("vfs: add rcu-based find_inode variants for iget ops") Signed-off-by:
Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20240718151838.611807-1-mjguzik@gmail.com Reviewed-by: Jan Kara <j...
-
David Howells authored
CONFIG_FSCACHE_DEBUG should have been renamed to CONFIG_NETFS_DEBUG, so do that now. Signed-off-by:
David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/1410796.1721333406@warthog.procyon.org.uk cc: Uwe Kleine-König <ukleinek@kernel.org> cc: Christian Brauner <brauner@kernel.org> cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
David Howells authored
Revert commit 163eae0f to get back the original operation of the debugging macros. Signed-off-by:
David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20240608151352.22860-2-ukleinek@kernel.org Link: https://lore.kernel.org/r/1410685.1721333252@warthog.procyon.org.uk cc: Uwe Kleine-König <ukleinek@kernel.org> cc: Christian Brauner <brauner@kernel.org> cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
- Jul 22, 2024
-
-
Kees Cook authored
Move the exec KUnit tests into a separate directory to avoid polluting the local directory namespace. Additionally update MAINTAINERS for the new files. Reviewed-by:
David Gow <davidgow@google.com> Reviewed-by:
SeongJae Park <sj@kernel.org> Acked-by:
Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20240720170310.it.942-kees@kernel.org Signed-off-by:
Kees Cook <kees@kernel.org>
-
Kent Overstreet authored
Reported-by:
<syzbot+f765e51170cf13493f0b@syzkaller.appspotmail.com> Fixes: f12410bb ("bcachefs: Add an error message for insufficient rw journal devs") Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
- Jul 20, 2024
-
-
David Howells authored
A network filesystem needs to implement a netfslib hook to invalidate fscache if it's to be able to use the cache. Fix cifs to implement the cache invalidation hook. Signed-off-by:
David Howells <dhowells@redhat.com> Reviewed-by:
Paulo Alcantara (Red Hat) <pc@manguebit.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Cc: stable@vger.kernel.org Fixes: 3ee1a1fc ("cifs: Cut over to using netfslib") Signed-off-by:
Steve French <stfrench@microsoft.com>
-
- Jul 19, 2024
-
-
Jason A. Donenfeld authored
The vDSO getrandom() implementation works with a buffer allocated with a new system call that has certain requirements: - It shouldn't be written to core dumps. * Easy: VM_DONTDUMP. - It should be zeroed on fork. * Easy: VM_WIPEONFORK. - It shouldn't be written to swap. * Uh-oh: mlock is rlimited. * Uh-oh: mlock isn't inherited by forks. - It shouldn't reserve actual memory, but it also shouldn't crash when page faulting in memory if none is available * Uh-oh: VM_NORESERVE means segfaults. It turns out that the vDSO getrandom() function has three really nice characteristics that we can exploit to solve this problem: 1) Due to being wiped during fork(), the vDSO code is already robust to having the contents of the pages it reads zeroed out midway through the function's execution. 2) In the absolute worst case of whatever contingency we're coding for, we have the option to fallback to the getrandom() syscall, and everything is fine. 3) The buffers the function uses are only ever useful for a maximum of 60 seconds -- a sort of cache, rather than a long term allocation. These characteristics mean that we can introduce VM_DROPPABLE, which has the following semantics: a) It never is written out to swap. b) Under memory pressure, mm can just drop the pages (so that they're zero when read back again). c) It is inherited by fork. d) It doesn't count against the mlock budget, since nothing is locked. e) If there's not enough memory to service a page fault, it's not fatal, and no signal is sent. This way, allocations used by vDSO getrandom() can use: VM_DROPPABLE | VM_DONTDUMP | VM_WIPEONFORK | VM_NORESERVE And there will be no problem with OOMing, crashing on overcommitment, using memory when not in use, not wiping on fork(), coredumps, or writing out to swap. In order to let vDSO getrandom() use this, expose these via mmap(2) as MAP_DROPPABLE. Note that this involves removing the MADV_FREE special case from sort_folio(), which according to Yu Zhao is unnecessary and will simply result in an extra call to shrink_folio_list() in the worst case. The chunk removed reenables the swapbacked flag, which we don't want for VM_DROPPABLE, and we can't conditionalize it here because there isn't a vma reference available. Finally, the provided self test ensures that this is working as desired. Cc: linux-mm@kvack.org Acked-by:
David Hildenbrand <david@redhat.com> Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
David Howells authored
Add a tracepoint to track the credit changes and server in_flight value involved in the lifetime of a R/W request, logging it against the request/subreq debugging ID. This requires the debugging IDs to be recorded in the cifs_credits struct. The tracepoint can be enabled with: echo 1 >/sys/kernel/debug/tracing/events/cifs/smb3_rw_credits/enable Also add a three-state flag to struct cifs_credits to note if we're interested in determining when the in_flight contribution ends and, if so, to track whether we've decremented the contribution yet. Signed-off-by:
David Howells <dhowells@redhat.com> Reviewed-by:
Paulo Alcantara (Red Hat) <pc@manguebit.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by:
Steve French <stfrench@microsoft.com>
-
David Howells authored
At the moment, at the end of a DIO write, cifs calls netfs_resize_file() to adjust the size of the file if it needs it. This will reduce the zero_point (the point above which we assume a read will just return zeros) if it's more than the new i_size, but won't increase it. With DIO writes, however, we definitely want to increase it as we have clobbered the local pagecache and then written some data that's not available locally. Fix cifs to make the zero_point above the end of a DIO or unbuffered write. This fixes corruption seen occasionally with the generic/708 xfs-test. In that case, the read-back of some of the written data is being short-circuited and replaced with zeroes. Fixes: 3ee1a1fc ("cifs: Cut over to using netfslib") Cc: stable@vger.kernel.org Reported-by:
Steve French <sfrench@samba.org> Signed-off-by:
David Howells <dhowells@redhat.com> Reviewed-by:
Paulo Alcantara (Red Hat) <pc@manguebit.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by:
Steve French <stfrench@microsoft.com>
-
David Howells authored
In cifs_strict_readv(), the default rc (-EACCES) is accidentally cleared by a successful return from netfs_start_io_direct(), such that if cifs_find_lock_conflict() fails, we don't return an error. Fix this by resetting the default error code. Fixes: 14b1cd25 ("cifs: Fix locking in cifs_strict_readv()") Cc: stable@vger.kernel.org Signed-off-by:
David Howells <dhowells@redhat.com> Reviewed-by:
Paulo Alcantara (Red Hat) <pc@manguebit.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by:
Steve French <stfrench@microsoft.com>
-
David Howells authored
When a subrequest is marked for needing retry, netfs will call cifs_prepare_write() which will make cifs repick the server for the op before renegotiating credits; it then calls cifs_issue_write() which invokes smb2_async_writev() - which re-repicks the server. If a different server is then selected, this causes the increment of server->in_flight to happen against one record and the decrement to happen against another, leading to misaccounting. Fix this by just removing the repick code in smb2_async_writev(). As this is only called from netfslib-driven code, cifs_prepare_write() should always have been called first, and so server should never be NULL and the preparatory step is repeated in the event that we do a retry. The problem manifests as a warning looking something like: WARNING: CPU: 4 PID: 72896 at fs/smb/client/smb2ops.c:97 smb2_add_credits+0x3f0/0x9e0 [cifs] ... RIP: 0010:smb2_add_credits+0x3f0/0x9e0 [cifs] ... smb2_writev_callback+0x334/0x560 [cifs] cifs_demultiplex_thread+0x77a/0x11b0 [cifs] kthread+0x187/0x1d0 ret_from_fork+0x34/0x60 ret_from_fork_asm+0x1a/0x30 Which may be triggered by a number of different xfstests running against an Azure server in multichannel mode. generic/249 seems the most repeatable, but generic/215, generic/249 and generic/308 may also show it. Fixes: 3ee1a1fc ("cifs: Cut over to using netfslib") Cc: stable@vger.kernel.org Reported-by:
Steve French <smfrench@gmail.com> Reviewed-by:
Paulo Alcantara (Red Hat) <pc@manguebit.com> Acked-by:
Tom Talpey <tom@talpey.com> Signed-off-by:
David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: Aurelien Aptel <aaptel@suse.com> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by:
Steve French <stfrench@microsoft.com>
-
Steve French authored
There are common cases where copy_file_range can noisily log "source and target of copy not on same server" e.g. the mv command across mounts to two different server's shares. Change this to informational rather than logging as an error. A followon patch will add dynamic trace points e.g. for cifs_file_copychunk_range Cc: stable@vger.kernel.org Reviewed-by:
Shyam Prasad N <sprasad@microsoft.com> Signed-off-by:
Steve French <stfrench@microsoft.com>
-
Qu Wenruo authored
Currently the BTRFS_MOUNT_* flags are already beyond 32 bits, this is going to cause compilation errors for some 32 bit systems, as their unsigned long is only 32 bits long, thus flag BTRFS_MOUNT_IGNORESUPERFLAGS overflows and can lead to errors. Fix the problem by: - Migrate all existing BTRFS_MOUNT_* flags to unsigned long long - Migrate all mount option related variables to unsigned long long * btrfs_fs_info::mount_opt * btrfs_fs_context::mount_opt * mount_opt parameter of btrfs_check_options() * old_opts parameter of btrfs_remount_begin() * old_opts parameter of btrfs_remount_cleanup() * mount_opt parameter of btrfs_check_mountopts_zoned() * mount_opt and opt parameters of check_ro_option() Fixes: 32e62165 ("btrfs: introduce new "rescue=ignoresuperflags" mount option") Signed-off-by:
Qu Wenruo <wqu@suse.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
- Jul 18, 2024
-
-
Kent Overstreet authored
When we're called via trans commit -> btree split -> allocator We may have already arbitrarily many btree_paths, for the transaction commit we're trying to do; when this happens, the btree_trans_too_many_iters() call causes us to livelock. Since the allocator calls btree_iter_dontneed to release paths as it iterates, this shouldn't cause any problems. Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Tavian Barnes authored
Shifting a value by the width of its type or more is undefined. Signed-off-by:
Tavian Barnes <tavianator@tavianator.com> Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
We can't have more updates than paths, so btree_path_idx_t is the correct type to use. Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
If a btree_trans is in use it's supposed to be passed to fsck_err so that it can be unlocked if we're waiting on userspace input; but the btree IO paths do call fsck errors where a btree_trans exists on the stack but it's not passed through. But it's ok, because it's unlocked while doing IO. Fixes: a850bde6 ("bcachefs: fsck_err() may now take a btree_trans") Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
This causes us to go read-only - need an error message saying why. Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Tavian Barnes authored
Shifting a negative value left is undefined. Signed-off-by:
Tavian Barnes <tavianator@tavianator.com> Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Christian Brauner authored
Ensure that rcu read lock is given up before returning. Link: https://lore.kernel.org/r/20240716-elixier-fliesen-1ab342151a61@brauner Fixes: ca567df7 ("nsfs: add pid translation ioctls") Reported-by:
<syzbot+a3e82ae343b26b4d2335@syzkaller.appspotmail.com> Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
Jeff Johnson authored
Fix the 'make W=1' issue: WARNING: modpost: missing MODULE_DESCRIPTION() in fs/adfs/adfs.o Signed-off-by:
Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://lore.kernel.org/r/20240523-md-adfs-v1-1-364268e38370@quicinc.com Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
Donet Tom authored
generic_hugetlb_get_unmapped_area() was returning an address less than mmap_min_addr if the mmap argument addr, after alignment, was less than mmap_min_addr, causing mmap to fail. This is because current generic_hugetlb_get_unmapped_area() code does not take into account mmap_min_addr. This patch ensures that generic_hugetlb_get_unmapped_area() always returns an address that is greater than mmap_min_addr. Additionally, similar to generic_get_unmapped_area(), vm_end_gap() checks are included to maintain stack gap. How to reproduce ================ #include <stdio.h> #include <stdlib.h> #include <sys/mman.h> #include <unistd.h> #define HUGEPAGE_SIZE (16 * 1024 * 1024) int main() { void *addr = mmap((void *)-1, HUGEPAGE_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0); if (addr == MAP_FAILED) { perror("mmap"); exit(EXIT_FAILURE); } snprintf((char *)addr, HUGEPAGE_SIZE, "Hello, Huge Pages!"); printf("%s\n", (char *)addr); if (munmap(addr, HUGEPAGE_SIZE) == -1) { perror("munmap"); exit(EXIT_FAILURE); } return 0; } Result without fix ================== # cat /proc/meminfo |grep -i HugePages_Free HugePages_Free: 20 # ./test mmap: Permission denied # Result with fix =============== # cat /proc/meminfo |grep -i HugePages_Free HugePages_Free: 20 # ./test Hello, Huge Pages! # Link: https://lkml.kernel.org/r/20240710051912.4681-1-donettom@linux.ibm.com Signed-off-by:
Donet Tom <donettom@linux.ibm.com> Reported-by Pavithra Prakash <pavrampu@linux.vnet.ibm.com> Acked-by:
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: Tony Battersby <tonyb@cybernetics.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
- Jul 17, 2024
-
-
Christoph Hellwig authored
nfs_read_folio is a bit hard to follow because it mixes highlevel logic with the actual data read. Split the latter into a helper and update the comments to be more accurate. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Christoph Hellwig authored
nfs_folio_length is unsafe to use without having the folio locked and a check for a NULL ->f_mapping that protects against truncations and can lead to kernel crashes. E.g. when running xfstests generic/065 with all nfs trace points enabled. Follow the model of the XFS trace points and pass in an explіcit offset and length. This has the additional benefit that these values can be more accurate as some of the users touch partial folio ranges. Fixes: eb5654b3 ("NFS: Enable tracing of nfs_invalidate_folio() and nfs_launder_folio()") Reported-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Jiri Pirko authored
Since the original virtio_find_vqs() is no longer present, rename virtio_find_vqs_info() back to virtio_find_vqs(). Signed-off-by:
Jiri Pirko <jiri@nvidia.com> Message-Id: <20240708074814.1739223-20-jiri@resnulli.us> Signed-off-by:
Michael S. Tsirkin <mst@redhat.com>
-
Jiri Pirko authored
Instead of passing separate names and callbacks arrays to virtio_find_vqs(), allocate one of virtual_queue_info structs and pass it to virtio_find_vqs_info(). Suggested-by:
Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by:
Jiri Pirko <jiri@nvidia.com> Message-Id: <20240708074814.1739223-16-jiri@resnulli.us> Signed-off-by:
Michael S. Tsirkin <mst@redhat.com>
-