- Jul 28, 2024
-
-
Linus Torvalds authored
We only had a couple of array[] declarations, and changing them to just use 'MAX()' instead of 'max()' fixes the issue. This will allow us to simplify our min/max macros enormously, since they can now unconditionally use temporary variables to avoid using the argument values multiple times. Cc: David Laight <David.Laight@aculab.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Jul 27, 2024
-
-
Hongbo Li authored
hostfs not keep the host directory when mounting. When the host directory is none (default), fc->source is used as the host root directory, and this is wrong. Here we use `parse_monolithic` to handle the old mount path for parsing the root directory. For new mount path, The `parse_param` is used for the host directory parse. Reported-and-tested-by:
Maciej Żenczykowski <maze@google.com> Fixes: cd140ce9 ("hostfs: convert hostfs to use the new mount API") Link: https://lore.kernel.org/all/CANP3RGceNzwdb7w=vPf5=7BCid5HVQDmz1K5kC9JG42+HVAh_g@mail.gmail.com/ Cc: Christian Brauner <brauner@kernel.org> Signed-off-by:
Hongbo Li <lihongbo22@huawei.com> Link: https://lore.kernel.org/r/20240725065130.1821964-1-lihongbo22@huawei.com [brauner: minor fixes] Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
Seth Forshee (DigitalOcean) authored
Christian noticed that it is possible for a privileged user to mount most filesystems with a non-initial user namespace in sb->s_user_ns. When fsopen() is called in a non-init namespace the caller's namespace is recorded in fs_context->user_ns. If the returned file descriptor is then passed to a process priviliged in init_user_ns, that process can call fsconfig(fd_fs, FSCONFIG_CMD_CREATE), creating a new superblock with sb->s_user_ns set to the namespace of the process which called fsopen(). This is problematic. We cannot assume that any filesystem which does not set FS_USERNS_MOUNT has been written with a non-initial s_user_ns in mind, increasing the risk for bugs and security issues. Prevent this by returning EPERM from sget_fc() when FS_USERNS_MOUNT is not set for the filesystem and a non-initial user namespace will be used. sget() does not need to be updated as it always uses the user namespace of the current context, or the initial user namespace if SB_SUBMOUNT is set. Fixes: cb50b348 ("convenience helpers: vfs_get_super() and sget_fc()") Reported-by:
Christian Brauner <brauner@kernel.org> Signed-off-by:
Seth Forshee (DigitalOcean) <sforshee@kernel.org> Link: https://lore.kernel.org/r/20240724-s_user_ns-fix-v1-1-895d07c94701@kernel.org Reviewed-by:
Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
- Jul 26, 2024
-
-
Ryusuke Konishi authored
Syzbot reported that a buffer state inconsistency was detected in nilfs_btnode_create_block(), triggering a kernel bug. It is not appropriate to treat this inconsistency as a bug; it can occur if the argument block address (the buffer index of the newly created block) is a virtual block number and has been reallocated due to corruption of the bitmap used to manage its allocation state. So, modify nilfs_btnode_create_block() and its callers to treat it as a possible filesystem error, rather than triggering a kernel bug. Link: https://lkml.kernel.org/r/20240725052007.4562-1-konishi.ryusuke@gmail.com Fixes: a60be987 ("nilfs2: B-tree node cache") Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by:
<syzbot+89cc4f2324ed37988b60@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=89cc4f2324ed37988b60 Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Steve French authored
There are cases where services need to remount (or change their credentials files) when keys have expired, but it can be helpful to have a dynamic trace point to make it easier to notify the service to refresh the storage account key. Here is sample output, one from mount with bad password, one from a reconnect where the password has been changed or expired and reconnect fails (requiring remount with new storage account key) TASK-PID CPU# ||||| TIMESTAMP FUNCTION | | | ||||| | | mount.cifs-11362 [000] ..... 6000.241620: smb3_key_expired: rc=-13 user=testpassu conn_id=0x2 server=localhost addr=127.0.0.1:445 kworker/4:0-8458 [004] ..... 6044.892283: smb3_key_expired: rc=-13 user=testpassu conn_id=0x3 server=localhost addr=127.0.0.1:445 Reviewed-by:
Shyam Prasad N <sprasad@microsoft.com> Signed-off-by:
Steve French <stfrench@microsoft.com>
-
Steve French authored
Add more dynamic tracepoints to help debug copy_file_range (copychunk) and clone_range ("duplicate extents"). These are tracepoints for entering the function and completing without error. For example: "trace-cmd record -e smb3_copychunk_enter -e smb3_copychunk_done" or "trace-cmd record -e smb3_clone_enter -e smb3_clone_done" Here is sample output: TASK-PID CPU# ||||| TIMESTAMP FUNCTION | | | ||||| | | cp-5964 [005] ..... 2176.168977: smb3_clone_enter: xid=17 sid=0xeb275be4 tid=0x7ffa7cdb source fid=0x1ed02e15 source offset=0x0 target fid=0x1ed02e15 target offset=0x0 len=0xa0000 cp-5964 [005] ..... 2176.170668: smb3_clone_done: xid=17 sid=0xeb275be4 tid=0x7ffa7cdb source fid=0x1ed02e15 source offset=0x0 target fid=0x1ed02e15 target offset=0x0 len=0xa0000 Reviewed-by:
Shyam Prasad N <sprasad@microsoft.com> Signed-off-by:
Steve French <stfrench@microsoft.com>
-
Steve French authored
There are cases where debugging clone_range ("smb2_duplicate_extents" function) and in the future copy_range ("smb2_copychunk_range") can be helpful. Add dynamic trace points for any errors in clone, and a followon patch will add them for copychunk. "trace-cmd record -e smb3_clone_err" Reviewed-by:
Shyam Prasad N <sprasad@microsoft.com> Signed-off-by:
Steve French <stfrench@microsoft.com>
-
Chen Ni authored
Replace a comma between expression statements by a semicolon. Signed-off-by:
Chen Ni <nichen@iscas.ac.cn> Link: https://lore.kernel.org/r/20240724020721.2389738-1-nichen@iscas.ac.cn Reviewed-by:
Chao Yu <chao@kernel.org> Signed-off-by:
Gao Xiang <hsiangkao@linux.alibaba.com>
-
Gao Xiang authored
If the requested page is part of the previous multi-page folio, there is no need to call read_mapping_folio() again. Also, get rid of the remaining one of page->index [1] in our codebase. [1] https://lore.kernel.org/r/Zp8fgUSIBGQ1TN0D@casper.infradead.org Cc: Matthew Wilcox <willy@infradead.org> Reviewed-by:
Chao Yu <chao@kernel.org> Signed-off-by:
Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240723073024.875290-1-hsiangkao@linux.alibaba.com
-
Huang Xiaojia authored
FS_IOC_GETFSSYSFSPATH ioctl exposes /sys/fs path of a given filesystem, potentially standarizing sysfs reporting. This patch add support for FS_IOC_GETFSSYSFSPATH for erofs, "erofs/<dev>" will be outputted for bdev cases, "erofs/[domain_id,]<fs_id>" will be outputted for fscache cases. Signed-off-by:
Huang Xiaojia <huangxiaojia2@huawei.com> Link: https://lore.kernel.org/r/20240720082335.441563-1-huangxiaojia2@huawei.com Reviewed-by:
Chao Yu <chao@kernel.org> Signed-off-by:
Gao Xiang <hsiangkao@linux.alibaba.com>
-
Gao Xiang authored
In z_erofs_get_gbuf(), the current task may be migrated to another CPU between `z_erofs_gbuf_id()` and `spin_lock(&gbuf->lock)`. Therefore, z_erofs_put_gbuf() will trigger the following issue which was found by stress test: <2>[772156.434168] kernel BUG at fs/erofs/zutil.c:58! .. <4>[772156.435007] <4>[772156.439237] CPU: 0 PID: 3078 Comm: stress Kdump: loaded Tainted: G E 6.10.0-rc7+ #2 <4>[772156.439239] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 1.0.0 01/01/2017 <4>[772156.439241] pstate: 83400005 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) <4>[772156.439243] pc : z_erofs_put_gbuf+0x64/0x70 [erofs] <4>[772156.439252] lr : z_erofs_lz4_decompress+0x600/0x6a0 [erofs] .. <6>[772156.445958] stress (3127): drop_caches: 1 <4>[772156.446120] Call trace: <4>[772156.446121] z_erofs_put_gbuf+0x64/0x70 [erofs] <4>[772156.446761] z_erofs_lz4_decompress+0x600/0x6a0 [erofs] <4>[772156.446897] z_erofs_decompress_queue+0x740/0xa10 [erofs] <4>[772156.447036] z_erofs_runqueue+0x428/0x8c0 [erofs] <4>[772156.447160] z_erofs_readahead+0x224/0x390 [erofs] .. Fixes: f36f3010 ("erofs: rename per-CPU buffers to global buffer pool and make it configurable") Cc: <stable@vger.kernel.org> # 6.10+ Reviewed-by:
Chunhai Guo <guochunhai@vivo.com> Reviewed-by:
Sandeep Dhavale <dhavale@google.com> Reviewed-by:
Chao Yu <chao@kernel.org> Signed-off-by:
Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240722035110.3456740-1-hsiangkao@linux.alibaba.com
-
Hongbo Li authored
Add support for STATX_DIOALIGN to EROFS, so that direct I/O alignment restrictions are exposed to userspace in a generic way. [Before] ``` ./statx_test /mnt/erofs/testfile statx(/mnt/erofs/testfile) = 0 dio mem align:0 dio offset align:0 ``` [After] ``` ./statx_test /mnt/erofs/testfile statx(/mnt/erofs/testfile) = 0 dio mem align:512 dio offset align:512 ``` Signed-off-by:
Hongbo Li <lihongbo22@huawei.com> Reviewed-by:
Chao Yu <chao@kernel.org> Signed-off-by:
Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240718083243.2485437-1-hsiangkao@linux.alibaba.com
-
- Jul 24, 2024
-
-
Joel Granados authored
const qualify the struct ctl_table argument in the proc_handler function signatures. This is a prerequisite to moving the static ctl_table structs into .rodata data which will ensure that proc_handler function pointers cannot be modified. This patch has been generated by the following coccinelle script: ``` virtual patch @r1@ identifier ctl, write, buffer, lenp, ppos; identifier func !~ "appldata_(timer|interval)_handler|sched_(rt|rr)_handler|rds_tcp_skbuf_handler|proc_sctp_do_(hmac_alg|rto_min|rto_max|udp_port|alpha_beta|auth|probe_interval)"; @@ int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int write, void *buffer, size_t *lenp, loff_t *ppos); @r2@ identifier func, ctl, write, buffer, lenp, ppos; @@ int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int write, void *buffer, size_t *lenp, loff_t *ppos) { ... } @r3@ identifier func; @@ int func( - stru...
-
Linus Torvalds authored
Commit e3ec0fe9 ("hostfs: Convert hostfs_read_folio() to use a folio") simplified hostfs_read_folio(), but in the process of converting to using folios natively also mis-used the folio_zero_tail() function due to the very confusing API of that function. Very arguably it's folio_zero_tail() API itself that is buggy, since it would make more sense (and the documentation kind of implies) that the third argument would be the pointer to the beginning of the folio buffer. But no, the third argument to folio_zero_tail() is where we should start zeroing the tail (even if we already also pass in the offset separately as the second argument). So fix the hostfs caller, and we can leave any folio_zero_tail() sanity cleanup for later. Reported-and-tested-by:
Maciej Żenczykowski <maze@google.com> Fixes: e3ec0fe9 ("hostfs: Convert hostfs_read_folio() to use a folio") Link: https://lore.kernel.org/all/CANP3RGceNzwdb7w=vPf5=7BCid5HVQDmz1K5kC9JG42+HVAh_g@mail.gmail.com/ Cc: Matthew Wilcox <willy@infradead.org> Cc: Christian Brauner <brauner@kernel.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Christian Brauner authored
In __wait_on_freeing_inode() we warn in case the inode_hash_lock is held but the inode is unhashed. We then release the inode_lock. So using "locked" as parameter name is confusing. Use is_inode_hash_locked as parameter name instead. Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
David Howells authored
When using cachefiles, lockdep may emit something similar to the circular locking dependency notice below. The problem appears to stem from the following: (1) Cachefiles manipulates xattrs on the files in its cache when called from ->writepages(). (2) The setxattr() and removexattr() system call handlers get the name (and value) from userspace after taking the sb_writers lock, putting accesses of the vma->vm_lock and mm->mmap_lock inside of that. (3) The afs filesystem uses a per-inode lock to prevent multiple revalidation RPCs and in writeback vs truncate to prevent parallel operations from deadlocking against the server on one side and local page locks on the other. Fix this by moving the getting of the name and value in {get,remove}xattr() outside of the sb_writers lock. This also has the minor benefits that we don't need to reget these in the event of a retry and we never try to take the sb_writers lock in the event we can't pull the name and value into the kernel. Alternative approaches that might fix this include moving the dispatch of a write to the cache off to a workqueue or trying to do without the validation lock in afs. Note that this might also affect other filesystems that use netfslib and/or cachefiles. ====================================================== WARNING: possible circular locking dependency detected 6.10.0-build2+ #956 Not tainted ------------------------------------------------------ fsstress/6050 is trying to acquire lock: ffff888138fd82f0 (mapping.invalidate_lock#3){++++}-{3:3}, at: filemap_fault+0x26e/0x8b0 but task is already holding lock: ffff888113f26d18 (&vma->vm_lock->lock){++++}-{3:3}, at: lock_vma_under_rcu+0x165/0x250 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #4 (&vma->vm_lock->lock){++++}-{3:3}: __lock_acquire+0xaf0/0xd80 lock_acquire.part.0+0x103/0x280 down_write+0x3b/0x50 vma_start_write+0x6b/0xa0 vma_link+0xcc/0x140 insert_vm_struct+0xb7/0xf0 alloc_bprm+0x2c1/0x390 kernel_execve+0x65/0x1a0 call_usermodehelper_exec_async+0x14d/0x190 ret_from_fork+0x24/0x40 ret_from_fork_asm+0x1a/0x30 -> #3 (&mm->mmap_lock){++++}-{3:3}: __lock_acquire+0xaf0/0xd80 lock_acquire.part.0+0x103/0x280 __might_fault+0x7c/0xb0 strncpy_from_user+0x25/0x160 removexattr+0x7f/0x100 __do_sys_fremovexattr+0x7e/0xb0 do_syscall_64+0x9f/0x100 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #2 (sb_writers#14){.+.+}-{0:0}: __lock_acquire+0xaf0/0xd80 lock_acquire.part.0+0x103/0x280 percpu_down_read+0x3c/0x90 vfs_iocb_iter_write+0xe9/0x1d0 __cachefiles_write+0x367/0x430 cachefiles_issue_write+0x299/0x2f0 netfs_advance_write+0x117/0x140 netfs_write_folio.isra.0+0x5ca/0x6e0 netfs_writepages+0x230/0x2f0 afs_writepages+0x4d/0x70 do_writepages+0x1e8/0x3e0 filemap_fdatawrite_wbc+0x84/0xa0 __filemap_fdatawrite_range+0xa8/0xf0 file_write_and_wait_range+0x59/0x90 afs_release+0x10f/0x270 __fput+0x25f/0x3d0 __do_sys_close+0x43/0x70 do_syscall_64+0x9f/0x100 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #1 (&vnode->validate_lock){++++}-{3:3}: __lock_acquire+0xaf0/0xd80 lock_acquire.part.0+0x103/0x280 down_read+0x95/0x200 afs_writepages+0x37/0x70 do_writepages+0x1e8/0x3e0 filemap_fdatawrite_wbc+0x84/0xa0 filemap_invalidate_inode+0x167/0x1e0 netfs_unbuffered_write_iter+0x1bd/0x2d0 vfs_write+0x22e/0x320 ksys_write+0xbc/0x130 do_syscall_64+0x9f/0x100 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #0 (mapping.invalidate_lock#3){++++}-{3:3}: check_noncircular+0x119/0x160 check_prev_add+0x195/0x430 __lock_acquire+0xaf0/0xd80 lock_acquire.part.0+0x103/0x280 down_read+0x95/0x200 filemap_fault+0x26e/0x8b0 __do_fault+0x57/0xd0 do_pte_missing+0x23b/0x320 __handle_mm_fault+0x2d4/0x320 handle_mm_fault+0x14f/0x260 do_user_addr_fault+0x2a2/0x500 exc_page_fault+0x71/0x90 asm_exc_page_fault+0x22/0x30 other info that might help us debug this: Chain exists of: mapping.invalidate_lock#3 --> &mm->mmap_lock --> &vma->vm_lock->lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- rlock(&vma->vm_lock->lock); lock(&mm->mmap_lock); lock(&vma->vm_lock->lock); rlock(mapping.invalidate_lock#3); *** DEADLOCK *** 1 lock held by fsstress/6050: #0: ffff888113f26d18 (&vma->vm_lock->lock){++++}-{3:3}, at: lock_vma_under_rcu+0x165/0x250 stack backtrace: CPU: 0 PID: 6050 Comm: fsstress Not tainted 6.10.0-build2+ #956 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 Call Trace: <TASK> dump_stack_lvl+0x57/0x80 check_noncircular+0x119/0x160 ? queued_spin_lock_slowpath+0x4be/0x510 ? __pfx_check_noncircular+0x10/0x10 ? __pfx_queued_spin_lock_slowpath+0x10/0x10 ? mark_lock+0x47/0x160 ? init_chain_block+0x9c/0xc0 ? add_chain_block+0x84/0xf0 check_prev_add+0x195/0x430 __lock_acquire+0xaf0/0xd80 ? __pfx___lock_acquire+0x10/0x10 ? __lock_release.isra.0+0x13b/0x230 lock_acquire.part.0+0x103/0x280 ? filemap_fault+0x26e/0x8b0 ? __pfx_lock_acquire.part.0+0x10/0x10 ? rcu_is_watching+0x34/0x60 ? lock_acquire+0xd7/0x120 down_read+0x95/0x200 ? filemap_fault+0x26e/0x8b0 ? __pfx_down_read+0x10/0x10 ? __filemap_get_folio+0x25/0x1a0 filemap_fault+0x26e/0x8b0 ? __pfx_filemap_fault+0x10/0x10 ? find_held_lock+0x7c/0x90 ? __pfx___lock_release.isra.0+0x10/0x10 ? __pte_offset_map+0x99/0x110 __do_fault+0x57/0xd0 do_pte_missing+0x23b/0x320 __handle_mm_fault+0x2d4/0x320 ? __pfx___handle_mm_fault+0x10/0x10 handle_mm_fault+0x14f/0x260 do_user_addr_fault+0x2a2/0x500 exc_page_fault+0x71/0x90 asm_exc_page_fault+0x22/0x30 Signed-off-by:
David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/2136178.1721725194@warthog.procyon.org.uk cc: Alexander Viro <viro@zeniv.linux.org.uk> cc: Christian Brauner <brauner@kernel.org> cc: Jan Kara <jack@suse.cz> cc: Jeff Layton <jlayton@kernel.org> cc: Gao Xiang <xiang@kernel.org> cc: Matthew Wilcox <willy@infradead.org> cc: netfs@lists.linux.dev cc: linux-erofs@lists.ozlabs.org cc: linux-fsdevel@vger.kernel.org [brauner: fix minor issues] Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
Jann Horn authored
When I wrote commit 3cad1bc0 ("filelock: Remove locks reliably when fcntl/close race is detected"), I missed that there are two copies of the code I was patching: The normal version, and the version for 64-bit offsets on 32-bit kernels. Thanks to Greg KH for stumbling over this while doing the stable backport... Apply exactly the same fix to the compat path for 32-bit kernels. Fixes: c293621b ("[PATCH] stale POSIX lock handling") Cc: stable@kernel.org Link: https://bugs.chromium.org/p/project-zero/issues/detail?id=2563 Signed-off-by:
Jann Horn <jannh@google.com> Link: https://lore.kernel.org/r/20240723-fs-lock-recover-compatfix-v1-1-148096719529@google.com Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
Christian Brauner authored
The counter is unconditionally incremented for each mount allocation. If we set it to 1ULL << 32 we're losing 4294967296 as the first valid non-32 bit mount id. Link: https://lore.kernel.org/r/20240719-work-mount-namespace-v1-1-834113cab0d2@kernel.org Reviewed-by:
Josef Bacik <josef@toxicpanda.com> Reviewed-by:
Jeff Layton <jlayton@kernel.org> Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
David Howells authored
Set the maximum size of a subrequest that writes to cachefiles to be MAX_RW_COUNT so that we don't overrun the maximum write we can make to the backing filesystem. Signed-off-by:
David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/1599005.1721398742@warthog.procyon.org.uk cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
David Howells authored
When netfslib is performing writeback (ie. ->writepages), it maintains two parallel streams of writes, one to the server and one to the cache, but it doesn't mark either stream of writes as active until it gets some data that needs to be written to that stream. This is done because some folios will only be written to the cache (e.g. copying to the cache on read is done by marking the folios and letting writeback do the actual work) and sometimes we'll only be writing to the server (e.g. if there's no cache). Now, since we don't actually dispatch uploads and cache writes in parallel, but rather flip between the streams, depending on which has the lowest so-far-issued offset, and don't wait for the subreqs to finish before flipping, we can end up in a situation where, say, we issue a write to the server and this completes before we start the write to the cache. But because we only activate a stream when we first add a subreq to it, the result...
-
Christian Brauner authored
The nsproxy structure contains nearly all of the namespaces associated with a task. When a given namespace type is not supported by this kernel the rules whether the corresponding pointer in struct nsproxy is NULL or always init_<ns_type>_ns differ per namespace. Ideally, that wouldn't be the case and for all namespace types we'd always set it to init_<ns_type>_ns when the corresponding namespace type isn't supported. Make sure we handle all namespaces where the pointer in struct nsproxy can be NULL when the namespace type isn't supported. Link: https://lore.kernel.org/r/20240722-work-pidfs-e6a83030f63e@brauner Fixes: 5b08bd40 ("pidfs: allow retrieval of namespace file descriptors") # mainline only Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
Edward Adam Davis authored
syzbot call pidfd_ioctl() with cmd "PIDFD_GET_TIME_NAMESPACE" and disabled CONFIG_TIME_NS, since time_ns is NULL, it will make NULL ponter deref in open_namespace. Fixes: 5b08bd40 ("pidfs: allow retrieval of namespace file descriptors") # mainline only Reported-and-tested-by:
<syzbot+34a0ee986f61f15da35d@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=34a0ee986f61f15da35d Signed-off-by:
Edward Adam Davis <eadavis@qq.com> Link: https://lore.kernel.org/r/tencent_7FAE8DB725EE0DD69236DDABDDDE195E4F07@qq.com Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
Congjie Zhou authored
correct the comments of vfs_*() helpers in fs/namei.c, including: 1. vfs_create() 2. vfs_mknod() 3. vfs_mkdir() 4. vfs_rmdir() 5. vfs_symlink() All of them come from the same commit: 6521f891 "namei: prepare for idmapped mounts" The @dentry is actually the dentry of child directory rather than base directory(parent directory), and thus the @dir has to be modified due to the change of @dentry. Signed-off-by:
Congjie Zhou <zcjie0802@qq.com> Link: https://lore.kernel.org/r/tencent_2FCF6CC9E10DC8A27AE58A5A0FE4FCE96D0A@qq.com Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
Mateusz Guzik authored
Lockless hash lookup can find and lock the inode after it gets the I_FREEING flag set, at which point it blocks waiting for teardown in evict() to finish. However, the flag is still set even after evict() wakes up all waiters. This results in a race where if the inode lock is taken late enough, it can happen after both hash removal and wakeups, meaning there is nobody to wake the racing thread up. This worked prior to RCU-based lookup because the entire ordeal was synchronized with the inode hash lock. Since unhashing requires the inode lock, we can safely check whether it happened after acquiring it. Link: https://lore.kernel.org/v9fs/20240717102458.649b60be@kernel.org/ Reported-by:
Dominique Martinet <asmadeus@codewreck.org> Fixes: 7180f8d9 ("vfs: add rcu-based find_inode variants for iget ops") Signed-off-by:
Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20240718151838.611807-1-mjguzik@gmail.com Reviewed-by: Jan Kara <j...
-
David Howells authored
CONFIG_FSCACHE_DEBUG should have been renamed to CONFIG_NETFS_DEBUG, so do that now. Signed-off-by:
David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/1410796.1721333406@warthog.procyon.org.uk cc: Uwe Kleine-König <ukleinek@kernel.org> cc: Christian Brauner <brauner@kernel.org> cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
David Howells authored
Revert commit 163eae0f to get back the original operation of the debugging macros. Signed-off-by:
David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20240608151352.22860-2-ukleinek@kernel.org Link: https://lore.kernel.org/r/1410685.1721333252@warthog.procyon.org.uk cc: Uwe Kleine-König <ukleinek@kernel.org> cc: Christian Brauner <brauner@kernel.org> cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
- Jul 23, 2024
-
-
Steve French authored
Although by default we negotiate CIFS Unix Extensions for SMB1 mounts to Samba (and they work if the user does not specify "unix" or "posix" or "linux" on mount), and we do properly handle when a user turns them off with "nounix" mount parm. But with the changes to the mount API we broke cases where the user explicitly specifies the "unix" option (or equivalently "linux" or "posix") on mount with vers=1.0 to Samba or other servers which support the CIFS Unix Extensions. "mount error(95): Operation not supported" and logged: "CIFS: VFS: Check vers= mount option. SMB3.11 disabled but required for POSIX extensions" even though CIFS Unix Extensions are supported for vers=1.0 This patch fixes the case where the user specifies both "unix" (or equivalently "posix" or "linux") and "vers=1.0" on mount to a server which supports the CIFS Unix Extensions. Cc: stable@vger.kernel.org Reviewed-by:
David Howells <dhowell@redhat.com> Reviewed-by:
Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by:
Steve French <stfrench@microsoft.com>
-
Steve French authored
When mounting with the SMB1 Unix Extensions (e.g. mounts to Samba with vers=1.0), reconnects no longer reset the Unix Extensions (SetFSInfo SET_FILE_UNIX_BASIC) after tcon so most operations (e.g. stat, ls, open, statfs) will fail continuously with: "Operation not supported" if the connection ever resets (e.g. due to brief network disconnect) Cc: stable@vger.kernel.org Reviewed-by:
Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by:
Steve French <stfrench@microsoft.com>
-
Steve French authored
Dan Carpenter reported a Smack static checker warning: fs/smb/client/cifsfs.c:1981 init_cifs() error: we previously assumed 'serverclose_wq' could be null (see line 1895) The patch which introduced the serverclose workqueue used the wrong oredering in error paths in init_cifs() for freeing it on errors. Fixes: 173217bd ("smb3: retrying on failed server close") Cc: stable@vger.kernel.org Cc: Ritvik Budhiraja <rbudhiraja@microsoft.com> Reported-by:
Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by:
Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by:
David Howells <dhowell@redhat.com> Signed-off-by:
Steve French <stfrench@microsoft.com>
-
ethanwu authored
The kmalloc size of pagevec mempool is incorrectly calculated. It misses the size of page pointer and only accounts the number for the array. Fixes: a0102bda ("ceph: move sb->wb_pagevec_pool to be a global mempool") Signed-off-by:
ethanwu <ethanwu@synology.com> Reviewed-by:
Xiubo Li <xiubli@redhat.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
Xiubo Li authored
The MDS could be waiting the caps releases infinitely in some corner case and then reporting the caps revoke stuck warning. To fix this we should periodically flush the cap releases. Link: https://tracker.ceph.com/issues/57244 Signed-off-by:
Xiubo Li <xiubli@redhat.com> Reviewed-by:
Venky Shankar <vshankar@redhat.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
Chen Ni authored
Replace a comma between expression statements by a semicolon. Signed-off-by:
Chen Ni <nichen@iscas.ac.cn> Reviewed-by:
Xiubo Li <xiubli@redhat.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
Max Kellermann authored
Only debugfs uses this list. By omitting it, we save some memory and reduce lock contention on `caps_list_lock`. Signed-off-by:
Max Kellermann <max.kellermann@ionos.com> Reviewed-by:
Xiubo Li <xiubli@redhat.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
- Jul 22, 2024
-
-
Kees Cook authored
Move the exec KUnit tests into a separate directory to avoid polluting the local directory namespace. Additionally update MAINTAINERS for the new files. Reviewed-by:
David Gow <davidgow@google.com> Reviewed-by:
SeongJae Park <sj@kernel.org> Acked-by:
Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20240720170310.it.942-kees@kernel.org Signed-off-by:
Kees Cook <kees@kernel.org>
-
Kent Overstreet authored
Reported-by:
<syzbot+f765e51170cf13493f0b@syzkaller.appspotmail.com> Fixes: f12410bb ("bcachefs: Add an error message for insufficient rw journal devs") Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
Kent Overstreet authored
Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev>
-
- Jul 20, 2024
-
-
David Howells authored
A network filesystem needs to implement a netfslib hook to invalidate fscache if it's to be able to use the cache. Fix cifs to implement the cache invalidation hook. Signed-off-by:
David Howells <dhowells@redhat.com> Reviewed-by:
Paulo Alcantara (Red Hat) <pc@manguebit.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Cc: stable@vger.kernel.org Fixes: 3ee1a1fc ("cifs: Cut over to using netfslib") Signed-off-by:
Steve French <stfrench@microsoft.com>
-
- Jul 19, 2024
-
-
Jason A. Donenfeld authored
The vDSO getrandom() implementation works with a buffer allocated with a new system call that has certain requirements: - It shouldn't be written to core dumps. * Easy: VM_DONTDUMP. - It should be zeroed on fork. * Easy: VM_WIPEONFORK. - It shouldn't be written to swap. * Uh-oh: mlock is rlimited. * Uh-oh: mlock isn't inherited by forks. - It shouldn't reserve actual memory, but it also shouldn't crash when page faulting in memory if none is available * Uh-oh: VM_NORESERVE means segfaults. It turns out that the vDSO getrandom() function has three really nice characteristics that we can exploit to solve this problem: 1) Due to being wiped during fork(), the vDSO code is already robust to having the contents of the pages it reads zeroed out midway through the function's execution. 2) In the absolute worst case of whatever contingency we're coding for, we have the option to fallback to the getrandom() syscall, and everything is fine. 3) The buffers the function uses are only ever useful for a maximum of 60 seconds -- a sort of cache, rather than a long term allocation. These characteristics mean that we can introduce VM_DROPPABLE, which has the following semantics: a) It never is written out to swap. b) Under memory pressure, mm can just drop the pages (so that they're zero when read back again). c) It is inherited by fork. d) It doesn't count against the mlock budget, since nothing is locked. e) If there's not enough memory to service a page fault, it's not fatal, and no signal is sent. This way, allocations used by vDSO getrandom() can use: VM_DROPPABLE | VM_DONTDUMP | VM_WIPEONFORK | VM_NORESERVE And there will be no problem with OOMing, crashing on overcommitment, using memory when not in use, not wiping on fork(), coredumps, or writing out to swap. In order to let vDSO getrandom() use this, expose these via mmap(2) as MAP_DROPPABLE. Note that this involves removing the MADV_FREE special case from sort_folio(), which according to Yu Zhao is unnecessary and will simply result in an extra call to shrink_folio_list() in the worst case. The chunk removed reenables the swapbacked flag, which we don't want for VM_DROPPABLE, and we can't conditionalize it here because there isn't a vma reference available. Finally, the provided self test ensures that this is working as desired. Cc: linux-mm@kvack.org Acked-by:
David Hildenbrand <david@redhat.com> Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
David Howells authored
Add a tracepoint to track the credit changes and server in_flight value involved in the lifetime of a R/W request, logging it against the request/subreq debugging ID. This requires the debugging IDs to be recorded in the cifs_credits struct. The tracepoint can be enabled with: echo 1 >/sys/kernel/debug/tracing/events/cifs/smb3_rw_credits/enable Also add a three-state flag to struct cifs_credits to note if we're interested in determining when the in_flight contribution ends and, if so, to track whether we've decremented the contribution yet. Signed-off-by:
David Howells <dhowells@redhat.com> Reviewed-by:
Paulo Alcantara (Red Hat) <pc@manguebit.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by:
Steve French <stfrench@microsoft.com>
-
David Howells authored
At the moment, at the end of a DIO write, cifs calls netfs_resize_file() to adjust the size of the file if it needs it. This will reduce the zero_point (the point above which we assume a read will just return zeros) if it's more than the new i_size, but won't increase it. With DIO writes, however, we definitely want to increase it as we have clobbered the local pagecache and then written some data that's not available locally. Fix cifs to make the zero_point above the end of a DIO or unbuffered write. This fixes corruption seen occasionally with the generic/708 xfs-test. In that case, the read-back of some of the written data is being short-circuited and replaced with zeroes. Fixes: 3ee1a1fc ("cifs: Cut over to using netfslib") Cc: stable@vger.kernel.org Reported-by:
Steve French <sfrench@samba.org> Signed-off-by:
David Howells <dhowells@redhat.com> Reviewed-by:
Paulo Alcantara (Red Hat) <pc@manguebit.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by:
Steve French <stfrench@microsoft.com>
-