- Jul 06, 2024
-
-
Kefeng Wang authored
The folio_migrate_copy() is just a wrapper of folio_copy() and folio_migrate_flags(), it is simple and only aio use it for now, unfold it and remove folio_migrate_copy(). Link: https://lkml.kernel.org/r/20240626085328.608006-7-wangkefeng.wang@huawei.com Signed-off-by:
Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by:
Jane Chu <jane.chu@oracle.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jiaqi Yan <jiaqiyan@google.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
- Jul 03, 2024
-
-
Kefeng Wang authored
Commit 2916ecc0 ("mm/migrate: new migrate mode MIGRATE_SYNC_NO_COPY") introduce a new MIGRATE_SYNC_NO_COPY mode to allow to offload the copy to a device DMA engine, which is only used __migrate_device_pages() to decide whether or not copy the old page, and the MIGRATE_SYNC_NO_COPY mode only set in hmm, as the MIGRATE_SYNC_NO_COPY set is removed by previous cleanup, it seems that we could remove the unnecessary MIGRATE_SYNC_NO_COPY. Link: https://lkml.kernel.org/r/20240524052843.182275-6-wangkefeng.wang@huawei.com Signed-off-by:
Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by:
Jane Chu <jane.chu@oracle.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jiaqi Yan <jiaqiyan@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <muchun.song@linux.dev...
-
- Jun 20, 2024
-
-
Prasad Singamsetty authored
An atomic write is a write issued with torn-write protection, meaning that for a power failure or any other hardware failure, all or none of the data from the write will be stored, but never a mix of old and new data. Userspace may add flag RWF_ATOMIC to pwritev2() to indicate that the write is to be issued with torn-write prevention, according to special alignment and length rules. For any syscall interface utilizing struct iocb, add IOCB_ATOMIC for iocb->ki_flags field to indicate the same. A call to statx will give the relevant atomic write info for a file: - atomic_write_unit_min - atomic_write_unit_max - atomic_write_segments_max Both min and max values must be a power-of-2. Applications can avail of atomic write feature by ensuring that the total length of a write is a power-of-2 in size and also sized between atomic_write_unit_min and atomic_write_unit_max, inclusive. Applications must ensure that the write is at a naturally-aligned offset in the file wrt the total write length. The value in atomic_write_segments_max indicates the upper limit for IOV_ITER iovcnt. Add file mode flag FMODE_CAN_ATOMIC_WRITE, so files which do not have the flag set will have RWF_ATOMIC rejected and not just ignored. Add a type argument to kiocb_set_rw_flags() to allows reads which have RWF_ATOMIC set to be rejected. Helper function generic_atomic_write_valid() can be used by FSes to verify compliant writes. There we check for iov_iter type is for ubuf, which implies iovcnt==1 for pwritev2(), which is an initial restriction for atomic_write_segments_max. Initially the only user will be bdev file operations write handler. We will rely on the block BIO submission path to ensure write sizes are compliant for the bdev, so we don't need to check atomic writes sizes yet. Signed-off-by:
Prasad Singamsetty <prasad.singamsetty@oracle.com> jpg: merge into single patch and much rewrite Acked-by:
Darrick J. Wong <djwong@kernel.org> Reviewed-by:
Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by:
John Garry <john.g.garry@oracle.com> Reviewed-by:
Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20240620125359.2684798-4-john.g.garry@oracle.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Apr 15, 2024
-
-
Miklos Szeredi authored
These have no clear purpose. This is effectively a revert of commit bb7462b6 ("vfs: use helpers for calling f_op->{read,write}_iter()"). The patch was created with the help of a coccinelle script. Fixes: bb7462b6 ("vfs: use helpers for calling f_op->{read,write}_iter()") Reviewed-by:
Christian Brauner <brauner@kernel.org> Signed-off-by:
Miklos Szeredi <mszeredi@redhat.com> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- Apr 05, 2024
-
-
Kefeng Wang authored
Since aio use folios in most functions, convert ring/internal_pages to ring/internal_folios, let's directly use folio instead of page throughout aio to remove hidden calls to compound_head(), eg, flush_dcache_page(). Signed-off-by:
Kefeng Wang <wangkefeng.wang@huawei.com> Link: https://lore.kernel.org/r/20240321131640.948634-4-wangkefeng.wang@huawei.com Reviewed-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
Kefeng Wang authored
Use a folio throughout aio_free_ring() to remove calls to compound_head(), also move pr_debug after folio check to remove unnecessary print. Signed-off-by:
Kefeng Wang <wangkefeng.wang@huawei.com> Link: https://lore.kernel.org/r/20240321131640.948634-3-wangkefeng.wang@huawei.com Reviewed-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
Kefeng Wang authored
Use a folio throughout aio_setup_ring() to remove calls to compound_head(), also use folio_end_read() to simultaneously mark the folio uptodate and unlock it. Signed-off-by:
Kefeng Wang <wangkefeng.wang@huawei.com> Link: https://lore.kernel.org/r/20240321131640.948634-2-wangkefeng.wang@huawei.com Reviewed-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
Kent Overstreet authored
list_del_init_careful() needs to be the last access to the wait queue entry - it effectively unlocks access. Previously, finish_wait() would see the empty list head and skip taking the lock, and then we'd return - but the completion path would still attempt to do the wakeup after the task_struct pointer had been overwritten. Fixes: 71eb6b6b ("fs/aio: obey min_nr when doing wakeups") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/linux-fsdevel/CAHTA-ubfwwB51A5Wg5M6H_rPEQK9pNf8FkAGH=vr=FEkyRrtqw@mail.gmail.com/ Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev> Link: https://lore.kernel.org/stable/20240331215212.522544-1-kent.overstreet%40linux.dev Link: https://lore.kernel.org/r/20240331215212.522544-1-kent.overstreet@linux.dev Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
- Mar 05, 2024
-
-
Bart Van Assche authored
The first kiocb_set_cancel_fn() argument may point at a struct kiocb that is not embedded inside struct aio_kiocb. With the current code, depending on the compiler, the req->ki_ctx read happens either before the IOCB_AIO_RW test or after that test. Move the req->ki_ctx read such that it is guaranteed that the IOCB_AIO_RW test happens first. Reported-by:
Eric Biggers <ebiggers@kernel.org> Cc: Benjamin LaHaise <ben@communityfibre.ca> Cc: Eric Biggers <ebiggers@google.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Avi Kivity <avi@scylladb.com> Cc: Sandeep Dhavale <dhavale@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: stable@vger.kernel.org Fixes: b820de74 ("fs/aio: Restrict kiocb_set_cancel_fn() to I/O submitted via libaio") Signed-off-by:
Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240304235715.3790858-1-bvanassche@acm.org Reviewed-by:
Jens Axboe <axboe@kernel.dk> Reviewed-by:
Eric Biggers <ebiggers@google.com> Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
Bart Van Assche authored
Patch "fs/aio: Make io_cancel() generate completions again" is based on the assumption that calling kiocb->ki_cancel() does not complete R/W requests. This is incorrect: the two drivers that call kiocb_set_cancel_fn() callers set a cancellation function that calls usb_ep_dequeue(). According to its documentation, usb_ep_dequeue() calls the completion routine with status -ECONNRESET. Hence this revert. Cc: Benjamin LaHaise <ben@communityfibre.ca> Cc: Eric Biggers <ebiggers@google.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Avi Kivity <avi@scylladb.com> Cc: Sandeep Dhavale <dhavale@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: stable@vger.kernel.org Reported-by:
<syzbot+b91eb2ed18f599dd3c31@syzkaller.appspotmail.com> Fixes: 54cbc058 ("fs/aio: Make io_cancel() generate completions again") Signed-off-by:
Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240304182945.3646109-1-bvanassche@acm.org Acked-by:
Eric Biggers <ebiggers@google.com> Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
- Feb 27, 2024
-
-
Bart Van Assche authored
The following patch accidentally removed the code for delivering completions for cancelled reads and writes to user space: "[PATCH 04/33] aio: remove retry-based AIO" (https://lore.kernel.org/all/1363883754-27966-5-git-send-email-koverstreet@google.com/) >From that patch: - if (kiocbIsCancelled(iocb)) { - ret = -EINTR; - aio_complete(iocb, ret, 0); - /* must not access the iocb after this */ - goto out; - } This leads to a leak in user space of a struct iocb. Hence this patch that restores the code that reports to user space that a read or write has been cancelled successfully. Fixes: 41003a7b ("aio: remove retry-based AIO") Cc: Christoph Hellwig <hch@lst.de> Cc: Avi Kivity <avi@scylladb.com> Cc: Sandeep Dhavale <dhavale@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: stable@vger.kernel.org Signed-off-by:
Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240215204739.2677806-3-bvanassche@acm.org Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
- Feb 21, 2024
-
-
Bart Van Assche authored
If kiocb_set_cancel_fn() is called for I/O submitted via io_uring, the following kernel warning appears: WARNING: CPU: 3 PID: 368 at fs/aio.c:598 kiocb_set_cancel_fn+0x9c/0xa8 Call trace: kiocb_set_cancel_fn+0x9c/0xa8 ffs_epfile_read_iter+0x144/0x1d0 io_read+0x19c/0x498 io_issue_sqe+0x118/0x27c io_submit_sqes+0x25c/0x5fc __arm64_sys_io_uring_enter+0x104/0xab0 invoke_syscall+0x58/0x11c el0_svc_common+0xb4/0xf4 do_el0_svc+0x2c/0xb0 el0_svc+0x2c/0xa4 el0t_64_sync_handler+0x68/0xb4 el0t_64_sync+0x1a4/0x1a8 Fix this by setting the IOCB_AIO_RW flag for read and write I/O that is submitted by libaio. Suggested-by:
Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de> Cc: Avi Kivity <avi@scylladb.com> Cc: Sandeep Dhavale <dhavale@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: stable@vger.kernel.org Signed-off-by:
Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240215204739.2677806-2-bvanassche@acm.org Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
- Dec 28, 2023
-
-
Joel Granados authored
This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (sentinels) which will reduce the overall build time size of the kernel and run time memory bloat by ~64 bytes per sentinel (further information Link : https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/ ) Remove sentinel elements ctl_table struct. Special attention was placed in making sure that an empty directory for fs/verity was created when CONFIG_FS_VERITY_BUILTIN_SIGNATURES is not defined. In this case we use the register sysctl call that expects a size. Signed-off-by:
Joel Granados <j.granados@samsung.com> Reviewed-by:
Jan Kara <jack@suse.cz> Reviewed-by:
"Darrick J. Wong" <djwong@kernel.org> Acked-by:
Christian Brauner <brauner@kernel.org> Signed-off-by:
Luis Chamberlain <mcgrof@kernel.org>
-
- Dec 05, 2023
-
-
Jens Axboe authored
With the removal of the 'iov' argument to import_single_range(), the two functions are now fully identical. Convert the import_single_range() callers to import_ubuf(), and remove the former fully. Signed-off-by:
Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20231204174827.1258875-3-axboe@kernel.dk Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
Jens Axboe authored
It is entirely unused, just get rid of it. Signed-off-by:
Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20231204174827.1258875-2-axboe@kernel.dk Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
- Nov 28, 2023
-
-
Kent Overstreet authored
I've been observing workloads where IPIs due to wakeups in aio_complete() are ~15% of total CPU time in the profile. Most of those wakeups are unnecessary when completion batching is in use in io_getevents(). This plumbs min_nr through via the wait eventry, so that aio_complete() can avoid doing unnecessary wakeups. Signed-off-by:
Kent Overstreet <kent.overstreet@linux.dev> Link: https://lore.kernel.org/r/20231122234257.179390-1-kent.overstreet@linux.dev Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Christian Brauner <brauner@kernel.org> Cc: <linux-aio@kvack.org> Cc: <linux-fsdevel@vger.kernel.org> Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
Christian Brauner authored
Ever since the eventfd type was introduced back in 2007 in commit e1ad7468 ("signal/timer/event: eventfd core") the eventfd_signal() function only ever passed 1 as a value for @n. There's no point in keeping that additional argument. Link: https://lore.kernel.org/r/20231122-vfs-eventfd-signal-v2-2-bd549b14ce0c@kernel.org Acked-by:
Xu Yilun <yilun.xu@intel.com> Acked-by: Andrew Donnellan <ajd@linux.ibm.com> # ocxl Acked-by: Eric Farman <farman@linux.ibm.com> # s390 Reviewed-by:
Jan Kara <jack@suse.cz> Reviewed-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
- Nov 21, 2023
-
-
Matthew Wilcox (Oracle) authored
It is hard to find where mapping->private_lock, mapping->private_list and mapping->private_data are used, due to private_XXX being a relatively common name for variables and structure members in the kernel. To fit with other members of struct address_space, rename them all to have an i_ prefix. Tested with an allmodconfig build. Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/r/20231117215823.2821906-1-willy@infradead.org Acked-by:
Darrick J. Wong <djwong@kernel.org> Reviewed-by:
Josef Bacik <josef@toxicpanda.com> Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
- Sep 20, 2023
-
-
Kees Cook authored
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct kioctx_table. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: linux-aio@kvack.org Cc: linux-fsdevel@vger.kernel.org Signed-off-by:
Kees Cook <keescook@chromium.org> Reviewed-by:
"Gustavo A. R. Silva" <gustavoars@kernel.org> Message-Id: <20230915201413.never.881-kees@kernel.org> Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
- Aug 21, 2023
-
-
Amir Goldstein authored
Use helpers instead of the open coded dance to silence lockdep warnings. Suggested-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Amir Goldstein <amir73il@gmail.com> Reviewed-by:
Jan Kara <jack@suse.cz> Reviewed-by:
Jens Axboe <axboe@kernel.dk> Message-Id: <20230817141337.1025891-6-amir73il@gmail.com> Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
- Jul 11, 2023
-
-
Yu-cheng Yu authored
There was no more caller passing vm_flags to do_mmap(), and vm_flags was removed from the function's input by: commit 45e55300 ("mm: remove unnecessary wrapper function do_mmap_pgoff()"). There is a new user now. Shadow stack allocation passes VM_SHADOW_STACK to do_mmap(). Thus, re-introduce vm_flags to do_mmap(). Co-developed-by:
Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by:
Yu-cheng Yu <yu-cheng.yu@intel.com> Signed-off-by:
Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by:
Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by:
Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by:
Peter Collingbourne <pcc@google.com> Reviewed-by:
Kees Cook <keescook@chromium.org> Reviewed-by:
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by:
Mark Brown <broonie@kernel.org> Acked-by:
Mike Rapoport (IBM) <rppt@kernel.org> Acked-by:
David Hildenbrand <david@redhat.com> Tested-by:
Pengfei Xu <pengfei.xu@intel.com> Tested-by:
John Allen <john.allen@amd.com> Tested-by:
Kees Cook <keescook@chromium.org> Tested-by:
Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/all/20230613001108.3040476-5-rick.p.edgecombe%40intel.com
-
- Jun 15, 2023
-
-
Fabio M. De Francesco authored
There is no need to allocate aio rings from HIGHMEM because of very little memory needed here. Therefore, use GFP_USER flag in find_or_create_page() and get rid of kmap*() mappings. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ira Weiny <ira.weiny@intel.com> Suggested-by:
Matthew Wilcox <willy@infradead.org> Signed-off-by:
Fabio M. De Francesco <fmdefrancesco@gmail.com> Reviewed-by:
Ira Weiny <ira.weiny@intel.com> Message-Id: <20230609145937.17610-1-fmdefrancesco@gmail.com> Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
- Feb 09, 2023
-
-
Suren Baghdasaryan authored
Replace direct modifications to vma->vm_flags with calls to modifier functions to be able to track flag changes and to keep vma locking correctness. [akpm@linux-foundation.org: fix drivers/misc/open-dice.c, per Hyeonggon Yoo] Link: https://lkml.kernel.org/r/20230126193752.297968-5-surenb@google.com Signed-off-by:
Suren Baghdasaryan <surenb@google.com> Acked-by:
Michal Hocko <mhocko@suse.com> Acked-by:
Mel Gorman <mgorman@techsingularity.net> Acked-by:
Mike Rapoport (IBM) <rppt@kernel.org> Acked-by:
Sebastian Reichel <sebastian.reichel@collabora.com> Reviewed-by:
Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by:
Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjun Roy <arjunroy@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Minchan Kim <minchan@google.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Oskolkov <posk@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Song Liu <songliubraving@fb.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
- Feb 03, 2023
-
-
Seth Jenkins authored
Commit e4a0d3e7 ("aio: Make it possible to remap aio ring") introduced a null-deref if mremap is called on an old aio mapping after fork as mm->ioctx_table will be set to NULL. [jmoyer@redhat.com: fix 80 column issue] Link: https://lkml.kernel.org/r/x49sffq4nvg.fsf@segfault.boston.devel.redhat.com Fixes: e4a0d3e7 ("aio: Make it possible to remap aio ring") Signed-off-by:
Seth Jenkins <sethjenkins@google.com> Signed-off-by:
Jeff Moyer <jmoyer@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Jann Horn <jannh@google.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
- Nov 25, 2022
-
-
Al Viro authored
READ/WRITE proved to be actively confusing - the meanings are "data destination, as used with read(2)" and "data source, as used with write(2)", but people keep interpreting those as "we read data from it" and "we write data to it", i.e. exactly the wrong way. Call them ITER_DEST and ITER_SOURCE - at least that is harder to misinterpret... Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- Sep 12, 2022
-
-
Uros Bizjak authored
Use atomic_try_cmpxchg instead of atomic_cmpxchg (*ptr, old, new) == old in __get_reqs_available. x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg). Also, atomic_try_cmpxchg implicitly assigns old *ptr value to "old" when cmpxchg fails, enabling further code simplifications. No functional change intended. Link: https://lkml.kernel.org/r/20220714164851.3055-1-ubizjak@gmail.com Signed-off-by:
Uros Bizjak <ubizjak@gmail.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
- Aug 02, 2022
-
-
Matthew Wilcox (Oracle) authored
Use a folio throughout this function. Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by:
Christoph Hellwig <hch@lst.de>
-
- Jun 10, 2022
-
-
Al Viro authored
* calculate at the time we set FMODE_OPENED (do_dentry_open() for normal opens, alloc_file() for pipe()/socket()/etc.) * update when handling F_SETFL * keep in a new field - file->f_iocb_flags; since that thing is needed only before the refcount reaches zero, we can put it into the same anon union where ->f_rcuhead and ->f_llist live - those are used only after refcount reaches zero. Reviewed-by:
Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- Mar 16, 2022
-
-
Matthew Wilcox (Oracle) authored
This is a mechanical change. Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by:
Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by:
Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs Tested-by: David Howells <dhowells@redhat.com> # afs
-
- Mar 15, 2022
-
-
Lukas Bulwahn authored
Commit 84c4e1f8 ("aio: simplify - and fix - fget/fput for io_submit()") refactored aio_read() and some error cases into early return, which made some intermediate assignment of the return variable needless. Drop this needless assignment in aio_read(). No functional change. No change in resulting object code. Signed-off-by:
Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- Mar 08, 2022
-
-
Christoph Hellwig authored
This field is entirely unused now except for a tracepoint in f2fs, so remove it. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Dave Chinner <dchinner@redhat.com> Reviewed-by:
Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220308060529.736277-2-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jan 22, 2022
-
-
Xiaoming Ni authored
The kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. Move aio sysctl to aio.c and use the new register_sysctl_init() to register the sysctl interface for aio. [mcgrof@kernel.org: adjust commit log to justify the move] Link: https://lkml.kernel.org/r/20211123202347.818157-9-mcgrof@kernel.org Signed-off-by:
Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by:
Luis Chamberlain <mcgrof@kernel.org> Reviewed-by:
Jan Kara <jack@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Qing Wang <wangqing@vivo.com> Cc: Sebastian Reichel <sre@kernel.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Stephen Kitt <steve@sk2.org> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Antti Palosaari <crope@iki.fi> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Clemens Ladisch <clemens@ladisch.de> Cc: David Airlie <airlied@linux.ie> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Julia Lawall <julia.lawall@inria.fr> Cc: Lukas Middendorf <kernel@tuxforce.de> Cc: Mark Fasheh <mark@fasheh.com> Cc: Phillip Potter <phil@philpotter.co.uk> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Douglas Gilbert <dgilbert@interlog.com> Cc: James E.J. Bottomley <jejb@linux.ibm.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Dec 09, 2021
-
-
Xie Yongji authored
We should defer eventfd_signal() to the workqueue when eventfd_signal_allowed() return false rather than return true. Fixes: b542e383 ("eventfd: Make signal recursion protection a task bit") Signed-off-by:
Xie Yongji <xieyongji@bytedance.com> Link: https://lore.kernel.org/r/20210913111928.98-1-xieyongji@bytedance.com Reviewed-by:
Eric Biggers <ebiggers@google.com> Signed-off-by:
Eric Biggers <ebiggers@google.com>
-
Eric Biggers authored
signalfd_poll() and binder_poll() are special in that they use a waitqueue whose lifetime is the current task, rather than the struct file as is normally the case. This is okay for blocking polls, since a blocking poll occurs within one task; however, non-blocking polls require another solution. This solution is for the queue to be cleared before it is freed, by sending a POLLFREE notification to all waiters. Unfortunately, only eventpoll handles POLLFREE. A second type of non-blocking poll, aio poll, was added in kernel v4.18, and it doesn't handle POLLFREE. This allows a use-after-free to occur if a signalfd or binder fd is polled with aio poll, and the waitqueue gets freed. Fix this by making aio poll handle POLLFREE. A patch by Ramji Jiyani <ramjiyani@google.com> (https://lore.kernel.org/r/20211027011834.2497484-1-ramjiyani@google.com) tried to do this by making aio_poll_wake() always complete the request inline if POLLFREE is seen. However, that solution had two bugs. First, it introduced a deadlock, as it unconditionally locked the aio context while holding the waitqueue lock, which inverts the normal locking order. Second, it didn't consider that POLLFREE notifications are missed while the request has been temporarily de-queued. The second problem was solved by my previous patch. This patch then properly fixes the use-after-free by handling POLLFREE in a deadlock-free way. It does this by taking advantage of the fact that freeing of the waitqueue is RCU-delayed, similar to what eventpoll does. Fixes: 2c14fa83 ("aio: implement IOCB_CMD_POLL") Cc: <stable@vger.kernel.org> # v4.18+ Link: https://lore.kernel.org/r/20211209010455.42744-6-ebiggers@kernel.org Signed-off-by:
Eric Biggers <ebiggers@google.com>
-
Eric Biggers authored
Currently, aio_poll_wake() will always remove the poll request from the waitqueue. Then, if aio_poll_complete_work() sees that none of the polled events are ready and the request isn't cancelled, it re-adds the request to the waitqueue. (This can easily happen when polling a file that doesn't pass an event mask when waking up its waitqueue.) This is fundamentally broken for two reasons: 1. If a wakeup occurs between vfs_poll() and the request being re-added to the waitqueue, it will be missed because the request wasn't on the waitqueue at the time. Therefore, IOCB_CMD_POLL might never complete even if the polled file is ready. 2. When the request isn't on the waitqueue, there is no way to be notified that the waitqueue is being freed (which happens when its lifetime is shorter than the struct file's). This is supposed to happen via the waitqueue entries being woken up with POLLFREE. Therefore, leave the requests on the waitqueue until they are actually completed (or cancelled). To keep track of when aio_poll_complete_work needs to be scheduled, use new fields in struct poll_iocb. Remove the 'done' field which is now redundant. Note that this is consistent with how sys_poll() and eventpoll work; their wakeup functions do *not* remove the waitqueue entries. Fixes: 2c14fa83 ("aio: implement IOCB_CMD_POLL") Cc: <stable@vger.kernel.org> # v4.18+ Link: https://lore.kernel.org/r/20211209010455.42744-5-ebiggers@kernel.org Signed-off-by:
Eric Biggers <ebiggers@google.com>
-
- Oct 25, 2021
-
-
Jens Axboe authored
The second argument was only used by the USB gadget code, yet everyone pays the overhead of passing a zero to be passed into aio, where it ends up being part of the aio res2 value. Now that everybody is passing in zero, kill off the extra argument. Reviewed-by:
Darrick J. Wong <djwong@kernel.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Oct 20, 2021
-
-
Len Baker authored
As noted in the "Deprecated Interfaces, Language Features, Attributes, and Conventions" documentation [1], size calculations (especially multiplication) should not be performed in memory allocator (or similar) function arguments due to the risk of them overflowing. This could lead to values wrapping around and a smaller allocation being made than the caller was expecting. Using those allocations could lead to linear overflows of heap memory and other misbehaviors. So, use the struct_size() helper to do the arithmetic instead of the argument "size + size * count" in the kzalloc() function. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments Signed-off-by:
Len Baker <len.baker@gmx.com> Reviewed-by:
Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by:
Gustavo A. R. Silva <gustavoars@kernel.org>
-
- Aug 27, 2021
-
-
Thomas Gleixner authored
The recursion protection for eventfd_signal() is based on a per CPU variable and relies on the !RT semantics of spin_lock_irqsave() for protecting this per CPU variable. On RT kernels spin_lock_irqsave() neither disables preemption nor interrupts which allows the spin lock held section to be preempted. If the preempting task invokes eventfd_signal() as well, then the recursion warning triggers. Paolo suggested to protect the per CPU variable with a local lock, but that's heavyweight and actually not necessary. The goal of this protection is to prevent the task stack from overflowing, which can be achieved with a per task recursion protection as well. Replace the per CPU variable with a per task bit similar to other recursion protection bits like task_struct::in_page_owner. This works on both !RT and RT kernels and removes as a side effect the extra per CPU storage. No functional change for !RT kernels. Reported-by: Daniel Bristot de Oliveira <bristo...
-
- Apr 30, 2021
-
-
Brian Geffon authored
This reverts commit cd544fd1. As discussed in [1] this commit was a no-op because the mapping type was checked in vma_to_resize before move_vma is ever called. This meant that vm_ops->mremap() would never be called on such mappings. Furthermore, we've since expanded support of MREMAP_DONTUNMAP to non-anonymous mappings, and these special mappings are still protected by the existing check of !VM_DONTEXPAND and !VM_PFNMAP which will result in a -EINVAL. 1. https://lkml.org/lkml/2020/12/28/2340 Link: https://lkml.kernel.org/r/20210323182520.2712101-2-bgeffon@google.com Signed-off-by:
Brian Geffon <bgeffon@google.com> Acked-by:
Hugh Dickins <hughd@google.com> Reviewed-by:
Dmitry Safonov <0x7f454c46@gmail.com> Cc: Alejandro Colomar <alx.manpages@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: "Michael S . Tsirkin" <mst@redhat.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Dec 15, 2020
-
-
Dmitry Safonov authored
As kernel expect to see only one of such mappings, any further operations on the VMA-copy may be unexpected by the kernel. Maybe it's being on the safe side, but there doesn't seem to be any expected use-case for this, so restrict it now. Link: https://lkml.kernel.org/r/20201013013416.390574-4-dima@arista.com Fixes: commit e346b381 ("mm/mremap: add MREMAP_DONTUNMAP to mremap()") Signed-off-by:
Dmitry Safonov <dima@arista.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Geffon <bgeffon@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-