Forum | Documentation | Website | Blog

Skip to content
Snippets Groups Projects
  1. Apr 15, 2024
  2. Apr 05, 2024
  3. Mar 05, 2024
  4. Feb 27, 2024
  5. Feb 21, 2024
    • Bart Van Assche's avatar
      fs/aio: Restrict kiocb_set_cancel_fn() to I/O submitted via libaio · b820de74
      Bart Van Assche authored
      
      If kiocb_set_cancel_fn() is called for I/O submitted via io_uring, the
      following kernel warning appears:
      
      WARNING: CPU: 3 PID: 368 at fs/aio.c:598 kiocb_set_cancel_fn+0x9c/0xa8
      Call trace:
       kiocb_set_cancel_fn+0x9c/0xa8
       ffs_epfile_read_iter+0x144/0x1d0
       io_read+0x19c/0x498
       io_issue_sqe+0x118/0x27c
       io_submit_sqes+0x25c/0x5fc
       __arm64_sys_io_uring_enter+0x104/0xab0
       invoke_syscall+0x58/0x11c
       el0_svc_common+0xb4/0xf4
       do_el0_svc+0x2c/0xb0
       el0_svc+0x2c/0xa4
       el0t_64_sync_handler+0x68/0xb4
       el0t_64_sync+0x1a4/0x1a8
      
      Fix this by setting the IOCB_AIO_RW flag for read and write I/O that is
      submitted by libaio.
      
      Suggested-by: default avatarJens Axboe <axboe@kernel.dk>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Avi Kivity <avi@scylladb.com>
      Cc: Sandeep Dhavale <dhavale@google.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Kent Overstreet <kent.overstreet@linux.dev>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      Link: https://lore.kernel.org/r/20240215204739.2677806-2-bvanassche@acm.org
      
      
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      b820de74
  6. Dec 28, 2023
  7. Dec 05, 2023
  8. Nov 28, 2023
  9. Nov 21, 2023
  10. Sep 20, 2023
  11. Aug 21, 2023
  12. Jul 11, 2023
  13. Jun 15, 2023
  14. Feb 09, 2023
  15. Feb 03, 2023
  16. Nov 25, 2022
    • Al Viro's avatar
      use less confusing names for iov_iter direction initializers · de4eda9d
      Al Viro authored
      
      READ/WRITE proved to be actively confusing - the meanings are
      "data destination, as used with read(2)" and "data source, as
      used with write(2)", but people keep interpreting those as
      "we read data from it" and "we write data to it", i.e. exactly
      the wrong way.
      
      Call them ITER_DEST and ITER_SOURCE - at least that is harder
      to misinterpret...
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      de4eda9d
  17. Sep 12, 2022
  18. Aug 02, 2022
  19. Jun 10, 2022
  20. Mar 16, 2022
  21. Mar 15, 2022
  22. Mar 08, 2022
  23. Jan 22, 2022
    • Xiaoming Ni's avatar
      aio: move aio sysctl to aio.c · 86b12b6c
      Xiaoming Ni authored
      The kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
      dishes, this makes it very difficult to maintain.
      
      To help with this maintenance let's start by moving sysctls to places
      where they actually belong.  The proc sysctl maintainers do not want to
      know what sysctl knobs you wish to add for your own piece of code, we
      just care about the core logic.
      
      Move aio sysctl to aio.c and use the new register_sysctl_init() to
      register the sysctl interface for aio.
      
      [mcgrof@kernel.org: adjust commit log to justify the move]
      
      Link: https://lkml.kernel.org/r/20211123202347.818157-9-mcgrof@kernel.org
      
      
      Signed-off-by: default avatarXiaoming Ni <nixiaoming@huawei.com>
      Signed-off-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Amir Goldstein <amir73il@gmail.com>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Iurii Zaikin <yzaikin@google.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Paul Turner <pjt@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Qing Wang <wangqing@vivo.com>
      Cc: Sebastian Reichel <sre@kernel.org>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Stephen Kitt <steve@sk2.org>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Antti Palosaari <crope@iki.fi>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Clemens Ladisch <clemens@ladisch.de>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Julia Lawall <julia.lawall@inria.fr>
      Cc: Lukas Middendorf <kernel@tuxforce.de>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Phillip Potter <phil@philpotter.co.uk>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Douglas Gilbert <dgilbert@interlog.com>
      Cc: James E.J. Bottomley <jejb@linux.ibm.com>
      Cc: Jani Nikula <jani.nikula@intel.com>
      Cc: John Ogness <john.ogness@linutronix.de>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      86b12b6c
  24. Dec 09, 2021
    • Xie Yongji's avatar
      aio: Fix incorrect usage of eventfd_signal_allowed() · 4b374986
      Xie Yongji authored
      We should defer eventfd_signal() to the workqueue when
      eventfd_signal_allowed() return false rather than return
      true.
      
      Fixes: b542e383
      
       ("eventfd: Make signal recursion protection a task bit")
      Signed-off-by: default avatarXie Yongji <xieyongji@bytedance.com>
      Link: https://lore.kernel.org/r/20210913111928.98-1-xieyongji@bytedance.com
      
      
      Reviewed-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      4b374986
    • Eric Biggers's avatar
      aio: fix use-after-free due to missing POLLFREE handling · 50252e4b
      Eric Biggers authored
      signalfd_poll() and binder_poll() are special in that they use a
      waitqueue whose lifetime is the current task, rather than the struct
      file as is normally the case.  This is okay for blocking polls, since a
      blocking poll occurs within one task; however, non-blocking polls
      require another solution.  This solution is for the queue to be cleared
      before it is freed, by sending a POLLFREE notification to all waiters.
      
      Unfortunately, only eventpoll handles POLLFREE.  A second type of
      non-blocking poll, aio poll, was added in kernel v4.18, and it doesn't
      handle POLLFREE.  This allows a use-after-free to occur if a signalfd or
      binder fd is polled with aio poll, and the waitqueue gets freed.
      
      Fix this by making aio poll handle POLLFREE.
      
      A patch by Ramji Jiyani <ramjiyani@google.com>
      (https://lore.kernel.org/r/20211027011834.2497484-1-ramjiyani@google.com)
      tried to do this by making aio_poll_wake() always complete the request
      inline if POLLFREE is seen.  However, that solution had two bugs.
      First, it introduced a deadlock, as it unconditionally locked the aio
      context while holding the waitqueue lock, which inverts the normal
      locking order.  Second, it didn't consider that POLLFREE notifications
      are missed while the request has been temporarily de-queued.
      
      The second problem was solved by my previous patch.  This patch then
      properly fixes the use-after-free by handling POLLFREE in a
      deadlock-free way.  It does this by taking advantage of the fact that
      freeing of the waitqueue is RCU-delayed, similar to what eventpoll does.
      
      Fixes: 2c14fa83 ("aio: implement IOCB_CMD_POLL")
      Cc: <stable@vger.kernel.org> # v4.18+
      Link: https://lore.kernel.org/r/20211209010455.42744-6-ebiggers@kernel.org
      
      
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      50252e4b
    • Eric Biggers's avatar
      aio: keep poll requests on waitqueue until completed · 363bee27
      Eric Biggers authored
      Currently, aio_poll_wake() will always remove the poll request from the
      waitqueue.  Then, if aio_poll_complete_work() sees that none of the
      polled events are ready and the request isn't cancelled, it re-adds the
      request to the waitqueue.  (This can easily happen when polling a file
      that doesn't pass an event mask when waking up its waitqueue.)
      
      This is fundamentally broken for two reasons:
      
        1. If a wakeup occurs between vfs_poll() and the request being
           re-added to the waitqueue, it will be missed because the request
           wasn't on the waitqueue at the time.  Therefore, IOCB_CMD_POLL
           might never complete even if the polled file is ready.
      
        2. When the request isn't on the waitqueue, there is no way to be
           notified that the waitqueue is being freed (which happens when its
           lifetime is shorter than the struct file's).  This is supposed to
           happen via the waitqueue entries being woken up with POLLFREE.
      
      Therefore, leave the requests on the waitqueue until they are actually
      completed (or cancelled).  To keep track of when aio_poll_complete_work
      needs to be scheduled, use new fields in struct poll_iocb.  Remove the
      'done' field which is now redundant.
      
      Note that this is consistent with how sys_poll() and eventpoll work;
      their wakeup functions do *not* remove the waitqueue entries.
      
      Fixes: 2c14fa83 ("aio: implement IOCB_CMD_POLL")
      Cc: <stable@vger.kernel.org> # v4.18+
      Link: https://lore.kernel.org/r/20211209010455.42744-5-ebiggers@kernel.org
      
      
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      363bee27
  25. Oct 25, 2021
  26. Oct 20, 2021
  27. Aug 27, 2021
    • Thomas Gleixner's avatar
      eventfd: Make signal recursion protection a task bit · b542e383
      Thomas Gleixner authored
      
      The recursion protection for eventfd_signal() is based on a per CPU
      variable and relies on the !RT semantics of spin_lock_irqsave() for
      protecting this per CPU variable. On RT kernels spin_lock_irqsave() neither
      disables preemption nor interrupts which allows the spin lock held section
      to be preempted. If the preempting task invokes eventfd_signal() as well,
      then the recursion warning triggers.
      
      Paolo suggested to protect the per CPU variable with a local lock, but
      that's heavyweight and actually not necessary. The goal of this protection
      is to prevent the task stack from overflowing, which can be achieved with a
      per task recursion protection as well.
      
      Replace the per CPU variable with a per task bit similar to other recursion
      protection bits like task_struct::in_page_owner. This works on both !RT and
      RT kernels and removes as a side effect the extra per CPU storage.
      
      No functional change for !RT kernels.
      
      Reported-by: default avatarDaniel Bristot de Oliveira <bristot@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarDaniel Bristot de Oliveira <bristot@redhat.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Link: https://lore.kernel.org/r/87wnp9idso.ffs@tglx
      b542e383
  28. Apr 30, 2021
  29. Dec 15, 2020
    • Dmitry Safonov's avatar
      mremap: don't allow MREMAP_DONTUNMAP on special_mappings and aio · cd544fd1
      Dmitry Safonov authored
      As kernel expect to see only one of such mappings, any further operations
      on the VMA-copy may be unexpected by the kernel.  Maybe it's being on the
      safe side, but there doesn't seem to be any expected use-case for this, so
      restrict it now.
      
      Link: https://lkml.kernel.org/r/20201013013416.390574-4-dima@arista.com
      Fixes: commit e346b381
      
       ("mm/mremap: add MREMAP_DONTUNMAP to mremap()")
      Signed-off-by: default avatarDmitry Safonov <dima@arista.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Brian Geffon <bgeffon@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cd544fd1
  30. Nov 10, 2020
  31. Nov 06, 2020
  32. Oct 03, 2020