Forum | Documentation | Website | Blog

Skip to content
Snippets Groups Projects
  1. Jul 01, 2024
  2. Apr 24, 2024
  3. Apr 15, 2024
  4. Apr 07, 2024
    • Christian Brauner's avatar
      fs: claw back a few FMODE_* bits · 210a03c9
      Christian Brauner authored
      There's a bunch of flags that are purely based on what the file
      operations support while also never being conditionally set or unset.
      IOW, they're not subject to change for individual files. Imho, such
      flags don't need to live in f_mode they might as well live in the fops
      structs itself. And the fops struct already has that lonely
      mmap_supported_flags member. We might as well turn that into a generic
      fop_flags member and move a few flags from FMODE_* space into FOP_*
      space. That gets us four FMODE_* bits back and the ability for new
      static flags that are about file ops to not have to live in FMODE_*
      space but in their own FOP_* space. It's not the most beautiful thing
      ever but it gets the job done. Yes, there'll be an additional pointer
      chase but hopefully that won't matter for these flags.
      
      I suspect there's a few more we can move into there and that we can also
      redirect a bunch of new flag suggestions that follow this pattern into
      the fop_flags field instead of f_mode.
      
      ...
      210a03c9
  5. Feb 19, 2024
  6. Oct 23, 2023
  7. Aug 24, 2023
  8. Jun 12, 2023
  9. Jun 09, 2023
  10. May 24, 2023
  11. Apr 05, 2023
  12. Apr 03, 2023
  13. Feb 09, 2023
  14. Jan 19, 2023
    • Christian Brauner's avatar
      fs: port ->setattr() to pass mnt_idmap · c1632a0f
      Christian Brauner authored
      Convert to struct mnt_idmap.
      
      Last cycle we merged the necessary infrastructure in
      256c8aed
      
       ("fs: introduce dedicated idmap type for mounts").
      This is just the conversion to struct mnt_idmap.
      
      Currently we still pass around the plain namespace that was attached to a
      mount. This is in general pretty convenient but it makes it easy to
      conflate namespaces that are relevant on the filesystem with namespaces
      that are relevent on the mount level. Especially for non-vfs developers
      without detailed knowledge in this area this can be a potential source for
      bugs.
      
      Once the conversion to struct mnt_idmap is done all helpers down to the
      really low-level helpers will take a struct mnt_idmap argument instead of
      two namespace arguments. This way it becomes impossible to conflate the two
      eliminating the possibility of any bugs. All of the vfs and all filesystems
      only operate on struct mnt_idmap.
      
      Acked-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      c1632a0f
  15. Nov 06, 2022
    • Dave Chinner's avatar
      xfs: write page faults in iomap are not buffered writes · 118e021b
      Dave Chinner authored
      
      When we reserve a delalloc region in xfs_buffered_write_iomap_begin,
      we mark the iomap as IOMAP_F_NEW so that the the write context
      understands that it allocated the delalloc region.
      
      If we then fail that buffered write, xfs_buffered_write_iomap_end()
      checks for the IOMAP_F_NEW flag and if it is set, it punches out
      the unused delalloc region that was allocated for the write.
      
      The assumption this code makes is that all buffered write operations
      that can allocate space are run under an exclusive lock (i_rwsem).
      This is an invalid assumption: page faults in mmap()d regions call
      through this same function pair to map the file range being faulted
      and this runs only holding the inode->i_mapping->invalidate_lock in
      shared mode.
      
      IOWs, we can have races between page faults and write() calls that
      fail the nested page cache write operation that result in data loss.
      That is, the failing iomap_end call will punch out the data that
      the other racing iomap iteration brought into the page cache. This
      can be reproduced with generic/34[46] if we arbitrarily fail page
      cache copy-in operations from write() syscalls.
      
      Code analysis tells us that the iomap_page_mkwrite() function holds
      the already instantiated and uptodate folio locked across the iomap
      mapping iterations. Hence the folio cannot be removed from memory
      whilst we are mapping the range it covers, and as such we do not
      care if the mapping changes state underneath the iomap iteration
      loop:
      
      1. if the folio is not already dirty, there is no writeback races
         possible.
      2. if we allocated the mapping (delalloc or unwritten), the folio
         cannot already be dirty. See #1.
      3. If the folio is already dirty, it must be up to date. As we hold
         it locked, it cannot be reclaimed from memory. Hence we always
         have valid data in the page cache while iterating the mapping.
      4. Valid data in the page cache can exist when the underlying
         mapping is DELALLOC, UNWRITTEN or WRITTEN. Having the mapping
         change from DELALLOC->UNWRITTEN or UNWRITTEN->WRITTEN does not
         change the data in the page - it only affects actions if we are
         initialising a new page. Hence #3 applies  and we don't care
         about these extent map transitions racing with
         iomap_page_mkwrite().
      5. iomap_page_mkwrite() checks for page invalidation races
         (truncate, hole punch, etc) after it locks the folio. We also
         hold the mapping->invalidation_lock here, and hence the mapping
         cannot change due to extent removal operations while we are
         iterating the folio.
      
      As such, filesystems that don't use bufferheads will never fail
      the iomap_folio_mkwrite_iter() operation on the current mapping,
      regardless of whether the iomap should be considered stale.
      
      Further, the range we are asked to iterate is limited to the range
      inside EOF that the folio spans. Hence, for XFS, we will only map
      the exact range we are asked for, and we will only do speculative
      preallocation with delalloc if we are mapping a hole at the EOF
      page. The iterator will consume the entire range of the folio that
      is within EOF, and anything beyond the EOF block cannot be accessed.
      We never need to truncate this post-EOF speculative prealloc away in
      the context of the iomap_page_mkwrite() iterator because if it
      remains unused we'll remove it when the last reference to the inode
      goes away.
      
      Hence we don't actually need an .iomap_end() cleanup/error handling
      path at all for iomap_page_mkwrite() for XFS. This means we can
      separate the page fault processing from the complexity of the
      .iomap_end() processing in the buffered write path. This also means
      that the buffered write path will also be able to take the
      mapping->invalidate_lock as necessary.
      
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      118e021b
  16. Oct 31, 2022
    • Darrick J. Wong's avatar
      xfs: fix incorrect return type for fsdax fault handlers · 47ba8cc7
      Darrick J. Wong authored
      The kernel robot complained about this:
      
      >> fs/xfs/xfs_file.c:1266:31: sparse: sparse: incorrect type in return expression (different base types) @@     expected int @@     got restricted vm_fault_t @@
         fs/xfs/xfs_file.c:1266:31: sparse:     expected int
         fs/xfs/xfs_file.c:1266:31: sparse:     got restricted vm_fault_t
         fs/xfs/xfs_file.c:1314:21: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted vm_fault_t [usertype] ret @@     got int @@
         fs/xfs/xfs_file.c:1314:21: sparse:     expected restricted vm_fault_t [usertype] ret
         fs/xfs/xfs_file.c:1314:21: sparse:     got int
      
      Fix the incorrect return type for these two functions.
      
      While we're at it, make the !fsdax version return VM_FAULT_SIGBUS
      because a zero return value will cause some callers to try to lock
      vmf->page, which we never set here.
      
      Fixes: ea6c49b7
      
       ("xfs: support CoW in fsdax mode")
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      47ba8cc7
  17. Aug 05, 2022
  18. Jul 24, 2022
  19. Jul 17, 2022
  20. May 22, 2022
  21. May 16, 2022
  22. Apr 20, 2022
  23. Apr 11, 2022
  24. Feb 01, 2022