Forum | Documentation | Website | Blog

Skip to content
Snippets Groups Projects
  1. Jan 15, 2021
  2. Jan 14, 2021
    • Daniel Colascione's avatar
      fs: add LSM-supporting anon-inode interface · e7e832ce
      Daniel Colascione authored
      
      This change adds a new function, anon_inode_getfd_secure, that creates
      anonymous-node file with individual non-S_PRIVATE inode to which security
      modules can apply policy. Existing callers continue using the original
      singleton-inode kind of anonymous-inode file. We can transition anonymous
      inode users to the new kind of anonymous inode in individual patches for
      the sake of bisection and review.
      
      The new function accepts an optional context_inode parameter that callers
      can use to provide additional contextual information to security modules.
      For example, in case of userfaultfd, the created inode is a 'logical child'
      of the context_inode (userfaultfd inode of the parent process) in the sense
      that it provides the security context required during creation of the child
      process' userfaultfd inode.
      
      Signed-off-by: default avatarDaniel Colascione <dancol@google.com>
      [LG: Delete obsolete comments to alloc_anon_inode()]
      [LG: Add context_inode description in comments to anon_inode_getfd_secure()]
      [LG: Remove definition of anon_inode_getfile_secure() as there are no callers]
      [LG: Make __anon_inode_getfile() static]
      [LG: Use correct error cast in __anon_inode_getfile()]
      [LG: Fix error handling in __anon_inode_getfile()]
      Signed-off-by: default avatarLokesh Gidra <lokeshgidra@google.com>
      Reviewed-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      e7e832ce
  3. May 25, 2019
    • David Howells's avatar
      vfs: Convert anon_inodes to use the new mount API · 33cada40
      David Howells authored
      
      Convert the anon_inodes filesystem to the new internal mount API as the old
      one will be obsoleted and removed.  This allows greater flexibility in
      communication of mount parameters between userspace, the VFS and the
      filesystem.
      
      See Documentation/filesystems/mount_api.txt for more information.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: linux-fsdevel@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      33cada40
    • Al Viro's avatar
      mount_pseudo(): drop 'name' argument, switch to d_make_root() · 1f58bb18
      Al Viro authored
      
      Once upon a time we used to set ->d_name of e.g. pipefs root
      so that d_path() on pipes would work.  These days it's
      completely pointless - dentries of pipes are not even connected
      to pipefs root.  However, mount_pseudo() had set the root
      dentry name (passed as the second argument) and callers
      kept inventing names to pass to it.  Including those that
      didn't *have* any non-root dentries to start with...
      
      All of that had been pointless for about 8 years now; it's
      time to get rid of that cargo-culting...
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      1f58bb18
  4. May 21, 2019
  5. Jul 12, 2018
  6. Dec 24, 2016
  7. Mar 27, 2014
  8. Mar 25, 2014
  9. Nov 09, 2013
  10. Jul 16, 2013
  11. Feb 26, 2013
  12. Feb 22, 2013
    • Anatol Pomozov's avatar
      fs: Preserve error code in get_empty_filp(), part 2 · 39b65252
      Anatol Pomozov authored
      
      Allocating a file structure in function get_empty_filp() might fail because
      of several reasons:
       - not enough memory for file structures
       - operation is not allowed
       - user is over its limit
      
      Currently the function returns NULL in all cases and we loose the exact
      reason of the error. All callers of get_empty_filp() assume that the function
      can fail with ENFILE only.
      
      Return error through pointer. Change all callers to preserve this error code.
      
      [AV: cleaned up a bit, carved the get_empty_filp() part out into a separate commit
      (things remaining here deal with alloc_file()), removed pipe(2) behaviour change]
      
      Signed-off-by: default avatarAnatol Pomozov <anatol.pomozov@gmail.com>
      Reviewed-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      39b65252
  13. Mar 20, 2012
  14. Jul 26, 2011
  15. Jul 24, 2011
    • Tim Chen's avatar
      VFS : mount lock scalability for internal mounts · 423e0ab0
      Tim Chen authored
      
      For a number of file systems that don't have a mount point (e.g. sockfs
      and pipefs), they are not marked as long term. Therefore in
      mntput_no_expire, all locks in vfs_mount lock are taken instead of just
      local cpu's lock to aggregate reference counts when we release
      reference to file objects.  In fact, only local lock need to have been
      taken to update ref counts as these file systems are in no danger of
      going away until we are ready to unregister them.
      
      The attached patch marks file systems using kern_mount without
      mount point as long term.  The contentions of vfs_mount lock
      is now eliminated.  Before un-registering such file system,
      kern_unmount should be called to remove the long term flag and
      make the mount point ready to be freed.
      
      Signed-off-by: default avatarTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      423e0ab0
  16. Jan 16, 2011
    • Al Viro's avatar
      sanitize vfsmount refcounting changes · f03c6599
      Al Viro authored
      
      Instead of splitting refcount between (per-cpu) mnt_count
      and (SMP-only) mnt_longrefs, make all references contribute
      to mnt_count again and keep track of how many are longterm
      ones.
      
      Accounting rules for longterm count:
      	* 1 for each fs_struct.root.mnt
      	* 1 for each fs_struct.pwd.mnt
      	* 1 for having non-NULL ->mnt_ns
      	* decrement to 0 happens only under vfsmount lock exclusive
      
      That allows nice common case for mntput() - since we can't drop the
      final reference until after mnt_longterm has reached 0 due to the rules
      above, mntput() can grab vfsmount lock shared and check mnt_longterm.
      If it turns out to be non-zero (which is the common case), we know
      that this is not the final mntput() and can just blindly decrement
      percpu mnt_count.  Otherwise we grab vfsmount lock exclusive and
      do usual decrement-and-check of percpu mnt_count.
      
      For fs_struct.c we have mnt_make_longterm() and mnt_make_shortterm();
      namespace.c uses the latter in places where we don't already hold
      vfsmount lock exclusive and opencodes a few remaining spots where
      we need to manipulate mnt_longterm.
      
      Note that we mostly revert the code outside of fs/namespace.c back
      to what we used to have; in particular, normal code doesn't need
      to care about two kinds of references, etc.  And we get to keep
      the optimization Nick's variant had bought us...
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      f03c6599
  17. Jan 12, 2011
  18. Jan 07, 2011
    • Nicholas Piggin's avatar
      fs: scale mntget/mntput · b3e19d92
      Nicholas Piggin authored
      
      The problem that this patch aims to fix is vfsmount refcounting scalability.
      We need to take a reference on the vfsmount for every successful path lookup,
      which often go to the same mount point.
      
      The fundamental difficulty is that a "simple" reference count can never be made
      scalable, because any time a reference is dropped, we must check whether that
      was the last reference. To do that requires communication with all other CPUs
      that may have taken a reference count.
      
      We can make refcounts more scalable in a couple of ways, involving keeping
      distributed counters, and checking for the global-zero condition less
      frequently.
      
      - check the global sum once every interval (this will delay zero detection
        for some interval, so it's probably a showstopper for vfsmounts).
      
      - keep a local count and only taking the global sum when local reaches 0 (this
        is difficult for vfsmounts, because we can't hold preempt off for the life of
        a reference, so a counter would need to be per-thread or tied strongly to a
        particular CPU which requires more locking).
      
      - keep a local difference of increments and decrements, which allows us to sum
        the total difference and hence find the refcount when summing all CPUs. Then,
        keep a single integer "long" refcount for slow and long lasting references,
        and only take the global sum of local counters when the long refcount is 0.
      
      This last scheme is what I implemented here. Attached mounts and process root
      and working directory references are "long" references, and everything else is
      a short reference.
      
      This allows scalable vfsmount references during path walking over mounted
      subtrees and unattached (lazy umounted) mounts with processes still running
      in them.
      
      This results in one fewer atomic op in the fastpath: mntget is now just a
      per-CPU inc, rather than an atomic inc; and mntput just requires a spinlock
      and non-atomic decrement in the common case. However code is otherwise bigger
      and heavier, so single threaded performance is basically a wash.
      
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      b3e19d92
    • Nicholas Piggin's avatar
      fs: improve scalability of pseudo filesystems · 4b936885
      Nicholas Piggin authored
      
      Regardless of how much we possibly try to scale dcache, there is likely
      always going to be some fundamental contention when adding or removing children
      under the same parent. Pseudo filesystems do not seem need to have connected
      dentries because by definition they are disconnected.
      
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      4b936885
    • Nicholas Piggin's avatar
      fs: dcache reduce branches in lookup path · fb045adb
      Nicholas Piggin authored
      
      Reduce some branches and memory accesses in dcache lookup by adding dentry
      flags to indicate common d_ops are set, rather than having to check them.
      This saves a pointer memory access (dentry->d_op) in common path lookup
      situations, and saves another pointer load and branch in cases where we
      have d_op but not the particular operation.
      
      Patched with:
      
      git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i
      
      Signed-off-by: default avatarNick Piggin <npiggin@kernel.dk>
      fb045adb
  19. Dec 10, 2010
  20. Oct 29, 2010
  21. Oct 25, 2010
  22. May 27, 2010
  23. May 21, 2010
  24. Mar 30, 2010
    • Tejun Heo's avatar
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo authored
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Guess-its-ok-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  25. Mar 12, 2010
    • Eric Paris's avatar
      anon_inodes: mark the anon inode private · 3836a03d
      Eric Paris authored
      Inotify was switched to use anon_inode instead of its own private filesystem
      which only had one inode in commit c44dcc56
      
       "switch inotify_user to
      anon_inode"
      
      The problem with this is that now the inotify inode is not a distinct inode
      which can be managed by LSMs.  userspace tools which use inotify were allowed
      to use the inotify inode but may not have had permission to do read/write type
      operations on the anon_inode.  After looking at the anon_inode and its users
      it looks like the best solution is to just mark the anon_inode as S_PRIVATE
      so the security system will ignore it.
      
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      Acked-by: default avatarJames Morris <jmorris@namei.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3836a03d
  26. Dec 22, 2009
  27. Dec 17, 2009
  28. Dec 16, 2009
  29. Oct 04, 2009
  30. Sep 23, 2009
    • Davide Libenzi's avatar
      anonfd: split interface into file creation and install · 562787a5
      Davide Libenzi authored
      
      Split the anonfd interface into a bare file pointer creation one, and a
      file pointer creation plus install one.
      
      There are cases, like the usage of eventfds inside other kernel
      interfaces, where the file pointer created by anonfd needs to be used
      inside the initialization of other structures.
      
      As it is right now, as soon as anon_inode_getfd() returns, the kenrle can
      race with userspace closing the newly installed file descriptor.
      
      This patch, while keeping the old anon_inode_getfd(), introduces a new
      anon_inode_getfile() (whose services are reused in anon_inode_getfd())
      that allows to split the file creation phase and the fd install one.
      
      Once all the kernel structures are initialized, the code can call the
      proper fd_install().
      
      Gregory manifested the need for something like this inside KVM.
      
      Signed-off-by: default avatarDavide Libenzi <davidel@xmailserver.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: James Morris <jmorris@namei.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Gregory Haskins <ghaskins@novell.com>
      Acked-by: default avatarSerge Hallyn <serue@us.ibm.com>
      Acked-by: default avatarRoland Dreier <rolandd@cisco.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      562787a5
  31. Jun 18, 2009