Forum | Documentation | Website | Blog

Skip to content
Snippets Groups Projects
  1. Feb 28, 2019
    • Al Viro's avatar
      convenience helpers: vfs_get_super() and sget_fc() · cb50b348
      Al Viro authored
      
      the former is an analogue of mount_{single,nodev} for use in
      ->get_tree() instances, the latter - analogue of sget() for the
      same.
      
      These are fairly similar to the originals, but the callback signature
      for sget_fc() is different from sget() ones, so getting bits and
      pieces shared would be too convoluted; we might get around to that
      later, but for now let's just remember to keep them in sync.  They
      do live next to each other, and changes in either won't be hard
      to spot.
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      cb50b348
    • David Howells's avatar
      vfs: Implement a filesystem superblock creation/configuration context · 3e1aeb00
      David Howells authored
      [AV - unfuck kern_mount_data(); we want non-NULL ->mnt_ns on long-living
      mounts]
      [AV - reordering fs/namespace.c is badly overdue, but let's keep it
      separate from that series]
      [AV - drop simple_pin_fs() change]
      [AV - clean vfs_kern_mount() failure exits up]
      
      Implement a filesystem context concept to be used during superblock
      creation for mount and superblock reconfiguration for remount.
      
      The mounting procedure then becomes:
      
       (1) Allocate new fs_context context.
      
       (2) Configure the context.
      
       (3) Create superblock.
      
       (4) Query the superblock.
      
       (5) Create a mount for the superblock.
      
       (6) Destroy the context.
      
      Rather than calling fs_type->mount(), an fs_context struct is created and
      fs_type->init_fs_context() is called to set it up.  Pointers exist for the
      filesystem and LSM to hang their private data off.
      
      A set of operations has to be set by ->init_fs_context() to provide
      freeing, duplication, option parsing, binary data parsing, vali...
      3e1aeb00
    • David Howells's avatar
      vfs: Put security flags into the fs_context struct · 846e5662
      David Howells authored
      
      Put security flags, such as SECURITY_LSM_NATIVE_LABELS, into the filesystem
      context so that the filesystem can communicate them to the LSM more easily.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      846e5662
    • David Howells's avatar
      smack: Implement filesystem context security hooks · 2febd254
      David Howells authored
      
      Implement filesystem context security hooks for the smack LSM.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Casey Schaufler <casey@schaufler-ca.com>
      cc: linux-security-module@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      2febd254
    • David Howells's avatar
      selinux: Implement the new mount API LSM hooks · 442155c1
      David Howells authored
      
      Implement the new mount API LSM hooks for SELinux.  At some point the old
      hooks will need to be removed.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Paul Moore <paul@paul-moore.com>
      cc: Stephen Smalley <sds@tycho.nsa.gov>
      cc: selinux@tycho.nsa.gov
      cc: linux-security-module@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      442155c1
    • David Howells's avatar
      vfs: Add LSM hooks for the new mount API · da2441fd
      David Howells authored
      Add LSM hooks for use by the new mount API and filesystem context code.
      This includes:
      
       (1) Hooks to handle allocation, duplication and freeing of the security
           record attached to a filesystem context.
      
       (2) A hook to snoop source specifications.  There may be multiple of these
           if the filesystem supports it.  They will to be local files/devices if
           fs_context::source_is_dev is true and will be something else, possibly
           remote server specifications, if false.
      
       (3) A hook to snoop superblock configuration options in key[=val] form.
           If the LSM decides it wants to handle it, it can suppress the option
           being passed to the filesystem.  Note that 'val' may include commas
           and binary data with the fsopen patch.
      
       (4) A hook to perform validation and allocation after the configuration
           has been done but before the superblock is allocated and set up.
      
       (5) A hook to transfer the security from the context to a newly created
           superblock...
      da2441fd
    • David Howells's avatar
      vfs: Add configuration parser helpers · 31d921c7
      David Howells authored
      Because the new API passes in key,value parameters, match_token() cannot be
      used with it.  Instead, provide three new helpers to aid with parsing:
      
       (1) fs_parse().  This takes a parameter and a simple static description of
           all the parameters and maps the key name to an ID.  It returns 1 on a
           match, 0 on no match if unknowns should be ignored and some other
           negative error code on a parse error.
      
           The parameter description includes a list of key names to IDs, desired
           parameter types and a list of enumeration name -> ID mappings.
      
           [!] Note that for the moment I've required that the key->ID mapping
           array is expected to be sorted and unterminated.  The size of the
           array is noted in the fsconfig_parser struct.  This allows me to use
           bsearch(), but I'm not sure any performance gain is worth the hassle
           of requiring people to keep the array sorted.
      
           The parameter type array is sized according to the number of parameter
         ...
      31d921c7
  2. Jan 30, 2019
    • David Howells's avatar
      vfs: Introduce logging functions · c6b82263
      David Howells authored
      
      Introduce a set of logging functions through which informational messages,
      warnings and error messages incurred by the mount procedure can be logged
      and, in a future patch, passed to userspace instead by way of the
      filesystem configuration context file descriptor.
      
      There are four functions:
      
       (1) infof(const char *fmt, ...);
      
           Logs an informational message.
      
       (2) warnf(const char *fmt, ...);
      
           Logs a warning message.
      
       (3) errorf(const char *fmt, ...);
      
           Logs an error message.
      
       (4) invalf(const char *fmt, ...);
      
           As errof(), but returns -EINVAL so can be used on a return statement.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      c6b82263
    • Al Viro's avatar
      introduce fs_context methods · f3a09c92
      Al Viro authored
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      f3a09c92
    • Al Viro's avatar
      fs_context flavour for submounts · e1a91586
      Al Viro authored
      
      This is an eventual replacement for vfs_submount() uses.  Unlike the
      "mount" and "remount" cases, the users of that thing are not in VFS -
      they are buried in various ->d_automount() instances and rather than
      converting them all at once we introduce the (thankfully small and
      simple) infrastructure here and deal with the prospective users in
      afs, nfs, etc. parts of the series.
      
      Here we just introduce a new constructor (fs_context_for_submount())
      along with the corresponding enum constant to be put into fc->purpose
      for those.
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      e1a91586
    • David Howells's avatar
      convert do_remount_sb() to fs_context · 8d0347f6
      David Howells authored
      
      Replace do_remount_sb() with a function, reconfigure_super(), that's
      fs_context aware.  The fs_context is expected to be parameterised already
      and have ->root pointing to the superblock to be reconfigured.
      
      A legacy wrapper is provided that is intended to be called from the
      fs_context ops when those appear, but for now is called directly from
      reconfigure_super().  This wrapper invokes the ->remount_fs() superblock op
      for the moment.  It is intended that the remount_fs() op will be phased
      out.
      
      The fs_context->purpose is set to FS_CONTEXT_FOR_RECONFIGURE to indicate
      that the context is being used for reconfiguration.
      
      do_umount_root() is provided to consolidate remount-to-R/O for umount and
      emergency remount by creating a context and invoking reconfiguration.
      
      do_remount(), do_umount() and do_emergency_remount_callback() are switched
      to use the new process.
      
      [AV -- fold UMOUNT and EMERGENCY_REMOUNT in; fixes the
      umount / bug, gets rid of pointless complexity]
      [AV -- set ->net_ns in all cases; nfs remount will need that]
      [AV -- shift security_sb_remount() call into reconfigure_super(); the callers
      that didn't do security_sb_remount() have NULL fc->security anyway, so it's
      a no-op for them]
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Co-developed-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      8d0347f6
    • Al Viro's avatar
      vfs_get_tree(): evict the call of security_sb_kern_mount() · c9ce29ed
      Al Viro authored
      
      Right now vfs_get_tree() calls security_sb_kern_mount() (i.e.
      mount MAC) unless it gets MS_KERNMOUNT or MS_SUBMOUNT in flags.
      Doing it that way is both clumsy and imprecise.
      
      Consider the callers' tree of vfs_get_tree():
      vfs_get_tree()
              <- do_new_mount()
      	<- vfs_kern_mount()
      		<- simple_pin_fs()
      		<- vfs_submount()
      		<- kern_mount_data()
      		<- init_mount_tree()
      		<- btrfs_mount()
      			<- vfs_get_tree()
      		<- nfs_do_root_mount()
      			<- nfs4_try_mount()
      				<- nfs_fs_mount()
      					<- vfs_get_tree()
      			<- nfs4_referral_mount()
      
      do_new_mount() always does need MAC (we are guaranteed that neither
      MS_KERNMOUNT nor MS_SUBMOUNT will be passed there).
      
      simple_pin_fs(), vfs_submount() and kern_mount_data() pass explicit
      flags inhibiting that check.  So does nfs4_referral_mount() (the
      flags there are ulimately coming from vfs_submount()).
      
      init_mount_tree() is called too early for anything LSM-related; it
      doesn't matter whether we attempt those checks, they'll do nothing.
      
      Finally, in case of btrfs_mount() and nfs_fs_mount(), doing MAC
      is pointless - either the caller will do it, or the flags are
      such that we wouldn't have done it either.
      
      In other words, the one and only case when we want that check
      done is when we are called from do_new_mount(), and there we
      want it unconditionally.
      
      So let's simply move it there.  The superblock is still locked,
      so nobody is going to get access to it (via ustat(2), etc.)
      until we get a chance to apply the checks - we are free to
      move them to any point up to where we drop ->s_umount (in
      do_new_mount_fc()).
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      c9ce29ed
    • David Howells's avatar
      new helper: do_new_mount_fc() · 132e4608
      David Howells authored
      
      Create an fs_context-aware version of do_new_mount().  This takes an
      fs_context with a superblock already attached to it.
      
      Make do_new_mount() use do_new_mount_fc() rather than do_new_mount(); this
      allows the consolidation of the mount creation, check and add steps.
      
      To make this work, mount_too_revealing() is changed to take a superblock
      rather than a mount (which the fs_context doesn't have available), allowing
      this check to be done before the mount object is created.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Co-developed-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      132e4608
    • Al Viro's avatar
      teach vfs_get_tree() to handle subtype, switch do_new_mount() to it · a0c9a8b8
      Al Viro authored
      
      Roll the handling of subtypes into do_new_mount() and vfs_get_tree().  The
      former determines any subtype string and hangs it off the fs_context; the
      latter applies it.
      
      Make do_new_mount() create, parameterise and commit an fs_context and
      create a mount for itself rather than calling vfs_kern_mount().
      
      [AV -- missing kstrdup()]
      [AV -- ... and no kstrdup() if we get to setting ->s_submount - we
      simply transfer it from fc, leaving NULL behind]
      [AV -- constify ->s_submount, while we are at it]
      
      Reviewed-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      a0c9a8b8
    • Al Viro's avatar
      new helpers: vfs_create_mount(), fc_mount() · 8f291889
      Al Viro authored
      
      Create a new helper, vfs_create_mount(), that creates a detached vfsmount
      object from an fs_context that has a superblock attached to it.
      
      Almost all uses will be paired with immediately preceding vfs_get_tree();
      add a helper for such combination.
      
      Switch vfs_kern_mount() to use this.
      
      NOTE: mild behaviour change; passing NULL as 'device name' to
      something like procfs will change /proc/*/mountstats - "device none"
      instead on "no device".  That is consistent with /proc/mounts et.al.
      
      [do'h - EXPORT_SYMBOL_GPL slipped in by mistake; removed]
      [AV -- remove confused comment from vfs_create_mount()]
      [AV -- removed the second argument]
      
      Reviewed-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      8f291889
    • David Howells's avatar
      vfs: Introduce fs_context, switch vfs_kern_mount() to it. · 9bc61ab1
      David Howells authored
      
      Introduce a filesystem context concept to be used during superblock
      creation for mount and superblock reconfiguration for remount.  This is
      allocated at the beginning of the mount procedure and into it is placed:
      
       (1) Filesystem type.
      
       (2) Namespaces.
      
       (3) Source/Device names (there may be multiple).
      
       (4) Superblock flags (SB_*).
      
       (5) Security details.
      
       (6) Filesystem-specific data, as set by the mount options.
      
      Accessor functions are then provided to set up a context, parameterise it
      from monolithic mount data (the data page passed to mount(2)) and tear it
      down again.
      
      A legacy wrapper is provided that implements what will be the basic
      operations, wrapping access to filesystems that aren't yet aware of the
      fs_context.
      
      Finally, vfs_kern_mount() is changed to make use of the fs_context and
      mount_fs() is replaced by vfs_get_tree(), called from vfs_kern_mount().
      [AV -- add missing kstrdup()]
      [AV -- put_cred() can be unconditional - fc->cred can't be NULL]
      [AV -- take legacy_validate() contents into legacy_parse_monolithic()]
      [AV -- merge KERNEL_MOUNT and USER_MOUNT]
      [AV -- don't unlock superblock on success return from vfs_get_tree()]
      [AV -- kill 'reference' argument of init_fs_context()]
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Co-developed-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      9bc61ab1
    • Al Viro's avatar
      saner handling of temporary namespaces · 74e83122
      Al Viro authored
      
      mount_subtree() creates (and soon destroys) a temporary namespace,
      so that automounts could function normally.  These beasts should
      never become anyone's current namespaces; they don't, but it would
      be better to make prevention of that more straightforward.  And
      since they don't become anyone's current namespace, we don't need
      to bother with reserving procfs inums for those.
      
      Teach alloc_mnt_ns() to skip inum allocation if told so, adjust
      put_mnt_ns() accordingly, make mount_subtree() use temporary
      (anon) namespace.  is_anon_ns() checks if a namespace is such.
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      74e83122
    • Al Viro's avatar
      separate copying and locking mount tree on cross-userns copies · 3bd045cc
      Al Viro authored
      
      Rather than having propagate_mnt() check doing unprivileged copies,
      lock them before commit_tree().
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      3bd045cc
  3. Jan 17, 2019
    • Al Viro's avatar
      kill kernfs_pin_sb() · 6d7fbce7
      Al Viro authored
      
      unused now and impossible to use safely anyway.
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      6d7fbce7
    • Al Viro's avatar
      cgroup: saner refcounting for cgroup_root · 35ac1184
      Al Viro authored
      * make the reference from superblock to cgroup_root counting -
      do cgroup_put() in cgroup_kill_sb() whether we'd done
      percpu_ref_kill() or not; matching grab is done when we allocate
      a new root.  That gives the same refcounting rules for all callers
      of cgroup_do_mount() - a reference to cgroup_root has been grabbed
      by caller and it either is transferred to new superblock or dropped.
      
      * have cgroup_kill_sb() treat an already killed refcount as "just
      don't bother killing it, then".
      
      * after successful cgroup_do_mount() have cgroup1_mount() recheck
      if we'd raced with mount/umount from somebody else and cgroup_root
      got killed.  In that case we drop the superblock and bugger off
      with -ERESTARTSYS, same as if we'd found it in the list already
      dying.
      
      * don't bother with delayed initialization of refcount - it's
      unreliable and not needed.  No need to prevent attempts to bump
      the refcount if we find cgroup_root of another mount in progress -
      sget will reuse an existing sup...
      35ac1184
    • Al Viro's avatar
      fix cgroup_do_mount() handling of failure exits · 399504e2
      Al Viro authored
      same story as with last May fixes in sysfs (7b745a4e
      
      
      "unfuck sysfs_mount()"); new_sb is left uninitialized
      in case of early errors in kernfs_mount_ns() and papering
      over it by treating any error from kernfs_mount_ns() as
      equivalent to !new_ns ends up conflating the cases when
      objects had never been transferred to a superblock with
      ones when that has happened and resulting new superblock
      had been dropped.  Easily fixed (same way as in sysfs
      case).  Additionally, there's a superblock leak on
      kernfs_node_dentry() failure *and* a dentry leak inside
      kernfs_node_dentry() itself - the latter on probably
      impossible errors, but the former not impossible to trigger
      (as the matter of fact, injecting allocation failures
      at that point *does* trigger it).
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      399504e2
  4. Jan 13, 2019
  5. Jan 12, 2019
    • Linus Torvalds's avatar
      Merge tag 'for-linus-20190112' of git://git.kernel.dk/linux-block · b8c3b899
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
      
       - NVMe pull request from Christoph, with little fixes all over the map
      
       - Loop caching fix for offset/bs change (Jaegeuk Kim)
      
       - Block documentation tweaks (Jeff, Jon, Weiping, John)
      
       - null_blk zoned tweak (John)
      
       - ahch mvebu suspend/resume support. Should have gone into the merge
         window, but there was some confusion on which tree had it. (Miquel)
      
      * tag 'for-linus-20190112' of git://git.kernel.dk/linux-block: (22 commits)
        ata: ahci: mvebu: request PHY suspend/resume for Armada 3700
        ata: ahci: mvebu: add Armada 3700 initialization needed for S2RAM
        ata: ahci: mvebu: do Armada 38x configuration only on relevant SoCs
        ata: ahci: mvebu: remove stale comment
        ata: libahci_platform: comply to PHY framework
        loop: drop caches if offset or block_size are changed
        block: fix kerneldoc comment for blk_attempt_plug_merge()
        nvme: don't initlialize ctrl->cntlid twice
        nvme: introduce NVME_QUIRK_IGNORE_DEV_SUBNQN
        nvme: pad fake subsys NQN vid and ssvid with zeros
        nvme-multipath: zero out ANA log buffer
        nvme-fabrics: unset write/poll queues for discovery controllers
        nvme-tcp: don't ask if controller is fabrics
        nvme-tcp: remove dead code
        nvme-pci: fix out of bounds access in nvme_cqe_pending
        nvme-pci: rerun irq setup on IO queue init errors
        nvme-pci: use the same attributes when freeing host_mem_desc_bufs.
        nvme-pci: fix the wrong setting of nr_maps
        block: doc: add slice_idle_us to bfq documentation
        block: clarify documentation for blk_{start|finish}_plug
        ...
      b8c3b899
    • Linus Torvalds's avatar
      Merge tag 'remove-dma_zalloc_coherent-5.0' of git://git.infradead.org/users/hch/dma-mapping · 66c56cfa
      Linus Torvalds authored
      Pull dma_zalloc_coherent() removal from Christoph Hellwig:
       "We've always had a weird situation around dma_zalloc_coherent. To
        safely support mapping the allocations to userspace major
        architectures like x86 and arm have always zeroed allocations from
        dma_alloc_coherent, but a couple other architectures were missing that
        zeroing either always or in corner cases.
      
        Then later we grew anothe dma_zalloc_coherent interface to explicitly
        request zeroing, but that just added __GFP_ZERO to the allocation
        flags, which for some allocators that didn't end up using the page
        allocator ended up being a no-op and still not zeroing the
        allocations.
      
        So for this merge window I fixed up all remaining architectures to
        zero the memory in dma_alloc_coherent, and made dma_zalloc_coherent a
        no-op wrapper around dma_alloc_coherent, which fixes all of the above
        issues.
      
        dma_zalloc_coherent is now pointless and c...
      66c56cfa