Forum | Documentation | Website | Blog

Skip to content
Snippets Groups Projects
  1. Jun 15, 2024
  2. Jun 03, 2024
  3. Apr 24, 2024
  4. Dec 20, 2023
  5. Oct 04, 2023
  6. Aug 21, 2023
    • Aleksa Sarai's avatar
      memfd: replace ratcheting feature from vm.memfd_noexec with hierarchy · 9876cfe8
      Aleksa Sarai authored
      This sysctl has the very unusual behaviour of not allowing any user (even
      CAP_SYS_ADMIN) to reduce the restriction setting, meaning that if you were
      to set this sysctl to a more restrictive option in the host pidns you
      would need to reboot your machine in order to reset it.
      
      The justification given in [1] is that this is a security feature and thus
      it should not be possible to disable.  Aside from the fact that we have
      plenty of security-related sysctls that can be disabled after being
      enabled (fs.protected_symlinks for instance), the protection provided by
      the sysctl is to stop users from being able to create a binary and then
      execute it.  A user with CAP_SYS_ADMIN can trivially do this without
      memfd_create(2):
      
        % cat mount-memfd.c
        #include <fcntl.h>
        #include <string.h>
        #include <stdio.h>
        #include <stdlib.h>
        #include <unistd.h>
        #include <linux/mount.h>
      
        #define SHELLCODE "#!/bin/echo this file was executed from this totally private tmpfs:"
      
        int main(void)
        {
        	int fsfd = fsopen("tmpfs", FSOPEN_CLOEXEC);
        	assert(fsfd >= 0);
        	assert(!fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 2));
      
        	int dfd = fsmount(fsfd, FSMOUNT_CLOEXEC, 0);
        	assert(dfd >= 0);
      
        	int execfd = openat(dfd, "exe", O_CREAT | O_RDWR | O_CLOEXEC, 0782);
        	assert(execfd >= 0);
        	assert(write(execfd, SHELLCODE, strlen(SHELLCODE)) == strlen(SHELLCODE));
        	assert(!close(execfd));
      
        	char *execpath = NULL;
        	char *argv[] = { "bad-exe", NULL }, *envp[] = { NULL };
        	execfd = openat(dfd, "exe", O_PATH | O_CLOEXEC);
        	assert(execfd >= 0);
        	assert(asprintf(&execpath, "/proc/self/fd/%d", execfd) > 0);
        	assert(!execve(execpath, argv, envp));
        }
        % ./mount-memfd
        this file was executed from this totally private tmpfs: /proc/self/fd/5
        %
      
      Given that it is possible for CAP_SYS_ADMIN users to create executable
      binaries without memfd_create(2) and without touching the host filesystem
      (not to mention the many other things a CAP_SYS_ADMIN process would be
      able to do that would be equivalent or worse), it seems strange to cause a
      fair amount of headache to admins when there doesn't appear to be an
      actual security benefit to blocking this.  There appear to be concerns
      about confused-deputy-esque attacks[2] but a confused deputy that can
      write to arbitrary sysctls is a bigger security issue than executable
      memfds.
      
      /* New API */
      
      The primary requirement from the original author appears to be more based
      on the need to be able to restrict an entire system in a hierarchical
      manner[3], such that child namespaces cannot re-enable executable memfds.
      
      So, implement that behaviour explicitly -- the vm.memfd_noexec scope is
      evaluated up the pidns tree to &init_pid_ns and you have the most
      restrictive value applied to you.  The new lower limit you can set
      vm.memfd_noexec is whatever limit applies to your parent.
      
      Note that a pidns will inherit a copy of the parent pidns's effective
      vm.memfd_noexec setting at unshare() time.  This matches the existing
      behaviour, and it also ensures that a pidns will never have its
      vm.memfd_noexec setting *lowered* behind its back (but it will be raised
      if the parent raises theirs).
      
      /* Backwards Compatibility */
      
      As the previous version of the sysctl didn't allow you to lower the
      setting at all, there are no backwards compatibility issues with this
      aspect of the change.
      
      However it should be noted that now that the setting is completely
      hierarchical.  Previously, a cloned pidns would just copy the current
      pidns setting, meaning that if the parent's vm.memfd_noexec was changed it
      wouldn't propoagate to existing pid namespaces.  Now, the restriction
      applies recursively.  This is a uAPI change, however:
      
       * The sysctl is very new, having been merged in 6.3.
       * Several aspects of the sysctl were broken up until this patchset and
         the other patchset by Jeff Xu last month.
      
      And thus it seems incredibly unlikely that any real users would run into
      this issue. In the worst case, if this causes userspace isues we could
      make it so that modifying the setting follows the hierarchical rules but
      the restriction checking uses the cached copy.
      
      [1]: https://lore.kernel.org/CABi2SkWnAgHK1i6iqSqPMYuNEhtHBkO8jUuCvmG3RmUB5TKHJw@mail.gmail.com/
      [2]: https://lore.kernel.org/CALmYWFs_dNCzw_pW1yRAo4bGCPEtykroEQaowNULp7svwMLjOg@mail.gmail.com/
      [3]: https://lore.kernel.org/CALmYWFuahdUF7cT4cm7_TGLqPanuHXJ-hVSfZt7vpTnc18DPrw@mail.gmail.com/
      
      Link: https://lkml.kernel.org/r/20230814-memfd-vm-noexec-uapi-fixes-v2-4-7ff9e3e10ba6@cyphar.com
      Fixes: 105ff533
      
       ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC")
      Signed-off-by: default avatarAleksa Sarai <cyphar@cyphar.com>
      Cc: Dominique Martinet <asmadeus@codewreck.org>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: Daniel Verkamp <dverkamp@chromium.org>
      Cc: Jeff Xu <jeffxu@google.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9876cfe8
  7. Jul 01, 2023
  8. Jun 30, 2023
  9. May 02, 2023
  10. Jan 18, 2023
    • Jeff Xu's avatar
      mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC · 105ff533
      Jeff Xu authored
      The new MFD_NOEXEC_SEAL and MFD_EXEC flags allows application to set
      executable bit at creation time (memfd_create).
      
      When MFD_NOEXEC_SEAL is set, memfd is created without executable bit
      (mode:0666), and sealed with F_SEAL_EXEC, so it can't be chmod to be
      executable (mode: 0777) after creation.
      
      when MFD_EXEC flag is set, memfd is created with executable bit
      (mode:0777), this is the same as the old behavior of memfd_create.
      
      The new pid namespaced sysctl vm.memfd_noexec has 3 values:
      0: memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like
              MFD_EXEC was set.
      1: memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like
              MFD_NOEXEC_SEAL was set.
      2: memfd_create() without MFD_NOEXEC_SEAL will be rejected.
      
      The sysctl allows finer control of memfd_create for old-software that
      doesn't set the executable bit, for example, a container with
      vm.memfd_noexec=1 means the old-software will create non-executable memfd
      by default.  Also, the value of memfd_noexec is passed to child namespace
      at creation time.  For example, if the init namespace has
      vm.memfd_noexec=2, all its children namespaces will be created with 2.
      
      [akpm@linux-foundation.org: add stub functions to fix build]
      [akpm@linux-foundation.org: remove unneeded register_pid_ns_ctl_table_vm() stub, per Jeff]
      [akpm@linux-foundation.org: s/pr_warn_ratelimited/pr_warn_once/, per review]
      [akpm@linux-foundation.org: fix CONFIG_SYSCTL=n warning]
      Link: https://lkml.kernel.org/r/20221215001205.51969-4-jeffxu@google.com
      
      
      Signed-off-by: default avatarJeff Xu <jeffxu@google.com>
      Co-developed-by: default avatarDaniel Verkamp <dverkamp@chromium.org>
      Signed-off-by: default avatarDaniel Verkamp <dverkamp@chromium.org>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Cc: David Herrmann <dh.herrmann@gmail.com>
      Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
      Cc: Shuah Khan <skhan@linuxfoundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      105ff533
  11. Jan 03, 2023
    • Frederic Weisbecker's avatar
      rcu-tasks: Fix synchronize_rcu_tasks() VS zap_pid_ns_processes() · 28319d6d
      Frederic Weisbecker authored
      RCU Tasks and PID-namespace unshare can interact in do_exit() in a
      complicated circular dependency:
      
      1) TASK A calls unshare(CLONE_NEWPID), this creates a new PID namespace
         that every subsequent child of TASK A will belong to. But TASK A
         doesn't itself belong to that new PID namespace.
      
      2) TASK A forks() and creates TASK B. TASK A stays attached to its PID
         namespace (let's say PID_NS1) and TASK B is the first task belonging
         to the new PID namespace created by unshare()  (let's call it PID_NS2).
      
      3) Since TASK B is the first task attached to PID_NS2, it becomes the
         PID_NS2 child reaper.
      
      4) TASK A forks() again and creates TASK C which get attached to PID_NS2.
         Note how TASK C has TASK A as a parent (belonging to PID_NS1) but has
         TASK B (belonging to PID_NS2) as a pid_namespace child_reaper.
      
      5) TASK B exits and since it is the child reaper for PID_NS2, it has to
         kill all other tasks attached to PID_NS2, and wait for all of them to
         die before getting reaped itself (zap_pid_ns_process()).
      
      6) TASK A calls synchronize_rcu_tasks() which leads to
         synchronize_srcu(&tasks_rcu_exit_srcu).
      
      7) TASK B is waiting for TASK C to get reaped. But TASK B is under a
         tasks_rcu_exit_srcu SRCU critical section (exit_notify() is between
         exit_tasks_rcu_start() and exit_tasks_rcu_finish()), blocking TASK A.
      
      8) TASK C exits and since TASK A is its parent, it waits for it to reap
         TASK C, but it can't because TASK A waits for TASK B that waits for
         TASK C.
      
      Pid_namespace semantics can hardly be changed at this point. But the
      coverage of tasks_rcu_exit_srcu can be reduced instead.
      
      The current task is assumed not to be concurrently reapable at this
      stage of exit_notify() and therefore tasks_rcu_exit_srcu can be
      temporarily relaxed without breaking its constraints, providing a way
      out of the deadlock scenario.
      
      [ paulmck: Fix build failure by adding additional declaration. ]
      
      Fixes: 3f95aa81
      
       ("rcu: Make TASKS_RCU handle tasks that are almost done exiting")
      Reported-by: default avatarPengfei Xu <pengfei.xu@intel.com>
      Suggested-by: default avatarBoqun Feng <boqun.feng@gmail.com>
      Suggested-by: default avatarNeeraj Upadhyay <quic_neeraju@quicinc.com>
      Suggested-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Cc: Eric W . Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      28319d6d
  12. Apr 29, 2022
  13. Sep 03, 2021
  14. Oct 16, 2020
  15. Aug 19, 2020
    • Kirill Tkhai's avatar
      pid: Use generic ns_common::count · 8eb71d95
      Kirill Tkhai authored
      
      Switch over pid namespaces to use the newly introduced common lifetime
      counter.
      
      Currently every namespace type has its own lifetime counter which is stored
      in the specific namespace struct. The lifetime counters are used
      identically for all namespaces types. Namespaces may of course have
      additional unrelated counters and these are not altered.
      
      This introduces a common lifetime counter into struct ns_common. The
      ns_common struct encompasses information that all namespaces share. That
      should include the lifetime counter since its common for all of them.
      
      It also allows us to unify the type of the counters across all namespaces.
      Most of them use refcount_t but one uses atomic_t and at least one uses
      kref. Especially the last one doesn't make much sense since it's just a
      wrapper around refcount_t since 2016 and actually complicates cleanup
      operations by having to use container_of() to cast the correct namespace
      struct out of struct ns_common.
      
      Having the lifetime counter for the namespaces in one place reduces
      maintenance cost. Not just because after switching all namespaces over we
      will have removed more code than we added but also because the logic is
      more easily understandable and we indicate to the user that the basic
      lifetime requirements for all namespaces are currently identical.
      
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Acked-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Link: https://lore.kernel.org/r/159644979226.604812.7512601754841882036.stgit@localhost.localdomain
      
      
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      8eb71d95
  16. Jul 19, 2020
  17. May 09, 2020
    • Christian Brauner's avatar
      nsproxy: add struct nsset · f2a8d52e
      Christian Brauner authored
      
      Add a simple struct nsset. It holds all necessary pieces to switch to a new
      set of namespaces without leaving a task in a half-switched state which we
      will make use of in the next patch. This patch switches the existing setns
      logic over without causing a change in setns() behavior. This brings
      setns() closer to how unshare() works(). The prepare_ns() function is
      responsible to prepare all necessary information. This has two reasons.
      First it minimizes dependencies between individual namespaces, i.e. all
      install handler can expect that all fields are properly initialized
      independent in what order they are called in. Second, this makes the code
      easier to maintain and easier to follow if it needs to be changed.
      
      The prepare_ns() helper will only be switched over to use a flags argument
      in the next patch. Here it will still use nstype as a simple integer
      argument which was argued would be clearer. I'm not particularly
      opinionated about this if it really helps or not. The struct nsset itself
      already contains the flags field since its name already indicates that it
      can contain information required by different namespaces. None of this
      should have functional consequences.
      
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Reviewed-by: default avatarSerge Hallyn <serge@hallyn.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Aleksa Sarai <cyphar@cyphar.com>
      Link: https://lore.kernel.org/r/20200505140432.181565-2-christian.brauner@ubuntu.com
      f2a8d52e
  18. Apr 27, 2020
  19. Feb 28, 2020
  20. Nov 15, 2019
    • Adrian Reber's avatar
      fork: extend clone3() to support setting a PID · 49cb2fc4
      Adrian Reber authored
      
      The main motivation to add set_tid to clone3() is CRIU.
      
      To restore a process with the same PID/TID CRIU currently uses
      /proc/sys/kernel/ns_last_pid. It writes the desired (PID - 1) to
      ns_last_pid and then (quickly) does a clone(). This works most of the
      time, but it is racy. It is also slow as it requires multiple syscalls.
      
      Extending clone3() to support *set_tid makes it possible restore a
      process using CRIU without accessing /proc/sys/kernel/ns_last_pid and
      race free (as long as the desired PID/TID is available).
      
      This clone3() extension places the same restrictions (CAP_SYS_ADMIN)
      on clone3() with *set_tid as they are currently in place for ns_last_pid.
      
      The original version of this change was using a single value for
      set_tid. At the 2019 LPC, after presenting set_tid, it was, however,
      decided to change set_tid to an array to enable setting the PID of a
      process in multiple PID namespaces at the same time. If a process is
      created in a PID namespace it is possible to influence the PID inside
      and outside of the PID namespace. Details also in the corresponding
      selftest.
      
      To create a process with the following PIDs:
      
            PID NS level         Requested PID
              0 (host)              31496
              1                        42
              2                         1
      
      For that example the two newly introduced parameters to struct
      clone_args (set_tid and set_tid_size) would need to be:
      
        set_tid[0] = 1;
        set_tid[1] = 42;
        set_tid[2] = 31496;
        set_tid_size = 3;
      
      If only the PIDs of the two innermost nested PID namespaces should be
      defined it would look like this:
      
        set_tid[0] = 1;
        set_tid[1] = 42;
        set_tid_size = 2;
      
      The PID of the newly created process would then be the next available
      free PID in the PID namespace level 0 (host) and 42 in the PID namespace
      at level 1 and the PID of the process in the innermost PID namespace
      would be 1.
      
      The set_tid array is used to specify the PID of a process starting
      from the innermost nested PID namespaces up to set_tid_size PID namespaces.
      
      set_tid_size cannot be larger then the current PID namespace level.
      
      Signed-off-by: default avatarAdrian Reber <areber@redhat.com>
      Reviewed-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Reviewed-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reviewed-by: default avatarDmitry Safonov <0x7f454c46@gmail.com>
      Acked-by: default avatarAndrei Vagin <avagin@gmail.com>
      Link: https://lore.kernel.org/r/20191115123621.142252-1-areber@redhat.com
      
      
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      49cb2fc4
  21. Jul 18, 2019
  22. May 27, 2019
    • Eric W. Biederman's avatar
      signal/pid_namespace: Fix reboot_pid_ns to use send_sig not force_sig · f9070dc9
      Eric W. Biederman authored
      The locking in force_sig_info is not prepared to deal with a task that
      exits or execs (as sighand may change).  The is not a locking problem
      in force_sig as force_sig is only built to handle synchronous
      exceptions.
      
      Further the function force_sig_info changes the signal state if the
      signal is ignored, or blocked or if SIGNAL_UNKILLABLE will prevent the
      delivery of the signal.  The signal SIGKILL can not be ignored and can
      not be blocked and SIGNAL_UNKILLABLE won't prevent it from being
      delivered.
      
      So using force_sig rather than send_sig for SIGKILL is confusing
      and pointless.
      
      Because it won't impact the sending of the signal and and because
      using force_sig is wrong, replace force_sig with send_sig.
      
      Cc: Daniel Lezcano <daniel.lezcano@free.fr>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Fixes: cf3f8921
      
       ("pidns: add reboot_pid_ns() to handle the reboot syscall")
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      f9070dc9
  23. May 21, 2019
  24. Sep 16, 2018
    • Eric W. Biederman's avatar
      signal: Use group_send_sig_info to kill all processes in a pid namespace · 82058d66
      Eric W. Biederman authored
      
      Replace send_sig_info in zap_pid_ns_processes with
      group_send_sig_info.  This makes more sense as the entire process
      group is being killed.  More importantly this allows the kill of those
      processes with PIDTYPE_MAX to indicate all of the process in the pid
      namespace are being signaled.  This is needed for fork to detect when
      signals are sent to a group of processes.
      
      Admittedly fork has another case to catch SIGKILL but the principle remains
      that it is desirable to know when a group of processes is being signaled.
      
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      82058d66
  25. Sep 11, 2018
  26. Apr 02, 2018
  27. Mar 21, 2018
  28. Nov 17, 2017
  29. Jul 20, 2017
    • Eric W. Biederman's avatar
      userns,pidns: Verify the userns for new pid namespaces · a2b42626
      Eric W. Biederman authored
      It is pointless and confusing to allow a pid namespace hierarchy and
      the user namespace hierarchy to get out of sync.  The owner of a child
      pid namespace should be the owner of the parent pid namespace or
      a descendant of the owner of the parent pid namespace.
      
      Otherwise it is possible to construct scenarios where a process has a
      capability over a parent pid namespace but does not have the
      capability over a child pid namespace.  Which confusingly makes
      permission checks non-transitive.
      
      It requires use of setns into a pid namespace (but not into a user
      namespace) to create such a scenario.
      
      Add the function in_userns to help in making this determination.
      
      v2: Optimized in_userns by using level as suggested
          by: Kirill Tkhai <ktkhai@virtuozzo.com>
      
      Ref: 49f4d8b9
      
       ("pidns: Capture the user namespace and filter ns_last_pid")
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      a2b42626
  30. May 13, 2017
  31. May 08, 2017
    • Kirill Tkhai's avatar
      pidns: expose task pid_ns_for_children to userspace · eaa0d190
      Kirill Tkhai authored
      pid_ns_for_children set by a task is known only to the task itself, and
      it's impossible to identify it from outside.
      
      It's a big problem for checkpoint/restore software like CRIU, because it
      can't correctly handle tasks, that do setns(CLONE_NEWPID) in proccess of
      their work.
      
      This patch solves the problem, and it exposes pid_ns_for_children to ns
      directory in standard way with the name "pid_for_children":
      
        ~# ls /proc/5531/ns -l | grep pid
        lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid -> pid:[4026531836]
        lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid_for_children -> pid:[4026532286]
      
      Link: http://lkml.kernel.org/r/149201123914.6007.2187327078064239572.stgit@localhost.localdomain
      
      
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Cc: Andrei Vagin <avagin@virtuozzo.com>
      Cc: Andreas Gruenbacher <agruenba@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul Moore <paul@paul-moore.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Serge Hallyn <serge@hallyn.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      eaa0d190
  32. Mar 02, 2017
  33. Jan 09, 2017
    • Andrei Vagin's avatar
      pid: fix lockdep deadlock warning due to ucount_lock · add7c65c
      Andrei Vagin authored
      =========================================================
      [ INFO: possible irq lock inversion dependency detected ]
      4.10.0-rc2-00024-g4aecec9-dirty #118 Tainted: G        W
      ---------------------------------------------------------
      swapper/1/0 just changed the state of lock:
       (&(&sighand->siglock)->rlock){-.....}, at: [<ffffffffbd0a1bc6>] __lock_task_sighand+0xb6/0x2c0
      but this lock took another, HARDIRQ-unsafe lock in the past:
       (ucounts_lock){+.+...}
      and interrupts could create inverse lock ordering between them.
      other info that might help us debug this:
      Chain exists of:                 &(&sighand->siglock)->rlock --> &(&tty->ctrl_lock)->rlock --> ucounts_lock
       Possible interrupt unsafe locking scenario:
             CPU0                    CPU1
             ----                    ----
        lock(ucounts_lock);
                                     local_irq_disable();
                                     lock(&(&sighand->siglock)->rlock);
                                     lock(&(&tty->ctrl_lock)->rlock);
        <Interrupt>
          lock(&(&sighand->siglock)->rlock);
      
       *** DEADLOCK ***
      
      This patch removes a dependency between rlock and ucount_lock.
      
      Fixes: f333c700
      
       ("pidns: Add a limit on the number of pid namespaces")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAndrei Vagin <avagin@openvz.org>
      Acked-by: default avatarAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      add7c65c
  34. Sep 22, 2016