Forum | Documentation | Website | Blog

Skip to content
Snippets Groups Projects
  1. Jul 19, 2024
    • Jason A. Donenfeld's avatar
      random: introduce generic vDSO getrandom() implementation · 4ad10a5f
      Jason A. Donenfeld authored
      
      Provide a generic C vDSO getrandom() implementation, which operates on
      an opaque state returned by vgetrandom_alloc() and produces random bytes
      the same way as getrandom(). This has the following API signature:
      
        ssize_t vgetrandom(void *buffer, size_t len, unsigned int flags,
                           void *opaque_state, size_t opaque_len);
      
      The return value and the first three arguments are the same as ordinary
      getrandom(), while the last two arguments are a pointer to the opaque
      allocated state and its size. Were all five arguments passed to the
      getrandom() syscall, nothing different would happen, and the functions
      would have the exact same behavior.
      
      The actual vDSO RNG algorithm implemented is the same one implemented by
      drivers/char/random.c, using the same fast-erasure techniques as that.
      Should the in-kernel implementation change, so too will the vDSO one.
      
      It requires an implementation of ChaCha20 that does not use any stack,
      in order to maintain forward secrecy if a multi-threaded program forks
      (though this does not account for a similar issue with SA_SIGINFO
      copying registers to the stack), so this is left as an
      architecture-specific fill-in. Stack-less ChaCha20 is an easy algorithm
      to implement on a variety of architectures, so this shouldn't be too
      onerous.
      
      Initially, the state is keyless, and so the first call makes a
      getrandom() syscall to generate that key, and then uses it for
      subsequent calls. By keeping track of a generation counter, it knows
      when its key is invalidated and it should fetch a new one using the
      syscall. Later, more than just a generation counter might be used.
      
      Since MADV_WIPEONFORK is set on the opaque state, the key and related
      state is wiped during a fork(), so secrets don't roll over into new
      processes, and the same state doesn't accidentally generate the same
      random stream. The generation counter, as well, is always >0, so that
      the 0 counter is a useful indication of a fork() or otherwise
      uninitialized state.
      
      If the kernel RNG is not yet initialized, then the vDSO always calls the
      syscall, because that behavior cannot be emulated in userspace, but
      fortunately that state is short lived and only during early boot. If it
      has been initialized, then there is no need to inspect the `flags`
      argument, because the behavior does not change post-initialization
      regardless of the `flags` value.
      
      Since the opaque state passed to it is mutated, vDSO getrandom() is not
      reentrant, when used with the same opaque state, which libc should be
      mindful of.
      
      The function works over an opaque per-thread state of a particular size,
      which must be marked VM_WIPEONFORK, VM_DONTDUMP, VM_NORESERVE, and
      VM_DROPPABLE for proper operation. Over time, the nuances of these
      allocations may change or grow or even differ based on architectural
      features.
      
      The opaque state passed to vDSO getrandom() must be allocated using the
      mmap_flags and mmap_prot parameters provided by the vgetrandom_opaque_params
      struct, which also contains the size of each state. That struct can be
      obtained with a call to vgetrandom(NULL, 0, 0, &params, ~0UL). Then,
      libc can call mmap(2) and slice up the returned array into a state per
      each thread, while ensuring that no single state straddles a page
      boundary. Libc is expected to allocate a chunk of these on first use,
      and then dole them out to threads as they're created, allocating more
      when needed.
      
      vDSO getrandom() provides the ability for userspace to generate random
      bytes quickly and safely, and is intended to be integrated into libc's
      thread management. As an illustrative example, the introduced code in
      the vdso_test_getrandom self test later in this series might be used to
      do the same outside of libc. In a libc the various pthread-isms are
      expected to be elided into libc internals.
      
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      4ad10a5f
    • Yang Yang's avatar
      sbitmap: fix io hung due to race on sbitmap_word::cleared · 72d04bdc
      Yang Yang authored
      Configuration for sbq:
        depth=64, wake_batch=6, shift=6, map_nr=1
      
      1. There are 64 requests in progress:
        map->word = 0xFFFFFFFFFFFFFFFF
      2. After all the 64 requests complete, and no more requests come:
        map->word = 0xFFFFFFFFFFFFFFFF, map->cleared = 0xFFFFFFFFFFFFFFFF
      3. Now two tasks try to allocate requests:
        T1:                                       T2:
        __blk_mq_get_tag                          .
        __sbitmap_queue_get                       .
        sbitmap_get                               .
        sbitmap_find_bit                          .
        sbitmap_find_bit_in_word                  .
        __sbitmap_get_word  -> nr=-1              __blk_mq_get_tag
        sbitmap_deferred_clear                    __sbitmap_queue_get
        /* map->cleared=0xFFFFFFFFFFFFFFFF */     sbitmap_find_bit
          if (!READ_ONCE(map->cleared))           sbitmap_find_bit_in_word
            return false;                         __sbitmap_get_word -> nr=-1
          mask = xchg(&map->cleared, 0)           sbitmap_deferred_clear
          atomic_long_andnot()                    /* map->cleared=0 */
                                                    if (!(map->cleared))
                                                      return false;
                                           /*
                                            * map->cleared is cleared by T1
                                            * T2 fail to acquire the tag
                                            */
      
      4. T2 is the sole tag waiter. When T1 puts the tag, T2 cannot be woken
      up due to the wake_batch being set at 6. If no more requests come, T1
      will wait here indefinitely.
      
      This patch achieves two purposes:
      1. Check on ->cleared and update on both ->cleared and ->word need to
      be done atomically, and using spinlock could be the simplest solution.
      2. Add extra check in sbitmap_deferred_clear(), to identify whether
      ->word has free bits.
      
      Fixes: ea86ea2c
      
       ("sbitmap: ammortize cost of clearing bits")
      Signed-off-by: default avatarYang Yang <yang.yang@vivo.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Link: https://lore.kernel.org/r/20240716082644.659566-1-yang.yang@vivo.com
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      72d04bdc
  2. Jul 14, 2024
    • Masahiro Yamada's avatar
      fortify: fix warnings in fortify tests with KASAN · 84679f04
      Masahiro Yamada authored
      When a software KASAN mode is enabled, the fortify tests emit warnings
      on some architectures.
      
      For example, for ARCH=arm, the combination of CONFIG_FORTIFY_SOURCE=y
      and CONFIG_KASAN=y produces the following warnings:
      
          TEST    lib/test_fortify/read_overflow-memchr.log
        warning: unsafe memchr() usage lacked '__read_overflow' warning in lib/test_fortify/read_overflow-memchr.c
          TEST    lib/test_fortify/read_overflow-memchr_inv.log
        warning: unsafe memchr_inv() usage lacked '__read_overflow' symbol in lib/test_fortify/read_overflow-memchr_inv.c
          TEST    lib/test_fortify/read_overflow-memcmp.log
        warning: unsafe memcmp() usage lacked '__read_overflow' warning in lib/test_fortify/read_overflow-memcmp.c
          TEST    lib/test_fortify/read_overflow-memscan.log
        warning: unsafe memscan() usage lacked '__read_overflow' symbol in lib/test_fortify/read_overflow-memscan.c
          TEST    lib/test_fortify/read_overflow2-memcmp.log
        warning: unsafe memcmp() usage lacked '__read_overflow2' warning in lib/test_fortify/read_overflow2-memcmp.c
           [ more and more similar warnings... ]
      
      Commit 9c2d1328 ("kbuild: provide reasonable defaults for tool
      coverage") removed KASAN flags from non-kernel objects by default.
      It was an intended behavior because lib/test_fortify/*.c are unit
      tests that are not linked to the kernel.
      
      As it turns out, some architectures require -fsanitize=kernel-(hw)address
      to define __SANITIZE_ADDRESS__ for the fortify tests.
      
      Without __SANITIZE_ADDRESS__ defined, arch/arm/include/asm/string.h
      defines __NO_FORTIFY, thus excluding <linux/fortify-string.h>.
      
      This issue does not occur on x86 thanks to commit 4ec4190b
      ("kasan, x86: don't rename memintrinsics in uninstrumented files"),
      but there are still some architectures that define __NO_FORTIFY
      in such a situation.
      
      Set KASAN_SANITIZE=y explicitly to the fortify tests.
      
      Fixes: 9c2d1328
      
       ("kbuild: provide reasonable defaults for tool coverage")
      Reported-by: default avatarArnd Bergmann <arnd@arndb.de>
      Closes: https://lore.kernel.org/all/0e8dee26-41cc-41ae-9493-10cd1a8e3268@app.fastmail.com/
      
      
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      84679f04
  3. Jul 12, 2024
  4. Jul 11, 2024
  5. Jul 10, 2024
    • Kent Overstreet's avatar
      closures: fix closure_sync + closure debugging · 29f1c1ae
      Kent Overstreet authored
      
      originally, stack closures were only used synchronously, and with the
      original implementation of closure_sync() the ref never hit 0; thus,
      closure_put_after_sub() assumes that if the ref hits 0 it's on the debug
      list, in debug mode.
      
      that's no longer true with the current implementation of closure_sync,
      so we need a new magic so closure_debug_destroy() doesn't pop an assert.
      
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      29f1c1ae
  6. Jul 06, 2024
  7. Jul 05, 2024
  8. Jul 03, 2024
  9. Jul 02, 2024
  10. Jun 28, 2024
  11. Jun 25, 2024