Forum | Documentation | Website | Blog

Skip to content
Snippets Groups Projects
  1. Nov 01, 2023
  2. Oct 27, 2023
  3. Oct 20, 2023
    • Eric Biggers's avatar
      crypto: adiantum - add fast path for single-page messages · dadf5e56
      Eric Biggers authored
      
      When the source scatterlist is a single page, optimize the first hash
      step of adiantum to use crypto_shash_digest() instead of
      init/update/final, and use the same local kmap for both hashing the bulk
      part and loading the narrow part of the source data.
      
      Likewise, when the destination scatterlist is a single page, optimize
      the second hash step of adiantum to use crypto_shash_digest() instead of
      init/update/final, and use the same local kmap for both hashing the bulk
      part and storing the narrow part of the destination data.
      
      In some cases these optimizations improve performance significantly.
      
      Note: ideally, for optimal performance each architecture should
      implement the full "adiantum(xchacha12,aes)" algorithm and fully
      optimize the contiguous buffer case to use no indirect calls.  That's
      not something I've gotten around to doing, though.  This commit just
      makes a relatively small change that provides some benefit with the
      existing template-based approach.
      
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      dadf5e56
  4. Oct 13, 2023
  5. Feb 13, 2023
  6. Jan 02, 2021
  7. Aug 07, 2020
    • Waiman Long's avatar
      mm, treewide: rename kzfree() to kfree_sensitive() · 453431a5
      Waiman Long authored
      As said by Linus:
      
        A symmetric naming is only helpful if it implies symmetries in use.
        Otherwise it's actively misleading.
      
        In "kzalloc()", the z is meaningful and an important part of what the
        caller wants.
      
        In "kzfree()", the z is actively detrimental, because maybe in the
        future we really _might_ want to use that "memfill(0xdeadbeef)" or
        something. The "zero" part of the interface isn't even _relevant_.
      
      The main reason that kzfree() exists is to clear sensitive information
      that should not be leaked to other future users of the same memory
      objects.
      
      Rename kzfree() to kfree_sensitive() to follow the example of the recently
      added kvfree_sensitive() and make the intention of the API more explicit.
      In addition, memzero_explicit() is used to clear the memory to make sure
      that it won't get optimized away by the compiler.
      
      The renaming is done by using the command sequence:
      
        git grep -w --name-only kzfree |\
        xargs sed -i 's/kzfree/kfree_sen...
      453431a5
  8. Jul 16, 2020
    • Eric Biggers's avatar
      crypto: algapi - use common mechanism for inheriting flags · 7bcb2c99
      Eric Biggers authored
      The flag CRYPTO_ALG_ASYNC is "inherited" in the sense that when a
      template is instantiated, the template will have CRYPTO_ALG_ASYNC set if
      any of the algorithms it uses has CRYPTO_ALG_ASYNC set.
      
      We'd like to add a second flag (CRYPTO_ALG_ALLOCATES_MEMORY) that gets
      "inherited" in the same way.  This is difficult because the handling of
      CRYPTO_ALG_ASYNC is hardcoded everywhere.  Address this by:
      
        - Add CRYPTO_ALG_INHERITED_FLAGS, which contains the set of flags that
          have these inheritance semantics.
      
        - Add crypto_algt_inherited_mask(), for use by template ->create()
          methods.  It returns any of these flags that the user asked to be
          unset and thus must be passed in the 'mask' to crypto_grab_*().
      
        - Also modify crypto_check_attr_type() to handle computing the 'mask'
          so that most templates can just use this.
      
        - Make crypto_grab_*() propagate these flags to the template instance
          being created so that templates don't have to do this themselves.
      
      Make crypto/simd.c propagate these flags too, since it "wraps" another
      algorithm, similar to a template.
      
      Based on a patch by Mikulas Patocka <mpatocka@redhat.com>
      (https://lore.kernel.org/r/alpine.LRH.2.02.2006301414580.30526@file01.intranet.prod.int.rdu2.redhat.com
      
      ).
      
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      7bcb2c99
  9. Jan 16, 2020
    • Jason A. Donenfeld's avatar
      crypto: poly1305 - add new 32 and 64-bit generic versions · 1c08a104
      Jason A. Donenfeld authored
      
      These two C implementations from Zinc -- a 32x32 one and a 64x64 one,
      depending on the platform -- come from Andrew Moon's public domain
      poly1305-donna portable code, modified for usage in the kernel. The
      precomputation in the 32-bit version and the use of 64x64 multiplies in
      the 64-bit version make these perform better than the code it replaces.
      Moon's code is also very widespread and has received many eyeballs of
      scrutiny.
      
      There's a bit of interference between the x86 implementation, which
      relies on internal details of the old scalar implementation. In the next
      commit, the x86 implementation will be replaced with a faster one that
      doesn't rely on this, so none of this matters much. But for now, to keep
      this passing the tests, we inline the bits of the old implementation
      that the x86 implementation relied on. Also, since we now support a
      slightly larger key space, via the union, some offsets had to be fixed
      up.
      
      Nonce calculation was folded in with the emit function, to take
      advantage of 64x64 arithmetic. However, Adiantum appeared to rely on no
      nonce handling in emit, so this path was conditionalized. We also
      introduced a new struct, poly1305_core_key, to represent the precise
      amount of space that particular implementation uses.
      
      Testing with kbench9000, depending on the CPU, the update function for
      the 32x32 version has been improved by 4%-7%, and for the 64x64 by
      19%-30%. The 32x32 gains are small, but I think there's great value in
      having a parallel implementation to the 64x64 one so that the two can be
      compared side-by-side as nice stand-alone units.
      
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      1c08a104
  10. Jan 08, 2020
  11. Dec 09, 2019
  12. Nov 16, 2019
  13. Apr 25, 2019
    • Eric Biggers's avatar
      crypto: shash - remove shash_desc::flags · 877b5691
      Eric Biggers authored
      
      The flags field in 'struct shash_desc' never actually does anything.
      The only ostensibly supported flag is CRYPTO_TFM_REQ_MAY_SLEEP.
      However, no shash algorithm ever sleeps, making this flag a no-op.
      
      With this being the case, inevitably some users who can't sleep wrongly
      pass MAY_SLEEP.  These would all need to be fixed if any shash algorithm
      actually started sleeping.  For example, the shash_ahash_*() functions,
      which wrap a shash algorithm with the ahash API, pass through MAY_SLEEP
      from the ahash API to the shash API.  However, the shash functions are
      called under kmap_atomic(), so actually they're assumed to never sleep.
      
      Even if it turns out that some users do need preemption points while
      hashing large buffers, we could easily provide a helper function
      crypto_shash_update_large() which divides the data into smaller chunks
      and calls crypto_shash_update() and cond_resched() for each chunk.  It's
      not necessary to have a flag in 'struct shash_desc', nor is it necessary
      to make individual shash algorithms aware of this at all.
      
      Therefore, remove shash_desc::flags, and document that the
      crypto_shash_*() functions can be called from any context.
      
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      877b5691
  14. Apr 18, 2019
    • Eric Biggers's avatar
      crypto: run initcalls for generic implementations earlier · c4741b23
      Eric Biggers authored
      
      Use subsys_initcall for registration of all templates and generic
      algorithm implementations, rather than module_init.  Then change
      cryptomgr to use arch_initcall, to place it before the subsys_initcalls.
      
      This is needed so that when both a generic and optimized implementation
      of an algorithm are built into the kernel (not loadable modules), the
      generic implementation is registered before the optimized one.
      Otherwise, the self-tests for the optimized implementation are unable to
      allocate the generic implementation for the new comparison fuzz tests.
      
      Note that on arm, a side effect of this change is that self-tests for
      generic implementations may run before the unaligned access handler has
      been installed.  So, unaligned accesses will crash the kernel.  This is
      arguably a good thing as it makes it easier to detect that type of bug.
      
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      c4741b23
  15. Jan 10, 2019
    • Eric Biggers's avatar
      crypto: adiantum - initialize crypto_spawn::inst · 6db43410
      Eric Biggers authored
      crypto_grab_*() doesn't set crypto_spawn::inst, so templates must set it
      beforehand.  Otherwise it will be left NULL, which causes a crash in
      certain cases where algorithms are dynamically loaded/unloaded.  E.g.
      with CONFIG_CRYPTO_CHACHA20_X86_64=m, the following caused a crash:
      
          insmod chacha-x86_64.ko
          python -c 'import socket; socket.socket(socket.AF_ALG, 5, 0).bind(("skcipher", "adiantum(xchacha12,aes)"))'
          rmmod chacha-x86_64.ko
          python -c 'import socket; socket.socket(socket.AF_ALG, 5, 0).bind(("skcipher", "adiantum(xchacha12,aes)"))'
      
      Fixes: 059c2a4d
      
       ("crypto: adiantum - add Adiantum support")
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      6db43410
  16. Dec 13, 2018
  17. Nov 20, 2018
    • Eric Biggers's avatar
      crypto: adiantum - add Adiantum support · 059c2a4d
      Eric Biggers authored
      Add support for the Adiantum encryption mode.  Adiantum was designed by
      Paul Crowley and is specified by our paper:
      
          Adiantum: length-preserving encryption for entry-level processors
          (https://eprint.iacr.org/2018/720.pdf
      
      )
      
      See our paper for full details; this patch only provides an overview.
      
      Adiantum is a tweakable, length-preserving encryption mode designed for
      fast and secure disk encryption, especially on CPUs without dedicated
      crypto instructions.  Adiantum encrypts each sector using the XChaCha12
      stream cipher, two passes of an ε-almost-∆-universal (εA∆U) hash
      function, and an invocation of the AES-256 block cipher on a single
      16-byte block.  On CPUs without AES instructions, Adiantum is much
      faster than AES-XTS; for example, on ARM Cortex-A7, on 4096-byte sectors
      Adiantum encryption is about 4 times faster than AES-256-XTS encryption,
      and decryption about 5 times faster.
      
      Adiantum is a specialization of the more general HBSH construction.  Our
      earlier proposal, HPolyC, was also a HBSH specialization, but it used a
      different εA∆U hash function, one based on Poly1305 only.  Adiantum's
      εA∆U hash function, which is based primarily on the "NH" hash function
      like that used in UMAC (RFC4418), is about twice as fast as HPolyC's;
      consequently, Adiantum is about 20% faster than HPolyC.
      
      This speed comes with no loss of security: Adiantum is provably just as
      secure as HPolyC, in fact slightly *more* secure.  Like HPolyC,
      Adiantum's security is reducible to that of XChaCha12 and AES-256,
      subject to a security bound.  XChaCha12 itself has a security reduction
      to ChaCha12.  Therefore, one need not "trust" Adiantum; one need only
      trust ChaCha12 and AES-256.  Note that the εA∆U hash function is only
      used for its proven combinatorical properties so cannot be "broken".
      
      Adiantum is also a true wide-block encryption mode, so flipping any
      plaintext bit in the sector scrambles the entire ciphertext, and vice
      versa.  No other such mode is available in the kernel currently; doing
      the same with XTS scrambles only 16 bytes.  Adiantum also supports
      arbitrary-length tweaks and naturally supports any length input >= 16
      bytes without needing "ciphertext stealing".
      
      For the stream cipher, Adiantum uses XChaCha12 rather than XChaCha20 in
      order to make encryption feasible on the widest range of devices.
      Although the 20-round variant is quite popular, the best known attacks
      on ChaCha are on only 7 rounds, so ChaCha12 still has a substantial
      security margin; in fact, larger than AES-256's.  12-round Salsa20 is
      also the eSTREAM recommendation.  For the block cipher, Adiantum uses
      AES-256, despite it having a lower security margin than XChaCha12 and
      needing table lookups, due to AES's extensive adoption and analysis
      making it the obvious first choice.  Nevertheless, for flexibility this
      patch also permits the "adiantum" template to be instantiated with
      XChaCha20 and/or with an alternate block cipher.
      
      We need Adiantum support in the kernel for use in dm-crypt and fscrypt,
      where currently the only other suitable options are block cipher modes
      such as AES-XTS.  A big problem with this is that many low-end mobile
      devices (e.g. Android Go phones sold primarily in developing countries,
      as well as some smartwatches) still have CPUs that lack AES
      instructions, e.g. ARM Cortex-A7.  Sadly, AES-XTS encryption is much too
      slow to be viable on these devices.  We did find that some "lightweight"
      block ciphers are fast enough, but these suffer from problems such as
      not having much cryptanalysis or being too controversial.
      
      The ChaCha stream cipher has excellent performance but is insecure to
      use directly for disk encryption, since each sector's IV is reused each
      time it is overwritten.  Even restricting the threat model to offline
      attacks only isn't enough, since modern flash storage devices don't
      guarantee that "overwrites" are really overwrites, due to wear-leveling.
      Adiantum avoids this problem by constructing a
      "tweakable super-pseudorandom permutation"; this is the strongest
      possible security model for length-preserving encryption.
      
      Of course, storing random nonces along with the ciphertext would be the
      ideal solution.  But doing that with existing hardware and filesystems
      runs into major practical problems; in most cases it would require data
      journaling (like dm-integrity) which severely degrades performance.
      Thus, for now length-preserving encryption is still needed.
      
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Reviewed-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      059c2a4d