Forum | Documentation | Website | Blog

Skip to content
Snippets Groups Projects
  1. May 31, 2024
    • Jeff Johnson's avatar
      crypto: Add missing MODULE_DESCRIPTION() macros · 7c699fe9
      Jeff Johnson authored
      
      Fix the 'make W=1' warnings:
      WARNING: modpost: missing MODULE_DESCRIPTION() in crypto/cast_common.o
      WARNING: modpost: missing MODULE_DESCRIPTION() in crypto/af_alg.o
      WARNING: modpost: missing MODULE_DESCRIPTION() in crypto/algif_hash.o
      WARNING: modpost: missing MODULE_DESCRIPTION() in crypto/algif_skcipher.o
      WARNING: modpost: missing MODULE_DESCRIPTION() in crypto/ecc.o
      WARNING: modpost: missing MODULE_DESCRIPTION() in crypto/curve25519-generic.o
      WARNING: modpost: missing MODULE_DESCRIPTION() in crypto/xor.o
      WARNING: modpost: missing MODULE_DESCRIPTION() in crypto/crypto_simd.o
      
      Signed-off-by: default avatarJeff Johnson <quic_jjohnson@quicinc.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      7c699fe9
  2. Feb 10, 2021
  3. Jan 07, 2021
    • Kirill Tkhai's avatar
      crypto: xor - Fix divide error in do_xor_speed() · 3c02e04f
      Kirill Tkhai authored
      crypto: Fix divide error in do_xor_speed()
      
      From: Kirill Tkhai <ktkhai@virtuozzo.com>
      
      Latest (but not only latest) linux-next panics with divide
      error on my QEMU setup.
      
      The patch at the bottom of this message fixes the problem.
      
      xor: measuring software checksum speed
      divide error: 0000 [#1] PREEMPT SMP KASAN
      PREEMPT SMP KASAN
      CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.10.0-next-20201223+ #2177
      RIP: 0010:do_xor_speed+0xbb/0xf3
      Code: 41 ff cc 75 b5 bf 01 00 00 00 e8 3d 23 8b fe 65 8b 05 f6 49 83 7d 85 c0 75 05 e8
       84 70 81 fe b8 00 00 50 c3 31 d2 48 8d 7b 10 <f7> f5 41 89 c4 e8 58 07 a2 fe 44 89 63 10 48 8d 7b 08
       e8 cb 07 a2
      RSP: 0000:ffff888100137dc8 EFLAGS: 00010246
      RAX: 00000000c3500000 RBX: ffffffff823f0160 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: 0000000000000808 RDI: ffffffff823f0170
      RBP: 0000000000000000 R08: ffffffff8109c50f R09: ffffffff824bb6f7
      R10: fffffbfff04976de R11: 0000000000000001 R12: 0000000000000000
      R13: ffff888101997000 R14: f...
      3c02e04f
  4. Oct 08, 2020
  5. Oct 02, 2020
    • Ard Biesheuvel's avatar
      crypto: xor - use ktime for template benchmarking · c055e3ea
      Ard Biesheuvel authored
      Currently, we use the jiffies counter as a time source, by staring at
      it until a HZ period elapses, and then staring at it again and perform
      as many XOR operations as we can at the same time until another HZ
      period elapses, so that we can calculate the throughput. This takes
      longer than necessary, and depends on HZ, which is undesirable, since
      HZ is system dependent.
      
      Let's use the ktime interface instead, and use it to time a fixed
      number of XOR operations, which can be done much faster, and makes
      the time spent depend on the performance level of the system itself,
      which is much more reasonable. To ensure that we have the resolution
      we need even on systems with 32 kHz time sources, while not spending too
      much time in the benchmark on a slow CPU, let's switch to 3 attempts of
      800 repetitions each: that way, we will only misidentify algorithms that
      perform within 10% of each other as the fastest if they are faster than
      10 GB/s to begin with, which is not expected to occur on systems with
      such coarse clocks.
      
      On ThunderX2, I get the following results:
      
      Before:
      
        [72625.956765] xor: measuring software checksum speed
        [72625.993104]    8regs     : 10169.000 MB/sec
        [72626.033099]    32regs    : 12050.000 MB/sec
        [72626.073095]    arm64_neon: 11100.000 MB/sec
        [72626.073097] xor: using function: 32regs (12050.000 MB/sec)
      
      After:
      
        [72599.650216] xor: measuring software checksum speed
        [72599.651188]    8regs           : 10491 MB/sec
        [72599.652006]    32regs          : 12345 MB/sec
        [72599.652871]    arm64_neon      : 11402 MB/sec
        [72599.652873] xor: using function: 32regs (12345 MB/sec)
      
      Link: https://lore.kernel.org/linux-crypto/20200923182230.22715-3-ardb@kernel.org/
      
      
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Reviewed-by: default avatarDouglas Anderson <dianders@chromium.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      c055e3ea
    • Ard Biesheuvel's avatar
      crypto: xor - defer load time benchmark to a later time · 524ccdbd
      Ard Biesheuvel authored
      
      Currently, the XOR module performs its boot time benchmark at core
      initcall time when it is built-in, to ensure that the RAID code can
      make use of it when it is built-in as well.
      
      Let's defer this to a later stage during the boot, to avoid impacting
      the overall boot time of the system. Instead, just pick an arbitrary
      implementation from the list, and use that as the preliminary default.
      
      Reviewed-by: default avatarDouglas Anderson <dianders@chromium.org>
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      524ccdbd
  6. May 24, 2019
  7. Nov 15, 2017
  8. Aug 31, 2016
  9. Aug 24, 2016
  10. Oct 10, 2012
  11. May 21, 2012
  12. Apr 09, 2012
  13. Mar 30, 2010
    • Tejun Heo's avatar
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo authored
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        bloc...
      5a0e3ad6
  14. Jun 15, 2009
  15. Mar 30, 2009
  16. Jul 13, 2007
    • Dan Williams's avatar
      async_tx: add the async_tx api · 9bc89cd8
      Dan Williams authored
      
      The async_tx api provides methods for describing a chain of asynchronous
      bulk memory transfers/transforms with support for inter-transactional
      dependencies.  It is implemented as a dmaengine client that smooths over
      the details of different hardware offload engine implementations.  Code
      that is written to the api can optimize for asynchronous operation and the
      api will fit the chain of operations to the available offload resources. 
       
      	I imagine that any piece of ADMA hardware would register with the
      	'async_*' subsystem, and a call to async_X would be routed as
      	appropriate, or be run in-line. - Neil Brown
      
      async_tx exploits the capabilities of struct dma_async_tx_descriptor to
      provide an api of the following general format:
      
      struct dma_async_tx_descriptor *
      async_<operation>(..., struct dma_async_tx_descriptor *depend_tx,
      			dma_async_tx_callback cb_fn, void *cb_param)
      {
      	struct dma_chan *chan = async_tx_find_channel(depend_tx, <operation>);
      	struct dma_device *device = chan ? chan->device : NULL;
      	int int_en = cb_fn ? 1 : 0;
      	struct dma_async_tx_descriptor *tx = device ?
      		device->device_prep_dma_<operation>(chan, len, int_en) : NULL;
      
      	if (tx) { /* run <operation> asynchronously */
      		...
      		tx->tx_set_dest(addr, tx, index);
      		...
      		tx->tx_set_src(addr, tx, index);
      		...
      		async_tx_submit(chan, tx, flags, depend_tx, cb_fn, cb_param);
      	} else { /* run <operation> synchronously */
      		...
      		<operation>
      		...
      		async_tx_sync_epilog(flags, depend_tx, cb_fn, cb_param);
      	}
      
      	return tx;
      }
      
      async_tx_find_channel() returns a capable channel from its pool.  The
      channel pool is organized as a per-cpu array of channel pointers.  The
      async_tx_rebalance() routine is tasked with managing these arrays.  In the
      uniprocessor case async_tx_rebalance() tries to spread responsibility
      evenly over channels of similar capabilities.  For example if there are two
      copy+xor channels, one will handle copy operations and the other will
      handle xor.  In the SMP case async_tx_rebalance() attempts to spread the
      operations evenly over the cpus, e.g. cpu0 gets copy channel0 and xor
      channel0 while cpu1 gets copy channel 1 and xor channel 1.  When a
      dependency is specified async_tx_find_channel defaults to keeping the
      operation on the same channel.  A xor->copy->xor chain will stay on one
      channel if it supports both operation types, otherwise the transaction will
      transition between a copy and a xor resource.
      
      Currently the raid5 implementation in the MD raid456 driver has been
      converted to the async_tx api.  A driver for the offload engines on the
      Intel Xscale series of I/O processors, iop-adma, is provided in a later
      commit.  With the iop-adma driver and async_tx, raid456 is able to offload
      copy, xor, and xor-zero-sum operations to hardware engines.
       
      On iop342 tiobench showed higher throughput for sequential writes (20 - 30%
      improvement) and sequential reads to a degraded array (40 - 55%
      improvement).  For the other cases performance was roughly equal, +/- a few
      percentage points.  On a x86-smp platform the performance of the async_tx
      implementation (in synchronous mode) was also +/- a few percentage points
      of the original implementation.  According to 'top' on iop342 CPU
      utilization drops from ~50% to ~15% during a 'resync' while the speed
      according to /proc/mdstat doubles from ~25 MB/s to ~50 MB/s.
       
      The tiobench command line used for testing was: tiobench --size 2048
      --block 4096 --block 131072 --dir /mnt/raid --numruns 5
      * iop342 had 1GB of memory available
      
      Details:
      * if CONFIG_DMA_ENGINE=n the asynchronous path is compiled away by making
        async_tx_find_channel a static inline routine that always returns NULL
      * when a callback is specified for a given transaction an interrupt will
        fire at operation completion time and the callback will occur in a
        tasklet.  if the the channel does not support interrupts then a live
        polling wait will be performed
      * the api is written as a dmaengine client that requests all available
        channels
      * In support of dependencies the api implicitly schedules channel-switch
        interrupts.  The interrupt triggers the cleanup tasklet which causes
        pending operations to be scheduled on the next channel
      * Xor engines treat an xor destination address differently than a software
        xor routine.  To the software routine the destination address is an implied
        source, whereas engines treat it as a write-only destination.  This patch
        modifies the xor_blocks routine to take a an explicit destination address
        to mirror the hardware.
      
      Changelog:
      * fixed a leftover debug print
      * don't allow callbacks in async_interrupt_cond
      * fixed xor_block changes
      * fixed usage of ASYNC_TX_XOR_DROP_DEST
      * drop dma mapping methods, suggested by Chris Leech
      * printk warning fixups from Andrew Morton
      * don't use inline in C files, Adrian Bunk
      * select the API when MD is enabled
      * BUG_ON xor source counts <= 1
      * implicitly handle hardware concerns like channel switching and
        interrupts, Neil Brown
      * remove the per operation type list, and distribute operation capabilities
        evenly amongst the available channels
      * simplify async_tx_find_channel to optimize the fast path
      * introduce the channel_table_initialized flag to prevent early calls to
        the api
      * reorganize the code to mimic crypto
      * include mm.h as not all archs include it in dma-mapping.h
      * make the Kconfig options non-user visible, Adrian Bunk
      * move async_tx under crypto since it is meant as 'core' functionality, and
        the two may share algorithms in the future
      * move large inline functions into c files
      * checkpatch.pl fixes
      * gpl v2 only correction
      
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-By: default avatarNeilBrown <neilb@suse.de>
      9bc89cd8
    • Dan Williams's avatar
      xor: make 'xor_blocks' a library routine for use with async_tx · 685784aa
      Dan Williams authored
      
      The async_tx api tries to use a dma engine for an operation, but will fall
      back to an optimized software routine otherwise.  Xor support is
      implemented using the raid5 xor routines.  For organizational purposes this
      routine is moved to a common area.
      
      The following fixes are also made:
      * rename xor_block => xor_blocks, suggested by Adrian Bunk
      * ensure that xor.o initializes before md.o in the built-in case
      * checkpatch.pl fixes
      * mark calibrate_xor_blocks __init, Adrian Bunk
      
      Cc: Adrian Bunk <bunk@stusta.de>
      Cc: NeilBrown <neilb@suse.de>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      685784aa
  17. Apr 16, 2005
    • Linus Torvalds's avatar
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds authored
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      v2.6.12-rc2
      1da177e4