- Sep 09, 2023
-
-
David Howells authored
Add some kunit tests for page extraction for ITER_BVEC, ITER_KVEC and ITER_XARRAY type iterators. ITER_UBUF and ITER_IOVEC aren't dealt with as they require userspace VM interaction. ITER_DISCARD isn't dealt with either as that can't be extracted. Signed-off-by:
David Howells <dhowells@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: David Hildenbrand <david@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
David Howells authored
Add some kunit tests for page extraction for ITER_BVEC, ITER_KVEC and ITER_XARRAY type iterators. ITER_UBUF and ITER_IOVEC aren't dealt with as they require userspace VM interaction. ITER_DISCARD isn't dealt with either as that does nothing. Signed-off-by:
David Howells <dhowells@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: David Hildenbrand <david@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
David Howells authored
iov_iter_extract_pages() doesn't correctly handle skipping over initial zero-length entries in ITER_KVEC and ITER_BVEC-type iterators. The problem is that it accidentally reduces maxsize to 0 when it skipping and thus runs to the end of the array and returns 0. Fix this by sticking the calculated size-to-copy in a new variable rather than back in maxsize. Fixes: 7d58fe73 ("iov_iter: Add a function to extract a page list from an iterator") Signed-off-by:
David Howells <dhowells@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: David Hildenbrand <david@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Sep 06, 2023
-
-
WANG Xuerui authored
Similar to the syndrome calculation, the recovery algorithms also work on 64 bytes at a time to align with the L1 cache line size of current and future LoongArch cores (that we care about). Which means unrolled-by-4 LSX and unrolled-by-2 LASX code. The assembly is originally based on the x86 SSSE3/AVX2 ports, but register allocation has been redone to take advantage of LSX/LASX's 32 vector registers, and instruction sequence has been optimized to suit (e.g. LoongArch can perform per-byte srl and andi on vectors, but x86 cannot). Performance numbers measured by instrumenting the raid6test code, on a 3A5000 system clocked at 2.5GHz: > lasx 2data: 354.987 MiB/s > lasx datap: 350.430 MiB/s > lsx 2data: 340.026 MiB/s > lsx datap: 337.318 MiB/s > intx1 2data: 164.280 MiB/s > intx1 datap: 187.966 MiB/s Because recovery algorithms are chosen solely based on priority and availability, lasx is marked as priority 2 and lsx priority 1. At least for the current generation of LoongArch micro-architectures, LASX should always be faster than LSX whenever supported, and have similar power consumption characteristics (because the only known LASX-capable uarch, the LA464, always compute the full 256-bit result for vector ops). Acked-by:
Song Liu <song@kernel.org> Signed-off-by:
WANG Xuerui <git@xen0n.name> Signed-off-by:
Huacai Chen <chenhuacai@loongson.cn>
-
WANG Xuerui authored
The algorithms work on 64 bytes at a time, which is the L1 cache line size of all current and future LoongArch cores (that we care about), as confirmed by Huacai. The code is based on the generic int.uc algorithm, unrolled 4 times for LSX and 2 times for LASX. Further unrolling does not meaningfully improve the performance according to experiments. Performance numbers measured during system boot on a 3A5000 @ 2.5GHz: > raid6: lasx gen() 12726 MB/s > raid6: lsx gen() 10001 MB/s > raid6: int64x8 gen() 2876 MB/s > raid6: int64x4 gen() 3867 MB/s > raid6: int64x2 gen() 2531 MB/s > raid6: int64x1 gen() 1945 MB/s Comparison of xor() speeds (from different boots but meaningful anyway): > lasx: 11226 MB/s > lsx: 6395 MB/s > int64x4: 2147 MB/s Performance as measured by raid6test: > raid6: lasx gen() 25109 MB/s > raid6: lsx gen() 13233 MB/s > raid6: int64x8 gen() 4164 MB/s > raid6: int64x4 gen() 6005 MB/s > raid6: int64x2 gen() 5781 MB/s > raid6: int64x1 gen() 4119 MB/s > raid6: using algorithm lasx gen() 25109 MB/s > raid6: .... xor() 14439 MB/s, rmw enabled Acked-by:
Song Liu <song@kernel.org> Signed-off-by:
WANG Xuerui <git@xen0n.name> Signed-off-by:
Huacai Chen <chenhuacai@loongson.cn>
-
- Sep 05, 2023
-
-
Ariel Marcovitch authored
The relevant parameter is 'start' and not 'nextid' Fixes: 460488c5 ("idr: Remove idr_alloc_ext") Signed-off-by:
Ariel Marcovitch <arielmarcovitch@gmail.com> Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org>
-
Philipp Stanner authored
Adds a new line to the docstrings of functions wrapping __xa_alloc() and __xa_alloc_cyclic(), informing about the necessity of flag XA_FLAGS_ALLOC being set previously. The documentation so far says that functions wrapping __xa_alloc() and __xa_alloc_cyclic() are supposed to return either -ENOMEM or -EBUSY in case of an error. If the xarray has been initialized without the flag XA_FLAGS_ALLOC, however, they fail with a different, undocumented error code. As hinted at in Documentation/core-api/xarray.rst, wrappers around these functions should only be invoked when the flag has been set. The functions' documentation should reflect that as well. Signed-off-by:
Philipp Stanner <pstanner@redhat.com> Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org>
-
- Aug 31, 2023
-
-
Nathan Chancellor authored
When building for ARCH=riscv using LLVM < 14, there is an error with CONFIG_DEBUG_INFO_SPLIT=y: error: A dwo section may not contain relocations This was worked around in LLVM 15 by disallowing '-gsplit-dwarf' with '-mrelax' (the default), so CONFIG_DEBUG_INFO_SPLIT is not selectable with newer versions of LLVM: $ clang --target=riscv64-linux-gnu -gsplit-dwarf -c -o /dev/null -x c /dev/null clang: error: -gsplit-dwarf is unsupported with RISC-V linker relaxation (-mrelax) GCC silently had a similar issue that was resolved with GCC 12.x. Restrict CONFIG_DEBUG_INFO_SPLIT for RISC-V when using LLVM or GCC < 12.x to avoid these known issues. Link: https://github.com/ClangBuiltLinux/linux/issues/1914 Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99090 Reported-by:
kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/all/202308090204.9yZffBWo-lkp@intel.com/ Signed-off-by:
Nathan Chancellor <nathan@kernel.org> Reviewed-by:
Fangrui Song <maskray@google.com> Reviewed-by:
Nick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/r/20230816-riscv-debug_info_split-v1-1-d1019d6ccc11@kernel.org Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
Jisheng Zhang authored
Allow to force all function address 64B aligned as it is possible for other architectures. This may be useful when verify if performance bump is caused by function alignment changes. Before commit 1bf18da6 ("lib/Kconfig.debug: add ARCH dependency for FUNCTION_ALIGN option"), riscv supports enabling the DEBUG_FORCE_FUNCTION_ALIGN_64B option, but after that commit, each arch needs to claim the support explicitly. Signed-off-by:
Jisheng Zhang <jszhang@kernel.org> Reviewed-by:
Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230727160356.3874-1-jszhang@kernel.org Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
- Aug 28, 2023
-
-
Rob Herring authored
The DT of_device.h and of_platform.h date back to the separate of_platform_bus_type before it was merged into the regular platform bus. As part of that merge prepping Arm DT support 13 years ago, they "temporarily" include each other. They also include platform_device.h and of.h. As a result, there's a pretty much random mix of those include files used throughout the tree. In order to detangle these headers and replace the implicit includes with struct declarations, users need to explicitly include the correct includes. Link: https://lore.kernel.org/r/20230714175056.4066297-1-robh@kernel.org Signed-off-by:
Rob Herring <robh@kernel.org>
-
- Aug 25, 2023
-
-
Helge Deller authored
The gcc compiler translates on some architectures the 64-bit __builtin_clzll() function to a call to the libgcc function __clzdi2(), which should take a 64-bit parameter on 32- and 64-bit platforms. But in the current kernel code, the built-in __clzdi2() function is defined to operate (wrongly) on 32-bit parameters if BITS_PER_LONG == 32, thus the return values on 32-bit kernels are in the range from [0..31] instead of the expected [0..63] range. This patch fixes the in-kernel functions __clzdi2() and __ctzdi2() to take a 64-bit parameter on 32-bit kernels as well, thus it makes the functions identical for 32- and 64-bit kernels. This bug went unnoticed since kernel 3.11 for over 10 years, and here are some possible reasons for that: a) Some architectures have assembly instructions to count the bits and which are used instead of calling __clzdi2(), e.g. on x86 the bsr instruction and on ppc cntlz is used. On such architectures the wrong __clzdi2() implementation isn't used and as such the bug has no effect and won't be noticed. b) Some architectures link to libgcc.a, and the in-kernel weak functions get replaced by the correct 64-bit variants from libgcc.a. c) __builtin_clzll() and __clzdi2() doesn't seem to be used in many places in the kernel, and most likely only in uncritical functions, e.g. when printing hex values via seq_put_hex_ll(). The wrong return value will still print the correct number, but just in a wrong formatting (e.g. with too many leading zeroes). d) 32-bit kernels aren't used that much any longer, so they are less tested. A trivial testcase to verify if the currently running 32-bit kernel is affected by the bug is to look at the output of /proc/self/maps: Here the kernel uses a correct implementation of __clzdi2(): root@debian:~# cat /proc/self/maps 00010000-00019000 r-xp 00000000 08:05 787324 /usr/bin/cat 00019000-0001a000 rwxp 00009000 08:05 787324 /usr/bin/cat 0001a000-0003b000 rwxp 00000000 00:00 0 [heap] f7551000-f770d000 r-xp 00000000 08:05 794765 /usr/lib/hppa-linux-gnu/libc.so.6 ... and this kernel uses the broken implementation of __clzdi2(): root@debian:~# cat /proc/self/maps 0000000010000-0000000019000 r-xp 00000000 000000008:000000005 787324 /usr/bin/cat 0000000019000-000000001a000 rwxp 000000009000 000000008:000000005 787324 /usr/bin/cat 000000001a000-000000003b000 rwxp 00000000 00:00 0 [heap] 00000000f73d1000-00000000f758d000 r-xp 00000000 000000008:000000005 794765 /usr/lib/hppa-linux-gnu/libc.so.6 ... Signed-off-by:
Helge Deller <deller@gmx.de> Fixes: 4df87bb7 ("lib: add weak clz/ctz functions") Cc: Chanho Min <chanho.min@lge.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: stable@vger.kernel.org # v3.11+ Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Mateusz Guzik authored
Allocations and frees are globally serialized on the pcpu lock (and the CPU hotplug lock if enabled, which is the case on Debian). At least one frequent consumer allocates 4 back-to-back counters (and frees them in the same manner), exacerbating the problem. While this does not fully remedy scalability issues, it is a step towards that goal and provides immediate relief. Signed-off-by:
Mateusz Guzik <mjguzik@gmail.com> Reviewed-by:
Dennis Zhou <dennis@kernel.org> Reviewed-by:
Vegard Nossum <vegard.nossum@oracle.com> Link: https://lore.kernel.org/r/20230823050609.2228718-2-mjguzik@gmail.com [Dennis: reflowed a few lines] Signed-off-by:
Dennis Zhou <dennis@kernel.org>
-
Christophe Leroy authored
On powerpc64le checksum kunit tests work: [ 2.011457][ T1] KTAP version 1 [ 2.011662][ T1] # Subtest: checksum [ 2.011848][ T1] 1..3 [ 2.034710][ T1] ok 1 test_csum_fixed_random_inputs [ 2.079325][ T1] ok 2 test_csum_all_carry_inputs [ 2.127102][ T1] ok 3 test_csum_no_carry_inputs [ 2.127202][ T1] # checksum: pass:3 fail:0 skip:0 total:3 [ 2.127533][ T1] # Totals: pass:3 fail:0 skip:0 total:3 [ 2.127956][ T1] ok 1 checksum But on powerpc64 and powerpc32 they fail: [ 1.859890][ T1] KTAP version 1 [ 1.860041][ T1] # Subtest: checksum [ 1.860201][ T1] 1..3 [ 1.861927][ T58] # test_csum_fixed_random_inputs: ASSERTION FAILED at lib/checksum_kunit.c:243 [ 1.861927][ T58] Expected result == expec, but [ 1.861927][ T58] result == 54991 (0xd6cf) [ 1.861927][ T58] expec == 33316 (0x8224) [ 1.863742][ T1] not ok 1 test_csum_fixed_random_inputs [ 1.864520][ T60] # test_csum_all_carry_inputs: ASSERTION FAILED at lib/checksum_kunit.c:267 [ 1.864520][ T60] Expected result == expec, but [ 1.864520][ T60] result == 255 (0xff) [ 1.864520][ T60] expec == 65280 (0xff00) [ 1.868820][ T1] not ok 2 test_csum_all_carry_inputs [ 1.869977][ T62] # test_csum_no_carry_inputs: ASSERTION FAILED at lib/checksum_kunit.c:306 [ 1.869977][ T62] Expected result == expec, but [ 1.869977][ T62] result == 64515 (0xfc03) [ 1.869977][ T62] expec == 0 (0x0) [ 1.872060][ T1] not ok 3 test_csum_no_carry_inputs [ 1.872102][ T1] # checksum: pass:0 fail:3 skip:0 total:3 [ 1.872458][ T1] # Totals: pass:0 fail:3 skip:0 total:3 [ 1.872791][ T1] not ok 3 checksum This is because all expected values were calculated for X86 which is little endian. On big endian systems all precalculated 16 bits halves must be byte swapped. And this is confirmed by a huge amount of sparse errors when building with C=2 So fix all sparse errors and it will naturally work on all endianness. Fixes: 688eb819 ("x86/csum: Improve performance of `csum_partial`") Signed-off-by:
Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Aug 24, 2023
-
-
Liam R. Howlett authored
Avoid setting the variables until necessary, and actually use the variables where applicable. Introducing a variable for the slots array avoids spanning multiple lines. Add the missing argument to the documentation. Use the node type when setting the metadata instead of blindly assuming the type. Finally, add a trace point to the function for successful store. Link: https://lkml.kernel.org/r/20230819004356.1454718-3-Liam.Howlett@oracle.com Signed-off-by:
Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
Patch series "New page table range API", v6. This patchset changes the API used by the MM to set up page table entries. The four APIs are: set_ptes(mm, addr, ptep, pte, nr) update_mmu_cache_range(vma, addr, ptep, nr) flush_dcache_folio(folio) flush_icache_pages(vma, page, nr) flush_dcache_folio() isn't technically new, but no architecture implemented it, so I've done that for them. The old APIs remain around but are mostly implemented by calling the new interfaces. The new APIs are based around setting up N page table entries at once. The N entries belong to the same PMD, the same folio and the same VMA, so ptep++ is a legitimate operation, and locking is taken care of for you. Some architectures can do a better job of it than just a loop, but I have hesitated to make too deep a change to architectures I don't understand well. One thing I have changed in every architecture is that PG_arch_1 is now a per-folio bit instead of a per-page bit when used for dcache clean/dirty tracking. This was something that would have to happen eventually, and it makes sense to do it now rather than iterate over every page involved in a cache flush and figure out if it needs to happen. The point of all this is better performance, and Fengwei Yin has measured improvement on x86. I suspect you'll see improvement on your architecture too. Try the new will-it-scale test mentioned here: https://lore.kernel.org/linux-mm/20230206140639.538867-5-fengwei.yin@intel.com/ You'll need to run it on an XFS filesystem and have CONFIG_TRANSPARENT_HUGEPAGE set. This patchset is the basis for much of the anonymous large folio work being done by Ryan, so it's received quite a lot of testing over the last few months. This patch (of 38): Determine if a value lies within a range more efficiently (subtraction + comparison vs two comparisons and an AND). It also has useful (under some circumstances) behaviour if the range exceeds the maximum value of the type. Convert all the conflicting definitions of in_range() within the kernel; some can use the generic definition while others need their own definition. Link: https://lkml.kernel.org/r/20230802151406.3735276-1-willy@infradead.org Link: https://lkml.kernel.org/r/20230802151406.3735276-2-willy@infradead.org Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Liam R. Howlett authored
The current implementation of append may cause duplicate data and/or incorrect ranges to be returned to a reader during an update. Although this has not been reported or seen, disable the append write operation while the tree is in rcu mode out of an abundance of caution. During the analysis of the mas_next_slot() the following was artificially created by separating the writer and reader code: Writer: reader: mas_wr_append set end pivot updates end metata Detects write to last slot last slot write is to start of slot store current contents in slot overwrite old end pivot mas_next_slot(): read end metadata read old end pivot return with incorrect range store new value Alternatively: Writer: reader: mas_wr_append set end pivot updates end metata Detects write to last slot last lost write to end of slot store value mas_next_slot(): read end metadata read old end pivot read new end pivot return with incorrect range set old end pivot There may be other accesses that are not safe since we are now updating both metadata and pointers, so disabling append if there could be rcu readers is the safest action. Link: https://lkml.kernel.org/r/20230819004356.1454718-2-Liam.Howlett@oracle.com Fixes: 54a611b6 ("Maple Tree: add new data structure") Signed-off-by:
Liam R. Howlett <Liam.Howlett@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
- Aug 21, 2023
-
-
Andy Shevchenko authored
We already use _tolower() in other places, so convert the one which open codes it. Link: https://lkml.kernel.org/r/20230817145919.543251-1-andriy.shevchenko@linux.intel.com Signed-off-by:
Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Andy Shevchenko authored
Sparse is not happy to see non-static variable without declaration: lib/vsprintf.c:61:6: warning: symbol 'no_hash_pointers' was not declared. Should it be static? Declare respective variable in the sprintf.h. With this, add a comment to discourage its use if no real need. Link: https://lkml.kernel.org/r/20230814163344.17429-3-andriy.shevchenko@linux.intel.com Signed-off-by:
Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by:
Marco Elver <elver@google.com> Reviewed-by:
Petr Mladek <pmladek@suse.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Andy Shevchenko authored
Patch series "lib/vsprintf: Rework header inclusions", v3. Some patches that reduce the mess with the header inclusions related to vsprintf.c module. Each patch has its own description, and has no dependencies to each other, except the collisions over modifications of the same places. Hence the series. This patch (of 2): kernel.h is being used as a dump for all kinds of stuff for a long time. sprintf() and friends are used in many drivers without need of the full kernel.h dependency train with it. Here is the attempt on cleaning it up by splitting out sprintf() and friends. Link: https://lkml.kernel.org/r/20230814163344.17429-1-andriy.shevchenko@linux.intel.com Link: https://lkml.kernel.org/r/20230814163344.17429-2-andriy.shevchenko@linux.intel.com Signed-off-by:
Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by:
Petr Mladek <pmladek@suse.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Marco Elver <elver@google.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Liam R. Howlett authored
Reorder the operations for split and spanning stores so that new data is placed in the tree prior to marking the old data as dead. This will limit re-walks on dead data to just once instead of a retry loop. The order of operations is as follows: Create the new data, put the new data in place, mark the top node of the old data as dead. Then repair parent links in the reused nodes through all levels of the tree, following the new nodes downwards. Finally walk the top dead node looking for nodes that are no longer used, or subtrees that should be destroyed (marked dead throughout then freed), follow the partially used nodes downwards to discover other dead nodes and subtrees. Link: https://lkml.kernel.org/r/20230804165951.2661157-7-Liam.Howlett@oracle.com Signed-off-by:
Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Liam R. Howlett authored
All calls to mas_adopt_children() currently pass the parent as the node in the maple state. Allow for the parent pointer that is passed in to be used instead. Link: https://lkml.kernel.org/r/20230804165951.2661157-6-Liam.Howlett@oracle.com Signed-off-by:
Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Liam R. Howlett authored
Add a definition to shorten long code lines and clarify what the code is doing. Use the new definition to get the maple tree parent pointer from the maple state where possible. Link: https://lkml.kernel.org/r/20230804165951.2661157-5-Liam.Howlett@oracle.com Signed-off-by:
Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Liam R. Howlett authored
mas_replace() has a single user that takes a flag which is now always true. Replace this function with mas_put_in_tree() to better align with mas_replace_node(). Inline the remaining logic into the only caller; mas_wmb_replace(). Link: https://lkml.kernel.org/r/20230804165951.2661157-4-Liam.Howlett@oracle.com Signed-off-by:
Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Liam R. Howlett authored
Replacing nodes may cause a live lock-up if CPU resources are saturated by write operations on the tree by continuously retrying on dead nodes. To avoid the continuous retry scenario, ensure the new node is inserted into the tree prior to marking the old data as dead. This will define a window where old and new data is swapped. When reusing lower level nodes, ensure the parent pointer is updated after the parent is marked dead. This ensures that the child is still reachable from the top of the tree, but walking up to a dead node will result in a single retry that will start a fresh walk from the top down through the new node. Link: https://lkml.kernel.org/r/20230804165951.2661157-3-Liam.Howlett@oracle.com Signed-off-by:
Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Liam R. Howlett authored
Patch series "maple_tree: Change replacement strategy". The maple tree marks nodes dead as soon as they are going to be replaced. This could be problematic when used in the RCU context since the writer may be starved of CPU time by the readers. This patch set addresses the issue by switching the data replacement strategy to one that will only mark data as dead once the new data is available. This series changes the ordering of the node replacement so that the new data is live before the old data is marked 'dead'. When readers hit 'dead' nodes, they will restart from the top of the tree and end up in the new data. In more complex scenarios, the replacement strategy means a subtree is built and graphed into the tree leaving some nodes to point to the old parent. The view of tasks into the old data will either remain with the old data, or see the new data once the old data is marked 'dead'. Iterators will see the 'dead' node and restart on their own and switch to the new data. There is no risk of the reader seeing old data in these cases. The 'dead' subtree of data is then fully marked dead, but reused nodes will still point to the dead nodes until the parent pointer is updated. Walking up to a 'dead' node will cause a re-walk from the top of the tree and enter the new data area where old data is not reachable. Once the parent pointers are fully up to date in the active data, the 'dead' subtree is iterated to collect entirely 'dead' subtrees, and dead nodes (nodes that partially contained reused data). This patch (of 6): When dumping the tree, honour formatting request to output hex for the maple node type arange64. Link: https://lkml.kernel.org/r/20230804165951.2661157-1-Liam.Howlett@oracle.com Link: https://lkml.kernel.org/r/20230804165951.2661157-2-Liam.Howlett@oracle.com Signed-off-by:
Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Arnd Bergmann authored
Recent versions of clang warn about an unused variable, though older versions saw the 'slot++' as a use and did not warn: radix-tree.c:1136:50: error: parameter 'slot' set but not used [-Werror,-Wunused-but-set-parameter] It's clearly not needed any more, so just remove it. Link: https://lkml.kernel.org/r/20230811131023.2226509-1-arnd@kernel.org Fixes: 3a08cd52 ("radix tree: Remove multiorder support") Signed-off-by:
Arnd Bergmann <arnd@arndb.de> Cc: Matthew Wilcox <willy@infradead.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peng Zhang <zhangpeng.00@bytedance.com> Cc: Rong Tao <rongtao@cestc.cn> Cc: Tom Rix <trix@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Rae Moar authored
Add parameter descriptions to struct kunit_attr header for the parameters attr_default and print. Fixes: 39e92cb1 ("kunit: Add test attributes API structure") Reported-by:
kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202308180127.VD7YRPGa-lkp@intel.com/ Signed-off-by:
Rae Moar <rmoar@google.com> Reviewed-by:
David Gow <davidgow@google.com> Signed-off-by:
Shuah Khan <skhan@linuxfoundation.org>
-
- Aug 19, 2023
-
-
Zhen Lei authored
When adding koject or kset, we have made sure that ktype cannot be NULL. Therefore, after adding koject or kset, there is no need to worry about ktype being NULL. Clear all ktype-related redundancy checks. Signed-off-by:
Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/r/20230805084114.1298-3-thunder.leizhen@huaweicloud.com Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Zhen Lei authored
When I register a kset in the following way: static struct kset my_kset; kobject_set_name(&my_kset.kobj, "my_kset"); ret = kset_register(&my_kset); A null pointer dereference exception is occurred: [ 4453.568337] Unable to handle kernel NULL pointer dereference at \ virtual address 0000000000000028 ... ... [ 4453.810361] Call trace: [ 4453.813062] kobject_get_ownership+0xc/0x34 [ 4453.817493] kobject_add_internal+0x98/0x274 [ 4453.822005] kset_register+0x5c/0xb4 [ 4453.825820] my_kobj_init+0x44/0x1000 [my_kset] ... ... Because I didn't initialize my_kset.kobj.ktype. According to the description in Documentation/core-api/kobject.rst: - A ktype is the type of object that embeds a kobject. Every structure that embeds a kobject needs a corresponding ktype. So add sanity check to make sure kset->kobj.ktype is not NULL. Signed-off-by:
Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/r/20230805084114.1298-2-thunder.leizhen@huaweicloud.com Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- Aug 18, 2023
-
-
Douglas Anderson authored
The APIs that allow backtracing across CPUs have always had a way to exclude the current CPU. This convenience means callers didn't need to find a place to allocate a CPU mask just to handle the common case. Let's extend the API to take a CPU ID to exclude instead of just a boolean. This isn't any more complex for the API to handle and allows the hardlockup detector to exclude a different CPU (the one it already did a trace for) without needing to find space for a CPU mask. Arguably, this new API also encourages safer behavior. Specifically if the caller wants to avoid tracing the current CPU (maybe because they already traced the current CPU) this makes it more obvious to the caller that they need to make sure that the current CPU ID can't change. [akpm@linux-foundation.org: fix trigger_allbutcpu_cpu_backtrace() stub] Link: https://lkml.kernel.org/r/20230804065935.v4.1.Ia35521b91fc781368945161d7b28538f9996c182@changeid Signed-off-by:
Douglas Anderson <dianders@chromium.org> Acked-by:
Michal Hocko <mhocko@suse.com> Cc: kernel test robot <lkp@intel.com> Cc: Lecopzer Chen <lecopzer.chen@mediatek.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Pingfan Liu <kernelfans@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
John Sanpe authored
Replace internal logic with separate bitrev library. Link: https://lkml.kernel.org/r/20230730081717.1498217-1-sanpeqf@gmail.com Signed-off-by:
John Sanpe <sanpeqf@gmail.com> Cc: Bhaskar Chowdhury <unixbhaskar@gmail.com> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Wang Ming authored
It is expected that most callers should _ignore_ the errors return by debugfs_create_dir() in ei_debugfs_init(). Link: https://lkml.kernel.org/r/20230719144355.6720-1-machel@vivo.com Signed-off-by:
Wang Ming <machel@vivo.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Wang Ming authored
It is expected that most callers should _ignore_ the errors return by debugfs_create_dir() in err_inject_init(). Link: https://lkml.kernel.org/r/20230713082455.2415-1-machel@vivo.com Signed-off-by:
Wang Ming <machel@vivo.com> Cc: Akinobu Mita <akinobu.mita@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Sumitra Sharma authored
kmap() has been deprecated in favor of the kmap_local_page() due to high cost, restricted mapping space, the overhead of a global lock for synchronization, and making the process sleep in the absence of free slots. kmap_local_page() is faster than kmap() and offers thread-local and CPU-local mappings, take pagefaults in a local kmap region and preserves preemption by saving the mappings of outgoing tasks and restoring those of the incoming one during a context switch. The mappings are kept thread local in the functions “dmirror_do_read” and “dmirror_do_write” in test_hmm.c Therefore, replace kmap() with kmap_local_page() and use mempcy_from/to_page() to avoid open coding kmap_local_page() + memcpy() + kunmap_local(). Remove the unused variable “tmp”. Link: https://lkml.kernel.org/r/20230610175712.GA348514@sumitra.com Signed-off-by:
Sumitra Sharma <sumitraartsy@gmail.com> Suggested-by:
Fabio M. De Francesco <fmdefrancesco@gmail.com> Reviewed-by:
Fabio M. De Francesco <fmdefrancesco@gmail.com> Reviewed-by:
Ira Weiny <ira.weiny@intel.com> Cc: Deepak R Varma <drv@mailo.com> Cc: Jérôme Glisse <jglisse@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Liam R. Howlett authored
mas_prealloc() may walk partially down the tree before finding that a split or spanning store is needed. When the write occurs, relax the logic on resetting the walk so that partial walks will not restart, but walks that have gone too far (a store that affects beyond the current node) should be restarted. Link: https://lkml.kernel.org/r/20230724183157.3939892-15-Liam.Howlett@oracle.com Signed-off-by:
Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Peng Zhang <zhangpeng.00@bytedance.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Liam R. Howlett authored
Calculate the number of nodes based on the pending write action instead of assuming the worst case. This addresses a performance regression introduced in platforms that have longer allocation timing. Link: https://lkml.kernel.org/r/20230724183157.3939892-14-Liam.Howlett@oracle.com Signed-off-by:
Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Peng Zhang <zhangpeng.00@bytedance.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Liam R. Howlett authored
Relocate it and call mas_wr_extend_null() from within mas_wr_end_piv(). Extending the NULL may affect the end pivot value so call mas_wr_endtend_null() from within mas_wr_end_piv() to keep it all together. Link: https://lkml.kernel.org/r/20230724183157.3939892-12-Liam.Howlett@oracle.com Signed-off-by:
Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Peng Zhang <zhangpeng.00@bytedance.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Liam R. Howlett authored
mas_rebalance() is called to rebalance an insufficient node into a single node or two sufficient nodes. The preallocation estimate is always too many in this case as the height of the tree will never grow and there is no possibility to have a three way split in this case, so revise the node allocation count. Link: https://lkml.kernel.org/r/20230724183157.3939892-9-Liam.Howlett@oracle.com Signed-off-by:
Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Peng Zhang <zhangpeng.00@bytedance.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Liam R. Howlett authored
The current preallocation strategy is to preallocate the absolute worst-case allocation for a tree modification. The entry (or NULL) is needed to know how many nodes are needed to write to the tree. Start by adding the argument to the mas_preallocate() definition. Link: https://lkml.kernel.org/r/20230724183157.3939892-8-Liam.Howlett@oracle.com Signed-off-by:
Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Peng Zhang <zhangpeng.00@bytedance.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Liam R. Howlett authored
Add some benchmarking functions in testing for mas_prev(). This is useful to ensure there are no regressions added during modifications. Link: https://lkml.kernel.org/r/20230724183157.3939892-3-Liam.Howlett@oracle.com Signed-off-by:
Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Peng Zhang <zhangpeng.00@bytedance.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-