diff --git a/Documentation/admin-guide/mm/zswap.rst b/Documentation/admin-guide/mm/zswap.rst index c5c2c7dbb155689bcff657591716ad652717a70d..45b98390e938d05816a6f35a93c64b81ec0746c9 100644 --- a/Documentation/admin-guide/mm/zswap.rst +++ b/Documentation/admin-guide/mm/zswap.rst @@ -49,7 +49,7 @@ compressed pool. Design ====== -Zswap receives pages for compression through the Frontswap API and is able to +Zswap receives pages for compression from the swap subsystem and is able to evict pages from its own compressed pool on an LRU basis and write them back to the backing swap device in the case that the compressed pool is full. @@ -70,19 +70,19 @@ means the compression ratio will always be 2:1 or worse (because of half-full zbud pages). The zsmalloc type zpool has a more complex compressed page storage method, and it can achieve greater storage densities. -When a swap page is passed from frontswap to zswap, zswap maintains a mapping +When a swap page is passed from swapout to zswap, zswap maintains a mapping of the swap entry, a combination of the swap type and swap offset, to the zpool handle that references that compressed swap page. This mapping is achieved with a red-black tree per swap type. The swap offset is the search key for the tree nodes. -During a page fault on a PTE that is a swap entry, frontswap calls the zswap -load function to decompress the page into the page allocated by the page fault -handler. +During a page fault on a PTE that is a swap entry, the swapin code calls the +zswap load function to decompress the page into the page allocated by the page +fault handler. Once there are no PTEs referencing a swap page stored in zswap (i.e. the count -in the swap_map goes to 0) the swap code calls the zswap invalidate function, -via frontswap, to free the compressed entry. +in the swap_map goes to 0) the swap code calls the zswap invalidate function +to free the compressed entry. Zswap seeks to be simple in its policies. Sysfs attributes allow for one user controlled policy: diff --git a/Documentation/mm/frontswap.rst b/Documentation/mm/frontswap.rst deleted file mode 100644 index c892412988af264d34d39a98b31f719ad84aabda..0000000000000000000000000000000000000000 --- a/Documentation/mm/frontswap.rst +++ /dev/null @@ -1,264 +0,0 @@ -========= -Frontswap -========= - -Frontswap provides a "transcendent memory" interface for swap pages. -In some environments, dramatic performance savings may be obtained because -swapped pages are saved in RAM (or a RAM-like device) instead of a swap disk. - -.. _Transcendent memory in a nutshell: https://lwn.net/Articles/454795/ - -Frontswap is so named because it can be thought of as the opposite of -a "backing" store for a swap device. The storage is assumed to be -a synchronous concurrency-safe page-oriented "pseudo-RAM device" conforming -to the requirements of transcendent memory (such as Xen's "tmem", or -in-kernel compressed memory, aka "zcache", or future RAM-like devices); -this pseudo-RAM device is not directly accessible or addressable by the -kernel and is of unknown and possibly time-varying size. The driver -links itself to frontswap by calling frontswap_register_ops to set the -frontswap_ops funcs appropriately and the functions it provides must -conform to certain policies as follows: - -An "init" prepares the device to receive frontswap pages associated -with the specified swap device number (aka "type"). A "store" will -copy the page to transcendent memory and associate it with the type and -offset associated with the page. A "load" will copy the page, if found, -from transcendent memory into kernel memory, but will NOT remove the page -from transcendent memory. An "invalidate_page" will remove the page -from transcendent memory and an "invalidate_area" will remove ALL pages -associated with the swap type (e.g., like swapoff) and notify the "device" -to refuse further stores with that swap type. - -Once a page is successfully stored, a matching load on the page will normally -succeed. So when the kernel finds itself in a situation where it needs -to swap out a page, it first attempts to use frontswap. If the store returns -success, the data has been successfully saved to transcendent memory and -a disk write and, if the data is later read back, a disk read are avoided. -If a store returns failure, transcendent memory has rejected the data, and the -page can be written to swap as usual. - -Note that if a page is stored and the page already exists in transcendent memory -(a "duplicate" store), either the store succeeds and the data is overwritten, -or the store fails AND the page is invalidated. This ensures stale data may -never be obtained from frontswap. - -If properly configured, monitoring of frontswap is done via debugfs in -the `/sys/kernel/debug/frontswap` directory. The effectiveness of -frontswap can be measured (across all swap devices) with: - -``failed_stores`` - how many store attempts have failed - -``loads`` - how many loads were attempted (all should succeed) - -``succ_stores`` - how many store attempts have succeeded - -``invalidates`` - how many invalidates were attempted - -A backend implementation may provide additional metrics. - -FAQ -=== - -* Where's the value? - -When a workload starts swapping, performance falls through the floor. -Frontswap significantly increases performance in many such workloads by -providing a clean, dynamic interface to read and write swap pages to -"transcendent memory" that is otherwise not directly addressable to the kernel. -This interface is ideal when data is transformed to a different form -and size (such as with compression) or secretly moved (as might be -useful for write-balancing for some RAM-like devices). Swap pages (and -evicted page-cache pages) are a great use for this kind of slower-than-RAM- -but-much-faster-than-disk "pseudo-RAM device". - -Frontswap with a fairly small impact on the kernel, -provides a huge amount of flexibility for more dynamic, flexible RAM -utilization in various system configurations: - -In the single kernel case, aka "zcache", pages are compressed and -stored in local memory, thus increasing the total anonymous pages -that can be safely kept in RAM. Zcache essentially trades off CPU -cycles used in compression/decompression for better memory utilization. -Benchmarks have shown little or no impact when memory pressure is -low while providing a significant performance improvement (25%+) -on some workloads under high memory pressure. - -"RAMster" builds on zcache by adding "peer-to-peer" transcendent memory -support for clustered systems. Frontswap pages are locally compressed -as in zcache, but then "remotified" to another system's RAM. This -allows RAM to be dynamically load-balanced back-and-forth as needed, -i.e. when system A is overcommitted, it can swap to system B, and -vice versa. RAMster can also be configured as a memory server so -many servers in a cluster can swap, dynamically as needed, to a single -server configured with a large amount of RAM... without pre-configuring -how much of the RAM is available for each of the clients! - -In the virtual case, the whole point of virtualization is to statistically -multiplex physical resources across the varying demands of multiple -virtual machines. This is really hard to do with RAM and efforts to do -it well with no kernel changes have essentially failed (except in some -well-publicized special-case workloads). -Specifically, the Xen Transcendent Memory backend allows otherwise -"fallow" hypervisor-owned RAM to not only be "time-shared" between multiple -virtual machines, but the pages can be compressed and deduplicated to -optimize RAM utilization. And when guest OS's are induced to surrender -underutilized RAM (e.g. with "selfballooning"), sudden unexpected -memory pressure may result in swapping; frontswap allows those pages -to be swapped to and from hypervisor RAM (if overall host system memory -conditions allow), thus mitigating the potentially awful performance impact -of unplanned swapping. - -A KVM implementation is underway and has been RFC'ed to lkml. And, -using frontswap, investigation is also underway on the use of NVM as -a memory extension technology. - -* Sure there may be performance advantages in some situations, but - what's the space/time overhead of frontswap? - -If CONFIG_FRONTSWAP is disabled, every frontswap hook compiles into -nothingness and the only overhead is a few extra bytes per swapon'ed -swap device. If CONFIG_FRONTSWAP is enabled but no frontswap "backend" -registers, there is one extra global variable compared to zero for -every swap page read or written. If CONFIG_FRONTSWAP is enabled -AND a frontswap backend registers AND the backend fails every "store" -request (i.e. provides no memory despite claiming it might), -CPU overhead is still negligible -- and since every frontswap fail -precedes a swap page write-to-disk, the system is highly likely -to be I/O bound and using a small fraction of a percent of a CPU -will be irrelevant anyway. - -As for space, if CONFIG_FRONTSWAP is enabled AND a frontswap backend -registers, one bit is allocated for every swap page for every swap -device that is swapon'd. This is added to the EIGHT bits (which -was sixteen until about 2.6.34) that the kernel already allocates -for every swap page for every swap device that is swapon'd. (Hugh -Dickins has observed that frontswap could probably steal one of -the existing eight bits, but let's worry about that minor optimization -later.) For very large swap disks (which are rare) on a standard -4K pagesize, this is 1MB per 32GB swap. - -When swap pages are stored in transcendent memory instead of written -out to disk, there is a side effect that this may create more memory -pressure that can potentially outweigh the other advantages. A -backend, such as zcache, must implement policies to carefully (but -dynamically) manage memory limits to ensure this doesn't happen. - -* OK, how about a quick overview of what this frontswap patch does - in terms that a kernel hacker can grok? - -Let's assume that a frontswap "backend" has registered during -kernel initialization; this registration indicates that this -frontswap backend has access to some "memory" that is not directly -accessible by the kernel. Exactly how much memory it provides is -entirely dynamic and random. - -Whenever a swap-device is swapon'd frontswap_init() is called, -passing the swap device number (aka "type") as a parameter. -This notifies frontswap to expect attempts to "store" swap pages -associated with that number. - -Whenever the swap subsystem is readying a page to write to a swap -device (c.f swap_writepage()), frontswap_store is called. Frontswap -consults with the frontswap backend and if the backend says it does NOT -have room, frontswap_store returns -1 and the kernel swaps the page -to the swap device as normal. Note that the response from the frontswap -backend is unpredictable to the kernel; it may choose to never accept a -page, it could accept every ninth page, or it might accept every -page. But if the backend does accept a page, the data from the page -has already been copied and associated with the type and offset, -and the backend guarantees the persistence of the data. In this case, -frontswap sets a bit in the "frontswap_map" for the swap device -corresponding to the page offset on the swap device to which it would -otherwise have written the data. - -When the swap subsystem needs to swap-in a page (swap_readpage()), -it first calls frontswap_load() which checks the frontswap_map to -see if the page was earlier accepted by the frontswap backend. If -it was, the page of data is filled from the frontswap backend and -the swap-in is complete. If not, the normal swap-in code is -executed to obtain the page of data from the real swap device. - -So every time the frontswap backend accepts a page, a swap device read -and (potentially) a swap device write are replaced by a "frontswap backend -store" and (possibly) a "frontswap backend loads", which are presumably much -faster. - -* Can't frontswap be configured as a "special" swap device that is - just higher priority than any real swap device (e.g. like zswap, - or maybe swap-over-nbd/NFS)? - -No. First, the existing swap subsystem doesn't allow for any kind of -swap hierarchy. Perhaps it could be rewritten to accommodate a hierarchy, -but this would require fairly drastic changes. Even if it were -rewritten, the existing swap subsystem uses the block I/O layer which -assumes a swap device is fixed size and any page in it is linearly -addressable. Frontswap barely touches the existing swap subsystem, -and works around the constraints of the block I/O subsystem to provide -a great deal of flexibility and dynamicity. - -For example, the acceptance of any swap page by the frontswap backend is -entirely unpredictable. This is critical to the definition of frontswap -backends because it grants completely dynamic discretion to the -backend. In zcache, one cannot know a priori how compressible a page is. -"Poorly" compressible pages can be rejected, and "poorly" can itself be -defined dynamically depending on current memory constraints. - -Further, frontswap is entirely synchronous whereas a real swap -device is, by definition, asynchronous and uses block I/O. The -block I/O layer is not only unnecessary, but may perform "optimizations" -that are inappropriate for a RAM-oriented device including delaying -the write of some pages for a significant amount of time. Synchrony is -required to ensure the dynamicity of the backend and to avoid thorny race -conditions that would unnecessarily and greatly complicate frontswap -and/or the block I/O subsystem. That said, only the initial "store" -and "load" operations need be synchronous. A separate asynchronous thread -is free to manipulate the pages stored by frontswap. For example, -the "remotification" thread in RAMster uses standard asynchronous -kernel sockets to move compressed frontswap pages to a remote machine. -Similarly, a KVM guest-side implementation could do in-guest compression -and use "batched" hypercalls. - -In a virtualized environment, the dynamicity allows the hypervisor -(or host OS) to do "intelligent overcommit". For example, it can -choose to accept pages only until host-swapping might be imminent, -then force guests to do their own swapping. - -There is a downside to the transcendent memory specifications for -frontswap: Since any "store" might fail, there must always be a real -slot on a real swap device to swap the page. Thus frontswap must be -implemented as a "shadow" to every swapon'd device with the potential -capability of holding every page that the swap device might have held -and the possibility that it might hold no pages at all. This means -that frontswap cannot contain more pages than the total of swapon'd -swap devices. For example, if NO swap device is configured on some -installation, frontswap is useless. Swapless portable devices -can still use frontswap but a backend for such devices must configure -some kind of "ghost" swap device and ensure that it is never used. - -* Why this weird definition about "duplicate stores"? If a page - has been previously successfully stored, can't it always be - successfully overwritten? - -Nearly always it can, but no, sometimes it cannot. Consider an example -where data is compressed and the original 4K page has been compressed -to 1K. Now an attempt is made to overwrite the page with data that -is non-compressible and so would take the entire 4K. But the backend -has no more space. In this case, the store must be rejected. Whenever -frontswap rejects a store that would overwrite, it also must invalidate -the old data and ensure that it is no longer accessible. Since the -swap subsystem then writes the new data to the read swap device, -this is the correct course of action to ensure coherency. - -* Why does the frontswap patch create the new include file swapfile.h? - -The frontswap code depends on some swap-subsystem-internal data -structures that have, over the years, moved back and forth between -static and global. This seemed a reasonable compromise: Define -them as global but declare them in a new include file that isn't -included by the large number of source files that include swap.h. - -Dan Magenheimer, last updated April 9, 2012 diff --git a/Documentation/mm/index.rst b/Documentation/mm/index.rst index 5a94a921ea40482648c17bd703ac50aff2b1510d..31d2ac3064387b8b2b709361cb8d6ddc91301e95 100644 --- a/Documentation/mm/index.rst +++ b/Documentation/mm/index.rst @@ -44,7 +44,6 @@ above structured documentation, or deleted if it has served its purpose. balance damon/index free_page_reporting - frontswap hmm hwpoison hugetlbfs_reserv diff --git a/Documentation/translations/zh_CN/mm/frontswap.rst b/Documentation/translations/zh_CN/mm/frontswap.rst deleted file mode 100644 index 434975390b480cf1222a5cbfc19a97badb9d0270..0000000000000000000000000000000000000000 --- a/Documentation/translations/zh_CN/mm/frontswap.rst +++ /dev/null @@ -1,196 +0,0 @@ -:Original: Documentation/mm/frontswap.rst - -:翻译: - - å¸å»¶è…¾ Yanteng Si <siyanteng@loongson.cn> - -:æ ¡è¯‘: - -========= -Frontswap -========= - -Frontswap为交æ¢é¡µæ供了一个 “transcendent memory†的接å£ã€‚在一些环境ä¸ï¼Œç”± -于交æ¢é¡µè¢«ä¿å˜åœ¨RAM(或类似RAM的设备)ä¸ï¼Œè€Œä¸æ˜¯äº¤æ¢ç£ç›˜ï¼Œå› æ¤å¯ä»¥èŽ·å¾—巨大的性能 -节çœï¼ˆæ高)。 - -.. _Transcendent memory in a nutshell: https://lwn.net/Articles/454795/ - -Frontswap之所以这么命åï¼Œæ˜¯å› ä¸ºå®ƒå¯ä»¥è¢«è®¤ä¸ºæ˜¯ä¸Žswap设备的“backâ€å˜å‚¨ç›¸åã€‚å˜ -储器被认为是一个åŒæ¥å¹¶å‘安全的é¢å‘页é¢çš„“伪RAM设备â€ï¼Œç¬¦åˆtranscendent memory -(如Xen的“tmemâ€ï¼Œæˆ–å†…æ ¸å†…åŽ‹ç¼©å†…å˜ï¼Œåˆç§°â€œzcacheâ€ï¼Œæˆ–未æ¥çš„类似RAMçš„è®¾å¤‡ï¼‰çš„è¦ -求;这个伪RAM设备ä¸èƒ½è¢«å†…æ ¸ç›´æŽ¥è®¿é—®æˆ–å¯»å€ï¼Œå…¶å¤§å°æœªçŸ¥ä¸”å¯èƒ½éšæ—¶é—´å˜åŒ–。驱动程åºé€šè¿‡ -调用frontswap_register_ops将自己与frontswap链接起æ¥ï¼Œä»¥é€‚当地设置frontswap_ops -的功能,它æ供的功能必须符åˆæŸäº›ç–略,如下所示: - -一个 “init†将设备准备好接收与指定的交æ¢è®¾å¤‡ç¼–å·ï¼ˆåˆç§°â€œç±»åž‹â€ï¼‰ç›¸å…³çš„frontswap -交æ¢é¡µã€‚一个 “store†将把该页å¤åˆ¶åˆ°transcendent memory,并与该页的类型和å移 -é‡ç›¸å…³è”。一个 “load†将把该页,如果找到的è¯ï¼Œä»Žtranscendent memoryå¤åˆ¶åˆ°å†…æ ¸ -内å˜ï¼Œä½†ä¸ä¼šä»Žtranscendent memoryä¸åˆ 除该页。一个 “invalidate_page†将从 -transcendent memoryä¸åˆ 除该页,一个 “invalidate_areaâ€ å°†åˆ é™¤æ‰€æœ‰ä¸Žäº¤æ¢ç±»åž‹ -相关的页(例如,åƒswapoff)并通知 “device†拒ç»è¿›ä¸€æ¥å˜å‚¨è¯¥äº¤æ¢ç±»åž‹ã€‚ - -一旦一个页é¢è¢«æˆåŠŸå˜å‚¨ï¼Œåœ¨è¯¥é¡µé¢ä¸Šçš„匹é…åŠ è½½é€šå¸¸ä¼šæˆåŠŸã€‚å› æ¤ï¼Œå½“å†…æ ¸å‘现自己处于需 -è¦äº¤æ¢é¡µé¢çš„情况时,它首先å°è¯•ä½¿ç”¨frontswap。如果å˜å‚¨çš„结果是æˆåŠŸçš„,那么数æ®å°±å·² -ç»æˆåŠŸçš„ä¿å˜åˆ°äº†transcendent memoryä¸ï¼Œå¹¶ä¸”é¿å…了ç£ç›˜å†™å…¥ï¼Œå¦‚æžœåŽæ¥å†è¯»å›žæ•°æ®ï¼Œ -也é¿å…了ç£ç›˜è¯»å–。如果å˜å‚¨è¿”回失败,transcendent memoryå·²ç»æ‹’ç»äº†è¯¥æ•°æ®ï¼Œä¸”该页 -å¯ä»¥åƒå¾€å¸¸ä¸€æ ·è¢«å†™å…¥äº¤æ¢ç©ºé—´ã€‚ - -请注æ„,如果一个页é¢è¢«å˜å‚¨ï¼Œè€Œè¯¥é¡µé¢å·²ç»å˜åœ¨äºŽtranscendent memoryä¸ï¼ˆä¸€ä¸ª “é‡å¤â€ -çš„å˜å‚¨ï¼‰ï¼Œè¦ä¹ˆå˜å‚¨æˆåŠŸï¼Œæ•°æ®è¢«è¦†ç›–,è¦ä¹ˆå˜å‚¨å¤±è´¥ï¼Œè¯¥é¡µé¢è¢«åºŸæ¢ã€‚这确ä¿äº†æ—§çš„æ•°æ®æ°¸è¿œ -ä¸ä¼šä»Žfrontswapä¸èŽ·å¾—。 - -如果é…ç½®æ£ç¡®ï¼Œå¯¹frontswap的监控是通过 `/sys/kernel/debug/frontswap` 目录下的 -debugfs完æˆçš„。frontswap的有效性å¯ä»¥é€šè¿‡ä»¥ä¸‹æ–¹å¼æµ‹é‡ï¼ˆåœ¨æ‰€æœ‰äº¤æ¢è®¾å¤‡ä¸ï¼‰: - -``failed_stores`` - 有多少次å˜å‚¨çš„å°è¯•æ˜¯å¤±è´¥çš„ - -``loads`` - å°è¯•äº†å¤šå°‘æ¬¡åŠ è½½ï¼ˆåº”è¯¥å…¨éƒ¨æˆåŠŸï¼‰ - -``succ_stores`` - 有多少次å˜å‚¨çš„å°è¯•æ˜¯æˆåŠŸçš„ - -``invalidates`` - å°è¯•äº†å¤šå°‘次作废 - -åŽå°å®žçŽ°å¯ä»¥æä¾›é¢å¤–çš„æŒ‡æ ‡ã€‚ - -ç»å¸¸é—®åˆ°çš„问题 -============== - -* 价值在哪里? - -当一个工作负载开始交æ¢æ—¶ï¼Œæ€§èƒ½å°±ä¼šä¸‹é™ã€‚Frontswap通过æ供一个干净的ã€åŠ¨æ€çš„接å£æ¥ -读å–和写入交æ¢é¡µåˆ° “transcendent memoryâ€ï¼Œä»Žè€Œå¤§å¤§å¢žåŠ äº†è®¸å¤šè¿™æ ·çš„å·¥ä½œè´Ÿè½½çš„æ€§ -能,å¦åˆ™å†…æ ¸æ˜¯æ— æ³•ç›´æŽ¥å¯»å€çš„。当数æ®è¢«è½¬æ¢ä¸ºä¸åŒçš„å½¢å¼å’Œå¤§å°ï¼ˆæ¯”如压缩)或者被秘密 -移动(对于一些类似RAM的设备æ¥è¯´ï¼Œè¿™å¯èƒ½å¯¹å†™å¹³è¡¡å¾ˆæœ‰ç”¨ï¼‰æ—¶ï¼Œè¿™ä¸ªæŽ¥å£æ˜¯ç†æƒ³çš„ã€‚äº¤æ¢ -页(和被驱é€çš„页é¢ç¼“å˜é¡µï¼‰æ˜¯è¿™ç§æ¯”RAM慢但比ç£ç›˜å¿«å¾—多的“伪RAM设备â€çš„一大用途。 - -Frontswapå¯¹å†…æ ¸çš„å½±å“相当å°ï¼Œä¸ºå„ç§ç³»ç»Ÿé…ç½®ä¸æ›´åŠ¨æ€ã€æ›´çµæ´»çš„RAM利用æ供了巨大的 -çµæ´»æ€§ï¼š - -在å•ä¸€å†…æ ¸çš„æƒ…å†µä¸‹ï¼Œåˆç§°â€œzcacheâ€ï¼Œé¡µé¢è¢«åŽ‹ç¼©å¹¶å˜å‚¨åœ¨æœ¬åœ°å†…å˜ä¸ï¼Œä»Žè€Œå¢žåŠ 了å¯ä»¥å®‰ -å…¨ä¿å˜åœ¨RAMä¸çš„匿å页é¢æ€»æ•°ã€‚Zcache本质上是用压缩/解压缩的CPU周期æ¢å–更好的内å˜åˆ© -用率。Benchmarks测试显示,当内å˜åŽ‹åŠ›è¾ƒä½Žæ—¶ï¼Œå‡ 乎没有影å“,而在高内å˜åŽ‹åŠ›ä¸‹çš„一些 -工作负载上,则有明显的性能改善(25%以上)。 - -“RAMster†在zcacheçš„åŸºç¡€ä¸Šå¢žåŠ äº†å¯¹é›†ç¾¤ç³»ç»Ÿçš„ “peer-to-peer†transcendent memory -的支æŒã€‚Frontswap页é¢åƒzcacheä¸€æ ·è¢«æœ¬åœ°åŽ‹ç¼©ï¼Œä½†éšåŽè¢«â€œremotified†到å¦ä¸€ä¸ªç³» -统的RAM。这使得RAMå¯ä»¥æ ¹æ®éœ€è¦åŠ¨æ€åœ°æ¥å›žè´Ÿè½½å¹³è¡¡ï¼Œä¹Ÿå°±æ˜¯è¯´ï¼Œå½“系统A超载时,它å¯ä»¥ -交æ¢åˆ°ç³»ç»ŸB,å之亦然。RAMster也å¯ä»¥è¢«é…ç½®æˆä¸€ä¸ªå†…å˜æœåŠ¡å™¨ï¼Œå› æ¤é›†ç¾¤ä¸çš„许多æœåŠ¡å™¨ -å¯ä»¥æ ¹æ®éœ€è¦åŠ¨æ€åœ°äº¤æ¢åˆ°é…置有大é‡å†…å˜çš„å•ä¸€æœåŠ¡å™¨ä¸Š......而ä¸éœ€è¦é¢„å…ˆé…ç½®æ¯ä¸ªå®¢æˆ· -有多少内å˜å¯ç”¨ - -在虚拟情况下,虚拟化的全部æ„义在于统计地将物ç†èµ„æºåœ¨å¤šä¸ªè™šæ‹Ÿæœºçš„ä¸åŒéœ€æ±‚ä¹‹é—´è¿›è¡Œå¤ -用。对于RAMæ¥è¯´ï¼Œè¿™çœŸçš„很难åšåˆ°ï¼Œè€Œä¸”在ä¸æ”¹å˜å†…æ ¸çš„æƒ…å†µä¸‹ï¼Œè¦åšå¥½è¿™ä¸€ç‚¹çš„努力基本上 -是失败的(除了一些广为人知的特殊情况下的工作负载)。具体æ¥è¯´ï¼ŒXen Transcendent Memory -åŽç«¯å…许管ç†å™¨æ‹¥æœ‰çš„RAM “fallowâ€ï¼Œä¸ä»…å¯ä»¥åœ¨å¤šä¸ªè™šæ‹Ÿæœºä¹‹é—´è¿›è¡Œâ€œtime-sharedâ€ï¼Œ -而且页é¢å¯ä»¥è¢«åŽ‹ç¼©å’Œé‡å¤åˆ©ç”¨ï¼Œä»¥ä¼˜åŒ–RAM的利用率。当客户æ“作系统被诱导交出未充分利用 -çš„RAM时(如 “selfballooningâ€ï¼‰ï¼Œçªç„¶å‡ºçŽ°çš„æ„外内å˜åŽ‹åŠ›å¯èƒ½ä¼šå¯¼è‡´äº¤æ¢ï¼›frontswap -å…许这些页é¢è¢«äº¤æ¢åˆ°ç®¡ç†å™¨RAMä¸æˆ–从管ç†å™¨RAMä¸äº¤æ¢ï¼ˆå¦‚果整体主机系统内å˜æ¡ä»¶å…许), -从而å‡è½»è®¡åˆ’外交æ¢å¯èƒ½å¸¦æ¥çš„å¯æ€•çš„性能影å“。 - -一个KVM的实现æ£åœ¨è¿›è¡Œä¸ï¼Œå¹¶ä¸”å·²ç»è¢«RFC'ed到lkml。而且,利用frontswap,对NVM作为 -内å˜æ‰©å±•æŠ€æœ¯çš„调查也在进行ä¸ã€‚ - -* 当然,在æŸäº›æƒ…况下å¯èƒ½æœ‰æ€§èƒ½ä¸Šçš„优势,但frontswap的空间/时间开销是多少? - -如果 CONFIG_FRONTSWAP 被ç¦ç”¨ï¼Œæ¯ä¸ª frontswap é’©å都会编译æˆç©ºï¼Œå”¯ä¸€çš„å¼€é”€æ˜¯æ¯ -个 swapon'ed swap è®¾å¤‡çš„å‡ ä¸ªé¢å¤–å—节。如果 CONFIG_FRONTSWAP 被å¯ç”¨ï¼Œä½†æ²¡æœ‰ -frontswapçš„ “backend†寄å˜å™¨ï¼Œæ¯è¯»æˆ–写一个交æ¢é¡µå°±ä¼šæœ‰ä¸€ä¸ªé¢å¤–的全局å˜é‡ï¼Œè€Œä¸ -是零。如果 CONFIG_FRONTSWAP 被å¯ç”¨ï¼Œå¹¶ä¸”有一个frontswapçš„backend寄å˜å™¨ï¼Œå¹¶ä¸” -åŽç«¯æ¯æ¬¡ “store†请求都失败(å³å°½ç®¡å£°ç§°å¯èƒ½ï¼Œä½†æ²¡æœ‰æ供内å˜ï¼‰ï¼ŒCPU 的开销ä»ç„¶å¯ä»¥ -忽略ä¸è®¡ - å› ä¸ºæ¯æ¬¡frontswap失败都是在交æ¢é¡µå†™åˆ°ç£ç›˜ä¹‹å‰ï¼Œç³»ç»Ÿå¾ˆå¯èƒ½æ˜¯ I/O 绑定 -çš„ï¼Œæ— è®ºå¦‚ä½•ä½¿ç”¨ä¸€å°éƒ¨åˆ†çš„ CPU 都是ä¸ç›¸å…³çš„。 - -至于空间,如果CONFIG_FRONTSWAP被å¯ç”¨ï¼Œå¹¶ä¸”有一个frontswapçš„backend注册,那么 -æ¯ä¸ªäº¤æ¢è®¾å¤‡çš„æ¯ä¸ªäº¤æ¢é¡µéƒ½ä¼šè¢«åˆ†é…ä¸€ä¸ªæ¯”ç‰¹ã€‚è¿™æ˜¯åœ¨å†…æ ¸å·²ç»ä¸ºæ¯ä¸ªäº¤æ¢è®¾å¤‡çš„æ¯ä¸ªäº¤æ¢ -页分é…çš„8ä½ï¼ˆåœ¨2.6.34之å‰æ˜¯16ä½ï¼‰ä¸Šå¢žåŠ 的。(Hugh Dickins观察到,frontswapå¯èƒ½ -会å·å–现有的8个比特,但是我们以åŽå†æ¥æ‹…心这个å°çš„优化问题)ã€‚å¯¹äºŽæ ‡å‡†çš„4K页é¢å¤§å°çš„ -éžå¸¸å¤§çš„交æ¢ç›˜ï¼ˆè¿™å¾ˆç½•è§ï¼‰ï¼Œè¿™æ˜¯æ¯32GB交æ¢ç›˜1MB开销。 - -当交æ¢é¡µå˜å‚¨åœ¨transcendent memoryä¸è€Œä¸æ˜¯å†™åˆ°ç£ç›˜ä¸Šæ—¶ï¼Œæœ‰ä¸€ä¸ªå‰¯ä½œç”¨ï¼Œå³è¿™å¯èƒ½ä¼š -产生更多的内å˜åŽ‹åŠ›ï¼Œæœ‰å¯èƒ½è¶…过其他的优点。一个backend,比如zcache,必须实现ç–ç•¥ -æ¥ä»”细(但动æ€åœ°ï¼‰ç®¡ç†å†…å˜é™åˆ¶ï¼Œä»¥ç¡®ä¿è¿™ç§æƒ…况ä¸ä¼šå‘生。 - -* 好å§ï¼Œé‚£å°±ç”¨å†…æ ¸éª‡å®¢èƒ½ç†è§£çš„术è¯æ¥å¿«é€Ÿæ¦‚述一下这个frontswapè¡¥ä¸çš„作用如何? - -我们å‡è®¾åœ¨å†…æ ¸åˆå§‹åŒ–过程ä¸ï¼Œä¸€ä¸ªfrontswap çš„ “backend†已ç»æ³¨å†Œäº†ï¼›è¿™ä¸ªæ³¨å†Œè¡¨ -明这个frontswap çš„ “backend†å¯ä»¥è®¿é—®ä¸€äº›ä¸è¢«å†…æ ¸ç›´æŽ¥è®¿é—®çš„â€œå†…å˜â€ã€‚它到底æ -供了多少内å˜æ˜¯å®Œå…¨åŠ¨æ€å’Œéšæœºçš„。 - -æ¯å½“一个交æ¢è®¾å¤‡è¢«äº¤æ¢æ—¶ï¼Œå°±ä¼šè°ƒç”¨frontswap_init(),把交æ¢è®¾å¤‡çš„ç¼–å·ï¼ˆåˆç§°â€œç±» -åž‹â€ï¼‰ä½œä¸ºä¸€ä¸ªå‚æ•°ä¼ ç»™å®ƒã€‚è¿™å°±é€šçŸ¥äº†frontswap,以期待 “store†与该å·ç 相关的交 -æ¢é¡µçš„å°è¯•ã€‚ - -æ¯å½“交æ¢å系统准备将一个页é¢å†™å…¥äº¤æ¢è®¾å¤‡æ—¶ï¼ˆå‚è§swap_writepage()),就会调用 -frontswap_store。Frontswap与frontswap backendå商,如果backend说它没有空 -间,frontswap_store返回-1ï¼Œå†…æ ¸å°±ä¼šç…§å¸¸æŠŠé¡µæ¢åˆ°äº¤æ¢è®¾å¤‡ä¸Šã€‚注æ„,æ¥è‡ªfrontswap -backendçš„å“åº”å¯¹å†…æ ¸æ¥è¯´æ˜¯ä¸å¯é¢„测的;它å¯èƒ½é€‰æ‹©ä»Žä¸æŽ¥å—一个页é¢ï¼Œå¯èƒ½æŽ¥å—æ¯ä¹ä¸ª -页é¢ï¼Œä¹Ÿå¯èƒ½æŽ¥å—æ¯ä¸€ä¸ªé¡µé¢ã€‚但是如果backend确实接å—了一个页é¢ï¼Œé‚£ä¹ˆè¿™ä¸ªé¡µé¢çš„æ•° -æ®å·²ç»è¢«å¤åˆ¶å¹¶ä¸Žç±»åž‹å’Œå移é‡ç›¸å…³è”了,而且backendä¿è¯äº†æ•°æ®çš„æŒä¹…性。在这ç§æƒ…况 -下,frontswap在交æ¢è®¾å¤‡çš„“frontswap_map†ä¸è®¾ç½®äº†ä¸€ä¸ªä½ï¼Œå¯¹åº”于交æ¢è®¾å¤‡ä¸Šçš„ -页é¢å移é‡ï¼Œå¦åˆ™å®ƒå°±ä¼šå°†æ•°æ®å†™å…¥è¯¥è®¾å¤‡ã€‚ - -当交æ¢å系统需è¦äº¤æ¢ä¸€ä¸ªé¡µé¢æ—¶ï¼ˆswap_readpage()),它首先调用frontswap_load(), -检查frontswap_map,看这个页é¢æ˜¯å¦æ—©å…ˆè¢«frontswap backend接å—。如果是,该页 -çš„æ•°æ®å°±ä¼šä»ŽfrontswapåŽç«¯å¡«å……,æ¢å…¥å°±å®Œæˆäº†ã€‚如果ä¸æ˜¯ï¼Œæ£å¸¸çš„交æ¢ä»£ç 将被执行, -以便从真æ£çš„交æ¢è®¾å¤‡ä¸ŠèŽ·å¾—这一页的数æ®ã€‚ - -所以æ¯æ¬¡frontswap backend接å—一个页é¢æ—¶ï¼Œäº¤æ¢è®¾å¤‡çš„读å–和(å¯èƒ½ï¼‰äº¤æ¢è®¾å¤‡çš„写 -入都被 “frontswap backend store†和(å¯èƒ½ï¼‰â€œfrontswap backend loads†-所å–代,这å¯èƒ½ä¼šå¿«å¾—多。 - -* frontswapä¸èƒ½è¢«é…置为一个 “特殊的†交æ¢è®¾å¤‡ï¼Œå®ƒçš„优先级è¦é«˜äºŽä»»ä½•çœŸæ£çš„äº¤æ¢ - 设备(例如åƒzswap,或者å¯èƒ½æ˜¯swap-over-nbd/NFS)? - -首先,现有的交æ¢å系统ä¸å…许有任何ç§ç±»çš„交æ¢å±‚次结构。也许它å¯ä»¥è¢«é‡å†™ä»¥é€‚应层次 -结构,但这将需è¦ç›¸å½“大的改å˜ã€‚å³ä½¿å®ƒè¢«é‡å†™ï¼ŒçŽ°æœ‰çš„交æ¢å系统也使用了å—I/O层,它 -å‡å®šäº¤æ¢è®¾å¤‡æ˜¯å›ºå®šå¤§å°çš„,其ä¸çš„任何页é¢éƒ½æ˜¯å¯çº¿æ€§å¯»å€çš„。Frontswapå‡ ä¹Žæ²¡æœ‰è§¦ -åŠçŽ°æœ‰çš„交æ¢å系统,而是围绕ç€å—I/Oå系统的é™åˆ¶ï¼Œæ供了大é‡çš„çµæ´»æ€§å’ŒåŠ¨æ€æ€§ã€‚ - -例如,frontswap backend对任何交æ¢é¡µçš„接å—是完全ä¸å¯é¢„测的。这对frontswap backend -的定义至关é‡è¦ï¼Œå› 为它赋予了backend完全动æ€çš„决定æƒã€‚在zcacheä¸ï¼Œäººä»¬æ— 法预 -先知é“一个页é¢çš„å¯åŽ‹ç¼©æ€§å¦‚何。å¯åŽ‹ç¼©æ€§ “差†的页é¢ä¼šè¢«æ‹’ç»ï¼Œè€Œ â€œå·®â€ æœ¬èº«ä¹Ÿå¯ -ä»¥æ ¹æ®å½“å‰çš„内å˜é™åˆ¶åŠ¨æ€åœ°å®šä¹‰ã€‚ - -æ¤å¤–,frontswap是完全åŒæ¥çš„,而真æ£çš„交æ¢è®¾å¤‡ï¼Œæ ¹æ®å®šä¹‰ï¼Œæ˜¯å¼‚æ¥çš„,并且使用 -å—I/O。å—I/O层ä¸ä»…是ä¸å¿…è¦çš„,而且å¯èƒ½è¿›è¡Œ “优化â€ï¼Œè¿™å¯¹é¢å‘RAM的设备æ¥è¯´æ˜¯ -ä¸åˆé€‚的,包括将一些页é¢çš„写入延迟相当长的时间。åŒæ¥æ˜¯å¿…须的,以确ä¿åŽç«¯çš„动 -æ€æ€§ï¼Œå¹¶é¿å…棘手的竞争æ¡ä»¶ï¼Œè¿™å°†ä¸å¿…è¦åœ°å¤§å¤§å¢žåŠ frontswapå’Œ/或å—I/Oå系统的 -å¤æ‚性。也就是说,åªæœ‰æœ€åˆçš„ “store†和 “load†æ“作是需è¦åŒæ¥çš„。一个独立 -的异æ¥çº¿ç¨‹å¯ä»¥è‡ªç”±åœ°æ“作由frontswapå˜å‚¨çš„页é¢ã€‚例如,RAMsterä¸çš„ “remotification†-çº¿ç¨‹ä½¿ç”¨æ ‡å‡†çš„å¼‚æ¥å†…æ ¸å¥—æŽ¥å—,将压缩的frontswap页é¢ç§»åŠ¨åˆ°è¿œç¨‹æœºå™¨ã€‚åŒæ ·ï¼Œ -KVM的客户方实现å¯ä»¥è¿›è¡Œå®¢æˆ·å†…压缩,并使用 “batched†hypercalls。 - -在虚拟化环境ä¸ï¼ŒåŠ¨æ€æ€§å…许管ç†ç¨‹åºï¼ˆæˆ–主机æ“作系统)åšâ€œintelligent overcommitâ€ã€‚ -例如,它å¯ä»¥é€‰æ‹©åªæŽ¥å—页é¢ï¼Œç›´åˆ°ä¸»æœºäº¤æ¢å¯èƒ½å³å°†å‘生,然åŽå¼ºè¿«å®¢æˆ·æœºåšä»–们 -自己的交æ¢ã€‚ - -transcendent memoryè§„æ ¼çš„frontswap有一个åå¤„ã€‚å› ä¸ºä»»ä½• “storeâ€ éƒ½å¯ -能失败,所以必须在一个真æ£çš„交æ¢è®¾å¤‡ä¸Šæœ‰ä¸€ä¸ªçœŸæ£çš„æ’槽æ¥äº¤æ¢é¡µé¢ã€‚å› æ¤ï¼Œ -frontswap必须作为æ¯ä¸ªäº¤æ¢è®¾å¤‡çš„ “影å†æ¥å®žçŽ°ï¼Œå®ƒæœ‰å¯èƒ½å®¹çº³äº¤æ¢è®¾å¤‡å¯èƒ½ -容纳的æ¯ä¸€ä¸ªé¡µé¢ï¼Œä¹Ÿæœ‰å¯èƒ½æ ¹æœ¬ä¸å®¹çº³ä»»ä½•é¡µé¢ã€‚è¿™æ„味ç€frontswapä¸èƒ½åŒ…å«æ¯” -swap设备总数更多的页é¢ã€‚例如,如果在æŸäº›å®‰è£…上没有é…置交æ¢è®¾å¤‡ï¼Œfrontswap -å°±æ²¡æœ‰ç”¨ã€‚æ— äº¤æ¢è®¾å¤‡çš„便æºå¼è®¾å¤‡ä»ç„¶å¯ä»¥ä½¿ç”¨frontswap,但是这ç§è®¾å¤‡çš„ -backendå¿…é¡»é…ç½®æŸç§ “ghost†交æ¢è®¾å¤‡ï¼Œå¹¶ç¡®ä¿å®ƒæ°¸è¿œä¸ä¼šè¢«ä½¿ç”¨ã€‚ - - -* 为什么会有这ç§å…³äºŽ “é‡å¤å˜å‚¨â€ 的奇怪定义?如果一个页é¢ä»¥å‰è¢«æˆåŠŸåœ°å˜å‚¨è¿‡ï¼Œ - éš¾é“它ä¸èƒ½æ€»æ˜¯è¢«æˆåŠŸåœ°è¦†ç›–å—? - -å‡ ä¹Žæ€»æ˜¯å¯ä»¥çš„,ä¸ï¼Œæœ‰æ—¶ä¸èƒ½ã€‚考虑一个例å,数æ®è¢«åŽ‹ç¼©äº†ï¼ŒåŽŸæ¥çš„4K页é¢è¢«åŽ‹ -缩到了1K。现在,有人试图用ä¸å¯åŽ‹ç¼©çš„æ•°æ®è¦†ç›–è¯¥é¡µï¼Œå› æ¤ä¼šå 用整个4K。但是 -backend没有更多的空间了。在这ç§æƒ…况下,这个å˜å‚¨å¿…须被拒ç»ã€‚æ¯å½“frontswap -æ‹’ç»ä¸€ä¸ªä¼šè¦†ç›–çš„å˜å‚¨æ—¶ï¼Œå®ƒä¹Ÿå¿…须使旧的数æ®ä½œåºŸï¼Œå¹¶ç¡®ä¿å®ƒä¸å†è¢«è®¿é—®ã€‚å› ä¸ºäº¤ -æ¢å系统会把新的数æ®å†™åˆ°è¯»äº¤æ¢è®¾å¤‡ä¸Šï¼Œè¿™æ˜¯ç¡®ä¿ä¸€è‡´æ€§çš„æ£ç¡®åšæ³•ã€‚ - -* 为什么frontswapè¡¥ä¸ä¼šåˆ›å»ºæ–°çš„头文件swapfile.h? - -frontswap代ç ä¾èµ–于一些swapå系统内部的数æ®ç»“构,这些数æ®ç»“构多年æ¥ä¸€ç›´ -在é™æ€å’Œå…¨å±€ä¹‹é—´æ¥å›žç§»åŠ¨ã€‚这似乎是一个åˆç†çš„妥å:将它们定义为全局,但在一 -个新的包å«æ–‡ä»¶ä¸å£°æ˜Žå®ƒä»¬ï¼Œè¯¥æ–‡ä»¶ä¸è¢«åŒ…å«swap.h的大é‡æºæ–‡ä»¶æ‰€åŒ…å«ã€‚ - -Dan Magenheimer,最åŽæ›´æ–°äºŽ2012å¹´4月9æ—¥ diff --git a/Documentation/translations/zh_CN/mm/index.rst b/Documentation/translations/zh_CN/mm/index.rst index 2f53e37b80497f3174438e2d3ad548d0bdaef51c..b950dd118be73e63587f9eb44b0d9ad7164bc956 100644 --- a/Documentation/translations/zh_CN/mm/index.rst +++ b/Documentation/translations/zh_CN/mm/index.rst @@ -42,7 +42,6 @@ Linux内å˜ç®¡ç†æ–‡æ¡£ damon/index free_page_reporting ksm - frontswap hmm hwpoison hugetlbfs_reserv diff --git a/MAINTAINERS b/MAINTAINERS index 9e4cfcd7998a01e02e95c7f35fc90b9d00ae6e86..9f0179682d91ce3c8c48254831c563b5c48fbf79 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8404,13 +8404,6 @@ F: Documentation/power/freezing-of-tasks.rst F: include/linux/freezer.h F: kernel/freezer.c -FRONTSWAP API -M: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> -L: linux-kernel@vger.kernel.org -S: Maintained -F: include/linux/frontswap.h -F: mm/frontswap.c - FS-CACHE: LOCAL CACHING FOR NETWORK FILESYSTEMS M: David Howells <dhowells@redhat.com> L: linux-cachefs@redhat.com (moderated for non-subscribers) diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c index 8dca4d6d96c7c7c6fe52e78249b964d9839d5564..74e3c3815696a6129640c91876c4b1195799d178 100644 --- a/fs/proc/meminfo.c +++ b/fs/proc/meminfo.c @@ -17,6 +17,7 @@ #ifdef CONFIG_CMA #include <linux/cma.h> #endif +#include <linux/zswap.h> #include <asm/page.h> #include "internal.h" diff --git a/include/linux/frontswap.h b/include/linux/frontswap.h deleted file mode 100644 index eaa0ac5f9003035629a25bdd76bf493a276179ea..0000000000000000000000000000000000000000 --- a/include/linux/frontswap.h +++ /dev/null @@ -1,91 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef _LINUX_FRONTSWAP_H -#define _LINUX_FRONTSWAP_H - -#include <linux/swap.h> -#include <linux/mm.h> -#include <linux/bitops.h> -#include <linux/jump_label.h> - -struct frontswap_ops { - void (*init)(unsigned); /* this swap type was just swapon'ed */ - int (*store)(unsigned, pgoff_t, struct page *); /* store a page */ - int (*load)(unsigned, pgoff_t, struct page *, bool *); /* load a page */ - void (*invalidate_page)(unsigned, pgoff_t); /* page no longer needed */ - void (*invalidate_area)(unsigned); /* swap type just swapoff'ed */ -}; - -int frontswap_register_ops(const struct frontswap_ops *ops); - -extern void frontswap_init(unsigned type, unsigned long *map); -extern int __frontswap_store(struct page *page); -extern int __frontswap_load(struct page *page); -extern void __frontswap_invalidate_page(unsigned, pgoff_t); -extern void __frontswap_invalidate_area(unsigned); - -#ifdef CONFIG_FRONTSWAP -extern struct static_key_false frontswap_enabled_key; - -static inline bool frontswap_enabled(void) -{ - return static_branch_unlikely(&frontswap_enabled_key); -} - -static inline void frontswap_map_set(struct swap_info_struct *p, - unsigned long *map) -{ - p->frontswap_map = map; -} - -static inline unsigned long *frontswap_map_get(struct swap_info_struct *p) -{ - return p->frontswap_map; -} -#else -/* all inline routines become no-ops and all externs are ignored */ - -static inline bool frontswap_enabled(void) -{ - return false; -} - -static inline void frontswap_map_set(struct swap_info_struct *p, - unsigned long *map) -{ -} - -static inline unsigned long *frontswap_map_get(struct swap_info_struct *p) -{ - return NULL; -} -#endif - -static inline int frontswap_store(struct page *page) -{ - if (frontswap_enabled()) - return __frontswap_store(page); - - return -1; -} - -static inline int frontswap_load(struct page *page) -{ - if (frontswap_enabled()) - return __frontswap_load(page); - - return -1; -} - -static inline void frontswap_invalidate_page(unsigned type, pgoff_t offset) -{ - if (frontswap_enabled()) - __frontswap_invalidate_page(type, offset); -} - -static inline void frontswap_invalidate_area(unsigned type) -{ - if (frontswap_enabled()) - __frontswap_invalidate_area(type); -} - -#endif /* _LINUX_FRONTSWAP_H */ diff --git a/include/linux/swap.h b/include/linux/swap.h index 456546443f1f3041ed8e50b325922523d14e7a16..bb5adc6041448a15f834b3e4fd8919e3731ef357 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -302,10 +302,6 @@ struct swap_info_struct { struct file *swap_file; /* seldom referenced */ unsigned int old_block_size; /* seldom referenced */ struct completion comp; /* seldom referenced */ -#ifdef CONFIG_FRONTSWAP - unsigned long *frontswap_map; /* frontswap in-use, one bit per page */ - atomic_t frontswap_pages; /* frontswap pages in-use counter */ -#endif spinlock_t lock; /* * protect map scan related fields like * swap_map, lowest_bit, highest_bit, @@ -630,11 +626,6 @@ static inline int mem_cgroup_swappiness(struct mem_cgroup *mem) } #endif -#ifdef CONFIG_ZSWAP -extern u64 zswap_pool_total_size; -extern atomic_t zswap_stored_pages; -#endif - #if defined(CONFIG_SWAP) && defined(CONFIG_MEMCG) && defined(CONFIG_BLK_CGROUP) void __folio_throttle_swaprate(struct folio *folio, gfp_t gfp); static inline void folio_throttle_swaprate(struct folio *folio, gfp_t gfp) diff --git a/include/linux/swapfile.h b/include/linux/swapfile.h index 7ed529a77c5b368bb105a0b4c36cdfed3605ea0b..99e3ed469e8877b5983d52827d3d3bd514e18880 100644 --- a/include/linux/swapfile.h +++ b/include/linux/swapfile.h @@ -2,11 +2,6 @@ #ifndef _LINUX_SWAPFILE_H #define _LINUX_SWAPFILE_H -/* - * these were static in swapfile.c but frontswap.c needs them and we don't - * want to expose them to the dozens of source files that include swap.h - */ -extern struct swap_info_struct *swap_info[]; extern unsigned long generic_max_swapfile_size(void); unsigned long arch_max_swapfile_size(void); diff --git a/include/linux/zswap.h b/include/linux/zswap.h new file mode 100644 index 0000000000000000000000000000000000000000..850c377d9b6df8fcbc15eabd5312336b07e0dffa --- /dev/null +++ b/include/linux/zswap.h @@ -0,0 +1,37 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_ZSWAP_H +#define _LINUX_ZSWAP_H + +#include <linux/types.h> +#include <linux/mm_types.h> + +extern u64 zswap_pool_total_size; +extern atomic_t zswap_stored_pages; + +#ifdef CONFIG_ZSWAP + +bool zswap_store(struct page *page); +bool zswap_load(struct page *page); +void zswap_invalidate(int type, pgoff_t offset); +void zswap_swapon(int type); +void zswap_swapoff(int type); + +#else + +static inline bool zswap_store(struct page *page) +{ + return false; +} + +static inline bool zswap_load(struct page *page) +{ + return false; +} + +static inline void zswap_invalidate(int type, pgoff_t offset) {} +static inline void zswap_swapon(int type) {} +static inline void zswap_swapoff(int type) {} + +#endif + +#endif /* _LINUX_ZSWAP_H */ diff --git a/mm/Kconfig b/mm/Kconfig index 1959d048bbf560319fa401c098884aed5d6bb6ac..5fe49c030961ec469a98f4851194d617f542de80 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -25,7 +25,6 @@ menuconfig SWAP config ZSWAP bool "Compressed cache for swap pages" depends on SWAP - select FRONTSWAP select CRYPTO select ZPOOL help @@ -873,9 +872,6 @@ config USE_PERCPU_NUMA_NODE_ID config HAVE_SETUP_PER_CPU_AREA bool -config FRONTSWAP - bool - config CMA bool "Contiguous Memory Allocator" depends on MMU diff --git a/mm/Makefile b/mm/Makefile index 678530a073261f1af59384b857eb915ee2029cfa..e6d9a1d5e84df15d5c5a7415d009b2e318a74fa8 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -72,7 +72,6 @@ ifdef CONFIG_MMU endif obj-$(CONFIG_SWAP) += page_io.o swap_state.o swapfile.o swap_slots.o -obj-$(CONFIG_FRONTSWAP) += frontswap.o obj-$(CONFIG_ZSWAP) += zswap.o obj-$(CONFIG_HAS_DMA) += dmapool.o obj-$(CONFIG_HUGETLBFS) += hugetlb.o diff --git a/mm/frontswap.c b/mm/frontswap.c deleted file mode 100644 index 2fb5df3384b8ebc9b1ceab96c3778e0b1bd8fd43..0000000000000000000000000000000000000000 --- a/mm/frontswap.c +++ /dev/null @@ -1,283 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-only -/* - * Frontswap frontend - * - * This code provides the generic "frontend" layer to call a matching - * "backend" driver implementation of frontswap. See - * Documentation/mm/frontswap.rst for more information. - * - * Copyright (C) 2009-2012 Oracle Corp. All rights reserved. - * Author: Dan Magenheimer - */ - -#include <linux/mman.h> -#include <linux/swap.h> -#include <linux/swapops.h> -#include <linux/security.h> -#include <linux/module.h> -#include <linux/debugfs.h> -#include <linux/frontswap.h> -#include <linux/swapfile.h> - -DEFINE_STATIC_KEY_FALSE(frontswap_enabled_key); - -/* - * frontswap_ops are added by frontswap_register_ops, and provide the - * frontswap "backend" implementation functions. Multiple implementations - * may be registered, but implementations can never deregister. This - * is a simple singly-linked list of all registered implementations. - */ -static const struct frontswap_ops *frontswap_ops __read_mostly; - -#ifdef CONFIG_DEBUG_FS -/* - * Counters available via /sys/kernel/debug/frontswap (if debugfs is - * properly configured). These are for information only so are not protected - * against increment races. - */ -static u64 frontswap_loads; -static u64 frontswap_succ_stores; -static u64 frontswap_failed_stores; -static u64 frontswap_invalidates; - -static inline void inc_frontswap_loads(void) -{ - data_race(frontswap_loads++); -} -static inline void inc_frontswap_succ_stores(void) -{ - data_race(frontswap_succ_stores++); -} -static inline void inc_frontswap_failed_stores(void) -{ - data_race(frontswap_failed_stores++); -} -static inline void inc_frontswap_invalidates(void) -{ - data_race(frontswap_invalidates++); -} -#else -static inline void inc_frontswap_loads(void) { } -static inline void inc_frontswap_succ_stores(void) { } -static inline void inc_frontswap_failed_stores(void) { } -static inline void inc_frontswap_invalidates(void) { } -#endif - -/* - * Due to the asynchronous nature of the backends loading potentially - * _after_ the swap system has been activated, we have chokepoints - * on all frontswap functions to not call the backend until the backend - * has registered. - * - * This would not guards us against the user deciding to call swapoff right as - * we are calling the backend to initialize (so swapon is in action). - * Fortunately for us, the swapon_mutex has been taken by the callee so we are - * OK. The other scenario where calls to frontswap_store (called via - * swap_writepage) is racing with frontswap_invalidate_area (called via - * swapoff) is again guarded by the swap subsystem. - * - * While no backend is registered all calls to frontswap_[store|load| - * invalidate_area|invalidate_page] are ignored or fail. - * - * The time between the backend being registered and the swap file system - * calling the backend (via the frontswap_* functions) is indeterminate as - * frontswap_ops is not atomic_t (or a value guarded by a spinlock). - * That is OK as we are comfortable missing some of these calls to the newly - * registered backend. - * - * Obviously the opposite (unloading the backend) must be done after all - * the frontswap_[store|load|invalidate_area|invalidate_page] start - * ignoring or failing the requests. However, there is currently no way - * to unload a backend once it is registered. - */ - -/* - * Register operations for frontswap - */ -int frontswap_register_ops(const struct frontswap_ops *ops) -{ - if (frontswap_ops) - return -EINVAL; - - frontswap_ops = ops; - static_branch_inc(&frontswap_enabled_key); - return 0; -} - -/* - * Called when a swap device is swapon'd. - */ -void frontswap_init(unsigned type, unsigned long *map) -{ - struct swap_info_struct *sis = swap_info[type]; - - VM_BUG_ON(sis == NULL); - - /* - * p->frontswap is a bitmap that we MUST have to figure out which page - * has gone in frontswap. Without it there is no point of continuing. - */ - if (WARN_ON(!map)) - return; - /* - * Irregardless of whether the frontswap backend has been loaded - * before this function or it will be later, we _MUST_ have the - * p->frontswap set to something valid to work properly. - */ - frontswap_map_set(sis, map); - - if (!frontswap_enabled()) - return; - frontswap_ops->init(type); -} - -static bool __frontswap_test(struct swap_info_struct *sis, - pgoff_t offset) -{ - if (sis->frontswap_map) - return test_bit(offset, sis->frontswap_map); - return false; -} - -static inline void __frontswap_set(struct swap_info_struct *sis, - pgoff_t offset) -{ - set_bit(offset, sis->frontswap_map); - atomic_inc(&sis->frontswap_pages); -} - -static inline void __frontswap_clear(struct swap_info_struct *sis, - pgoff_t offset) -{ - clear_bit(offset, sis->frontswap_map); - atomic_dec(&sis->frontswap_pages); -} - -/* - * "Store" data from a page to frontswap and associate it with the page's - * swaptype and offset. Page must be locked and in the swap cache. - * If frontswap already contains a page with matching swaptype and - * offset, the frontswap implementation may either overwrite the data and - * return success or invalidate the page from frontswap and return failure. - */ -int __frontswap_store(struct page *page) -{ - int ret = -1; - swp_entry_t entry = { .val = page_private(page), }; - int type = swp_type(entry); - struct swap_info_struct *sis = swap_info[type]; - pgoff_t offset = swp_offset(entry); - - VM_BUG_ON(!frontswap_ops); - VM_BUG_ON(!PageLocked(page)); - VM_BUG_ON(sis == NULL); - - /* - * If a dup, we must remove the old page first; we can't leave the - * old page no matter if the store of the new page succeeds or fails, - * and we can't rely on the new page replacing the old page as we may - * not store to the same implementation that contains the old page. - */ - if (__frontswap_test(sis, offset)) { - __frontswap_clear(sis, offset); - frontswap_ops->invalidate_page(type, offset); - } - - ret = frontswap_ops->store(type, offset, page); - if (ret == 0) { - __frontswap_set(sis, offset); - inc_frontswap_succ_stores(); - } else { - inc_frontswap_failed_stores(); - } - - return ret; -} - -/* - * "Get" data from frontswap associated with swaptype and offset that were - * specified when the data was put to frontswap and use it to fill the - * specified page with data. Page must be locked and in the swap cache. - */ -int __frontswap_load(struct page *page) -{ - int ret = -1; - swp_entry_t entry = { .val = page_private(page), }; - int type = swp_type(entry); - struct swap_info_struct *sis = swap_info[type]; - pgoff_t offset = swp_offset(entry); - bool exclusive = false; - - VM_BUG_ON(!frontswap_ops); - VM_BUG_ON(!PageLocked(page)); - VM_BUG_ON(sis == NULL); - - if (!__frontswap_test(sis, offset)) - return -1; - - /* Try loading from each implementation, until one succeeds. */ - ret = frontswap_ops->load(type, offset, page, &exclusive); - if (ret == 0) { - inc_frontswap_loads(); - if (exclusive) { - SetPageDirty(page); - __frontswap_clear(sis, offset); - } - } - return ret; -} - -/* - * Invalidate any data from frontswap associated with the specified swaptype - * and offset so that a subsequent "get" will fail. - */ -void __frontswap_invalidate_page(unsigned type, pgoff_t offset) -{ - struct swap_info_struct *sis = swap_info[type]; - - VM_BUG_ON(!frontswap_ops); - VM_BUG_ON(sis == NULL); - - if (!__frontswap_test(sis, offset)) - return; - - frontswap_ops->invalidate_page(type, offset); - __frontswap_clear(sis, offset); - inc_frontswap_invalidates(); -} - -/* - * Invalidate all data from frontswap associated with all offsets for the - * specified swaptype. - */ -void __frontswap_invalidate_area(unsigned type) -{ - struct swap_info_struct *sis = swap_info[type]; - - VM_BUG_ON(!frontswap_ops); - VM_BUG_ON(sis == NULL); - - if (sis->frontswap_map == NULL) - return; - - frontswap_ops->invalidate_area(type); - atomic_set(&sis->frontswap_pages, 0); - bitmap_zero(sis->frontswap_map, sis->max); -} - -static int __init init_frontswap(void) -{ -#ifdef CONFIG_DEBUG_FS - struct dentry *root = debugfs_create_dir("frontswap", NULL); - if (root == NULL) - return -ENXIO; - debugfs_create_u64("loads", 0444, root, &frontswap_loads); - debugfs_create_u64("succ_stores", 0444, root, &frontswap_succ_stores); - debugfs_create_u64("failed_stores", 0444, root, - &frontswap_failed_stores); - debugfs_create_u64("invalidates", 0444, root, &frontswap_invalidates); -#endif - return 0; -} - -module_init(init_frontswap); diff --git a/mm/page_io.c b/mm/page_io.c index ff4156a44d5d72ab0186abab948775860cebbfbe..5d0baba3578b2e70ae5d95d0f4074a94dc92330d 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -19,12 +19,12 @@ #include <linux/bio.h> #include <linux/swapops.h> #include <linux/writeback.h> -#include <linux/frontswap.h> #include <linux/blkdev.h> #include <linux/psi.h> #include <linux/uio.h> #include <linux/sched/task.h> #include <linux/delayacct.h> +#include <linux/zswap.h> #include "swap.h" static void __end_swap_bio_write(struct bio *bio) @@ -195,7 +195,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc) folio_unlock(folio); return ret; } - if (frontswap_store(&folio->page) == 0) { + if (zswap_store(&folio->page)) { folio_start_writeback(folio); folio_unlock(folio); folio_end_writeback(folio); @@ -512,7 +512,7 @@ void swap_readpage(struct page *page, bool synchronous, struct swap_iocb **plug) } delayacct_swapin_start(); - if (frontswap_load(page) == 0) { + if (zswap_load(page)) { SetPageUptodate(page); unlock_page(page); } else if (data_race(sis->flags & SWP_FS_OPS)) { diff --git a/mm/swapfile.c b/mm/swapfile.c index 346e22b8ae970cbeed27665cccc64872aa480c15..e04eb9c0482db22092b94491762398f1cfc5417a 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -35,13 +35,13 @@ #include <linux/memcontrol.h> #include <linux/poll.h> #include <linux/oom.h> -#include <linux/frontswap.h> #include <linux/swapfile.h> #include <linux/export.h> #include <linux/swap_slots.h> #include <linux/sort.h> #include <linux/completion.h> #include <linux/suspend.h> +#include <linux/zswap.h> #include <asm/tlbflush.h> #include <linux/swapops.h> @@ -95,7 +95,7 @@ static PLIST_HEAD(swap_active_head); static struct plist_head *swap_avail_heads; static DEFINE_SPINLOCK(swap_avail_lock); -struct swap_info_struct *swap_info[MAX_SWAPFILES]; +static struct swap_info_struct *swap_info[MAX_SWAPFILES]; static DEFINE_MUTEX(swapon_mutex); @@ -744,7 +744,7 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset, swap_slot_free_notify = NULL; while (offset <= end) { arch_swap_invalidate_page(si->type, offset); - frontswap_invalidate_page(si->type, offset); + zswap_invalidate(si->type, offset); if (swap_slot_free_notify) swap_slot_free_notify(si->bdev, offset); offset++; @@ -2343,11 +2343,10 @@ static void _enable_swap_info(struct swap_info_struct *p) static void enable_swap_info(struct swap_info_struct *p, int prio, unsigned char *swap_map, - struct swap_cluster_info *cluster_info, - unsigned long *frontswap_map) + struct swap_cluster_info *cluster_info) { - if (IS_ENABLED(CONFIG_FRONTSWAP)) - frontswap_init(p->type, frontswap_map); + zswap_swapon(p->type); + spin_lock(&swap_lock); spin_lock(&p->lock); setup_swap_info(p, prio, swap_map, cluster_info); @@ -2390,7 +2389,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) struct swap_info_struct *p = NULL; unsigned char *swap_map; struct swap_cluster_info *cluster_info; - unsigned long *frontswap_map; struct file *swap_file, *victim; struct address_space *mapping; struct inode *inode; @@ -2515,12 +2513,10 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) p->swap_map = NULL; cluster_info = p->cluster_info; p->cluster_info = NULL; - frontswap_map = frontswap_map_get(p); spin_unlock(&p->lock); spin_unlock(&swap_lock); arch_swap_invalidate_area(p->type); - frontswap_invalidate_area(p->type); - frontswap_map_set(p, NULL); + zswap_swapoff(p->type); mutex_unlock(&swapon_mutex); free_percpu(p->percpu_cluster); p->percpu_cluster = NULL; @@ -2528,7 +2524,6 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) p->cluster_next_cpu = NULL; vfree(swap_map); kvfree(cluster_info); - kvfree(frontswap_map); /* Destroy swap account information */ swap_cgroup_swapoff(p->type); exit_swap_address_space(p->type); @@ -2995,7 +2990,6 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) unsigned long maxpages; unsigned char *swap_map = NULL; struct swap_cluster_info *cluster_info = NULL; - unsigned long *frontswap_map = NULL; struct page *page = NULL; struct inode *inode = NULL; bool inced_nr_rotate_swap = false; @@ -3135,11 +3129,6 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) error = nr_extents; goto bad_swap_unlock_inode; } - /* frontswap enabled? set up bit-per-page map for frontswap */ - if (IS_ENABLED(CONFIG_FRONTSWAP)) - frontswap_map = kvcalloc(BITS_TO_LONGS(maxpages), - sizeof(long), - GFP_KERNEL); if ((swap_flags & SWAP_FLAG_DISCARD) && p->bdev && bdev_max_discard_sectors(p->bdev)) { @@ -3192,16 +3181,15 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) if (swap_flags & SWAP_FLAG_PREFER) prio = (swap_flags & SWAP_FLAG_PRIO_MASK) >> SWAP_FLAG_PRIO_SHIFT; - enable_swap_info(p, prio, swap_map, cluster_info, frontswap_map); + enable_swap_info(p, prio, swap_map, cluster_info); - pr_info("Adding %uk swap on %s. Priority:%d extents:%d across:%lluk %s%s%s%s%s\n", + pr_info("Adding %uk swap on %s. Priority:%d extents:%d across:%lluk %s%s%s%s\n", p->pages<<(PAGE_SHIFT-10), name->name, p->prio, nr_extents, (unsigned long long)span<<(PAGE_SHIFT-10), (p->flags & SWP_SOLIDSTATE) ? "SS" : "", (p->flags & SWP_DISCARDABLE) ? "D" : "", (p->flags & SWP_AREA_DISCARD) ? "s" : "", - (p->flags & SWP_PAGE_DISCARD) ? "c" : "", - (frontswap_map) ? "FS" : ""); + (p->flags & SWP_PAGE_DISCARD) ? "c" : ""); mutex_unlock(&swapon_mutex); atomic_inc(&proc_poll_event); @@ -3231,7 +3219,6 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) spin_unlock(&swap_lock); vfree(swap_map); kvfree(cluster_info); - kvfree(frontswap_map); if (inced_nr_rotate_swap) atomic_dec(&nr_rotate_swap); if (swap_file) diff --git a/mm/zswap.c b/mm/zswap.c index 258e4e17799a0295416ada1fac653ad69a1ce319..be1b6417ef5c377b899e542a6bcc4cc8d2a56efb 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -2,7 +2,7 @@ /* * zswap.c - zswap driver file * - * zswap is a backend for frontswap that takes pages that are in the process + * zswap is a cache that takes pages that are in the process * of being swapped out and attempts to compress and store them in a * RAM-based memory pool. This can result in a significant I/O reduction on * the swap device and, in the case where decompressing from RAM is faster @@ -20,7 +20,6 @@ #include <linux/spinlock.h> #include <linux/types.h> #include <linux/atomic.h> -#include <linux/frontswap.h> #include <linux/rbtree.h> #include <linux/swap.h> #include <linux/crypto.h> @@ -28,7 +27,7 @@ #include <linux/mempool.h> #include <linux/zpool.h> #include <crypto/acompress.h> - +#include <linux/zswap.h> #include <linux/mm_types.h> #include <linux/page-flags.h> #include <linux/swapops.h> @@ -1084,7 +1083,7 @@ static int zswap_get_swap_cache_page(swp_entry_t entry, * * This can be thought of as a "resumed writeback" of the page * to the swap device. We are basically resuming the same swap - * writeback path that was intercepted with the frontswap_store() + * writeback path that was intercepted with the zswap_store() * in the first place. After the page has been decompressed into * the swap cache, the compressed version stored by zswap can be * freed. @@ -1224,13 +1223,11 @@ static void zswap_fill_page(void *ptr, unsigned long value) memset_l(page, value, PAGE_SIZE / sizeof(unsigned long)); } -/********************************* -* frontswap hooks -**********************************/ -/* attempts to compress and store an single page */ -static int zswap_frontswap_store(unsigned type, pgoff_t offset, - struct page *page) +bool zswap_store(struct page *page) { + swp_entry_t swp = { .val = page_private(page), }; + int type = swp_type(swp); + pgoff_t offset = swp_offset(swp); struct zswap_tree *tree = zswap_trees[type]; struct zswap_entry *entry, *dupentry; struct scatterlist input, output; @@ -1238,23 +1235,22 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, struct obj_cgroup *objcg = NULL; struct zswap_pool *pool; struct zpool *zpool; - int ret; unsigned int dlen = PAGE_SIZE; unsigned long handle, value; char *buf; u8 *src, *dst; gfp_t gfp; + int ret; + + VM_WARN_ON_ONCE(!PageLocked(page)); + VM_WARN_ON_ONCE(!PageSwapCache(page)); /* THP isn't supported */ - if (PageTransHuge(page)) { - ret = -EINVAL; - goto reject; - } + if (PageTransHuge(page)) + return false; - if (!zswap_enabled || !tree) { - ret = -ENODEV; - goto reject; - } + if (!zswap_enabled || !tree) + return false; /* * XXX: zswap reclaim does not work with cgroups yet. Without a @@ -1262,10 +1258,8 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, * local cgroup limits. */ objcg = get_obj_cgroup_from_page(page); - if (objcg && !obj_cgroup_may_zswap(objcg)) { - ret = -ENOMEM; + if (objcg && !obj_cgroup_may_zswap(objcg)) goto reject; - } /* reclaim space if needed */ if (zswap_is_full()) { @@ -1275,10 +1269,9 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, } if (zswap_pool_reached_full) { - if (!zswap_can_accept()) { - ret = -ENOMEM; + if (!zswap_can_accept()) goto shrink; - } else + else zswap_pool_reached_full = false; } @@ -1286,7 +1279,6 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, entry = zswap_entry_cache_alloc(GFP_KERNEL); if (!entry) { zswap_reject_kmemcache_fail++; - ret = -ENOMEM; goto reject; } @@ -1303,17 +1295,13 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, kunmap_atomic(src); } - if (!zswap_non_same_filled_pages_enabled) { - ret = -EINVAL; + if (!zswap_non_same_filled_pages_enabled) goto freepage; - } /* if entry is successfully added, it keeps the reference */ entry->pool = zswap_pool_current_get(); - if (!entry->pool) { - ret = -EINVAL; + if (!entry->pool) goto freepage; - } /* compress */ acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx); @@ -1333,19 +1321,17 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, * synchronous in fact. * Theoretically, acomp supports users send multiple acomp requests in one * acomp instance, then get those requests done simultaneously. but in this - * case, frontswap actually does store and load page by page, there is no + * case, zswap actually does store and load page by page, there is no * existing method to send the second page before the first page is done - * in one thread doing frontswap. + * in one thread doing zwap. * but in different threads running on different cpu, we have different * acomp instance, so multiple threads can do (de)compression in parallel. */ ret = crypto_wait_req(crypto_acomp_compress(acomp_ctx->req), &acomp_ctx->wait); dlen = acomp_ctx->req->dlen; - if (ret) { - ret = -EINVAL; + if (ret) goto put_dstmem; - } /* store */ zpool = zswap_find_zpool(entry); @@ -1381,15 +1367,12 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, /* map */ spin_lock(&tree->lock); - do { - ret = zswap_rb_insert(&tree->rbroot, entry, &dupentry); - if (ret == -EEXIST) { - zswap_duplicate_entry++; - /* remove from rbtree */ - zswap_rb_erase(&tree->rbroot, dupentry); - zswap_entry_put(tree, dupentry); - } - } while (ret == -EEXIST); + while (zswap_rb_insert(&tree->rbroot, entry, &dupentry) == -EEXIST) { + zswap_duplicate_entry++; + /* remove from rbtree */ + zswap_rb_erase(&tree->rbroot, dupentry); + zswap_entry_put(tree, dupentry); + } if (entry->length) { spin_lock(&entry->pool->lru_lock); list_add(&entry->lru, &entry->pool->lru); @@ -1402,7 +1385,7 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, zswap_update_total_size(); count_vm_event(ZSWPOUT); - return 0; + return true; put_dstmem: mutex_unlock(acomp_ctx->mutex); @@ -1412,23 +1395,20 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, reject: if (objcg) obj_cgroup_put(objcg); - return ret; + return false; shrink: pool = zswap_pool_last_get(); if (pool) queue_work(shrink_wq, &pool->shrink_work); - ret = -ENOMEM; goto reject; } -/* - * returns 0 if the page was successfully decompressed - * return -1 on entry not found or error -*/ -static int zswap_frontswap_load(unsigned type, pgoff_t offset, - struct page *page, bool *exclusive) +bool zswap_load(struct page *page) { + swp_entry_t swp = { .val = page_private(page), }; + int type = swp_type(swp); + pgoff_t offset = swp_offset(swp); struct zswap_tree *tree = zswap_trees[type]; struct zswap_entry *entry; struct scatterlist input, output; @@ -1436,15 +1416,16 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset, u8 *src, *dst, *tmp; struct zpool *zpool; unsigned int dlen; - int ret; + bool ret; + + VM_WARN_ON_ONCE(!PageLocked(page)); /* find */ spin_lock(&tree->lock); entry = zswap_entry_find_get(&tree->rbroot, offset); if (!entry) { - /* entry was written back */ spin_unlock(&tree->lock); - return -1; + return false; } spin_unlock(&tree->lock); @@ -1452,7 +1433,7 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset, dst = kmap_atomic(page); zswap_fill_page(dst, entry->value); kunmap_atomic(dst); - ret = 0; + ret = true; goto stats; } @@ -1460,7 +1441,7 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset, if (!zpool_can_sleep_mapped(zpool)) { tmp = kmalloc(entry->length, GFP_KERNEL); if (!tmp) { - ret = -ENOMEM; + ret = false; goto freeentry; } } @@ -1481,7 +1462,8 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset, sg_init_table(&output, 1); sg_set_page(&output, page, PAGE_SIZE, 0); acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, dlen); - ret = crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait); + if (crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait)) + WARN_ON(1); mutex_unlock(acomp_ctx->mutex); if (zpool_can_sleep_mapped(zpool)) @@ -1489,16 +1471,16 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset, else kfree(tmp); - BUG_ON(ret); + ret = true; stats: count_vm_event(ZSWPIN); if (entry->objcg) count_objcg_event(entry->objcg, ZSWPIN); freeentry: spin_lock(&tree->lock); - if (!ret && zswap_exclusive_loads_enabled) { + if (ret && zswap_exclusive_loads_enabled) { zswap_invalidate_entry(tree, entry); - *exclusive = true; + SetPageDirty(page); } else if (entry->length) { spin_lock(&entry->pool->lru_lock); list_move(&entry->lru, &entry->pool->lru); @@ -1510,8 +1492,7 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset, return ret; } -/* frees an entry in zswap */ -static void zswap_frontswap_invalidate_page(unsigned type, pgoff_t offset) +void zswap_invalidate(int type, pgoff_t offset) { struct zswap_tree *tree = zswap_trees[type]; struct zswap_entry *entry; @@ -1528,8 +1509,22 @@ static void zswap_frontswap_invalidate_page(unsigned type, pgoff_t offset) spin_unlock(&tree->lock); } -/* frees all zswap entries for the given swap type */ -static void zswap_frontswap_invalidate_area(unsigned type) +void zswap_swapon(int type) +{ + struct zswap_tree *tree; + + tree = kzalloc(sizeof(*tree), GFP_KERNEL); + if (!tree) { + pr_err("alloc failed, zswap disabled for swap type %d\n", type); + return; + } + + tree->rbroot = RB_ROOT; + spin_lock_init(&tree->lock); + zswap_trees[type] = tree; +} + +void zswap_swapoff(int type) { struct zswap_tree *tree = zswap_trees[type]; struct zswap_entry *entry, *n; @@ -1547,29 +1542,6 @@ static void zswap_frontswap_invalidate_area(unsigned type) zswap_trees[type] = NULL; } -static void zswap_frontswap_init(unsigned type) -{ - struct zswap_tree *tree; - - tree = kzalloc(sizeof(*tree), GFP_KERNEL); - if (!tree) { - pr_err("alloc failed, zswap disabled for swap type %d\n", type); - return; - } - - tree->rbroot = RB_ROOT; - spin_lock_init(&tree->lock); - zswap_trees[type] = tree; -} - -static const struct frontswap_ops zswap_frontswap_ops = { - .store = zswap_frontswap_store, - .load = zswap_frontswap_load, - .invalidate_page = zswap_frontswap_invalidate_page, - .invalidate_area = zswap_frontswap_invalidate_area, - .init = zswap_frontswap_init -}; - /********************************* * debugfs functions **********************************/ @@ -1658,16 +1630,11 @@ static int zswap_setup(void) if (!shrink_wq) goto fallback_fail; - ret = frontswap_register_ops(&zswap_frontswap_ops); - if (ret) - goto destroy_wq; if (zswap_debugfs_init()) pr_warn("debugfs initialization failed\n"); zswap_init_state = ZSWAP_INIT_SUCCEED; return 0; -destroy_wq: - destroy_workqueue(shrink_wq); fallback_fail: if (pool) zswap_pool_destroy(pool);