- Jul 29, 2024
-
-
Zheng Zucheng authored
In extreme test scenarios: the 14th field utime in /proc/xx/stat is greater than sum_exec_runtime, utime = 18446744073709518790 ns, rtime = 135989749728000 ns In cputime_adjust() process, stime is greater than rtime due to mul_u64_u64_div_u64() precision problem. before call mul_u64_u64_div_u64(), stime = 175136586720000, rtime = 135989749728000, utime = 1416780000. after call mul_u64_u64_div_u64(), stime = 135989949653530 unsigned reversion occurs because rtime is less than stime. utime = rtime - stime = 135989749728000 - 135989949653530 = -199925530 = (u64)18446744073709518790 Trigger condition: 1). User task run in kernel mode most of time 2). ARM64 architecture 3). TICK_CPU_ACCOUNTING=y CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set Fix mul_u64_u64_div_u64() conversion precision by reset stime to rtime Fixes: 3dc167ba ("sched/cputime: Improve cputime_adjust()") Signed-off-by:
Zheng Zucheng <zhengzucheng@h...>
-
- May 27, 2024
-
-
Ingo Molnar authored
Do a spell-checking pass. Signed-off-by:
Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-kernel@vger.kernel.org Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
- Apr 17, 2024
-
-
Alexander Gordeev authored
The generic vtime_task_switch() implementation gets built only if __ARCH_HAS_VTIME_TASK_SWITCH is not defined, but requires an architecture to implement arch_vtime_task_switch() callback at the same time, which is confusing. Further, arch_vtime_task_switch() is implemented for 32-bit PowerPC architecture only and vtime_task_switch() generic variant is rather superfluous. Simplify the whole vtime_task_switch() wiring by moving the existing generic implementation to PowerPC. Signed-off-by:
Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Reviewed-by:
Frederic Weisbecker <frederic@kernel.org> Reviewed-by:
Nicholas Piggin <npiggin@gmail.com> Acked-by:
Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/2cb6e3caada93623f6d4f78ad938ac6cd0e2fda8.1712760275.git.agordeev@linux.ibm.com
-
- Dec 27, 2022
-
-
Nicholas Piggin authored
The archs that use cputime_to_nsecs() internally provide their own definition and don't need the fallback. cputime_to_usecs() unused except in this fallback, and is not defined anywhere. This removes the final remnant of the cputime_t code from the kernel. Signed-off-by:
Nicholas Piggin <npiggin@gmail.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by:
Alexander Gordeev <agordeev@linux.ibm.com> Link: https://lore.kernel.org/r/20221220070705.2958959-1-npiggin@gmail.com
-
- Jul 04, 2022
-
-
Josh Don authored
4feee7d1 previously added per-task forced idle accounting. This patch extends this to also include cgroups. rstat is used for cgroup accounting, except for the root, which uses kcpustat in order to bypass the need for doing an rstat flush when reading root stats. Only cgroup v2 is supported. Similar to the task accounting, the cgroup accounting requires that schedstats is enabled. Signed-off-by:
Josh Don <joshdon@google.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by:
Tejun Heo <tj@kernel.org> Link: https://lkml.kernel.org/r/20220629211426.3329954-1-joshdon@google.com
-
- Feb 23, 2022
-
-
Ingo Molnar authored
Similarly to kernel/sched/build_utility.c, collect all 'scheduling policy' related source code files into kernel/sched/build_policy.c: kernel/sched/idle.c kernel/sched/rt.c kernel/sched/cpudeadline.c kernel/sched/pelt.c kernel/sched/cputime.c kernel/sched/deadline.c With the exception of fair.c, which we continue to build as a separate file for build efficiency and parallelism reasons. Signed-off-by:
Ingo Molnar <mingo@kernel.org> Reviewed-by:
Peter Zijlstra <peterz@infradead.org>
-
- Dec 02, 2021
-
-
Frederic Weisbecker authored
getrusage(RUSAGE_THREAD) with nohz_full may return shorter utime/stime than the actual time. task_cputime_adjusted() snapshots utime and stime and then adjust their sum to match the scheduler maintained cputime.sum_exec_runtime. Unfortunately in nohz_full, sum_exec_runtime is only updated once per second in the worst case, causing a discrepancy against utime and stime that can be updated anytime by the reader using vtime. To fix this situation, perform an update of cputime.sum_exec_runtime when the cputime snapshot reports the task as actually running while the tick is disabled. The related overhead is then contained within the relevant situations. Reported-by:
Hasegawa Hitomi <hasegawa-hitomi@fujitsu.com> Signed-off-by:
Frederic Weisbecker <frederic@kernel.org> Signed-off-by:
Hasegawa Hitomi <hasegawa-hitomi@fujitsu.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Tested-by:
Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Acked-by: Phil Aul...
-
- Nov 23, 2021
-
-
Andrey Ryabinin authored
cpuacct.stat in no-root cgroups shows user time without guest time included int it. This doesn't match with user time shown in root cpuacct.stat and /proc/<pid>/stat. This also affects cgroup2's cpu.stat in the same way. Make account_guest_time() to add user time to cgroup's cpustat to fix this. Fixes: ef12fefa ("cpuacct: add per-cgroup utime/stime statistics") Signed-off-by:
Andrey Ryabinin <arbn@yandex-team.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by:
Daniel Jordan <daniel.m.jordan@oracle.com> Acked-by:
Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20211115164607.23784-1-arbn@yandex-team.com
-
- Mar 21, 2021
-
-
Ingo Molnar authored
Fix ~42 single-word typos in scheduler code comments. We have accumulated a few fun ones over the years. :-) Signed-off-by:
Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Segall <bsegall@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: linux-kernel@vger.kernel.org
-
- Mar 17, 2021
-
-
Thomas Gleixner authored
vtime_account_irq and irqtime_account_irq() base checks on preempt_count() which fails on RT because preempt_count() does not contain the softirq accounting which is seperate on RT. These checks do not need the full preempt count as they only operate on the hard and softirq sections. Use irq_count() instead which provides the correct value on both RT and non RT kernels. The compiler is clever enough to fold the masking for !RT: 99b: 65 8b 05 00 00 00 00 mov %gs:0x0(%rip),%eax - 9a2: 25 ff ff ff 7f and $0x7fffffff,%eax + 9a2: 25 00 ff ff 00 and $0xffff00,%eax Reported-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Tested-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by:
Paul E. McKenney <paulmck@kernel.org> Reviewed-by:
Frederic Weisbecker <frederic@kernel.org> Acked-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210309085727.153926793@linutronix.de
-
- Dec 02, 2020
-
-
Frederic Weisbecker authored
IRQ time entry is currently accounted before HARDIRQ_OFFSET or SOFTIRQ_OFFSET are incremented. This is convenient to decide to which index the cputime to account is dispatched. Unfortunately it prevents tick_irq_enter() from being called under HARDIRQ_OFFSET because tick_irq_enter() has to be called before the IRQ entry accounting due to the necessary clock catch up. As a result we don't benefit from appropriate lockdep coverage on tick_irq_enter(). To prepare for fixing this, move the IRQ entry cputime accounting after the preempt offset is incremented. This requires the cputime dispatch code to handle the extra offset. Signed-off-by:
Frederic Weisbecker <frederic@kernel.org> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Acked-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20201202115732.27827-5-frederic@kernel.org
-
Frederic Weisbecker authored
The 3 architectures implementing CONFIG_VIRT_CPU_ACCOUNTING_NATIVE all have their own version of irq time accounting that dispatch the cputime to the appropriate index: hardirq, softirq, system, idle, guest... from an all-in-one function. Instead of having these ad-hoc versions, move the cputime destination dispatch decision to the core code and leave only the actual per-index cputime accounting to the architecture. Signed-off-by:
Frederic Weisbecker <frederic@kernel.org> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201202115732.27827-4-frederic@kernel.org
-
Frederic Weisbecker authored
s390 has its own version of IRQ entry accounting because it doesn't account the idle time the same way the other architectures do. Only the actual idle sleep time is accounted as idle time, the rest of the idle task execution is accounted as system time. Make the generic IRQ entry accounting aware of architectures that have their own way of accounting idle time and convert s390 to use it. This prepares s390 to get involved in further consolidations of IRQ time accounting. Signed-off-by:
Frederic Weisbecker <frederic@kernel.org> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201202115732.27827-3-frederic@kernel.org
-
Frederic Weisbecker authored
account_irq_enter_time() and account_irq_exit_time() are not called from modules. EXPORT_SYMBOL_GPL() can be safely removed from the IRQ cputime accounting functions called from there. Signed-off-by:
Frederic Weisbecker <frederic@kernel.org> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201202115732.27827-2-frederic@kernel.org
-
- Jun 15, 2020
-
-
Oleg Nesterov authored
People report that utime and stime from /proc/<pid>/stat become very wrong when the numbers are big enough, especially if you watch these counters incrementally. Specifically, the current implementation of: stime*rtime/total, results in a saw-tooth function on top of the desired line, where the teeth grow in size the larger the values become. IOW, it has a relative error. The result is that, when watching incrementally as time progresses (for large values), we'll see periods of pure stime or utime increase, irrespective of the actual ratio we're striving for. Replace scale_stime() with a math64.h helper: mul_u64_u64_div_u64() that is far more accurate. This also allows architectures to override the implementation -- for instance they can opt for the old algorithm if this new one turns out to be too expensive for them. Signed-off-by:
Oleg Nesterov <oleg@redhat.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200519172506.GA317395@hirez.programming.kicks-ass.net
-
- Apr 15, 2020
-
-
Borislav Petkov authored
Work around this warning: kernel/sched/cputime.c: In function ‘kcpustat_field’: kernel/sched/cputime.c:1007:6: warning: ‘val’ may be used uninitialized in this function [-Wmaybe-uninitialized] because GCC can't see that val is used only when err is 0. Acked-by:
Peter Zijlstra <peterz@infradead.org> Signed-off-by:
Borislav Petkov <bp@suse.de> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200327214334.GF8015@zn.tnic
-
- Mar 06, 2020
-
-
Chris Wilson authored
As the vtime is sampled under loose seqcount protection by kcpustat, the vtime fields may change as the code flows. Where logic dictates a field has a static value, use a READ_ONCE. Signed-off-by:
Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by:
Frederic Weisbecker <frederic@kernel.org> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Fixes: 74722bb2 ("sched/vtime: Bring up complete kcpustat accessor") Link: https://lkml.kernel.org/r/20200123180849.28486-1-frederic@kernel.org
-
- Jan 17, 2020
-
-
Alex Shi authored
Every time we call irqtime_account_process_tick() is in a interrupt, Every caller will get and assign a parameter rq = this_rq(), This is unnecessary and increase the code size a little bit. Move the rq getting action to irqtime_account_process_tick internally is better. base with this patch cputime.o 578792 bytes 577888 bytes Signed-off-by:
Alex Shi <alex.shi@linux.alibaba.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1577959674-255537-1-git-send-email-alex.shi@linux.alibaba.com
-
- Nov 21, 2019
-
-
Frederic Weisbecker authored
Many callsites want to fetch the values of system, user, user_nice, guest or guest_nice kcpustat fields altogether or at least a pair of these. In that case calling kcpustat_field() for each requested field brings unecessary overhead when we could fetch all of them in a row. So provide kcpustat_cpu_fetch() that fetches the whole kcpustat array in a vtime safe way under the same RCU and seqcount block. Signed-off-by:
Frederic Weisbecker <frederic@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Yauheni Kaliuta <yauheni.kaliuta@redhat.com> Link: https://lkml.kernel.org/r/20191121024430.19938-3-frederic@kernel.org Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Frederic Weisbecker authored
Provide support for user, nice, guest and guest_nice fields through kcpustat_field(). Whether we account the delta to a nice or not nice field is decided on top of the nice value snapshot taken at the time we call kcpustat_field(). If the nice value of the task has been changed since the last vtime update, we may have inacurrate distribution of the nice VS unnice cputime. However this is considered as a minor issue compared to the proper fix that would involve interrupting the target on nice updates, which is undesired on nohz_full CPUs. Signed-off-by:
Frederic Weisbecker <frederic@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Yauheni Kaliuta <yauheni.kaliuta@redhat.com> Link: https://lkml.kernel.org/r/20191121024430.19938-2-frederic@kernel.org Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
- Oct 29, 2019
-
-
Frederic Weisbecker authored
Kcpustat is not correctly supported on nohz_full CPUs. The tick doesn't fire and the cputime therefore doesn't move forward. The issue has shown up after the vanishing of the remaining 1Hz which has made the stall visible. We are solving that with checking the task running on a CPU through RCU and reading its vtime delta that we add to the raw kcpustat values. We make sure that we fetch a coherent raw-kcpustat/vtime-delta couple sequence while checking that the CPU referred by the target vtime is the correct one, under the locked vtime seqcount. Only CPUTIME_SYSTEM is handled here as a start because it's the trivial case. User and guest time will require more preparation work to correctly handle niceness. Reported-by:
Yauheni Kaliuta <yauheni.kaliuta@redhat.com> Signed-off-by:
Frederic Weisbecker <frederic@kernel.org> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpengli@tencent.com> Link: https://lkml.kernel.org/r/20191025020303.19342-1-frederic@kernel.org Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Frederic Weisbecker authored
Standardize the naming on top of the vtime_accounting_enabled_*() base. Also make it clear we are checking the vtime state of the *current* CPU with this function. We'll need to add an API to check that state on remote CPUs as well, so we must disambiguate the naming. Signed-off-by:
Frederic Weisbecker <frederic@kernel.org> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Jacek Anaszewski <jacek.anaszewski@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J . Wysocki <rjw@rjwysocki.net> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Yauheni Kaliuta <yauheni.kaliuta@redhat.com> Link: https://lkml.kernel.org/r/20191016025700.31277-9-frederic@kernel.org Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Frederic Weisbecker authored
Record guest as a VTIME state instead of guessing it from VTIME_SYS and PF_VCPU. This is going to simplify the cputime read side especially as its state machine is going to further expand in order to fully support kcpustat on nohz_full. Signed-off-by:
Frederic Weisbecker <frederic@kernel.org> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Jacek Anaszewski <jacek.anaszewski@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J . Wysocki <rjw@rjwysocki.net> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Yauheni Kaliuta <yauheni.kaliuta@redhat.com> Link: https://lkml.kernel.org/r/20191016025700.31277-4-frederic@kernel.org Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Frederic Weisbecker authored
Record idle as a VTIME state instead of guessing it from VTIME_SYS and is_idle_task(). This is going to simplify the cputime read side especially as its state machine is going to further expand in order to fully support kcpustat on nohz_full. Signed-off-by:
Frederic Weisbecker <frederic@kernel.org> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Jacek Anaszewski <jacek.anaszewski@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J . Wysocki <rjw@rjwysocki.net> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Yauheni Kaliuta <yauheni.kaliuta@redhat.com> Link: https://lkml.kernel.org/r/20191016025700.31277-3-frederic@kernel.org Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Frederic Weisbecker authored
In order to compute the kcpustat delta on a nohz CPU, we'll need to fetch the task running on that target. Checking that its vtime state snapshot actually refers to the relevant target involves recording that CPU under the seqcount locked on task switch. This is a step toward making kcpustat moving forward on full nohz CPUs. Signed-off-by:
Frederic Weisbecker <frederic@kernel.org> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Jacek Anaszewski <jacek.anaszewski@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J . Wysocki <rjw@rjwysocki.net> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Yauheni Kaliuta <yauheni.kaliuta@redhat.com> Link: https://lkml.kernel.org/r/20191016025700.31277-2-frederic@kernel.org Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
- Oct 09, 2019
-
-
Frederic Weisbecker authored
On context switch we are locking the vtime seqcount of the scheduling-out task twice: * On vtime_task_switch_common(), when we flush the pending vtime through vtime_account_system() * On arch_vtime_task_switch() to reset the vtime state. This is pointless as these actions can be performed without the need to unlock/lock in the middle. The reason these steps are separated is to consolidate a very small amount of common code between CONFIG_VIRT_CPU_ACCOUNTING_GEN and CONFIG_VIRT_CPU_ACCOUNTING_NATIVE. Performance in this fast path is definitely a priority over artificial code factorization so split the task switch code between GEN and NATIVE and mutualize the parts than can run under a single seqcount locked block. As a side effect, vtime_account_idle() becomes included in the seqcount protection. This happens to be a welcome preparation in order to properly support kcpustat under vtime in the future and fetch CPUTIME_IDLE without race. Signed-off-by:
Frederic Weisbecker <frederic@kernel.org> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Yauheni Kaliuta <yauheni.kaliuta@redhat.com> Link: https://lkml.kernel.org/r/20191003161745.28464-3-frederic@kernel.org Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Frederic Weisbecker authored
vtime_account_system() decides if we need to account the time to the system (__vtime_account_system()) or to the guest (vtime_account_guest()). So this function is a misnomer as we are on a higher level than "system". All we know when we call that function is that we are accounting kernel cputime. Whether it belongs to guest or system time is a lower level detail. Rename this function to vtime_account_kernel(). This will clarify things and avoid too many underscored vtime_account_system() versions. Signed-off-by:
Frederic Weisbecker <frederic@kernel.org> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Yauheni Kaliuta <yauheni.kaliuta@redhat.com> Link: https://lkml.kernel.org/r/20191003161745.28464-2-frederic@ker...
-
Frederic Weisbecker authored
vtime_account_system() assumes that the target task to account cputime to is always the current task. This is most often true indeed except on task switch where we call: vtime_common_task_switch(prev) vtime_account_system(prev) Here prev is the scheduling-out task where we account the cputime to. It doesn't match current that is already the scheduling-in task at this stage of the context switch. So we end up checking the wrong task flags to determine if we are accounting guest or system time to the previous task. As a result the wrong task is used to check if the target is running in guest mode. We may then spuriously account or leak either system or guest time on task switch. Fix this assumption and also turn vtime_guest_enter/exit() to use the task passed in parameter as well to avoid future similar issues. Signed-off-by:
Frederic Weisbecker <frederic@kernel.org> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpengli@tencent.com> Fixes: 2a42eb95 ("sched/cputime: Accumulate vtime on top of nsec clocksource") Link: https://lkml.kernel.org/r/20190925214242.21873-1-frederic@kernel.org Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
- May 21, 2019
-
-
Thomas Gleixner authored
Add SPDX license identifiers to all files which: - Have no license information of any form - Have EXPORT_.*_SYMBOL_GPL inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- Dec 03, 2018
-
-
Ingo Molnar authored
Go over the scheduler source code and fix common typos in comments - and a typo in an actual variable name. No change in functionality intended. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
- Mar 04, 2018
-
-
Ingo Molnar authored
Do the following cleanups and simplifications: - sched/sched.h already includes <asm/paravirt.h>, so no need to include it in sched/core.c again. - order the <linux/sched/*.h> headers alphabetically - add all <linux/sched/*.h> headers to kernel/sched/sched.h - remove all unnecessary includes from the .c files that are already included in kernel/sched/sched.h. Finally, make all scheduler .c files use a single common header: #include "sched.h" ... which now contains a union of the relied upon headers. This makes the various .c files easier to read and easier to handle. Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
- Mar 03, 2018
-
-
Ingo Molnar authored
A good number of small style inconsistencies have accumulated in the scheduler core, so do a pass over them to harmonize all these details: - fix speling in comments, - use curly braces for multi-line statements, - remove unnecessary parentheses from integer literals, - capitalize consistently, - remove stray newlines, - add comments where necessary, - remove invalid/unnecessary comments, - align structure definitions and other data types vertically, - add missing newlines for increased readability, - fix vertical tabulation where it's misaligned, - harmonize preprocessor conditional block labeling and vertical alignment, - remove line-breaks where they uglify the code, - add newline after local variable definitions, No change in functionality: md5: 1191fa0a890cfa8132156d2959d7e9e2 built-in.o.before.asm 1191fa0a890cfa8132156d2959d7e9e2 built-in.o.after.asm Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
- Nov 08, 2017
-
-
Frederic Weisbecker authored
Use lockdep to check that IRQs are enabled or disabled as expected. This way the sanity check only shows overhead when concurrency correctness debug code is enabled. Signed-off-by:
Frederic Weisbecker <frederic@kernel.org> Acked-by:
Thomas Gleixner <tglx@linutronix.de> Cc: David S . Miller <davem@davemloft.net> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1509980490-4285-12-git-send-email-frederic@kernel.org Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
- Sep 25, 2017
-
-
Tejun Heo authored
cfb766da ("sched/cputime: Expose cputime_adjust()") made cputime_adjust() public for cgroup basic cpu stat support; however, the commit forgot to add a dummy implementaiton for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE leading to compiler errors on some s390 configurations. Fix it by adding the missing dummy implementation. Reported-by:
“kbuild-all@01.org” <kbuild-all@01.org> Fixes: cfb766da ("sched/cputime: Expose cputime_adjust()") Signed-off-by:
Tejun Heo <tj@kernel.org>
-
Tejun Heo authored
Introduce cgroup_account_cputime[_field]() which wrap cpuacct_charge() and cgroup_account_field(). This doesn't introduce any functional changes and will be used to add cgroup basic resource accounting. Signed-off-by:
Tejun Heo <tj@kernel.org> Acked-by:
Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com>
-
Tejun Heo authored
Will be used by basic cgroup resource stat reporting later. Signed-off-by:
Tejun Heo <tj@kernel.org> Acked-by:
Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Li Zefan <lizefan@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org>
-
- Jul 14, 2017
-
-
Wanpeng Li authored
Recent kernels trigger this warning: BUG: using smp_processor_id() in preemptible [00000000] code: 99-trinity/181 caller is debug_smp_processor_id+0x17/0x19 CPU: 0 PID: 181 Comm: 99-trinity Not tainted 4.12.0-01059-g2a42eb9 #1 Call Trace: dump_stack+0x82/0xb8 check_preemption_disabled() debug_smp_processor_id() vtime_delta() task_cputime() thread_group_cputime() thread_group_cputime_adjusted() wait_consider_task() do_wait() SYSC_wait4() do_syscall_64() entry_SYSCALL64_slow_path() As Frederic pointed out: | Although those sched_clock_cpu() things seem to only matter when the | sched_clock() is unstable. And that stability is a condition for nohz_full | to work anyway. So probably sched_clock() alone would be enough. This patch fixes it by replacing sched_clock_cpu() with sched_clock() to avoid calling smp_processor_id() in a preemptible context. Reported-by:
Xiaolong Ye <xiaolong.ye@intel.com> Signed-off-by:
Wanpeng Li <wanpeng.li@hotmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1499586028-7402-1-git-send-email-wanpeng.li@hotmail.com [ Prettified the changelog. ] Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
- Jul 05, 2017
-
-
Wanpeng Li authored
Currently the cputime source used by vtime is jiffies. When we cross a context boundary and jiffies have changed since the last snapshot, the pending cputime is accounted to the switching out context. This system works ok if the ticks are not aligned across CPUs. If they instead are aligned (ie: all fire at the same time) and the CPUs run in userspace, the jiffies change is only observed on tick exit and therefore the user cputime is accounted as system cputime. This is because the CPU that maintains timekeeping fires its tick at the same time as the others. It updates jiffies in the middle of the tick and the other CPUs see that update on IRQ exit: CPU 0 (timekeeper) CPU 1 ------------------- ------------- jiffies = N ... run in userspace for a jiffy tick entry tick entry (sees jiffies = N) set jiffies = N + 1 tick exit tick exit (sees jiffies = N + 1) account 1 jiffy as stime Fix this with using a nanosec clock source instead of jiffies. The cputime is then accumulated and flushed everytime the pending delta reaches a jiffy in order to mitigate the accounting overhead. [ fweisbec: changelog, rebase on struct vtime, field renames, add delta on cputime readers, keep idle vtime as-is (low overhead accounting), harmonize clock sources. ] Suggested-by:
Thomas Gleixner <tglx@linutronix.de> Reported-by:
Luiz Capitulino <lcapitulino@redhat.com> Tested-by:
Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by:
Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by:
Frederic Weisbecker <fweisbec@gmail.com> Reviewed-by:
Thomas Gleixner <tglx@linutronix.de> Acked-by:
Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wanpeng Li <kernellwp@gmail.com> Link: http://lkml.kernel.org/r/1498756511-11714-6-git-send-email-fweisbec@gmail.com Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Frederic Weisbecker authored
We are about to add vtime accumulation fields to the task struct. Let's avoid more bloatification and gather vtime information to their own struct. Tested-by:
Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by:
Frederic Weisbecker <fweisbec@gmail.com> Reviewed-by:
Thomas Gleixner <tglx@linutronix.de> Acked-by:
Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wanpeng Li <kernellwp@gmail.com> Link: http://lkml.kernel.org/r/1498756511-11714-5-git-send-email-fweisbec@gmail.com Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Frederic Weisbecker authored
The current "snapshot" based naming on vtime fields suggests we record some past event but that's a low level picture of their actual purpose which comes out blurry. The real point of these fields is to run a basic state machine that tracks down cputime entry while switching between contexts. So lets reflect that with more meaningful names. Tested-by:
Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by:
Frederic Weisbecker <fweisbec@gmail.com> Reviewed-by:
Thomas Gleixner <tglx@linutronix.de> Acked-by:
Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wanpeng Li <kernellwp@gmail.com> Link: http://lkml.kernel.org/r/1498756511-11714-4-git-send-email-fweisbec@gmail.com Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-