CPU Utilization and how to monitor high CPU usage in Linux
Contents
What is CPU Utilization?
CPU utilization refers to a computer’s usage of processing resources or the amount of work handled by a CPU. Actual CPU utilization varies depending on the amount and type of managed computing tasks. Certain tasks require heavy CPU time, while others require less because of non-CPU resource requirements. CPU utilization may be used to measure system performance.
CPU utilization should not be confused with CPU load.
Understanding Linux CPU stats:
The top command produces a frequently-updated list of processes. By default, the processes are ordered by percentage of CPU usage, with only the “top” CPU consumers shown. The top command shows how much processing power and memory are being used, as well as other information about the running processes.
# “top”
The 3 CPU states:
Idle, which means it has nothing to do.
Running a user space program, like a command shell, an email server, or a compiler.
Running the kernel, servicing interrupts or managing resources.
CPU statistics explained as:
The output from top is divided into two sections. The first few lines give a summary of the system resources including a breakdown of the number of tasks, the CPU statistics, and the current memory usage. Beneath these stats is a live list of the current running processes. This list can be sorted by PID, CPU usage, memory usage, and so on.
The CPU line will look something like this:
1.6%us: This tells us that the processor is spending 1.6% of its time running user space processes. A user space program is any process that doesn’t belong to the kernel. Shells, compilers, databases, web servers, and the programs associated with the desktop are all user space processes. If the processor isn’t idle, it is quite normal that most of the CPU time should be spent running user space processes.
94.6%id: The id statistic tells us that the processor was idle just over 94.6% of the time during the last sampling period. The total of the user space percentage – us, the niced percentage – ni, and the idle percentage – id, should be close to 100%. Which it is in this case. If the CPU is spending a more time in the other states, then something is probably awry.
3.7%sy: This is the amount of time that the CPU spent running the kernel. All the processes and system resources are handled by the Linux kernel. When a user space process needs something from the system, for example when it needs to allocate memory, perform some I/O, or it needs to create a child process, then the kernel is running. In fact, the scheduler itself which determines which process runs next is part of the kernel. The amount of time spent in the kernel should be as low as possible. In this case, just 3.7% of the time given to the different processes was spent in the kernel. This number can peak much higher, especially when there is a lot of I/O happening.
0.0%ni: User space programs can be categorized as those running under their initial priority level or those running with a nice priority. Niceness is a way to tweak the priority level of a process so that it runs less frequently. The niceness level ranges from -20 (most favorable scheduling) to 19 (least favorable). By default, processes on Linux are started with a niceness of 0. The ni stat shows how much time the CPU spent running user space processes that have been niced. On a system where no processes have been niced then the number will be 0.
0.0%wa: Input and output operations, like reading or writing to a disk, are slow compared to the speed of a CPU. Although this operation happens very fast compared to everyday human activities, they are still slow when compared to the performance of a CPU. There are times when the processor has initiated a read or write operation and then it must wait for the result but has nothing else to do. In other words, it is idle while waiting for an I/O operation to complete. The time the CPU spends in this state is shown by the wa statistic.
0.0%hi & 0.0%si: These two statistics show how much time the processor has spent servicing interrupts. ‘hi’, is for hardware interrupts, and ‘si’ is for software interrupts. Hardware interrupts are physical interrupts sent to the CPU from various peripherals like disks and network interfaces. Software interrupts come from processes running on the system. A hardware interrupt will cause the CPU to stop what it is doing and go handle the interrupt. A software interrupt doesn’t occur at the CPU level, but rather at the kernel level.
0.0%st: This applies only to virtual machines. When Linux is running as a virtual machine on a hypervisor, the st (short for stolen) statistic shows how long the virtual CPU has spent waiting for the hypervisor to service another virtual CPU running on a different virtual machine. Since in the real-world these virtual processors are sharing the same physical processor(s) then there will be times when the virtual machine wanted to run but the hypervisor scheduled another virtual machine instead.
Command line tools to monitor CPU utilization and Linux performance:
- top
Linux Top command is a performance monitoring program which is used frequently by many system administrators to monitor Linux performance and it is available under many Linux/Unix like operating systems. The top command used to display all the running and active real-time processes in ordered list and updates it regularly.
It displays CPU usage, Memory usage, Swap Memory, Cache Size, Buffer Size, Process PID, User, Commands and much more. It also shows high memory and cpu utilization of a running processes.
You can use Some more basic commands for top:
# “top -u root” (display specific user process, in this example root is the user)
# Press ‘z‘option in running top command will display running process in color which may help you to identified running process easily. (highlight running process in top)
- vmstat (Virtual memory statistics)
Linux VmStat command used to display statistics of virtual memory, kernerl threads, disks, system processes, I/O blocks, interrupts, CPU activity and much more. By default, vmstat command is not available under Linux systems you need to install a package called sysstat that includes a vmstat program
Free – Amount of free/idle memory spaces.
si – Swaped in every second from disk in Kilo Bytes.
so – Swaped out every second to disk in Kilo Bytes
More useful commands of vmstat:
# Execute vmstat ‘x’ seconds and (‘n’ number of times)
# vmstat 2 6 (with this command, vmstat execute every two seconds and stop automatically after executing six intervals)
# disk statistics
vmstat -d (vmstat with -d option display all disks statistics.)
- lsof (List Open Files)
Lsof command used in many Linux/Unix like system that is used to display list of all the open files and the processes. The open files included are disk files, network sockets, pipes, devices and processes. One of the main reason for using this command is when a disk cannot be unmounted and displays the error that files are being used or opened. With this command, you can easily identify which files are in use.
Let’s understand the terms mentioned in the above screenshot
FD – stands for File descriptor and may see some of the values as:
cwd current working directory
rtd root directory
txt program text (code and data)
mem memory-mapped file
Also, in FD column numbers like 1u is actual file descriptor and followed by u,r,w of its mode as:
r for read access.
w for write access.
u for read and write access.
TYPE – of files and it’s identification.
DIR – Directory
REG – Regular file
CHR – Character special file.
FIFO – First In First Out
- sar
The sar command writes to standard output the contents of selected cumulative activity counters in the operating system. The accounting system, based on the values in the count and interval parameters.
# sar -u 2 5
display comparison of CPU utilization; 2 seconds apart; 5 times
Output (for each 2 second. 5 lines are displayed) as shown in the above screenshot.
Where,
-u 12 5 : Report CPU utilization. The following values are displayed:
- %user: Percentage of CPU utilization that occurred while executing at the user level (application).
- %nice: Percentage of CPU utilization that occurred while executing at the user level with nice priority.
- %system: Percentage of CPU utilization that occurred while executing at the system level (kernel).
- %iowait: Percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request.
- %idle: Percentage of time that the CPU or CPUs were idle, and the system did not have an outstanding disk I/O request.
Monitor High CPU usage in Linux:
You can expect the amount of time the CPU spends in idle to be small. However, if a system rarely has any idle time then it is either
- overloaded or b) something is wrong.
Here is a brief look at some of the things that can go wrong and how they affect the CPU utilization.
High user mode – If a system suddenly jumps from having spare CPU cycles to running flat out, then the first thing to check is the amount of time the CPU spends running user space processes. If this is high, then it probably means that a process has gone crazy and is eating up all the CPU time. Using the top command, you will be able to see which process is to blame and restart the service or kill the process.
High kernel usage – Sometimes this is acceptable. For example, a program that does lots of consoles I/O can cause the kernel usage to spike. However, if it remains higher for long periods of time then it could be an indication that something isn’t right. A possible cause of such spikes could be a problem with a driver/kernel module.
High niced value – If the amount of time the CPU is spending running processes with a nice priority value jumps then it means that someone has started some intensive CPU jobs on the system, but they have niced the task.
If the niceness level is greater than zero, then the user has been courteous enough lower to the priority of the process and therefore avoid a CPU overload. But if the niceness level is less than 0, then you will need to investigate what is happening and who is responsible, as such a task could easily cripple the responsiveness of the system.
High waiting on I/O – This means that there is some intensive I/O tasks running on the system that doesn’t use up much CPU time. If this number is high for anything other than short bursts then it means that either the I/O performed by the task is very inefficient, or the data is being transferred to a very slow device, or there is a potential problem with a hard disk that is taking a long time to process reads & writes.
High interrupt processing – This could be an indication of a broken peripheral that is causing lots of hardware interrupts or of a process that is issuing lots of software interrupts.
Some useful commands to check who is monopolizing the CPUs
Following command will displays the top 10 CPU users on the Linux system.
# ps -eo pcpu,pid,user,args | sort -k 1 -r | head -10
#ps -eo pid -o ppid -o user -o pcpu -o pmem -o vsz -o rss -o stime -o args
I hope you find this post useful……
Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above…
Thank you ?
[ratings]
Linux System Admin | shell scripting | Ansible developer