TL;DR The operating system controls when and how long processes run on the CPUs.
A program is a static sequence of instructions. When it is being executed, it becomes a process managed by the operating system. The operating system controls which processes have access to the CPU at a given point in time.
Apart from the construction and destruction states, a process mostly switches between the following states:
The kernel divides time into slices (a.k.a. time quantum) and schedules the processes onto those time slices. The length of a time quantum is a trade-off between fairness and efficiency:
Timer interrupt handler:
How does the kernel decide which process to run next? It should give every process a chance to run so none of them is starved. This is the main topic of the post.
Linux uses a scheduler called Completely Fair Scheduler (CFS) (reference).
Threads have priorities (e.g. normal, real-time). Threads with high priority can preempt lower-priority ones.
There's one queue per priority. The scheduler picks the head of the queue, runs it, and puts it back to the tail of the queue.
Scheduling policies
Cgroups (control groups) can be used to isolate resources and specify CPU quotas. Without such isolation, the processes are free to run on the CPUs and run as fast as the hardware permits.
A cgroup quota is controlled by two factors: period
(usually 100ms) and quota
. If a process exceeds its CPU quota within a period, it won't be able to run until the next period.
The length of period
is a trade-off between throughput and latency.
The value quota only matters as a percentage of the period.
cpuset allows a process to be pinned on a certain CPU core. This way, throttling is no longer necessary and the processes are free to use the cores assigned.
There're many considerations about the physical layout of the CPU cores when actually enabling cpuset. See this blog post for more.