cgroups-Resource control
If we imagine containers with only namespace and chroot, we are able to isolate processess and networking. Imagine a setup where multiple containers are running. Chances are that:
- a single container can use 100% cpu
- another may consume entire RAM This would mean that the host machine would become unusable as their is no limit set for each container on cpu/memory usage. Google was facing similar challenges while managing their platform - BORG So they build and contributed cgroups to the linux kernel. cgroups is a linux kernel feature that lets you:
- Group processes
- Measure resource usage
- Limit resource consumption
- Prioritize workloads
In simple words
cgroups decide how much system a process is allowed to use.
if we take Pizza as the system resources (cpu, ram) then with cgroups we slice the pizza and assign one slice per container.
Where do cgroup live in the filesystem
All cgroup control files are exposed via a special virtual filesystem mounted at /sys/fs/cgroup Depending on the linux version we have two layputs for cgroup:
1️⃣ cgroups v1 (older / still common in some setups)
Resources are separated by subsystem.
/sys/fs/cgroup/
├── cpu/
├── memory/
├── pids/
├── blkio/
├── cpuset/
└── devices/
Lets say a docker container exist with id as abc1234. The memory limit will be stored in a file called memory.limit_in_bytes and will be stored on a path that looks like:
/sys/fs/cgroup/memory/docker/abc1234/memory.limit_in_bytes
cpu limit will be stored in the file named cpu.cfs_quota_us and will be stored on:
/sys/fs/cgroup/memory/docker/abc1234/cpu.cfs_quota_us
PID limit:
/sys/fs/cgroup/memory/docker/abc1234/pids.max
2️⃣ cgroups v2 (modern unified hierarchy)
Most modern distros + Kubernetes use this version which has a single hierarchy:
/sys/fs/cgroup Inside this is container slices/folders
/sys/fs/cgroup/docker/abc1234/<file-name for cpu/ram/pid>Also their is slight change in names of the file(memory.max/cpu.max/pids.max)
Run the following command to check which version of cgroup is present in your system:
mount | grep cgroup
if the output is
cgroup2 on /sys/fs/cgroup type cgroup2
Then you are on V2. If multiple mounts like:
cgroup on /sys/fs/cgroup/memory
cgroup on /sys/fs/cgroup/cpu
then you are on V1
What resource cgroups control
CPU
This includes:
- CPU shares(priority)
- CPU quotas(hard limits)
- CPU pinning(specific cores)
docker run --cpus=1 nginx
This command would ensure that the nginx container can use only 1 cpu core.
RAM
docker run -m 512m nginx
This will make sure that the container will not be able to use more than 512 mb of ram. If it tries to do so, linux has a built in mechanism called OOM(out of memory) killer. This will be used by the kernel to enforce the cgroup rule and to kill the container.
Block I/O (Disk Throughput)
This controls disk priority and read/write speeds.
Process count (PIDS)
This limits how many processes a container can spawn
Example: Docker
docker run -m 1g --cpus=2 myapp.py
This command creates a cgroup, assigns container process and writes the limits into kernel control files.
Kubernetes Kubernetes maps directly to cgroups. In a pod spec:
resources: requests: memory: “1Gi” limits: memory: “2Gi”
| Term | Meaning |
|---|---|
| Request | Guaranteed allocation |
| Limit | Hard ceiling |
Lets take an example of running a llm model in a pod.
| Model | RAM needed |
|---|---|
| BERT base | ~1–2 GB |
| Llama 7B | ~14 GB |
| Llama 70B | 100+ GB |
If you set wrong limit where allowed limit is less than the model size, pod will crash repeatedly. So the sizing is very important as it can lead to failed deployments or even node instability.