If we imagine containers with only namespace and chroot, we are able to isolate processess and networking. Imagine a setup where multiple containers are running. Chances are that:

  • a single container can use 100% cpu
  • another may consume entire RAM This would mean that the host machine would become unusable as their is no limit set for each container on cpu/memory usage. Google was facing similar challenges while managing their platform - BORG So they build and contributed cgroups to the linux kernel. cgroups is a linux kernel feature that lets you:
  • Group processes
  • Measure resource usage
  • Limit resource consumption
  • Prioritize workloads

In simple words

cgroups decide how much system a process is allowed to use.

if we take Pizza as the system resources (cpu, ram) then with cgroups we slice the pizza and assign one slice per container.


Where do cgroup live in the filesystem

All cgroup control files are exposed via a special virtual filesystem mounted at /sys/fs/cgroup Depending on the linux version we have two layputs for cgroup:

1️⃣ cgroups v1 (older / still common in some setups)

Resources are separated by subsystem.

/sys/fs/cgroup/
├── cpu/
├── memory/
├── pids/
├── blkio/
├── cpuset/
└── devices/

Lets say a docker container exist with id as abc1234. The memory limit will be stored in a file called memory.limit_in_bytes and will be stored on a path that looks like:

/sys/fs/cgroup/memory/docker/abc1234/memory.limit_in_bytes

cpu limit will be stored in the file named cpu.cfs_quota_us and will be stored on:

/sys/fs/cgroup/memory/docker/abc1234/cpu.cfs_quota_us

PID limit:

/sys/fs/cgroup/memory/docker/abc1234/pids.max

2️⃣ cgroups v2 (modern unified hierarchy)

Most modern distros + Kubernetes use this version which has a single hierarchy:

/sys/fs/cgroup Inside this is container slices/folders

/sys/fs/cgroup/docker/abc1234/<file-name for cpu/ram/pid>

Also their is slight change in names of the file(memory.max/cpu.max/pids.max)

Run the following command to check which version of cgroup is present in your system:

mount | grep cgroup

if the output is

cgroup2 on /sys/fs/cgroup type cgroup2

Then you are on V2. If multiple mounts like:

cgroup on /sys/fs/cgroup/memory
cgroup on /sys/fs/cgroup/cpu

then you are on V1


What resource cgroups control


CPU

This includes:

  • CPU shares(priority)
  • CPU quotas(hard limits)
  • CPU pinning(specific cores)
docker run --cpus=1 nginx

This command would ensure that the nginx container can use only 1 cpu core.

RAM

  docker run -m 512m nginx

This will make sure that the container will not be able to use more than 512 mb of ram. If it tries to do so, linux has a built in mechanism called OOM(out of memory) killer. This will be used by the kernel to enforce the cgroup rule and to kill the container.

Block I/O (Disk Throughput)

This controls disk priority and read/write speeds.

Process count (PIDS)

This limits how many processes a container can spawn

Example: Docker

docker run -m 1g --cpus=2 myapp.py

This command creates a cgroup, assigns container process and writes the limits into kernel control files.

Kubernetes Kubernetes maps directly to cgroups. In a pod spec:

resources: requests: memory: “1Gi” limits: memory: “2Gi”

Term Meaning
Request Guaranteed allocation
Limit Hard ceiling

Lets take an example of running a llm model in a pod.

Model RAM needed
BERT base ~1–2 GB
Llama 7B ~14 GB
Llama 70B 100+ GB

If you set wrong limit where allowed limit is less than the model size, pod will crash repeatedly. So the sizing is very important as it can lead to failed deployments or even node instability.

Updated: