Resource Management

RNREDDY
Aug 26
5 min read

Efficiently pack containers on finite nodes without starving critical workloads, while balancing predictability, performance, and cost.

Core Levers

Requests

What the scheduler sees.

Minimum guaranteed resources (CPU/memory).

Drive pod placement across nodes.

If requests are inaccurate → wrong bin-packing and potential starvation.

Limits

What the kubelet enforces at runtime via cgroups.

Ceiling for resource usage.

Prevents runaway containers from hogging a node.

If limits are too low → throttling (CPU) or OOMKilled (memory).

Example: Running Two Pods on a Node

Node Capacity

CPU: 4 cores

Memory: 8 GiB

Pod A (critical workload)

Requests: 2 CPU, 4 GiB

Limits: 3 CPU, 6 GiB

What this means:

Scheduler sees 2 CPU, 4 GiB → ensures Pod A always has that much reserved on a node.

At runtime, Pod A can burst up to 3 CPU, 6 GiB if the node has slack.

Pod A gets Guaranteed-ish performance because it reserves enough upfront.

Pod B (batch job)

Requests: 1 CPU, 1 GiB

Limits: 4 CPU, 4 GiB

👉 What this means:

Scheduler only sees 1 CPU, 1 GiB → very cheap placement!

But at runtime, Pod B might try to spike up to 4 CPU (all cores) if available.

If node is busy, kubelet throttles it down to its fair share (other pods’ requests are honored first).

QoS Classes

QoS classes: who gets evicted first under pressure

Guaranteed: For every container in the pod, CPU and memory have requests == limits.

Highest priority for memory; least likely to be evicted.

Burstable: At least one container has a request set (requests may differ from limits).

Middle priority.

BestEffort: No requests/limits set.

Evicted first. Avoid in production.

Tip: Make critical control‑plane‑adjacent workloads Guaranteed when possible.

Kubernetes’ way to rank pods under pressure.

Guaranteed → requests = limits for all containers.

Burstable → at least one request set, not all equal to limits.

BestEffort → no requests/limits → first to be killed under pressure.

QoS Classes

Pod A → Burstable (requests ≠ limits) but well-protected.

Pod B → Burstable but riskier (low request, high limit).

If they had no requests/limits → BestEffort → first to be killed.

If requests == limits → Guaranteed → strongest QoS protection.

Golden Rule

The scheduler only considers requests, never limits.

Scheduler = “Where do I put it?”

Kubelet/cgroups = “How much can it actually eat?”

If requests are wrong, the cluster gets mispacked → wasted capacity, noisy neighbors, and unpredictable performance.

DEEP DIVE USE CASE:

Understanding Kubernetes Resource Management

If you're just getting into Kubernetes, one of the most misunderstood yet critical topics is resource management. Whether you’re deploying a single microservice or running a production grade system, how you configure CPU and memory can decide how stable, efficient, and secure your workloads are.

Let’s break it down from the basics and gradually build towards real-world application.

Requests and Limits - What They Really Mean

Containers run inside pods, and every container consumes compute resources. But you don’t just let them use as much as they want. You tell Kubernetes two things:

Request: This is the minimum resource the container is guaranteed. Kubernetes uses this during scheduling.

Limit: This is the upper cap. A container cannot exceed this.

Think of it like booking a hotel room.

The "request" is the space you block ahead of time.

The "limit" is the maximum you’re allowed to use. You can’t burst beyond it.

In case of CPU:

If usage exceeds the request but stays under the limit, the container runs, but may get throttled if the node is under pressure.

If it goes beyond the limit, it gets throttled instantly.

For memory:

If it uses more than requested, it's allowed, but if it crosses the limit, the pod is killed with an OOM (Out Of Memory) error.

A real world pod manifest look like this:

This kind of setup is essential in shared clusters. It avoids noisy neighbor issues and makes sure pods don’t hog all the resources.

Next, you need to know about Resource Quotas.

A Resource Quota sets hard limits on how much CPU, memory, and other resources a namespace can use. It prevents teams from overusing shared cluster resources and ensures fair access across environments.

How Kubernetes Applies Resource Quotas

The API server receives the pod request and runs admission checks.

The ResourceQuota plugin verifies if CPU and memory stay within namespace limits.

If over quota, the request is denied before scheduling.

If valid, usage is updated and the pod is linked to a service account.

The final state is saved in etcd for tracking and enforcement.

Having established how requests, limits, and resource quotas work in Kubernetes, let’s discuss with visuals how they are structured per namespace and how they apply directly to pods and containers.

EX2 II:

Having established how requests, limits, and resource quotas work in Kubernetes, let’s discuss with visuals how they are structured per namespace and how they apply directly to pods and containers.

Each namespace in a Kubernetes cluster can have its own ResourceQuota. This lets platform teams allocate fixed CPU and memory budgets for different teams, environments, or applications.

For example, Namespace 1 and Namespace 2 in the below setup are each assigned a quota of 250 milli CPUs and 600 MiB memory.

All pods running inside that namespace must fit within that total budget, across all containers.

When a pod is created, Kubernetes looks at the sum of all current requests in that namespace.

It adds the new pod’s requests values to the total.

If the result crosses the namespace’s quota, the pod is rejected by the API server.

Here’s the catch:

Teams forget the quota is cumulative and assume it applies to each pod.

Quota usage doesn’t reset unless pods are deleted or updated, so stale resources stay counted.

If you don’t define requests in your pod spec, and a quota is set, the API server may block the pod creation entirely.

Always make sure:

Every pod defines requests and limits.

Teams are aware of their namespace budgets.

Quota values are regularly audited and updated based on real usage.

Resource quotas operate at the namespace level but track usage based on pod level requests and limits.

This means quotas do not directly limit running usage, but they block new pod creation if declared requests push the namespace over the quota. It’s not the actual usage that matters, it’s what you declare.

Even a small pod with multiple containers can breach limits if each sets a request, leading to failed deployments.

Final Take Away

1. Start with sensible defaults

Use LimitRange to enforce minimum and maximum requests and limits for all pods in a namespace.

2. Use kubectl describe quota

Regularly check actual resource usage vs quota per namespace. This helps avoid surprises during deployments.

3. Monitor and alert on throttling and OOM kills

Use Prometheus metrics like container_cpu_cfs_throttled_seconds_total and oom_killed events to catch issues early.

4. Revisit quotas monthly

Teams grow. Workloads shift. Quotas that worked last quarter may now block scale. Adjust based on usage patterns.

5. Automate dry runs

Add admission checks or GitOps CI jobs that simulate pod scheduling against quota before merging YAML to main.

6. Document team level budgets

Even if quotas are in place, maintain a central view of how much each team or environment is allocated and using.

DevOps On Fly

Resource Management

Resource quotas operate at the namespace level but track usage based on pod level requests and limits.

Recent Posts

Comments