top of page

Understanding Kubernetes Architecture for Modern Applications in 2025

Updated: Aug 4

ree













Therefore, if you are seeking to:

  1. Comprehend the architecture of Kubernetes

  2. Acquire a thorough understanding of the fundamental concepts behind Kubernetes components.

  3. Study the components of Kubernetes architecture

  4. Investigate the workflows that integrate these components



A Kubernetes cluster is comprised of control plane nodes and worker nodes.


Control Plane

The control plane is tasked with container orchestration and ensuring the cluster's desired state is maintained. It includes the following components:

  1. kube-apiserver

  2. etcd

  3. kube-scheduler

  4. kube-controller-manager

  5. cloud-controller-manager

A cluster may comprise one or more control plane nodes.


Worker Node

The worker nodes are responsible for executing containerized applications. Each worker node includes the following components:

  1. kubelet

  2. kube-proxy

  3. Container runtime


Core Components of the Kubernetes Control Plane


First, let's take a look at each control plane component and the important concepts behind each component.



1. kube-apiserver


The kube-api server is the central hub of the Kubernetes cluster that exposes the Kubernetes API. It is highly scalable and can handle large number of concurrent requests.


End users, and other cluster components, talk to the cluster via the API server. Very rarely monitoring systems and third-party services may talk to API servers to interact with the cluster.


So when you use kubectl to manage the cluster, at the backend you are actually communicating with the API server through HTTP REST APIs over TLS.


Also, the communication between the API server and other components in the cluster happens over TLS to prevent unauthorized access to the cluster.


The Kubernetes API server is responsible for the following functions:

  1. API Management: It exposes the cluster API endpoint and processes all API requests. The API is versioned and supports multiple API versions concurrently.

  2. Authentication (via client certificates, bearer tokens, and HTTP Basic Authentication) and Authorization (through ABAC and RBAC evaluation).

  3. It processes API requests and validates data for API objects such as pods and services, utilizing Validation and Mutation Admission controllers.

  4. The API server coordinates all operations between the control plane and worker node components.

  5. The API server includes an aggregation layer, allowing for the extension of the Kubernetes API to create custom API resources and controllers.

  6. The only component to which the kube-apiserver initiates a connection is the etcd component. All other components connect to the API server.

  7. The API server supports monitoring resources for changes. Clients can establish a watch on specific resources and receive real-time notifications when those resources are created, modified, or deleted.

  8. Each component (Kubelet, scheduler, controllers) independently monitors the API server to determine its tasks.



Additionally, the API server includes a built-in apiserver proxy.

This proxy is part of the API server process and is primarily used to enable access to ClusterIP services from outside the cluster.

Security Note: To minimize the cluster's attack surface, securing the API server is essential. The Shadowserver Foundation conducted an experiment revealing 380,000 publicly accessible Kubernetes API servers.



2. etcd

Kubernetes operates as a distributed system and requires a robust distributed database like etcd to support its architecture. It functions as both a backend service discovery mechanism and a database, essentially serving as the brain of the Kubernetes cluster.


etcd is an open-source, strongly consistent, distributed key-value store. What does this entail?


  1. Strongly consistent: When an update occurs on one node, strong consistency ensures that the update is immediately reflected across all nodes in the cluster. According to the CAP theorem, achieving 100% availability while maintaining strong consistency and partition tolerance is unattainable.

  2. Distributed: etcd is engineered to operate across multiple nodes as a cluster without compromising consistency.

  3. Key Value Store: This non-relational database stores data in key-value pairs and offers a key-value API. The datastore is built on BboltDB, a fork of BoltDB.

etcd employs the raft consensus algorithm to maintain strong consistency and availability. It operates in a leader-member configuration to ensure high availability and resilience against node failures.


How does etcd integrate with Kubernetes?


In essence, when you use kubectl to retrieve Kubernetes object details, the information is sourced from etcd. Similarly, when deploying an object like a pod, an entry is created in etcd.

In summary, here are the key points about etcd:

  1. etcd stores all configurations, states, and metadata of Kubernetes objects (e.g., pods, secrets, daemonsets, deployments, configmaps, statefulsets, etc.).

  2. etcd allows clients to subscribe to events using the Watch() API. The Kubernetes API server utilizes etcd’s watch functionality to monitor changes in object states.

  3. etcd exposes a key-value API using gRPC. Additionally, the gRPC gateway acts as a RESTful proxy, translating HTTP API calls into gRPC messages, making it an ideal database for Kubernetes.

  4. etcd stores all objects under the /registry directory key in a key-value format. For instance, details of a pod named Nginx in the default namespace can be found under /registry/pods/default/nginx.


Additionally, etcd is the sole Statefulset component within the control plane.

The fault tolerance of an etcd cluster is directly influenced by the number of nodes it contains. The breakdown is as follows:

  1. 3 nodes: Can tolerate 1 node failure (quorum = 2)

  2. 5 nodes: Can tolerate 2 node failures (quorum = 3)

  3. 7 nodes: Can tolerate 3 node failures (quorum = 4)

And so forth. The general formula for determining the number of node failures a cluster can withstand is:

fault tolerance = (n - 1) / 2

Where n represents the total number of nodes.


3. kube-scheduler

The kube-scheduler is tasked with scheduling Kubernetes pods on worker nodes.


When deploying a pod, you specify its requirements, such as CPU, memory, affinity, taints or tolerations, priority, and persistent volumes (PV), among others. The scheduler's main function is to recognize the creation request and select the most suitable node that meets these requirements.

Here is the process by which the scheduler operates:


  1. The kube-scheduler employs filtering and scoring operations to identify the optimal node.

  2. During filtering, the scheduler identifies nodes that are best suited for scheduling the pod. For instance, if there are five worker nodes with sufficient resources, all five are selected. If no nodes are available, the pod becomes unschedulable and is placed in the scheduling queue. In large clusters, such as those with 100 worker nodes, the scheduler does not iterate over all nodes. Instead, it uses a configuration parameter called percentageOfNodesToScore, which typically defaults to 50%. This parameter guides the scheduler to iterate over 50% of the nodes in a round-robin manner. If worker nodes are distributed across multiple zones, the scheduler iterates over nodes in different zones. For very large clusters, the default percentageOfNodesToScore is 5%.

  3. In the scoring phase, the scheduler assigns scores to the filtered worker nodes. It utilizes multiple scheduling plugins to perform this scoring. The node with the highest score is selected for scheduling the pod. If nodes have identical scores, a node is chosen randomly.

  4. After a node is selected, the scheduler creates a binding event in the API server, effectively binding the pod to the node.


4. Kube Controller Manager

What is a controller? Controllers are programs that operate continuous control loops, monitoring the actual versus desired state of objects. If discrepancies are detected, they ensure that the Kubernetes resource or object reaches the desired state.


According to the official documentation,

In Kubernetes, controllers are control loops that observe the cluster's state and make or request changes as necessary. Each controller aims to align the current cluster state with the desired state.


For instance, when creating a deployment, the desired state is specified in the manifest YAML file (a declarative approach). This might include details like two replicas, a volume mount, and a configmap. The built-in deployment controller ensures that the deployment consistently remains in the desired state. If a user updates the deployment to five replicas, the deployment controller detects this and adjusts to maintain five replicas as the desired state.


The Kube controller manager is a component responsible for managing all Kubernetes controllers. Resources and objects such as pods, namespaces, jobs, and replicasets are managed by their respective controllers. Additionally, the Kube scheduler is a controller managed by the Kube controller manager.


Below is a list of significant built-in Kubernetes controllers:

  1. Deployment controller

  2. Replicaset controller

  3. DaemonSet controller

  4. Job Controller (Kubernetes Jobs)

  5. CronJob Controller

  6. Endpoints controller

  7. Namespace controller

  8. Service accounts controller

  9. Node controller


Key points about the Kube controller manager:


  1. It administers all controllers, which work to maintain the cluster in the desired state.

  2. Kubernetes can be extended with custom controllers associated with a custom resource definition.


5. Cloud Controller Manager (CCM)

In cloud environments, when Kubernetes is deployed, the Cloud Controller Manager serves as a conduit between Cloud Platform APIs and the Kubernetes cluster.


This architecture allows the core Kubernetes components to operate independently, enabling cloud providers to integrate with Kubernetes through plugins. For instance, it acts as an interface between a Kubernetes cluster and AWS cloud API.


Cloud controller integration enables a Kubernetes cluster to provision cloud resources such as instances (for nodes), Load Balancers (for services), and Storage Volumes (for persistent volumes).


The Cloud Controller Manager comprises a set of cloud platform-specific controllers that maintain the desired state of cloud-specific components (nodes, Load Balancers, storage, etc.). The following are the three primary controllers within the Cloud Controller Manager.

  1. Node controller: This controller updates node-related information by interfacing with the cloud provider API. Examples include node labeling and annotation, obtaining hostname, CPU and memory availability, node health, etc.

  2. Route controller: It is tasked with configuring networking routes on a cloud platform, enabling communication between pods on different nodes.

  3. Service controller: This controller manages the deployment of load balancers for Kubernetes services, IP address assignment, etc.


Below are some classic examples of the Cloud Controller Manager's functionality.


  1. Deploying a Kubernetes Service of type Load Balancer, where Kubernetes provisions a cloud-specific Load Balancer and integrates it with the Kubernetes Service.

  2. Provisioning storage volumes (PV) for pods supported by cloud storage solutions.


Overall, the Cloud Controller Manager oversees the lifecycle of cloud-specific resources utilized by Kubernetes.

Core Components of Kubernetes Worker Node

Let's examine each of the worker node components.

1. Kubelet

Kubelet is an agent component that operates on every node in the cluster. It runs as a daemon managed by systemd, rather than as a container.

It is responsible for registering worker nodes with the API server and working with the podSpec (Pod specification - YAML or JSON) primarily obtained from the API server. The podSpec defines the containers to run inside the pod, their resources (e.g., CPU and memory limits), and other settings such as environment variables, volumes, and labels.

It then brings the podSpec to its desired state by creating containers.


In summary, kubelet is responsible for the following tasks:

  1. Creating, modifying, and deleting containers for the pod.

  2. Managing liveliness, readiness, and startup probes.

  3. Mounting volumes by reading pod configuration and creating corresponding directories on the host for volume mounts.

  4. Collecting and reporting Node and pod status via API server calls using implementations like cAdvisor and CRI.

Kubelet also acts as a controller, monitoring pod changes and utilizing the node's container runtime to pull images, run containers, etc.


In addition to PodSpecs from the API server, kubelet can accept podSpec from a file, HTTP endpoint, and HTTP server. A notable example of “podSpec from a file” is Kubernetes static pods.


Static pods are managed by kubelet, not the API servers.

This means pods can be created by providing a pod YAML location to the Kubelet component. However, static pods created by Kubelet are not managed by the API server.


Here is a practical use case of a static pod:

During the bootstrapping of the control plane, kubelet initiates the API server, scheduler, and controller manager as static pods from podSpecs located at /etc/kubernetes/manifests


Here are some key aspects of kubelet:

  1. Kubelet uses the CRI (Container Runtime Interface) gRPC interface to communicate with the container runtime.

  2. It also exposes an HTTP endpoint to stream logs and provides exec sessions for clients.

  3. Utilizes the CSI (Container Storage Interface) gRPC to configure block volumes.

  4. Employs the CNI plugin configured in the cluster to allocate the pod IP address and set up necessary network routes and firewall rules for the pod.


2. Kube Proxy

To comprehend Kube Proxy, it is essential to have a fundamental understanding of Kubernetes Service and Endpoint objects.

In Kubernetes, a Service is a method to expose a group of pods either internally or to external traffic. Upon creating a Service object, it is assigned a virtual IP, known as ClusterIP, which is only accessible within the Kubernetes cluster.

The Endpoint object includes all the IP addresses and ports of pod groups associated with a Service object. The Endpoints controller maintains a list of pod IP addresses (endpoints), while the Service controller configures these endpoints to a service.

The ClusterIP cannot be pinged as it is solely used for service discovery, unlike pod IPs which are pingable.


Now, let's explore Kube Proxy.

Kube-proxy is a daemon running on every node as a daemonset. It acts as a proxy component that implements the Kubernetes Services concept for pods, providing a single DNS for a set of pods with load balancing. It primarily proxies UDP, TCP, and SCTP, and does not interpret HTTP.

When pods are exposed using a Service (ClusterIP), Kube-proxy establishes network rules to direct traffic to the backend pods (endpoints) associated with the Service object. This means that all load balancing and service discovery are managed by Kube Proxy.

How does Kube-proxy function?

Kube Proxy communicates with the API server to obtain details about the Service (ClusterIP) and the corresponding pod IPs and ports (endpoints). It also monitors changes in services and endpoints.


Kube-proxy then employs one of the following modes to create or update rules for routing traffic to pods behind a Service:


  1. IPTables: This is the default mode. In IPTables mode, traffic is managed by IPtable rules, which are created for each service. These rules capture traffic directed to the ClusterIP and forward it to the backend pods. In this mode, kube-proxy randomly selects a backend pod for load balancing. Once a connection is established, requests are sent to the same pod until the connection is terminated.

  2. NFTables: Currently in beta, this mode primarily addresses performance and scalability limitations in iptables, especially in large clusters with thousands of services.

  3. IPVS: For clusters with more than 1000 services, IPVS offers enhanced performance. It supports various load-balancing algorithms for the backend.

    1. rr: round-robin (default mode).

    2. lc: least connection (fewest open connections).

    3. dh: destination hashing.

    4. sh: source hashing.

    5. sed: shortest expected delay.

    6. nq: never queue.

  4. Userspace (legacy & not recommended)

  5. Kernelspace: This mode is exclusive to Windows systems.



3. Container Runtime

You might be familiar with Java Runtime (JRE), which is the software necessary to run Java programs on a host. Similarly, a container runtime is a software component required to run containers.

The container runtime operates on all nodes in the Kubernetes cluster. It is responsible for pulling images from container registries, running containers, allocating and isolating resources for containers, and managing the entire lifecycle of a container on a host.


To gain a deeper understanding, consider these two key concepts:

  1. Container Runtime Interface (CRI): This is a set of APIs that allows Kubernetes to interact with various container runtimes. It enables different container runtimes to be used interchangeably with Kubernetes. The CRI defines the API for creating, starting, stopping, and deleting containers, as well as managing images and container networks.

  2. Open Container Initiative (OCI): This is a set of standards for container formats and runtimes.


Kubernetes supports multiple container runtimes (CRI-O, Docker Engine, containerd, etc.) that comply with the Container Runtime Interface (CRI). All these container runtimes implement the CRI interface and expose gRPC CRI APIs (runtime and image service endpoints).

How does Kubernetes utilize the container runtime?

As discussed in the Kubelet section, the kubelet agent interacts with the container runtime using CRI APIs to manage the lifecycle of a container. It also retrieves all container information from the container runtime and provides it to the control plane.


Kubernetes Cluster Addon Components

Beyond the core components, a Kubernetes cluster requires addon components to be fully operational. The choice of an addon depends on the project requirements and use cases.


Below are some popular addon components that might be needed in a cluster:

  1. CNI Plugin (Container Network Interface)

  2. CoreDNS (For DNS server): CoreDNS functions as a DNS server within the Kubernetes cluster, enabling DNS-based service discovery.

  3. Metrics Server (For Resource Metrics): This addon collects performance data and resource usage of nodes and pods in the cluster.

  4. Web UI (Kubernetes Dashboard): This addon enables the Kubernetes dashboard for managing objects via a web UI.


1. CNI Plugin

First, it is important to understand the Container Networking Interface (CNI)

CNI is a plugin-based architecture with vendor-neutral specifications and libraries for creating network interfaces for containers.

It is not specific to Kubernetes. CNI standardizes container networking across orchestration tools like Kubernetes, Mesos, CloudFoundry, Podman, Docker, etc.

In container networking, companies may have different requirements such as network isolation, security, and encryption. As container technology evolved, many network providers developed CNI-based solutions for containers with diverse networking capabilities. These are referred to as CNI Plugins.

This allows users to select a networking solution that best fits their needs from various providers.


How does the CNI Plugin integrate with Kubernetes?

  1. The Kube-controller-manager assigns pod CIDR to each node, providing each pod with a unique IP address from the pod CIDR.

  2. Kubelet interacts with the container runtime to launch the scheduled pod. The CRI plugin, part of the container runtime, interacts with the CNI plugin to configure the pod network.

  3. The CNI Plugin facilitates networking between pods across the same or different nodes using an overlay network.



The following are high-level functionalities provided by CNI plugins:

  1. Pod Networking

  2. Pod network security and isolation using Network Policies to control traffic flow between pods and namespaces.


Some popular CNI plugins include:

  1. Calico

  2. Flannel

  3. Weave Net

  4. Cilium (Uses eBPF)

  5. Amazon VPC CNI (For AWS VPC)

  6. Azure CNI (For Azure Virtual network)


Kubernetes Native Objects

Until now, we have explored the core Kubernetes components and their functionalities.


All these components work towards managing the following key Kubernetes objects:

  1. Pod

  2. Namespaces

  3. Replicaset

  4. Deployment

  5. Daemonset

  6. Statefulset

  7. Jobs & Cronjobs

  8. ConfigMaps and Secrets


Regarding networking, the following Kubernetes objects play a crucial role:

  1. Services

  2. Ingress

  3. Network policies

Additionally, Kubernetes can be extended using CRDs and Custom Controllers. Therefore, cluster components also manage objects created using custom controllers and custom resource definitions.



Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
bottom of page