top of page

Monitoring with Prometheus and CICD Pipelines

ree

Kubernetes Monitoring with Prometheus and CICD Pipelines

Before jumping into Prometheus or Kubernetes specific tooling, it’s important to align on what monitoring actually is and why it matters.


What is Monitoring?

Monitoring is the process of collecting, processing, and analyzing system level data to understand the health and performance of your infrastructure and applications. It answers questions like:


Is the application running as expected?


Are there bottlenecks or anomalies?


Is resource usage within safe limits?


Monitoring is not alerting, alerting is an outcome derived from it.


ree

The Four Golden Signals

Google’s Site Reliability Engineering (SRE) guidelines define four key metrics to monitor any system:


Latency – How long does it take to serve a request?


Traffic – How much demand is being placed on your system?


Errors – What is the rate of failing requests?


Saturation – How full is your service (CPU, memory, I/O, etc.)?


These signals help prioritize what to observe, even in a Kubernetes environment. and Prometheus is designed to collect these metrics at scale.


Why Prometheus for Kubernetes Monitoring?

Prometheus is a pull based time series database with native Kubernetes support. It discovers targets automatically, scrapes metrics using HTTP endpoints, and stores the data locally for analysis and alerting. Reasons Prometheus fits Kubernetes well:


Native support for Kubernetes Service Discovery


Works seamlessly with kube-state-metrics and node-exporter


Rich query language (PromQL)


Integration with Alertmanager and Grafana


Typical Kubernetes Monitoring Pipeline:


ree

1. Metric Sources

Application level metrics (/metrics endpoint with libraries like prometheus-client)


Node-level metrics via node-exporter


Cluster state metrics via kube-state-metrics


2. Prometheus Server

Scrapes metrics from above sources


Applies relabeling and filters via config


Stores time-series data locally


3. Alerting

Prometheus pushes alert conditions to Alertmanager


Alertmanager routes alerts to email, Slack, PagerDuty, etc.


You define conditions like: job_duration_seconds > 95th_percentile for 5m


4. Visualization

Data from Prometheus is queried via Grafana


Dashboards show node health, pod status, memory, CPU usage, and custom app metrics


Example PromQL to alert on long running jobs:


max_over_time(kube_job_duration_seconds_sum[5m])

/

max_over_time(kube_job_duration_seconds_count[5m])

> 120

This triggers when the average job duration over the last 5 minutes exceeds 2 minutes.


Having established the basics of monitoring and how a pipeline looks like, let us move into how the actual behind the scenes metric flow happens inside a Kubernetes cluster.


ree

The default source of resource metrics like CPU and memory usage in Kubernetes is the metrics-server. It powers commands like:


kubectl top pod

kubectl top node

Here’s how the flow works internally:


Each kubelet on a worker node collects resource usage from the cAdvisor component and aggregates pod-level metrics.


Metrics-server, deployed inside the cluster, pulls these metrics from all kubelets using the Summary API.


The metrics-server then exposes this data through its own API endpoint.


The API server fetches metrics from the metrics-server, making it available to kubectl and Horizontal Pod Autoscalers.


When you run kubectl top, it queries the API server, which relays data coming from metrics-server.


This flow is purely for resource usage monitoring, not event based alerting or long term storage. For detailed, persistent, and custom metrics, Prometheus must be added on top.


Before creating a pipeline like this, consider these actionable insights:


Define clear SLOs and SLIs

Know what "healthy" looks like for your services. Define latency, error rate, and availability objectives up front.


Standardize metric exposure

Use a consistent /metrics endpoint format across services using Prometheus client libraries. Avoid custom metric formats.


Tag everything with labels

Ensure all metrics include labels like app, env, namespace, and job to enable flexible querying and filtering later.


Start with node, pod, and job-level metrics

Include node-exporter, kube-state-metrics, and kubelet as part of your base setup. These cover 80 percent of operational use cases.


Limit alert fatigue

Don’t start with hundreds of alerts. Begin with a few high-signal rules tied to the four golden signals and refine from real incidents.


Plan data retention and storage

Prometheus stores time-series data on disk. Configure retention, remote storage, or Thanos if you're scaling beyond default limits.


Separate environments

Keep staging and production metrics isolated. Use federation or label filters if you need aggregate views.


These foundations ensure your monitoring pipeline is scalable, actionable, and maintainable from day one.


 
 
 

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
bottom of page