Prometheus is a popular choice for application monitoring. It is easy to set up and can be deployed alongside the applications inside a Kubernetes cluster. However, when you cannot run your main Prometheus instance within the same cluster it becomes a bit more tricky. At work we recently set up Prometheus monitoring for a web service which is deployed to independent kubernetes clusters in different regions. The official documentation is a bit sparce on this topic. In this post I will show what I learned about combining the metrics of applications running in separate Kubernetes clusters.
In contrast to other monitoring systems, Prometheus follows a pull workflow. The monitoring targets have to expose their metrics on an HTTP endpoint where Prometheus can “scrape” the data at its own pace. This is typically done by modules called “exporters”, e.g. the “node exporter” for system metrics. So including the visualization tool Grafana which is often used to create dashboards the setup looks like this:
The exporters simply expose the metrics as an HTTP endpoint, say https://example-api.test/metrics
. The content of the page looks for example like this:
http_requests_total{method="get",code="400"} 3 1395076383000
http_requests_total{method="get",code="200"} 1555 1395076383000
http_requests_total{method="get",code="500"} 0 1395076383000
...
In this example one would see how many requests were served up to the time demarked by the timestamp 1395076383000
. These were 1555 HTTP Get requests answered with status code 200, three answered with 400 and none which resulted into a server error with code 500.
This is how Prometheus could be configured to scrape from such an endpoint:
global:
scrape_interval: 15s
scrape_timeout: 15s
evaluation_interval: 15s
scrape_timeout: 10s
scrape_configs:
- job_name: example-api
honor_labels: true
honor_timestamps: true
metrics_path: /metrics
scheme: https
static_configs:
- targets:
- example-api.test
labels:
env: prod
location: us
Monitoring a Kubernetes deployment
For an application hosted as a “deployment” object on Kubernetes the same approach still works. We can expose the metrics as HTTP endpoint and let Prometheus scrape it. However, one needs to take care to point Prometheus to the endpoint in every Kubernetes pod and not to access it via an ingress route and service object, i.e. not via the outward facing url as in the first example above. This is because it might be that multiple replicas of the pod exist. Even if that’s not the case in normal operation there might e.g. be an old and a new version during a rolling update. Every pod then only serve some of the requests and with every scraping action Prometheus would get routed to a different pod. Everytime it would see the metrics of another one of the replicas and the numbers would be inconsitent and unusable.
To avoid this problem, Prometheus provides service discovery functionality for Kubernetes, the kubernetes_sd_config module. This module accesses the Kubernetes API to discover the pod objects and their IP addresses. Because these IP addresses are only reachable inside the Kubernetes cluster, Prometheus also needs to be run inside the cluster for this. With service discovery we get a setup as follows:
Pod 1) app2(Exporter
Pod 2) Prometheus -- pull metrics --> app1 & app2
The following listing shows a configuration for Prometheus which scrapes the /metrics
endpoint of all pods with a certain label (application=example-api
) in the namespace example-api-prod
in the same cluster. The name of the pod is added as additional label. Here I excluded network port 9443 from scraping, because the same metrics endpoint was also provided under a different port. Prometheus creates one scraping target per pod and exposed network port, so values would be duplicated.
global:
scrape_interval: 15s
evaluation_interval: 15s
evaluation_interval: 15s
scrape_timeout: 10s
scrape_configs:
- job_name: 'example-api-pods'
kubernetes_sd_configs:
- role: pod
namespaces:
names:
- example-api-prod
selectors:
- label: "application=example-api"
role: pod
metrics_path: /metrics
relabel_configs:
- source_labels: [__meta_kubernetes_pod_container_port_number]
regex: '9443'
action: drop
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
In order to get this up and running we need to deploy Prometheus also on Kubernetes. We can deploy it as a StatefulSet:
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
cluster: us
environment: prod
name: prometheus
namespace: example-api-prod
spec:
replicas: 1
selector:
matchLabels:
cluster: us
environment: prod
module: monitoring
template:
metadata:
labels:
app: prometheus
cluster: us
environment: prod
module: monitoring
spec:
containers:
- image: prom/prometheus:v2.24.0
initialDelaySeconds: 15
livenessProbe:
httpGet:
path: /-/healthy
port: 9090
name: prometheus
periodSeconds: 20
ports:
- containerPort: 9090
name: default
readinessProbe:
httpGet:
path: /-/ready
port: 9090
initialDelaySeconds: 5
periodSeconds: 10
resources:
limits:
cpu: 100m
memory: 200Mi
requests:
cpu: 10m
memory: 50Mi
volumeMounts:
- mountPath: /etc/prometheus/
name: config-volume
- mountPath: /prometheus
name: prometheus-volume
serviceAccount: prometheus
serviceAccountName: prometheus
volumes:
- configMap:
name: prometheus-config
name: config-volume
- emptyDir: {}
name: prometheus-volume
After deploying this and creating a port forwarding, we can browse the metrics in the Prometheus UI:
kubectl apply -f prometheus.yaml
kubectl port-forward prometheus-765d459796-258hz 9090:9090
Note that with this configuration the data does not survive a container recreation, because I used an “emptyDir”, a temporary directory, as data volume. This only makes sense when this Prometheus instance just serves as a relay for some central instance as described in the next section. But with this setup we could even deploy it as a “Deployment” instead of a “StatefulSet”.
Centralizing monitoring across clusters
Ok, now we have a working Prometheus inside our kubernetes cluster which collects all the metrics of the locally running services. But what if we have multiple clusters, for example in separate regions? Or if we have a central Prometheus instance outside of the cluster and want to relay our data to it?
Nothing easier than that. Because for this we can use one of the central concepts of the Prometheus monitoring landscape: federation. Prometheus provides this mechanism to create a hierarchy of Prometheus instances where the single instances only scrape a couple of targets and provide the already consolidated metrics for another round of scraping by another Prometheus server. This is how it looks like:
In order to get this concept practically working we can add a service and an ingress route in our kubernetes setup which exposes the built-in /federate
endpoint. This endpoint provides access to all the metrics.
apiVersion: v1
kind: Service
metadata:
labels:
app: prometheus
cluster: us
environment: prod
module: monitoring
name: prometheus-svc
namespace: example-api-prod
spec:
ports:
- port: 80
protocol: TCP
targetPort: 9090
selector:
app: prometheus
cluster: us
environment: prod
module: monitoring
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: prometheus-ingress
namespace: example-api-prod
labels:
app: prometheus
cluster: us
environment: prod
module: monitoring
spec:
rules:
- host: "monitoring.example-api.test"
http:
paths:
- path: /federate
pathType: Prefix
backend:
service:
name: prometheus-svc
port:
number: 80
Note: You will want to add TLS encryption and probably also authentication, so that the federate endpoint and the metrics are not exposed to the internet. In kubernetes this job can be taken by the ingress controller. I’m leaving this out here, because there are different implementations of Ingress controllers whith differing configuration syntax and this is not the focus of this blog post. See for example the documentation of the nginx ingress controller.
Now we can configure an external Prometheus instance to scrape the metrics from the federated instance. The configuration for Prometheus looks as if it was a normal metrics endpoint. Only we add a match[]
argument which selects the scraping targets which should be pulled from the federated Prometheus instance:
global:
scrape_interval: 15s
scrape_timeout: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: example-api-federate
honor_labels: true
honor_timestamps: true
params:
match[]:
- '{job!=""}'
scrape_interval: 15s
scrape_timeout: 15s
metrics_path: /federate
scheme: http
static_configs:
- targets:
- monitoring.example-api.test
labels:
env: prod
location: us
Also here authentication and encryption are missing. A good choice could be TLS client certificates. In the newer Prometheus versions support for this is built in. This is the documentation for the configuration needed on the central server.
Now the central Prometheus server and possibly the Grafana dashboards have access to the metrics of the applications inside the kubernetes cluster!
Summary
With the setup described above it is possible to monitor applications running in remote kubernetes clusters with a central Prometheus instance. The official documentation about this is a bit scarce, why it took us a while to find this approach. But with the steps explained here it is very simple to set up this monitoring system in a reliable manner. I hope this helps and you spread the word.