- read

Kubernetes Multi-Cluster Monitoring using Prometheus and Thanos

VAIBHAV THAKUR 81

We are monitoring multiple clusters (prod, staging etc) from a single Ops Cluster. However, at the same time the monitoring set-up in each cluster is very robust and complete and we can view those metrics separately as well should the need arise.

Prerequisite for the set-up:

In order to completely understand this tutorial, the following are needed:

  1. Working knowledge of Kubernetes and using kubectl
  2. A running Kubernetes cluster with at least 3 nodes ( for the purpose of this demo a GKE cluster is being used )
  3. Implementing Ingress Controller and ingress objects ( for the purpose of this demo Nginx Ingress Controller is being used ). Although this is not mandatory but it is highly recommended in-order to decrease the number of external endpoints created.

Architecture

Thanos architecture and components is as follows:

Thanos Architecture

Thanos consists of the following components:

Thanos Sidecar: This is the main component that runs along Prometheus. It reads and archives data on the object store. Moreover, it manages Prometheus’ configuration and lifecycle. To distinguish each Prometheus instance, the sidecar component injects external labels into the Prometheus configuration. This component is capable of running queries on Prometheus servers’ PromQL interface.

Thanos Store: This component implements the Store API on top of historical data in an object storage bucket. It acts primarily as an API gateway and therefore does not need significant amounts of local disk space. It joins a Thanos cluster on startup and advertises the data it can access. It keeps a small amount of information about all remote blocks on local disk and keeps it in sync with the bucket. This data is generally safe to delete across restarts at the cost of increased startup times.

Thanos Query: The Query component listens on HTTP and translates queries to Thanos gRPC format. It aggregates the query result from different sources, and can read data from Sidecar and Store. In HA setup, it even deduplicates the result.

Run-time deduplication of HA groups

Prometheus is stateful and does not allow replicating its database. This means that increasing high availability by running multiple Prometheus replicas is not very easy to use. Simple load balancing will not work as for example after some crash, replica might be up but querying such replica will result in small gap during the period it was down. You have a second replica that maybe was up, but it could be down in other moment (e.g rolling restart), so load balancing on top of those is not working well.

Thanos Querier instead pulls the data from both replicas, and deduplicate those signals, filling the gaps if any, transparently to the Querier consumer.

Thanos Compact: The compactor component of Thanos applies the compaction procedure of the Prometheus 2.0 storage engine to block data stored in object storage. It is generally not semantically concurrency safe and must be deployed as a singleton against a bucket.

It is also responsible for downsampling of data — performing 5m downsampling after 40 hours and 1h downsampling after 10 days.

Thanos Ruler: It basically does the same thing as Prometheus’ rules. The only difference is that it can communicate with Thanos components.

Implementation

You all the manifests used in the tutorial can be found here.

To get started, run

git clone https://github.com/Thakurvaibhav/k8s.git
kubectl create ns monitoring

Please update the domain name for the ingress objects.

sed -i -e s/<your-domain>/yourdomain/g k8s/monitoring/prometheus-ha/values.yaml

Next we create the GCS bucket for long term metric storage and query

  • Create 2 GCS buckets and name them as prometheus-long-term and thanos-ruler
  • Create a service account with the roles as Storage Object Creator and Storage Object Viewer
  • Download the key file as json credentials and name it as thanos-gcs-credentials.json
  • Create kubernetes secret using the credentials, kubectl create secret generic thanos-gcs-credentials --from-file=thanos-gcs-credentials.json -n monitoring

Deploy Resources

  • Deploy Helm Release
helm upgrade --install <RELEASE_NAME> prometheus-ha/

Now you can check all the pods as following:

$ kubectl -n monitoring get deploy
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
alertmanager 1 1 1 1 51d
kube-state-metrics 1 1 1 1 51d
thanos-querier 1 1 1 1 51d
$ kubectl -n monitoring get statefulsets
NAME DESIRED CURRENT AGE
grafana 1 1 51d
prometheus 3 3 55d
thanos-compactor 1 1 51d
$ kubectl -n monitoring get pods
NAME READY STATUS RESTARTS AGE
alertmanager-7b77799974-2lxkz 1/1 Running 0 50d
grafana-0 1/1 Running 0 51d
kube-state-metrics-6c7f456ddc-7cq8w 2/2 Running 0 51d
node-exporter-4lrxj 1/1 Running 0 29d
node-exporter-575rp 1/1 Running 0 51d
node-exporter-vvwkd 1/1 Running 0 51d
prometheus-0 2/2 Running 0 50d
prometheus-1 2/2 Running 0 50d
prometheus-2 2/2 Running 0 50d
thanos-compactor-0 1/1 Running 0 51d
thanos-querier-7dd85d6cb5-8d5s6 1/1 Running 0 50d

If we fire up an interactive shell in the same namespace as our workloads to check which pods thanos-store-gateway resolves, you will see something like this

[email protected]:/# nslookup thanos-store-gateway
Server: 10.63.240.10
Address: 10.63.240.10#53
Name: thanos-store-gateway.monitoring.svc.cluster.local
Address: 10.60.25.2
Name: thanos-store-gateway.monitoring.svc.cluster.local
Address: 10.60.25.4
Name: thanos-store-gateway.monitoring.svc.cluster.local
Address: 10.60.30.2
Name: thanos-store-gateway.monitoring.svc.cluster.local
Address: 10.60.30.8
Name: thanos-store-gateway.monitoring.svc.cluster.local
Address: 10.60.31.2
[email protected]:/#

The IPs returned above correspond to our Prometheus pods, thanos-store and thanos-ruler. This can be verified as

$ kubectl get pods -o wide -l thanos-store-api="true"NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
prometheus-0 2/2 Running 0 100m 10.60.31.2 gke-demo-1-pool-1-649cbe02-jdnv <none> <none>
prometheus-1 2/2 Running 0 14h 10.60.30.2 gke-demo-1-pool-1-7533d618-kxkd <none> <none>
prometheus-2 2/2 Running 0 31h 10.60.25.2 gke-demo-1-pool-1-4e9889dd-27gc <none> <none>
thanos-ruler-0 1/1 Running 0 100m 10.60.30.8 gke-demo-1-pool-1-7533d618-kxkd <none> <none>
thanos-store-gateway-0 1/1 Running 0 14h 10.60.25.4 gke-demo-1-pool-1-4e9889dd-27gc <none> <none>

Once grafana is running:

  • Access grafana at grafana.<your-domain>.com
  • You should be able to see some dashboards baked-in. You can always customise them or add more.

You should be able to filter dashboards per cluster. Each metric will contain an external label for a cluster name which can help with filtering and de-duplication of data. You can set in in the prometheus config as shown below or it can be simply set in the values.yaml for the helm chart:

...
global:
scrape_interval: 5s
evaluation_interval: 5s
external_labels:
cluster: prometheus-ha
# Each Prometheus has to have unique labels.
prometheus_replica: $(POD_NAME)
...

Now, in order to monitor another kubernetes cluster in the same pane do the following:

  1. Deploy all the components except for Grafana in your second cluster.
  2. Make sure you update the cluster name in the prometheus config map.
  3. Expose Thanos querier on port 10901 of second cluster to be accessible from first cluster.
  4. Update the Querier deloyment in first cluster to query metrics from the second cluster. This can be done by adding the store endpoint (alternatively Query endpoint can also be used )to Querier deployment of first cluster.
# Update the querier container's argument and add the following
- --store <IP_THANOS_QUERY_CLUSTER_2>:10901
# Alternatively you can also use dns name as following
- --store dns+<DNS_STORE_GATEWAY_CLUSTER_2>:10901

Once the querier deployment is updated the new store gateway should appear under Stores tab on the UI.

You should be able to access thanos querier at thanos-querier.<your-domain>.com

Thanos Querier

If you have imported the dashboards from k8s/monitoring/dashboards-ha then you should be able to filter metrics for each cluster as shown below:

Kubernetes Deployment metrics for each connected cluster

Note:

  1. One distinct advantage of running Thanos is that we need not perform reloads for prometheus configs anymore. Each time the prometheus configMap is updated, thanos sidecar performs a hot reload on it.

Feel free to reach out should you have any questions around implementation or architecture. If you enjoyed reading this please check out some of my other blogs here:

  1. Continuous Delivery pipelines for Kubernetes using Spinnaker
  2. HA Elasticsearch over Kubernetes
  3. Scaling MongoDB on Kubernetes
  4. AWS ECS and Gitlab-CI

Follow us on Twitter 🐦 and Facebook 👥 and Instagram 📷 and join our Facebook and Linkedin Groups 💬.

To join our community Slack team chat 🗣️ read our weekly Faun topics 🗞️, and connect with the community 📣 click here⬇