We are monitoring multiple clusters (prod, staging etc) from a single Ops Cluster. However, at the same time the monitoring set-up in each cluster is very robust and complete and we can view those metrics separately as well should the need arise.
Prerequisite for the set-up:
In order to completely understand this tutorial, the following are needed:
- Working knowledge of Kubernetes and using kubectl
- A running Kubernetes cluster with at least 3 nodes ( for the purpose of this demo a GKE cluster is being used )
- Implementing Ingress Controller and ingress objects ( for the purpose of this demo Nginx Ingress Controller is being used ). Although this is not mandatory but it is highly recommended in-order to decrease the number of external endpoints created.
Thanos architecture and components is as follows:
Thanos consists of the following components:
Thanos Sidecar: This is the main component that runs along Prometheus. It reads and archives data on the object store. Moreover, it manages Prometheus’ configuration and lifecycle. To distinguish each Prometheus instance, the sidecar component injects external labels into the Prometheus configuration. This component is capable of running queries on Prometheus servers’ PromQL interface.
Thanos Store: This component implements the Store API on top of historical data in an object storage bucket. It acts primarily as an API gateway and therefore does not need significant amounts of local disk space. It joins a Thanos cluster on startup and advertises the data it can access. It keeps a small amount of information about all remote blocks on local disk and keeps it in sync with the bucket. This data is generally safe to delete across restarts at the cost of increased startup times.
Thanos Query: The Query component listens on HTTP and translates queries to Thanos gRPC format. It aggregates the query result from different sources, and can read data from Sidecar and Store. In HA setup, it even deduplicates the result.
Run-time deduplication of HA groups
Prometheus is stateful and does not allow replicating its database. This means that increasing high availability by running multiple Prometheus replicas is not very easy to use. Simple load balancing will not work as for example after some crash, replica might be up but querying such replica will result in small gap during the period it was down. You have a second replica that maybe was up, but it could be down in other moment (e.g rolling restart), so load balancing on top of those is not working well.
Thanos Querier instead pulls the data from both replicas, and deduplicate those signals, filling the gaps if any, transparently to the Querier consumer.
Thanos Compact: The compactor component of Thanos applies the compaction procedure of the Prometheus 2.0 storage engine to block data stored in object storage. It is generally not semantically concurrency safe and must be deployed as a singleton against a bucket.
It is also responsible for downsampling of data — performing 5m downsampling after 40 hours and 1h downsampling after 10 days.
Thanos Ruler: It basically does the same thing as Prometheus’ rules. The only difference is that it can communicate with Thanos components.
You all the manifests used in the tutorial can be found here.
To get started, run
git clone https://github.com/Thakurvaibhav/k8s.git
kubectl create ns monitoring
Please update the domain name for the ingress objects.
sed -i -e s/<your-domain>/yourdomain/g k8s/monitoring/prometheus-ha/values.yaml
Next we create the GCS bucket for long term metric storage and query
- Create 2 GCS buckets and name them as
- Create a service account with the roles as Storage Object Creator and Storage Object Viewer
- Download the key file as json credentials and name it as
- Create kubernetes secret using the credentials,
kubectl create secret generic thanos-gcs-credentials --from-file=thanos-gcs-credentials.json -n monitoring
- Deploy Helm Release
helm upgrade --install <RELEASE_NAME> prometheus-ha/
Now you can check all the pods as following:
$ kubectl -n monitoring get deploy
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
alertmanager 1 1 1 1 51d
kube-state-metrics 1 1 1 1 51d
thanos-querier 1 1 1 1 51d$ kubectl -n monitoring get statefulsets
NAME DESIRED CURRENT AGE
grafana 1 1 51d
prometheus 3 3 55d
thanos-compactor 1 1 51d$ kubectl -n monitoring get pods
NAME READY STATUS RESTARTS AGE
alertmanager-7b77799974-2lxkz 1/1 Running 0 50d
grafana-0 1/1 Running 0 51d
kube-state-metrics-6c7f456ddc-7cq8w 2/2 Running 0 51d
node-exporter-4lrxj 1/1 Running 0 29d
node-exporter-575rp 1/1 Running 0 51d
node-exporter-vvwkd 1/1 Running 0 51d
prometheus-0 2/2 Running 0 50d
prometheus-1 2/2 Running 0 50d
prometheus-2 2/2 Running 0 50d
thanos-compactor-0 1/1 Running 0 51d
thanos-querier-7dd85d6cb5-8d5s6 1/1 Running 0 50d
If we fire up an interactive shell in the same namespace as our workloads to check which pods thanos-store-gateway resolves, you will see something like this
[email protected]:/# nslookup thanos-store-gateway
Address: 10.63.240.10#53Name: thanos-store-gateway.monitoring.svc.cluster.local
Address: 10.60.31.2[email protected]:/#
The IPs returned above correspond to our Prometheus pods, thanos-store and thanos-ruler. This can be verified as
$ kubectl get pods -o wide -l thanos-store-api="true"NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
prometheus-0 2/2 Running 0 100m 10.60.31.2 gke-demo-1-pool-1-649cbe02-jdnv <none> <none>
prometheus-1 2/2 Running 0 14h 10.60.30.2 gke-demo-1-pool-1-7533d618-kxkd <none> <none>
prometheus-2 2/2 Running 0 31h 10.60.25.2 gke-demo-1-pool-1-4e9889dd-27gc <none> <none>
thanos-ruler-0 1/1 Running 0 100m 10.60.30.8 gke-demo-1-pool-1-7533d618-kxkd <none> <none>
thanos-store-gateway-0 1/1 Running 0 14h 10.60.25.4 gke-demo-1-pool-1-4e9889dd-27gc <none> <none>
Once grafana is running:
- Access grafana at
- You should be able to see some dashboards baked-in. You can always customise them or add more.
You should be able to filter dashboards per cluster. Each metric will contain an external label for a cluster name which can help with filtering and de-duplication of data. You can set in in the prometheus config as shown below or it can be simply set in the values.yaml for the helm chart:
# Each Prometheus has to have unique labels.
Now, in order to monitor another kubernetes cluster in the same pane do the following:
- Deploy all the components except for Grafana in your second cluster.
- Make sure you update the cluster name in the prometheus config map.
- Expose Thanos querier on port
10901of second cluster to be accessible from first cluster.
- Update the Querier deloyment in first cluster to query metrics from the second cluster. This can be done by adding the store endpoint (alternatively Query endpoint can also be used )to Querier deployment of first cluster.
# Update the querier container's argument and add the following
- --store <IP_THANOS_QUERY_CLUSTER_2>:10901# Alternatively you can also use dns name as following
- --store dns+<DNS_STORE_GATEWAY_CLUSTER_2>:10901
Once the querier deployment is updated the new store gateway should appear under Stores tab on the UI.
You should be able to access thanos querier at
If you have imported the dashboards from
k8s/monitoring/dashboards-ha then you should be able to filter metrics for each cluster as shown below:
- One distinct advantage of running Thanos is that we need not perform reloads for prometheus configs anymore. Each time the prometheus configMap is updated, thanos sidecar performs a hot reload on it.
Feel free to reach out should you have any questions around implementation or architecture. If you enjoyed reading this please check out some of my other blogs here:
- Continuous Delivery pipelines for Kubernetes using Spinnaker
- HA Elasticsearch over Kubernetes
- Scaling MongoDB on Kubernetes
- AWS ECS and Gitlab-CI
To join our community Slack team chat 🗣️ read our weekly Faun topics 🗞️, and connect with the community 📣 click here⬇