- read

Know GOMAXPROCS before deploying your GO app to Kubernetes

Poom Wettayakorn 47

If you are writing Go concurrent application and running it on Kubnernetes, this is the configuration you need to be aware of.

The Problem

During our testing, we notice a drastic drop in performance when the service was stressed out on high loads. When we increase the number of users from 30, to 50, and 100, you can see that the median and p95 latency ramped up exponentially.

An issue of slowness in response time when we run load-testing with a higher number of users.

After looking through multiple traces, we found an unusual GC pause time (i.e., stop-the-world garbage collection) in newrelic’s go-runtime monitor which could be the root of the slowness in response time.

The Root

But WHY is GC pause time so high? — Is there something to do about Pods/Go runtime or Goroutine configuration?

In search for solutions, we found one article, Running Go application on Kubernetes, mentioned about Go runtime variable called GOMAXPROCS.

“that controls the number of system threads that it can spawn. That means the number of goroutines that can actually run in parallel. In Kubernetes, all the available CPU cores on the node are visible by its pods (instead of the limits configured in the manifest)”

> This Means, If you set a pod CPU limit to 1 core but your node has 64 cores of CPU, your Go app will grab the actual node resource and set GOMAXPROC to 64. Thus, you over-assign more threads than the Pod CPU has. And It will hurt the performance.

I’ve got to test out if it’s true. So I updated the code and ran on K8s with cpu.limit=1. It shows GOMAXPROCS as 64 (which same as node CPUs).

This is also confirmed by github.com/uber-go/automaxprocs/issues/12,

“When GOMAXPROCS is higher than the CPU quota allocated, we also saw significant throttling”.

The below chart showed benchmark results that when the GOMAXPROCS was set equal to the cpu given, it gives the best performance.


The solution

As discussed in the benchmark result, GOMAXPROCS should be set to the number of cores available in Pods. For an easy config, Uber’s automaxprocs helps you to do that by adding one import line to your code.

import _ "go.uber.org/automaxprocs


Now we get the configuration right, let’s prove the results. The figure below is the before and after setting automaxprocs. It shows that the response time is much lower and more stable in high loads.

Locust load testing

As for newrelic’s monitor, notice that GC pause time is reduced to about 1ms, and GC pause frequency (Calls per minute of stop-the-world garbage collection) is also reduced.

NewRelics Go runtime monitoring

Last, the cpu is more utilized and consumes fewer resources.

Openshif resource metrics

In a more goroutine intense app, you will see much more differences in performance like this. In our case, this is >50% slower if we do not set it

Plus: What happens if cpu quota on k8s side is less than 1 core?, I’ve also tested out this assumption. It turns out that 1 pod with 1 core is more efficient compared to 5 pods with 200m core. In addition, the automaxprocs did not go lower than 1 when the cpu limit is set to 200m core.

Final thoughts

If you’re running your GO app in Kubernetes the GOMAXPROCS is crucial and should be configured, or else you could use automaxprocs to do the work. And it’s better to use a full-core CPU as GOMAXPROCS consumes an integer amount of cores. This ensures that there is enough resource to utilize threads.