If you are writing Go concurrent application and running it on Kubnernetes, this is the configuration you need to be aware of.
The Problem
During our testing, we notice a drastic drop in performance when the service was stressed out on high loads. When we increase the number of users from 30, to 50, and 100, you can see that the median and p95 latency ramped up exponentially.
After looking through multiple traces, we found an unusual GC pause time (i.e., stop-the-world garbage collection) in newrelic’s go-runtime monitor which could be the root of the slowness in response time.
The Root
But WHY is GC pause time so high? — Is there something to do about Pods/Go runtime or Goroutine configuration?
In search for solutions, we found one article, Running Go application on Kubernetes, mentioned about Go runtime variable called GOMAXPROCS.
“that controls the number of system threads that it can spawn. That means the number of goroutines that can actually run in parallel. In Kubernetes, all the available CPU cores on the node are visible by its pods (instead of the limits configured in the manifest)”
> This Means, If you set a pod CPU limit to 1 core
but your node has 64 cores
of CPU, your Go app will grab the actual node resource and set GOMAXPROC to 64.
Thus, you over-assign more threads than the Pod CPU has. And It will hurt the performance.
limit=1
. It shows GOMAXPROCS
as 64 (which same as node CPUs).This is also confirmed by github.com/uber-go/automaxprocs/issues/12,
“When GOMAXPROCS is higher than the CPU quota allocated, we also saw significant throttling”.
The below chart showed benchmark results that when the GOMAXPROCS
was set equal to the cpu given, it gives the best performance.
The solution
As discussed in the benchmark result, GOMAXPROCS
should be set to the number of cores available in Pods. For an easy config, Uber’s automaxprocs helps you to do that by adding one import line to your code.
import _ "go.uber.org/automaxprocs
Results
Now we get the configuration right, let’s prove the results. The figure below is the before and after setting automaxprocs
. It shows that the response time is much lower and more stable in high loads.
As for newrelic’s monitor, notice that GC pause time is reduced to about 1ms, and GC pause frequency (Calls per minute of stop-the-world garbage collection) is also reduced.
Last, the cpu is more utilized and consumes fewer resources.
In a more goroutine intense app, you will see much more differences in performance like this. In our case, this is >50% slower if we do not set it
Plus: What happens if cpu quota on k8s side is less than 1 core?, I’ve also tested out this assumption. It turns out that 1 pod with 1 core is more efficient compared to 5 pods with 200m core. In addition, the automaxprocs
did not go lower than 1 when the cpu limit is set to 200m core.
Final thoughts
If you’re running your GO app in Kubernetes the GOMAXPROCS is crucial and should be configured, or else you could use automaxprocs to do the work. And it’s better to use a full-core CPU as GOMAXPROCS consumes an integer amount of cores. This ensures that there is enough resource to utilize threads.