If you are writing Go concurrent application and running it on Kubnernetes, this is the configuration you need to be aware of.
During our testing, we notice a drastic drop in performance when the service was stressed out on high loads. When we increase the number of users from 30, to 50, and 100, you can see that the median and p95 latency ramped up exponentially.
After looking through multiple traces, we found an unusual GC pause time (i.e., stop-the-world garbage collection) in newrelic’s go-runtime monitor which could be the root of the slowness in response time.
But WHY is GC pause time so high? — Is there something to do about Pods/Go runtime or Goroutine configuration?
In search for solutions, we found one article, Running Go application on Kubernetes, mentioned about Go runtime variable called GOMAXPROCS.
“that controls the number of system threads that it can spawn. That means the number of goroutines that can actually run in parallel. In Kubernetes, all the available CPU cores on the node are visible by its pods (instead of the limits configured in the manifest)”
> This Means, If you set a pod CPU limit to
1 core but your node has
64 cores of CPU, your Go app will grab the actual node resource and set
GOMAXPROC to 64. Thus, you over-assign more threads than the Pod CPU has. And It will hurt the performance.
This is also confirmed by github.com/uber-go/automaxprocs/issues/12,
“When GOMAXPROCS is higher than the CPU quota allocated, we also saw significant throttling”.
The below chart showed benchmark results that when the
GOMAXPROCS was set equal to the cpu given, it gives the best performance.
As discussed in the benchmark result,
GOMAXPROCS should be set to the number of cores available in Pods. For an easy config, Uber’s automaxprocs helps you to do that by adding one import line to your code.
import _ "go.uber.org/automaxprocs
Now we get the configuration right, let’s prove the results. The figure below is the before and after setting
automaxprocs. It shows that the response time is much lower and more stable in high loads.
As for newrelic’s monitor, notice that GC pause time is reduced to about 1ms, and GC pause frequency (Calls per minute of stop-the-world garbage collection) is also reduced.
Last, the cpu is more utilized and consumes fewer resources.
In a more goroutine intense app, you will see much more differences in performance like this. In our case, this is >50% slower if we do not set it
Plus: What happens if cpu quota on k8s side is less than 1 core?, I’ve also tested out this assumption. It turns out that 1 pod with 1 core is more efficient compared to 5 pods with 200m core. In addition, the
automaxprocs did not go lower than 1 when the cpu limit is set to 200m core.
If you’re running your GO app in Kubernetes the GOMAXPROCS is crucial and should be configured, or else you could use automaxprocs to do the work. And it’s better to use a full-core CPU as GOMAXPROCS consumes an integer amount of cores. This ensures that there is enough resource to utilize threads.