- read

Learning from Discord’s Approach — Request Coalescing with Golang

Mohammad Hoseini Rad 53

As you might have seen previously, Discord published a valuable article last year discussing how they successfully managed to store trillions of messages. While there are numerous Youtube videos and articles about this article, I think one section of this article, titled “Data Services Serving Data,” didn’t receive enough attention. In this article, we discuss Discord’s approach to data services and explore how we can leverage Golang’s concurrency features to reduce database load in certain scenarios.

Data Services to Rescue Hot Partitions

As you know, messaging and channels are the most used components of Discord. Let’s imagine a scenario where an admin of a channel with 500k members mentions @everyone. What would happen? Thousands of simultaneous requests direct to that database partition, all aiming to retrieve the same message. This pattern repeats until the partition can no longer respond to other requests.

Discord introduced an intermediary service that sits between Python API and the database cluster — which they call data service. This service contains roughly one gRPC endpoint per query without any business logic. The big feature that this service has for Discord is request coalescing.

Request Coalescing

As we discussed before, numerous similar requests direct to the database partition whenever there is a mention in a huge channel. By coalescing the requests, if multiple users are requesting the same row of the database, we can merge these requests in only one select query and run that instead.

By having a data service instead of connecting directly to the database, we can implement many exciting features, such as bulk queries, that can reduce the database overhead significantly and improve the mean and especially the 99th percentile of the queries.

Implementing a simple request coalescing with Golang

Like numerous other companies, Discord uses Python as its primary backend language. Whether a microservice or a monolith, backend services are usually directly connected to a data source for making queries. While Python is indeed a versatile language, it falls short in concurrency. Implementing concurrent and high-throughput services with Python can be somewhat challenging, and the performance, compared to similar services written in compiled languages such as C++, Rust, and Golang, tends to be lower.

Before doing anything, let’s simulate the mentioned situation. Let’s imagine the service receives a total of 5k requests with 1k concurrency.

  • Total requests: 5,000
  • Concurrency: 1,000
  • Number of unique messages that need to be retrieved: 100
type Message struct {

Text string
User string // some random properties that a message row may have

func generateRandomData(db *gorm.DB) {
for i := 0; i < 100; i++ {
msg := &messages.Message{Text: fmt.Sprintf("Message #%d", i)}

I built a simple database model for the Message table with Gorm and then filled the table with 100 dummy messages.

e := echo.New()
e.GET("/randomMessage", func(c echo.Context) error {
randomMessageID := rand.Intn(100)
var msg messages.Message
if err := db.Where("id=?", randomMessageID).First(&msg).Error; err != nil {
return err
return c.JSON(200, msg)

I made a simple endpoint for simulating a SELECT query for a random id between 0 and 100. Now we can benchmark this endpoint to simulate what would happen in this scenario:

  • Average RPS: 300
  • Mean Response Time: 3.2s
  • 50%: 546ms
  • 99%: 14.7s

And if we had the 10 seconds timeout policy, around 2% of the request would not get a response. Now let’s change the code. Golang has a built-in package called “single flight.” This package provides a duplicate function call suppression mechanism. In general, you give it a key and a function, and instead of running that function multiple times, SingleFlight holds other calls until the first call has completed its request and responds with the same result.

var g = singleflight.Group{}
e.GET("/randomMessage", func(c echo.Context) error {
randomMessageID := rand.Intn(100)
msg, err, _ := g.Do(fmt.Sprint(randomMessageID), func() (interface{}, error) {
var msg messages.Message
if err := db.Where("id=?", randomMessageID).First(&msg).Error; err != nil {
return nil, err
return &msg, nil
if err != nil {
return err
return c.JSON(200, msg)

func (g *Group) Do(key string, fn func() (interface{}, error)) (v interface{}, err error, shared bool)

Do executes and returns the results of the given function, making sure that only one execution is in-flight for a given key at a time. If a duplicate comes in, the duplicate caller waits for the original to complete and receives the same results. The return value shared indicates whether v was given to multiple callers.

Now let’s rerun the simulation and compare the results.

  • Average RPS: 2309
  • Mean Response Time: 433ms
  • 50%: 389ms
  • 99%: 777ms

As you can see, just using a simple technique decreased the 99th percentile by 14 seconds, and the new approach supported 7.6 times more requests per second.


We have been using the data services in my company for around three years, and we have noticed since then that there is a lot of potential to improve the overall performance of the application by just optimizing database queries. While the approach we discussed is situational, Discord has been using it for over a year and helped them a lot.

You should be aware that if you use data services, you will face other complications. For instance, you have multiple data service instances, and your Python API must have a mechanism to send similar requests to the same instance.