Boosting API Performance with Docker Compose Replicas

Enhance API performance using replicas and load balancing for significant request handling improvements.

January 23, 2025

10 min read

A few weeks ago, I had a customer with a big performance problem with their legacy application. When a lot of people used the app at the same time, the website just froze and only a restart of the service helped. Because of that, we decided to have a closer look at this issue. What I found was a procedure in the code that generated very heavy loads because of a lot of calculations. The main thread was blocked for a long time and could not solve any other requests for the service. We decided that this part should be refactored to be more efficient or outsourced from the main thread to be non-blocking, but since it was already a big problem that the service needed to be restarted a lot or will freeze, we needed a quick solution for that. The final workaround solution for now is simple. Since the service is already hosted with Docker, we decided to give the replica mode of docker compose with a load balancer in the front a chance, and it solved our problem for now and gives us additional time to refactor the problematic part.

What are docker compose replicas?

Docker compose replicas means that a service is not only started once, which is the default, but as many times as you want. So, instead of a single instance of a service, you could start 10 instances of the service which can process much more than just one instance. When it is used with an API service, you need a load balancer like NGINX in front of the replicated service to route the requests to an instance of the service that currently has a low amount of load.

Example Implementation

Now, let's look at an example of how you could implement the replica mode in your own docker compose setup.

To test a docker setup without the need of a big application, I wrote a small and basic NodeJS application that would generate a random number as an identifier and count up every time a request was sent to the service. It will answer the requests with the identifier and the current hit count of the instance. To simulate the heavy load tasks, the response of the api is delayed for 500 ms.

Here you can have a look at the implementation:

const http = require("http");

// Generate a random number between 1 and 10000
const randomNumber = Math.floor(Math.random() * 10000) + 1;
let hitCounter = 0;
const timeout = Math.floor(Math.random() * 5);

const server = http.createServer((req, res) => {
    hitCounter += 1;
    // console.log(`Request number: ${hitCounter}`);
    res.statusCode = 200;
    res.setHeader("Content-Type", "text/plain");
    setTimeout(() => {
        res.end(`number: ${randomNumber}; request: ${hitCounter}`);
    }, timeout * 500);
});

const port = 3000;
server.listen(port, () => {
    console.log(`Server running at http://localhost:${port}/`);
    console.log(`Random Number: ${randomNumber}`);
    console.log(`Timeout: ${timeout}`);
});

This application should be converted into a docker container and to do this I used a simple Docker setup and the following Dockerfile:

# Use the official Node.js image from the Docker Hub
FROM node:20

# Create and set the working directory
WORKDIR /app

# Copy the rest of the application files to the working directory
COPY . .

# Expose the port the app runs on
EXPOSE 3000

# Command to run the application
CMD ["node", "index.js"]

Now I have a ready to use application and I can set up the pre-replica docker compose file and the post-replica docker compose file to compare the differences and changes that are needed to get more performance from the application without changing the application itself.

Pre-Replica Setup

Now, let's set up the pre-replica docker compose file where I only spin up a single instance of the service.

services:
    app:
        build:
            context: .
            dockerfile: ./Dockerfile
        ports:
            - 3000:3000

As you see, it is a very simple setup and after starting the compose file I can test the performance of the application with an autocannon load test.

I used this command npx autocannon -c 500 -d 5 http://localhost:3000/ which will run 500 connections in 5 seconds and on my device I got a performance of about 900 Req/Sec on average.

Post-Replica Setup

To improve the performance, I need to adjust the docker compose in two points. Now the first point is that I want to add the replication option to the app. Since the service is now started multiple times, I can not apply a mapped port anymore, because if I do that, I will map a single port multiple times, which is not possible. Because of that, I need a NGINX load balancer in front of the application. The load balancer will export the port and route internally to the different instances of the application.

The new docker compose setup looks like this now:

services:
    app:
        build:
            context: .
            dockerfile: ./Dockerfile
        deploy:
            replicas: 10

    nginx:
        image: nginx:latest
        volumes:
            - ./nginx.conf:/etc/nginx/nginx.conf:ro
        depends_on:
            - app
        ports:
            - 3000:3000

Since a docker compose will always create a network for all containers that are included, I can use the app name in the configuration of NGINX. NGINX can load balance with different routing methods like least connection or least time. In this case, I used the least connection method, since this is a good default most of the time.

Here you can have a look at the nginx.conf file:

user  nginx;

events {
    worker_connections   1000;
}

http {
    upstream worker {
        least_conn;
        server app:3000;
    }

    server {
        listen 3000;
        location / {
            proxy_pass http://worker;
        }
    }
}

As you can see, the NGINX configuration set up a worker, which will point to the app by the name in the docker compose file and pass directly to that worker. NGINX will listen on the same port as before the service does so that I can use the same autocannon command as before. This time I had a performance of about 7500 Req/Sec on average, so I could make a lot more requests with the same application as before.

Conclusion

It can be very simple to improve the performance of an application with a horizontal scale, as long as the server has enough resources to spin up multiple instances of a single service. It is important to understand that this is not the end solution to the problem from the start, but a great improvement to get more time for a real refactoring. It also does work all the time. Here it is important to test things out. For example, if I reduce the delay in the service to 10 ms instead of 500 ms, I would get a worse performance with multiple instances, since my device could not handle the amount of requests that would be made. It's good to know possible solutions like this replication method to test if it could improve the performance, but always keep in mind to validate your changes to verify that it is a good solution.

If you want to have a look at the full example, you could have a look at the repository where the full example is stored here on GitHub.