Thursday, March 21, 2024

Introducing BullMQ Proxy

As BullMQ gained popularity, a recurrent question has been raised: What about support for other languages? After all, if most of the code is written in LUA scripts, porting it should not be that difficult.

We ported it to Python and soon discovered that the work required to implement all the features and fix issues is non-trivial. Although the Python version is usable, it still lacks some features present in the NodeJS version.

Attempting to port to any other language and platform out there was going to be an immense amount of work, not to mention delving into new domains with which I am not particularly familiar.

So, how to fulfill this need of running BullMQ everywhere and still make it manageable for us and everybody else? It turns out that the simplest solutions are often the best ones. In this case, I am quite satisfied with how well this turned out: namely, a proxy server that acts as a middleman between any service and BullMQ queues.

We wanted to create a service that is lightweight, capable of extracting as much performance from BullMQ as possible with minimal overhead. Hence, we figured BunJS would be the best candidate for the job, as it provides the fastest HTTP and WebSocket servers for JavaScript-based code (and probably for other runtimes as well).

How does it work?

Initially, the proxy supports a REST HTTP API, but we are planning to support WebSockets as well to further minimize the overhead. The HTTP version of the proxy has some advantages, though, as will be clear in a moment. The way it works is that you register workers in the proxy, at most one worker per queue, but you can choose any concurrency factor you want, the same as when using BullMQ natively. If you want several workers running in different processes, you just need to spawn more proxy services, and everything will automatically be kept in sync between them.

Workers are thus registered with the proxy by specifying your typical BullMQ worker options, but this time you also specify an endpoint. This endpoint is what the worker will call when a job should be processed. You can see this mechanism similar to webhooks; you register the webhook, and the service will call it with the job. Retries, rate limits, and all the features that make BullMQ so powerful are all there.

I think this approach opens up a lot of new possibilities, not only does it simplify communication between services implemented in different languages, but it also allows for new patterns. For instance, it is easy to implement the job processing code in lambdas and have them called by the proxy with the same delivery guarantees provided by BullMQ.

It would also be easy to improve the authorisation mechanism, so that queues can be shared between different actors in a secure way. This is something that is not so easy to do with BullMQ natively.

What about performance?

I wrote a simple benchmark to get an indication of the overhead of using the proxy, and my average results on a MacBook Pro 2018 i7 2.2Ghz running on a single process with a local stock Redis 7.2.1 where as follows:

NodeJS 20.11.0BunJS 1.0.33BullMQ Proxy06K12K18K24K22780226826230
Add jobs per second (1 queue)

Variation between runs lay between 22k and 28k for native BullMQ regardless of using Node or Bun, whereas for the proxy it lies between 6k and 8k.

NodeJS 20.11.0BunJS 1.0.33BullMQ Proxy03.5K7K11K14K13026127422836
Process jobs per second. 200 concurrency.

For processed jobs, the variation between runs lay between 11k and 14k for Node and Bun, and quite stable around 3k for the Proxy.

I found these results quite interesting. The first thing that was surprising was that Bun did not outperform NodeJS in the BullMQ benchmarks; I guess the bottleneck is not in the runtime itself but in the TCP stack or the Redis client.

The second thing is that the proxy seems to provide a performance about one-fourth of the native versions. This is still quite good in my opinion, as the HTTP overhead is playing a big role here, which can be mitigated by running more proxy servers in parallel. Still, 2k jobs per second is a pretty good number for a lot of use cases. Obviously, these numbers will vary a lot depending on the time it takes for the actual endpoint to process the job, so these numbers will vary a lot in practice.

Production readyness

As mentioned, the proxy is quite light, and thus all the heavy work is still being performed by the battle-tested BullMQ library, so we expect the service to be quite stable. Still, it is new code, and despite our efforts, there are likely unknown bugs. Furthermore, we are using the new BunJS runtime, which is also a relatively new product and thus may introduce some unknown issues.

What we can promise is that if you find any issues, we will do our best to fix them as soon as possible.

What's next?

All in all, I think the proxy is quite usable already, but we have an ambitious roadmap ahead of us, featuring significant enhancements such as flows support, alongside smaller features like queue actions (such as pause/resume, clean, etc.), some job actions such as promoting delayed jobs, and manual job fetching, among others. The goal is to enable users to do exactly the same things with the proxy as they could by directly using BullMQ.

We have prepared some initial documentation for your perusal, and as always, feel free to contact me if you have any questions or suggestions.