Tuesday, October 10, 2023

Dragonfly + BullMQ = massive performance

Over the past year, there's been increasing buzz around Dragonfly, a new Redis™ drop-in replacement that aims to be the fastest and most memory-efficient Redis™-compatible database.

BullMQ makes a very heavy use of LUA scripts; the kind of advanced features it provides really require this level of complexity, the downside is that it puts a lot of pressure on the data store, specially if you want to provide very high throughput and still be fully compatible with Redis™.

When I heard about Dragonfly over one year ago, I was very eager to try it out and see how it would perform with BullMQ. Some of the features I thought were going to be beneficial for BullMQ users were:

  • Faster and more memory efficient data structures up to 30%
  • Support for multicore processing.
  • Faster and more memory efficient snapshotting.
  • Newer LUA engine 5.4, up to 2x faster than 5.1

Unfortunately, the initial versions of Dragonfly were not compatible with BullMQ, as our LUA scripts created dynamic keys, which were not supported at the time. This incompatibility stemmed from one of Dragonfly's key features: its support for multicore processing. This allowed it to distribute requests across different threads, processing them simultaneously.

Since Dragonfly could not know which keys were going to be touched by a given LUA script, it could not process the requests in parallel without breaking the data consistency.

To address this, a temporary solution was introduced: the "default_lua_flags=allow-undeclared-keys" flag. This flag essentially directed Dragonfly not to parallelize LUA scripts, allowing BullMQ to run on Dragonfly, but the performance was not great, in fact around 50% slower than Redis™, as the multicore architecture of Dragonfly would require global locks to make sure the LUA scripts were never interleaved.

However the Dragonfly team did ultimately find a solution to this problem, which I think is pretty smart: inspited by Redis™ cluster, utilize hashtags to tie a specific queue to an individual thread. The advantage here is scalability: as more queues are added, they're distributed evenly across available threads. Consequently, each thread can process its jobs concurrently, leading to a marked performance boost. Detailed instructions on how to configure Dragonfly for BullMQ can be found here.

In order to benchmark and compare Dragonfly with Redis™ we have created a small benchmark tool that can be used to add and process jobs as fast as possible in different setups. It can be run agains Redis™ or Dragonfly, and we can specify stuff like the amount of queues, the amount of workers, concurrency and so on.

Redis 6.2.13Redis 7.2.1Dragonfly 1tDragonfly 8t020K40K60K80K64032690357683860430
Add jobs per second (1 queue)
Redis 6.2.13Redis 7.2.1Dragonfly 1tDragonfly 8t08K16K24K32K28327308513039427344
Process jobs per second (1 queue). 100 concurrency.

In the above benchmark we test the simplest scenario, where only 1 queue is being used, dragonfly is at par or slightly faster than Redis™. Adding more threads in this case will not increase performance but actually affect it negativaly. However a more common scenario is to have a system with several queues, then the benefits start to show a completely different picture. Check benchmarks below when we use 16 queues.

Redis 6.2.13Redis 7.2.1DFLY 1tDFLY 4tDFLY 8t035K70K105K140K59674628996629993946120355
Add jobs per second (16 queues)
Redis 6.2.13Redis 7.2.1DFLY 1tDFLY 4tDFLY 8t015K30K45K60K2789928807257893514159539
Process jobs per second (16 queues). 100 concurrency.

Now we can clearly see the improvements in performance that one single Dragonfly instance can deliver if running on a multicore machine. We also tested with more queues, 64 in this case, although such large number of queues are not so common in practice, it will serve us as an upper bound on what performance we can expect right now.

Redis 6.2.13Redis 7.2.1DFLY 1tDFLY 4tDFLY 8t040K80K120K160K589826389859597121856141107
Add jobs per second (64 queues)
Redis 6.2.13Redis 7.2.1DFLY 1tDDFLYgf 4tDFLY 8t020K40K60K80K2926930996257895459072775
Process jobs per second (64 queues). 100 concurrency.

(all the tests where run with node index.js -h $SERVER_IP -c 100 -d 10 -r 8 -w 8 -q $NUM_QUEUES with Dragonfly/Redis run on AWS c7i.2xlarge, and BullMQ ran on AWS c7i.16xlarge)

I think these numbers are quite impressive, and as this is just the start of the journey it will only get better from here. Another thing that I find remarkable about Dragonfly is the comprehensive and user-friendly documentation, including in-depth technical explanations that instill confidence in the team's expertise.

It is great news for BullMQ users to have another higly competitive alternative to choose from, which also demonstrates that there is a vibrant community betting on a technology that will continue to be highly relevant in years to come.

Finally, I would like to give a big thanks to the Dragonfly team for their hard work and for making this possible, and special thanks to Shahar Mike that provided the hardware and run the tests. I am very excited to see how Dragonfly evolves in the future, as it for sure is going to be a game changer in the Redis™ ecosystem.

You are welcome to clone the repo, to try it out yourself and see how it performs in your machine with different configurations. You can also check our older benchmark tool that we have used in the past to compare BullMQ performance with previous versions.