BullMQ Python vs RQ Performance Benchmark
Read more:
Python developers have long relied on RQ (Redis Queue) as the go-to solution for background job processing with Redis. It's simple, well-documented, and has been around since 2012. But as Python applications scale, the question arises: is there a faster alternative?
BullMQ Python is a native Python port of the popular BullMQ library, bringing the same battle-tested architecture that powers millions of jobs in production Node.js applications. With its async-first design and optimized Lua scripts, BullMQ Python promises significant performance improvements over traditional synchronous queues.
In this benchmark, we'll put both libraries head-to-head to see how they perform in real-world scenarios.
Architecture Differences
Before diving into the numbers, it's important to understand the fundamental architectural differences between these two libraries:
RQ (Redis Queue)
- Synchronous, blocking architecture
- One job processed at a time per worker
- Simple and straightforward design
- To scale, you run multiple worker processes
BullMQ Python
- Asynchronous, non-blocking architecture (built on asyncio)
- Configurable concurrency per worker
- Lua scripts for atomic Redis operations
- Single worker can handle many concurrent jobs
These design choices have real implications. RQ's simplicity is its strength for basic use cases, but BullMQ's async architecture allows a single worker process to handle multiple jobs concurrently, significantly improving resource efficiency.
Feature Comparison
Beyond performance, the two libraries differ significantly in available features:
| Feature | BullMQ Python | RQ |
|---|---|---|
| Async support | Native (asyncio) | No (synchronous) |
| Concurrency per worker | Configurable (1–1000+) | 1 (scale via processes) |
| Job priorities | Yes (numeric) | Separate queues only |
| Rate limiting | Yes (global, per-queue) | No |
| Delayed jobs | Yes (built-in) | Via rq-scheduler |
| Repeatable / cron jobs | Yes (built-in) | Via rq-scheduler |
| Retries with backoff | Yes (exponential, custom) | Basic (fixed count) |
| Parent–child flows | Yes (FlowProducer) | Basic dependencies |
| Job progress tracking | Yes | No |
| Global events | Yes | No |
| Stalled job recovery | Automatic | Manual |
| Sandboxed processors | Yes | Via forking worker |
| Dashboard / UI | Taskforce.sh / Bull Board | rq-dashboard |
| Atomic operations | Lua scripts | Python + Redis calls |
Benchmark Configuration
I ran these benchmarks on my local machine:
- Machine: MacBook Pro with M2 Pro chip, 16GB RAM
- Python: 3.13
- Redis: 6.4.0 (local Docker instance)
- BullMQ Python: 2.19.5
- RQ: 2.6.1
- Methodology: 5 runs per test, reporting mean values
For RQ, we used SimpleWorker which processes jobs in the same process without forking, providing
the fairest comparison since BullMQ also processes jobs in-process. For processing tests with
simulated I/O work, we also tested RQ with multiple worker processes to match BullMQ's concurrency level.
Benchmark Results
Bulk Job Insertion
Adding 50,000 jobs at once using bulk insertion:
For bulk insertions, BullMQ holds a slight ~17% edge at ~21,900 vs ~18,700 jobs/sec.
Both libraries efficiently batch Redis round-trips, BullMQ uses pipelined Lua scripts while
RQ uses enqueue_many() with its own pipelining, so the results are in the same ballpark.
Single Job Insertion (Concurrent)
Adding 5,000 jobs with concurrent add() calls (concurrency=10):
Here BullMQ shows a ~2.2x advantage (6,200 vs 2,800 jobs/sec). The async architecture
allows multiple add() calls to be in-flight simultaneously, while RQ's synchronous design
means each insertion blocks until complete. This is a significant win for applications that
need to enqueue jobs from async web frameworks like FastAPI or Starlette.
Processing with Simulated I/O Work
Most real-world jobs involve some I/O: calling APIs, querying databases, reading files. To simulate this, each job performs a 10ms async sleep (BullMQ) or 10ms thread sleep (RQ).
For a fair scaling comparison, we match BullMQ's concurrency level with an equal number of RQ worker processes. At concurrency=10:
With 10 concurrent workers, BullMQ is 39% faster (654 vs 471 jobs/sec). RQ performs respectably here: 10 OS processes provide real parallelism. But the gap widens dramatically when we scale to 50:
At concurrency=50, BullMQ is 2.6x faster (3,100 vs 1,200 jobs/sec). BullMQ scales near-linearly (654 → 3,100, a 4.7x increase for 5x more concurrency), while RQ shows sub-linear scaling (471 → 1,200, only 2.5x for 5x more workers). The reason? 50 RQ processes all compete for the same Redis queue, creating contention. BullMQ handles all 50 concurrent jobs in a single process with zero contention.
Pure Queue Overhead
To isolate the queue machinery cost from the job work itself, we process no-op jobs (jobs that return immediately without doing any work). This test deliberately uses a single RQ worker to measure the true per-job overhead floor (note that this is lower than the I/O test above, which uses 10 parallel RQ worker processes):
BullMQ is ~10x faster for raw job turnover (3,400 vs 335 jobs/sec). This explains why RQ struggles with high-throughput workloads: each RQ job cycle involves approximately 24 sequential Redis round-trips (dequeue, deserialize, status updates, result storage, cleanup). At ~0.14ms per Redis round-trip on localhost, that's ~3.3ms of unavoidable per-job overhead, capping a single worker at ~300 jobs/sec regardless of how lightweight the job is.
BullMQ uses optimized Lua scripts that batch multiple Redis operations into atomic calls, reducing per-job overhead to a fraction of RQ's.
CPU-Bound Processing
Processing jobs with CPU work (1,000 sin/cos operations per job, matching the Elixir benchmark methodology):
BullMQ is ~8x faster for CPU-bound work (2,900 vs 351 jobs/sec). Since both libraries are single-threaded for CPU work (Python's GIL limits parallelism), the difference comes entirely from per-job overhead: BullMQ's ~0.3ms vs RQ's ~3ms. When the job work itself is lightweight (~0.3ms for 1000 sin/cos), BullMQ's lower overhead translates directly into higher throughput.
Summary
| Benchmark | BullMQ Python | RQ | Winner |
|---|---|---|---|
| Bulk Insert | 21,900/sec | 18,700/sec | BullMQ (1.2x) |
| Single Insert | 6,200/sec | 2,800/sec | BullMQ (2.2x) |
| 10ms I/O (c=10) | 654/sec | 471/sec (10 workers) | BullMQ (1.4x) |
| 10ms I/O (c=50) | 3,100/sec | 1,200/sec (50 workers) | BullMQ (2.6x) |
| Pure Overhead | 3,400/sec | 335/sec (1 worker) | BullMQ (10x) |
| CPU Processing | 2,900/sec | 351/sec | BullMQ (8x) |
Understanding Python's Performance Ceiling
Readers familiar with BullMQ's Node.js implementation may notice that the Python version tops out at ~3,400 jobs/sec for pure overhead, while the Node.js counterpart routinely exceeds 30,000 jobs/sec. The gap is not a design issue it is a platform-level constraint rooted in the performance of Python's Redis client.
To pinpoint the bottleneck we profiled every layer of BullMQ Python's job-processing pipeline:
| Operation | Time per call |
|---|---|
| Redis round-trip (PING) | 0.22 ms |
| Lua script eval (simple) | 0.23 ms |
| Full job cycle (c=1) | 0.51 ms |
| JSON encode / decode | < 0.002 ms |
| asyncio scheduling | 0.05 ms |
The dominant cost is the Redis round-trip itself. BullMQ Python uses redis-py, the standard async Redis client for Python. Node.js uses ioredis, which benefits from several structural advantages:
- C++ networking via libuv: ioredis performs socket I/O through Node.js's libuv layer,
written in C/C++. redis-py uses Python's
asynciostream layer, which is implemented in pure Python. - Native protocol parsing: ioredis can use hiredis (a C library) to parse Redis protocol responses. redis-py parses them in Python.
- Lower event-loop overhead: Node.js's libuv dispatches callbacks roughly 10× faster than Python's asyncio event loop.
The net effect is that a single Redis round-trip takes ~0.22 ms in Python versus ~0.05–0.08 ms in Node.js: a 3–4× difference. Because every job requires at least one Redis Lua-script call, this per-call gap directly caps throughput.
Unfortunately, redis-py is the only mature async Redis client available for Python, so this ceiling is effectively a platform limitation rather than a BullMQ design issue. Despite this constraint, BullMQ Python still delivers the fastest job processing in the Python ecosystem, up to 10× faster than RQ and considerably ahead of any other Python queue library we've tested.
The Resource Efficiency Story
The raw numbers tell only part of the story. Consider what these results mean in practice:
Single Process Comparison:
- 1 BullMQ worker (concurrency=50): ~3,100 jobs/sec (10ms I/O work)
- 50 RQ worker processes: ~1,200 jobs/sec (same 10ms I/O work)
BullMQ achieves 2.6x higher throughput in a single process compared to 50 RQ processes. Each RQ process consumes memory independently, needs its own Redis connection, and adds operational complexity. BullMQ's async architecture eliminates this overhead entirely.
Scaling Behavior:
- BullMQ scales near-linearly with concurrency: c=10 → c=50 yields 4.7x throughput
- RQ scales sub-linearly with workers: 10 → 50 workers yields only 2.5x throughput
The sub-linear scaling for RQ comes from Redis contention: 50 processes all polling the same queue simultaneously. BullMQ avoids this by scheduling all work in a single event loop.
For a production workload processing 100,000 jobs per hour (with real I/O work):
- BullMQ: 1 worker process
- RQ: 25-30 worker processes
This translates directly to infrastructure savings, simpler deployments, and reduced resource consumption.
When to Choose Each
Choose RQ when:
- You need simplicity above all else
- Your job volume is moderate (less than 1,000 jobs/min)
- You prefer synchronous Python code
- You're already running multiple worker processes anyway
Choose BullMQ Python when:
- You need high throughput from minimal workers
- You're using async Python (FastAPI, Starlette, etc.)
- You want advanced features (priorities, rate limiting, job dependencies)
- Resource efficiency matters for your infrastructure costs
Conclusion
BullMQ Python delivers substantial performance improvements over RQ across all benchmark categories. The async architecture and optimized Lua scripts provide up to 10x speedups for pure job throughput, 2.6x gains for realistic I/O workloads at scale, 8x for CPU-bound work, and 2.2x for concurrent single insertions. BullMQ also scales near-linearly with concurrency, while RQ's multi-process scaling hits diminishing returns from Redis contention.
The key architectural insight: RQ's SimpleWorker performs ~24 sequential Redis round-trips per job, creating an unavoidable ~3ms overhead floor. BullMQ batches these operations into atomic Lua scripts, keeping per-job overhead under 0.3ms, a 10x reduction that compounds across every job processed.
For Python applications that need to process high volumes of background jobs efficiently, BullMQ Python offers a compelling alternative to traditional synchronous queues. The ability to handle thousands of jobs per second with a single worker process can significantly reduce infrastructure complexity and costs.
The benchmark code is available at bullmq-python-bench if you'd like to run these tests yourself.
Ready to try BullMQ Python? Check out the documentation to get started.