Wednesday, June 21, 2023

Asymptotically faster priority jobs

For quite some time, the performance characteristics of the 'Priority Jobs' feature in BullMQ weren't meeting our expectations. Ideally, maintaining a priority list should have a performance complexity of O(logn). However, due to the mechanism implemented to evade polling, we had to adopt a different approach with a complexity of O(n).

Our original algorithm utilized a sorted set to determine the position of a priority job relative to other jobs in the queue. Whenever a new job was added, we calculated its insertion point based on its priority and added it to the waitlist at that position. This methodology was functional, but when dealing with a large number of jobs in the queue, the time required to add a job could become significant. Moreover, maintaining the mechanism was a complex process.

With the advancement of our understanding and new techniques to manage polling-free queues without needing a list for all queued jobs, we were able to develop an approach with a complexity of O(logn). This is easier to understand and maintain, as it relies solely on the sorted set.

While this new method offers simplicity and superior asymptotic complexity, it presented a challenge: maintaining order within a given priority. In Redis, sorted sets are organized by a score, which would be the job's priority in our context. However, if two jobs have identical priority, Redis sorts them lexico-graphically by job id, disrupting our desired order of job addition to the queue.

To resolve this, we introduced a counter key that increments every time a new job enters the priority set. This counter helps calculate the score. Instead of using the priority directly as a score, we shift the priority 32 bits to the left and add the counter to it. This ensures a unique score for each job and maintains the order of job addition to the queue. However, this approach brings some limitations that we should be aware of:

  • The maximum priority value is 2^21 (0 to 2,097,152). This is because the score is a 64-bit floating-point number, which only guarantees 53 bits for integer numbers, and we are shifting 32 bits to the left to accommodate the counter.

  • After exactly 2^32 jobs (over 4 billion) are added to the queue, the counter will overflow and disrupt the job order temporarily. While this won't affect most use cases, it's something to note. We are mitigating this issue by resetting the counter each time the queue is emptied. While not perfect, this solution should suffice for most scenarios.

  • Jobs without specified priority now gain maximum priority. In the previous version, a job without priority was added at the end of the waitlist, requiring it to wait for all prioritized jobs to be processed first. Now, with priority jobs existing only in the priority set, any job added to the waitlist (i.e., jobs without priority) will be processed immediately. This adjustment actually brings more consistency if we consider priority 0 as the highest priority, implying it is logical for a job without assigned priority to be processed before any other job.

  • The new mechanism implies a breaking change. The zset score isn't compatible with the old one. Furthermore, in the older version, the job ids existed both in the waitlist and the priority set simultaneously, whereas now they only exist in the priority (called "prioritized" now in Redis) set. To alleviate this, we use a different key for the new priority jobs. You would need to manually remove the old set (using the new Queue method "removeDeprecatedPriorityKey") when you have fully upgraded all instances using the old library version.

Since the feature implies a breaking change it is available in BullMQ 4.0.0. You can also find the PR that implements the new feature here.