Issue with retrying connection
Read more:
An issue relating workers that misteriously stopped processing jobs has been identified
when there is a connection loss and a subsequent reconnection.
The issue manifests itself when a connection is re-established, and although the workers are
indeed connected to Redis, they stop processing any jobs.
We have tracked down this issue to the speed at which the reconnections are retried. In the default
configuration, the reconnection is retried using an expontential backoff strategy, and the first retries
are performed within milliseoconds. This triggers the following bug in the Redis client library:
ioredis #1718
The following code was used in BullMQ to define the default retry strategy for Redis connections:
this.opts = {
port: 6379,
host: "127.0.0.1",
retryStrategy: function (times: number) {
return Math.min(Math.exp(times), 20000);
},
...opts,
};
As it can be seen in the code above, the first retry is performed after 1ms, and the second retry after
2ms, etc. following an exponential backoff, however, since the first retries are so fast, the bug in the
Redis client library is triggered.
The issue has been fixed in the latest version of BullMQ (v3.6.2), and the default retry strategy is now as follows:
this.opts = {
port: 6379,
host: "127.0.0.1",
retryStrategy: function (times: number) {
return Math.max(Math.min(Math.exp(times), 20000), 1000);
},
...opts,
};
So it will wait at least 1 second before retrying the first time, and then it will follow the exponential backoff strategy.
However, if you are defining you own custom retry strategy, you should make sure that the first retries are not
performed within milliseconds, and wait one second minimum, at least until the issue in the Redis client library is fixed.