Hangfire.Pro.Redis 3.2.0 Performance Issues

We have been updating our redis connections to support sentinel and recently upgraded Hangfire.Pro.Redis from 3.0.0 to 3.2.0, and since doing so have been running into performance issues on our performance testing environments.

Most of our API requests that write to the database will also enqueue a hangfire job into redis to dispatch some messages in the background.

Here is a normal percentile graph from our perf suite:

Here is a percentile graph with our redis updates and updating to Hangfire.Pro.Redis 3.2.0.

Today we started running tests on the new version, and then deployed the old version of the code mid test, and we can see the percentiles immediately improve. We then created a new temporary branch that contains all of the logic changes, but the only change we have is downgrading Hangfire.Redis.Pro from 3.2.0 back to 3.0.0.

On this run:

10:54 - 11:12 = newest build with logic changes + updated libraries
11:12 - 11:40 = revert to build before logic changes + old libraries
11:40 onwards = custom build with new logic changes + using old Hangfire.Redis.Pro library.

We also see when we are running the tests the background job creation API now randomly throws back an exception that it is attempting to access a disposed connection.

Method: System.String Create(Hangfire.Common.Job, Hangfire.States.IState, System.Collections.Generic.IDictionary`2[System.String,System.Object]) (163, 17)
	Exception 02 inner --------------------------
	Type: StackExchange.Redis.RedisConnectionException
	Source: StackExchange.Redis.ConnectionMultiplexer, Hangfire.Pro.Redis, Version=3.2.0.0, Culture=neutral, PublicKeyToken=null
	Message: ConnectionDisposed on EXEC
	Trace:    at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor`1 processor, ServerEndPoint server) in C:\projects\stackexchange-redis\StackExchange.Redis\StackExchange\Redis\ConnectionMultiplexer.cs:line 2237
   at Hangfire.Pro.Redis.RedisConnection.CreateExpiredJob(Job job, IDictionary`2 parameters, DateTime createdAt, TimeSpan expireIn) in C:\projects\hangfire-pro-redis\src\Hangfire.Pro.Redis\RedisConnection.cs:line 92
   at Hangfire.Client.CoreBackgroundJobFactory.RetryOnException[TContext,TResult](Int32& attemptsLeft, Func`3 action, TContext context) in C:\projects\hangfire-525\src\Hangfire.Core\Client\CoreBackgroundJobFactory.cs:line 154
--- End of stack trace from previous location ---
   at Hangfire.Client.CoreBackgroundJobFactory.RetryOnException[TContext,TResult](Int32& attemptsLeft, Func`3 action, TContext context) in C:\projects\hangfire-525\src\Hangfire.Core\Client\CoreBackgroundJobFactory.cs:line 179
   at Hangfire.Client.CoreBackgroundJobFactory.Create(CreateContext context) in C:\projects\hangfire-525\src\Hangfire.Core\Client\CoreBackgroundJobFactory.cs:line 73
   at Hangfire.Client.BackgroundJobFactory.InvokeClientFilter(Enumerator& enumerator, IBackgroundJobFactory innerFactory, CreateContext context, CreatingContext preContext) in C:\projects\hangfire-525\src\Hangfire.Core\Client\BackgroundJobFactory.cs:line 141
   at Hangfire.Client.BackgroundJobFactory.Create(CreateContext context) in C:\projects\hangfire-525\src\Hangfire.Core\Client\BackgroundJobFactory.cs:line 76
   at Hangfire.BackgroundJobClient.Create(Job job, IState state, IDictionary`2 parameters) in C:\projects\hangfire-525\src\Hangfire.Core\BackgroundJobClient.cs:line 156
	Location: C:\projects\stackexchange-redis\StackExchange.Redis\StackExchange\Redis\ConnectionMultiplexer.cs
	Method: T ExecuteSyncImpl[T](StackExchange.Redis.Message, StackExchange.Redis.ResultProcessor`1[T], StackExchange.Redis.ServerEndPoint) (2237, 37)

I’m currently asking whoever has the private nuget feed to grab a copy of 3.1.0 to see if it also has issues.

Note: Sentinel currently isn’t being used, we are connecting to all 3 redis nodes directly.

Thank you for the detailed report! The ObjectDisposedException is thrown, because connection is being re-established for some reason, and we should understand why it’s being triggered. Can you also provide logs related to the Hangfire.Pro.Redis namespace?

Hey,

Sorry for the delays on this, our perf testing environment has been pretty busy over the past few weeks with public holidays mixed in.

We’ve managed to confirm over the last few days this issue only exists in Hangfire.Pro.Redis 3.2.0. We’ve tested this problem does not occur on 3.0.0, 3.1.0 or 3.1.1, which will unblock our sentinel support as we can launch on these 3.1.x versions for now.

I’ve included some screenshots from setting up a Hangfire logger on trace level, connecting to our ELK stack. The Hangfire.Pro.Redis 3.2.0 library reports frequent disconnections from the and reconnections to the Redis cluster on the server side. Breaking it down to a single kubernetes pod, we can see this occurs roughly every 30 seconds. No exceptions are being thrown.

These logs are not present at all before 3.2.0.


Thanks for the details. So the problem is that Hangfire.Pro.Redis incorrectly determines the presence of the primary node for background processing. Previously such detection was working only for display purposes, and since 3.2.0 it drives some decisions.

I will roll back any decision-making steps for this feature in the upcoming version to solve the issue.

However, I’m curious why no primary node is detected in your case. Could you tell me more the configuration you are using:

  1. Standalone nodes (and how many endpoints are configured)
  2. Redis Cluster (and how many primary / replica endpoints you have)
  3. Redis Sentinel (and what’s the configuration)
  4. Azure Cache for Redis (and its tier and essential config)
  5. ElastiCache for Redis (and whether it’s serverless)
  6. Any other scenario

Hey,

So we are just connecting to 3 x standalone redis nodes directly, so the connection string looks something like:

node0:6379,node1:6379,node2:6379

We’re planning to move this to use sentinel as part of our move of redis from virtual machines to kubernetes, as we have noticed this method of connections can cause outages where hangfire begins throwing UnableToConnect or DNS resolution errors rolling out a config change through a rolling restart via the k8s pod DNS addresses.

It’s probably worth mentioning the version of redis we use on our virtual machines is fairly old as this is legacy hardware we’re looking to retire in the near future, currently it’s running on 4.0.9.

Thanks for the details, and this is the reason – primary node detection doesn’t work in multi-master setup with no cluster or sentinel support.

You can pass only a single node that’s actually used, because multi-master setup doesn’t work anyway with transactional workloads, such as in Hangfire (required for eventual, at-least-once processing). And unfortunately there’s no easy way to spread the workload to multiple nodes without sharding even in a cluster mode.

We’re planning to move this to use sentinel as part of our move of redis from virtual machines to kubernetes, as we have noticed this method of connections can cause outages where hangfire begins throwing UnableToConnect or DNS resolution errors rolling out a config change through a rolling restart via the k8s pod DNS addresses.

If you have any exception details and / or log messages in this case, you can send to support [at] hangfire.io. Logs starting with the debug level in the recent versions of Hangfire.Pro.Redis include messages emitted during the connection establishment pipeline, and should reveal the issue.

It’s probably worth mentioning the version of redis we use on our virtual machines is fairly old as this is legacy hardware we’re looking to retire in the near future, currently it’s running on 4.0.9.

This is not a problem, since minimal version required is 2.6.12 and I see no reasons to change this yet.

Sorry just to clarify this isn’t multi-master, this is 3 x standalone redis nodes with 1 master and 2 slaves.

There is a sentinel service running in the background but we do not support connecting to it on these virtual machines (and Hangfire.Pro.Redis below 3.1.0 would not work anyway), they typically have been exposed via direct connection and HAProxy. We specifically don’t use the HAProxy side as the stackexchange redis library has issues with this setup (a failover without a dropped connection can end up with a connection to a slave and readonly errors coming back).

Thank you for the additional details, it’s now super-clear what’s happened, and I have released Hangfire.Pro.Redis 3.3.0 with everything supported now:

  1. Previously custom primary/replica setups weren’t fully supported. Now, I have implemented correct detection of a primary node for this case, and together with implemented role checks in 3.2.0, such a setup should work even if the connection is the same.
  2. Role check will not drop a multiplexer if no “main” node is detected for Hangfire.Pro.Redis, so connections will not be dropped in this case (the original problem for your case), but a warning will be emitted in logs instead.

So please try upgrading to the newest versions when you have time, and let me know if you have any problems!