No master available & Connection to Redis isn't available yet, reconnect is in progress: please try again later

Hi,

I’m using Hangfire.Core 1.7.9, Hangfire.Pro 2.2.1 and Hangfire.Pro.Redis 2.5.5, and following exception sometimes occurs on Windows Service startup.

2020-11-07 20:06:24,786 [36] INFO  Hangfire.BackgroundJobServer [(null)] - Starting Hangfire Server using job storage: 'redis://no master available/360'

After that message, I’m seeing following exception:

2020-11-07 20:06:39,849 [BackgroundServerProcess #1] DEBUG 
Hangfire.Processing.BackgroundExecution [(null)] - Execution loop 
BackgroundServerProcess:9280dbcb caught an exception and will be retried in 00:00:15
Hangfire.Pro.Redis.RedisStorageException: Connection to Redis isn't available yet, reconnect is in 
progress:  please try again later.
at Hangfire.Pro.Redis.RedisStorage.ThrowConnectionUnavailableException()
at Hangfire.Pro.Redis.RedisStorage.GetDatabase()
at Hangfire.Pro.Redis.RedisConnection.TryGetServerTime(DateTime& now, String& reason)
at Hangfire.Pro.Redis.RedisConnection.AnnounceServer(String serverId, ServerContext context)
at Hangfire.Server.BackgroundServerProcess.CreateServer(BackgroundServerContext context)
at Hangfire.Server.BackgroundServerProcess.Execute(Guid executionId, BackgroundExecution 
execution, CancellationToken stoppingToken, CancellationToken stoppedToken, CancellationToken 
shutdownToken)
at Hangfire.Server.BackgroundProcessingServer.RunServer(Guid executionId, Object state)
at Hangfire.Processing.BackgroundExecution.Run(Action`2 callback, Object state)

What confuses me are - “no master available” message and subsequent exception. This is single instance Redis 5.0 server (running on Centos 7), which was available at the time - without any errors on server-side, or network issues.

At the time when the service was trying to reconnect/recover, I was able to connect to Redis server (through Redis desktop manager) without any issues from that same server where service was running.

This usually lasts for about 2-5 minutes, and then the hangfire recovers and things start working properly.
Although it is good that hangfire manages to recover, I’d really like to figure out why is this happening, and resolve it, especially because currently I need to delay execution and processing until the connection is established.

Unfortunately, this keeps popping up intermittently, so it is a bit hard to reproduce.

Can you please assist?

Thanks.

Hi @ikalafat, have you tried to update Hangfire.Pro.Redis to the latest version? There were a lot of changes to the underlying StackExchange.Redis fork since 2.5.5 released. Also, experimental there’s the Hangfire.Pro.Redis.SEv2 package based on the latest StackExchange.Redis packages, but I haven’t any evidences that it improves anything.

Hi @odinserj,

Thanks for the tip - I’ll give it a shot and try to update Hangfire.Pro.Redis.

Ivan

1 Like

Hi @odinserj

I have updated Hangfire.Pro libs few days ago and I’m no longer seeing this issue. However, I have made an additional change that affects overall app performance - unrelated TPL dataflow is now using dedicated thread scheduler (and not TaskScheduler.Default which was used by default by TPL logic).

I believe that due to a lot of tasks being created by TPL ActionBlocks, I have hit a multithreading issue (thread exhaustion) which caused that connection to Redis was opened after several attempts, when TaskScheduler actually got a chance to run the connection-establishing task.

Hope this will help someone.

Great the issue is gone now. Actually there were a lot of changes in Hangfire.Pro.Redis to avoid using thread pool even more (to continue work done in 2.5.X). But as far as I remember, indeed – connection reconfiguration logic can still be using thread pool – I was trying to avoid it, but there were too many changes, and code implementation relies on async logic so I gave up.

You told “caused that connection to Redis was opened after several attempts, when TaskScheduler actually got a chance to run the connection-establishing task”, and if you are sure with this statement, I can try once again.

Hi @odinserj,

I apologize for very late reply on this one, I have no idea how I have missed your reply. I got back to forum after googling something and noticed missed notification :slight_smile:

 You told “caused that connection to Redis was opened after several attempts, when TaskScheduler actually got a chance to run the connection-establishing task”, and if you are sure with this statement, I can try once again.

Well, judging by log entries (several unsuccessfull and then successful), after some time hangfire did connect and started processing tasks.

So yes, I could say that I’m sure that the internal default task scheduler was getting overflowed and couldn’t keep up with the amount of work I gave to it, causing issues with Hangfire as well.

Have a good one! Cheers!

Thanks for the reply, so will try again to avoid thread pool usage when connection is established to prevent such behavior.

Finally I was able to eliminate thread pool usage completely, please see the Hangfire.Pro.Redis 2.8.6 release.

Hi Sergej,

Thank you for your feedback. I’ll update Nuget as soon as I can.

Cheers!