Distributed Lock Timeout Exception - Timeout expired

Tags: #<Tag:0x00007f2356a2a140> #<Tag:0x00007f2356a2a000>

Hello Everyone, I have a DistributedLockTimeoutException. Below are the details.

Issue : When a Hangfire job is running and is not yet completed, and for some reason the server that is running the hangfire job comes down due to deployment or etc., before the hangfire job gets completed, then any new jobs that gets started later after the server starts cannot acquire a lock for executing the job as the lock is still held by the old job that was running before the server got down. The job runs every minute. These jobs are recurring jobs and below is the hangfire setup :

 services.AddHangfire(config => config
                .UseSerilogLogProvider()
                .SetDataCompatibilityLevel(CompatibilityLevel.Version_170)
                .UseSimpleAssemblyNameTypeSerializer()
                .UseRecommendedSerializerSettings()
                .UsePostgreSqlStorage(hangfireConnectionString, new PostgreSqlStorageOptions
                {
                    DistributedLockTimeout = TimeSpan.FromMinutes(2),
                    PrepareSchemaIfNecessary = true
                }));

Here is the error i got when there was a deployment :

"Hangfire.PostgreSql.PostgreSqlDistributedLockException: Could not place a lock on the resource 'HangFire:Processor.ExecuteAsync-1/1': Lock timeout.\n at Hangfire.PostgreSql.PostgreSqlDistributedLock.PostgreSqlDistributedLock_Init_Transaction(String resource, TimeSpan timeout, IDbConnection connection, PostgreSqlStorageOptions options)\n at Hangfire.PostgreSql.PostgreSqlDistributedLock..ctor(String resource, TimeSpan timeout, IDbConnection connection, PostgreSqlStorageOptions options)\n at Hangfire.PostgreSql.PostgreSqlConnection.AcquireDistributedLock(String resource, TimeSpan timeout)\n at Hangfire.MaximumConcurrentExecutionsAttribute.OnPerforming(PerformingContext filterContext)\n at Hangfire.Server.BackgroundJobPerformer.InvokeOnPerforming(Tuple2 x)\n at Hangfire.Profiling.ProfilerExtensions.InvokeAction[TInstance](InstanceAction1 tuple)\n at Hangfire.Profiling.SlowLogProfiler.InvokeMeasured[TInstance,TResult](TInstance instance, Func2 action, String message)\n at Hangfire.Profiling.ProfilerExtensions.InvokeMeasured[TInstance](IProfiler profiler, TInstance instance, Action1 action, String message)\n at Hangfire.Server.BackgroundJobPerformer.InvokePerformFilter(IServerFilter filter, PerformingContext preContext, Func1 continuation)"`

After Deployment, with the new jobs, i am getting the following error :

"Hangfire.Storage.DistributedLockTimeoutException: Timeout expired. The timeout elapsed prior to obtaining a distributed lock on the 'Processor.ExecuteAsync' resource.\n at Hangfire.MaximumConcurrentExecutionsAttribute.OnPerforming(PerformingContext filterContext)\n at Hangfire.Server.BackgroundJobPerformer.InvokeOnPerforming(Tuple2 x)\n at Hangfire.Profiling.ProfilerExtensions.InvokeAction[TInstance](InstanceAction1 tuple)\n at Hangfire.Profiling.SlowLogProfiler.InvokeMeasured[TInstance,TResult](TInstance instance, Func2 action, String message)\n at Hangfire.Profiling.ProfilerExtensions.InvokeMeasured[TInstance](IProfiler profiler, TInstance instance, Action1 action, String message)\n at Hangfire.Server.BackgroundJobPerformer.InvokePerformFilter(IServerFilter filter, PerformingContext preContext, Func1 continuation)"`

Additional Info : I am using MaximumConcurrentExecutions hangfire extensions. Below are the packages i am using.
Hangfire.AspNetCore - Version=β€œ1.7.18”
Hangfire.MaximumConcurrentExecutions - Version=β€œ1.1.0”
Hangfire.PostgreSql - Version=β€œ1.8.1”

Any help is really appreciated. Thank you in advance.

Could you find a solution to this PostgreSqlDistributedLockException ?

To get rid of the locks, i am deleting the locks from the lock table as a temporary solution. As a permanent solution in .net core i am using IHostApplicationLifeTime to gracefully stop all the hangfire jobs and hangfire server if needed before the host server gets terminated, so that the locks don’t occur in first place. You can also check other options suggested in hangfire git discussion(https://github.com/HangfireIO/Hangfire/issues/1799)

1 Like