Long running jobs cause "Distributed Lock Timeout Exception", and multiple instance of the same job

sa4720k · August 6, 2021, 12:51pm

Hello,
I’ve been struggling with this issue for a while now. I have a Hangfire background job that has the potential to run for a very long time (18+ hours) until it’s completed. However, at some point in the middle of the job running I get a DistributedLockTimeoutException and then the job starts again as a duplicate job that causes a duplicate record in the database. The number of duplicate jobs seems to correspond to the number of Workers in the particular server that the job is running under.

I’ve tried several of the suggestions posted here and elsewhere in my Google search , such as setting DisableConcurrentExecution and playing around with setting Time-Outs in the SqlServerStorageOptions to no avail. I’ve also tried other users’ codes such as SkipConcurrentExecution and MaximumConcurrentExecutions.

SkipConcurrentExecution is the closest I’ve gotten to a solution, however, if it encounters the distributed timeout exception it just stops the job in the middle whereas I want it to somehow ignore the exception and continue to the end of the job.

I’m using the latest version of Hangfire (1.7.24) for ASP.NET Core 2.2. I’ve noticed that a lot of people have had this issue, but I haven’t found a solution. I’d appreciate some suggestions on how to fix this issue.

This is what I have in my ConfigureServices section:
GlobalConfiguration.Configuration
.UseSqlServerStorage(Configuration.GetConnectionString(“DB”), new SqlServerStorageOptions
{
CommandBatchMaxTimeout = TimeSpan.FromMinutes(30),
SlidingInvisibilityTimeout = TimeSpan.FromMinutes(5),
QueuePollInterval = TimeSpan.Zero,
UseRecommendedIsolationLevel = true,
DisableGlobalLocks = true,
CommandTimeout = TimeSpan.FromMinutes(30),
});

Topic		Replies	Views
Distributed Lock Timeout Exception - Timeout expired question recurring , aspnetcore	2	16551	February 10, 2021
Issue with Hangfire lock timeout bug? sql-server , distributed-locks , aspnetcore	0	224	December 6, 2024
DistributedLock TimeoutException with batches, but batch operations get executed question hangfire-pro , sql-server	9	8043	March 20, 2017
The timeout elapsed prior to obtaining a distributed lock on the 'xxx' resource bug? recurring	1	2216	February 19, 2021
Unhandled exception in my code prevents recurring process from running again question recurring , sql-server	0	2022	June 10, 2019

Long running jobs cause "Distributed Lock Timeout Exception", and multiple instance of the same job

Related topics