Long running jobs cause "Distributed Lock Timeout Exception", and multiple instance of the same job

Tags: #<Tag:0x00007f7b5c63a540> #<Tag:0x00007f7b5c63a450>

Hello,
I’ve been struggling with this issue for a while now. I have a Hangfire background job that has the potential to run for a very long time (18+ hours) until it’s completed. However, at some point in the middle of the job running I get a DistributedLockTimeoutException and then the job starts again as a duplicate job that causes a duplicate record in the database. The number of duplicate jobs seems to correspond to the number of Workers in the particular server that the job is running under.

I’ve tried several of the suggestions posted here and elsewhere in my Google search , such as setting DisableConcurrentExecution and playing around with setting Time-Outs in the SqlServerStorageOptions to no avail. I’ve also tried other users’ codes such as SkipConcurrentExecution and MaximumConcurrentExecutions.

SkipConcurrentExecution is the closest I’ve gotten to a solution, however, if it encounters the distributed timeout exception it just stops the job in the middle whereas I want it to somehow ignore the exception and continue to the end of the job.

I’m using the latest version of Hangfire (1.7.24) for ASP.NET Core 2.2. I’ve noticed that a lot of people have had this issue, but I haven’t found a solution. I’d appreciate some suggestions on how to fix this issue.

This is what I have in my ConfigureServices section:
GlobalConfiguration.Configuration
.UseSqlServerStorage(Configuration.GetConnectionString(“DB”), new SqlServerStorageOptions
{
CommandBatchMaxTimeout = TimeSpan.FromMinutes(30),
SlidingInvisibilityTimeout = TimeSpan.FromMinutes(5),
QueuePollInterval = TimeSpan.Zero,
UseRecommendedIsolationLevel = true,
DisableGlobalLocks = true,
CommandTimeout = TimeSpan.FromMinutes(30),
});