Worker changed during job execution

Vladimirs_Surajevs · May 17, 2017, 11:43am

Hi,

I have a job that meant to do some hard work for about 1.5h, however it fails on a halfway through because it looks like worker stopped and my job is not re-entrant. I understand that I should design my job better, however I’d like to understand why worker stopped/changed. In State table I see following:

Id	JobId	Name	Reason	CreatedAt	Data
27	8	Processing	NULL	2017-05-17 10:34:51.640	{"StartedAt":"2017-05-17T10:34:51.6389278Z","ServerId":"mypc:3332","WorkerNumber":"22"}
28	8	Processing	NULL	2017-05-17 11:04:51.683	{"StartedAt":"2017-05-17T11:04:51.6819303Z","ServerId":"mypc:3332","WorkerNumber":"14"}
29	8	Failed	An exception occurred during performance of the job.	2017-05-17 11:04:51.727	{"FailedAt":"2017-05-17T11:04:51.7209303Z","ExceptionType":"System.IO.FileNotFoundException","ExceptionMessage":"blahblah you job is not reentrant.."}
30	8	Deleted	Automatic deletion after retry count exceeded 0	2017-05-17 11:04:51.727	{"DeletedAt":"2017-05-17T11:04:51.7209303Z"}

How I can find out what happened to worker 22? In logs I can see only that job started again and then failed and deleted (Failed to process the job ‘8’: an exception occured. Job was automatically deleted because the retry attempt count exceeded 0.)

I am using 1.4.3.0 hangfire version with WebAPI and it is configured to use sql server storage:

SqlServerStorage storage = new SqlServerStorage(HANGFIRE_CONNECTION);
BackgroundJobServerOptions options = new BackgroundJobServerOptions();
_app.UseHangfireServer(options, storage);

GlobalConfiguration.Configuration
	.UseSqlServerStorage(HANGFIRE_CONNECTION)
	.UseActivator(new ContainerJobActivator());

_app.UseHangfireDashboard();

GlobalJobFilters.Filters.Add(new AutomaticRetryAttribute
{
	Attempts = 0,
	OnAttemptsExceeded = AttemptsExceededAction.Delete
});

Update. Also I noticed that this is happening exactly after 30 mins. Is there any timeout to recycle workers or something?

aidmsu · May 24, 2017, 12:58pm

Hi @Vladimirs_Surajevs!

This behavior is caused by SqlServerStorageOptions.InvisibilityTimeout settings. It means that if the job is staying in processing state too long (InvisibilityTimeout), another worker becomes able to fetch this job.

Since Hangfire 1.5.0 SqlServerStorageOptions.InvisibilityTimeout does not make sense.New Hangfire.SqlServer implementation uses transactions to fetch background jobs and hide them from other workers.

So there are at least two ways to solve your problem:

Increase InvisibilityTimeout to 2h for example. But I don’t recommend this way because if Hangfire server is down then jobs, which is in Processing state, will be performed in 2 hours after created. Other reason if some job is performed more than 2 hour the problem will be repeated.
Update Hangfire to version 1.5.0 or higher. You shouldn’t configure something additionally. Long-running tasks work out of the box.

apilavakis · June 2, 2017, 3:09pm

I am running on version 1.6.5 and got the same issue.

Can also somebody let me know what would be the anticipated effect in such case? Mine looks like things got processed multiple times. My job is a veeery long one and the workers kept changing every 30 mins usually and sometimes even 1-2 minutes.

Server: ??? Worker: 36618529 +30m 1.175s Processing
Server: ??? Worker: 744a14ce +30m 37.460s Processing
Server: ??? Worker: 82fcf18c +30m 27.906s Processing
Server: ??? Worker: ec3999d1 +30m 11.517s Processing
Server: ??? Worker: da83b3da +1m 22.105s Processing
Server: ??? Worker: 5cfa73ab +2m 14.910s Processing
Server: ??? Worker: a1313895 +7m 32.283s Processing
Server: ??? Worker: 74003647 +30m 6.293s Processing
Server: ??? Worker: 4958f19f +30m 2.724s Processing
Server: ??? Worker: 224894b0 +30m 18.407s Processing
Server: ??? Worker: e26c2657 +30m 1.462s Processing
Server: ??? Worker: fc03c109 +30m 36.489s Processing
Server: ??? Worker: 0267941e +31m 10.761s Processing
Server: ??? Worker: 6249fa69 +30m 51.836s Processing
Server: ??? Worker: e5fb2a0c +30m 6.746s Processing
Server: ??? Worker: 44f3ae92 +30m 14.208s Processing
Server: ??? Worker: e38f26f9 +30m 54.943s Processing
Server: ??? Worker: f6fe79dd +32m 10.359s Processing
Server: ??? Worker: b6599844 +65ms Processing

Any ideas how to sort this out? Am i correct to think that this is the reason things got processed multiple times?

Oleg_Pak · April 22, 2025, 5:01pm

In my project long running jobs were immediately enqueued and picked up by another worker after running for 30 minutes. The first re-queue starts after 30 minutes, but later re-queues may occur randomly. After re-queue previous session continues to run, but doesn’t report logs to dashboard (logs will appear later, when job is finished).

The easiest way to fix that is to set big InvisibilityTimeout (ignore deprecation warning, it still helps).

Another way to workaround this is to update Hangfire.JobQueue table. During job execution update FetchedAt from time to time to avoid default 30 minutes invisibility timeout.

I also tried to use different MySQL storage providers, but they seem outdated.

Hangfire.AspNetCore 1.8.18
Hangfire.Console 1.4.3
Hangfire.MySqlStorage 2.0.3 (GitHub - arnoldasgudas/Hangfire.MySqlStorage: MySql storage for Hangfire - fire-and-forget, delayed and recurring tasks runner)

Topic		Replies	Views
Job retry while job is running with small delay question	1	1244	December 19, 2018
Processing on a job started 2 time with different worker IDs question queues	1	1166	May 2, 2017
Workers seems to hang and number of active workers slowly descrease to 0 question	2	3079	August 11, 2015
Multiple workers on one job bug?	11	4973	January 30, 2020
Loss of SQL connection causes Job ID to increase 10k bug? sql-server	1	1443	January 29, 2022

Worker changed during job execution

Related topics