Worker changed during job execution

Hi,

I have a job that meant to do some hard work for about 1.5h, however it fails on a halfway through because it looks like worker stopped and my job is not re-entrant. I understand that I should design my job better, however I’d like to understand why worker stopped/changed. In State table I see following:

Id	JobId	Name	Reason	CreatedAt	Data
27	8	Processing	NULL	2017-05-17 10:34:51.640	{"StartedAt":"2017-05-17T10:34:51.6389278Z","ServerId":"mypc:3332","WorkerNumber":"22"}
28	8	Processing	NULL	2017-05-17 11:04:51.683	{"StartedAt":"2017-05-17T11:04:51.6819303Z","ServerId":"mypc:3332","WorkerNumber":"14"}
29	8	Failed	An exception occurred during performance of the job.	2017-05-17 11:04:51.727	{"FailedAt":"2017-05-17T11:04:51.7209303Z","ExceptionType":"System.IO.FileNotFoundException","ExceptionMessage":"blahblah you job is not reentrant.."}
30	8	Deleted	Automatic deletion after retry count exceeded 0	2017-05-17 11:04:51.727	{"DeletedAt":"2017-05-17T11:04:51.7209303Z"}

How I can find out what happened to worker 22? In logs I can see only that job started again and then failed and deleted (Failed to process the job ‘8’: an exception occured. Job was automatically deleted because the retry attempt count exceeded 0.)

I am using 1.4.3.0 hangfire version with WebAPI and it is configured to use sql server storage:

SqlServerStorage storage = new SqlServerStorage(HANGFIRE_CONNECTION);
BackgroundJobServerOptions options = new BackgroundJobServerOptions();
_app.UseHangfireServer(options, storage);

GlobalConfiguration.Configuration
	.UseSqlServerStorage(HANGFIRE_CONNECTION)
	.UseActivator(new ContainerJobActivator());

_app.UseHangfireDashboard();

GlobalJobFilters.Filters.Add(new AutomaticRetryAttribute
{
	Attempts = 0,
	OnAttemptsExceeded = AttemptsExceededAction.Delete
});

Update. Also I noticed that this is happening exactly after 30 mins. Is there any timeout to recycle workers or something?

Hi @Vladimirs_Surajevs!

This behavior is caused by SqlServerStorageOptions.InvisibilityTimeout settings. It means that if the job is staying in processing state too long (InvisibilityTimeout), another worker becomes able to fetch this job.

Since Hangfire 1.5.0 SqlServerStorageOptions.InvisibilityTimeout does not make sense.New Hangfire.SqlServer implementation uses transactions to fetch background jobs and hide them from other workers.

So there are at least two ways to solve your problem:

  1. Increase InvisibilityTimeout to 2h for example. But I don’t recommend this way because if Hangfire server is down then jobs, which is in Processing state, will be performed in 2 hours after created. Other reason if some job is performed more than 2 hour the problem will be repeated.
  2. Update Hangfire to version 1.5.0 or higher. You shouldn’t configure something additionally. Long-running tasks work out of the box.

I am running on version 1.6.5 and got the same issue.

Can also somebody let me know what would be the anticipated effect in such case? Mine looks like things got processed multiple times. My job is a veeery long one and the workers kept changing every 30 mins usually and sometimes even 1-2 minutes.

Server: ??? Worker: 36618529 +30m 1.175s Processing
Server: ??? Worker: 744a14ce +30m 37.460s Processing
Server: ??? Worker: 82fcf18c +30m 27.906s Processing
Server: ??? Worker: ec3999d1 +30m 11.517s Processing
Server: ??? Worker: da83b3da +1m 22.105s Processing
Server: ??? Worker: 5cfa73ab +2m 14.910s Processing
Server: ??? Worker: a1313895 +7m 32.283s Processing
Server: ??? Worker: 74003647 +30m 6.293s Processing
Server: ??? Worker: 4958f19f +30m 2.724s Processing
Server: ??? Worker: 224894b0 +30m 18.407s Processing
Server: ??? Worker: e26c2657 +30m 1.462s Processing
Server: ??? Worker: fc03c109 +30m 36.489s Processing
Server: ??? Worker: 0267941e +31m 10.761s Processing
Server: ??? Worker: 6249fa69 +30m 51.836s Processing
Server: ??? Worker: e5fb2a0c +30m 6.746s Processing
Server: ??? Worker: 44f3ae92 +30m 14.208s Processing
Server: ??? Worker: e38f26f9 +30m 54.943s Processing
Server: ??? Worker: f6fe79dd +32m 10.359s Processing
Server: ??? Worker: b6599844 +65ms Processing

Any ideas how to sort this out? Am i correct to think that this is the reason things got processed multiple times?