I have a job that meant to do some hard work for about 1.5h, however it fails on a halfway through because it looks like worker stopped and my job is not re-entrant. I understand that I should design my job better, however I’d like to understand why worker stopped/changed. In State table I see following:
Id JobId Name Reason CreatedAt Data
27 8 Processing NULL 2017-05-17 10:34:51.640 {"StartedAt":"2017-05-17T10:34:51.6389278Z","ServerId":"mypc:3332","WorkerNumber":"22"}
28 8 Processing NULL 2017-05-17 11:04:51.683 {"StartedAt":"2017-05-17T11:04:51.6819303Z","ServerId":"mypc:3332","WorkerNumber":"14"}
29 8 Failed An exception occurred during performance of the job. 2017-05-17 11:04:51.727 {"FailedAt":"2017-05-17T11:04:51.7209303Z","ExceptionType":"System.IO.FileNotFoundException","ExceptionMessage":"blahblah you job is not reentrant.."}
30 8 Deleted Automatic deletion after retry count exceeded 0 2017-05-17 11:04:51.727 {"DeletedAt":"2017-05-17T11:04:51.7209303Z"}
How I can find out what happened to worker 22? In logs I can see only that job started again and then failed and deleted (Failed to process the job ‘8’: an exception occured. Job was automatically deleted because the retry attempt count exceeded 0.)
I am using 1.4.3.0 hangfire version with WebAPI and it is configured to use sql server storage:
This behavior is caused by SqlServerStorageOptions.InvisibilityTimeout settings. It means that if the job is staying in processing state too long (InvisibilityTimeout), another worker becomes able to fetch this job.
Since Hangfire 1.5.0 SqlServerStorageOptions.InvisibilityTimeout does not make sense.New Hangfire.SqlServer implementation uses transactions to fetch background jobs and hide them from other workers.
So there are at least two ways to solve your problem:
Increase InvisibilityTimeout to 2h for example. But I don’t recommend this way because if Hangfire server is down then jobs, which is in Processing state, will be performed in 2 hours after created. Other reason if some job is performed more than 2 hour the problem will be repeated.
Update Hangfire to version 1.5.0 or higher. You shouldn’t configure something additionally. Long-running tasks work out of the box.
I am running on version 1.6.5 and got the same issue.
Can also somebody let me know what would be the anticipated effect in such case? Mine looks like things got processed multiple times. My job is a veeery long one and the workers kept changing every 30 mins usually and sometimes even 1-2 minutes.