I got below error message while processing the job:
The job was aborted β it is processed by server XXXXX which is not in the active servers list for now. It will be retried automatically after invisibility timeout, but you can also re-queue or delete it manually.
How to handle this? help appreciated!
Thanks
i have the same problem. Did you solve?
It says the Hangfire server that was handling the job is no longer active, so once another Hangfire server is started to take its place it will retry it after a timeout. Pretty self explanatory.
Ensure your Hangfire server is till running, and is kept running correctly.
I personally would advise not running the Hangfire server in an IIS, since it seems difficult to configure an IIS to handle running a PROGRAM without stopping it every few minutes because of site inactivity.
It is much simpler to run the Hangfire server as a Windows Service using Topshelf or something similar. Even running it as a console app seem more reliable than an IIS.
I would like to understand more about aborted jobs retry principles.
In my experience, starting a new hangfire instance sometimes causes retry of outstanding aborted jobs left from previous launch, sometimes not, and not all of jobs are restarted. Looks like the older job - the less chance it has to be resurrected.
What if I have multiple hangfires in a web garden? will other instances pick up aborted jobs?
Does it have something to do with InvisibilityTimeout?
ok, I made some experiments and came to the conclusion that probably Hangfire does not support automatic retry of a suddenly fallen serverβ jobs.
If a server managed to place its running jobs into retry queue, then they will be retried.
If server just quits (OOM, electricity, or container shut down) then they will remain in orphaned state with only manual resume possible.
To mitigate this problem, I added the following code into my services startup. It detects orphaned jobs (that is, jobs without active server) and requeues them. So in case one instance of a service fails, and another one starts - it takes care of these jobs.
private static void RequeueOrphanedJobs()
{
var api = JobStorage.Current.GetMonitoringApi();
var processingJobs = api.ProcessingJobs(0, 100);
var servers = api.Servers();
var orphanJobs = processingJobs.Where(j => !servers.Any(s => s.Name == j.Value.ServerId));
foreach (var orphanJob in orphanJobs)
{
BackgroundJob.Requeue(orphanJob.Key);
}
}
2 Likes
I had the same problem where I was running a SQL server on different system and Application using dashboard UI on a different system. Problem was both servers had a system time gap of 1 minute. After syncing the time on both the servers issue got resolved.