Recurring jobs stop working after a while

Diego_Germano · June 12, 2019, 10:43am

Hi guys,

I know this is a known issue, but I tried a lot of things and nothing worked for me.

I am using aspnetcore 2.2 azure kubernetes on Linux with nginx. The problem is after a while the recurring job stop working.

Do you guys have any idea how to prevent recurring job stop in this scenario?

Thanks a lot

Steven_Desrochers · June 12, 2019, 11:41am

What is your current setup?

Is the dashboard separated from the workers?
Do you have different workers that runs on different queues?
What storage provider are you using?
Do all your worker are on the same version of hangfire?

Have you tried adding logging to your hangfire instance (both dashboard and workers)(Documentation) to see if there was any error during your worker execution?

I’ve had this problem recently also, and my problem was that i had 2 workers that had different jobs each but didn’t have the interface of one another. The catch is when putting a job in a queue, any worker can pick any of the jobs. That would cause the recurring job to have an exception when being picked up (because the interfaces didn’t exists in that worker) and thus, it would silently fail (if the job fail when being picked up, Hangfire puts it as an invalid job in the database and stops it from being ran at the interval set) and show after a while that the “Next execution” should be “a day ago”, which clearly doesn’t make sense.

The solution to my problem was to have an interface project that would contain ALL the interfaces of ALL the jobs. That way any workers could pick up jobs from the database and put them in the right queue.

Diego_Germano · June 12, 2019, 1:09pm

Is the dashboard separated from the workers?
R: I am using docker with several nodes and different applications sharing the same sql server database.
Do you have different workers that runs on different queues?
R: Yes, each application (micro-service) is using one queue (sharing the same database)
What storage provider are you using?
R: Azure Sql server
Do all your worker are on the same version of hangfire?
R: Yes

Have you tried adding logging to your hangfire instance (both dashboard and workers)(Documentation) to see if there was any error during your worker execution?
No, but thanks I will check and post here later

I’ve had this problem recently also, and my problem was that i had 2 workers that had different jobs each but didn’t have the interface of one another. The catch is when putting a job in a queue, any worker can pick any of the jobs. That would cause the recurring job to have an exception when being picked up (because the interfaces didn’t exists in that worker) and thus, it would silently fail (if the job fail when being picked up, Hangfire puts it as an invalid job in the database and stops it from being ran at the interval set) and show after a while that the “Next execution” should be “a day ago”, which clearly doesn’t make sense.
R: Uhm I think that is my issue, I am using docker, and for each queue (app) I have 2 nodes :\

The solution to my problem was to have an interface project that would contain ALL the interfaces of ALL the jobs. That way any workers could pick up jobs from the database and put them in the right queue.
R: Sorry, I dindt get that. But even in my case using docker do you think would work?

Thanks a lot for your help

Steven_Desrochers · June 13, 2019, 11:56am

I am also using docker and the issue isn’t coming from there.

If you go in your hangfire database and check the “Set” table, you should see your recurring job. If you see that the “score” is at -1, it means that it’s not going to get picked up by your workers and that’s your problem. The score is put at -1 when there’s an error while queuing the job (i think).

You’ll have to create a csproj containing all the interfaces of your jobs. Then import that csproj into all your worker projects and use the interfaces to enqueue the jobs. That way, every worker “knows” the jobs of the other workers and they will be able to properly put them in a queue.

example of a recurring job flow : The job starts in the database. Then it can be picked up by ANY worker from the database and put into a queue. Then only worker of that queue process that job.

The problems comes because your worker doesn’t have the assembly required to properly enqueue the job of the other worker. (hangfire search the job by reflection) That’s also why the job stops working but randomly. If the job is picked up by the right worker, it has the assembly and can properly queue it but in the case where your job en up in the wrong worker, it’ll just silently fail it and never requeue it.

Diego_Germano · June 13, 2019, 12:17pm

Hi Steven,

Thanks, that helps a lot. I have only one more doubt, but using docker with two or more nodes, hangfire still will work property? Because the jobs will be in the same queue and my concern is that hangfire can duplicate the execution. Right?

Steven_Desrochers · June 13, 2019, 1:21pm

Hangfire locks the job when starting processing a job, So it won’t be executed by every worker.

Just be careful for when a job takes more than 30 minutes as sometimes the lock falls off and another worker will start processing the same job. you can check your storage options and increase that timeout if you need.

Diego_Germano · June 14, 2019, 2:41pm

Hi Steven,

I set up the logging using Azure Application Insights and I did not get any errors. I am wondering if I need to set up something on my server, nginx, docker etc.

Now I am not sure if it is an error or just a configuration to let the application “Always running”

Did you face something like that?

Ram123 · May 15, 2024, 12:55pm

I am also facing the same situation
I have created a recuring job and set time for that every 4 minutes but it is not firing on the given time

RSantana · September 14, 2024, 12:19am

I have same problem. Has anyone managed to resolve this or have any suggestions as to what it could be?
Thanks!

Topic		Replies	Views
When I close the dashboard the recurring jobs stops bug? recurring , dashboard	0	211	February 14, 2024
Recurring Job stops running if Dashboard is closed question recurring , sql-server	2	920	February 16, 2024
Recurring jobs stop being executed after a while question	4	5578	July 12, 2017
Failed jobs break schedule bug? recurring	0	948	October 20, 2017
Hangfire stops processing recurring job question recurring , sql-server , dashboard	0	1560	September 15, 2016

Recurring jobs stop working after a while

Related topics