We’ve been using hangfire in production for over a year, spontaneously over the past few days, the recurring jobs have stopped working properly.
We have close to 20 recurring jobs that each now say their next execution is in the past. I am manually triggering them (and they work properly when I do), but they don’t “requeue” themselves. Also suspiciously the Last Execution is blank, which makes no sense as I’m requeuring them regularly and they’ve been running for months.
Any ideas on how to recover my installation?
So I saw that there might be DST issues with the release I was using, so I updated to the latest version of hangfire in order to try to fix my broken recurring tasks.
Unfortunately, no luck, still all blanks.
So my next step has been to delete a couple of the pending jobs and recreate them. This is really arduous, because I don’t have an easy way to recreate each of them, but the ones I knew what other users did to create, I did to create. Again, these are chores that have been running just fine for over six months.
My new jobs just say “N/A” for next execution time and N/A for previous execution time… My old ones still say their next execution time is in the past.
Anyone have any suggestions?
I’m currently doing the second to worst solution possible, I’m scripting the 29 jobs, dropping the tables, and letting hangfire recreate itself.
I wouldn’t be willing to do this for 100+ recurring jobs, so while I need this fixed this hour so have to do this, I’d really appreciate anyone telling me how I could have fixed this in C# or in TSQL.
Oh dear, what version did you have before the update?
1.5.3, apparently 1 point before the DST fix in 1.5.4. :’(
Rescripting them fixed it, my recurring schedules are working properly now. My assumption is that there must be something different in how the jobs are serialized between 1.5.3 and the current version that simply upgrading Hangfire wasn’t enough. I would have loved some technique to tell hangfire to re-initialize the jobs. I understand that perhaps Hangfire doesn’t even store enough information to re-initalize, but it’d be nice if it had.
Essentially, I sent Hangfire information like the choreID, the CRON, and the method with parameters. After DST, it stopped working, if I could have iterated through the jobs (without knowing their methods, crons, choreids, etc) and just told it to “reinitialize” or maybe even one step back, just simply tell hangfire to reinitalize all jobs, ultimately something that could become a button in the dashboard, it would be a nice way to “save” the corrupted jobs.
This is making the assumption that the jobs can ever get corrupted again by some other circumstance (presumably not DST since hopefully that was fixed). If they can’t, if DST was the only thing ever, thats great I guess this isn’t needed. But as it is, I would assume there is always going to be something else that can corrupt the job, and rebuilding all chores by hand probably isn’t the best solution.