GetFirstByLowestScoreFromSet

Have an odd problem, we’ve been using hangfire for months with no problem, yesterday around midnight UTC the following query started running about 150 times/second and has been doing it consistently ever since.

SELECT top 1 Value
FROM HangFire.[Set]
WHERE [Key] = @key
AND Score BETWEEN @from AND @to
ORDER BY Score

I see the query is part of Hangfire.SqlServer.SqlServerConnection.GetFirstByLowestScoreFromSet.

The number of times the query is being executed seems excessive, but I have no idea what triggers the query and why it would suddenly start doing this.

I have confirmed with my colleagues that no changes were made to the servers or code anytime near the time this began.

Any thoughts?

Thanks
Monte

From what I can see GetFirstByLowestScoreFromSet (Such a catchy name) is only ever called by the SchedulePoller and this should only occur one every poll interval:

Could it be that the poll interval for the SchedulePoller has been set to zero ? (The property SchedulePollingInterval on the BackgroundJobServerOptions class, defaults to 15 seconds.) Can you confirm that the query is being executed with a key of “schedule” ? How many servers do you have ? Could it be that many BackgroundJobServer instances are being created ?

1 Like

I verified that we don’t override the poll interval so it should be using the default. I checked directly against the database and the query is using the key ‘schedule’ and when I run the query it returns values. Now the weird part is that no scheduled jobs are being run at all. According to the from and to values being passed into the query there should be thousands of jobs available to run but none are being queued up. I can’t see what would be preventing those jobs from running.

Figured it out, the Hangfire.Set table had a scheduled job that had no corresponding row in Hangfire.Job. The poller kept pulling that broken job as the next one to schedule, failed to run it and then repeat. I’m assuming that the failure kept triggering something to try again which was why there were so many queries from the poller. I deleted the row from the Set table and all the other scheduled jobs started running.
Still no idea how the broken scheduled job got there.

I saw this as well today on one of our production servers. We had 2 records in the Set table without corresponding Job records.

1 Like

@mrattray, @JohanAlkemade, this is very strange that those job identifiers were not removed from a set, I realize the only possible thing that someone removed the job from storage manually, bypassing Hangfire’s state change mechanism.

However, this behavior is weird, and SchedulePoller should remove identifiers of a missing job. I’ve just created a bug on GitHub, thank you for reporting this!

Thanks for looking into it, I haven’t been able to reproduce the missing job identifier so it is very possible that it was caused by someone manually messing around.

Guys, the fix is available with the 1.4.6 release.

1 Like