GetFirstByLowestScoreFromSet

mrattray · August 14, 2015, 5:14pm

Have an odd problem, we’ve been using hangfire for months with no problem, yesterday around midnight UTC the following query started running about 150 times/second and has been doing it consistently ever since.

SELECT top 1 Value
FROM HangFire.[Set]
WHERE [Key] = @key
AND Score BETWEEN @from AND @to
ORDER BY Score

I see the query is part of Hangfire.SqlServer.SqlServerConnection.GetFirstByLowestScoreFromSet.

The number of times the query is being executed seems excessive, but I have no idea what triggers the query and why it would suddenly start doing this.

I have confirmed with my colleagues that no changes were made to the servers or code anytime near the time this began.

Any thoughts?

Thanks
Monte

yngndrw · August 14, 2015, 7:20pm

From what I can see GetFirstByLowestScoreFromSet (Such a catchy name) is only ever called by the SchedulePoller and this should only occur one every poll interval:

github.com

HangfireIO/Hangfire/blob/5a770aab402b820f262f0872b18c2d3304bb7128/src/Hangfire.Core/Server/SchedulePoller.cs#L58


public void Execute(CancellationToken cancellationToken)
{
    if (!EnqueueNextScheduledJob())
    {
        if (_enqueuedCount != 0)
        {
            Logger.InfoFormat("{0} scheduled jobs were enqueued.", _enqueuedCount);
            _enqueuedCount = 0;
        }


        cancellationToken.WaitHandle.WaitOne(_pollInterval);
    }
    else
    {
        // No wait, try to fetch next scheduled job immediately.
        _enqueuedCount++;
    }
}


public override string ToString()
{

Could it be that the poll interval for the SchedulePoller has been set to zero ? (The property SchedulePollingInterval on the BackgroundJobServerOptions class, defaults to 15 seconds.) Can you confirm that the query is being executed with a key of “schedule” ? How many servers do you have ? Could it be that many BackgroundJobServer instances are being created ?

mrattray · August 17, 2015, 5:05am

I verified that we don’t override the poll interval so it should be using the default. I checked directly against the database and the query is using the key ‘schedule’ and when I run the query it returns values. Now the weird part is that no scheduled jobs are being run at all. According to the from and to values being passed into the query there should be thousands of jobs available to run but none are being queued up. I can’t see what would be preventing those jobs from running.

mrattray · August 17, 2015, 5:24am

Figured it out, the Hangfire.Set table had a scheduled job that had no corresponding row in Hangfire.Job. The poller kept pulling that broken job as the next one to schedule, failed to run it and then repeat. I’m assuming that the failure kept triggering something to try again which was why there were so many queries from the poller. I deleted the row from the Set table and all the other scheduled jobs started running.
Still no idea how the broken scheduled job got there.

JohanAlkemade · August 21, 2015, 7:10am

I saw this as well today on one of our production servers. We had 2 records in the Set table without corresponding Job records.

odinserj · August 21, 2015, 8:45am

@mrattray, @JohanAlkemade, this is very strange that those job identifiers were not removed from a set, I realize the only possible thing that someone removed the job from storage manually, bypassing Hangfire’s state change mechanism.

However, this behavior is weird, and SchedulePoller should remove identifiers of a missing job. I’ve just created a bug on GitHub, thank you for reporting this!

mrattray · August 21, 2015, 2:28pm

Thanks for looking into it, I haven’t been able to reproduce the missing job identifier so it is very possible that it was caused by someone manually messing around.

odinserj · August 29, 2015, 2:00pm

Guys, the fix is available with the 1.4.6 release.

Topic		Replies	Views
Next execution 7 hours ago? bug? recurring , dashboard	4	3299	February 1, 2018
Jobs scheduled to be enqueued are not being enqueued bug? queues	2	2087	December 29, 2015
Long Running Job Fetching SQL Query bug?	1	2611	February 11, 2016
Problem with Delayed Job - I am in big trouble bug?	26	12443	July 24, 2015
Polling Interval for Fire, and Forget Background Jobs question	10	21225	March 14, 2016

GetFirstByLowestScoreFromSet

Related topics