Why does Hangfire pause checking for jobs from sqlserver for several seconds?

sql-server
Tags: #<Tag:0x00007f8ba0b12140>

#1

I’ve inherited a system that uses Hangfire with sql server job storage. Usually when a job is scheduled to be run immediately we notice it takes a few seconds before it’s triggered.

Looking at SQL Profiler when running in my dev environment, the SQL run against Hangfire db looks like this:

exec sp_executesql N'delete top (1) JQ
output DELETED.Id, DELETED.JobId, DELETED.Queue
from [HangFire].JobQueue JQ with (readpast, updlock, rowlock, forceseek)
where Queue in (@queues1) and (FetchedAt is null or FetchedAt < DATEADD(second, @timeout, GETUTCDATE()))',N'@queues1 nvarchar(4000),@timeout float',@queues1=N'MYQUEUENAME_master',@timeout=-1800

-- Exactly the same SQL as above is executed about 6 times/second for about 3-4 seconds,
-- then nothing for about 2 seconds, then: 

exec sp_getapplock @Resource=N'HangFire:recurring-jobs:lock',@DbPrincipal=N'public',@LockMode=N'Exclusive',@LockOwner=N'Session',@LockTimeout=5000
exec sp_getapplock @Resource=N'HangFire:locks:schedulepoller',@DbPrincipal=N'public',@LockMode=N'Exclusive',@LockOwner=N'Session',@LockTimeout=5000
exec sp_executesql N'select top (@count) Value from [HangFire].[Set] with (readcommittedlock, forceseek) where [Key] = @key and Score between @from and @to order by Score',N'@count int,@key nvarchar(4000),@from float,@to float',@count=1000,@key=N'recurring-jobs',@from=0,@to=1596053348
exec sp_executesql N'select top (@count) Value from [HangFire].[Set] with (readcommittedlock, forceseek) where [Key] = @key and Score between @from and @to order by Score',N'@count int,@key nvarchar(4000),@from float,@to float',@count=1000,@key=N'schedule',@from=0,@to=1596053348
exec sp_releaseapplock @Resource=N'HangFire:recurring-jobs:lock',@LockOwner=N'Session'
exec sp_releaseapplock @Resource=N'HangFire:locks:schedulepoller',@LockOwner=N'Session'

-- Then nothing is executed for about 8-10 seconds, then: 

exec sp_executesql N'update [HangFire].Server set LastHeartbeat = @now where Id = @id',N'@now datetime,@id nvarchar(4000)',@now='2020-07-29 20:09:19.097',@id=N'ps12345:19764:fe362d1a-5ee4-4d97-b70d-134fdfab2b87'

-- Then about 500ms-2s later I get 
exec sp_executesql N'delete top (1) JQ ... -- i.e. Same as first query

The update LastHeartbeat query is only there every second time (from just a brief inspection, maybe that’s not exactly right).

It looks like there’s at least 3 threads running the DELETE query against JQ, since I can see several RPC:Starting before the RPC:Completed, suggesting they’re being executed in parallel instead of sequentially.
I don’t know if that’s normal but seems weird as I thought we had just one ‘consumer’ of the jobs.

I only have one Queue in my dev environment, although in live we’d have 20-50 I’d guess.

Any suggestions on where I should look for the configuration that’s causing:
a) the 8-10s pause between checking for jobs
b) the number of threads that are checking for jobs - it seems like I have too many

thanks,

Rory


#2

It turned out we were configuring the SqlServerStorageOptions after calling app.UseHangfireServer(), which unsurprisingly means it’s left at the default. Which for sql server is to poll every 15 seconds.

The number of threads was to do with calling app.UseHangfireServer() a bunch of times instead of just once.

PS: it’s really annoying on this forum that you can’t edit posts (too similar to existing post) or seemingly reply to your own post.