We are using HangFire in production Azure SaaS environment and the DTU is reaching to 100% very often. On further analysis, we are seeing the below statement is causing spike.
(@now datetime,@count int)delete top (@count) T from [HangFire].[Job] T with (forceseek)
where ExpireAt < @now
option (loop join, optimize for (@count = 20000))
- How to reduce the batch count from 20000?
- Was that configurable? where?
- Do we need to do run any optimization scripts like reindex? Any recommendation and scripts for that?
Hi There
Did you find a solution to this?
We are experiencing the same issue. This query runs on every hour and every half hour. When it does, our DTU hits 100% and we experience SQL timeout issues for other queries at this time.
I was thinking it would be preferable for us to schedule the deletion at a quiet time of day. I assume this is the cleanup job which removes any jobs which are older than (default) 1 day.
I think the properties we need to set are the JobExpirationCheckInterval and DeleteExpiredBatchSize of the SqlServerStorageOptions.
JobExpirationCheckInterval is defaulted to 30 minutes. DeleteExpiredBatchSize is defaulted to -1 on instantiation, so I assume HangFire interprets a -1 value as the default 20,000.
In our case we HangFire is processing a high volume of very small tasks, and once complete, we don’t really care about them. So I think we need to set these jobs to expire very quickly, i.e. 15 minutes rather than 1 day, and then I believe the 30 minute deletion of expired jobs will be more efficient, or we could have the deletion interval set to say every 6 hours, and the batch a higher value such as 50,000.