High Data IO for two SQL Queries

Tags: #<Tag:0x00007faff6a92150>

We run Hangfire in Production for two sites and are having a lot of issues with SQL server and resource usage, specifically data IO.

Every 30 minutes or so the Data IO metrics will spike on SQL and cause SQL timeouts in hangfire. SQL Query Performance identifies the following two queries as being responsible.

(@now datetime,@count int)delete top (@count) from [HangFire].[Set]
where ExpireAt < @now
option (loop join, optimize for (@count = 20000))

and this one:

(@key nvarchar(26),@value nvarchar(256),@score float)merge [HangFire].[Set] with (xlock) as Target
using (VALUES (@key, @value, @score)) as Source ([Key], Value, Score)
on Target.[Key] = Source.[Key] and Target.Value = Source.Value
when matched then update set Score = Source.Score
when not matched then insert ([Key], Value, Score) values (Source.[Key], Source.Value, Source.Score); 

Looking at the queries the common factor is the HangFire.Set table. I’ve checked this and it sits at around 700,000 rows.

What does this table do and are there any settings we can change, or behaviour we can implement to make these queries more manageable. On the server is question there is about 6.5 million completed jobs. About 15k sitting in failed and another 15k sitting in a holding queue. Is it the high number of jobs sitting in failed or success that is causing this?