I have a typical Hangfire + Sql Server setup (using Azure).
I have to crunch through around 160,000 jobs, my default concurrency is ~20 workers, but when all 20 workers are doing their work, pretty much all of hangfire comes down to a crawl.
DB load is very low (we use a P11 instance right now, DTU usage is less than 10%, other queries are just fine). I’ve traced this down to lock contentions.
SELECT sqltext.TEXT,* FROM sys.dm_exec_requests OUTER APPLY sys.dm_exec_sql_text(sql_handle) AS sqltext where DB_NAME(database_id)='dbname' and blocking_session_id <>0
yields ~6-7 blocked queries, waiting on resource
APPLICATION: 6:0:[HangFire:Set:Lock]:(cb774589) – this appears to be a global lock.
I’m about to start digging into this code to figure out what can be done about it, but some pointers would be helpful. It appears that if I have a lot of short-running jobs, with a concurrency level of ~20, the hangfire subsystem overhead drags everything down.