Processing Queue under capacity

ObsidianPhoenix · May 13, 2020, 3:37pm

We have a Hangfire instance running on Hangfire 1.7.9 (Using an ACE license as far as I’m aware). This instance is run out of a SQL Server database, and has 7 servers hooked in (each with 32 workers, and set up for all queues).

When we have a large backlog on one of the queues (~130k), and no other items in the other queues, I would expect to see our processing items sitting at close to 224 (7 servers * 32 workers). However, we rarely see the Processing section climb above 150, and normally sits around 120.

I appreciate that Processing doesn’t necessarily paint the complete picture, as it probably doesn’t account for things like polling for the next job, etc.

Is there a better way to work out what the workers are actually doing, and whether we’re being bottlenecked somewhere.

Jonah_Simpson · May 13, 2020, 6:36pm

First guess would be you’re hitting the default limit for threads per process in IIS.

ObsidianPhoenix · May 13, 2020, 7:09pm

We’re running the actual workers off a windows service, rather than from within the web api project, is that likely to hit the same limit, or an equivalent limit in the windows service?

Jonah_Simpson · May 13, 2020, 7:36pm

No, there shouldn’t be a (relevant) thread limit in a Windows Service like there would be for IIS. I doubt you’re hitting a problem where “workers are doing other things” but might be hitting database query limits for updating state and pulling new Jobs.

How long are your Jobs processing for?

ObsidianPhoenix · May 13, 2020, 7:39pm

Looking at the logs, they seem to be sub 1 second, usually 1-200ms

ObsidianPhoenix · May 14, 2020, 4:05pm

So, the reason we get 100k jobs in the queue is because we’re running a peak load test through the system, which generates around 30 odd RPS. It starts to back up and just climbs.

My original thinking was that because the jobs table is high traffic (in and out), they’re just all stepping on each other. That does seem to be visible in the db, with many sessions getting locked by others.

However, after the peak load completes, there’s little to no traffic on the system, so Hangfire has its pick. I don’t see the processing count climb above 150, maybe 170 occasionally.

ObsidianPhoenix · May 18, 2020, 1:05pm

We managed to figure it out what was causing the backlog. We’re using v1.7, but hadn’t seen the recommended changes in the config:

UseRecommendedIsolationLevel = true
UsePageLocksOnDequeue = true
DisableGlobalLocks = true

I ran a performance test on Friday with these new settings. In the test env, after an hour I had a 19k backlog. With these settings enabled, I had none.

Jonah_Simpson · May 25, 2020, 2:56pm

Nice work, glad you came to a resolution!

Topic		Replies	Views
High CPU Usage, high latency question recurring , queues , job-filters , dashboard , aspnetcore	5	1174	September 18, 2023
Multiple workers on one job bug?	11	4988	January 30, 2020
Piling up of job queue/Random deadlock bug? recurring , sql-server	1	1948	February 14, 2019
Hangfire Jobs "stuck" in processing/enqueued state with no errors question sql-server , queues	2	8139	August 1, 2022
Hangfire slow jobs processing and JobQueue table seems locked up question sql-server	0	172	August 8, 2024

Processing Queue under capacity

Related topics