"Expired" enqueued jobs, are blocking new jobs

Simon_Green · August 22, 2019, 9:14am

Setup: Hangfire 1.6.17
We are using SQL Server and MSMQ.

Our Hangfire environment has gotten into a bad state. on the dashboard, we see this:

Note the “15/6” next to the “Enqueued” label, but then there are 659 items in the MEDIUMPRIORITY queue. When you drill into this queue, you see that all of them (apart from 15), are “Job expired”:

If you look in the DB, there is no record in the Job table, for all of the expired ones.

When we enqueue new items on the MEDIUMPRIORITY queue, they go to the back and are being blocked. The “expired” (non-existent) job items are very slowly being cleared out of the queue, but essentially are holding up the new job items for hours, until the expired ones are fully cleared out.

So my questions are:
-How could this have happened?
-Where is Hangfire sourcing these 659 job items from, if they are not in the DB?
-How can we clear these “expired” items out of whatever storage they are in, so that newly enqueued jobs aren’t blocked and can be processed immediately?

Thanks!
Simon

Simon_Green · August 22, 2019, 9:53am

OK, I have been able to resolve this issue, although I still can’t explain how it happened.

I had to delete all of the MSMQ items in the queue, that were referencing a job that did not exist in ght hangfire.job table.

odinserj · August 22, 2019, 2:04pm

That’s because each worker waits for some time before removing background job identifier from a queue. This feature was added, because in some cases (especially when using MSMQ + SQL Azure) it’s possible that enqueue operation is performed before the transaction is fully committed.

If job storage doesn’t support linearizable reads (i.e. don’t block on pending transaction), then null value is returned when trying to fetch a background job. The problem is we can’t distinguish two cases, where job was already expired for some reason or the corresponding transaction will be committed after a few moments.

Starting from 1.7.0 it’s possible to specify that the storage supports linearizable reads (READCOMMITTEDLOCK is used for SQL Server), and in this case workers will not wait on non-existing jobs.

Carlos_Ribeiro · February 9, 2021, 5:31pm

Can you kindly provide an example on how to enable this configuration?
Thanks.

odinserj · February 23, 2021, 10:45am

Just upgrade your Hangfire.Core and Hangfire.SqlServer to the latest version, these changes are enabled by default.

Topic		Replies	Views
Jobs stuck in enqueued state bug? msmq , sql-server	9	29168	April 21, 2015
Jobs being created in database, stuck as Enqueued bug? msmq , queues	26	22649	July 8, 2016
RecurringJob Stuck Enqueued using MSMQ but not in SQL Server bug? msmq , queues	14	5131	April 27, 2015
Why are there entries in JobQueue table that have no corresponding Job table entry? question sql-server	1	3539	July 25, 2016
How to remove Expired Jobs from Awaiting Queue? question queues , dashboard	1	4189	April 23, 2021

"Expired" enqueued jobs, are blocking new jobs

Related topics