Ok, I think I have a repro case.
Let's assume we have a simple job:
public static void NoopJob()
We then declare a job that creates a batch within a transaction, we add an artificial delay of 30 seconds to demonstrate the issue:
public async Task CreateJob1()
using (var scope = new TransactionScope(TransactionScopeOption.Required, TimeSpan.FromMinutes(2), TransactionScopeAsyncFlowOption.Enabled))
var batchId = BatchJob.StartNew(c =>
for (var i = 0; i < 10; i++)
c.Enqueue(() => NoopJob());
And finally, we create a harmless method that does nothing fancy but simply creates a job:
public async Task CreateJob2()
BackgroundJob.Enqueue(() => NoopJob());
Now, expected behavior when I call
CreateJob1(), which blocks for 30 seconds, and call
CreateJob2() within 30 seconds, there should be no blocking -- and things work OK.
However, now open the Dashboard, call
CreateJob1() (which blocks), and then call
CreateJob2(), in that order -- this should block. If it doesn't call
CreateJob2 a few times, normally blocks within 5-10 tries. The query it appears to block on is
(@key nvarchar(4000))select count([Key]) from [HangFire].[Set] with (readcommittedlock) where [Key] = @key.
You'll also notice the Dashboard is completely blocked while
CreateJob1() is running -- I think I narrowed that down to batches:started metric which the Dashboard tries to load.
If I remove batches from
CreateJob1, there is no blocking anymore. So this scalability issue has to do when batches are used within transactions. All this is running within an ASP.NET WebAPI.
If you have any idea on what's going on, please let me know. If not I'll try to find a more deterministic repro case where
CreateJob2 is blocked.