Thanks @odinserj.
Ok, I think I have a repro case.
Let’s assume we have a simple job:
public static void NoopJob()
{
// NO-OP
}
We then declare a job that creates a batch within a transaction, we add an artificial delay of 30 seconds to demonstrate the issue:
public async Task CreateJob1()
{
using (var scope = new TransactionScope(TransactionScopeOption.Required, TimeSpan.FromMinutes(2), TransactionScopeAsyncFlowOption.Enabled))
{
var batchId = BatchJob.StartNew(c =>
{
for (var i = 0; i < 10; i++)
{
c.Enqueue(() => NoopJob());
}
});
await Task.Delay(TimeSpan.FromSeconds(30));
scope.Complete();
}
}
And finally, we create a harmless method that does nothing fancy but simply creates a job:
public async Task CreateJob2()
{
BackgroundJob.Enqueue(() => NoopJob());
}
Now, expected behavior when I call CreateJob1()
, which blocks for 30 seconds, and call CreateJob2()
within 30 seconds, there should be no blocking – and things work OK.
However, now open the Dashboard, call CreateJob1()
(which blocks), and then call CreateJob2()
, in that order – this should block. If it doesn’t call CreateJob2
a few times, normally blocks within 5-10 tries. The query it appears to block on is (@key nvarchar(4000))select count([Key]) from [HangFire].[Set] with (readcommittedlock) where [Key] = @key
.
You’ll also notice the Dashboard is completely blocked while CreateJob1()
is running – I think I narrowed that down to batches:started metric which the Dashboard tries to load.
If I remove batches from CreateJob1
, there is no blocking anymore. So this scalability issue has to do when batches are used within transactions. All this is running within an ASP.NET WebAPI.
If you have any idea on what’s going on, please let me know. If not I’ll try to find a more deterministic repro case where CreateJob2
is blocked.