Multiple IIS servers with a single queuename ( i.e RunOnlyOneJobAtaTime) with 1 worker count and single SQL server

We are trying to use mulitple IIS servers to process jobs from the same SQL server Job Queues. This is mostly for failsafe, so if one of the IIS servers goes down the recurring jobs continue to work.
Both servers are creating a different server with the same Queue name. Jobs are getting queues from both IIS servers at start with AddOrUpdate, so only one job appears in the Recurring job page on both the servers.
This queues under question has only a worker count of 1. When a job gets queued into this queue, we are seeing 2 jobs running one of each server, even though the queue worker count is set to 1.
The job is set to run once ever 2 minutes but the job itself may take anywhere from 3-5 minutes but only one job can run at the same time due to underlying business logic requirements.

All this works if only one server is running and only 1 jobs runs in this queue and the others shows as enqueued wait to run one at a time.

What are we possibly missing to make only one job run at a time and the others get enqueued.

Regards,
Reddy

1 Like

I use Mutex instances to make sure only one instance of a job is running. A named Mutex is supposed to be system-wide, so it only works if both job servers are running on the same server instance.

For example:

using (var mutex = new Mutex(false, $"JOB_{environment}_{jobName}", out bool createdNew))
{
	if (createdNew)
	{
		bool canExecute = mutex.WaitOne(3 * 1000); // wait up to 3 seconds to acquire the wait handle

		if (canExecute)
		{
			// run the relevant code...
		}
	}
}

where environment and jobName are derived from a combination of reflection and appsettings values. You can use whatever you want for the job name (within reason - refer to the linked article for any caveats).

If you’re running a web farm scenario, you would probably need to look into the concurrency tooling provided by Hangfire.Ace: Concurrency & Rate Limiting — Hangfire Documentation

I’m not sure if anything exists outside of Ace. You could probably implement something using Redis or similar, but I haven’t had a need to do so and can’t really provide any guidance.

1 Like

Thank you for your insights.
I have looked into the source for hangfire itself directly and looks SQL server as a data source will not allow true distributed locking (it sp_getapplock and sp_releaseapplock) and these are session and transaction based, which means only one server.
The only way to truly get distributed application locking out of the box (hangfire) iis to use Redis as your data source which will provide true distributed locking (irrespective of how many servers are requesting the lock with a unique lock name).
We have implemented a transaction based table based row lock in sql server and all servers will request a lock when trying to run a job and skip if the lock is already in use ( with a set expiration date time , so the job does not hold to the lock forever). We will plan on transition to Redis as our data source to queue jobs and it should work out of the box from that point.

I think we are saying the same thing but just to give clarity if somebody in the future runs into the same problem.

Regards,
Reddy

1 Like

Sounds about right - I think the docs mention as much. Thankfully (or not?) I haven’t had to deal with the kind of volume where server farms become necessary, so my experience is limited to spreading jobs across specialized service applications (basically Topshelf-powered console apps which get installed as Windows services), which has worked well enough up to now.

If you hadn’t seen it yet, there is an official Redis plugin: Using Redis — Hangfire Documentation

Again, it requires the Ace version of Hangfire, but I thought I’d mention it just in case.

1 Like