Hangfire stops to fetch jobs

hermandejager · November 16, 2015, 12:50pm

Hi,

We are using hangfire in production. It is a cluster of 5 servers. In 1 windows service we start 3 hangfire core’s. Each core works with 1 queue.
If the servers are very busy (CPU and bandwidth) processing jobs sometimes 1 of the queues stops processing jobs.
Not one of the servers will fetch a jobs of that queue while the other queues still process jobs within the same windows service.

If we restart 1 service all the servers start to process jobs from that specific queue again. We use Redis. We get the feeling that there is some kind of semaphore blocking the queue in redis for the servers to fetch jobs.

Can you help me what the problem can be. We already upgraded to the last hangfire and redis versions.

Herman

caltuntas · November 16, 2015, 1:22pm

We had a similar problem but no solution yet. You can see the details from
http://hangfire.discourse.group/t/distrubuted-lock-on-sql-server-is-never-released/1527/2

hermandejager · November 16, 2015, 1:34pm

Hi,

I thought it would be different on REDIS. but we have exact the same issue. also upgraded from 1.1 to 1.5.x and before no issues. but now problems.

Hope to hear very soon. This is a big problem!!!

odinserj · November 16, 2015, 4:29pm

Whether you use logging, are there any exceptions? Worker class, that processes background job in Enqueued state apply distributed locks only on background jobs, there are no any queue-level locks. Are other background jobs are processed immediately after a service restart? If the problem is caused by distributed locks, there should be a delay in minutes.

odinserj · November 16, 2015, 4:36pm

@hermandejager, can you send me an output of the following Redis command to learn more about locked resources when you are experiencing the issue?

KEYS *lock*

UPD. Do you have any custom filters?

odinserj · November 16, 2015, 5:18pm

Can you also create a dump file (Task Manager -> Right click on a process -> Create dump file), archive it and send it via email to support@hangfire.io or share it through Dropbox, Google Drive, etc.?

hermandejager · November 17, 2015, 2:07pm

Hi,

Yes we have custom filters.
Besides that we are now changing our implementation from 1 instance per queue. to more queue’s in one instance with this code

var processes = new List
{
new Worker(“default”),
new DelayedJobScheduler(),
new RecurringJobScheduler()
};

You are discribing:
Want 3 workers listening the default queue and 7 listening the critical queue? No problem. Don’t want to use recurring job scheduler on some instances? You can do this! Just pass the processes you need:

Can you give us an example of how the code must look like with 3 queue’s

queue - workers
default 10
Critical 2
normal 5

When this works we can cut out 2 hangfire instances in our windows server.

hermandejager · November 17, 2015, 2:14pm

Yes, when i reboot 1 service the specific queue processes jobs again.

hermandejager · November 17, 2015, 2:15pm

Yes i will do that, when it happens again. The files are large (8Gb) will RAR it and send it.
We have a specific busy time around 10:00. normaly it happens then.

odinserj · November 17, 2015, 3:41pm

Can you send me the source code of your filters? Sometimes they cause problems. Here is the sample code for your configuration. Will wait for the dump.

var processes = new List<IBackgroundProcess>();
processes.Add(new DelayedJobScheduler());
processes.Add(new RecurringJobScheduler());

var queues = new Dictionary<string, int>
{
    { "default", 10 },
    { "critical", 2 },
    { "normal", 5 }
};

foreach (var queue in queues)
{
    for (var i = 1; i < queue.Value; i++)
    {
        processes.Add(new Worker(queue.Key));
    }
}

var properties = new Dictionary<string, object>
{
    { "Queues", queues.Keys.ToArray() },
    { "WorkerCount", queues.Values.Sum() }
};

using (new BackgroundProcessingServer(processes, properties))
{
    Console.ReadLine();
}

hermandejager · November 23, 2015, 2:50pm

Hi,

We did not have this problem since we used your code of multiple queues with different workers.
Thanks so much for your help so far Sergey!

Topic		Replies	Views
Hangfire seems deadlocked? bug? recurring , mysql , distributed-locks	0	3639	January 29, 2018
Queue With 1 Worker Processes More than 1 job concurrently bug? redis , queues	9	21550	July 27, 2015
Sometimes background jobs get stuck in the processing queue question redis , queues , dashboard , dotnetcore	0	940	February 15, 2023
Two services on the the same Hangfire database question	0	1625	September 17, 2015
Run one job queue in the same time on multiple servers question queues	2	3863	May 7, 2021

Hangfire stops to fetch jobs

Related topics