Jobs stuck in enqueued state

oeaoaueaa · September 17, 2014, 1:48pm

Hi,
We are using Hangfire with SqlServer/MSMQ and sometimes, the HangFire dashboard shows that there are some jobs (X) Enqueued (Queues X/3) but going into “hangfire/queues” it says all the queues are empty.
Then in the database:
select * from HangFire.JobQueue > nothing
select * from HangFire.job where statename=‘Enqueued’ > X records
And there is no message in the queues for MSMQ.

Might it be that the jobs have been run but the database has not been updated?

Thanks

nirinchev · January 12, 2015, 1:12am

Hi everyone, we hit a similar issue on one of our test servers. Our setup is as follows:

B2 Azure Website (2 cores, 3.5GB ram).
Web (retired) SQL Database.
Hangfire is using SQL Server storage.
10 workers.
~200 very lightweight scheduled jobs per day.

Today I noticed that we had 280 enqueued jobs, but they were not being processed. Since I didn’t have access to a PC, I didn’t examine what’s in the database and restarted the website and immediately Hangfire started processing the enqueued jobs. We’re not too concerned at this point, because our database server is fairly underpowered so this might have caused the problem, but I’ll share more information if it occurs again.

I’m considering setting up some lightweight monitoring task that will alert us if the queue becomes unusually long. Has anyone done something similar? I’m thinking about exposing a GET method that will query the monitoring API and return the queue length, but if anyone has a more elegant solution, let me know

paulduran · January 12, 2015, 3:29am

This is a really timely thread because I was about to add a question for the same reason.

We’ve had several occasions where hangfire stops processing jobs too. We’ve had to restart the app to get it going again.

Our setup is similar to @nirinchev:

hangfire with SQL server storage.
4 web servers (on appharbor/AWS) with 5 workers per server.
several hundred jobs per day

It would be awesome to get to the bottom of why it stops working. If there’s any diagnostic information that we can provide, please let me know.

Regarding monitoring, the idea that I had was to have some recurring job that is put on the queue and it updates some sort of data variable (eg: ‘last executed’). Then you could have an endpoint that returns an error if that data variable hasn’t been updated in some particular amount of time (eg: 2 minutes)

chris · January 13, 2015, 3:40pm

I’ve got the same issue : http://discuss.hangfire.io/t/hangfire-stops-processing-jobs-in-queue-after-database-errors-on-azure-sql

All five worker get stopped by Hangfire because the database is unavailable/failing for a short time.

odinserj · January 28, 2015, 10:29pm

@oeaoaueaa, recent Hangfire 1.3.3 release fixes problems with “stucked” enqueued jobs with MSMQ.

odinserj · January 28, 2015, 10:29pm

@chris, @paulduran your problem was fixed in Hangfire 1.3.2.

Michael_Greenspan · April 9, 2015, 1:40pm

I am using SQL Server (not MSMQ) but this same behaviour is happening to me.

I have a long running background task that gets fired off on startup, for illustration purposes:


public class OwinStartup
{
    public void Configuration(IAppBuilder app)
    {
        app.UseHangfire(config =>
        {
            config.UseSqlServerStorage("Hangfire");
            config.UseServer();
        });

        BackgroundJob.Enqueue(() => Foo(JobCancellationToken.Null));                 
    }

    public static void Foo(IJobCancellationToken cancellationToken)
    {
        cancellationToken.ThrowIfCancellationRequested();

        Console.WriteLine("Starting FOO");
        while (true)
        {
            Thread.Sleep(100);
        }
    }
}

When I first launch my site from Visual Studio in IIS Express (F5), Foo gets enqueued and is in the Processing list, however, if I then stop my debug session in VS, and restart the site using F5, I see the original Foo in the Processing list, and a second Foo perpetually stuck in the queue.
If I then stop the site using the IIS Express controller in my task bar (as opposed to just cancelling the debug session in VS), and restart, the subsequent Foo will begin processing (but the second Foo remains in the queue).
I struggled with Hangfire for a full work day yesterday,and am almost ready to throw the towel in and use a different solution! Please help!

odinserj · April 9, 2015, 2:58pm

This is due to the fact that Hangfire Server was killed, I call this ungraceful shutdown. Job queues based on SQL Server’s table works in the following way in Hangfire. When a worker picks up a job, it marks it as invisible for other workers (to prevent parallel processing of a same job). However, to handle ungraceful shutdown and to be able to process aborted jobs, we are using Invisibility timeout that defaults to 30 minutes. So, after this timeout, your job will be re-queued.

This is very confusing scenario, and I’ve added some hints for Hangfire 1.4.0’s job details page:

And

odinserj · April 9, 2015, 2:59pm

You can also use MSMQ, it does not have invisibility timeout, and jobs are requeued instantly – http://docs.hangfire.io/en/latest/configuration/using-sql-server-with-msmq.html.

Topic		Replies	Views
RecurringJob Stuck Enqueued using MSMQ but not in SQL Server bug? msmq , queues	14	5132	April 27, 2015
Hangfire job remain in enqueued bug? sql-server , queues	0	1326	January 22, 2017
Cannot See Jobs in MSMQ question	1	749	April 28, 2017
Jobs are not getting enqueued on time bug? hangfire-pro	4	4741	August 3, 2017
Scheduled Jobs not being Enqueued question sql-server , queues	0	33	June 12, 2025

Jobs stuck in enqueued state

Related topics