Jobs stuck in enqueued state

Tags: #<Tag:0x00007f0661281518> #<Tag:0x00007f0661281428>

Hi,
We are using Hangfire with SqlServer/MSMQ and sometimes, the HangFire dashboard shows that there are some jobs (X) Enqueued (Queues X/3) but going into “hangfire/queues” it says all the queues are empty.
Then in the database:
select * from HangFire.JobQueue > nothing
select * from HangFire.job where statename=‘Enqueued’ > X records
And there is no message in the queues for MSMQ.

Might it be that the jobs have been run but the database has not been updated?

Thanks

Hi everyone, we hit a similar issue on one of our test servers. Our setup is as follows:

  • B2 Azure Website (2 cores, 3.5GB ram).
  • Web (retired) SQL Database.
  • Hangfire is using SQL Server storage.
  • 10 workers.
  • ~200 very lightweight scheduled jobs per day.

Today I noticed that we had 280 enqueued jobs, but they were not being processed. Since I didn’t have access to a PC, I didn’t examine what’s in the database and restarted the website and immediately Hangfire started processing the enqueued jobs. We’re not too concerned at this point, because our database server is fairly underpowered so this might have caused the problem, but I’ll share more information if it occurs again.

I’m considering setting up some lightweight monitoring task that will alert us if the queue becomes unusually long. Has anyone done something similar? I’m thinking about exposing a GET method that will query the monitoring API and return the queue length, but if anyone has a more elegant solution, let me know :smile:

This is a really timely thread because I was about to add a question for the same reason.

We’ve had several occasions where hangfire stops processing jobs too. We’ve had to restart the app to get it going again.

Our setup is similar to @nirinchev:

  • hangfire with SQL server storage.
  • 4 web servers (on appharbor/AWS) with 5 workers per server.
  • several hundred jobs per day

It would be awesome to get to the bottom of why it stops working. If there’s any diagnostic information that we can provide, please let me know.

Regarding monitoring, the idea that I had was to have some recurring job that is put on the queue and it updates some sort of data variable (eg: ‘last executed’). Then you could have an endpoint that returns an error if that data variable hasn’t been updated in some particular amount of time (eg: 2 minutes)

I’ve got the same issue : http://discuss.hangfire.io/t/hangfire-stops-processing-jobs-in-queue-after-database-errors-on-azure-sql

All five worker get stopped by Hangfire because the database is unavailable/failing for a short time.

@oeaoaueaa, recent Hangfire 1.3.3 release fixes problems with “stucked” enqueued jobs with MSMQ.

@chris, @paulduran your problem was fixed in Hangfire 1.3.2.

I am using SQL Server (not MSMQ) but this same behaviour is happening to me.

I have a long running background task that gets fired off on startup, for illustration purposes:

public class OwinStartup
{
    public void Configuration(IAppBuilder app)
    {
        app.UseHangfire(config =>
        {
            config.UseSqlServerStorage("Hangfire");
            config.UseServer();
        });

        BackgroundJob.Enqueue(() => Foo(JobCancellationToken.Null));                 
    }

    public static void Foo(IJobCancellationToken cancellationToken)
    {
        cancellationToken.ThrowIfCancellationRequested();

        Console.WriteLine("Starting FOO");
        while (true)
        {
            Thread.Sleep(100);
        }
    }
}

When I first launch my site from Visual Studio in IIS Express (F5), Foo gets enqueued and is in the Processing list, however, if I then stop my debug session in VS, and restart the site using F5, I see the original Foo in the Processing list, and a second Foo perpetually stuck in the queue.

If I then stop the site using the IIS Express controller in my task bar (as opposed to just cancelling the debug session in VS), and restart, the subsequent Foo will begin processing (but the second Foo remains in the queue).

I struggled with Hangfire for a full work day yesterday,and am almost ready to throw the towel in and use a different solution! Please help!

:frowning: This is due to the fact that Hangfire Server was killed, I call this ungraceful shutdown. Job queues based on SQL Server’s table works in the following way in Hangfire. When a worker picks up a job, it marks it as invisible for other workers (to prevent parallel processing of a same job). However, to handle ungraceful shutdown and to be able to process aborted jobs, we are using Invisibility timeout that defaults to 30 minutes. So, after this timeout, your job will be re-queued.

This is very confusing scenario, and I’ve added some hints for Hangfire 1.4.0’s job details page:

And

You can also use MSMQ, it does not have invisibility timeout, and jobs are requeued instantly – http://docs.hangfire.io/en/latest/configuration/using-sql-server-with-msmq.html.