Transaction Scopes and MSMQ

Tags: #<Tag:0x00007f0664b4bf88>

I am currently using Hangfire with MSMQ. I have come across a strange issue where if I attempt to queue a job into hangfire from within a TransactionScope, the job will be added to the [HangFire].[Job] table, but it will never be executed. I have also noticed this only occurs when I attempt to queue the job from within the execution of another job.

I am using transaction scopes because I am performing several SQL operations and if any of them fail, I want them to all be rolled back, including the insertion of the job into the hangfire DB.

To summarize:

  1. Queue a job to run (I used recurring but I do not think it matters)
  2. Within that jobs execution code, add a new job wrapped inside a TransactionScope.

I have included some sample code below.

RecurringJob.AddOrUpdate<Test>(x => x.Test1(), Cron.Minutely);

public class Test
{
    public void Test1()
    {
        Console.WriteLine("In test1");

        using (TransactionScope scope = new TransactionScope())
        {
            BackgroundJob.Enqueue(() => Test2());
        }
    }

    public void Test2()
    {
        Console.WriteLine("In test2");
    }
}

What Hangfire version are you using? I’ve tried to reproduce this with Hangfire 1.4.6, and got the following results.

When there is no call to scope.Complete() (as in the code snippet above), Test2 background job isn’t created in a storage as expected. Console output shows only “In test1” strings.

When I place scope.Complete() just before closing the using statement, Test2 background job is created and executed successfully:

Here is my configuration:

GlobalConfiguration.Configuration
    .UseColouredConsoleLogProvider()
    .UseSqlServerStorage(@"server=.\sqlexpress;database=Hangfire.Test2;integrated security=sspi;")
    .UseMsmqQueues(@".\Private$\hangfire{0}", "default");

Please post your configuration here.

Sorry about that. I actually already had scope.Complete() in my project but accidentally left it out when I created the small sample for you. I am using V1.4.6.0.

This problem appears to be much deeper than I thought. When I change the worker count to 1, it works every time. When I leave it at it’s default (4 cores, 20 workers), I start to see the issue. It does not happen every time and appears to be some type of random timing issue. I have to keep manually inserting the top level job into the queue (via the dashboard) and every 5 or 6 tries, the sub job wont fire and stays stuck. The top level job always fires.

I can prevent the problem from occurring by doing one of the following. Unfortunately, none of these are options for me.

  1. Disable MSMQ
  2. Don’t use a transaction.
  3. Use a single thread.

My configuration is GlobalConfiguration.Configuration.UseSqlServerStorage(“string”).UseMsmqQueues(@".\Private$\hangfire-{0}", “default”);

I appreciate your help. I will continue to debug this and see what I can find. If you have any ideas, please share.

Thanks

I pulled down the latest hangfire source so I could dig into this a little more and I believe I have located the issue.

When Execute is called for the second job, it eventually makes a call to GetJobData in SqlServerConnection.cs (line 155). In some cases, GetJobData returns null because the job does not exist in the DB. It appears that hangfire sometimes executes the second job before the transaction that inserted it has completed. My guess is something like the following is occurring:

-Job A executes
-Job A starts a transaction and inserts Job B into the queue from inside the transaction
-Job B is immediately executed via MSMQ but the transaction from Job A has not yet completed. Since it has not completed, GetJobData returns null and Job B fails to run.

Let me know your thoughts on this.

Thanks,
Otis

@ogoodwin, many thanks for the investigation! With your help, I’ve just made a fix and released Hangfire 1.4.7. It now waits for a non-null JobData instance (like it previously waited for non-null state), with 1 minute timeout.

That appears to have resolved the issue. Thanks so much for your quick response to this problem.

Thanks again,
Otis

Resurrecting the old topic. I don’t like the fix introduced in Hangfire 1.4.7 that solves a problem with enqueued jobs that aren’t executed when using TransactionScope and MSMQ. I was able to reproduce this problem recently by enabling snapshot isolation.

Does your database use the READ_COMMITTED_SNAPSHOT or ALLOW_SNAPSHOT_ISOLATION option?

I’ve already added a note to the documentation that snapshot isolation isn’t supported by Hangfire, and want to rollback the fix. The problem is that we can’t efficiently handle non-existing background jobs with the fix, and a number of wrong identifiers in a queue may cause background job processing to stuck.