Jobs Hang - 10 state change attempt(s) - New transaction is not allowed

Tags: #<Tag:0x00007fb50e2fb0f0> #<Tag:0x00007fb50e2fafd8> #<Tag:0x00007fb50e2faee8>

We are seeing jobs spontaneously “hang”, which causes all sorts of issues because we use DisableConcurrentExecution quite a bit.

Any idea if this is a configuration issue on our end - or a race condition within hangfire?

Message - New transaction is not allowed because there are other threads running in the session.
Exception type - System.Data.SqlClient.SqlException

10 state change attempt(s) failed due to an exception, moving job to the FailedState

System.Data.SqlClient.SqlException:
at System.Data.SqlClient.SqlConnection.OnError (System.Data.SqlClient, Version=4.6.1.1, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
at System.Data.SqlClient.SqlInternalConnection.OnError (System.Data.SqlClient, Version=4.6.1.1, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning (System.Data.SqlClient, Version=4.6.1.1, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
at System.Data.SqlClient.TdsParser.TryRun (System.Data.SqlClient, Version=4.6.1.1, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
at System.Data.SqlClient.SqlCommand.RunExecuteNonQueryTds (System.Data.SqlClient, Version=4.6.1.1, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
at System.Data.SqlClient.SqlCommand.InternalExecuteNonQuery (System.Data.SqlClient, Version=4.6.1.1, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
at System.Data.SqlClient.SqlCommand.ExecuteNonQuery (System.Data.SqlClient, Version=4.6.1.1, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a)
at Hangfire.SqlServer.SqlCommandBatch.ExecuteNonQuery (Hangfire.SqlServer, Version=1.6.20.0, Culture=neutral, PublicKeyToken=null)
at Hangfire.SqlServer.SqlServerWriteOnlyTransaction.b__7_0 (Hangfire.SqlServer, Version=1.6.20.0, Culture=neutral, PublicKeyToken=null)
at Hangfire.SqlServer.SqlServerStorage+<>c__DisplayClass26_0.b__0 (Hangfire.SqlServer, Version=1.6.20.0, Culture=neutral, PublicKeyToken=null)
at Hangfire.SqlServer.SqlServerStorage+<>c__DisplayClass27_0`1.b__0 (Hangfire.SqlServer, Version=1.6.20.0, Culture=neutral, PublicKeyToken=null)
at Hangfire.SqlServer.SqlServerStorage.UseConnection (Hangfire.SqlServer, Version=1.6.20.0, Culture=neutral, PublicKeyToken=null)
at Hangfire.SqlServer.SqlServerStorage.UseTransaction (Hangfire.SqlServer, Version=1.6.20.0, Culture=neutral, PublicKeyToken=null)
at Hangfire.SqlServer.SqlServerStorage.UseTransaction (Hangfire.SqlServer, Version=1.6.20.0, Culture=neutral, PublicKeyToken=null)
at Hangfire.SqlServer.SqlServerWriteOnlyTransaction.Commit (Hangfire.SqlServer, Version=1.6.20.0, Culture=neutral, PublicKeyToken=null)
at Hangfire.States.BackgroundJobStateChanger.ChangeState (Hangfire.Core, Version=1.7.12.0, Culture=neutral, PublicKeyToken=null)
at Hangfire.Server.Worker.TryChangeState (Hangfire.Core, Version=1.7.12.0, Culture=neutral, PublicKeyToken=null)

.NET Core 3.1

SQL Server Versions (we use both, see error on both)

Microsoft SQL Azure (RTM) - 12.0.2000.8 Oct 1 2020 18:48:35 Copyright © 2019 Microsoft Corporation
Microsoft SQL Server 2017 (RTM-CU14-GDR) (KB4494352) - 14.0.3103.1 (X64) Mar 22 2019 22:33:11 Copyright © 2017 Microsoft Corporation Enterprise Edition: Core-based Licensing (64-bit) on Windows Server 2016 Datacenter 10.0 (Build 14393: ) (Hypervisor)

I think I have all useful Startup.cs info here:

            var options = new SqlServerStorageOptions {
                SlidingInvisibilityTimeout = TimeSpan.FromSeconds(ApplicationSettings.HangfireSlidingInvisibilityTimeoutSeconds),
                QueuePollInterval = TimeSpan.FromSeconds(ApplicationSettings.HangfireQueuePollIntervalSeconds)
            };

            GlobalConfiguration.Configuration.UseSqlServerStorage(hangfireConnectionString, options);

        app.UseHangfireDashboard("/hangfire", new DashboardOptions {
            Authorization = new[] {
                new HangfireCustomBasicAuthenticationFilter {
                    User = _configuration.GetSection("HangfireSettings:UserName").Value,
                    Pass = _configuration.GetSection("HangfireSettings:Password").Value
                }
            }
        });
        app.UseHangfireServer(new BackgroundJobServerOptions {
            WorkerCount = 20
        });

I should also mention! We had to start using SlidingInvisibilityTimeout and QueuePollInterval options, because these hung jobs would leave a spid open and blow up our transaction log until we restarted hangfire.

I’ve been having a similar issue, i have 900+ recurring jobs that rely on DisableConcurrentExecution too, we are experiencing heartbeat and statechange problems, except the error i got is

“A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 40 - Could not open a connection to SQL Server)”.

I rely heavly on hangfire to run batch jobs for my application and it’s really taking a toll on my processing time.

Any help would be great!

toutboxrj - I think your error is just a generic networking error between your hangfire instance and SQL server - are you seeing that error occur within your jobs as well? Are you using the same SQL connection string for hangfire as well as your jbos?

When talking about .NET Core, sometimes problems like this are caused by bugs in the System.Data.SqlClient package, and an explicit reference to the latest version in the application project resolves all the problems – there were a lot of bugs during early stages of .NET Core (and looks like we can consider even 3.0 as an early stage).

Thanks Sergey!

bugs in System.Data.SqlClient explain my New transaction is not allowed because there are other threads running in the session. error?

Or @toutboxrj’s error of A network-related or instance-specific error occurred while establishing a connection to SQL Server.

or both?

Will update and advise