Batching hundreds of jobs causes connection pool timeout

geirsagberg · August 24, 2016, 4:52pm

I have a case where I want to enqueue hundreds of jobs in a batch, and the enqueued jobs contain database operations on the same database server as the one Hangfire runs on.

The problem is that after starting the batch, the Hangfire dashboard becomes unresponsive, and I start getting lots of errors like this:

Error occurred during execution of 'Worker #d726f5b8' process. Execution will be retried (attempt 1 of 2147483647) in 00:00:00 seconds.

Stack trace:

System.InvalidOperationException: Timeout expired.  The timeout period elapsed prior to obtaining a connection from the pool.  This may have occurred because all pooled connections were in use and max pool size was reached.
   at System.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)
   at System.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
   at System.Data.ProviderBase.DbConnectionClosed.TryOpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
   at System.Data.SqlClient.SqlConnection.TryOpenInner(TaskCompletionSource`1 retry)
   at System.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource`1 retry)
   at System.Data.SqlClient.SqlConnection.Open()
   at Hangfire.SqlServer.SqlServerStorage.CreateAndOpenConnection()
   at Hangfire.SqlServer.SqlServerStorage.UseConnection[T](Func`2 func)
   at Hangfire.SqlServer.SqlServerStorage.UseTransaction[T](Func`3 func, Nullable`1 isolationLevel)
   at Hangfire.SqlServer.SqlServerStorage.UseTransaction(Action`2 action)
   at Hangfire.SqlServer.SqlServerWriteOnlyTransaction.Commit()
   at Hangfire.Batches.BatchJobSupportAttribute.OnStateApplied(ApplyStateContext context, IWriteOnlyTransaction outerTransaction)
   at Hangfire.States.StateMachine.ApplyState(ApplyStateContext initialContext)
   at Hangfire.States.BackgroundJobStateChanger.ChangeState(StateChangeContext context, BackgroundJob backgroundJob, IState toState, String oldStateName)
   at Hangfire.States.BackgroundJobStateChanger.ChangeState(StateChangeContext context)
   at Hangfire.Server.Worker.Execute(BackgroundProcessContext context)
   at Hangfire.Server.ServerProcessExtensions.Execute(IServerProcess process, BackgroundProcessContext context)
   at Hangfire.Server.AutomaticRetryProcess.Execute(BackgroundProcessContext context)

It would seem the batching of the jobs causes some kind of connection pool shortage.

These errors do not appear if I queue the jobs up normally.

Using Hangfire 1.6.3 with Hangfire Pro 1.4.7.

Any idea how to fix this?

odinserj · August 30, 2016, 5:28pm

There was a problem related to connection leaks, when nested distributed locks acquired for the same resource. The problem is related to Hangfire.SqlServer, and introduced in version 1.5.0, the latest Pro version helped to reveal it. Both batches and continuations are affected.

I’ve just released Hangfire 1.6.4 with the fix, thanks for reporting this!

mahdi_negahi · November 23, 2016, 7:33pm

Hi,

It looks the problem is exists in version 1.6.4…

              foreach ( i=0-316)
                {                   
                    foreach (x=0-2) 
                    {
                        parentId = BackgroundJob.Enqueue<Task1Wapper>(task => task.Execute(i,x));
                    }                 
                }

and the code task wrapper is:

public class Task1Wrapper
{

    public void Execute(int i, int x)
    {
        using (MyDbContext context = new MyDbContext())
        {
              // do sth     
        }
    }
}

The following photo shows my Performance counter, the blue line refers to Number of Connection Pooled. At the end of diagram, the system throws a timeout exception.

odinserj · November 24, 2016, 11:26am

The NumberOfPooledConnections performance counter does not explain the timeout exception. This is how the connection pool works: it creates a lot of connections in advance, to decrease the delays caused by opening a connection. The correct performance counter that may show connections aren’t closed is NumberOfActiveConnections, but it requires the following code in web.config or app.config:

<system.diagnostics>
  <switches>
    <add name="ConnectionPoolPerformanceCounterDetail"
         value="4"/>
  </switches>
</system.diagnostics>

Can you post here the result of NumberOfActiveConnections counter, as well as full information about exception, including message and stack trace?

Topic		Replies	Views
Error in sql connection pool question	0	1967	January 29, 2018
Timeout/Deadlock occurring in infrastructure question msdtc , recurring , sql-server , queues	6	7545	November 11, 2022
DistributedLock TimeoutException with batches, but batch operations get executed question hangfire-pro , sql-server	9	8057	March 20, 2017
Error occurred during execution of 'Server Heartbeat'-Load Balancer question sql-server	3	2878	September 8, 2016
Timeout on connectin pool bug? sql-server , queues	0	2602	February 5, 2018

Batching hundreds of jobs causes connection pool timeout

Related topics