I set my logging level to TRACE and I got the following when the issue happend (since it happens every few days):
09:18:24.382 Hangfire.Server.Worker |Fatal| Fatal error occurred during execution of 'Worker #2' component. It will be stopped. See the exception for details. System.Data.SqlClient.SqlException (0x80131904): The service has encountered an error processing your request.
09:18:28.102 Hangfire.Server.Worker |Fatal| Fatal error occurred during execution of 'Worker #1' component. It will be stopped. See the exception for details. System.Data.SqlClient.SqlException (0x80131904): A transport-level error has occurred when receiving results from the server.
09:18:30.174 Hangfire.Server.Worker |Fatal| Fatal error occurred during execution of 'Worker #4' component. It will be stopped. See the exception for details. System.Data.SqlClient.SqlException (0x80131904): The service has encountered an error processing your request. Please try again. Error code 40143.
09:18:30.368 Hangfire.Server.Worker |Fatal| Fatal error occurred during execution of 'Worker #5' component. It will be stopped. See the exception for details. System.Data.SqlClient.SqlException (0x80131904): A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.)
09:18:30.651 Hangfire.Server.Worker |Fatal| Fatal error occurred during execution of 'Worker #3' component. It will be stopped. See the exception for details. System.Data.SqlClient.SqlException (0x80131904): A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.)
Looks like Hangfire is stopping the worker when a database error occurs.
Unfortunately on Azure SQL errors can occurs and it is considered normal.
The application should implementing some sort of retry logic, Transient Fault Handling (Building Real-World Cloud Apps with Azure)
Is hangfire using such a logic ?
Also would I be better of using the Azure service bus to schedule the jobs ?
or if hangfire hasn’t some retry logic for the database I will have the same issue anyway ?!
Is there some way to restart the workers when they are stopped because of a database error ?
One full error:
09:18:28.102 Hangfire.Server.Worker |Fatal| Fatal error occurred during execution of 'Worker #1' component. It will be stopped. See the exception for details. System.Data.SqlClient.SqlException (0x80131904): A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.) ---> System.ComponentModel.Win32Exception (0x80004005): An existing connection was forcibly closed by the remote host
at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
at System.Data.SqlClient.TdsParserStateObject.ReadSniError(TdsParserStateObject stateObj, UInt32 error)
at System.Data.SqlClient.TdsParserStateObject.ReadSniSyncOverAsync()
at System.Data.SqlClient.TdsParserStateObject.TryReadNetworkPacket()
at System.Data.SqlClient.TdsParserStateObject.TryPrepareBuffer()
at System.Data.SqlClient.TdsParserStateObject.TryReadByte(Byte& value)
at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
at System.Data.SqlClient.SqlDataReader.TryConsumeMetaData()
at System.Data.SqlClient.SqlDataReader.get_MetaData()
at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString)
at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async, Int32 timeout, Task& task, Boolean asyncWrite, SqlDataReader ds)
at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean asyncWrite)
at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method)
at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior, String method)
at System.Data.SqlClient.SqlCommand.ExecuteDbDataReader(CommandBehavior behavior)
at System.Data.Common.DbCommand.System.Data.IDbCommand.ExecuteReader(CommandBehavior behavior)
at Dapper.SqlMapper.<QueryImpl>d__11`1.MoveNext()
at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)
at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)
at Dapper.SqlMapper.Query[T](IDbConnection cnn, String sql, Object param, IDbTransaction transaction, Boolean buffered, Nullable`1 commandTimeout, Nullable`1 commandType)
at Hangfire.SqlServer.SqlServerJobQueue.Dequeue(String[] queues, CancellationToken cancellationToken)
at Hangfire.SqlServer.SqlServerConnection.FetchNextJob(String[] queues, CancellationToken cancellationToken)
at Hangfire.Server.Worker.Execute(CancellationToken cancellationToken)
at Hangfire.Server.ServerSupervisor.ExecuteComponent()
at Hangfire.Server.ServerSupervisor.RunComponent()
ClientConnectionId:b9fe70b0-43b8-4e0e-9cd3-b6a1f7b7609d
Error Number:10054,State:0,Class:20, System.Data.SqlClient.SqlException (0x80131904): A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.) ---> System.ComponentModel.Win32Exception (0x80004005): An existing connection was forcibly closed by the remote host
at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
at System.Data.SqlClient.TdsParserStateObject.ReadSniError(TdsParserStateObject stateObj, UInt32 error)
at System.Data.SqlClient.TdsParserStateObject.ReadSniSyncOverAsync()
at System.Data.SqlClient.TdsParserStateObject.TryReadNetworkPacket()
at System.Data.SqlClient.TdsParserStateObject.TryPrepareBuffer()
at System.Data.SqlClient.TdsParserStateObject.TryReadByte(Byte& value)
at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
at System.Data.SqlClient.SqlDataReader.TryConsumeMetaData()
at System.Data.SqlClient.SqlDataReader.get_MetaData()
at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString)
at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async, Int32 timeout, Task& task, Boolean asyncWrite, SqlDataReader ds)
at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean asyncWrite)
at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method)
at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior, String method)
at System.Data.SqlClient.SqlCommand.ExecuteDbDataReader(CommandBehavior behavior)
at System.Data.Common.DbCommand.System.Data.IDbCommand.ExecuteReader(CommandBehavior behavior)
at Dapper.SqlMapper.<QueryImpl>d__11`1.MoveNext()
at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)
at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)
at Dapper.SqlMapper.Query[T](IDbConnection cnn, String sql, Object param, IDbTransaction transaction, Boolean buffered, Nullable`1 commandTimeout, Nullable`1 commandType)
at Hangfire.SqlServer.SqlServerJobQueue.Dequeue(String[] queues, CancellationToken cancellationToken)
at Hangfire.SqlServer.SqlServerConnection.FetchNextJob(String[] queues, CancellationToken cancellationToken)
at Hangfire.Server.Worker.Execute(CancellationToken cancellationToken)
at Hangfire.Server.ServerSupervisor.ExecuteComponent()
at Hangfire.Server.ServerSupervisor.RunComponent()
ClientConnectionId:b9fe70b0-43b8-4e0e-9cd3-b6a1f7b7609d
Error Number:10054,State:0,Class:20