AutomaticRetry - Am I missing somthing!

Hi,

I have set [AutomaticRetry(Attempts = 0)] to my method which is having Hangfire.Enqueue call. but somehow it tries to attempt 10 times if my job has failed.I don’t want the method to be processed again if it fails.
It gives me error message like for each attempt : Error occurred during execution of ‘Worker #11’ component. Execution will be retried (attempt 5 of 10)

Am I missing something??? Please suggest.

This message says that there are problems outside of your job, there is also an internal retrying mechanisms to repair Hangfire threads automatically. Please tell me the exception itself and its stacktrace. To see them, configure your logging library to log exceptions also.

Hi ,

For some background, I am using Hangfire to upload data of excel file(small-large) to database table(Insert\update) as background process in Asp.Net MVC web application.

Below is the more error detail from elmah :

Error : Error occurred during execution of ‘Worker #9’ component. Execution will be retried (attempt 1 of 10) in 00:00:00 seconds.

Detail :

  1. System.Threading.ThreadAbortException: Thread was being aborted.
    at Hangfire.Server.Worker.ProcessJob(String jobId, IStorageConnection connection, IJobPerformanceProcess process, CancellationToken shutdownToken)
    at Hangfire.Server.Worker.Execute(CancellationToken cancellationToken)
    at Hangfire.Server.AutomaticRetryServerComponentWrapper.ExecuteWithAutomaticRetry(CancellationToken cancellationToken)

Sometimes it gives like below in elmah :

  1. System.InvalidOperationException: Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.
    at System.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection) at System.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource1 retry, DbConnectionOptions userOptions)
    at System.Data.SqlClient.SqlConnection.TryOpenInner(TaskCompletionSource1 retry) at System.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource1 retry)
    at System.Data.SqlClient.SqlConnection.Open()
    at Hangfire.SqlServer.SqlServerStorage.CreateAndOpenConnection()
    at Hangfire.SqlServer.SqlServerStorage.GetConnection()
    at Hangfire.Server.Worker.Execute(CancellationToken cancellationToken)
    at Hangfire.Server.AutomaticRetryServerComponentWrapper.ExecuteWithAutomaticRetry(CancellationToken cancellationToken)

  2. Error occurred during execution of ‘Recurring Job Scheduler’ component.
    Execution will be retried (attempt 1 of 10) in 00:00:01 seconds.
    System.Data.SqlClient.SqlException (0x80131904): Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding. —> System.ComponentModel.Win32Exception (0x80004005): The wait operation timed out
    at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action1 wrapCloseInAction) at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose) at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady) at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString) at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async, Int32 timeout, Task& task, Boolean asyncWrite, SqlDataReader ds) at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, TaskCompletionSource1 completion, Int32 timeout, Task& task, Boolean asyncWrite)
    at System.Data.SqlClient.SqlCommand.InternalExecuteNonQuery(TaskCompletionSource1 completion, String methodName, Boolean sendToPipe, Int32 timeout, Boolean asyncWrite) at System.Data.SqlClient.SqlCommand.ExecuteNonQuery() at Dapper.SqlMapper.ExecuteCommand(IDbConnection cnn, CommandDefinition& command, Action2 paramReader)
    at Dapper.SqlMapper.ExecuteImpl(IDbConnection cnn, CommandDefinition& command)
    at Hangfire.SqlServer.SqlServerDistributedLock…ctor(String resource, TimeSpan timeout, IDbConnection connection)
    at Hangfire.SqlServer.SqlServerConnection.AcquireDistributedLock(String resource, TimeSpan timeout)
    at Hangfire.Server.RecurringJobScheduler.Execute(CancellationToken cancellationToken)
    at Hangfire.Server.AutomaticRetryServerComponentWrapper.ExecuteWithAutomaticRetry(CancellationToken cancellationToken)
    ClientConnectionId:835c6fab-be41-4047-8ac9-b794e675825a


Please find attachments for example exception detail in State table in Hangfire database.

As you can see there are 2 kind of errors in state table.( 2 images) First, Timeout. Second,File Directory not found.
I have checked excel file is there in the directory.

In jobs table, In duration of all this,existing (but not succeeded) job remains in “Scheduled” state and if I upload new excel file then new jobs are going in Scheduled state instead of processing.

And surprisingly, some times the job with this file not found excption gets succeeded after some duration !!!

I am not sure how hangfire is processing job. Please have a look and suggest.


State table exception in detail :

{“FailedAt”:“2015-01-28T06:26:11.0467558Z”,

“ExceptionType”:“System.Data.SqlClient.SqlException”,

“ExceptionMessage”:“Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.”,

“ExceptionDetails”:"System.Data.SqlClient.SqlException (0x80131904): Timeout expired.
The timeout period elapsed prior to completion of the operation or the server is not responding. —> System.ComponentModel.Win32Exception (0x80004005): The wait operation timed out\r\n
at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)\r\n

at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)\r\n

at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)\r\n
at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString)\r\n

at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async, Int32 timeout, Task& task, Boolean asyncWrite, SqlDataReader ds)\r\n
at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, TaskCompletionSource1 completion, Int32 timeout, Task& task, Boolean asyncWrite)\r\n at System.Data.SqlClient.SqlCommand.InternalExecuteNonQuery(TaskCompletionSource1 completion, String methodName, Boolean sendToPipe, Int32 timeout, Boolean asyncWrite)\r\n at System.Data.SqlClient.SqlCommand.ExecuteNonQuery()\r\n
ClientConnectionId:bae2a6d1-d4a5-4606-9a42-ab3e78f480e1"}

{“FailedAt”:“2015-01-28T23:06:25.0085674Z”,"

ExceptionType":“System.IO.FileNotFoundException”,

“ExceptionMessage”:“Could not find file ‘C:\Elastic Grid\CampaignLauncher\temp\446f055b-1f5d-4656-925c-b947e16559aa25000_contacts_XLSX_Step 7.xlsx’.”,

“ExceptionDetails”:"System.IO.FileNotFoundException: Could not find file ‘C:\Elastic Grid\CampaignLauncher\temp\446f055b-1f5d-4656-925c-b947e16559aa25000_contacts_XLSX_Step 7.xlsx’.\r\nFile name: ‘C:\Elastic Grid\CampaignLauncher\temp\446f055b-1f5d-4656-925c-b947e16559aa25000_contacts_XLSX_Step 7.xlsx’\r\n
at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)\r\n at System.IO.FileStream.Init(String path, FileMode mode, FileAccess access, Int32 rights, Boolean useRights, FileShare share, Int32 bufferSize, FileOptions options, SECURITY_ATTRIBUTES secAttrs, String msgPath, Boolean bFromProxy, Boolean useLongPath, Boolean checkHost)\r\n

at System.IO.FileStream…ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options, String msgPath, Boolean bFromProxy)\r\n at System.IO.FileStream…ctor(String path, FileMode mode, FileAccess access, FileShare share)}


As per my understanding , When I upload the excel file, hangfire creates a job and starts processing. The processing takes more then 30 min and Hangfire creates one more job for the same excel file upload. So two jobs are accessing same file and one of them gets File not found error as other is processing it. This retry count of errored job gets increasing and connections gets increasing. This happens for each upload excel file task and all gets down. This is just what I am thinking/assuming, Please suggest your views/solution and do the needful asap.

@Vibhuti, thank you for the detailed information!

Have you tried to increase the InvisibilityTimeout option to a higher value, if you have huge files as written here – http://docs.hangfire.io/en/latest/configuration/using-sql-server.html#configuring-the-invisibility-timeout?

Worker thread is being aborted (and ThreadAbortException occurs) when Hangfire Server is unable to stop the background job on application shutdown using IJobCancellationToken (you can read about it – http://docs.hangfire.io/en/latest/background-methods/using-cancellation-tokens.html).

Hi,

Thank you for the reply.
I have implemented your suggestions in my code and now no file not found error is coming anymore.
In between, I noticed one thing, I uploaded 10000 records and job is created in jobs table with enqueued/ processing state.

In State table it went through Enqueued->processing->processing->processing->processing->processing->processing->succeeded states.
so for processing state it shows 6 rows in state table with different worker number. Is it right??

State table is log table for jobs? Why each processing state shows different worker number??
Is it processing multiple time each job on different worker?

Please provide some details to understand the processing mechanism followed by Hangfire.

Hi,

This is urgent !!! In my jobs table, There is a job is in processing state and after that goes to succeeded state.
But actually job has not succeeded and nothing is showing in my web page. It has done nothing !!!
I am not sure why it is in succeeded state !!!

In state table, It shows “System.OperationCanceledException” - The operation was canceled.
In jobs table, It shows succeeded

Please do the needful.

Hi,

Somehow I managed to solve issue.But I had to remove code of IJobCancellationToken to make functionality running properly.
Thanks for help.