Hangfire deadlocks on server restart

Tags: #<Tag:0x00007f1866a70a50>

We have an azure websites that is running Hangfire which appears to be deadlocking when we restart.

[SqlException (0x80131904): Transaction (Process ID 1161) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
Installing Hangfire SQL objects...
Database schema [HangFire] already exists
Table [HangFire].[Schema] already exists
Current HangFire schema version: 3]
   System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction) +1787814
   System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction) +5341674
   System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose) +546
   System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady) +1693
   System.Data.SqlClient.SqlCommand.RunExecuteNonQueryTds(String methodName, Boolean async, Int32 timeout, Boolean asyncWrite) +869
   System.Data.SqlClient.SqlCommand.InternalExecuteNonQuery(TaskCompletionSource`1 completion, String methodName, Boolean sendToPipe, Int32 timeout, Boolean asyncWrite) +413
   System.Data.SqlClient.SqlCommand.ExecuteNonQuery() +163
   Dapper.SqlMapper.ExecuteCommand(IDbConnection cnn, CommandDefinition& command, Action`2 paramReader) +93
   Dapper.SqlMapper.ExecuteImpl(IDbConnection cnn, CommandDefinition& command) +761
   Dapper.SqlMapper.Execute(IDbConnection cnn, String sql, Object param, IDbTransaction transaction, Nullable`1 commandTimeout, Nullable`1 commandType) +175
   Hangfire.SqlServer.SqlServerObjectsInstaller.Install(SqlConnection connection) +182
   Hangfire.SqlServer.SqlServerStorage..ctor(String nameOrConnectionString, SqlServerStorageOptions options) +203
   Thunder.Startup.<Configuration>b__2(IBootstrapperConfiguration config) +87
   Hangfire.OwinBootstrapper.UseHangfire(IAppBuilder app, Action`1 configurationAction) +82
   Thunder.Startup.Configuration(IAppBuilder app) +207

See the full stack trace here
https://jsfiddle.net/np0dc1zm/embedded/result/

We scaled down to just one azure instance and restart and this is no longer an issue. Any idea what is going on?

Hm, interesting. Do you have any deadlock graphs logged by DBCC TRACEON(1204, 3406, -1) command? My guess is that deadlock occurs between SqlServerObjectsInstaller.Install method and regular Hangfire workload (not between Install/Install, because the order of operations is the same).

We can’t simply ignore deadlock, because there is an edge case, when another instance (that uses old schema) is shutting down, and a new instance with the new schema starts up. So, we should retry the transaction. I’ve added a pull request for this, it will be shipped in 1.4.0. Before that you can temporarily disable automatic migrations:

var options = new SqlServerStorageOptions
{
    PrepareSchemaIfNecessary = false
};

var storage = new SqlServerStorage("<name or connection string>", options);

Thanks we will update our code to do this … How common is a scheme change ? We have BE and FE instances that run hang fire and it’s possible that we dont always deploy to then simulataneosly mean the FE scheme could be out of date with the BE? Is it only a concern on version changes - fyi (in our current case we did not change the hang fire version)

You should expect schema changes in minor releases. SemVer says nothing about it, but they are automatic, that is why I decided to have ability to change schema versions in minor releases. I’ll include information about schema changes to release notes.

Hi guys, retry functionality added in version 1.4.0.

Did this resolve the issue? We had this exact scenario hit us on 1.3.4 build when our Amazon instance auto scaled on. We have both scheduled as well as auto scaled type of auto scaling and it was our scheduled 8:00 am. 3 were on and only one threw this error.

Have you tried 1.4.0?

Can’t test in production yet as we are in a slower release process. Was curious to @dlfu 's experience to ease the fears of the boss. :smile:

We may do a point release for our recent release to resolve this. I’ll update later if we do but the scenario isn’t easily reproducible as we did a release Saturday and just now this happened. Unless you can instruct me of a method to reproduce manually? Thanks!

Implemented hotfix to our production environment and havent had any issues as of yet. Auto scaling worked and no deadlocks were experienced. Will continue to monitor. THANKS!

1 Like