Hangfire and separation of concerns

Tags: #<Tag:0x00007f2356a52c58>

Hello, I am using hangfire for scheduling of order processing jobs in our company, and it is successful but with some caveats I would really like to overcome.

Basically I have several separate logical services that will all need to be performed on our orders depending on what state they in (overly simplified: submitted orders will need step 1 run on them, step 1 orders will need step 2, etc)

The services are currently set up as completely independent windows services with a hangfire instance on each of them that is pointing to a unique queue using
BackgroundJobServerOptions with queues set to the queue that service should process.

The services responsible for running the jobs are developed and released independently by various developers with a shared library for the hangfire server implementation

I then have a single website that has the hangfire user interface for scheduling the jobs and doesn’t run any backend job logic itself

This works quite well, but has some hurdles I can’t seem to overcome. One is that basically the services are spamming the sql server as more services are added to automate more tasks in the business

the other is that the job web uses an interface to schedule the job, and the services have implementations for that interface, which is fine except that since the interface is needed for scheduling the jobs and all the services need to be able to schedule all the jobs (even though they only run the ones from a particular queue) I need to recompile and redeploy every service every time I make any change to any of the interfaces for a new service…

I think I have not designed this optimally but in my reading I can’t seem to find the recommended or “right” way to do this. Basically I want to be able to have someone write all the non hangfire related code for a new job and add this to be scheduled and run by hangfire without having to touch the existing jobs. Is there a better way?

Thanks for any input!

I don’t think you’ve necessarily made a wrong architectural choice. Your design is completely fine for smaller projects, but you may begin to suffer more as you add Services and Jobs. Don’t feel bad, I’d prefer to see people under-engineer and build a pragmatic solution and then scale out as the needs justify the effort instead of over-engineering a project because you’ve anticipated needs that will never materialize.

The main issue with the type of design you’ve implemented is that you have 2 central dependencies (your Hangfire library of Jobs and your interface abstractions) that every other project depends on. Every new project is going to make maintaining the 2 central dependencies more and more complicated. It sounds like both of those dependencies are a direct result of having your dashboard and having your unique servers be able to schedule work for other servers.

There’s a couple evolutions you can do to change to a more scalable architecture:

  1. Stand up a Hangfire Dashboard project for each “unique” Webservice. We utilize always-on Websites instead of Hangfire Windows Service + Dashboard Website to simplify the amount of deployed code. This is the first step in eliminating the shared Job library as you will need one less project that knows about all Jobs.

  2. Think about your integration points between different “unique” Servers. Instead of queuing up a Hangfire Job to be run on a different server, you could instead kick off the job via a Webservice call (or other similar “messaging” pattern). For example, if Step 1 runs on Server A, when Step 1 is complete, Server A makes a Webservice call to Server B to kick off/queue up Step 2 which will be run on Server B.

  3. Think about a strategy for eliminating point to point calls between Webservices. We’ve utilized pub/sub messaging patterns (i.e. Azure Service Bus, Google Pubsub, etc) to notify that work is available to be done without necessarily building a spiderweb of point to point calls. This step might be completely over kill if you really only have a couple of calling “contracts” between Webservices.

The downside of doing this work is you’re going to have to write more code and you might not be able to easily facilitate each developer taking care of one piece. So how far you go depends on your need to scale vs your need to maintain a sane working environment. There’s also other approaches to scaling but it really depends on the types of problems you’re trying to solve.

Thanks so much for your thoughtful reply. Yes, your recommendations make a lot of sense and one of the alternate solutions I was thinking might work better as we scale up is basically instead of doing this with hangfire would be to use a message queue to communicate between the services rather than our current setup of database statuses + polling since really this whole process is a triggered process. This is a difficult step for our organization but would eventually be better if we did it I think.

The main trouble that I’m having right now is that I did not expect the interface for step 4 or for a new step we create tomorrow to be required in the service that will only run step 1. It seems that it is required just to for the service to be able to act as a redundant way to queue the job up. I did want this redundancy in case our web frontend went down but I didn’t foresee this meaning that every service will basically need every interface for every job hangfire touches.

The “shared job library” is maybe misrepresented on my part, this is just an internal dll that you need to include in your project that includes the hangfire server and windows service config code. Those are separately compiled code static dlls included with each service and don’t need to be redeployed unless the hangfire version changes so they are not much different from the always on websites in that way I think. Does that mean the each of your always on websites also has a separate hangfire database? Otherwise I assume they would also run into the same interface issue?

What I would love to be able to do is either to be able to have the service queue an already scheduled job without requiring it’s interface (I don’t quite understand why this is required when the job was already scheduled from the website that does have the interface, and will eventually be run on a service that does have the interface) or to have the service only be a backup for queuing the jobs that are in it’s queue (since it will have the interface for those) but I don’t think that is possible. I may settle for not having the services act as redundancy for the queuing of scheduled jobs at all, which I believe is possible.

Not 100% sure what you mean by interface in this context? Do you mean the BackgroundJob’s Type?

Every one of our unique Hangfire Servers has unique storage. We use SQL Server Storage so this can manifest two ways, either a unique database or a unique schema in the same database. i.e.

  • Server A runs on Database A with Schema A
  • Server B runs on Database A with Schema B
  • Server C runs on Database B

Servers A, B and C don’t share any Background Job Types. Each of these Servers is running it’s own Dashboard as well. If I finish work on Server A and the next logical step is some work that’s done on Server B I either make a Web service call from Service A to Service B, queue up an MSMQ or Pubsub message and Service B consequently queues up the Background Job to be executed on one of it’s instances.

We might have 4 instances of Server A running, 2 instances of Server B and 8 instances of Server C, all behind a load-balancer for the Dashboard and each with 10 Hangfire Workers dequeing BackgroundJobs and doing work.

The key thing is we don’t rely on Server A being able to queue up BackgroundJob B which will ultimately run on Server B. Doing so will require shared knowledge of Background Job B’s Type.

If you need to manage state for the object each of the servers is working on then you can have a shared database separate of the Hangfire Database in which the BackgroundJob can update whatever state data is necessary for the larger process. This is not Hangfire functionality and is just purely based on your application state requirements.

Yes, it is the equivalent of background job’s type. I use dependency injection in the windows service for the actual type so the shared knowledge between the scheduler and runner is the job’s interface rather than the implementation type. Currently each job type has it’s own interface.

We keep our state management separate from hangfire, but we are using one hangfire storage so that we can use hangfire’s logging system to easily track failed jobs using the nice dashboard system without having to check multiple places for each job type. We have 12 different jobs right now, I think it would be quite difficult to manage if they each had their own storage in our case

That’s interesting, I have never felt the need to separate interface/implementation on Hangfire Background Jobs. In places where we need to treat BackgroundJobs in a homogenous manner we end up using a shared abstract class and the extending BackgroundJob implements an abstract method. Are you doing it this way because that’s typical OO expectation or are you gaining something by using interfaces?

What constraints are you working with that requires you to have multiple Servers to facilitate execution of 12 BackgroundJob types? Mostly out of curiosity as each of our unique BackgroundJob servers is responsible for between 10-200 different BackgroundJob types.

That logging constraint as a limiting factor might be a growing pain. What information are you looking for in your execution that requires the Hangfire Logging? The only time we dig into that is if we see something executed when it shouldn’t be…otherwise we look at Application logging and/or business state data.

I know I’m asking a lot of questions, but for me at least, architectural concerns get solved by talking about constraints and seeing how others solve them.

I’m using the interfaces because they don’t change as often as the types do, so this issue with needing to redeploy all the services when they change is less of a factor. With my current setup I would otherwise need to deploy all the services any time the code in any of them changed. edit: actually not sure this is entirely true. The queuing system seems to fail at runtime when the interface signature changes in any way, it may be the same with the types, but I would need to include the all the actual types in ever service whereas right now I just have a single jobinterface library that they use

The idea with the separate services is just to keep our tasks independent of each other, so a developer can fix or change one without the possibility of affecting the others (although obviously it hasn’t worked completely). The various tasks are non trivial but not huge codebases, and easier to manage releases separately.

We actually use the hangfire logging quite a bit, I use it the console extension and dump stored procedure output and internal logging from the job into the console so that when I need to see details about what happened in a job run I can do so by bringing up the job in the job control site.

I appreciate you taking the time to ask these questions and consider my situation :slight_smile:

Yeah, makes sense. Going back to my original post, you have a set of constraints in front of you, from Developer experience to infrastructure to business needs and you’ve made your choices to come to the solution you have.

I don’t think I would be too worried about your architecture, but maybe you will need to change it in the future if business needs change in such a way that you can’t achieve them with current solution without making the Developer experience terrible.

We’ve kinda meandered a bit in this conversation. Hopefully the conversation has helped but if you’ve come out of this with specific technical questions I’m happy to try and answer those for you as well.

Thanks yes, I have some options. I really like the idea of handling this with message queues but short term I think I will disable the queuing redundancy on my windows services entirely and maybe also put out copy of the job control website or a single windows service with queuing enabled so I still have some queuing redundancy