I am really frustrated and disappointed that Hangfire is not able to enqueue file uploads


#1

In 3 months I have not found a way in ASP.Net Core where a user selects a file from their PC using to be read and loaded by CsvHelper into SQL Server, using Enqueue() to have it done in a background job. There is just no way to do it… the browser will not pass back a path except for in the IFormFile object, and you cannot serialize IFormFile in order to use it as a parameter in the background job.

Seems like that would be a primary use for Hangfire and it fails. Disappointing.


#2

Read normally the csv file and process each line with the enqueue function. that would do it.
Can you show us your code?


#3

You’ve smashed two problems together:

1 - Uploading a file from a browser to a server
The file the user has uploaded doesn’t exist in file form on the ASP.NET Core server, it exists as a stream with some metadata, all contained in the IFormFile. If you want to work on a File object as opposed to a MemoryStream, you’ll have to save that IFormFile somewhere on your server/network share/wherever. This part has nothing to do with Hangfire.
ASP.NET Core 3.1 File Upload Guidance

2 - Processing your file via CsvHelper into SqlServer in a Hangfire Background Job
If it were me, after step 1 was complete I would queue up a Background Job to process the File, passing in the File Path. Inside that Background Job you would utilized your CsvHelper and load the data into SqlServer. Finally you should probably delete the file/folder created in Step 1 at the end of your Background Job.

It’s best practice to not pass large arguments into the Enqueue object (i.e. a File or CSV data) as that information is serialized and saved to the persistence layer Hangfire is running on.


#4

Create a new background job for each line? I’m not sure if that would be any faster though, because I think reading in the physical file would be the slowest part. And for a 10,000 or 100,000 line file, that’s a lot of background jobs!


#5

You are correct that the file contents is in the IFormFile object in memory. And that object also does not give me a local path to the file because of security concerns. So uploading the file to the web server or first would be the only option to work with a file object. I am thinking that would be my only option, but allowing file uploads has its own security issues.

I guess I’ll have to deal with that at some point. For now, it’s just going to have to be a thread-blocking process of reading and loading the file using IFormFile and CSVReader, and the user will have to wait until it’s done.

I thought Hangfire would be able to process that in background, but now I see why I could not find much info about using Hangfire to load local files. I still might be able to use it for periodic jobs like email subscriptions, db cleanup, etc. though.


#6

It seems like you’re a bit confused about the relationship between a user’s browser, the request it makes to the server and how a Background Job fits, unless I’m really misunderstanding your question.

IFormFile doesn’t provide you with “a local path” because, as far as the server is concerned, it only has access to the file stream and metadata that are passed in the request to the server that represents the file and form data the user uploaded. What you choose to do with the information in the request is up to you. If you want to save the memory stream to file, that’s something you’re going to have to do in whatever is handling the request (i.e. the controller method).

You won’t find any examples of what you’re describing because passing a stream or the entirety of a file contents to Hangfire are both anti-patterns. The stream because it’s lifecycle is not in alignment with Hangfire’s requirements and the file contents because the parameters in a BackgroundJob are stored in Hangfire’s persistence layer.

Hangfire is fine at processing local files, we do it all the time. But Hangfire isn’t going to magically turn a memory stream from a web request into a local file for you. You have to do that step yourself. You’re best off to do like I said in my original response: save the file to some place all your Hangfire Servers are going to have access to and pass the Hangfire Background Job the path to that file. Do your work and then delete the file.

Security is a valid concern, but I doubt there’s any more risk reading the contents of a stream compared to a file. You’re not executing the file. I’m sure there’s good asp.net guidance you can find if you’re really concerned.


#7

Thank you for your thorough responses, I appreciate it. You are right that I am a more than a bit confused about browsers, requests and background jobs. I’ve been a mainframe programmer since 1985, but this is my first production web app. So coming from a COBOL and then Intersystems and then Visual Basic and then C# world, I’m learning lots of new stuff. :wink:

I rely pretty heavily on Microsoft’s documentation, since I’m developing this in Visual Studio ASP.Net Core and using SQL Server for the db. And MS says to be careful with file uploads, on the page you linked to:

Use caution when providing users with the ability to upload files to a server. Attackers may attempt to:

  • Execute denial of service attacks.
  • Upload viruses or malware.
  • Compromise networks and servers in other ways.

Anyway… somehow the file gets from the user’s storage location to the web server, processed by my code, and stored in the db. Currently, the IFormFile object was the easiest way to read in the file using CSVHelper and I didn’t have to worry about security because I control the file type and content the user can select and the data is never stored on the server.

But it takes several minutes per 1000 records to upload. So I thought I could plug Hangfire into my current processing logic, but that wasn’t the case, due to my misunderstanding for sure. But it’s still a frustrating learning process; I posted this same topic about 3 months ago and never got an answer and could find little to nothing on the subject using the tools I wanted to use.

So it looks like uploading to the web server and then maybe scheduling an hourly job to sweep the upload folder and process csv files might be the way to do it. That adds a bunch of complexity (like who owns which file, which table is the file for, what if the data is invalid, how to notify the user when their file is processed, how to prevent hackers from uploading undesirable stuff…) but I guess that’s a can of worms I have to open. :slight_smile:

If there’s an example app or code or project or repo that uses Hangfire in this way, I’d love to see how it’s been done before.

Thanks for your help!


#8

I understand and it sucks about the lack of response on the Hangfire forum…however the problem you’ve posted is about 70% ASP.NET, 20% architecture and 10% Hangfire. This isn’t a super active forum and a lot of similar or unrelated questions get asked a lot.

Web programming is often thought of as simple but there’s always multiple modalities in play (browser, server, database) and its very important that each boundary is understood. There’s lots of good courses out there that can teach you more about the interactions between browser and server which would probably be a worthwhile time investment if you want to ensure you write a secure app. I’d bet you’d pick it up relatively quickly as it’s not rocket science, just different than writing applications or mainframe code.

When you’re programming at the web server level, you have to understand that the only interaction with the browser you have is the data that is sent via the browser’s TCP request and the corresponding TCP response that you want to send back. You don’t actually have access to the user’s browser directly, or any of the resources on the user’s machine.

In the simplest case, when a user on a browser picks a file, maybe fills in some form fields and clicks the submit button on their form, a TCP request is made from the browser to the web server. In your case, this is a “multipart-form” request which looks like some string data with blocks of binary data in between, plus some header data, url, etc. Then, generally, the browser sits there waiting for the response from the server, with several timeouts (both server and browser) that may result in the waiting being canceled.

When the request hits the web server, there’s a pipeline of different “modules” that the request will flow through, eventually landing in ASPNET. ASPNET is kind enough to take care of turning that TCP data into an IFormFile abstraction and saves you the work of reading and splitting up the stream of data in the TCP request. It then gives you the reins to decide what you want to do next with the data and what response you want to send back to the user’s browser.

As a best practice you don’t want to do any long processing in your controller method:

  1. The user experience sucks for the user making the browser request (they’re sitting there waiting)
  2. You run the risk of the request timing out or IIS app pool restart which will result in incomplete work

In the BH times (before Hangfire), the systems architecture for this problem would be much more complicated. You’d still be saving the file, saving state info to the database, but then you’d pass the information via some out of process communication to a windows service (daemon) that would then do the work. Hangfire, allows us to skip all the complexity of a 4th process/modality, but not the requirements for good async programming.

As for the other concerns and problems you mentioned:

  1. File Upload concerns are valid I guess. DoS is always a potential problem. To prevent Upload viruses/compromise networks and servers: Restrict the file upload extension type (this information is on the IFileUpload object), don’t run ASPNET core from an elevated user permission, don’t run the file, validate the contents look the way you expect, employ virus scan on your working directories, delete the file when you’re done with it
  2. Saving state is always going to be required for async processing. We typically use a pattern like: create a “Working Directory”, save file to Working Directory, save state about request/file to database table(s), queue up Background Job with the primary key for the state data row(s), return the primary key as the http response. When we do our processing work, we may also save additional progress/logging state back to that table to understand what to do if the Background Job is restarted. Processing state might include start/finish times, current batch of records being uploaded, success/failure codes/logs, etc.
  3. Communicating back to the user is very application specific requirement. Maybe you send an email. Maybe you have a piece of web user interface that polls your web server while the request is being processed and asks about the state you stored in step 2 until it is complete/error and presents that processing state to the user. There’s a bunch of different approaches to this problem.

I don’t know of an example code you could use for the whole solution, but I would start with getting your Web Server to save the file and the state code. Once you’ve got a file sitting there you can process and some state data, I think the challenge of queuing up a Background Job and using the state data to determine your processing logic will seem a lot more surmountable (and maybe similar to mainframe logic)

Hope this helps.


#9

In hangfire you define how many workers (threads) should work in parallel, and the rest will be queued for processing. So no worries there.


#10

That’s the key right there and helps me a lot, thank you! I will need a new db table to store the uploaded file state for reference by Hangfire when I kick off the background job.

Yes, web programming definitely has its challenges, although in my mainframe programming we do deal with client/server processes. I just didn’t do the design on that so I am learning all the details now.

Regarding learning courses for this stuff, I have been coding this project for about 12 months now and have sat through multiple Microsoft and YouTube videos on basics of web programming. In fact, a shout out to Wes Doyle who gives a basic template for a ASP.NET Library app that gave me a starting point: [https://www.youtube.com/channel/UCfniixfhHqpIGbU7z2JCNJw](Wes Doyle YouTube Channel).

Now that I have the basics down, I don’t really have time or patience to sit through hours of instruction that iterates the same points I already know. So StackOverflow has been an excellent resource for questions and answers.

So thanks for your help. When this web app gets out of beta, I’ll share the URL with you and you can try it out and see how I did. :slight_smile:


#11

That’s definitely bad. A plain SQL Server table should be able to insert 1000s per second. If this delay is happening on the POST or PUT request then the browser will be waiting all that time without the user knowing what’s happening.

It may help to just copy the data into a temporary file (which should be super fast), and then enqueue a hangfire job with the temp file’s full path as it’s parameter, and then returning a response to the browser immediately. That way the subsequent stuff (parsing the CSV, inserting into the db) is all done soon after without the browser having to wait. Also if it hits a transient error during the execution of the job, hangfire will auto-retry in a sensible way (backing off by increasing the delay between attempts).

Also, unrelated to hangfire but given how slow your processing is maybe look into the bulk copier class that is included with the dotnet core sql client.


#12

Thanks, Daniel. That’s what Jonah suggested too, so I will go that route. I’ll have to figure out how/where to upload the user’s file to my web server, create a db records to store the file and user info, then enqueue a background job to open, read and load the data.

Bryan


#13

I’m not sure about your hosting situation but both Azure and AWS have good, safe ways to store files without you worrying about the details. (Azure Blob Storage and AWS S3 are examples but they have other solutions that are more like a filesystem). These files storage solutions are fast and reliable, and take the file IO burden off of your web server too.

These alternatives to using your web server storage take a lot of the security issues out of the equation. Typically though you would need to be able to process the IO stream when your Hangfire job runs – which shouldn’t be a problem in most cases.


#14

Good points about file storage. I’ll use my web server for now, but if our volume grows a lot, I’ll probably switch to another storage place. But my file uploads are temporary files until the data is loaded anyway, so I will create a periodic Hangfire job to clean out the quarantine folder of files that have been successfully loaded.

And I’m psyched…I spent a couple days coding the file upload logic from Microsoft Docs and then the background job class and methods from the Hangfire docs. I did my first test tonight and don’t you know it worked on the first try? :slight_smile:


#15

Great! I’ve used Hangfire for a variety of things over the years. It can be useful for lots of things but it isn’t the right fit for all out of band work. Glad you were able to get something functional.

I noticed you mentioned Intersystems. Cache? I’ve partnered with that company in the past and seen lots of healthcare / hospitals have used their technology for decades.

Good luck with your new application!


#16

Yes, Cache. Most of my career has been working with Cache, and when it was called Mumps and then “M” before it was Cache. I worked for IDX Systems Corp. and then for GE Healthcare when they bought IDX. Same software for 30 years, so learning new stuff like C#, web apps, java, android is a welcome challenge.

We’ll see where this app goes… if it takes off maybe I’ll be able to retire some day. :wink: