Task.Run occasionally fails silently when launched from MVC Controller - c#

I am attempting to generate PDF copies of specific forms within my MVC application. As this is time consuming, and the client does not need to wait for this generation to happen, I'm trying to trigger this as a series of Fire and Forget Tasks.
One hang-up of note is that I need to have the HttpContext established, or some underlying pieces of the code that I can't alter won't work. I believe I have dealt with this problem, but I wanted to call it out in case it matters.
Here is the function I am calling...
private void AsyncPDFFormGeneration(string htmlOutput, string serverRelativePath, string serverURL, string signature, ScannedDocument document, HttpContext httpContext)
{
try
{
System.Web.HttpContext.Current = httpContext;
using (StreamWriter stw = new StreamWriter(Server.MapPath(serverRelativePath), false, System.Text.Encoding.Default))
{
stw.Write(htmlOutput);
}
Doc ABCDoc = new Doc();
ABCDoc.HtmlOptions.Engine = EngineType.Gecko;
int DocID = 0;
DocID = ABCDoc.AddImageUrl(serverURL + serverRelativePath + "?dumb=" + DateTime.Now.Hour.ToString() + DateTime.Now.Minute.ToString() + DateTime.Now.Second + DateTime.Now.Millisecond);
while (true)
{
ABCDoc.FrameRect();
if (!ABCDoc.Chainable(DocID))
break;
ABCDoc.TextStyle.LeftMargin = 100;
ABCDoc.Page = ABCDoc.AddPage();
DocID = ABCDoc.AddImageToChain(DocID);
}//End while (true...
for (int i = 1; i <= ABCDoc.PageCount; i++)
{
ABCDoc.PageNumber = i;
ABCDoc.Flatten();
}
ScannedDocuments.AddScannedDocument(document, ABCDoc.GetData());
System.IO.File.Delete(Server.MapPath(serverRelativePath));
}
catch (Exception e)
{
//Exception is logged to the database, and if that fails, to the Event Log
}
}
Within, I am writing the String output of the HTML contents of the MVC Form in question to an html file, handing the path to that file to the PDF writer, generating the PDF, and then deleting the html file.
I'm calling it inside of a Controller POST method, like so:
Task.Run(() => AsyncPDFFormGeneration(htmlOutput, serverRelativePath,
serverURL, signature, document, HttpContext.ApplicationInstance.Context));
This command is called as part of a foreach loop that constructs the forms, loads them into string format, and then passes them into a task. I've also tried this with
Task.Factory.StartNew
just in case something weird was going on with Task.Run, but that didn't produce a different result.
The problem I am having is that not all of the Tasks execute every time. If I run in Visual Studio and step my way through debugging, it works properly every time. However, when attempting to generate 11 forms sequentially, sometimes it generates all of them, sometimes it generated 3 or 4, sometimes it generates all but 1.
I have error logging set up to be as extensive as possible, but no exceptions are being thrown that I can find, and no generated html files are left lying around in my file structure on account of a thread aborted part-way.
There seems to be a slight correlation between how quickly the page comes back from the post, and how many of the forms are generated. A longer load time generally correlates to more of the forms being generated...but I was under the impression that shouldn't matter. I'm spinning these off to separate threads with their own copy of the HttpContext to take with them and carry around. Once launched, I did not think that the original thread should impact them.
Any ideas on why I'm only getting 3 successful Tasks on some attempts, all 11 on another attempt, and no exceptions?

Task.Run(() => AsyncPDFFormGeneration(htmlOutput, serverRelativePath,
serverURL, signature, document, HttpContext.ApplicationInstance.Context));
You have a subtle race condition on this line. The problem is with the HttpContext.ApplicationInstance.Context property. It will be evaluated when the task starts. If it happens before the end of the request, this is fine. But if for some reason the task takes a bit of time to start, then the request will complete first, and the HttpContext will be null. Therefore, you will have a null-reference exception, giving you the impression that the task didn't start (when, in fact, it did but crashed immediately outside of your try/catch).
To avoid that, just store the context in a local variable, and use it for Task.Run:
var context = HttpContext; // Or HttpContext.ApplicationInstance.Context, but I don't really see the point
Task.Run(() => AsyncPDFFormGeneration(htmlOutput, serverRelativePath, serverURL, signature, document, context));
That said, I don't know what API you are using that requres System.Web.HttpContext.Current to be set, but it seems a very bad choice for a fire-and-forget task. Even if you locally save the HttpContext, it'll still have been cleaned up, so I'm not sure it'll behave as expected.
Also, as was mentioned in the comments, launching fire-and-forget tasks on ASP.NET is dangerous. You should use HostingEnvironment.QueueBackgroundWorkItem instead.

I would try using await Task.WhenAll(task1, task2, task3, etc) as your application may be closing before all tasks have completed.

Related

SqlBulkCopy.WriteToServerAsync() does not write to target SQL Server table, bulkCopy.WriteToServer() does

Just as the title states. I am trying to load a ~8.45GB csv file with ~330 columns (~7.5 million rows) into a SQL Server instance, but I'm doing the parsing internally as it the file has some quirks to it (with comma delimitations and quotes, etc). The heavy duty bulk insert and line parsing is done as below:
var dataTable = new DataTable(TargetTable);
using var streamReader = new StreamReader(FilePath);
using var bulkCopy = new SqlBulkCopy(this._connection, SqlBulkCopyOptions.TableLock, null)
{
DestinationTableName = TargetTable,
BulkCopyTimeout = 0,
BatchSize = BatchSize,
};
/// ...
var outputFields = new string[columnsInCsv];
this._connection.Open();
while ((line = streamReader.ReadLine()) != null)
{
//get data
CsvTools.ParseCsvLineWriteDirect(line, ref outputFields);
// insert into datatable
dataTable.LoadDataRow(outputFields, true);
// update counters
totalRows++;
rowCounter++;
if (rowCounter >= BatchSize)
{
try
{
// load data
bulkCopy.WriteToServer(dataTable); // this works.
//Task.Run(async () => await bulkCopy.WriteToServerAsync(dataTable)); // this does not.
//bulkCopy.WriteToServerAsync(dataTable)) // this does not write to the table either.
rowCounter = 0;
dataTable.Clear();
}
catch (Exception ex)
{
Console.Error.WriteLine(ex.ToString());
return;
}
}
}
// check if we have any remnants to load
if (dataTable.Rows.Count > 0)
{
bulkCopy.WriteToServer(dataTable); // same here as above
//Task.Run(async () => await bulkCopy.WriteToServerAsync(dataTable));
//bulkCopy.WriteToServerAsync(dataTable));
dataTable.Clear();
}
this._connection.Close();
Obviously I would like this to be fast as possible. I noticed via profiling that the WriteToServerAsync method was almost 2x as fast (in its execution duration) as the WriteToServer method, but when I use the async version, no data appears to be written to the target table (whereas the non-async version seems to commit the data fine but much more slowly). I'm assuming there is something here I forgot (to somehow trigger the commit to the table), but I am not sure what could prevent committing the data to the target table.
Note that I am aware that SQL Server has a BULK INSERT statement but I need more control over the data for other reasons and would prefer to do this in C#. Also perhaps relevant is that I am using SQL Server 2022 Developer edition.
Fire and forget tasks
By performing Task.Run(...) or DoSomethingAsync() without a corresponding await essentially makes the task a fire and forget task. The "fire" refers to the creation of the task and the "forget" due to the fact that the coder appears not to be interested in any return value (if applicable) or desires any knowledge as to when the task completes.
Though not immediately problematic, it is if the calling thread or process exits before the task completes! The task will be terminated before completion. This problem typically occurs in short-lived processes such as console apps, not so much for say Windows Services, web apps with 20-minute App Domain timeouts et all.
Example
sending an asynchronous keep-alive/heartbeat to a remote service or monitor.
there is no return value to monitor, asynchronous or otherwise
It won't matter if it fails as a more up-to-date call will eventually replace it
It won't matter if it doesn't complete in time if the hosting process exits for some reason (after-all we are a heartbeat, if the process is ended naturally there is no heart to beat).
Awaited tasks
Consider prefixing it with a await as in await bulkCopy.WriteToServerAsync(...);. This way the task is linked to the parent task/thread and ensures the former (unless it is terminated by some other means) does not exit before the task completes.
Naturally the containing method and those in the call stack will need to be marked async and also have await prefixes on the corresponding methods. This "async all the way" creates a nice daisy chain of linked tasks all the way to the parent (or at least to the last method in the call chain with an await or a legacy ContinueWith()).

Can this async code complete out of order?

I have the following C# code in an AspNet WebApi controller:
private static async Task<string> SaveDocumentAsync(HttpContent content) {
var path = "something";
using (var file = File.OpenWrite(path)) {
await content.CopyToAsync(file);
}
return path;
}
public async Task<IHttpActionResult> Put() {
var path = await SaveDocumentAsync(Request.Content);
await SaveDbRecordAsync(path); // writes something to the database using System.Data and awaiting Async methods
return OK();
}
I am sometimes seeing the database record visible before the document has finished being written. Is this a possible execution sequence? (It is also possible my file system isn't giving me the semantics I want).
To clarify how I'm observing this. It is an application that is reading the path out of the database and then trying to read the file and finding it isn't there. The file does appear shortly afterwards.
This doesn't happen every time, normally the file comes first. Maybe 1 in 1000 it happens the wrong way.
This turned out to be down to file system semantics. I thought I'd excluded my replicated file system, but I'd done it wrong. The code is behaving as expected.
Since you're awaiting SaveDocumentAsync function before you call SaveDbRecordAsync, it executes after SaveDocumentAsync completes.
If you were to fire the tasks in parallel then await them:
var saveTask = SaveDocumentAsync(Request.Content);
var dbTask = SaveDbRecordAsync("a/path.ext");
await saveTask;
await dbTask;
then you wouldn't be able to guarantee the completion order.
#Neiston touches a good point: it might be that the app you're using to view the results might be updating with a delay and causing you to think the order is switched.
As you are writing to 2 different files (one file, one database), then the OS is perfectly within it's remit to perform the writes in whatever order is 'best' for the storage medium.
In the old days of spinning storage, the 2 requests would be in the write queue, and if the r/w heads were currently nearer the to the tracks for the database, than the file, then the OS (or maybe the HDD controller) would write the database data first, followed by the file data.
This assumes that both your file and your database server are running on the same physical machine. If you are writing to a shared folder, and/or the DB server is also on a different machine, then who knows what order they will finish in.

Is there a way to retrieve an existing NSUrlSession/cancel its task?

I am creating an NSUrlSession for a background upload using a unique identifier.
Is there a way, say after closing and reopening the app, to retrieve that NSUrlSession and cancel the upload task in case it has not been processed yet?
I tried simply recreating the NSUrlSession using the same identifier to check whether it still contains the upload task, however it does not even allow me to create this session, throwing an exception like "A background URLSession with identifier ... already exists", which is unsurprising as documentation explicitly says that a session identifier must be unique.
I am trying to do this with Xamarin.Forms 2.3.4.270 in an iOS platform project.
Turns out I was on the right track. The error message "A background URLSession with identifier ... already exists" actually seems to be more of a warning, but there is not actually an exception thrown (the exception I had did not actually come from duplicate session creation).
So, you can in fact reattach to an existing NSUrlSession and will find the contained tasks still present, even after restarting the app. Just create a new configuration with the same identifier, use that to create a new session, ignore the warning that's printed out, and go on from there.
I am not sure if this is recommended for production use, but it works fine for my needs.
private async Task EnqueueUploadInternal(string uploadId)
{
NSUrlSessionConfiguration configuration = NSUrlSessionConfiguration.CreateBackgroundSessionConfiguration(uploadId);
INSUrlSessionDelegate urlSessionDelegate = (...);
NSUrlSession session = NSUrlSession.FromConfiguration(configuration, urlSessionDelegate, new NSOperationQueue());
NSUrlSessionUploadTask uploadTask = await (...);
uploadTask.Resume();
}
private async Task CancelUploadInternal(string uploadId)
{
NSUrlSessionConfiguration configuration = NSUrlSessionConfiguration.CreateBackgroundSessionConfiguration(uploadId);
NSUrlSession session = NSUrlSession.FromConfiguration(configuration); // this will print a warning
NSUrlSessionTask[] tasks = await session.GetAllTasksAsync();
foreach (NSUrlSessionTask task in tasks)
task.Cancel();
}

Crystal Reports reaches job limit despite garbage collection

I have a C# application I recently converted into a service. As part of its normal operation, it creates PDF invoices via CR using the following code:
foreach (string docentry in proformaDocs)
using (ReportDocument prodoc = new ReportDocument()) {
string filename = outputFolder + docentry + ".pdf";
prodoc.Load(/* .rpt file */);
prodoc.SetParameterValue(0, docentry);
prodoc.SetParameterValue(1, 17);
prodoc.SetDatabaseLogon(/* login data */);
prodoc.ExportToDisk(CrystalDecisions.Shared.ExportFormatType.PortableDocFormat,
filename);
prodoc.Close();
prodoc.Dispose();
}
foreach (string docentry in invoiceDocs)
using (ReportDocument invdoc = new ReportDocument()) {
string filename = differentOutputFolder + docentry + ".pdf";
invdoc.Load(/* different .rpt file */);
invdoc.SetParameterValue(0, docentry);
invdoc.SetParameterValue(1, 13);
invdoc.SetDatabaseLogon(/* login data */);
invdoc.ExportToDisk(CrystalDecisions.Shared.ExportFormatType.PortableDocFormat,
filename);
invdoc.Close();
invdoc.Dispose();
}
GC.Collect();
Problem is, after about 3-4 hours of runtime with the above code executing at most every two minutes, the Load() operation hits the processing job limit despite me explicitly disposing the report objects. However, if I leave the service running and launch a non-service instance of the same application, that one executes properly even while the service is still throwing the job limit exception. With the non-service instance having taken care of the processing, the service has nothing to do for the moment - but the instant it does, it throws the error again until I manually stop and restart the service, at which point the error goes away for another 3-4 hours.
How am I hitting the job limit if I'm manually disposing every single report object as soon as I'm done with it and calling garbage collection after each round of processing and disposing? And if the job limit is reached, how can a parallel instance of the same code not be affected by it?
UPDATE: I managed to track down the problem and as it turns out, it's not with CR. I take CR's database login credentials from a SAP Company object inside a Database wrapper class stored in a Dictionary, fetched with this:
public Company GetSAP(string name) {
Database db; //wrapper class
SAP.TryGetValue(name, out db); //fetching from the Dictionary
return db.SAP; //Company object in the wrapper class
}
For some reason, calling this freezes the thread, but the Timer launching the service's normal operation naturally doesn't wait for it to complete and launches another thread, which freezes too upon calling this. This keeps up until the number of frozen threads hits the job limit, at which point an exception is thrown by each new thread due to the still frozen threads filling the job limit. I put in a check to not launch a new thread if one is still running and the application froze upon calling the above function.
The getter of the object the return db.SAP above calls has literally nothing in it other than a return.
Alright, the problem was kinda solved. For some reason, the getters in the COM object I was trying to fetch the login credentials from freeze if accessed from a service but not from a non-service application. Testing this COM-object-stuffed-into-wrapper-class-stuffed-into-Dictionary setup in an IIS application also yielded no freezes. I have no idea why and short of SAP sharing the source code for said COM object, I'm unlikely to ever find out. So I simply declared a few string fields for storing the credentials and cut accessing the COM object out entirely since I didn't need it, only its fields.

TPL DataFlow confusion around pipelines - should I create a new pipeline for each data call? How can I track data that's flowing through?

I'm struggling with how to apply TPL DataFlow to my application.
I've got a bunch of parallel data operations I want to track and manage, previously I was just using Tasks, but I'm trying to implement DataFlow to give me more control.
I'm composing a pipeline of tasks to say get the data and process it, here's an example of a pipeline to get data, process data, and log it as complete:
TransformBlock<string, string> loadDataFromFile = new TransformBlock<string, string>(filename =>
{
// read the data file (takes a long time!)
Console.WriteLine("Loading from " + filename);
Thread.Sleep(2000);
// return our result, for now just use the filename
return filename + "_data";
});
TransformBlock<string, string> prodcessData = new TransformBlock<string, string>(data =>
{
// process the data
Console.WriteLine("Processiong data " + data);
Thread.Sleep(2000);
// return our result, for now just use the data string
return data + "_processed";
});
TransformBlock<string, string> logProcessComplete= new TransformBlock<string, string>(data =>
{
// Doesn't do anything to the data, just performs an 'action' (but still passses the data long, unlike ActionBlock)
Console.WriteLine("Result " + data + " complete");
return data;
});
I'm linking them together like this:
// create a pipeline
loadDataFromFile.LinkTo(prodcessData);
prodcessData.LinkTo(logProcessComplete);
I've been trying to follow this tutorial.
My confusion is that in the tutorial this pipeline seems to be a 'fire once' operation. It creates the pipeline and fires it off once, and it completes. This seems counter to how the Dataflow library seems designed, I've read:
The usual way of using TPL Dataflow is to create all the blocks, link
them together, and then start putting data in one end.
From "Concurrency in C# Cookbook" by Stephen Cleary.
But I'm not sure how to track the data after I've put said data 'in one end'. I need to be able to get the processed data from multiple parts of the program, say the user presses two buttons, one to get the data from "File1" and do something with it, one to get the data from "File2", I'd need something like this I think:
public async Task loadFile1ButtonPress()
{
loadDataFromFile.Post("File1");
var data = await logProcessComplete.ReceiveAsync();
Console.WriteLine($"Got data1: {data}");
}
public async Task loadFile2ButtonPress()
{
loadDataFromFile.Post("File2");
var data = await logProcessComplete.ReceiveAsync();
Console.WriteLine($"Got data2: {data}");
}
If these are performed 'synchronously' it works just fine, as there's only one piece of information flowing through the pipeline:
Console.WriteLine("waiting for File 1");
await loadFile1ButtonPress();
Console.WriteLine("waiting for File 2");
await loadFile2ButtonPress();
Console.WriteLine("Done");
Produces the expected output:
waiting for File 1
Loading from File1
Processiong data File1_data
Result File1_data_processed complete
Got data1: File1_data_processed
waiting for File 2
Loading from File2
Processiong data File2_data
Result File2_data_processed complete
Got data2: File2_data_processed
Done
This makes sense to me, it's just doing them one at a time:
However, the point is I want to run these operations in parallel and asynchronously. If I simulate this (say, the user pressing both 'buttons' in quick succession) with:
Console.WriteLine("waiting");
await Task.WhenAll(loadFile1ButtonPress(), loadFile2ButtonPress());
Console.WriteLine("Done");
Does this work if the second operation takes longer than the first?
I was expecting both to return the first data however (Originally this didn't
work but it was a bug I've fixed - it does return the correct items now).
I was thinking I could link an ActionBlock<string> to perform the action with the data, something like:
public async Task loadFile1ButtonPress()
{
loadDataFromFile.Post("File1");
// instead of var data = await logProcessComplete.ReceiveAsync();
logProcessComplete.LinkTo(new ActionBlock<string>(data =>
{
Console.WriteLine($"Got data1: {data}");
}));
}
But this is changing the pipeline completely, now loadFile2ButtonPress won't work at all as it's using that pipeline.
Can I create multiple pipelines with the same Blocks? Or should I be creating a whole new pipeline (and new blocks) for each 'operation' (that seems to defeat the point of using the Dataflow library at all)
Not sure if this is best place for Stackoverflow or something like Codereview? Might be a bit subjective.
If you need some events to happen after some data has been processed, you should make your last block AsObservable, and add some small code with Rx.Net:
var subscription = logProcessComplete.AsObservable();
subscription.Subscribe(i => Console.WriteLine(i));
As been said in comments, you can link your blocks to more than one block, with a predicate. Note, that in that case, message will be delivered only to first matching block. You also may create a BroadcastBlock, which delivers a copy of the message to each linked block.
Make sure that unwanted by every other block messages are linked to NullTarget, as in other case they will stay in your pipeline forever, and will stop your completion.
Check that your pipeline correctly handles completion, as in case of multiple links the completion also being propagated only to the first linked block.

Categories