How to process directory files in Task parallel library? - c#

I have a scenario in which i have to process the multiple files(e.g. 30) parallel based on the processor cores. I have to assign these files to separate tasks based on no of processor cores. I don't know how to make a start and end limit of each task to process. For example each and every task knows how many files it has to process.
private void ProcessFiles(object e)
{
try
{
var diectoryPath = _Configurations.Descendants().SingleOrDefault(Pr => Pr.Name == "DirectoryPath").Value;
var FilePaths = Directory.EnumerateFiles(diectoryPath);
int numCores = System.Environment.ProcessorCount;
int NoOfTasks = FilePaths.Count() > numCores ? (FilePaths.Count()/ numCores) : FilePaths.Count();
for (int i = 0; i < NoOfTasks; i++)
{
Task.Factory.StartNew(
() =>
{
int startIndex = 0, endIndex = 0;
for (int Count = startIndex; Count < endIndex; Count++)
{
this.ProcessFile(FilePaths);
}
});
}
}
catch (Exception ex)
{
throw;
}
}

For problems such as yours, there are concurrent data structures available in C#. You want to use BlockingCollection and store all the file names in it.
Your idea of calculating the number of tasks by using the number of cores available on the machine is not very good. Why? Because ProcessFile() may not take the same time for each file. So, it would be better to start the number of tasks as the number of cores you have. Then, let each task read file name one by one from the BlockingCollection and then process the file, until the BlockingCollection is empty.
try
{
var directoryPath = _Configurations.Descendants().SingleOrDefault(Pr => Pr.Name == "DirectoryPath").Value;
var filePaths = CreateBlockingCollection(directoryPath);
//Start the same #tasks as the #cores (Assuming that #files > #cores)
int taskCount = System.Environment.ProcessorCount;
for (int i = 0; i < taskCount; i++)
{
Task.Factory.StartNew(
() =>
{
string fileName;
while (!filePaths.IsCompleted)
{
if (!filePaths.TryTake(out fileName)) continue;
this.ProcessFile(fileName);
}
});
}
}
And the CreateBlockingCollection() would be as follows:
private BlockingCollection<string> CreateBlockingCollection(string path)
{
var allFiles = Directory.EnumerateFiles(path);
var filePaths = new BlockingCollection<string>(allFiles.Count);
foreach(var fileName in allFiles)
{
filePaths.Add(fileName);
}
filePaths.CompleteAdding();
return filePaths;
}
You will have to modify your ProcessFile() to receive a file name now instead of taking all the file paths and processing its chunk.
The advantage of this approach is that now your CPU won't be over or under subscribed and the load will be evenly balanced too.
I haven't run the code myself, so there might be some syntax error in my code. Feel free to correct the error, if you come across any.

Based on my admittedly limited understanding of the TPL, I think your code could be rewritten as such:
private void ProcessFiles(object e)
{
try
{
var diectoryPath = _Configurations.Descendants().SingleOrDefault(Pr => Pr.Name == "DirectoryPath").Value;
var FilePaths = Directory.EnumerateFiles(diectoryPath);
Parallel.ForEach(FilePaths, path => this.ProcessFile(path));
}
catch (Exception ex)
{
throw;
}
}
regards

Related

System.AggregateException at System.Threading.Tasks.TaskExceptionHolder.Finalize()

Update:
Adding TaskCreationOptions.LongRunning solved the issue but is this a good approach ? If not, what is the best solution to get over this exception ?
There is an issue i am trying to troubleshoot. I have implemented the suggestions that were provided in StackOverFlow but those have not helped solve the issue. I have used other alternatives like ContinuwWith option instead of Task.WaitAll by attaching the extension method. This did not help either .
I have put Ex.handle { } and i have tried throw ex in the Catch(aggrgateException ex) in the exceptions but that did not help to catch the actual exception.
I have only .Net 4.0 installed so i cannot try the .Net 4.5 resolution to solve this
The exception that i have been getting all the time is
"System.AggregateException:" The task's exception was not observed either by waiting on the task or accessing the Exception property "
After this it simply kills worker process and the App crashes and i see an entry in the EventViewer
Any help here here will be appreciated.
We have the below code:
Task<List<MyBusinessObject>>[] tasks = new Task<List<MyBusinessObject>>[MyCollection.Count];
for (int i = 0; i < MyCollection.Count; i++)
{
MyDTO dto = new MyDTO();
--Some Property Assignment for the MyDTO object--
tasks[i] = Task<List<MyBusinessObject>>.Factory.StartNew(MyDelegate, dto)
}
try
{
Task.WaitAll(tasks);
}
catch (AggregateException e)
{
AddToLogFile("Exceptions thrown by WaitAll() : ");
for (int j = 0; j < e.InnerExceptions.Count; j++)
{
AddToLogFile(e.InnerExceptions[j].ToString());
}
}
catch(Exception ex)
{
AddToLogFile(ex.Message);
}
Second Alternative
public Static Class Extensions
{
public static void LogExceptions(this Task<List<<MyBusinessObject>> task)
{
task.ContinueWith(t =>
{
var aggException = t.Exception.Flatten();
foreach (var exception in aggException.InnerExceptions)
{
AddToLogFile("Task Exception: " + exception.Message);
}
},
TaskContinuationOptions.OnlyOnFaulted);
}
}
//In a different class call the extension method after starting the new tasks
Task<List<MyBusinessObject>>[] tasks = new Task<List<MyBusinessObject>>[MyCollection.Count];
for (int i = 0; i < MyCollection.Count; i++)
{
MyDTO dto = new MyDTO();
--Some Property Assignment for the MyDTO object--
tasks[i] = Task<List<MyBusinessObject>>.Factory.StartNew(MyDelegate, dto).LogExceptions()
}
Instead of creating a new tasks for each iteration all at one time, i created the new task for 100 tasks every time. I then waited till all the tasks were complete and spwaned off another 100 tasks till all the tasks were finished
int count = Mycollection.Count();
Task>[] tasks;
int i = 0;
int j;
while (i < count)
{
j = 0;
tasks = new Task<List<MyBusinessObject>>[nbrOfTasks];
foreach (var p in Mycollection.Skip(i).Take(nbrOfTasks))
{
MyRequestDto dto = new MyRequestDto ();
--Some Proerty Assignment
tasks[j] = Task<List<MyBusinessObject>>.Factory.StartNew(MyDelegate, dto);
i++;
j++;
}
try
{
// Wait for all the tasks to finish.
if (tasks != null && tasks.Count() > 0)
{
tasks = tasks.Where(t => t != null).ToArray();
Task.WaitAll(tasks);
}
}
catch (AggregateException e)
{
}
This is to do with the memory avaiable to service multiple tasks. If the required RAM was not available to service the concurrent tasks along the way, the App Would just crash.That is why it is best to spawn limited nbr of tasks. Instead of spwaning multiple and leaving it to the threadpool to manage.

C# WPF Speedup (Thread) Total FileInfo.Length from Multiple Files

I'm trying to Speedup the Sum-calculation of all Files in all Folders recursive given by one Path.
Let's say i choose "E:\" as Folder.
I will now get the entrie recursive Fileslist via "SafeFileEnumerator" into IEnumerable in Milliseconds (works like a charm)
Now i would like to gather the sum of all bytes from all files in this Enumerable.
Right now i loop them via foreach and get the FileInfo(oFileInfo.FullName).Length; - for each file.
This is working, but it is slow - it takes about 30 seconds. If i lookup the space consumption via Windows rightclick - properties of all selected folders in the windows explorer i get them in about 6 seconds (~ 1600 files in 26 gigabytes of data on ssd)
so my first thougth was to speedup gathering by the usage of threads, but i don't get any speedup here..
the code without the threads is below:
public static long fetchFolderSize(string Folder, CancellationTokenSource oCancelToken)
{
long FolderSize = 0;
IEnumerable<FileSystemInfo> aFiles = new SafeFileEnumerator(Folder, "*", SearchOption.AllDirectories);
foreach (FileSystemInfo oFileInfo in aFiles)
{
// check if we will cancel now
if (oCancelToken.Token.IsCancellationRequested)
{
throw new OperationCanceledException();
}
try
{
FolderSize += new FileInfo(oFileInfo.FullName).Length;
}
catch (Exception oException)
{
Debug.WriteLine(oException.Message);
}
}
return FolderSize;
}
the multithreading code is below:
public static long fetchFolderSize(string Folder, CancellationTokenSource oCancelToken)
{
long FolderSize = 0;
int iCountTasks = 0;
IEnumerable<FileSystemInfo> aFiles = new SafeFileEnumerator(Folder, "*", SearchOption.AllDirectories);
foreach (FileSystemInfo oFileInfo in aFiles)
{
// check if we will cancel now
if (oCancelToken.Token.IsCancellationRequested)
{
throw new OperationCanceledException();
}
if (iCountTasks < 10)
{
iCountTasks++;
Thread oThread = new Thread(delegate()
{
try
{
FolderSize += new FileInfo(oFileInfo.FullName).Length;
}
catch (Exception oException)
{
Debug.WriteLine(oException.Message);
}
iCountTasks--;
});
oThread.Start();
continue;
}
try
{
FolderSize += new FileInfo(oFileInfo.FullName).Length;
}
catch (Exception oException)
{
Debug.WriteLine(oException.Message);
}
}
return FolderSize;
}
could someone please give me an advice how i could speedup the foldersize calculation process?
kindly regards
Edit 1 (Parallel.Foreach suggestion - see comments)
public static long fetchFolderSize(string Folder, CancellationTokenSource oCancelToken)
{
long FolderSize = 0;
ParallelOptions oParallelOptions = new ParallelOptions();
oParallelOptions.CancellationToken = oCancelToken.Token;
oParallelOptions.MaxDegreeOfParallelism = System.Environment.ProcessorCount;
IEnumerable<FileSystemInfo> aFiles = new SafeFileEnumerator(Folder, "*", SearchOption.AllDirectories).ToArray();
Parallel.ForEach(aFiles, oParallelOptions, oFileInfo =>
{
try
{
FolderSize += new FileInfo(oFileInfo.FullName).Length;
}
catch (Exception oException)
{
Debug.WriteLine(oException.Message);
}
});
return FolderSize;
}
Side-note about SafeFileEnumerator performance:
Once you get IEnumerable, it doesn't mean you got entire collection because it is lazy proxy. Try this snippet below - I'm sure you'll see the performance difference (sorry if it's not compiling - just to illustrate the idea):
var tmp = new SafeFileEnumerator(Folder, "*", SearchOption.AllDirectories).ToArray(); // fetch all records explicitly to populate the array
IEnumerable<FileSystemInfo> aFiles = tmp;
Now out the actual result you want to achieve.
If you need just file sizes - it's better to request OS functions about filesystem, not querying files one-by-one. I'd start with DirectoryInfo class (see for instance http://www.tutorialspoint.com/csharp/csharp_windows_file_system.htm).
If you need to calculate the checksum for each, it would be definitely slow task because you have to load each of the files first (a lot of memory transfers). Threads are not a booster here because they'll be limited by OS filesystem throughput, not your CPU power.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
using System.IO;
namespace ConsoleApplication3
{
class Program
{
static void Main(string[] args)
{
long size = fetchFolderSize(#"C:\Test", new CancellationTokenSource());
}
public static long fetchFolderSize(string Folder, CancellationTokenSource oCancelToken)
{
ParallelOptions po = new ParallelOptions();
po.CancellationToken = oCancelToken.Token;
po.MaxDegreeOfParallelism = System.Environment.ProcessorCount;
long folderSize = 0;
string[] files = Directory.GetFiles(Folder);
Parallel.ForEach<string,long>(files,
po,
() => 0,
(fileName, loop, fileSize) =>
{
fileSize = new FileInfo(fileName).Length;
po.CancellationToken.ThrowIfCancellationRequested();
return fileSize;
},
(finalResult) => Interlocked.Add(ref folderSize, finalResult)
);
string[] subdirEntries = Directory.GetDirectories(Folder);
Parallel.For<long>(0, subdirEntries.Length, () => 0, (i, loop, subtotal) =>
{
if ((File.GetAttributes(subdirEntries[i]) & FileAttributes.ReparsePoint) !=
FileAttributes.ReparsePoint)
{
subtotal += fetchFolderSize(subdirEntries[i], oCancelToken);
return subtotal;
}
return 0;
},
(finalResult) => Interlocked.Add(ref folderSize, finalResult)
);
return folderSize ;
}
}
}

How to pass different instances while multithreading?

I am building a scraper. My goal is to start X browsers (where X is number of threads) and proceed to scrape a list of URLs with each of them by splitting that list in X parts.
I decide to use 3 threads (3 browsers) with list of 10 URLs.
Question: How to separate each task between the browsers like this:
Browser1 scrapes items in the list from 0 to 3
Browser2 scrapes items in the list from 4 to 7
Browser3 scrapes items in the list from 8 to 10
All browsers should be working at the same time scraping the passed list of URLs.
I already have this BlockingCollection:
BlockingCollection<Action> _taskQ = new BlockingCollection<Action>();
public Multithreading(int workerCount)
{
// Create and start a separate Task for each consumer:
for (int i = 0; i < workerCount; i++)
Task.Factory.StartNew(Consume);
}
public void Dispose() { _taskQ.CompleteAdding(); }
public void EnqueueTask(Action action) { _taskQ.Add(action); }
void Consume()
{
// This sequence that we’re enumerating will block when no elements
// are available and will end when CompleteAdding is called.
foreach (Action action in _taskQ.GetConsumingEnumerable())
action(); // Perform task.
}
public int ItemsCount()
{
return _taskQ.Count;
}
It can be used like this:
Multithreading multithread = new Multithreading(3); //3 threads
foreach(string url in urlList){
multithread.EnqueueTask(new Action(() =>
{
startScraping(browser1); //or browser2 or browser3
}));
}
I need to create the browsers instances before scraping, because I do not want to start a new browser with every thread.
Taking Henk Holtermans comment into account that you may want maximum speed, i.e. keep browsers busy as much as possible, use this:
private static void StartScraping(int id, IEnumerable<Uri> urls)
{
// Construct browser here
foreach (Uri url in urls)
{
// Use browser to process url here
Console.WriteLine("Browser {0} is processing url {1}", id, url);
}
}
in main:
int nrWorkers = 3;
int nrUrls = 10;
BlockingCollection<Uri> taskQ = new BlockingCollection<Uri>();
foreach (int i in Enumerable.Range(0, nrWorkers))
{
Task.Run(() => StartScraping(i, taskQ.GetConsumingEnumerable()));
}
foreach (int i in Enumerable.Range(0, nrUrls))
{
taskQ.Add(new Uri(String.Format("http://Url{0}", i)));
}
taskQ.CompleteAdding();
I think the usual approach is to have a single blocking queue, a provider thread and an arbitrary pool of workers.
The provider thread is responsible for adding URLs to the queue. It blocks when there are none to add.
A worker thread instantiates a browser, and then retrieves a single URL from the queue, scrapes it and then loops back for more. It blocks when the queue is empty.
You can start as many workers as you like, and they just sort it out between them.
The mainline starts all the threads and retires to the sidelines. It looks after the UI, if there is one.
Multithreading can be really hard to debug. You might want to look at using Tasks for at least part of the job.
You could give some Id to the tasks and also Workers. Then you'll have BlockingCollection[] instead of just BlockingCollection. Every consumer will consume from its own BlockingCollection from the array. Our job is to find the right consumer and post the job.
BlockingCollection<Action>[] _taskQ;
private int taskCounter = -1;
public Multithreading(int workerCount)
{
_taskQ = new BlockingCollection<Action>[workerCount];
for (int i = 0; i < workerCount; i++)
{
int workerId = i;//To avoid closure issue
_taskQ[workerId] = new BlockingCollection<Action>();
Task.Factory.StartNew(()=> Consume(workerId));
}
}
public void EnqueueTask(Action action)
{
int value = Interlocked.Increment(ref taskCounter);
int index = value / 4;//Your own logic to find the index here
_taskQ[index].Add(action);
}
void Consume(int workerId)
{
foreach (Action action in _taskQ[workerId].GetConsumingEnumerable())
action();// Perform task.
}
A simple solution using background workers can limit the number of threads:
public class Scraper : IDisposable
{
private readonly BlockingCollection<Action> tasks;
private readonly IList<BackgroundWorker> workers;
public Scraper(IList<Uri> urls, int numberOfThreads)
{
for (var i = 0; i < urls.Count; i++)
{
var url = urls[i];
tasks.Add(() => Scrape(url));
}
for (var i = 0; i < numberOfThreads; i++)
{
var worker = new BackgroundWorker();
worker.DoWork += (sender, args) =>
{
Action task;
while (tasks.TryTake(out task))
{
task();
}
};
workers.Add(worker);
worker.RunWorkerAsync();
}
}
public void Scrape(Uri url)
{
Console.WriteLine("Scraping url {0}", url);
}
public void Dispose()
{
throw new NotImplementedException();
}
}

Use Task.Run instead of Delegate.BeginInvoke

I have recently upgraded my projects to ASP.NET 4.5 and I have been waiting a long time to use 4.5's asynchronous capabilities. After reading the documentation I'm not sure whether I can improve my code at all.
I want to execute a task asynchronously and then forget about it. The way that I'm currently doing this is by creating delegates and then using BeginInvoke.
Here's one of the filters in my project with creates an audit in our database every time a user accesses a resource that must be audited:
public override void OnActionExecuting(ActionExecutingContext filterContext)
{
var request = filterContext.HttpContext.Request;
var id = WebSecurity.CurrentUserId;
var invoker = new MethodInvoker(delegate
{
var audit = new Audit
{
Id = Guid.NewGuid(),
IPAddress = request.UserHostAddress,
UserId = id,
Resource = request.RawUrl,
Timestamp = DateTime.UtcNow
};
var database = (new NinjectBinder()).Kernel.Get<IDatabaseWorker>();
database.Audits.InsertOrUpdate(audit);
database.Save();
});
invoker.BeginInvoke(StopAsynchronousMethod, invoker);
base.OnActionExecuting(filterContext);
}
But in order to finish this asynchronous task, I need to always define a callback, which looks like this:
public void StopAsynchronousMethod(IAsyncResult result)
{
var state = (MethodInvoker)result.AsyncState;
try
{
state.EndInvoke(result);
}
catch (Exception e)
{
var username = WebSecurity.CurrentUserName;
Debugging.DispatchExceptionEmail(e, username);
}
}
I would rather not use the callback at all due to the fact that I do not need a result from the task that I am invoking asynchronously.
How can I improve this code with Task.Run() (or async and await)?
If I understood your requirements correctly, you want to kick off a task and then forget about it. When the task completes, and if an exception occurred, you want to log it.
I'd use Task.Run to create a task, followed by ContinueWith to attach a continuation task. This continuation task will log any exception that was thrown from the parent task. Also, use TaskContinuationOptions.OnlyOnFaulted to make sure the continuation only runs if an exception occurred.
Task.Run(() => {
var audit = new Audit
{
Id = Guid.NewGuid(),
IPAddress = request.UserHostAddress,
UserId = id,
Resource = request.RawUrl,
Timestamp = DateTime.UtcNow
};
var database = (new NinjectBinder()).Kernel.Get<IDatabaseWorker>();
database.Audits.InsertOrUpdate(audit);
database.Save();
}).ContinueWith(task => {
task.Exception.Handle(ex => {
var username = WebSecurity.CurrentUserName;
Debugging.DispatchExceptionEmail(ex, username);
});
}, TaskContinuationOptions.OnlyOnFaulted);
As a side-note, background tasks and fire-and-forget scenarios in ASP.NET are highly discouraged. See The Dangers of Implementing Recurring Background Tasks In ASP.NET
It may sound a bit out of scope, but if you just want to forget after you launch it, why not using directly ThreadPool?
Something like:
ThreadPool.QueueUserWorkItem(
x =>
{
try
{
// Do something
...
}
catch (Exception e)
{
// Log something
...
}
});
I had to do some performance benchmarking for different async call methods and I found that (not surprisingly) ThreadPool works much better, but also that, actually, BeginInvoke is not that bad (I am on .NET 4.5). That's what I found out with the code at the end of the post. I did not find something like this online, so I took the time to check it myself. Each call is not exactly equal, but it is more or less functionally equivalent in terms of what it does:
ThreadPool: 70.80ms
Task: 90.88ms
BeginInvoke: 121.88ms
Thread: 4657.52ms
public class Program
{
public delegate void ThisDoesSomething();
// Perform a very simple operation to see the overhead of
// different async calls types.
public static void Main(string[] args)
{
const int repetitions = 25;
const int calls = 1000;
var results = new List<Tuple<string, double>>();
Console.WriteLine(
"{0} parallel calls, {1} repetitions for better statistics\n",
calls,
repetitions);
// Threads
Console.Write("Running Threads");
results.Add(new Tuple<string, double>("Threads", RunOnThreads(repetitions, calls)));
Console.WriteLine();
// BeginInvoke
Console.Write("Running BeginInvoke");
results.Add(new Tuple<string, double>("BeginInvoke", RunOnBeginInvoke(repetitions, calls)));
Console.WriteLine();
// Tasks
Console.Write("Running Tasks");
results.Add(new Tuple<string, double>("Tasks", RunOnTasks(repetitions, calls)));
Console.WriteLine();
// Thread Pool
Console.Write("Running Thread pool");
results.Add(new Tuple<string, double>("ThreadPool", RunOnThreadPool(repetitions, calls)));
Console.WriteLine();
Console.WriteLine();
// Show results
results = results.OrderBy(rs => rs.Item2).ToList();
foreach (var result in results)
{
Console.WriteLine(
"{0}: Done in {1}ms avg",
result.Item1,
(result.Item2 / repetitions).ToString("0.00"));
}
Console.WriteLine("Press a key to exit");
Console.ReadKey();
}
/// <summary>
/// The do stuff.
/// </summary>
public static void DoStuff()
{
Console.Write("*");
}
public static double RunOnThreads(int repetitions, int calls)
{
var totalMs = 0.0;
for (var j = 0; j < repetitions; j++)
{
Console.Write(".");
var toProcess = calls;
var stopwatch = new Stopwatch();
var resetEvent = new ManualResetEvent(false);
var threadList = new List<Thread>();
for (var i = 0; i < calls; i++)
{
threadList.Add(new Thread(() =>
{
// Do something
DoStuff();
// Safely decrement the counter
if (Interlocked.Decrement(ref toProcess) == 0)
{
resetEvent.Set();
}
}));
}
stopwatch.Start();
foreach (var thread in threadList)
{
thread.Start();
}
resetEvent.WaitOne();
stopwatch.Stop();
totalMs += stopwatch.ElapsedMilliseconds;
}
return totalMs;
}
public static double RunOnThreadPool(int repetitions, int calls)
{
var totalMs = 0.0;
for (var j = 0; j < repetitions; j++)
{
Console.Write(".");
var toProcess = calls;
var resetEvent = new ManualResetEvent(false);
var stopwatch = new Stopwatch();
var list = new List<int>();
for (var i = 0; i < calls; i++)
{
list.Add(i);
}
stopwatch.Start();
for (var i = 0; i < calls; i++)
{
ThreadPool.QueueUserWorkItem(
x =>
{
// Do something
DoStuff();
// Safely decrement the counter
if (Interlocked.Decrement(ref toProcess) == 0)
{
resetEvent.Set();
}
},
list[i]);
}
resetEvent.WaitOne();
stopwatch.Stop();
totalMs += stopwatch.ElapsedMilliseconds;
}
return totalMs;
}
public static double RunOnBeginInvoke(int repetitions, int calls)
{
var totalMs = 0.0;
for (var j = 0; j < repetitions; j++)
{
Console.Write(".");
var beginInvokeStopwatch = new Stopwatch();
var delegateList = new List<ThisDoesSomething>();
var resultsList = new List<IAsyncResult>();
for (var i = 0; i < calls; i++)
{
delegateList.Add(DoStuff);
}
beginInvokeStopwatch.Start();
foreach (var delegateToCall in delegateList)
{
resultsList.Add(delegateToCall.BeginInvoke(null, null));
}
// We lose a bit of accuracy, but if the loop is big enough,
// it should not really matter
while (resultsList.Any(rs => !rs.IsCompleted))
{
Thread.Sleep(10);
}
beginInvokeStopwatch.Stop();
totalMs += beginInvokeStopwatch.ElapsedMilliseconds;
}
return totalMs;
}
public static double RunOnTasks(int repetitions, int calls)
{
var totalMs = 0.0;
for (var j = 0; j < repetitions; j++)
{
Console.Write(".");
var resultsList = new List<Task>();
var stopwatch = new Stopwatch();
stopwatch.Start();
for (var i = 0; i < calls; i++)
{
resultsList.Add(Task.Factory.StartNew(DoStuff));
}
// We lose a bit of accuracy, but if the loop is big enough,
// it should not really matter
while (resultsList.Any(task => !task.IsCompleted))
{
Thread.Sleep(10);
}
stopwatch.Stop();
totalMs += stopwatch.ElapsedMilliseconds;
}
return totalMs;
}
}
Here's one of the filters in my project with creates an audit in our database every time a user accesses a resource that must be audited
Auditing is certainly not something I would call "fire and forget". Remember, on ASP.NET, "fire and forget" means "I don't care whether this code actually executes or not". So, if your desired semantics are that audits may occasionally be missing, then (and only then) you can use fire and forget for your audits.
If you want to ensure your audits are all correct, then either wait for the audit save to complete before sending the response, or queue the audit information to reliable storage (e.g., Azure queue or MSMQ) and have an independent backend (e.g., Azure worker role or Win32 service) process the audits in that queue.
But if you want to live dangerously (accepting that occasionally audits may be missing), you can mitigate the problems by registering the work with the ASP.NET runtime. Using the BackgroundTaskManager from my blog:
public override void OnActionExecuting(ActionExecutingContext filterContext)
{
var request = filterContext.HttpContext.Request;
var id = WebSecurity.CurrentUserId;
BackgroundTaskManager.Run(() =>
{
try
{
var audit = new Audit
{
Id = Guid.NewGuid(),
IPAddress = request.UserHostAddress,
UserId = id,
Resource = request.RawUrl,
Timestamp = DateTime.UtcNow
};
var database = (new NinjectBinder()).Kernel.Get<IDatabaseWorker>();
database.Audits.InsertOrUpdate(audit);
database.Save();
}
catch (Exception e)
{
var username = WebSecurity.CurrentUserName;
Debugging.DispatchExceptionEmail(e, username);
}
});
base.OnActionExecuting(filterContext);
}

Producer Consumer model using TPL, Tasks in .net 4.0

I have a fairly large XML file(around 1-2GB).
The requirement is to persist the xml data in to database.
Currently this is achieved in 3 steps.
Read the large file with less memory foot print as much as possible
Create entities from the xml-data
Store the data from the created entities in to the database using SqlBulkCopy.
To achieve better performance I want to create a Producer-consumer model where the producer creates a set of entities say a batch of 10K and adds it to a Queue. And the consumer should take the batch of entities from the queue and persist to the database using sqlbulkcopy.
Thanks,
Gokul
void Main()
{
int iCount = 0;
string fileName = #"C:\Data\CatalogIndex.xml";
DateTime startTime = DateTime.Now;
Console.WriteLine("Start Time: {0}", startTime);
FileInfo fi = new FileInfo(fileName);
Console.WriteLine("File Size:{0} MB", fi.Length / 1048576.0);
/* I want to change this loop to create a producer consumer pattern here to process the data parallel-ly
*/
foreach (var element in StreamElements(fileName,"title"))
{
iCount++;
}
Console.WriteLine("Count: {0}", iCount);
Console.WriteLine("End Time: {0}, Time Taken:{1}", DateTime.Now, DateTime.Now - startTime);
}
private static IEnumerable<XElement> StreamElements(string fileName, string elementName)
{
using (var rdr = XmlReader.Create(fileName))
{
rdr.MoveToContent();
while (!rdr.EOF)
{
if ((rdr.NodeType == XmlNodeType.Element) && (rdr.Name == elementName))
{
var e = XElement.ReadFrom(rdr) as XElement;
yield return e;
}
else
{
rdr.Read();
}
}
rdr.Close();
}
}
Is this what you are trying to do?
void Main()
{
const int inputCollectionBufferSize = 1024;
const int bulkInsertBufferCapacity = 100;
const int bulkInsertConcurrency = 4;
BlockingCollection<object> inputCollection = new BlockingCollection<object>(inputCollectionBufferSize);
Task loadTask = Task.Factory.StartNew(() =>
{
foreach (object nextItem in ReadAllElements(...))
{
// this will potentially block if there are already enough items
inputCollection.Add(nextItem);
}
// mark this collection as done
inputCollection.CompleteAdding();
});
Action parseAction = () =>
{
List<object> bulkInsertBuffer = new List<object>(bulkInsertBufferCapacity);
foreach (object nextItem in inputCollection.GetConsumingEnumerable())
{
if (bulkInsertBuffer.Length == bulkInsertBufferCapacity)
{
CommitBuffer(bulkInsertBuffer);
bulkInsertBuffer.Clear();
}
bulkInsertBuffer.Add(nextItem);
}
};
List<Task> parseTasks = new List<Task>(bulkInsertConcurrency);
for (int i = 0; i < bulkInsertConcurrency; i++)
{
parseTasks.Add(Task.Factory.StartNew(parseAction));
}
// wait before exiting
loadTask.Wait();
Task.WaitAll(parseTasks.ToArray());
}

Categories