ParallelForEach get Directories c#

ParallelForEach get Directories c# - c#

Here is what I've done so far, I don't know if it is the best way to realize this Parallel.ForEach, because sometimes it crashes and sometimes does not, can you guys please tell me what I'm doing wrong or what can I improve on this code?
Also I've got a problem with the StopWatch it's not showing correctly at all in my textbox, always stops after end the list of all directories....
private async void omplirParallel()
{
Stopwatch clock = new Stopwatch();
clock.Restart();
int contador = 0;
DirectoryInfo nodeDir = new DirectoryInfo(#"c:\files");
Parallel.ForEach(nodeDir.GetDirectories(), async dir =>
{
foreach (string s in Directory.GetFiles(dir.FullName))
{
Invoke(new MethodInvoker(delegate { lbxParallel.Items.Add(s); }));
contador++;
await Task.Delay(1);
}
});
await Task.Delay(1);
clock.Stop();
tbTimerParallel.Text = clock.Elapsed.TotalSeconds.ToString() + " segons";
tbcontadorParallel.Text = contador + " arxius";
}
EDIT
This is my ForEach without Parallel, the thing that I've tried is implement this code adapting with the ParallalelForEach
Stopwatch stopWatch = new Stopwatch();
foreach (string d in Directory.GetDirectories(#"C:\files"))
{
foreach (string s in Directory.GetFiles(d))
{
stopWatch.Start();
listBox1.Items.Add(s);
await Task.Delay(1);
btIniciar1.Enabled = false;
}
}
btIniciar1.Enabled = true;
stopWatch.Stop();
TimeSpan ts = stopWatch.Elapsed;
textBox1.Text = ts.ToString("mm\\:ss\\.ff") + (" minuts");

Paralel, or threads may be crash if there is common objects(shared object), so you have to make sure that there is no shared object (e.g. if two of them tried to write, edit or delete the shared object it will crash, and if not it will not crash).

Related

Why my workers work distribution count does not total the number of produced items in this System.Threading.Channel sample?

Following this post, I have been playing with System.Threading.Channel to get confident enough and use it in my production code, replacing the Threads/Monitor.Pulse/Wait based approach I am currently using (described in the referred post).
Basically I created a sample with a bounded channel where I run a couple of producer tasks at the beginning and, without waiting, start my consumer tasks, which start pushing elements from the channel.
After waiting for the producers tasks to complete, I then signal the channel as complete, so the consumer tasks can quit listening to new channel elements.
My channel is a Channel<Action>, and in each action I increment the count for each given worker in the WorkDistribution concurrent dictionary, and at the end of the sample I print it so I can check I consumed as many items as I expected, and also how did the channel distributed the actions between the consumers.
For some reason this "Work Distribution footer" is not printing the same number of items as the total items produced by producer tasks.
What am I missing ?
Some of the variables present were added for the sole purpose of helping troubleshoot.
Here's the full code:
public class ChannelSolution
{
object LockObject = new object();
Channel<Action<string>> channel;
int ItemsToProduce;
int WorkersCount;
int TotalItemsProduced;
ConcurrentDictionary<string, int> WorkDistribution;
CancellationToken Ct;
public ChannelSolution(int workersCount, int itemsToProduce, int maxAllowedItems,
CancellationToken ct)
{
WorkersCount = workersCount;
ItemsToProduce = itemsToProduce;
channel = Channel.CreateBounded<Action<string>>(maxAllowedItems);
Console.WriteLine($"Created channel with max {maxAllowedItems} items");
WorkDistribution = new ConcurrentDictionary<string, int>();
Ct = ct;
}
async Task ProduceItems(int cycle)
{
for (var i = 0; i < ItemsToProduce; i++)
{
var index = i + 1 + (ItemsToProduce * cycle);
bool queueHasRoom;
var stopwatch = new Stopwatch();
stopwatch.Start();
do
{
if (Ct.IsCancellationRequested)
{
Console.WriteLine("exiting read loop - cancellation requested !");
break;
}
queueHasRoom = await channel.Writer.WaitToWriteAsync();
if (!queueHasRoom)
{
if (Ct.IsCancellationRequested)
{
Console.WriteLine("exiting read loop - cancellation"
+ " requested !");
break;
}
if (stopwatch.Elapsed.Seconds % 3 == 0)
Console.WriteLine("Channel reached maximum capacity..."
+ " producer waiting for items to be freed...");
}
}
while (!queueHasRoom);
channel.Writer.TryWrite((workerName) => action($"A{index}", workerName));
Console.WriteLine($"Channel has room, item {index} added"
+ $" - channel items count: [{channel.Reader.Count}]");
Interlocked.Increment(ref TotalItemsProduced);
}
}
List<Task> GetConsumers()
{
var tasks = new List<Task>();
for (var i = 0; i < WorkersCount; i++)
{
var workerName = $"W{(i + 1).ToString("00")}";
tasks.Add(Task.Run(async () =>
{
while (await channel.Reader.WaitToReadAsync())
{
if (Ct.IsCancellationRequested)
{
Console.WriteLine("exiting write loop - cancellation"
+ "requested !");
break;
}
if (channel.Reader.TryRead(out var action))
{
Console.WriteLine($"dequed action in worker [{workerName}]");
action(workerName);
}
}
}));
}
return tasks;
}
void action(string actionNumber, string workerName)
{
Console.WriteLine($"processing {actionNumber} in worker {workerName}...");
var secondsToWait = new Random().Next(2, 5);
Thread.Sleep(TimeSpan.FromSeconds(secondsToWait));
Console.WriteLine($"action {actionNumber} completed by worker {workerName}"
+ $" after {secondsToWait} secs! channel items left:"
+ $" [{channel.Reader.Count}]");
if (WorkDistribution.ContainsKey(workerName))
{
lock (LockObject)
{
WorkDistribution[workerName]++;
}
}
else
{
var succeeded = WorkDistribution.TryAdd(workerName, 1);
if (!succeeded)
{
Console.WriteLine($"!!! failed incremeting dic value !!!");
}
}
}
public void Summarize(Stopwatch stopwatch)
{
Console.WriteLine("--------------------------- Thread Work Distribution "
+ "------------------------");
foreach (var kv in this.WorkDistribution)
Console.WriteLine($"thread: {kv.Key} items consumed: {kv.Value}");
Console.WriteLine($"Total actions consumed: "
+ $"{WorkDistribution.Sum(w => w.Value)} - Elapsed time: "
+ $"{stopwatch.Elapsed.Seconds} secs");
}
public void Run(int producerCycles)
{
var stopwatch = new Stopwatch();
stopwatch.Start();
var producerTasks = new List<Task>();
Console.WriteLine($"Started running at {DateTime.Now}...");
for (var i = 0; i < producerCycles; i++)
{
producerTasks.Add(ProduceItems(i));
}
var consumerTasks = GetConsumers();
Task.WaitAll(producerTasks.ToArray());
Console.WriteLine($"-------------- Completed waiting for PRODUCERS -"
+ " total items produced: [{TotalItemsProduced}] ------------------");
channel.Writer.Complete(); //just so I can complete this demo
Task.WaitAll(consumerTasks.ToArray());
Console.WriteLine("----------------- Completed waiting for CONSUMERS "
+ "------------------");
//Task.WaitAll(GetConsumers().Union(producerTasks/*.Union(
// new List<Task> { taskKey })*/).ToArray());
//Console.WriteLine("Completed waiting for tasks");
Summarize(stopwatch);
}
}
And here is the calling code in Program.cs
var workersCount = 5;
var itemsToProduce = 10;
var maxItemsInQueue = 5;
var cts = new CancellationTokenSource();
var producerConsumerTests = new ProducerConsumerTests(workersCount, itemsToProduce,
maxItemsInQueue, cts.Token);
producerConsumerTests.Run(2);

From a quick look there is a race condition in the ProduceItems method, around the queueHasRoom variable. You don't need this variable. The channel.Writer.TryWrite method will tell you whether there is room in the channel's buffer or not. Alternatively you could simply await the WriteAsync method, instead of using the WaitToWriteAsync/TryWrite combo. AFAIK this combo is intended as a performance optimization of the former method. If you absolutely need to know whether there is available space before attempting to post a value, then the Channel<T> is probably not a suitable container for your use case. You'll need to find something that can be locked during the whole operation of "check-for-available-space -> create-the-value -> post-the-value", so that this operation can be made atomic.
As a side note, using a lock to protect the updating of the ConcurrentDictionary is redundant. The ConcurrentDictionary offers the AddOrUpdate method, that can replace atomically a value it contains with another value. You may had to lock if the dictionary contained mutable objects, and you needed to mutate that objects with thread-safety. But in your case the values are of type Int32, which is an immutable struct. You don't change it, you just replace it with a new Int32, which is created based on the existing value:
WorkDistribution.AddOrUpdate(workerName, 1, (_, existing) => existing + 1);

Creating Tasks dynamically and wait for completion (C#)

In my C# project I have to open a bunch of images.
Let's say we need to open 50. My plan is to create 10 Tasks, do some stuff, and then wait for each to complete before the next 10 Tasks are created.
var fd = new OpenFileDialog
{
Multiselect = true,
Title = "Open Image",
Filter = "Image|*.jpg"
};
using (fd)
{
if (fd.ShowDialog() == DialogResult.OK)
{
int i = 1;
foreach (String file in fd.FileNames)
{
if (i <= 10) {
i++;
Console.WriteLine(i + ";" + file);
Task task = new Task(() =>
{
// do some stuff
});
task.Start();
}
else
{
Task.WaitAll();
i = 1;
}
}
}
}
Console.WriteLine("Wait for Tasks");
Task.WaitAll();
Console.WriteLine("Waited);
The Code is not waiting when i=10 and at the end it is also not waiting.
Does anyone have an idea how to fix it?

Task.WaitAll expects a Task array to wait, you never pass anything in. The following change will wait all the tasks you start.
List<Task> tasksToWait = new List<Task>();
foreach (String file in fd.FileNames)
{
if (i <= 10) {
i++;
Console.WriteLine(i + ";" + file);
Task task = new Task(() =>
{
// do some stuff
});
task.Start();
tasksToWait.Add(task);
}
else
{
Task.WaitAll(tasksToWait.ToArray());
tasksToWait.Clear();
i = 1;
}
}
This is a code fragment from your code above that has changes
Note This answer does not contain a critique on your choice of design and the possible pitfalls thereof.

Use Task.Run instead of Delegate.BeginInvoke

I have recently upgraded my projects to ASP.NET 4.5 and I have been waiting a long time to use 4.5's asynchronous capabilities. After reading the documentation I'm not sure whether I can improve my code at all.
I want to execute a task asynchronously and then forget about it. The way that I'm currently doing this is by creating delegates and then using BeginInvoke.
Here's one of the filters in my project with creates an audit in our database every time a user accesses a resource that must be audited:
public override void OnActionExecuting(ActionExecutingContext filterContext)
{
var request = filterContext.HttpContext.Request;
var id = WebSecurity.CurrentUserId;
var invoker = new MethodInvoker(delegate
{
var audit = new Audit
{
Id = Guid.NewGuid(),
IPAddress = request.UserHostAddress,
UserId = id,
Resource = request.RawUrl,
Timestamp = DateTime.UtcNow
};
var database = (new NinjectBinder()).Kernel.Get<IDatabaseWorker>();
database.Audits.InsertOrUpdate(audit);
database.Save();
});
invoker.BeginInvoke(StopAsynchronousMethod, invoker);
base.OnActionExecuting(filterContext);
}
But in order to finish this asynchronous task, I need to always define a callback, which looks like this:
public void StopAsynchronousMethod(IAsyncResult result)
{
var state = (MethodInvoker)result.AsyncState;
try
{
state.EndInvoke(result);
}
catch (Exception e)
{
var username = WebSecurity.CurrentUserName;
Debugging.DispatchExceptionEmail(e, username);
}
}
I would rather not use the callback at all due to the fact that I do not need a result from the task that I am invoking asynchronously.
How can I improve this code with Task.Run() (or async and await)?

If I understood your requirements correctly, you want to kick off a task and then forget about it. When the task completes, and if an exception occurred, you want to log it.
I'd use Task.Run to create a task, followed by ContinueWith to attach a continuation task. This continuation task will log any exception that was thrown from the parent task. Also, use TaskContinuationOptions.OnlyOnFaulted to make sure the continuation only runs if an exception occurred.
Task.Run(() => {
var audit = new Audit
{
Id = Guid.NewGuid(),
IPAddress = request.UserHostAddress,
UserId = id,
Resource = request.RawUrl,
Timestamp = DateTime.UtcNow
};
var database = (new NinjectBinder()).Kernel.Get<IDatabaseWorker>();
database.Audits.InsertOrUpdate(audit);
database.Save();
}).ContinueWith(task => {
task.Exception.Handle(ex => {
var username = WebSecurity.CurrentUserName;
Debugging.DispatchExceptionEmail(ex, username);
});
}, TaskContinuationOptions.OnlyOnFaulted);
As a side-note, background tasks and fire-and-forget scenarios in ASP.NET are highly discouraged. See The Dangers of Implementing Recurring Background Tasks In ASP.NET

It may sound a bit out of scope, but if you just want to forget after you launch it, why not using directly ThreadPool?
Something like:
ThreadPool.QueueUserWorkItem(
x =>
{
try
{
// Do something
...
}
catch (Exception e)
{
// Log something
...
}
});
I had to do some performance benchmarking for different async call methods and I found that (not surprisingly) ThreadPool works much better, but also that, actually, BeginInvoke is not that bad (I am on .NET 4.5). That's what I found out with the code at the end of the post. I did not find something like this online, so I took the time to check it myself. Each call is not exactly equal, but it is more or less functionally equivalent in terms of what it does:
ThreadPool: 70.80ms
Task: 90.88ms
BeginInvoke: 121.88ms
Thread: 4657.52ms
public class Program
{
public delegate void ThisDoesSomething();
// Perform a very simple operation to see the overhead of
// different async calls types.
public static void Main(string[] args)
{
const int repetitions = 25;
const int calls = 1000;
var results = new List<Tuple<string, double>>();
Console.WriteLine(
"{0} parallel calls, {1} repetitions for better statistics\n",
calls,
repetitions);
// Threads
Console.Write("Running Threads");
results.Add(new Tuple<string, double>("Threads", RunOnThreads(repetitions, calls)));
Console.WriteLine();
// BeginInvoke
Console.Write("Running BeginInvoke");
results.Add(new Tuple<string, double>("BeginInvoke", RunOnBeginInvoke(repetitions, calls)));
Console.WriteLine();
// Tasks
Console.Write("Running Tasks");
results.Add(new Tuple<string, double>("Tasks", RunOnTasks(repetitions, calls)));
Console.WriteLine();
// Thread Pool
Console.Write("Running Thread pool");
results.Add(new Tuple<string, double>("ThreadPool", RunOnThreadPool(repetitions, calls)));
Console.WriteLine();
Console.WriteLine();
// Show results
results = results.OrderBy(rs => rs.Item2).ToList();
foreach (var result in results)
{
Console.WriteLine(
"{0}: Done in {1}ms avg",
result.Item1,
(result.Item2 / repetitions).ToString("0.00"));
}
Console.WriteLine("Press a key to exit");
Console.ReadKey();
}
/// <summary>
/// The do stuff.
/// </summary>
public static void DoStuff()
{
Console.Write("*");
}
public static double RunOnThreads(int repetitions, int calls)
{
var totalMs = 0.0;
for (var j = 0; j < repetitions; j++)
{
Console.Write(".");
var toProcess = calls;
var stopwatch = new Stopwatch();
var resetEvent = new ManualResetEvent(false);
var threadList = new List<Thread>();
for (var i = 0; i < calls; i++)
{
threadList.Add(new Thread(() =>
{
// Do something
DoStuff();
// Safely decrement the counter
if (Interlocked.Decrement(ref toProcess) == 0)
{
resetEvent.Set();
}
}));
}
stopwatch.Start();
foreach (var thread in threadList)
{
thread.Start();
}
resetEvent.WaitOne();
stopwatch.Stop();
totalMs += stopwatch.ElapsedMilliseconds;
}
return totalMs;
}
public static double RunOnThreadPool(int repetitions, int calls)
{
var totalMs = 0.0;
for (var j = 0; j < repetitions; j++)
{
Console.Write(".");
var toProcess = calls;
var resetEvent = new ManualResetEvent(false);
var stopwatch = new Stopwatch();
var list = new List<int>();
for (var i = 0; i < calls; i++)
{
list.Add(i);
}
stopwatch.Start();
for (var i = 0; i < calls; i++)
{
ThreadPool.QueueUserWorkItem(
x =>
{
// Do something
DoStuff();
// Safely decrement the counter
if (Interlocked.Decrement(ref toProcess) == 0)
{
resetEvent.Set();
}
},
list[i]);
}
resetEvent.WaitOne();
stopwatch.Stop();
totalMs += stopwatch.ElapsedMilliseconds;
}
return totalMs;
}
public static double RunOnBeginInvoke(int repetitions, int calls)
{
var totalMs = 0.0;
for (var j = 0; j < repetitions; j++)
{
Console.Write(".");
var beginInvokeStopwatch = new Stopwatch();
var delegateList = new List<ThisDoesSomething>();
var resultsList = new List<IAsyncResult>();
for (var i = 0; i < calls; i++)
{
delegateList.Add(DoStuff);
}
beginInvokeStopwatch.Start();
foreach (var delegateToCall in delegateList)
{
resultsList.Add(delegateToCall.BeginInvoke(null, null));
}
// We lose a bit of accuracy, but if the loop is big enough,
// it should not really matter
while (resultsList.Any(rs => !rs.IsCompleted))
{
Thread.Sleep(10);
}
beginInvokeStopwatch.Stop();
totalMs += beginInvokeStopwatch.ElapsedMilliseconds;
}
return totalMs;
}
public static double RunOnTasks(int repetitions, int calls)
{
var totalMs = 0.0;
for (var j = 0; j < repetitions; j++)
{
Console.Write(".");
var resultsList = new List<Task>();
var stopwatch = new Stopwatch();
stopwatch.Start();
for (var i = 0; i < calls; i++)
{
resultsList.Add(Task.Factory.StartNew(DoStuff));
}
// We lose a bit of accuracy, but if the loop is big enough,
// it should not really matter
while (resultsList.Any(task => !task.IsCompleted))
{
Thread.Sleep(10);
}
stopwatch.Stop();
totalMs += stopwatch.ElapsedMilliseconds;
}
return totalMs;
}
}

Here's one of the filters in my project with creates an audit in our database every time a user accesses a resource that must be audited
Auditing is certainly not something I would call "fire and forget". Remember, on ASP.NET, "fire and forget" means "I don't care whether this code actually executes or not". So, if your desired semantics are that audits may occasionally be missing, then (and only then) you can use fire and forget for your audits.
If you want to ensure your audits are all correct, then either wait for the audit save to complete before sending the response, or queue the audit information to reliable storage (e.g., Azure queue or MSMQ) and have an independent backend (e.g., Azure worker role or Win32 service) process the audits in that queue.
But if you want to live dangerously (accepting that occasionally audits may be missing), you can mitigate the problems by registering the work with the ASP.NET runtime. Using the BackgroundTaskManager from my blog:
public override void OnActionExecuting(ActionExecutingContext filterContext)
{
var request = filterContext.HttpContext.Request;
var id = WebSecurity.CurrentUserId;
BackgroundTaskManager.Run(() =>
{
try
{
var audit = new Audit
{
Id = Guid.NewGuid(),
IPAddress = request.UserHostAddress,
UserId = id,
Resource = request.RawUrl,
Timestamp = DateTime.UtcNow
};
var database = (new NinjectBinder()).Kernel.Get<IDatabaseWorker>();
database.Audits.InsertOrUpdate(audit);
database.Save();
}
catch (Exception e)
{
var username = WebSecurity.CurrentUserName;
Debugging.DispatchExceptionEmail(e, username);
}
});
base.OnActionExecuting(filterContext);
}

Is there any issue with using a StopWatch around Parallel.For or Parallel.ForEach?

I am using this logic in a WPF application. I am seeing the execution done in 3seconds visually (from the database), but for some reason the stop watch prints out 17s or more.
The code within the Parallel.For block does a lot of calculations and writes to db. So by visually, I mean from the database select query.
Stopwatch stopwatch = new Stopwatch();
// Begin timing
stopwatch.Start();
String currentTime = System.DateTime.Now.ToString();
statusLabel.Dispatcher.Invoke(System.Windows.Threading.DispatcherPriority.Background,
new Action(delegate() { statusLabel.Content = "New Minute data processing..." + currentTime; }));
int size = NumSymWatching;
Parallel.For(0, size, x =>
{
CalculateStudies(SymbolsToWatch[x], SymbolSessions[x]);
});
UpdateApplicationStatusTable();
stopwatch.Stop();
statusLabel.Dispatcher.Invoke(System.Windows.Threading.DispatcherPriority.Background,
new Action(delegate() { lblTimer.Content = "Time taken to run: " + stopwatch.Elapsed; }));
isRunning = false;
EDIT
After a few tests, I've found that there is no issue with using stopwatch around a Parallel.For loop.

Task Parallel Library - I don't understand what I'm doing wrong

This is a two part question.
I have a class that gets all processes asynchronously and polls them for CPU usage. Yesterday I had a bug with it and it was solved here.
The first part of the question is why the solution helped. I didn't understand the explanation.
The second part of the question is that I still get an "Object reference not set to an instance of object" exception occasionally when I try to print the result at the end of the process. This is because item.Key is indeed null. I don't understand why that is because I put a breakpoint checking for (process == null) and it was never hit. What am I doing wrong?
Code is below.
class ProcessCpuUsageGetter
{
private IDictionary<Process, int> _usage;
public IDictionary<Process, int> Usage { get { return _usage; } }
public ProcessCpuUsageGetter()
{
while (true)
{
Process[] processes = Process.GetProcesses();
int processCount = processes.Count();
Task[] tasks = new Task[processCount];
_usage = new Dictionary<Process, int>();
for (int i = 0; i < processCount; i++)
{
var localI = i;
var localProcess = processes[localI];
tasks[localI] = Task.Factory.StartNew(() => DoWork(localProcess));
}
Task.WaitAll(tasks);
foreach (var item in Usage)
{
Console.WriteLine("{0} - {1}%", item.Key.ProcessName, item.Value);
}
}
}
private void DoWork(object o)
{
Process process = (Process)o;
PerformanceCounter pc = new PerformanceCounter("Process", "% Processor Time", process.ProcessName, true);
pc.NextValue();
Thread.Sleep(1000);
int cpuPercent = (int)pc.NextValue() / Environment.ProcessorCount;
if (process == null)
{
var x = 5;
}
if (_usage == null)
{
var t = 6;
}
_usage.Add(process, cpuPercent);
}
}

The line
_usage.Add(process, cpuPercent);
is accessing a not-threadsafe collection from a thread.
Use a ConcurrentDictionary<K,V> instead of the normal dictionary.
The 'null reference' error is just a random symptom, you could get other errors too.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

ParallelForEach get Directories c# - c#

Paralel, or threads may be crash if there is common objects(shared object), so you have to make sure that there is no shared object (e.g. if two of them tried to write, edit or delete the shared object it will crash, and if not it will not crash).

Related

Why my workers work distribution count does not total the number of produced items in this System.Threading.Channel sample?

Creating Tasks dynamically and wait for completion (C#)

Use Task.Run instead of Delegate.BeginInvoke

Is there any issue with using a StopWatch around Parallel.For or Parallel.ForEach?

Task Parallel Library - I don't understand what I'm doing wrong

Categories

Resources