Optimize performance of a Parallel.For - c#

I have replaced a for loop in my code with a Parallel.For. The performance improvement is awesome (1/3 running time). I've tried to account for shared resources using an array to gather result codes. I then process the array out side the Parallel.For. Is this the most efficient way or will blocking still occur even if no iteration can ever share the same loop-index? Would a CompareExchange perform much better?
int[] pageResults = new int[arrCounter];
Parallel.For(0, arrCounter, i =>
{
AlertToQueueInput input = new AlertToQueueInput();
input.Message = Messages[i];
pageResults[i] = scCommunication.AlertToQueue(input).ReturnCode;
});
foreach (int r in pageResults)
{
if (r != 0 && outputPC.ReturnCode == 0) outputPC.ReturnCode = r;
}

It depends on whether you have any (valuable) side-effects in the main loop.
When the outputPC.ReturnCode is the only result, you can use PLINQ:
outputPC.ReturnCode = Messages
.AsParallel()
.Select(msg =>
{
AlertToQueueInput input = new AlertToQueueInput();
input.Message = msg;
return scCommunication.AlertToQueue(input).ReturnCode;
})
.FirstOrDefault(r => r != 0);
This assumes scCommunication.AlertToQueue() is thread-safe and you don't want to call it for the remaining items after the first error.
Note that FirstOrDefault() in PLinq is only efficient in Framework 4.5 and later.

You could replace:
foreach (int r in pageResults)
{
if (r != 0 && outputPC.ReturnCode == 0) outputPC.ReturnCode = r;
}
with:
foreach (int r in pageResults)
{
if (r != 0)
{
outputPC.ReturnCode = r;
break;
}
}
This will then stop the loop on the first fail.

I like David Arno's solution, but as I see you can improve the speed with putting the check inside the parallel loop and breaking directly from it. Anyway you put the main code to fail if any of iterations failed , so there is no need for further iterations.
Something like this:
Parallel.For(0, arrCounter, (i, loopState) =>
{
AlertToQueueInput input = new AlertToQueueInput();
input.Message = Messages[i];
var code = scCommunication.AlertToQueue(input).ReturnCode;
if (code != 0)
{
outputPC.ReturnCode = code ;
loopState.Break();
}
});
Upd 1:
If you need to save the result of all iterations you can do something like this:
int[] pageResults = new int[arrCounter];
Parallel.For(0, arrCounter, (i, loopState) =>
{
AlertToQueueInput input = new AlertToQueueInput();
input.Message = Messages[i];
var code = scCommunication.AlertToQueue(input).ReturnCode;
pageResults[i] = code ;
if (code != 0 && outputPC.ReturnCode == 0)
outputPC.ReturnCode = code ;
});
It will save you from the foreach loop which is an improvement although a small one.
UPD 2:
just found this post and I think custom parallel is a good solution too. But it's your call to decide if it fits to your task.

Related

trying to figure out how to do a fast complex sorting without LINQ

I guess I'm too used to use LINQ but this is slow, I did use a profiler and it consume 65% of the time for what I'm trying to do
var unlock = Locked.OrderBy(x => x.Weight) //double
.ThenByDescending(x => x.Stuff?.Level ?? 100) //int
.ThenBy(x => x.Penalty) //double
.FirstOrDefault();
Locked is a list, I know the sort will change the list but I don't really care, I just want to make it work (if possible), the code below doesn't give the same result as the LINQ above;
Locked.Sort(delegate (Node a, Node b)
{
int xdiff = a.Weight.CompareTo(b.Weight);
if (xdiff != 0) return xdiff;
var aStuff = a.Stuff?.Level ?? 100;
var bStuff = b.Stuff?.Level ?? 100;
xdiff = -1 * aStuff.CompareTo(bStuff);
if (xdiff != 0) return xdiff;
return xdiff = a.Penalty.CompareTo(b.Penalty);
});
var unlock = Locked[0];
first thing is, is it possible with the List.Sort to do that complex sorting? asc / then desc / then asc ?
if yes, where is my error?
next is, is there a faster way of doing what I'm trying to do?
If you're just after the "first or default" (min/max), you don't need to sort - you can do this in a single O(N) pass. Pick the first item and store it in a variable; now loop over all the other items in turn: if it is preferable by your criteria: shove it in the variable. When you get to the end, you have your winner.
You could create a custom comparer:
var comparer = Comparer<SomeItem>.Create((a, b) =>
a.A == a.B
? a.B == b.B
? a.C == b.C
? 0
: b.C - a.C //the order of these subtractions will affect the order
: a.B - b.B
: a.A - b.A);
then use morelinq's MinBy:
IEnumerable<SomeItem> items = ...;
var best = items.MinBy(x => x, comparer); //.First() if it's morelinq3
...or if creating that comparer looks scary, I wrote a ComparerBuilder library for making it a bit easier:
var builder = new ComparerBuilder<Item>();
var comparer = builder
.SortKey(x => x.A)
.ThenKeyDescending(x => x.B)
.ThenKey(x => x.C)
.Build();
var selectedItem = items.MinBy(x => x, comparer).First();
based on Mark answer I came up with this;
Node unlock = null;
Node locked = null;
if (Locked.Count > 0)
{
unlock = Locked[0];
for (var i = 1; i < Locked.Count; ++i)
{
locked = Locked[i];
if (unlock.Weight > locked.Weight)
{
unlock = locked;
}
else if (unlock.Weight == locked.Weight)
{
var unlockStuffLevel = unlock.Stuff?.Level ?? 100;
var lockedStuffLevel = locked.Stuff?.Level ?? 100;
if (unlockStuffLevel < lockedStuffLevel)
{
unlock = locked;
}
else if (unlockStuffLevel == lockedStuffLevel )
{
if (unlock.Penalty > locked.Penalty)
{
unlock = locked;
}
}
}
}
}
profiling show that this now take about 15% instead of 65% before, this seem to replicate the same result as the LINQ, I might refine it more later but for now I got what I wanted

Does Linq provide a way to easily spot gaps in a sequence?

I am managing a directory of files. Each file will be named similarly to Image_000000.png, with the numeric portion being incremented for each file that is stored.
Files can also be deleted, leaving gaps in the number sequence. The reason I am asking is because I recognize that at some point in the future, the user could use up the number sequence unless I takes steps to reuse numbers when they become available. I realize that it is a million, and that's a lot, but we have 20-plus year users, so "someday" is not out of the question.
So, I am specifically asking whether or not there exists a way to easily determine the gaps in the sequence without simply looping. I realize that because it's a fixed range, I could simply loop over the expected range.
And I will unless there is a better/cleaner/easier/faster alternative. If so, I'd like to know about it.
This method is called to obtain the next available file name:
public static String GetNextImageFileName()
{
String retFile = null;
DirectoryInfo di = new DirectoryInfo(userVars.ImageDirectory);
FileInfo[] fia = di.GetFiles("*.*", SearchOption.TopDirectoryOnly);
String lastFile = fia.Where(i => i.Name.StartsWith("Image_") && i.Name.Substring(6, 6).ContainsOnlyDigits()).OrderBy(i => i.Name).Last().Name;
if (!String.IsNullOrEmpty(lastFile))
{
Int32 num;
String strNum = lastFile.Substring(6, 6);
String strExt = lastFile.Substring(13);
if (!String.IsNullOrEmpty(strNum) &&
!String.IsNullOrEmpty(strExt) &&
strNum.ContainsOnlyDigits() &&
Int32.TryParse(strNum, out num))
{
num++;
retFile = String.Format("Image_{0:D6}.{1}", num, strExt);
while (num <= 999999 && File.Exists(retFile))
{
num++;
retFile = String.Format("Image_{0:D6}.{1}", num, strExt);
}
}
}
return retFile;
}
EDIT: in case it helps anyone, here is the final method, incorporating Daniel Hilgarth's answer:
public static String GetNextImageFileName()
{
DirectoryInfo di = new DirectoryInfo(userVars.ImageDirectory);
FileInfo[] fia = di.GetFiles("Image_*.*", SearchOption.TopDirectoryOnly);
List<Int32> fileNums = new List<Int32>();
foreach (FileInfo fi in fia)
{
Int32 i;
if (Int32.TryParse(fi.Name.Substring(6, 6), out i))
fileNums.Add(i);
}
var result = fileNums.Select((x, i) => new { Index = i, Value = x })
.Where(x => x.Index != x.Value)
.Select(x => (Int32?)x.Index)
.FirstOrDefault();
Int32 index;
if (result == null)
index = fileNums.Count - 1;
else
index = result.Value - 1;
var nextNumber = fileNums[index] + 1;
if (nextNumber >= 0 && nextNumber <= 999999)
return String.Format("Image_{0:D6}", result.Value);
return null;
}
A very simple approach to find the first number of the first gap would be the following:
int[] existingNumbers = /* extract all numbers from all filenames and order them */
var allNumbers = Enumerable.Range(0, 1000000);
var result = allNumbers.Where(x => !existingNumbers.Contains(x)).First();
This will return 1,000,000 if all numbers have been used and no gaps exist.
This approach has the drawback that it performs rather badly, as it iterates existingNumbers multiple times.
A somewhat better approach would be to use Zip:
allNumbers.Zip(existingNumbers, (a, e) => new { Number = a, ExistingNumber = e })
.Where(x => x.Number != x.ExistingNumber)
.Select(x => x.Number)
.First();
An improved version of DuckMaestro's answer that actually returns the first value of the first gap - and not the first value after the first gap - would look like this:
var tmp = existingNumbers.Select((x, i) => new { Index = i, Value = x })
.Where(x => x.Index != x.Value)
.Select(x => (int?)x.Index)
.FirstOrDefault();
int index;
if(tmp == null)
index = existingNumbers.Length - 1;
else
index = tmp.Value - 1;
var nextNumber = existingNumbers[index] + 1;
Improving over the other answer, use the alternate version of Where.
int[] existingNumbers = ...
var result = existingNumbers.Where( (x,i) => x != i ).FirstOrDefault();
The value i is a counter starting at 0.
This version of where is supported in .NET 3.5 (http://msdn.microsoft.com/en-us/library/bb549418(v=vs.90).aspx).
var firstnonexistingfile = Enumerable.Range(0,999999).Select(x => String.Format("Image_{0:D6}.{1}", x, strExt)).FirstOrDefault(x => !File.Exists(x));
This will iterate from 0 to 999999, then output the result of the String.Format() as an IEnumerable<string> and then find the first string out of that sequence that returns false for File.Exists().
It's an old question, but it has been suggested (in the comments) that you could use .Except() instead. I tend to like this solution a little better since it will give you the first missing number (the gap) or the next smallest number in the sequence. Here's an example:
var allNumbers = Enumerable.Range(0, 999999); //999999 is arbitrary. You could use int.MaxValue, but it would degrade performance
var existingNumbers = new int[] { 0, 1, 2, 4, 5, 6 };
int result;
var missingNumbers = allNumbers.Except(existingNumbers);
if (missingNumbers.Any())
result = missingNumbers.First();
else //no missing numbers -- you've reached the max
result = -1;
Running the above code would set result to:
3
Additionally, if you changed existingNumbers to:
var existingNumbers = new int[] { 0, 1, 3, 2, 4, 5, 6 };
So there isn't a gap, you would get 7 back.
Anyway, that's why I prefer Except over the Zip solution -- just my two cents.
Thanks!

iterating through IEnumerable<string> causing serious performance issue

I am clue less about what has happend to performance of for loop when i tried to iterate through IEnumerable type.
Following is the code that cause serious performance issue
foreach (IEdge ed in edcol)
{
IEnumerable<string> row =
from r in dtRow.AsEnumerable()
where (((r.Field<string>("F1") == ed.Vertex1.Name) &&
(r.Field<string>("F2") == ed.Vertex2.Name))
|| ((r.Field<string>("F1") == ed.Vertex2.Name) &&
(r.Field<string>("F2") == ed.Vertex1.Name)))
select r.Field<string>("EdgeId");
int co = row.Count();
//foreach (string s in row)
//{
//}
x++;
}
The upper foreach(IEdge ed in edcol) has about 11000 iteration to complete.
It runs in fraction of seconds if i remove the line
int co = row.Count();
from the code.
The row.Count() have maximum value of 10 in all loops.
If i Uncomment the
//foreach (string s in row)
//{
//}
it goes for about 10 minutes to complete the execution of code.
Does IEnumerable type have such a serious performance issues.. ??
This answer is for the implicit question of "how do I make this much faster"? Apologies if that's not actually what you were after, but...
You can go through the rows once, grouping by the names. (I haven't done the ordering like Marc has - I'm just looking up twice when querying :)
var lookup = dtRow.AsEnumerable()
.ToLookup(r => new { F1 = r.Field<string>("F1"),
F2 = r.Field<string>("F2") });
Then:
foreach (IEdge ed in edcol)
{
// Need to check both ways round...
var first = new { F1 = ed.Vertex1.Name, F2 = ed.Vertex2.Name };
var second = new { F1 = ed.Vertex2.Name, F2 = ed.Vertex1.Name };
var firstResult = lookup[first];
var secondResult = lookup[second];
// Due to the way Lookup works, this is quick - much quicker than
// calling query.Count()
var count = firstResult.Count() + secondResult.Count();
var query = firstResult.Concat(secondResult);
foreach (var row in query)
{
...
}
}
At the moment you have O(N*M) performance, which could be probematic if both N and M are large. I would be inclined to pre-compute some of the DataTable info. For example, we could try:
var lookup = dtRows.AsEnumerable().ToLookup(
row => string.Compare(row.Field<string>("F1"),row.Field<string>("F2"))<0
? Tuple.Create(row.Field<string>("F1"), row.Field<string>("F2"))
: Tuple.Create(row.Field<string>("F2"), row.Field<string>("F1")),
row => row.Field<string>("EdgeId"));
then we can iterate that:
foreach(IEdge ed in edCol)
{
var name1 = string.Compare(ed.Vertex1.Name,ed.Vertex2.Name) < 0
? ed.Vertex1.Name : ed.Vertex2.Name;
var name2 = string.Compare(ed.Vertex1.Name,ed.Vertex2.Name) < 0
? ed.Vertex2.Name : ed.Vertex1.Name;
var matches = lookup[Tuple.Create(name1,name2)];
// ...
}
(note I enforced ascending alphabetical pairs in there, for convenience)

Null Reference while handling a List in multiple threads

Basically, i have a collection of objects, i am chopping it into small collections, and doing some work on a thread over each small collection simultaneously.
int totalCount = SomeDictionary.Values.ToList().Count;
int singleThreadCount = (int)Math.Round((decimal)(totalCount / 10));
int lastThreadCount = totalCount - (singleThreadCount * 9);
Stopwatch sw = new Stopwatch();
Dictionary<int,Thread> allThreads = new Dictionary<int,Thread>();
List<rCode> results = new List<rCode>();
for (int i = 0; i < 10; i++)
{
int count = i;
if (i != 9)
{
Thread someThread = new Thread(() =>
{
List<rBase> objects = SomeDictionary.Values
.Skip(count * singleThreadCount)
.Take(singleThreadCount).ToList();
List<rCode> result = objects.Where(r => r.ZBox != null)
.SelectMany(r => r.EffectiveCBox, (r, CBox) => new rCode
{
RBox = r,
// A Zbox may refer an object that can be
// shared by many
// rCode objects even on different threads
ZBox = r.ZBox,
CBox = CBox
}).ToList();
results.AddRange(result);
});
allThreads.Add(i, someThread);
someThread.Start();
}
else
{
Thread someThread = new Thread(() =>
{
List<rBase> objects = SomeDictionary.Values
.Skip(count * singleThreadCount)
.Take(lastThreadCount).ToList();
List<rCode> result = objects.Where(r => r.ZBox != null)
.SelectMany(r => r.EffectiveCBox, (r, CBox) => new rCode
{
RBox = r,
// A Zbox may refer an object that
// can be shared by many
// rCode objects even on different threads
ZBox = r.ZBox,
CBox = CBox
}).ToList();
results.AddRange(result);
});
allThreads.Add(i, someThread);
someThread.Start();
}
}
sw.Start();
while (allThreads.Values.Any(th => th.IsAlive))
{
if (sw.ElapsedMilliseconds >= 60000)
{
results = null;
allThreads.Values.ToList().ForEach(t => t.Abort());
sw.Stop();
break;
}
}
return results != null ? results.OrderBy(r => r.ZBox.Name).ToList():null;
so, My issue is that SOMETIMES, i get a null reference exception while performing the OrderBy operation before returning the results, and i couldn't determine where is the exception exactly, i press back, click the same button that does this operation on the same data again, and it works !! .. If someone can help me identify this issue i would be more than gratefull. NOTE :A Zbox may refer an object that can be shared by many rCode objects even on different threads, can this be the issue ?
as i can't determine this upon testing, because the error happening is not deterministic.
The bug is correctly found in the chosen answer although I do not agree with the answer. You should switch to using a concurrent collection. In your case a ConcurrentBag or ConcurrentQueue. Some of which are (partially) lockfree for better performance. And they provide more readable and less code since you do not need manual locking.
Your code would also more than halve in size and double in readability if you keep from manually created threads and manual paritioning;
Parallel.ForEach(objects, MyObjectProcessor);
public void MyObjectProcessor(Object o)
{
// Create result and add to results
}
Use a ParallelOptions object if you want to limit the number of threads with Parallel.ForEach............
Well, one obvious problem is here:
results.AddRange(result);
where you're updating a list from multiple threads. Try using a lock:
object resultsLock = new object(); // globally visible
...
lock(resultsLock)
{
results.AddRange(result);
}
I suppose the problem in results = null
while (allThreads.Values.Any(th => th.IsAlive))
{ if (sw.ElapsedMilliseconds >= 60000) { results = null; allThreads.Values.ToList().ForEach(t => t.Abort());
if the threads not finised faster than 60000 ms you results become equal null and you can't call results.OrderBy(r => r.ZBox.Name).ToList(); it's throws exception
you should add something like that
if (results != null)
return results.OrderBy(r => r.ZBox.Name).ToList();
else
return null;

C# MultiThreading Loop entire DataTable while limiting threads to 4

This may be a tricky question to ask, but what I have is a DataTable that contains 1000 rows. Foreach of these rows I want to process on a new thread. However I want to limit the threads to 4 threads. So basically I'm constently keeping 4 threads running until the whole datatable has been processed.
currently I have this;
foreach (DataRow dtRow in urlTable.Rows)
{
for (int i = 0; i < 4; i++)
{
Thread thread = new Thread(() => MasterCrawlerClass.MasterCrawlBegin(dtRow));
thread.Start();
}
}
I know this is backwards but i'm not sure how to achieve what I'm looking for. I thought of a very complicated while loop but maybe that's not the best way? Any help is always appreciated.
Simplest solution would be in case you have 4 CPU cores - Parallel LINQ +Degree of parallelism == 4 would give you one threads per CPU core, otherwise you have manually distribute records between threads/tasks, see both solutions below:
PLINQ solution:
urlTable.Rows.AsParallel().WithDegreeOfParallelism(4)
.Select(....)
Manual distribution:
You can distribute items by worker threads manually using simple trick:
N-thread would pick up each N+4 item from the input list, for instance:
First thread: Each0+4 == 0, 3, 7...
Second: Each1+4 == 1, 4, 8...
Third: Each2+4 == ...
Task Parallel Library solution:
private void ProcessItems(IEnumerable<string> items)
{
// TODO: ..
}
var items = new List<string>(Enumerable.Range(0, 1000)
.Select(i => i + "_ITEM"));
var items1 = items.Where((item, index) => (index + 0) % 4 == 0);
var items2 = items.Where((item, index) => (index + 1) % 4 == 0);
var items3 = items.Where((item, index) => (index + 2) % 4 == 0);
var items4 = items.Where((item, index) => (index + 3) % 4 == 0);
var tasks = new Task[]
{
factory.StartNew(() => ProcessItems((items1))),
factory.StartNew(() => ProcessItems((items2))),
factory.StartNew(() => ProcessItems((items3))),
factory.StartNew(() => ProcessItems((items4)))
};
Task.WaitAll(tasks);
MSDN:
WithDegreeOfParallelism():
Introduction to PLINQ

Categories