Null Reference while handling a List in multiple threads - c#

Basically, i have a collection of objects, i am chopping it into small collections, and doing some work on a thread over each small collection simultaneously.
int totalCount = SomeDictionary.Values.ToList().Count;
int singleThreadCount = (int)Math.Round((decimal)(totalCount / 10));
int lastThreadCount = totalCount - (singleThreadCount * 9);
Stopwatch sw = new Stopwatch();
Dictionary<int,Thread> allThreads = new Dictionary<int,Thread>();
List<rCode> results = new List<rCode>();
for (int i = 0; i < 10; i++)
{
int count = i;
if (i != 9)
{
Thread someThread = new Thread(() =>
{
List<rBase> objects = SomeDictionary.Values
.Skip(count * singleThreadCount)
.Take(singleThreadCount).ToList();
List<rCode> result = objects.Where(r => r.ZBox != null)
.SelectMany(r => r.EffectiveCBox, (r, CBox) => new rCode
{
RBox = r,
// A Zbox may refer an object that can be
// shared by many
// rCode objects even on different threads
ZBox = r.ZBox,
CBox = CBox
}).ToList();
results.AddRange(result);
});
allThreads.Add(i, someThread);
someThread.Start();
}
else
{
Thread someThread = new Thread(() =>
{
List<rBase> objects = SomeDictionary.Values
.Skip(count * singleThreadCount)
.Take(lastThreadCount).ToList();
List<rCode> result = objects.Where(r => r.ZBox != null)
.SelectMany(r => r.EffectiveCBox, (r, CBox) => new rCode
{
RBox = r,
// A Zbox may refer an object that
// can be shared by many
// rCode objects even on different threads
ZBox = r.ZBox,
CBox = CBox
}).ToList();
results.AddRange(result);
});
allThreads.Add(i, someThread);
someThread.Start();
}
}
sw.Start();
while (allThreads.Values.Any(th => th.IsAlive))
{
if (sw.ElapsedMilliseconds >= 60000)
{
results = null;
allThreads.Values.ToList().ForEach(t => t.Abort());
sw.Stop();
break;
}
}
return results != null ? results.OrderBy(r => r.ZBox.Name).ToList():null;
so, My issue is that SOMETIMES, i get a null reference exception while performing the OrderBy operation before returning the results, and i couldn't determine where is the exception exactly, i press back, click the same button that does this operation on the same data again, and it works !! .. If someone can help me identify this issue i would be more than gratefull. NOTE :A Zbox may refer an object that can be shared by many rCode objects even on different threads, can this be the issue ?
as i can't determine this upon testing, because the error happening is not deterministic.

The bug is correctly found in the chosen answer although I do not agree with the answer. You should switch to using a concurrent collection. In your case a ConcurrentBag or ConcurrentQueue. Some of which are (partially) lockfree for better performance. And they provide more readable and less code since you do not need manual locking.
Your code would also more than halve in size and double in readability if you keep from manually created threads and manual paritioning;
Parallel.ForEach(objects, MyObjectProcessor);
public void MyObjectProcessor(Object o)
{
// Create result and add to results
}
Use a ParallelOptions object if you want to limit the number of threads with Parallel.ForEach............

Well, one obvious problem is here:
results.AddRange(result);
where you're updating a list from multiple threads. Try using a lock:
object resultsLock = new object(); // globally visible
...
lock(resultsLock)
{
results.AddRange(result);
}

I suppose the problem in results = null
while (allThreads.Values.Any(th => th.IsAlive))
{ if (sw.ElapsedMilliseconds >= 60000) { results = null; allThreads.Values.ToList().ForEach(t => t.Abort());
if the threads not finised faster than 60000 ms you results become equal null and you can't call results.OrderBy(r => r.ZBox.Name).ToList(); it's throws exception
you should add something like that
if (results != null)
return results.OrderBy(r => r.ZBox.Name).ToList();
else
return null;

Related

C# List.AddRange in Parallel.For occur ArgumentException

My code like that.
List<MyPanel> list_panel = new List<MyPanel>();
.......
List<string> list_sql = new List<string>();
Parallel.For(0, list_panel.Count, i =>
{
if (list_panel[i].R == 0)
{
list_sql.AddRange(list_panel[i].MakeSqlForSave()); // it returns two string
}
});
But AddRange occur System.ArgumentException sometimes.
I found 'list isn't for multi write'. So I fix it using lock.
string[] listLock = new string[2];
Parallel.For(0, list_panel.Count, i =>
{
if (list_panel[i].R == 0)
{
listLock = list_panel[i].MakeSqlForSave();
lock(listLock)
list_sql.AddRange(listLock);
}
});
But it still occur System.ArgumentException that 'Source array was not long enough. Check srcIndex and length, and the array's lower bounds.' sometimes.
An error occurred in list_sql. If the count is 34, but the IndexOfRangeException occurs when you call list_sql[32] and list_sql[33].
How can I handle it?
Use ConcurrentBag<T> as a thread-safe collection that you can safely append to from multiple threads:
ConcurrentBag<String> result = new ConcurrentBag<String>();
Parallel.For(0, list_panel.Count, i =>
{
if (list_panel[i].R == 0)
{
foreach( String s in list_panel[i].MakeSqlForSave() )
{
result.Add( s );
}
}
});
List<String> list_sql = result.Select( s => s ).ToList(); // Serialize to a single List<T> after the concurrent operations are complete.
You must use a dedicated lock for the specific List, and use it every time you access this list (for both read and write).
List<MyPanel> list_panel = new List<MyPanel>();
List<string> list_sql = new List<string>();
object listSqlLock = new object();
Parallel.For(0, list_panel.Count, i =>
{
if (list_panel[i].R == 0)
{
var sqlCommands = list_panel[i].MakeSqlForSave();
lock (listSqlLock)
list_sql.AddRange(sqlCommands);
}
});

I have two large lists and I need get the diff between them

I have two large lists and I need get the diff between them.
The first list is from another system via webservice, the second list is from a database (destiny of data).
i will compare and get items from first list that not have in second list and insert in the database (second list source).
have another solution with best performance?
using List.Any(), the process take a lot of hours and not finish...
using for loop, the process take 10 hours or more.
Each list have 1.300.000 records
newItensForInsert = List1.Where(item1 => !List2.Any(item2 => item1.prop1 == item2.prop1 && item1.prop2 == item2.prop2)).ToList();
//or
for (int i = 0; i < List1.Count; i++)
{
if (!List2.Any(x => x.prop1 == List1[i].prop1 && x.prop2 == List1[i].prop2))
{
ListForInsert.Add(List1[i]);
}
}
//or
ListForInsert = List1.AsParallel().Except(List2.AsParallel(), IEqualityComparer).ToList();
You could use List.Except
List<object> webservice = new List<object>();
List<object> database = new List<object>();
IEnumerable<object> toPutIntoDatabase = webservice.Except(database);
database.AddRange(toPutIntoDatabase);
EDIT:
You can even use the new PLINQ (parallel LINQ) like this
IEnumerable<object> toPutIntoDatabase = webservice.AsParallel().Except(database.AsParallel());
EDIT:
Maybe you could use a Hashset to speed up lookups.
HashSet<object> databaseHash = new HashSet<object>(database);
foreach (var item in webservice)
{
if (databaseHash.Contains(item) == false)
{
database.Add(item);
}
{
If same data type then you can use List.Exists,
Else Better to go with inner join and emit
var newdata = from c in dblist
join p in list1 on c.Category equals p.Category into ps
from p in ps.DefaultIfEmpty()
it will select list if given data not present in dblist
HashSet<T> is optimized for executing this kind of set operations. In many cases it's worth the effort to create HashSets from Lists and do the set operation on the Hashsets. I demonstrated this with a little Linqpad program.
The program creates two lists containing 1,300,000 objects. It uses your method to get the difference (or better: attempted to used, because I ran out of patience). And it uses LINQ's Except and hashsets with ExceptWith, both with an IEqualityComparer. The program is listed below.
The result was:
Lists created: 00:00:00.9221369
Hashsets created: 00:00:00.1057532
Except: 00:00:00.2564191
ExceptWith: 00:00:00.0696830
So creating the HashSets and executing ExceptWith (together 0.18), beat Except (0.26s).
One caveat: creating HashSets may take too much memory since the large lists already take a fair amount of memory.
void Main()
{
var sw = Stopwatch.StartNew();
var amount = 1300000;
//amount = 50000;
var list1 = Enumerable.Range(0, amount).Select(i => new Demo(i)).ToList();
var list2 = Enumerable.Range(10, amount).Select(i => new Demo(i)).ToList();
sw.Stop();
sw.Elapsed.Dump("Lists created");
sw.Restart();
var hs1 = new HashSet<Demo>(list1, new DemoComparer());
var hs2 = new HashSet<Demo>(list2, new DemoComparer());
sw.Stop();
sw.Elapsed.Dump("Hashsets created");
sw.Restart();
// var list3 = list1.Where(item1 => !list2.Any(item2 => item1.ID == item2.ID)).ToList();
// sw.Stop();
// sw.Elapsed.Dump("Any");
// sw.Restart();
var list4 = list1.Except(list2, new DemoComparer()).ToList();
sw.Stop();
sw.Elapsed.Dump("Except");
sw.Restart();
hs1.ExceptWith(hs2);
sw.Stop();
sw.Elapsed.Dump("ExceptWith");
// list3.Count.Dump();
list4.Count.Dump();
hs1.Count.Dump();
}
// Define other methods and classes here
class Demo
{
public Demo(int id)
{
ID = id;
Name = id.ToString();
}
public int ID { get; set; }
public string Name { get; set; }
}
class DemoComparer : IEqualityComparer<Demo>
{
public bool Equals(Demo x, Demo y)
{
return (x == null && y == null)
|| (x != null && y != null) && x.ID.Equals(y.ID);
}
public int GetHashCode(Demo obj)
{
return obj.ID.GetHashCode();
}
}
Use List.Exists, it is better than List.Any Performance-wise

How to increase perfomance for loop using c#

I compare task data from Microsoft project using a nested for loop. But since the project has many records (more than 1000), it is very slow.
How do I improve the performance?
for (int n = 1; n < thisProject.Tasks.Count; n++)
{
string abc = thisProject.Tasks[n].Name;
string def = thisProject.Tasks[n].ResourceNames;
for (int l = thisProject.Tasks.Count; l > n; l--)
{
// MessageBox.Show(thisProject.Tasks[l].Name);
if (abc == thisProject.Tasks[l].Name && def == thisProject.Tasks[l].ResourceNames)
{
thisProject.Tasks[l].Delete();
}
}
}
As you notice, I am comparing the Name and ResourceNames on the individual Task and when I find a duplicate, I call Task.Delete to get rid of the duplicate
A hash check should be lot faster in this case then nested-looping i.e. O(n) vs O(n^2)
First, provide a equality comparer of your own
class TaskComparer : IEqualityComparer<Task> {
public bool Equals(Task x, Task y) {
if (ReferenceEquals(x, y)) return true;
if (ReferenceEquals(x, null)) return false;
if (ReferenceEquals(y, null)) return false;
if (x.GetType() != y.GetType()) return false;
return string.Equals(x.Name, y.Name) && string.Equals(x.ResourceNames, y.ResourceNames);
}
public int GetHashCode(Task task) {
unchecked {
return
((task?.Name?.GetHashCode() ?? 0) * 397) ^
(task?.ResourceNames?.GetHashCode() ?? 0);
}
}
}
Don't worry too much about the GetHashCode function implementation; this is just a broiler-plate code which composes a unique hash-code from its properties
Now you have this class for comparison and hashing, you can use the below code to remove your dupes
var set = new HashSet<Task>(new TaskComparer());
for (int i = thisProject.Tasks.Count - 1; i >= 0; --i) {
if (!set.Add(thisProject.Tasks[i]))
thisProject.Tasks[i].Delete();
}
As you notice, you are simply scanning all your elements, while storing them into a HashSet. This HashSet will check, based on our equality comparer, if the provided element is a duplicate or not.
Now, since you want to delete it, the detected dupes are deleted. You can modify this code to simply extract the Unique items instead of deleting the dupes, by reversing the condition to if (set.Add(thisProject.Tasks[i])) and processing within this if
Microsoft Project has a Sort method which makes simple work of this problem. Sort the tasks by Name, Resource Names, and Unique ID and then loop through comparing adjacent tasks and delete duplicates. By using Unique ID as the third sort key you can be sure to delete the duplicate that was added later. Alternatively, you can use the task ID to remove tasks that are lower down in the schedule. Here's a VBA example of how to do this:
Sub RemoveDuplicateTasks()
Dim proj As Project
Set proj = ActiveProject
Application.Sort Key1:="Name", Ascending1:=True, Key2:="Resource Names", Ascending2:=True, Key3:="Unique ID", Ascending3:=True, Renumber:=False, Outline:=False
Application.SelectAll
Dim tsks As Tasks
Set tsks = Application.ActiveSelection.Tasks
Dim i As Integer
Do While i < tsks.Count
If tsks(i).Name = tsks(i + 1).Name And tsks(i).ResourceNames = tsks(i + 1).ResourceNames Then
tsks(i + 1).Delete
Else
i = i + 1
End If
Loop
Application.Sort Key1:="ID", Renumber:=False, Outline:=False
Application.SelectBeginning
End Sub
Note: This question relates to algorithm, not syntax; VBA is easy to translate to c#.
This should give you all the items which are duplicates, so you can delete them from your original list.
thisProject.Tasks.GroupBy(x => new { x.Name, x.ResourceNames}).Where(g => g.Count() > 1).SelectMany(g => g.Select(c => c));
Note that you probably do not want to remove all of them, only the duplicate versions, so be careful how you loop through this list.
A Linq way of getting distinct elements from your Tasks list :
public class Task
{
public string Name {get;set;}
public string ResourceName {get;set;}
}
public class Program
{
public static void Main()
{
List<Task> Tasks = new List<Task>();
Tasks.Add(new Task(){Name = "a",ResourceName = "ra"});
Tasks.Add(new Task(){Name = "b",ResourceName = "rb"});
Tasks.Add(new Task(){Name = "c",ResourceName = "rc"});
Tasks.Add(new Task(){Name = "a",ResourceName = "ra"});
Tasks.Add(new Task(){Name = "b",ResourceName = "rb"});
Tasks.Add(new Task(){Name = "c",ResourceName = "rc"});
Console.WriteLine("Initial List :");
foreach(var t in Tasks){
Console.WriteLine(t.Name);
}
// Here comes the interesting part
List<Task> Tasks2 = Tasks.GroupBy(x => new {x.Name, x.ResourceName})
.Select(g => g.First()).ToList();
Console.WriteLine("Final List :");
foreach(Task t in Tasks2){
Console.WriteLine(t.Name);
}
}
}
This selects every first elements having the same Name and ResourceName.
Run the example here.

Optimizing LINQ routines

I run a build system. Datawise the simplified description would be that I have Configurations and each config has 0..n Builds.
Now builds produce artifacts and some of these are stored on server. What I am doing is writing kind of a rule, that sums all the bytes produced per configuration builds and checks if these are too much.
The code for the routine at the moment is following:
private void CalculateExtendedDiskUsage(IEnumerable<Configuration> allConfigurations)
{
var sw = new Stopwatch();
sw.Start();
// Lets take only confs that have been updated within last 7 days
var items = allConfigurations.AsParallel().Where(x =>
x.artifact_cleanup_type != null && x.build_cleanup_type != null &&
x.updated_date > DateTime.UtcNow.AddDays(-7)
).ToList();
using (var ctx = new LocalEntities())
{
Debug.WriteLine("Context: " + sw.Elapsed);
var allBuilds = ctx.Builds;
var ruleResult = new List<Notification>();
foreach (var configuration in items)
{
// all builds for current configuration
var configurationBuilds = allBuilds.Where(x => x.configuration_id == configuration.configuration_id)
.OrderByDescending(z => z.build_date);
Debug.WriteLine("Filter conf builds: " + sw.Elapsed);
// Since I don't know which builds/artifacts have been cleaned up, calculate it manually
if (configuration.build_cleanup_count != null)
{
var buildCleanupCount = "30"; // default
if (configuration.build_cleanup_type.Equals("ReserveBuildsByDays"))
{
var buildLastCleanupDate = DateTime.UtcNow.AddDays(-int.Parse(buildCleanupCount));
configurationBuilds = configurationBuilds.Where(x => x.build_date > buildLastCleanupDate)
.OrderByDescending(z => z.build_date);
}
if (configuration.build_cleanup_type.Equals("ReserveBuildsByCount"))
{
var buildLastCleanupCount = int.Parse(buildCleanupCount);
configurationBuilds =
configurationBuilds.Take(buildLastCleanupCount).OrderByDescending(z => z.build_date);
}
}
if (configuration.artifact_cleanup_count != null)
{
// skipped, similar to previous block
}
Debug.WriteLine("Done cleanup: " + sw.Elapsed);
const int maxDiscAllocationPerConfiguration = 1000000000; // 1GB
// Sum all disc usage per configuration
var confDiscSizePerConfiguration = configurationBuilds
.GroupBy(c => new {c.configuration_id})
.Where(c => (c.Sum(z => z.artifact_dir_size) > maxDiscAllocationPerConfiguration))
.Select(groupedBuilds =>
new
{
configurationId = groupedBuilds.FirstOrDefault().configuration_id,
configurationPath = groupedBuilds.FirstOrDefault().configuration_path,
Total = groupedBuilds.Sum(c => c.artifact_dir_size),
Average = groupedBuilds.Average(c => c.artifact_dir_size)
}).ToList();
Debug.WriteLine("Done db query: " + sw.Elapsed);
ruleResult.AddRange(confDiscSizePerConfiguration.Select(iter => new Notification
{
ConfigurationId = iter.configurationId,
CreatedDate = DateTime.UtcNow,
RuleType = (int) RulesEnum.TooMuchDisc,
ConfigrationPath = iter.configurationPath
}));
Debug.WriteLine("Finished loop: " + sw.Elapsed);
}
// find owners and insert...
}
}
This does exactly what I want, but I am thinking if I could make it any faster. Currenly I see:
Context: 00:00:00.0609067
// first round
Filter conf builds: 00:00:00.0636291
Done cleanup: 00:00:00.0644505
Done db query: 00:00:00.3050122
Finished loop: 00:00:00.3062711
// avg round
Filter conf builds: 00:00:00.0001707
Done cleanup: 00:00:00.0006343
Done db query: 00:00:00.0760567
Finished loop: 00:00:00.0773370
The SQL generated by .ToList() looks very messy. (Everything that is used in WHERE is covered with an index in DB)
I am testing with 200 configurations, so this adds up to 00:00:18.6326722. I have a total of ~8k items that need to get processed daily (so the whole routine takes more than 10 minutes to complete).
I have been randomly googling around this internet and it seems to me that Entitiy Framework is not very good with parallel processing. Knowing that I still decided to give this async/await approch a try (First time a tried it, so sorry for any nonsense).
Basically if I move all the processing out of scope like:
foreach (var configuration in items)
{
var confDiscSizePerConfiguration = await GetData(configuration, allBuilds);
ruleResult.AddRange(confDiscSizePerConfiguration.Select(iter => new Notification
{
... skiped
}
And:
private async Task<List<Tmp>> GetData(Configuration configuration, IQueryable<Build> allBuilds)
{
var configurationBuilds = allBuilds.Where(x => x.configuration_id == configuration.configuration_id)
.OrderByDescending(z => z.build_date);
//..skipped
var confDiscSizePerConfiguration = configurationBuilds
.GroupBy(c => new {c.configuration_id})
.Where(c => (c.Sum(z => z.artifact_dir_size) > maxDiscAllocationPerConfiguration))
.Select(groupedBuilds =>
new Tmp
{
ConfigurationId = groupedBuilds.FirstOrDefault().configuration_id,
ConfigurationPath = groupedBuilds.FirstOrDefault().configuration_path,
Total = groupedBuilds.Sum(c => c.artifact_dir_size),
Average = groupedBuilds.Average(c => c.artifact_dir_size)
}).ToListAsync();
return await confDiscSizePerConfiguration;
}
This, for some reason, drops the execution time for 200 items from 18 -> 13 sec. Anyway, from what I understand, since I am awaiting each .ToListAsync(), it is still processed in sequence, is that correct?
So the "can't process in parallel" claim starts coming out when I replace the foreach (var configuration in items) with Parallel.ForEach(items, async configuration =>. Doing this change results in:
A second operation started on this context before a previous
asynchronous operation completed. Use 'await' to ensure that any
asynchronous operations have completed before calling another method
on this context. Any instance members are not guaranteed to be thread
safe.
It was a bit confusing to me at first as I await practically in every place where the compiler allows it, but possibly the data gets seeded to fast.
I tried to overcome this by being less greedy and added the new ParallelOptions {MaxDegreeOfParallelism = 4} to that parallel loop, peasant assumption was that default connection pool size is 100, all I want to use is 4, should be plenty. But it still fails.
I have also tried to create new DbContexts inside the GetData method, but it still fails. If I remember correctly (can't test now), I got
Underlying connection failed to open
What possibilities there are to make this routine go faster?
Before going in parallel, it is worth to optimize query itself. Here are some suggestions that might improve your times:
1) Use Key when working with GroupBy. This might solve issue of complex & nested SQL query as in that way you instruct Linq to use the same keys defined in GROUP BY and not to create sub-select.
var confDiscSizePerConfiguration = configurationBuilds
.GroupBy(c => new { ConfigurationId = c.configuration_id, ConfigurationPath = c.configuration_path})
.Where(c => (c.Sum(z => z.artifact_dir_size) > maxDiscAllocationPerConfiguration))
.Select(groupedBuilds =>
new
{
configurationId = groupedBuilds.Key.ConfigurationId,
configurationPath = groupedBuilds.Key.ConfigurationPath,
Total = groupedBuilds.Sum(c => c.artifact_dir_size),
Average = groupedBuilds.Average(c => c.artifact_dir_size)
})
.ToList();
2) It seems that you are bitten by N+1 problem. In simple words - you execute one SQL query to get all configurations and N another ones to get build information. In total that would be ~8k small queries where 2 bigger queries would suffice. If used memory is not a constraint, fetch all build data in memory and optimize for fast lookup using ToLookup.
var allBuilds = ctx.Builds.ToLookup(x=>x.configuration_id);
Later you can lookup builds by:
var configurationBuilds = allBuilds[configuration.configuration_id].OrderByDescending(z => z.build_date);
3) You are doing OrderBy on configurationBuilds multiple times. Filtering does not affect record order, so you can safely remove extra calls to OrderBy:
...
configurationBuilds = configurationBuilds.Where(x => x.build_date > buildLastCleanupDate);
...
configurationBuilds = configurationBuilds.Take(buildLastCleanupCount);
...
4) There is no point to do GroupBy as builds are already filtered for a single configuration.
UPDATE:
I took it one step further and created code that would retrieve same results as your provided code with a single request. It should be more performant and use less memory.
private void CalculateExtendedDiskUsage()
{
using (var ctx = new LocalEntities())
{
var ruleResult = ctx.Configurations
.Where(x => x.build_cleanup_count != null &&
(
(x.build_cleanup_type == "ReserveBuildsByDays" && ctx.Builds.Where(y => y.configuration_id == x.configuration_id).Where(y => y.build_date > buildLastCleanupDate).Sum(y => y.artifact_dir_size) > maxDiscAllocationPerConfiguration) ||
(x.build_cleanup_type == "ReserveBuildsByCount" && ctx.Builds.Where(y => y.configuration_id == x.configuration_id).OrderByDescending(y => y.build_date).Take(buildCleanupCount).Sum(y => y.artifact_dir_size) > maxDiscAllocationPerConfiguration)
)
)
.Select(x => new Notification
{
ConfigurationId = x.configuration_id,
ConfigrationPath = x.configuration_path
CreatedDate = DateTime.UtcNow,
RuleType = (int)RulesEnum.TooMuchDisc,
})
.ToList();
}
}
First make a new context every parallel.foreach of you going to go that route. But u need to write a query that gets all the needed data in one trip. To speed up ef u can also disable change tracking or proxies on the context when ur reading data.
There are a lot of places for optimizations...
There are places where you should put .ToArray() to avoid asking multiple time to server...
I did a lot of refactor, but I'm unable to check, due lack of more information.
Maybe this can lead you to a better solution...
private void CalculateExtendedDiskUsage(IEnumerable allConfigurations)
{
var sw = new Stopwatch();
sw.Start();
using (var ctx = new LocalEntities())
{
Debug.WriteLine("Context: " + sw.Elapsed);
var allBuilds = ctx.Builds;
var ruleResult = GetRulesResult(sw, allConfigurations, allBuilds); // Clean Code!!!
// find owners and insert...
}
}
private static IEnumerable<Notification> GetRulesResult(Stopwatch sw, IEnumerable<Configuration> allConfigurations, ICollection<Configuration> allBuilds)
{
// Lets take only confs that have been updated within last 7 days
var ruleResult = allConfigurations
.AsParallel() // Check if you really need this right here...
.Where(IsConfigElegible) // Clean Code!!!
.SelectMany(x => CreateNotifications(sw, allBuilds, x))
.ToArray();
Debug.WriteLine("Finished loop: " + sw.Elapsed);
return ruleResult;
}
private static bool IsConfigElegible(Configuration x)
{
return x.artifact_cleanup_type != null &&
x.build_cleanup_type != null &&
x.updated_date > DateTime.UtcNow.AddDays(-7);
}
private static IEnumerable<Notification> CreateNotifications(Stopwatch sw, IEnumerable<Configuration> allBuilds, Configuration configuration)
{
// all builds for current configuration
var configurationBuilds = allBuilds
.Where(x => x.configuration_id == configuration.configuration_id);
// .OrderByDescending(z => z.build_date); <<< You should order only when needed (most at the end)
Debug.WriteLine("Filter conf builds: " + sw.Elapsed);
configurationBuilds = BuildCleanup(configuration, configurationBuilds); // Clean Code!!!
configurationBuilds = ArtifactCleanup(configuration, configurationBuilds); // Clean Code!!!
Debug.WriteLine("Done cleanup: " + sw.Elapsed);
const int maxDiscAllocationPerConfiguration = 1000000000; // 1GB
// Sum all disc usage per configuration
var confDiscSizePerConfiguration = configurationBuilds
.OrderByDescending(z => z.build_date) // I think that you can put this even later (or not to have anyway)
.GroupBy(c => c.configuration_id) // No need to create a new object, just use the property
.Where(c => (c.Sum(z => z.artifact_dir_size) > maxDiscAllocationPerConfiguration))
.Select(CreateSumPerConfiguration);
Debug.WriteLine("Done db query: " + sw.Elapsed);
// Extracting to variable to be able to return it as function result
var notifications = confDiscSizePerConfiguration
.Select(CreateNotification);
return notifications;
}
private static IEnumerable<Configuration> BuildCleanup(Configuration configuration, IEnumerable<Configuration> builds)
{
// Since I don't know which builds/artifacts have been cleaned up, calculate it manually
if (configuration.build_cleanup_count == null) return builds;
const int buildCleanupCount = 30; // Why 'string' if you always need as integer?
builds = GetDiscartBelow(configuration, buildCleanupCount, builds); // Clean Code (almost)
builds = GetDiscartAbove(configuration, buildCleanupCount, builds); // Clean Code (almost)
return builds;
}
private static IEnumerable<Configuration> ArtifactCleanup(Configuration configuration, IEnumerable<Configuration> configurationBuilds)
{
if (configuration.artifact_cleanup_count != null)
{
// skipped, similar to previous block
}
return configurationBuilds;
}
private static SumPerConfiguration CreateSumPerConfiguration(IGrouping<object, Configuration> groupedBuilds)
{
var configuration = groupedBuilds.First();
return new SumPerConfiguration
{
configurationId = configuration.configuration_id,
configurationPath = configuration.configuration_path,
Total = groupedBuilds.Sum(c => c.artifact_dir_size),
Average = groupedBuilds.Average(c => c.artifact_dir_size)
};
}
private static IEnumerable<Configuration> GetDiscartBelow(Configuration configuration,
int buildCleanupCount,
IEnumerable<Configuration> configurationBuilds)
{
if (!configuration.build_cleanup_type.Equals("ReserveBuildsByDays"))
return configurationBuilds;
var buildLastCleanupDate = DateTime.UtcNow.AddDays(-buildCleanupCount);
var result = configurationBuilds
.Where(x => x.build_date > buildLastCleanupDate);
return result;
}
private static IEnumerable<Configuration> GetDiscartAbove(Configuration configuration,
int buildLastCleanupCount,
IEnumerable<Configuration> configurationBuilds)
{
if (!configuration.build_cleanup_type.Equals("ReserveBuildsByCount"))
return configurationBuilds;
var result = configurationBuilds
.Take(buildLastCleanupCount);
return result;
}
private static Notification CreateNotification(SumPerConfiguration iter)
{
return new Notification
{
ConfigurationId = iter.configurationId,
CreatedDate = DateTime.UtcNow,
RuleType = (int)RulesEnum.TooMuchDisc,
ConfigrationPath = iter.configurationPath
};
}
}
internal class SumPerConfiguration {
public object configurationId { get; set; } //
public object configurationPath { get; set; } // I did use 'object' cause I don't know your type data
public int Total { get; set; }
public double Average { get; set; }
}

Optimize performance of a Parallel.For

I have replaced a for loop in my code with a Parallel.For. The performance improvement is awesome (1/3 running time). I've tried to account for shared resources using an array to gather result codes. I then process the array out side the Parallel.For. Is this the most efficient way or will blocking still occur even if no iteration can ever share the same loop-index? Would a CompareExchange perform much better?
int[] pageResults = new int[arrCounter];
Parallel.For(0, arrCounter, i =>
{
AlertToQueueInput input = new AlertToQueueInput();
input.Message = Messages[i];
pageResults[i] = scCommunication.AlertToQueue(input).ReturnCode;
});
foreach (int r in pageResults)
{
if (r != 0 && outputPC.ReturnCode == 0) outputPC.ReturnCode = r;
}
It depends on whether you have any (valuable) side-effects in the main loop.
When the outputPC.ReturnCode is the only result, you can use PLINQ:
outputPC.ReturnCode = Messages
.AsParallel()
.Select(msg =>
{
AlertToQueueInput input = new AlertToQueueInput();
input.Message = msg;
return scCommunication.AlertToQueue(input).ReturnCode;
})
.FirstOrDefault(r => r != 0);
This assumes scCommunication.AlertToQueue() is thread-safe and you don't want to call it for the remaining items after the first error.
Note that FirstOrDefault() in PLinq is only efficient in Framework 4.5 and later.
You could replace:
foreach (int r in pageResults)
{
if (r != 0 && outputPC.ReturnCode == 0) outputPC.ReturnCode = r;
}
with:
foreach (int r in pageResults)
{
if (r != 0)
{
outputPC.ReturnCode = r;
break;
}
}
This will then stop the loop on the first fail.
I like David Arno's solution, but as I see you can improve the speed with putting the check inside the parallel loop and breaking directly from it. Anyway you put the main code to fail if any of iterations failed , so there is no need for further iterations.
Something like this:
Parallel.For(0, arrCounter, (i, loopState) =>
{
AlertToQueueInput input = new AlertToQueueInput();
input.Message = Messages[i];
var code = scCommunication.AlertToQueue(input).ReturnCode;
if (code != 0)
{
outputPC.ReturnCode = code ;
loopState.Break();
}
});
Upd 1:
If you need to save the result of all iterations you can do something like this:
int[] pageResults = new int[arrCounter];
Parallel.For(0, arrCounter, (i, loopState) =>
{
AlertToQueueInput input = new AlertToQueueInput();
input.Message = Messages[i];
var code = scCommunication.AlertToQueue(input).ReturnCode;
pageResults[i] = code ;
if (code != 0 && outputPC.ReturnCode == 0)
outputPC.ReturnCode = code ;
});
It will save you from the foreach loop which is an improvement although a small one.
UPD 2:
just found this post and I think custom parallel is a good solution too. But it's your call to decide if it fits to your task.

Categories