I'm trying to use the Parallel library for my code and I'm facing a strange issue.
I made a short program to demonstrate the behavior. In short, I make 2 loops (one inside another). The first loop generates a random array of 200 integers and the second loop adds all the arrays in a big list.
The issue is, in the end, I don't get a multiple of 200 integers, instead I see some runs doesn't wait for the random array to fully be loaded.
It's difficult to explain so here the sample code:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
namespace TestParallel
{
class Program
{
static int RecommendedDegreesOfParallelism = 8;
static int DefaultMaxPageSize = 200;
static void Main(string[] args)
{
int maxPage = 50;
List<int> lstData = new List<int>();
Parallel.For(0, RecommendedDegreesOfParallelism, new ParallelOptions() { MaxDegreeOfParallelism = RecommendedDegreesOfParallelism },
(index) =>
{
int cptItems = 0;
int cptPage = 1 - RecommendedDegreesOfParallelism + index;
int idx = index;
do
{
cptPage += RecommendedDegreesOfParallelism;
if (cptPage > maxPage) break;
int Min = 0;
int Max = 20;
Random randNum = new Random();
int[] test2 = Enumerable
.Repeat(0, DefaultMaxPageSize)
.Select(i => randNum.Next(Min, Max))
.ToArray();
var lstItems = new List<int>();
lstItems.AddRange(test2);
var lstRes = new List<int>();
lstItems.AsParallel().WithDegreeOfParallelism(8).ForAll((item) =>
{
lstRes.Add(item);
});
Console.WriteLine($"{Task.CurrentId} = {lstRes.Count}");
lstData.AddRange(lstRes);
cptItems = lstRes.Count;
} while (cptItems == DefaultMaxPageSize);
}
);
Console.WriteLine($"END: {lstData.Count}");
Console.ReadKey();
}
}
}
And here is an execution log :
4 = 200
1 = 200
2 = 200
3 = 200
6 = 200
5 = 200
7 = 200
8 = 200
1 = 200
6 = 194
2 = 191
5 = 200
7 = 200
8 = 200
4 = 200
5 = 200
3 = 182
4 = 176
8 = 150
7 = 200
5 = 147
1 = 200
7 = 189
1 = 200
1 = 198
END: 4827
We can see some loops return less than 200 items.
How is it possible?
This here is not threadsafe:
lstItems.AsParallel().WithDegreeOfParallelism(8).ForAll((item) =>
{
lstRes.Add(item);
});
From the documentation for List<T>:
It is safe to perform multiple read operations on a List, but
issues can occur if the collection is modified while it's being read.
To ensure thread safety, lock the collection during a read or write
operation. To enable a collection to be accessed by multiple threads
for reading and writing, you must implement your own synchronization.
It doesn't explicitly mention it, but .Add() can also fail when called simultaneously by multiple threads.
The solution would be to lock the calls to List<T>.Add() in the loop above, but if you do that it will likely make it slower than just adding the items in a loop in a single thread.
var locker = new object();
lstItems.AsParallel().WithDegreeOfParallelism(8).ForAll((item) =>
{
lock (locker)
{
lstRes.Add(item);
}
});
Related
This question already has answers here:
Captured variable in a loop in C#
(10 answers)
Closed 2 years ago.
I have a .NET 3.1 Core app. I am learning to use the Task Parallel library. To do this, I want to create a basic Task. I want to run this Task via an extension method. At this time, I have the following:
var tasks = new List<Task<int>>(); // Used to track the calculated result
var details = new List<Details>(); // Used for debugging purposes
var random = new Random(); // Used to generate a random wait time
var offset = 10; // This value is used to perform a basic calculation
// The following tasks should run the tasks OUT OF ORDER (i.e. 4, 5, 2, 3, 1)
var stopwatch = Stopwatch.StartNew(); // Keep track of how long all five calculations take
for (var i=1; i<=5; i++)
{
var shouldBe = i + offset;
var d = new Details();
var t = new Task<int>(() => {
var ms = random.Next(1001); // Choose a wait time between 0 and 1 seconds
Thread.Sleep(ms);
d.SleepTime = ms;
var result = i + offset;
Console.WriteLine($"{i} + {offset} = {result} (over {timeout} ms.)");
return result;
});
tasks.Add(t.ExecuteAsync<int>());
details.Add(d);
}
stopwatch.Stop();
Task.WaitAll(tasks.ToArray());
foreach (var t in tasks)
{
Console.WriteLine($"Result: {t.Result}");
}
// Calculate how long the whole thing took
var totalMilliseconds = 0.0;
foreach (var log in logs)
{
totalMilliseconds = totalMilliseconds + log.TotalMilliseconds;
}
Console.WriteLine($"Runtime: {stopwatch.Elapsed.TotalMilliseconds}, Execution: {totalMilliseconds}");
MyExtensions.cs
namespace System.Threading.Tasks
{
public static class MyExtensions
{
public static async Task<T> ExecuteAsync<T>(this Task<T> code, Details details)
{
T result = default(T);
Stopwatch stopwatch = Stopwatch.StartNew();
code.Start();
await code.ConfigureAwait(false);
result = code.Result;
stopwatch.Stop();
return result;
}
}
// The following class is only for debugging purposes
public class Details
{
public int SleepTime { get; set; }
public double TotalMilliseconds { get; set; }
}
}
When I run this app, I see something like the following in the Terminal window:
6 + 10 = 16 (over 44 ms.)
6 + 10 = 16 (over 197 ms.)
6 + 10 = 16 (over 309 ms.)
6 + 10 = 16 (over 687 ms.)
6 + 10 = 16 (over 950 ms.)
Result: 16
Result: 16
Result: 16
Result: 16
Result: 16
Runtime: 3.5835, Execution: 2204.62970000004
Based on this output, it appears I have five tasks running asynchronously. The sleep time changes each time (which is good). The runtime is less than the execution time, which implies the tasks are running in parallel. However, it's almost like the tasks get ran after the 'for' loop has finished. That's the only place I can see where the 6 is coming from. But, I don't understand why, or how to fix it. The results should be more like:
4 + 10 = 14
5 + 10 = 15
2 + 10 = 12
3 + 10 = 13
1 + 10 = 11
What am I doing wrong? Why is the same i value used each time? How do I fix this?
Thank you so much for your help!
the value of i is not local to the function, aka the mem allication is the same across so its changed.
for (var i=1; i<=5; i++)
{
var shouldBe = i + offset;
var d = new Details();
var t = new Task<int>(() => {
var ms = random.Next(1001); // Choose a wait time between 0 and 1 seconds
Thread.Sleep(ms);
d.SleepTime = ms;
var result = i + offset;
Console.WriteLine($"{i} + {offset} = {result} (over {timeout} ms.)");
return result;
});
tasks.Add(t.ExecuteAsync<int>());
details.Add(d);
}
this should print correctly
for (var i=1; i<=5; i++)
{
var localCounter = i;
var shouldBe = localCounter + offset;
var d = new Details();
var t = new Task<int>(() => {
var ms = random.Next(1001); // Choose a wait time between 0 and 1 seconds
Thread.Sleep(ms);
d.SleepTime = ms;
var result = localCounter + offset;
Console.WriteLine($"{localCounter} + {offset} = {result} (over {timeout} ms.)");
return result;
});
tasks.Add(t.ExecuteAsync<int>()); //<- this extention method is just making it more completcated to read. i would remove it.
details.Add(d);
}
this should help a bucket
static async Task Main(string[] args)
{
var random = new Random();
List<Task<Details>> tasks = new List<Task<Details>>();
for (var i = 1; i <= 20; i++)
{
var localCounter = i;
var t = new Task<Details>(() => {
var ms = random.Next(1001);
Task.Delay(ms);
var result = new Details
{
Id = localCounter,
DelayTime = ms
};
Console.WriteLine($"basically done id: {localCounter}");
return result;
});
tasks.Add(t);
}
tasks.ForEach(t => t.Start());
Task.WaitAll(tasks.ToArray());
foreach (var item in tasks)
{
Console.WriteLine($"id: {item.Result.Id}");
Console.WriteLine($"random delay: {item.Result.DelayTime}");
}
}
If I need just the maximum [or the 3 biggest items] of an array, and I do it with myArray.OrderBy(...).First() [or myArray.OrderBy(...).Take(3)], it is 20 times slower than calling myArray.Max(). Is there a way to write a faster linq query? This is my sample:
using System;
using System.Linq;
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
var array = new int[1000000];
for (int i = 0; i < array.Length; i++)
{
array[i] = i;
}
var maxResults = new int[10];
var linqResults = new int[10];
var start = DateTime.Now;
for (int i = 0; i < maxResults.Length; i++)
{
maxResults[i] = array.Max();
}
var maxEnd = DateTime.Now;
for (int i = 0; i < maxResults.Length; i++)
{
linqResults[i] = array.OrderByDescending(it => it).First();
}
var linqEnd = DateTime.Now;
// 00:00:00.0748281
// 00:00:01.5321276
Console.WriteLine(maxEnd - start);
Console.WriteLine(linqEnd - maxEnd);
Console.ReadKey();
}
}
}
You sort the initial array 10 times in a loop:
for (int i = 0; i < maxResults.Length; i++)
{
linqResults[i] = array.OrderByDescending(it => it).First();
}
Let's do it once:
// 10 top item of the array
var linqResults = array
.OrderByDescending(it => it)
.Take(10)
.ToArray();
please, note, that
for (int i = 0; i < maxResults.Length; i++)
{
maxResults[i] = array.Max();
}
just repeat the same Max value 10 times (it doesn't return 10 top items)
Max method time consumption is O(n) and Ordering in the best time is O(n log(n))
First error of your code is that you are ordering 10 times which is worst scenario. You can order one time and take 10 of them like what Dmitry answered.
And also, calling Max method for 10 times does not give you 10 biggest values, just the biggest value for 10 times.
However Max method does iterating list once and keep the Max value in a seperate variable. You can rewrite this method to iterate you array and keep you 10 biggest values in your maxResults and this is the fastest way that you can get result.
It seems that others have filled the efficiency gap that Microsoft has left in linq-to-objects:
https://morelinq.github.io/3.1/ref/api/html/M_MoreLinq_MoreEnumerable_PartialSort__1_3.htm
I'm trying to get the greatest average values for different duration in a list.
Let's say I have the following data:
var randomList = new List<int>();
var random = new Random(1969);
for (var i = 0; i < 10; i++)
{
randomList.Add(random.Next(0, 500));
}
That produces the following list:
190
279
37
413
90
131
64
129
287
172
I'm trying to get the highest average values for the different sets 0-9.
Set 0 (one item in a row) = 413 (index 3)
Set 1 (two items in a row) = 252 (average index 3,4)
Set 9 (10 items in a row) = 179 (average of the entire list)
I've been beating my head on this a while. I'm trying to find an efficient way to write this so I have the least traversals as possible. In production, I'll have lists with 3500-6000 points.
How do I find the highest average values for the different sets 0-9?
This probably isn't the most efficient way to do it, but it works fine:
Basically, we use a stack to track the items we've traversed. Then to calculate the average for n last items, we peek at n items from the stack.
void Main()
{
var randomList = new List<int>();
var random = new Random(1969);
for (var i = 0; i < 10; i++)
{
randomList.Add(random.Next(0, 500));
}
// Use the values from the original post for validation
randomList = new List<int> { 190, 279, 37, 413, 90, 131, 64, 129, 287, 172 };
const int numSets = 9;
var avgDict = Enumerable.Range(1, numSets).ToDictionary(e => e, e => (double)0);
var s = new Stack<int>();
foreach (var item in randomList)
{
s.Push(item);
for (var i = 1; i <= numSets; i++)
{
if (s.Count >= i)
{
var avg = s.Take(i).Average();
if (avg > avgDict[i])
avgDict[i] = avg;
}
}
}
avgDict.Dump();
}
Yields the result:
1 413
2 251.5
3 243
4 229.75
5 201.8
6 190
7 183.714285714286
8 178.75
9 180
I'm unsure as to the implications of using a Stack for large lists, when we only need 9-10 items. Might be a good case for a custom limited size stack
In your comment, you mentioned Avg(items:0,1,2) vs Avg(items:1,2,3) vs Avg(items:2,3,4)
Not sure if this is what you want but I came up with this.
First, get random number, then get average of 3 numbers. Then, get the largest average value.
static void Main(string[] args)
{
var randomList = new List<int>();
var random = new Random(1969);
int TotalRandomNumber = 10; //Change this accordingly
for (var i = 0; i < TotalRandomNumber ; i++)
{
randomList.Add(random.Next(0, 500));
}
foreach (var item in randomList)
{
Console.WriteLine("Random Number: " + item);
}
var AveNum = new List<double>();
int range = 3; //Change this for different range
for (int i = 1; i < TotalRandomNumber - range; i++)
{
var three = randomList.GetRange(i, range);
double result = three.Average();
Console.WriteLine("Average Number: " + result);
AveNum.Add(result);
}
Console.WriteLine("Largest: " + AveNum.Max());
}
I'm trying to determine the optimal solution for this tough problem. I've got a length (let's say 11). So it's a one dimensional space 0-10. Now I've got these intervals with same length (let's assume 2 in this example). Now they're randomly distributed (overlapping, or not). Let me draw an example:
Situation:
|00|01|02|03|04|05|06|07|08|09|10| <- Space (length = 11)
|-----|
|-----|
|-----|
|-----|
|-----|
|-----| <- single interval of length = 2
Now the solution needs to find the maximal number of intervals that can fit at once without overlap.
The solution is: 4 intervals
There are three results of 4 intervals:
|00|01|02|03|04|05|06|07|08|09|10|
|-----| |-----|-----| |-----| <- result 1
|-----| |-----| |-----| |-----| <- result 2
|-----| |-----|-----| |-----| <- result 3
But there are also two more constraints as well.
If there are more results (of best solution, in this case = 4), then the one with the least number of gaps.
If there are more results still the one with the highest minimal length of all its spaces. For example the one with spaces (of length) 2 & 3 has minimal length of space = 2, that is better than 1 & 4 where the minimal length of space is only 1.
So the result 2 has 4 "continual" chunks, the other two have only 3 so the refinement is:
|00|01|02|03|04|05|06|07|08|09|10|
|-----| |-----------| |-----| <- result 1
|-----| |-----------| |-----| <- result 3
Those two got same space distributions between them, so let's take first one.
The result for the input set is:
Interval count : 4
Optimal solution: |-----| |-----------| |-----|
The algorithm has to work universally for all the space length (not only 11), all interval lengths (interval length is always <= space length) and any number of intervals.
Update:
Problematic scenario:
|00|01|02|03|04|05|06|07|08|09|
|-----|
|-----|
|-----|
|-----|
|-----|
This is a simple dynamic programming problem.
Let the total length be N and the length of a task be L.
Let F(T) be maximum number of tasks that can be selected from the sub interval (T, N), then at each unit time T, there are 3 possibilities:
There is no task that starts at T.
There is a task that starts at T, but we do not include it in the result set.
There is a task that starts at T, and we do include it in the result set.
Case 1 is simple, we just have F(T) = F(T + 1).
In case 2/3, notice that selecting a task that start a T means we must reject all tasks that start while this task is running, i.e. between T and T + L. So we get F(T) = max(F(T + 1), F(T + L) + 1).
Finally, F(N) = 0. So you just start from F(N) and work your way back to F(0).
EDIT: This will give you the maximum number of intervals, but not the set that fulfils your 2 constraints. Your explanation of the constraints is unclear to me, so I'm not sure how to help you there. In particular, I can't tell what constraint 1 means since all the solutions to your example set are apparently equal.
EDIT 2: Some further explanation as requested:
Consider your posted example, we have N = 11 and L = 2. There are tasks that start at T = 0, 3, 4, 5, 6, 9. Starting from F(11) = 0 and working backwards:
F(11) = 0
F(10) = F(11) = 0 (Since no task starts at T = 10)
F(9) = max(F(10), F(11) + 1) = 1
...
Eventually we get to F(0) = 4:
T |00|01|02|03|04|05|06|07|08|09|10|
F(T)| 4| 3| 3| 3| 3| 2| 2| 1| 1| 1| 0|
EDIT 3: Well I was curious enough about this that I wrote a solution, so may as well post it. This will give you the set that has the most tasks, with the least number of gaps, and the smallest minimum gap. The output for the examples in the question is:
(0, 2) -> (4, 6) -> (6, 8) -> (9, 11)
(0, 2) -> (4, 6) -> (8, 10)
Obviously, I make no guarantees about correctness! :)
private class Task
{
public int Start { get; set; }
public int Length { get; set; }
public int End { get { return Start + Length; } }
public override string ToString()
{
return string.Format("({0:d}, {1:d})", Start, End);
}
}
private class CacheEntry : IComparable
{
public int Tasks { get; set; }
public int Gaps { get; set; }
public int MinGap { get; set; }
public Task Task { get; set; }
public Task NextTask { get; set; }
public int CompareTo(object obj)
{
var other = obj as CacheEntry;
if (Tasks != other.Tasks)
return Tasks - other.Tasks; // More tasks is better
if (Gaps != other.Gaps)
return other.Gaps = Gaps; // Less gaps is better
return MinGap - other.MinGap; // Larger minimum gap is better
}
}
private static IList<Task> F(IList<Task> tasks)
{
var end = tasks.Max(x => x.End);
var tasksByTime = tasks.ToLookup(x => x.Start);
var cache = new List<CacheEntry>[end + 1];
cache[end] = new List<CacheEntry> { new CacheEntry { Tasks = 0, Gaps = 0, MinGap = end + 1 } };
for (int t = end - 1; t >= 0; t--)
{
if (!tasksByTime.Contains(t))
{
cache[t] = cache[t + 1];
continue;
}
foreach (var task in tasksByTime[t])
{
var oldCEs = cache[t + task.Length];
var firstOldCE = oldCEs.First();
var lastOldCE = oldCEs.Last();
var newCE = new CacheEntry
{
Tasks = firstOldCE.Tasks + 1,
Task = task,
Gaps = firstOldCE.Gaps,
MinGap = firstOldCE.MinGap
};
// If there is a task that starts at time T + L, then that will always
// be the best option for us, as it will have one less Gap than the others
if (firstOldCE.Task == null || firstOldCE.Task.Start == task.End)
{
newCE.NextTask = firstOldCE.Task;
}
// Otherwise we want the one that maximises MinGap.
else
{
var ce = oldCEs.OrderBy(x => Math.Min(x.Task.Start - newCE.Task.End, x.MinGap)).Last();
newCE.NextTask = ce.Task;
newCE.Gaps++;
newCE.MinGap = Math.Min(ce.MinGap, ce.Task.Start - task.End);
}
var toComp = cache[t] ?? cache[t + 1];
if (newCE.CompareTo(toComp.First()) < 0)
{
cache[t] = toComp;
}
else
{
var ceList = new List<CacheEntry> { newCE };
// We need to keep track of all subsolutions X that start on the interval [T, T+L] that
// have an equal number of tasks and gaps, but a possibly a smaller MinGap. This is
// because an earlier task may have an even smaller gap to this task.
int idx = newCE.Task.Start + 1;
while (idx < newCE.Task.End)
{
toComp = cache[idx];
if
(
newCE.Tasks == toComp.First().Tasks &&
newCE.Gaps == toComp.First().Gaps &&
newCE.MinGap >= toComp.First().MinGap
)
{
ceList.AddRange(toComp);
idx += toComp.First().Task.End;
}
else
idx++;
}
cache[t] = ceList;
}
}
}
var rv = new List<Task>();
var curr = cache[0].First();
while (true)
{
rv.Add(curr.Task);
if (curr.NextTask == null) break;
curr = cache[curr.NextTask.Start].First();
}
return rv;
}
public static void Main()
{
IList<Task> tasks, sol;
tasks = new List<Task>
{
new Task { Start = 0, Length = 2 },
new Task { Start = 3, Length = 2 },
new Task { Start = 4, Length = 2 },
new Task { Start = 5, Length = 2 },
new Task { Start = 6, Length = 2 },
new Task { Start = 9, Length = 2 },
};
sol = F(tasks);
foreach (var task in sol)
Console.Out.WriteLine(task);
Console.Out.WriteLine();
tasks = new List<Task>
{
new Task { Start = 0, Length = 2 },
new Task { Start = 3, Length = 2 },
new Task { Start = 4, Length = 2 },
new Task { Start = 8, Length = 2 },
};
sol = F(tasks);
foreach (var task in sol)
Console.Out.WriteLine(task);
Console.Out.WriteLine();
tasks = new List<Task>
{
new Task { Start = 0, Length = 5 },
new Task { Start = 6, Length = 5 },
new Task { Start = 7, Length = 3 },
new Task { Start = 8, Length = 9 },
new Task { Start = 19, Length = 1 },
};
sol = F(tasks);
foreach (var task in sol)
Console.Out.WriteLine(task);
Console.Out.WriteLine();
Console.In.ReadLine();
}
in my program i need to write large text files (~300 mb), the text files contains numbers seperated by spaces, i'm using this code :
TextWriter guessesWriter = TextWriter.Synchronized(new StreamWriter("guesses.txt"));
private void QueueStart()
{
while (true)
{
if (writeQueue.Count > 0)
{
guessesWriter.WriteLine(writeQueue[0]);
writeQueue.Remove(writeQueue[0]);
}
}
}
private static void Check()
{
TextReader tr = new StreamReader("data.txt");
string guess = tr.ReadLine();
b = 0;
List<Thread> threads = new List<Thread>();
while (guess != null) // Reading each row and analyze it
{
string[] guessNumbers = guess.Split(' ');
List<int> numbers = new List<int>();
foreach (string s in guessNumbers) // Converting each guess to a list of numbers
numbers.Add(int.Parse(s));
threads.Add(new Thread(GuessCheck));
threads[b].Start(numbers);
b++;
guess = tr.ReadLine();
}
}
private static void GuessCheck(object listNums)
{
List<int> numbers = (List<int>) listNums;
if (!CloseNumbersCheck(numbers))
{
writeQueue.Add(numbers[0] + " " + numbers[1] + " " + numbers[2] + " " + numbers[3] + " " + numbers[4] + " " + numbers[5] + " " + numbers[6]);
}
}
private static bool CloseNumbersCheck(List<int> numbers)
{
int divideResult = numbers[0]/10;
for (int i = 1; i < 6; i++)
{
if (numbers[i]/10 != divideResult)
return false;
}
return true;
}
the file data.txt contains data in this format : (dots mean more numbers following the same logic)
1 2 3 4 5 6 1
1 2 3 4 5 6 2
1 2 3 4 5 6 3
.
.
.
1 2 3 4 5 6 8
1 2 3 4 5 7 1
.
.
.
i know this is not very efficient and i was looking for some advice on how to make it quicker.
if you night know how to save LARGE amount of numbers more efficiently than a .txt i would appreciate it.
One way to improve efficiency is with a larger buffer on your output stream. You are using the defaults, which give you probably a 1k buffer, but you won't see maximum performance with less than a 64k buffer. Open your file like this:
new StreamWriter("guesses.txt", new UTF8Encoding(false, true), 65536)
Instead of reading and writing line by line (ReadLine and WriteLine), you should read and write big block of data (ReadBlock and Write). This way you will access disk alot less and have a big performance boost. But you will need to manage the end of each line (look at Environment.NewLine).
The effeciency could be improved by using BinaryWriter. Then you could just write out integers directly. This would allow you to skip the parsing step on the read and the ToString conversion on the write.
It also looks like you are creating a bunch of threads in there. Additional threads will slow down your performance. You should do all of the work on a single thread, since threads are very heavyweight objects.
Here is a more-or-less direct conversion of your code to use a BinaryWriter. (This does not address the thread problem.)
BinaryWriter guessesWriter = new BinaryWriter(new StreamWriter("guesses.dat"));
private void QueueStart()
{
while (true)
{
if (writeQueue.Count > 0)
{
lock (guessesWriter)
{
guessesWriter.Write(writeQueue[0]);
}
writeQueue.Remove(writeQueue[0]);
}
}
}
private const int numbersPerThread = 6;
private static void Check()
{
BinaryReader tr = new BinaryReader(new StreamReader("data.txt"));
b = 0;
List<Thread> threads = new List<Thread>();
while (tr.BaseStream.Position < tr.BaseStream.Length)
{
List<int> numbers = new List<int>(numbersPerThread);
for (int index = 0; index < numbersPerThread; index++)
{
numbers.Add(tr.ReadInt32());
}
threads.Add(new Thread(GuessCheck));
threads[b].Start(numbers);
b++;
}
}
Try using a bufferi n between. There is a BGufferdSTream. Right now you use very inefficient disc access patterns.