No performance gains with Parallel.ForEach and Regex?

No performance gains with Parallel.ForEach and Regex? - c#

I have a program which color codes a returned results set a certain way depending on what the results are. Due to the length of time it takes to color-code the results (currently being done with Regex and RichTextBox.Select + .SelectionColor), I cut off color-coding at 400 results. At around that number it takes about 20 seconds, which is just about max time of what I'd consider reasonable.
To try an improve performance I re-wrote the Regex part to use a Parallel.ForEach loop to iterate through the MatchCollection, but the time was about the same (18-19 seconds vs 20)! Is just not a job that lends itself to Parallel programming very well? Should I try something different? Any advice is welcome. Thanks!
PS: Thought it was a bit strange that my CPU utilization never went about 14%, with or without Parallel.ForEach.
Code
MatchCollection startMatches = Regex.Matches(tempRTB.Text, startPattern);
object locker = new object();
System.Threading.Tasks.Parallel.ForEach(startMatches.Cast<Match>(), m =>
{
int i = 0;
foreach (Group g in m.Groups)
{
if (i > 0 && i < 5 && g.Length > 0)
{
tempRTB.Invoke(new Func<bool>(
delegate
{
lock (locker)
{
tempRTB.Select(g.Index, g.Length);
if ((i & 1) == 0) // Even number
tempRTB.SelectionColor = Namespace.Properties.Settings.Default.ValueColor;
else // Odd number
tempRTB.SelectionColor = Namespace.Properties.Settings.Default.AttributeColor;
return true;
}
}));
}
else if (i == 5 && g.Length > 0)
{
var result = tempRTB.Invoke(new Func<string>(
delegate
{
lock (locker)
{
return tempRTB.Text.Substring(g.Index, g.Length);
}
}));
MatchCollection subMatches = Regex.Matches((string)result, pattern);
foreach (Match subMatch in subMatches)
{
int j = 0;
foreach (Group subGroup in subMatch.Groups)
{
if (j > 0 && subGroup.Length > 0)
{
tempRTB.Invoke(new Func<bool>(
delegate
{
lock (locker)
{
tempRTB.Select(g.Index + subGroup.Index, subGroup.Length);
if ((j & 1) == 0) // Even number
tempRTB.SelectionColor = Namespace.Properties.Settings.Default.ValueColor;
else // Odd number
tempRTB.SelectionColor = Namespace.Properties.Settings.Default.AttributeColor;
return true;
}
}));
}
j++;
}
}
}
i++;
}
});

Virtually no aspect of your program is actually able to run in parallel.
The generation of the matches needs to be done sequentially. It can't find the second match until it has already found the first. Parallel.ForEach will, at best, allow you to process the results of the sequence in parallel, but they are still generated sequentially. This is where the majority of your time consuming work seems to be, and there are no gains there.
On top of that, you aren't really processing the results in parallel either. The majority of code run in the body of your loop is all inside an invoke to the UI thread, which means it's all being run by a single thread.
In short, only a tiny, tiny bit of your program is actually run in parallel, and using parallelization in general adds some overhead; it sounds like you're just barely getting more than that overhead. There isn't really much that you did wrong, the operation just inherently doesn't lend itself to parallelization, unless there is an effective way of breaking up the initial string into several smaller chucks that the regex can parse individually (in parallel).

The most time in your code is most likely spent in the part that actually selects the text in the richtext box and sets the color.
This code is impossible to execute in parallel, because it has to be marshalled to the UI thread - which you do via tempRTB.Invoke.
Furthermore, you explicitly make sure that the highlighting is not executed in parallel but sequentially by using the lock statement. This is unnecessary, because all of that code is run on the single UI thread anyway.
You could try to improve your performance by suspending the layouting of your UI while you select and color the text in the RTB:
tempRTB.SuspendLayout();
// your loop
tempRTB.ResumeLayout();

Related

No memory leaks or errors but my code slows down exponentially C#

I am perplexed by this issue. I believe I'm just missing an easy problem right in front of my face but I'm at the point where I need a second opinion to point out anything obvious that I'm missing. I minimized my code and simplified it so it only shows a small part of what it does. The full code is just many different calculations added on to what I have below.
for (int h = 2; h < 200; h++)
{
var List1 = CalculateSomething(testValues, h);
var masterLists = await AddToRsquaredList("Calculation1", h, actualValuesList, List1, masterLists.Item1, masterLists.Item2);
var List2 = CalculateSomething(testValues, h);
masterLists = await AddToRsquaredList("Calculation2", h, actualValuesList, List2, masterLists.Item1, masterLists.Item2);
var List3 = CalculateSomething(testValues, h);
masterLists = await AddToRsquaredList("Calculation3", h, actualValues, List3, masterLists.Item1, masterLists.Item2);
}
public static async Task<(List<RSquaredValues3>, List<ValueClass>)> AddToRsquaredList(string valueName, int days,
IEnumerable<double> estimatedValuesList, IEnumerable<double> actualValuesList,
List<RSquaredValues3> rSquaredList, List<ValueClass> valueClassList)
{
try
{
RSquaredValues3 rSquaredValue = new RSquaredValues3
{
ValueName = valueName,
Days = days,
RSquared = GoodnessOfFit.CoefficientOfDetermination(estimatedValuesList, actualValuesList),
StdError = GoodnessOfFit.PopulationStandardError(estimatedValuesList, actualValuesList)
};
int comboSize = 15;
double max = 0;
var query = await rSquaredList.OrderBy(i => i.StdError - i.RSquared).DistinctBy(i => i.ValueName).Take(comboSize).ToListAsync().ConfigureAwait(false);
if (query.Count > 0)
{
max = query.Last().StdError - query.Last().RSquared;
}
else
{
max = 10000000;
}
if ((rSquaredValue.StdError - rSquaredValue.RSquared < max || query.Count < comboSize) && rSquaredList.Contains(rSquaredValue) == false)
{
rSquaredList.Add(rSquaredValue);
valueClassList.Add(new ValueClass { ValueName = rSquaredValue.ValueName, ValueList = estimatedValuesList, Days = days });
}
}
catch (Exception ex)
{
ThrowExceptionInfo(ex);
}
return (rSquaredList, valueClassList);
}

There is clearly a significance to StdError - RSquared, so change RSquaredValues3 to expose that value (i.e. calculate it once, on construction, since the values do not change) rather than recalculating it in multiple places during the processing loop.
The value in this new property is the way that the list is being sorted. Rather than sorting the list over and over again, consider keeping the items in the list in that order in the first place. You can do this by ensuring that each time an item gets added, it is inserted in the right place in the list. This is called an insertion sort. (I have assumed that SortedList<TKey,TValue> is inappropriate due to duplicate 'key's.)
Similar improvements can be made to avoid the need for DistinctBy(i => i.ValueName). If you are only interested in distinct value names, then consider avoiding inserting the item if it is not providing an improvement.
Your List needs to grow during your processing - under the hood, the list doubles every time it grows, so the number of growths is O(log(n)). You can specify a suggested capacity in construction. If you specify the expected size large enough at the start, then the list will not need to do this during your processing.
The await of the ToListAsync is not adding any advantage to this code, as far as I can see.
The check for rSquaredList.Contains(rSquaredValue) == false looks like a redundant check, since this is a reference comparison of a newly instantiated item which cannot have been inserted in the list. So you can remove it to make it run faster.

With all that use of Task and await, you are not actually gaining anything at the moment, since you have a single thread handling it and are waiting for execution sequentially, so it appears to all be overhead. I am not sure if you can parallelize this workload but the main loop from 2 to 200 seems like a prime candidate for a Parallel.For() loop instead. You should also look into using a System.Collections.Concurrent.ConcurrentBag() for your master list if you implement parallelism to avoid deadlock issues.

How to loop a list while using Multi-threading in c#

I have a List to loop while using multi-thread,I will get the first item of the List and do some processing,then remove the item.
While the count of List is not greater than 0 ,fetch data from data.
In a word:
In have a lot of records in my database.I need to publish them to my server.In the process of publishing, multithreading is required and the number of threads may be 10 or less.
For example:
private List<string> list;
void LoadDataFromDatabase(){
list=...;//load data from database...
}
void DoMethod()
{
While(list.Count>0)
{
var item=list.FirstOrDefault();
list.RemoveAt(0);
DoProcess();//how to use multi-thread (custom the count of theads)?
if(list.Count<=0)
{
LoadDataFromDatabase();
}
}
}
Please help me,I'm a beginner of c#,I have searched a lot of solutions, but no similar.
And more,I need to custom the count of theads.

Should your processing of the list be sequential? In other words, cannot you process element n + 1 while not finished yet processing of element n? If this is your case, then Multi-Threading is not the right solution.
Otherwise, if your processing elements are fully independent, you can use m threads, deviding Elements.Count / m elements for each thread to work on
Example: printing a list:
List<int> a = new List<int> { 1, 2, 3, 4,5 , 6, 7, 8, 9 , 10 };
int num_threads = 2;
int thread_elements = a.Count / num_threads;
// start the threads
Thread[] threads = new Thread[num_threads];
for (int i = 0; i < num_threads; ++i)
{
threads[i] = new Thread(new ThreadStart(Work));
threads[i].Start(i);
}
// this works fine if the total number of elements is divisable by num_threads
// but if we have 500 elements, 7 threads, then thread_elements = 500 / 7 = 71
// but 71 * 7 = 497, so that there are 3 elements not processed
// process them here:
int actual = thread_elements * num_threads;
for (int i = actual; i < a.Count; ++i)
Console.WriteLine(a[i]);
// wait all threads to finish
for (int i = 0; i < num_threads; ++i)
{
threads[i].Join();
}
void Work(object arg)
{
Console.WriteLine("Thread #" + arg + " has begun...");
// calculate my working range [start, end)
int id = (int)arg;
int mystart = id * thread_elements;
int myend = (id + 1) * thread_elements;
// start work on my range !!
for (int i = mystart; i < myend; ++i)
Console.WriteLine("Thread #" + arg + " Element " + a[i]);
}
ADD For your case, (uploading to server), it is the same as the code obove. You assign a number of threads, assigning each thread number of elements (which is auto calculated in the variable thread_elements, so you need only to change num_threads). For method Work, all you need is replacing the line Console.WriteLine("Thread #" + arg + " Element " + a[i]); with you uploading code.
One more thing to keep in mind, that multi-threading is dependent on your machine CPU. If your CPU has 4 cores, for example, then the best performance obtained would be 4 threads at maximum, so that assigning each core a thread. Otherwise, if you have 10 threads, for example, they would be slower than 4 threads because they will compete on CPU cores (Unless the threads are idle, waiting for some event to occur (e.g. uploading). In this case, 10 threads can run, because they don't take %100 of CPU usage)
WARNING: DO NOT modify the list while any thread is working (add, remove, set element...), neither assigning two threads the same element. Such things cause you a lot of bugs and exceptions !!!

This is a simple scenario that can be expanded in multiple ways if you add some details to your requirements:
IEnumerable<Data> LoadDataFromDatabase()
{
return ...
}
void ProcessInParallel()
{
while(true)
{
var data = LoadDataFromDatabase().ToList();
if(!data.Any()) break;
data.AsParallel().ForEach(ProcessSingleData);
}
}
void ProcessSingleData(Data d)
{
// do something with data
}

There are many ways to approach this. You can create threads and partition the list yourself or you can take advantage of the TPL and utilize Parallel.ForEach. In the example on the link you see a Action is called for each member of the list being iterated over. If this is your first taste of threading I would also attempt to do it the old fashioned way.

Here my opinion ;)
You can avoid use multithread if youur "List" is not really huge.
Instead of a List, you can use a Queue (FIFO - First In First Out). Then only use Dequeue() method to get one element of the Queue, DoSomeWork and get the another. Something like:
while(queue.Count > 0)
{
var temp = DoSomeWork(queue.Dequeue());
}
I think that this will be better for your propose.

I will get the first item of the List and do some processing,then remove the item.
Bad.
First, you want a queue, not a list.
Second, you do not process then remove, you remove THEN process.
Why?
So that you keep the locks small. Lock list access (note you need to synchonize access), remove, THEN unlock immediately and then process. THis way you keep the locks short. If you take, process, then remove - you basically are single threaded as you have to keep the lock in place while processing, so the next thread does not take the same item again.
And as you need to synchronize access and want multiple threads this is about the only way.
Read up on the lock statement for a start (you can later move to something like spinlock). Do NOT use threads unless you ahve to put schedule Tasks (using the Tasks interface new in 4.0), which gives you more flexibility.

ThreadPool uses excessive amounts of memory in just a few seconds

I have made a simple console application for printing prime numbers. I am using ThreadPool for the function which checks whether a number is prime or not.
In the task manager , this program starts to take too much memory ( 1 GB in a few seconds )
How do I improve that if I have to still use ThreadPool?
Here is the code which I wrote
class Program
{
static void Main(string[] args)
{
Console.WriteLine(2);
Console.WriteLine(3);
Console.WriteLine(5);
Console.WriteLine(7);
Console.WriteLine(11);
Console.WriteLine(13);
Console.WriteLine(17);
for (long i = 19; i < Int64.MaxValue; i = i+2)
{
if(i % 3 == 0 || i % 5 == 0 || i % 7 == 0 || i % 11 == 0 || i % 13 == 0 || i % 17 == 0 )
continue;
ThreadPool.QueueUserWorkItem(CheckForPrime, i);
}
Console.Read();
}
private static void CheckForPrime(object i)
{
var i1 = i as long?;
var val = Math.Sqrt(i1.Value);
for (long j = 19; j <= val; j = j + 2)
{
if (i1 % j == 0) return;
}
Console.WriteLine(i1);
}
}

Simplest way to fix your code, just limit the work queue using a semaphore;
class Program
{
// Max 100 items in queue
private static readonly Semaphore WorkLimiter = new Semaphore(100, 100);
static void Main(string[] args)
{
Console.WriteLine(2);
Console.WriteLine(3);
Console.WriteLine(5);
Console.WriteLine(7);
Console.WriteLine(11);
Console.WriteLine(13);
Console.WriteLine(17);
for (long i = 19; i < Int64.MaxValue; i = i + 2)
{
if (i % 3 == 0 || i % 5 == 0 || i % 7 == 0 || i % 11 == 0 || i % 13 == 0 || i % 17 == 0)
continue;
// Get one of the 100 "allowances" to add to the queue.
WorkLimiter.WaitOne();
ThreadPool.QueueUserWorkItem(CheckForPrime, i);
}
Console.Read();
}
private static void CheckForPrime(object i)
{
var i1 = i as long?;
try
{
var val = Math.Sqrt(i1.Value);
for (long j = 19; j <= val; j = j + 2)
{
if (i1%j == 0) return;
}
Console.WriteLine(i1);
}
finally
{
// Allow another add to the queue
WorkLimiter.Release();
}
}
}
This will allow you to keep the queue full (100 items in queue) at all times, without over-filling it or adding a Sleep.

To put it rather bluntly, you're doing multi-threading wrong. Threads are a powerful tool when used correctly, but like all tools, they're not the correct solution in every case. A glass bottle works well for holding beer, but not so great for hammering nails.
In the general case, creating more threads is not going to make things run any faster, and that is particularly true here as you've discovered. The code you've written queues up a new thread each iteration through your loop, and each of those threads will allocate a stack. Since the default size for a stack in the .NET world is 1 MB, it doesn't take very long for your memory commitment to skyrocket. It therefore comes as no particular surprise that you're exceeding 1 GB. Eventually, you'll run into a hard memory limit and get an OutOfMemoryException thrown at you. And memory is just the most obvious of the resources that your design is quickly starving your system for. Unless your system resources can grow exponentially with your thread pool, you're not going to experience any performance advantages.
Adil suggests inserting a call to Thread.Sleep to give the new thread you create time to run before continuing through the loop (and creating additional threads). As I mentioned in a comment, although this "works", it seems like an awfully ugly hack to me. But it's hard for me to suggest a better solution because the real problem is the design. You say that you have to use a thread pool, but you don't say why that is the case.
If you absolutely have to use a thread pool, the best workaround is probably to set an arbitrary limit on the size of the thread pool (i.e., how many new threads it can spawn), which is accomplished by calling the SetMaxThreads method. This feels at least a little less hacky to me than Thread.Sleep.
Note: If you decide to pursue the SetMaxThreads approach, you should be aware that you cannot set the maximum to less than the minimum. The default value for the minimum is the number of CPU cores, so if you have a dual-core processor, you can't set the maximum to 1 without first lowering the minimum.
Finally, although it doesn't really change the answer in this case, it is worth noting that Task Manager is not a memory profiler. Relying on it as if it were one will frequently get you bad (or at least very misleading) data.
Edit: After further thought, it occurs to me that the problem really lies not with exponential execution but with exponential querying. The maximum number of allowed threads is probably irrelevant because the code will still queue 'em up faster than they can ever hope to be processed. So never mind about limiting the size. You probably want to go with Joachim's solution involving the creation of a semaphore, or the implicit suggestion that everyone has made of not using a thread pool.

You are creating threads in loop without any break. You should give some break to the process of creating thread so that some of thread finish their execution before you create more thread in the ThreadPool. You can use System.Threading.Thread.Sleep for that.
for (long i = 19; i < Int64.MaxValue; i = i+2)
{
if(i % 3 == 0 || i % 5 == 0 || i % 7 == 0 || i % 11 == 0 || i % 13 == 0 || i % 17 == 0 )
continue;
ThreadPool.QueueUserWorkItem(CheckForPrime, i);
System.Threading.Thread.Sleep(100);
}
You should know where to use threads and they will be beneficial and how many threads you require and what would be the impact of thread on application performance. It depends upon the application that for how much time it would suspend the current thread. I just gave 100 miliseconds, you adjust according to your application.

How come this algorithm in Ruby runs faster than in Parallel'd C#?

The following ruby code runs in ~15s. It barely uses any CPU/Memory (about 25% of one CPU):
def collatz(num)
num.even? ? num/2 : 3*num + 1
end
start_time = Time.now
max_chain_count = 0
max_starter_num = 0
(1..1000000).each do |i|
count = 0
current = i
current = collatz(current) and count += 1 until (current == 1)
max_chain_count = count and max_starter_num = i if (count > max_chain_count)
end
puts "Max starter num: #{max_starter_num} -> chain of #{max_chain_count} elements. Found in: #{Time.now - start_time}s"
And the following TPL C# puts all my 4 cores to 100% usage and is orders of magnitude slower than the ruby version:
static void Euler14Test()
{
Stopwatch sw = new Stopwatch();
sw.Start();
int max_chain_count = 0;
int max_starter_num = 0;
object locker = new object();
Parallel.For(1, 1000000, i =>
{
int count = 0;
int current = i;
while (current != 1)
{
current = collatz(current);
count++;
}
if (count > max_chain_count)
{
lock (locker)
{
max_chain_count = count;
max_starter_num = i;
}
}
if (i % 1000 == 0)
Console.WriteLine(i);
});
sw.Stop();
Console.WriteLine("Max starter i: {0} -> chain of {1} elements. Found in: {2}s", max_starter_num, max_chain_count, sw.Elapsed.ToString());
}
static int collatz(int num)
{
return num % 2 == 0 ? num / 2 : 3 * num + 1;
}
How come ruby runs faster than C#? I've been told that Ruby is slow. Is that not true when it comes to algorithms?
Perf AFTER correction:
Ruby (Non parallel): 14.62s
C# (Non parallel): 2.22s
C# (With TPL): 0.64s

Actually, the bug is quite subtle, and has nothing to do with threading. The reason that your C# version takes so long is that the intermediate values computed by the collatz method eventually start to overflow the int type, resulting in negative numbers which may then take ages to converge.
This first happens when i is 134,379, for which the 129th term (assuming one-based counting) is 2,482,111,348. This exceeds the maximum value of 2,147,483,647 and therefore gets stored as -1,812,855,948.
To get good performance (and correct results) on the C# version, just change:
int current = i;
…to:
long current = i;
…and:
static int collatz(int num)
…to:
static long collatz(long num)
That will bring down your performance to a respectable 1.5 seconds.
Edit: CodesInChaos raises a very valid point about enabling overflow checking when debugging math-oriented applications. Doing so would have allowed the bug to be immediately identified, since the runtime would throw an OverflowException.

Should be:
Parallel.For(1L, 1000000L, i =>
{
Otherwise, you have integer overfill and start checking negative values. The same collatz method should operate with long values.

I experienced something like that. And I figured out that's because each of your loop iterations need to start other thread and this takes some time, and in this case it's comparable (I think it's more time) than the operations you acctualy do in the loop body.
There is an alternative for that: You can get how many CPU cores you have and than use a parallelism loop with the same number of iterations you have cores, each loop will evaluate part of the acctual loop you want, it's done by making an inner for loop that depends on the parallel loop.
EDIT: EXAMPLE
int start = 1, end = 1000000;
Parallel.For(0, N_CORES, n =>
{
int s = start + (end - start) * n / N_CORES;
int e = n == N_CORES - 1 ? end : start + (end - start) * (n + 1) / N_CORES;
for (int i = s; i < e; i++)
{
// Your code
}
});
You should try this code, I'm pretty sure this will do the job faster.
EDIT: ELUCIDATION
Well, quite a long time since I answered this question, but I faced the problem again and finally understood what's going on.
I've been using AForge implementation of Parallel for loop, and it seems like, it fires a thread for each iteration of the loop, so, that's why if the loop takes relatively a small amount of time to execute, you end up with a inefficient parallelism.
So, as some of you pointed out, System.Threading.Tasks.Parallel methods are based on Tasks, which are kind of a higher level of abstraction of a Thread:
"Behind the scenes, tasks are queued to the ThreadPool, which has been enhanced with algorithms that determine and adjust to the number of threads and that provide load balancing to maximize throughput. This makes tasks relatively lightweight, and you can create many of them to enable fine-grained parallelism."
So yeah, if you use the default library's implementation, you won't need to use this kind of "bogus".

Why does this threaded code run so much slower?

I'm writing an N-Body simulation, and for computational simplification I've divided the whole space into a number of uniformly-sized regions.
For each body, I compute the force of all other bodies in the same region, and for the other regions I aggregate the mass and distances together so there's less work to be done.
I have a List<Region> and Region defines public void Index() which sums the total mass at this iteration.
I have two variants of my Space.Tick() function:
public void Tick()
{
foreach (Region r in Regions)
r.Index();
}
This is very quick. For 20x20x20 = 8000 regions with 100 bodies each = 800000 bodies in total, it only takes about 0.1 seconds to do this. The CPU graph shows 25% utilisation on my quad-core, which is exactly what I would expect.
Now I write this multi-threaded variant:
public void Tick()
{
Thread[] threads = new Thread[Environment.ProcessorCount];
foreach (Region r in Regions)
while (true)
{
bool queued = false;
for (int i = 0; i < threads.Length; i++)
if (threads[i] == null || !threads[i].IsAlive)
{
Region s = r;
threads[i] = new Thread(s.Index);
threads[i].Start();
queued = true;
break;
}
if (queued)
break;
}
}
So a quick explanation in case it's not obvious: threads is an array of 4, in the case of my CPU. It starts off being 4xnull. For each region, I loop through all 4 Thread objects (which could be null). When I find one that's either null or isn't IsAlive, I queue up the Index() of that Region and Start() it. I set queued to true so that I can tell that the region has started indexing.
This code takes about 7 seconds. That's 70x slower. I understand that there's a bit of overhead involved with setting up the threads, finding a thread that's vacant, etc. But I would still expect that I would have at least some sort of performance gain.
What am I doing wrong?

Why not try PLINQ?
Regions.AsParallel().ForAll(x=>x.Index());
PLINQ is usually SUPER fast for me, and it scales dependent on your environment.. If it shouldn't be Parallel, it does single thread.
So, if you had to have a multidimensional array come into the function, you could just do this:
Regions.AsParallel().Cast<Region>().ForAll(x=>x.Index());

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.