Unwanted delay between starting each BackgroundWorker array element - c#

I want to create an array of BackgroundWorkers which work alongside each other to take advantage of multicore processors. Let's say I want eight of them working simultaneously. I also want a separate Timer thread to report in the GUI on some data which each of the threads has been processing.
My problem is this. It takes around 3-4 seconds for each worker to start work. So the first worker element in the array starts straight away. The second starts a few seconds after. The third starts a few seconds after that etc. etc.
I want them all to start straight away. How can I fix this?
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
BackgroundWorker[] bw;
int threads = 8;
data[] d;
private void Form1_Load(object sender, EventArgs e)
{
d = new data[threads];
for (int i = 0; i < threads; i++) d[i] = new data(i);
bw = new BackgroundWorker[threads];
for (int i = 0; i < threads; i++)
{
bw[i] = new BackgroundWorker();
bw[i].DoWork += new DoWorkEventHandler(Work);
}
timer1.Enabled = true;
for (int i = 0; i < threads; i++) bw[i].RunWorkerAsync(d[i]);
}
private void Work(object sender, DoWorkEventArgs e)
{
data o = (data)e.Argument;
while (true) o.count += o.id;
}
private void timer1_Tick(object sender, EventArgs e)
{
StringBuilder sb = new StringBuilder();
long total = 0;
for (int n = 0; n < threads; n++) {
sb.Append(d[n].id + ": " + d[n].count + "\n");
total+=d[n].count;
}
sb.Append("TOTAL: "+total);
richTextBox1.Text = sb.ToString();
}
}
public class data {
public int id;
public long count;
public data(int id)
{
this.id = id;
this.count = 0;
}
}
-------------------------------EDIT: Found out later than the 3-4 second delay only applies beyond the maximum number of logical cores you have. So if you have 20 cores, and set 20 threads that's fine. However, if you have 4 cores, and set 20 threads, there will a delay between creating each of the 16 threads beyond the first 4.

I want them all to start straight away.
Then Backgroundworkers are not the right tool. And maybe even Windows is not the right OS.
Backgroundworkers run on the ThreadPool, the guideline is to use that only for short (< 500ms) tasks. The Threadpool creates new threads at 2/second (roughly).
The short solution is to increase ThreadPool.Minthreads (look up the actual name/method) but that's still a whacky solution.
You did not provide enough (real) information for a better advice.

I suppose, that the line while (true) o.count += o.id; consumes so much CPU that other operations may be blocked.
Try to add Thread.Sleep(100); before o.count += o.id; or reduce the number of threads (8 threads + GUI thread => 9 Threads just for your application).

Related

Trying to assign a large workload into a thread pool (Unity)

I have a very specific and demanding workload I am trying to multithreaded. This is very new to me so I am struggling to find an effective solution.
Program Description: UpdateEquations() is cycling through a list of mathematical functions to update the coordinates of rendered lines. By default, func.Count = 3, so this will call CordCalc() 1500 times every frame. I am using NClac to parse a function string and write the result to the Function list, which will later be used before the end of the frame (irrelevant).
Goal: I want to put each cycle of the for(int a) loop inside its own thread. Since for(int a) will only loop 3 times, I just need to start three threads. I cannot continue the for(int i) loop until for(int a) is fully calculated. I am calculating a very large about of small tasks so it would be too expensive to assign each task to the thread.
What I am currently trying to do: I am trying to use a ThreadPool queue, however I'm not sure how to wait for them all to finish before continuing onto the next for(int i) iteration. Furthermore, while the program compiles and executes, the performance is disastrous. Probably %5 of my original performance. I am not sure if creating a "new WaitCallback" is expensive or not. I was looking for a way to predefined threads somehow so that I don't have to reinitialize them 1500 times a frame. (Which is what I suspect the issue is).
Other things I've tried: I tried using new Thread(() => CordCalc(a, i)); however this seemed to have much worse performance. I saw online somewhere that using a ThreadPool would be less expensive.
(This code is shortened for readability and relevance)
public List<Function> func;
private Expression[] exp;
private int lines_i;
private int lines_a;
public void Start()
{
func = new List<Function>();
exp = new Expression[func.Count];
for (int i = 0; i < func.Count; i++) exp[i] = new Expression(func[i].function);
}
//Calculate
public void CordCalc(object state)
{
for (int b = 0; b < func.Count; b++)
exp[lines_a].Parameters[func[b].name] = func[b].mainCords[lines_i - 1];
exp[lines_a].Parameters["t"] = t;
try
{
func[lines_a].mainCords[lines_i] = Convert.ToSingle(exp[lines_a].Evaluate());
}
catch
{
Debug.Log("input Error");
func[lines_a].mainCords[lines_i] = 0;
}
}
private void UpdateEquations()
{
//Initialize equations
for (int a = 0; a < func.Count; a++)
{
func[a].mainCords[0] = t;
}
lines_i = 1;
for (int i = 1; i < 500; i++)
{
lines_a = 0;
for (int a = 0; a < func.Count; a++)
{
//Calculate
ThreadPool.QueueUserWorkItem(new WaitCallback(CordCalc));
//This was something else that I tried, which gave worse results:
//threads[a] = new Thread(() => CordCalc(a, i));
//threads[a].Start();
//t.Join();
//This was my original method call without multithreading
//func[a].mainCords[i] = CordCalc(a, i);
lines_a++;
}
lines_i++;
}
private void FixedUpdate()
{
t += step * (2 + step) * 0.05f;
UpdateEquations();
}
//Function List
public class Function
{
public string name;
public string function;
public float[] mainCords;
//Constructor
public Function(string nameIn, string funcIn)
{
name = nameIn;
function = funcIn;
}
public void SetSize(int len)
{
mainCords = new float[len];
}
}

Why Lock statement doesn't work as expected

static List<int> sharedCollection = new List<int>();
static readonly Object obj = new Object();
static void Main(string[] args)`enter code here`
{
var writeThread = new Thread(() =>
{
for (int i = 0; i < 10; i++)
{
lock (obj)
{
Write();
}
}
});
var readThread = new Thread(() =>
{
for (int i = 0; i < 10; i++)
{
lock (obj)
{
Read();
}
}
});
writeThread.Start();
readThread.Start();
Console.ReadLine();
}
static void Read()
{
Console.Write("Current collection state: ");
sharedCollection.ForEach((e) => Console.Write($"{e} "));
Console.WriteLine();
}
static void Write()
{
Random generator = new Random();
var addedValue = generator.Next(1, 20);
sharedCollection.Add(addedValue);
Console.WriteLine($"Added value is: {addedValue}");
}
I spend a lot of time trying to understand why I receive this:
console result
Could someone explain to me what is wrong with this code?
Mutex works fine but I need to illustrate lock statement too...
I expect that after every adding in 1st thread I obtain a collection state from the 2nd thread. Like this:
Added value: 1
Collection state: 1
Added value: 15
Collection state: 1 15
Added value: 4
Collection state: 1 15 4
I understand you expeected those threasd to run somewhat in paralell, but instead they executed sequentially. You expectation is correct.
I do not think it has anything to do with lock, however. lock will only prevent a read and a write from happening at the same time, not produce this behavior. Try it without the lock to verify. (However due to things like the JiT Compiler, CPU cache invalidations and Optimisations, results may still differet if there is a lock, even if it has no direct effect).
My best bet is that the read thread is simply so slow, it does not finish once before the write is through all it's itteartions. Writing the UI is expensive, even on something as trivial as the console. Or even especially there. I do a lot of backups of userprofiles using robocopy. And if it hits a lot of very small files, just writing the Console becomes the actuall programm bottleneck, ever over disk access. And something out-bottlenecking disk acess is not something that happens often.
If you write the UI only once per user triggerd event, you will not notice the cost. But do it from any form of loop - especially one running in another thread - and you will start to notice it. I was particualry informed that a foreach is apparently half as slow at ittearting as a for loop.
I even made a example for this, albeit in a Windows Forms Environment:
using System;
using System.Windows.Forms;
namespace UIWriteOverhead
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
int[] getNumbers(int upperLimit)
{
int[] ReturnValue = new int[upperLimit];
for (int i = 0; i < ReturnValue.Length; i++)
ReturnValue[i] = i;
return ReturnValue;
}
void printWithBuffer(int[] Values)
{
textBox1.Text = "";
string buffer = "";
foreach (int Number in Values)
buffer += Number.ToString() + Environment.NewLine;
textBox1.Text = buffer;
}
void printDirectly(int[] Values){
textBox1.Text = "";
foreach (int Number in Values)
textBox1.Text += Number.ToString() + Environment.NewLine;
}
private void btnPrintBuffer_Click(object sender, EventArgs e)
{
MessageBox.Show("Generating Numbers");
int[] temp = getNumbers(10000);
MessageBox.Show("Printing with buffer");
printWithBuffer(temp);
MessageBox.Show("Printing done");
}
private void btnPrintDirect_Click(object sender, EventArgs e)
{
MessageBox.Show("Generating Numbers");
int[] temp = getNumbers(1000);
MessageBox.Show("Printing directly");
printDirectly(temp);
MessageBox.Show("Printing done");
}
}
}
But even this overhead is pretty unlikey to have a presistent result. At some time the read thread should get the lock first, blocking write. But still, there are too many variables to say for sure. You should propably try a simpler example, with more consistent (and a whole lot less) writework. What about writing "A" and "B" to the console, instead of complex stuff like this?

Threading on Winforms

I am teaching myself how to operate with large numbers in complex loops.
In the main program, it will calling an method to perform some action.
In the example that I am working on it is just displaying the time in seconds.
As I am working the Form goes to Not Responding and crashes.
The screen looks like
When the program is running, every second would be outputted to the screen letting the user see that the program is still running and not responding.
The code is as follows
private void BtnStart_Click(object sender, EventArgs e)
{
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
label1.Text = "Start";
label2.Text = "Started";
dataGridView1.ColumnCount = 1;
dataGridView1.Columns[0].Name = "Number";
for (int index1 = 0; index1 < limit; index1++)
{
for (int index2 = 0; index2 < limit; index2++)
{
for (int index3 = 0; index3 < limit; index3++)
{
if ((stopwatch.ElapsedMilliseconds % 1000) == 0)
{
timeCount++;
AddRowToDG();
}
count++;
}
}
}
label1.Text = "The count is " + count.ToString();
// Stop.
stopwatch.Stop();
Double myTime = stopwatch.ElapsedMilliseconds;
label2.Text = (myTime / 1000).ToString();
}
private void AddRowToDG()
{
dataGridView1.Rows.Add(timeCount.ToString());
}
If I use above 150 for the limit, the program goes to not responding.
In the programming that I will be actually using will be 10 to power of 12.
From the research that I have done, there is tasks and threads that can be used.
Which methodology should I use and where would I get the best resource to help me to make the choices in future?
C# language have System.Threading.Tasks.Task and System.Threading.Thread classes, but for Windows Forms you should use System.ComponentModel.BackgroundWorker. There is DoWork event to do your application logic in the different thread.

Creating X amount of threads that execute a task at the same time

I am trying to dynamically create X amount of threads(specified by user) then basically have all of them execute some code at the exact same time in intervals of 1 second.
The issue I am having is that the task I am trying to complete relies on a loop to determine if the current IP is equal to the last. (It scans hosts) So since I have this loop inside, it is going off and then the other threads are not getting created, and not executing the code. I would like them to all go off at the same time, wait 1 second(using a timer or something else that doesnt lock the thread since the code it is executing has a timeout it waits for.) Can anyone help me out? Here is my current code:
int threads = Convert.ToInt32(txtThreads.Text);
List<Thread> workerThreads = new List<Thread>();
string from = txtStart.Text, to = txtEnd.Text;
uint current = from.ToUInt(), last = to.ToUInt();
ulong total = last - current;
for (int i = 0; i < threads; i++)
{
Thread thread = new Thread(() =>
{
for (int t = 0; t < Convert.ToInt32(total); t += i)
{
while (current <= last)
{
current = Convert.ToUInt32(current + t);
var ip = current.ToIPAddress();
doSomething(ip);
}
}
});
workerThreads.Add(thread);
thread.Start();
}
Don't use a lambda as the body of your thread, otherwise the i value isn't doing what you think it's doing. Instead pass the value into a method.
As for starting all of the threads at the same time do something like the following:
private object syncObj = new object();
void ThreadBody(object boxed)
{
Params params = (Params)boxed;
lock (syncObj)
{
Monitor.Wait(syncObj);
}
// do work here
}
struct Params
{
// passed values here
}
void InitializeThreads()
{
int threads = Convert.ToInt32(txtThreads.Text);
List<Thread> workerThreads = new List<Thread>();
string from = txtStart.Text, to = txtEnd.Text;
uint current = from.ToUInt(), last = to.ToUInt();
ulong total = last - current;
for (int i = 0; i < threads; i++)
{
Thread thread = new Thread(new ParameterizedThreadStart(this.ThreadBody, new Params { /* initialize values here */ }));
workerThreads.Add(thread);
thread.Start();
}
lock(syncObj)
{
Monitor.PulseAll(syncObj);
}
}
You're running into closure problems. There's another question that somewhat addresses this, here.
Basically you need to capture the value of i as you create each task. What's happening is by the time the task gets around to actually running, the value of i across all your tasks is the same -- the value at the end of the loop.

Dual-core performance worse than single core?

The following nunit test compares performance between running a single thread versus running 2 threads on a dual core machine. Specifically, this is a VMWare dual core virtual Windows 7 machine running on a quad core Linux SLED host with is a Dell Inspiron 503.
Each thread simply loops and increments 2 counters, addCounter and readCounter. This test was original testing a Queue implementation which was discovered to perform worse on a multi-core machine. So in narrowing down the problem to the small reproducible code, you have here no queue only incrementing variables and to shock and dismay, it's far slower with 2 threads then one.
When running the first test, the Task Manager shows 1 of the cores 100% busy with the other core almost idle. Here's the test output for the single thread test:
readCounter 360687000
readCounter2 0
total readCounter 360687000
addCounter 360687000
addCounter2 0
You see over 360 Million increments!
Next the dual thread test shows 100% busy on both cores for the whole 5 seconds duration of the test. However it's output shows only:
readCounter 88687000
readCounter2 134606500
totoal readCounter 223293500
addCounter 88687000
addCounter2 67303250
addFailure0
That's only 223 Million read increments. What is god's creation are those 2 CPU's doing for those 5 seconds to get less work done?
Any possible clue? And can you run the tests on your machine to see if you get different results? One idea is that perhaps the VMWare dual core performance isn't what you would hope.
using System;
using System.Threading;
using NUnit.Framework;
namespace TickZoom.Utilities.TickZoom.Utilities
{
[TestFixture]
public class ActiveMultiQueueTest
{
private volatile bool stopThread = false;
private Exception threadException;
private long addCounter;
private long readCounter;
private long addCounter2;
private long readCounter2;
private long addFailureCounter;
[SetUp]
public void Setup()
{
stopThread = false;
addCounter = 0;
readCounter = 0;
addCounter2 = 0;
readCounter2 = 0;
}
[Test]
public void TestSingleCoreSpeed()
{
var speedThread = new Thread(SpeedTestLoop);
speedThread.Name = "1st Core Speed Test";
speedThread.Start();
Thread.Sleep(5000);
stopThread = true;
speedThread.Join();
if (threadException != null)
{
throw new Exception("Thread failed: ", threadException);
}
Console.Out.WriteLine("readCounter " + readCounter);
Console.Out.WriteLine("readCounter2 " + readCounter2);
Console.Out.WriteLine("total readCounter " + (readCounter + readCounter2));
Console.Out.WriteLine("addCounter " + addCounter);
Console.Out.WriteLine("addCounter2 " + addCounter2);
}
[Test]
public void TestDualCoreSpeed()
{
var speedThread1 = new Thread(SpeedTestLoop);
speedThread1.Name = "Speed Test 1";
var speedThread2 = new Thread(SpeedTestLoop2);
speedThread2.Name = "Speed Test 2";
speedThread1.Start();
speedThread2.Start();
Thread.Sleep(5000);
stopThread = true;
speedThread1.Join();
speedThread2.Join();
if (threadException != null)
{
throw new Exception("Thread failed: ", threadException);
}
Console.Out.WriteLine("readCounter " + readCounter);
Console.Out.WriteLine("readCounter2 " + readCounter2);
Console.Out.WriteLine("totoal readCounter " + (readCounter + readCounter2));
Console.Out.WriteLine("addCounter " + addCounter);
Console.Out.WriteLine("addCounter2 " + addCounter2);
Console.Out.WriteLine("addFailure" + addFailureCounter);
}
private void SpeedTestLoop()
{
try
{
while (!stopThread)
{
for (var i = 0; i < 500; i++)
{
++addCounter;
}
for (var i = 0; i < 500; i++)
{
readCounter++;
}
}
}
catch (Exception ex)
{
threadException = ex;
}
}
private void SpeedTestLoop2()
{
try
{
while (!stopThread)
{
for (var i = 0; i < 500; i++)
{
++addCounter2;
i++;
}
for (var i = 0; i < 500; i++)
{
readCounter2++;
}
}
}
catch (Exception ex)
{
threadException = ex;
}
}
}
}
Edit: I tested the above on a quad core laptop w/o vmware and got similar degraded performance. So I wrote another test similar to the above but which has each thread method in a separate class. My purpose in doing that was to test 4 cores.
Well that test showed excelled results which improved almost linearly with 1, 2, 3, or 4 cores.
With some experimentation now on both machines it appears that the proper performance only happens if main thread methods are on different instances instead of the same instance.
In other words, if multiple threads main entry method on on the same instance of a particular class, then the performance on a multi-core will be worse for each thread you add, instead of better as you might assume.
It almost appears that the CLR is "synchronizing" so only one thread at a time can run on that method. However, my testing says that isn't the case. So it's still unclear what's happening.
But my own problem seems to be solved simply by making separate instances of methods to run threads as their starting point.
Sincerely,
Wayne
EDIT:
Here's an updated unit test that tests 1, 2, 3, & 4 threads with them all on the same instance of a class. Using arrays with variables uses in the thread loop at least 10 elements apart. And performance still degrades significantly for each thread added.
using System;
using System.Threading;
using NUnit.Framework;
namespace TickZoom.Utilities.TickZoom.Utilities
{
[TestFixture]
public class MultiCoreSameClassTest
{
private ThreadTester threadTester;
public class ThreadTester
{
private Thread[] speedThread = new Thread[400];
private long[] addCounter = new long[400];
private long[] readCounter = new long[400];
private bool[] stopThread = new bool[400];
internal Exception threadException;
private int count;
public ThreadTester(int count)
{
for( var i=0; i<speedThread.Length; i+=10)
{
speedThread[i] = new Thread(SpeedTestLoop);
}
this.count = count;
}
public void Run()
{
for (var i = 0; i < count*10; i+=10)
{
speedThread[i].Start(i);
}
}
public void Stop()
{
for (var i = 0; i < stopThread.Length; i+=10 )
{
stopThread[i] = true;
}
for (var i = 0; i < count * 10; i += 10)
{
speedThread[i].Join();
}
if (threadException != null)
{
throw new Exception("Thread failed: ", threadException);
}
}
public void Output()
{
var readSum = 0L;
var addSum = 0L;
for (var i = 0; i < count; i++)
{
readSum += readCounter[i];
addSum += addCounter[i];
}
Console.Out.WriteLine("Thread readCounter " + readSum + ", addCounter " + addSum);
}
private void SpeedTestLoop(object indexarg)
{
var index = (int) indexarg;
try
{
while (!stopThread[index*10])
{
for (var i = 0; i < 500; i++)
{
++addCounter[index*10];
}
for (var i = 0; i < 500; i++)
{
++readCounter[index*10];
}
}
}
catch (Exception ex)
{
threadException = ex;
}
}
}
[SetUp]
public void Setup()
{
}
[Test]
public void SingleCoreTest()
{
TestCores(1);
}
[Test]
public void DualCoreTest()
{
TestCores(2);
}
[Test]
public void TriCoreTest()
{
TestCores(3);
}
[Test]
public void QuadCoreTest()
{
TestCores(4);
}
public void TestCores(int numCores)
{
threadTester = new ThreadTester(numCores);
threadTester.Run();
Thread.Sleep(5000);
threadTester.Stop();
threadTester.Output();
}
}
}
That's only 223 Million read increments. What is god's creation are those 2 CPU's doing for those 5 seconds to get less work done?
You're probably running into cache contention -- when a single CPU is incrementing your integer, it can do so in its own L1 cache, but as soon as two CPUs start "fighting" over the same value, the cache line it's on has to be copied back and forth between their caches each time each one accesses it. The extra time spent copying data between caches adds up fast, especially when the operation you're doing (incrementing an integer) is so trivial.
A few things:
You should probably test each setup at least 10 times and take the average
As far as I know, Thread.sleep is not exact - it depends on how the OS switches your threads
Thread.join is not immediate. Again, it depends on how the OS switches your threads
A better way to test would be to run a computationally intensive operation (say, sum from one to a million) on two configurations and time them. For example:
Time how long it takes to sum from one to a million
Time how long it takes to sum one to 500000 on one thread and one 500001 to 1000000 on another
You were right when you thought that two threads would work faster than one thread. But yours are not the only threads running - the OS has threads, your browser has threads, and so on. Keep in mind that your timings will not be exact and may even fluctuate.
Lastly, there are other reasons(see slide 24) why threads work slower.

Categories