So I'm not the most experienced with the C# programming language, however I've been making a few test applications here and there.
I've noticed that the more threads I create for an application I'm working on, the more my GUI starts to freeze. I'm not sure why this occurs, I previously thought that part of the point of multi-threading an application was to avoid the GUI from freezing up.
An explanation would be appreciated.
Also, here's the code I use to create the threads:
private void runThreads(int amount, ThreadStart address)
{
for (int i = 0; i < amount; i++)
{
threadAmount += 1;
Thread currentThread = new Thread(address);
currentThread.Start();
}
}
and here's what the threads run:
private void checkProxies()
{
while (started)
{
try
{
WebRequest request = WebRequest.Create("http://google.co.nz/");
request.Timeout = (int)timeoutCounter.Value * 1000;
request.Proxy = new WebProxy(proxies[proxyIndex]);
Thread.SetData(Thread.GetNamedDataSlot("currentProxy"), proxies[proxyIndex]);
if (proxyIndex != proxies.Length)
{
proxyIndex += 1;
}
else
{
started = false;
}
request.GetResponse();
workingProxies += 1;
}
catch (WebException)
{
deadProxies += 1;
}
lock ("threadAmount")
{
if (threadAmount > proxies.Length - proxyIndex)
{
threadAmount -= 1;
break;
}
}
}
}
While I cannot tell you why exactly your code is slowing the GUI down, there are a few things in your code you should do to make it all-round better. If the problem persist then, then it should be a lot easier to pinpoint the issue.
Creating Thread objects is expensive. This is why new classes were added in C# to better handle multi-threading. Now you have access to the Task class or the Parallel class (described below).
Judging from the comments, you're running a LOT of threads at the same time. While it shouldn't be an issue to just run them, you're not really getting much use out of them if what you're doing is firing WebRequests (unless you have an awesome network). Use multiple threads, sure, but limit their number.
A Task is great when you want to do a specific operation in the background. But when you want to repeat a single operation for a specific set of data in the background... why not use the System.Threading.Tasks.Parallel class? Specifically, Parallel.ForEach (where you can specify your list of proxies as the parameter). This method also lets you set up how many threads are supposed to run concurrently at any given moment by using ParallelOptions.
One more way to code would be to make use of the async and await keywords available in .NET 4.5. In this case your GUI (button press?) should invoke an async method.
Use thread-safe methods like Interlocked.Increment or Interlocked.Add to increase / decrease counters available from multiple threads. Also, consider that you could change your list of proxies to a ConcurrentDictionary<string, bool> (where bool indicates if the proxy works or not) and set the values without any worry as each thread will only access its own entry in the dictionary. You can easily queue the totals at the end using LINQ: dictionary.Where(q => q.Value).Count() to get the number of working proxies, for example. Of course other classes are also available, depending on how you want to tackle the issue - perhaps a Queue (or ConcurrentQueue)?
Your lock shouldn't really work... as in, it seems it's working by accident than by design in your code (thanks Luaan for the comment). But you really shouldn't do that. Consult the MSDN documentation on lock to get a better understanding of how it works. The Object created in the MSDN example isn't just for show.
You can also make the requests themselves multi-threaded by using BeginGetResponse and EndGetResponse methods. You can, in fact, combine this with the Task class for a much cleaner code (the Task class can convert Begin/End method pairs into a single Task object).
So, a quick recap - use the Parallel class for multi-threading, and use concurrency classes for keeping things in place.
Here's a quick example I wrote:
private ConcurrentDictionary<string, bool?> values = new ConcurrentDictionary<string, bool?>();
private async void Button_Click(object sender, RoutedEventArgs e)
{
var result = await CheckProxies();
label.Content = result.ToString();
}
async Task<int> CheckProxies()
{
//I don't actually HAVE a list of proxies, so I make up some data
for (int i = 0; i < 1000; i++)
values[Guid.NewGuid().ToString()] = null;
await Task.Factory.StartNew(() => Parallel.ForEach(values, new ParallelOptions() { MaxDegreeOfParallelism = 10 }, this.PeformOperation));
//note that with maxDegreeOfParallelism set to a high value (like 1000)
//then I'll get a TON of failed requests simply because I'm overloading the network
//either that or google thinks I'm DDOSing them... >_<
return values.Where(v => v.Value == true).Count();
}
void PeformOperation(KeyValuePair<string, bool?> kvp)
{
try
{
WebRequest request = WebRequest.Create("http://google.co.nz/");
request.Timeout = 100;
//I'm not actually setting up the proxy from kvp,
//because it's populated with bogus data
request.GetResponse();
values[kvp.Key] = true;
}
catch (WebException)
{
values[kvp.Key] = false;
}
}
Despite the other comments being correct in that you should use either the Task class, or better yet, the async API, that is not the reason you're locking up.
The line of code that is causing your threads to lock up is this:
request.Timeout = (int)timeoutCounter.Value * 1000;
I am assuming that timeoutCounter is a control on the WinForm - which is running on the main GUI thread.
In other words, your threads' code is trying to access a control which is not in it's own thread, which is not really "allowed", at least not so simply.
For example, this question shows how to do this, albeit most of the answers there are a bit dated.
From a quick google (okay who am I kidding, I binged it) I found this article that explains the problem rather well.
Related
I have the following use case. Multiple threads are creating data points which are collected in a ConcurrentBag. Every x ms a single consumer thread looks at the data points that came in since the last time and processes them (e.g. count them + calculate average).
The following code more or less represents the solution that I came up with:
private static ConcurrentBag<long> _bag = new ConcurrentBag<long>();
static void Main()
{
Task.Run(() => Consume());
var producerTasks = Enumerable.Range(0, 8).Select(i => Task.Run(() => Produce()));
Task.WaitAll(producerTasks.ToArray());
}
private static void Produce()
{
for (int i = 0; i < 100000000; i++)
{
_bag.Add(i);
}
}
private static void Consume()
{
while (true)
{
var oldBag = _bag;
_bag = new ConcurrentBag<long>();
var average = oldBag.DefaultIfEmpty().Average();
var count = oldBag.Count;
Console.WriteLine($"Avg = {average}, Count = {count}");
// Wait x ms
}
}
Is a ConcurrentBag the right tool for the job here?
Is switching the bags the right way to achieve clearing the list for new data points and then processing the old ones?
Is it safe to operate on oldBag or could I run into trouble when I iterate over oldBag and a thread is still adding an item?
Should I use Interlocked.Exchange() for switching the variables?
EDIT
I guess the above code was not really a good representation of what I'm trying to achieve. So here is some more code to show the problem:
public class LogCollectorTarget : TargetWithLayout, ILogCollector
{
private readonly List<string> _logMessageBuffer;
public LogCollectorTarget()
{
_logMessageBuffer = new List<string>();
}
protected override void Write(LogEventInfo logEvent)
{
var logMessage = Layout.Render(logEvent);
lock (_logMessageBuffer)
{
_logMessageBuffer.Add(logMessage);
}
}
public string GetBuffer()
{
lock (_logMessageBuffer)
{
var messages = string.Join(Environment.NewLine, _logMessageBuffer);
_logMessageBuffer.Clear();
return messages;
}
}
}
The class' purpose is to collect logs so they can be sent to a server in batches. Every x seconds GetBuffer is called. This should get the current log messages and clear the buffer for new messages. It works with locks but it as they are quite expensive I don't want to lock on every Logging-operation in my program. So that's why I wanted to use a ConcurrentBag as a buffer. But then I still need to switch or clear it when I call GetBuffer without loosing any log messages that happen during the switch.
Since you have a single consumer, you can work your way with a simple ConcurrentQueue, without swapping collections:
public class LogCollectorTarget : TargetWithLayout, ILogCollector
{
private readonly ConcurrentQueue<string> _logMessageBuffer;
public LogCollectorTarget()
{
_logMessageBuffer = new ConcurrentQueue<string>();
}
protected override void Write(LogEventInfo logEvent)
{
var logMessage = Layout.Render(logEvent);
_logMessageBuffer.Enqueue(logMessage);
}
public string GetBuffer()
{
// How many messages should we dequeue?
var count = _logMessageBuffer.Count;
var messages = new StringBuilder();
while (count > 0 && _logMessageBuffer.TryDequeue(out var message))
{
messages.AppendLine(message);
count--;
}
return messages.ToString();
}
}
If memory allocations become an issue, you can instead dequeue them to a fixed-size array and call string.Join on it. This way, you're guaranteed to do only two allocations (whereas the StringBuilder could do many more if the initial buffer isn't properly sized):
public string GetBuffer()
{
// How many messages should we dequeue?
var count = _logMessageBuffer.Count;
var buffer = new string[count];
for (int i = 0; i < count; i++)
{
_logMessageBuffer.TryDequeue(out var message);
buffer[i] = message;
}
return string.Join(Environment.NewLine, buffer);
}
Is a ConcurrentBag the right tool for the job here?
Its the right tool for a job, this really depends on what you are trying to do, and why. The example you have given is very simplistic without any context so its hard to tell.
Is switching the bags the right way to achieve clearing the list for
new data points and then processing the old ones?
The answer is no, for probably many reasons. What happens if a thread writes to it, while you are switching it?
Is it safe to operate on oldBag or could I run into trouble when I
iterate over oldBag and a thread is still adding an item?
No, you have just copied the reference, this will achieve nothing.
Should I use Interlocked.Exchange() for switching the variables?
Interlock methods are great things, however this will not help you in your current problem, they are for thread safe access to integer type values. You are really confused and you need to look up more thread safe examples.
However Lets point you in the right direction. forget about ConcurrentBag and those fancy classes. My advice is start simple and use locking so you understand the nature of the problem.
If you want multiple tasks/threads to access a list, you can easily use the lock statement and guard access to the list/array so other nasty threads aren't modifying it.
Obviously the code you have written is a nonsensical example, i mean you are just adding consecutive numbers to a list, and getting another thread to average them them. This hardly needs to be consumer producer at all, and would make more sense to just be synchronous.
At this point i would point you to better architectures that would allow you to implement this pattern, e.g Tpl Dataflow, but i fear this is just a learning excise and unfortunately you really need to do more reading on multithreading and try more examples before we can truly help you with a problem.
It works with locks but it as they are quite expensive. I don't want to lock on every logging-operation in my program.
Acquiring an uncontended lock is actually quite cheap. Quoting from Joseph Albahari's book:
You can expect to acquire and release a lock in as little as 20 nanoseconds on a 2010-era computer if the lock is uncontended.
Locking becomes expensive when it is contended. You can minimize the contention by reducing the work inside the critical region to the absolute minimum. In other words don't do anything inside the lock that can be done outside the lock. In your second example the method GetBuffer does a String.Join inside the lock, delaying the release of the lock and increasing the chances of blocking other threads. You can improve it like this:
public string GetBuffer()
{
string[] messages;
lock (_logMessageBuffer)
{
messages = _logMessageBuffer.ToArray();
_logMessageBuffer.Clear();
}
return String.Join(Environment.NewLine, messages);
}
But it can be optimized even further. You could use the technique of your first example, and instead of clearing the existing List<string>, just swap it with a new list:
public string GetBuffer()
{
List<string> oldList;
lock (_logMessageBuffer)
{
oldList = _logMessageBuffer;
_logMessageBuffer = new();
}
return String.Join(Environment.NewLine, oldList);
}
Starting from .NET Core 3.0, the Monitor class has the property Monitor.LockContentionCount, that returns the number of times there was contention at the entry point of a lock. You could watch the delta of this property every second, and see if the number is concerning. If you get single-digit numbers, there is nothing to worry about.
Touching some of your questions:
Is a ConcurrentBag the right tool for the job here?
No. The ConcurrentBag<T> is a very specialized collection intended for mixed producer scenarios, mainly object pools. You don't have such a scenario here. A ConcurrentQueue<T> is preferable to a ConcurrentBag<T> in almost all scenarios.
Should I use Interlocked.Exchange() for switching the variables?
Only if the collection was immutable. If the _logMessageBuffer was an ImmutableQueue<T>, then it would be excellent to swap it with Interlocked.Exchange. With mutable types you have no idea if the old collection is still in use by another thread, and for how long. The operating system can suspend any thread at any time for a duration of 10-30 milliseconds or even more (demo). So it's not safe to use lock-free techniques. You have to lock.
Is there a way in c# to call a method so that if the method takes to long to complete, the method will be canceled and it will return to the calling method? I think I can do this with threading but what if threading is not needed?
For reference, the method I may need to kill/stop/abort is calling the CorelDraw 15 API. This opens an instance of CorelDraw and I have received non-repeatable errors in this method. Meaning, I can process the same image twice and one time it will freeze or error and the other it will not.
The current solution to the issue I am using is to have a second application that does Process.Start(firstAppExecutablePath) and then checks a variable in a text file and if the variable doesn't change after 10 minutes, .Kill(); is called on the instance of the process. I would prefer to avoid this solution if possible as it seems clunky and prone to issues. Since it runs .Kill(); it is being very messy in how things close but generally does not cause an issue.
Not built-in, no, since interrupting arbitrary code cannot be done safely (what if it's in the middle of calling a C library function (that doesn't support exceptions) which has just taken a global lock and needs to release it?).
But you can write such support yourself. I wouldn't add threads to the mix unless absolutely necessary, since they come with an entire new dimension of potential problems.
Example:
void Caller()
{
int result;
if (TryDoSomething(out result, 100)) {
System.Console.WriteLine("Result: {0}", result);
}
}
bool TryDoSomething(out int result, int timeoutMillis)
{
var sw = Stopwatch.StartNew();
result = 0x12345678;
for (int i = 0; i != 100000000; ++i) {
if (sw.ElapsedMilliseconds > timeoutMillis)
return false;
result += i / (result % 43) + (i % 19);
}
return true;
}
Threading is absolutely needed unless you are ok with checking the timeout from within the function - which probably you arn't. So here is a minimalistic approach with threads:
private static bool ExecuteWithTimeout(TimeSpan timeout, Action action)
{
Thread x = new Thread(() => { action(); });
x.Start();
if (!x.Join(timeout))
{
x.Abort(); //Or Interrupt instead, if you use e.g. Thread.Sleep in your method
return false;
}
return true;
}
I have an integration service which runs a calculation heavy, data bound process. I want to make sure that there are never more than say, n = 5, (but n will be configurable, changeable at runtime) of these processes running at the same. The idea is to throttle the load on the server to a safe level. The amount of data processed by the method is limited by batching, so I don't need to worry about 1 process representing a much bigger load than another.
The processing method is called by another process, where requests to run payroll are held on a queue, and I can insert some logic at that point to determine whether to process this request now, or leave it on the queue.
So i want a seperate method on the same service as the processing method, which can tell me if the server can accept another call to the processing method. It's going to ask, "how many payroll runs are going on? is that less than n?" What's a neat way of achieving this?
-----------edit------------
I think I need to make it clear, the process that decides whether to take the request off the queue this is seperated from the service that processes the payroll data by a WCF boundary. Stopping a thread on the payroll processing process isn't going to prevent more requests coming in
You can use a Semaphore to do this.
public class Foo
{
private Semaphore semaphore;
public Foo(int numConcurrentCalls)
{
semaphore = new Semaphore(numConcurrentCalls, numConcurrentCalls);
}
public bool isReady()
{
return semaphore.WaitOne(0);
}
public void Bar()
{
try
{
semaphore.WaitOne();//it will only get past this line if there are less than
//"numConcurrentCalls" threads in this method currently.
//do stuff
}
finally
{
semaphore.Release();
}
}
}
Review the Object Pool pattern. This is what you're describing. While not strictly required by the pattern, you can expose the number of objects currently in the pool, the maximum (configured) number, the high-watermark, etc.
I think that you might want a BlockingCollection, where each item in the collection represents one of the concurrent calls.
Also see IProducerConsumerCollection.
If you were just using threads, I'd suggest you look at the methods for limiting thread concurrency (e.g. the TaskScheduler.MaximumConcurrencyLevel property, and this example.).
Also see ParallelEnumerable.WithDegreeOfParallelism
void ThreadTest()
{
ConcurrentQueue<int> q = new ConcurrentQueue<int>();
int MaxCount = 5;
Random r = new Random();
for (int i = 0; i <= 10000; i++)
{
q.Enqueue(r.Next(100000, 200000));
}
ThreadStart proc = null;
proc = () =>
{
int read = 0;
if (q.TryDequeue(out read))
{
Console.WriteLine(String.Format("[{1:HH:mm:ss}.{1:fff}] starting: {0}... #Thread {2}", read, DateTime.Now, Thread.CurrentThread.ManagedThreadId));
Thread.Sleep(r.Next(100, 1000));
Console.WriteLine(String.Format("[{1:HH:mm:ss}.{1:fff}] {0} ended! #Thread {2}", read, DateTime.Now, Thread.CurrentThread.ManagedThreadId));
proc();
}
};
for (int i = 0; i <= MaxCount; i++)
{
new Thread(proc).Start();
}
}
Current implementation: Waits until parallelCount values are collected, uses ThreadPool to process the values, waits until all threads complete, re-collect another set of values and so on...
Code:
private static int parallelCount = 5;
private int taskIndex;
private object[] paramObjects;
// Each ThreadPool thread should access only one item of the array,
// release object when done, to be used by another thread
private object[] reusableObjects = new object[parallelCount];
private void MultiThreadedGenerate(object paramObject)
{
paramObjects[taskIndex] = paramObject;
taskIndex++;
if (taskIndex == parallelCount)
{
MultiThreadedGenerate();
// Reset
taskIndex = 0;
}
}
/*
* Called when 'paramObjects' array gets filled
*/
private void MultiThreadedGenerate()
{
int remainingToGenerate = paramObjects.Count;
resetEvent.Reset();
for (int i = 0; i < paramObjects.Count; i++)
{
ThreadPool.QueueUserWorkItem(delegate(object obj)
{
try
{
int currentIndex = (int) obj;
Generate(currentIndex, paramObjects[currentIndex], reusableObjects[currentIndex]);
}
finally
{
if (Interlocked.Decrement(ref remainingToGenerate) == 0)
{
resetEvent.Set();
}
}
}, i);
}
resetEvent.WaitOne();
}
I've seen significant performance improvements with this approach, however there are a number of issues to consider:
[1] Collecting values in paramObjects and synchronization using resetEvent can be avoided as there is no dependency between the threads (or current set of values with the next set of values). I'm only doing this to manage access to reusableObjects (when a set paramObjects is done processing, I know that all objects in reusableObjects are free, so taskIndex is reset and each new task of the next set of values will have its unique 'reusableObj' to work with).
[2] There is no real connection between the size of reusableObjects and the number of threads the ThreadPool uses. I might initialize reusableObjects to have 10 objects, and say due to some limitations, ThreadPool can run only 3 threads for my MultiThreadedGenerate() method, then I'm wasting memory.
So by getting rid of paramObjects, how can the above code be refined in a way that as soon as one thread completes its job, that thread returns its taskIndex(or the reusableObj) it used and no longer needs so that it becomes available to the next value. Also, the code should create a reUsableObject and add it to some collection only when there is a demand for it. Is using a Queue here a good idea ?
Thank you.
There's really no reason to do your own manual threading and task management any more. You could restructure this to a more loosely-coupled model using Task Parallel Library (and possibly System.Collections.Concurrent for result collation).
Performance could be further improved if you don't need to wait for a full complement of work before handing off each Task for processing.
TPL came along in .Net 4.0 but was back-ported to .Net 3.5. Download here.
What is the most recommended .NET custom threadpool that can have separate instances i.e more than one threadpool per application?
I need an unlimited queue size (building a crawler), and need to run a separate threadpool in parallel for each site I am crawling.
Edit :
I need to mine these sites for information as fast as possible, using a separate threadpool for each site would give me the ability to control the number of threads working on each site at any given time. (no more than 2-3)
Thanks
Roey
I believe Smart Thread Pool can do this. It's ThreadPool class is instantiated so you should be able to create and manage your separate site specific instances as you require.
Ami bar wrote an excellent Smart thread pool that can be instantiated.
take a look here
Ask Jon Skeet: http://www.yoda.arachsys.com/csharp/miscutil/
Parallel extensions for .Net (TPL) should actually work much better if you want a large number of parallel running tasks.
Using BlockingCollection can be used as a queue for the threads.
Here is an implementation of it.
Updated at 2018-04-23:
public class WorkerPool<T> : IDisposable
{
BlockingCollection<T> queue = new BlockingCollection<T>();
List<Task> taskList;
private CancellationTokenSource cancellationToken;
int maxWorkers;
private bool wasShutDown;
int waitingUnits;
public WorkerPool(CancellationTokenSource cancellationToken, int maxWorkers)
{
this.cancellationToken = cancellationToken;
this.maxWorkers = maxWorkers;
this.taskList = new List<Task>();
}
public void enqueue(T value)
{
queue.Add(value);
waitingUnits++;
}
//call to signal that there are no more item
public void CompleteAdding()
{
queue.CompleteAdding();
}
//create workers and put then running
public void startWorkers(Action<T> worker)
{
for (int i = 0; i < maxWorkers; i++)
{
taskList.Add(new Task(() =>
{
string myname = "worker " + Guid.NewGuid().ToString();
try
{
while (!cancellationToken.IsCancellationRequested)
{
var value = queue.Take();
waitingUnits--;
worker(value);
}
}
catch (Exception ex) when (ex is InvalidOperationException) //throw when collection is closed with CompleteAdding method. No pretty way to do this.
{
//do nothing
}
}));
}
foreach (var task in taskList)
{
task.Start();
}
}
//wait for all workers to be finish their jobs
public void await()
{
while (waitingUnits >0 || !queue.IsAddingCompleted)
Thread.Sleep(100);
shutdown();
}
private void shutdown()
{
wasShutDown = true;
Task.WaitAll(taskList.ToArray());
}
//case something bad happen dismiss all pending work
public void Dispose()
{
if (!wasShutDown)
{
queue.CompleteAdding();
shutdown();
}
}
}
Then use like this:
WorkerPool<int> workerPool = new WorkerPool<int>(new CancellationTokenSource(), 5);
workerPool.startWorkers(value =>
{
log.Debug(value);
});
//enqueue all the work
for (int i = 0; i < 100; i++)
{
workerPool.enqueue(i);
}
//Signal no more work
workerPool.CompleteAdding();
//wait all pending work to finish
workerPool.await();
You can have as many polls has you like simply creating new WorkPool objects.
This free nuget library here: CodeFluentRuntimeClient has a CustomThreadPool class that you can reuse. It's very configurable, you can change pool threads priority, number, COM apartment state, even name (for debugging), and also culture.
Another approach is to use a Dataflow Pipeline. I added these later answer because i find Dataflows a much better approach for these kind of problem, the problem of having several thread pools. They provide a more flexible and structured approach and can easily scale vertically.
You can broke your code into one or more blocks, link then with Dataflows and let then the Dataflow engine allocate threads according to CPU and memory availability
I suggest to broke into 3 blocks, one for preparing the query to the site page , one access site page, and the last one to Analise the data.
This way the slow block (get) may have more threads allocated to compensate.
Here how would look like the Dataflow setup:
var linkOptions = new DataflowLinkOptions { PropagateCompletion = true };
prepareBlock.LinkTo(get, linkOptions);
getBlock.LinkTo(analiseBlock, linkOptions);
Data will flow from prepareBlock to getBlock and then to analiseBlock.
The interfaces between blocks can be any class, just have to bee the same. See the full example on Dataflow Pipeline
Using the Dataflow would be something like this:
while ...{
...
prepareBlock.Post(...); //to send data to the pipeline
}
prepareBlock.Complete(); //when done
analiseBlock.Completion.Wait(cancellationTokenSource.Token); //to wait for all queues to empty or cancel