I want to loop through a list of URLs and check each URL if the website is down or not using multiple threads.
My approach:
while (_lURLs.Count > 0)
{
while (_iRunningThreads < _iNumThreads)
{
Thread t = new Thread(new ParameterizedThreadStart(CheckWebsite));
string strUrl = GetNextURL();
if (!string.IsNullOrEmpty(strUrl))
{
t.Start(strUrl);
_iRunningThreads++;
}
else
{
break;
}
}
}
private string GetNextURL()
{
lock (_lURLs)
{
if (_lURLs.Count > 0)
{
string strRetVal = _lURLs[0];
_lURLs.RemoveAt(0);
return strRetVal;
}
else
{
return string.Empty;
}
}
}
When a thread is finished the _iRunningThreads property gets decremented.
My problem is: The outer while loop blocks everything "while (_lURLs.Count > 0)".
Adding a Application.DoEvents() in the outer while loop helps but I want to use the code in a c# library where Application.DoEvents() is not available.
Thank you for you help.
Instead of managing the threads yourself, you can use the TPL.
Also, if you're using .Net Framework 4.5 you can even add async/await and the WhenAll method to prevent blocking...
Here is a small example:
private async Task CheckUrl()
{
List<Task> tasks = new List<Task>();
string url = GetNextUrl();
while (!String.IsNullOrEmpty(url))
{
tasks.Add(Task.Run(() => CheckWebSite(url)));
url = GetNextUrl();
}
await Task.WhenAll(tasks);
// All tasks have finished...
}
I think using the .NET ThreadPool would be a good idea in this case, if the tasks take quite a short time to complete.
Check out: http://msdn.microsoft.com/en-us/library/4yd16hza.aspx
This allows you to simplify your code a bit as the ThreadPool automatically manages the count of the worker threads. You just have to call ThreadPool.QueueUserWorkItem for each URL you have and increment a running task counter. Queuing items into the ThreadPool won't block the UI thread.
Have the ThreadPool tasks decrement the counter (as you have now) and when the counter gets to zero (all tasks have been ran) call a callback function so that your main code knows when all the URLs have been processed. You can update the UI or what ever else you want to do from that callback.
Related
Basic overview: program should launch task to parse some array of data and occasionally enqueue tasks to process it one at a time. Test rig have a button an two labels to display debug info. TaskQueue is a class for SemaphoreSlim from this thread
Dispatcher dispath = Application.Current.Dispatcher;
async void Test_Click(s, e)
{
TaskQueue queue = new TaskQueue();
// Blocks thread if SimulateParse does not have await inside
await SimulateParse(queue);
//await Task.Run(() => SimulateParse(queue));
lblStatus2.Content = string.Format("Awaiting queue"));
await queue.WaitAsync(); //this is just SemaphoreSlim.WaitAsync()
lblStatus.Content = string.Format("Ready"));
lblStatus2.Content = string.Format("Ready"));
MessageBox.Show("Ok");
}
async Task SimulateParse(TaskQueue queue)
{
Random rnd = new Random();
int counter = 0; // representing some piece of data
for(int i = 0; i < 500; i++)
{
dispatch.Invoke(() => lblStatus2.Content = string.Format("Check {0}", ++counter));
Thread.Sleep(25); //no await variant
//await Task.Delay(25);
// if some condition matched - queue work
if (rnd.Next(1, 11) < 2)
{
// Blocks thread even though Enqueue() has await inside
queue.Enqueue(SimulateWork, counter);
//Task.Run(() => queue.Enqueue(SimulateWork, counter));
}
}
}
async Task SimulateWork(object par)
{
dispatch.Invoke(() => lblStatus.Content = string.Format("Working with {0}", par));
Thread.Sleep(400); //no await variant
//await Task.Delay(400);
}
It seems, that it works only if launched task have await inside itself, i.e. if you trying to launch task without await inside it, it will block current thread.
This rig will work as intended, if commented lines are used, but it looks like excessive amount of calls, also, real versions of SimulateParse and SimulateWork does not need to await anything. Main question is - what is the optimal way to launch task with non-async function inside of it? Do i just need to encase them in a Task.Run() like in commented rows?
TaskQueue is used here to run task one by one
It will run them one at a time, yes. SemaphoreSlim does have an implicit queue, but it's not strictly a FIFO-queue. Most synchronization primitives have a mostly-but-not-quite-FIFO implementation, which is Close Enough. This is because they are synchronization primitives, and not queues.
If you want an actual queue (i.e., with guaranteed FIFO order), then you should use a queue, such as TPL Dataflow or System.Threading.Channels.
if you trying to launch task without await inside it, it will block current thread.
All async methods begin executing on the current thread, as described on my blog. async does not mean "run on a different thread". If you want to run a method on a thread pool thread, then wrap that method call in Task.Run. That's a much cleaner solution than sprinkling Task.Delay throughout, and it's more efficient, too (no delays).
My program is creating some Tasks to do some work.
Now I want to wait for all tasks to complete before I will exit my program.
I know the Task.WaitAll(...) method but in my case I want to create some Tasks dynamically --> they don't exist when Task.WaitAll() is called.
My question now is:
how can I wait for all running Tasks until they are completed?
This is what I'm doing now.
It works but I want to be sure this is also a good way to do it.
public class Test
{
public ConcurrentBag<Task> RunningTasks { get; set; }
public Test()
{
RunningTasks = new ConcurrentBag<Task>();
}
public void RunIt(){
//...create some Tasks asynchronously so they may be created in a few seconds and add them to the list like
//RunningTasks.Add(...);
//Just wait until all tasks finished.
var waitTask = Task.Run(() =>
{
while (true)
{
if (RunningTasks.All(t => t.IsCompleted))
break;
Thread.Sleep(1000);
}
});
waitTask.Wait();
}
}
Task.WhenAll() will take an array of Task, and the ConcurrentBag<T> supports the ToArray() extension method, so you can simply do:
await Task.WhenAll(RunningTasks.ToArray())
or if you don't want to use the await keyword:
Task.WhenAll(RunningTasks.ToArray()).Wait()
UPDATE
So if your RunningTasks is changing after the initial call you can do this pretty easily to handle it:
while(RunningTasks.Any(t=>!t.IsCompleted))
{
Task.WhenAll(RunningTasks.ToArray());
}
The main benefit over your method is that this will yield the thread to other work while it waits, where your code will tie up the thread in a sleep state until all the tasks are completed.
I'm trying to transition from the Event-based Asynchronous Pattern where I tracked running methods using unique id's and the asynoperationmanager. As this has now been dropped from Windows 8 Apps I'm trying to get a similar effect with Async/Await but can't quite figure out how.
What I'm trying to achieve is something like
private async Task updateSomething()
{
if(***the method is already running***)
{
runagain = true;
}
else
{
await someMethod();
if (runagain)
{
run the method again
}
}
}
The part I'm struggling with is finding out if the method is running. I've tried creating a Task and looking at the status of both that and the .status of the async method but they don't appear to be the correct place to look.
Thanks
UPDATE: This is the current code I use in .net 4 to achieve the same result. _updateMetaDataAsync is a class based on the Event-Based Asynchronous Pattern.
private void updateMetaData()
{
if (_updateMetaDataAsync.IsTaskRunning(_updateMetaDataGuid_CheckAllFiles))
{
_updateMetaDataGuid_CheckAllFiles_Again = true;
}
else
{
_updateMetaDataGuid_CheckAllFiles_Again = false;
_updateMetaDataAsync.UpdateMetaDataAsync(_updateMetaDataGuid_CheckAllFiles);
}
}
private void updateMetaDataCompleted(object sender, UpdateMetaDataCompletedEventArgs e)
{
if (_updateMetaDataGuid_CheckAllFiles_Again)
{
updateMetaData();
}
}
async/await itself is intended to be used to create sequential operations executed asynchronously from the UI thread. You can get it to do parallel operations, but generally the operations "join" back to the UI thread with some sort of result. (there's also the possibility of doing "fire-and-forget" types of asynchronous operations with await but it's not recommended). i.e. there's nothing inherent to async/await to support progress reporting.
You can get progress out of code using async/await; but you need to use new progress interfaces like IProgress<T>. For more info on progress reporting with async/await, see http://blogs.msdn.com/b/dotnet/archive/2012/06/06/async-in-4-5-enabling-progress-and-cancellation-in-async-apis.aspx. Migrating to this should just be a matter of calling an IProgress delegate instead of a Progress event.
If you're using a Task you've created, you can check the Task's Status property (or just see Task.IsCompleted if completion is the only state you are interested in).
That being said, await will not "return" until the operation either completes, raises an exception, or cancels. You can basically safely assume that, if you're still waiting on the "await", your task hasn't completed.
SemaphoreSlim queueToAccessQueue = new SemaphoreSlim(1);
object queueLock = new object();
long queuedRequests = 0;
Task _loadingTask;
public void RetrieveItems() {
lock (queueLock) {
queuedRequests++;
if (queuedRequests == 1) { // 1 is the minimum size of the queue before another instance is queued
_loadingTask = _loadingTask?.ContinueWith(async () => {
RunTheMethodAgain();
await queueToAccessQueue.WaitAsync();
queuedRequests = 0; // indicates that the queue has been cleared;
queueToAccessQueue.Release()
}) ?? Task.Run(async () => {
RunTheMethodAgain();
await queueToAccessQueue.WaitAsync();
queuedRequests = 0; // indicates that the queue has been cleared;
queueToAccessQueue.Release();
});
}
}
}
public void RunTheMethodAgain() {
** run the method again **
}
The added bonus is that you can see how many items are sitting in the queue!
I have scenarios where I need a main thread to wait until every one of a set of possible more than 64 threads have completed their work, and for that I wrote the following helper utility, (to avoid the 64 waithandle limit on WaitHandle.WaitAll())
public static void WaitAll(WaitHandle[] handles)
{
if (handles == null)
throw new ArgumentNullException("handles",
"WaitHandle[] handles was null");
foreach (WaitHandle wh in handles) wh.WaitOne();
}
With this utility method, however, each waithandle is only examined after every preceding one in the array has been signalled... so it is in effect synchronous, and will not work if the waithandles are autoResetEvent wait handles (which clear as soon as a waiting thread has been released)
To fix this issue I am considering changing this code to the following, but would like others to check and see if it looks like it will work, or if anyone sees any issues with it, or can suggest a better way ...
Thanks in advance:
public static void WaitAllParallel(WaitHandle[] handles)
{
if (handles == null)
throw new ArgumentNullException("handles",
"WaitHandle[] handles was null");
int actThreadCount = handles.Length;
object locker = new object();
foreach (WaitHandle wh in handles)
{
WaitHandle qwH = wh;
ThreadPool.QueueUserWorkItem(
delegate
{
try { qwH.WaitOne(); }
finally { lock(locker) --actThreadCount; }
});
}
while (actThreadCount > 0) Thread.Sleep(80);
}
If you know how many threads you have, you can use an interlocked decrement. This is how I usually do it:
{
eventDone = new AutoResetEvent();
totalCount = 128;
for(0...128) {ThreadPool.QueueUserWorkItem(ThreadWorker, ...);}
}
void ThreadWorker(object state)
try
{
... work and more work
}
finally
{
int runningCount = Interlocked.Decrement(ref totalCount);
if (0 == runningCount)
{
// This is the last thread, notify the waiters
eventDone.Set();
}
}
Actually, most times I don't even signal but instead invoke a callback continues the processing from where the waiter would continue. Less blocked threads, more scalability.
I know is different and may not apply to your case (eg. for sure will not work if some of thoe handles are not threads, but I/O or events), but it may worth thinking about this.
I'm not sure what exactly you're trying to do, but would a CountdownEvent (.NET 4.0) conceptually solve your problem?
I'm not a C# or .NET programmer, but you could use a semaphore that is posted when one of your worker threads exits. The monitoring thread would simply wait on the semaphore n times where n is the number of worker threads. Semaphores are traditionally used to count resources in use but they can be used to count jobs completed by waiting on the same semaphore for n times.
When working with lots of simultaneous threads, I prefer to add each thread's ManagedThreadId into a Dictionary when I start the thread, and then have each thread invoke a callback routine that removes the dying thread's id from the Dictionary. The Dictionary's Count property tells you how many threads are active. Use the value side of the key/value pair to hold info that your UI thread can use to report status. Wrap the Dictionary with a lock to keep things safe.
ThreadPool.QueueUserWorkItem(o =>
{
try
{
using (var h = (o as WaitHandle))
{
if (!h.WaitOne(100000))
{
// Alert main thread of the timeout
}
}
}
finally
{
Interlocked.Decrement(ref actThreadCount);
}
}, wh);
What is the most recommended .NET custom threadpool that can have separate instances i.e more than one threadpool per application?
I need an unlimited queue size (building a crawler), and need to run a separate threadpool in parallel for each site I am crawling.
Edit :
I need to mine these sites for information as fast as possible, using a separate threadpool for each site would give me the ability to control the number of threads working on each site at any given time. (no more than 2-3)
Thanks
Roey
I believe Smart Thread Pool can do this. It's ThreadPool class is instantiated so you should be able to create and manage your separate site specific instances as you require.
Ami bar wrote an excellent Smart thread pool that can be instantiated.
take a look here
Ask Jon Skeet: http://www.yoda.arachsys.com/csharp/miscutil/
Parallel extensions for .Net (TPL) should actually work much better if you want a large number of parallel running tasks.
Using BlockingCollection can be used as a queue for the threads.
Here is an implementation of it.
Updated at 2018-04-23:
public class WorkerPool<T> : IDisposable
{
BlockingCollection<T> queue = new BlockingCollection<T>();
List<Task> taskList;
private CancellationTokenSource cancellationToken;
int maxWorkers;
private bool wasShutDown;
int waitingUnits;
public WorkerPool(CancellationTokenSource cancellationToken, int maxWorkers)
{
this.cancellationToken = cancellationToken;
this.maxWorkers = maxWorkers;
this.taskList = new List<Task>();
}
public void enqueue(T value)
{
queue.Add(value);
waitingUnits++;
}
//call to signal that there are no more item
public void CompleteAdding()
{
queue.CompleteAdding();
}
//create workers and put then running
public void startWorkers(Action<T> worker)
{
for (int i = 0; i < maxWorkers; i++)
{
taskList.Add(new Task(() =>
{
string myname = "worker " + Guid.NewGuid().ToString();
try
{
while (!cancellationToken.IsCancellationRequested)
{
var value = queue.Take();
waitingUnits--;
worker(value);
}
}
catch (Exception ex) when (ex is InvalidOperationException) //throw when collection is closed with CompleteAdding method. No pretty way to do this.
{
//do nothing
}
}));
}
foreach (var task in taskList)
{
task.Start();
}
}
//wait for all workers to be finish their jobs
public void await()
{
while (waitingUnits >0 || !queue.IsAddingCompleted)
Thread.Sleep(100);
shutdown();
}
private void shutdown()
{
wasShutDown = true;
Task.WaitAll(taskList.ToArray());
}
//case something bad happen dismiss all pending work
public void Dispose()
{
if (!wasShutDown)
{
queue.CompleteAdding();
shutdown();
}
}
}
Then use like this:
WorkerPool<int> workerPool = new WorkerPool<int>(new CancellationTokenSource(), 5);
workerPool.startWorkers(value =>
{
log.Debug(value);
});
//enqueue all the work
for (int i = 0; i < 100; i++)
{
workerPool.enqueue(i);
}
//Signal no more work
workerPool.CompleteAdding();
//wait all pending work to finish
workerPool.await();
You can have as many polls has you like simply creating new WorkPool objects.
This free nuget library here: CodeFluentRuntimeClient has a CustomThreadPool class that you can reuse. It's very configurable, you can change pool threads priority, number, COM apartment state, even name (for debugging), and also culture.
Another approach is to use a Dataflow Pipeline. I added these later answer because i find Dataflows a much better approach for these kind of problem, the problem of having several thread pools. They provide a more flexible and structured approach and can easily scale vertically.
You can broke your code into one or more blocks, link then with Dataflows and let then the Dataflow engine allocate threads according to CPU and memory availability
I suggest to broke into 3 blocks, one for preparing the query to the site page , one access site page, and the last one to Analise the data.
This way the slow block (get) may have more threads allocated to compensate.
Here how would look like the Dataflow setup:
var linkOptions = new DataflowLinkOptions { PropagateCompletion = true };
prepareBlock.LinkTo(get, linkOptions);
getBlock.LinkTo(analiseBlock, linkOptions);
Data will flow from prepareBlock to getBlock and then to analiseBlock.
The interfaces between blocks can be any class, just have to bee the same. See the full example on Dataflow Pipeline
Using the Dataflow would be something like this:
while ...{
...
prepareBlock.Post(...); //to send data to the pipeline
}
prepareBlock.Complete(); //when done
analiseBlock.Completion.Wait(cancellationTokenSource.Token); //to wait for all queues to empty or cancel