How to pass different instances while multithreading? - c#

I am building a scraper. My goal is to start X browsers (where X is number of threads) and proceed to scrape a list of URLs with each of them by splitting that list in X parts.
I decide to use 3 threads (3 browsers) with list of 10 URLs.
Question: How to separate each task between the browsers like this:
Browser1 scrapes items in the list from 0 to 3
Browser2 scrapes items in the list from 4 to 7
Browser3 scrapes items in the list from 8 to 10
All browsers should be working at the same time scraping the passed list of URLs.
I already have this BlockingCollection:
BlockingCollection<Action> _taskQ = new BlockingCollection<Action>();
public Multithreading(int workerCount)
{
// Create and start a separate Task for each consumer:
for (int i = 0; i < workerCount; i++)
Task.Factory.StartNew(Consume);
}
public void Dispose() { _taskQ.CompleteAdding(); }
public void EnqueueTask(Action action) { _taskQ.Add(action); }
void Consume()
{
// This sequence that we’re enumerating will block when no elements
// are available and will end when CompleteAdding is called.
foreach (Action action in _taskQ.GetConsumingEnumerable())
action(); // Perform task.
}
public int ItemsCount()
{
return _taskQ.Count;
}
It can be used like this:
Multithreading multithread = new Multithreading(3); //3 threads
foreach(string url in urlList){
multithread.EnqueueTask(new Action(() =>
{
startScraping(browser1); //or browser2 or browser3
}));
}
I need to create the browsers instances before scraping, because I do not want to start a new browser with every thread.

Taking Henk Holtermans comment into account that you may want maximum speed, i.e. keep browsers busy as much as possible, use this:
private static void StartScraping(int id, IEnumerable<Uri> urls)
{
// Construct browser here
foreach (Uri url in urls)
{
// Use browser to process url here
Console.WriteLine("Browser {0} is processing url {1}", id, url);
}
}
in main:
int nrWorkers = 3;
int nrUrls = 10;
BlockingCollection<Uri> taskQ = new BlockingCollection<Uri>();
foreach (int i in Enumerable.Range(0, nrWorkers))
{
Task.Run(() => StartScraping(i, taskQ.GetConsumingEnumerable()));
}
foreach (int i in Enumerable.Range(0, nrUrls))
{
taskQ.Add(new Uri(String.Format("http://Url{0}", i)));
}
taskQ.CompleteAdding();

I think the usual approach is to have a single blocking queue, a provider thread and an arbitrary pool of workers.
The provider thread is responsible for adding URLs to the queue. It blocks when there are none to add.
A worker thread instantiates a browser, and then retrieves a single URL from the queue, scrapes it and then loops back for more. It blocks when the queue is empty.
You can start as many workers as you like, and they just sort it out between them.
The mainline starts all the threads and retires to the sidelines. It looks after the UI, if there is one.
Multithreading can be really hard to debug. You might want to look at using Tasks for at least part of the job.

You could give some Id to the tasks and also Workers. Then you'll have BlockingCollection[] instead of just BlockingCollection. Every consumer will consume from its own BlockingCollection from the array. Our job is to find the right consumer and post the job.
BlockingCollection<Action>[] _taskQ;
private int taskCounter = -1;
public Multithreading(int workerCount)
{
_taskQ = new BlockingCollection<Action>[workerCount];
for (int i = 0; i < workerCount; i++)
{
int workerId = i;//To avoid closure issue
_taskQ[workerId] = new BlockingCollection<Action>();
Task.Factory.StartNew(()=> Consume(workerId));
}
}
public void EnqueueTask(Action action)
{
int value = Interlocked.Increment(ref taskCounter);
int index = value / 4;//Your own logic to find the index here
_taskQ[index].Add(action);
}
void Consume(int workerId)
{
foreach (Action action in _taskQ[workerId].GetConsumingEnumerable())
action();// Perform task.
}

A simple solution using background workers can limit the number of threads:
public class Scraper : IDisposable
{
private readonly BlockingCollection<Action> tasks;
private readonly IList<BackgroundWorker> workers;
public Scraper(IList<Uri> urls, int numberOfThreads)
{
for (var i = 0; i < urls.Count; i++)
{
var url = urls[i];
tasks.Add(() => Scrape(url));
}
for (var i = 0; i < numberOfThreads; i++)
{
var worker = new BackgroundWorker();
worker.DoWork += (sender, args) =>
{
Action task;
while (tasks.TryTake(out task))
{
task();
}
};
workers.Add(worker);
worker.RunWorkerAsync();
}
}
public void Scrape(Uri url)
{
Console.WriteLine("Scraping url {0}", url);
}
public void Dispose()
{
throw new NotImplementedException();
}
}

Related

C# How to let processes with different speeds work together

In the bellow test scenario i like to trigger some task by using multiple timers. Some event can trigger another event.
An event must finish the process, before a new process can be started. Events that gets triggered, while another event is processing, shall queue up and start once nothing is processing. The timer doesn't need to be accurate.
Once a line has executed the code, which takes just few seconds, the line cant take any new orders for minutes. Thats the purpose im using timers.
The current problem on the code bellow, is that things are getting mixed up in the real App. Line2 starts processing, while Line still hasn't finished. How to make the orders queue up properly and process it?
In the real App MyTask will start to run the first lines of code back and forth, after a while the last lines of the MyTask code will be executed.
Im a beginner, so please be patient.
public partial class Form1 : Form
{
readonly System.Windows.Forms.Timer myTimer1 = new System.Windows.Forms.Timer();
readonly System.Windows.Forms.Timer myTimer2 = new System.Windows.Forms.Timer();
int leadTime1 = 100;
int leadTime2 = 100;
public Form1()
{
InitializeComponent();
TaskStarter();
}
private void TaskStarter()
{
myTimer1.Tick += new EventHandler(myEventTimer1);
myTimer2.Tick += new EventHandler(myEventTimer2);
myTimer1.Interval = leadTime1;
myTimer2.Interval = leadTime2;
myTimer1.Start();
}
private void myEventTimer1(object source, EventArgs e)
{
myTimer1.Stop();
Console.WriteLine("Line1 Processing ");
MyTask();
Console.Write(" Line1 Completed");
leadTime1.Interval = 5000; // this leadtime is variable and will show how long the line cant be used again, after the code is executed
myTimer2.Start();
myTimer1.Enabled = true;
}
private void myEventTimer2(object source, EventArgs e)
{
myTimer2.Stop();
Console.WriteLine("Line2 Processing ");
MyTask();
Console.Write(" Line2 Completed");
leadTime2.Interval = 5000; // this leadtime is variable
myTimer2.Enabled = true;
}
private void MyTask()
{
Random rnd = new Random();
int timeExecuteCode = rnd.Next(1000, 5000); // This leadtime does reflect the execution of the real code
Thread.Sleep(timeExecuteCode );
}
}
Update
Thanks to the input i was able to sort the problems, which made me remove all the timers as they were causing the asynchronous task processing. I not just lock the Lines to a while loop till all orders are completed. All is done in a single Thread. I think for the most Pro my code will look very ugly. This solution is understandable with my 4 weeks C# experience :)
The 2 List i use and the properties
public class Orders
{
public string OrderID { get ; set ; }
public Orders(string orderID) { OrderID = orderID; }
}
public class LineData
{
string lineID;
public string LineID { get { return lineID; } set { lineID = value; } }
private string orderId;
public string OrderID { get { return orderId; } set { orderId = value; } }
public string ID { get { return lineID + OrderID; } private set {; } }
public double TaskTime { get; set; }
}
Creating the Line data with the lead times per Line and Part
Adding some sample orders
while loop till all orders are completed
public class Production
{
readonly static List<LineData> listLineData = new List<LineData>();
readonly static List<Orders> listOrders = new List<Orders>();
static void Main()
{
// List Line Processing Master Data
listLineData.Add(new LineData { LineID = "Line1", OrderID = "SubPart1", TaskTime = 3 });
listLineData.Add(new LineData { LineID = "Line1", OrderID = "SubPart2", TaskTime = 3 });
listLineData.Add(new LineData { LineID = "Line2", OrderID = "Part1", TaskTime = 1 });
listLineData.Add(new LineData { LineID = "Line3", OrderID = "Part1", TaskTime = 1 });
listLineData.Add(new LineData { LineID = "Line3", OrderID = "Part2", TaskTime = 2 });
// Create Order Book
listOrders.Add(new Orders("SubPart1"));
listOrders.Add(new Orders("SubPart2"));
listOrders.Add(new Orders("Part1"));
listOrders.Add(new Orders("Part2"));
listOrders.Add(new Orders("SubPart1"));
listOrders.Add(new Orders("SubPart2"));
listOrders.Add(new Orders("Part1"));
listOrders.Add(new Orders("Part2"));
listOrders.Add(new Orders("SubPart1"));
listOrders.Add(new Orders("SubPart2"));
listOrders.Add(new Orders("Part1"));
listOrders.Add(new Orders("Part2"));
while (listOrders.Count > 0)
{
CheckProductionLines();
Thread.Sleep(100)
}
}
Picking orders from the listOrder and assign them to the correct Line.
Using DateTime.Now and add the taskTime to determine whether a line is busy or not
Sending the orders to void InitializeProduction(int indexOrder, string line) to process the order.
In a later step im going to make a function for Line1-Linex, as it is repetitive.
static DateTime timeLine1Busy = new DateTime();
static DateTime timeLine2Busy = new DateTime();
static DateTime timeLine3Busy = new DateTime();
static void CheckProductionLines()
{
// Line 1
int indexOrderLine1 = listOrders.FindIndex(x => x.OrderID == "SubPart1" || x.OrderID == "SubPart2");
if (indexOrderLine1 >= 0 && timeLine1Busy < DateTime.Now)
{
string id = "Line1" + listOrders[indexOrderLine1].OrderID.ToString();// Construct LineID (Line + Part) for Task
int indexTasktime = listLineData.FindIndex(x => x.ID == id); // Get Index LineData where the tasktime is stored
double taskTime = (listLineData[indexTasktime].TaskTime); // Get the Task Time for the current order (min.)
InitializeProduction(indexOrderLine1, "Line1"); // Push the start button to run the task
timeLine1Busy = DateTime.Now.AddSeconds(taskTime); // Set the Line to busy
}
// Line2
int indexOrderLine2 = listOrders.FindIndex(x => x.OrderID == "Part1"); // Pick order Line2
if (indexOrderLine2 >= 0 && timeLine2Busy < DateTime.Now)
{
string id = "Line2" + listOrders[indexOrderLine2].OrderID.ToString(); // Line2 + Order is unique ID in listLineData List
int indexTasktime = listLineData.FindIndex(x => x.ID == id);// Get Index LineData where the tasktime is stored
double taskTime = (listLineData[indexTasktime].TaskTime); // Get the Task Time for the current order (min.)
InitializeProduction(indexOrderLine2, "Line2"); // Push the start button to run the task
timeLine2Busy = DateTime.Now.AddSeconds(taskTime); // Set the Line to busy
}
// Line 3
int indexOrderLine3 = listOrders.FindIndex(x => x.OrderID == "Part1" || x.OrderID == "Part2"); // Pick order
if (indexOrderLine3 >= 0 && timeLine3Busy < DateTime.Now)
{
string id = "Line3" + listOrders[indexOrderLine3].OrderID.ToString(); // Line3 + Order is unique ID in listLineData List
int indexTasktime = listLineData.FindIndex(x => x.ID == id);// Get Index LineData where the tasktime is stored
double taskTime = (listLineData[indexTasktime].TaskTime); // Get the Task Time for the current order (min.)
InitializeProduction(indexOrderLine3, "Line3"); // Push the start button to run the task
timeLine3Busy = DateTime.Now.AddSeconds(taskTime); // Set the Line to busy
}
}
Here i InitializeProduction the production
Remove the order from listOrders
in real here will be processed many tasks
static void InitializeProduction(int indexOrder, string line)
{
Thread.Sleep(1000); //simulates the inizialsation code
Debug.WriteLine($"{line} {listOrders[indexOrder].OrderID} Completed ");
listOrders.RemoveAt(indexOrder); //Remove Order from List
}
}
Im sure you will see a lot of space for improvement. If simple things can or even must be applied, im listening :)
Addition after comments at the end
Your problem screams for a producer-consumer pattern. This lesser known pattern has a producer who produces things that a consumer consumes.
The speed in which the producer produces items can be different than the speed in which the consumer can consume. Sometimes the producer produces faster, sometimes the producer produces slower.
In your case, the producer produces "requests to execute a task". The consumer will execute a task one at a time.
For this I use Nuget package: Microsoft.Tpl.Dataflow. It can do a lot more, but in your case, usage is simple.
Normally there are a lot of multi-threading issues you have to think about, like critical sections in the send-receive buffer. TPL will handle them for your.
If the Producer is started, it produces requests to do something, to execute and await an Action<Task>. The producer will these requests in a BufferBlock<Action<Task>>. It will produce as fast a possible.
First a factory, that will create Action<Task> with random execution time. Note that every created action is not executed yet, thus the task is not running!
class ActionFactory
{
private readonly Random rnd = new Random();
public Action<Task> Create()
{
TimeSpan timeExecuteCode = TimeSpan.FromMilliseconds(rnd.Next(1000, 5000));
return _ => Task.Delay(timeExecuteCode);
// if you want, you can use Thread.Sleep
}
}
The producer is fairly simple:
class Producer
{
private readonly BufferBlock<Action<Task>> buffer = new BufferBlock<Action<Task>>();
public TaskFactory TaskFactory {get; set;}
public ISourceBlock<Action<Task> ProducedActions => buffer;
public async Task ProduceAsync()
{
// Create several tasks and put them on the buffer
for (int i=0; i<10; ++i)
{
Action<Task> createdAction = this.TaskFactory.Create();
await this.buffer.SendAsync(createdAction);
}
// notice listeners to my output that I won't produce anything anymore
this.buffer.Complete();
}
If you want, you can optimize this: while SendAsync, you could create the next action. then await SendAsync task, before sending the next action. For simplicity I didn't do this.
The Consumer needs an input, that accepts Action<Task> objects. It will read this input, execute the action and wait until the action is completed before fetching the next input from the buffer.
class Consumer
{
public ISourceBlock<Action<Task>> ActionsToConsume {get; set;}
public async Task ConsumeAsync()
{
// wait until the producer has produced something,
// or says that nothing will be produced anymore
while (await this.ActionsToConsume.OutputAvailableAsync())
{
// the Producer has produced something; fetch it
Action<Task> actionToExecute = this.ActionsToConsume.ReceiveAsync();
// execute the action, and await the eturned Task
await actionToExecute();
// wait until Producer produces a new action.
}
// if here: producer notifies completion: nothing is expected anymore
}
Put it all together:
TaskFactory factory = new TaskFactory();
Producer producer = new Producer
{
TaskFactory = factory;
}
Consumer consumer = new Consumer
{
Buffer = producer.ProducedActions;
}
// Start Producing and Consuming and wait until everything is ready
var taskProduce = producer.ProduceAsync();
var taskConsume = consumer.ConsumeAsync();
// now producer is happily producing actions and sending them to the consumer.
// the consumer is waiting for actions to consume
// await until both tasks are finished:
await Task.WhenAll(new Task[] {taskProduce, taskConsume});
Addition after comment: do it with less code
The above seems a lot of work. I created separate classes, so you could see who is responsible for what. If you want, you can do it all with one buffer and two methods: a method that produces and a method that consumes:
private readonly BufferBlock<Action<Task>> buffer = new BufferBlock<Action<Task>>();
public async Task ProduceTasksAsync()
{
// Create several tasks and put them on the buffer
for (int i=0; i<10; ++i)
{
Action<Task> createdAction = ...
await this.buffer.SendAsync(createdAction);
}
// producer will not produce anything anymore:
buffer.Complete();
}
async Task ConsumeAsync()
{
while (await this.ActionsToConsume.OutputAvailableAsync())
{
// the Producer has produced something; fetch it, execute it
Action<Task> actionToExecute = this.ActionsToConsume.ReceiveAsync();
await actionToExecute();
}
}
Usage:
async Task ProduceAndConsumeAsync()
{
var taskProduce = producer.ProduceAsync();
var taskConsume = consumer.ConsumeAsync();
await Task.WhenAll(new Task[] {taskProduce, taskConsume});
}
Your problem is that both timers run on the same UI event loop. that means that while timer1 is doing it's event no other events are executed on that thread. The solution to this is to use tha async await pattern that runs code in the background in your case you can do something like this:
private async void myEventTimer1(object source, EventArgs e)
{
myTimer1.Stop();
Console.WriteLine("Line1 Processing ");
await MyTask();
Console.Write(" Line1 Completed");
myTimer1.Interval = 5000; // this leadtime is variable
myTimer2.Start();
myTimer1.Enabled = true;
}
private async void myEventTimer2(object source, EventArgs e)
{
myTimer2.Stop();
Console.WriteLine("Line2 Processing ");
await MyTask();
Console.Write(" Line2 Completed");
myTimer2.Interval = 5000; // this leadtime is variable
myTimer2.Enabled = true;
}
private async Task MyTask()
{
Random rnd = new Random();
int tleadtime = rnd.Next(1000, 5000);
await Task.Delay(tleadtime);
}
This runs MyTask (really just the Delay part) in the background, but continues in the foreground once it is complete.
Now to be clear, this isn't technically an answer to your question as you have asked it, but I believe it will produce the underlying behavior that you asking for (in comments), and I believe in helping people.
We have three classes, Order, Line and Factory written in Console Application as an example.
Order is straight forward, it has two properties, an identifying name, and a leadtime in seconds.
public class Order
{
public string OrderName { get; set; }
public int LeadTimeSeconds { get; set; }
public Order(string orderName, int leadTimeSeconds)
{
OrderName = orderName;
LeadTimeSeconds = leadTimeSeconds;
}
}
Line inherits from a BackgroundWorker MSDN - BackgroundWorker. I won't go into detail here as there are many posts on the subject, but you may delegate to the DoWork event that is invoked asynchronously. They allow you to do something continuously (or prolonged periods) without blocking behaviors since they expose a CancelAsync() method. Line also has reference to your Queue<Order>. A Queue<T> is a nice collection as it allows you to easily Dequeue() the next item in line. Within the constructor, Line calls RunWorkerAsync(), invoking the DoWork event, and in turn the handler Line_ProcessOrder.
public class Line: BackgroundWorker
{
public string LineName { get; set; }
public Queue<Order> OrderQueue { get; set; }
public Line (string lineName, Queue<Order> orderQueue)
{
LineName = lineName;
OrderQueue = orderQueue;
DoWork += Line_ProcessOrder;
RunWorkerAsync();
}
private void Line_ProcessOrder(object sender, DoWorkEventArgs e)
{
Order targetOrder;
BackgroundWorker worker = sender as BackgroundWorker;
while (true)
{
if (worker.CancellationPending == true)
{
e.Cancel = true;
break;
}
else
{
if (OrderQueue.Count > 0)
{
targetOrder = OrderQueue.Dequeue();
Console.WriteLine($"{LineName} is processing {targetOrder.OrderName}");
Thread.Sleep(targetOrder.LeadTimeSeconds * 1000);
Console.WriteLine($"{LineName} finished {targetOrder.OrderName}");
}
}
}
}
}
Finally, Factory brings this all together. We can have any number of Lines, sharing a Queue<Order>, created from any IEnumerable<Queue> that you may of otherwise had. Note that the Lines start working immediately on their construction. You may wish to add Start() and Stop() methods for example.
public class Factory
{
static void Main(string[] args)
{
List<Order> Orders = new List<Order>()
{
new Order("Order1",10),
new Order("Order2",8),
new Order("Order3",5),
new Order("Order4",15)
};
Queue<Order> OrderQueue = new Queue<Order>(Orders);
Line Line1 = new Line("Line1", OrderQueue);
Line Line2 = new Line("Line2", OrderQueue);
while (true) { }
}
}
This may not be exactly what you needed, but I hope it can take you away from the timer approach towards asynchronous programming.

Parallel.For & getting the next free thread/worker

I have the following parallelised code. What I'm not sure of is how to set the workerIndex variable:
// Initializing Worker takes time & must be done before the actual work
Worker[] w = new Worker[3]; // I would like to limit the parallelism to 3
for (int i = 0; i < 3; i++)
w[i] = new Worker();
...
element[] elements = GetArrayOfElements(); // elements.Length > 3
ParallelOptions options = new ParallelOption();
options.MaxDegreeOfParallelism = 3;
Parallel.For(0, elements.Length, options, i =>
{
element e = elements[i];
w[workerIndex].Work(e); // how to set "workerIndex"?
});
Is there some mechanism that says which worker thread id is free next?
How about you just new Worker() in the Parallel.For loop, and add them to w (you would need to change w to a concurrent list).
Possibly (if it is not too complex) move the contents of the .Work(e) method to the body of the loop, eliminating the need for the Worker class.
Edit:
If you change your Worker array to a IEnumerable (i.e. a List<Worker>) you can use .AsParallel() to make it parallel. Then you can use .ForAll(worker -> worker.Work()) to do the work in parallel. This would require you to pass the element into the worker through the constructor.
Looks like Object pool pattern is best for you. You can write the code like this:
const int Limit = 3;
using (var pool = new QueueObjectPool<Worker>(a => new Worker(), Limit)) {
element[] elements = GetArrayOfElements();
var options = new ParallelOptions { MaxDegreeOfParallelism = Limit };
Parallel.For(0, elements.Length, options, i => {
element e = elements[i];
Worker worker = null;
try {
worker = pool.Acquire();
worker.Work(e);
} finally {
pool.Release(worker);
}
});
}
During start every element will wait for available worker and only three workers will be initialized form beginning. Here is simplified queue base object pool implementation:
public sealed class QueueObjectPool<TObject> : IDisposable {
private readonly Queue<TObject> _poolQueue;
private readonly Func<QueueObjectPool<TObject>, TObject> _factory;
private readonly int _capacity;
private readonly SemaphoreSlim _throttler;
public QueueObjectPool(Func<QueueObjectPool<TObject>, TObject> factory, int capacity) {
_factory = factory;
_capacity = capacity;
_throttler = new SemaphoreSlim(initialCount: capacity, maxCount: capacity);
_poolQueue = CreatePoolQueue();
}
public TObject Acquire() {
_throttler.Wait();
lock (_poolQueue) {
return _poolQueue.Dequeue();
}
}
public void Release(TObject poolObject) {
lock (_poolQueue) {
_poolQueue.Enqueue(poolObject);
}
_throttler.Release();
}
private Queue<TObject> CreatePoolQueue() {
var queue = new Queue<TObject>(_capacity);
int itemsLeft = _capacity;
while (itemsLeft > 0) {
TObject queueObject = _factory(this);
queue.Enqueue(queueObject);
itemsLeft -= 1;
}
return queue;
}
public void Dispose() {
throw new NotImplementedException();
}
}
This code is for demo purposes. In real work it is better to use async/await based logic which is easily achieved using SemaphoreSlim.WaitAsync and you can replace Parallel.For with simple loop.

Use Task.Run instead of Delegate.BeginInvoke

I have recently upgraded my projects to ASP.NET 4.5 and I have been waiting a long time to use 4.5's asynchronous capabilities. After reading the documentation I'm not sure whether I can improve my code at all.
I want to execute a task asynchronously and then forget about it. The way that I'm currently doing this is by creating delegates and then using BeginInvoke.
Here's one of the filters in my project with creates an audit in our database every time a user accesses a resource that must be audited:
public override void OnActionExecuting(ActionExecutingContext filterContext)
{
var request = filterContext.HttpContext.Request;
var id = WebSecurity.CurrentUserId;
var invoker = new MethodInvoker(delegate
{
var audit = new Audit
{
Id = Guid.NewGuid(),
IPAddress = request.UserHostAddress,
UserId = id,
Resource = request.RawUrl,
Timestamp = DateTime.UtcNow
};
var database = (new NinjectBinder()).Kernel.Get<IDatabaseWorker>();
database.Audits.InsertOrUpdate(audit);
database.Save();
});
invoker.BeginInvoke(StopAsynchronousMethod, invoker);
base.OnActionExecuting(filterContext);
}
But in order to finish this asynchronous task, I need to always define a callback, which looks like this:
public void StopAsynchronousMethod(IAsyncResult result)
{
var state = (MethodInvoker)result.AsyncState;
try
{
state.EndInvoke(result);
}
catch (Exception e)
{
var username = WebSecurity.CurrentUserName;
Debugging.DispatchExceptionEmail(e, username);
}
}
I would rather not use the callback at all due to the fact that I do not need a result from the task that I am invoking asynchronously.
How can I improve this code with Task.Run() (or async and await)?
If I understood your requirements correctly, you want to kick off a task and then forget about it. When the task completes, and if an exception occurred, you want to log it.
I'd use Task.Run to create a task, followed by ContinueWith to attach a continuation task. This continuation task will log any exception that was thrown from the parent task. Also, use TaskContinuationOptions.OnlyOnFaulted to make sure the continuation only runs if an exception occurred.
Task.Run(() => {
var audit = new Audit
{
Id = Guid.NewGuid(),
IPAddress = request.UserHostAddress,
UserId = id,
Resource = request.RawUrl,
Timestamp = DateTime.UtcNow
};
var database = (new NinjectBinder()).Kernel.Get<IDatabaseWorker>();
database.Audits.InsertOrUpdate(audit);
database.Save();
}).ContinueWith(task => {
task.Exception.Handle(ex => {
var username = WebSecurity.CurrentUserName;
Debugging.DispatchExceptionEmail(ex, username);
});
}, TaskContinuationOptions.OnlyOnFaulted);
As a side-note, background tasks and fire-and-forget scenarios in ASP.NET are highly discouraged. See The Dangers of Implementing Recurring Background Tasks In ASP.NET
It may sound a bit out of scope, but if you just want to forget after you launch it, why not using directly ThreadPool?
Something like:
ThreadPool.QueueUserWorkItem(
x =>
{
try
{
// Do something
...
}
catch (Exception e)
{
// Log something
...
}
});
I had to do some performance benchmarking for different async call methods and I found that (not surprisingly) ThreadPool works much better, but also that, actually, BeginInvoke is not that bad (I am on .NET 4.5). That's what I found out with the code at the end of the post. I did not find something like this online, so I took the time to check it myself. Each call is not exactly equal, but it is more or less functionally equivalent in terms of what it does:
ThreadPool: 70.80ms
Task: 90.88ms
BeginInvoke: 121.88ms
Thread: 4657.52ms
public class Program
{
public delegate void ThisDoesSomething();
// Perform a very simple operation to see the overhead of
// different async calls types.
public static void Main(string[] args)
{
const int repetitions = 25;
const int calls = 1000;
var results = new List<Tuple<string, double>>();
Console.WriteLine(
"{0} parallel calls, {1} repetitions for better statistics\n",
calls,
repetitions);
// Threads
Console.Write("Running Threads");
results.Add(new Tuple<string, double>("Threads", RunOnThreads(repetitions, calls)));
Console.WriteLine();
// BeginInvoke
Console.Write("Running BeginInvoke");
results.Add(new Tuple<string, double>("BeginInvoke", RunOnBeginInvoke(repetitions, calls)));
Console.WriteLine();
// Tasks
Console.Write("Running Tasks");
results.Add(new Tuple<string, double>("Tasks", RunOnTasks(repetitions, calls)));
Console.WriteLine();
// Thread Pool
Console.Write("Running Thread pool");
results.Add(new Tuple<string, double>("ThreadPool", RunOnThreadPool(repetitions, calls)));
Console.WriteLine();
Console.WriteLine();
// Show results
results = results.OrderBy(rs => rs.Item2).ToList();
foreach (var result in results)
{
Console.WriteLine(
"{0}: Done in {1}ms avg",
result.Item1,
(result.Item2 / repetitions).ToString("0.00"));
}
Console.WriteLine("Press a key to exit");
Console.ReadKey();
}
/// <summary>
/// The do stuff.
/// </summary>
public static void DoStuff()
{
Console.Write("*");
}
public static double RunOnThreads(int repetitions, int calls)
{
var totalMs = 0.0;
for (var j = 0; j < repetitions; j++)
{
Console.Write(".");
var toProcess = calls;
var stopwatch = new Stopwatch();
var resetEvent = new ManualResetEvent(false);
var threadList = new List<Thread>();
for (var i = 0; i < calls; i++)
{
threadList.Add(new Thread(() =>
{
// Do something
DoStuff();
// Safely decrement the counter
if (Interlocked.Decrement(ref toProcess) == 0)
{
resetEvent.Set();
}
}));
}
stopwatch.Start();
foreach (var thread in threadList)
{
thread.Start();
}
resetEvent.WaitOne();
stopwatch.Stop();
totalMs += stopwatch.ElapsedMilliseconds;
}
return totalMs;
}
public static double RunOnThreadPool(int repetitions, int calls)
{
var totalMs = 0.0;
for (var j = 0; j < repetitions; j++)
{
Console.Write(".");
var toProcess = calls;
var resetEvent = new ManualResetEvent(false);
var stopwatch = new Stopwatch();
var list = new List<int>();
for (var i = 0; i < calls; i++)
{
list.Add(i);
}
stopwatch.Start();
for (var i = 0; i < calls; i++)
{
ThreadPool.QueueUserWorkItem(
x =>
{
// Do something
DoStuff();
// Safely decrement the counter
if (Interlocked.Decrement(ref toProcess) == 0)
{
resetEvent.Set();
}
},
list[i]);
}
resetEvent.WaitOne();
stopwatch.Stop();
totalMs += stopwatch.ElapsedMilliseconds;
}
return totalMs;
}
public static double RunOnBeginInvoke(int repetitions, int calls)
{
var totalMs = 0.0;
for (var j = 0; j < repetitions; j++)
{
Console.Write(".");
var beginInvokeStopwatch = new Stopwatch();
var delegateList = new List<ThisDoesSomething>();
var resultsList = new List<IAsyncResult>();
for (var i = 0; i < calls; i++)
{
delegateList.Add(DoStuff);
}
beginInvokeStopwatch.Start();
foreach (var delegateToCall in delegateList)
{
resultsList.Add(delegateToCall.BeginInvoke(null, null));
}
// We lose a bit of accuracy, but if the loop is big enough,
// it should not really matter
while (resultsList.Any(rs => !rs.IsCompleted))
{
Thread.Sleep(10);
}
beginInvokeStopwatch.Stop();
totalMs += beginInvokeStopwatch.ElapsedMilliseconds;
}
return totalMs;
}
public static double RunOnTasks(int repetitions, int calls)
{
var totalMs = 0.0;
for (var j = 0; j < repetitions; j++)
{
Console.Write(".");
var resultsList = new List<Task>();
var stopwatch = new Stopwatch();
stopwatch.Start();
for (var i = 0; i < calls; i++)
{
resultsList.Add(Task.Factory.StartNew(DoStuff));
}
// We lose a bit of accuracy, but if the loop is big enough,
// it should not really matter
while (resultsList.Any(task => !task.IsCompleted))
{
Thread.Sleep(10);
}
stopwatch.Stop();
totalMs += stopwatch.ElapsedMilliseconds;
}
return totalMs;
}
}
Here's one of the filters in my project with creates an audit in our database every time a user accesses a resource that must be audited
Auditing is certainly not something I would call "fire and forget". Remember, on ASP.NET, "fire and forget" means "I don't care whether this code actually executes or not". So, if your desired semantics are that audits may occasionally be missing, then (and only then) you can use fire and forget for your audits.
If you want to ensure your audits are all correct, then either wait for the audit save to complete before sending the response, or queue the audit information to reliable storage (e.g., Azure queue or MSMQ) and have an independent backend (e.g., Azure worker role or Win32 service) process the audits in that queue.
But if you want to live dangerously (accepting that occasionally audits may be missing), you can mitigate the problems by registering the work with the ASP.NET runtime. Using the BackgroundTaskManager from my blog:
public override void OnActionExecuting(ActionExecutingContext filterContext)
{
var request = filterContext.HttpContext.Request;
var id = WebSecurity.CurrentUserId;
BackgroundTaskManager.Run(() =>
{
try
{
var audit = new Audit
{
Id = Guid.NewGuid(),
IPAddress = request.UserHostAddress,
UserId = id,
Resource = request.RawUrl,
Timestamp = DateTime.UtcNow
};
var database = (new NinjectBinder()).Kernel.Get<IDatabaseWorker>();
database.Audits.InsertOrUpdate(audit);
database.Save();
}
catch (Exception e)
{
var username = WebSecurity.CurrentUserName;
Debugging.DispatchExceptionEmail(e, username);
}
});
base.OnActionExecuting(filterContext);
}

What is the best scenario for one fast producer multiple slow consumers?

I'm looking for the best scenario to implement one producer multiple consumer multithreaded application.
Currently I'm using one queue for shared buffer but it's much slower than the case of one producer one consumer.
I'm planning to do it like this:
Queue<item>[] buffs = new Queue<item>[N];
object[] _locks = new object[N];
static void Produce()
{
int curIndex = 0;
while(true)
{
// Produce item;
lock(_locks[curIndex])
{
buffs[curIndex].Enqueue(curItem);
Monitor.Pulse(_locks[curIndex]);
}
curIndex = (curIndex+1)%N;
}
}
static void Consume(int myIndex)
{
item curItem;
while(true)
{
lock(_locks[myIndex])
{
while(buffs[myIndex].Count == 0)
Monitor.Wait(_locks[myIndex]);
curItem = buffs[myIndex].Dequeue();
}
// Consume item;
}
}
static void main()
{
int N = 100;
Thread[] consumers = new Thread[N];
for(int i = 0; i < N; i++)
{
consumers[i] = new Thread(Consume);
consumers[i].Start(i);
}
Thread producer = new Thread(Produce);
producer.Start();
}
Use a BlockingCollection
BlockingCollection<item> _buffer = new BlockingCollection<item>();
static void Produce()
{
while(true)
{
// Produce item;
_buffer.Add(curItem);
}
// eventually stop producing
_buffer.CompleteAdding();
}
static void Consume(int myIndex)
{
foreach (var curItem in _buffer.GetConsumingEnumerable())
{
// Consume item;
}
}
static void main()
{
int N = 100;
Thread[] consumers = new Thread[N];
for(int i = 0; i < N; i++)
{
consumers[i] = new Thread(Consume);
consumers[i].Start(i);
}
Thread producer = new Thread(Produce);
producer.Start();
}
If you don't want to specify number of threads from start you can use Parallel.ForEach instead.
static void Consume(item curItem)
{
// consume item
}
void Main()
{
Thread producer = new Thread(Produce);
producer.Start();
Parallel.ForEach(_buffer.GetConsumingPartitioner(), Consumer)
}
Using more threads won't help. It may even reduce performance. I suggest you try to use ThreadPool where every work item is one item created by the producer. However, that doesn't guarantee the produced items to be consumed in the order they were produced.
Another way could be to reduce the number of consumers to 4, for example and modify the way they work as follows:
The producer adds the new work to the queue. There's only one global queue for all worker threads. It then sets a flag to indicate there is new work like this:
ManualResetEvent workPresent = new ManualResetEvent(false);
Queue<item> workQueue = new Queue<item>();
static void Produce()
{
while(true)
{
// Produce item;
lock(workQueue)
{
workQueue.Enqueue(newItem);
workPresent.Set();
}
}
}
The consumers wait for work to be added to the queue. Only one consumer will get to do its job. It then takes all the work from the queue and resets the flag. The producer will not be able to add new work until that is done.
static void Consume()
{
while(true)
{
if (WaitHandle.WaitOne(workPresent))
{
workPresent.Reset();
Queue<item> localWorkQueue = new Queue<item>();
lock(workQueue)
{
while (workQueue.Count > 0)
localWorkQueue.Enqueue(workQueue.Dequeue());
}
// Handle items in local work queue
...
}
}
}
That outcome of this, however, is a bit unpredictable. It could be that one thread is doing all the work and the others do nothing.
I don't see why you have to use multiple queues. Just reduce the amount of locking. Here is an sample where you can have a large number of consumers and they all wait for new work.
public class MyWorkGenerator
{
ConcurrentQueue<object> _queuedItems = new ConcurrentQueue<object>();
private object _lock = new object();
public void Produce()
{
while (true)
{
_queuedItems.Enqueue(new object());
Monitor.Pulse(_lock);
}
}
public object Consume(TimeSpan maxWaitTime)
{
if (!Monitor.Wait(_lock, maxWaitTime))
return null;
object workItem;
if (_queuedItems.TryDequeue(out workItem))
{
return workItem;
}
return null;
}
}
Do note that Pulse() will only trigger one consumer at a time.
Example usage:
static void main()
{
var generator = new MyWorkGenerator();
var consumers = new Thread[20];
for (int i = 0; i < consumers.Length; i++)
{
consumers[i] = new Thread(DoWork);
consumers[i].Start(generator);
}
generator.Produce();
}
public static void DoWork(object state)
{
var generator = (MyWorkGenerator) state;
var workItem = generator.Consume(TimeSpan.FromHours(1));
while (workItem != null)
{
// do work
workItem = generator.Consume(TimeSpan.FromHours(1));
}
}
Note that the actual queue is hidden in the producer as it's imho an implementation detail. The consumers doesn't really have to know how the work items are generated.

waitall for multiple handles on sta thread is not supported [duplicate]

This question already has answers here:
WaitAll for multiple handles on a STA thread is not supported
(5 answers)
Closed 9 years ago.
Hi all i have this exception when i run my app.
i work on .net 3.5 so i cannot use Task
waitall for multiple handles on sta thread is not supported
this is the code :-
private void ThreadPopFunction(ContactList SelectedContactList, List<User> AllSelectedUsers)
{
int NodeCount = 0;
AllSelectedUsers.EachParallel(user =>
{
NodeCount++;
if (user != null)
{
if (user.OCSEnable)
{
string messageExciption = string.Empty;
if (!string.IsNullOrEmpty(user.SipURI))
{
//Lync.Lync.Lync lync = new Lync.Lync.Lync(AdObjects.Pools);
List<Pool> myPools = AdObjects.Pools;
if (new Lync.Lync.Lync(myPools).Populate(user, SelectedContactList, out messageExciption))
{
}
}
}
}
});
}
and this is my extension method i use to work with multithreading
public static void EachParallel<T>(this IEnumerable<T> list, Action<T> action)
{
// enumerate the list so it can't change during execution
// TODO: why is this happening?
list = list.ToArray();
var count = list.Count();
if (count == 0)
{
return;
}
else if (count == 1)
{
// if there's only one element, just execute it
action(list.First());
}
else
{
// Launch each method in it's own thread
const int MaxHandles = 64;
for (var offset = 0; offset <= count/MaxHandles; offset++)
{
// break up the list into 64-item chunks because of a limitiation in WaitHandle
var chunk = list.Skip(offset*MaxHandles).Take(MaxHandles);
// Initialize the reset events to keep track of completed threads
var resetEvents = new ManualResetEvent[chunk.Count()];
// spawn a thread for each item in the chunk
int i = 0;
foreach (var item in chunk)
{
resetEvents[i] = new ManualResetEvent(false);
ThreadPool.QueueUserWorkItem(new WaitCallback((object data) =>
{
int methodIndex =
(int) ((object[]) data)[0];
// Execute the method and pass in the enumerated item
action((T) ((object[]) data)[1]);
// Tell the calling thread that we're done
resetEvents[methodIndex].Set();
}), new object[] {i, item});
i++;
}
// Wait for all threads to execute
WaitHandle.WaitAll(resetEvents);
}
}
}
if you can help me, i'll appreciate your support
OK, as you're using .Net 3.5, you can't use the TPL introduced with .Net 4.0.
STA thread or not, in your case there is a way more simple/efficient approach than WaitAll. You could simply have a counter and a unique WaitHandle. Here's some code (can't test it right now, but it should be fine):
// No MaxHandle limitation ;)
for (var offset = 0; offset <= count; offset++)
{
// Initialize the reset event
var resetEvent = new ManualResetEvent();
// Queue action in thread pool for each item in the list
int counter = count;
foreach (var item in list)
{
ThreadPool.QueueUserWorkItem(new WaitCallback((object data) =>
{
int methodIndex =
(int) ((object[]) data)[0];
// Execute the method and pass in the enumerated item
action((T) ((object[]) data)[1]);
// Decrements counter atomically
Interlocked.Decrement(ref counter);
// If we're at 0, then last action was executed
if (Interlocked.Read(ref counter) == 0)
{
resetEvent.Set();
}
}), new object[] {i, item});
}
// Wait for the single WaitHandle
// which is only set when the last action executed
resetEvent.WaitOne();
}
Also FYI, ThreadPool.QueueUserWorkItem doesn't spawn a thread each time it's called (I'm saying that because of the comment "spawn a thread for each item in the chunk"). It uses a pool of thread, so it mostly reuses existing threads.
For those like me, who need to use the examples.
ken2k's solution is great and it works but with a few corrections (he said he didn't test it). Here is ken2k's working example (worked for me):
// No MaxHandle limitation ;)
for (var offset = 0; offset <= count; offset++)
{
// Initialize the reset event
var resetEvent = new ManualResetEvent(false);
// Queue action in thread pool for each item in the list
long counter = count;
// use a thread for each item in the chunk
int i = 0;
foreach (var item in list)
{
ThreadPool.QueueUserWorkItem(new WaitCallback((object data) =>
{
int methodIndex =
(int) ((object[]) data)[0];
// Execute the method and pass in the enumerated item
action((T) ((object[]) data)[1]);
// Decrements counter atomically
Interlocked.Decrement(ref counter);
// If we're at 0, then last action was executed
if (Interlocked.Read(ref counter) == 0)
{
resetEvent.Set();
}
}), new object[] {i, item});
}
// Wait for the single WaitHandle
// which is only set when the last action executed
resetEvent.WaitOne();
}
Actually there is a way to use (at least a good part) of the TPL in .net 3.5. There is a backport that was done for the Rx-Project.
You can find it here: http://www.nuget.org/packages/TaskParallelLibrary
Maybe this will help.

Categories