I am using two threads in a C# application that access the same BlockingCollection. This works fine, but I want to retrieve the first value twice so the two threads retrieve the same value *.
After a few seconds I want to poll the currentIndex of both threads and delete every value < index. So for example the lowest currentIndex of a thread is 5, the application deletes theitems at index 0 -5 in the queue. Another solution is to delete the value in the queue if all threads processed the value.
How can I accomplish this? I think I need another type of buffer..?
Thank you in advance!
*If .Take() is called by thread1, the item is removed in the collection and thread2 can't get the same item again.
Update:
I want to store data in a buffer, so for example thread1 saves the data to a HDD and thread2 analyzes the (same) data (concurrent).
Use a producer-consumer to add Value1 to two separate ConcurrentQueues. Have the threads dequeue then process them from their own queue.
Edit 7/4/14:
Here's a, hazy, hacky, and half thought out solution: Create a custom object that is buffered. It could include space for both the information you're trying to buffer in thread 1 and the analysis results in thread 2.
Add the objects to a buffer in thread 1 and a BlockingCollection. Use thread 2 to analyse the results and update the objects with the results. The blocking collection shouldn't get too big, and since it's only dealing with references shouldn't hit your memory. This assumes that you won't be modifying the info in the buffer at the same time on both threads.
Another, also half thought out solution is to feed the info into the buffer and a blocking collection simultaneously. Analyse the data from the BlockingCollection, feed it into an output collection and match them up with the buffer again. This option can handle concurrent modification if you do it right, but is probably more work.
I think option one is better. As I've pointed out, these are only half-formed, but they might help you find something that suits your specific needs. Good luck.
I would suggest to rethink your design.
When you have a list of items which have to processed then give each thread a queue of items which he have to work on.
With such a solution it wouldn't be a problem to give both or more threads the same value to process.
Something like this, not tested just typed.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;
using System.Collections.Concurrent;
namespace ConsoleApplication2
{
class Item
{
private int _value;
public int Value
{
get
{
return _value;
}
}
// all you need
public Item(int i)
{
_value = i;
}
}
class WorkerParameters
{
public ConcurrentQueue<Item> Items = new ConcurrentQueue<Item>();
}
class Worker
{
private Thread _thread;
private WorkerParameters _params = new WorkerParameters();
public void EnqueueItem(Item item)
{
_params.Items.Enqueue(item);
}
public void Start()
{
_thread = new Thread(new ParameterizedThreadStart(ThreadProc));
_thread.Start();
}
public void Stop()
{
// build somthing to stop your thread
}
public static void ThreadProc(object threadParams)
{
WorkerParameters p = (WorkerParameters)threadParams;
while (true)
{
while (p.Items.Count > 0)
{
Item item = null;
p.Items.TryDequeue(out item);
if (item != null)
{
// do something
}
}
System.Threading.Thread.Sleep(50);
}
}
}
class Program
{
static void Main(string[] args)
{
Worker w1 = new Worker();
Worker w2 = new Worker();
w1.Start();
w2.Start();
List<Item> itemsToProcess = new List<Item>();
for (int i = 1; i < 1000; i++)
{
itemsToProcess.Add(new Item(i));
}
for (int i = 1; i < 1000; i++)
{
w1.EnqueueItem(itemsToProcess[i]);
w2.EnqueueItem(itemsToProcess[i]);
}
}
}
}
Related
I am a C# programmer, and I run into some thread issue problem.
Assets are entities, and I need to run each asset parallel, and run a method "doSomethingOnAsset"
I have a program that has 100 thread (i.e 1 thread per asset I am doing on it some manipulations). Generally each thread has the same time frame on each intrval that is running, and each one call "doSomethingOnAsset" method.
Each thread interval is running 10 millisecond (i.e).
I don't want so many threads, so I create one queue for each asset, but when calling the central method "doSomethingOnAsset" - the threads are not running in same time frame interval.
i.e the 1st thread running interval cycle is 300 milliseconds.
the 2nd thread running interval cycle is 700 milliseconds.
the 3rd thread running interval cycle is 2 seconds.
...
What is the best way running a predefined method 100 times parallel (the parallel entry may be an external service that when running, trigger an event that run my code of "doSomethingOnAsset".
public void doSomethingOnAsset(object obj)
{
// infinite loop when thread.
while (true)
{
doSomething(obj);
Thread.Sleep(100);
}
}
public void doSomething(object obj)
{
// do something.
}
public void Run()
{
Thread t;
for (int i = 0; i < 100; i++)
{
t = new Thread(new ParameterizedThreadStart(this.doSomethingOnAsset));
t.Start(new object());
}
Console.ReadLine();
}
or call doSomething on event signal, when an external program trigger.
Thanks :)
For these kinds of producing-consumer situations I usually define a blocking collection, define and create a consumer (or multiple), and start adding data to the collection. Each consumer instance will try to take an item, and if any, consume it. Otherwise, it will wait for an item.
You could add a cancellation token to support to stop processing.
You can scale it easily by adding more consumers. Of course, what number is the most efficient depends on the machine and the number of cores, in combination with the processing-length-per-item.
The consumer:
public class MyConsumer<T> {
public MyConsumer(BlockingCollection<T> collection, Action<T> action) {
_collection = collection;
_action = action;
}
private readonly BlockingCollection<T> _collection;
private readonly Action<T> _action;
public void StartConsuming() {
new Task(Consume).Start();
}
private void Consume() {
while (true) {
var obj = _collection.Take();
_action(obj);
}
}
}
Usage:
public void doSomething(object obj) {
// do something.
}
public void Run() {
var collection = new BlockingCollection<object>();
// Start workers
for (int i = 0; i < 5; i++) {
new MyConsumer<object>(collection, doSomethingOnAsset);
}
// Create object to consume
for (int i = 0; i < 100; i++) {
collection.Add(new object());
}
}
I have a scenario in which I need to create number of threads dynamically based on the configurable variable.I can only start that number of thread at a time and as soon as one of the thread is completed,I need to assign a method in same thread as in queue.
Can any one help me to resolve the above scenario with an example.
I have been researching for a week but not able to get the concrete solution.
There are many ways to approach this, but which is best depends on your specific problem.
However, let's assume that you have a collection of items that you want to do some work on, with a separate thread processing each item - up to a maximum number of simultaneous threads that you specify.
One very simple way to do that is to use Plinq via AsParallel() and WithDegreeOfParallelism(), as the following console application demonstrates:
using System;
using System.Linq;
using System.Threading;
namespace Demo
{
static class Program
{
static void Main()
{
int maxThreads = 4;
var workItems = Enumerable.Range(1, 100);
var parallelWorkItems = workItems.AsParallel().WithDegreeOfParallelism(maxThreads);
parallelWorkItems.ForAll(worker);
}
static void worker(int value)
{
Console.WriteLine($"Worker {Thread.CurrentThread.ManagedThreadId} is processing {value}");
Thread.Sleep(1000); // Simulate work.
}
}
}
If you run this and inspect the output, you'll see that multiple threads are processing the work items, but the maximum number of threads is limited to the specified value.
You should have a look at thread pooling. See this link for more information Threadpooling in .NET. You will most likely have to work with the callbacks to accomplish your task to call a method as soon as work in one thread was done
There might be a smarter solution for you using async/await, depending on what you are trying to achieve. But since you explicitly ask about threads, here is a short class that does what you want:
public class MutliThreadWorker : IDisposable
{
private readonly ConcurrentQueue<Action> _actions = new ConcurrentQueue<Action>();
private readonly List<Thread> _threads = new List<Thread>();
private bool _disposed;
private void ThreadFunc()
{
while (true)
{
Action action;
while (!_actions.TryDequeue(out action)) Thread.Sleep(100);
action();
}
}
public MutliThreadWorker(int numberOfThreads)
{
for (int i = 0; i < numberOfThreads; i++)
{
Thread t = new Thread(ThreadFunc);
_threads.Add(t);
t.Start();
}
}
public void Dispose()
{
Dispose(true);
}
protected virtual void Dispose(bool disposing)
{
_disposed = true;
foreach (Thread t in _threads)
t.Abort();
if (disposing)
GC.SuppressFinalize(this);
}
public void Enqueue(Action action)
{
if (_disposed)
throw new ObjectDisposedException("MultiThreadWorker");
_actions.Enqueue(action);
}
}
This class starts the required number of threads when instantiated as this:
int requiredThreadCount = 16; // your configured value
MultiThreadWorker mtw = new MultiThreadWorker(requiredThreadCount);
It then uses a ConcurrentQueue<T> to keep track of the tasks to do. You can add methods to the queue via
mtw.Enqueue(() => DoThisTask());
I made it IDisposable to make sure the treads are stopped in the end. Of course this would need a little improvemnt since aborting threads like this is not the best practice.
The ThreadFunc itself checks repeatedly if there are queued actions and executes them. This could also be improved a little by patterns using Monitor.Pulse and Monitor.Wait etc.
And as I said, async/await may lead to better solutions, but you asked for threads explicitly.
I have a main thread which is controlling a windows form, upon pressing a button two threads are executed. One is used for recording information, the other is used for reading it. The idea behind putting these in threads is to enable the user to interact with the interface while they are executing.
Here is the creating of the two threads;
Thread recordThread = new Thread(() => RecordData(data));
recordThread.Name = "record";
recordThread.Start();
Thread readThread = new Thread(() => ReadData(data));
readThread.Name = "read";
readThread.Start();
The data is simply a Data-object that stores the data that is recorded during the recording.
The problem that I am facing is that the first thread is executed fine, the second refuses to run until the first one completes. Putting a breakpoint in the second threads function, ReadData lets me know that it is only called after the first thread is done with all of its recording.
I have been trying to solve this for a few hours now and I can't get my head around why it would do this. Adding a;
while(readThread.IsAlive) { }
right after the start will halt the execution of anything after that, and it's state is Running. But it will not go to the given method.
Any ideas?
Edit:
The two functions that are called upon by the threads are;
private void RecordData(Data d)
{
int i = 0;
while (i < time * freq)
{
double[] data = daq.Read();
d.AddData(data);
i++;
}
}
private void ReadData(Data d)
{
UpdateLabelDelegate updateData =
new UpdateLabelDelegate(UpdateLabel);
int i = 0;
while (i < time * freq)
{
double[] data = d.ReadLastData();
this.Invoke(updateData, new object[] { data });
i++;
}
}
The data object has locking in both the functions that are called upon; ReadLastData and Read.
Here are the methods in the Data object.
public void AddData(double[] data)
{
lock (this)
{
int i = 0;
foreach (double d in data)
{
movementData[i].Add(d);
i++;
}
}
}
public double[] ReadLastData()
{
double[] data = new double[channels];
lock (this)
{
int i = 0;
foreach (List<double> list in movementData)
{
data[i] = list[list.Count - 1];
}
}
return data;
}
Looks like you have a race condition between your reading/writing. In your first thread you lock down the object whilst you add data to it and in the second thread you attempt to get an exclusive lock on it to start reading. However, the problem is the first thread is executing so fast that the second thread never really gets a chance to acquire the lock.
The solution to this problem really depends on what sort of behaviour you are after here. If you expect after every write you get a consecutive read then what you need to do is control the execution between the reading/writing operations e.g.
static AutoResetEvent canWrite = new AutoResetEvent(true); // default to true so the first write happens
static AutoResetEvent canRead = new AutoResetEvent(false);
...
private void RecordData(Data d)
{
int i = 0;
while (i < time * freq)
{
double[] data = daq.Read();
canWrite.WaitOne(); // wait for the second thread to finish reading
d.AddData(data);
canRead.Set(); // let the second thread know we have finished writing
i++;
}
}
private void ReadData(Data d)
{
UpdateLabelDelegate updateData =
new UpdateLabelDelegate(UpdateLabel);
int i = 0;
while (i < time * freq)
{
canRead.WaitOne(); // wait for the first thread to finish writing
double[] data = d.ReadLastData();
canWrite.Set(); // let the first thread know we have finished reading
this.Invoke(updateData, new object[] { data });
i++;
}
}
Could you try adding a Sleep inside RecordData?
Maybe it's just your (mono cpu??) windows operating system that doesn't let the second thread get its hand on cpu resources.
Don't do this:
lock (this)
Do something like this instead:
private object oLock = new object();
[...]
lock (this.oLock)
EDIT:
Could you try calls like this:
Thread recordThread = new Thread((o) => RecordData((Data)o));
recordThread.Name = "record";
recordThread.Start(data);
I have a class in C# like this:
public MyClass
{
public void Start() { ... }
public void Method_01() { ... }
public void Method_02() { ... }
public void Method_03() { ... }
}
When I call the "Start()" method, an external class start to work and will create many parallel threads that those parallel threads call the "Method_01()" and "Method_02()" form above class. after end of working of the external class, the "Method_03()" will be run in another parallel thread.
Threads of "Method_01()" or "Method_02()" are created before creation of thread of Method_03(), but there is no guaranty to end before start of thread of "Method_03()". I mean the "Method_01()" or the "Method_02()" will lost their CPU turn and the "Method_03" will get the CPU turn and will end completely.
In the "Start()" method I know the total number of threads that are supposed to create and run "Method_01" and "Method_02()". The question is that I'm searching for a way using semaphore or mutex to ensure that the first statement of "Method_03()" will be run exactly after end of all threads which are running "Method_01()" or "Method_02()".
Three options that come to mind are:
Keep an array of Thread instances and call Join on all of them from Method_03.
Use a single CountdownEvent instance and call Wait from Method_03.
Allocate one ManualResetEvent for each Method_01 or Method_02 call and call WaitHandle.WaitAll on all of them from Method_03 (this is not very scalable).
I prefer to use a CountdownEvent because it is a lot more versatile and is still super scalable.
public class MyClass
{
private CountdownEvent m_Finished = new CountdownEvent(0);
public void Start()
{
m_Finished.AddCount(); // Increment to indicate that this thread is active.
for (int i = 0; i < NUMBER_OF_THREADS; i++)
{
m_Finished.AddCount(); // Increment to indicate another active thread.
new Thread(Method_01).Start();
}
for (int i = 0; i < NUMBER_OF_THREADS; i++)
{
m_Finished.AddCount(); // Increment to indicate another active thread.
new Thread(Method_02).Start();
}
new Thread(Method_03).Start();
m_Finished.Signal(); // Signal to indicate that this thread is done.
}
private void Method_01()
{
try
{
// Add your logic here.
}
finally
{
m_Finished.Signal(); // Signal to indicate that this thread is done.
}
}
private void Method_02()
{
try
{
// Add your logic here.
}
finally
{
m_Finished.Signal(); // Signal to indicate that this thread is done.
}
}
private void Method_03()
{
m_Finished.Wait(); // Wait for all signals.
// Add your logic here.
}
}
This appears to be a perfect job for Tasks. Below I assume that Method01 and Method02 are allowed to run concurrently with no specific order of invocation or finishing (with no guarantee, just typed in out of memory without testing):
int cTaskNumber01 = 3, cTaskNumber02 = 5;
Task tMaster = new Task(() => {
for (int tI = 0; tI < cTaskNumber01; ++tI)
new Task(Method01, TaskCreationOptions.AttachedToParent).Start();
for (int tI = 0; tI < cTaskNumber02; ++tI)
new Task(Method02, TaskCreationOptions.AttachedToParent).Start();
});
// after master and its children are finished, Method03 is invoked
tMaster.ContinueWith(Method03);
// let it go...
tMaster.Start();
What it sounds like you need to do is to create a ManualResetEvent (initialized to unset) or some other WatHandle for each of Method_01 and Method_02, and then have Method_03's thread use WaitHandle.WaitAll on the set of handles.
Alternatively, if you can reference the Thread variables used to run Method_01 and Method_02, you could have Method_03's thread use Thread.Join to wait on both. This assumes however that those threads are actually terminated when they complete execution of Method_01 and Method_02- if they are not, you need to resort to the first solution I mention.
Why not use a static variable static volatile int threadRuns, which is initialized with the number threads Method_01 and Method_02 will be run.
Then you modify each of those two methods to decrement threadRuns just before exit:
...
lock(typeof(MyClass)) {
--threadRuns;
}
...
Then in the beginning of Method_03 you wait until threadRuns is 0 and then proceed:
while(threadRuns != 0)
Thread.Sleep(10);
Did I understand the quesiton correctly?
There is actually an alternative in the Barrier class that is new in .Net 4.0. This simplifies the how you can do the signalling across multiple threads.
You could do something like the following code, but this is mostly useful when synchronizing different processing threads.
public class Synchro
{
private Barrier _barrier;
public void Start(int numThreads)
{
_barrier = new Barrier((numThreads * 2)+1);
for (int i = 0; i < numThreads; i++)
{
new Thread(Method1).Start();
new Thread(Method2).Start();
}
new Thread(Method3).Start();
}
public void Method1()
{
//Do some work
_barrier.SignalAndWait();
}
public void Method2()
{
//Do some other work.
_barrier.SignalAndWait();
}
public void Method3()
{
_barrier.SignalAndWait();
//Do some other cleanup work.
}
}
I would also like to suggest that since your problem statement was quite abstract, that often actual problems that are solved using countdownevent are now better solved using the new Parallel or PLINQ capabilities. If you were actually processing a collection or something in your code, you might have something like the following.
public class Synchro
{
public void Start(List<someClass> collection)
{
new Thread(()=>Method3(collection));
}
public void Method1(someClass)
{
//Do some work.
}
public void Method2(someClass)
{
//Do some other work.
}
public void Method3(List<someClass> collection)
{
//Do your work on each item in Parrallel threads.
Parallel.ForEach(collection, x => { Method1(x); Method2(x); });
//Do some work on the total collection like sorting or whatever.
}
}
I'm working on a multi-threaded scraper for a website and as per a different question I've decided to use the ThreadPool with QueueUserWorkItem().
How can I continually Queue work items without queuing them all at once? I need to queue > 300k items (one for each userID) and if I loop to queue them all I'll run out of memory.
So, what I would like is:
// 1 = startUserID, 300000 = endUserID, 25 = MaxThreads
Scraper webScraper = new Scraper(1, 300000, 25);
webScraper.Start();
// return immediately while webScraper runs in the background
During this time, webScraper is continuouslly adding all 300000 workItems as threads become available.
Here is what I have so far:
public class Scraper
{
private int MaxUserID { get; set; }
private int MaxThreads { get; set; }
private static int CurrentUserID { get; set; }
private bool Running { get; set; }
private Parser StatsParser = new Parser();
public Scraper()
: this(0, Int32.MaxValue, 25)
{
}
public Scraper(int CurrentUserID, int MaxUserID, int MaxThreads)
{
this.CurrentUserID = CurrentUserID;
this.MaxUserID = MaxUserID;
this.MaxThreads = MaxThreads;
this.Running = false;
ThreadPool.SetMaxThreads(MaxThreads, MaxThreads);
}
public void Start()
{
int availableThreads;
// Need to start a new thread to spawn the new WorkItems so Start() will return right away?
while (Running)
{
// if (!CurrentUserID >= MaxUserID)
// {
// while (availableThreads > 0)
// {
// ThreadPool.QueueUserWorkItem(new WaitCallBack(Process));
// }
// }
// else
// { Running = false; }
}
}
public void Stop()
{
Running = false;
}
public static void process(object state)
{
var userID = Interlocked.Increment(ref CurrentUserID);
... Fetch Stats for userID
}
}
Is this the right approach?
Can anyone point me in the right direction for handling the creation of my work items while in the background once Start() is called, and not creating all Work items at once?
Would this be better implemented with less Work Items that steal work from a queue of work? Just because you have 300,000 pieces of work to do it doesn't mean you need 300,000 workers to do it. Obviously as you only have a few cores, only a few of these pieces of work can be happening in parallel, so why not hand out chunks of work to much fewer workers?
Depending on how constant the time taken for each piece of work is, you can either split it all evenly across each worker or have a central queue (that you'll have to lock around) and each worker can grab some work as it runs out.
EDIT:
Joe Duffy seems to have a series about writing a Work Stealing Queue here: http://www.bluebytesoftware.com/blog/2008/08/12/BuildingACustomThreadPoolSeriesPart2AWorkStealingQueue.aspx. It also looks like .Net 4's Threadpool is going to be a bit smarter. But I don't think you need something particularly complex for this scenario.
I think creating a queue of queued items doesn't seem quite right somehow, so how about making the WorkItems queue themselves again after they've finished?
Your Start method could queue up, say, 3 times MaxThreads items (75 in your example) and then your Process method queues itself when it's finished. That way your Start method returns quickly but fires off a number of work items, which as I say then fire themselves:
public class Scraper
{
private int MaxUserID { get; set; }
private int MaxThreads { get; set; }
private int currentUserID;
private bool Running { get; set; }
private Parser StatsParser = new Parser();
private int Multiplier { get; set; }
public Scraper()
: this(0, Int32.MaxValue, 25)
{
}
public Scraper(int currentUserID, int maxUserID, int maxThreads)
{
this.currentUserID = currentUserID;
this.MaxUserID = maxUserID;
this.MaxThreads = maxThreads;
this.Running = false;
ThreadPool.SetMaxThreads(maxThreads, maxThreads);
Multiplier = 3;
}
public void Start()
{
Running = true;
for (int i = 0; i < MaxThreads * Multiplier; i++)
{
ThreadPool.QueueUserWorkItem(Process);
}
}
public void Stop()
{
Running = false;
}
public void Process(object state)
{
if (Running == false)
{
return;
}
if (currentUserID < MaxUserID)
{
Interlocked.Increment(ref currentUserID);
//Parse stats for currentUserID
ThreadPool.QueueUserWorkItem(Process);
}
else
{ Running = false; }
}
}
I'm sure the Running flag should be being set using Interlocked for safety. I've made the multiplier into a property, which could be passed to the constructor - I'm fairly sure it could be adjusted to tweak performance, depending on how long those stats take to parse.
It looks like you need a Master process control class that governs the amount of workers that are firing off and keeps the Queue full.
You could work with two queues then:
One to hold all the items you need to scrape
Second to do the work
This Master/Governor object would then keep a loop until all your items from Queue #1 are gone and it would keep adding to Queue #2 when you have available cycles.
I definitely won't use ThreadPool.SetMaxThreads - remember that the threadpool is shared between all processes - setting the maximum amount of threads would simply kill performance. The whole idea behind the threadpool is that you don't need to specify things like the maximum amount of threads - the .Net framework figures out the optimum amount of threads to allocate - you don't need to do it.
Note that queuing 300 000 items would not cause 300 000 threads to spawn - the ThreadPool class will manage the number of threads for you and re-use threads as necessary. If you are simply worried that too many resources will be consumed this way I would recommend that you refine your process - perhaps create a 'Spawner' class which in turn runs 1000 of the scraper instances?
You can use a different thread pool. Here is one: http://www.codeplex.com/smartthreadpool
It allows you to queue up all your items at once. You can assign a max number of threads to create. Say you have 1000 work items and you assign 100 threads. It will immediately take the first 100 items and get them going while the rest wait. As soon as one of those items is done and a thread frees up, the next queued item is started. It manages all the work but won't saturate threads and memory. Also, it doesn't use threads from the .net thread pool.