Proper data access in Multithreading - c#

I have Method which is used by multiple threads at the same time. each one of this thread Call another method to receive the data they need from a List (each one should get a different data not same).
I wrote this code to get Data from a list and use them in the Threads.
public static List<string> ownersID;
static int idIdx = 0;
public static string[] GetUserID()
{
if (idIdx < ownersID.Count-1)
{
string[] ret = { ownersID[idIdx], idIdx.ToString() };
idIdx++;
return ret;
}
else if (idIdx >= ownersID.Count)
{
string[] ret = { "EndOfThat" };
return ret;
}
return new string[0];
}
Then each thread use this code to receive the data and remove it from the list:
string[] arrOwner = GetUserID();
string id = arrOwner[0];
ownersID.RemoveAt(Convert.ToInt32(arrOwner[1]));
But sometimes 2 or more threads can have the same data.
Is there has any better way to do this?

If you want to do it with List just add little bit of locking
private object _lock = new object();
private List<string> _list = new List<string>();
public void Add(string someStr)
{
lock(_lock)
{
if (_list.Any(s => s == someStr) // already added (inside lock)
return;
_list.Add(someStr);
}
}
public void Remove(string someStr)
{
lock(_lock)
{
if (!_list.Any(s => s == someStr) // already removed(inside lock)
return;
_list.Remove(someStr);
}
}
With that, no thread will be adding/removing anything while another thread does the same. Your list will be protected from multi-thread access. And you make sure that you only have 1 of the kind. However, you can achieve this using ConcurrentDictionary<T1, T2>
Update: I removed pre-lock check due to this MSDN thread safety statement
It is safe to perform multiple read operations on a List (read - multithreading), but issues can occur if the collection is modified while it's being read.

On a larger scale of application you can use .Net queue to communicate between two thread.
The benefit of using a queue is you don't need to lock the object which will be decrease the latency.From Main thread to Thread A , Thread B And Thread C the data will add and receive through queue.No Locking.

Related

Async Producer/Consumer

I have a instance of a class that is accessed from several threads. This class take this calls and add a tuple into a database. I need this to be done in a serial manner, as due to some db constraints, parallel threads could result in an inconsistent database.
As I am new to parallelism and concurrency in C#, I did this:
private BlockingCollection<Task> _tasks = new BlockingCollection<Task>();
public void AddDData(string info)
{
Task t = new Task(() => { InsertDataIntoBase(info); });
_tasks.Add(t);
}
private void InsertWorker()
{
Task.Factory.StartNew(() =>
{
while (!_tasks.IsCompleted)
{
Task t;
if (_tasks.TryTake(out t))
{
t.Start();
t.Wait();
}
}
});
}
The AddDData is the one who is called by multiple threads and InsertDataIntoBase is a very simple insert that should take few milliseconds.
The problem is that, for some reason that my lack of knowledge doesn't allow me to figure out, sometimes a task is been called twice! It always goes like this:
T1
T2
T3
T1 <- PK error.
T4
...
Did I understand .Take() completely wrong, am I missing something or my producer/ consumer implementation is really bad?
Best Regards,
Rafael
UPDATE:
As suggested, I made a quick sandbox test implementation with this architecture and as I was suspecting, it does not guarantee that a task will not be fired before the previous one finishes.
So the question remains: how to properly queue tasks and fire them sequentially?
UPDATE 2:
I simplified the code:
private BlockingCollection<Data> _tasks = new BlockingCollection<Data>();
public void AddDData(Data info)
{
_tasks.Add(info);
}
private void InsertWorker()
{
Task.Factory.StartNew(() =>
{
while (!_tasks.IsCompleted)
{
Data info;
if (_tasks.TryTake(out info))
{
InsertIntoDB(info);
}
}
});
}
Note that I got rid of Tasks as I'm relying on synced InsertIntoDB call (as it is inside a loop), but still no luck... The generation is fine and I'm absolutely sure that only unique instances are going to the queue. But no matter I try, sometimes the same object is used twice.
I think this should work:
private static BlockingCollection<string> _itemsToProcess = new BlockingCollection<string>();
static void Main(string[] args)
{
InsertWorker();
GenerateItems(10, 1000);
_itemsToProcess.CompleteAdding();
}
private static void InsertWorker()
{
Task.Factory.StartNew(() =>
{
while (!_itemsToProcess.IsCompleted)
{
string t;
if (_itemsToProcess.TryTake(out t))
{
// Do whatever needs doing here
// Order should be guaranteed since BlockingCollection
// uses a ConcurrentQueue as a backing store by default.
// http://msdn.microsoft.com/en-us/library/dd287184.aspx#remarksToggle
Console.WriteLine(t);
}
}
});
}
private static void GenerateItems(int count, int maxDelayInMs)
{
Random r = new Random();
string[] items = new string[count];
for (int i = 0; i < count; i++)
{
items[i] = i.ToString();
}
// Simulate many threads adding items to the collection
items
.AsParallel()
.WithDegreeOfParallelism(4)
.WithExecutionMode(ParallelExecutionMode.ForceParallelism)
.Select((x) =>
{
Thread.Sleep(r.Next(maxDelayInMs));
_itemsToProcess.Add(x);
return x;
}).ToList();
}
This does mean that the consumer is single threaded, but allows for multiple producer threads.
From your comment
"I simplified the code shown here, as the data is not a string"
I assume that info parameter passed into AddDData is a mutable reference type. Make sure that the caller is not using the same info instance for multple calls since that reference is captured in Task lambda .
Based on the trace that you provided the only logical possibility is that you have called InsertWorker twice (or more). There are thus two background threads waiting for items to appear in the collection and occasionally they both manage to grab an item and begin executing it.

Multithreading with method containing 2 parameters to return dictionary

I wrote a method that goes through a list of files and extracts values from each file and stores them into a dictionary and returns the dictionary. This method goes through a large amount of files and I receive a ContextSwitchDeadLock error because of it. I have looked into this error and I am needing to use a thread to fix this error. I am brand new to threads and would like some help with threading.
I create a new thread and use delegate to pass through the parameters dictionary and fileNames into the method getValuesNew(). I am wondering how can I return the dictionary. I have attached the method that I would like to call as well as the code in the main program that creates the new thread. Any suggestions to better my code will be greatly appreciated!
//dictionary and fileNames are manipulated a bit before use in thread
Dictionary<string, List<double>> dictionary = new Dictionary<string, List<double>>();
List<string> fileNames = new List<string>();
...
Thread thread = new Thread(delegate()
{
getValuesNEW(dictionary, fileNames);
});
thread.Start();
//This is the method that I am calling
public Dictionary<string, List<double>> getValuesNEW(Dictionary<string, List<double>> dictionary, List<string> fileNames)
{
foreach (string name in fileNames)
{
XmlReader reader = XmlReader.Create(name);
var collectValues = false;
string ertNumber = null;
while (reader.Read())
{
if ((reader.NodeType == XmlNodeType.Element))
{
if (reader.Name == "ChannelID" && reader.HasAttributes)
{
if (dictionary.ContainsKey(sep(reader.GetAttribute("EndPointChannelID"))))
{
//collectValues = sep(reader.GetAttribute("EndPointChannelID")) == ertNumber;
collectValues = true;
ertNumber = sep(reader.GetAttribute("EndPointChannelID"));
}
else
{
collectValues = false;
}
}
else if (collectValues && reader.Name == "Reading" && reader.HasAttributes)
{
dictionary[ertNumber].Add(Convert.ToDouble(reader.GetAttribute("Value")));
}
}
}
}
return dictionary;
}
Others have explained why the current approach isn't getting you anywhere. If you're using .NET 4 you can use the ConcurrentDictionary and Parallel.ForEach
private List<double> GetValuesFromFile(string fileName)
{
//TBD
}
private void RetrieveAllFileValues()
{
IEnumerable<string> files = ...;
ConcurrentDictionary<int, List<double>> dict = new ConcurrentDictionary<int, List<double>>();
Parallel.ForEach(files, file =>
{
var values = GetValuesFromFile(file);
dict.Add(file, values);
});
}
You don't need to return the dictionary: the main thread already has a reference to it and it'll see the changes that your thread does. All the main thread has to do is wait until the delegate thread is done (for example using thread.Wait()).
However, doing things this way, you get no benefit from multithreading because nothing is done in parallel. What you can do is have multiple threads, and multiple dictionaries (one per thread). When everyone's done, the main thread can put all those dictionaries together.
The reason you don't want multiple threads accessing the same dictionary is that the Dictionary class is not thread-safe: its behavior is undefined if more than one thread use it at the same time. However you could use a ConcurrentDictionary, this one is thread-safe. What this means is that every time you read or write to the ConcurrentDictionary, it uses a lock to make sure to wait until no one else is using the dictionary at the same time.
Which of the two techniques is faster depends on how often your threads would be accessing the shared dictionary: if they access it rarely then a ConcurrentDictionary will work well. If they access it very often then it might be preferable to use multiple dictionaries and merge in the end. In your case since there is file I/O involved, I suspect that the ConcurrentDictionary approach will work best.
So, in short, change getValuesNEW to this:
//This is the method that I am calling
public void getValuesNEW(ConcurrentDictionary<string, List<double>> dictionary, List<string> fileNames)
{
foreach (string name in fileNames)
{
// (code in there is unchanged)
}
// no need to return the dictionary
//return dictionary;
}
If you want to wait for your thread to finish, then you can use Thread.Join after Thread.Start, to get the result of your thread create a class variable or something that both your main program and your thread can use, however, i don't see the point of a thread here, unless you want to process all the files in parallel.

What is the best way to send data from multiple Threads back to the Main Thread?

I have a mainthread and i do create another n-threads. Each of the n-threads is populating a List<String>. When all threads are finished, they are joined, and i would like to have all those n-threads List in a List<List<String>> BUT in the mainthread. The mainthread should be able to operate on that List<List<String>>. Each of the n-threads contributed a List<String>.
I have c# .NET 3.5 and i would like to avoid a static List<List<String>>
Thread t = new Thread(someObjectForThreading.goPopulateList);
list_threads.Add(t);
list_threads.last().start()
all those threads in list_threads go on and populate their List and when they are finished i would like to have something like
//this = mainThread
this.doSomethingWith(List<List<String>>)
Edit: Hmmm is there no a "standard concept" how to solve such a task? Many threads operating on a list and when all joined, the mainthread can proceed with operating on the list.
Edit2: the List<List<String>> listOfLists must not be static. It can be public or private. First i need the n-threads to operate (and lock) the listOfLists, insert their List and after all n-threads are done inserting their lists, i would join the threads and the mainthread could proceed with businesslogic and operate on the listOfLists
i think i will reRead some parts of http://www.albahari.com/threading/ report back
Here's a simple implementation using wait handles (in the case ManualResetEvent) to allow each worker thread to signal the main thread that it's done with its work. I hope this is somewhat self explanatory:
private List<List<string>> _listOfLists;
public void CreateListOfLists()
{
var waitHandles = new List<WaitHandle>();
foreach (int count in Enumerable.Range(1, 5))
{
var t = new Thread(CreateListOfStringWorker);
var handle = new ManualResetEvent(false);
t.Start(handle);
waitHandles.Add(handle);
}
// wait for all threads to complete by calling Set()
WaitHandle.WaitAll(waitHandles.ToArray());
// do something with _listOfLists
// ...
}
public void CreateListOfStringWorker(object state)
{
var list = new List<string>();
lock (_listOfLists)
{
_listOfLists.Add(list);
}
list.Add("foo");
list.Add("bar");
((ManualResetEvent) state).Set(); // i'm done
}
Note how I'm only locking while I add each thread's List to the main list of lists. There is no need to lock the main list for each add, as each thread has its own List. Make sense?
Edit:
The point of using the waithandle is to wait for each thread to complete before working on your list of lists. If you don't wait, then you run the risk of trying to enumerate one of the List instances while the worker is still adding strings to it. This will cause an InvalidOperationException to be thrown, and your thread(s) will die. You cannot enumerate a collection and simultaneously modify it.
Rather than making the static List<List<String>> make a local List<List<String>> and pass it to the Object the thread will be running. Of course, you'll need to wrap the List in a synchronous wrapper since it's being accessed by multiple threads.
List<List<String>> list = ArrayList.synchronized(new ArrayList<List<String>>());
// later
SomeObject o = new SomeObjectForThreading(list);
Thread t = new Thread(o.goPopulateList);
list_threads.Add(t);
list_threads.last().start();
// even later
this.doSomethingWith(list);
In o.goPopulateList, you might have
List<String> temp = new ArrayList<String>();
temp.add(random text);
temp.add(other random text);
this.list.add(temp); // this.list was passed in at construct time
I would provide each thread with a call-back method that updates the list in the main thread, protected with a lock statement.
Edit:
class Program
{
static List<string> listOfStuff = new List<string>();
static void Main(string[] args)
{
List<Thread> threads = new List<Thread>();
for (int i = 0; i < 20; i++)
{
var thread = new Thread(() => { new Worker(new AppendToListDelegate(AppendToList)).DoWork(); });
thread.IsBackground = true;
threads.Add(thread);
}
threads.ForEach(n => n.Start());
threads.ForEach(n => n.Join());
Console.WriteLine("Count: " + listOfStuff.Count());
Console.ReadLine();
}
static void AppendToList(string arg)
{
lock (listOfStuff)
{
listOfStuff.Add(arg);
}
}
}
public delegate void AppendToListDelegate(string arg);
class Worker
{
AppendToListDelegate Appender;
public Worker(AppendToListDelegate appenderArg)
{
Appender = appenderArg;
}
public void DoWork()
{
for (int j = 0; j < 10000; j++)
{
Appender(Thread.CurrentThread.ManagedThreadId.ToString() + "." + j.ToString());
}
}
}
private void someObjectForThreading.goPopulateList()
{
Do threaded stuff...
populate threaded list..
All done..
for (List list in myThreadedList)
{
myMainList.Add(list);
}
}

How to add to a List while using Multi-Threading?

I'm kinda new to Multi-Threading and have only played around with it in the past. But I'm curious if it is possible to have a List of byte arrays on a main thread and still be able to add to that List while creating the new byte array in a seperate Thread. Also, I'll be using a for-each loop that will go through a list of forms that will be used to parse into the byte array. So basically a pseudo code would be like this...
reports = new List();
foreach (form in forms)
{
newReport = new Thread(ParseForm(form));
reports.Add(newReport);
}
void ParseForm(form)
{
newArray = new byte[];
newArray = Convert.ToBytes(form);
return newArray;
}
Hopefully the pseudo-code above makes some sense. If anyone could tell me if this is possible and point me in the direction of a good example, I'm sure I can figure out the actual code.
If you need to access a collection from multiple threads, you should either use synchronization, or use a SynchronizedCollection if your .NET version is 3.0 or higher.
Here is one way to make the collection accessible to your thread:
SynchronizedCollection reports = new SynchronizedCollection();
foreach (form in forms) {
var reportThread = new Thread(() => ParseForm(form, reports));
reportThread.Start();
}
void ParseForm(Form form, SynchronizedCollection reports) {
newArray = new byte[];
newArray = Convert.ToBytes(form);
reports.Add(newArray);
}
If you are on .NET 4 or later, a much better alternative to managing your threads manually is presented by various classes of the System.Threading.Tasks namespace. Consider exploring this alternative before deciding on your threading implementation.
In before we realized it was .Net 3.5, keep for reference on .Net 4
If you don't need any order within the list, an easy "fix" is to use the ConcurrentBag<T> class instead of a list. If you need more order, there is also a ConcurrentQueue<T> collection too.
If you really need something more custom, you can implement your own blocking collection using BlockingCollection<T>. Here's a good article on the topic.
You can also use Parallel.Foreach to avoid the explicit thread creation too:
private void ParseForms()
{
var reports = new ConcurrentBag<byte[]>();
Parallel.ForEach(forms, (form) =>
{
reports.Add(ParseForm(form));
});
}
private byte[] ParseForm(form)
{
newArray = new byte[];
newArray = Convert.ToBytes(form);
return newArray;
}
Why is enumerate files returning the same file more than once?
Check that out. It shows I think exactly what you want to do.
It creates a list on the main thread then adds to it from a different thread.
your going to need
using System.Threading.Tasks
-
Files.Clear(); //List<string>
Task.Factory.StartNew( () =>
{
this.BeginInvoke( new Action(() =>
{
Files.Add("Hi");
}));
});
Below is a simple Blocking Collection (as a queue only) that I just whipped up now since you don't have access to C# 4.0. It's most likely less efficient than the 4.0 concurrent collections, but it should work well enough. I didn't re-implement all of the Queue methods, just enqueue, dequeue, and peek. If you need others and can't figure out how they would be implemented just mention it in the comments.
Once you have the working blocking collection you can simply add to it from the producer threads and remove from it using the consumer threads.
public class MyBlockingQueue<T>
{
private Queue<T> queue = new Queue<T>();
private AutoResetEvent signal = new AutoResetEvent(false);
private object padLock = new object();
public void Enqueue(T item)
{
lock (padLock)
{
queue.Enqueue(item);
signal.Set();
}
}
public T Peek()
{
lock (padLock)
{
while (queue.Count < 1)
{
signal.WaitOne();
}
return queue.Peek();
}
}
public T Dequeue()
{
lock (padLock)
{
while (queue.Count < 1)
{
signal.WaitOne();
}
return queue.Dequeue();
}
}
}

New inside a lock

I noticed the following code from our foreign programmers:
private Client[] clients = new Client[0];
public CreateClients(int count)
{
lock (clients)
{
clients = new Client[count];
for(int i=0; i<count; i++)
{
Client[i] = new Client();//Stripped
}
}
}
It's not exactly proper code but I was wondering what exactly this will do. Will this lock on a new object each time this method is called?
To answer your question of "I was wondering what exactly this will do" consider what happens if two threads try to do this.
Thread 1: locks on the clients reference, which is `new Client[0]`
Thread 1 has entered the critical block
Thread 1: makes a array and assigns it to the clients reference
Thread 2: locks on the clients reference, which is the array just made in thread 1
Thread 2 has entered the critical block
You know have two threads in the critical block at the same time. That's bad.
This lock really does nothing. It locks an instance of an object which is immediately changed such that other threads entering this method will lock on a different object. The result is 2 threads executing in the middle of the lock which is probably not what was intended.
A much better approach here is to use a different, non-changing object to lock on
private readonly object clientsLock = new object();
private Client[] clients = new Client[0];
public CreateClients(int count) {
lock (clientsLock) {
clients = new string[count];
...
}
}
This code is wrong - it will lock on a new instance every time it's called.
It should look like that:
private static readonly object clientsLock = new object();
private static string[] Clients = null;
public CreateClients(int count)
{
if(clients == null)
{
lock (clientsLock)
{
if(clients == null)
{
clients = new string[count];
for(int i=0; i<count; i++)
{
client[i] = new Client();//Stripped
}
}
}
}
}
There's no point in locking every time the method is called - that's why the surrounding if clause.
Use :
private object = new Object();
lock(object){
//your code
}
I think you're correct to doubt this code!
This code will lock on the previous instance each time - this might be the desired effect, but I doubt it. It won't stop multiple threads from creating multiple arrays.

Categories