How to add to a List while using Multi-Threading? - c#

I'm kinda new to Multi-Threading and have only played around with it in the past. But I'm curious if it is possible to have a List of byte arrays on a main thread and still be able to add to that List while creating the new byte array in a seperate Thread. Also, I'll be using a for-each loop that will go through a list of forms that will be used to parse into the byte array. So basically a pseudo code would be like this...
reports = new List();
foreach (form in forms)
{
newReport = new Thread(ParseForm(form));
reports.Add(newReport);
}
void ParseForm(form)
{
newArray = new byte[];
newArray = Convert.ToBytes(form);
return newArray;
}
Hopefully the pseudo-code above makes some sense. If anyone could tell me if this is possible and point me in the direction of a good example, I'm sure I can figure out the actual code.

If you need to access a collection from multiple threads, you should either use synchronization, or use a SynchronizedCollection if your .NET version is 3.0 or higher.
Here is one way to make the collection accessible to your thread:
SynchronizedCollection reports = new SynchronizedCollection();
foreach (form in forms) {
var reportThread = new Thread(() => ParseForm(form, reports));
reportThread.Start();
}
void ParseForm(Form form, SynchronizedCollection reports) {
newArray = new byte[];
newArray = Convert.ToBytes(form);
reports.Add(newArray);
}
If you are on .NET 4 or later, a much better alternative to managing your threads manually is presented by various classes of the System.Threading.Tasks namespace. Consider exploring this alternative before deciding on your threading implementation.

In before we realized it was .Net 3.5, keep for reference on .Net 4
If you don't need any order within the list, an easy "fix" is to use the ConcurrentBag<T> class instead of a list. If you need more order, there is also a ConcurrentQueue<T> collection too.
If you really need something more custom, you can implement your own blocking collection using BlockingCollection<T>. Here's a good article on the topic.
You can also use Parallel.Foreach to avoid the explicit thread creation too:
private void ParseForms()
{
var reports = new ConcurrentBag<byte[]>();
Parallel.ForEach(forms, (form) =>
{
reports.Add(ParseForm(form));
});
}
private byte[] ParseForm(form)
{
newArray = new byte[];
newArray = Convert.ToBytes(form);
return newArray;
}

Why is enumerate files returning the same file more than once?
Check that out. It shows I think exactly what you want to do.
It creates a list on the main thread then adds to it from a different thread.
your going to need
using System.Threading.Tasks
-
Files.Clear(); //List<string>
Task.Factory.StartNew( () =>
{
this.BeginInvoke( new Action(() =>
{
Files.Add("Hi");
}));
});

Below is a simple Blocking Collection (as a queue only) that I just whipped up now since you don't have access to C# 4.0. It's most likely less efficient than the 4.0 concurrent collections, but it should work well enough. I didn't re-implement all of the Queue methods, just enqueue, dequeue, and peek. If you need others and can't figure out how they would be implemented just mention it in the comments.
Once you have the working blocking collection you can simply add to it from the producer threads and remove from it using the consumer threads.
public class MyBlockingQueue<T>
{
private Queue<T> queue = new Queue<T>();
private AutoResetEvent signal = new AutoResetEvent(false);
private object padLock = new object();
public void Enqueue(T item)
{
lock (padLock)
{
queue.Enqueue(item);
signal.Set();
}
}
public T Peek()
{
lock (padLock)
{
while (queue.Count < 1)
{
signal.WaitOne();
}
return queue.Peek();
}
}
public T Dequeue()
{
lock (padLock)
{
while (queue.Count < 1)
{
signal.WaitOne();
}
return queue.Dequeue();
}
}
}

Related

Proper data access in Multithreading

I have Method which is used by multiple threads at the same time. each one of this thread Call another method to receive the data they need from a List (each one should get a different data not same).
I wrote this code to get Data from a list and use them in the Threads.
public static List<string> ownersID;
static int idIdx = 0;
public static string[] GetUserID()
{
if (idIdx < ownersID.Count-1)
{
string[] ret = { ownersID[idIdx], idIdx.ToString() };
idIdx++;
return ret;
}
else if (idIdx >= ownersID.Count)
{
string[] ret = { "EndOfThat" };
return ret;
}
return new string[0];
}
Then each thread use this code to receive the data and remove it from the list:
string[] arrOwner = GetUserID();
string id = arrOwner[0];
ownersID.RemoveAt(Convert.ToInt32(arrOwner[1]));
But sometimes 2 or more threads can have the same data.
Is there has any better way to do this?
If you want to do it with List just add little bit of locking
private object _lock = new object();
private List<string> _list = new List<string>();
public void Add(string someStr)
{
lock(_lock)
{
if (_list.Any(s => s == someStr) // already added (inside lock)
return;
_list.Add(someStr);
}
}
public void Remove(string someStr)
{
lock(_lock)
{
if (!_list.Any(s => s == someStr) // already removed(inside lock)
return;
_list.Remove(someStr);
}
}
With that, no thread will be adding/removing anything while another thread does the same. Your list will be protected from multi-thread access. And you make sure that you only have 1 of the kind. However, you can achieve this using ConcurrentDictionary<T1, T2>
Update: I removed pre-lock check due to this MSDN thread safety statement
It is safe to perform multiple read operations on a List (read - multithreading), but issues can occur if the collection is modified while it's being read.
On a larger scale of application you can use .Net queue to communicate between two thread.
The benefit of using a queue is you don't need to lock the object which will be decrease the latency.From Main thread to Thread A , Thread B And Thread C the data will add and receive through queue.No Locking.

Batch process all items in ConcurrentBag

I have the following use case. Multiple threads are creating data points which are collected in a ConcurrentBag. Every x ms a single consumer thread looks at the data points that came in since the last time and processes them (e.g. count them + calculate average).
The following code more or less represents the solution that I came up with:
private static ConcurrentBag<long> _bag = new ConcurrentBag<long>();
static void Main()
{
Task.Run(() => Consume());
var producerTasks = Enumerable.Range(0, 8).Select(i => Task.Run(() => Produce()));
Task.WaitAll(producerTasks.ToArray());
}
private static void Produce()
{
for (int i = 0; i < 100000000; i++)
{
_bag.Add(i);
}
}
private static void Consume()
{
while (true)
{
var oldBag = _bag;
_bag = new ConcurrentBag<long>();
var average = oldBag.DefaultIfEmpty().Average();
var count = oldBag.Count;
Console.WriteLine($"Avg = {average}, Count = {count}");
// Wait x ms
}
}
Is a ConcurrentBag the right tool for the job here?
Is switching the bags the right way to achieve clearing the list for new data points and then processing the old ones?
Is it safe to operate on oldBag or could I run into trouble when I iterate over oldBag and a thread is still adding an item?
Should I use Interlocked.Exchange() for switching the variables?
EDIT
I guess the above code was not really a good representation of what I'm trying to achieve. So here is some more code to show the problem:
public class LogCollectorTarget : TargetWithLayout, ILogCollector
{
private readonly List<string> _logMessageBuffer;
public LogCollectorTarget()
{
_logMessageBuffer = new List<string>();
}
protected override void Write(LogEventInfo logEvent)
{
var logMessage = Layout.Render(logEvent);
lock (_logMessageBuffer)
{
_logMessageBuffer.Add(logMessage);
}
}
public string GetBuffer()
{
lock (_logMessageBuffer)
{
var messages = string.Join(Environment.NewLine, _logMessageBuffer);
_logMessageBuffer.Clear();
return messages;
}
}
}
The class' purpose is to collect logs so they can be sent to a server in batches. Every x seconds GetBuffer is called. This should get the current log messages and clear the buffer for new messages. It works with locks but it as they are quite expensive I don't want to lock on every Logging-operation in my program. So that's why I wanted to use a ConcurrentBag as a buffer. But then I still need to switch or clear it when I call GetBuffer without loosing any log messages that happen during the switch.
Since you have a single consumer, you can work your way with a simple ConcurrentQueue, without swapping collections:
public class LogCollectorTarget : TargetWithLayout, ILogCollector
{
private readonly ConcurrentQueue<string> _logMessageBuffer;
public LogCollectorTarget()
{
_logMessageBuffer = new ConcurrentQueue<string>();
}
protected override void Write(LogEventInfo logEvent)
{
var logMessage = Layout.Render(logEvent);
_logMessageBuffer.Enqueue(logMessage);
}
public string GetBuffer()
{
// How many messages should we dequeue?
var count = _logMessageBuffer.Count;
var messages = new StringBuilder();
while (count > 0 && _logMessageBuffer.TryDequeue(out var message))
{
messages.AppendLine(message);
count--;
}
return messages.ToString();
}
}
If memory allocations become an issue, you can instead dequeue them to a fixed-size array and call string.Join on it. This way, you're guaranteed to do only two allocations (whereas the StringBuilder could do many more if the initial buffer isn't properly sized):
public string GetBuffer()
{
// How many messages should we dequeue?
var count = _logMessageBuffer.Count;
var buffer = new string[count];
for (int i = 0; i < count; i++)
{
_logMessageBuffer.TryDequeue(out var message);
buffer[i] = message;
}
return string.Join(Environment.NewLine, buffer);
}
Is a ConcurrentBag the right tool for the job here?
Its the right tool for a job, this really depends on what you are trying to do, and why. The example you have given is very simplistic without any context so its hard to tell.
Is switching the bags the right way to achieve clearing the list for
new data points and then processing the old ones?
The answer is no, for probably many reasons. What happens if a thread writes to it, while you are switching it?
Is it safe to operate on oldBag or could I run into trouble when I
iterate over oldBag and a thread is still adding an item?
No, you have just copied the reference, this will achieve nothing.
Should I use Interlocked.Exchange() for switching the variables?
Interlock methods are great things, however this will not help you in your current problem, they are for thread safe access to integer type values. You are really confused and you need to look up more thread safe examples.
However Lets point you in the right direction. forget about ConcurrentBag and those fancy classes. My advice is start simple and use locking so you understand the nature of the problem.
If you want multiple tasks/threads to access a list, you can easily use the lock statement and guard access to the list/array so other nasty threads aren't modifying it.
Obviously the code you have written is a nonsensical example, i mean you are just adding consecutive numbers to a list, and getting another thread to average them them. This hardly needs to be consumer producer at all, and would make more sense to just be synchronous.
At this point i would point you to better architectures that would allow you to implement this pattern, e.g Tpl Dataflow, but i fear this is just a learning excise and unfortunately you really need to do more reading on multithreading and try more examples before we can truly help you with a problem.
It works with locks but it as they are quite expensive. I don't want to lock on every logging-operation in my program.
Acquiring an uncontended lock is actually quite cheap. Quoting from Joseph Albahari's book:
You can expect to acquire and release a lock in as little as 20 nanoseconds on a 2010-era computer if the lock is uncontended.
Locking becomes expensive when it is contended. You can minimize the contention by reducing the work inside the critical region to the absolute minimum. In other words don't do anything inside the lock that can be done outside the lock. In your second example the method GetBuffer does a String.Join inside the lock, delaying the release of the lock and increasing the chances of blocking other threads. You can improve it like this:
public string GetBuffer()
{
string[] messages;
lock (_logMessageBuffer)
{
messages = _logMessageBuffer.ToArray();
_logMessageBuffer.Clear();
}
return String.Join(Environment.NewLine, messages);
}
But it can be optimized even further. You could use the technique of your first example, and instead of clearing the existing List<string>, just swap it with a new list:
public string GetBuffer()
{
List<string> oldList;
lock (_logMessageBuffer)
{
oldList = _logMessageBuffer;
_logMessageBuffer = new();
}
return String.Join(Environment.NewLine, oldList);
}
Starting from .NET Core 3.0, the Monitor class has the property Monitor.LockContentionCount, that returns the number of times there was contention at the entry point of a lock. You could watch the delta of this property every second, and see if the number is concerning. If you get single-digit numbers, there is nothing to worry about.
Touching some of your questions:
Is a ConcurrentBag the right tool for the job here?
No. The ConcurrentBag<T> is a very specialized collection intended for mixed producer scenarios, mainly object pools. You don't have such a scenario here. A ConcurrentQueue<T> is preferable to a ConcurrentBag<T> in almost all scenarios.
Should I use Interlocked.Exchange() for switching the variables?
Only if the collection was immutable. If the _logMessageBuffer was an ImmutableQueue<T>, then it would be excellent to swap it with Interlocked.Exchange. With mutable types you have no idea if the old collection is still in use by another thread, and for how long. The operating system can suspend any thread at any time for a duration of 10-30 milliseconds or even more (demo). So it's not safe to use lock-free techniques. You have to lock.

Async Producer/Consumer

I have a instance of a class that is accessed from several threads. This class take this calls and add a tuple into a database. I need this to be done in a serial manner, as due to some db constraints, parallel threads could result in an inconsistent database.
As I am new to parallelism and concurrency in C#, I did this:
private BlockingCollection<Task> _tasks = new BlockingCollection<Task>();
public void AddDData(string info)
{
Task t = new Task(() => { InsertDataIntoBase(info); });
_tasks.Add(t);
}
private void InsertWorker()
{
Task.Factory.StartNew(() =>
{
while (!_tasks.IsCompleted)
{
Task t;
if (_tasks.TryTake(out t))
{
t.Start();
t.Wait();
}
}
});
}
The AddDData is the one who is called by multiple threads and InsertDataIntoBase is a very simple insert that should take few milliseconds.
The problem is that, for some reason that my lack of knowledge doesn't allow me to figure out, sometimes a task is been called twice! It always goes like this:
T1
T2
T3
T1 <- PK error.
T4
...
Did I understand .Take() completely wrong, am I missing something or my producer/ consumer implementation is really bad?
Best Regards,
Rafael
UPDATE:
As suggested, I made a quick sandbox test implementation with this architecture and as I was suspecting, it does not guarantee that a task will not be fired before the previous one finishes.
So the question remains: how to properly queue tasks and fire them sequentially?
UPDATE 2:
I simplified the code:
private BlockingCollection<Data> _tasks = new BlockingCollection<Data>();
public void AddDData(Data info)
{
_tasks.Add(info);
}
private void InsertWorker()
{
Task.Factory.StartNew(() =>
{
while (!_tasks.IsCompleted)
{
Data info;
if (_tasks.TryTake(out info))
{
InsertIntoDB(info);
}
}
});
}
Note that I got rid of Tasks as I'm relying on synced InsertIntoDB call (as it is inside a loop), but still no luck... The generation is fine and I'm absolutely sure that only unique instances are going to the queue. But no matter I try, sometimes the same object is used twice.
I think this should work:
private static BlockingCollection<string> _itemsToProcess = new BlockingCollection<string>();
static void Main(string[] args)
{
InsertWorker();
GenerateItems(10, 1000);
_itemsToProcess.CompleteAdding();
}
private static void InsertWorker()
{
Task.Factory.StartNew(() =>
{
while (!_itemsToProcess.IsCompleted)
{
string t;
if (_itemsToProcess.TryTake(out t))
{
// Do whatever needs doing here
// Order should be guaranteed since BlockingCollection
// uses a ConcurrentQueue as a backing store by default.
// http://msdn.microsoft.com/en-us/library/dd287184.aspx#remarksToggle
Console.WriteLine(t);
}
}
});
}
private static void GenerateItems(int count, int maxDelayInMs)
{
Random r = new Random();
string[] items = new string[count];
for (int i = 0; i < count; i++)
{
items[i] = i.ToString();
}
// Simulate many threads adding items to the collection
items
.AsParallel()
.WithDegreeOfParallelism(4)
.WithExecutionMode(ParallelExecutionMode.ForceParallelism)
.Select((x) =>
{
Thread.Sleep(r.Next(maxDelayInMs));
_itemsToProcess.Add(x);
return x;
}).ToList();
}
This does mean that the consumer is single threaded, but allows for multiple producer threads.
From your comment
"I simplified the code shown here, as the data is not a string"
I assume that info parameter passed into AddDData is a mutable reference type. Make sure that the caller is not using the same info instance for multple calls since that reference is captured in Task lambda .
Based on the trace that you provided the only logical possibility is that you have called InsertWorker twice (or more). There are thus two background threads waiting for items to appear in the collection and occasionally they both manage to grab an item and begin executing it.

Multithreading with method containing 2 parameters to return dictionary

I wrote a method that goes through a list of files and extracts values from each file and stores them into a dictionary and returns the dictionary. This method goes through a large amount of files and I receive a ContextSwitchDeadLock error because of it. I have looked into this error and I am needing to use a thread to fix this error. I am brand new to threads and would like some help with threading.
I create a new thread and use delegate to pass through the parameters dictionary and fileNames into the method getValuesNew(). I am wondering how can I return the dictionary. I have attached the method that I would like to call as well as the code in the main program that creates the new thread. Any suggestions to better my code will be greatly appreciated!
//dictionary and fileNames are manipulated a bit before use in thread
Dictionary<string, List<double>> dictionary = new Dictionary<string, List<double>>();
List<string> fileNames = new List<string>();
...
Thread thread = new Thread(delegate()
{
getValuesNEW(dictionary, fileNames);
});
thread.Start();
//This is the method that I am calling
public Dictionary<string, List<double>> getValuesNEW(Dictionary<string, List<double>> dictionary, List<string> fileNames)
{
foreach (string name in fileNames)
{
XmlReader reader = XmlReader.Create(name);
var collectValues = false;
string ertNumber = null;
while (reader.Read())
{
if ((reader.NodeType == XmlNodeType.Element))
{
if (reader.Name == "ChannelID" && reader.HasAttributes)
{
if (dictionary.ContainsKey(sep(reader.GetAttribute("EndPointChannelID"))))
{
//collectValues = sep(reader.GetAttribute("EndPointChannelID")) == ertNumber;
collectValues = true;
ertNumber = sep(reader.GetAttribute("EndPointChannelID"));
}
else
{
collectValues = false;
}
}
else if (collectValues && reader.Name == "Reading" && reader.HasAttributes)
{
dictionary[ertNumber].Add(Convert.ToDouble(reader.GetAttribute("Value")));
}
}
}
}
return dictionary;
}
Others have explained why the current approach isn't getting you anywhere. If you're using .NET 4 you can use the ConcurrentDictionary and Parallel.ForEach
private List<double> GetValuesFromFile(string fileName)
{
//TBD
}
private void RetrieveAllFileValues()
{
IEnumerable<string> files = ...;
ConcurrentDictionary<int, List<double>> dict = new ConcurrentDictionary<int, List<double>>();
Parallel.ForEach(files, file =>
{
var values = GetValuesFromFile(file);
dict.Add(file, values);
});
}
You don't need to return the dictionary: the main thread already has a reference to it and it'll see the changes that your thread does. All the main thread has to do is wait until the delegate thread is done (for example using thread.Wait()).
However, doing things this way, you get no benefit from multithreading because nothing is done in parallel. What you can do is have multiple threads, and multiple dictionaries (one per thread). When everyone's done, the main thread can put all those dictionaries together.
The reason you don't want multiple threads accessing the same dictionary is that the Dictionary class is not thread-safe: its behavior is undefined if more than one thread use it at the same time. However you could use a ConcurrentDictionary, this one is thread-safe. What this means is that every time you read or write to the ConcurrentDictionary, it uses a lock to make sure to wait until no one else is using the dictionary at the same time.
Which of the two techniques is faster depends on how often your threads would be accessing the shared dictionary: if they access it rarely then a ConcurrentDictionary will work well. If they access it very often then it might be preferable to use multiple dictionaries and merge in the end. In your case since there is file I/O involved, I suspect that the ConcurrentDictionary approach will work best.
So, in short, change getValuesNEW to this:
//This is the method that I am calling
public void getValuesNEW(ConcurrentDictionary<string, List<double>> dictionary, List<string> fileNames)
{
foreach (string name in fileNames)
{
// (code in there is unchanged)
}
// no need to return the dictionary
//return dictionary;
}
If you want to wait for your thread to finish, then you can use Thread.Join after Thread.Start, to get the result of your thread create a class variable or something that both your main program and your thread can use, however, i don't see the point of a thread here, unless you want to process all the files in parallel.

Threads interaction (data from one thread to another) c#

I need to pass information from thread of scanning data to recording information thread(write to xml file).
It should looks something like this:
Application.Run() - complete
Scanning thread - complete
Writing to xlm thread - ???
UI update thread - I think I did it
And what i got now:
private void StartButtonClick(object sender, EventArgs e)
{
if (FolderPathTextBox.Text == String.Empty || !Directory.Exists(FolderPathTextBox.Text)) return;
{
var nodeDrive = new TreeNode(FolderPathTextBox.Text);
FolderCatalogTreeView.Nodes.Add(nodeDrive);
nodeDrive.Expand();
var t1 = new Thread(() => AddDirectories(nodeDrive));
t1.Start();
}
}
private void AddDirectories(TreeNode node)
{
string strPath = node.FullPath;
var dirInfo = new DirectoryInfo(strPath);
DirectoryInfo[] arrayDirInfo;
FileInfo[] arrayFileInfo;
try
{
arrayDirInfo = dirInfo.GetDirectories();
arrayFileInfo = dirInfo.GetFiles();
}
catch
{
return;
}
//Write data to xml file
foreach (FileInfo fileInfo in arrayFileInfo)
{
WriteXmlFolders(null, fileInfo);
}
foreach (DirectoryInfo directoryInfo in arrayDirInfo)
{
WriteXmlFolders(directoryInfo, null);
}
foreach (TreeNode nodeFil in arrayFileInfo.Select(file => new TreeNode(file.Name)))
{
FolderCatalogTreeView.Invoke(new ThreadStart(delegate { node.Nodes.Add(nodeFil); }));
}
foreach (TreeNode nodeDir in arrayDirInfo.Select(dir => new TreeNode(dir.Name)))
{
FolderCatalogTreeView.Invoke(new ThreadStart(delegate
{node.Nodes.Add(nodeDir);
}));
StatusLabel.BeginInvoke(new MethodInvoker(delegate
{
//UI update...some code here
}));
AddDirectories(nodeDir);
}
}
private void WriteXmlFolders(DirectoryInfo dir, FileInfo file)
{//writing information into the file...some code here}
How to pass data from AddDirectories(recursive method) thread to WriteXmlFolders thread?
Here is a generic mechanism how one thread generates data that another thread consumes. No matter what approach (read: ready made classes) you would use the internal principle stays the same. The main players are (note that there are many locking classes available in System.Threading namespace that could be used but these are the most appropriate for this scenario:
AutoResetEvent - this allows a thread to go into sleep mode (without consuming resources) until another thread will wake it up. The 'auto' part means that once the thread wakes up, the class is reset so the next Wait() call will again put it in sleep, without the need to reset anything.
ReaderWriterLock or ReaderWriterLockSlim (recommended to use the second if you are using .NET 4) - this allows just one thread to lock for writing data but multiple threads can read the data. In this particular case there is only one reading thread but the approach would not be different if there were many.
// The mechanism for waking up the second thread once data is available
AutoResetEvent _dataAvailable = new AutoResetEvent();
// The mechanism for making sure that the data object is not overwritten while it is being read.
ReaderWriterLockSlim _readWriteLock = new ReaderWriterLockSlim();
// The object that contains the data (note that you might use a collection or something similar but anything works
object _data = null;
void FirstThread()
{
while (true)
{
// do something to calculate the data, but do not store it in _data
// create a lock so that the _data field can be safely updated.
_readWriteLock.EnterWriteLock();
try
{
// assign the data (add into the collection etc.)
_data = ...;
// notify the other thread that data is available
_dataAvailable.Set();
}
finally
{
// release the lock on data
_readWriteLock.ExitWriteLock();
}
}
}
void SecondThread()
{
while (true)
{
object local; // this will hold the data received from the other thread
// wait for the other thread to provide data
_dataAvailable.Wait();
// create a lock so that the _data field can be safely read
_readWriteLock.EnterReadLock();
try
{
// read the data (add into the collection etc.)
local = _data.Read();
}
finally
{
// release the lock on data
_readWriteLock.ExitReadLock();
}
// now do something with the data
}
}
In .NET 4 it is possible to avoid using ReadWriteLock and use one of the concurrency-safe collections such as ConcurrentQueue which will internally make sure that reading/writing is thread safe. The AutoResetEvent is still needed though.
.NET 4 provides a mechanism that could be used to avoid the need of even AutoResetEvent - BlockingCollection - this class provides methods for a thread to sleep until data is available. MSDN page contains example code on how to use it.
In case you use it as the answer
Take a look at a producer consumer.
BlockingCollection Class
How to: Implement Various Producer-Consumer Patterns

Categories