I need a linked list to be able to add items at both sides. The list will hold data to be shown in a trend viewer. As there is a huge amount of data I need to show data before its completely read so what I want to do is to read a block of data that I know has been already written and while I read that block have two threads filling the sides of the collection:
I thought on using LinkedList but it says that is does not support this scenario. Any ideas on something at the Framework that can help me or will I have to develop my custom list from scratch?
Thanks in advance.
EDIT: The main idea of the solution is to do it without locking anything because I'm reading a piece of the list that is not going to be changed while writing at other places. I mean, the read Thread will only read one chunk (from A to B) (a section that has been already written). When it finishes and other chunks have been completely written the reader will read those chunks while the writters write new data.
See the updated diagram:
If you are on .NET4 you can use two ConcurrentQueue. One for the left side and one for the right side.
If I understand correctly you have a linked list to which you are adding data at the beginning and the end. You are never adding or removing from anywhere else. If this is the case you do not have to worry about threading since the other threads will never interfere.
Simply do something like this:
//Everything between first and last is thread safe since
//the other threads only add before and after.
LinkedListNode<object> first = myList.First;
LinkedListNode<object> current = first;
LinkedListNode<object> last = myList.Last;
bool done = false;
if (current == null) return; //empty list
do
{
//do stuff
if (current == last) done = true;
current = current.Next;
} while (!done && current != null);
After you are done with this section you can do the same with two more sections from the new myList.First to first and from last to the new myList.Last.
You could use the linked list and just use normal .NET threading constructs like the lock keyword in order to protect access to the list.
Any custom list you developed would probably do something like that anyway.
I would recommend to consider an other approach with single data structure to persist incomming data, in this way you can keep order of incomming data messages.
For instance you can use blocking queue, in this SO post you can find nice example Creating a blocking Queue in .NET?.
Why not using LinkedList class.
The documentation says its not threadsafe so you have to synchronize access to the list for yourself, but you have to do this with any datastructure accessed by multiple threads.
Performance should be quiet good here is what msdn says about inserting nodes at any position:
LinkedList provides separate nodes of type LinkedListNode, so insertion and removal are O(1) operations.
You just have to lock read and insert operations with the lock construct.
EDIT
Ok, i think i understand what you want. You want a list like datastructure which is split into chunks of items. You want to independently write and read chunks of items without locking the whole list.
I suggest to use the LinkedList holding your chunks of data items.
The chunks themself can be represented as simple List or can be LinkedList instances as well.
You have to lock the access to the global LinkedList.
Now your writer threads fill one private List with n items a time. When finished the writer locks the LinkedList and adds his private list with dataitems to the LinkedList.
The reader thread locks the LinkedList reads one chunk and releases the lock. Now it can process n dataitems without locking them.
Related
I have a List that I am adding values to, every interval seconds, which is running in a thread.
var point = GetPoint(presentValue);
DataSource[itemIndex].Add(point);
In an Event I then read values from that List, to be exact I search for the closest value to my target. I Create a local variable for that list to work with, but sometimes I get the Exception
"Destination array not long enough" when creating this List.
I've figured out that this must mean the List was changed while the new List was created, so it's got something to do with the Code above. After a bit of research I found about thread-safety and the "lock" keyword, which I then tried to use. I tried locking to the list itself, to the list's SyncRoot and to a custom sync object, but the error still occured.
lock (SyncHelper.TrendDataPointLock)
{
var point = GetPoint(presentValue);
DataSource[itemIndex].Add(point);
}
and
lock (SyncHelper.TrendDataPointLock)
{
points = new List<DataPoint>(ActualPoints);
}
I know that I'm not fully familiar with the aspects of thread safety, but after looking at many different approaches I still can't seem to make this work.
1: Any advice on how to fix my error
2: Do I need to have a lock statement on every access of that list in order to be sure that the thread will pause before the other lock is released?
3: If not 2, then does locking to the list itself, make every thread block, no matter if they also have a lock statement around the list access or not? So locking on the Add statement "should" fix my problem.
EDIT:
DataSourceis a Dictionary<int, List<DataPoint>>
ActualPointsis a reference to the list DataSource[itemIndex]
The only place where I Edit this list is in the Code above, and when I clear the list.
the pointsvariable is only there for accessing certain indexes to find the closest value to my target, but the index is always lower than points.Count, to be exact, binary search thought the list so im starting in the middle. The Application only crashes when accessing ActualPoints to create the points list, so everything after that shouldn't make a difference.
Try a collection which is already threadsafe. Check out Thread-Safe Collections.
I've read on here that iterating though a dictionary is generally considered abusing the data structure and to use something else.
However, I'm having trouble coming up with a better way to accomplish what I'm trying to do.
When a tag is scanned I use its ID as the key and the value is a list of zones it was seen in. About every second I check to see if a tag in my dictionary has been seen in two or more zones and if it has, queue it up for some calculations.
for (int i = 0; i < TagReads.Count; i++)
{
var tag = TagReads.ElementAt(i).Value;
if (tag.ZoneReads.Count > 1)
{
Report.Tags.Enqueue(tag);
Boolean success = false;
do
{
success = TagReads.TryRemove(tag.Epc, out TagInfo outTag);
} while (!success);
}
}
I feel like a dictionary is the correct choice here because there can be many tags to look up but something about this code nags me as being poor.
As far as efficiency goes. The speed is fine for now in our small scale test environment but I don't have a good way to find out how it will work on a massive scale until it is put to use, hence my concern.
I believe that there's an alternative approach which doesn't involve iterating a big dictionary.
First of all, you need to create a HashSet<T> of tags on which you'll store those tags that have been detected in more than 2 zones. We'll call it tagsDetectedInMoreThanTwoZones.
And you may refactor your code flow as follows:
A. Whenever you detect a tag in one zone...
Add the tag and the zone to the main dictionary.
Create an exclusive lock against tagsDetectedInMoreThanTwoZones to avoid undesired behaviors in B..
Check if the key has more than one zone. If this is true, add it to tagsDetectedInMoreThanTwoZones.
Release the lock against tagsDetectedInMoreThanTwoZones.
B. Whenever you need to process a tag which has been detected in more than one zone...
Create an exclusive lock against tagsDetectedInMoreThanTwoZones to avoid more than a thread trying to process them.
Iterate tagsDetectedInTwoOrMoreZones.
Use each tag in tagsDetectedInMoreThanTwoZones to get the zones in your current dictionary.
Clear tagsDetectedInMoreThanTwoZones.
Release the exclusive lock against tagsDetectedInMoreThanTwoZones.
Now you'll iterate those tags that you already know that have been detected in more than a zone!
In the long run, you can even make per-region partitions so you never get a tagsDetectedInMoreThanTwoZones set with too many items to iterate, and each set could be consumed by a dedicated thread!
If you are going to do a lot of lookup in your code and only sometimes iterate through the whole thing, then I think the dictionary use is ok. I would like to point out thought that your use of ElementAt is more alarming. ElementAt performs very poorly when used on objects that do not implement IList<T> and the dictionary does not. For IEnumerables<T> that do not implement IList the way the nth element is found is through iteration, so your for-loop will iterate the dictionary once for each element. You are better off with a standard foreach.
I feel like this is a good use for a dictionary, giving you good access speed when you want to check if an ID is already in the collection.
I want to limit the size of the BlockingCollection. If I want to add another item and the collection is full, the oldest must be removed. Is there some Class specific to this task or my solution is ok?
BlockingCollection<string> collection = new BlockingCollection<string>(10);
string newString = "";
//Not an elegant solution?
if (collection.Count == collection.BoundedCapacity)
{
string dummy;
collection.TryTake(out dummy);
}
collection.Add(newString);
EDIT1: Similar question here: ThreadSafe FIFO List with Automatic Size Limit Management
What you are describing is a LRU cache. There is no implementation that I know of in the standard libraries but would not be hard to create. Look at this c++ implementation for some clues.
Edit
Try this one from code project
Your solution will function correctly, but it is not thread safe. BlockingCollection<T> does not provide a mechanism to handle this directly.
Your solution may still block (if another thread calls Add() after your TryTake) or potentially remove an extra item (if another thread removes while you're also removing).
This is an algorithmic question.
I have got Dictionary<object,Queue<object>>. Each queue contains one or more elements in it. I want to remove all queues with only one element from the dictionary. What is the fastest way to do it?
Pseudo-code: foreach(item in dict) if(item.Length==1) dict.Remove(item);
It is easy to do it in a loop (not foreach, of course), but I'd like to know which approach is the fastest one here.
Why I want it: I use that dictionary to find duplicate elements in a large set of objects. The Key in dictionary is kind of a hash of the object, the Value is a queue of all objects found with the same hash. Since I want only duplicates, I need to remove all items with just a single object in associated queue.
Update:
It may be important to know that in a regular case there are just a few duplicates in a large set of objects. Let's assume 1% or less. So possibly it could be faster to leave the Dictionary as is and create a new one from scatch with just selected elements from the first one... and then deelte the first Dictionary completely. I think it depends on the comlpexity of computational Dictionary class's methods used in particular algorithms.
I really want to see this problem on a theoretical level because as a teacher I want to discuss it with students. I didn't provide any concrete solution myself because I think it is really easy to do it. The question is which approach is the best, the fastest.
var itemsWithOneEntry = dict.Where(x => x.Value.Count == 1)
.Select(x => x.Key)
.ToList();
foreach (var item in itemsWithOneEntry) {
dict.Remove(item));
}
It stead of trying to optimize the traversing of the collection how about optimizing the content of the collection so that it only includes the duplicates? This would require changing your collection algorithm instead to something like this
var duplicates = new Dictionary<object,Queue<object>>;
var possibleDuplicates = new Dictionary<object,object>();
foreach(var item in original){
if(possibleDuplicates.ContainsKey(item)){
duplicates.Add(item, new Queue<object>{possibleDuplicates[item],item});
possibleDuplicates.Remove(item);
} else if(duplicates.ContainsKey(item)){
duplicates[item].Add(item);
} else {
possibleDuplicates.Add(item);
}
}
Note that you should probably measure the impact of this on the performance in a realistic scenario before you bother to make your code any more complex than it really needs to be. Most imagined performance problems are not in fact the real cause of slow code.
But supposing you do find that you could get a speed advantage by avoiding a linear search for queues of length 1, you could solve this problem with a technique called indexing.
As well as your dictionary containing all the queues, you maintain an index container (probably another dictionary) that only contains the queues of length 1, so when you need them they are already available separately.
To do this, you need to enhance all the operations that modify the length of the queue, so that they have the side-effect of updating the index container.
One way to do it is to define a class ObservableQueue. This would be a thin wrapper around Queue except it also has a ContentsChanged event that fires when the number of items in the queue changes. Use ObservableQueue everywhere instead of the plain Queue.
Then when you create a new queue, enlist on its ContentsChanged event a handler that checks to see if the queue only has one item. Based on this you can either insert or remove it from the index container.
if Peek returns the next object in a queue, is there a method I can use to get a specific object? For example, I want to find the third object in the queue and change one of its values?
right now Im just doing a foreach through the queue which might be the best solution, but I didnt know if there was something special you can use with peek? ie Queue.Peek(2)
If you want to access elements directly (with an O(1) operation), use an array instead of a queue because a queue has a different function (FIFO).
A random access operation on a queue will be O(n) because it needs to iterate over every element in the collection...which in turn makes it sequential access, rather than direct random access.
Then again, since you're using C#, you can use queue.ElementAt(n) from System.Linq (since Queue implements IEnumerable) but that won't be O(1) i.e. it will still iterate over the elements.
Although this is still O(n), it's certainly easier to read if you use the LINQ extention methods ElementAt() or ElementAtOrDefault(), these are extentions of IEnumerable<T>, which Queue<T> implements.
using System.Linq;
Queue<T> queue = new Queue<T>();
T result;
result = queue.ElementAt(2);
result = queue.ElementAtOrDefault(2);
Edit
If you do go with the other suggestions of converting your Queue to an array just for this operation that you need to decide if the likely sizes of your queue and the distance of the index you'll be looking for from the start of your queue justify the O(n) operation of calling .ToArray(). ElementAt(m), not to mention the space requirements of creating a secondary storage location for it.
foreach through a queue. Kind of a paradox.
However, if you can foreach, it is an IEnumerable, so the usual linq extensions apply:
queue.Skip(1).FirstOrDefault()
or
queue.ElementAt(1)
You could do something like this as a one off:
object thirdObjectInQueue = queue.ToArray()[2];
I wouldn't recommend using it a lot, however, as it copies the whole queue to an array, thereby iterating over the whole queue anyway.