I have a QueueList table:
{Id, Queue_Instance_ID, Parent_Queue_Instance_ID, Child_Queue_Instance_ID}
What would be the most efficient way to stuff this into a LinkedList<QueueList>? Do you think I can go lower than o(n^2)?
O(n) is better than O(n^2), right? ;)
I assume that the parent and child id are actually the previous and next id in the list.
First put the items in a dictionary so that you easily can look them up on the QueueInstanceId. In this loop you also locate the first item. Then you just add the items to the linked list and use the dictionary to get the next item. Example in C#:
Dictionary<int, QueueList> lookup = new Dictionary<int, QueueList>();
QueueList first = null;
foreach (QueueList item in source) {
lookup.Add(item.QueueInstanceId, item);
if (item.ParentQueueInstanceId == -1) {
first = item;
}
}
LinkedList<QueueList> list = new LinkedList<QueueList>();
do {
list.AddLast(first);
} while (lookup.TryGetValue(first.ChildQueueInstanceId, out first));
Adding items to a dictionary and getting them by key are O(1) operations, and each loop is the length of the source list, so it's all an O(n) operation.
Thinking about this a little bit, you can do it in O(n) time, although it will take some extra stuff that will add some overhead. (Still O(n)).
You could use some sort of universal hashing scheme so that you can build a good hashtable. Essentially, hash by the instance id.
-Take each queue, and throw it into a doubly linked node of some sort.
- Insert each node into the hashtable.
- When you insert, check if the parent and/or child are in the hashtable and link accordingly.
_ when you are done, start at the head (which you should be able to figure out in the first pass), walk the list and add it to your LinkedList or just use the resulting doubly linked list.
This takes one pass that is O(n), and the hashtable takes O(n) to initialize and inserts are O(1).
The problem left for you is choosing a good hashtable implementation that will give you those results.
Also, you have to consider the memory overhead, as I do not know how many results you expect. Hopefully enough to keep in memory.
I don't know of or think there is an SQL query based solution, but my SQL knowledge is very limited.
Hope this helps!
Related
I have read several different sources over the years that indicate that when storing a collection of data, a List<T> is efficient when you want to insert objects, and an IEnumerable<T> is best for enumerating over a collection.
In LINQ-to-Entities, there is the AsEnumerable() function, that will return an IEnumerable<T>, but it will not resolve the SQL created by the LINQ statement until you start enumerating over the list.
What if I want to store objects from LINQ to Entities in a collection and then query on that collection later?
Using this strategy causes the SQL to be resolved by adding a WHERE clause and querying each record separately. I specifically don't want to do that because I'm trying to limit network chatter:
var myDataToLookup = context.MyData.AsEnumerable();
for(var myOtherDatum in myOtherDataList)
{
// gets singular record from database each time.
var myDatum = myDataToLookup.SingleOrDefault(w => w.key == myOtherDatum.key)
}
How do I resolve the SQL upfront so myDataToLookup actually contains the data in memory? I've tried ToArray:
var myDataToLookup = context.MyData.ToArray();
But I recently learned that it actually uses more memory than ToList does:
Is it better to call ToList() or ToArray() in LINQ queries?
Should I use a join instead?
var myCombinedData = from o in myOtherDataList
join d in myDataToLookup on
o.key equals d.key
select { myOtherData: o, myData: d};
Should I use ToDictionary and store my key as the key to the dictionary? Or am I worrying too much about this?
If you're using LINQ to Entities then you should not worry if ToArray is slower than ToList. There is almost no difference between them in terms of performance and LINQ to Entities itself will be a bottleneck anyway.
Regarding a dictionary. It is a structure optimized for reads by keys. There is an additional cost on adding new items though. So, if you will read by key a lot and add new items not that often then that's the way to go. But to be honest - you probably should not bother at all. If data size is not big enough, you won't see a difference.
Think of IEnumerable, ICollection and IList/IDictionary as a hierarchy each one inheriting from the previous one. Arrays add a level of restriction and complexity on top of Lists. Simply, IEnumerable gives you iteration only. ICollection adds counting and IList then gives richer functionality including find, add and remove elements by index or via lambda expressions. Dictionaries provide efficient access via a key. Arrays are much more static.
So, the answer then depends on your requirements. If it is appropriate to hold the data in memory and you need to frequently re-query it then I usually convert the Entity result to a List. This also loads the data.
If access via a set of keys is paramount then I use a Dictionary.
I cannot remember that last time I used an array except for infrequent and very specific purposes.
SO, not a direct answer, but as your question and the other replies indicate there isn't a single answer and the solution will be a compromise.
When I code and measure performance and data carried over the network, here is how I look at things based on your example above.
Let's say your result returns 100 records. Your code has now run a query on the server and performed 1 second of processing (I made the number up for sake of argument).
Then you need to cast it to a list which is going to be 1 more second of processing. Then you want to find all records that have a value of 1. The code will now Loop through the entire list to find the values with 1 and then return you the result. This is let's say another 1 second of processing and it finds 10 records.
Your network is going to carry over 10 records that took 3 seconds to process.
If you move your logic to your Data layer and make your query search right away for the records that you want, you can then save 2 seconds of performance and still only carry 10 records across the network. The bonus side is also that you can just use IEnumerable<T> as a result and not have to cast it a list. Thus eliminating the 1 second of casting to list and 1 second of iterating through the list.
I hope this helps answer your question.
What is the best performance optimized alternative to choose among Dictionary<TKey,TValue>, HashSet<T> and List<T> with respect to :
Add values(without duplicates)
LookUps
Delete values.
I have to avoid adding duplicate values to the collection I know HashSet is good, since it skips the add if a duplicate is detected, a dictionary on the other hand throws an exception if a duplicate is found. List will need an additional ifExists check on the existing items before adding the value. But adding values in a HashSet<T> without duplicates seems to take around 1 min for 10K records. Is there a way to optimize this.
Ok ... In terms of theory, all data structures that you talked about (HashSet, Dictionary and List) have asimptotically O (1 ) time complexity for adding items. Hashing data structures have O(1) for delete also. For Lists, depends a lot where are you perfoming the delete operation: if you remove at a random "i" position then you have O(N) complexity due to the fact that all items from i+1 to the end of the list must be shifted to the left by one position. If you remove always the last element then it is an O(1) complexity.
But most important, data strucures that are based on hashing have a big bonus : O (1) lookup complexity. But this is only in theory. In practice if you define a very bad hashcode for your types you could fallback to an O(N) complexity. A simple example would be overriding gethashcode function and returning a constant int. I suspect that your bad performance comes from a bad GetHashCode design.
Another thing to remember: dictionary and HashSet are data structures to be used in different scenarious. You could view Dictionary as a kind of array for wich the index can be any type and HashSet a special list that does not allow duplicates
This perfectly answers the performance stats for Dictionary, List and HashSet w.r.t:
Add, LookUp and Remove
http://theburningmonk.com/2011/03/hashset-vs-list-vs-dictionary/
When it comes to performance and storing unique values I would prefer a Hash set or dictionary depending on my requirement.
hashSet is used when you dont have a key value pair to enter and still you dont want a duplicate in your collection. So, hashset is a collection to store unique values wihtout a key value pair.
where as when I have a pair of key and value i prefer dictionary to store unique values.
I am writing a method that does some manipulation on a List. During each iteration, the algorithm will either delete, add or move an existing item (so the list is changing, one item at a time). Move operation will be performed using an extension method that is implemented with the following logic:
T item = base[oldIndex];
base.RemoveItem(oldIndex);
base.InsertItem(newIndex, item);
Also, during each iteration, one item in the list will be queried for its current index. List.IndexOf(item) is an expensive operation (O(n)), and I wonder if there is a way to improve the performance of this operation. I thought about keeping a dictionary of the indices, but then on every insertion and removal of an item I will need to update the indices of O(n) items the come after it, so that does not help.
Note: I am ware that List.InsertItem and List.RemoveItem take O(n) anyway, but I am specifically asking about how to improve the performance of the IndexOf operation in such a scenario.
Thanks
I am trying to find out more details about the sorted list so that I can analyze whether it will perform like I need it to in several specific situations.
Does anyone know the specific sorting algorithm that it uses to sort its elements?
SortedList<T,U> uses an array internally for its keys, and performs the "sort" by inserting the items on Add in the correct order. When you call Add with a new item, it does a Array.BinarySearch with an Array.Insert.
This is why it has these characteristics:
This method is an O(n) operation for unsorted data, where n is Count. It is an O(log n) operation if the new element is added at the end of the list. If insertion causes a resize, the operation is O(n).
At the moment I am using a custom class derived from HashSet. There's a point in the code when I select items under certain condition:
var c = clusters.Where(x => x.Label != null && x.Label.Equals(someLabel));
It works fine and I get those elements. But is there a way that I could receive an index of that element within the collection to use with ElementAt method, instead of whole objects?
It would look more or less like this:
var c = select element index in collection under certain condition;
int index = c.ElementAt(0); //get first index
clusters.ElementAt(index).RunObjectMthod();
Is manually iterating over the whole collection a better way? I need to add that it's in a bigger loop, so this Where clause is performed multiple times for different someLabel strings.
Edit
What I need this for? clusters is a set of clusters of some documents collection. Documents are grouped into clusters by topics similarity. So one of the last step of the algorithm is to discover label for each cluster. But algorithm is not perfect and sometimes it makes two or more clusters with the same label. What I want to do is simply merge those cluster into big one.
Sets don't generally have indexes. If position is important to you, you should be using a List<T> instead of (or possibly as well as) a set.
Now SortedSet<T> in .NET 4 is slightly different, in that it maintains a sorted value order. However, it still doesn't implement IList<T>, so access by index with ElementAt is going to be slow.
If you could give more details about why you want this functionality, it would help. Your use case isn't really clear at the moment.
In the case where you hold elements in HashSet and sometimes you need to get elements by index, consider using extension method ToList() in such situations. So you use features of HashSet and then you take advantage of indexes.
HashSet<T> hashset = new HashSet<T>();
//the special situation where we need index way of getting elements
List<T> list = hashset.ToList();
//doing our special job, for example mapping the elements to EF entities collection (that was my case)
//we can still operate on hashset for example when we still want to keep uniqueness through the elements
There's no such thing as an index with a hash set. One of the ways that hash sets gain efficincy in some cases is by not having to maintain them.
I also don't see what the advantage is here. If you were to obtain the index, and then use it this would be less efficient than just obtaining the element (obtaining the index would be equally efficient, and then you've an extra operation).
If you want to do several operations on the same object, just hold onto that object.
If you want to do something on several objects, do so on the basis of iterating through them (normal foreach or doing foreach on the results of a Where() etc.). If you want to do something on several objects, and then do something else on those several same objects, and you have to do it in such batches, rather than doing all the operations in the same foreach then store the results of the Where() in a List<T>.
why don't use a dictionary?
Dictionary<string, int> dic = new Dictionary<string, int>();
for (int i = 0; i < 10; i++)
{
dic.Add("value " + i, dic.Count + 1);
}
string find = "value 3";
int position = dic[find];
Console.WriteLine("the position of " + find + " is " + position);
example