Select an element by index from a .NET HashSet

Select an element by index from a .NET HashSet - c#

At the moment I am using a custom class derived from HashSet. There's a point in the code when I select items under certain condition:
var c = clusters.Where(x => x.Label != null && x.Label.Equals(someLabel));
It works fine and I get those elements. But is there a way that I could receive an index of that element within the collection to use with ElementAt method, instead of whole objects?
It would look more or less like this:
var c = select element index in collection under certain condition;
int index = c.ElementAt(0); //get first index
clusters.ElementAt(index).RunObjectMthod();
Is manually iterating over the whole collection a better way? I need to add that it's in a bigger loop, so this Where clause is performed multiple times for different someLabel strings.
Edit
What I need this for? clusters is a set of clusters of some documents collection. Documents are grouped into clusters by topics similarity. So one of the last step of the algorithm is to discover label for each cluster. But algorithm is not perfect and sometimes it makes two or more clusters with the same label. What I want to do is simply merge those cluster into big one.

Sets don't generally have indexes. If position is important to you, you should be using a List<T> instead of (or possibly as well as) a set.
Now SortedSet<T> in .NET 4 is slightly different, in that it maintains a sorted value order. However, it still doesn't implement IList<T>, so access by index with ElementAt is going to be slow.
If you could give more details about why you want this functionality, it would help. Your use case isn't really clear at the moment.

In the case where you hold elements in HashSet and sometimes you need to get elements by index, consider using extension method ToList() in such situations. So you use features of HashSet and then you take advantage of indexes.
HashSet<T> hashset = new HashSet<T>();
//the special situation where we need index way of getting elements
List<T> list = hashset.ToList();
//doing our special job, for example mapping the elements to EF entities collection (that was my case)
//we can still operate on hashset for example when we still want to keep uniqueness through the elements

There's no such thing as an index with a hash set. One of the ways that hash sets gain efficincy in some cases is by not having to maintain them.
I also don't see what the advantage is here. If you were to obtain the index, and then use it this would be less efficient than just obtaining the element (obtaining the index would be equally efficient, and then you've an extra operation).
If you want to do several operations on the same object, just hold onto that object.
If you want to do something on several objects, do so on the basis of iterating through them (normal foreach or doing foreach on the results of a Where() etc.). If you want to do something on several objects, and then do something else on those several same objects, and you have to do it in such batches, rather than doing all the operations in the same foreach then store the results of the Where() in a List<T>.

why don't use a dictionary?
Dictionary<string, int> dic = new Dictionary<string, int>();
for (int i = 0; i < 10; i++)
{
dic.Add("value " + i, dic.Count + 1);
}
string find = "value 3";
int position = dic[find];
Console.WriteLine("the position of " + find + " is " + position);
example

Related

How do I return the value of a HashSet item instead of the index?

I am creating a HashSet in C# of type int. The HashSet is filled with randomly generated numbers. I'd like to select one of these numbers from the it at random and return it to the user. The problem is, I only see solutions that return the index of the item in the HashSet not the actual value. I'd like to know how I can access the value of my HashSet items.
Here's what I have at the moment, first I create a list from 10 to 20 which forms part of my list control, the reason for this is to set a limit on how many items a list can hold.
List<int> list_length = Enumerable.Range(10, 20).ToList();
I then iterate over list_length and add a random number to my primary list which is called hash_values
private readonly HashSet<int> hash_values = new HashSet<int>();
foreach (var i in list_length)
{
hash_values.Add(rnd.Next(10, 20));
}
I then access the hash_values list and return a random value from it.
int get_random_number = rnd.Next(hash_values.Count());
return Json(get_random_pin);
You'll notice that the random return is actually selecting a value from the lists total count. This is no use. What I'd like to do is pick an item at random and turn it's value not the index.
Let's say I have a list with 5 values, the hash_values table will store it like this:
[0] 100
[1] 200
[2] 300
[3] 400
[4] 500
Returning something like a random number from the lists total count will only select the index, so I would receive [1] instead of 100. So, how do I return the value instead of the index?

I would suggest to use the returned index to access the value in the hashset:
int get_random_value = hash_values.ElementAt(rnd.Next(hash_values.Count()));
EDIT:
the reason for this is to set a limit on how many items a list can hold.
you could also simply fix this number in an int and use a normal for loop
int maxNumberOfItems = 10;
for (int i = 0; i < maxNumberOfItems; i++)
{
hash_values.Add(rnd.Next(10, 20));
}
EDIT 2 Disclaimer:
The approach of using an index on a HashSet is actually counter productive since if you want to access the value the HashSet would do it in O(1) but using the index it will do it in O(n) like it is described in this answer

HashSet is a Set collection. It only implements the ICollection interface. It has great restrictions on individual element access: Compared with List, you cannot use subscripts to access elements, such as list[1]. Because HashSet only saves one item for each piece of data, and does not use the Key-Value method, in other words, the Key in the HashSet is the Value. If the Key is already known, there is no need to query to obtain the Value. All you need to do is check Whether the value already exists.

If you get the index, then you can always to that index and fetch that value and return it. I don't see any harm by doing this.

HashSet is not ordered by nature, but you can always just convert you HashSet to a List or an Array by calling ToList or ToArray respectively.
The other solution with ElementAt will probably work the same, but depending on the implementation details it might not be as efficient as e.g. array (if it loops through IEnumerable).

Best way to get an ordered list of groups by value from an unordered list

I'd like to know if there's a more efficient way to get an ordered list of groups by value from an initially unordered list, than using GroupBy() followed by OrderBy(), like this:
List<int> list = new List<int>();
IEnumerable<IEnumerable<int>> orderedGroups = list.GroupBy(x => x).OrderBy(x => x.Key);
For more detail, I have a large List<T> which I'd like to sort, however there are lots of duplicate values so I want to return the results as IEnumerable<IEnumerable<T>>, much as GroupBy() returns an IEnumerable of groups. If I use OrderBy(), I just get IEnumerable<T>, with no easy way to know whether the value has changed from one item to the next. I could group the list then sort the groups, but the list is large so this ends up being slow. Since OrderBy() returns an OrderedEnumerable which can then be sorted on a secondary field using ThenBy(), it must internally distinguish between adjacent items with the same or different values.
Is there any way I can make use of the fact that OrderedEnumerable<T> must internally group its results by value (in order to facilitate ThenBy()), or otherwise what's the most efficient way to use LINQ to get an ordered list of groups?

You can use ToLookup, which returns an IEnumerable<IGrouping<TKey, TElement> and then do OrderBy for values of each key on demand. This will be O(n) to create the lookup and O(h) to order elements under each group (values for a key) assuming h is the number of elements under a group
You can improve the performance to amortized O(n) by using IDictionary<TKey, IOrderedEnumerable<T>>. But if you want to order by multiple properties, it will again by O(h) on the group. See this answer for more info on IOrderedEnumerable. You can also use SortedList<TKey, TValue> instead of IOrderedEnumerable
[Update]:
Here is another answer which you can take a look. But again, it involves doing OrderBy on top of the result.
Further, you can come up with your own data structure as I don't see any data structure available on BCL meeting this requrement.
One possible implementation:
You can have a Binary Search Tree which does search/delete/insert in O(longN) on an average. And doing an in-order traversal will give you sorted keys. Each node on the tree will have an ordered collection for example, for the values.
node roughly looks like this:
public class MyNode
{
prop string key;
prop SortedCollection myCollection;
}
You can traverse over the initial collection once and create this special data structure which can be queried to get fast results.
[Update 2]:
if you have possible keys below 100k, then I feel implementing your own data structure is an overkill. Generally an order by will return pretty fast and the time taken is tiny. Unless you have large data and you do order by multiple times, ToLookup should work fairly well.

Honestly, you're not going to do much better than
items.GroupBy(i => i.KeyProperty).OrderBy(g => g.Key);
GroupBy is an O(n) operation. The OrderBy is then O(k log k) where k is the number of groups.
If you call OrderBy first... well, firstly, your O(n log n) is now in your number of items rather than your number of groups, so it's already slower than the above.
And secondly, an IOrderedEnumerable doesn't have the internal magic you think it does. It isn't an ordered sequence that contains groups of same-ordered items which can then by reordered with ThenBy; it's an unordered sequence with a list of sort keys which ThenBy adds to, which is eventually ordered by each of those keys when you iterate over it.
You may be able to eke out a little more speed by rolling your own "group and sort" loop, maybe manually adding to an SortedDictionary<TKey, IList<TItem>>, but I don't think you're going to get a better big O than what out-of-the-box LINQ gets you.LINQ

I think iterating thru the list for(;;) as you populate Dictionary<T, int>, where value is count of repeated elements will be faster.

C# Matching items in different lists

I have two different lists of objects, one of them an IQueryable set (rolled up into an array) and the other a List set. Objects in both sets share a field called ID; each of the objects in the second set will match an object in the first set, but not necessarily vice versa. I need to be able to handle both groups (matched and unmatched). The size of both collections is between 300 and 350 objects in this case (for reference, the XML generated for the objects in the second set is usually no more than 7k, so think maybe half to two-thirds of that size for the actual memory used by each object in each set).
The way I have it currently set up is a for-loop that iterates through an array representation of the IQueryable set, using a LINQ statement to query the List set for the matching record. This takes too much time; I'm running a Core i7 with 10GB of RAM and it's taking anywhere from 10 seconds to 2.5 minutes to match and compare the objects. Task Manager doesn't show any huge memory usage--a shade under 25MB. None of my system threads are being taxed either.
Is there a method or algorithm that would allow me to pair up the objects in each set one time and thus iterate through the pairs and unmatched objects at a faster pace? This set of objects is just a small subset of the 8000+ this program will have to chew through each day once it goes live...
EDIT: Here's the code I'm actually running...
for (int i = 0; i < draftRecords.Count(); i++)
{
sRecord record = (from r in sRecords where r.id == draftRecords.ToArray()[i].ID select r).FirstOrDefault();
if (record != null)
{ // Do stuff with the draftRecords element based on the rest of the content of the sRecord object

You should use a method such as Enumerable.Join or Enumerable.GroupJoin to match items from the two collections. This will be far faster than doing nested for loops.
Since you want to match a collection of keys to an item in the second list which may or may not exist, GroupJoin is likely more appropriate. This would look something like:
var results = firstSet.GroupJoin(secondSet, f => f.Id, s => s.Id, (f,sset) => new {First = f, Seconds = sset});
foreach(var match in results)
{
Console.WriteLine("Item {0} matches:", match.First);
foreach(var second in item.Seconds)
Console.WriteLine(" {0}", second); // each second item matching, one at a time
}

Your question is lacking in sample code/information but I would personally look to use methods like; Join, Intersect, or Contains. If necessary use Select to do a projection of the fields you want to match or define a custom IEqualityComparer.

What data structure can I use to access its contents randomly?

I need to add a bunch of items to a data structure and then later access ALL of the items within it in a random order. How can I do this?
To be more specific, I am currently adding URLs to a List<string> object. They are added in a way such that adjacent URLs are likely to be on the same server. When I access the List using a Parallel.ForEach statement, it just returns the items in the order that I added them. Normally this is okay, but when I am making web requests in parallel, this tends to overwhelm some servers and leads to timeouts. What data structure can I use that will return items in a more random way when I run a Parallel.ForEach statement on the object (i.e., not in the order that I added them)?

ORIGINAL SOLUTION
Fisher–Yates shuffle
public static void Shuffle<T>(this IList<T> list)
{
Random rng = new Random();
int n = list.Count;
while (n > 1) {
n--;
int k = rng.Next(n + 1);
T value = list[k];
list[k] = list[n];
list[n] = value;
}
}
List<Product> products = GetProducts();
products.Shuffle();

I think shuffling is a better answer, but an answer to your specific question would be a Hashtable. You would add your string url as the key and null for value. The Keys property will return the strings in the order of where they happened to be placed in the hash table, which will be fairly random since the strings' hashcodes and collision handling will result in the order not well correlated to the sorted order of the string values themselves.
Dictionary and HashSet won't work the same way. Their internal implementation ends up returning items in the order they were added.
Although this is how Hashtable actually works, you'd be counting on an internal implementation detail, which has its potential perils. That's why I prefer just shuffling.

loading a doubly linked list from a SQL database

I have a QueueList table:
{Id, Queue_Instance_ID, Parent_Queue_Instance_ID, Child_Queue_Instance_ID}
What would be the most efficient way to stuff this into a LinkedList<QueueList>? Do you think I can go lower than o(n^2)?

O(n) is better than O(n^2), right? ;)
I assume that the parent and child id are actually the previous and next id in the list.
First put the items in a dictionary so that you easily can look them up on the QueueInstanceId. In this loop you also locate the first item. Then you just add the items to the linked list and use the dictionary to get the next item. Example in C#:
Dictionary<int, QueueList> lookup = new Dictionary<int, QueueList>();
QueueList first = null;
foreach (QueueList item in source) {
lookup.Add(item.QueueInstanceId, item);
if (item.ParentQueueInstanceId == -1) {
first = item;
}
}
LinkedList<QueueList> list = new LinkedList<QueueList>();
do {
list.AddLast(first);
} while (lookup.TryGetValue(first.ChildQueueInstanceId, out first));
Adding items to a dictionary and getting them by key are O(1) operations, and each loop is the length of the source list, so it's all an O(n) operation.

Thinking about this a little bit, you can do it in O(n) time, although it will take some extra stuff that will add some overhead. (Still O(n)).
You could use some sort of universal hashing scheme so that you can build a good hashtable. Essentially, hash by the instance id.
-Take each queue, and throw it into a doubly linked node of some sort.
- Insert each node into the hashtable.
- When you insert, check if the parent and/or child are in the hashtable and link accordingly.
_ when you are done, start at the head (which you should be able to figure out in the first pass), walk the list and add it to your LinkedList or just use the resulting doubly linked list.
This takes one pass that is O(n), and the hashtable takes O(n) to initialize and inserts are O(1).
The problem left for you is choosing a good hashtable implementation that will give you those results.
Also, you have to consider the memory overhead, as I do not know how many results you expect. Hopefully enough to keep in memory.
I don't know of or think there is an SQL query based solution, but my SQL knowledge is very limited.
Hope this helps!

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Select an element by index from a .NET HashSet - c#

why don't use a dictionary? Dictionary<string, int> dic = new Dictionary<string, int>(); for (int i = 0; i < 10; i++) { dic.Add("value " + i, dic.Count + 1); } string find = "value 3"; int position = dic[find]; Console.WriteLine("the position of " + find + " is " + position); example

Related

How do I return the value of a HashSet item instead of the index?

Best way to get an ordered list of groups by value from an unordered list

C# Matching items in different lists

What data structure can I use to access its contents randomly?

loading a doubly linked list from a SQL database

Categories

Resources