I need to add a bunch of items to a data structure and then later access ALL of the items within it in a random order. How can I do this?
To be more specific, I am currently adding URLs to a List<string> object. They are added in a way such that adjacent URLs are likely to be on the same server. When I access the List using a Parallel.ForEach statement, it just returns the items in the order that I added them. Normally this is okay, but when I am making web requests in parallel, this tends to overwhelm some servers and leads to timeouts. What data structure can I use that will return items in a more random way when I run a Parallel.ForEach statement on the object (i.e., not in the order that I added them)?
ORIGINAL SOLUTION
Fisher–Yates shuffle
public static void Shuffle<T>(this IList<T> list)
{
Random rng = new Random();
int n = list.Count;
while (n > 1) {
n--;
int k = rng.Next(n + 1);
T value = list[k];
list[k] = list[n];
list[n] = value;
}
}
List<Product> products = GetProducts();
products.Shuffle();
I think shuffling is a better answer, but an answer to your specific question would be a Hashtable. You would add your string url as the key and null for value. The Keys property will return the strings in the order of where they happened to be placed in the hash table, which will be fairly random since the strings' hashcodes and collision handling will result in the order not well correlated to the sorted order of the string values themselves.
Dictionary and HashSet won't work the same way. Their internal implementation ends up returning items in the order they were added.
Although this is how Hashtable actually works, you'd be counting on an internal implementation detail, which has its potential perils. That's why I prefer just shuffling.
Related
I am creating a HashSet in C# of type int. The HashSet is filled with randomly generated numbers. I'd like to select one of these numbers from the it at random and return it to the user. The problem is, I only see solutions that return the index of the item in the HashSet not the actual value. I'd like to know how I can access the value of my HashSet items.
Here's what I have at the moment, first I create a list from 10 to 20 which forms part of my list control, the reason for this is to set a limit on how many items a list can hold.
List<int> list_length = Enumerable.Range(10, 20).ToList();
I then iterate over list_length and add a random number to my primary list which is called hash_values
private readonly HashSet<int> hash_values = new HashSet<int>();
foreach (var i in list_length)
{
hash_values.Add(rnd.Next(10, 20));
}
I then access the hash_values list and return a random value from it.
int get_random_number = rnd.Next(hash_values.Count());
return Json(get_random_pin);
You'll notice that the random return is actually selecting a value from the lists total count. This is no use. What I'd like to do is pick an item at random and turn it's value not the index.
Let's say I have a list with 5 values, the hash_values table will store it like this:
[0] 100
[1] 200
[2] 300
[3] 400
[4] 500
Returning something like a random number from the lists total count will only select the index, so I would receive [1] instead of 100. So, how do I return the value instead of the index?
I would suggest to use the returned index to access the value in the hashset:
int get_random_value = hash_values.ElementAt(rnd.Next(hash_values.Count()));
EDIT:
the reason for this is to set a limit on how many items a list can hold.
you could also simply fix this number in an int and use a normal for loop
int maxNumberOfItems = 10;
for (int i = 0; i < maxNumberOfItems; i++)
{
hash_values.Add(rnd.Next(10, 20));
}
EDIT 2 Disclaimer:
The approach of using an index on a HashSet is actually counter productive since if you want to access the value the HashSet would do it in O(1) but using the index it will do it in O(n) like it is described in this answer
HashSet is a Set collection. It only implements the ICollection interface. It has great restrictions on individual element access: Compared with List, you cannot use subscripts to access elements, such as list[1]. Because HashSet only saves one item for each piece of data, and does not use the Key-Value method, in other words, the Key in the HashSet is the Value. If the Key is already known, there is no need to query to obtain the Value. All you need to do is check Whether the value already exists.
If you get the index, then you can always to that index and fetch that value and return it. I don't see any harm by doing this.
HashSet is not ordered by nature, but you can always just convert you HashSet to a List or an Array by calling ToList or ToArray respectively.
The other solution with ElementAt will probably work the same, but depending on the implementation details it might not be as efficient as e.g. array (if it loops through IEnumerable).
Noob here. I've been scouring the internet for days, and cannot find a decent structure that auto-sorts data (like SortedSet), while still allowing that data to be accessible (like List). Here's what I have:
A list containing 100,000 nodes, added and modified regularly.
List<Nodes> nodes;
The node object, containing data I need to access/change
public class Node (string name, int index){ doSomething(); }
I don't wish to be vague, but can't sort the actual list because the index is a history of when nodes were added. Thus, I want to use a structure that auto-sorts KeyValuePair pairs(where string is the name to be sorted by, and int is the index as it is found in my list of nodes), but I must be able to access the value. Here's what I want to do:
// Add a node to the list, then to the structure
int index = nodes.Count;
nodes.Add(new Node("someName", index));
someStructure.Add("someName", index);
// Give name to structure, which returns int value for use in finding node
node[someStructure.findValueOf("someName"))].doSomething();
This would tell the node with the name "someName" to doSomething();
I am positive that I am missing something. I've tried using SortedSet, SortedList, Dictionary, etc. In each case, I can't retrieve the sorted object. What is the purpose of auto-sorting if I can't find out where they are at? Please help my poor life.
You are looking for a SortedDictionary.
As per the documentation: Represents a collection of key/value pairs that are sorted on the key. Although, as some comments say, those 100k objects would be better kept in a database...
Link: https://msdn.microsoft.com/en-us/library/f7fta44c(v=vs.110).aspx
You can use SortedList and LINQ:
SortedList<int,string> list = new SortedList<int, string>();
list.Add(1, "name1");
list.Add(2, "name2");;
var c = list.Select(x => x.Value == "name2").FirstOrDefault();
However I agree with a Christopher's comment about using db.
Is it possible to retrieve index of value in HashSet ?
I have a hashset:
HashSet<int> allE = mesh.GetAllNGonEdges(nGonTV);
And I would like to retrieve index of value similar to arrays function:
Array.IndexOf(...)
The "index" is meaningless in a HashSet - it's not guaranteed to be the same as the insertion order, and it can change over time as you add and remove entries (in non-guaranteed ways, e.g. if you add a new entry it could end up in the middle, at the end, at the start; it could reorder everything else...) There's not even a guarantee that you'll see the same order if you iterate over the set multiple times without modifying it between times, although I'd expect that to be okay.
You can get the current index with something like:
var valueAndIndex = hashSet.Select((Value, Index) => new { Value, Index })
.ToList();
... but you need to be very aware that the index isn't inherent in the entry, and is basically unstable.
During review of code of one application i've found that it assumes that order of Dictionary.Values is the same as elements were added to collection.
I've wrote application to test if this true:
using System;
using System.Collections.Generic;
namespace Test
{
class Program
{
static void Main(string[] args)
{
Dictionary<string, int> values = new Dictionary<string, int>();
values.Add("apple2", 2);
values.Add("apple3", 3);
values.Add("apple4", 4);
values.Add("apple5", 5);
values.Add("apple6", 6);
values.Add("apple1", 1);
var list = new List<int>(values.Values);
for (int i = 0; i < list.Count; i++)
{
Console.WriteLine(list[i]);
}
}
}
}
And output is:
2
3
4
5
6
1
First of all I wonder how is that possible. Isn't Dictionary supposed to use unordered Tree or something like that?
Moreover MSDN states that:
The order of the values in the Dictionary<TKey, TValue>.ValueCollection is unspecified, but it is the same order as the associated keys in the Dictionary<TKey, TValue>.KeyCollection returned by the Keys property.
So why MSDN tells that "order is unspecified" but implementation happens to keep order? Am I correct that I'm better not to rely on that fact?
Am I correct that I'm better not to rely on that fact?
Absolutely. Just because sometimes it keeps the order doesn't mean that either it will in future implementations, or indeed that it will do in all cases right now.
When the internal data structure is resized or if items are deleted, the order can change.
For example, if you add this code before you construct the list:
values.Remove("apple4");
values.Add("jon", 10);
on my box, I see the value 10 come where 4 was before... even though it was added after the entries for 5, 6 and 1.
You should definitely, definitely not rely on the ordering.
You are getting the values in order by chance could be due to your data that appears to be in order. Change the elements for mixed order and add remove the element the its order would be changed. You can not rely on the order.
If you want that behaviour then you should use an explicit ordered dictionary.
http://msdn.microsoft.com/en-us/library/system.collections.specialized.ordereddictionary.aspx
At the moment I am using a custom class derived from HashSet. There's a point in the code when I select items under certain condition:
var c = clusters.Where(x => x.Label != null && x.Label.Equals(someLabel));
It works fine and I get those elements. But is there a way that I could receive an index of that element within the collection to use with ElementAt method, instead of whole objects?
It would look more or less like this:
var c = select element index in collection under certain condition;
int index = c.ElementAt(0); //get first index
clusters.ElementAt(index).RunObjectMthod();
Is manually iterating over the whole collection a better way? I need to add that it's in a bigger loop, so this Where clause is performed multiple times for different someLabel strings.
Edit
What I need this for? clusters is a set of clusters of some documents collection. Documents are grouped into clusters by topics similarity. So one of the last step of the algorithm is to discover label for each cluster. But algorithm is not perfect and sometimes it makes two or more clusters with the same label. What I want to do is simply merge those cluster into big one.
Sets don't generally have indexes. If position is important to you, you should be using a List<T> instead of (or possibly as well as) a set.
Now SortedSet<T> in .NET 4 is slightly different, in that it maintains a sorted value order. However, it still doesn't implement IList<T>, so access by index with ElementAt is going to be slow.
If you could give more details about why you want this functionality, it would help. Your use case isn't really clear at the moment.
In the case where you hold elements in HashSet and sometimes you need to get elements by index, consider using extension method ToList() in such situations. So you use features of HashSet and then you take advantage of indexes.
HashSet<T> hashset = new HashSet<T>();
//the special situation where we need index way of getting elements
List<T> list = hashset.ToList();
//doing our special job, for example mapping the elements to EF entities collection (that was my case)
//we can still operate on hashset for example when we still want to keep uniqueness through the elements
There's no such thing as an index with a hash set. One of the ways that hash sets gain efficincy in some cases is by not having to maintain them.
I also don't see what the advantage is here. If you were to obtain the index, and then use it this would be less efficient than just obtaining the element (obtaining the index would be equally efficient, and then you've an extra operation).
If you want to do several operations on the same object, just hold onto that object.
If you want to do something on several objects, do so on the basis of iterating through them (normal foreach or doing foreach on the results of a Where() etc.). If you want to do something on several objects, and then do something else on those several same objects, and you have to do it in such batches, rather than doing all the operations in the same foreach then store the results of the Where() in a List<T>.
why don't use a dictionary?
Dictionary<string, int> dic = new Dictionary<string, int>();
for (int i = 0; i < 10; i++)
{
dic.Add("value " + i, dic.Count + 1);
}
string find = "value 3";
int position = dic[find];
Console.WriteLine("the position of " + find + " is " + position);
example